Sage Journals: Discover world-class research

Abstract

Attention mechanisms are widely used on NLP tasks and show strong performance in modeling local/global dependencies. Directional self-attention network shows the competitive performance on various datasets, but it not considers the reverse information of a sentence. In this paper, we propose the Multiway Dynamic Mask attention Network (MDMAN). The model has two modules: a dynamic mask selector and a multi-attention encoder. The dynamic mask selector chooses high-quality reverse information with reinforcement learning and feeds reverse information to multi-attention encoder, the multi-attention encoder uses four attention functions to match the word in the same sentence at different token level, then combine the information from all functions to obtain the final representation. Our experiments performed on two publicly available NLI datasets show that MDMAN achieves significant improvement over DSAN.

Keywords

Natural language processing attention mechanism reinforcement learning natural language inference

Get full access to this article

View all access options for this article.

References

Sun

et al., Recognizing Text Entailment via Bidirectional LSTM Model with Inner-Attention, 10363 (2017), 448–457.

Bowman

S.R.

Angeli

Potts

and Manning

C.D.

, A large annotated corpus for learning natural language inference, CoRR abs/1508.05326, 2015.

Williams

Nangia

and Bowman

S.R.

, A broad-coverage challenge corpus for sentence understanding through inference, CoRR abs/1704.05426, 2017.

Conneau

Kiela

Schwenk

Barrault

and Bordes

, Supervised learning of universal sentence representations from natural language inference data, CoRR abs/1705.02364, 2017.

Bowman

S.R.

Gauthier

Rastogi

Gupta

Manning

C.D.

and Potts

, A fast unified model for parsing and sentence understanding, arXiv preprint arXiv:1603.06021, 2016.

Kim

Kang

and Kwak

, Semantic sentence matching with densely-connected recurrent and co-attentive information, Proceedings of the AAAI Conference on Artificial Intelligence 33 (2019), 6586–6593.

Liu

Chen

and Gao

, Multi-Task Deep Neural Networks for Natural Language Understanding, ACL, 2019, 4487–4496.

Yoon

Lee

and Lee

, Dynamic Self-Attention: Computing Attention over Words Dynamically for Sentence Embedding, CoRR abs/1808.07383, 2018.

Devlin

Chang

Lee

and Toutanova

, BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018, 4171–4186.

10.

Dong

Huang

Yang

and Yan

, More is less: A more complicated network with less inference complexity, CVPR, 2017, 1895–1903.

11.

Aarne Talman, Anssi Yli-Jyrä, Jörg Tiedemann, Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture, CoRR abs/1808.08762, 2018.

12.

Vaswani

Shazeer

Parmar

et al., Attention is all you need, Advances in neural information processing systems, 2017, 5998–6008.

13.

Yang

Yao

Sun

and Xu

, Exploiting the complementary strengths of multi-layer CNN features for image retrieval, Neurocomputing 237 (2017), 235–241.

14.

Yin

Schütze

Xiang

and Zhou

, ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs, 4 (2016), 259–272.

15.

Shen

Zhou

Long

Jiang

Pan

and Zhang

, Disan, Directional self-attention network for rnn/cnn-free language understanding, 2018, 245–254.

16.

Parikh

A.P.

Täckström

Das

and Uszkoreit

, A Decomposable Attention Model for Natural Language Inference, EMNLP, 2016, 2249–2255.

17.

Hudson

D.A.

and Manning

C.D.

, Compositional Attention Networks for Machine Reasoning, ICLR (Poster), 2018.

18.

Chen

Zhu

Ling

Wei

Jiang

and Inkpen

, Enhanced lstm for natural language inference, 2017, 1657–1668.

19.

Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajishirzi, Bidirectional Attention Flow for Machine Comprehension, ICLR (Poster), 2017.

20.

Tan

Dos Santos

Xiang

and Zhou

, Improved representation learning for question answer matching, 2016, 464–473.

21.

Wang

and Jiang

, Learning Natural Language Inference with LSTM, 2016, 1442–1451.

22.

Wang

Hamza

and Florian

, Bilateral Multi-Perspective Matching for Natural Language Sentences, 2017, 4144–4150.

23.

Tan

Wei

Wang

and Zhou

, Multiway Attention Networks for Modeling Sentence Pairs, 2018, 4411–4417.

24.

Seo

M.J.

Kembhavi

Farhadi

and Hajishirzi

, Bidirectional Attention Flow for Machine Comprehension, 2017.

25.

Wang

and Jiang

, A Compare-Aggregate Model for Matching Text Sequences, 2017.

26.

Nie

and Bansal

, Shortcut-Stacked Sentence Encoders for Multi-Domain Inference, 2017, 41–45.

27.

Pennington

Socher

and Manning

, Glove: Global vectors for word representation, 2014, 1532–1543.

28.

Abadi

Agarwal

Barham

Brevdo

Chen

and Citro

, TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems, CoRR abs/1603.04467, 2016.

29.

Glorot

and Bengio

, Understanding the difficulty of training deep feedforward neural networks, 2010, 249–256.

30.

Mou

Men

Zhang

Yan

and Jin

, Natural Language Inference by Tree-Based Convolution and Heuristic Matching, 2016.

31.

and Munkhdalai

, Neural Semantic Encoders, 2017, 397–407.

32.

Chen

Zhu

Ling

Wei

Jiang

and Inkpen

, Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference, 2017, 36–40.

33.

and Cho

, Distance-based Self-Attention Network for Natural Language Inference, CoRR abs/1712.02047, 2017.

34.

Balazs

J.A.

Marrese-Taylor

Loyola

and Matsuo

, Refining Raw Sentence Representations for Textual Entailment Recognition via Attention, 2017, 51–55.

Multiway dynamic mask attention networks for natural language inference

Abstract

Keywords

Get full access to this article

References