Abstract
Attention mechanisms are widely used on NLP tasks and show strong performance in modeling local/global dependencies. Directional self-attention network shows the competitive performance on various datasets, but it not considers the reverse information of a sentence. In this paper, we propose the Multiway Dynamic Mask attention Network (MDMAN). The model has two modules: a dynamic mask selector and a multi-attention encoder. The dynamic mask selector chooses high-quality reverse information with reinforcement learning and feeds reverse information to multi-attention encoder, the multi-attention encoder uses four attention functions to match the word in the same sentence at different token level, then combine the information from all functions to obtain the final representation. Our experiments performed on two publicly available NLI datasets show that MDMAN achieves significant improvement over DSAN.
Keywords
Get full access to this article
View all access options for this article.
