Sage Journals: Discover world-class research

Abstract

The increase of depth is essential for the success of Deep Neural Networks while also leads to the difficulty of training. In light of this, the authors propose a novel multi-layer LSTM model called Highway-DC via introducing Highway Networks (Highway) to Densely Connected Bi-LSTM (DC-Bi-LSTM) which representation of each layer concatenates the output of itself and all preceding layers. Highway is applied to control the volume of input or output of each layer in DC-Bi-LSTM to the next. However, results reveal that Highway-DC shows no improvement over DC-Bi-LSTM, thus an extended version of Highway named Highway II is proposed via eliminating the multiplicative connections between transform gate and the output in Highway thus preserve the learning of each layer. And the Highway II-based model is named Highway II-DC. Evaluated on 7 benchmark datasets of text classification with compare to DC-Bi-LSTM and other state-of-the-art approaches, results indicate that Highway II-DC shows promising performance for achieving state-of-the-art on 3 datasets and surpassing DC-Bi-LSTM on 6 datasets with faster speed to converge. Besides, it can still enjoy the gain of increased layers with depth up to 30, while DC-Bi-LSTM gets saturated early at a depth of 15.

Keywords

Deep neural networks Bi-LSTM text classification highway

Get full access to this article

View all access options for this article.

References

Graves

, Mohamed

A.-R.

and Hinton

Speech recognition with deep recurrent neural networks, In ICASSP, 2013.

Russakovsky

, Deng

, Su

, Krause

, Satheesh

, Ma

, Huang

, Karpathy

, Khosla

and Bernstein

et al., Imagenet large scale visual recognition challenge, International Journal of Computer Vision 115(3) (2015), 211–252.

Irsoy

and Cardie

Deep recursive neural networks for compositionality in language, In NIPS, 2014.

Ding

, Xia

, Yu

, Li

and Yang

Densely connected bidirectional LSTM with applications to sentence classification, arXiv: 1802.00889, 2018.

Bengio

and LeCun

Scaling learning algorithms towards ai, Large-Scale Kernel Machines 34(5) (2007), 1–41.

Bianchini

and Scarselli

On the complexity of neural network classifiers: A comparison between shallow and deep architectures, IEEE Transactions on Neural Networks, 2014.

Zhang

, Chen

, Yu

, Yaco

, Khudanpur

and Glass

Highway long short-term memory rnns for distant speech recognition, In ICASSP, 2016.

Srivastava

R.K.

, Greff

and Schmidhuber

Training Very Deep Networks, In IMLS, 2015.

Greff

, Srivastava

R.K.

and Schmidhuber

Highway and Residual Networks Learn UnRolled Iterative Estimation, In ICLR, 2017.

10.

Hochreiter

Untersuchungen zu dynamischen neuronalen Netzen. Master’s thesis, Institut f Informatik, Technische Univ Munich (1991).

11.

Bengio

, Simard

and Frasconi

Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks 5(2) (1994), 157–166.

12.

, Zhang

, Ren

and Sun

Deep residual learning for image recognition, arXiv:1512.03385, 2015.

13.

Linnainmaa

The representation of the cumulative rounding error of an algorithm as a taylor expansion of the local rounding errors, Master’s thesis, Univ Helsinki, 1970.

14.

Huang

, Liu

, Weinberger

K.Q.

and van der Maaten

Densely connected convolutional networks, In CVPR, 2017.

15.

Huang

, Jiang

, Hasan

, Jiang

and Li

A topic BiLSTM model for sentiment classification, In Proceeding of the 2nd International Conference on Innovation in Artificial Intelligence, 2018, pp. 143–147.

16.

Hochreiter

and Schmidhuber

Long short-term memory, In Neural Computation 9(8) (1997), 1735–1780.

17.

Gers

F.A.

, Schmidhuber

and Cummins

Learning to forget: Continual prediction with lstm, Neural Computation 12(10) (2000), 2451–2471.

18.

Zhang

, Li

and Wang

Learning document representation via topic-enhanced LSTM model, In Knowledge-Based Systems, 2019.

19.

Wang

, Jiang

and Zhiyong

Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts, In Proceeding of COLING, 2016.

20.

Zhang

, Li

, Song

and Zhang

Encoding conversation context for neural keyphrase extraction from microblog posts, InNAACL-HLT 2018, 2018, pp. 1676–1686.

21.

Zilly

J.G.

, Srivastava

R.K.

, Koutník

and Schmidhube

Recurrent highway networks, In IMLS, 2017.

22.

Cho

, Merrienboer

B.V.

, Gulcehre

, Bahdanau

, Bougares

, Schwenk

and Bengio

Learning phrase representations using rnn encoder-decoder for statistical machine translation, arXiv:1406.1078, 2014.

23.

Bishop

C.M.

Neural networks for pattern recognition, Oxford University Press, 1995.

24.

Ripley

B.D.

Pattern recognition and neural networks, Cambridge University Press, 1996.

25.

Socher

, Perelygin

, Wu

J.Y.

, Chuang

, Manning

C.D.

, Ng

A.Y.

and Potts

Recursive deep models for semantic compositionality over a sentiment treebank, In EMNLP, volume 1631, 2013, p. 1642.

26.

Zeiler

M.D.

and Fergus

Visualizing and understanding convolutional neural networks, In ECCV, 2014.

27.

Zhang

and Wallace

A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification, arXiv:1510.03820, 2015.

28.

Wang

A.H.

Don’t follow me: Spam detection in twitter. In Security and Cryptography (SECRYPT), Proceedings of the 2010 International Conference on, IEEE, 2010, pp. 1–10.

29.

Mikolov

, Sutskever

, Chen

, Corrado

G.S.

and Dean

Distributed representations of words and phrases and their compositionality, In NIPS, 2013.

30.

Wang

, Wang

, Zhang

and Jun

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification, In IJCAI, 2017.

31.

Wang

and Manning

C.D.

Baselines and bigrams: Simple, good sentiment and topic classification, In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, 2012, pp. 90–94.

32.

Zhou

, Sun

, Liu

and Lau

A c-lstm neural network for text classification, arXiv:1511.08630, 2012.

33.

Nair

and Hinton

G.E.

Rectified linear units improve restricted Boltzmann machines, 2010, pp. 807–814.

34.

Zeng

, Liy

, Song

, Gao

, Lyu

M.R.

and King

Topic Memory Networks for Short Text Classification, In EMNLP, 2018.

35.

Kim

Convolutional neural networks for sentence classification, arXiv:1408.5882, 2014.

36.

Hihi

S.E.

and Bengio

Hierarchical recurrent neural networks for long-term dependencies, In NIPS, 1996.

37.

Schmidhuber

Learning complex, extended sequences using the principle of history compression, Neural Computation 4(2) (1992), 234–242.

38.

Graves

Generating sequences with recurrent neural networks, arXiv:1308.0850, 2013.

39.

Pang

and Lillian

A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts, In ACL, 2004.

40.

Pang

and Lillian

Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales, In ACL, 2005.

41.

Socher

, Perelygin

, Wu

, Chuang

, Manning

C.D.

, Ng

and Potts

Recursive deep models for semantic compositionality over a sentiment treebank, In EMNLP, 2013.

42.

and Dan

Learning question classifiers, In COLING, 2002.

43.

Q.V.

and Mikolov

Distributed representations of sentences and documents, arXiv:1405.4053, 2014.

44.

Wilson

, Wiebe

and Hoffmann

Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis, In EMNLP, 2005.

45.

and Bing

Mining and summarizing customer reviews, In ACM, 2004.

46.

Zhang

, Lee

, Radev

Dependency sensitive convolutional neural networks for modeling sentences and documents, arXiv:1611.02361, 2016.

47.

Zhou

, Qi

, Zheng

, Xu

, Bao

and Xu

Text classification improved by integrating bidirectional lstm with two-dimensional max pooling, arXiv:1611.06639, 2016.

48.

Turian

, Ratinov

and Bengio

Word representations: A simple and general method for semi-supervised learning, InACL 2010, pp. 384–394.

Highway II,an extended version of highway networks and its application to densely connected Bi-LSTM

Abstract

Keywords

Get full access to this article

References