Abstract
The increase of depth is essential for the success of Deep Neural Networks while also leads to the difficulty of training. In light of this, the authors propose a novel multi-layer LSTM model called Highway-DC via introducing Highway Networks (Highway) to Densely Connected Bi-LSTM (DC-Bi-LSTM) which representation of each layer concatenates the output of itself and all preceding layers. Highway is applied to control the volume of input or output of each layer in DC-Bi-LSTM to the next. However, results reveal that Highway-DC shows no improvement over DC-Bi-LSTM, thus an extended version of Highway named Highway II is proposed via eliminating the multiplicative connections between transform gate and the output in Highway thus preserve the learning of each layer. And the Highway II-based model is named Highway II-DC. Evaluated on 7 benchmark datasets of text classification with compare to DC-Bi-LSTM and other state-of-the-art approaches, results indicate that Highway II-DC shows promising performance for achieving state-of-the-art on 3 datasets and surpassing DC-Bi-LSTM on 6 datasets with faster speed to converge. Besides, it can still enjoy the gain of increased layers with depth up to 30, while DC-Bi-LSTM gets saturated early at a depth of 15.
Get full access to this article
View all access options for this article.
