Abstract
This paper proposes a pre-classification based language identification (LID) system for Indian languages. In this system, firstly, languages are pre-classified into tonal and non-tonal categories and then individual languages are identified from the languages of the respective category. In this work, language discriminating ability of various acoustic features like, pitch Chroma, mel-frequency Cepstral coefficients (MFCCs) and their combination has been investigated. The system performance has been analyzed for features extracted using different analysis units, like, syllables and utterances. The effectiveness of deep residual networks (ResNets) model in identification of Indian languages has been studied. Also, the system performance has been compared with the performances of other deep neural network architectures like, Convolutional Neural network (CNN) model, cascade CNN-long short-term memory (LSTM) model and shallow architecture like, ANN. Experiments have been carried out on NIT Silchar language database (NITS-LD) and OGI-Multilingual database (OGI-MLTS). Experimental analysis suggests that proposed ResNets model, based on syllable-level features, outperforms the other models. The pre-classification module provides accuracies of 96.6%, 93.2% and 90.6% for NITS-LD, and 92.1%, 89.3% and 85.4% for OGI-MLTS database, with 30s, 10s and 3s test data respectively. The pre-classification module helps to improve the system performance by 3.8%, 4.1% and 4.3% for 30s, 10s and 3s test data respectively. For OGI-MLTS database, the respective improvements are 6.8%, 6.5% and 5.4%.
Get full access to this article
View all access options for this article.
