Abstract
Speech analysis for extracting attributes such as the speaker, gender, accent and like has been a field of great interest and has been widely studied. The paper presents a novel architecture for accent identification by using a cascade of two deep-learning architecture. We design and test our proposed architecture on common voice dataset. The architecture consists of a cascade of Convolutional Neural Network (CNN) and Convolutional Recurrent Neural Network (CRNN). It is trained on Mel-spectrogram of the audios. We consider five of the most popular English accents groups namely India, Australia, US, England, Canada in this study. The proposed model has an accuracy of 78.48% using CNN and 83.21% using CRNN.
Get full access to this article
View all access options for this article.
