Computer Aided Qur’an Pronunciation using DNN

Abstract

This paper presents a system for improving the quality of pronunciation error detection and correction for Qur’an recitation by Non-Arabic speakers. Most of the classical speech recognition systems are built using the Hidden Markov Model (HMM) with a Mixture of Gaussian Model (GMM). This paper attempts to enhance the GMM-HMM model’s performance by using Deep Neural Networks (DNNs). The major part of the work done in this paper is involved in the collection and processing of speakers’ data, and building and evaluation of baseline GMM system and the proposed DNN acoustic models for the Qur’an recitation framework. With the aim of solving some pronunciation problems and enhancing the overall performance of such a speech recognition system, we replace the mixture of Gaussians with a DNN. The DNN-HMM model outperforms the GMM-HMM model by 1.02% based on HTK’s word accuracy equation. By calculating the insertion results for both models, DNN-HMM showed progress by 2.59%. In addition, in substitution results, DNN-HMM shows progress with the confusion phonemes DAA by 15.09% and DHA by 17.28%. All experiments and results are presented and discussed in detail.

Keywords

Computer Aided Language Pronunciation Hidden Markov Model Automatic Speech Recognition Deep Neural Network

Get full access to this article

View all access options for this article.

References

Fissore

, Laface

and Ruscitti

, HMM modeling for speaker independent voice dialing in car environment, 1992 IEEE International Conference1 (1992), 249–252.

Lamel

, Gauvain

, Le

, Oparin

and Meng

, Improved models for Mandarin speech-to-text transcription, IEEE International Conference (2011), 4660–4663.

Pieraccini

, The Voice in the Machine: Building Computers That Understand Speech,England: London;, MIT Press2012.

Ahmed

, The Holy Quran – A Linguistic Miracle, 19 November [Online]. Available:. [Accessed 20 January], (2014) http://cisweb.lk/the-miracle-of-the-quran-by-khalid-baig/.

Czerepinski

K.C.

, Tajweed rules of the Qur’an - Part One, Syria - Damascus:Dar Al-Khair Islamic Books Publisher;, 2003.

Abdou

S.M.

, Hamid

S.E.

, Rashwan

, Samir

, Abd-Elhamid

, Shahin

and Nazih

, Computer aided pronunciation learning system using speech recognition, in USA;, INTERSPEECH 2006Pittsburgh, PA2006.

Hinton

, Deng

, Yu

, Dahl

, Mohamed

, Jaitly

, Senior

, Vanhoucke

, Nguyen

, Sainath

and Kingsbury

, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, Signal Processing Magazine, IEEE (2012), 82–97.

Rabiner

, A tutorial on hidden markov models and selected applications in speech, Proceedings of the IEEE77(2) (1989).

Demuynck

and Triefenbach

, Porting concepts from DNNs back to GMMs, IEEE Workshop (2013), 356–361.

10.

Juang

B.-H.

and Rabiner

, Mixture autoregressive hidden Markov models for speech signals, IEEE Transactions on Acoustics, Speech, and Signal Processing (1985), 1404–1413.

11.

Pavelka

and Ekštein

, stein, A comparison of acoustic models based on neural networks and gaussian mixtures, Text, Speech and Dialoguein SpringerBerlin Heidelberg, (2009), pp291–298 .

12.

, Zhao

, Jiang

, Zhang

, Wang

, Gonzalez

, Valentin

and Sahli

, Hybrid deep neural network - hidden markov model (DNN-HMM) based speech emotion recognition, in Geneva;, Affective Computing and Intelligent Interaction (ACII), 2013 Humaine Association2013.

13.

Mohamed

A.-R.

, Dahl

and Hinton

, Deep Belief Networks for phone recognition, NIPS Workshop, Whistler, BC, Canada;, 2009.

14.

Seki

, Yamamoto

and Nakagawa

, Comparison of syllable-based and phoneme-based DNN-HMM in Japanese speech recognition, in Indonesia;, (ICAICTA), 2014.

15.

Cardillo

P.S.

, Clements

, Miller

M.S.

, Phonetic searching vs. LVCSR: How to find what you really want in audio archives, International Journal of Speech Technology5(1) (2002), 9–22.

16.

C.-H.

, Shen

H.-P.

and Yang

Y.-T.

, Chinese-English phone set construction for code-switching ASR using acoustic and DNN-extracted articulatory features, IEEE Press Piscataway (2014), 858–862.

17.

Alghamdi

and El

Y.O.M.

, Hadj and M. Alkanhal, A manual system to segment and transcribe arabic speech, in Dubai;, Signal Processing and Communications2007.

18.

Gaikwad

S.K.

, Gawali

B.W.

and Yannawar

, A review on speech recognition technique, International Journal of Computer Applications10(3) (2010), 0975–8887.

19.

Tabbal

and El

, Falou and B. Monla, Analysis and implementation of a “Quranic” verses delimitation system in audio files using speech recognition techniques, in Damascus, Syria;, Information and Communication Technologies2006.

20.

Walker

, Lamere

, Kwok

, Raj

, Singh

, Gouvea

, Wolf

and Woelfel

, Sphinx-4: A flexible open source framework for speech recognition, Sun Microsystems, lifornia;, Menlo Park, Ca, 2004.

21.

Muhammad

W.M.

, Muhammad

and A.M.

M.-E.

, Voice Content Matching System for Quran Readers, in mexico;, Ninth Mexican International Conference on Artificial Intelligence2010.

22.

Chelba

, Bikel

, Shugrina

, Nguyen

and Kumar

, Large Scale Language Modeling in Automatic Speech Recognition, Reseach at Google;, (2012), pp1–7.

23.

Ar-Ra’ee

S.N.M.

, Noorani Qa’idah, India: Darul Salaam; 2nd edition -01-01);, (1656), 2009.

24.

Abdallah

, Al-Marri

, Abdou

, Raafat

, Rashwan

and El-Gamal

M.A.

, Improving holy qur’an recitation system using hybrid deep neural network-hidden markov model approach, International Journal on Islamic Applications in Computer Science And Technology4(3) (2015), 1–8.

25.

Tevah

R.T.

, GMM-HMM, [Online]. Available: [Accessed 01 October], (2015). http://www.gta.ufrj.br/grad/09_1/versao-final/impvocal/hmms_arquivos/image002.jpg.

26.

Skowronski

and Harris

, Improving the filter bank of a classic speech feature extraction algorithm, in Bangkok, Thailand;, ISCAS ’032003.

27.

Nadeu

, Macho

and Hernando

, Time and frequency fltering of flter-bank energies for robust HMM speech recognition, Speech Communication - Elsevier34 (2001), 93–114.

28.

Shu

, Hetherington

L.L.

and Glass

, Baum-Welch training for segment-based speech recognition, in, Automatic Speech Recognition and Understanding, 2003. ASRU ’03. 2003 IEEE Workshop on2003.

29.

Mohamed

A.-R.

, Hinton

and Penn

, Understanding how Deep Belief Networks perform acoustic modelling, in Kyoto;, Speech and Signal Processing (ICASSP)2012.

30.

Letters and Sounds Pronunciation, [Online]. Available: http://41.media.tumblr.com/tumblr_m3sxwzrke71qirjfeo1_500.jpg [Accessed 24 December], (2015).

31.

Kaur

, Review On Error Detection and Error Correction Techniques in NLP, International Journal of Advanced Research in Computer Science and Software Engineering (2014), 851–853.