Abstract
This study addresses the problem of Chinese pronunciation errors made by native Uyghur speakers by proposing a phoneme confusion modeling framework that integrates cross-linguistic phonological comparison and data-driven methods. Based on speech transfer theory, a corpus of 52,510 standard and non-standard pronunciations (total duration 167.2 hours) was constructed. The dynamic time warping (DTW) algorithm was employed to align phoneme sequences, and Apriori association rule mining was used to uncover high-frequency confusion patterns. The study found that differences in sound, like the lack of retroflex consonants and the simplification of compound vowels, caused systematic errors. For example, the high-frequency substitution of affricates “ch→q” (62.5%) and compound vowels “uo→o/u” (76.1%). Data-driven methods further validated the phonological hypotheses and identified context-dependent phenomena (such as “ian→an” medial weakening). The confusion rule base achieved a robustness of F1-score 0.87. The quantitative model based on the confusion matrix can be integrated into intelligent speech assessment systems to provide real-time corrective feedback (e.g., “‘uo’ detected as misread as ‘o’, probability 76.1%”). This study provides an interpretable rule base and dynamic alignment technical foundation for cross-language pronunciation error correction algorithms, which has theoretical and practical significance for optimizing computer-assisted language teaching systems.
Keywords
Get full access to this article
View all access options for this article.
