Abstract
Rhythmic stress detection is an important but difficult problem in speech recognition. This paper describes an approach to the automatic detection of rhythmic stress in New Zealand spoken English using a linear genetic programming system with speaker independent prosodic features and vowel quality features as terminals to classify each vowel segment as stressed or unstressed. In addition to the four standard arithmetic operators, this approach also uses other functions such as trigonometric and conditional functions in the function set to cope with the complexity of the task. The error rate on the training set is used as the fitness function. The approach is examined and compared to a decision tree approach and a support vector machine approach on a speech data set with 703 vowels segmented from 60 female adult utterances. The genetic programming approach achieved a maximum average accuracy of 92.6%. The results suggest that the genetic programming approach developed in this paper outperforms the decision tree approach and the support vector machine approach for stress detection on this data set in terms of the detection accuracy, the ability of handling redundant features, and the automatic feature selection capability.
Get full access to this article
View all access options for this article.
