Abstract
This paper presents a technique to detect the six affective states of individual using audio cues. Bi-spectral features extracted from entire speech signal and voiced part of speech are used to create feature vectors. For classification K-Nearest Neighbor (KNN) and Simple Logistic Classifiers (SL) are used. eNTERFACE audio-visual emotional speech corpus that consists of six archetypal affective states: Fear, Anger, Disgust, Sad, Happy, and Surprise is considered. The performance of the system is analyzed based on features obtained from voiced part of speech and features obtained from the entire speech signal. The work proposed is first of its kind in affect computation, where a compact 13-dimensional Bi-spectral features extracted from the voiced speech segments is able to yield promising performance. A considerable improvement of 8.46% – 27.6% recognition rate is achieved with the proposed methodology compared to the existing approaches using emotion samples from the same speech corpus adding novelty to the proposed work.
Get full access to this article
View all access options for this article.
