Abstract
Audio sentiment analysis is pivotal for discerning nuances in spoken communication, impacting fields such as customer support, healthcare, and beyond. This paper seeks to investigate the latest methodologies and technological advancements in audio sentiment analysis, assessing the primary techniques used to recognize and interpret emotions within audio signals. By utilizing diverse multilingual datasets, it demonstrates versatility and applicability across different languages. The proposed emotion classification model, combining (LSTM + CNN) with Logistic Regression, achieved an accuracy of 93.33%. Leveraging the strengths of LSTM networks, VGGish features through a Convolutional Neural Network, and logistic regression for stacking, this model offers a rich framework for analyzing emotional content in audio recordings. Future work will focus on implementing this model in practical applications, such as enhancing user experience in virtual assistants, improving mental health monitoring systems, and integrating emotion recognition in everyday communication tools. Key challenges, including data diversity and model robustness, are discussed, along with emerging trends and future research possibilities. This study provides a comprehensive view of the current field, identifying promising directions for further development in audio sentiment analysis.
Keywords
Get full access to this article
View all access options for this article.
