A Novel Stacked Ensemble Framework for Sentiment Classification in Kashmiri

Abstract

The task of sentiment analysis is fundamental for Natural Language Processing because it allows Natural Language Processing (NLP) systems to group texts based on subjective expressions. Despite progress has been made in multilingual sentiment classification, Kashmiri language struggles to gain representation due to non-existent labeled datasets and pre-trained models. The study presents EnsembleSenti-Kash which represents a stacked ensemble learning system built to perform Kashmiri sentiment analysis. To address the lack of existing resources, a labeled Kashmiri sentiment dataset was manually developed for this study, along with a custom stopword file to enhance text preprocessing. TF-IDF vectorization was employed for feature extraction. The proposed model combines Support Vector Machine (SVM), Random Forest (RF), XGBoost and Logistic Regression (LR) as a base and Logistic Regression classifier as a meta-classifier. In addition, to the ensemble model, individual classifiers (SVM, RF, LR) were trained and evaluated, achieving accuracies of 93.1%, 91.5%, and 92.75%, respectively. The ensemble model outperformed all individual classifiers with an overall accuracy of 93.35% and works best in classification of sentiments among all the models. To strengthen the analysis, deep learning models including Long Short-Term Memory (LSTM) and multilingual BERT (mBERT) were also evaluated as baselines. A real-time usability study was performed, reporting model size, number of parameters and inference time, which demonstrated that the proposed model offers a practical balance between accuracy and efficiency. The performance of all models under study is also confirmed through AUC-ROC evaluation. This research lays important groundwork for Kashmiri NLP by providing a new dataset and demonstrating an effective sentiment classification approach.

Keywords

sentiment analysis natural language processing stacked ensemble learning TF-IDF feature extraction LSTM mBERT real-time usability

Get full access to this article

View all access options for this article.

References

Ahmad

Aftab

Ali

(2017). Sentiment analysis of tweets using SVM. International Journal of Computer Applications, 177(5), 25–29. https://doi.org/10.5120/ijca2017915758

Al Amrani

Lazaar

El Kadirp

K. E.

(2018). Random forest and support vector machine based hybrid approach to sentiment analysis. In Procedia computer science (pp. 511–520). Elsevier B.V. https://doi.org/10.1016/j.procs.2018.01.150

Asghar

M. Z.

Khan

Ahmad

Qasim

Khan

I. A.

(2017). Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS One, 12(2), e0171649. https://doi.org/10.1371/journal.pone.0171649

Bhonde

Bhagwat

Ingulkar

Pande

(2015). Sentiment analysis based on dictionary approach. International Journal of Emerging Engineering Research and Technology, 3(1), 51–55. http://www.ijeert.org .

Bird

J. J.

Ekárt

Buckingham

C. D.

Faria

D. R.

High resolution sentiment analysis by ensemble classification. http://tripadvisor.co.uk

Dang

N. C.

Moreno-García

M. N.

De la Prieta

(2020). Sentiment analysis based on deep learning: A comparative study. Electronics (Switzerland), 9(3), 483. https://doi.org/10.3390/electronics9030483

Hameed

Ahmadi

Daneshfar

(2023). Transfer learning for low–resource sentiment analysis (Central Kurdish). arXiv preprint arXiv:2304.04703.

Hasan

M. A.

(2024). Ensemble language models for multilingual sentiment analysis. arXiv preprint arXiv:2403.06060.

Joloudari

J. H.

Hussain

Nematollahi

M. A.

Bagheri

Fazl

Alizadehsani

Lashgari

Talukder

(2023). BERT-deep CNN: state of the art for sentiment analysis of COVID-19 tweets. Social Network Analysis and Mining, 13(1). https://doi.org/10.1007/s13278-023-01102-y

10.

Kamruzzaman

Kim

G. L.

(2023). Efficient sentiment analysis: A resource-aware evaluation of feature extraction techniques, ensembling, and deep learning models. arXiv preprint arXiv:2308.02022.

11.

Kannan

R. R.

Rajalakshmi

Kumar

(2021). IndicBERT based approach for sentiment analysis on code-mixed Tamil tweets. http://ceur-ws.org

12.

Khan

Amjad

Ashraf

Chang

H. T.

(2022). Multi-class sentiment analysis of Urdu text using multilingual BERT. Scientific Reports, 12(1). https://doi.org/10.1038/s41598-022-09381-9

13.

Le Chan

J. Y.

Bea

K. T.

Leow

S. M. H.

Phoong

S. W.

Cheng

W. K.

(2023). State of the art: A review of sentiment analysis based on sequential transfer learning. Artificial Intelligence Review, 56(1), 749–780. https://doi.org/10.1007/s10462-022-10183-8

14.

Lone

N. A.

Giri

K. J.

Bashir

(2022). Natural language processing resources for the Kashmiri language. Indian Journal Of Science And Technology, 15(43), 2275–2281. https://doi.org/10.17485/IJST/v15i43.1964

15.

Mabokela

K. R.

Primus

Celik

(2025). Advancing sentiment analysis for low–resourced African languages using pre-trained language models: Southern African languages benchmark. PLoS One, 20(6). https://doi.org/10.1371/journal.pone.0325102

16.

Murthy

G. S. N.

Rao Allu

Andhavarapu

Bagadi

Belusonti

Text based sentiment analysis using LSTM; Text based sentiment analysis using LSTM. https://www.ijert.org

17.

Naseem

Razzak

Musial

Imran

(2020). Transformer based deep intelligent contextual embedding for Twitter sentiment analysis. Future Generation Computer Systems, 113, 58–69. https://doi.org/10.1016/j.future.2020.06.050

18.

Pingle

Vyawahare

Joshi

Tangsali

Joshi

(2023). L3Cube-MahaSent-MD: A multi-domain marathi sentiment analysis dataset and transformer models. arXiv:2306.13888.

19.

Rakshitha

Ramalingam

H. M.

Pavithra

Advi

H. D.

Hegde

(2021). Sentimental analysis of Indian regional languages on social media. Global Transitions Proceedings, 2(2), 414–420. https://doi.org/10.1016/j.gltp.2021.08.039

20.

Rana

M. R. R.

Nawaz

Iqbal

(2018). A survey on sentiment classification algorithms, challenges and applications. Acta Universitatis Sapientiae, Informatica, 10(1), 58–72. https://doi.org/10.2478/ausi-2018-0004

21.

Salur

M. U.

Aydin

(2020). A novel hybrid deep learning model for sentiment classification. IEEE Access, 8, 58080–58093. https://doi.org/10.1109/ACCESS.2020.2982538

22.

Shah

S. R.

Kaushik

(2019). Sentiment analysis on Indian Indigenous languages: A review on multilingual opinion mining. https://doi.org/10.20944/preprints201911.0338.v1

23.

Subramanian

Shanmugavadivel

Dharshini

Deepiga

Praveenkumar

Ananthakumar

(2025). Sentiment analysis for low-resource languages: Insights from Tamil and Tulu using deep learning and machine learning models. Proceedings of DravidianLangTech 2025.

24.

Talaat

A. S.

(2023). Sentiment analysis classification system using hybrid BERT models. Journal of Big Data, 10(1). https://doi.org/10.1186/s40537-023-00781-w

25.

Thakkar

Preradović

N. M.

Tadić

(2024). Examining sentiment analysis for low–resource languages with data augmentation. Information, 5(4), 152. https://doi.org/10.3390/eng5040152

26.

Yadav

Vishwakarma

D. K.

(2020). Sentiment analysis using deep learning architectures: A review. Artificial Intelligence Review, 53(6), 4335–4385. https://doi.org/10.1007/s10462-019-09794-5