Sage Journals: Discover world-class research

Abstract

Tamil is one of the world's oldest classical languages still in use. The Tamil language boasts a rich and extensive literary tradition, dating back over 2,000 years. Tamil literature addresses various aspects of life, such as love, war, social values and religion. Tamil classical literature encodes human emotions through dense metaphor, symbolism, and cultural convention, posing significant challenges for automatic emotion analysis. This research investigates the classification of melancholic emotions in Kuruntokai, a Sangam-era Tamil poetic anthology, focusing on two dominant affective categories: Lamentation and Consolation. A manually annotated dataset of 401 poems, along with their explanatory prose (urai), is used to evaluate classical machine learning models, recurrent neural networks, and a fine-tuned multilingual BERT (mBERT) model. To address the linguistic complexity of classical Tamil, the framework incorporates morphological analysis, a word reformation algorithm tailored to poetic constructs, and subword-level tokenization. Experimental results show that while Support Vector Machines perform best among classical classifiers, the fine-tuned mBERT model achieves superior performance, attaining an accuracy of 78% on urai-based classification. Quantitative analysis, supported by statistical significance tests and confidence intervals, demonstrates that explanatory prose provides richer emotional cues than the original poems. Qualitative error analysis further reveals how metaphorical compression in poetry leads to misclassification, which is resolved through urai. The findings highlight the effectiveness of transformer-based models for emotion classification in classical Tamil literature and underscore the importance of explanatory prose for reliable affective modelling.

Keywords

tamil literature natural language processing text classification emotions

Get full access to this article

View all access options for this article.

References

Adigalasiriyar. Tolkāppiyam: Porulatikāram – ceyyuliyal. Tolkappiyam: Book of Semantics – Chapter of Poetry. Tamil University Thanjavur, 1985.

Anandan

Saravanan

Parthasarathi

Geetha

T. V.

(2002). Morphological Analyzer for Tamil. International Conference on Natural Language Processing.

Anita

Subalalitha

C. N.

(2021). A discourse-based information retrieval for tamil literary texts. Journal of Information and Communication Technology, 20(3), 353–389. https://doi.org/10.32890/jict2021.20.3.4

Anita

Subalalitha

C. N.

(2022a). A novel classification framework for the Thirukkural for building an efficient search system. Journal of Intelligent & Fuzzy Systems, 42(3), 2397–2408. https://doi.org/10.3233/JIFS-211667

Anita

Subalalitha

C. N.

(2022b). An analysis on semantic interpretation of tamil literary texts. Journal of Mobile Multimedia, 18(3), 661–682. https://doi.org/10.13052/jmm1550-4646.1839

Anita

Subalalitha

C. N.

(2022c). Contextual analysis of tamil proverbs for automatic meaning extraction. In International conference on speech and language technologies for low-resource languages (pp. 231–243). Springer International Publishing.

Bellamkonda

Lohakare

Patel

(2022). A dataset for detecting humor in telugu social Media text. In Proceedings of the Second Workshop on Speech and Language Technologies for Dravidian Languages (pp. 9–14). https://doi.org/10.18653/v1/2022.dravidianlangtech-1.2

Bharathi

Samyuktha

G. U.

(2021). Machine learning based approach for sentiment analysis on multilingual code mixing text. FIRE (Working Notes), 1038–1043. https://ceur-ws.org/Vol-3159/T6-18.pdf

Chakravarthi

B. R.

Priyadharshini

Muralidaran

Jose

Suryawanshi

Sherly

McCrae

J. P.

(2022). Dravidiancodemix: Sentiment analysis and offensive language identification dataset for dravidian languages in code-mixed text. Language Resources and Evaluation, 56(3), 765–806. https://doi.org/10.1007/s10579-022-09583-7

10.

Chakravarthi

B. R.

Sripriya

Bharathi

Nandhini

Navaneethakrishnan

S. C.

Durairaj

Ponnusamy

Kumaresan

P. K.

Ponnusamy

K. K.

Rajkumar

(2023). Overview of Sarcasm Identification of Dravidian Languages in DravidianCodeMix@ FIRE-2023.

11.

Chen

Kong

(2021). Cs@ DravidianLangTech-EACL2021: Offensive language identification based on multilingual BERT model. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages (pp. 230–235).

12.

Devlin

Chang

M. W.

Lee

Toutanova

(2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, vol. 1 (long and short papers) (pp. 4171–4186).

13.

Elanchezhiyan

Geetha

T. V.

Ranjani

Karky

(2011). Kuralagam-Concept relation based search engine for thirukkural. In Tamil Internet Conference. University of Pennsylvania, Philadelphia, USA (pp. 19–23).

14.

Gwet

K. L.

(2008). Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1), 29–48. https://doi.org/10.1348/000711006X126600

15.

Huang

Bai

(2021). HUB@ DravidianLangTech-EACL2021: Identify and classify offensive text in multilingual code mixing in social media. In Proceedings of the First Workshop on Speech and Language Technologies for Dravidian Languages (pp. 203–209).

16.

Jayaraman

Mirnalinee

T. T.

Anandan

K. R.

Kumar

A. S.

Anand

(2021). Offensive text prediction using machine learning and deep learning approaches. FIRE (Working Notes), 688–695. https://ceur-ws.org/Vol-3159/T3-11.pdf

17.

Kalaivani

Thenmozhi

(2021). Multilingual sentiment analysis in tamil, malayalam, and kannada code-mixed social media posts using MBERT. FIRE (Working Notes), 1020–1028. https://ceur-ws.org/Vol-3159/T6-16.pdf

18.

Kumar

Lahiri

Ojha

A. K.

Bansal

(2020). ComMA@ FIRE 2020: Exploring multilingual joint training across different classification tasks. FIRE (Working Notes), 823–828. https://ceur-ws.org/Vol-2826/T10-3.pdf

19.

Lin

Wattanachote

Jiang

Wang

(2021). Multilingual text classification for dravidian languages. arXiv preprint arXiv:2112.01705

20.

Madhavan

K. V.

Nagarajan

Sridhar

(2012). Rule based classification of tamil poems. International Journal of Information and Education Technology, 2(2), 156. https://doi.org/10.7763/IJIET.2012.V2.99

21.

Namburu

S. S. G.

Soman

K. P.

Kumar

S. S.

Mohan

(2024). Effectiveness of GNN based approach for topic classification of telugu text. In 2023 4th International Conference on Intelligent Technologies (CONIT) (pp. 1–5). IEEE.

22.

Pavan Kumar

P. H. V.

Premjith

Sanjanasri

J. P.

Soman

K. P.

(2021). Deep Learning Based Sentiment Analysis for Malayalam, Tamil and Kannada Languages.

23.

Pires

Schlinger

Garrette

(2019). How multilingual is multilingual BERT?. arXiv preprint arXiv:1906.01502.

24.

Ramaswamy

(2023). Passions of the tongue: Language devotion in tamil India, 1891–1970 (Vol. 29). Univ of California Press.

25.

Rashmi

K. B.

Guruprasad

H. S.

Shambhavi

B. R.

(2021). Sentiment classification on bilingual code-mixed texts for dravidian languages using machine learning methods. FIRE (Working Notes), 899–907. https://ceur-ws.org/Vol-3159/T6-3.pdf

26.

Sai

Sharma

(2020). Siva@ HASOC-dravidian-CodeMix-FIRE-2020: Multilingual offensive speech detection in code-mixed and romanized text. FIRE (Working Notes), 336–343. https://ceur-ws.org/Vol-2826/T2-32.pdf

27.

Sangeetha

(2025). Recent trends, growth and opportunities of sangam literature apply for modern sciences. Journal of Tamil Culture and Literature, 4(S1-i1-May), 202–205. https://doi.org/10.5281/zenodo.15332092

28.

Saroj

Pal

(2020). Sentiment analysis on multilingual code mixing text using BERT-BASE: Participation of IRLab@ IIT (BHU) in dravidian-CodeMix and HASOC tasks of FIRE2020. 16–20.

29.

Sreelakshmi

Premjith

Chakravarthi

B. R.

Soman

K. P.

(2024). Detection of Hate Speech and Offensive Language CodeMix Text in Dravidian Languages Using Cost-Sensitive Learning Approach. IEEE Access, 12, 20064–20090. https://doi.org/10.1109/ACCESS.2024.3358811

30.

Subalalitha

C. N.

(2019). Information extraction framework for kurunthogai. Sādhanā, 44(7), 156. https://doi.org/10.1007/s12046-019-1140-y

31.

Vijayabalan

Singaraj Rosary

Ali

(2025). Fuzzy random variables and transforms: A modern perspective on signal processing. Boletim da Sociedade Paranaense de Matemática, 2025(43), 1–20. https://doi.org/10.5269/bspm.76443

32.

Zhu

Wang

Cambria

Rida

López

J. S.

Cui

Wang

(2025). RMER-DT: Robust multimodal emotion recognition in conversational contexts based on diffusion and transformers. Information Fusion, 123, 103268. https://doi.org/10.1016/j.inffus.2025.103268

Fine-Tuning Multilingual BERT for Melancholic Emotion Detection in Kuruntokai

Abstract

Keywords

Get full access to this article

References