Sage Journals: Discover world-class research

Abstract

The rise of user-generated content on social media has led to increasing instances of toxic speech—such as hate, abuse, and cyberbullying—which threaten online safety and integrity. Manual moderation struggles to cope with the vast content volume and the mental burden it places on human moderators, highlighting the urgent need for automated detection systems. However, many existing approaches require large labeled datasets, which are costly and difficult to obtain. To address this challenge, we propose U-GIFT (Uncertainty-Guided Few-Shot Detection of Toxic Speech), a self-training framework that excels in low-resource settings. U-GIFT integrates Bayesian neural networks with an uncertainty-guided sample selection strategy to identify high-confidence pseudo-labeled samples from unlabeled data. This strategy improves detection accuracy even with minimal supervision. Experiments across multiple benchmarks show that U-GIFT outperforms strong baselines, achieving a 14.92% gain in the 5-shot setting. The method can be applied to different pre-trained language models, is robust to sample imbalance, and generalizes well across domains. By reducing dependence on annotated data, U-GIFT offers an effective, scalable solution for toxic speech detection and supports safer online interactions through improved automated moderation.

Keywords

Toxic Speech prediction uncertainty few-shot learning self-training

Get full access to this article

View all access options for this article.

References

Vogels

. The state of online harassment. Pew Res Center 2021.

Vedeler

Olsen

Eriksen

. Hate speech harms: A social justice discussion of disabled norwegians’ experiences. Disabil Soc 2019; 34: 368–383.

Bleich

. The rise of hate speech and hate crime laws in liberal democracies. J Ethn Migr Stud 2011; 37: 917–934.

Gillespie

. Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. New Haven, CT: Yale University Press, 2018.

Chen

Zhou

Zhu

. Detecting offensive language in social media to protect adolescent online safety. In: 2012 International Conference on Privacy, Security, Risk and Trust / 2012 International Conference on Social Computing, 2012, pp.71–80, IEEE.

Hutto

Gilbert

. VADER: A parsimonious rule-based model for sentiment analysis of social media text. Proc Int AAAI Conf Web Soc Med 2014; 8: 216–225.

ElSherief

Kulkarni

Nguyen

, et al. Hate speech detection in social media. In: Proceedings of the International AAAI conference on web and social media Vol. 12, 2018.

Karayiğit

Acı

ÇI

Akdağı

. Detecting abusive instagram comments in turkish using convolutional neural network and machine learning methods. Expert Syst Appl 2021; 174: 114802.

Zhang

Robinson

Tepper

. Detecting hate speech on Twitter using a convolution-GRU based deep neural network. In: The Semantic Web, 2018, pp.745–760. Springer.

10.

Devlin

Chang

M-W

Lee

, et al. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics. 2019, pp.4171–4186.

11.

Liu

. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692, 2019.

12.

Mozafari

Farahbakhsh

Crespi

. A BERT-based transfer learning approach for hate speech detection in online social media. In: Complex networks and their applications VIII, 2020, pp.928–940. Springer.

13.

Mnassri

Rajapaksha

Farahbakhsh

, et al. BERT-based ensemble approaches for hate speech detection. In: GLOBECOM 2022 – IEEE global communications conference, 2022, pp.4649–4654. IEEE.

14.

Wiedemann

Yimam

Biemann

. UHH-LT at SemEval-2020 Task 12: Fine-tuning of pre-trained transformer networks for offensive language detection. In: Proceedings of the fourteenth workshop on semantic evaluation. 2020, pp.1638–1644.

15.

Poletto

Basile

Sanguinetti

, et al. Resources and benchmark corpora for hate speech detection: A systematic review. Lang Resour Eval 2021; 55: 477–523.

16.

Chen

L-M

Xiu

B-X

Ding

Z-Y

. Multiple weak supervision for short text classification. Appl Intell 2022; 52: 9101–9116.

17.

Dutta

Chakraborty

Roychowdhury

, et al. CRUSH: Contextually regularized and user anchored self-supervised hate speech detection. Findings of NAACL 2022: 1874–1886.

18.

Alsafari

Sadaoui

. Semi-supervised self-training of hate and offensive speech from social media. Appl Artif Intell 2021; 35: 1621–1645.

19.

Ludwig

Dolos

Alves-Pinto

, et al. Unraveling the dynamics of semi-supervised hate speech detection: The impact of unlabeled data characteristics and pseudo-labeling strategies. In: Findings of the association for computational linguistics: EACL 2024. 2024, pp.1974–1986.

20.

Mozafari

Farahbakhsh

Crespi

. Cross-lingual few-shot hate speech and offensive language detection using meta learning. IEEE Access 2022; 10: 14880–14896.

21.

Saha

Sheth

Kedia

, et al. Rationale-guided few-shot classification to detect abusive language. In: ECAI 2023, 2023, pp.2041–2048. IOS Press.

22.

Finn

Abbeel

Levine

. Model-agnostic meta-learning for fast adaptation of deep networks. In: International conference on machine learning. 2017, pp.1126–1135.

23.

Lee

D-H

. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on challenges in representation learning, ICML. 2013, pp.896.

24.

Waseem

Hovy

. Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the NAACL student research workshop. 2016, pp.88–93.

25.

Malmasi

Zampieri

. Challenges in discriminating profanity from hate speech. J Exp Theor Artif Intell 2018; 30: 187–202.

26.

Chatzakou

Kourtellis

Blackburn

, et al. Measuring GamerGate: A tale of hate, sexism, and bullying. In: Proceedings of the 26th international conference on world wide web companion. 2017, pp.1285–1290.

27.

Davidson

Warmsley

Macy

, et al. Automated hate speech detection and the problem of offensive language. In: Proceedings of the International AAAI conference on web and social media. 2017, pp.512–515.

28.

MacAvaney

Yao

H-R

Yang

, et al. Hate speech detection: Challenges and solutions. PLoS ONE 2019; 14: 1–16.

29.

Mahajan

Kumar

. EnsMulHateCyb: Multilingual hate speech and cyberbully detection in online social media. Expert Syst Appl 2024; 236: 121228.

30.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neural Inf Process Syst 2017; 30.

31.

Kennedy

Jin

Davani

, et al. Contextualizing hate speech classifiers with post-hoc explanation. In: Proceedings of the 58th Annual meeting of the association for computational linguistics. 2020, pp.5435–5442.

32.

Mou

Lee

. SWE2: Subword enriched and significant word emphasized framework for hate speech detection. In: Proceedings of the 29th ACM international conference on information & knowledge management. 2020, pp.1145–1154.

33.

Tekirolu

Chung

Y-L

Guerini

. Generating counter narratives against online hate speech: Data and strategies. In: Proceedings of the 58th annual meeting of the association for computational linguistics. 2020, pp.1177–1190.

34.

Radford

Child

, et al. Language models are unsupervised multitask learners. OpenAI Blog 2019; 1: 9.

35.

Zhou

Yong

Fan

, et al. Hate speech detection based on sentiment knowledge sharing. In: Proceedings of the 59th annual meeting of the association for computational linguistics. 2021, pp.7158–7166.

36.

Rodríguez-Sánchez

Carrillo-de-Albornoz

Plaza

. Detecting sexism in social media: An empirical analysis of linguistic patterns and strategies. Appl Intell 2024; 54: 10995–11019.

37.

Firmino

Baptista

Paiva

. Improving hate speech detection using cross-lingual learning. Expert Syst Appl 2024; 235: 121115.

38.

Vinyals

Blundell

Lillicrap

, et al. Matching networks for one-shot learning. Adv Neural Inf Process Syst 2016; 29.

39.

Snell

Swersky

Zemel

. Prototypical networks for few-shot learning. Adv Neural Inf Process Syst 2017; 30.

40.

AlKhamissi

Ladhak

Iyer

, et al. ToKen: Task decomposition and knowledge infusion for few-shot hate speech detection. In: Proceedings of the 2022 conference on empirical methods in natural language processing. 2022, pp.2109–2120.

41.

Lao

, et al. Dual selective knowledge transfer for few-shot classification. Appl Intell 2023; 53: 27779–27789.

42.

Amini

M-R

Feofanov

Pauletto

, et al. Self-training: A survey. Neurocomputing 2025; 616: 128904.

43.

Zhu

Song

. Mutual match for semi-supervised online evolutive learning. Appl Intell 2023; 53: 3336–3350.

44.

Zhao

Wang

, et al. Semi-supervised pedestrian re-identification via a teacher–student model with similarity-preserving generative adversarial networks. Appl Intell 2023; 53: 1605–1618.

45.

Wan

Zhao

, et al. Joint uncertainty model and metric for robust feature selection: A bi-level distribution consideration and feature evaluation approach. Fuzzy Sets Syst 2026; 523: 109615.

46.

Gal

Ghahramani

. Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In: International conference on machine learning. PMLR, 2016.

47.

Gawlikowski

Rovile Njieutcheu

Ali

, et al. A survey of uncertainty in deep neural networks. Artif Intell Rev 2023; 56: 1513–1589.

48.

Mukherjee

Awadallah

. Uncertainty-aware self-training for few-shot text classification. Adv Neural Inf Process Syst 2020; 33: 21199–21212.

49.

Havasi

Jenatton

Fort

, et al. Training independent subnetworks for robust prediction. arXiv preprint arXiv:2010.06610, 2020.

50.

Durasov

Bagautdinov

Baque

, et al. Masksembles for uncertainty estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021.

51.

Sensoy

Kaplan

Kandemir

. Evidential deep learning to quantify classification uncertainty. Adv Neural Inf Process Syst 2018; 31.

52.

Heidrich

Beemelmanns

Nekrasov

, et al. OCCUQ: Exploring efficient uncertainty quantification for 3D occupancy prediction. arXiv preprint arXiv:2503.10605, 2025.

53.

Ganaie

Malik

, et al. Ensemble deep learning: A review. Eng Appl Artif Intell 2022; 115: 105151.

54.

Zhang

Lei

, et al. CLUR: Uncertainty estimation for few-shot text classification with contrastive learning. In: Proceedings of the 29th ACM SIGKDD conference on knowledge discovery and data mining. 2023.

55.

Adams

Sorensen

Elliott

, et al. Toxic Comment Classification Challenge. Kaggle, 2017. https://kaggle.com/competitions/jigsaw-toxic-comment-classification-challenge.

56.

Mathew

Saha

Yimam

, et al. Hatexplain: A benchmark dataset for explainable hate speech detection. Proc AAAI Conf Artif Intell 2021; 35: 14867–14875.

57.

Jiang

Yang

Liu

, et al. SWSR: A Chinese dataset and lexicon for online sexism detection. Online Soc Network Med 2022; 27: 100182.

58.

Çöltekin

. A corpus of Turkish offensive language on social media. In: Proceedings of the Twelfth language resources and evaluation conference. 2020, pp.6174–6184.

59.

Assenmacher

Niemann

Müller

, et al. RP-Mod & RP-Crowd: Moderator- and crowd-annotated German news comment datasets. In: NeurIPS datasets and benchmarks track (Round 2). 2021.

60.

Moon

Cho

Lee

. BEEP! Korean corpus of online news comments for toxic speech detection. In: Proceedings of the Eighth international workshop on natural language processing for social media. 2020, pp.25–31.

61.

Sarker

Sultana

Wilson

, et al. ToxiSpanSE: An explainable toxicity detection in code review comments. In: 2023 ACM/IEEE international symposium on empirical software engineering and measurement (ESEM), 2023, pp.1–12. IEEE.

62.

Zampieri

Malmasi

Nakov

, et al. Predicting the type and target of offensive posts in social media. In: Proceedings of the 2019 conference of the north american chapter of the association for computational linguistics. 2019, pp.1415–1420.

63.

Touvron

Martin

Stone

, et al. LLaMA 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.

64.

Guo

Yang

Zhang

, et al. Deepseek-r1: Incentivizing reasoning capability in LLMs via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025.

65.

Shen

Wallis

, et al. LoRA: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.

66.

Dai

Liu

, et al. Kungfupanda at SemEval-2020 Task 12: BERT-based multi-task learning for offensive language detection. In: Proceedings of the fourteenth workshop on semantic evaluation. 2020, pp.2060–2066.

U-GIFT: Uncertainty-guided few-shot detection of toxic speech

Abstract

Keywords

Get full access to this article

References