Abstract
The rise of user-generated content on social media has led to increasing instances of toxic speech—such as hate, abuse, and cyberbullying—which threaten online safety and integrity. Manual moderation struggles to cope with the vast content volume and the mental burden it places on human moderators, highlighting the urgent need for automated detection systems. However, many existing approaches require large labeled datasets, which are costly and difficult to obtain. To address this challenge, we propose U-GIFT (Uncertainty-Guided Few-Shot Detection of Toxic Speech), a self-training framework that excels in low-resource settings. U-GIFT integrates Bayesian neural networks with an uncertainty-guided sample selection strategy to identify high-confidence pseudo-labeled samples from unlabeled data. This strategy improves detection accuracy even with minimal supervision. Experiments across multiple benchmarks show that U-GIFT outperforms strong baselines, achieving a 14.92% gain in the 5-shot setting. The method can be applied to different pre-trained language models, is robust to sample imbalance, and generalizes well across domains. By reducing dependence on annotated data, U-GIFT offers an effective, scalable solution for toxic speech detection and supports safer online interactions through improved automated moderation.
Get full access to this article
View all access options for this article.
