Abstract
The current social media platforms face two major issues: decreasing credibility due to the proliferation of artificial intelligence generated content (AIGC) and contamination by human-generated implicitly toxic content. We propose an end-to-end deep learning method to identify AIGC and implicitly toxic content. This method integrates semantic features and long-distance textual dependency features, thereby enhancing recognition accuracy. We constructed 2 data sets including 78,798 messages and 54 topics from social media platforms. Experimental results demonstrate that our proposed method can identify 98.25% of AIGC and 98.06% of human-generated implicitly toxic content. Furthermore, in identifying AIGC, our proposed method’s accuracy is 1.12% and 13.89% higher than that of ERNIE and GPTZero, respectively. In identifying human-generated implicitly toxic content, our method outperforms BERT and XGBoost by 1.01% and 5.69%, respectively. Finally, we conducted an interpretability analysis of the model using the SHAP method to understand how the model identifies AIGC and human-generated implicitly toxic content.
Get full access to this article
View all access options for this article.
