Sage Journals: Discover world-class research

Abstract

As large language models (LLMs) evolve rapidly, distinguishing AI-generated text (AIGT) from human-written text (HWT) is becoming increasingly challenging. Recently, some AIGT detectors have been developed to overcome this challenge and have achieved decent accuracy. However, their brittle text representations make them highly susceptible to text perturbations, such that even minor character-level perturbations can reverse their predictions. In this work, we propose a multi-grained latent feature denoising and contrastive representation learning architecture to enhance text representations in terms of granularity, robustness, and distinguishability of features, thereby achieving robust AIGT detection. Specifically, we first extract both document-level and fine-grained segment-level features using a dual network, which captures the global and subtle local differences between AIGT and HWT. To encourage feature stability under perturbations, we inject random noise into both latent features and employ a denoising network to reconstruct the original representations. While this does not precisely simulate discrete character-level perturbations, it acts as a feature-level regularizer that suppresses non-essential variations and promotes smoother, more stable representations. Considering the similarities between AIGT and HWT, we further design a contrastive augmentation mechanism to increase the distinguishability between them. Extensive experiments demonstrate that our method not only outperforms baseline models in terms of classification accuracy but also exhibits superior robustness against various text perturbations.

Keywords

AI-generated text detection model robustness latent feature denoising contrastive representation learning

Get full access to this article

View all access options for this article.

References

Wang

Leong

Wang

, et al. Instruct once, chat consistently in multiple rounds: an efficient tuning framework for dialogue. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), 2024a, pp.3993–4010. DOI: 10.18653/v1/2024.acl-long.219.

Ravaut

Sun

Chen

, et al. On context utilization in summarization with large language models. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), 2024, pp.2764–2781. DOI: 10.18653/v1/2024.acl-long.153.

Qiao

Liu

. MoPS: modular story premise synthesis for open-ended automatic story generation. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), 2024, pp.2135–2169. DOI: 10.18653/v1/2024.acl-long.117.

Yang

Zhan

, et al. A survey on llm-gernerated text detection: necessity, methods, and future directions. arXiv preprint arXiv:2310.14724 (2023). DOI: 10.48550/arXiv.2310.14724.

Gehrmann

Strobelt

Rush

. GLTR: statistical detection and visualization of generated text. In: Proceedings of the 57th annual meeting of the association for computational linguistics: system demonstrations, 2019, pp.111–116. DOI: 10.18653/v1/P19-3019.

Solaiman

Brundage

Clark

et al. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203 (2019). DOI: 10.48550/arXiv.1908.09203.

Liu

Ott

Goyal

, et al. Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). DOI: 10.48550/arXiv.1907.11692.

Wang

Ren

, et al. SeqXGPT: sentence-level AI-generated text detection. In: Proceedings of the 2023 conference on empirical methods in natural language processing, 2023a, pp.1144–1156. https://aclanthology.org/2023.emnlp-main.73.

Antoun

Sagot

Seddah

. From text to source: results in detecting large language model-generated content. In: Proceedings of the 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), 2024, pp.7531–7543, https://aclanthology.org/2024.lrec-main.665/.

10.

Gambini

Fagni

Falchi

, et al. On pushing deepfake tweet detection capabilities to the limits. In: Proceedings of the 14th ACM web science conference 2022, 2022, pp.154–163. DOI: 10.1145/3501247.3531560.

11.

Huang

Zhang

, et al. Are AI-generated text detectors robust to adversarial perturbations? In: Proceedings of the 62nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), 2024, pp.6005–6024. DOI: 10.18653/v1/2024.acl-long.327.

12.

Pruthi

Dhingra

Lipton

. Combating adversarial misspellings with robust word recognition. In: Proceedings of the 57th annual meeting of the association for computational linguistics, 2019, pp.5582–5591. DOI: 10.18653/v1/P19-1561.

13.

Tang

Zhang

, et al. Divide-and-conquer: confluent triple-flow network for RGB-T salient object detection. IEEE Trans Pattern Anal Mach Intell 2024; 47: 1958–1974.

14.

Tang

Yuan

, et al. Learning attention-guided pyramidal features for few-shot fine-grained recognition. Pattern Recognit 2022; 130: 108792.

15.

Xing

Wang

, et al. Towards visual interaction: hand segmentation by combining 3D graph deep learning and laser point cloud for intelligent rehabilitation. IEEE Internet Things J 2025; 12: 21328–21338.

16.

Xing

Meng

Zheng

, et al. Human-computer interactive rehabilitation: a 3D graph deep learning method for non-contact gesture recognition in post-epidemic and aging societies. Measurement 2025; 257: 118794.

17.

Wang

. Disconnected recurrent neural networks for text categorization. In: Proceedings of the 56th annual meeting of the association for computational linguistics (Volume 1: Long Papers), 2018, pp.2311–2320. DOI: 10.18653/v1/P18-1215.

18.

Uchendu

Lee

. TOPFORMER: topology-aware authorship attribution of deepfake texts with diverse writing styles. arXiv preprint arXiv:2309.12934 (2023). DOI: 10.48550/arXiv.2309.12934.

19.

Cowap

Graham

Foster

. Do stochastic parrots have feelings too? Improving neural detection of synthetic text via emotion recognition. In: Findings of the association for computational linguistics: EMNLP 2023, 2023, pp.9928–9946. DOI: 10.18653/v1/2023.findings-emnlp.665.

20.

Liu

Zhang

Wang

, et al. CoCo: coherence-enhanced machine-generated text detection under low resource with contrastive learning. In: Proceedings of the 2023 conference on empirical methods in natural language processing, 2023, pp.16167–16188. DOI: 10.18653/v1/2023.emnlp-main.1005.

21.

Zeng

Zheng

, et al. Certified robustness to text adversarial attacks by randomized [MASK]. Comput Linguist 2023; 49: 395–427.

22.

Zhang

Tang

Sun

, et al. Modality-specific interactive attack for vision-language pre-training models. IEEE Trans Inf Forensics Secur 2025; 20: 5663–5677.

23.

Zhang

Tang

Sun

, et al. Gradient pruning interactive attack for vision-language pre-training models. IEEE Trans Dependable Secure Comput 2025; 23: 2198–2214.

24.

Wang

Feng

Hou

, et al. Stumbling blocks: stress testing the robustness of machine-generated text detectors under attacks. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), 2024b, pp.2894–2925. DOI: 10.18653/v1/2024.acl-long.160.

25.

Mitchell

Lee

Khazatsky

, et al. DetectGPT: zero-shot machine-generated text detection using probability curvature. In: International conference on machine learning, 2023, pp.24950–24962. DOI: 10.48550/arXiv.2301.11305.

26.

John

Mou

Bahuleyan

, et al. Disentangled representation learning for non-parallel text style transfer. In: Proceedings of the 57th annual meeting of the association for computational linguistics, 2019, pp.424–434. DOI: 10.18653/v1/P19-1041.

27.

Khosla

Teterwak

Wang

, et al. Supervised contrastive learning. Adv Neural Inf Process Syst 2020; 33: 18661–18673.

28.

Cui

, et al. MAGE: machine-generated text detection in the wild. In: Proceedings of the 62nd annual meeting of the association for computational linguistics (Volume 1: Long Papers), 2024, pp.36–53. DOI: 10.18653/v1/2024.acl-long.3.

29.

Zellers

Holtzman

Bisk

, et al. HellaSwag: can a machine really finish your sentence? In: Proceedings of the 57th annual meeting of the association for computational linguistics, 2019, pp.4791–4800. DOI: 10.18653/v1/P19-1472.

30.

Chen

Takamura

Nakayama

. SciXGen: a scientific paper dataset for context-aware text generation. In: Findings of the association for computational linguistics: EMNLP 2021, 2021, pp.1483–1492. DOI: 10.18653/v1/2021.findings-emnlp.128.

31.

Fan

Jernite

Perez

, et al. ELI5: long form question answering. In: Proceedings of the 57th annual meeting of the association for computational linguistics, 2019, pp.3558–3567. DOI: 10.18653/v1/P19-1346.

32.

Narayan

Cohen

Lapata

. Don’t give me the details, just the summary! Topic-aware convolutional neural networks for extreme summarization. In: Proceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp.1797–1807. DOI: 10.18653/v1/D18-1206.

33.

Zhang

Roller

Goyal

et al. Opt: open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022). DOI: 10.48550/arXiv.2205.01068.

34.

Wang

Komatsuzaki

. GPT-J-6B: a 6 billion parameter autoregressive language model, 2021.

35.

Touvron

Lavril

Izacard

et al. Llama: open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023). DOI: 10.48550/arXiv.2302.13971.

36.

Brown

. Language models are few-shot learners. arXiv preprint arXiv:2005.14165 (2020). DOI: 10.48550/arXiv.2005.14165.

37.

Zeng

Liu

et al. GLM-130B: an open bilingual pre-trained model. arXiv preprint arXiv:2210.02414 (2022). DOI: 10.48550/arXiv.2210.02414.

38.

Tan

Niculae

Danescu-Niculescu-Mizil

, et al. Winning arguments: interaction dynamics and persuasion strategies in good-faith online discussions. In: Proceedings of the 25th international conference on world wide web, 2016, pp.613–624. DOI: 10.48550/arXiv.1602.01103.

39.

Liu

Gao

, et al. Deberta: decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654. DOI: 10.48550/arXiv.2006.03654.

40.

Radford

Child

et al. Language models are unsupervised multitask learners. OpenAI blog 2019; 1: 9.

Robust AI generated text detection through multi-grained latent feature denoising and contrastive representation learning

Abstract

Keywords

Get full access to this article

References