Sage Journals: Discover world-class research

Abstract

Conditional Semantic Textual Similarity (C-STS) evaluates the semantic similarity between two sentences under a given condition. Recent methods often overlook the inherent ambiguity in human-annotated scoring criteria. This research hypothesizes that C-STS annotations reflect a combination of explicit instructions and latent, implicit scoring standards. Unlike prior explanation-aware similarity approaches that treat explanation generation and scoring as independent stages, this work jointly optimizes both by using LLM-generated explanations as candidate inputs and selecting the most relevant ones via a fine-tuned lightweight LLM scorer. This design addresses the inherent limitations of general-purpose LLMs in subjective scoring tasks while maintaining adaptability and computational efficiency. Furthermore, a score-guided explanation selection mechanism that identifies optimal explanations is introduced by retrospectively evaluating candidate explanations under the trained scoring model. Experiments on the C-STS dataset demonstrate improved similarity estimation by approximately 9% compared to static encoders like SimCSE. Additionally, the selection process reveals the existence of explanations that could theoretically yield up to 38% higher correlation, indicating the latent upper bound of explanation-driven scoring and validating the potential of reverse filtering. These findings highlight the importance of modeling implicit reasoning and demonstrate the potential of lightweight LLMs in explanation-sensitive evaluation tasks. Source code is available in following link.¹

Keywords

conditional semantic textual similarity explanation-aware scoring lightweight large language models implicit scoring standards

Get full access to this article

View all access options for this article.

References

Agirre

Banea

Cardie

Cer

Diab

Gonzalez-Agirre

Guo

Mihalcea

Rigau

Wiebe

(2014). Semeval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th international workshop on semantic evaluation (pp. 81–91). Association for Computational Linguistics. https://doi.org/10.3115/v1/S14-2010 .

Agirre

Banea

Cardie

Cer

Diab

Gonzalez-Agirre

Guo

Mihalcea

Rigau

Wiebe

(2015). Semeval-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability. In Proceedings of the 9th international workshop on semantic evaluation (pp. 252–263). Association for Computational Linguistics. https://doi.org/10.18653/v1/S15-2045 .

Amidei

Piwek

Willis

(2018). Rethinking the agreement in human evaluation tasks. In Proceedings of the 27th international conference on computational linguistics (pp. 3318–3329). Association for Computational Linguistics. https://aclanthology.org/C18-1281/ .

Bulatov

Kuratov

Burtsev

(2024). Mastering long-context multi-task reasoning with transformers and recurrent memory. Optical Memory and Neural Networks, 33(Suppl 3), S466–S474. https://doi.org/10.3103/S1060992X24700735

Camburu

O. M.

Rocktäschel

Lukasiewicz

Blunsom

(2018). e-SNLI: natural language inference with natural language explanations. Neural Information Processing Systems, 31. https://doi.org/10.48550/arXiv.1812.01193

Cer

Yang

Kong

Hua

Limtiaco

St. John

Constant

Guajardo-Cespedes

Yuan

Tar

Strope

Kurzweil

(2018). Universal sentence encoder for English. In Proceedings of the 2018 conference on empirical methods in natural language processing: System demonstrations (pp. 169–174). Association for Computational Linguistics. https://doi.org/10.18653/v1/D18-2029 .

Chen

Zhang

Chen

Zhou

Peng

Wang

Roth

(2024). Sub-sentence encoder: Contrastive learning of propositional semantic representations. In Proceedings of the 2024 conference of the north American chapter of the association for computational linguistics: Human language technologies (volume 1: Long papers) (pp. 1596–1609). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.naacl-long.89 .

Chen

Kornblith

Norouzi

Hinton

(2020). A simple framework for contrastive learning of visual representations. In International conference on machine learning (pp. 1597–1607). Proceedings of Machine Learning Research (PMLR). https://doi.org/10.48550/arXiv.2002.05709 .

Chin-Parker

Bradner

(2017). A contrastive account of explanation generation. Psychonomic Bulletin & Review, 24(5), 1387–1397. https://doi.org/10.3758/s13423-017-1349-x

10.

del Águila Escobar

R. A.

Suárez-Figueroa

M. C.

Fernández-López

(2024). OBOE: An explainable text classification framework. International Journal of Interactive Multimedia and Artificial Intelligence, 8(6), 24–37. https://doi.org/10.9781/ijimai.2022.11.001

11.

Deshpande

Jimenez

Chen

Murahari

Graf

Rajpurohit

Kalyan

Chen

Narasimhan

(2023). C-STS: Conditional semantic textual similarity. In Proceedings of the 2023 conference on empirical methods in natural language processing (pp. 5669–5690). Association for Computational Linguistics. https://doi.org/10.18653/v1/2023.emnlp-main.345 .

12.

Elsken

Staffler

Metzen

J. H.

Hutter

(2020). Meta-learning of neural architectures for few-shot learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 12365–12375). IEEE Computer Society. https://doi.org/10.48550/arXiv.1911.11090 .

13.

Erliksson

K. F.

Arpteg

Matskin

Payberah

A. H.

(2021). Cross-domain transfer of generative explanations using text-to-text models. In International conference on applications of natural language to information systems (pp. 76–89). Springer. https://doi.org/10.1007/978-3-030-80599-9_8 .

14.

Gao

Yao

Chen

(2021). SimCSE: Simple contrastive learning of sentence embeddings. In Proceedings of the 2021 conference on empirical methods in natural language processing (pp. 6894–6910). Association for Computational Linguistics. Online and Punta Cana. https://doi.org/10.18653/v1/2021.emnlp-main.552.

15.

E. J.

Shen

Wallis

Allen-Zhu

Wang

Chen

(2022). LoRA: Low-Rank adaptation of large language models. ICLR. https://doi.org/10.48550/arXiv.2106.09685

16.

Huang

Yang

Wang

(2024). When large language model meets optimization. Swarm and Evolutionary Computation, 90, 101663. https://doi.org/10.1016/j.swevo.2024.101663

17.

Klose

(2004). Extracting fuzzy classification rules from partially labeled data. Soft Computing, 8, 417–427. https://doi.org/10.1007/s00500-003-0297-8

18.

Lan

Chen

Goodman

Gimpel

Sharma

Soricut

(2019). ALBERT: a lite BERT for self-supervised learning of language representations. ICLR. https://doi.org/10.48550/arXiv.1909.11942

19.

Siew

C. S.

(2022). Diachronic semantic change in language is constrained by how people use and learn language. Memory & Cognition, 50(6), 1284–1298. https://doi.org/10.3758/s13421-022-01331-0

20.

Liu

Zhang

Zhao

Yang

(2024a). LLMEmbed: Rethinking lightweight LLM’s genuine function in text classification. In Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 7994–8004). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.433 .

21.

Liu

Huang

Wang

Peng

Xie

(2024b). EGLR: Two-staged explanation generation and language reasoning framework for commonsense question answering. Knowledge-Based Systems, 286, 111411. https://doi.org/10.1016/j.knosys.2024.111411

22.

Liu

Qin

Wang

Liang

Zong

(2025). Conditional semantic textual similarity via conditional contrastive learning. In Proceedings of the 31st international conference on computational linguistics (pp. 4548–4560). Association for Computational Linguistics. https://aclanthology.org/2025.coling-main.306/ .

23.

Liu

Ott

Goyal

Joshi

Chen

Levy

Lewis

Zettlemoyer

Stoyanov

(2019). RoBERTa: A robustly optimized BERT pretraining approach. ICLR. https://doi.org/10.48550/arXiv.1907.11692

24.

Liu

(2025). f-KGQA: A fuzzy question answering system for knowledge graphs. Fuzzy Sets and Systems, 498, 109117. https://doi.org/10.1016/j.fss.2024.109117

25.

Opitz

Möller

Michail

Padó

Clematide

(2025). Interpretable text embeddings and text similarity explanation: a survey. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (pp. 22314–22330). https://doi.org/10.18653/v1/2025.emnlp-main.1135 .

26.

Peng

Zhang

Wang

Srinivasa

Liu

Wang

Shang

(2024). Answer is all you need: Instruction-following text embedding via answering the question. In Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 459–477). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.27 .

27.

Rajani

N. F.

McCann

Xiong

Socher

(2019). Explain yourself! leveraging language models for commonsense reasoning. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 4932–4942). Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1487 .

28.

Stankevičius

Lukoševičius

(2024). Extracting sentence embeddings from pretrained transformer models. Applied Sciences, 14(19), 8887. https://doi.org/10.3390/app14198887

29.

Sun

Wang

Miao

Zhao

(2025). A review of AI edge devices and lightweight CNN and LLM deployment. Neurocomputing, 614, 128791. https://doi.org/10.1016/j.neucom.2024.128791

30.

Touvron

Lavril

Izacard

Martinet

Lachaux

M. A.

Lacroix

Rozière

Goyal

Hambro

Azhar

Rodriguez

Joulin

Grave

Lample

(2023). LLaMA: open and efficient foundation language models. arXiv preprint arXiv:2302.13971. https://doi.org/10.48550/arXiv.2302.13971

31.

Yue

Rim

Pustejovsky

(2024). Linguistically conditioned semantic textual similarity. In Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 1161–1172). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.64 .

32.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

Kaiser

Ł.

Polosukhin

(2017). Attention is all you need. neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.03762

33.

Xian

Lampert

C. H.

Schiele

Akata

(2018). Zero-shot learning - a comprehensive evaluation of the good, the bad and the ugly. IEEE transactions on Pattern Analysis and Machine Intelligence, 41(9), 2251–2265. https://doi.org/10.1109/TPAMI.2018.2857768

34.

Yoo

Y. H.

Cha

Kim

(2024). Hyper-CL: Conditioning sentence representations with hypernetworks. In Proceedings of the 62nd annual meeting of the association for computational linguistics (volume 1: Long papers) (pp. 700–711). Association for Computational Linguistics. https://doi.org/10.18653/v1/2024.acl-long.41 .

Score-Guided Identification of Optimal Explanations for Conditional Text Similarity

Abstract

Keywords

Get full access to this article

References