Sage Journals: Discover world-class research

Abstract

In text classification tasks with complex models and high-stakes domains the alignment between predictions and explanations tends to be weak because post-hoc explainability methods operate independent of model training. In this paper, we suggest ATM-AM - an approach based on the Gated Recurrent Unit (GRU) that combines Bahdanau attention with a training-time SHAP-backed alignment objective to offer real-time, context-aware interpretability without trade-off in predictive performance. The model is tested over three frequently-used sentiment analysis datasets (IMDbhttps://huggingface.co/datasets/imdb, Amazon Reviews https://www.kaggle.com/datasets/bittlingmayer/amazonreviews, and SST-2. https://huggingface.co/datasets/glue/viewer/sst2) yielding accuracy scores of 91.8%, 89.5%, and 90.0% with respective F1-scores of 0.899, 0.877, and 0.889 respectively, on each dataset. We also average our measurements over three runs for statistical soundness. The additional training latency added by ATM-AM is quite modest (13–18%), and the inference time remains short (3–4 ms per sample), rendering it feasible to be deployed in real-time. A user-centered interpretability study with 30 participants obtained an average rating of 4.6/5 showing that users trust the explanations produced by our proposed model. These observations posit ATM-AM as a feasible and interpretable solution Text Classification in contexts where model behavior needs to be accountable and reliable.

Keywords

Artificial intelligence (AI)natural language processing (NLP)explainable AI (XAI)SHAP sentiment analysis bahdanau attention

Get full access to this article

View all access options for this article.

References

Adadi

Berrada

(2018). Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE access, 6, 52138–52160.

Amershi

Weld

Vorvoreanu

Fourney

Nushi

Collisson

Suh

Iqbal

Bennett

P. N.

Inkpen

Teevan

, & ... Horvitz

(2019). Guidelines for human-AI interaction. In Proceedings of the 2019 CHI conference on human factors in computing systems. (p. 1-13).

Bahdanau

Cho

Bengio

(2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.

Bhatt

Weller

Dunnmon

A. K.

Zhang

Zemel

(2020). Explainable machine learning in deployment. In Proceedings of the 2020 CHI conference on human factors in computing systems. (p. 648:1–648:12).

Bhatt

Weller

Moura

J. M. F.

(2019). Explainable machine learning in deployment. In ACM conference on fairness, accountability, and transparency (FAT*).

Bittlingmayer

(2017). Amazon reviews. Kaggle. Available from: https://www.kaggle.com/datasets/bittlingmayer/amazonreviews.

Chen

Zhang

(2022). Transformer interpretability beyond attention visualization: A taxonomy and survey. arXiv preprint arXiv:2201.04874.

Chen

Huang

(2022). Hierarchical attention with visual explanation for affective text classification. IEEE Transactions on Affective Computing. https://doi.org/10.1109/TAFFC.2022.3142313

Chen

Huang

(2022). Hierarchical attention networks revisited: GRU-based document modeling for classification. Information Processing & Management.

10.

Chen

Sun

(2022). Interpretable sentiment analysis with hierarchical attention and GRUs. In Proceedings of the 60th Annual meeting of the association for computational linguistics (ACL).

11.

Doshi-Velez

Kim

(2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608.

12.

European Union. (2016). Regulation (EU) 2016/679

\dots

General Data Protection Regulation. Off J Eur Union, L119, 1–88. Available from: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32016R0679.

13.

Gilpin

Bau

Yuan

B. Z.

Bajwa

Specter

Kagal

(2018). Explaining explanations: An overview of interpretability of machine learning. In IEEE 5th International conference on data science and advanced analytics (DSAA).

14.

Guidotti

Monreale

Ruggieri

Turini

Giannotti

Pedreschi

(2018). A survey of methods for explaining black-box models. ACM Computing Surveys, 51(5), 1–42.

15.

HuggingFace. (2020). IMDb dataset. Available from: https://huggingface.co/datasets/imdb.

16.

Jain

Wallace

B. C.

(2019). Attention is not explanation. In Proceedings of NAACL-HLT.

17.

Lipton

Z. C.

(2018). The mythos of model interpretability. Communications of the ACM, 61(10), 36–43.

18.

Lundberg

S. M.

Lee

S-I.

(2017). A unified approach to interpreting model predictions. In Proceedings of the 31st International conference on neural information processing systems (NeurIPS). (p. 4765–4774).

19.

Lundberg

S. M.

Lee

S-I.

(2017). A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems (NeurIPS).

20.

Miller

(2019). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 1–38.

21.

Mittelstadt

Russell

Wachter

(2019). Explaining explanations in AI. In Proceedings of the conference on fairness, accountability, and transparency (FAT* ’19). (p. 279-288).

22.

Ribeiro

M. T.

Singh

Guestrin

(2016). Why should I trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. (pp. 1135-1144).

23.

Rozario

čevora

(2023). Explainable AI does not provide the explanations end-users are asking for. arXiv preprint arXiv:2302.11577. https://doi.org/10.48550/arXiv.2302.11577.

24.

Saeed

Omlin

C. W.

(2023). Explainable AI (XAI): A systematic meta-survey of current challenges and future opportunities. Knowl-Based Systems, 263, 110273.

25.

Selsam

Lamm

Dill

(2019). Learning a SAT solver from single-bit supervision. In Proceedings of ICLR.

26.

Singh

Verma

(2023). SHAP-based explainability in fine-tuned BERT models for sentiment classification. Journal of Intelligent & Fuzzy Systems, 45(3), 3257–3269.

27.

Singh

Verma

(2024). SHAP-BERT: Shapley-based explanations for transformer models. Journal of Artificial Intelligence Research.

28.

Singh

Verma

(2024). SHAP-BERT: A hybrid framework for transparent text classification. Knowl-Based Systems, 285, 110268.

29.

Tonghe

Yeh

C.-K.

Ravikumar

Zhang

(2021). On the Importance of Feature Attribution Methods: An Empirical Study of BERT Interpretability. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). (pp. 10544–10555).

30.

Wang

(2017). The Stanford Sentiment Treebank (SST-2). Hugging Face Datasets, GLUE Benchmark. Available from: https://huggingface.co/datasets/glue/viewer/sst2.

31.

Wang

Liu

Zhang

(2023). Transformer-XAI: Interpretable transformer models via token-level attribution. IEEE Transactions on Affective Computing.

32.

Wang

Zhao

(2023). Explainable transformers for sentiment analysis in low-resource scenarios. An International Journal on Information Fusion, 91, 101812.

33.

Wang

Zhang

Liu

(2023). Transformer-XAI: Explaining transformer decisions in NLP tasks. IEEE Transactions on Neural Networks and Learning Systems.

34.

Wiegreffe

Pinter

(2019). Attention is not not explanation. In Proceedings of EMNLP.

ATM-AM: An Interpretable Attention SHAP Aligned Framework for Text Classification across IMDb,Amazon,and SST-2

Abstract

Keywords

Get full access to this article

References