Sage Journals: Discover world-class research

Abstract

Machine generation of arithmetic word problems (AWPs) is challenging as these problems require the correct use of quantities and mathematical relationships among them. While state-of-the-art deep-learning (DL) models excel at generating text with language variations, the mathematical validity of generated problems often remains unchecked. Metrics such as BLEU-4, METEOR, and ROUGE-L exist to assess the language quality of generated problems, but checking the end-to-end mathematical validity of AWPs is less explored. This work focuses on transfer-case (TC)-AWPs (problems involving object transfer among agents). Though we train them with a dataset of valid problems, DL systems generate valid, near-valid, and invalid problems. Near-valid cases are invalid problems that are grammatically correct but mathematically incorrect. The proposed work focuses on validity-checking of TC-AWPs and repairing the near-valid cases. Detecting valid/near-valid problems requires manual effort and is error-prone. Encoding the relevant domain knowledge as an ontology is very helpful in these tasks. We propose leveraging an extended TC-ontology, previously developed to solve TC-AWPs, for automated validity-checking and repairing near-valid problems. We construct a problem-specific representation (ontology Assertional-Box) of an auto-generated problem by leveraging a sentence-classifier and BERT language models (LMs). The training set for these LMs is problem-texts where sentence-parts are annotated with ontology class-names. The proposed approach ensures that TC-AWPs produced in the output are always valid. We also briefly discuss how our ontology-based approach can be adapted to generate TC-AWPs that contain multiple object-transfers and are guaranteed to be valid. Adopting this approach to generate other types of AWPs is interesting future work.

Keywords

Ontology language model ABox extraction arithmetic word problem (AWP)mathematical validity semantic web rule language (SWRL)

Get full access to this article

View all access options for this article.

References

Baader

Calvanese

McGuinness

D. L.

Nardi

Patel-Schneider

P. F.

(2010). The description logic handbook: Theory, implementation and applications (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9780511711787

Bechhofer

van Harmelen

Hendler

Horrocks

McGuinness

Patel-Schneijder

Stein

L. A.

(2004). OWL web ontology language reference. W3C Recommendation, World Wide Web Consortium. http://www.w3.org/TR/owl-ref/

Brickley

Guha

(2004). RDF vocabulary description language 1.0: RDF schema. W3C Recommendation, World Wide Web Consortium. https://www.w3.org/TR/rdf-schema/

Cao

Zeng

Zhao

Mansur

Chang

(2021). Generating math word problems from equations with topic consistency maintaining and commonsense enforcement. In Farkaš, I., Masulli, P., Otte, S., & Wermter, S. (Eds.), Artificial neural networks and machine learning—ICANN 2021 (pp. 66–79). Springer International Publishing. https://doi.org/10.1007/978-3-030-86365-4_6

Chang

Wang

Yang

Zhu

Chen

Wang

Zhang

Chang

P. S.

Yang

Xie

(2024). A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3), Article 39. https://doi.org/10.1145/3641289

Chen

Guestrin

(2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16 (pp. 785–794). Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785

Deane

Sheehan

(2003). Automatic item generation via frame semantics: Natural language generation of math word problems. https://eric.ed.gov/?id=ED480135

Devlin

Chang

M.-W.

Lee

Toutanova

(2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies, volume 1 (long and short papers) (pp. 4171–4186). Association for Computational Linguistics.

Gupta

Kumar

Sreenivasa Kumar

(2023). Solving age-word problems using domain ontology and BERT. In Proceedings of the 6th joint international conference on data science and management of data (10th ACM IKDD CODS and 28th COMAD) (CODSCOMAD '23), New York, NY, USA (pp. 95–103). Association for Computing Machinery. https://doi.org/10.1145/3570991.3571058

10.

Horrocks

Patel-Schneider

P. F.

Bechhofer

Tsarkov

(2005). OWL rules: A proposal and prototype implementation. Web Semantics (Online), 3(1), 23–40. https://doi.org/10.1016/j.websem.2005.05.003

11.

Hosseini

M. J.

Hajishirzi

Etzioni

Kushman

(2014). Learning to solve arithmetic word problems with verb categorization. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 523–533). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1058

12.

Huang

Shi

Lin

C.-Y.

Yin

W.-Y.

(2016). How well do computers solve math word problems? Large-scale dataset construction and evaluation. In Proceedings of the 54th annual meeting of the Association for Computational Linguistics (volume 1: Long papers) (pp. 887–896). Association for Computational Linguistics. https://doi.org/10.18653/v1/P16-1084

13.

Klyne

Carroll

J. J.

(2004). Resource description framework (RDF): Concepts and abstract syntax. W3C recommendation. https://www.w3.org/TR/2004/REC-rdf-concepts-20040210/

14.

Koncel-Kedziorski

Konstas

Zettlemoyer

Hajishirzi

(2016a). A theme-rewriting approach for generating algebra word problems. In Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 1617–1628). Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1168

15.

Koncel-Kedziorski

Roy

Amini

Kushman

Hajishirzi

(2016b). MAWPS: A math word problem repository. In Proceedings of the 2016 conference of the North American Chapter of the Association for Computational Linguistics: Human language technologies (pp. 1152–1157). Association for Computational Linguistics. https://doi.org/10.18653/v1/N16-1136

16.

Kumar

P. S.

(2024). KLAUS-Tr: Knowledge & learning-based unit focused arithmetic word problem solver for transfer cases. Natural Language Engineering, 30(1), 96–131. https://doi.org/10.1017/S1351324922000511

17.

Kushman

Artzi

Zettlemoyer

Barzilay

(2014). Learning to automatically solve algebra word problems. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (volume 1: Long papers) (pp. 271–281). Association for Computational Linguistics. https://doi.org/10.3115/v1/P14-1026

18.

Lamy

J.-B.

(2017). Owlready. Artificial Intelligence in Medicine, 80(C), 11–28. https://doi.org/10.1016/j.artmed.2017.07.002

19.

Lavie

Agarwal

(2007). METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments. In Proceedings of the second workshop on statistical machine translation (pp. 228–231). Association for Computational Linguistics. https://aclanthology.org/W07-0734

20.

Tang

Zhao

W. X.

Nie

J.-Y.

Wen

J.-R.

(2024). Pre-trained language models for text generation: A survey. ACM Computing Surveys, 56(9). https://doi.org/10.1145/3649449

21.

Lin

C.-Y

(2004). ROUGE: A package for automatic evaluation of summaries. In Text summarization branches out (pp. 74–81). Association for Computational Linguistics. https://aclanthology.org/W04-1013

22.

Liu

Fang

Ding

Liu

. (2021). Mathematical word problem generation from commonsense knowledge graph and equations. In Moens, M., Huang, X., Specia, L., & Yih, S.W. (Eds.), Proceedings of the 2021 conference on empirical methods in natural language processing, EMNLP 2021, virtual event/Punta Cana, Dominican Republic, 7–11 November 2021 (pp. 4225–4240). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.348

23.

Loper

Bird

(2002). NLTK: The natural language toolkit. CoRR, cs.CL/0205028. https://doi.org/10.3115/1118108.1118117

24.

Motik

Sattler

Studer

(2004). Query answering for OWL-DL with rules. In McIlraith, S. A., Plexousakis, D., & van Harmelen, F. (Eds.), The semantic Web—ISWC 2004 (pp. 549–563). Springer. https://doi.org/10.1007/978-3-540-30475-3_38

25.

Musen

M. A.

(2015). The protégé project: A look back and a look forward. AI Matters, 1(4), 4–12.

26.

Papineni

Roukos

Ward

Zhu

W.-J.

(2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311–318). Association for Computational Linguistics. https://doi.org/10.3115/1073083.1073135

27.

Polozov

O’Rourke

Smith

A. M.

Zettlemoyer

Gulwani

Popovic

(2015). Personalized mathematical word problem generation. In Proceedings of the 24th international conference on artificial intelligence, IJCAI’15 (pp. 381–388). AAAI Press. https://dl.acm.org/doi/10.5555/2832249.2832302

28.

Radford

Child

Luan

Amodei

Sutskever

(2019). Language models are unsupervised multitask learners. https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

29.

Reiter

Dale

(1997). Building applied natural language generation systems. Natural Language Engineering, 3(1), 57–87. https://doi.org/10.1017/S1351324997001502

30.

Roy

Roth

. (2017). Unit dependency graph and its application to arithmetic word problem solving. In Proceedings of the thirty-first AAAI conference on artificial intelligence, AAAI’17 (pp. 3082–3088). AAAI Press. https://dl.acm.org/doi/10.5555/3298483.3298682

31.

Staab

Studer

(Eds.). (2004). Handbook on ontologies. International handbooks on information systems. Springer. https://doi.org/10.1007/978-3-540-92673-3

32.

Stevens

Malone

Williams

Power

Third

(2011). Automating generation of textual class definitions from OWL to English. Journal of Biomedical Semantics 2, 1–20. https://doi.org/10.1186/2041-1480-2-S2-S5

33.

Veličković

Cucurull

Casanova

Romero

Liò

Bengio

(2017). Graph attention networks. https://arxiv.org/abs/1710.10903

34.

Wang

Lan

Baraniuk

(2021). Math word problem generation with mathematical consistency and problem context constraints. In Proceedings of the 2021 conference on empirical methods in natural language processing (pp. 5986–5999). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.484

35.

Wang

Zhang

Gao

Dai

B. T.

Shen

H. T.

(2019). Template-based math word problem solvers with recursive neural networks. In Proceedings of the thirty-third AAAI conference on artificial intelligence and thirty-first innovative applications of artificial intelligence conference and ninth AAAI symposium on educational advances in artificial intelligence, July 2019 (Vol. 33, No. 01, pp. 7144–7151). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33017144

36.

Williams

(2011). Generating mathematical word problems. Association for the Advancement of Artificial Intelligence (AAAI). In AAAI fall symposium series, Virginia, 4-6 November 2011 (pp. 61–64). https://aaai.org/papers/04182-4182-generating-mathematical-word-problems/

37.

Zhou

Huang

(2019). Towards generating math word problems from equations and topics. In Proceedings of the 12th international conference on natural language generation (pp. 494–503). Association for Computational Linguistics. https://doi.org/10.18653/v1/W19-8661

38.

Zou

(2019). Text2Math: End-to-end parsing text into math expressions. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5327–5337). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1536

Validity Checking and Repairing of Machine Generated Transfer-Type Word Problems

Abstract

Keywords

Get full access to this article

References