Abstract
The widespread application of Large Language Models (LLMs) extends to education and assessment, underscoring their broader scope and attention. In this context, pretrained LLMs are employed for generating voluminous questions on the basis of their vastly pretrained linguistic patterns. However, challenges remain in generating multihop questions automatically. Hallucinatory named entities or false positive phrases are observed during the automatic generation of multihop questions at scale, which is highly challenging. This challenge arises from the tendency of language models to capture co-occurrence within a given context rather than accurately identifying the relationships between named entities, known as knowledge gaps. To address this challenge, techniques to foster named entity expansion are highly demanding. In this context, Retrieval Augmented Models (RAMs) supplement the requirement by incorporating standard knowledge models such as ontologies for named entity-based text expansion of the source text. However, the limited factual representation from each individual ontology is rarely adequate for text expansion, where Ontology Mapping (OM) needs attention. Here, two key objectives are focused on: 1. Extraction of overlapping entity mappings in the Basic Matching (BM) stage and 2. Ranking intersectional entity mappings in the Final Alignment (FA) stage. With this motivation, experiments are conducted on OM approaches and to integrate them with the RAM and transformer models for the multihop question generation process. Analysis using benchmark datasets and evaluation metrics demonstrates that the proposed hybrid LLM model, incorporating Ontology Mapping (OM), the RAG model (RAM), and a Large Language Model (LLM), achieves ROUGE-L scores ranging from 41% to 45% and BERTScore between 74% and 88%, indicating strong relevance to the entity context. Additionally, the study introduces new metric, RAG Assessment (RAGAS), and the results reveal that the proposed approach effectively balances ROUGE-L and RAGAS-Precision scores, which range from 39% to 43%, highlighting reduced hallucination in auto-generated multihop questions.
Keywords
Get full access to this article
View all access options for this article.
