End-to-end knowledge graph-based question answering system for complex queries

Abstract

Knowledge graph-based question answering (KGQA) systems face several challenges. These include the need for detailed training data, difficulty in handling complex multi-hop queries, and dense knowledge gap interactions. The model needs training on annotated entities and relations, which requires significant human effort and time. We developed methodologies that improve end-to-end question answering with knowledge graphs, eliminating the need for pre-annotated entities (gold entities). Our approach incorporates language models—Text-to-Text Transformer (T5) and Longformer—and employs a named entity disambiguation technique. We reduced the dependency on gold entities by first removing explicit entity annotations from the training data and then augmenting this data with relevant knowledge base facts. In this paper, we explored two different methodologies: (1) training T5 and Longformer on this augmented dataset to answer factoid questions using inferred knowledge graph entities, and (2) applying transfer learning with SPARQL-based supervision to improve generalization. The experimental results demonstrate that the proposed models are efficient and offer effective strategies for addressing complex questions while significantly reducing the need for manual annotation of training data.

Keywords

question answering system knowledge graph knowledge base named entity disambiguation

Get full access to this article

View all access options for this article.

References

Hoffart

Yosef

Bordino

, et al. Robust disambiguation of named entities in text. In Proceedings of the 2011 conference on empirical methods in natural language processing, 2011, pp.782–792.

Dubey

Banerjee

Chaudhuri

, et al. Earl: joint entity and relation linking for question answering over knowledge graphs. In The Semantic Web–ISWC 2018: 17th international Semantic Web conference, Monterey, CA, USA, October 8–12, 2018, proceedings, Part I 17. Springer, 2018, pp.108–126.

Min

Iyer

, et al. Efficient one-pass end-to-end entity linking for questions. arXiv preprint arXiv:2010.02413 (2020).

Shen

Geng

Qin

, et al. Multi-task learning for conversational question answering over a large-scale knowledge base. arXiv preprint arXiv:1910.05069 (2019).

Christmann

Saha Roy

Weikum

. Beyond NED: fast and effective search space reduction for complex question answering over knowledge bases. In Proceedings of the fFifteenth ACM iInternational conference on web search and data mining, 2022, pp.172–180.

Lukovnikov

Fischer

Lehmann

. Pretrained transformers for simple question answering over knowledge graphs. In The Semantic Web–ISWC 2019: 18th international Semantic Web conference, Auckland, New Zealand, October 26–30, 2019, proceedings, Part I 18. Springer, 2019, pp.470–486.

Gul

Ayturan

Hardalaç

. Pycaret for predicting type 2 diabetes: a phenotype- and gender-based approach with the “nurses' health study” and the “health professionals' follow-up study” datasets. J Pers Med 2024; 14: 804.

Cao

Shi

Pan

, et al. KQA Pro: a dataset with explicit compositional programs for complex question answering over knowledge base. arXiv preprint arXiv:2007.03875 (2020).

Kase

Vanni

, et al. Beyond IID: three levels of generalization for question answering on knowledge bases. In Proceedings of the web conference 2021, 2021, pp.3477–3488.

10.

Sen

Aji

Saffari

. Mintaka: a complex, natural, and multilingual dataset for end-to-end question answering. arXiv preprint arXiv:2210.01613 (2022).

11.

Dubey

Banerjee

Abdelkawi

, et al. LC-QuAD 2.0: a large dataset for complex question answering over Wikidata and DBpedia. In The Semantic Web–ISWC 2019: 18th international Semantic Web conference, Auckland, New Zealand, October 26–30, 2019, proceedings, Part II 18. Springer, 2019, pp.69–78.

12.

Pramanik

Alabi

Roy

, et al. Uniqorn: unified question answering over RDF knowledge graphs and natural language text. arXiv preprint arXiv:2108.08614 (2021).

13.

Diomedi

Hogan

. Question answering over knowledge graphs with neural machine translation and entity linking. arXiv preprint arXiv:2107.02865 (2021).

14.

Usbeck

Yan

Perevalov

, et al. QALD-10—the 10th challenge on question answering over linked data.

15.

Vaswani

Shazeer

Parmar

, et al. Attention is all you need. Adv Neural Inf Process Syst 2017; 30: 5998–6008.

16.

Beltagy

Peters

Cohan

. Longformer: the long-document transformer. arXiv preprint arXiv:2004.05150 (2020).

17.

Raffel

Shazeer

Roberts

, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 2020; 21: 5485–5551.

18.

Miller

Fisch

Dodge

, et al. Key-value memory networks for directly reading documents. arXiv preprint arXiv:1606.03126 (2016).

19.

Cao

Shi

Pan

, et al. KQA Pro: a dataset with explicit compositional programs for complex question answering over knowledge base. arXiv preprint arXiv:2007.03875 (2020).

20.

Saxena

Tripathi

Talukdar

. Improving multi-hop question answering over knowledge graphs using knowledge base embeddings. In Proceedings of the 58th annual meeting of the association for computational linguistics, 2020, pp.4498–4507.

21.

Cohen

Sun

Hofer

, et al. Scalable neural methods for reasoning with a symbolic knowledge base. arXiv preprint arXiv:2002.06115 (2020).

22.

Liu

Ott

Goyal

, et al. Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).

23.

Karpukhin

Oğuz

Min

, et al. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906 (2020).