Sage Journals: Discover world-class research

Abstract

Event-centric knowledge graphs help enhance coherence to otherwise fragmented and overwhelming data by establishing causal and temporal connections using relevant data. We address the challenge of automatically constructing event-centric knowledge graphs from generic ones. We present ChronoGrapher, a two-step system to build an event-centric knowledge graph from grand events such as the French Revolution. First, a pruned, semantically informed best-first search traversal retrieves a subgraph from large, open-domain knowledge graphs. We define event-centric filters to prune the search space and a heuristic ranking to prioritize nodes like events. Second, we combine a structured triple enrichment method with a text-based triple enrichment method to build event-centric knowledge graphs. ChronoGrapher demonstrates adaptability across datasets like DBpedia and Wikidata, outperforming approaches from the literature. Furthermore, it is designed to be flexible and to operate over any knowledge graph accessible through Header, Dictionary, and Triples dumps or SPARQL endpoints. To evaluate the utility of these constructed graphs, we conduct a preliminary user study comparing different prompting techniques for event-centric question-answering. Our results demonstrate that prompts enriched with event-centric knowledge graph triples yield more factual answers, measured by how well answers are grounded in source information, than those enriched with generic triples or base prompts, while preserving succinctness and relevance.

Keywords

knowledge graphs event-centric knowledge graphs graph traversal knowledge graph construction

1. Introduction

We tackle the problem of building event-centric knowledge graphs (KGs) automatically starting from generic KGs. An event-centric KG can be seen as a KG that focuses on events as central entities, rather than generic entities. An event is an occurrence or happening that can be described by its attributes and relationships to other entities within a KG. It includes information such as time, location, participants, and other relevant properties. Relating events together provide coherence to otherwise fragmented and overwhelming data by establishing causal and temporal connections using relevant data. There are numerous scenarios in which systems could assist individuals in constructing event-centric KGs in their tasks: in news article retrieval for event-centric exploration (Voskarides et al., 2021), in the legal domain where experts must associate events that could bolster their case (Filtz et al., 2020), or in the historical domain when generating hypotheses by retrieving valuable information from archives (Merono et al., 2015).

Let us imagine a historian seeking to understand the events that happened at the end of the French Revolution. Our system aims to produce an event-centric KG, as illustrated in Figure 1, which we manually created using mainly the Simple Event Model (SEM) (Van Hage et al., 2011) ontology and Wikidata. Figure 1 shows that the Coup of 18 Brumaire directlyPrecedes the French Consulate, and that Napoleon was a participant in the Coup of 18 Brumaire, and First Consul during the French Consulate. We can therefore infer that the Coup of 18 Brumaire marked the beginning of the French Consulate and enabled Napoleon to become First Consul.

Figure 1.

One Example of an Event-Centric Knowledge Graph (KG) on the Coup of 18 Brumaire.

It should be noted that the classification of entities such as the French Directory and the French Consulate can be ambiguous, as they may be interpreted either as governing bodies (i.e., institutional entities) or as historical periods marking distinct phases in political history. In our work, we choose to treat such entities as events, consistent with how they are often described in historical discourse (e.g., “the period of the French Directory”) and represented in sources like DBpedia and Wikidata, where these entities frequently include temporal attributes (start and end dates) and are part of event-type hierarchies.

Previous work has approached event-centric KG construction as a two-step process: first, identifying relevant cues within a large memory or knowledge base (Step 1), and second, analyzing and reasoning about these cues (Step 2) (Sloman, 1996). Recent studies have also used generic KGs as background knowledge to support event-centric KG construction (Guan et al., 2022). In this context, we approach event-centric KG construction using the following two steps: given an input event node in a KG, we extract an event-centric subgraph from that KG (Step 1), and we enrich and link this subgraph with additional information to build a new event-centric KG (Step 2). Each triple in the event-centric subgraph from Step 1 has a sub-event of the input event as its subject or its object, making the extracted content directly relevant to the construction of the new event-centric KG in Step 2. Most existing event-centric KG resources are; however, built from annotated text (Leetaru & Schrodt, 2013; Nguyen et al., 2016; Rospocher et al., 2016) or from generic KGs (Gottschalk & Demidova, 2018) (Step 2), but do not focus on identifying related events for event-centric KG construction (Step 1). Likewise, event subgraph extraction (Step 1) has been used for downstream tasks such as question-answering (QA) (Jia et al., 2021; Li et al., 2021; Souza Costa et al., 2020), but does not focus on building event-centric KGs (Step 2). We propose ChronoGrapher that both retrieves event-centric subgraphs and generates new event-centric KGs, as well as an evaluation of an event-centric KG constructed by ChronoGrapher in a QA setting in the form of a user study. ChronoGrapher, along with the source code and detailed instructions to reproduce the experiments, is publicly available on GitHub under the GPL-v3 license,¹ and archived on Zenodo.²

Our objective is to construct event-centric KGs that support downstream tasks such as QA, for example by enabling the summarization of complex events like the French Revolution or verifying whether figures like Napoleon and Paul Barras were both involved in the Coup of 18 Brumaire. We define three research questions: RQ1:

How to extract an event-centric subgraph from a KG, given an input event node? (Step 1)

RQ2:

How to create an event-centric KG from that subgraph? (Step 2)

RQ3:

Can integrating event-centric KGs into a QA pipeline improve the output quality of large language models (LLMs) compared to using LLMs alone?

Figure 2 provides an overview of the different tasks we tackle, as well as the input and output of each component.

Figure 2.

Overview of the Diverse Tasks We Tackle to Answer Our Research Questions. The Example on the Bottom Is With the French Revolution as Starting Node and DBpedia as the Knowledge Graph (KG). ChronoGrapher Integrates Both a Subgraph Extraction Step Through a Pruned, Semantically Informed Best-First Search Traversal, and a KG Construction Step Through a Structured Triple Enrichment Method and a Text-Based Triple Enrichment Method.

For each of the research questions, we hypothesize the followings: RQ1:

Due to the vastness of KGs, exploring the entire graph is impractical. A search strategy can help efficiently extract the most relevant parts of the graph.

RQ2:

Combining both coarse-grained (e.g., entity-level) and finer-grained (e.g., roles) information leads to a richer and more useful meaningful representation.

RQ3:

Since KGs can be difficult to navigate or interpret directly, they are often not exposed to end users. Instead, their utility can be evaluated through downstream tasks such as QA, where improved performance indicates higher-quality structure and content.

To address these questions, we present ChronoGrapher, a framework composed of two components. First, a pruned, semantically informed best-first search traversal mechanism extracts event-centric subgraphs around a given event node (RQ1). Second, a KG construction module converts this subgraph into an event-centric KG using information from both the structure and literals (RQ2). We then evaluate the resulting KGs through a QA task to assess whether integrating event-centric KGs enhances the performance of LLMs compared to using LLMs alone (RQ3).

For RQ1, ChronoGrapher performs an informed graph traversal to extract subgraphs composed of sub-events that are related to a given event node in the KG. It is a pruned, semantically informed best-first search. First, it uses event-centric filters to prune the search space and to guide the exploration of the graph, and it identifies which ones are the most relevant to build an event-centric KG. Second, it adds a ranking stage to prioritize nodes for the best-first search part, which, to the best of our knowledge, has not been used in search systems starting from one node in KGs. We define the ranking heuristics based upon syntactic and semantic properties of the graph.

For RQ2, ChronoGrapher combines a structured triple enrichment method (or IRI-based method, where IRI stands for Internationalized Resource Identifier) with a text-based triple enrichment method (or abstract-based method) to generate an event-centric KG. The structured triple enrichment method is a rule-based system that converts the triples from the original KG into event-centric ones. The text-based triple enrichment method is linked to the valuable information embedded in textual literals, such as dbo:abstract. For this, we extract frame-based semantics (Fillmore & Baker, 2001) from these texts using semantic parsing. Frames can be seen as templates with placeholders to be filled in. For example, from the sentence “The coup of 18 Brumaire ended the French Revolution,” one could extract a Causation frame where “coup of 18 Brumaire” is the Cause and “end of the French Revolution” is the Effect.

Concerning the technical novelty of the work, most methods for representing events in KGs are underdeveloped, manually built and time-consuming, and ChronoGrapher is a method for building event-centric KGs automatically from a generic KG. Unlike most approaches, ChronoGrapher integrates data from existing KGs and texts, and results in more concise and precise outputs than baselines adapted from existing systems. The modularity of our approach ensures flexibility and adaptability, while the novel combination and adaptation of previously unintegrated elements contributes to the field and improves the adapted baselines (Fionda et al., 2015; Isele et al., 2010). We evaluate ChronoGrapher and adapted methods from the literature (Fionda et al., 2015; Isele et al., 2010) against ground truth events from EventKG (Gottschalk & Demidova, 2018). ChronoGrapher shows adaptability and efficiency across datasets like DBpedia and Wikidata, outperforming existing methods.

For RQ3, we assess whether integrating event-centric KGs enhances the performance of LLMs compared to using LLMs alone. A human evaluation is conducted to compare three types of zero-shot QA prompts (without explicit context, with triples from generic KGs, with triples from event-centric KGs) and six types of different questions. Our results demonstrate that prompts enriched with event-centric KG triples yield more factual answers compared to those enriched with generic KG triples or base prompts, measured by how well answers are grounded in source information, achieving groundedness scores of 2.85, 2.24, and 1.11, respectively, while preserving succinctness and relevance.

2. Related Work

In this section, we review related work along three dimensions: (i) event-centric KGs and their usage, (ii) methods for event-centric KG construction from KG, and (iii) retrieval augmented generation. Recent advances have introduced new approaches to build event-centric KGs with various applications (i). In particular, we propose a method for event-centric KG construction, which directly contributes to this field. Unlike most previous methods that focus on constructing event-centric KGs from unstructured data, our approach builds KGs from existing, structured KGs. To achieve this, our aim is to improve how event-centric subgraph information is searched and navigated within input KGs, by optimizing existing graph traversal methods. To this end, we present related methods that facilitate this process (ii), including link-traversal-based techniques for dynamic data discovery, graph navigational languages for expressing complex patterns, and entity relatedness measures to prioritize relevant nodes. Lastly, we review works that utilize event-centric KGs to improve the performance of LLMs, highlighting the practical applications of event-centric KGs in downstream tasks (iii).

(i) Event-centric KGs and their usage. Recent research on KGs has emphasized the importance of having more event-centric structures to facilitate the analysis and understanding of sequence of events (Riedl, 2016). In this context, event-centric KGs have been developed in a wide range of domains, including news (Leetaru & Schrodt, 2013; Nguyen et al., 2016; Rospocher et al., 2016), political events (Boschee et al., 2015), literary narratives (Kawamura et al., 2019), the biomedical domain (Liu et al., 2018), tourism (Wu et al., 2020), the legal domain (Filtz et al., 2020), olfactory heritage (Lisena et al., 2022), and plenary debates (Sinikallio et al., 2021). In the digital humanities and cultural heritage domains, KGs are increasingly used to link objects to events and the stories surrounding them. For example, the Arco project constructs an Italian cultural heritage KG (Carriero et al., 2019) that was built on XML data using the eXtreme Design methodology (Blomqvist et al., 2010). Other projects focus on understanding historical processes, whether through the history of a city such as Amsterdam (Noordegraaf et al., 2019) or through various historical archives (Meroño-Peñuela & Hoekstra, 2014; Singh et al., 2016; Tomasi, 2021; Van den Akker et al., 2010). Van den Akker et al. (2010) focused on the extraction and modeling of events from the text archive, and Tomasi (2021) proposes an agile cataloging process to build a KG. These works primarily extract events from textual sources, often employing domain-specific heuristics or manual structuring.

In this work, we focus on KG construction from KG. The work most closely related to ours is EventKG (Gottschalk & Demidova, 2019), an event-centric KG that was built starting from large, generic KGs, such as DBpedia or Wikidata. EventKG follows a global and schema-driven approach, where event types are predefined and extracted in bulk across the entire KG. In contrast, our methods adopts a local and dynamic approach: we start from a given input event, that is a node in the KG, and automatically retrieves a subgraph that contains sub-events of that input event. This allows for more context-sensitive and targeted event graph construction. Moreover, our framework is designed to be more flexible: it can be applied to any KG, not limited to the largest open-domain KGs, and it can be configured to retrieve subgraphs around any type of node, not just events, making it broadly adaptable beyond the event-centric setting.

In addition to the KG construction part, event-centric KGs have been explored for downstream applications. For example, Guyet (2020) improved sequential pattern mining with time and reasoning to analyze complex dynamical systems. Understanding such systems helps in acquiring new knowledge but also in predicting and controlling its behavior. Event-centric KGs also help in decision-making, for instance, in the epidemiological domain (Bakalara et al., 2021) or in maritime transport (Del Mondo et al., 2021). Kroll et al. (2020) lastly uses structured narrative representations to build hypotheses on knowledge bases. These applications underscore the growing importance of event-centric KGs in both analytical and predictive tasks.

(ii) Methods for event-centric KG construction from KG. To construct event-centric KGs from KGs, our method performs a pruned, semantically informed, best-first search graph traversal. Although not originally designed for event-centric KG construction, several sub-domains offer valuable methods and insights that support this task. We draw on techniques from linked data traversal, graph navigational languages, and entity relatedness and summarization to inform our approach. These methods guide how information is explored, filtered, and extracted from large graphs, which is crucial for retrieving event-centric content efficiently. We compare their scope to ChronoGrapher’s.

Linked-traversal based methods focus on identifying and extracting task-relevant information by navigating large-scale web data. Also framed as Link Traversal Query Processing (LTQP), the engines start with seed URIs and dynamically discovers sources by following hyperlinks (Hartig et al., 2009; Hartig & Freytag, 2012). One of the pioneering systems in this domain is LDSpider (Isele et al., 2010), which employs diverse search strategies such as breadth-first search or load balancing to crawl web data. Subsequent systems aimed to reduce the search space during the traversal, encoding features like property selection (Schaffert et al., 2012) or basic graph patterns (Umbrich et al., 2012). Systems such as Linked Data-fu (Stadtmüller et al., 2013) and LiDAQ (Umbrich et al., 2012) and its extensions (Umbrich et al., 2015) added reasoning capabilities to their approaches. Other systems like SQUIN (Hartig, 2013) focused on optimizing query execution efficiency. Lynden et al. (2013) propose a hybrid SPARQL query execution strategy that combines link traversal with distributed query processing over SPARQL endpoints. A follow-up of their work (Hartig & Özsu, 2016) evaluated 14 ranking approaches to prioritize result-relevant data sources early during query execution. Taelman and Verborgh (2023) extend LTQP by leveraging structural properties of decentralized systems such as solid to improve query performance at scale. As a follow-up, Eschauzier et al. (2023) investigate the behavior of the link queue in LTQP and reveals two distinct execution patterns. Their findings highlight the importance of understanding link queue behavior for optimizing LTQP performance in decentralized settings. Lastly, Comunica (Taelman et al., 2018) is a modular query engine for Linked Data, designed to flexibly support and evaluate diverse query algorithms, SPARQL features, and Web interfaces. These methods offer a variety of traversal strategies designed for efficient and goal-directed search through KGs, which we draw on to dynamically guide the extraction of event-centric subgraphs in our method.

Graph navigational languages allow structured and iterative exploration of Linked Data through declarative instructions to retrieve nodes or subgraphs. There are two types of approach, namely SPARQL extensions (Fionda et al., 2019; Harris et al., 2013; Reutter et al., 2015) and standalone languages (Fionda et al., 2015; Hartig & Pérez, 2016). Most of these approaches use regular expressions, such as SPARQLeR (Kochut & Janik, 2007) and PSPARQL (Alkhateeb et al., 2009), and later nested regular expressions, such as Zauner et al. (2010) and nSPARQL (Pérez et al., 2010) to explore the graph. Although our traversal is not expressed in a declarative language, we compare the scope and output of our method to those of navigational languages. Our approach can handle both local KGs and SPARQL endpoints, and its output is a subgraph, referred to as a “region” in the context of GeL (Fionda & Pirrò, 2017) and NautiLOD (Fionda et al., 2015). The most expressive system appears to be LDQL (Hartig & Pérez, 2016) which separates pattern processing and source selection. These approaches provide techniques to extract nodes or subgraphs that match specific semantic criteria, which we draw on to refine and optimize the extraction of event-centric subgraphs in our framework.

Entity relatedness methods support the identification of paths, “regions” or subgraphs of high semantic relevance within KGs (Jiménez et al., 2022). Entity relatedness systems compute all paths with search heuristics and activation criteria to prune the search space and rank paths (Herrera, 2017). The search heuristics include the bidirectional search (Fang et al., 2011; Tiddi et al., 2016), backward search heuristic (Herrera et al., 2016), SPARQL queries (Pirrò, 2015), exploratory association search (Cheng et al., 2014), and A* (De Vocht et al., 2013). Activation criteria include entity clustering (Cheng et al., 2014), metrics based on node degree (Fang et al., 2011; Moore et al., 2012) or ontology (Lohmann et al., 2010), and similarity measures (De Vocht et al., 2013; Herrera et al., 2016) such as Jaccard, Wikipedia link-based measures (Witten & Milne, 2008), or SimRank (Jeh & Widom, 2002). For ranking paths, metrics include PF-ITF (Pirrò, 2015), exclusivity-based relatedness (Herrera et al., 2016), and point mutual information (Hulpuş et al., 2015). Table 1 provides an overview of the entity relatedness systems we described.

Entity summarization systems (Liu et al., 2021) take a KG as input and output a graph that retains certain original properties of the graph. The methods used to select nodes are similar to the ranking ones from entity relatedness systems. These systems offer a range of search heuristics and path-ranking strategies to identify meaningful connections between nodes, which we adapt to dynamically guide the extraction of subgraphs in our framework.

Table 1.
Overview of Entity Relatedness Systems.

System Path search Activation criteria Ranking

Tiddi et al. (2016) Bidirectional search Primitives Cost functions

Profiler (Herrera et al., 2016) Backward search heuristic Similarity measures –

RECAP (Pirrò, 2015) SPARQL queries – Path informativeness

EXPLASS (Cheng et al., 2014) Exploratory association search Clustering entities –

De Vocht et al. (2013) A* Jaccard-based metrics Top $k$ paths

Moore et al. (2012) Shortest paths Low-degree nodes Top $k$ paths

REX (Fang et al., 2011) Bidirectional search Degree of node Interestingness measure

RelFinder (Lohmann et al., 2010) All paths Class, link and length filter –

“–” indicates that the component is absent in the system.

System	Path search	Activation criteria	Ranking
Tiddi et al. (2016)	Bidirectional search	Primitives	Cost functions
Profiler (Herrera et al., 2016)	Backward search heuristic	Similarity measures	–
RECAP (Pirrò, 2015)	SPARQL queries	–	Path informativeness
EXPLASS (Cheng et al., 2014)	Exploratory association search	Clustering entities	–
De Vocht et al. (2013)	A*	Jaccard-based metrics	Top $k$ paths
Moore et al. (2012)	Shortest paths	Low-degree nodes	Top $k$ paths
REX (Fang et al., 2011)	Bidirectional search	Degree of node	Interestingness measure
RelFinder (Lohmann et al., 2010)	All paths	Class, link and length filter	–

Our method traverses KGs to extract event-centric subgraphs for event-centric KG building, similar to link traversal-based methods and graph navigational languages. It operates on local graphs or any SPARQL endpoint and produces meaningful subgraphs, similarly to systems like Fionda and Pirrò (2017) and Fionda et al. (2015). Unlike existing traversal methods, which often lack finer-grained control and expressiveness, our approach introduces flexible navigation and reasoning capabilities. It employs filters for search space pruning without prior knowledge of predicates, unlike graph navigational languages requiring regular expressions. Additionally, we introduce a ranking stage inspired by entity relatedness and summarization systems, prioritizing nodes based on a cost function. To the best of our knowledge, this dynamic prioritization is only present by Hartig and Özsu (2016) across in existing navigational systems, and only propose two graph-related approaches. In contrast to entity relatedness systems that typically compute all paths between entities, our system dynamically computes paths. Furthermore, our approach requires only one starting node as input, with target nodes defined as variables. It extracts subgraphs from the original KG, aiming to retrieve all events of a given entity, disregarding specific property considerations as in entity summarization systems. Lastly, while entity summarization techniques focus on preserving specific property distributions, our method targets the extraction of complete event-centric subgraphs, ensuring the retrieval of all sub-events linked to the input node.

(iii) Retrieval augmented generation (RAG). RAG studies have shown that LLMs store factual knowledge in their parameters (Petroni et al., 2019), for example, in the form of implicit knowledge bases (Razniewski et al., 2021; Roberts et al., 2020). However, the mechanisms to retrieve and understand this knowledge remain challenging. RAG systems (Chen et al., 2024) explicitly provide external information to the model through a dedicated knowledge retriever component. By separating knowledge storage from generation, RAG systems improve interpretability and often lead to more factual, diverse, and contextually grounded output (Lewis et al., 2021), with applications in tasks such as QA (Chen et al., 2017) or reasoning (Sun et al., 2024). Systems such as ORQA (Lee et al., 2019) or REALM (Guu et al., 2020) learn the retriever and the reader jointly, whereas systems like DrQA (Chen et al., 2017) have a fixed document retrieval component. Karpukhin et al. (2020) propose a dense passage retriever to index all documents in a latent space. Petroni et al. (2020) augment the reader with a retriever in an unsupervised way.

In our user study, we adopt a RAG-like setup to compare three types of zero-shot prompting strategies for QA: one prompting without explicit context, and two prompting with triples from generic KGs and event-centric KGs, respectively. Our knowledge retriever component is implemented using SPARQL queries that extract relevant triples from the KG to condition the LLM’s generation.

3. ChronoGrapher

As displayed in Figure 2 from Section 1, ChronoGrapher takes an event IRI as input, extracts an event-centric subgraph where all triples contains sub-events of that input IRI, and outputs an event-centric KG. In this section, we formalize the two main components: subgraph extraction (Section 3.1) and event-centric KG construction (Section 3.2). We begin by defining the core concepts used throughout the two components: KGs, event-centric KGs, event and subevent.

Definition 1 (Knowledge Graph)

A knowledge graph ( $K G$ ) is an RDF graph where the nodes represent entities (subjects or objects) and literals (objects) and the edges represent the relationships (predicates) between these entities and literals. Formally, $K G = {(s, p, o) | s \in E, o \in E \cup L, p \in R}$ , where $E$ is the set of entities (nodes) in the KG, $L$ is the set of literals in the KG, and $R$ is the set of predicates (edges) representing relationships in the KG.

Definition 2 (Event-Centric KG)

In this work, an event-centric KG is a KG whose vocabulary, classes, and relationships originate from the SEM (Van Hage et al., 2011) and the NIF (Hellmann et al., 2013) ontologies. There are four core classes in SEM: sem:Event (what), sem:Actor (who), sem:Place (where), and sem:Time (when). The NIF ontology enables to integrate text data in KGs.

Definition 3 (Event and Sub-Event)

An event is defined as an occurrence in time that can be characterized by attributes from the SEM ontology, such as actor and location. A sub-event is defined as an event that forms part of a larger, composite event. Sub-events maintain the same formal structure as events but are contextually linked to parent events through temporal, causal, or structural relations (e.g., sem:subEventOf).

3.1. Event-Centric Subgraph Extraction (RQ1)

Given a target event of interest and a knowledge graph $K G$ , our goal is to extract an event-centric subgraph from $K G$ that captures relevant information about this event. To achieve this, we propose a link traversal-based method tailored to the event-centric setting. Our approach takes the form of a pruned best-first search: it incrementally explores the most promising nodes while explicitly pruning others, enabling focused and efficient traversal.

We begin by introducing the core concepts required for graph traversal (Section 3.1.1) and formally define the task and the search algorithm (Section 3.1.2). We then present the two key components of our method: the event-centric filters, which prune the search space by discarding nodes (Section 3.1.3); and the ranking and scoring mechanism, which selects the highest scoring nodes for expansion based on their contextual relevance (Section 3.1.4). Together, these components drive the extraction of compact and semantically meaningful subgraphs centered around a given event. To facilitate understanding of the subgraph extraction task and its components, we summarize the key definitions used throughout this section in Table 2.

Table 2.
Useful Definitions for the Subgraph Extraction Task.

Definition number Definition name Relevance to subgraph extraction task

4 Ingoing triples Node expansion (cf. Definition 6)

5 Outgoing triples Node expansion (cf. Definition 6)

6 Node expansion KG traversal

7 Event-centric subgraph retrieval task Task

8 Pruned best-first search Method, with two main components: node pruning and node ranking

9 Event-centric filters Node pruning (cf. Definition 8)

10 Relation patterns Node ranking (cf. Definition 8)

11 Scoring Node ranking (cf. Definition 8)

12 Preference function Node ranking (cf. Definition 8)

KG = knowledge graph.

Definition number	Definition name	Relevance to subgraph extraction task
4	Ingoing triples	Node expansion (cf. Definition 6)
5	Outgoing triples	Node expansion (cf. Definition 6)
6	Node expansion	KG traversal
7	Event-centric subgraph retrieval task	Task
8	Pruned best-first search	Method, with two main components: node pruning and node ranking
9	Event-centric filters	Node pruning (cf. Definition 8)
10	Relation patterns	Node ranking (cf. Definition 8)
11	Scoring	Node ranking (cf. Definition 8)
12	Preference function	Node ranking (cf. Definition 8)

3.1.1. Preliminaries

We formally define preliminary concepts that are useful for our subgraph extraction task and our link traversal-based method: the ingoing and outgoing triples of a node in a KG, as well as node expansion.

Definition 4 (Ingoing Triples)

Let $K G = {(s, p, o) | s \in E, o \in E \cup L, p \in R}$ , and $P (K G) = {A | A \subseteq K G}$ the set of all subsets of $K G$ . The ingoing triples of a node $n$ are formally defined as follows:

\begin{aligned} i n g o i n g_{K G} : E & ⟶ P (K G) \\ n & ⟼ {(s, p, n) ∣ \exists p \in R, \exists s \in E, such that (s, p, n) \in K G} \end{aligned}

Definition 5 (Outgoing Triples)

Let $K G = {(s, p, o) | s \in E, o \in E \cup L, p \in R}$ , and $P (K G) = {A | A \subseteq K G}$ the set of all subsets of $K G$ . The outgoing triples of a node $n$ are formally defined as follows:

\begin{aligned} o u t g o i n g_{K G} : E & ⟶ P (K G) \\ n & ⟼ {(n, p, o) ∣ \exists p \in R, \exists o \in E \cup L, such that (n, p, o) \in K G} \end{aligned}

Definition 6 (Node Expansion)

Expanding a node $n$ means retrieving its ingoing and outgoing triples.

Building on these preliminary concepts, we now formalize the event-centric subgraph retrieval task and describe our pruned best-first search algorithm to tackle it.

3.1.2. Task and Method Overview

We formalize the event-centric subgraph retrieval task and present the framework used to address it. Our method is a pruned best-first search that iteratively expands a local subgraph by exploring the most promising candidate nodes while discarding other ones. This section first introduces the task formulation and algorithm definition, followed by a step-by-step breakdown in pseudo-code and an illustration of the traversal process.

Definition 7 (Event-Centric Subgraph Retrieval Task)

Given a knowledge graph $K G = {(s, p, o) ∣ s \in E, o \in E \cup L, p \in R}$ and a starting event node $n_{s t a r t}$ , the goal of the event-centric subgraph retrieval task is to extract a subgraph $G^{'} \subseteq K G$ such that each triple $(s^{'}, p^{'}, o^{'}) \in G^{'}$ satisfies the condition that either $s^{'}$ or $o^{'}$ is a sub-event of $s .$ ³ Formally:

\begin{aligned} r e t r i e v a l_{K G} : E & ⟶ P (K G) \\ n_{s t a r t} & ⟼ {(s^{'}, p^{'}, o^{'}) \in K G | subevent_of (s^{'}, n_{start}) \lor subevent_of (o^{'}, n_{start})} \end{aligned}

Note that with this definition, $n_{s t a r t}$ will not be included in the extracted subgraph if it is not explicitly encoded as a sub-event of itself. However, $n_{s t a r t}$ is later re-integrated during the KG construction step (described in Section 3.2), and is therefore present as an event node in the final output KG. Furthermore, non-subevent relations such as preceding events are not included in the extracted event-centric subgraph from Definition 7, and are not treated as sub-events. Extending this approach to incorporate other relations, such as those defined in Allen’s interval algebra (Allen, 1983), is a promising direction for future work.

Definition 8 (Pruned Best-First Search Algorithm)

A pruned best-first search algorithm explores a graph by expanding the unvisited node with the highest score according to a predefined heuristic. During traversal, the algorithm prunes (i.e., excludes) nodes and edges of specific types, reducing the search space while focusing the exploration on the most promising parts of the graph.

We provide the pseudo-code for our subgraph extraction method in Algorithm 1, which implements a pruned best-first search in four main stages at each iteration of the search: nodes with the highest score are selected and expanded in the graph (Stage 1); event-centric filters prune certain nodes from the search space (Stage 2); the pending set and output subgraph are updated accordingly (Stage 3); and nodes in the pending set are scored and ranked to guide the next iteration (Stage 4). The $t o_{e} x t r a c t$ parameter specifies the type of nodes to extract from the graph; while it is set to events in our experiments, it can be adapted to extract different node types depending on the dataset or use case. The set nodes, initialized in line 4, represents the priority queue of nodes to be expanded, drawn from the pending set $P$ initialized in line 3.

To illustrate these stages more intuitively, Figure 3 shows how a single iteration of the graph search unfolds.⁴ This visual aid complements the pseudo-code by providing a concrete example of node expansion, filtering, and scoring in context. In Stage 1, the highest-ranked nodes are expanded by retrieving their ingoing and outgoing neighbors. In the figure, dbr:French_Revolution is expanded. This leads to new, unvisited ingoing nodes like dbr:Insurrection_of_10_August_1792 or dbr:Storming_of_the_Bastille, and unvisited outgoing nodes like dbo:MilitaryConflict and dbr:Kingdom_of_France. Stage 2 applies the node filtering. For example, dbr:Kingdom_of_France is removed because it is a location node (WHERE), dbo:MilitaryConflict is discarded due to being reached via the rdf:type predicate (WHAT), and dbr:Jean-Jacques_Rousseau is excluded because he died before the French Revolution began (WHEN). These filters help focus the traversal on event nodes, which are more likely to lead to other relevant events, reducing noise during search. Importantly, some filtered nodes, typically people and locations, are re-integrated during the KG construction phase to ensure a complete and coherent event representation. Stage 3 updates the pending node set and the output subgraph with the remaining nodes. In the figure, dbr:Pierre_Douville, Storming_of_the_Bastille, and Insurrection_of_10_August_1792 are added to the pending nodes to possibly be expanded, and the triples connecting the two latter nodes to dbr:French_Revolution are added to the subgraph output. Lastly, in Stage 4, candidate nodes are grouped by patterns, and ranked using scoring metrics such as predicate frequency or entropy predicate frequency to guide the next iteration. In the figure, Storming_of_the_Bastille and Insurrection_of_10_August_1792 will be expanded at the next iteration.

Figure 3.

Description of One Iteration During the Informed Graph Traversal. The Type of Ranking Is Predicate-Object Frequency. The WHERE, WHAT, and WHEN Filters Are Activated (Formally Defined as $e_{f} i l t e r_{{d b o : L o c a t i o n}, D B, r d f : t y p e}$ , Where $D B$ Is the Set of Triples in DBpedia, $p_{f} i l t e r_{r d f : t y p e}$ and $t_{f} i l t e r_{d b o : s t a r t D a t e, 1789 - 05 - 05, d b o : e n d D a t e, 1799 - 12 - 31}$ , Respectively).

The pseudo-algorithm and the visual overview highlight the key stages of our graph traversal process. We now formally define the event-centric filters (Section 3.1.3) and ranking mechanism (Section 3.1.4) that guide the subgraph extraction by pruning irrelevant nodes and prioritizing the most informative ones.

3.1.3. Event-Centric Filters

We define a set of generic filters to prune the search space during traversal, instantiated in our experiments as event-centric filters (see Section 4.1). While the filters are generic, our usage for the event-centric KGs follows the SEM ontology (Van Hage et al., 2011) and is aligned with its four core classes, as detailed in the introduction of Section 3.

The intuition behind these filters is to restrict the traversal to event-centric paths, that is, to nodes that are themselves events. This design helps prevent the search from drifting into semantically less relevant or noisy parts of the graph. For instance, expanding a person or location node may lead to biographical or geographic information that, while informative, does not contribute directly to the sub-event structure we aim to retrieve. Therefore, such entities are, when possible, not expanded further during traversal. However, they remain essential to the broader context of events and are reintegrated during the KG construction phase, which enriches the retrieved event-centric core with relevant contextual information such as people, places, and other linked entities.

In Figure 3 (Stage 2), the filters act dynamically, pruning newly visited nodes based on their type or connection. These filters could be implemented using SPARQL queries or graph navigational languages such as NautiLOD (Fionda et al., 2015), though such approaches lack the ability to rank patterns (see Section 3.1.4). Below, we formally define the filter mechanism.

Definition 9 (Event-Centric Filters)

We hereafter define three types of event-centric filters: predicate-based, entity-based and temporal-based. Each filter is a boolean function that enables to prune the search space in a knowledge graph $K G = {(s, p, o) | s \in E, o \in E \cup L, p \in R}$ .

Definition 9.1
Predicate-based filter.

Let $R^{'} \subset R$ and $(s, p, o) \in K G$ . The predicate-based filter discards $(s, p, o)$ from the search space if $p \in R^{'}$ . It can be formally defined as follows:
$\begin{aligned} p_{f} i l t e r_{R^{'}} : K G & ⟶ {0, 1} \\ (s, p, o) & ⟼ {\begin{cases} 1 & if p \in R^{'} \\ 0 & else \end{cases} \end{aligned}$
–
WHAT-filter. For example, $R^{'} = {rdf:type}$ .

Definition 9.2
Entity-based filter.

Let $E^{'} \subset E$ , $p \in R$ , $s \in E$ . The entity-based filter discards $s$ from the search space if $s$ is of any type in $E^{'}$ . It can be formally defined as follows:
$\begin{aligned} e_{f} i l t e r_{E^{'}, K G, p} : E & ⟶ {0, 1} \\ s & ⟼ {\begin{cases} 1 & if \exists t \in E^{'}, such that (s, p, t) \in K G \\ 0 & else \end{cases} \end{aligned}$

–
WHO-filter. For example, $E^{'} = {dbo:Person}$ , $p = rdf:type$ .
–
WHERE-filter. For example, $E^{'} = {dbo:Location}$ , $p = rdf:type$ .

Definition 9.3
Temporal-based filter.

Let $L_{d a t e} \subseteq L$ be the subset of literals corresponding to dates. Let $(p_{s t a r t}, p_{e n d}) \in R^{2}$ , $(t_{s t a r t}, t_{e n d}) \in L_{d a t e}^{2}$ . $p_{s t a r t}$ and $p_{e n d}$ are predicates denoting the start and end dates of an event, respectively, and $t_{s t a r t}$ and $t_{e n d}$ are literal timestamps. The temporal-based filter discards $s \in E$ from the search space if it ends before $t_{s t a r t}$ or if it starts after $t_{e n d}$ . It can be formally defined as follows:
$\begin{aligned} t_{f} i l t e r_{p_{s t a r t}, t_{s t a r t}, p_{e n d}, t_{e n d}} : E & ⟶ {0, 1} \\ s & ⟼ {\begin{cases} 1 & if \exists t_{1} \in L_{d a t e}, such that (s, p_{s t a r t}, t_{1}) \in K G and t_{1} > t_{e n d} \\ or \exists t_{2} \in L_{d a t e}, such that (s, p_{e n d}, t_{2}) \in K G and t_{2} < t_{s t a r t} \\ 0 & otherwise \end{cases} \end{aligned}$
–
WHEN-filter. For example, $p_{s t a r t} = dbo:birthDate$ , $p_{e n d} = dbo:deathDate$ , $t_{s t a r t} = ``1789-05-05,"$

$t_{e n d} = ``1799-12-31."$

The above filters are parameters of the traversal, and can be activated on demand. In Figure 3, the filters we activated are WHAT, WHERE, and WHEN (formally defined as $e_{f} i l t e r_{{d b o : L o c a t i o n}, D B, r d f : t y p e}$ , $p_{f} i l t e r_{r d f : t y p e}$ , and $t_{f} i l t e r_{d b o : s t a r t D a t e, 1789 - 05 - 05, d b o : e n d D a t e, 1799 - 12 - 31}$ respectively). For WHAT, dbo:MilitaryConflict is discarded from the search space, since it is the object of the triple (dbr:French_Revolution,rdf:type, dbo: MilitaryConflict). Regarding WHERE, Kingdom_of_France is discarded since it is of type dbo:Location. Lastly, Jean-Jacques_Rousseau is discarded because Rousseau died before the French Revolution started, hence applying the WHEN filter.
3.1.4. Ranking Mechanism

To prioritize the most relevant nodes for expansion, we implement a ranking mechanism that operates at the level of relation patterns. Rather than scoring individual nodes in isolation, our method groups nodes according to the structural and semantic patterns they instantiate in the graph. Each pattern is then evaluated and ranked based on its relevance to the current search context.

The ranking strategy combines two complementary components: a quantitative heuristic, which assigns a numerical score to each pattern, and a semantic preference function, which classifies patterns into semantically relevant versus less semantically relevant. The final score of a pattern integrates both aspects to reflect both contextual fit and semantic importance. Once the top-ranked pattern is selected, all nodes in the pending set that match this pattern are expanded. The process repeats until no relevant patterns or nodes remain, or until the maximum iteration number is reached. We now formally introduce the notion of relation patterns and node satisfaction, before detailing the scoring and preference mechanisms used to rank them. All the examples are taken from Figure 3.

Definition 10 (Relation Patterns and Node Satisfaction)

Let $K G = {(s, p, o) | s \in E, o \in E \cup L, p \in R}$ . We define the following relation patterns in $K G$ as Boolean functions. For each pattern defined below, we write that a node $n \in E$ satisfies a pattern if the corresponding pattern function returns 1 for $n$ .

Definition 10.1
Single-step outgoing predicate pattern.

Let a predicate $p \in R$ , we formally define $o p p_{p, K G}$ as follows:
$\begin{aligned} o p p_{p, K G} : E & ⟶ {0, 1} \\ n & ⟼ {\begin{cases} 1 & if \exists o \in E such that (n, p, o) \in K G \\ 0 & otherwise \end{cases} \end{aligned}$
–
Example. We assume $K G = D B p e d i a$ and $p = {dbo:isPartOfMilitaryConflict}$ .

dbr:Storming_of_the_Bastille satisfies $o p p_{p, K G}$ , since

(dbr:Storming_of_the_Bastille, $p$ , dbr:French_Revolution) $\in K G$ .

Definition 10.2
Single-step ingoing predicate pattern

Let a predicate $p \in R$ , we formally define $i p p_{p, K G}$ as follows:
$\begin{aligned} i p p_{p, K G} : E & ⟶ {0, 1} \\ n & ⟼ {\begin{cases} 1 & if \exists s \in E such that (s, p, n) \in K G \\ 0 & otherwise \end{cases} \end{aligned}$
–
Example. We assume $K G = D B p e d i a$ and $p = {dbo:isPartOfMilitaryConflict}$ .

dbr:French_Revolution satisfies $i p p_{p, K G}$ , since

(dbr:French_Revolution, $p$ , dbr:Storming_of_the_Bastille) $\in K G$ .

Definition 10.3
Single-step outgoing predicate-object pattern

Let a predicate $p \in R$ and an entity $e \in E$ , we formally define $o p o p_{p, e, K G}$ as follows:
$\begin{aligned} o p o p_{p, e, K G} : E & ⟶ {0, 1} \\ n & ⟼ {\begin{cases} 1 & if (n, p, e) \in K G \\ 0 & otherwise \end{cases} \end{aligned}$
–
Example. We assume $K G = D B p e d i a$ , $p = {dbo:isPartOfMilitaryConflict}$ and

$e = {dbr:Storming_of_the_Bastille}$ .

dbr:French_Revolution satisfies $o p o p_{p, e, K G}$ , since

(dbr:French_Revolution, $p$ , dbr:Storming_of_the_Bastille) $\in K G$ .

Definition 10.4
Single-step ingoing predicate-object pattern

Let a predicate $p \in R$ and an entity $e \in E$ , we formally define $i p o p_{p, e, K G}$ as follows:
$\begin{aligned} i p o p_{p, e, K G} : E & ⟶ {0, 1} \\ n & ⟼ {\begin{cases} 1 & if (e, p, n) \in K G \\ 0 & otherwise \end{cases} \end{aligned}$
–
Example. We assume $K G = D B p e d i a$ , $p = {dbo:isPartOfMilitaryConflict}$ and

$e = {dbr:French_Revolution}$ .

dbr:Storming_of_the_Bastille satisfies $i p o p_{p, e, K G}$ , since

(dbr:French_Revolution, $p$ , dbr:Storming_of_the_Bastille) $\in K G$ .

Definition 11 (Scoring)

Let $G^{'}$ a subset of $K G$ .

We first define scoring metrics for ingoing and outgoing predicate patterns. Let a predicate $p \in R$ and the patterns $i p p_{p}$ and $o p p_{p}$ from Definition 10.

–
Predicate frequency (pf). Edges with predicates that are often used in the dataset will be favored.
$\begin{aligned} p f (i p p_{p}) = | {s ∣ \exists o \in E, such that (s, p, o) \in G^{'}} | \end{aligned}$
(1)
$\begin{aligned} p f (o p p_{p}) = | {o ∣ \exists s \in E, such that (s, p, o) \in G^{'}} | \end{aligned}$
(2)
–
Entropy predicate frequency (epf). Edges with the most informative predicates in the dataset will be favored. Let $p p \in {i p p_{p}, o p p_{p}} .$
$\begin{aligned} e p f (p p) = {\begin{cases} - \frac{p f (p p)}{| G^{'} |} \times \log (\frac{p f (p p)}{| G^{'} |}) & if p f (p p) > 0 \\ 0 & otherwise \end{cases} \end{aligned}$
(3)
–
Inverse predicate frequency (ipf). Edges with predicates that are rarely used in the dataset will be favored. Let $p p \in {i p p_{p}, o p p_{p}} .$
$\begin{aligned} e p f (p p) = {\begin{cases} - \frac{1}{p f (p p)} & if p f (p p) > 0 \\ 0 & otherwise \end{cases} \end{aligned}$
(4)

Table 3.
Triples Considered for the Scoring Example. Only Unvisited Nodes Are Taken Into Account.

Subject Predicate Object

dbr:Pierre_Douville dbo:battle dbr:French_Revolution

dbr:Insurrection_of_10_August_1792 dbo:isPartOfMilitaryConflict dbr:French_Revolution

dbr:Storming_of_the_Bastille dbo:isPartOfMilitaryConflict dbr:French_Revolution

We second define scoring metrics for ingoing and outgoing predicate-object patterns. Let $p$ , $p \in R$ , $e$ , $e \in E$ , and the patterns $i p o p_{p, e}$ and $o p o p_{p, e}$ from Definition 10. –
Predicate-object frequency (pof). This is similar to pf, but this metrics also differentiates between the objects.
$\begin{aligned} p o f (i p o p_{p, e}) = | {s ∣ (s, p, e) \in G^{'}} | \end{aligned}$
(5)
$\begin{aligned} p o f (o p o p_{p, e}) = | {o ∣ (e, p, o) \in G^{'}} | \end{aligned}$
(6)
–
Entropy predicate-object frequency (epof). This is similar to epf, but this metrics also differentiates between the objects. Let $p o p \in {i p o p_{p, e}, o p o p_{p, e}} .$
$\begin{aligned} e p o f (p o p) = {\begin{cases} - \frac{p o f (p o p)}{| G^{'} |} \times \log (\frac{p o f (p o p)}{| G^{'} |}) & if p o f (p o p) > 0 \\ 0 & otherwise \end{cases} \end{aligned}$
(7)
–
Inverse predicate-object frequency (ipof). This is similar to ipf, but this metrics also differentiates between the objects. Let $p o p \in {i p o p_{p, e}, o p o p_{p, e}} .$
$\begin{aligned} i p o f (p o p) = {\begin{cases} \frac{1}{p o f (p o p)} & if p o f (p o p) > 0 \\ 0 & otherwise \end{cases} \end{aligned}$
(8)

Let us consider the following triples from Table 3. We have $| G^{'} | = 3$ .

Since dbr:French_Revolution was already visited, we only consider the ingoing patterns. Given the above definitions, we have the followings scores: –
$p f (i p p_{dbo:battle}) = 1$
–
$p f (i p p_{dbo:isPartOfMilitaryConflict}) = 2$
–
$e p f (i p p_{dbo:battle}) = - \frac{p f (i p p_{dbo:battle})}{| G^{'} |} \times \log (\frac{p f (i p p_{dbo:battle})}{| G^{'} |}) = - \frac{1}{3} \times \log \frac{1}{3} \approx 0.16$
–
$e p f (i p p_{dbo:isPartOfMilitaryConflict}) = - \frac{p f (i p p_{dbo:isPartOfMilitaryConflict})}{| G^{'} |} \times \log (\frac{p f (i p p_{dbo:isPartOfMilitaryConflict})}{| G^{'} |}) = - \frac{2}{3} \times \log \frac{2}{3} \approx 0.12$

The scoring metric used in Figure 3 is pof. The score of the pattern $i p o p_{dbo:isPartOfMilitaryConflict,dbr:French_Revolution}$ is 2 since there are two ingoing nodes of the dbr:French_Revolution that are connected to the latter with the predicate dbo:isPartOfMilitaryConflict. If the scoring metric was epof, the score would be $- \frac{2}{3} \times \log \frac{2}{3} \approx 0.12$ .
Definition 12 (Preference Function)

Subject	Predicate	Object
dbr:Pierre_Douville	dbo:battle	dbr:French_Revolution
dbr:Insurrection_of_10_August_1792	dbo:isPartOfMilitaryConflict	dbr:French_Revolution
dbr:Storming_of_the_Bastille	dbo:isPartOfMilitaryConflict	dbr:French_Revolution

The preference function is a Boolean function that classifies a pattern as semantically relevant if it meets certain conditions. Let $K G = {(s, p, o) | s \in E, o \in E \cup L, p \in R}$ a KG. Let $d \in R$ and $s c \in R$ denoting predicates for domain and subclass information. Let $e x t r a c t_{t} y p e \in E$ a type of events to retrieve. Let $p \in R$ , $e \in E$ , and a pattern $p p$ , $p p \in {i p p_{p}, o p p_{p}, i p o p_{p, e}, o p o p_{p, e}}$ . We formally define as follows:

\begin{aligned} p r e f_{f} p p_{d, s c, t_{e}} : {i p p_{p}, o p p_{p}, i p o p_{p, e}, o p o p_{p, e}} & ⟶ {0, 1} \\ p p & ⟼ {\begin{cases} 1 & if \exists n \in E, such that (p, d, n) \in K G \\ and (n, s c, e x t r a c t_{t} y p e) \in K G \\ 0 & otherwise \end{cases} \end{aligned}

–
For example, $d = rdfs:domain$ , $s c = rdfs:subClassOf$ , $t_{e} = dbo:Event$ .

The preference function prioritizes predicates with the highest scores from the $s c o r i n g$ step, favoring nodes with the correct types. We use the ontological equivalent predicates of rdfs:domain and rdfs:range. In DBpedia, we have (dbo:isPartOf MilitaryConflict,rdfs:domain,dbo:MilitaryConflict) and (dbo: MilitaryConflict, rdfs:subClassOf*, dbo:Event). Consequently, in (s,dbo:isPart OfMilitaryConflict,o), $s$ must be of type dbo:Event. With the preference function, the system prefers patterns with dbo:isPartOfMilitaryConflict over dbo:battle since the former has dbo:Military Conflict as its domain, a subclass of event type, while the latter lacks explicit information about its domain or range. In Figure 3, the preference function does not affect the selection of nodes to expand. Without the preference function, patterns are ranked in descending order of scores from the $s c o r i n g$ step.

We have presented the event-centric subgraph extraction component of ChronoGrapher. We now present its event-centric KG population component.
3.2. Event-Centric KG Population (RQ2)

To build a comprehensive and fine-grained representation of events, ChronoGrapher incorporates a second component dedicated to constructing event-centric KGs. This process complements the subgraph extraction phase by enriching the retrieved events with contextual information. Such information may be explicitly encoded in the KG through IRIs or expressed in texts stored as literals.

Algorithm 2 outlines the KG construction procedure. Starting from the events identified during the subgraph extraction step, the system applies two complementary triple enrichment strategies, one structure-based and one text-based, to build the event-centric KG:

1:
Structured triple enrichment (IRI-based). For each event, the system retrieves its outgoing triples from the original KG and maps them to SEM-compatible relations. This mapping relies on a predefined set of predicate labels ( $S$ in the pseudo-code), which determine the corresponding narrative dimensions (e.g., time, place, and actor). This step is rule-based and aims to reproduce the type of structured information found in resources such as EventKG (Gottschalk & Demidova, 2018).
2:
Text-based triple enrichment (abstract-based). The system then processes the abstract associated with each event, typically encoded as a literal. Using a semantic information extraction pipeline based on frame semantics, it identifies frames and their roles (frame elements), which are then linked to named entities. This allows the construction of more fine-grained relations between events and entities, such as causal relations.

An example of the IRI-based KG construction is shown in Figure 4. On the left, we show a portion of the original Wikidata subgraph for the Coup of 18 Brumaire (wd:Q620965), while the right side presents the transformed, SEM-aligned event-centric KG. Table 4 lists the label patterns used in this process. For each triple (s, p, o), if the label of p matches one of the predefined labels associated with a narrative dimension (e.g., time, place, and actor), a corresponding SEM-compatible triple is added to the output graph. This transformation is governed by a rule-based mapping approach, which assigns SEM predicates based on the labels of the original RDF predicates. For instance, consider a predicate whose label contains the substring “person.” This label indicates a relation involving an actor. In this case, the system adds the triple (s, sem:hasActor, o) to the event-centric KG.⁵

Figure 4.
Populating a SEM Knowledge Graph From the Coup of 18 Brumaire in Wikidata. $w d$ Is a Prefix Used in Wikidata for http://www.wikidata.org/entity/.

Table 4.
Labels Used to Retrieve Information From Knowledge Graph (KG) Triples.

Narrative dimension SEM predicate Labels

Who sem:hasActor person, combatant, commander, and participant

When (begin) sem:hasBeginTimeStamp start time, date, and point in time

When (end) sem:hasEndTimeStamp end time

Where sem:hasPlace place, location, and country

Part of sem:subEventOf partof and part of

Part of (inverse) sem:hasSubEvent has part and significant event

Beyond transforming IRI triples, ChronoGrapher enriches the KG by extracting structured information from textual description, specifically, event abstracts encoded as literals. To achieve this, we rely on the frame semantics theory (Fillmore & Baker, 2001), which represents situations as frames, that is, as structured templates involving participants, temporal properties, and other contextual roles. Some event-centric KGs built from news articles (Leetaru & Schrodt, 2013; Rospocher et al., 2016) also used frames. Although EventKG contains text events and links them to entities, it does not provide the finer-grained semantic structure of frames. One example of frames is the Change_of_ leadership frame.⁶ It describes “the appointment of a New_leader or removal from office of an Old_leader.” Semantic roles such as Old_leader and New_leader are called frame elements.

ChronoGrapher automatically identifies such frames and their associated roles by reusing the pre-trained transformer-based model introduced by Chanin (2023). Each frame instance is aligned with the NIF ontology (Hellmann et al., 2013) to anchor it in the source text, and DBpedia Spotlight (Mendes et al., 2011) is used to link frame elements to entities. The resulting frame structures are then integrated into the KG and linked to the Framester ontology (Gangemi et al., 2016), using a representation consistent with Framester’s semantic model.⁷^,⁸

Figure 5 illustrates this process. Starting from the abstract of dbr:Coup_of_18_Brumaire, the text is segmented into sentences (e.g., ex:S1). A frame annotation such as ex:FNAnnot1 is associated with the frame frame:Cause_to_end, which in turn connects to roles like ex:E1, representing entities such as the French Revolution. These literals are grounded to IRI entities (e.g., dbr:French_Revolution) via entity linking. This frame-based enrichment enables ChronoGrapher to capture causal, temporal, and participant-related semantics that go beyond what is available from structured triples alone, and is particularly useful for answering complex event-centric queries (as explored in RQ3).

Figure 5.
Example of Frames Extracted From Text About the Coup of 18 Brumaire. The Knowledge Graph (KG) Contains Both Literals (in Gray) and Linked KG Entities. For Instance, the French Revolution Appears as a Literal, Directly Tied to Its Sentence and Serving as the Value of Role ex:E1, While the Linked DBpedia Entity (dbr:French_Revolution) is Associated to ex:E1 via skos:related.
4. Evaluation

Narrative dimension	SEM predicate	Labels
Who	sem:hasActor	person, combatant, commander, and participant
When (begin)	sem:hasBeginTimeStamp	start time, date, and point in time
When (end)	sem:hasEndTimeStamp	end time
Where	sem:hasPlace	place, location, and country
Part of	sem:subEventOf	partof and part of
Part of (inverse)	sem:hasSubEvent	has part and significant event

We present the setup (Section 4.1), and results on the quality of the method (Section 4.2) and the impact of the constructed event-centric KGs on LLMs question-answering (Section 4.3). The quantitative evaluation (Section 4.2, RQ1 and RQ2) focuses on evaluating the quality of the method, both for the traversal (RQ1) and the KG population (RQ2). The evaluation for RQ1 is done automatically and can be reproduced. The evaluation for RQ2 is partly done automatically, and partly manually assessed. For RQ3, we conducted a preliminary user study to assess the potential of event-centric KGs on an LLM QA setting, and we found encouraging qualitative results.

4.1. Experimental Setting (RQ1 and RQ2)

Our experiments first aim to assess the quality of ChronoGrapher, both for event-centric subgraph extraction (RQ1) and for event-centric KG population (RQ2). For RQ1, we investigate the effects of filtering and ranking on the graph traversal approach. We first experiment under different configurations, with six distinct parameters for the search process: (1) scoring metric, (2) preference function (cf. Section 3.1.4), and (3)–(6) filters pertaining to WHAT, WHERE, WHEN, and WHO (Definition 9). The scoring metrics is one of the six options described above (Definition 11), whereas all the other parameters are Boolean.

To speed up the overall experiments, we used two Ubuntu machines with similar hardware configurations (40 CPU cores and 314 GiB/376 GiB of memory, respectively). Given the close similarity in specs, we do not expect the use of different machines to have introduced any meaningful variability in performance. While memory usage remains relatively constant due to the persistent in-memory KG, the approach would be more CPU-intensive, as it involves issuing a large number of query interface calls during execution.

Parameter notation. Throughout this section, we use the following binary parameters to indicate which semantic filters or retrieval strategies are enabled in each configuration:

–
what, where, when, who $\in {0, 1}$ : These indicate whether the corresponding semantic filters are active. For instance, who = 1 means the WHO-filter defined in Section 3.1.3 is applied.
–
domain_range $\in {0, 1}$ : When set to 1, the domain-range preference function is used to prioritize triples whose predicate-object structure aligns with the expected answer type.

Baselines. For RQ1 on event-centric subgraph extraction, we compare chronographer with three categories of system we implemented. We adapted LDSpider (Isele et al., 2010) and NautiLOD (Fionda et al., 2015) to align with our method. The performance comparison between ChronoGrapher and the considered baselines focuses solely on the quality of the extracted sub-events, not the entire subgraph. –
random-5, random-10, and random-15: At each iteration, $n b$ nodes are randomly selected to be expanded. We used $n b \in {5, 10, 15}$ . Parameters (1)–(6) are not applicable.
–
ldspider-m: Inspired by LDSpider (Isele et al., 2010), this search is a breadth-first search with some predefined predicates to be pruned out of the search space. Parameters (1) and (2) are not applicable, and (3)–(6) are set to 0.
–
nautilod-m: Inspired by NautiLOD (Fionda et al., 2015), this baseline works as a breadth-first search combined with more complex filters than ldspider-m. Parameters (1) and (2) are not applicable, but we experiment with (3)–(6).

Evaluation. We chose EventKG 3.1 (Gottschalk & Demidova, 2020) as a golden standard primarily because, to the best of our knowledge, it is the only existing resource that automatically constructs event-centric KGs from large-scale KGs like DBpedia and Wikidata, making it the most relevant point of comparison for our method. Additionally, EventKG relies on similar input datasets, which allows for a meaningful overlap in evaluation. While we note that the original EventKG paper does not detail a formal evaluation of its own event selection process, its scale, structured format, and availability make it a valuable reference baseline for both subgraph extraction and KG construction tasks. We discuss this further in Section 5.1.

For RQ1 on event-centric subgraph extraction, we constructed a gold standard by querying EventKG using SPARQL SELECT queries to retrieve relevant event instances. The notebook is available to replicate this process.⁹ For RQ2 on event-centric KG population, we used SPARQL CONSTRUCT queries to adapt EventKG’s triples, specifically, we replaced EventKG IRIs with the original DBpedia/Wikidata instances and aligned the predicates to SEM. The code implementing this adaptation is publicly available in our repository.¹⁰

We chose to compare our method’s output to EventKG using the standard F1, precision and recall metrics. For RQ1, which focuses on extracting a subgraph from a KG given an input event, we use these metrics at the event level to evaluate whether the extracted subject or object nodes are indeed sub-events of the target event. This allows us to assess the correctness and completeness of the events in the subgraph. For RQ2, which concerns constructing an event-centric KG, we use the same metrics at the triple level to evaluate whether the resulting KG correctly describes sub-events with relevant features like place, time, or actor. This helps us determine how well the structural component of the KG captures the intended event semantics. These metrics thus offer a standard and interpretable way to assess performance for both subgraph extraction and KG construction.

Datasets. Our experiments focus on DBpedia (2021-09) (Auer et al., 2007), Wikidata (2021-03-05) (Vrandeˇić & Krötzsch, 2014) and YAGO4 (Pellissier Tanon et al., 2020) because EventKG was built on them. We use the HDT (Header, Dictionary, and Triples) version of DBpedia, which we obtained from TriplyDB.¹¹ We obtain Wikidata from rdfhdt¹² and YAGO4 from their official website.¹³ We chose HDT primarily for performance and scalability reasons when working with large datasets such as DBpedia and Wikidata. HDT allows for fast, local querying thanks to its compressed, indexed structure, which significantly reduces loading and access times compared to traditional RDF formats or remote SPARQL endpoints.¹⁴ This is especially suitable for tasks involving repeated and large-scale traversals, such as our method. However, our method is not tied to HDT. We also implemented a SPARQL-based interface, enabling the traversal to be executed on any SPARQL endpoint. This ensures the approach remains flexible and broadly applicable beyond the HDT setting.

Table 5 shows information on events and their sub-events across datasets. All types of events are taken into account, which mainly include historical events, sports events and political events such as elections. We choose to retain solely the historical events with more than ten sub-events, as they offer more than a simple list of events. Nonetheless, we provide results on additional experiments on sports and political events in Section 5.3.2. The final distribution of historical events in EventKG and their average number of outgoing links are presented in Table 6. Events with a higher number of sub-events tend to have a higher average number of features. The limited number of outgoing links for YAGO4 resulted in traversals that would end after one iteration. YAGO4 was built on Wikidata with an improved schema, without the full content from Wikidata, and we found that the ingoing links were mostly about creative works, and the outgoing links about metadata, discarded for the search. This resulted in a limited number of features. We thus only used Wikidata and DBpedia.

Table 5.
Statistics on Events and Their Sub-Events Across Datasets, Extracted From EventKG.

Number of sub-events

Dataset $=$ 1 >1 >10 Retained

Wikidata 203,988 238,094 2408 341

DBpedia 84,599 95,504 1333 250

YAGO4 70,738 76,682 993 306

Each cell shows, per dataset, the number of events with at least one sub-event, broken down by the total number of sub-events: exactly one ( $=$ 1), more than one (>1), and more than ten (>10). The “Retained” column indicates the subset of historical events with more than 10 sub-events that were retained for our experiments. KG = knowledge graph.

Table 6.
Number of Retained Events and Average Number of Features for Events With More Than $n$ Sub-Events.

Number of sub-events Average number of features

Dataset >10 >20 >30 >50 >100 >10 >20 >30 >50 >100

Wikidata 341 208 147 91 47 237 273 290 323 381

DBpedia 275 158 106 66 37 122 133 141 147 160

YAGO4 306 180 126 76 44 5 4 5 5 5

Filters and preference function definition. The filters from Definition 9 were generic with some event-centric examples for the WHAT, WHERE, WHEN, and WHO filters. Table 7 describes the predicates we used for each of the filters. Each filter relates to one of the core classes of the SEM ontology (Van Hage et al., 2011), making them event-centric. However, other filters can also be implemented by instantiating other variables, making the method easily expandable and usable for other tasks.

Table 7.
Instantiated Variables Used for Each of the Filters and the Preference Function.

Name Type Dataset Variable Predicates

WHAT Predicate filter DB $R^{'}$ rdf:type

WD wp:P31

WHO Entity filter DB $E^{'}$ dbo:Person

$p$ rdf:type

WD $E^{'}$ wd:Q5

$p$ wd:P31

WHERE Entity filter DB $E^{'}$ dbo:Place

dbo:Location

$p$ rdf:type

WD $E^{'}$ wd:P17

wp:P276

wd:Q6256

$p$ wp:P31

WHEN Temporal filter DB $p s$ dbp:date

dbp:startDate

dbp:birthDate

$p e$ dbp:endDate

dbp:deathDate

WD $p s$ wp:P585

wp:P580

wp:P569

$p e$ wp:P582

wp:P570

Preference function – DB $d$ rdfs:domain

$s c$ rdfs:subClassOf

$t_{e}$ dbo:Event

WD $d$ wd:Q21503250

$s c$ wp:P279

$t_{e}$ wd:Q13418847

DB = DBpedia; WD = Wikidata.

4.2. Quantitative Evaluation (RQ1 and RQ2)

	Number of sub-events
Wikidata	203,988	238,094	2408	341
DBpedia	84,599	95,504	1333	250
YAGO4	70,738	76,682	993	306

	Number of sub-events	Average number of features
Wikidata	341	208	147	91	47	237	273	290	323	381
DBpedia	275	158	106	66	37	122	133	141	147	160
YAGO4	306	180	126	76	44	5	4	5	5	5

We now present the quantitative evaluation of our method, structured around RQ1 on event-centric subgraph extraction and RQ2 on KG construction. Section 4.2.1 presents the parameter selection procedure for the event-centric subgraphs (RQ1), whereas Section 4.2.2 reports the results of the search component, including comparisons with baselines. Section 4.2.3 addresses RQ2 by evaluating the construction of enriched event-centric KGs.

4.2.1. Parameter Selection for the Search (RQ1)

For chronographer and nautilod-m , we first experimented on 12 events to select the most meaningful parameters to run the graph traversal on all events. Table 8 shows the 12 events used for parameter selection for the search. There were 192 possible combinations of parameters ( $6 \times 2^{5}$ ) for chronographer , and 16 ( $2^{4}$ ) for nautilod-m .

Table 8.
Events Used for Parameter Selection.

World War I American Indian Wars Mediterranean and Middle East Theatre of World War II,

French Revolution Cold War European Theatre of World War II

Napoleonic Wars Coalition Wars European Theatre of World War I

Pacific War Russian Civil War Yugoslav Wars

World War I	American Indian Wars	Mediterranean and Middle East Theatre of World War II,
French Revolution	Cold War	European Theatre of World War II
Napoleonic Wars	Coalition Wars	European Theatre of World War I
Pacific War	Russian Civil War	Yugoslav Wars

We investigated on the correlation between our ranking and filtering parameters and the best F1 score. For chronographer , we found a strong positive correlation between $d o m a i n_{r} a n g e$ and higher F1 scores. Prioritizing events based on their domain range thus significantly improves event retrieval performance. We observed a weak but statistically significant correlation between $w h e n$ and the F1 score, suggesting that adding temporal information could help the search process. For nautilod-m , we found a strong positive correlation between $w h a t$ and the best F1 score. The $w h a t$ filter indeed enables to remove more generic nodes from the search, which proves its efficiency here.

Table 9 shows the contributions of each filter individually. As an illustration, $w h a t = 1$ means that only the experiments with this parameter activated, and all the others set to 0, are taken into account. For nautilod-m , the parameter that yields the best result is $w h a t$ , with an average F1 of 0.36. The best average F1 score is obtained when all parameters are activated. On average, chronographer performs better or comparably to nautilod-m , under all configurations. The parameter that yields the best results is $d o m a i n_{r} a n g e$ . For computational and performance reasons, we set the following parameters: domain_range = when = what = where = who = 1. We kept all metrics but the inverse ones for chronographer for the full experiments.

Table 9.

Mean F1 Score for Individual Parameters. no filter and all Means That All Parameters Are Set to 0 and 1 Respectively. pf, pof, epf, epof, ipf, and ipf Are the Six Possible Scoring Metrics (Section 4.1).

		chronographer
	nautilod-m	pf	pof	epf	epof	ipf	ipof
$d o m a i n_{r} a n g e = 1$	N/A	0.66	0.77	0.66	0.77	0.66	0.65
$w h a t = 1$	0.36	0.50	0.48	0.37	0.39	0.25	0.27
$w h e r e = 1$	0.28	0.47	0.48	0.33	0.39	0.25	0.23
$w h e n = 1$	0.27	0.47	0.48	0.45	0.43	0.29	0.30
$w h o = 1$	0.25	0.43	0.50	0.39	0.48	0.26	0.26
no filter	0.31	0.46	0.48	0.34	0.39	0.24	0.24
all	0.53	0.69	0.77	0.69	0.77	0.69	0.72

pf = predicate frequency; pof = predicate-object frequency; epf = entropy predicate frequency; epof = entropy predicate-object frequency; ipf = inverse predicate frequency; ipof = inverse predicate-object frequency.

Lastly, Table 10 shows the results of all systems. We set a timeout of 10 h for each experiment. Only five experiments from ldspider-m timed out. In these cases, we took the latest output of the search at that stage. Our systems achieve better result than the baselines. The search is furthermore more localized and efficient, since the output graphs and the runtime are smaller.

Table 10.

Results Comparison of Various Systems.

System	F1	Precision	Recall	Nb_Nodes	Nb_Preds	Nb_Triples	Runtime
random-5	0.24	0.68	0.16	40,356	70	40,915	38’00”
random-10	0.31	0.54	0.24	1,554	70	2,027	43’00”
random-15	0.28	0.57	0.21	11,890	89	12,744	25’00”
ldspider-m	0.23	0.70	0.23	126,015	137	241,049	5’20”
nautilod-m	0.53	0.51	0.63	5,096	121	13,755	23’00”
chronographer-pf	0.69	0.70	0.71	874	21	2366	1’51”
chronographer-pof	0.78	0.79	0.77	750	17	2090	2’13”
chronographer-epf	0.69	0.70	0.72	857	20	2353	1’35”
chronographer-epof	0.77	0.78	0.77	726	17	2082	1’42”

pf = predicate frequency; pof = predicate-object frequency; epf = entropy predicate frequency; epof = entropy predicate-object frequency.

4.2.2. Search Results (RQ1)

Table 11 shows the final results on the remaining 604 events for all systems. The three metrics tend to improve when moving from the random baselines to our chronographer methods. random-5 has the worst F1 scores, and chronographer-epof has the best ones. The differences are the most visible in the recall scores, which means that the informed systems leave out less false negative than the random baselines. Precision scores for Wikidata tend to be higher than the ones for DBpedia, whereas it is the reverse for recall. Therefore, the content retrieved in Wikidata is less noisy but also less complete. The random baselines and ldspider-m yield significantly bigger graphs than nautilod-m and the chronographer baselines, and take more time to run. nautilod-m performs better than the above baselines, however the size of the graphs and the runtime remain higher than the chronographer baselines ( $\times 4$ ).

Table 11.
Systems Results on the 604 Events.

F1 Precision Recall Nb Nodes Nb Preds Nb Triples Runtime

System DB WD DB WD DB WD DB WD DB WD DB WD DB WD

random-5 0.65 0.63 0.72 0.79 0.68 0.59 22,185 427 23 9 23,221 530 23’00” 11’00”

random-10 0.65 0.64 0.72 0.78 0.69 0.60 30,078 1033 27 10 33,264 1470 18’56” 11’33”

random-15 0.64 0.63 0.71 0.78 0.68 0.60 41,868 1139 27 11 53,561 1582 18’26” 16’43”

ldspider-m 0.64 0.63 0.73 0.78 0.68 0.61 131,607 4992 52 20 239,548 7389 27’05” 44’05”

nautilod-m 0.70 0.66 0.72 0.81 0.74 0.62 481 4787 23 14 1150 7321 2’02” 39’26”

chronographer-pf 0.75 0.67 0.75 0.82 0.80 0.63 108 661 5 6 213 1146 26” 9’38”

chronographer-pof 0.77 0.68 0.76 0.83 0.82 0.64 97 612 5 6 196 863 25” 8’43”

chronographer-epf 0.75 0.67 0.75 0.83 0.80 0.62 108 670 6 6 212 963 32” 9’07”

chronographer-epof 0.78 0.68 0.77 0.83 0.82 0.63 99 752 5 6 199 1093 24” 8’16”

DB = DBpedia, WD = Wikidata; pf = predicate frequency; pof = predicate-object frequency; epf = entropy predicate frequency; epof = entropy predicate-object frequency.

	F1	Precision	Recall	Nb Nodes	Nb Preds	Nb Triples	Runtime
random-5	0.65	0.63	0.72	0.79	0.68	0.59	22,185	427	23	9	23,221	530	23’00”	11’00”
random-10	0.65	0.64	0.72	0.78	0.69	0.60	30,078	1033	27	10	33,264	1470	18’56”	11’33”
random-15	0.64	0.63	0.71	0.78	0.68	0.60	41,868	1139	27	11	53,561	1582	18’26”	16’43”
ldspider-m	0.64	0.63	0.73	0.78	0.68	0.61	131,607	4992	52	20	239,548	7389	27’05”	44’05”
nautilod-m	0.70	0.66	0.72	0.81	0.74	0.62	481	4787	23	14	1150	7321	2’02”	39’26”
chronographer-pf	0.75	0.67	0.75	0.82	0.80	0.63	108	661	5	6	213	1146	26”	9’38”
chronographer-pof	0.77	0.68	0.76	0.83	0.82	0.64	97	612	5	6	196	863	25”	8’43”
chronographer-epf	0.75	0.67	0.75	0.83	0.80	0.62	108	670	6	6	212	963	32”	9’07”
chronographer-epof	0.78	0.68	0.77	0.83	0.82	0.63	99	752	5	6	199	1093	24”	8’16”

To summarize findings on event-centric subgraph extraction (RQ1), adding event-centric filters (WHAT, WHERE, WHEN, and WHO) and a ranking step with an entropy predicate-object frequency heuristics helps to select relevant events while being more efficient time-wise and while exploring smaller parts of the graphs, as shown by the better performance of the chronographer systems in Table 11.

4.2.3. Event-Centric KG Population Results (RQ2)

To generate our event-centric KGs, we use information from triples (i) and text (ii). On average, over all 281 historical events with more than 10 sub-events from DBpedia, there were 28,123 triples generated during the extraction from text, and 548 other triples. There are therefore 51 more triples generated during the extraction from text than other triples. For the triples generated from the information extraction the KG predicates are distributed, on average per event, as 41.9% rdf predicates, 40.0% wsj predicates, 12.2% nif predicates, 5.6% skos predicates, and 0.3% dbo:abstract predicates.

(i) Constructing event-centric KG from triples. We generate event-centric KGs for each event using all sub-events from the ground truth. We compare the number of common triples against EventKG. The results are shown in Table 12. ChronoGrapher achieves an overall F1 score of 67.2% and 76% for DBpedia and Wikidata. Precision scores for DBpedia are higher than for Wikidata, which means that the event-centric KGs are less noisy but less complete. For Wikidata, it is the opposite.

Table 12.
Our Event-Eentric KGs Against EventKG.

F1 Precision Recall

Predicate DB WD DB WD DB WD

all 67.2 76.0 92.1 72.1 52.9 80.4

sem:hasActor 65.0 43.6 90.4 58.1 50.7 34.9

sem:hasBeginTimeStamp 77.7 81.9 90.5 71.7 68.0 95.5

sem:hasEndTimeStamp 77.7 75.9 90.5 70.4 68.1 82.4

sem:hasPlace 63.6 90.3 95.0 92.1 47.8 88.6

DB = DBpedia; WD = Wikidata; KGs = knowledge graphs.

	F1	Precision	Recall
all	67.2	76.0	92.1	72.1	52.9	80.4
sem:hasActor	65.0	43.6	90.4	58.1	50.7	34.9
sem:hasBeginTimeStamp	77.7	81.9	90.5	71.7	68.0	95.5
sem:hasEndTimeStamp	77.7	75.9	90.5	70.4	68.1	82.4
sem:hasPlace	63.6	90.3	95.0	92.1	47.8	88.6

Table 13 shows the F1, precision, and recall scores of the narrative graphs generated from the output of the search algorithm presented in Section 3.1. Our end-to-end system achieves an overall (for all predicates) F1 score of 51.7% and 49.2%, respectively. As in Table 12, precision is higher for DBpedia, while recall tends to be higher for Wikidata. The results are furthermore lower than those in Table 12, which is expected since the output of the search also contains events that are not in the ground truth events. Consequently, for each of such event, all generated triples will not be in the ground truth from EventKG.

Table 13.

Metrics of Generated Narrative Graphs Compared to EventKG.

	F1		Precision		Recall
Pred	DB	WD	DB	WD	DB	WD
all	51.7	49.2	79.3	41.1	38.4	61.4
sem:hasActor	48.5	27.6	76.5	35.1	35.5	22.8
sem:hasBeginTimeStamp	62.6	50.3	79.5	38.0	51.6	74.4
sem:hasEndTimeStamp	62.7	48.3	79.5	38.1	51.7	65.9
sem:hasPlace	48.2	59.2	81.6	50.7	34.2	71.2

DB = DBpedia; WD = Wikidata; KGs = knowledge graphs.

(ii) Constructing event-centric KG from text. We generate event-centric KGs from all the DBpedia events and sub-events that contain an abstract. The content that is extracted in this part is not present in EventKG. In total, this sums up to 9438 events. On average, each event had 18 distinct frames and 48 instantiations of frames with at least one mapped role. The frames that appear the most in terms of number are: Military (26,681), Hostile_encounter (26,111), Political_locales (17,513), Attack (16,155), and Leadership (14,002). We are interested in events and relations between events, hence we focus on “causation” frames for a qualitative analysis, but many other frames are extracted. We manually annotate (1) whether there is a causation frame (2) whether the cause and the effect are correctly identified (3) whether the DBpedia entity is correct for 100 frame annotations. Out of 100 frames, 97 had at least one identified Cause or Effect. A total of 85 were correctly identified as Causation frames. False positive causation frames often contain causative lemmas such as the noun “cause” that might have induced the model in error. Among the 171 retrieved frame elements, 151 (88.3%) were correct. The majority of wrong frame elements resulted from false positive causation frames. Among the 94 DBpedia entities that were identified, 80 (85.1%) were correct.

To summarize findings on event-centric KG population (RQ2), we use both information extraction from triples and text. For triples, our system achieves F1 scores of 67.2% and 76% for DBpedia and Wikidata. For text, we conduct a qualitative evaluation on the “causation” frames and find that the system is able to retrieve information with good performance.

To illustrate the constructed KGs, we provide several sample KG files in the public repository,¹⁵ along with a file describing the contents of each. These examples, based on the French Revolution scenario, illustrate the different KG construction strategies used in our pipeline. Specifically, eventkg_ng.ttl corresponds to the KG generated from EventKG; frame_ng.ttl is derived from event abstracts (abstract-based); generation_ng.ttl results from ground-truth event IRIs (IRI-based); and search_ng.ttl captures the KG generated from the output of the subgraph extraction component (IRI-based).

4.3. User Study (RQ3)

We conducted a user study focusing on a QA task. The goal of the study was to evaluate whether integrating our constructed event-centric KGs could enhance the quality of answers produced by LLMs, compared to using LLMs alone. Section 4.3.1 first outlines the design of the user study, including the preparation of evaluation data and the experimental setup. Section 4.3.2 presents and analyzes the results of our user study.

4.3.1. User Study Setup

To evaluate whether integrating event-centric KGs enhances the performance of LLMs compared to using LLMs alone (RQ3), we designed six SEM-centric question types, each reflecting a core role or relation from the SEM ontology: summary of an event, causal explanations for an event, types of events within a time-range, sub-events of an event, events in which an actor participated, and events in which two actors both participated. For each type, we created a question template and instantiated it with two manually selected events, resulting in 12 questions. Each question was manually associated with an event during the design phase, no automated mapping was used. Once the event was selected, we retrieved its corresponding event-centric KG using SPARQL queries, and used the resulting triples as context in the LLM prompt alongside the natural language question. Table 14 shows examples of the six types of questions.

Table 14.
Event-Centric Templated Question Examples.

No. Type Example

1 Summary “Please provide a summary of the French Revolution”

2 Causal explanation “What happened at the end of the French Revolution? Please provide the main causal happenings that led to the unfolding of the French Revolution.”

3 Event types within a time range “What were the main type of events that happened between 1792-01-01 and 1793-01-01 during the French Revolution?”

4 Sub-events “What were the main events of the French Revolutionary Wars during the French Revolution? Can you list and order them in chronological order?”

5 One actor in an event “What happened to Antoine Balland during the French Revolution?”

6 Two actors in an event “In which events were Jean-Baptiste Jourdan and Joseph Bonaparte both involved during the French Revolution?”

No.	Type	Example
1	Summary	“Please provide a summary of the French Revolution”
2	Causal explanation	“What happened at the end of the French Revolution? Please provide the main causal happenings that led to the unfolding of the French Revolution.”
3	Event types within a time range	“What were the main type of events that happened between 1792-01-01 and 1793-01-01 during the French Revolution?”
4	Sub-events	“What were the main events of the French Revolutionary Wars during the French Revolution? Can you list and order them in chronological order?”
5	One actor in an event	“What happened to Antoine Balland during the French Revolution?”
6	Two actors in an event	“In which events were Jean-Baptiste Jourdan and Joseph Bonaparte both involved during the French Revolution?”

We used three different types of prompts: (a) base: base prompt, without context triples, (b) db-kg: base + context triples from DBpedia, and (c) ec-kg: base + context triples from the event-centric KG built by ChronoGrapher. Table 15 details how context triples are retrieved for each question type, both for DBpedia and ChronoGrapher. The detailed queries are available in the code.¹⁶ To integrate structured knowledge into the model’s reasoning process without fine-tuning, we adopt a prompting strategy based on in-context learning. Specifically, we append relevant context triples to a base question prompt. This method enables the model to consider factual information from the knowledge graph at inference time. On average, 345 triples from db-kg and 75 from ec-kg are included in each prompt, with ranges of 18–2606 and 6–434 triples, respectively, depending on the question and retrieved context. On average, 54 triples in the ec-kg prompts originated from the text-based triple enrichment step, accounting for $\sim$ 51% of the total triples. The remaining triples were contributed by the structured triple enrichment process.

Table 15.

Context Triples Retrieval for DBpedia and the Event-centric KGs Built by ChronoGrapher. The Context Triples Are Retrieved Through CONSTRUCT Queries or Equivalent Implementation Using the HDT Data Format.

#	Type	DBpedia	ChronoGrapher
1	Summary	Ingoing and outgoing triples of the event	Temporal, spatial, and frame-based contexts
2	Causal explanation	Ingoing and outgoing triples of the event	Frame annotations for causal frames (e.g., frame:Causation)
3	Event types within a time range	Events of type dbo:Event with timestamps, filtered by time range	Events filtered using start/end timestamps
4	Sub-events	Linked via dbo:isPartOfMilitaryConflict and their properties	Events connected with sem:subEventOf, enriched with abstracts and temporal data
5	One actor in an event	Events where the actor appears in subject or object position	Events with sem:hasActor or frame-mapped role mentions linked to the actor
6	Two actors in an event	Events where both actors co-occur in triples	Events where both actors co-occur as sem:hasActor

KGs = knowledge graphs; HDT = Header, Dictionary, Triples.

Table 16.

Qualitative Evaluation Metrics Used in the User Study. Grounding Was Assessed by Us Only.

Metric	Definition	Scoring guidelines to the participants
Granularity	Level of specificity in the answer (generic vs. specific references)	Is the answer generic or specific? (mention of generic events vs. mention of specific events). A specific answer should get a high score, whereas a generic answer should get a low score.
Relevance	Degree to which the answer addresses the question	Do the answer provide an actual answer to the question? If so, should get a high score, else a low score.
Succinctness	Clarity and brevity of the answer	Is the answer brief and clearly expressed? If so, it should get a high score, else a low score.
Diversity	Lexical and semantic variation in the answer	Is the answer varied in content, or very repetitive? It should get a high score if the answer is diverse, and a low score if it is repetitive.
Grounding	Extent to which the answer is supported by the provided context triples (assessed internally)	–

In our user study, we used GPT-4.¹⁷ In April 2024, when we conducted the user study, GPT-4 was among the top-performing LLMs available,¹⁸ with strong capabilities in reasoning, coding, and general language understanding. At that time, GPT-4 was widely recognized for its performance across various benchmarks, making it a suitable choice for our user study.

The participants were asked to assess the quality of the answers with these criteria: (1) granularity of the information, (2) relevance to the question (3) succinctness, and (4) diversity in content, on a scale of 1 to 5. We furthermore manually assessed (5) grounding to factual events. A QA system must most importantly give faithful content, making the groundedness metric (5) a priority. The relevance metric measures how well the answer provides a plausible answer to the question, but not its actual truthfulness.

We gathered 10 participants for our user studies. We reached the participants of the user study throughout our research laboratories. The researchers who participated are based in Paris or Amsterdam, and they have a background in computer science and/or computational linguistics. They were not overly familiar with the questions we had them evaluated, and were familiar with LLMs systems and their pitfalls. We designed two forms¹⁹ with six questions each, and with three answers per question corresponding to the three prompt types. We split the 10 participants evenly across these two forms. Each participant rated 18 answers (6 question types $\times$ 3 prompt types). Each prompt was assessed 60 times for metrics 1–4 (6 $\times$ 10), and 12 times for metric 5 (6 $\times$ 2). Given the scale, we solely focused on the French Revolution.

4.3.2. User Study Results

Table 17 presents the average scores for each metric and prompt type. The base prompt has the lowest groundedness which undermines its reliability, and we found that it was prone to hallucinations, often about actors outside the LLM’s implicit knowledge base. This might be linked to its lowest granularity and highest succinctness scores, due to having less content to work with. For the remainder of the analysis, we discard the base prompt because of its low groundedness score.

Table 17.
Average Scores for Metrics on the Human Evaluation. The Groundedness Metric Is the Most Important One as It Assesses the Faithfulness of the Answers.

Type prompt Groundedness Granularity Relevance Succinctness Diversity

base 1.11 $\pm$ 1.47 (3.92 $\pm$ 0.96) (4.18 $\pm$ 1.03) (4.22 $\pm$ 0.92) (3.83 $\pm$ 1.03)

db-kg 2.24 $\pm$ 1.6 4.15 $\pm$ 0.9 4.02 $\pm$ 1.07 3.35 $\pm$ 1.01 3.97 $\pm$ 1.09

ec-kg 2.85 $\pm$ 1.86 4.13 $\pm$ 0.96 4.12 $\pm$ 1.01 3.62 $\pm$ 0.99 3.82 $\pm$ 1.03

Type prompt	Groundedness	Granularity	Relevance	Succinctness	Diversity
base	1.11 $\pm$ 1.47	(3.92 $\pm$ 0.96)	(4.18 $\pm$ 1.03)	(4.22 $\pm$ 0.92)	(3.83 $\pm$ 1.03)
db-kg	2.24 $\pm$ 1.6	4.15 $\pm$ 0.9	4.02 $\pm$ 1.07	3.35 $\pm$ 1.01	3.97 $\pm$ 1.09
ec-kg	2.85 $\pm$ 1.86	4.13 $\pm$ 0.96	4.12 $\pm$ 1.01	3.62 $\pm$ 0.99	3.82 $\pm$ 1.03

When comparing the triples-based prompts, ec-kg stands out with the highest groundedness (+27%) and succinctness (+8%) scores, indicating that its generated answers are both more faithful and concise. Additionally, ec-kg scores slightly higher in relevance (+2.5%) compared to db-kg, though db-kg achieves a higher diversity (+3.8%) score, likely due to its more varied context triples, and a slightly higher granularity score (+0.5%). Overall, ec-kg emerges as the most promising approach, balancing faithfulness, relevance, and conciseness effectively.

This first small user study thus hints that event-centric KGs can help an LLM provide more factual and concise answers to event-centric questions (2.85) than a base prompt (1.11) or a generic KG-based prompt (2.24), while maintaining succinctness.

5. Discussion

In this section, we reflect on the design choices, assumptions, and limitations of our approach. We begin by discussing the role of EventKG as the ground truth (Section 5.1) with an example. We then consider the methodological scope of our framework and motivate our design decisions, particularly the choice of a rule-based approach over learning-based alternatives (Section 5.2). Following this, we explore the generalisability of ChronoGrapher: both to new datasets (Section 5.3.1) and to different types of events beyond the historical domain we primarily focus on (Section 5.3.2). Lastly, we reflect on the potential uses of the event-centric KG produced by ChronoGrapher, and the kinds of usage and applications it supports (Section 5.4).

5.1. EventKG as the Ground Truth

As many parts of our evaluation rely on EventKG as a reference, we conducted a qualitative comparison on a few subevents of the French Revolution between subgraphs extracted by ChronoGrapher and their counterparts in EventKG. Our goal was to better understand the complementarity between the two systems.

Figure 6 provides a visual comparison between the outputs of EventKG and ChronoGrapher. In the top part of the figure, we observe that ChronoGrapher does not retrieve dbr:Flight_to_Varennes and dbr:Tennis_Court_Oath that are present in EventKG. Upon inspection, we found that these events are not explicitly typed as dbo:Event in DBpedia, which prevents ChronoGrapher from including them during traversal. Despite these omissions, ChronoGrapher also uncovers relevant sub-events absent from EventKG. For instance, it successfully identifies dbr:Battle_of_Amsteg and dbr:Invasion_of_Guadeloupe_(1794) as sub-events of the French Revolution, which arguably should be considered part of the historical narrative. In the middle part of Figure 6, we show that both EventKG and ChronoGrapher include temporal and spatial information. However, as illustrated in the bottom section of the figure, ChronoGrapher provides a more complete representation, offering additional sub-event links and richer actor associations.

Figure 6.

Comparison Between EventKG and the Output of ChronoGrapher With Two Events.

5.2. Methodological Scope

Our methodology adopts a heuristic, modular approach that prioritizes semantic transparency and temporal control in the construction of event-centric KGs. This design decision is motivated by the requirements of understanding and reasoning tasks, where explainability and domain alignment are essential. While basic in form, the approach performs effectively, as demonstrated in both automatic and user-based evaluations. Moreover, the focus of this work lies not in proposing a novel learning algorithm, but in defining a clear and extensible pipeline for event-centric KG construction and use. By anchoring event representation in a semantically coherent and temporally aware framework, we provide a reliable foundation for downstream learning-based methods, such as embedding models or graph neural networks, that could build upon the structured graphs produced. We are currently working on identifying the best syntactic representations to use to train embeddings for event-centric KGs. We believe this balance of simplicity, effectiveness, and extensibility offers practical value for real-world applications while supporting future integration with more complex models.

5.3. Generalisibility of ChronoGrapher

We discuss the generalisability of ChronoGrapher, both in terms of its adaptability to different datasets (Section 5.3.1) and its applicability to diverse types of events (Section 5.3.2).

5.3.1. Generalisibility to Other Datasets

ChronoGrapher is flexible and could be applied to other types of nodes or datasets, including those using the PROV vocabulary, which could also yield event-centric graphs. As a reminder, it consists of two components: (1) a pruned, semantically informed best-first search, and (2) a KG constructor. The search component is highly configurable through a lightweight configuration file.²⁰ and supports various traversal strategies, such as

–
Semantically informed or random walks, with or without ground truths.
–
Target-based walks in the graph.

This is illustrated in our open-source implementation.²¹ The search is also not bound to focus on events only, but can be configured to focus on any type of nodes, such as people or places. Lastly, the search can be run on any KG that is accessible through an HDT dump or a SPARQL endpoint. The constructor is adaptable, provided that predicate labels are retrievable (e.g., via SPARQL).

Furthermore, while ChronoGrapher does not currently implement incremental updates, its design allows for a practical workaround. In dynamic settings (e.g., live sports events), one could extract only the newly added parts of the dataset—centered, for instance, around new event nodes—and rerun ChronoGrapher on these subgraphs. This avoids reconstructing the full event-centric KG from scratch and enables more efficient updates.
5.3.2. Generalisibility to Other Event Types

While our evaluation primarily focuses on historical conflict events, we also explored the applicability of our approach to other domains, notably political and sports events. These types of events often differ structurally from historical ones in KGs: political events (e.g., elections) and sports events (e.g., Olympic competitions) tend to be organized as lists of sub-events (disciplines, rounds, or contests) within a broader event. Moreover, the semantic richness in these domains is often limited, with many relationships captured through generic predicates such as dbo:wikiPageWikiLink, which lack fine-grained semantics.

To assess performance in this context, we conducted additional experiments using DBpedia, incorporating dbo:wikiPageWikiLink as an accepted relation for traversal. Out of 42 political events and 982 sports events, 5% and 48%, respectively, received an F1 score of zero, with average F1 scores of 56.5% for political events and 22.5% for sports events (considering only one iteration to reduce noise from the dbo:wikiPageWikiLink links). These lower performances reveal structural and semantic limitations in the current DBpedia encoding of such events. For instance, we observe that some relevant nodes lack explicit rdf:type information. These insights highlight important directions for future work, including better category integration, improved type inference, and more precise handling of weakly typed links in domain-specific event graphs.

5.4. Usage of the Event-Centric KG

In this work, ChronoGrapher was designed to support a wide variety of question types by focusing on events, people, locations, frame-based relationships, and causal interactions. The event-centric KGs can still capture a rich set of relationships between entities and events. This approach allows us to answer questions ranging from straightforward fact retrieval (e.g., Who was involved? or Where did the event take place?) to more complex reasoning questions (e.g., What caused this event? or What were the consequences?).

However, we acknowledge that the types of questions are inherently constrained by the content it contains. As is the case with any knowledge-based task, the quality and scope of the questions answered are determined by the data represented in the KG. This limitation is a natural consequence of working with structured data, where the knowledge captured reflects the domain’s scope and the granularity of its representation. Furthermore, ChronoGrapher’s design aligns with the broader goal of narrative construction, which aims to facilitate understanding and reasoning about events within a narrative context. In our previous work, we identified specific requirements for narrative structures that inform the development of such knowledge graphs, ensuring that Chronographer can address a variety of questions within its scope and purpose (Blin et al., 2024).

Adapting ChronoGrapher to a new dataset requires some familiarity with the target KG’s ontology, particularly to identify predicates analogous to those in the SEM ontology (e.g., for time, location, or actor roles). While this setup currently requires manual configuration, we envision an interface that could assist non-expert users. Such an interface could suggest relevant predicates based on common usage in the dataset and help define event-centric filters through guided forms or templates. In practice, applying ChronoGrapher to new domains may also involve collaboration with a domain curator or a developer familiar with the schema. These additions would help extend ChronoGrapher’s applicability beyond users familiar with coding.

6. Conclusion

We address the challenge of automatically constructing event-centric KGs from generic ones. We present ChronoGrapher that extracts event-centric subgraphs from generic KGs, and builds event-centric KGs. To extract event-centric subgraphs (RQ1), ChronoGrapher contains a pruned, semantically informed, best-first search traversal integrating event-centric filters to prune the search space and a ranking step for node prioritization, yielding F1 scores of 0.78 and 0.68 for DBpedia and Wikidata. To generate event-centric KGs (RQ2), ChronoGrapher combines a structured triple enrichment based on IRIs and a textual triple enrichments based on abstracts encoded a literals in KGs, achieving F1 scores of 0.67 and 0.76 for DBpedia and Wikidata, and information extraction from text. To evaluate whether integrating event-centric KGs enhances the performance of LLMs compared to using LLMs alone (RQ3), we conduct first experiments comparing different prompts on an event-centric QA setting, and show that prompts enriched with event-centric KG triples give more factual answers while maintaining succinctness (2.95 vs. 1.11 and 2.24).

Future work will focus on improving ChronoGrapher in a number of directions, including expanding the search method to reach more sub-events using, for example, EventKG for additional type information; expanding the event-centric KG population beyond EventKG to tackle data noise and incompleteness; improving the modeling of complex entities; considering novel ontologies and vocabulary for the event representation (e.g., RDF-star); and use more sophisticated RAG-based methods for a larger user study. We also plan to work on additional quantitative and automatic metrics for a better evaluation of the event-centric KGs.

Supplemental Material Statement: Source code for our system, the baselines we (re-)implemented and the experiments are available on Github.²² The source code contains the predicate labels that were used to generate the event-centric KG (Section 3.2), pointers to download the datasets and more generic statistics on events and their sub-events (Section 4.1), the 12 events for parameter selection (Section 4.2.1), additional results on experiments on the end-to-end system combining the event extraction and the graph generation and the manual annotations for the causation frames (Section 4.2.3), and all code, prompts and data linked to the user study (Section 4.3).

Footnotes

Acknowledgements

This work was funded by the European MUHAI project (Horizon 2020 research and innovation program) under grant agreement number 951846 and the Sony Computer Science Laboratories-Paris. We thank Frank van Harmelen for fruitful discussions, and our reviewers for their constructive and insightful feedback. We also thank our user study participants for their time and insights. Figures were created using draw.io.

Funding

This work was funded by the European MUHAI project (Horizon 2020 research and innovation program) under grant agreement number 951846 and the Sony Computer Science Laboratories - Paris.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

ORCID iDs

Inès Blin

Ilaria Tiddi

Remi van Trijp

Annette ten Teije

Notes

References

Alkhateeb

Baget

J-F.

Euzenat

(2009). Extending sparql with regular expression patterns (for querying rdf). Journal of Web Semantics, 7(2), 57–73.

Allen

J. F.

(1983). Maintaining knowledge about temporal intervals. Communications of the ACM, 26(11), 832–843.

Auer

Bizer

Kobilarov

Lehmann

Cyganiak

Ives

(2007). Dbpedia: A nucleus for a web of open data. In The semantic web (pp. 722–735). Springer.

Bakalara

Guyet

Dameron

Happe

Oger

(2021). An extension of chronicles temporal model with taxonomies-application to epidemiological studies. In HEALTHINF 2021 - 14th International conference on health informatics (pp. 1–10).

Blin

ten Teije

van Harmelen

Tiddi

(2024). Structured representations for narratives. In International conference on knowledge engineering and knowledge management (pp. 133–154). Springer.

Blomqvist

Presutti

Daga

Gangemi

(2010). Experimenting with extreme design. In International conference on knowledge engineering and knowledge management (pp. 120–134). Springer.

Boschee

Lautenschlager

O’Brien

Shellman

Starz

Ward

(2015). ICEWS coded event data. https://doi.org/10.7910/DVN/28075

Carriero

V. A.

Gangemi

Mancinelli

M. L.

Marinucci

Nuzzolese

A. G.

Presutti

Veninata

(2019). Arco: The Italian cultural heritage knowledge graph. In International semantic web conference (pp. 36–52). Springer.

Chanin

(2023). Open-source frame semantic parsing. arXiv preprint arXiv:2303.12788.

10.

Chen

Fisch

Weston

Bordes

(2017). Reading wikipedia to answer open-domain questions. In R. Barzilay & M. Y. Kan (Eds.), Proceedings of the 55th Annual meeting of the association for computational linguistics (Volume 1: Long Papers) (pp. 1870–1879). Vancouver, Canada: Association for Computational Linguistics. https://doi.org/10.18653/v1/P17-1171

11.

Chen

Lin

Han

Sun

(2024). Benchmarking large language models in retrieval-augmented generation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 38, pp. 17754–17762).

12.

Cheng

Zhang

(2014). Explass: Exploring associations between entities via top-k ontological patterns and facets. In The Semantic Web–ISWC 2014: 13th International semantic web conference, Riva del Garda, Italy, October 19–23, 2014. Proceedings, Part II 13 (pp. 422–437). Springer.

13.

Del Mondo

Peng

Gensel

Claramunt

(2021). Leveraging spatio-temporal graphs and knowledge graphs: Perspectives in the field of maritime transportation. ISPRS International Journal of Geo-Information, 10(8), 541.

14.

De Vocht

Coppens

Verborgh

Vander Sande

Mannens

Van de Walle

(2013). Discovering meaningful connections between resources in the web of data. In LDOW.

15.

Eschauzier

Taelman

Verborgh

(2023). How does the link queue evolve during traversal-based query processing? In QuWeDa/MEPDaW@ ISWC (pp. 26–33).

16.

Fang

Sarma

A. D.

Bohannon

(2011). Rex: Explaining relationships between entity pairs. arXiv preprint arXiv:1111.7170.

17.

Fillmore

C. J.

Baker

C. F.

(2001). Frame semantics for text understanding. In Proceedings of wordnet and other lexical resources workshop, NAACL (Vol. 6).

18.

Filtz

Navas-Loro

Santos

Polleres

Kirrane

(2020). Events matter: Extraction of events from court decisions.

19.

Fionda

Pirrò

(2017). Explaining graph navigational queries. In The Semantic Web: 14th International conference, ESWC 2017, Portorož, Slovenia, May 28–June 1, 2017, Proceedings, Part I 14 (pp. 19–34). Springer.

20.

Fionda

Pirrò

Consens

M. P.

(2019). Querying knowledge graphs with extended property paths. Semantic Web, 10(6), 1127–1168.

21.

Fionda

Pirrò

Gutierrez

(2015). Nautilod: A formal language for the web of data graph. ACM Transactions on the Web (TWEB), 9(1), 1–43.

22.

Gangemi

Alam

Asprino

Presutti

Recupero

D. R.

(2016). Framester: A wide coverage linguistic linked data hub. In Knowledge engineering and knowledge management: 20th International conference, EKAW 2016, Bologna, Italy, November 19–23, 2016, Proceedings 20 (pp. 239–254). Springer.

23.

Gottschalk

Demidova

(2018). Eventkg: A multilingual event-centric temporal knowledge graph. In European semantic web conference (pp. 272–287). Springer.

24.

Gottschalk

Demidova

(2019). Eventkg—the hub of event knowledge on the web—and biographical timeline generation. Semantic Web, 10(6), 1039–1070.

25.

Gottschalk

Demidova

(2020). Eventkg. https://doi.org/10.5281/zenodo.4720078

26.

Guan

Cheng

Bai

Zhang

Zeng

Jin

Guo

(2022). What is event knowledge graph: A survey. IEEE Transactions on Knowledge and Data Engineering, 35, 7569–7589.

27.

Guu

Lee

Tung

Pasupat

Chang

M. W.

(2020). REALM: Retrieval-Augmented language model pre-training. 10.48550/arXiv.2002.08909. http://arxiv.org/abs/2002.08909. ArXiv:2002.08909 [cs].

28.

Guyet

(2020). Enhancing sequential pattern mining with time and reasoning. PhD Thesis, Université de Rennes 1.

29.

Harris

Seaborne

Prud’hommeaux

(2013). Sparql 1.1 query language. w3c recommendation (2013). https://www. w3. org/TR/sparql11-query

30.

Hartig

(2013). Squin: A traversal based query execution system for the web of linked data. In Proceedings of the 2013 ACM SIGMOD international conference on management of data (pp. 1081–1084).

31.

Hartig

Bizer

Freytag

J. C.

(2009). Executing sparql queries over the web of linked data. In International semantic web conference (pp. 293–309). Springer.

32.

Hartig

Freytag

J. C.

(2012). Foundations of traversal based query execution over linked data. In Proceedings of the 23rd ACM conference on Hypertext and social media (pp. 43–52).

33.

Hartig

Özsu

M. T.

(2016). Walking without a map: Ranking-based traversal for querying linked data. In The Semantic Web–ISWC 2016: 15th International semantic web conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part I 15 (pp. 305–324). Springer.

34.

Hartig

Pérez

(2016). Ldql: A query language for the web of linked data. Journal of Web Semantics, 41, 9–29.

35.

Hellmann

Lehmann

Auer

Brümmer

(2013). Integrating nlp using linked data. In The Semantic Web–ISWC 2013: 12th International semantic web conference, Sydney, NSW, Australia, October 21–25, 2013, Proceedings, Part II 12 (pp. 98–113). Springer.

36.

Herrera

J. E. T.

(2017). On the connectivity of entity pairs in knowledge bases. PhD Thesis, Department of Informatics, Pontifical Catholic University of do Rio de Janeiro.

37.

Herrera

J. E. T.

Casanova

M. A.

Nunes

B. P.

Lopes

G. R.

Leme

(2016). Dbpedia profiler tool: Profiling the connectivity of entity pairs in dbpedia. In Proceedings of the 5th international workshop on intelligent exploration of semantic data (IESD 2016) (pp. 17–18).

38.

Hulpuş

Prangnawarat

Hayes

(2015). Path-based semantic relatedness on linked data and its use to word and entity disambiguation. In The Semantic Web-ISWC 2015: 14th International semantic web conference, Bethlehem, PA, USA, October 11–15, 2015, Proceedings, Part I 14 (pp. 442–457). Springer.

39.

Isele

Umbrich

Bizer

Harth

(2010). Ldspider: An open-source crawling framework for the web of linked data. In Proceedings of the 2010 International conference on posters & demonstrations track (Vol. 658, pp. 29–32). Citeseer.

40.

Jeh

Widom

(2002). Simrank: A measure of structural-context similarity. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 538–543).

41.

Jia

Pramanik

Saha Roy

Weikum

(2021). Complex temporal question answering on knowledge graphs. In Proceedings of the 30th ACM international conference on information & knowledge management (pp. 792–802).

42.

Jiménez

J. G.

Leme

L. A. P. P.

Casanova

M. A.

(2022). Coepinkb: Evaluating path search strategies in knowledge bases. Journal of the Brazilian Computer Society, 28(1), 13–25.

43.

Karpukhin

Oguz

Min

Lewis

Edunov

Chen

Yih

Wt.

(2020). Dense passage retrieval for open-domain question answering. In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) (pp. 6769–6781). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.550

44.

Kawamura

Egami

Tamura

Hokazono

Ugai

Koyanagi

Nishino

Okajima

Murakami

Takamatsu

(2019). Report on the first knowledge graph reasoning challenge 2018. In Joint international semantic technology conference (pp. 18–34). Springer.

45.

Kochut

K. J.

Janik

(2007). Sparqler: Extended sparql for semantic association discovery. In The Semantic Web: Research and applications: 4th European semantic web conference, ESWC 2007, Innsbruck, Austria, June 3–7, 2007. Proceedings 4 (pp. 145–159). Springer.

46.

Kroll

Nagel

Balke

W. T.

(2020). Modeling narrative structures in logical overlays on top of knowledge repositories. In International conference on conceptual modeling (pp. 250–260). Springer.

47.

Lee

Chang

M. W.

Toutanova

(2019). Latent retrieval for weakly supervised open domain question answering. In A. Korhonen, D. Traum, & L. Màrquez (Eds.), Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 6086–6096). Florence, Italy: Association for Computational Linguistics. https://doi.org/10.18653/v1/P19-1612

48.

Leetaru

Schrodt

P. A.

(2013). Gdelt: Global data on events, location, and tone, 1979–2012. In ISA annual convention (Vol. 2, pp. 1–49). Citeseer.

49.

Lewis

Perez

Piktus

Petroni

Karpukhin

Goyal

Küttler

Lewis

Yih

Wt.

Rocktäschel

Riedel

Kiela

(2021). Retrieval-augmented generation for knowledge-intensive nlp tasks. http://arxiv.org/abs/2005.11401. ArXiv:2005.11401 [cs].

50.

Jin

Guan

Guo

Wang

Cheng

(2021). Search from history and reason for future: Two-stage reasoning on temporal knowledge graphs. arXiv preprint arXiv:2106.00327.

51.

Lisena

Schwabe

van Erp

Troncy

Tullett

Leemans

Marx

Ehrich

S. C.

(2022). Capturing the semantics of smell: The odeuropa data model for olfactory heritage information. In European semantic web conference (pp. 387–405). Springer.

52.

Liu

Cheng

Gunaratna

(2021). Entity summarization: State of the art and future challenges. Journal of Web Semantics, 69, 100647.

53.

Liu

Jin

Wang

Ruan

Zhou

Gao

Yin

(2018). Patienteg dataset: Bringing event graph model with temporal relations to electronic medical records. arXiv preprint arXiv:1812.09905.

54.

Lohmann

Heim

Stegemann

Ziegler

(2010). The relfinder user interface: Interactive exploration of relationships between objects of interest. In Proceedings of the 15th international conference on Intelligent user interfaces (pp. 421–422).

55.

Lynden

S. J.

Kojima

Matono

Nakamura

Yui

(2013) A hybrid approach to linked data query processing with time constraints LDOW, 996, 1–10.

56.

Mendes

P. N.

Jakob

García-Silva

Bizer

(2011). Dbpedia spotlight: Shedding light on the web of documents. In Proceedings of the 7th international conference on semantic systems (pp. 1–8).

57.

Merono

Ashkpour

van Erp

Mandemakers

Breure

Scharnhorst

Schlobach

van Harmelen

(2015). Semantic technologies for historical research: A survey. Semantic Web, 6(6), 539–564. https://doi.org/10.3233/SW-140158

58.

Meroño-Peñuela

Hoekstra

(2014). What is linked historical data? In EKAW.

59.

Moore

J. L.

Steinke

Tresp

(2012). A novel metric for information retrieval in semantic networks. In The Semantic Web: ESWC 2011 Workshops: ESWC 2011 Workshops, Heraklion, Greece, May 29–30, 2011, Revised Selected Papers 8 (pp. 65–79). Springer.

60.

Nguyen

K. H.

Tannier

Ferret

Besançon

(2016). A dataset for open event extraction in English. In Proceedings of the Tenth international conference on language resources and evaluation (LREC’16) (pp. 1939–1943).

61.

Noordegraaf

van Erp

Zijdeman

Raat

van Oort

Zandhuis

Vermaut

Mol

van der Sijs

Doreleijers

Baptist

Vrielink

Assendelft

Rasterhoff

Kisjes

(2019). Semantic deep mapping in the Amsterdam time machine: Viewing late 19th- and early 20th-century theatre and cinema culture through the lens of language use and socio-economic status. In F. Niebling, S. Münster, & H. Messemer (Eds.), Research and education in urban history in the age of digital libraries – second international workshop, UHDL 2019, Dresden, Germany, October 10–11, 2019, Revised Selected Papers, Communications in Computer and Information Science (Vol. 1501, pp. 191–212). Springer. https://doi.org/10.1007/978-3-030-93186-5_9

62.

Pellissier Tanon

Weikum

Suchanek

(2020). Yago 4: A reasonable knowledge base. In European semantic web conference (pp. 583–596). Springer.

63.

Pérez

Arenas

Gutierrez

(2010). nsparql: A navigational language for rdf. Journal of Web Semantics, 8(4), 255–270.

64.

Petroni

Lewis

Piktus

Rocktäschel

Miller

A. H.

Riedel

(2020). How Context Affects Language Models’ Factual Predictions. https://openreview.net/forum?id=025X0zPfn

65.

Petroni

Rocktäschel

Riedel

Lewis

Bakhtin

Miller

(2019). Language models as knowledge bases? In K. Inui, J. Jiang, V. Ng, & X. Wan (Eds.), Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 2463–2473). Hong Kong, China: Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1250

66.

Pirrò

(2015). Explaining and suggesting relatedness in knowledge graphs. In The Semantic Web-ISWC 2015: 14th International semantic web conference, Bethlehem, PA, USA, October 11–15, 2015, Proceedings, Part I 14 (pp. 622–639). Springer.

67.

Razniewski

Yates

Kassner

Weikum

(2021). Language models as or for knowledge bases. arXiv preprint arXiv:2110.04888.

68.

Reutter

J. L.

Soto

Vrgoˇ

(2015). Recursion in sparql. In The Semantic Web-ISWC 2015: 14th international semantic web conference, Bethlehem, PA, USA, October 11–15, 2015, Proceedings, Part I 14 (pp. 19–35). Springer.

69.

Riedl

M. O.

(2016). Computational narrative intelligence: A human-centered goal for artificial intelligence. arXiv preprint arXiv:1602.06484.

70.

Roberts

Raffel

Shazeer

(2020). How much knowledge can you pack into the parameters of a language model? In B. Webber, T. Cohn, Y. He, & Y. Liu (Eds.), Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP) (pp. 5418–5426). Online: Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.emnlp-main.437

71.

Rospocher

van Erp

Vossen

Fokkens

Aldabe

Rigau

Soroa

Ploeger

Bogaard

(2016). Building event-centric knowledge graphs from news. Journal of Web Semantics, 37, 132–151.

72.

Schaffert

Kurz

Glachs

Bauer

Dorschel

Fernandez

(2012). The linked media framework.

73.

Singh

Nejdl

Anand

(2016). History by diversity: Helping historians search news archives. In Proceedings of the 2016 ACM on conference on human information interaction and retrieval (pp. 183–192).

74.

Sinikallio

Drobac

Tamper

Leal

Koho

Tuominen

Mela

M. L.

Hyvönen

(2021). Plenary debates of the parliament of finland as linked open data and in Parla-Clarin markup. In 3rd Conference on language, data and knowledge, LDK 2021 Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing.

75.

Sloman

S. A.

(1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3.

76.

Souza Costa

Gottschalk

Demidova

(2020). Event-qa: A dataset for event-centric question answering over knowledge graphs. In Proceedings of the 29th ACM international conference on information & knowledge management (pp. 3157–3164).

77.

Stadtmüller

Speiser

Harth

Studer

(2013). Data-fu: A language and an interpreter for interaction with read/write linked data. In Proceedings of the 22nd international conference on World Wide Web (pp. 1225–1236).

78.

Sun

Tang

Wang

Lin

Gong

L. M.

Shum

H. Y.

Guo

(2024). Think-on-graph: Deep and responsible reasoning of large language model on knowledge graph. 10.48550/arXiv.2307.07697. http://arxiv.org/abs/2307.07697. ArXiv:2307.07697 [cs].

79.

Taelman

Van Herwegen

Vander Sande

Verborgh

(2018). Comunica: A modular sparql query engine for the web. In The Semantic Web–ISWC 2018: 17th International semantic web conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part II 17 (pp. 239–255). Springer.

80.

Taelman

Verborgh

(2023). Link traversal query processing over decentralized environments with structural assumptions. In International semantic web conference (pp. 3–22). Springer.

81.

Tiddi

d’Aquin

Motta

(2016). Learning to assess linked data relationships using genetic programming. In The Semantic Web–ISWC 2016: 15th International semantic web conference, Kobe, Japan, October 17–21, 2016, Proceedings, Part I 15 (pp. 581–597). Springer.

82.

Tomasi

(2021). Artchives: A linked open data native catalogue of art historians’ archives.

83.

Umbrich

Hogan

Polleres

Decker

(2012). Improving the recall of live linked data querying through reasoning. RR, 7497, 188–204.

84.

Umbrich

Hogan

Polleres

Decker

(2015). Link traversal querying for a diverse web of data. Semantic Web, 6(6), 585–624.

85.

Van den Akker

Aroyo

Cybulska

Van Erp

Gorgels

Hollink

Jager

Legene

van der Meij

Oomen

(2010) Historical event-based access to museum collections. Applied Artificial Intelligence, 25, 1–9.

86.

Van Hage

W. R.

Malaisé

Segers

Hollink

Schreiber

(2011). Design and use of the simple event model (SEM). Journal of Web Semantics, 9(2), 128–136.

87.

Voskarides

Meij

Sauer

de Rijke

(2021). News article retrieval in context for event-centric narrative creation. In Proceedings of the 2021 ACM SIGIR international conference on theory of information retrieval (pp. 103–112).

88.

Vrandečić

Krötzsch

(2014). Wikidata: A free collaborative knowledgebase. Communications of the ACM, 57(10), 78–85.

89.

Witten

I. H.

Milne

D. N.

(2008). An effective, low-cost measure of semantic relatedness obtained from Wikipedia links.

90.

Zhu

Zhang

(2020). Event-centric tourism knowledge graph—A case study of hainan. In International conference on knowledge science, engineering and management (pp. 3–15). Springer.

91.

Zauner

Linse

Furche

Bry

(2010). A rpl through rdf: Expressive navigation in rdf graphs. In Web reasoning and rule systems: Fourth international conference, RR 2010, Bressanone/Brixen, Italy, September 22-24, 2010. Proceedings 4 (pp. 251–257). Springer.

	Number of sub-events
Dataset	$=$ 1	>1	>10	Retained
Wikidata	203,988	238,094	2408	341
DBpedia	84,599	95,504	1333	250
YAGO4	70,738	76,682	993	306

	Number of sub-events					Average number of features
Dataset	>10	>20	>30	>50	>100	>10	>20	>30	>50	>100
Wikidata	341	208	147	91	47	237	273	290	323	381
DBpedia	275	158	106	66	37	122	133	141	147	160
YAGO4	306	180	126	76	44	5	4	5	5	5

Name	Type	Dataset	Variable	Predicates
WHAT	Predicate filter	DB	$R^{'}$	rdf:type
		WD		wp:P31
WHO	Entity filter	DB	$E^{'}$	dbo:Person
			$p$	rdf:type
		WD	$E^{'}$	wd:Q5
			$p$	wd:P31
WHERE	Entity filter	DB	$E^{'}$	dbo:Place
				dbo:Location
			$p$	rdf:type
		WD	$E^{'}$	wd:P17
				wp:P276
				wd:Q6256
			$p$	wp:P31
WHEN	Temporal filter	DB	$p s$	dbp:date
				dbp:startDate
				dbp:birthDate
			$p e$	dbp:endDate
				dbp:deathDate
		WD	$p s$	wp:P585
				wp:P580
				wp:P569
			$p e$	wp:P582
				wp:P570
Preference function	–	DB	$d$	rdfs:domain
			$s c$	rdfs:subClassOf
			$t_{e}$	dbo:Event
		WD	$d$	wd:Q21503250
			$s c$	wp:P279
			$t_{e}$	wd:Q13418847

ChronoGrapher: Event-Centric Knowledge Graph Construction via Informed Graph Traversal

Abstract

Keywords

1. Introduction

Definition 1 (Knowledge Graph)

Definition 2 (Event-Centric KG)

Definition 3 (Event and Sub-Event)

3.1. Event-Centric Subgraph Extraction (RQ1)

Definition 4 (Ingoing Triples)

Definition 5 (Outgoing Triples)

Definition 6 (Node Expansion)

3.1.2. Task and Method Overview

Definition 7 (Event-Centric Subgraph Retrieval Task)

Definition 8 (Pruned Best-First Search Algorithm)

Definition 9 (Event-Centric Filters)

Definition 10 (Relation Patterns and Node Satisfaction)

4.1. Experimental Setting (RQ1 and RQ2)

4.2.1. Parameter Selection for the Search (RQ1)

4.3.1. User Study Setup

5.1. EventKG as the Ground Truth

5.3. Generalisibility of ChronoGrapher

5.3.1. Generalisibility to Other Datasets

5.4. Usage of the Event-Centric KG

6. Conclusion

Footnotes

Acknowledgements

Funding

Declaration of Conflicting Interests

ORCID iDs

Notes

References