Abstract

Preface
Knowledge graphs (KGs; Hogan et al., 2021) have become essential for realizing the Semantic Web vision, enabling the development of intelligent, data-driven systems and lately supporting advancements in large language models (LLMs). Yet, its construction remains a complex and evolving challenge that continues to attract attention from both academia and industry (Chaves-Fraga et al., 2024). This special issue brings together recent research contributions that advance the state of the art in this field, offering theoretical insights, methodological innovations, and practical solutions that collectively push the boundaries of KG construction (KGC) and pave the way for future innovations.
Over the past decade, significant progress has been made in developing methods and systems for transforming semistructured data, such as tabular data (e.g., relational databases, or CSV files), or hierarchical data (e.g., XML, or JSON) into resource description framework (RDF)-based KGs (Van Assche et al., 2023). Declarative frameworks, including mapping languages (Dimou et al., 2014; Iglesias-Molina et al., 2023) and rule-based approaches, have laid the foundations for transparent and reproducible KGC (Arenas-Guerrero et al., 2024; Iglesias et al., 2020). However, important challenges remain regarding the formalization of mapping languages, the automation of mapping rules definition, the scalability and performance of KGC systems, as well as their systematic evaluation. Addressing these issues requires both conceptual advances and efficient implementations that can cope with the increasing volume and heterogeneity of data.
The human factor introduces its own complexities as well, including designing intuitive and accessible user interfaces, minimizing cognitive load, and balancing automation with manual oversight. Equally demanding is the process of conducting user-centered evaluations, which poses challenges in defining meaningful metrics, capturing diverse user needs, and ensuring that systems are both usable and effective in real-world scenarios. Addressing these issues requires incorporating human-in-the-loop approaches and enabling iterative feedback to ensure that the resulting systems meet real-world usability.
In parallel, the field is undergoing a transformation driven by machine learning (ML) and, more recently, LLMs (Freund et al., 2025; Schmidt et al., 2025). These approaches are opening new avenues for the automation and enrichment of KGs (Dimou & Chaves-Fraga, 2022), from schema generation and entity extraction to data validation and quality assessment. While such models bring remarkable potential, they also raise new questions concerning explainability, reproducibility, and human oversight, calling for hybrid approaches that integrate declarative methods with ML techniques in a human-in-the-loop setting.
The contributions included in this special issue reflect these diverse but interconnected perspectives. They explore the interplay between declarative and procedural paradigms, the optimization of mapping systems, and the integration of ML and LLMs in the knowledge engineering lifecycle. Together, they offer a comprehensive and up-to-date picture of the research landscape, illustrating how established methodologies continue to evolve while new paradigms emerge.
This collection provides both an overview of the current state of KGC and inspiration for further research in this vibrant and rapidly advancing area of the Semantic Web.
Contributions
This special issue received 14 articles, among which seven articles were accepted for publication. Among the accepted articles, one article is related to mapping rules’ formalization (Oo et al., 2025), two articles to optimizations of KGC systems (García-González, 2025; Van Assche et al., 2025), three articles to LLMs (Blin et al., 2025; Dang et al., 2025; Moskvoretskii et al., 2025), two articles to the event-centric KGC (Blin et al., 2025; Van Assche et al., 2025). and one article to users’ explainability of the KGC process (Zhang et al., 2025). In detail:
Analysis and Beyond
After analyzing the accepted articles, their main contributions focus on two areas: the development of standards and formalization techniques for declarative methods for KGC, and the automation or provision of user support throughout the KGC process.
Standards and Standardization
A recurring theme in KGC is the role of standards and their adoption by the community. In this special issue, we received contributions related to the RML (Dimou et al., 2014; Iglesias-Molina et al., 2023) and ShExML (Garcı́a-González et al., 2020) mapping languages. RML extends R2RML, 1 a W3C Recommendation for transforming data contained in relational databases into RDF. RML has been instrumental in supporting mappings beyond relational databases. The KGC Community Group 2 was established in 2019 and has been actively working toward establishing a charter for a candidate submission to the W3C since 2023. Similarly, ShExML builds upon ShEx, 3 which, although not a W3C Recommendation, is widely acknowledged as a specification for validation by the Semantic Web community. This close alignment with recognized and widely adopted specifications underscores how ShExML, such as RML, is rooted within standards, whether identified by a standardization body or community.
We note that this special issue did not receive contributions on Façade-X (Asprino et al., 2024; Daga et al., 2021). Yet, it is important to recognize that a W3C Community Group
4
was established in September 2025 for this initiative as well. Façade-X enables the use of SPARQL to interact with various sources as RDF graphs. RML and ShExML differ from Façade-X in that the former focus on declaratively specifying mappings from source data to RDF according to a target ontology. In contrast, the latter focuses on using SPARQL to access data contained in heterogeneous data sources as RDF via direct mappings. These direct mappings are a natural evolution of a direction mapping approach as proposed by the
Taken together, these developments suggest that emerging standards are evolving to address different use cases and niches in KGC. However, the different approaches highlight both the richness of the field and the importance of continued dialogue between communities to ensure maximum interoperability and sustainable adoption.
Mapping Languages’ Formalization
Another observation is the increased effort within the community to formalize the mapping languages for the KGC. While R2RML introduced a reference algorithm, it did not provide a formal foundation. This gap has been addressed by, for example, Calvanese et al. (2017) and Elhalawati et al. (2025). The first offered a formalization of R2RML for the Ontop system; the second proposed a Datalog-based formalization of R2RML. In this special issue, Min Oo et al. presented initial steps toward formalizing RML. Although this work was well received, we note for the readers’ benefit that the lead author of the article has recently published a similar work in collaboration with a computer scientist (Oo & Hartig, 2025), which won the Best Paper award at the 22nd European Semantic Web Conference (ESWC) in 2025.
We expect that the formalization of mapping languages will play an important role in shaping and driving the standardization process of KGC. Historically, KGC has been led primarily by engineers focused on system development and practical applications. In contrast, formalization is often the domain of (theoretical) computer scientists. As KGC languages grow more expressive and begin to intersect with fields such as programming languages, logic, and database theory, closer collaboration between these communities will be essential. It will otherwise be challenging to establish solid, extensible, and verifiable foundations.
Large Language Models (LLMs)
A notable trend is the integration of LLMs to streamline and enhance various stages of KGC, ranging from initial design to iterative refinement and enrichment. This relationship between LLMs and KGs is bidirectional. On the one hand, LLMs can support and accelerate the KGC; on the other hand, the continuous enhancement of KGs with newly constructed knowledge strengthens the performance of LLMs, improving their factual grounding, reducing hallucinations, and enabling more accurate reasoning. LLMs have been considered so far for all aspects of KGC; ranging from ontology development and taxonomy construction and enrichment, as with (Moskvoretskii et al., 2025), to table interpretation and KG enrichment. However, as automated approaches increasingly rely on LLMs, challenges arise regarding the validity and interpretability of the results, as Dang et al. (2025) showed. Ambiguous or incorrect results highlight the need for mechanisms for their explainability, as with (Zhang et al., 2025), validation, as with (Dang et al., 2025), and improvement are required.
Moreover, as KGC is a complex process that requires a good understanding of the data to semantically annotate entities and their relations, we observe that KGC with declarative mapping languages remains primarily manual. Automated systems for the KGC rely more and more on LLMs, but either only indicate the semantic annotations without constructing a knowledge graph, or directly produce a knowledge graph without producing any declarative mapping rules. As with declarative KGC communities that have their own community groups and venues, for example, the KGC workshop, 6 the automated KGC topic has its own communities, for example, around the SemTab challenge 7 or the Table Representation Learning Workshop. 8 Consequently, we observe that the declarative and automated approaches evolve in different directions, and little interaction occurs between them.
Human-in-the-Loop
Although human interactions for KGC was one of the highlighted topics of the special issue, only one article addressed this aspect. Existing approaches for human interactions tend to fall into two categories: declarative methods, which emphasize editor tools and structured methodologies such as the pay-as-you-go paradigm (Sequeda & Miranker, 2017) to assist users, and automated solutions that often rely on human feedback for refinement. However, there has been limited investment in comprehensive human-in-the-loop methodologies that actively integrate user input throughout the entire construction process. Recognizing this gap, a dedicated workshop on “Users and Knowledge Graphs” 9 was recently organized to foster dialogue, share best practices, and promote research that prioritizes usability, collaboration, and human-centered design in knowledge graph development.
Challenges and Future Work
KGC continues to evolve rapidly, driven by new paradigms, technologies, and the increasing need for scalable, explainable, and interoperable data integration methods. Despite the significant advances reflected in this special issue, several open challenges remain and point toward promising directions for future research.
A first challenge lies in bridging declarative and learning-based approaches. Declarative mapping languages provide transparency, reproducibility, and formal rigor, whereas ML and LLMs offer adaptability and automation. The integration of both paradigms, through neuro-symbolic approaches with human-in-the-loop pipelines, remains an open area that requires methodological frameworks and evaluation metrics capable of balancing interpretability and performance.
A second line of research concerns evaluation and benchmarking. Although several benchmarks such as KROWN (Van Assche et al., 2024) and GTFS-Madrid-Bench (Chaves-Fraga et al., 2020) have played an important role in assessing the performance of KGC engines, they require updates to align with current languages, specifications, and parameters. Moreover, the rise of LLM-based and incremental construction approaches calls for new evaluation scenarios that go beyond traditional materialization pipelines. Recent initiatives such as the BLINKG benchmark (Castedo et al., 2026) represent a significant step toward reproducible, fine-grained, and comparable evaluation of automatic KGC methods. However, broader community coordination is still needed to establish unified datasets, metrics, and evaluation protocols that ensure fairness and cumulative scientific progress.
Another emerging challenge involves maintaining and evolving KGs in dynamic and distributed environments. The growth of data spaces and the proliferation of real-time data streams call for incremental and event-driven graph construction methods, enabling continuous synchronization, provenance tracking, and version management while minimizing computational costs (Geisler et al., 2025). Finally, from a foundational perspective, formalization and standardization remain essential. Ongoing efforts toward algebraic semantics for mapping languages and formal models of execution need to be consolidated (Oo & Hartig, 2025) and connected with W3C standardization initiatives. Ensuring semantic interoperability across languages, engines, and specifications will be key to achieving maturity and long-term sustainability in the field.
In summary, the future of KGC lies in combining solid theoretical foundations with adaptive, explainable, and interoperable systems. The convergence of declarative design, formal semantics, and ML holds the potential to make KGC not only more efficient and scalable but also more trustworthy and impactful across scientific and industrial domains.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
