Sage Journals: Discover world-class research

Abstract

The adoption of Semantic Web and Linked Data technologies in web application development has been impacted by less-than-optimal development experience. Front-end web application development is inherently challenging, as developers must master a multitude of technologies and frameworks even to build simple applications. Adding Linked Data technologies to the mix further complicates this challenge by increasing the number of technologies that need to be learned. The Semantic Web community has historically struggled to provide front-end developers with quality tools and libraries for working with Linked Data; consequently, developers often prefer traditional solutions based on relational or document databases that offer far superior developer experience. To address this issue, we developed LDkit, an innovative Object Graph Mapping framework for TypeScript. The framework works as the data access layer, providing model-based abstraction for querying and retrieving RDF data. LDkit transforms the data between RDF representation and TypeScript primitives according to user-defined data schemas, simplifying the use of the data and ensuring end-to-end data type safety. This paper introduces LDkit, describes its design and implementation fundamentals with focus on its developer interface and integration with other related technologies. Building on community feedback and experience from using LDkit, we introduce major enhancements that simplify data querying and update for common and uncommon web application scenarios, further improving developer experience. Finally, we demonstrate impact of LDkit by examining usage of the framework in real-world projects. LDkit aims to enhance the web ecosystem by making Linked Data more accessible and integrated into mainstream web technologies.

Keywords

linked data developer experience data abstraction

1. Introduction

The Semantic Web and Linked Data have emerged as powerful technologies to enrich the World Wide Web with structured and interconnected data (Berners-Lee et al., 2001). These technologies enable a more meaningful representation of web content, facilitating advanced data integration and interoperability. However, despite their advantages, their adoption has been relatively slow. A primary obstacle is the difficulty of querying distributed Linked Data within web applications (Bizer et al., 2009), a challenge often framed as the expressivity/complexity trade-off.

Expressivity in Linked Data refers to the ability to represent rich semantics and relationships between resources, often using ontologies and domain-specific vocabularies (Heath & Bizer, 2011). A more expressive data model allows for precise and semantically rich descriptions, improving data integration and reasoning capabilities. However, this increased expressivity comes at a cost. Greater expressivity typically results in increased complexity in data retrieval and processing. More sophisticated data models require advanced query languages, such as SPARQL (Harris & Seaborne, 2013), and computationally intensive processing methods. As a result, working with Linked Data often demands significant development effort and extensive computational resources, creating barriers to its practical implementation in web applications.

To address these challenges, several tools have been developed to simplify Linked Data querying while maintaining its expressivity. Notable examples include Comunica, a modular query engine designed for querying heterogeneous Linked Data sources (Taelman et al., 2018), and LDflex, a domain-specific language that provides a more intuitive way to query RDF data (Verborgh & Taelman, 2020). These solutions help bridge the gap between expressivity and usability by abstracting the complexities of Linked Data querying. Nevertheless, the ecosystem of Linked Data tools for web development remains limited, with existing solutions lacking the maturity and comprehensive feature set found in more established web technologies.

Over the past decade, web development has undergone a profound transformation, driven by the increasing demand to migrate traditionally desktop-native applications to the web. As more complex software systems transitioned to web-based environments, the need for more powerful and scalable front-end technologies became evident. This shift led to the emergence and widespread adoption of advanced frameworks such as React,¹ Vue² and Angular,³ which enabled the development of highly interactive and responsive web applications.

Alongside these advancements, the rise of TypeScript,⁴ a statically typed superset of JavaScript, further strengthened the web development ecosystem. By introducing static typing, TypeScript improved code quality, enhanced developer productivity, and provided robust tooling support, making it more feasible to build and maintain large-scale applications. These innovations have collectively equipped developers with the necessary tools to create sophisticated, feature-rich web applications that rival their desktop counterparts in functionality and performance.

However, the recent advancements in web development have significantly expanded the skill set required of web developers, who now must navigate a growing ecosystem of technologies. The increasing complexity of web applications demands a deeper understanding of software architecture principles and many advanced concepts such as state management, asynchronous programming, security best practices, and performance optimization. In this context, the adoption of Linked Data in web applications presents yet another challenge for developers due to its inherent complexity. Unlike conventional data management approaches, Linked Data requires developers to work with RDF, SPARQL, and other knowledge graph technologies, which introduce unfamiliar paradigms and demand a deeper understanding of semantic data structures. Therefore, to ease this transition, it is essential to provide robust developer tools that abstract much of the complexity while preserving the expressiveness and power of Linked Data.

To address the needs of modern web developers, we created LDkit, a novel Linked Data abstraction designed to provide a data type-safe way for interacting with Linked Data from within web applications. LDkit enables developers to directly utilize Linked Data in their web applications by providing mapping from Linked Data to simple, well-defined data objects; it shields the developer from the challenges of querying, fetching and processing RDF data.

This paper extends our previous work on LDkit (Klíma et al., 2023) and makes three key contributions to the development of Linked Data applications and adoption of Semantic Web technologies:

–
First, it elaborates on the architecture and design of LDkit, providing an in-depth discussion of data schema construction, as well as data querying and updating. The paper includes numerous code examples that demonstrate typical Linked Data use cases, illustrating LDkit’s internal mechanisms such as SPARQL query generation and the transformation between the RDF data model and TypeScript objects.
–
Second, it presents enhancements to LDkit introduced since the original publication, addressing former limitations and incorporating feedback from the community. These enhancements focus on improving the developer experience, including usability refinements and solutions to common challenges in working with Linked Data.
–
Third, it provides an empirical analysis of LDkit’s usage, offering insights into adoption trends, real-world application scenarios, and performance considerations. This analysis not only validates the effectiveness of LDkit but also identifies areas for future research and optimization.

The remainder of this paper is organized as follows. Section 2 reviews related work. Section 3 provides an overview of the design, implementation, usage examples, and integration of LDkit in web applications, followed by Section 4 that provides a thorough overview of improvements in LDkit V2. Section 5 then discusses the developer experience aspects of LDkit, including comparison with related tools and current limitations. Section 6 evaluates performance of LDkit and discusses potential performance bottlenecks. Section 7 examines the usage of LDkit, presenting usage metrics and an analysis of its adoption in existing tools. Finally, Sections 8 and 9 outline directions for future research and provide concluding remarks.
2. Related Work

This section presents an overview of the existing approaches and technologies that are closely related to LDkit.

The first subsection focusses on common web application data abstraction solutions and introduces several key data access libraries that are considered state-of-the art in web development. By examining these well-established solutions, we can infer a set of general qualities that a library such as LDkit should possess in order to achieve wide adoption, ensure usability, and integrate seamlessly with the web application development ecosystem.

The second subsection reviews existing RDF-specific JavaScript/TypeScript libraries, focussing on their capabilities, limitations and relevance to web application development. By examining these solutions, we aim to identify the gaps that remain unaddressed, especially when compared to traditional data abstraction tools.

The third subsection introduces alternative RDF querying approaches that leverage mainstream web application technologies, such as RESTful interfaces and GraphQL.

The fourth subsection examines RDF data documentation strategies and data descriptors, such as data shape definitions, and explores their potential use in Linked Data applications.

The last subsection provides an overview of the different JavaScript runtime environments that are relevant to the execution of LDkit and similar libraries. This technical background is important for understanding the broader context in which LDkit operates. Since LDkit is written in TypeScript and intended to be used in diverse web development environments, it is crucial to discuss the available runtime options, particularly how they affect TypeScript execution, development workflow, and the overall developer experience. Finally, this subsection also clarifies the runtime assumptions made in the paper.

This structure demonstrates how LDkit fits within the broader landscape of web development tools, while also illustrating the unique contributions it makes by integrating Linked Data into mainstream web technologies.

2.1. Web Application Data Abstractions

There are various styles of abstractions over data sources to facilitate access to databases in web development. These abstractions often cater to different preferences and use cases.

Object-Relational Mapping (ORM) and Object-Document Mapping (ODM) abstractions map relational or document database entities to objects in the programming language, using a data schema. They provide a convenient way to interact with the database using familiar object-oriented paradigms, and generally include built-in type casting, validation and query building out of the box. Examples of ORM and ODM libraries for JavaScript/TypeScript include Prisma,⁵ TypeORM⁶ or Mongoose.⁷ Corresponding tools for graph databases are typically referred to as Object-Graph Mapping (OGM) or Object-Triple Mapping (Ledvinka & Křemen, 2020) libraries, and include Neo4j OGM⁸ for Java and GQLAlchemy⁹ for Python.

Query Builders provide a fluent interface for constructing queries in the programming language, with support for various database types. They often focus on providing a more flexible and composable way to build queries compared to ORM/ODM abstractions, but lack convenient development features like automated type casting. A prominent query builder for SQL databases in web application domain is Knex.js.¹⁰

Driver-based abstractions provide a thin layer over the database-specific drivers, offering a simplified and more convenient interface for interacting with the database. An example of a driver-based abstraction heavily utilized in web applications is the MongoDB Node.js Driver.¹¹

Finally, API-based Data Access abstractions facilitate access to databases indirectly through APIs, such as RESTful or GraphQL APIs. They provide client-side libraries that make it easy to fetch and manipulate data exposed by the APIs. Examples of API-based data access libraries include tRPC¹² and Apollo Client.¹³

Each style of abstraction caters to different needs and preferences. Ultimately, the choice of abstraction style depends on the project’s specific requirements and architecture, as well as the database technology being used. There are, however, several shared qualities among these libraries that contribute to a good developer experience.

All of these libraries have static type support, which is especially beneficial for large or complex projects, where maintaining consistent types can significantly improve developer efficiency. Static types provide early error detection during compile time rather than runtime, which reduces bugs and unexpected behaviour in production since many common mistakes, such as assigning the wrong data type to variables or passing incorrect arguments to functions, are caught during development. In large projects, where many developers work on the same codebase, static types act as a form of self-documentation, improving code readability, maintenance, and developer collaboration.

Another aspect of the reviewed libraries, which is closely related to static types, is good tooling support: These libraries often provide integrations with popular development tools and environments. This support can include autocompletion, syntax highlighting, and inline error checking, further enhancing the developer experience and productivity.

Furthermore, most of these libraries offer a consistent API across different database systems, which simplifies the process of switching between databases or working with multiple databases in a single application. Finally, abstracting away low-level details allows developers to focus on their application’s logic rather than dealing with the intricacies of the underlying database technology.

2.2. JavaScript/TypeScript RDF Libraries

JavaScript is a versatile programming language that can be utilized in various execution environments, such as web browsers, servers, or desktop. As Linked Data and RDF have gained traction in web development, several JavaScript libraries have emerged to work with RDF data. These libraries offer varying levels of RDF abstraction and cater to different use cases.

Most of the existing libraries conform to the RDF/JS Data model specification (Bergwinkl et al., 2022), sharing the same RDF data representation in JavaScript for great compatibility benefits. Often, RDF libraries make use of the JavaScript Object Notation for Linked Data (JSON-LD) (Sporny et al., 2020), a lightweight syntax that enables JSON objects to be enhanced with RDF semantics. JSON-LD achieves this by introducing the concept of JSON-LD context, which is a mechanism used to map terms in JSON data to concepts and entities in external vocabularies via RDF property and type IRIs¹⁴. This mapping allows for JSON objects to be interpreted as RDF graphs.

One of the most comprehensive projects is Comunica (Taelman et al., 2018), a modular query engine for Linked Data, enabling developers to execute SPARQL queries over multiple heterogeneous data sources with extensive customizability.

LDflex (Verborgh & Taelman, 2020) is a domain-specific language that provides a developer-friendly API for querying and manipulating RDF data with an expressive, JavaScript-like syntax. It makes use of JSON-LD contexts to interpret JavaScript expressions as SPARQL queries. While it does not provide end-to-end type safety, LDflex is one of the most versatile Linked Data abstractions that are available. Since it does not utilize a fixed data schema, it is especially useful for use cases where the underlying Linked Data is not well defined or known.

There are also several object-oriented abstractions that provide access to RDF data through JavaScript objects. RDF Object¹⁵ and SimpleRDF¹⁶ enable per-property access to RDF data through JSON-LD context mapping. Linked Data Objects (LDOs)¹⁷ leverage Shape Expressions (ShEx) (Prud’hommeaux et al., 2019) data shapes to generate RDF to JavaScript interface, and static typings for the JavaScript objects. Object-semantic mapping¹⁸ utilizes proprietary model definition to map RDF data to model instances. Soukai-solid¹⁹ provides OGM-like access to Solid Pods²⁰ based on a proprietary data model format.

Except for LDflex, the major drawback of all the aforementioned Linked Data abstractions is that they require pre-loading the source RDF data to memory. For large decentralized environments like Solid, this pre-loading is often impossible, and we instead require discovery of data during query execution (Taelman & Verborgh, 2023). While these libraries offer valuable tools for working with RDF, when it comes to web application development, none of them provides the same level of type safety, tooling support and overall developer experience as their counterparts that target relational or document databases.

2.3. Alternative RDF Querying Approaches

In addition to dedicated RDF libraries, several attempts have been made to simplify the integration of Linked Data into web applications by leveraging technologies familiar to web developers and abstracting away the complexity of SPARQL querying.

One such project is GRLC (Meroño-Peñuela & Hoekstra, 2016), a tool that transforms a set of SPARQL queries into Linked Data Web APIs. This enables application developers to interact with Linked Data via a REST API, without requiring knowledge of SPARQL. Furthermore, GRLC automatically generates an OpenAPI²¹ specification, which developers can leverage as documentation and use to generate additional application artifacts, such as TypeScript definitions.

GRLC offers benefits to both application developers and RDF data providers. Developers may deploy GRLC as a security layer to prevent exposure of SPARQL queries and endpoints to end users. RDF data providers, on the other hand, may host a GRLC server alongside an existing SPARQL endpoint to facilitate easier access for developers unfamiliar with SPARQL, in order to simplify data browsing and querying. However, this extra architectural layer may be impractical or undesirable in certain scenarios, and may add significant performance overhead.

In recent years, the GraphQL²² interface has gained popularity as an alternative to REST interfaces, due to its flexible data retrieval, strongly typed schema, and the ability to group multiple REST requests into one. A notable element of this interface is the GraphQL query language, which is popular among developers due to its ease of use and wide tooling support. However, GraphQL uses custom interface-specific schemas, which are difficult to federate over, and have no relation to the RDF data model.

That is why, in the recent years, we have seen several initiatives Taelman et al. (2018), Taelman et al. (2019) and Angele et al. (2022) attempting to bridge the worlds of GraphQL and RDF, by translating GraphQL queries into SPARQL, with the goal of lowering the entry-barrier for writing queries over RDF. While these initiatives addressed the problems to some extent, there are still several drawbacks to this approach. Most notably, similar to GRLC, it requires the deployment of a dedicated server.

2.4. RDF Data Descriptors

One of the key challenges in developing Linked Data applications is understanding what data is available within an RDF data source and how it can be explored effectively. While it is possible to investigate data directly using RDF data visualization tools or via SPARQL endpoint queries, this approach can be inefficient and opaque, particularly when dealing with unfamiliar or complex datasets. Consequently, various forms of documentation and data descriptors have been developed to facilitate comprehension, navigation, and use of RDF data.

ShEx (Prud’hommeaux et al., 2019) and the Shapes Constraint Language (SHACL) (Knublauch & Kontokostas, 2017) are two prominent schema languages designed to describe and validate RDF data structures. ShEx offers a compact and expressive syntax for defining graph patterns, enabling both validation and data documentation. SHACL, standardized by the W3C, employs an RDF-based syntax to define constraints and rules over RDF graphs, making it highly interoperable within the semantic web stack. These shape definitions may serve as means of accessing RDF data from web applications, as demonstrated by the aforementioned LDO library, which utilizes ShEx shapes to generate a type-safe RDF data access layer.

The JSON-LD context provides a form of RDF data description as well, albeit not as specific as ShEx or SHACL. For example, the LDflex library uses JSON-LD contexts to generate SPARQL queries and to map RDF data to JavaScript primitives.

Finally, the Vocabulary of Interlinked Datasets (VoID) (Alexander et al., 2011) provides a standardized way to describe metadata about RDF datasets, including structural metadata that describe the structure and schema of datasets. This is particularly useful for tasks such as querying and data integration. The SPARQL Editor²³ uses VoID description present in the triplestore to provide autocomplete features when composing SPARQL queries.

These data description technologies are typically employed in back-end systems or data pipelines, with the possible exception of JSON-LD, which benefits from a comprehensive web-based library²⁴. Consequently, the available tooling for these technologies predominantly targets server-side technologies and programming languages. However, thanks to the general support of WebAssembly across modern web platforms, it has become feasible to leverage tooling originally developed for backend environments within web runtime contexts. Notable WebAssembly-enabled projects include Rudof (Labra-Gayo et al., 2024), a Rust library for handling RDF data models and shapes, and Oxigraph,²⁵ a graph database that implements the SPARQL standard.

Despite the portability advantages offered by WebAssembly, its integration into web applications must be approached with caution. The size of WebAssembly binaries and the startup latency they introduce can negatively impact the responsiveness of browser-based applications (for instance, the Oxigraph WASM binary is 3.5 MB). For these reasons, it may be more appropriate to deploy WebAssembly-based libraries in server-side JavaScript environments.

2.5. JavaScript Runtime Environments

JavaScript code can be executed in a variety of environments. The most widely used runtime is perhaps Node.js,²⁶ which pioneered usage of JavaScript outside of web browsers. Node.js benefits from a large ecosystem of libraries and tools, that is facilitated by the NPM²⁷ package registry, the largest software registry in the world. One of the drawbacks of Node.js is that it cannot execute TypeScript files directly – in order to achieve that, TypeScript must first be transpiled into JavaScript, either as a build step, or through just-in-time transformation using Node.js wrappers like ts-node.²⁸

Insufficient TypeScript support, antique CommonJS module system and absence of comprehensive development toolchain in Node.js gave rise to alternative JavaScript runtimes in recent years.

The most prominent of them is Deno.²⁹ The Deno runtime natively supports TypeScript, JSX and modern ECMAScript features with zero configuration. It is built on web standards, and includes essential tools to build, test and deploy applications. Even though Deno is backwards compatible with Node.js code and supports NPM package registry, Deno can import modules from any location on the web via URL, like GitHub, a personal webserver or a CDN like esm.sh.³⁰ Recently, the Deno team introduced JSR,³¹ a modern open-source JavaScript package registry that is runtime agnostic and supports TypeScript natively.

Bun³² is another JavaScript runtime. Similar to Deno, it is an all-in-one JavaScript and TypeScript toolkit for bundling and testing applications. Bun is designed as a drop-in replacement for Node.js, and includes an NPM-compatible package manager.

Web browsers represent the traditional platform for JavaScript execution, arguably the most important one when it comes to user-facing applications. Same as Node.js, web browsers do not support TypeScript and require script transpilation. Additionally, because some of the interfaces provided by Node.js or other runtimes do not have direct equivalents in browser environments, they need to be polyfilled to ensure full functionality. Polyfills allow developers to emulate missing features, providing a seamless and consistent experience across server and client-side executions.

The examples in the rest of the paper assume the usage of TypeScript and Node.js runtime, since it is predominantly used at the time of writing, but they are easily adaptable (with minimal changes) to other runtime environments as well.

3. LDkit

LDkit is a Linked Data OGM toolkit that provides a type-safe data abstraction layer for interacting with Linked Data from within web applications. It enables RDF data query and update over a variety of data sources. It is written in TypeScript and designed to be used on the client or server. In this section, we provide a high level perspective of LDkit design philosophy, architecture, capabilities and discuss some of its most important components.

3.1. Design Philosophy

The development of LDkit was driven by the following design principles, which evolved from the initial requirements (Klíma et al., 2023) to fully realized features:

P1
Embraces Linked Data heterogeneity The inherent heterogeneity of Linked Data ecosystem arises due to the decentralized nature of Linked Data, where various data sources, formats, and ontologies are independently created and maintained by different parties (Bizer et al., 2009). As a result, data from multiple sources can exhibit discrepancies in naming conventions, data models, and relationships among entities, making it difficult to combine and interpret the information seamlessly (Hogan et al., 2012). LDkit embraces this heterogeneity by supporting the querying of Linked Data from various data sources, such as SPARQL endpoints and Solid pods, and various RDF representations.
P2
Provides a simple way of Linked Data model specification The core of any ORM, ODM or OGM framework is a specification of a data model. This model is utilized for shielding the developer from the complexities of the underlying data representation. It is a developer-friendly programming interface for data querying and validation that encapsulates the complexity of translation between the simplified application model and the underlying data representation. In LDkit, the data model (called schema) is represented by a simple TypeScript object that contains a definition of a data shape – a class of entities and their properties – eventually utilized to query RDF data. An example of a simple schema of a book is available in Listing 1. LDkit schemas are based on JSON-LD context format, and simple JSON-LD contexts can usually be easily transformed to schemas. As such, schemas are easy to create and maintain, and are easily separable from the rest of the application so that they can be shared as standalone artifacts. Even though the Schema is a simple structure, it offers comprehensive RDF data abstraction including data types specification, arity of values specification (optional value, single value, array of values), or nesting one schema in another one. Schemas are discussed in more detail in Section 3.3.

P3
Has a flexible architecture A Linked Data abstraction for web applications needs to encompass several inter-related processes, such as generating SPARQL queries based on the data model, executing queries across one or more data sources, and transforming RDF data to JavaScript primitives and vice-versa. In LDkit, each of these processes is implemented as a standalone component for maximum flexibility. A flexible architecture allows LDkit to adapt to varying use cases and requirements, making it suitable for a wide range of web applications that leverage Linked Data. Developers can customize the framework to their specific needs, modify individual components, or extend the functionality to accommodate unique requirements. Finally, as Linked Data and web technologies evolve, a flexible architecture ensures that LDkit remains relevant and can accommodate new standards, formats, or methodologies that may emerge in the future.
P4
Aims to provide positive developer experience LDkit achieves a good developer experience by focussing on several key aspects. First, LDkit provides an ORM-like programming interface that should feel familiar to developers new to the framework, making the learning curve more manageable. Second, the toolkit leverages TypeScript’s type safety features, enabling better tooling support and error prevention. This provides developers with instantaneous feedback in the form of autocomplete or error highlighting within their development environment. Third, LDkit is compatible with popular web application libraries and frameworks, allowing developers to incorporate LDkit into their existing workflows easily. By focussing on these aspects, LDkit creates a positive developer experience that fosters rapid adoption and encourages the effective use of the framework for reading and writing Linked Data in web applications.
P5
Adheres to existing Web standards and best practices LDkit adheres to both general web standards and web development best practices, and Linked Data specific standards for several reasons. First, compliance with established standards ensures interoperability and seamless integration with existing web technologies, tools, and services, thereby enabling developers to build on the current web ecosystem’s strengths. Second, adhering to Linked Data specific standards fosters best practices and encourages broader adoption of Linked Data technologies, contributing to a more robust and interconnected Semantic Web. Finally, compliance with existing web standards allows for the long-term sustainability and evolution of the LDkit framework, as it can adapt and grow with the ever-changing landscape of web technologies and standards.

3.2. Design Fundamentals

In this section, we provide a high-level overview of how LDkit works, of its capabilities and discuss some of its most important components.

The primary objective of LDkit is to provide TypeScript native abstraction to RDF data. It achieves that using the Lens³³ component that provides a programming interface to query RDF using simple LDkit Query language. Using Schema, Lens translate the simple queries into SPARQL queries, which are then executed over a target RDF data source using Query engine. The Query engine returns retrieved RDF data back to Lens; the data is then decoded using Schema again to TypeScript native objects that are ready to be handled by the developer. The process is depicted in Figure 1.

Figure 1.

Basic operation of LDkit.

Let us illustrate how to display simple Linked Data in a web application, using the following objective:

Query DBpedia for persons. A person should have a name property and a birth date property of type date. Find me a person by a specific IRI.

The example in Listing 2 demonstrates how to query, retrieve and display Linked Data in TypeScript using LDkit in only 20 lines of code.

On lines 4-11, the user creates a data Schema, which describes the shape of data to be retrieved, including mapping to RDF properties and optionally their data type. On line 13, they create a Lens object, which acts as an intermediary between Linked Data and TypeScript paradigms. Finally, on line 18, the user requests a data artifact using its resource IRI and receives a plain JavaScript object that can then be printed in a type-safe way.

Under the hood, LDkit performs the following: –

Generates a SPARQL query based on the data schema.

–

Queries remote data sources and fetches RDF data.

–

Converts RDF data to JavaScript plain objects and primitives.

–

Infers TypeScript types for the provided data.

Listing 3 illustrates how exactly the data schema corresponds to the original RDF data and resulting TypeScript primitives, and presents an example of equivalent data models in both domains.

3.3. Schema

The schema is the key concept in LDkit; understanding of the term is essential for efficient use of the library.

On a conceptual level, a data schema is a definition of a data shape through which the RDF data are queried, and how the data are eventually transformed to JavaScript primitives. It is similar to a data model for standard ORM libraries. Through schema, the library users describe a class of entities and their properties to be retrieved from an RDF source.

Listing 4 includes a TypeScript definition of the schema object, including RDF types and properties specification. Any object that satisfies the Schema type is a valid LDkit schema.

3.3.1. RDF Type Definition

A schema may include a @type definition, that is, a specification of one or more RDF types (specified by IRIs) of the entities to be queried. LDkit uses this information as a restriction and considers only the subjects that have all of the specified types. The type definition is optional – if the user omits it, then only the shape of properties is considered for querying data. Listing 5 includes examples of type definitions.

3.3.2. Properties Definition

The schema lets developers define general shape of the data by specifying properties of entities, their data type (a primitive data type based on XSD (Peterson et al., 2012), a custom primitive data type,³⁴ or a nested schema), and arity.

A schema typically includes a map of multiple data properties, that is, a mapping from simplified names to RDF predicates specified by IRI.

In addition, each property definition may include one or more property modifiers, further specifying how the data should be queried and outputted. LDkit schema supports the following modifiers:

–
@id (required)

The RDF predicate IRI.
–
@type

The RDF datatype of the property value. If omitted, defaults to xsd:string.
–
@schema

Nested subschema. It may contain the definition in place, or a reference to another JavaScript object containing LDkit schema specification, thanks to the composability properties of schema. Alternative to @type.
–
@optional

If set, indicates that the property is optional. By default, properties are considered required, and LDkit will only query entities having such properties, or will require property values when creating or updating an entity.
–
@array

If set, indicates to treat the property as having multiple values. By default, properties are considered single-value only. If there are multiple values, only one of them is accepted – the first one encountered in the dataset.
–
@multilang

If set, LDkit will treat the property as language-enabled, that is, it will transform literal values annotated by @language tags to a key-value map of languages and their respective literal values.
–
@inverse

If set, indicates that the property is in an inverse relation, which is useful to represent incoming links. Normally, the properties are matched using <?entity ?property ?value> pattern, but if the inverse attribute is set, the matching is done using <?value ?property ?entity> instead. The attribute is equivalent to JSON-LD @reverse keyword or ShExJ inverse keyword.

The modifiers may be combined together as required. For example, a property with the @array and @multilang flags will have all its literal values transformed to a key-value map, the key being the @language annotation, and the value being an array of literals belonging to the particular language.

The @type property includes a datatype IRI. LDkit supports two-way conversion between commonly used RDF data types and TypeScript native types. Both the data and TypeScript types are adequately converted. For example, a property of type xsd:date is converted to a TypeScript Date object. The complete reference of supported data types, including a mapping to resulting TypeScript types, is available in the LDkit documentation³⁵.

In addition to the built-in types, LDkit may be extended by custom datatypes handlers. We discuss this in more detail in Section 4.

While property values of data entities are usually literals, in some cases it may be useful to query for named nodes (e.g. IRIs of linked entities) instead of literal property values. To address that, we introduced a special datatype ldkit:IRI from the LDkit ontology.³⁶

Finally, for developer convenience, the schema supports a shorthand property notation, where instead of a complex property definition using a key-value object, the developer provides only its predicate IRI. In such case, LDkit assumes that the property is required, of type xsd:string, and has a single value (not an array).
3.3.3. Schema Nesting, Composition, Recursion and Handling Complex Types

The LDkit schema support is designed with two important constraints. First, the schema cannot be recursive. Second, it must be possible to retrieve all the data necessary to populate the schema using a single SPARQL query, in order to apply data constraints directly in the query (e.g. ensure existence of all required properties).

Due to these constraints, although LDkit supports schema nesting and composition, a schema cannot reference itself, even transitively. This restriction is enforced syntactically in TypeScript, thereby preventing users from inadvertently creating invalid recursive schemas.

However, Linked Data often includes entities that reference others of the same type, and potentially the same schema. Such cases can be addressed through application level recursion. At the schema level, instead of referencing the complete schema recursively, users may retrieve only the IRIs of related entities, or use a subschema. Listing 6 illustrates a typical use case involving foaf:knows predicate where one person may refer to another. In this example, instead of referencing the entire PersonSchema, the query retrieves only IRIs of referenced entities of a RDF type foaf:Person. These IRIs can then be used in subsequent queries that apply the main schema.

Another common use case in Linked Data is the handling of complex data types, for example, those where the range of a property is the union or the intersection between two classes. There are two alternative strategies to address this scenario.

To illustrate these strategies, consider the schema:author property from the Schema.org³⁷ ontology, whose range is defined as either schema:Person or schema:Organization.

The first approach, demonstrated in Listing 7, is similar to the recursion example discussed earlier. It uses an intermediary schema to resolve RDF types of the linked entity. The application can then query additional data based on the resolved RDF types, ideally using dedicated schemas for Person and Organization.

The second strategy involves creating a synthetic schema that contains properties that belong to either Person or Organization entities, using the @optional property attribute. This is possible to achieve since LDkit does not require rdf:type specification on the schema level. The example synthetic schema is demonstrated in Listing 8.

3.4. Querying Data

In Listing 2, we showed how LDkit can map a particular data entity specified by IRI from RDF to TypeScript model. When building web applications that browse data, a more advanced interface is required, so that the user can efficiently browse and utilize the data. For example, traditional ORM solutions provide a way to retrieve all entities, lookup entities based on particular criteria, or paginate results to support large datasets. LDkit strives to support the same use cases. A key differentiator of LDkit is the ability to retrieve previously unknown data entities without knowing their resource IRI upfront. In contrast, other existing RDF-based solutions typically require a starting IRI or having the entire dataset loaded into memory.

To support such advanced querying needs, LDkit introduces a custom query language that abstracts SPARQL and enables developers to perform filtering and pagination without writing SPARQL manually. This query language provides a familiar and declarative syntax akin to traditional ORMs. Listing 9 shows an example of how to fetch persons named ”Alan”, and limiting the total number of results to ten. It also shows the particular SPARQL query that is created and executed by LDkit under the hood.

The LDkit query may contain a where clause that lets the user restrict the values of specific properties. It uses FILTER expressions and built-in SPARQL functions to achieve that.

LDkit allows various search and filtering operations that are data-type specific, most of those operations are realized using built-in SPARQL functions. There is support for general comparisons³⁸, string functions³⁹, and array functions⁴⁰. In addition, users can specify a custom filter expression that is inserted directly into the resulting SPARQL query. In addition, most of these filtering operations can be mixed together for a single entity property, so that users can for example query for a date range (Find all persons born after 1950 and before 1990).

For pagination, there are take and skip parameters, that correspond to the LIMIT and OFFSET in the SPARQL query.

In order to support this kind of query specification, the resulting SPARQL queries are quite complex. The queries can be broken down into three parts:

First, a set of entities represented by their IRIs must be established. This is done using a SELECT subquery that checks for required properties defined in schema, and employs filtering and pagination.

Second, for each IRI found, a graph corresponding to the defined schema is matched, including optional properties. Filtering must be applied on this level as well in order to yield correct results.

Third, the graph is finalized using CONSTRUCT query, and a special type ldkit:Resource is added for each IRI so that it is clear which of the IRIs contained in the resulting set corresponds to the entities found.

In summary, LDkit enhances data browsing by offering retrieval of data entities without known IRIs and does not require the entire dataset to be in memory. It features advanced search and filtering capabilities using SPARQL, allowing efficient exploration of large datasets. Users can filter data based on properties, use data-type specific operations, execute custom SPARQL expressions, and apply pagination, making it versatile for querying any kind of data.

3.5. Updating Data

LDkit provides means to insert, update and delete RDF data in a developer friendly way, centred around schema definition, similar as reading data. The read operations convert RDF data to TypeScript plain objects and primitives. The update operations work exactly in the opposite way – the input are plain objects holding information about entities, which are then converted to RDF and SPARQL Update queries. An example of creating a new entity is shown in Listing 10.

When updating the data, LDkit assumes that the data present in the data source correspond to the defined data schema, and the update operations are designed so that the data entities remain sound, that is, if the developer modifies an entity through LDkit, the resulting RDF data always correspond to the defined schema. For example, it is not possible to delete a required property of an entity. With that said, it is always the responsibility of the developer to make sure to use provided interface appropriately, for example make sure not to insert an entity if it already exists, or not to update an entity that does not exist yet. For performance reasons, LDkit does not check for data integrity in the data source.

In order to add new data, users need to specify a full entity (or entities) to insert, including all the required properties. For update operation, it is only needed to provide a subset of entity properties – only the ones that are supposed to change. Listing 9 shows an example of an update query.

3.5.1. The Singularization Problem

One of the peculiarities when working with Linked Data, or with graph data sources in general, is that more often than not, a particular property of an entity may have multiple values, and sometimes the list of values may be quite big. The majority of existing RDF abstractions deal with this issue in a Linked Data way – simply considering any property in data to have multiple values. That is however not good enough developer experience for a lot of scenarios, especially if one needs to rely on a particular data model, or has the data under their control. However, some of the available libraries provide means to define this behaviour.

To better understand the issue, we provide examples of singularization in the GraphQL-LD and LDflex libraries.

The GraphQL-LD (Taelman et al., 2018) library treats all properties as arrays by default, but allows for @single or @plural directives to be added inside the queries to indicate which fields should be singularized and which ones should remain plural. Listing 12 gives an example of how to retrieve single or multiple labels of an entity.

The LDflex (Verborgh & Taelman, 2020) library adopts a different approach by providing a fluid JavaScript interface for interacting with Linked Data, allowing users to traverse data similarly to local objects. When a user retrieves a property using await, a singular value is returned, whereas iterating over a property using for await yields all available values. Listing 13 shows how to print a single or multiple values of the entity.

3.5.2. Singularization in LDkit

LDkit supports singularization definition directly within the model. In the data schema, users can specify which properties are restricted to single value (the default), and which may contain multiple values. This approach enhances the developer experience by ensuring data integrity and providing tailored methods for reading and writing properties based on whether they represent a singular value or an array.

Furthermore, as we mentioned earlier, some arrays may be quite large, raising the question of how to update such lists efficiently. The LDkit library addresses this challenge by offering developers a mechanism to set array-like properties to a specific list of values, add new values, or remove existing ones. This functionality enables efficient updates to large datasets without requiring the entire dataset to be transmitted. An example of such an update is shown in Listing 14.

3.6. Data Sources and Query Engine

In LDkit, a Query engine is a component that handles execution of SPARQL queries over data sources. The query engine must follow the RDF/JS Query specification (Taelman & Scazzosi, 2023) and implement the StringSparqlQueryable interface.

LDkit ships with a simple default query engine that lets developers execute queries over a single SPARQL endpoint. It is lightweight and optimized for browser environment, and it can be used as a standalone component, independently of the rest of LDkit. The engine supports all SPARQL endpoints that conform to the SPARQL 1.1 specification (Harris & Seaborne, 2013).

LDkit is fully compatible with Comunica-based query engines. Comunica (Taelman et al., 2018) provides access to RDF data from multiple sources and various source types, including Solid pods, RDF files, Triple/Quad Pattern Fragments, and HDT files.

3.7. Converting Data Between RDF and TypeScript

LDkit implements a runtime RDF decoding mechanism that transforms RDF triples into structured TypeScript objects, according to a user-defined schema. This decoding is performed by the Decoder component, which interprets the RDF graph through a schema-driven approach. The algorithm supports language-tagged literals, nested entities, optional and repeated properties, and type validation, enabling reliable and predictable transformation of RDF data into a TypeScript object aligned with the schema.

The input of the decoding algorithm is an RDF graph, a schema, and an optional preferred language settings. The RDF graph is a set of triples (subject S, predicate P, object O) serialized in the following tree structure, grouping predicates by subject, and objects by predicate:

The decoding process operates in several stages, formally described in Algorithm 1.

Resource identification: All RDF subjects in the graph are scanned for an rdf:type including ldkit:Resource.⁴¹ Only those nodes are considered as root nodes for decoding.

Per-node decoding: Each resource node is decoded based on the schema, which maps property keys to RDF predicates and type constraints.

Property handling: Each property is evaluated based on its constraints.

–
Required vs Optional: Required properties must be present; missing values raise errors⁴².
–
Multilingual literals: If @multilang attribute is set, terms are grouped by language.
–
Arrays: Properties marked with @array are decoded into lists.
–
Nested schemas: Properties with @schema are recursively decoded into sub-objects.

4.
Language preference: If a preferred language is specified, literals in that language are prioritized.
5.
Value resolution: RDF terms are mapped to TypeScript values using built-in or custom data types converters.

To facilitate end-to-end type safety, in addition to the runtime type conversion provided by the Decoder component, LDkit includes a compile-time TypeScript type conversion through the SchemaInterface type helper. This utility takes the data schema as input and infers a TypeScript type that describes the shape of a single data entity corresponding to the schema. This shape is equivalent to the one produced by the Decoder component.

The resolution algorithm employed by SchemaInterface closely mirrors that of the runtime Decoder in terms of semantic interpretation of schema definitions, including handling of optional properties, arrays, language maps, and value coercion. However, because it operates at compile time, it must express this logic entirely within the TypeScript type system, leveraging advanced TypeScript programming features, including mapped types, conditional types, template literal types, and recursive type inference⁴³.

The SchemaInterface helper is integrated in the Lens component interface and facilitates the data-specific auto-completion during development time.

Since LDkit supports not only querying but also updating RDF data, conversion from TypeScript back to RDF is also required. This functionality is provided by the Encoder component, which effectively inverts the operation performed by the Decoder. It takes as input a set of data entities along with a schema and produces a corresponding set of RDF triples.

However, the encoding algorithm is not entirely symmetrical, as updating RDF data involves more complexity than reading it. Data insertion is relatively straightforward: A data entity is transformed to a set of RDF triples, which are then added to the data store using the INSERT operation of SPARQL Update query. In contrast, data updates are more complex, as they often require replacing existing values in the data store (e.g., when assigning a new value to a required property), or removing them (e.g., when deleting the value of an optional property). Consequently, a typical update operation comprises both triple patterns to be deleted and new triples to be inserted. To support this, the Encoder component also generates the corresponding deletion patterns, where the subject is the IRI of the entity being updated, the predicate is the property IRI, and the object is a variable. This pattern enables the removal of any existing values for the specified properties prior to the insertion of new values.
3.8. LDkit Components

Thanks to its modular architecture, components comprising the LDkit OGM framework can be further extended or used separately, accommodating advanced use cases of leveraging Linked Data in web applications. Besides Schema, Lens, Query engine, Decoder and Encoder already presented, LDkit also provides several additional components and utilities that facilitate developmetn with Linked Data.

The QueryBuilder component automatically generates SPARQL queries for basic Create, Read, Update, and Delete operations based on the data schema. This includes query construction for inserting new entities, retrieving data by IRI or property filters, updating property values, and deleting entities or specific statements.

LDkit also offers a general-purpose, type-safe SPARQL query builder that allows developers to construct arbitrary SPARQL queries programmatically using a fluent interface.

Finally, LDkit also includes Namespaces definitions for popular Linked Data vocabularies, such as Dublin Core,⁴⁴ FOAF⁴⁵ or Schema.org.⁴⁶

This level of flexibility means that LDkit could also support other query languages, such as GraphQL.

3.9. Implementation

The current version of LDkit is implemented in TypeScript, and requires TypeScript version 5.5 or higher to be used effectively in applications. For execution in a server environment, LDkit requires at least Node v20.19.3 or Deno v2.1. The source code is available under the MIT license on GitHub⁴⁷ and Zenodo.⁴⁸

Following the standard practices, LDkit is published as an NPM package⁴⁹ and as a Deno module.⁵⁰ At the time of writing, the latest release is at version 2.5.1.

To make adoption easy for new developers, comprehensive documentation, API reference and code examples are available at https://ldkit.io or linked from the GitHub repository. This resource includes several examples of fully working demo applications covering both Node and Deno environments, and using vanilla JavaScript, React⁵¹ or Preact⁵² frameworks.

LDkit includes a comprehensive suite of over 200 unit tests and integration tests to verify the functionality and interactions of each component within the framework. By rigorously testing the library across various use cases and edge cases, LDkit ensures stability and dependability with each new release, facilitating seamless updates and enhancements while minimizing the risk of introducing regressions.

LDkit is actively developed and maintained by a group of researchers at Department of Software Engineering, Faculty of Mathematics and Physics, Charles University in Prague, Czechia.

4. LDkit V2 Improvements

The release of LDkit 2.0 addressed several limitations of its predecessor (Klíma et al., 2023), and since then LDkit has continued to evolve, introducing further refinements and capabilities to support a broader range of application development scenarios. In this section, we examine the major changes in the library.

4.1. LDkit Query Language

The most significant improvement in LDkit V2 is the introduction of the custom query language that enables filtering and pagination of data entities without requiring knowledge of the SPARQL language. This query language has been thoroughly described in Section 3.4. In V1, users had to provide either a full custom SPARQL query, or at minimum, a custom WHERE clause to issue advanced queries and precisely define the desired results. While the custom SPARQL queries remain supported, the new query language offers a more effective abstraction over SPARQL and functions similarly to traditional ORMs, making it more accessible to application developers.

4.2. Efficient Large Array Manipulation

Another major enhancement introduced in LDkit V2 concerns the manipulation of large arrays, or to be precise, multi-valued properties in RDF datasets. In earlier versions, modifying these array-like structures – such as adding or removing individual items – required either passing the whole updated array to LDkit, or specifying the RDF triples to insert and delete from the underlying datastore. This approach was not only error-prone but also placed a considerable cognitive and technical burden on developers, particularly those unfamiliar with RDF syntax and semantics.

LDkit V2 simplifies this process by introducing a high-level interface for array operations, that was described in Section 3.5. Developers can now perform incremental updates on array-like properties using concise and declarative commands, without the need to manage the RDF graph state manually. This abstraction aligns with the design principles of modern application frameworks, where collection manipulation is a routine task that should not involve low-level data operations. This improvement reduces the likelihood of inconsistencies or unintended side effects in the RDF graph.

4.3. Custom Data Types

In addition to built-in datatypes, LDkit supports custom two-way conversion based on datatype IRI between RDF literals and TypeScript native primitive or complex types. This is useful when working with complex data formats, such as geometrical points, dates, monetary values, or domain-specific representations that require special parsing and serialization.

In order to support custom types, including end-to-end type safety, developers must provide an explicit TypeScript type definition by augmenting the LDkit’s CustomDataTypes interface and register conversion functions to translate between RDF literal values and TypeScript native values.

To demonstrate the usage of custom datatypes, Listing 15 illustrates how to store a complex TypeScript object in RDF using JSON serialization.

4.4. Schema Generators

LDkit provides experimental schema generators that transform existing Linked Data definitions into TypeScript schemas compatible with LDkit. These tools are available via the LDkit CLI and support generating code directly from JSON-LD contexts or ShEx shapes.

Although the generators do not fully support the complete feature set of JSON-LD or ShEx – potentially omitting or simplifying complex validation rules, advanced constraints, and specialized constructs – the output schemas can nevertheless serve as a robust starting point for LDkit-based applications.

4.4.1. Using the CLI

The LDkit CLI is included in the NPM ldkit package, and as such it can be invoked via npx or installed⁵³.

The general command syntax:

–
<command>: One of context-to-schema, shexc-to-schema or shexj-to-schema
–
<method>: Defines how the input is provided. Possible values: *
url – The input is a URL pointing to the resource.
*
file – The input is a path to a local file.
*
arg – The input is passed directly as a string argument.

–
<input>: The actual input data, such as URL, depending on the selected method.

The generators produce TypeScript code that can be used directly in LDkit projects. The CLI outputs the schemas to stdout which can be redirected to a file. For example, the following command transforms a JSON-LD context from a local file and writes the resulting LDkit schema to a schema.ts file:

4.4.2. JSON-LD

The context-to-schema command transforms the @context entry from a JSON-LD 1.1 document to a compatible LDkit schema.

Supported JSON-LD features:

–
property datatypes set via @type keyword
–
containers of type @language, @set and @list
–
@reverse properties
–
nested @context entries

Since JSON-LD contexts do not support specifying property cardinality, all properties are treated as required by default in the LDkit schema, which may require manual adjustment. Nested context entries are transformed to explicit LDkit schemas, meaning that a single JSON-LD context may yield multiple LDkit schemas that can be used independently, if needed.

Example of a JSON-LD context transformation:

4.4.3. ShEx

LDkit supports generating schemas from ShEx 2.1 using two commands – shexc-to-schema and shexj-to-schema that perform ShExC and ShExJ (JSON representation) conversion, respectively.

Supported ShEx features:

–
explicit property types
–
property types inferred from enumerations (value sets)
–
property cardinalities, represented as optional and/or array LDkit properties
–
expressions with choices, represented as optional properties
–
inverse properties
–
nested shapes (both explicit and anonymous)
–
simplified AND/OR shapes logic
–
reuse of named triple expressions

Explicit ShEx shapes result in corresponding LDkit schemas in the generated TypeScript code. Since ShEx language is more expressive than LDkit schemas, some of the ShEx rules need to be simplified – most notably, LDkit schema does not support alternative choices of any kind, that is, it cannot accurately represent constraints such as Schema S contains either property A, or property B. Such rules would be converted to the equivalent of Schema S contains an optional property A and an optional property B, meaning that the resulting LDkit schema is more permissive. Nevertheless, the converted schemas should work well enough with data that are valid according to the original ShEx schema.

Example of a ShExC schema transformation:

4.5. Other Improvements

LDkit V2 includes built-in support for new RDF data serializations – N-Triples, N-Quads, and Trig. These formats complement the already supported Turtle and RDF/JSON serializations, increasing interoperability with a wider range of Linked Data sources and tools.

Additionally, V2 introduced support for inverse property relations in the schema using the @inverse property attribute, discussed in Section 3.3. In V1, the same result could only have been achieved by specifying custom SPARQL queries.

5. Developer Experience

Ensuring a positive developer experience is crucial for the adoption and success of any programming tool. This section explores the various ways in which LDkit improves the workflow for developers, making it easier for them to use Semantic Web and Linked Data technologies in their applications.

5.1. Comparison with Similar Tools

Table 1 presents a comparison of LDkit with other similar software based on key features. The comparison includes LDflex, which is the most widely used; LDO, which offers the most advanced TypeScript integration and type safety; and GraphQL-LD. Each of these libraries employs a different approach to querying Linked Data and offers a distinct developer experience. Together, they represent the state of the art in the development of web applications based on Linked Data.

Table 1.
Comparison of LDkit, LDflex, LDO, and GraphQL-LD.

Feature LDkit LDflex LDO GraphQL-LD

Language Support TypeScript JavaScript TypeScript TypeScript

Data Modelling Utilizes schemas based on JSON-LD context for defining data shapes Does not use fixed data schemas; flexible property access Employs ShEx to define and validate data shapes Does not use fixed data schemas; flexible property access via GraphQL queries

Type Safety End-to-end type safety with TypeScript types automatically inferred from the property RDF datatypes specified in schema No built-in type safety Provides static typings through TypeScript interfaces generated from ShEx shapes No built-in type safety

Query Mechanism Generates SPARQL queries based on defined schemas Uses JavaScript-like expressions interpreted as SPARQL queries Translates object manipulations into RDF/JS Dataset queries Uses GraphQL queries translated into SPARQL queries

Update Mechanism Generates SPARQL UPDATE queries based on defined schemas Uses JavaScript-like expressions interpreted as SPARQL queries Translates object manipulations into RDF/JS Dataset queries Not supported

Environment Compatibility Browsers – via build tool Node.js – native Deno – native Browsers – via build tool Node.js – native Deno – via NPM compatibility layer Browsers – via build tool Node.js – native Deno – via NPM compatibility layer Browsers – via build tool Node.js – native Deno – via NPM compatibility layer

Integration with Frameworks Promise-based API; serializable results Iterative promise-based API; multiple queries needed to read a complex data entity Promise-based API; React components for Solid connectivity Promise-based API; serializable results

Data Source Compatibility Supports multiple distributed data sources Supports multiple distributed data sources Requires pre-loading RDF data into memory Supports multiple distributed data sources

Developer Experience Focusses on providing a clear API with strong tooling support Aims to simplify Linked Data interactions with a familiar syntax Offers an object-like approach with ShEx-based typings Familiar GraphQL syntax makes it accessible to GraphQL developers

Feature	LDkit	LDflex	LDO	GraphQL-LD
Language Support	TypeScript	JavaScript	TypeScript	TypeScript
Data Modelling	Utilizes schemas based on JSON-LD context for defining data shapes	Does not use fixed data schemas; flexible property access	Employs ShEx to define and validate data shapes	Does not use fixed data schemas; flexible property access via GraphQL queries
Type Safety	End-to-end type safety with TypeScript types automatically inferred from the property RDF datatypes specified in schema	No built-in type safety	Provides static typings through TypeScript interfaces generated from ShEx shapes	No built-in type safety
Query Mechanism	Generates SPARQL queries based on defined schemas	Uses JavaScript-like expressions interpreted as SPARQL queries	Translates object manipulations into RDF/JS Dataset queries	Uses GraphQL queries translated into SPARQL queries
Update Mechanism	Generates SPARQL UPDATE queries based on defined schemas	Uses JavaScript-like expressions interpreted as SPARQL queries	Translates object manipulations into RDF/JS Dataset queries	Not supported
Environment Compatibility	Browsers – via build tool Node.js – native Deno – native	Browsers – via build tool Node.js – native Deno – via NPM compatibility layer	Browsers – via build tool Node.js – native Deno – via NPM compatibility layer	Browsers – via build tool Node.js – native Deno – via NPM compatibility layer
Integration with Frameworks	Promise-based API; serializable results	Iterative promise-based API; multiple queries needed to read a complex data entity	Promise-based API; React components for Solid connectivity	Promise-based API; serializable results
Data Source Compatibility	Supports multiple distributed data sources	Supports multiple distributed data sources	Requires pre-loading RDF data into memory	Supports multiple distributed data sources
Developer Experience	Focusses on providing a clear API with strong tooling support	Aims to simplify Linked Data interactions with a familiar syntax	Offers an object-like approach with ShEx-based typings	Familiar GraphQL syntax makes it accessible to GraphQL developers

JSON-LD: JavaScript object notation for linked data; LDO: linked data objects; ShEx: shape expressions.

5.2. Key Differentiators

LDkit provides a simple way of Linked Data model specification through schema, which is a flexible mechanism for developers to define their own custom data models and RDF mappings that are best suited for their application’s requirements. The schema syntax is based on JSON-LD context, and as such it assumes its qualities: it is self-explanatory and easy to create, and can be reused, nested, and shared independently of LDkit.

The Lens interface for reading and writing Linked Data should feel familiar even to developers new to RDF, as it is inspired by interfaces of analogous model-based abstractions of relational databases. LDkit provides tooling support by incorporating end-to-end data type safety, giving developers instantaneous feedback in the form of autocomplete or error highlighting within their development environment. The development experience is further enhanced by the fact that the typings of data entities are inferred directly from the defined schema during development time, without the need to generate additional TypeScript artifacts or provide explicit type information, which is the case for all other similar tools.

Other Linked Data libraries, like LDflex or LDO, make use of the JavaScript Proxy object to allow virtual access to data and override some of their properties. While that approach may be effectively used for developer-friendly paradigms like fluent interfaces, it may be problematic when working with complex data. The reason for that is that proxied objects cannot be in principle cached, serialized, printed, or sent from server to client. LDkit on the other hand opts for representing the data always as JavaScript plain objects and primitives that inherently share all the aforementioned properties. This approach is implemented by the most advanced ORMs, such as Prisma.

5.3. Integration and Compatibility

LDkit adheres to selected W3C standards and recommendations, ensuring compatibility and integration capabilities within the Semantic Web and Linked Data ecosystem. The core of the toolkit is based on the RDF/JS data model (Bergwinkl et al., 2022) and query (Taelman & Scazzosi, 2023) specifications, which standardizes the representation of RDF in JavaScript. To interact with data sources, LDkit utilizes SPARQL Query Language (Harris & Seaborne, 2013) and SPARQL Update (Gearon et al., 2013); its built-in query engine interacts with SPARQL endpoints using SPARQL Protocol (Feigenbaum et al., 2013).

One of the key features of LDkit is its Promise-based API, which aligns with modern asynchronous programming practices in JavaScript. As a result, LDkit can be easily integrated with existing frontend and backend frameworks, such as React, Angular or Express. Whether developers are building complex single-page applications or more traditional multi-page websites, LDkit enables RDF data integration and manipulation across different parts of an application.

Further extending its versatility, LDkit is compatible with multiple JavaScript runtimes, including Node.js, Deno, and Bun. This compatibility ensures that LDkit can be used in a variety of execution environments – from server-side applications, serverless and edge workers, to client-side interfaces. This wide range of applicability allows developers to deploy LDkit in diverse scenarios, whether they are building backend services, interactive client-side applications, or applications requiring low-latency responses at the edge.

5.4. The Expressivity/Complexity Trade-Off

In this section, we examine how the use of LDkit affects expressivity and complexity inherent in Linked Data applications.

LDkit provides a schema-based abstraction over RDF data. Using this schema, LDkit reduces the complexity of RDF data querying by automatically generating SPARQL queries, fetching data from a data source, and transforming it into TypeScript native types. This preserves the expressivity of the data model through semantic mapping in the schema, while eliminating the need for direct SPARQL querying and RDF data mapping in developer code.

However, building the schema can be challenging, as it requires knowledge of data ontologies and possibly the data itself. To reduce schema creation complexity, LDkit provides converters that enable the reuse of existing ShEx shapes or JSON-LD contexts to generate an LDkit schema.

When using LDkit, the expressivity of data querying is deliberately reduced through its interface, based on the assumption that typical application use cases do not require the full degree of expressivity offered by SPARQL. Instead of SPARQL, developers use an ORM-like programming interface along with simplified query expressions to filter data.

For less typical use cases, LDkit offers an advanced interface that allows developers to specify custom SPARQL queries to execute against the data source or explicitly define triples to insert into or remove from the data source. Even in these cases, complexity is still reduced, as LDkit handles query execution and potentially data retrieval and transformation. Additionally, LDkit includes a type-safe SPARQL query builder with a fluent interface that helps to create syntactically correct queries.

In summary, LDkit simplifies RDF data access by reducing complexity while maintaining sufficient expressivity for typical and advanced Linked Data use cases.

5.5. Current Limitations

To date, the greatest limitation of LDkit remains the inherent complexity of the SPARQL queries it generates to read and update data in RDF data sources. Since the complexity of such queries is directly proportional to the complexity of the data schemas, the more properties developers add to the schema, the less performant the LDkit becomes.

Section 6 presents performance tests of LDkit using various schemas, ranging from simple to complex. These tests provide insight into the expected performance and help identify the threshold of schema complexity at which LDkit continues to operate efficiently. In cases where performance issues arise, we recommend refactoring the data schemas by removing unnecessary properties, splitting the schemas into multiple simpler ones, or reducing schema nesting. In many instances, executing several simpler SPARQL queries can be much faster than running a single complex one.

6. Performance

In our previous work (Klíma et al., 2023), we evaluated the performance of LDkit using three typical scenarios involving data queries of increasing complexity, executed via the DBpedia SPARQL endpoint⁵⁴. The experiments showed that the overall performance of LDkit is primarily determined by the execution speed of the SPARQL query engine. The library itself maintained stable performance without substantial degradation, even as the complexity of the scenarios increased. In other words, the majority of the processing time is attributable to the execution of the SPARQL query.

While the evaluation demonstrated that LDkit’s performance is adequate for use in Linked Data-based applications and confirmed our assumption that increasing data schema complexity affects performance, it did not provide detailed insight into the extent of this impact – for example, the performance degradation associated with adding a single new property to the data schema.

To address this limitation, we designed a new performance test that would better indicate the impact of complexity of data schemas. A synthetic dataset was constructed comprising 10,000 entities, each containing 10 simple properties, 10 array properties and 10 object properties. Each array property holds three literal values, and each object property includes a sub-entity with three simple properties. To ensure consistency, UUIDv4 random string values were assigned to all properties. The resulting dataset contains a total of 910,000 explicit RDF statements.

The test scenario involves querying all entities in the dataset over 10 iterations, retrieving 1000 entities in each iteration. To accurately measure the performance overhead introduced by LDkit, each query was executed twice: First using LDkit, and then directly against the SPARQL endpoint using the same SPARQL query⁵⁵. To reduce the influence of outliers, the minimum and maximum execution times were discarded, and the remaining values were averaged to yield the mean query time in milliseconds per 1000 entities.

The sole test parameter – and the only variable across the test runs – is the data schema used for retrieving the data. To evaluate the limits of LDkit, we employed five distinct data schema types:

–
A: includes only simple properties with singular values,
–
B: includes only object properties, with each sub-entity containing three simple properties with singular values,
–
C: includes only array properties, each containing three distinct values,
–
AB: a combination of A and B, incorporating both simple properties and sub-entities,
–
ABC: a combination of A, B, and C, encompassing all property types

The number of properties in each schema type ranged from one to ten. For the composite schema types AB and ABC, this range applies to each individual property type. Listing 16 presents an example of the ABC(1) schema type along with the corresponding entity.

Figure 2 presents the test results, showing the average time required to query 1000 entities using the different data schema types. The tests were conducted on a PC with an Intel CPU @ 2.40 GHz, 8 GB RAM, running Windows 10. To minimize the impact of network latency, a local installation of GraphDB version 11.0 was used as the triplestore. The test scripts, the dataset generator, execution instructions, and raw results are publicly available on GitHub⁵⁶.

Figure 2.
Performance test results of LDkit showing execution time for five different schema types based on the number of properties the schemas include. Schema A includes only simple properties with one value, schema B includes properties with sub-entities (each of those containing three simple properties), and schema C includes array properties containing three values. Schemas AB and ABC are the combinations, for example, Schema AB with five properties includes five simple properties and five properties with sub-entities.

The results demonstrate that LDkit performs efficiently when handling simple properties or properties with sub-entities. For instance, for a schema with ten simple properties, the average execution time to query 1000 entities was 374 ms. Similarly, for a schema of type AB, comprising 10 simple properties and 10 sub-entities, the average execution time was 1923 ms, which is considered satisfactory. However, the results clearly indicate that array properties have a significant negative impact on performance. When the number of array properties exceeds four, the execution time surpasses two seconds, rendering such schemas impractical for use in web applications.

The findings also help identify the potential limits of LDkit schema complexity and suggest strategies to enhance performance in web applications. If performance is suboptimal, one viable approach is to refactor the schema by extracting some or all array properties into separate schemas, which can then be queried independently using the same entity IRI.

Finally, comparing the execution time of LDkit with direct SPARQL endpoint queries, the average overhead introduced by LDkit was 7.47% per test case. Figure 3 illustrates the overhead in various test schemas. This overhead is more pronounced for simple schemas and lower for more complex ones, especially those containing multi-value properties. As the SPARQL query execution time increases, LDkit benefits from the streaming nature of result processing, enabling it to process much of the data in parallel as it is received from the SPARQL endpoint.

Figure 3.
Performance test results of LDkit showing execution time overhead as the percentage increase in execution time compared to direct SPARQL endpoint queries. Schemas A, B, C, AB, and ABC are the same as in Figure 2: Schema A includes only simple properties with one value, schema B includes properties with sub-entities (each of those containing three simple properties), and schema C includes array properties containing three values. Schemas AB and ABC are the combinations, for example, Schema AB with five properties includes five simple properties and five properties with sub-entities.
6.1. Independent Comparative Performance Evaluation

The performance of LDkit was also independently evaluated (González, 2025) together with six other RDF data abstraction libraries, including LDflex and LDO. Although the evaluation has not been published or peer reviewed at the time of writing, the dataset and results are publicly available on GitHub⁵⁷.

The repository provides a detailed description of the performance scenarios, including the steps required to reproduce the tests.

The evaluation assessed library performance across two use cases:

–
Fetching a set of entities from an RDF data source, and
–
fetching a set of entities that satisfy a specific condition.

Statistical analysis⁵⁸ indicates that LDkit outperforms all other evaluated libraries and demonstrates significantly better performance than both LDflex and LDO.
7. Usage

In this section, we present overview of practical adoption and application of LDkit across various platforms and projects. Understanding the extent of LDkit’s utilization not only validates its effectiveness but also demonstrates its impact within the developer community. We explore this through Usage metrics, which quantifies its integration into development environments, and In-use analysis, which provides qualitative insights from real-world applications and developer experiences.

7.1. Usage Metrics

The adoption and popularity of LDkit can be partly quantified by several key metrics commonly used within the open-source community.

Since the source code of LDkit is hosted on GitHub, we can inspect some of the statistics that the platform provides. As of November 2025, the LDkit repository gained 69 stars from other GitHub users, and there are 13 other public repositories that depend on LDkit directly. There seems to be no transitive dependencies, which is an indicator that LDkit is being used in applications and not in libraries, exactly as intended.

We can gather additional insight from the NPM platform⁵⁹ where the Node.js version of LDkit is being hosted. The public NPM statistics indicate that the total number of downloads of the LDkit package between February 1st 2024 and February 1st 2025 was 11,819, averaging more than 227 downloads per week. Similar to GitHub, NPM does not indicate any dependent packages, confirming the assumption that LDkit is being used in user applications, not in libraries or similar tools.

As we mentioned earlier, LDkit is also available as a Deno module.⁶⁰ Unfortunately, the Deno platform does not provide download statistics or dependencies information, therefore we cannot easily gauge the adoption of LDkit for this particular JavaScript runtime.

It is important to note that while these metrics offer valuable insights, they come with certain limitations. For instance, LDkit’s usage on platforms like Deno is not tracked, and projects hosted in private GitHub repositories, or other source code platforms such as Bitbucket or GitLab are not included in these metrics. This lack of comprehensive tracking can lead to underestimations of the actual usage figures. Additionally, the metrics from GitHub and NPM might not fully capture the toolkit’s impact and adoption due to these platforms’ inherent tracking limitations and the fact that they do not encompass all usage scenarios or the variety of developer environments. Moreover, the number of downloads reported by NPM may be overestimated, as a portion of these downloads could be attributed to automated processes such as bots, rather than actual human users.

Thus, while the data from GitHub stars, project dependencies, and NPM downloads are informative, they should be considered indicative rather than exhaustive. These metrics, albeit with their imperfections, still provide a snapshot of LDkit’s presence and relevance in the field of Linked Data application development.

7.2. In-Use Analysis

This section presents qualitative analysis of selected projects that use LDkit: Dutch Digital Heritage Network, Assembly Line (AL) for conceptual models, Dataspecer, TypeSPARQ, Schema Forge, and LDE – Linked Data Engine. Two of these projects include contributions from the authors of this paper, as noted in the respective subsections below.

7.2.1. Netwerk Digitaal Erfgoed

Netwerk Digitaal Erfgoed (NDE),⁶¹ or the Dutch Digital Heritage Network, is a collaborative initiative in the Netherlands focussed on improving access to and preservation of digital heritage collections from Dutch museums, archives, libraries, and other cultural heritage institutions.

The main goal of NDE is to make the rich digital heritage of the Netherlands accessible, sustainable, and interconnected for users, researchers, and the public. It aims to foster cooperation among heritage institutions and promote the use of open standards, linked data, and sustainable digital practices.

A core component of NDE is The Network of Terms⁶² – a search engine for finding terms in terminology sources, such as thesauri, classification systems and reference lists. Given a textual search query, the engine searches one or more terminology sources in real-time and returns matching terms, including their labels and URIs, aggregating the results to the SKOS data model.

This project uses LDkit to query JSON-LD datasets, using Comunica as a query engine. The authors created a complex data schema that employs very advanced LDkit usage. It includes numerous data properties with various data types and leverages features such as nested schemas, arrays, optional properties, and multilingual support.

7.2.2. AL for Conceptual Models

LDkit is used in a project for the Czech government⁶³ that aims to build a set of web applications for distributed modelling and maintenance of government ontologies⁶⁴. The ensemble is called AL (Klíma et al., 2023). It allows business glossary experts and conceptual modelling engineers from different public bodies to model their domains in the form of domain vocabularies consisting of a business glossary further extended to a formal UFO-based ontology (Guizzardi et al., 2022). The individual domain vocabularies are managed in a distributed fashion by the different parties through AL. AL also enables interlinking related domain vocabularies and also linking them to the common upper public government ontology defined centrally. Domain vocabularies are natively represented and published⁶⁵ in SKOS (business glossary) and OWL (ontology). The AL tools have to process this native representation of the domain vocabularies in their front-end parts. Dealing with native representation would be, however, unnecessarily complex for the front-end developers of these tools. Therefore, they use LDkit to simplify their codebase. This allows them to focus on the UX of their domain-modelling front-end features while keeping the complexity of SKOS and OWL behind the LDkit schemas and lenses. On the other hand, the native SKOS and OWL representations of the domain models make their publishing, sharing, and reuse much easier. LDkit removes the necessity to transform this native representation with additional transformation steps in the back-end components of the AL tools.

Disclaimer: Authors of the paper contributed to this project.

7.2.3. Dataspecer

Dataspecer (Stenchlák et al., 2022), a tool for management and modelling of data specifications based on a domain ontology. Using Dataspecer, the users can generate technical artifacts such as data schemas, for example, in JSON Schema or XML Schema, and human-readable documentation for a specific dataset based on the provided ontology while maintaining the semantic mapping from the generated artifacts to the ontology. This significantly eases the task of developing data specifications and keeping the corresponding technical artifacts consistent in the process.

Dataspecer provides support for the implementation of artifact generators for any target format, including LDkit. Using Dataspecer, users can automatically generate ready-to-use LDkit data schemas based on generic data specification. These schemas are generated as TypeScript files that are ready to use in an LDkit-based application, speeding the development time considerably.

Disclaimer: authors of the paper contributed to this project.

7.2.4. TypeSPARQ

TypeSPARQ⁶⁶ is a dynamic and user-friendly service that simplifies the process of exploring and extracting data from SPARQL endpoints. With TypeSPARQ, users can easily navigate and understand the schema of a SPARQL endpoint, thanks to its graphical interface that provides a visual representation of the endpoint’s schema.

Additionally, TypeSPARQ enables users to visually select data structures or subsets of ontologies and directly generate a starter LDkit application from those selections. This includes the specification of the endpoint, data schemas, and Lens components necessary for the development.

As a result, using TypeSPARQ enables application developers to bootstrap their Linked Data applications easily, even without previous knowledge of RDF and related technologies. TypeSPARQ produces a ready-to-use starter template that provides an end-to-end type-safe access to selected data in the target SPARQL endpoint. This accessibility significantly lowers the barrier to entry for new developers in the field of Semantic Web and Linked Data, allowing them to focus more on application logic and less on the complexities of RDF data handling. This approach not only simplifies the initial setup process but also accelerates the development cycle, making it more approachable and manageable for developers of all skill levels.

7.2.5. Schema Forge

Schema Forge⁶⁷ is a low-code, pattern-driven schema engineering tool designed to facilitate the interactive exploration and authoring of RDF-based ontologies. It supports the ingestion of ontologies serialized in formats such as Turtle and JSON-LD, enabling users to navigate classes, properties, and relationships through a dynamic graphical interface, including visualization of ontology structures.

To enable dynamic retrieval of ontology elements, Schema Forge integrates LDkit for querying Linked Data resources. Specifically, it utilizes LDkit’s querying capabilities to obtain class hierarchies and enumerate subclass relationships within the connected dataset, thereby ensuring type-safe and schema-aware interaction with RDF data.

While Schema Forge is currently in an early stage of development and lacks some features required for production-level deployment, it provides a functional foundation for ontology exploration and editing. The project appears to be under active development, with a publicly available roadmap indicating planned features and enhancements.

7.2.6. Linked Data Engine

The LDE⁶⁸ is a modular suite of Node.js libraries that are designed to support the development and execution of Linked Data applications and data processing pipelines.

The project encompasses a broad set of functionalities, including dataset modelling, downloading dataset distributions, and retrieving dataset descriptions from DCAT registries. It also facilitates the deployment of local SPARQL endpoints for testing purposes, as well as the import of data dumps into local SPARQL endpoints for querying.

LDE is currently under active development. A public roadmap outlines upcoming features, including a pipeline builder tool to query, transform and enrich Linked Data.

At present, LDE utilizes LDkit to query dataset descriptions from DCAT-AP 3.0 registries. The implementation includes a custom DCAT namespace, a dataset schema built upon this namespace, and leverages LDkit filtering and pagination capabilities.

8. Future Work

While LDkit is a production-ready toolkit suitable for developing most common Linked Data-based applications, several aspects could be improved in future iterations.

First, LDkit is currently RDF graph-agnostic when reading and writing data. To enhance flexibility, we plan to introduce support for named graph constraints in LDkit resources, allowing users to specify particular graphs for their data. A key challenge in this implementation is determining how to handle resources that span multiple graphs. This raises the question whether to enforce graph constraints at the resource level or at the property level.

Second, LDkit users would benefit from support for RDF containers and collections, enabling the retrieval of ordered or unordered lists of entities – a common requirement in application development. Since LDkit relies on SPARQL queries for data retrieval and updates, effective support for these RDF structures must first be implemented in SPARQL to ensure efficient integration into LDkit and similar tools. While SPARQL 1.1 introduced support for property paths, making it somewhat easier to work with ordered lists, it still leaves a lot to be desired.

Finally, to support the development of user-friendly applications, LDkit must be able to query data fast. As previously discussed, more complex data schemas result in more complex SPARQL queries, leading to longer execution times within query engines. Addressing this challenge is nontrivial, as SPARQL performance issues stem from graph-based data model, inefficient storage or indexing, and the computational cost of joins and path queries. Given these constraints, there are limits to the performance improvements that can be reasonably expected from SPARQL query engines in the future. Instead, performance optimizations could be implemented at the LDkit level by leveraging knowledge of the data schema. This could involve decomposing complex SPARQL queries into smaller queries, deferring their execution, or supporting the retrieval of partial results.

9. Conclusion

Web application developers face considerable challenges, primarily due to the vast array of technologies they need to master. The addition of Linked Data into the web development mix introduces yet another layer of complexity with its distinct set of technologies and standards. This exacerbates the learning curve and can detract from a positive development experience. Historically, the Semantic Web community has struggled to keep pace with the rapid evolution of application development standards, and there has been a notable scarcity of robust, production-ready frameworks or libraries that adequately support developers in integrating Semantic Web technologies smoothly. Consequently, despite the potential benefits of Semantic Web technologies, their adoption within the broader web development community remains limited.

LDkit 2.0 represents a major step forward in lowering the barrier to the adoption of Linked Data and Semantic Web technologies. LDkit is the result of a decade-long effort and experience of building front-end web applications that leverage Linked Data, and as such it is a successor to many different RDF abstractions that we have built along the way. It is specifically designed to simplify the complexities associated with querying RDF data, while still providing developers complete control over semantic data mapping through its schema-based approach. By abstracting the intricacies of SPARQL queries and RDF data handling, LDkit allows developers to focus on the logic of their applications rather than getting bogged down by the underlying data structure complexities. The framework offers a schema-driven approach that enables precise control over how data is mapped and manipulated within applications. This means developers can define custom data models that directly correspond to their application needs, ensuring that the integration of semantic data is both seamless and efficient.

In conclusion, we believe that LDkit is a valuable contribution to the Linked Data community, providing a comprehensive abstraction of Linked Data technologies. Throughout this paper, we have presented evidence to support this claim, demonstrating how LDkit addresses specific web development needs and its utility in real-world scenarios. We have illustrated its impact by presenting both quantitative and qualitative insights from existing projects that currently utilize LDkit, including those that further simplify the use of Semantic Web technologies for developers. Based on this evidence, we are confident that LDkit will foster further adoption of Linked Data in web applications.

Funding

The authors received the following financial support for the research, authorship and/or publication of this article: This research was supported by SVV project number 260 698.

Footnotes

Declaration of Conflicting Interests

Ruben Taelman is a postdoctoral fellow of the Research Foundation – Flanders (FWO) (1202124N). The remaining authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Alexander

Cyganiak

Hausenblas

Zhao

(2011). Describing linked datasets with the void vocabulary. https://www.w3.org/TR/void/

Angele

Meitinger

Bußjäger

Föhl

Fensel

(2022). GraphSPARQL: A graphql interface for linked data. In Proceedings of the 37th ACM/SIGAPP symposium on applied computing (pp. 778–785).

Bergwinkl

Luggen

Pavlik

Regalia

Savastano

Verborgh

(2022). RDF/JS: Data model specification. https://rdf.js.org/data-model-spec/

Berners-Lee

Hendler

Lassila

(2001). The semantic web. Scientific American, 284(5), 34–43.

Bizer

Heath

Berners-Lee

(2009). Linked data – the story so far. International Journal on Semantic Web and Information Systems, 5(3), 1–22.

Feigenbaum

Williams

G. T.

Clark

K. G.

Torres

(2013). SPARQL 1.1 protocol. W3C Recommendation 21.

Gearon

Passant

Polleres

(2013). SPARQL 1.1 update. https://www.w3.org/TR/sparql11-update/

González

H. G.

(2025). Dmaog paper examples and performance measurement scripts. https://herminiogarcia.com/dmaog-paper-evaluation/

Guizzardi

Botti Benevides

Fonseca

C. M.

Porello

Almeida

J. P. A.

Prince Sales

(2022). UFO: Unified foundational ontology. Applied Ontology, 17(1), 167–210.

10.

Harris

Seaborne

(2013). SPARQL 1.1 query language. https://www.w3.org/TR/sparql11-query/

11.

Heath

Bizer

(2011). Linked Data: Evolving the Web Into a Global Data Space. Synthesis lectures on the semantic web: Theory and technology, Volume 1. Morgan & Claypool Publishers.

12.

Hogan

Umbrich

Harth

Cyganiak

Polleres

Decker

(2012). An empirical survey of linked data conformance. Journal of Web Semantics, 14, 14–44.

13.

Klíma

Blaško

Křemen

Nečaskỳ

Ledvinka

Binderová

Švagr

Kopeckỳ

(2023). Assembly line: A tool for collaborative modeling of ontologies in public administration. In 2023 ACM/IEEE international conference on model driven engineering languages and systems companion (MODELS-C) (pp. 24–29). IEEE.

14.

Klíma

Taelman

Nečaskỳ

(2023). LDkit: Linked data object graph mapping toolkit for web applications. In International semantic web conference (pp. 194–210). Springer Nature Switzerland. ISBN 978-3-031-47243-5. https://doi.org/10.1007/978-3-031-47243-5_11

15.

Knublauch

Kontokostas

(2017). Shapes constraint language (SHACL). https://www.w3.org/TR/shacl/

16.

Labra-Gayo

J. E.

Iglesias-Préstamo

Martín-Fernández

Arnaud

M. A.

(2024). rudof: A rust library for handling RDF data models and shapes. In Proceedings of the posters and demos track at the 23rd international semantic web conference (ISWC 2024), CEUR Workshop Proceedings, volume 3828. Baltimore, USA.

17.

Ledvinka

Křemen

(2020). A comparison of object-triple mapping libraries. Semantic Web, 11(3), 483–524.

18.

Meroño-Peñuela

Hoekstra

(2016). grlc Makes GitHub Taste Like Linked Data APIs. In The semantic web: ESWC 2016 Satellite Events, Heraklion, Crete, Greece, May 29–June 2, 2016 (pp. 342–353). Springer. ISBN 978-3-319-47602-5. https://doi.org/10.1007/978-3-319-47602-5_48

19.

Peterson

Gao

Malhotra

Sperberg-McQueen

C. M.

Thompson

H. S.

(2012). XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes. https://www.w3.org/TR/xmlschema11-2/

20.

Prud’hommeaux

Boneva

Labra Gayo

J. E.

Kellog

(2019). Shape Expressions Language (ShEx) 2.1. https://shex.io/shex-semantics/

21.

Sporny

Longley

Kellogg

Lanthaler

Lindström

(2020). JSON-LD 1.1. W3C Recommendation, July.

22.

Stenchlák

Š.

Nečaskỳ

Škoda

Klímek

(2022). Dataspecer: A model-driven approach to managing data specifications. In European Semantic Web Conference (pp. 52–56). Springer.

23.

Taelman

Scazzosi

(2023). RDF/JS: Query specification. https://rdf.js.org/query-spec/

24.

Taelman

Van Herwegen

Vander Sande

Verborgh

(2018). Comunica: a modular sparql query engine for the web. In Proceedings of the 17th international semantic web conference. https://comunica.github.io/Article-ISWC2018-Resource/

25.

Taelman

Vander Sande

Verborgh

(2018). GraphQL-LD: linked data querying with GraphQL. In ISWC2018, the 17th international semantic web conference (pp. 1–4).

26.

Taelman

Vander Sande

Verborgh

(2019). Bridges between GraphQL and RDF. In W3C workshop on web standardization for graph data. W3C.

27.

Taelman

Verborgh

(2023). Evaluation of link traversal query execution over decentralized environments with structural assumptions. arXiv preprint arXiv:2302.06933.

28.

Verborgh

Taelman

(2020). LDflex: a read/write linked data abstraction for front-end web developers. In The semantic web – ISWC 2020: 19th international semantic web conference, Athens, Greece, November 2–6, 2020, Proceedings, Part II 19 (pp. 193–211). Springer.

Improving Linked Data Development Experience with LDkit

Abstract

Keywords

1. Introduction

2.1. Web Application Data Abstractions

2.2. JavaScript/TypeScript RDF Libraries

2.3. Alternative RDF Querying Approaches

2.4. RDF Data Descriptors

2.5. JavaScript Runtime Environments

3. LDkit

3.1. Design Philosophy

3.3.1. RDF Type Definition

3.3.2. Properties Definition

3.4. Querying Data

3.5. Updating Data

3.5.1. The Singularization Problem

3.5.2. Singularization in LDkit

3.6. Data Sources and Query Engine

3.7. Converting Data Between RDF and TypeScript

3.9. Implementation

4. LDkit V2 Improvements

4.1. LDkit Query Language

4.2. Efficient Large Array Manipulation

4.3. Custom Data Types

4.4. Schema Generators

4.4.1. Using the CLI

5. Developer Experience

5.1. Comparison with Similar Tools

5.3. Integration and Compatibility

5.4. The Expressivity/Complexity Trade-Off

5.5. Current Limitations

6. Performance

– Fetching a set of entities from an RDF data source, and – fetching a set of entities that satisfy a specific condition. Statistical analysis 58 indicates that LDkit outperforms all other evaluated libraries and demonstrates significantly better performance than both LDflex and LDO. 7. Usage

7.1. Usage Metrics

7.2. In-Use Analysis

7.2.1. Netwerk Digitaal Erfgoed

7.2.2. AL for Conceptual Models

7.2.3. Dataspecer

7.2.4. TypeSPARQ

7.2.5. Schema Forge

7.2.6. Linked Data Engine

8. Future Work

9. Conclusion

Funding

Footnotes

Declaration of Conflicting Interests

Notes

References