Indexing and querying algorithm based on structure indexing for managing massive-scale RDF data

Abstract

Resource description framework (RDF) data management systems (triplestores) that store, index, and process RDF data sets has been a core research interest in semantic web area. Even though the value-based approach with a RDBMS support for constructing triplestores is highly successful in practice, it falls short in storing massive scale RDF data sets and improving query processing performance. In this paper, we propose a novel indexing approach for massive-scale RDF management. Our indexing approach builds collections of vertices, predicates, and graph/relation information with optimized form for both SPARQL query and keyword-based query. Also, we propose a grouping and flagging algorithm to optimize stored data. Triplestore based on our approach can support highly scalability with a distributed repository and a high-performance query processing. Our experimental results show the proposed indexing approach performs considerably better in query processing time than conventional approaches, and the performance benefits are even greater when the query is complex (more constrains or with FILTER).

Keywords

Resource description framework indexing querying distributed storage keyword search

Get full access to this article

View all access options for this article.