Abstract
Knowledge Graphs (KGs) are composed of structured information about a particular domain in the form of entities and relations. In addition to the structured information KGs help in facilitating interconnectivity and interoperability between different resources represented in the Linked Data Cloud. KGs have been used in a variety of applications such as entity linking, question answering, recommender systems, etc. However, KG applications suffer from high computational and storage costs. Hence, there arises the necessity for a representation able to map the high dimensional KGs into low dimensional spaces, i.e., embedding space, preserving structural as well as relational information. This paper conducts a survey of KG embedding models which not only consider the structured information contained in the form of entities and relations in a KG but also its unstructured information represented as literals such as text, numerical values, images, etc. Along with a theoretical analysis and comparison of the methods proposed so far for generating KG embeddings with literals, an empirical evaluation of the different methods under identical settings has been performed for the general task of link prediction.
Keywords
Introduction
Knowledge Graphs (KGs) have become quite crucial for storing structured information. There has been a sudden attention towards using KGs for various applications mainly in the area of artificial intelligence. For instance, in a more general sense, KGs can be used to support decision making process and to improve different machine learning applications such as question answering [3], recommender systems [82], and relation extraction [74]. Some of the most popular publicly available general purpose KGs are DBpedia [40], Wikidata [68], and YAGO [47]. These general purpose KGs often consist of huge amount of facts constructed using millions of entities (represented as nodes) and relations (as edges connecting these nodes).
Although KGs are effective in representing structured data, there exist some issues which hinder their efficient manipulation such as i) different KGs are usually based on different rigorous symbolic frameworks and this makes it hard to utilize their data in other applications [6] and ii) the fact that a significant number of important graph algorithms needed for the efficient manipulation and analysis of graphs have proven to be NP-complete [23]. In order to address these issues and use a KG more efficiently, it is beneficial to transform it into a low dimensional vector space while preserving its underlying semantics. To this end, various attempts have been made so far to learn vector representations (embeddings) for KGs.
As discussed in [69], a typical KG embedding approach, which uses only structured information from the KG, generally follows three steps: (i) determining the form of entity and relation representations, (ii) defining a scoring function, and (iii) learning entity and relation representations. In the first step, the forms in which entities and relations are represented in the vector space are determined. Entities can be represented as vectors or modeled as multivariate Gaussian distributions whereas relations can be encoded as operations, matrices, tensors, multivariate Gaussian distributions, or mixtures of Gaussians. Once the form of the entities are determined, in the second step, a scoring function which measures the plausibility of a triple is defined. The main goal is to enable the model to assign higher score to true triples and lower scores to false/negative/corrupted triples. Thus, in order to achieve this, the third step solves an optimization problem which maximizes the plausibility of true facts in order to learn the embeddings of entities and relations. Note that the method used to generate false/negative/corrupted triples has an impact on the performance of a model. The various negative triple generation methods and their differences are discussed in detail in [36].
Among the different embedding approaches proposed so far, TransE [5] is, to the best of our knowledge, the very first attempt to use distance-based scoring function to learn KG embedding. Given a triple
Literals can bring advantages to the process of learning KG embeddings in two major ways:
In order to learn the embedding for this particular entity (i.e.,
A small fraction of triples taken from the KG DBpedia [40]. A small fraction of triples with literals taken from the KG DBpedia [40].

However, having the model trained with more triples containing literal values for these entities, as shown in Fig. 2, would improve the embeddings for the entities. For instance, based on the values of the data relation
The above example indicates that the use of literals along with their respective entities would add more semantics so that similar entities can be represented close to each other in the vector space while those dissimilar are further apart.
Recently, some approaches have been proposed which leverage the information present in literals to learn KG embeddings. The types of literals considered in these embedding methods are either text, numeric, images, or multi-modal literals, i.e., a combination of more than one medium of information. These methods use different techniques in order to incorporate the literals into the KG embeddings. However, data typed literals are not addressed in these KG embedding models and surveys that are conducted on KG embeddings. The main challenge with data typed literals, such as date and time, is that they require additional semantics to be represented in KG embeddings.
KG embedding models and their categories
This survey analyses different embedding approaches, which make use of literals, and highlights their advantages and drawbacks in handling different challenges such as multi-valued data properties/relations, data typed literals, and units of literals. A review of the different applications used for model evaluation by different KG embedding models is also presented. Furthermore, experiments with some of the models have been conducted specifically on the link prediction task. The contribution of this paper is summarized as follows:
A detailed analysis of the existing literal enriched KG embedding models and their approaches. In addition, the models are categorized into different classes based on the type of literals used.
An evaluation oriented comparison of the existing models on the link prediction task is performed under same experimental settings.
The research gaps in the area of KG embeddings in using literals are indicated which can open directions for further research.
The rest of this paper is organized as follows: Section 2 presents a brief overview of related work. In Section 3, the problem formulation including definitions, preliminaries, types of literals and research questions are provided while Section 4 analyses different KG embedding techniques with literals is discussed. Section 5 reviews different tasks used to train or evaluate the embedding models is given. Section 6 discusses the experiment conducted with the existing KG embedding models with literals on the link prediction task. Finally, concluding remarks summarize our findings on KGs with literals and are presented along with future directions in Section 7.
This section describes the state-of-the-art algorithms proposed for generating KG embeddings. It also gives a brief overview of the surveys already published following these lines and what is lacking in those studies.
A brief overview of the most popular KG embedding techniques, including the state-of-the-art approaches are short listed in Table 1. The categories presented in this table are inspired by a previous survey work [69] for the models without literals (column 1). These categories are created based on the methods used by the models, i.e., translation distance, semantic matching, entity types, relation paths, logical rules, temporal information, and graph structures. We have also categorized the techniques which use literals with respect to the same set of categories. Since a detailed discussion on these categories on the models without literals has already been presented in [69], in the current study, the main focus lies on analysing the models which make use of literals. The standard KG embedding techniques which are extended by the models with literals are listed in Table 2.
KG embedding models with literals and their corresponding base models
KG embedding models with literals and their corresponding base models
Few attempts have been made to conduct surveys on the techniques and applications of KG embeddings [7,25,69]. The survey [25] is conducted on factorization based, random walk based, and deep learning based network embedding approaches such as DeepWalk, Node2vec, and etc. [7,69] discuss only RESCAL [52] and KREAR [43] as methods which use attributes of entities for KG embeddings, and focus mostly on the structure-based embedding methods, i.e., methods using non-attributive triples, for example, translation based embedding models listed in Table 1. However, RESCAL is a matrix-factorization method for relational learning which encodes each object/data property as a slice of the tensor leading to an increase in the dimensionality of the tensor automatically. This method suffers from efficiency issues if literals are utilized while generating KG embeddings. Similarly, KREAR only considers those data properties which have categorical values, i.e., fixed number of values and ignores those which take any random literals as values. One of the recent surveys [53] summarizes the methods proposed so far on refining KGs. However, this survey does not confine itself to embedding techniques and also does not consider most of the approaches which are making use of literals. Another very recent related study [57], discusses different aspects of KG embedding models such as model architectures, training strategies, and hyperparameter optimization but it takes into consideration only those models without literals.
None of the surveys mentioned above include all the existing KG embedding models which make use of literals, such as the ones categorized as models incorporating information represented in literals in Table 1. To the best of our knowledge, this is the first attempt to analyse the algorithms proposed so far for generating KG embeddings using literals. In this paper, discussions on the type of literals, the embedding approaches, and the applications/tasks on which the embedding models are evaluated are given. A categorization of the models based on the type of literals they use is also provided.
This survey is an extension of an already published short survey [24]. The major difference between the two versions is that (i) this survey contains a much more detailed theoretical analysis of the KG embedding models with literals proposed so far, and (ii) it performs empirical evaluation of the discussed models under the same experimental settings under the example of link prediction.
This section briefly introduces the fundamentals of KGs and KG embeddings followed by a formal definition of KG embeddings with literals. It also poses various research questions about why conducting this study is a stepping stone for future development.
Preliminaries
Types of literals
Literals in a KG encode additional information which is not captured by the entities or relations. There are different types of literals present in the KGs:
Research questions
As it can be seen from the above discussion that the information represented in the KGs is diverse, modelling these entities is a challenging task. The challenges which are further targeted in this study are given as follows:
Knowledge graph embeddings with literals
This section investigates KG embedding models with literals divided into the following different categories based on the types of literals utilized: (i) Text, (ii) Numeric, (iii) Image, and (iv) Multi-modal. A KG embedding model which makes use of at least two types of literals providing complementary information is considered as multi-modal. In the subsequent sections, a description of the models for each of the previously described categories analyzing their similarities and differences, followed by a discussion of potential drawbacks are provided.
Models with text literals
In this section, seven KG embedding models utilizing text literals are discussed, namely, Extended RESCAL [51], Jointly(desp) [84], DKRL [78], Jointly [80], SSP [76], KDCoE [9], and KGloVe with literals [10]. A detailed description followed by a summary presenting the comparison of these models is given along with their drawback. Moreover, in order to show the differences between the models based on complexity, the number of parameters of each model is presented in Table 3.
Complexity of the models with text literals in terms of the number of parameters. Θ is the number of parameters in the base model, H is the entity embedding size,
is the number of data relations, L is the number of attribute-value pairs,
is the number of relations,
is the number of words,
is the word embedding size,
is the dimension of input vectors at the first layer,
is the dimension of output vectors at layer 1, K is window size,
is the dimension of input vectors at the second layer,
is the dimension of output vectors at second layer,
and
denote the number of entities in two different languages of a multilingual KG,
and
denote the number of relations in two different languages of a multilingual KG, N is the total number of entities and relations, and M is the total number of entities, relations and words
Complexity of the models with text literals in terms of the number of parameters. Θ is the number of parameters in the base model,
In contrast to the original algorithm, the extended RESCAL algorithm handles the attributive triples in a separate matrix. The matrix factorization is performed jointly with the tensor factorization of the non-attributive triples. The attributive triples containing only text literals are encoded in an entity-attribute matrix
Then, the loss function of the knowledge model is defined as follows:
The text model adopts the same assumption made in [72] that is if two words occur in the same context then there is a relation between them. Based on this assumption, the text model defines the probability of a pair of words
By adopting the joint embedding framework in [72], the main loss of Jointly(desp) is defined as follows:
In order to learn structure-based representations, the TransE approach is directly applied which considers the relation in a triple as the translation from the head entity to the tail entity. On the other hand, Continuous Bag of Words (CBOW) and a deep Convolutional Neural Network (CNN) model have been used to generate the description-based representations of the head and tail entities. In case of CBOW, short text is generated from the description based on keywords and their corresponding word embeddings are summed up to generate the entity embedding. In the CNN model, after preprocessing the description, pretrained word vectors from Wikipedia are provided as input. This CNN model has five layers and after every convolutional layer pooling is applied to decrease the parameter space of CNN and filter noises. Max-pooling is applied for the first pooling and mean pooling for the last one. The activation function used is either tanh or ReLU. The CNN model works better than CBOW because it preserves the sequence of words.
In order to train DKRL, the following margin-based score function is considered as an objective function and minimized using a standard back propagation using stochastic gradient descent (SGD)
For an entity
The entity descriptions are encoded using either bag-of-words, LSTM, or Attentive LSTM (ALSTM) encoders in order to generate text-based representation for the corresponding entities. On the other hand, to better model the structure-based embedidngs, entities and relations can be pre-trained with any existing KG embedding models, such as TransE.
Jointly’s score function is inspired by TransE and defined as follows:
SSP applies the following scoring function:
SSP provides two different settings for training which are referred to as
Let
In order to create the second matrix, the Named Entity Recognition (NER) task is performed on the entity description text using the list of entities and predicates of the KG as an input. The NER step employs a simple exact string matching technique which leads to numerous drawbacks such as missing entities due to having different keywords with the same semantics. All the English words that do not match any entity labels are added to the entity-predicate list. Then GloVe co-occurrence for text is applied to the modified text (i.e., DBpedia abstract and comments) using the entity-predicate and word list as input. Finally, the two co-occurrence matrices are summed up together to create a single unified matrix. The proposed approach has been evaluated on classification and regression tasks and the result indicates that for most of the classifiers used, except SVM, the approach does not bring significant improvement to KGloVe. However, the approach can be improved using parameter tuning with extensive experiments.
In this section, the analysis of the presented KG embedding models which use numeric literals, namely, MT-KGNN [63], KBLRN [22], LiteralE [37], and TransEA [75] are presented followed by a summary. Moreover, in order to show the differences between the models based on complexity, the number of parameters of each model is presented in Table 4.
Complexity of the models with numerical literals in terms of the number of parameters. Θ is the number of parameters in the base model, H is the entity embedding size,
is the number of data relations, Λ is the size of the hidden layer in the Attrnet networks of MTKGNN,
is the number of relations, and M is attribute embedding size
Complexity of the models with numerical literals in terms of the number of parameters. Θ is the number of parameters in the base model,
In RelNet, a concatenated triple is passed through a nonlinear transform and then a sigmoid function is applied to get a linear transform:
The overall loss of the AttrNet is computed by adding the MSE of the head AttrNet and that of the tail AttrNet as follows:
For the attribute embedding, it uses all attributive triples containing numeric values as input and applies a linear regression model to learn embeddings of entities and attributes. Given an attributive triple
The main loss function for TransEA (
the literal value “110.0” from the first triple and the literal value “110.0” from the second triple could be considered exactly the same if the semantics of the types kilogram and pound are ignored. Moreover, most of the models do not have a proper mechanism to handle multi-valued literals.
Regarding model complexity, the number of parameters used in each model is presented in Table 4 to show the complexity in terms of the parameters. It is noted that the complexity of the models depend on the size of the dataset and TransEA has lower complexity as compared to the other models.
In this section, KG embedding models utilizing images of entities, namely, IKRL [77] and MTKGRL [58] are discussed. First, a detailed analysis of the models is presented followed by a summary. Moreover, in order to show the differences between the models based on complexity, the number of parameters of each model is presented in Table 5.
Complexity of the models with text literals in terms of the number of parameters. Θ is the number of parameters in the base model, H is the entity embedding size,
represents the dimension of image features,
is the number of parameters in AlexNet [38],
represents the number of entities, and
is the number of images
Complexity of the models with text literals in terms of the number of parameters. Θ is the number of parameters in the base model,
For the
Attention-based multi-instance learning is used to integrate the representations learned for each image instance by automatically calculating the attention that should be given to each instance. The attention for the
Given a triple, the overall energy function is defined by combining four energy functions (i.e.,
Given the energy function
MTKGRL defines an energy function for each kind of representation and also their combinations, i.e., structural energy, multimodal energies, and structural-multimodal energies. Structural energy is adopted from TransE, which is defined as
The multimodal energy function under the translational assumption is given as:
The overall energy function, shown in Equation (43), is defined by combining the aforementioned energy functions, i.e.,
This section presents an analysis of the embedding models making use of at least two types of literals providing complementary information. First, the category with numeric and text literals is discussed followed by the category with numeric, text, and image. Moreover, in order to show the differences between the models based on complexity, the number of parameters of each model is presented in Table 6.
Complexity of the models with multimodal literals in terms of the number of parameters. Θ is the number of parameters in the base model, H is the entity embedding size,
is the number of data relations,
is the number of characters, and
is the number of images,
is the number of parameters in the CNN model used in [19],
is the number of parameters in ARAE [83] where instead of using the random noise vector z , the generator is conditioned on the entity embeddings,
denotes the sum of the number of parameters in BE-GAN [1] and in pix2pix-GAN [30]
Complexity of the models with multimodal literals in terms of the number of parameters. Θ is the number of parameters in the base model,
On the other hand, the attributing character embedding is designed to learn embeddings for entities from the strings occurring in the attributes associated with the entities. The purpose is to enable the entity embeddings from two KGs to fall into the same vector space despite the fact that the attributes come from different KGs. The attribute character embedding is inspired by the concept of translation in TransE. Given a triple
The following objective function is defined for the attribute character embedding:
The attribute character embedding
All the three functions are summed up to an overall objective function
Models with numeric, text, and image literals
The binary cross-entropy loss, as defined below, is used to train the model:
Moreover, using these learned embeddings and different neural decoders, a novel multimodal imputation model is introduced to generate missing multimodal values, such as numerical data, categorical data, text, and images, from information in the knowledge base. In order to predict the missing numerical and categorical data such as dates, gender, and occupation, a simple feed-forward network on the entity embedding is used. For text, the adversarially regularized autoencoder (ARAE) has been used to train generators that decodes text from continuous codes, having the generator conditioned on the entity embeddings instead of random noise vector. Similarly, the combination of BE-GAN structure with pix2pix-GAN model is used to generate images, conditioning the generator on the entity embeddings.
Applications
This section discusses different applications of KG embeddings on which the previously described methods have been trained and/or evaluated.
Link prediction is one of the most common tasks used for evaluating the performance of KG embeddings. Head prediction, tail prediction, and relation prediction are different kinds of sub-tasks related to link prediction. Head prediction aims at identifying a missing head entity where the relation and tail entity are given, and analogously for tail prediction and relation prediction. Most of the models discussed in Section 4 have been evaluated on some or all of these prediction tasks. Head and tail prediction are used to evaluate the models LiteralE [37], TransEA [75], KBLRN [22], KDCoE [9], EAKGAE [67], IKRL [77], MKBE [55], MTKGRL [58], Jointly(Desp) [84], Jointly [80], and SSP [76]. On the other hand, DKRL [78] has been evaluated on all kinds of link prediction tasks: head, tail, and relation predictions. In Extended RESCAL [51], two kinds of link prediction experiments have been conducted on the Yago 2 [61], i.e., i) tail prediction by fixing the relation type to
Nearest Neighbor Analysis is a task of detecting the nearest neighbors of some given entities in the latent space learned by an embedding model. This task has been performed in LiteralE [37] to compare
Experiments on link prediction
This section provides an empirical evaluation of the methods discussed in the previous section under a unified environmental settings and discusses the results based on the performance of the approaches applied to the task of link prediction. In this work, link prediction is chosen because most of the KG embedding models with literals are trained and evaluated on it. One of the major issues encountered while conducting these experiments is that the source code of some of these models is not openly available and is not easily reproducible. Such methods were excluded from the experimentation. In the subsequent sections, the datasets and the experiments with text, numeric, images and multi-modal literals are presented.
Based on the results of the experiments, a clear comparison is presented between the models with literals on link prediction. In addition, these models are also compared with the standard KG embedding approaches that they extend. Note that these models may inherit the problems that already exist in their corresponding base models – the standard KG embedding models that they extend). For instance, the models that extend DistMult such as
Datasets
The performance of the aforementioned models was measured using two of the most commonly used datasets for link prediction, i.e., FB15K [5] and FB15K-237 [64] are considered. FB15K is a subset of Freebase [2] which mostly contains triples describing the facts about movies, actors, awards, sports and sport teams. It contains a randomly split training, validation, and test sets. The issue with this dataset is that the test set contains a large number of triples which are obtained by simply inverting triples in the training set. This enables a simple embedding model which is symmetric with respect to the head and tail entity to obtain an excellent performance. In order to avoid this, the dataset FB15K-237 has been created by removing the inverse relations from FB15K. The statistics of these datasets is given in Table 7.
The number of entities, object relations, data relations, relational triples, train sets, valid sets, and test sets of the FB15K and the FB15K-237 datasets
The number of entities, object relations, data relations, relational triples, train sets, valid sets, and test sets of the FB15K and the FB15K-237 datasets
As discussed in Section 4.1, the embedding models Extended RESCAL, DKRL, KDCoE, and KGloVe with literals utilize text literals. However, all of these models except DKRL are not considered for experimentation due to the following issues:
The implementation of the model KGloVe with literals is not publicly available and it is not easily reproducible.
KDCoE is designed specifically for cross-lingual entity alignment task which makes it difficult to apply it for link prediction.
In case of Extended RESCAL, practically this method is computationally expensive and thus not considered as a feasible embedding model to incorporate literals.
Moreover, none of the models with literals which are discussed in this paper consider Extended RESCAL in their experiments.
In order to conduct experiments with text literals, 15239 English entity descriptions of the entities common in both datasets FB15K and FB15K-237 shown in Table 7 are taken from LiteralE [37]. The focus lies on the common entity descriptions, i.e., for those entities existing in FB15K but not in FB15K-237 no description is used, because there has already been experiments done using the whole entity descriptions for FB15K dataset in the original paper. This way it would be possible to analyse the effect of the size of the dataset (the entity descriptions) on the performance of the embedding models. The average number of words (tokens) in the descriptions is 143 whereas the maximum and minimum are 804 and 2.
Experiment results using DKRL model on FB15K and FB15K-237 datasets
The results of link prediction on FB15K and FB15K-237 datasets are shown in Table 8 for the models TransE, DKRL with Bernoulli distribution (
Note that in the original paper, the result of DKRL on FB15K is slightly better than TransE. However, in our experiments, as the results in Table 8 indicate, on the FB15K dataset TransE achieves better result than both versions of DKRL on all metrics except MRR and MR. The reason for this is that, as mentioned above, the set of entity descriptions used in our experiments are common for both datasets FB15K and FB15K-237, i.e., there is less entity descriptions in our experiment than there is in the original paper for FB15K. This indicates that the size of the dataset (the entity description has impact on the performance of the model). On the other hand, on the dataset FB15K-237 TransE is outperformed by
Furthermore, the result shows that DKRL model with Bernoulli distribution (
MT-KGNN, KBLRN, LiteralE, and TransEA are the KG embedding models which make use of numeric literals (see Section 4.2). KBLN, the submodel of KBLRN, which excludes the relational information provided by graph feature methods is used in the experiment instead of the main model KBLRN. This is the case because KBLN is directly comparable with the other three models (i.e., MT-KGNN, LiteralE, and TransEA) whereas KBLRN is not. The code2
Moreover, the model LiteralE has different varieties depending on the baseline model and the transformation function used. As discussed in Section 4, in LiteralE there are two transformation functions:
Runtime of models considered in the experiments with numeric literals. The resutls are per single iteration and reported in milliseconds
Another possible analysis to make is to compare the results of the standard models presented in Table 11 with the results of their extensions shown in the ‘both head and tail Prediction’ part of Table 10. For instance,
On the other hand, referring to the overall result on FB15K-237 dataset as shown in Table 12, the model
Link prediction results on FB15K dataset using filtered setting
Link prediction results with models without literals on FB15K using filtered setting
Link prediction results on FB15K-237 dataset using filtered setting
Link prediction results with models without literals on FB15K-237 dataset using filtered setting
Note that it is not possible to compare the whole of MKBE [55] with any other model as it is the only embedding model which utilizes the three types of literals together: text, numeric, and images. Therefore, its sub model
MRR results on link prediction task on YAGO-10 taken from MKBE [55]
MRR results on link prediction task on YAGO-10 taken from MKBE [55]
Link prediction results on FB15K and FB15K-237 datasets using filtered set
As discussed in Section 4, the existing multi-modal embeddings are categorized into two types: i) models with text literal, numeric literal and image literals and ii) models with text and numeric literals. However, since MKBE is the only model in the first category only its submodel
The experimental results obtained on the datasets FB15K and FB15K-237 are shown in Table 15. As the result indicates, combining text and numeric literals on FB15K dataset with
Discussion and conclusion
Given the recent massive attention towards the use of KGs in various applications, different KG embedding techniques have been proposed to enable efficient use of KGs. In some of these techniques, an attempt has been made to utilize the information represented in literals present in KGs for a better quality embedding of the elements of the KGs, i.e., entities and relations. In this paper, a comprehensive survey of those KG embedding models with literals has been presented. The survey provides a detailed analysis and categorization of these models based on the proposed methodology along with their application scenarios and limitations. Moreover, various experiments on link prediction task on these models have been conducted so as to compare the models’ performances. [62] is a very recent work, not included in this survey, which re-evaluates KG completion models.
In this paper, two major research questions are formulated and presented in Section 3. The answers to these questions are given as follows:
In order to use both data sources, i.e., triples with object relations and triples with data relations (attributive triples) together for representation learning, in broader terms, the following two techniques are considered in the models discussed in this paper: Handling literals separately: defining one task per data source like in TransEA or using a separate encoder for literals as in DKRL. The two tasks are trained simultaneously to make sure that for every entity the information available in both data sources are used to learn its embedding. The embeddings of the entities learned based on each data source can be unified in the vector space or not depending on how the model works. For instance, Jointly(Disp) learns unified representation for entities where as DKRL generates two representations per entity and do not force them to be unified. Incorporating literals directly into entity embeddings: as in LiteralE, one way is to extend a certain latent feature method by directly enriching the embeddings with information from literals via a learnable parameter and use the same scoring function from the latent feature method. The following are some possible ways to combine different kind of literals, i.e., text, numeric, etc. together for representation learning. Encoding each type of literal separately: in order to capture the semantics of literals, different encoders can be used for different types of literals, for example, CNN for textual descriptions. Then, as shown in MKBE, each attributive triple can be treated same as structured triples and use a single scoring function for training. Incorporating information present in every kind of literal directly into the entity embedding: as in LiteralE, for a given entity, first the literals associated with it are encoded as vectors – using one vector per type of literal. Then, a mapping function is used to map all these vectors (including the structure-based vector representation of the entity) into a single vector. The effect that data types/units have on the semantics of literals has not been considered by any of the models. Most of the embedding models which make use of numerical literals, such as LiteralE, TransEA, MT-KGNN, and KBLN consider only the year part of date typed literals and ignore the month and day values. This hinders the ability to properly capture the information represented in such kind of literals. For example, given the following three date typed literal values: Most of the models also do not have a proper mechanism to handle multi-valued literals. The performance of most of the models is dependent on the dataset used for training and testing which shows that these models are not robust. For example, referring to Table 15, the results of the model Not all the models are effective in combining different types of literals. For example, the performance of Only few approaches have been proposed for multi-modal KG embeddings and none of them take into consideration literals with URIs connected to items such as audio, video, or pdf files.
As mentioned in Section 4 or seen from the result of the experiments in Section 6, these embedding models have different drawbacks such as:
The above described shortcomings of the existing models clearly indicate that thorough investigation is needed on how to address different types of literals that obtain different inherent semantics. For instance, a possible perspective that arises by this detailed analysis is that there is a need to properly handle the data typed literals such as the values of the data relation
One cannot expect that by leaving out available information present in the original KG, its latent representation as being only an approximation of the original KG, will perform equally well on tasks that depend on its semantic information content. Overall, the inclusion of datatyped literals with a proper representation of their semantics into the representation learning process will increase the model’s semantic content and might thereby lead to quality improvement.
Footnotes
Summary of applications
Summary of different applications on which the KG embedding techniques with literals, in their original papers, have been trained and/or evaluated
Link prediction
Triple classification
Entity classification
Entity alignment
Attribute value prediction
Nearest neighbour analysis
Data linking
Document classification
Relational fact extraction
Extended RESCAL
LiteralE
TransEA
KBLRN
DKRL
KDCoE
KGlove with literals
IKRL
EAKGE
MKBE
MT-KGNN
LiteralE with blocking
Jointly(Desp)
Jointly
SSP
MTKGRL
