Abstract
In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to a search personalization problem which aims to tailor the search results to each specific user based on the user’s personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple (query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as strong search personalization baselines.
Keywords
Introduction
Large-scale knowledge bases (KBs), such as YAGO [41], Freebase [3] and DBpedia [25], are usually databases of triples representing the relationships between entities in the form of fact(head entity, relation, tail entity) denoted as
Many embedding models have proposed to learn vector or matrix representations for entities and relations, obtaining state-of-the-art (SOTA) link prediction results [35]. In these embedding models, valid triples obtain lower implausibility scores than invalid triples. Let us take the well-known embedding model TransE [4] as an example. In TransE, entities and relations are represented by k-dimensional vector embeddings. TransE employs a transitional characteristic to model relationships between entities, in which it assumes that if
Recently, convolutional neural networks (CNNs), originally designed for computer vision [24], have significantly received research attention in natural language processing [9,22]. CNN learns non-linear features to capture complex relationships with a remarkably less number of parameters compared to fully connected neural networks. Inspired from the success in computer vision, Dettmers et al. [10] proposed ConvE – the first model applying CNN for KB completion. In ConvE, only
The score functions in previous models and in our ConvKB model.
denotes the p-norm of
.
=
denotes a tri-linear dot product.
with
and
corresponding to the real and imaginary parts of the complex-valued vector
, and ı denoting the square root of −1. In addition, g denotes a non-linear function. ∗ denotes a convolution operator. · denotes a dot product.
denotes a concatenation operator.
denotes a 2D reshaping of
. Ω denotes a set of filters
The score functions in previous models and in our ConvKB model.
In this paper, we present ConvKB – a novel embedding model which proposes a novel use of CNN for the KB completion task. In ConvKB, each entity or relation is associated with an unique k-dimensional embedding. Let
Our contributions in this paper are as follows:
We introduce ConvKB – a novel embedding model of entities and relationships for knowledge base completion. ConvKB models the relationships among same dimensional entries of the embeddings. This implies that ConvKB generalizes transitional characteristics in transition-based embedding models.
We evaluate ConvKB on two benchmark datasets WN18RR [10] and FB15k-237 [45], and show that ConvKB obtains better link prediction performance than previous SOTA embedding models. In particular, ConvKB obtains the best mean rank and the highest Hits@10 on WN18RR, and produces the highest mean reciprocal rank and highest Hits@10 on FB15k-237.
We also evaluate ConvKB for triple classification on two benchmark datasets WN11 and FB13 [40]. The goal is to classify whether a given triple is valid or not. ConvKB also does better than previous SOTA models, obtaining the best and second best accuracies on WN11 and FB13, respectively.
We adapt ConvKB to search personalization where the search results for a query from a user are driven toward the personal needs of that user by exploiting historical interactions (e.g., submitted queries, clicked documents) between the user and the system [1,18,42,44,56]. Our general aim is to re-rank the documents returned by the search system in such a way that the more relevant documents are ranked higher. More specially, we train our ConvKB to measure a score for each triple (query, user, document), and to reward higher plausibility scores for more relevant documents. We then verify this application of ConvKB on the query logs of a commercial web search engine. Experimental results show that ConvKB significantly improves the ranking quality over the strong baselines.
The paper is organized as follows. We provide related work in Section 2. We then describe our proposed model ConvKB in Section 3. We evaluate and compare ConvKB with previous models on the link prediction and triple classification tasks in Section 4. The application of ConvKB to search personalization is introduced in Section 5. The conclusion is finally presented in Section 6.
TransH [54] extends TransE to allow entities playing different roles in different relations. Each relation r is associated with a relation-specific hyperplane
TransR [27] extends TransH to perform projections where each relation r is associated with a projection matrix
DISTMULT [62] and ComplEx [48] use a tri-linear dot product to compute the score for each triple. See formal definitions of DISTMULT and ComplEx in Table 1. In addition, NTN [40] uses a bilinear tensor operator into a neural network to compute the triple score. Recent approaches also show that using relation paths between entities in the KBs could help to get contextual information for improving the KB completion performance [16,26,33,47]. For example, PTransE-ADD [26] and TransE-

Process involved in the proposed ConvKB (with
A knowledge base
We denote the dimensionality of embeddings by k such that each embedding triple (
Our ConvKB uses different filters
Formally, we define the ConvKB score function f as follows:
If we only use one filter

Parameter optimization for ConvKB in the KB completion
We use the Adam optimizer [23] to train ConvKB by minimizing the loss function
We evaluate ConvKB on two KB completion tasks: the link prediction task [4] and the triple classification task [40]. We use benchmark datasets WN18RR [10] and FB15k-237 [45] for link prediction, and datasets FB13 and WN11 [40] for triple classification. WN18RR and FB15k-237 are subsets of two common datasets WN18 and FB15k [4], respectively. As noted by Toutanova and Chen [45], WN18 and FB15k are easy because they contain many reversible relations. So knowing relations are reversible allows us to easily predict the majority of test triples, e.g. state-of-the-art results on both WN18 and FB15k are obtained by using a simple reversal rule as shown in Dettmers et al. [10]. Therefore, WN18RR and FB15k-237 are created to not suffer from this reversible relation problem in WN18 and FB15k, for which the knowledge base completion task is more realistic. It is also worth noting that when constructing datasets FB13 and WN11, Socher et al. [40] filtered out triples from the test set if either or both of their head and tail entities also appear in the training set in a different relation type or order. Table 2 gives statistics of the experimental datasets.
Statistics of the experimental datasets. In both WN11 and FB13, each validation and test set also contains the same number of incorrect triples as the number of correct triples. It is to note that the FB13 test set is filtered to only contain 7 relations taken from 13 relations appearing in the FB13 training set
Statistics of the experimental datasets. In both WN11 and FB13, each validation and test set also contains the same number of incorrect triples as the number of correct triples. It is to note that the FB13 test set is filtered to only contain 7 relations taken from 13 relations appearing in the FB13 training set
Task description
In the KB completion or link prediction task [4], the purpose is to predict a missing entity given a relation and another entity, i.e, inferring h given
We employ three common evaluation metrics: mean rank (MR), mean reciprocal rank (MRR), and Hits@10 (i.e., the proportion of the valid test triples ranking in top 10 predictions). Lower MR, higher MRR or higher Hits@10 indicate better performance. We report the final scores on the test set for the model obtaining the highest Hits@10 score on the validation set.1
Some previous works also reported Hits@1. However, the formulas of MRR and Hits@1 show a strong correlation between these two scores. So using Hits@1 does not really reveal any additional information for this task.
Link prediction results on WN18RR and FB15k-237 test sets. MRR and H@10 denote the mean reciprocal rank and Hits@10 (in %), respectively. [⋆]: Results are taken from Dettmers et al. [10] where Hits@10 and MRR are rounded to 2 decimal places on WN18RR. The last 4 rows report results of models that exploit information about relation paths (
We use the common Bernoulli trick [27,54] to generate the head or tail entities when sampling invalid triples. We also use entity and relation embeddings produced by TransE to initialize entity and relation embeddings in ConvKB.2
We employ a TransE implementation available at:
To learn our model parameters including entity and relation embeddings, filters
Table 3 compares the experimental results of our ConvKB model with previous published results, using the same experimental setup. Table 3 shows that ConvKB obtains the best MR and highest Hits@10 scores on WN18RR and also the highest MRR and Hits@10 scores on FB15k-237.
ConvKB does better than the closely related model TransE on both experimental datasets, especially on FB15k-237 where ConvKB gains significant improvements of
ConvKB obtains better scores than ConvE on both datasets (except MRR on WN18RR and MR on FB15k-237), thus showing the usefulness of taking transitional characteristics into accounts. In particular, on FB15k-237, ConvKB achieves improvements of
Following Bordes et al. [4], we explore the Hits@10 results on the FB15k-237 test set corresponding to the relation categories. For each relation r, we calculate the averaged number

Hits@10 (in %) on the FB15k-237 test set w.r.t. each relation category.

Hits@10 and MRR on the WN18RR test set w.r.t. each relation. The right y-axis is the percentage of triples corresponding to relations.
Figure 2 shows the Hits@10 results for separately predicting head and tail entities on the FB15k-237 test set with respect to (w.r.t.) each relation category. We find that ConvKB is outperformed by TransE in 1-1 as 1-1 relations are relatively rare. We also find that both TransE and ConvKB are easier to predict entities on the relational “side 1” triples (i.e., predicting head entities in 1-1 and 1-M, and predicting tail entities in 1-1 and M-1). However, TransE is not good at predicting head entities in M-1 and M-M where TransE obtains the Hits@10 scores of 9.9% and 39.8%, while ConvKB is better in achieving the Hits@10 scores of 38.6% and 47.5%, respectively. A reason is probably that ConvKB could bring a generalization of projecting the embedding triples into the vector space of relations rather than TransE. Hence, this helps ConvKB to better modeling M-1 and M-M relations.
For a more concrete example, Fig. 3 presents Hits@10 and MRR scores on WN18RR w.r.t. each relation type.
Task description
The triple classification task aims to predict whether a given triple
Accuracy results (in %) on the WN11 and FB13 test sets. The last 4 rows report accuracies of the models that use relation paths or incorporate with a large external corpus. The best score is in bold while the second best score is in underline . “Avg.” denotes the averaged accuracy over two datasets
Accuracy results (in %) on the WN11 and FB13 test sets. The last 4 rows report accuracies of the models that use relation paths or incorporate with a large external corpus. The best score is in
Similar to the training protocol in Section 4.1.2, we sample invalid triples using the common Bernoulli trick and also train TransE to produce entity and relation embeddings for initializing embeddings in ConvKB. The best accuracies obtained by TransE on the validation set are when using
Main results
Table 4 presents the accuracy results of our ConvKB model and previous published results on the WN11 and FB13 datasets. On WN11, ConvKB obtains an accuracy of 87.6% which outperforms all other models. On FB13, ConvKB gains a second highest accuracy of 88.8% which is 0.3% outperformed by TransD. Compared to TransE, ConvKB absolutely improves by 1.1% on WN11 and 1.3% on FB13. Overall, ConvKB yields the best performance averaged over these two benchmark datasets. This also indicates the generalization of ConvKB over different datasets.
Regarding to TransE, Table 4 demonstrates that we obtain very competitive accuracies of 86.5% and 87.5% on WN11 and FB13 respectively. On WN11, TransE is comparable with TransD, TransR-FT and TranSparse-S while it scores better than lppTransD and TransE-

Accuracy results on the FB13 test set w.r.t. each relation. The right y-axis is the number of triples corresponding to relations.
Figure 4 visualizes the accuracy results of different relations on FB13 for TransE and ConvKB. Relations institution and profession can be categorized as M-M where ConvKB is about 2.3% absolute higher accuracy than TransE, while remaining relations can be categorized as M-1. In short, Fig. 4 shows that ConvKB performs equal to or better than TransE for all 7 relations in the FB13 test set.
Search personalization, an important feature of commercial search engines, has been recently attracted much attention from both academia [6,8,11,28,30,49,50,64] and industry (e.g., Bing, Google, Airbnb [15]). Unlike classical searching methods, personalized search systems utilize the historical interactions such as submitted queries and clicked documents between a user and the systems to tailor returned search results to the needs of that user [1,18,42,44,56]. That historical information can be used to build the user profile, which is crucial to effective personalization [42,44,51,52].
Given a user, a submitted query and the documents returned by a search system for that query, our approach is to re-rank the returned documents so that the more relevant documents should be ranked higher. Following Vu et al. [49], we represent the relationship between the submitted query, the user and the returned document as a
We evaluate ConvKB using the search results returned by a commercial search engine. We use the same dataset of query logs of 106 anonymous users from Vu et al. [49]. A log entity consists of a user identifier, a query, top-10 returned documents ranked by the search engine and clicked documents along with the user’s dwell time. Vu et al. [49] employed the SAT criteria [13] to identify whether or not a clicked document is relevant from the query logs (i.e., a SAT click). They then assigned a
Basic statistics of the dataset [49]
Basic statistics of the dataset [49]
Our ConvKB model is used to re-rank the original list of top-10 documents returned by the commercial search engine as follows: (1) We train ConvKB and use the trained model to calculate a score for each triple
We re-rank the list of top-10 documents returned by the search engine, so all models obtain the same Hits@10 scores.
Query and document embedding initialization
We initialize query and document embeddings for ConvKB and the baseline TransE, then fix query and document embeddings (i.e. not updating these embeddings) during training.
To initialize document embeddings, we follow Vu et al. [49] to train a LDA topic model [2] with 200 topics only on the relevant documents (i.e., SAT clicks) extracted from the query logs. We then use the trained LDA model to infer the probability distribution over topics for each document. We use the topic proportion vector of each document as its document embedding (i.e.
We also represent each query by a probability distribution vector over topics. Let
Hyper-parameter tuning
Similar to the training protocol presented in Section 4.1.2, we run model up to 200 epochs and perform a grid search to choose optimal hyper-parameters on the validation set. Following Vu et al. [49], we use
Results
Experimental results on the test set. ⋆ denotes the results reported in Vu et al. [49]. SE: The original rank is returned by the search engine. CI: This baseline use a personalized navigation method based on previously clicking returned documents [43]. SP: A search personalization method makes use of the short-term profiles [1,52]. The subscripts denote the relative improvement over the baseline TransE
Experimental results on the test set. ⋆ denotes the results reported in Vu et al. [49]. SE: The original rank is returned by the search engine. CI: This baseline use a personalized navigation method based on previously clicking returned documents [43]. SP: A search personalization method makes use of the short-term profiles [1,52]. The subscripts denote the relative improvement over the baseline TransE
Table 6 presents the experimental results of ConvKB, TransE and the previous published results of other strong baselines, in which ConvKB obtains highest MRR and Hits@1 scores. In particular, ConvKB does significantly better than TransE with relative improvements at 12.1% for MRR and 17.7% for Hits@1. It is probably because our model not only can capture richer relational characteristics within the triple but also generalize the transitional relationships between embeddings of user queries and relevant documents for user profiles. We also obtain higher TransE results than those reported in Vu et al. [49]. The reason is that for each valid triple, rather than using only one invalid triple as in [49], we take into account its all invalid triples to train TransE (each valid or invalid triple contains a relevant- or irrelevant-labelled document, respectively).
In this paper, we propose a novel embedding model ConvKB for the knowledge base completion task. ConvKB applies the convolutional neural network to explore the global relationships among same dimensional entries of the entity and relation embeddings, so that ConvKB generalizes the transitional characteristics in the transition-based embedding models. Experimental results show that our ConvKB model outperforms other state-of-the-art models on two benchmark datasets WN18RR and FB15k-237 for the link prediction task, and on two other benchmark datasets WN11 and FB13 for the triple classification task. ConvKB obtains the best mean rank and the highest Hits@10 on WN18RR and obtains the highest mean reciprocal rank and Hits@10 on FB15k-237. In addition, ConvKB produces the best accuracy on WN11 and the second best accuracy on FB13. Moreover, we show the effectiveness of ConvKB for search personalization, in which ConvKB outperforms the strong baselines on the query logs of a commercial web search engine.
In the future work, we plan to extend ConvKB with relation path information to achieve better performance. We will also adapt ConvKB to other personalization tasks where we can model each task as a triple relationship, e.g. in personalized query suggestion or auto-completion.
Our ConvKB implementation is available at:
Bibliographic note
This paper extends our paper [32] published in Proceedings of NAACL-HLT 2018 (Volume 2: Short Papers). We first add a significantly improved analysis to the link prediction task. Then we conduct new extensive empirical study on the triple classification and search personalization tasks.
Footnotes
Acknowledgements
This research was partially supported by the Australian Research Council (ARC) DP150100031 and DP160103934.
