Sage Journals: Discover world-class research

Abstract

In this paper, we propose a novel embedding model, named ConvKB, for knowledge base completion. Our model ConvKB advances state-of-the-art models by employing a convolutional neural network, so that it can capture global relationships and transitional characteristics between entities and relations in knowledge bases. In ConvKB, each triple (head entity, relation, tail entity) is represented as a 3-column matrix where each column vector represents a triple element. This 3-column matrix is then fed to a convolution layer where multiple filters are operated on the matrix to generate different feature maps. These feature maps are then concatenated into a single feature vector representing the input triple. The feature vector is multiplied with a weight vector via a dot product to return a score. This score is then used to predict whether the triple is valid or not. Experiments show that ConvKB obtains better link prediction and triple classification results than previous state-of-the-art models on benchmark datasets WN18RR, FB15k-237, WN11 and FB13. We further apply our ConvKB to a search personalization problem which aims to tailor the search results to each specific user based on the user’s personal interests and preferences. In particular, we model the potential relationship between the submitted query, the user and the search result (i.e., document) as a triple (query, user, document) on which the ConvKB is able to work. Experimental results on query logs from a commercial web search engine show that ConvKB achieves better performances than the standard ranker as well as strong search personalization baselines.

Keywords

Knowledge base completion convolutional neural network ConvKB link prediction triple classification search personalization

1. Introduction

Large-scale knowledge bases (KBs), such as YAGO [41], Freebase [3] and DBpedia [25], are usually databases of triples representing the relationships between entities in the form of fact(head entity, relation, tail entity) denoted as $(h, r, t)$ , e.g., (Melbourne, cityOf, Australia). These KBs are useful resources in many applications such as semantic searching and ranking [21,38,60], question answering [17,66] and machine reading [61]. However, the KBs are still incomplete, i.e., missing a lot of valid triples [40,55]. Therefore, much research work has been devoted towards knowledge base completion or link prediction to predict whether a triple $(h, r, t)$ is valid or not [5].

Many embedding models have proposed to learn vector or matrix representations for entities and relations, obtaining state-of-the-art (SOTA) link prediction results [35]. In these embedding models, valid triples obtain lower implausibility scores than invalid triples. Let us take the well-known embedding model TransE [4] as an example. In TransE, entities and relations are represented by k-dimensional vector embeddings. TransE employs a transitional characteristic to model relationships between entities, in which it assumes that if $(h, r, t)$ is a valid fact, the embedding of head entity h plus the embedding of relation r should be close to the embedding of tail entity t, i.e. $v_{h} + v_{r} \approx v_{t}$ (here, $v_{h}$ , $v_{r}$ and $v_{t}$ are embeddings of h, r and t respectively). That is, a TransE score $‖ v_{h} + v_{r} - v_{t} ‖_{p}$ of the valid triple $(h, r, t)$ should be close to 0 and smaller than a score $‖ v_{h^{'}} + v_{r^{'}} - v_{t^{'}} ‖_{p}$ of an invalid triple $(h^{'}, r^{'}, t^{'})$ . The transitional characteristic in TransE also implies the global relationships among same dimensional entries of $v_{h}$ , $v_{r}$ and $v_{t}$ . Other transition-based models extend TransE to use additional projection vectors or matrices to translate head and tail embeddings into the relation vector space, such as: TransH [54], TransR [27], TransD [19], STransE [34] and TranSparse [20]. Furthermore, DISTMULT [62] and ComplEx [48] use a tri-linear dot product to compute the score for each triple. Recent research has shown that using relation paths between entities in the KBs could help to get contextual information for improving KB completion performance [16,26,29,33,47]. See other embedding models for KB completion in Nguyen [31].

Recently, convolutional neural networks (CNNs), originally designed for computer vision [24], have significantly received research attention in natural language processing [9,22]. CNN learns non-linear features to capture complex relationships with a remarkably less number of parameters compared to fully connected neural networks. Inspired from the success in computer vision, Dettmers et al. [10] proposed ConvE – the first model applying CNN for KB completion. In ConvE, only $v_{h}$ and $v_{r}$ are reshaped and then concatenated into an input matrix which is fed to the convolution layer. Different filters of the same $3 \times 3$ shape are operated over the input matrix to output feature map tensors. These feature map tensors are then vectorized and mapped into a vector via a linear transformation. Then this vector is computed with $v_{t}$ via a dot product to return a score for $(h, r, t)$ . See a formal definition of the ConvE score function in Table 1. It is worth noting that ConvE focuses on the local relationships among different dimensional entries in each of $v_{h}$ or $v_{r}$ , i.e., ConvE does not observe the global relationships among same dimensional entries of an embedding triple ( $v_{h}$ , $v_{r}$ , $v_{t}$ ), so that ConvE ignores the transitional characteristic in transition-based models, which is one of the most useful intuitions for KB completion.

Table 1
The score functions in previous models and in our ConvKB model. $‖ v ‖_{p}$ denotes the p-norm of $v$ . $⟨ v_{h}, v_{r}, v_{t} ⟩$ = $\sum_{i} v_{h_{i}} v_{r_{i}} v_{t_{i}}$ denotes a tri-linear dot product. $\overline{v} = Re (v) - ı Im (v)$ with $Re (v)$ and $Im (v)$ corresponding to the real and imaginary parts of the complex-valued vector $v$ , and ı denoting the square root of −1. In addition, g denotes a non-linear function. ∗ denotes a convolution operator. · denotes a dot product. $concat$ denotes a concatenation operator. $\hat{v}$ denotes a 2D reshaping of $v$ . Ω denotes a set of filters

Model The score function $f (h, r, t)$

TransE $‖ v_{h}$ + $v_{r}$ - $v_{t} ‖_{p}$

DISTMULT $⟨ v_{h}, v_{r}, v_{t} ⟩$

ComplEx $Re (⟨ v_{h}, v_{r}, {\overline{v}}_{t} ⟩)$

ConvE $g (vec (g (concat ({\hat{v}}_{h}, {\hat{v}}_{r}) * Ω)) W) \cdot v_{t}$

Our ConvKB $concat (g ([v_{h}, v_{r}, v_{t}] * Ω)) \cdot w$

Model	The score function $f (h, r, t)$
TransE	$‖ v_{h}$ + $v_{r}$ - $v_{t} ‖_{p}$
DISTMULT	$⟨ v_{h}, v_{r}, v_{t} ⟩$
ComplEx	$Re (⟨ v_{h}, v_{r}, {\overline{v}}_{t} ⟩)$
ConvE	$g (vec (g (concat ({\hat{v}}_{h}, {\hat{v}}_{r}) * Ω)) W) \cdot v_{t}$
Our ConvKB	$concat (g ([v_{h}, v_{r}, v_{t}] * Ω)) \cdot w$

In this paper, we present ConvKB – a novel embedding model which proposes a novel use of CNN for the KB completion task. In ConvKB, each entity or relation is associated with an unique k-dimensional embedding. Let $v_{h}$ , $v_{r}$ and $v_{t}$ denote k-dimensional embeddings of h, r and t, respectively. For each triple $(h, r, t)$ , the corresponding triple of k-dimensional embeddings ( $v_{h}$ , $v_{r}$ , $v_{t}$ ) is represented as a $k \times 3$ input matrix. This input matrix is fed to the convolution layer where different filters of the same $1 \times 3$ shape are used to extract the global relationships among same dimensional entries of the embedding triple. That is, these filters are repeatedly operated over every row of the input matrix to produce different feature maps. The feature maps are concatenated into a single feature vector which is then computed with a weight vector via a dot product to produce a score for the triple $(h, r, t)$ . This score is used to infer whether the triple $(h, r, t)$ is valid or not.

Our contributions in this paper are as follows:

We introduce ConvKB – a novel embedding model of entities and relationships for knowledge base completion. ConvKB models the relationships among same dimensional entries of the embeddings. This implies that ConvKB generalizes transitional characteristics in transition-based embedding models.

We evaluate ConvKB on two benchmark datasets WN18RR [10] and FB15k-237 [45], and show that ConvKB obtains better link prediction performance than previous SOTA embedding models. In particular, ConvKB obtains the best mean rank and the highest Hits@10 on WN18RR, and produces the highest mean reciprocal rank and highest Hits@10 on FB15k-237.

We also evaluate ConvKB for triple classification on two benchmark datasets WN11 and FB13 [40]. The goal is to classify whether a given triple is valid or not. ConvKB also does better than previous SOTA models, obtaining the best and second best accuracies on WN11 and FB13, respectively.

We adapt ConvKB to search personalization where the search results for a query from a user are driven toward the personal needs of that user by exploiting historical interactions (e.g., submitted queries, clicked documents) between the user and the system [1,18,42,44,56]. Our general aim is to re-rank the documents returned by the search system in such a way that the more relevant documents are ranked higher. More specially, we train our ConvKB to measure a score for each triple (query, user, document), and to reward higher plausibility scores for more relevant documents. We then verify this application of ConvKB on the query logs of a commercial web search engine. Experimental results show that ConvKB significantly improves the ranking quality over the strong baselines.

The paper is organized as follows. We provide related work in Section 2. We then describe our proposed model ConvKB in Section 3. We evaluate and compare ConvKB with previous models on the link prediction and triple classification tasks in Section 4. The application of ConvKB to search personalization is introduced in Section 5. The conclusion is finally presented in Section 6.

2. Related work

TransH [54] extends TransE to allow entities playing different roles in different relations. Each relation r is associated with a relation-specific hyperplane $w_{r}$ , and then the embeddings of h and t are projected to this hyperplane. The TransH score function is defined as: $\begin{matrix} f_{TransH} (h, r, t) = ‖ v_{h ⊥} + v_{r} - v_{t ⊥} ‖_{p} \end{matrix}$ where $v_{h ⊥} = v_{h} - w_{r}^{T} v_{h} w_{r}$ and $v_{t ⊥} = v_{t} - w_{r}^{T} v_{t} w_{r}$ are the projected embeddings of h and t on $w_{r}$ respectively.

TransR [27] extends TransH to perform projections where each relation r is associated with a projection matrix $W_{r}$ which is used to map entity embeddings into the vector space of relations: $\begin{matrix} f_{TransR} (h, r, t) = ‖ W_{r} v_{h} + v_{r} - W_{r} v_{t} ‖_{p} \end{matrix}$ Both TransH and TransR use only one vector or matrix to perform projections, ignoring the fact that head and tail entities have different properties to each relation. Therefore, head and tail entities should be associated with their own projection vectors or matrices as presented in direct extensions of TransH and TransR such as TransD [19], STransE [34], lppTransD [65], TransR-FT [12], TranSparse [20] and ITransF[59]. The transitional characteristics in these transition-based models can be intuitively defined as: if $(h, r, t)$ is a valid fact, the projected embedding of h plus the embedding of r is close to the projected embedding of t. This reflects global relationships among same dimensional entries of projected entity embeddings with relation embedding. In our ConvKB model, each filter with the shape of $1 \times 3$ is responsible for mapping head, tail and relation embeddings to the relation vector space. So ConvKB can generalize the transitional characteristics in the transition-based models.

DISTMULT [62] and ComplEx [48] use a tri-linear dot product to compute the score for each triple. See formal definitions of DISTMULT and ComplEx in Table 1. In addition, NTN [40] uses a bilinear tensor operator into a neural network to compute the triple score. Recent approaches also show that using relation paths between entities in the KBs could help to get contextual information for improving the KB completion performance [16,26,33,47]. For example, PTransE-ADD [26] and TransE-comp [16] represent each path by summing up the embeddings of all relations in the path, while Bilinear-comp [16] and PRUNED-PATHS [47] represent each relation in the path by a matrix and directly use matrix multiplication to modeling relation path. Some approaches also incorporate textual mentions derived from a large external corpus for improving the performance [14,45,46,53]. See other methods for learning from KBs in [31,35].

Fig. 1.

Process involved in the proposed ConvKB (with $k = 4$ and $τ = 3$ for illustration purpose).

3. Proposed ConvKB model

A knowledge base $G$ is a collection of valid factual triples in the form of (head entity, relation, tail entity) denoted as $(h, r, t)$ such that $h, t \in E$ and $r \in R$ where $E$ is a set of entities and $R$ is a set of relations. Embedding models aim to define a score function f giving an implausibility score for each triple $(h, r, t)$ such that valid triples receive lower scores than invalid triples. Table 1 presents score functions in previous SOTA models.

We denote the dimensionality of embeddings by k such that each embedding triple ( $v_{h}$ , $v_{r}$ , $v_{t}$ ) are viewed as a matrix $A = [v_{h}, v_{r}, v_{t}] \in R^{k \times 3}$ . And $A_{i, :} \in R^{1 \times 3}$ denotes the ith row of $A$ . Suppose that we use a filter $ω \in R^{1 \times 3}$ operated on the convolution layer. $ω$ is not only aimed to examine the global relationships between same dimensional entries of the embedding triple ( $v_{h}$ , $v_{r}$ , $v_{t}$ ), but also to generalize the transitional characteristics in the transition-based models. $ω$ is repeatedly operated over every row of $A$ to finally generate a feature map $v = [v_{1}, v_{2}, \dots, v_{k}] \in R^{k}$ as: $\begin{matrix} v_{i} = g (ω \cdot A_{i, :} + b) \end{matrix}$ where $b \in R$ is a bias term and g is a non-linear activation function such as ReLU.

Our ConvKB uses different filters $\in R^{1 \times 3}$ to generate different feature maps. Let Ω and τ denote the set of filters and the number of filters, respectively, i.e. $τ = | Ω |$ , resulting in τ feature maps. These τ feature maps are concatenated into a single vector $\in R^{τ k \times 1}$ which is then computed with a weight vector $w \in R^{τ k \times 1}$ via a dot product to give a score for the triple $(h, r, t)$ . Figure 1 illustrates the computation process in ConvKB.

Formally, we define the ConvKB score function f as follows: $\begin{matrix} f (h, r, t) = concat (g ([v_{h}, v_{r}, v_{t}] * Ω)) \cdot w \end{matrix}$ where Ω and w are shared parameters, independent of h, r and t; ∗ denotes a convolution operator; and $concat$ denotes a concatenation operator.

If we only use one filter $ω$ (i.e. using $τ = 1$ ) with a fixed bias term $b = 0$ and the activation function $g (x) = | x |$ or $g (x) = x^{2}$ , and fix $ω = [1, 1, - 1]$ and $w = 1$ during training, ConvKB reduces to the plain TransE model [4]. So our ConvKB model can be viewed as an extension of TransE to further model global relationships.

Algorithm 1:

Parameter optimization for ConvKB in the KB completion

We use the Adam optimizer [23] to train ConvKB by minimizing the loss function $L$ [48] with $L_{2}$ regularization on the weight vector w of the model: $\begin{array}{l} \begin{matrix} L = & \sum_{(h, r, t) \in {G \cup G^{'}}} log (1 \\ + exp (l_{(h, r, t)} \cdot f (h, r, t))) \\ + \frac{λ}{2} ‖ w ‖_{2}^{2} \end{matrix} \\ in which, l_{(h, r, t)} = \{\begin{matrix} 1 & for (h, r, t) \in G \\ - 1 & for (h, r, t) \in G^{'} \end{matrix} \end{array}$ here $G^{'}$ is a collection of invalid triples generated by corrupting valid triples in $G$ . We use the common Bernoulli trick [27,54] to generate the head or tail entities for invalid triples. For each relation r, let $η_{h}$ denote the averaged number of head entities per tail entity whilst $η_{t}$ denote the averaged number of tail entities per head entity. Given a valid triple $(h, r, t)$ of relation r, we then generate a new head entity $h^{'}$ with probability $\frac{η_{t}}{η_{t} + η_{h}}$ to form an invalid triple $(h^{'}, r, t)$ and a new tail entity $t^{'}$ with probability $\frac{η_{h}}{η_{t} + η_{h}}$ to form an invalid triple $(h, r, t^{'})$ . Algorithm 1 details the learning process of our ConvKB model.

4. KB completion evaluation

We evaluate ConvKB on two KB completion tasks: the link prediction task [4] and the triple classification task [40]. We use benchmark datasets WN18RR [10] and FB15k-237 [45] for link prediction, and datasets FB13 and WN11 [40] for triple classification. WN18RR and FB15k-237 are subsets of two common datasets WN18 and FB15k [4], respectively. As noted by Toutanova and Chen [45], WN18 and FB15k are easy because they contain many reversible relations. So knowing relations are reversible allows us to easily predict the majority of test triples, e.g. state-of-the-art results on both WN18 and FB15k are obtained by using a simple reversal rule as shown in Dettmers et al. [10]. Therefore, WN18RR and FB15k-237 are created to not suffer from this reversible relation problem in WN18 and FB15k, for which the knowledge base completion task is more realistic. It is also worth noting that when constructing datasets FB13 and WN11, Socher et al. [40] filtered out triples from the test set if either or both of their head and tail entities also appear in the training set in a different relation type or order. Table 2 gives statistics of the experimental datasets.

Table 2
Statistics of the experimental datasets. In both WN11 and FB13, each validation and test set also contains the same number of incorrect triples as the number of correct triples. It is to note that the FB13 test set is filtered to only contain 7 relations taken from 13 relations appearing in the FB13 training set

Dataset $| E |$ $| R |$ #Triples in train/valid/test

FB15k-237 14,541 237 272,115 17,535 20,466

WN18RR 40,943 11 86,835 3,034 3,134

FB13 75,043 13 316,232 5,908 23,733

WN11 38,696 11 112,581 2,609 10,544

Dataset	$\| E \|$	$\| R \|$	#Triples in train/valid/test
FB15k-237	14,541	237	272,115	17,535	20,466
WN18RR	40,943	11	86,835	3,034	3,134
FB13	75,043	13	316,232	5,908	23,733
WN11	38,696	11	112,581	2,609	10,544

4.1. Link prediction

4.1.1. Task description

In the KB completion or link prediction task [4], the purpose is to predict a missing entity given a relation and another entity, i.e, inferring h given $(r, t)$ or inferring t given $(h, r)$ . The results are calculated based on ranking the scores produced by the score function f on test triples. Following Bordes et al. [4], for each valid test triple $(h, r, t)$ , we replace either h or t by each of other entities in $E$ to create a set of corrupted triples. We use the “Filtered” setting protocol [4], i.e., not taking any corrupted triples that appear in the KB into accounts. We rank the valid test triple and corrupted triples in ascending order of their scores.

We employ three common evaluation metrics: mean rank (MR), mean reciprocal rank (MRR), and Hits@10 (i.e., the proportion of the valid test triples ranking in top 10 predictions). Lower MR, higher MRR or higher Hits@10 indicate better performance. We report the final scores on the test set for the model obtaining the highest Hits@10 score on the validation set.1

¹
Some previous works also reported Hits@1. However, the formulas of MRR and Hits@1 show a strong correlation between these two scores. So using Hits@1 does not really reveal any additional information for this task.

Table 3

Link prediction results on WN18RR and FB15k-237 test sets. MRR and H@10 denote the mean reciprocal rank and Hits@10 (in %), respectively. [⋆]: Results are taken from Dettmers et al. [10] where Hits@10 and MRR are rounded to 2 decimal places on WN18RR. The last 4 rows report results of models that exploit information about relation paths ( ${KB}_{LRN}$ , R-GCN+ and Neural LP) or textual mentions derived from a large external corpus (Node + LinkFeat). The best score is in bold, while the second best score is in underline

Method	WN18RR			FB15k-237

	MR	MRR	H@10	MR	MRR	H@10
IRN [39]	–	–	–	211	–	46.4
KBGAN [7]	–	0.213	48.1	–	0.278	45.8
DISTMULT [62] [⋆]	5110	0.43	49	254	0.241	41.9
ComplEx [48] [⋆]	5261	0.44	51	339	0.247	42.8
ConvE [10]	5277	0.46	48	246	0.316	49.1
TransE [4] (our results)	3384	0.226	50.1	347	0.294	46.5
Our ConvKB model	2554	0.248	52.5	257	0.396	51.7
${KB}_{LRN}$ [14]	–	–	–	209	0.309	49.3
R-GCN+ [37]	–	–	–	–	0.249	41.7
Neural LP [63]	–	–	–	–	0.240	36.2
Node + LinkFeat [45]	–	–	–	–	0.293	46.2

4.1.2. Training protocol

We use the common Bernoulli trick [27,54] to generate the head or tail entities when sampling invalid triples. We also use entity and relation embeddings produced by TransE to initialize entity and relation embeddings in ConvKB.2

²
We employ a TransE implementation available at: https://github.com/datquocnguyen/STransE.

We train TransE using a grid search of hyper-parameters: the dimensionality of embeddings

k \in {50, 100}

, SGD learning rate

\in {1 e^{- 4}, 5 e^{- 4}, 1 e^{- 3}, 5 e^{- 3}}

l_{1}

-norm or

l_{2}

-norm, and margin

γ \in {1, 3, 5, 7}

. The highest Hits@10 scores on the validation set are when using

l_{1}

-norm, learning rate at

5 e^{- 4}

γ = 5

and

k = 50

for WN18RR, and using

l_{1}

-norm, learning rate at

5 e^{- 4}

γ = 1

and

k = 100

for FB15k-237.

To learn our model parameters including entity and relation embeddings, filters $ω$ and the weight vector w, we use Adam [23] and select its initial learning rate $\in {5 e^{- 6}, 1 e^{- 5}, 5 e^{- 5}, 1 e^{- 4}, 5 e^{- 4}}$ . We use ReLU as the activation function g. We fix the batch size at 256 and set the $L_{2}$ -regularizer λ at 0.001 in our objective function. The filters $ω$ are initialized by a truncated normal distribution or by $[0.1, 0.1, - 0.1]$ . We select the number of filters $τ \in {50, 100, 200, 400, 500}$ . We run ConvKB up to 200 epochs and use outputs from the last epoch for evaluation. The highest Hits@10 scores on the validation set are obtained when using $k = 50$ , $τ = 500$ , the truncated normal distribution for filter initialization, and the initial learning rate at $1 e^{- 4}$ on WN18RR; and $k = 100$ , $τ = 50$ , $[0.1, 0.1, - 0.1]$ for filter initialization, and the initial learning rate at $5 e^{- 6}$ on FB15k-237.

4.1.3. Main experimental results

Table 3 compares the experimental results of our ConvKB model with previous published results, using the same experimental setup. Table 3 shows that ConvKB obtains the best MR and highest Hits@10 scores on WN18RR and also the highest MRR and Hits@10 scores on FB15k-237.

ConvKB does better than the closely related model TransE on both experimental datasets, especially on FB15k-237 where ConvKB gains significant improvements of $347 - 257 = 90$ in MR (which is about 26% relative improvement) and $0.396 - 0.294 = 0.102$ in MRR (which is 34+% relative improvement), and also obtains $51.7 % - 46.5 % = 5.2 %$ absolute improvement in Hits@10. Previous work shows that TransE obtains very competitive results [26,33,36,48]. However, when comparing the CNN-based embedding model ConvE with other embedding models, Dettmers et al. [10] did not experiment with TransE. We reconfirm previous findings that TransE in fact is a strong baseline model, e.g., TransE obtains better MR and Hits@10 than ConvE on WN18RR.

ConvKB obtains better scores than ConvE on both datasets (except MRR on WN18RR and MR on FB15k-237), thus showing the usefulness of taking transitional characteristics into accounts. In particular, on FB15k-237, ConvKB achieves improvements of $0.394 - 0.316 = 0.078$ in MRR (which is about 25% relative improvement) and $51.7 % - 49.1 % = 2.6$ % in Hits@10, while both ConvKB and ConvE produce similar MR scores. ConvKB also obtains 25% relatively higher MRR score than the relation path-based model ${KB}_{LRN}$ on FB15k-237. In addition, ConvKB gives better Hits@10 than ${KB}_{LRN}$ , however, ${KB}_{LRN}$ gives better MR than ConvKB. We plan to extend ConvKB with relation path information to obtain better link prediction performance in future work.

Following Bordes et al. [4], we explore the Hits@10 results on the FB15k-237 test set corresponding to the relation categories. For each relation r, we calculate the averaged number $η_{h}$ of heads per tail and the averaged number $η_{t}$ of tails per head. If $η_{h} < 1.5$ and $η_{t} < 1.5$ , r is classified as one-to-one (1-1). If $η_{h} < 1.5$ and $η_{t} ⩾ 1.5$ , r is classified as one-to-many (1-M). If $η_{h} ⩾ 1.5$ and $η_{t} < 1.5$ , r is classified as many-to-one (M-1). If $η_{h} ⩾ 1.5$ and $η_{t} ⩾ 1.5$ , r is classified as many-to-many (M-M). We find that 17, 26, 81 and 113 relations are classified as 1-1, 1-M, M-1 and M-M, respectively. And 0.9%, 6.3%, 20.5% and 72.3% of the FB15k-237 test triples have their relations classified as 1-1, 1-M, M-1 and M-M, respectively.

Fig. 2.

Hits@10 (in %) on the FB15k-237 test set w.r.t. each relation category.

Fig. 3.

Hits@10 and MRR on the WN18RR test set w.r.t. each relation. The right y-axis is the percentage of triples corresponding to relations.

Figure 2 shows the Hits@10 results for separately predicting head and tail entities on the FB15k-237 test set with respect to (w.r.t.) each relation category. We find that ConvKB is outperformed by TransE in 1-1 as 1-1 relations are relatively rare. We also find that both TransE and ConvKB are easier to predict entities on the relational “side 1” triples (i.e., predicting head entities in 1-1 and 1-M, and predicting tail entities in 1-1 and M-1). However, TransE is not good at predicting head entities in M-1 and M-M where TransE obtains the Hits@10 scores of 9.9% and 39.8%, while ConvKB is better in achieving the Hits@10 scores of 38.6% and 47.5%, respectively. A reason is probably that ConvKB could bring a generalization of projecting the embedding triples into the vector space of relations rather than TransE. Hence, this helps ConvKB to better modeling M-1 and M-M relations.

For a more concrete example, Fig. 3 presents Hits@10 and MRR scores on WN18RR w.r.t. each relation type. $member_meronym$ and $hypernym$ are 1-M and M-1 relations, respectively. We find that TransE encounters a difficulty when dealing with these relation types. E.g., for 1,251 triples containing the relation $hypernym$ from 3,134 test triples in the WN18RR test set, TransE only obtains the Hits@10 and MRR scores of 17.4% and 0.076 respectively, while ConvKB performs better than TransE and gets the Hits@10 and MRR scores of 22.5% and 0.121 respectively. In summary, Figs 2 and 3 show that ConvKB are better at modeling 1-M, M-1 and M-M relations than TransE.

4.2. Triple classification

4.2.1. Task description

The triple classification task aims to predict whether a given triple $(h, r, t)$ is valid or not [40]. Each relation r is associated with a threshold $θ_{r}$ . For an unseen test triple $(h, r, t)$ , if its score is below $θ_{r}$ then it will be classified as valid, otherwise invalid. Following Socher et al. [40], the relation-specific threshold $θ_{r}$ is obtained by maximizing the micro-averaged classification accuracy on the validation set.

Table 4
Accuracy results (in %) on the WN11 and FB13 test sets. The last 4 rows report accuracies of the models that use relation paths or incorporate with a large external corpus. The best score is in bold while the second best score is in underline. “Avg.” denotes the averaged accuracy over two datasets

Method WN11 FB13 Avg.

NTN [40] 70.6 87.2 78.9

TransH [54] 78.8 83.3 81.1

TransR [27] 85.9 82.5 84.2

TransD [19] 86.4 89.1 87.8

TransR-FT [12] 86.6 82.9 84.8

TranSparse-S [20] 86.4 88.2 87.3

TranSparse-US [20] 86.8 87.5 87.2

ManifoldE [57] 87.5 87.2 87.4

TransG [58] 87.4 87.3 87.4

lppTransD [65] 86.2 88.6 87.4

TransE [4] (our results) 86.5 87.5 87.0

Our ConvKB model 87.6 88.8 88.2

TransE-NMM [33] 86.8 88.6 87.7

TEKE_H [53] 84.8 84.2 84.5

Bilinear-comp [16] 77.6 86.1 81.9

TransE-comp [16] 80.3 87.6 84.0

Method	WN11	FB13	Avg.
NTN [40]	70.6	87.2	78.9
TransH [54]	78.8	83.3	81.1
TransR [27]	85.9	82.5	84.2
TransD [19]	86.4	89.1	87.8
TransR-FT [12]	86.6	82.9	84.8
TranSparse-S [20]	86.4	88.2	87.3
TranSparse-US [20]	86.8	87.5	87.2
ManifoldE [57]	87.5	87.2	87.4
TransG [58]	87.4	87.3	87.4
lppTransD [65]	86.2	88.6	87.4
TransE [4] (our results)	86.5	87.5	87.0
Our ConvKB model	87.6	88.8	88.2
TransE-NMM [33]	86.8	88.6	87.7
TEKE_H [53]	84.8	84.2	84.5
Bilinear-comp [16]	77.6	86.1	81.9
TransE-comp [16]	80.3	87.6	84.0

4.2.2. Training protocol

Similar to the training protocol in Section 4.1.2, we sample invalid triples using the common Bernoulli trick and also train TransE to produce entity and relation embeddings for initializing embeddings in ConvKB. The best accuracies obtained by TransE on the validation set are when using $l_{1}$ -norm, learning rate at 0.01, $γ = 7$ and $k = 50$ for WN11, and using $l_{2}$ -norm, learning rate at 0.01, $γ = 1$ and $k = 100$ for FB13. We then use a grid search to choose the hyper-parameter for ConvKB. We monitor the accuracy after each training epoch, and obtain the best accuracies on validation set when using $k = 50$ , $τ = 200$ , the truncated normal distribution for filter initialization, and the initial learning rate at $5 e^{- 4}$ on WN11; and $k = 100$ , $τ = 200$ , also the truncated normal distribution for filter initialization, and the initial learning rate at $5 e^{- 5}$ on FB13.

4.2.3. Main results

Table 4 presents the accuracy results of our ConvKB model and previous published results on the WN11 and FB13 datasets. On WN11, ConvKB obtains an accuracy of 87.6% which outperforms all other models. On FB13, ConvKB gains a second highest accuracy of 88.8% which is 0.3% outperformed by TransD. Compared to TransE, ConvKB absolutely improves by 1.1% on WN11 and 1.3% on FB13. Overall, ConvKB yields the best performance averaged over these two benchmark datasets. This also indicates the generalization of ConvKB over different datasets.

Regarding to TransE, Table 4 demonstrates that we obtain very competitive accuracies of 86.5% and 87.5% on WN11 and FB13 respectively. On WN11, TransE is comparable with TransD, TransR-FT and TranSparse-S while it scores better than lppTransD and TransE-comp. On FB13, TransE performs slightly better than ManifoldE and TransG while it achieves similar scores in comparison with TransE-comp and TranSparse-US. Note that TransR, TranSparse-S/US and TransD also perform the embedding initialization using TransE outputs (but these models do not report their TransE accuracy results). Hence, these models might get better results when using our TransE results as shown in this paper.

Fig. 4.

Accuracy results on the FB13 test set w.r.t. each relation. The right y-axis is the number of triples corresponding to relations.

Figure 4 visualizes the accuracy results of different relations on FB13 for TransE and ConvKB. Relations institution and profession can be categorized as M-M where ConvKB is about 2.3% absolute higher accuracy than TransE, while remaining relations can be categorized as M-1. In short, Fig. 4 shows that ConvKB performs equal to or better than TransE for all 7 relations in the FB13 test set.

5. Application for search personalization

Search personalization, an important feature of commercial search engines, has been recently attracted much attention from both academia [6,8,11,28,30,49,50,64] and industry (e.g., Bing, Google, Airbnb [15]). Unlike classical searching methods, personalized search systems utilize the historical interactions such as submitted queries and clicked documents between a user and the systems to tailor returned search results to the needs of that user [1,18,42,44,56]. That historical information can be used to build the user profile, which is crucial to effective personalization [42,44,51,52].

Given a user, a submitted query and the documents returned by a search system for that query, our approach is to re-rank the returned documents so that the more relevant documents should be ranked higher. Following Vu et al. [49], we represent the relationship between the submitted query, the user and the returned document as a $(h, r, t)$ -like triple (query, user, document). The triple captures how much interest a user puts on a document given a query. Therefore, we can also evaluate the effectiveness of our ConvKB model for the search personalization task.

We evaluate ConvKB using the search results returned by a commercial search engine. We use the same dataset of query logs of 106 anonymous users from Vu et al. [49]. A log entity consists of a user identifier, a query, top-10 returned documents ranked by the search engine and clicked documents along with the user’s dwell time. Vu et al. [49] employed the SAT criteria [13] to identify whether or not a clicked document is relevant from the query logs (i.e., a SAT click). They then assigned a $relevant$ label to a returned document if it is a SAT click and also assigned $irrelevant$ labels to the remaining top-10 documents. The rank position of the $relevant$ labeled documents is used as the ground truth to evaluate the search performance before and after re-ranking. As a result, the dataset contains 8,052 valid triples(query, user, relevant document) in which 5,658, 1,184 and 1,210 valid triples are used for training, validation and test, respectively. Table 5 presents the dataset statistics.

Table 5
Basic statistics of the dataset [49]

# users 106

#distinct queries 6,632

#SAT clicks 8,052

#distinct documents 33,591

5.1. Evaluation protocol

Our ConvKB model is used to re-rank the original list of top-10 documents returned by the commercial search engine as follows: (1) We train ConvKB and use the trained model to calculate a score for each triple $(question, user, document)$ . (2) We then sort the scores in the ascending order to achieve a new ranked list. To evaluate the performance, we use two common metrics in document ranking: MRR and Hits@1.3

³
We re-rank the list of top-10 documents returned by the search engine, so all models obtain the same Hits@10 scores.

5.2. Training protocol

5.2.1. Query and document embedding initialization

We initialize query and document embeddings for ConvKB and the baseline TransE, then fix query and document embeddings (i.e. not updating these embeddings) during training.

To initialize document embeddings, we follow Vu et al. [49] to train a LDA topic model [2] with 200 topics only on the relevant documents (i.e., SAT clicks) extracted from the query logs. We then use the trained LDA model to infer the probability distribution over topics for each document. We use the topic proportion vector of each document as its document embedding (i.e. $k = 200$ ). In particular, the zth element ( $z = 1, 2, \dots, k$ ) of the vector embedding for document d is: $v_{d, z} = P (z | d)$ where $P (z | d)$ is the probability of the topic z given the document d.

We also represent each query by a probability distribution vector over topics. Let $D_{q} = {d_{1}, d_{2}, \dots, d_{n}}$ be the set of top n ranked documents returned for a query q (here, $n = 10$ ). The zth element of the vector embedding for query q is defined as in [49]: $v_{q, z} = \sum_{i = 1}^{n} λ_{i} P (z | d_{i})$ , where $λ_{i} = \frac{δ^{i - 1}}{\sum_{j = 1}^{n} δ^{j - 1}}$ is the exponential decay function of i which is the rank of $d_{i}$ in $D_{q}$ . And δ is the decay hyper-parameter ( $0 < δ < 1$ ).

5.2.2. Hyper-parameter tuning

Similar to the training protocol presented in Section 4.1.2, we run model up to 200 epochs and perform a grid search to choose optimal hyper-parameters on the validation set. Following Vu et al. [49], we use $δ = 0.8$ . We also monitor the MRR score after each training epoch and obtain the highest MRR score on the validation set when using the margin at 5, $l_{1}$ -norm and learning rate at $5 e^{- 3}$ for TransE; and using $τ = 500$ , the truncated normal distribution for filter initialization, and the initial learning rate at $5 e^{- 4}$ for ConvKB.

5.3. Results

Table 6
Experimental results on the test set. ⋆ denotes the results reported in Vu et al. [49]. SE: The original rank is returned by the search engine. CI: This baseline use a personalized navigation method based on previously clicking returned documents [43]. SP: A search personalization method makes use of the short-term profiles [1,52]. The subscripts denote the relative improvement over the baseline TransE

Model MRR Hits@1 (%)

SE [⋆] 0.559 38.5

CI [43] [⋆] 0.597 41.6

SP [1,52] [⋆] 0.631 45.2

TransE [4] [⋆] 0.645 48.1

STransE [34] [⋆] 0.656 50.1

TransE (our results) 0.669 50.9

Our ConvKB model 0.750_+12.1% 59.9_+17.7%

Model	MRR	Hits@1 (%)
SE [⋆]	0.559	38.5
CI [43] [⋆]	0.597	41.6
SP [1,52] [⋆]	0.631	45.2
TransE [4] [⋆]	0.645	48.1
STransE [34] [⋆]	0.656	50.1
TransE (our results)	0.669	50.9
Our ConvKB model	0.750_+12.1%	59.9_+17.7%

Table 6 presents the experimental results of ConvKB, TransE and the previous published results of other strong baselines, in which ConvKB obtains highest MRR and Hits@1 scores. In particular, ConvKB does significantly better than TransE with relative improvements at 12.1% for MRR and 17.7% for Hits@1. It is probably because our model not only can capture richer relational characteristics within the triple but also generalize the transitional relationships between embeddings of user queries and relevant documents for user profiles. We also obtain higher TransE results than those reported in Vu et al. [49]. The reason is that for each valid triple, rather than using only one invalid triple as in [49], we take into account its all invalid triples to train TransE (each valid or invalid triple contains a relevant- or irrelevant-labelled document, respectively).

6. Conclusion

In this paper, we propose a novel embedding model ConvKB for the knowledge base completion task. ConvKB applies the convolutional neural network to explore the global relationships among same dimensional entries of the entity and relation embeddings, so that ConvKB generalizes the transitional characteristics in the transition-based embedding models. Experimental results show that our ConvKB model outperforms other state-of-the-art models on two benchmark datasets WN18RR and FB15k-237 for the link prediction task, and on two other benchmark datasets WN11 and FB13 for the triple classification task. ConvKB obtains the best mean rank and the highest Hits@10 on WN18RR and obtains the highest mean reciprocal rank and Hits@10 on FB15k-237. In addition, ConvKB produces the best accuracy on WN11 and the second best accuracy on FB13. Moreover, we show the effectiveness of ConvKB for search personalization, in which ConvKB outperforms the strong baselines on the query logs of a commercial web search engine.

In the future work, we plan to extend ConvKB with relation path information to achieve better performance. We will also adapt ConvKB to other personalization tasks where we can model each task as a triple relationship, e.g. in personalized query suggestion or auto-completion.

Our ConvKB implementation is available at: https://github.com/daiquocnguyen/ConvKB.

Bibliographic note

This paper extends our paper [32] published in Proceedings of NAACL-HLT 2018 (Volume 2: Short Papers). We first add a significantly improved analysis to the link prediction task. Then we conduct new extensive empirical study on the triple classification and search personalization tasks.

Footnotes

Acknowledgements

This research was partially supported by the Australian Research Council (ARC) DP150100031 and DP160103934.

References

P.N.

Bennett,

R.W.

White,

Chu,

S.T.

Dumais,

Bailey,

Borisyuk and

Cui, Modeling the impact of short- and long-term behavior on search personalization, in: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’12, Portland, Oregon, USA, ACM, 2012, pp. 185–194. doi:10.1145/2348283.2348312.

D.M.

Blei,

A.Y.

Ng and

M.I.

Jordan, Latent Dirichlet allocation, in: Advances in Neural Information Processing Systems, Vol. 14,

T.G.

Dietterich,

Becker and

Ghahramani, eds, MIT Press, 2002, pp. 601–608. http://papers.nips.cc/paper/2070-latent-dirichlet-allocation.pdf.

Bollacker,

Evans,

Paritosh,

Sturge and

Taylor, Freebase: A collaboratively created graph database for structuring human knowledge, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08, Vancouver, Canada, ACM, 2008, pp. 1247–1250. ISBN 978-1-60558-102-6. doi:10.1145/1376616.1376746.

Bordes,

Usunier,

Garcia-Duran,

Weston and

Yakhnenko, Translating embeddings for modeling multi-relational data, in: Advances in Neural Information Processing Systems, Vol. 26,

C.J.C.

Burges,

Bottou,

Welling,

Ghahramani and

K.Q.

Weinberger, eds, Curran Associates, Inc., 2013, pp. 2787–2795.

Bordes,

Weston,

Collobert and

Bengio, Learning structured embeddings of knowledge bases, in: Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI’11, AAAI Press, San Francisco, California, 2011, pp. 301–306. https://www.aaai.org/ocs/index.php/AAAI/AAAI11/paper/view/3659.

Cai,

Wang and

de Rijke, Behavior-based personalization in web search, Journal of the Association for Information Science and Technology68(4) (2017), 855–868. doi:10.1002/asi.23735.

Cai and

W.Y.

Wang, KBGAN: Adversarial learning for knowledge graph embeddings, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, Association for Computational Linguistics, 2018, pp. 1470–1480. http://www.aclweb.org/anthology/N18-1133.

Cheng,

Jialie and

S.C.

Hoi, On effective personalized music retrieval by exploring online user behaviors, in: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’16, Pisa, Italy, ACM, 2016, pp. 125–134. http://doi.org/10.1145/2911451.2911491.

Collobert,

Weston,

Bottou,

Karlen,

Kavukcuoglu and

Kuksa, Natural language processing (almost) from scratch, Journal of Machine Learning Research12 (2011), 2493–2537. www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf.

10.

Dettmers,

Minervini,

Stenetorp and

Riedel, Convolutional 2D knowledge graph embeddings, in: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI’18, AAAI Press, 2018, pp. 1811–1818. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17366.

11.

Dou,

Song and

J.-R.

Wen, A large-scale evaluation and analysis of personalized search strategies, in: Proceedings of the 16th International Conference on World Wide Web, WWW’07, Banff, Alberta, Canada, International World Wide Web Conferences Steering Committee, 2007, pp. 581–590. http://doi.org/10.1145/1242572.1242651.

12.

Feng,

Huang,

Wang,

Zhou,

Hao and

Zhu, Knowledge graph embedding by flexible translation, in: Proceedings of the Fifteenth International Conference on Principles of Knowledge Representation and Reasoning, KR’16, Cape Town, South Africa, AAAI Press, 2016, pp. 557–560. https://www.aaai.org/ocs/index.php/KR/KR16/paper/view/12887.

13.

Fox,

Karnawat,

Mydland,

Dumais and

White, Evaluating implicit measures to improve web search, ACM Transactions on Information Systems23(2) (2005), 147–168. doi:10.1145/1059981.1059982.

14.

García-Durán and

Niepert, KBLRN: End-to-end learning of knowledge base representations with latent, relational, and numerical features, 2017, Preprint, arXiv:abs/1709.04676.

15.

Grbovic, Search ranking and personalization at Airbnb, in: Proceedings of the Eleventh ACM Conference on Recommender Systems, RecSys’17, Como, Italy, 2017, ACM, pp. 339–340. doi:10.1145/3109859.3109920.

16.

Guu,

Miller and

Liang, Traversing knowledge graphs in vector space, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Lisbon, Portugal, 2015, pp. 318–327. http://aclweb.org/anthology/D15-1038.

17.

Hao,

Zhang,

Liu,

He,

Liu,

Wu and

Zhao, An end-to-end model for question answering over knowledge base with cross-attention combining global knowledge, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, Association for Computational Linguistics, 2017, pp. 221–231. http://aclweb.org/anthology/P17-1021.

18.

Harvey,

Crestani and

M.J.

Carman, Building user profiles from topic models for personalised search, in: Proceedings of the 22Nd, ACM International Conference on Information & Knowledge Management, CIKM’13, San Francisco, California, USA, ACM. 2013, pp. 2309–2314. http://doi.org/10.1145/2505515.2505642.

19.

Ji,

He,

Xu,

Liu and

Zhao, Knowledge graph embedding via dynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, Association for Computational Linguistics, 2015, pp. 687–696. http://www.aclweb.org/anthology/P15-1067.

20.

Ji,

Liu,

He and

Zhao, Knowledge graph completion with adaptive sparse transfer matrix, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, Phoenix, Arizona, AAAI Press, 2016, pp. 985–991. https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/11982.

21.

Kasneci,

F.M.

Suchanek,

Ifrim,

Ramanath and

Weikum, Naga: Searching and ranking knowledge, in: Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE’08, IEEE Computer Society, Washington, DC, USA, 2008, pp. 953–962. doi:10.1109/ICDE.2008.4497504.

22.

Kim, Convolutional neural networks for sentence classification, in: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, Association for Computational Linguistics, 2014, pp. 1746–1751. http://www.aclweb.org/anthology/D14-1181.

23.

Kingma and

Ba, Adam: A method for stochastic optimization, 2014, Preprint, arXiv:1412.6980.

24.

Lecun,

Bottou,

Bengio and

Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE86(11) (1998), 2278–2324. doi:10.1109/5.726791.

25.

Lehmann,

Isele,

Jakob,

Jentzsch,

Kontokostas,

P.N.

Mendes,

Hellmann,

Morsey,

Van Kleef,

Aueret al., DBpedia – a large-scale, multilingual knowledge base extracted from Wikipedia, Semantic Web6 (2015), 167–195. https://doi.org/10.3233/SW-140134.

26.

Lin,

Liu,

Luan,

Sun,

Rao and

Liu, Modeling relation paths for representation learning of knowledge bases, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, Association for Computational Linguistics, 2015, pp. 705–714. http://aclweb.org/anthology/D15-1082.

27.

Lin,

Liu,

Sun,

Liu and

Zhu, Learning entity and relation embeddings for knowledge graph completion, in: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, AAAI’15, Austin, Texas, AAAI Press, 2015, pp. 2181–2187. https://www.aaai.org/ocs/index.php/AAAI/AAAI15/paper/view/9571.

28.

Liu, Modeling users’ dynamic preference for personalized recommendation, in: Proceedings of the 24th International Conference on Artificial Intelligence, IJCAI’15, Buenos Aires, Argentina, AAAI Press, 2015, pp. 1785–1791. https://www.ijcai.org/Proceedings/15/Papers/254.pdf.

29.

Luo,

Wang,

Wang and

Guo, Context-dependent knowledge graph embedding, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, Association for Computational Linguistics, 2015, pp. 1656–1661. http://aclweb.org/anthology/D15-1191.

30.

Nanas,

Vavalis and

De Roeck, A network-based model for high-dimensional information filtering, in: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’10, Geneva, Switzerland, ACM, 2010, pp. 202–209. http://doi.org/10.1145/1835449.1835485.

31.

D.Q.

Nguyen, An overview of embedding models of entities and relationships for knowledge base completion, 2017, Preprint, arXiv:1703.08098.

32.

D.Q.

Nguyen,

T.D.

Nguyen,

D.Q.

Nguyen and

Phung, A novel embedding model for knowledge base completion based on convolutional neural network, in: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), New Orleans, Louisiana, Association for Computational Linguistics, 2018, pp. 327–333. http://www.aclweb.org/anthology/N18-2053.

33.

D.Q.

Nguyen,

Sirts,

Qu and

Johnson, Neighborhood mixture model for knowledge base completion, in: Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, Berlin, Germany, Association for Computational Linguistics, 2016, pp. 40–50. http://www.aclweb.org/anthology/K16-1005.

34.

D.Q.

Nguyen,

Sirts,

Qu and

Johnson, Stranse: A novel embedding model of entities and relationships in knowledge bases, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, Association for Computational Linguistics, 2016, pp. 460–466. http://www.aclweb.org/anthology/N16-1054.

35.

Nickel,

Murphy,

Tresp and

Gabrilovich, A review of relational machine learning for knowledge graphs, Proceedings of the IEEE104(1) (2016), 11–33. doi:10.1109/JPROC.2015.2483592.

36.

Nickel,

Rosasco and

Poggio, Holographic embeddings of knowledge graphs, in: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, AAAI Press, Phoenix, Arizona, 2016, pp. 1955–1961. https://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12484.

37.

Schlichtkrull,

Kipf,

Bloem,

R.v.d.

Berg,

Titov and

Welling, Modeling relational data with graph convolutional networks, 2017, Preprint, arXiv:1703.06103.

38.

Schuhmacher and

S.P.

Ponzetto, Knowledge-based graph document modeling, in: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, WSDM’14, New York, New York, USA, ACM. 2014, pp. 543–552. doi:10.1145/2556195.2556250.

39.

Shen,

Huang,

Chang and

Gao, Traversing knowledge graph in vector space without symbolic space guidance, 2017, Preprint, arXiv:1611.04642v4.

40.

Socher,

Chen,

C.D.

Manning and

Ng, Reasoning with neural tensor networks for knowledge base completion, in: Advances in Neural Information Processing Systems, Vol. 26,

C.J.C.

Burges,

Bottou,

Welling,

Ghahramani and

K.Q.

Weinberger, eds, Curran Associates, Inc., 2013, pp. 926–934.

41.

F.M.

Suchanek,

Kasneci and

G.W.

Yago, A core of semantic knowledge, in: Proceedings of the 16th International Conference on World Wide Web, WWW’07, Banff, Alberta, Canada, 2007, International World Wide Web Conferences Steering Committee, pp. 697–706. http://doi.org/10.1145/1242572.1242667.

42.

Teevan,

S.T.

Dumais and

Horvitz, Personalizing search via automated analysis of interests and activities, in: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’05, Salvador, Brazil, ACM, 2005, pp. 449–456. doi:10.1145/1076034.1076111.

43.

Teevan,

D.J.

Liebling and

G.R.

Geetha, Understanding and predicting personal navigation, in: Proceedings of the Fourth ACM International Conference on Web Search and Data Mining, WSDM’11, Hong Kong, China, ACM, 2011, pp. 85–94. http://doi.org/10.1145/1935826.1935848.

44.

Teevan,

M.R.

Morris and

Bush, Discovering and using groups to improve personalized search, in: Proceedings of the Second ACM International Conference on Web Search and Data Mining, WSDM ’09, Barcelona, Spain, ACM, 2009, pp. 15–24. doi:10.1145/1498759.1498786.

45.

Toutanova and

Chen, Observed versus latent features for knowledge base and text inference, in: Proceedings of the 3rd Workshop on Continuous Vector Space Models and Their Compositionality, Beijing, China, Association for Computational Linguistics, 2015, pp. 57–66. http://www.aclweb.org/anthology/W15-4007.

46.

Toutanova,

Chen,

Pantel,

Poon,

Choudhury and

Gamon, Representing text for joint embedding of text and knowledge bases, in: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, Association for Computational Linguistics, 2015, pp. 1499–1509. http://aclweb.org/anthology/D15-1174.

47.

Toutanova,

Lin,

W.-t.

Yih,

Poon and

Quirk, Compositional learning of embeddings for relation paths in knowledge base and text, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, Association for Computational Linguistics, 2016, pp. 1434–1444. http://www.aclweb.org/anthology/P16-1136.

48.

Trouillon,

Welbl,

Riedel,

Gaussier and

Bouchard, Complex embeddings for simple link prediction, in: Proceedings of the 33rd International Conference on International Conference on Machine Learning, ICML’16, Vol. 48, New York, NY, USA, JMLR.org, 2016, pp. 2071–2080. http://proceedings.mlr.press/v48/trouillon16.pdf.

49.

Vu,

D.Q.

Nguyen,

Johnson,

Song and

Willis, Search personalization with embeddings, in: Proceedings of the European Conference on Information Retrieval, Aberdeen, Scotland, Springer, 2017, pp. 598–604. doi:10.1007/978-3-319-56608-5_54.

50.

Vu,

Song,

Willis,

S.N.

Tran and

Li, Improving search personalisation with dynamic group formation, in: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR’14, Gold Coast, Queensland, Australia, ACM, 2014, pp. 951–954. http://doi.org/10.1145/2600428.2609482.

51.

Vu,

Willis,

Kruschwitz and

Song, Personalised query suggestion for intranet search with temporal user profiling, in: Proceedings of the 2017 Conference on Human Information Interaction and Retrieval, CHIIR’17, Oslo, Norway, ACM, 2017, pp. 265–268. doi:10.1145/3020165.3022129.

52.

Vu,

Willis,

S.N.

Tran and

Song, Temporal latent topic user profiles for search personalisation, in: Proceedings of the European Conference on Information Retrieval, Springer International Publishing, Vienna, Austria, 2015, pp. 605–616. https://doi.org/10.1007/978-3-319-16354-3_67.

53.

Wang and

Li, Text-enhanced representation learning for knowledge graph, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, New York, New York, USA, AAAI Press, 2016, pp. 1293–1299. https://www.ijcai.org/Proceedings/16/Papers/187.pdf.

54.

Wang,

Zhang,

Feng and

Chen, Knowledge graph embedding by translating on hyperplanes, in: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, AAAI’14, Quebec City, Quebec, Canada, AAAI Press, 2014, pp. 1112–1119. https://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8531.

55.

West,

Gabrilovich,

Murphy,

Sun,

Gupta and

Lin, Knowledge base completion via search-based question answering, in: Proceedings of the 23rd International Conference on World Wide Web, WWW’14, Seoul, Korea, International World Wide Web Conferences Steering Committee, 2014, pp. 515–526. http://doi.org/10.1145/2566486.2568032.

56.

R.W.

White,

Chu,

Hassan,

He,

Song and

Wang, Enhancing personalized search by mining and modeling task behavior, in: Proceedings of the 22nd International Conference on World Wide Web, WWW’13, Rio de Janeiro, Brazil, International World Wide Web Conferences Steering Committee, 2013, pp. 1411–1420. http://doi.org/10.1145/2488388.2488511.

57.

Xiao,

Huang and

Zhu, From one point to a manifold: Knowledge graph embedding for precise link prediction, in: Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI’16, New York, New York, USA, AAAI Press, 2016, pp. 1315–1321. https://www.ijcai.org/Proceedings/16/Papers/190.pdf.

58.

Xiao,

Huang and

Zhu, Transg: A generative model for knowledge graph embedding, in: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, Association for Computational Linguistics, 2016, pp. 2316–2325. http://www.aclweb.org/anthology/P16-1219.

59.

Xie,

Ma,

Dai and

Hovy, An interpretable knowledge transfer model for knowledge base completion, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, Association for Computational Linguistics, 2017, pp. 950–962. http://aclweb.org/anthology/P17-1088.

60.

Xiong,

Power and

Callan, Explicit semantic ranking for academic search via knowledge graph embedding, in: Proceedings of the 26th International Conference on World Wide Web, WWW’17, Perth, Australia, International World Wide Web Conferences Steering Committee, 2017, pp. 1271–1279. https://doi.org/10.1145/3038912.3052558.

61.

Yang and

Mitchell, Leveraging knowledge bases in lstms for improving machine reading, in: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, Association for Computational Linguistics, 2017, pp. 1436–1446. http://aclweb.org/anthology/P17-1132.

62.

Yang,

W.-t.

Yih,

He,

Gao and

Deng, Embedding entities and relations for learning and inference in knowledge bases, in: Proceedings of the International Conference on Learning Representations, 2015. https://arxiv.org/abs/1412.6575.

63.

Yang,

Yang and

W.W.

Cohen, Differentiable learning of logical rules for knowledge base reasoning, in: Advances in Neural Information Processing Systems, Vol. 30,

Guyon,

U.V.

Luxburg,

Bengio,

Wallach,

Fergus,

Vishwanathan and

Garnett, eds, Curran Associates, Inc., 2017, pp. 2319–2328.

64.

Yang,

Guo,

Song,

Meng,

Shokouhi,

McDonald and

W.B.

Croft, Modeling user interests for zero-query ranking, in: Proceedings of the European Conference on Information Retrieval, Springer International Publishing, 2016, pp. 171–184. https://doi.org/10.1007/978-3-319-30671-1_13.

65.

H.-G.

Yoon,

H.-J.

Song,

S.-B.

Park and

S.-Y.

Park, A translation-based knowledge graph embedding preserving logical property of relations, in: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, Association for Computational Linguistics, 2016, pp. 907–916. http://www.aclweb.org/anthology/N16-1105.

66.

Zhang,

Liu,

He,

Ji,

Liu,

Wu and

Zhao, Question answering over knowledge base with neural attention combining global knowledge information, 2016, Preprint, arXiv:1606.00979.

# users	106
#distinct queries	6,632
#SAT clicks	8,052
#distinct documents	33,591

A convolutional neural network-based model for knowledge base completion and its application to search personalization

Abstract

Keywords

1. Introduction

4.1.1. Task description

1 Some previous works also reported Hits@1. However, the formulas of MRR and Hits@1 show a strong correlation between these two scores. So using Hits@1 does not really reveal any additional information for this task.

2 We employ a TransE implementation available at: https://github.com/datquocnguyen/STransE.

4.2.1. Task description

4.2.3. Main results

Table 5 Basic statistics of the dataset [49] # users 106 #distinct queries 6,632 #SAT clicks 8,052 #distinct documents 33,591

3 We re-rank the list of top-10 documents returned by the search engine, so all models obtain the same Hits@10 scores.

5.2.1. Query and document embedding initialization

5.2.2. Hyper-parameter tuning

5.3. Results

Bibliographic note

Footnotes

Acknowledgements

References

¹
Some previous works also reported Hits@1. However, the formulas of MRR and Hits@1 show a strong correlation between these two scores. So using Hits@1 does not really reveal any additional information for this task.

²
We employ a TransE implementation available at: https://github.com/datquocnguyen/STransE.

Table 5
Basic statistics of the dataset [49]

# users 106

#distinct queries 6,632

#SAT clicks 8,052

#distinct documents 33,591

³
We re-rank the list of top-10 documents returned by the search engine, so all models obtain the same Hits@10 scores.