Sage Journals: Discover world-class research

Abstract

Sequential recommendation aims to predict users’ future activities based on their historical interaction sequences. Various neural network architectures, such as Recurrent Neural Networks (RNN), Graph Neural Networks (GNN), and self-attention mechanisms, have been employed in the tasks, exploring multiple aspects of user preferences, including general interests, short-term interests, long-term interests, and item co-occurrence patterns. Despite achieving good performance, there are still limitations in capturing complex user preferences. Specifically, the current structures of RNN, GNN, etc., only capture item-level transition relations while neglecting attribute-level transition relations. Additionally, the explicit item relations are studied using item co-occurrence modules, but they cannot capture the implicit item-item relations. To address these issues, we propose a knowledge-augmented Gated Recurrent Unit (GRU) to improve the short-term user interest module and adopt a collaborative item aggregation method to enhance the item co-occurrence module. Additionally, our long-term interest module utilizes a bitwise gating mechanism to select historical item features significant to users’ current preferences. We extensively evaluate our model on three real-world datasets alongside competitive methods, demonstrating its effectiveness in top $K$ sequential recommendation.

Keywords

Knowledge graph aggregation collaborative item aggregation user preferences sequential recommendation

1. Introduction

With the explosive growth of online information, recommender systems provide an increasingly important role in knowing users’ requirements and providing personalized suggestions. In the real scene, user interaction sequences are generated chronologically, which reflects users’ dynamic interests, and there is a strong correlation between adjacent interactions [1]. Therefore, unlike traditional recommendation tasks that model users’ preferences in a static way, sequential recommendation can capture dynamic preferences and predict the next interacting items, which is more suitable for realistic requirements.

Recurrent neural network (RNN)-based and graph neural network (GNN)-based sequential recommendation models [2, 3, 4] demonstrate the effectiveness in learning the transition relations among items and modeling short-term user preferences, but have difficulty in capturing long-range dependencies. One way to deal with the problem is adopting a self-attention mechanism, such as SASRec [5] and Bert4Rec [6]. Another way is separating modeling the short-term and long-range contextual, such as CLSR [7] and SLi-Rec [8]. In recent work, MA-GNN [9] models user general interests, short-term interests, long-term interests and item co-occurrence patterns simultaneously to fully describe user preferences and capture the joint occurrence of related items. These works do achieve good performance but still have limitations. First, they only consider the item transition relations, neglecting the attribute-level transition patterns. Second, modeling the items in an interaction can only distill the explicit item co-occurrence patterns, ignoring the implicit item-item relations.

To address the limitations of existing methods of sequential recommendation models, we propose integrating knowledge graph (KG) and collaborative signals to jointly conduct sequence modeling. As shown in Fig. 1, the middle arrow illustrates the behavior sequence $e_{0},\ldots,e_{n}$ of Alice. The lower part shows the KG with item-side information. Incorporating it in sequence modeling can help to capture the attribute-level transition patterns and introducing semantic relatedness among items. The upper part is the user-item graph, representing the users who have similar interactions with Alice also prefer movie ‘Avatar II’, ‘Avengers’ and ‘Flipped’. Therefore, collaborative signals can contribute to learn the implicit item-item relations. From the two aspects, we could design the recommendation list for Alice at the next time step, with ‘Avengers’ at first place. One reason is that ‘Avengers’ is closely related to the sequences in the KG by common actors, genres, etc. In addition, ‘Avengers’ has the highest co-occurrence frequency with the current sequences according to user behaviours. Since movie ‘Avatar II’ and ‘Hulk’ have different degrees of correlations with the sequences, they are ranked behind ‘Avengers’ as a supplement to the recommendation list. Under the joint influence of KG and collaborative signals, we can learn more diverse and intricate user preferences.

Figure 1.

An illustration of Alice’s interaction sequence.

Based on the above observations, we Aggregate Knowledge and Collaborative Information for Sequential Recommendation (AKCISR), thereby improving the accuracy. It consists of a general interest module, short-term interest module, long-term interest module and item co-occurrence module. Specifically, AKCISR first incorporates external information into item embeddings based on the knowledge graph embedding (KGE) method. Then, the user preferences are modelled through the four modules. User vectors are the representations of general interests. In the short-term module, we combine the knowledge propagation and gate recurrent unit (GRU) mechanism to capture the user’s attribute-level and item-level preference transitions in an end-to-end manner. In the long-term module, a bitwise gating mechanism is used to dynamically discriminate the importance of each item feature. In the co-occurrence module, we propose a collaborative item aggregation mechanism to enrich the item representations, thereby integrating collaborative signals to enhance its ability. To summarize, the major contributions of this paper are listed as follows:

•

We embed items with knowledge-based (KB) information in users’ interactions to capture users’ interests.

•

We design a KG propagation augmented recurrent neural network to learn the structure information and semantic information of the KG as well as transition relations.

•

We aggregate collaborative information in interactions to capture the implicit item co-occurrence patterns.

•

Experiments on three real-world datasets demonstrate the efficacy of AKCISR over several baselines.

2. Related work

KGs are usually introduced in traditional recommendation tasks in order to supplement external information and enrich the representations, but are rarely introduced in sequential recommendation tasks. In this section, we briefly describe the recent advances in KGs for recommendation methods, sequential recommendation methods and the combination of them that inspired our work.

2.1 Knowledge graphs for recommendation

Introducing a KG in a recommendation system helps to improve its diversity and interpretability. According to the survey [10], existing KG-based recommendation algorithms are mainly divided into three categories. The first is the embedding-based method, which usually embeds the items with external knowledge into low rank embedding to enrich the representation of the item or user. For instance, DKN [11] and KSR [12] adopt the two-stage method: pretraining the item embeddings with the involvement of KG and loading the embedding results into the recommendation model. CKE [13] and SHINE [14] train the KGE module and recommendation module jointly, providing additional regularization items for recommendation tasks. The second is the connection-based method, which introduces interpretability through the connectivity of the path. However, it is computationally intensive and time-consuming. For the propagation-based method, the enriched user and item representation can be obtained by combining the semantic representation and connectivity information of entities. Representative works, such as Ripplenet [15], KGCN [16], and KGAT [17], demonstrate significant superiority over many other baselines.

2.2 Sequential recommendation

Sequential recommendation aims to learn the changes in users’ interests and predict users’ subsequent behavior by modeling history sequences. Traditional algorithms apply Markov chains to model the relevance of behaviors. For instance, FPMC [18] constructs the transition matrix of items through history interactions, while Fossil [19] alleviates the problem of sparsity according to high-order Markov chain and similarity-based methods.

After the rise of deep learning, some basic deep neural networks based on RNN, Convolutional Neural Network (CNN), and Graph Neural Network (GNN) are demonstrated to be helpful for predicting successive items. RNN is suitable for learning the transition patterns between items. Classical RNN-based models such as GRU4Rec [2] and NARM [3] show good performances in session recommendation tasks. CNN can be used to capture the global information of the sequences, such as Caser [1], capturing point-level, union-level and skip behaviors through two horizontal and vertical filters. Wu et al. [20] propose a mapping session sequence into a graph, thereby taking complex transitions of items into account through the GNN algorithm, leading to the rise of GNN-based sequential recommendation. However, all the above studies have the problem of long-term dependencies. To address these limitations, SASRec [5] and BERT4Rec [6] leverage a self-attention structure to model long sequences, while RUM [21] utilizes a memory network structure to store historical user preferences. MA-GNN [9] combines multiple channels to improve recommendation accuracy effectively; but neglects the associated information of items, causing insufficient item embeddings. Therefore, a few sequential recommendation methods introduce KG to capture more fine-grained user preferences.

2.3 Knowledge graph for sequential recommendation

KSR [12] and ADKGN [4] incorporate KG with the KGE method, which takes semantic representation into consideration but neglects the connectivity pattern. Note that KSR [12] also proposes a memory network to integrate the external knowledge. Although it indeed improves the interpretability, the attribute transition relations are difficult to be captured. KASR [22] adopts a propagation-based method and RNN mechanism to model user preferences. However, the attention score is computed only on the basis of relation and item embeddings, lacking personalization. In addition, being segmented by a sliding window, the incomplete records are used to model user preferences, which is insufficient.

Based on the preliminaries above, our work also considers multiple factors, namely, the general interests, the short-term interests, the long-term interests and the co-occurrence probability of items. However, the difference is that we integrate KB information into the short-term interest module through a personalized way and involve CF signals into the item co-occurrence module. Therefore, AKCISR not only captures the transfer relationship and co-occurrence relationship between items, but also leverages KB and CF information to assist decision-making.

3. Problem formulation

We deal with the recommendation tasks under the implicit feedback circumstance. The user-item interaction sequences are generated chronologically, represented by $S_{u}=\{e_{1}^{u},e_{2}^{u},e_{3}^{u},\ldots,e^{u}_{\lvert E^{u}\rvert}\}$ , where $e_{*}^{u}$ are the items that user $u$ has interacted with sequentially. In addition, the item set is equipped with a corresponding KG $G$ which is formed by a set of KB triples. A KB triple $(e_{h},r,e_{t})$ describes that the entities $e_{h}$ and $e_{t}$ are associated by the relation $r$ , $e_{h}$ and $e_{t}$ belong to the entity set $E$ and $r$ belongs to the relation set $R$ . Note that the item set $I$ in sequential recommendation is a subset of KB entity set $E$ .

Based on the user-item historical interaction sequences and the KG, the problem is to choose a list of items from $\lvert I\rvert$ items to each user and evaluate whether the chosen items will appear in the next few time steps. Therefore, our goal is to learn a prediction function $\hat{y}_{S_{u},e}=f(S_{u},u,e)$ , where $\hat{y}_{S_{u},e}$ denotes the probability that user $u$ will engage with item $e$ next, given the user’s interaction sequences $S_{u}$ .

4. Methodology

The framework of our method is illustrated in Fig. 2. We start with an embedding module using the KGE method, and then model user preferences through the four factors which are general interest module, short-term interest module, long-term interest module and item co-occurrence module. Finally, the prediction and training process is introduced.

Figure 2.

The architecture of AKCISR. AKCISR consists of three major components: the embedding module, the user interest module and the prediction module. Specifically, $\sim$ denotes the softmax function and $\otimes$ denotes the bit-wise multiplication.

4.1 Embedding module

The embedding module is shown as (a) in Fig. 2. We convert the entity, and relation in KG into low-dimensional vector representations by the TransD [23] method to preserve structural information. Compared with some other previous KGE methods, TransD is a more fine-grained framework that regards relations as translations from head to tail entities. Each entity or relation is represented by two vectors. Specifically, assume the dimension of entity space and relation space is $d$ , a triplet $(e_{h},r,e_{t})$ is projected to $e_{hp},e_{rp},e_{tp}\in R^{d}$ , where $e_{h},r,e_{t}\in R^{d}$ indicate the vectors of head, relation and tail respectively, and $p$ indicates the projection vectors. Two mapping matrices $M_{r}^{h},M_{r}^{t}\in R^{d*d}$ project entities from entity space to relation space, which is defined as:

$\displaystyle M_{r}^{h}=e_{rp}e^{T}_{hp}+I^{d*d},$ (1) $\displaystyle M_{r}^{t}=e_{rp}e^{T}_{tp}+I^{d*d},$ (2)

where $I^{d*d}$ represents the identity matrix. Then, the projected vectors can be:

$\displaystyle e_{h}^{r}=e_{h}M_{r}^{h},$ (3) $\displaystyle e_{t}^{r}=e_{t}M_{t}^{h}.$ (4)

The score function is correspondingly defined as:

$\displaystyle f_{r}(e_{h},e_{t})={\|e_{h}^{r}+r-e_{t}^{r}\|}_{2}^{2}.$ (5)

$(e_{h},r,e^{\prime}_{t})$ , the broken triplet, is formed by negative sampling for each valid triplet $(e_{h},r,e_{t})$ . Then, the loss is computed by:

$\displaystyle L_{KG}=\sum_{(e_{h},r,e_{t},e^{\prime}_{t})\in T}\max(0,\gamma+f% _{r}(e_{h},e_{t})-f_{r}(e_{h},e^{\prime}_{t})),$ (6)

where $T=\{(e_{h},r,e_{t},e^{\prime}_{t})\lvert(e_{h},r,e_{t})\in G,(e_{h},r,e^{% \prime}_{t}\notin G)\}$ and $\gamma$ is the margin separating golden triplets and negative triplets. This layer models the entity and relations with the connection information, thereby enhancing the representation ability.

4.2 User interest module

According to (b) in Fig. 2, the user interest module contains four sub-modules: the general interest module, short-term interest module, long-term interest module and item co-occurrence module. The detailed information is presented in the section.

4.2.1 General interest module

To capture the inherent preference of user, we take the embedding of user $u\in R^{d}$ as the representations of users’ general interests, where $d$ is the dimension size.

4.2.2 Short-term interest module

For each user’s interaction sequences $S_{u}=\{e_{1}^{u},e_{2}^{u},e_{3}^{u},\ldots,e_{\lvert E^{u}\rvert}^{u}\}$ , the short-term sequence is segmented by a sliding window of size $Z$ , then $S_{u,z}=\{e_{z}^{u},e_{z+1}^{u},\ldots,e_{z+\lvert Z\rvert-1}^{u}\}$ is the $z$ th short-term sequence, while $e_{z+Z}^{u}$ is the target.

For each item (entity) representation $e$ , its adjacent entities and relations can be extracted from the KG $G$ . The first-order neighborhoods are entities that represent attribute-level information, and items are connected by their common attributes. In our model, the neighborhoods of each item are selected randomly at a fixed size $N$ , since the fixed size meets the requirement of matrix computation and random selection helps to reduce the computation scale. The $N$ neighborhoods of each entity are selected to form a matrix $NX\in R^{(\lvert E\rvert*N)}$ , and use $r_{e_{h},e_{t}}$ to denote the relationship between $e_{h}$ and $e_{t}$ .

Similar to KGCN, we perform neighborhood aggregation on each item in the interaction sequences for a given user, corresponding to the yellow box in Fig. 2. The attention score $\pi_{r}^{u}$ , obtained by the inner product of the user vector and relation vector, can be used to judge which relationships the user pays more attention to. The function is defined as:

$\displaystyle\pi^{u}_{r_{e_{h},e_{t}}}=g(u,r_{e_{h},e_{t}}),$ (7)

where $u$ and $r$ are the user and relation embedding, $u\in R^{d}$ , $r\in R^{d}$ . Function $g$ represents inner product operation. The normalized $\widetilde{\pi}_{r}^{u}$ endows weights for the neighborhood entities:

$\displaystyle\widetilde{\pi}^{u}_{r_{e_{h},e_{t}}}=\frac{\exp(\pi^{u}_{r_{e_{h% },e_{t}}})}{\sum_{e_{t}\in NX(e_{h})}\exp(\pi^{u}_{r_{e_{h},e_{t}}})}.$ (8)

Given this, the neighborhood aggregation can be computed by linear combination:

$\displaystyle e^{u}_{NX(e_{h})}=\sum_{e_{t}\in NX(e_{h})}\widetilde{\pi}^{u}_{% r_{e_{h},e_{t}}}e_{t}.$ (9)

We adopt a sum aggregator to combine the entity embedding $e_{h}$ and the embedding $e^{u}_{NX(e_{h})}\in R^{d}$ obtained after aggregation to construct the final aggregated vector

$\displaystyle f_{\textit{agg}}(e_{h},e^{u}_{NX(e_{h})})=\sigma{(W_{f}\cdot(e_{% h}+e^{u}_{NX(e_{h})})+b_{f})},$ (10)

where $W_{f}\in R^{d*d}$ and $b_{f}\in R^{d}$ are weight matrices, and $\sigma$ is the sigmoid function. In order to capture high-order semantic information for each entity, we can repeat the above procedure and stack multi-layer aggregation. Formally, we iteratively define the $A$ -order information aggregation of $e$ as:

$\displaystyle e^{au}=e^{au}_{h}=f_{\textit{agg}}(e^{(a-1)u}_{h},e^{(a-1)u}_{NX% (e_{h})}),$ (11)

where the updated embedding $e^{au}_{h}$ , also denoted as $e^{au}$ , can be derived by the central entity representation $e^{(a-1)u}_{h}$ and its neighborhood aggregation $e^{(a-1)u}_{NX(e_{h})}$ at the previous layer. Therefore, the $z$ th short-term interaction sequences of user $u$ with neighborhood knowledge can be represented as $S^{\prime}_{u,z}=\{e_{z}^{au},e_{z+1}^{au},\ldots,e_{z+\lvert Z\rvert-1}^{au}\}$ . To show the details, the knowledge propagation process is described in Algorithm 4.2.2.

The sequences are fed into GRU, which is appropriate to learn the transition patterns of items. Because the side information has already been incorporated, users’ attribute-level preference transition relations can also be captured. For each input, the hidden state can be computed by:

$\displaystyle h_{t}=\textit{GRU}(h_{t-1},e_{i}^{au},\phi),$ (12)

where $h_{t-1}$ is the state of the previous step, $e_{i}^{au}$ is the aggregated item embedding, and $\phi$ represents all the parameters of GRU.

We essentially take $h_{1},\ldots,h_{t}$ as the new representations of items with sequential and semantic information. To discriminate which informative items are more significant to represent user preferences, an attention model similar to that in NARM [3] is applied.

$\displaystyle\alpha_{i}=W_{a}^{3}\cdot\sigma{(W_{a}^{1}h_{t}+W_{a}^{2}h_{i})},$ (13) $\displaystyle p_{u,z}^{S}=\left.\left(\sum_{i=1}^{t}\alpha_{i}h_{i}\right)% \right\|h_{t},$ (14)

where $W_{a}^{1},W_{a}^{2},W_{a}^{3}\in R^{d*d}$ are the learnable parameters, and $h_{t}$ and $h_{i}$ are the final state and the state at the $i$ th step, respectively. $\|$ is the concatenation operation. $p_{u,l}^{S}\in R^{2d}$ represents the final user long-term interests, obtained by concatenating weighted aggregation of each state $\sum_{i=1}^{t}\alpha_{i}h_{i}$ and the final state $h_{t}$ .

: Constructing Knowledge-propagation Interaction Sequences[1] Input: Knowledge gragh $G=(E,R)$ , interaction sequence $S_{u,z}=\{e_{z}^{u},e_{z+1}^{u},\ldots,e_{z+\lvert Z\rvert-1}^{u}\}$ , neighbor sampling size $N$ , aggregation depth $A$ ; Output: Knowledge-propagation interaction sequence $S^{\prime}_{u,z}=\{e_{z}^{au},e_{z+1}^{au},\ldots,e_{z+\lvert Z\rvert-1}^{au}\}$ ; Initialize adjacency matrix $N$ ; $e\in E$ $NX[e]=$ sampling neighbors in KG by size $N$ ;

$e$ in $S_{u,z}$ $\{\textit{Field}[a]\}_{a=0}^{A}=$ Get- $A$ -Order-Propagation-Field ( $e$ , $N$ ); $a=1,\ldots,A$ $e_{h}\in\textit{Field}[a]$ $e^{(a-1)u}_{NX(e_{h})}=\sum_{e_{t}\in NX(e_{h})}\widetilde{\pi}^{u}_{r_{e_{h},% e_{t}}}e_{t}^{a-1}$ ; (Eq. (9))

$e^{au}=e^{au}_{h}=f_{\textit{agg}}(e^{(a-1)u}_{h},e^{(a-1)u}_{NX(e_{h})})$ ; (Eq. (10)) $S^{\prime}_{u,z}\leftarrow\textit{concat}(S^{\prime}_{u,z},e^{au})$ ; Return $S^{\prime}_{u,z}$ ;

4.2.3 Long-term interest module

Compared with short-term interactions, which are of great significance for capturing users’ current interest, long-term interactions are more appropriate for storing users’ historical preferences, preventing the long-term dependencies and the homogenization of recommendation lists.

As shown in the top channel of Fig. 2, AKCISR applies the position function and bitwise gating mechanism to learn the long-term sequence, rather than the GRU mechanism in the short-term sequence. One reason is that the gating mechanism is demonstrated to be effective in learning the importance of each item embedding dynamically, wakening the unimportant items and strengthening the important ones. Another reason is that modeling the short-term sequence and long-term sequence in a different way helps them decouple.

For the $z$ th long sequences $L_{u,z}=\{e^{u}_{1},e^{u}_{2},\ldots,e^{u}_{z}\}$ of user $u$ , its positional information is computed based on the positional function proposed by Transformer [24]:

$\displaystyle PE_{\textit{pos},2i}=\sin(\textit{pos}/10000^{2i/{d_{\textit{% model}}}}),$ (15) $\displaystyle PE_{\textit{pos},2i+1}=\cos(\textit{pos}/10000^{2i/{d_{\textit{% model}}}}),$ (16) $\displaystyle L_{u,z}=L_{u,z}+PE(L_{u,z}),$ (17)

where $PE(\cdot)$ is the positional encoding function. Then, the user long-term preference can be modelled by a bitwise gating mechanism:

$\displaystyle o_{u,z}=\textit{softmax}(W_{s}^{2}\textit{tanh}(W_{s}^{1}L_{u,z}% ))),$ (18) $\displaystyle p_{u,z}^{L}=\textit{avg}(\textit{tanh}(W_{s}^{3}(o_{u,z}\cdot L_% {u,z}^{T}))),$ (19)

where $W_{s}^{1}\in R^{d*d}$ , $W_{s}^{2}\in R^{d}$ , $W_{s}^{3}\in R^{d*2d}$ are the learnable parameters, and $o_{u,z}\in R^{Z*d}$ are the weight scores of the long sequences $L_{u,z}\in R^{Z*d}$ . $p_{u,z}^{L}\in R^{2d}$ represents the final user short-term interests.

4.2.4 Interests fusion module

Similarly, we also apply the gating mechanism to fuse long-term and short-term embedding, since it can adjust the contribution of the two channels, which has been demonstrated by a few studies [25, 9]. When the user continuously pays attention to the same type of items, the short-term interest module may give more importance to the fusion results. In contrast, the importance of the long-term interest module must be increased. Therefore, the function is defined as:

$\displaystyle g_{u,z}=\sigma(W_{g}^{1}p_{u,z}^{s}+W_{g}^{2}p_{u,z}^{L}),$ (20) $\displaystyle p_{u,z}^{F}=(g_{u,z}\otimes p_{u,z}^{s}+(1-g_{u,z})\otimes p_{u,% z}^{R}),$ (21)

where $\otimes$ indicates element-wise multiplication, $W_{g}^{1}\in R^{2d*2d}$ and $W_{g}^{2}\in R^{2d*2d}$ are the learnable parameters. $g_{u,z}\in R^{2d}$ represents the gate embedding. $p_{u,z}^{F}\in R^{2d}$ represents the fusion of short-term and long-term interests.

4.2.5 Item co-occurrence module

Different from the short-term and long-term modules that learn user preferences with external knowledge, the item co-occurrence module focuses on capturing the similarity of user behavior. This module is a key component since the items strongly correlated with those in $S_{u,z}$ are more likely to be targets. However, the sparsity of items and short interactions bring limitations to the capability of the module.

Given this, we propose a more direct way to utilize collaborative filtering information to enrich the item information and capture the implicit item-item relations. This module corresponds to the bottom channel of Fig. 2. To be clearer, we first determine the relevant item sets for each item to form a similar-item matrix based on the ItemCF-IUF algorithm [26]. The algorithm is used to measure the item correlations according to the ratio of common users. Formally, the correlation score between item $i$ and item $j$ is defined as:

$\displaystyle\textit{Cor}(e_{i},e_{j})=\frac{1}{\lvert C(e_{i})\rvert\lvert C(% e_{j})\rvert}\sum_{u\in C(e_{i})\cap C(e_{j})}\frac{1}{\log(1+\lvert C(u)% \rvert)},$ (22)

where $\lvert C(e_{i})\rvert$ and $\lvert C(e_{j})\rvert$ are the number of users who like item $e_{i}$ and item $e_{j}$ respectively, while $\lvert C(u)\rvert$ is the number of users who both like item $e_{i}$ and item $e_{j}$ . Therefore, the higher the ratio of common users is, the stronger the correlations. The first $M$ most similar items of each item are selected to form a matrix $MX\in R^{(\lvert I\rvert*M)}$ , which is subsequently used for information aggregation. $\lvert I\rvert$ represents the number of items. For item $e_{i}$ , $MX(e_{i})$ is the set of its neighborhood items. To learn the relational score between item $e_{i}$ and its neighborhood item $e_{j}\in MX(e_{i})$ more flexibly, we define the relational score $\pi_{e_{i}}^{e_{j}}$ as:

$\displaystyle\pi^{e_{j}}_{e_{i}}=g(e_{i},e_{j}),$ (23)

where function $g$ , $R^{d}*R^{d}\rightarrow R$ , is the inner product operation with $e_{i}\in R^{d}$ , $e_{j}\in R^{d}$ . The normalized $\widetilde{\pi}_{e_{i}}^{e_{j}}$ is computed as:

$\displaystyle\widetilde{\pi}^{e_{i}}_{e_{j}}=\frac{\exp(\pi^{e_{i}}_{e_{j}})}{% \sum_{e_{j}\in MX(e_{i})}\exp(\pi^{e_{i}}_{e_{j}})}.$ (24)

The aggregation of the $M$ most similar items is obtained by linear combination:

$\displaystyle e_{MX(e_{i})}=\sum_{e_{j}\in MX(e_{i})}\widetilde{\pi}^{e_{i}}_{% e_{j}}e_{i}.$ (25)

The final aggregated representation is constructed by the sum aggregator, injecting collaborative signals.

$\displaystyle e^{c}=f_{\textit{agg}}(e_{i},e_{MX(e_{i})})=\sigma{(W_{c}\cdot(e% _{i}+e_{MX(e_{i})})+b_{c})},$ (26)

where $W_{c}\in R^{d*d}$ and $b_{c}\in R^{d}$ are weight matrices, $\sigma$ is the sigmoid function, and $e^{c}$ is the aggregated representation with collaborative information. Therefore, the sequences of item co-occurrence module can be updated as $S_{u,z}^{c}=\{e_{u,z}^{c},e_{u,z+1}^{c},\ldots,e_{u,z+\lvert Z\rvert-1}^{c}\}$ . The CF-based aggregation process is described in Algorithm 4.2.5. To model the pairwise relationships between the sequences $S_{u,z}^{c}$ and the target items, the function takes the form:

$\displaystyle\frac{1}{\lvert S_{u,z}^{c}\rvert}\sum_{e_{i}\in S_{u,z}^{c}}e_{i% }Q,$ (27)

where $Q\in R^{d*\lvert I\rvert}$ is the output item embedding. The results capture the average co-occurrence score from each item in $S_{u,z}$ to all the candidate items.

: Constructing Interaction Sequences with CF Information[1] Input: Interaction sequence $S_{u,z}=\{e_{z}^{u},e_{z+1}^{u},\ldots,e_{z+\lvert Z\rvert-1}^{u}\}$ , similar item sampling size $M$ ; Output: Interaction sequence containing CF signals $S_{u,z}^{c}=\{e_{u,z}^{c},e_{u,z+1}^{c},\ldots,e_{u,z+\lvert Z\rvert-1}^{c}\}$ ; Initialize similar-item matrix $M X$ ; Computing correlation score $\textit{Cor}(e_{i},e_{j})$ for any two items according to Eq. (23); $e\in$ $I$ $MX[e]=$ selecting collaborative items for $e$ by size $M$ ;

$e$ in $S_{u,z}$

$e_{i},e_{j}\in MX[e]$ $e_{MX(e_{i})}=\sum_{e_{j}\in MX(e_{i})}\widetilde{\pi}^{e_{i}}_{e_{j}}e_{i}$ ; (Eq. (25))

$e^{c}=f_{\textit{agg}}(e_{i},e_{MX(e_{i})})=\sigma{(W_{c}\cdot(e_{i}+e_{MX(e_{% i})})+b_{c})}$ ; (Eq. (26)) $S^{c}_{u,z}\leftarrow\textit{concat}(S^{c}_{u,z},e^{c})$ ; Return $S^{c}_{u,z}$ ;

4.3 Prediction and training module

Given a user’s interaction sequence $S_{u}$ which is divided into a short sequence $S_{u,z}$ and a long sequence $L_{u,z}$ , the prediction score for candidate item $j$ is:

$\displaystyle\hat{y}_{S_{u},j}=(p_{u,z}^{F}\|u)*q_{j}+\frac{1}{\lvert S_{u,z}^% {c}\rvert}\sum_{e_{i}\in S_{u,z}^{c}}e_{i}*q_{j},$ (28)

where $q_{j}\in R^{d}$ is the $j$ th column of the output item embedding $Q$ . This module is illustrated as (c) in Fig. 2.

We optimize the proposed model by using gradient descent on the cross-entropy loss:

$\displaystyle\textit{argmin}\sum_{i=1}^{\lvert I\rvert}-\hat{D}_{i}\log(D_{i})% +\lambda(\|{U}\|^{2}+\|{E}\|^{2}+\|{R}\|^{2}+\|{\theta}\|^{2}),$ (29)

where $\hat{D}$ is the distribution of prediction probability, $D$ is the truly distribution, $\lvert I\rvert$ is the number of items, $\lambda$ is the parameters in the regularization parameter and $\theta$ is the parameters of the whole network.

5. Experiments

In this section, our proposed model is analysed to explain the following research questions (RQs).

RQ 1. What effect do the hyperparameters neighbor sampling size $N$ , aggregation depth $A$ and collaborative items size $M$ have on the recommendation results? RQ 2. Does AKCISR outperform other baselines? RQ 3. Do the KG and CF parts improve the performance and alleviate the sparsity problem? RQ 4. How AKCISR is affected by each component?

5.1 Datasets

Since our model requires KB information, only the recommendation benchmark data, including the corresponding KG, are considered. We conduct experiments on MovieLens-1M,1

¹
https://grouplens.org/datasets/movielens/.
Last-FM,2 ²
http://www.cp.jku.at/datasets/LFM-1b/.
and AmazonBook3 ³
http://jmcauley.ucsd.edu/data/amazon/.
whose knowledge graphs are provided by the open Knowledge Base [27].4 ⁴
The KG of Amazon-Book and Last-FM datasets are available at https://github.com/LunaBlack/KGAT-pytorch/tree/master/ datasets. The KG of MovieLens-1M is available at https://github.com/hwwang55/KGCN/tree/master/data.

•
MovieLens-1M is a widely used benchmark dataset in movie recommendations, which consists of approximately 1 million explicit ratings on the MovieLens website.
•
Amazon-Book is a large-scale book review dataset provided by Amazon Corporation, including 70 thousand users and approximately 1 million interactions.
•
Last-FM is a music listening dataset from Last.fm online music system, containing over 10 million listening records. In particular, we take the subset of the Last-FM dataset where the first $5\%$ interactions of each user are retained according to the time stamp.

Detailed information of the three datasets is shown in Table 1.

Table 1
Basic statistics for the three datasets

MovieLens-1M Amazon-Book Last-FM

Users 6036 70,679 11,994

Items 2445 24,915 19,999

Interactions 753,772 847,733 1,288,370

Entities 182,011 113,488 78,267

Relations 12 39 9

Triplets 923,718 2,557,746 286,378

We transform the three datasets into implicit feedback. After arranging the interactions of each user in order according to the timestamp, we hold the first 70% of the actions in each user’s record as the training set, and leave the next 10% of actions as the validation set to tune the hyper-parameters. The remaining 20%, as the test set, is used for reporting the performance of the model.
5.2 Baselines and metrics

	MovieLens-1M	Amazon-Book	Last-FM
Users	6036	70,679	11,994
Items	2445	24,915	19,999
Interactions	753,772	847,733	1,288,370
Entities	182,011	113,488	78,267
Relations	12	39	9
Triplets	923,718	2,557,746	286,378

We compare AKCISR5

⁵
https://github.com/Jillyq/AKCISR.
with the sequential recommendation model (i.e., Caser, SASRec and MA-GNN) and the KG-based recommendation model (i.e., KGCN and KGAT).

•
Caser [1], convolutional sequence embedding model, learns sequential patterns using convolutional filters.
•
SASRec [5], self-attention based sequential model, adopts self-attention mechanism to identify relevant items from a user’s action history.
•
MA-GNN [9], memory augmented graph neural networks, applies GNN to model short-term preference and utilize a memory network to capture the long-term dependencies.
•
KGCN [16], knowledge graph convolutional networks, captures item-item relatedness by modeling high-order proximity information in KG.
•
KGAT [17], knowledge graph attention network, models the high-order connectivities in a hybrid structure of KG and user-item graph.

To evaluate all the models, we adopt three widely used top $K$ ranking metrics: Recall@ $K$ , Mean Average Precision MAP@ $K$ , and Normalized Discounted Cumulative Gain NDCG@ $K$ ( $K=$ 10, 15, 20). Recall@ $K$ is a measure of completeness, which determines the fraction of correctly predicted items retrieved out of the top- $K$ recommended items. MAP@ $K$ is a measure of exactness, and determines the fraction of correctly predicted items retrieved out of user’s rated items. NDCG@ $K$ is the normalized discounted cumulative gain at $N$ , which evaluates the ranking accuracy. Note that the results are the average of all test users. The hyper-parameters of baselines are set to the best values as reported in the original papers.

Figure 3.
Effect of neighbor sampling size $N$ in the short-term interest module.

Figure 4.
Effect of aggregation depth $A$ in the short-term interest module.

Figure 5.
Effect of collaborative item size $M$ in the item co-occurrence module.

5.3 Parameter analysis and settings (RQ 1)

To get deep insights into how our proposed model will work well, we investigate the impact of three hyper-parameters: neighbor sampling size $N$ , aggregation depth $A$ and collaborative item size $M$ . When one of them is adjusted, the other two hyper-parameters remain unchanged. The remaining parameters are set as follows: The embedding size $d$ is set to 50, and hidden unit $u$ is set to 100. The length of sliding window $Z$ is set to 5. The learning rate and regularization factor are set to 0.001 and 0.01, respectively. The batch size is set to 128. Due to the space limitations, we report Recall@10 and NDCG@10 results of all the datasets.

(1) Effect of neighbor sampling size $N$ in the short-term interest module

We vary the number of neighbors sampled in the KG from 2 to 8 across all the datasets. The results are illustrated in Fig. 4. Obviously, the neighbor sampling size $N$ should not be too small, which is unable to incorporate side information. Meanwhile, it should not be too large or too much noise can be introduced in the item representation. For MovieLens-1M and AmazonBook datasets with a large quantity of entities and relations in KG, the best performances are achieved at $N=6$ , while Last-FM dataset with relatively few entities and relations prefers a small $N=4$ . This difference is reasonable since a large-scale KG is expected to collect more side information compared to datasets with a smaller-scale KG.

(2) Effect of aggregation depth $A$ in the short-term interest module

As described in Fig. 4, the first-order aggregation achieves the best performances on the three datasets, which is similar to some previous work (e.g., KGCN and KASR). However, high-order aggregation leads to the performance degradation. The reason is that a long relation-chain may capture irrelevant information and decrease performance.

(3) Effect of collaborative item size $M$ in the item co-occurrence module

The variation of $M$ is shown in Fig. 5. Since $M$ controls the number of collaborative items to be aggregated, its results show similar patterns with those of $N$ : a small $M$ is not sufficient to encode CF information. By increasing the value of $M$ , the model performance improves. However, a too large $M$ can cause the performances to decline.

According to the tuning results of the hyper-parameters, we set $N=6$ , $A=1$ and $M=6$ for our proposed model.

5.4 Performance comparison (RQ 2 and RQ 3)

We first report the overall comparison and then evaluate how our model can deal with the sparsity and cold start problem.

5.4.1 Overall comparison

The performance results of all the methods are illustrated in Table 2. We make the following observations.

Table 2
Recommendation Performance. The best performing method is boldfaced. The underlined result represents the best baseline for each metric. ${}^{*}$ indicates the statistical significance for $p<=0.05$ compared to the best baseline method based on the paired $t$ -test

	Recall			MAP			NDCG
	@10	@15	@20	@10	@15	@20	@10	@15	@20
MovieLens-1M
KGCN	0.0616	0.0858	0.1083	0.0560	0.0504	0.0513	0.1164	0.1189	0.1238
KGAT	0.0810	0.1139	0.1434	0.0689	0.0639	0.0624	0.1386	0.1446	0.1522
Caser	0.1616	0.2179	0.2638	0.1150	0.1139	0.1155	0.2212	0.2344	0.2466
SASRec	0.1593	0.2135	0.2568	0.1318	0.1275	0.1272	0.2389	0.2488	0.2588
MA-GNN	0.1557	0.2055	0.2473	0.1295	0.1247	0.1246	0.2332	0.2412	0.2514
Our model	0.1798 ${}^{\ast}$	0.2334 ${}^{\ast}$	0.2777 ${}^{\ast}$	0.1542 ${}^{\ast}$	0.1490 ${}^{\ast}$	0.1486 ${}^{\ast}$	0.2675 ${}^{\ast}$	0.2751 ${}^{\ast}$	0.2844 ${}^{\ast}$
AmazonBook
KGCN	0.0313	0.0445	0.0598	0.0098	0.0094	0.0103	0.0202	0.0274	0.0311
KGAT	0.0463	0.0603	0.0739	0.0188	0.0199	0.0208	0.0307	0.0352	0.0391
Caser	0.0434	0.0575	0.0716	0.0102	0.0105	0.0109	0.0233	0.0285	0.0326
SASRec	0.0573	0.0744	0.0890	0.0236	0.0250	0.0261	0.0371	0.0426	0.0472
MA-GNN	0.0668	0.0869	0.1036	0.0270	0.0287	0.0298	0.0439	0.0504	0.0552
Our model	0.0862 ${}^{\ast}$	0.1088 ${}^{\ast}$	0.1269 ${}^{\ast}$	0.0376 ${}^{\ast}$	0.0395 ${}^{\ast}$	0.0407 ${}^{\ast}$	0.0589 ${}^{\ast}$	0.0660 ${}^{\ast}$	0.0713 ${}^{\ast}$
Last-FM
KGCN	0.0712	0.0905	0.1066	0.0601	0.0487	0.0401	0.1166	0.1010	0.1003
KGAT	0.0903	0.1071	0.1181	0.0721	0.0622	0.0647	0.1225	0.1160	0.1177
Caser	0.0941	0.1115	0.1249	0.0812	0.0727	0.0696	0.1291	0.1249	0.1241
SASRec	0.1116	0.1337	0.1512	0.0934	0.0896	0.0832	0.1418	0.1395	0.1401
MA-GNN	0.0981	0.1262	0.1314	0.0815	0.0760	0.0732	0.1288	0.1257	0.1271
Our model	0.1339 ${}^{\ast}$	0.1551 ${}^{\ast}$	0.1709 ${}^{\ast}$	0.1242 ${}^{\ast}$	0.1095 ${}^{\ast}$	0.1034 ${}^{\ast}$	0.1936 ${}^{\ast}$	0.1829 ${}^{\ast}$	0.1793 ${}^{\ast}$

•

Our proposed model, AKCISR, yields the best performance on all three datasets with evaluation metrics. Compared with the best baselines for each dataset, AKCISR improves Recall@10 by 11.3%, 29.0% and 20.0%, MAP@10 by 16.9%, 32.9% and 33.0%, and NDCG@10 by 12.0%, 34.1% and 36.5%, respectively. First, AKCISR outperforms Caser which only models users’ short-term interests, while AKCISR models short-term interests, long-term interests and item co-occurrence probabilities separately. In particular, the Recall@10 of AKCISR is 42.3% higher than that of Caser on LastFM dataset. Second, compared with SASRec which leverages the self-attention mechanism to capture the item transition patterns, AKCISR combines the KG transfer module and GRU module to capture both the item transition patterns and feature transition patterns simultaneously. Third, with reference to MA-GNN who model user preferences through different channels, AKCISR also take different modules into consideration. In addition, it aggregates KG and CF information to enrich these modules. We can see that the recall@10 of our model exceeds MA-GNN by more than 15%, 29% and 36% on the three datasets. Fourth, AKCISR achieves better performance than KGCN and KGAT, since these methods only capture the user’s general interests, neglecting the sequential patterns.

•

For the sequential recommendation models Caser, SASRec and MA-GNN, Caser achieves better performance on the MovieLens-1M datasets in terms of Recall@ $K$ . However, its performances on the AmazonBook datasets (high sparsity) and Last-FM datasets (long sequences) are relatively poor. This means that Caser has difficulty in dealing with sparser data and long-range dependencies. In contrast, SASRec and MA-GNN achieve better performance on AmazonBook and Last-FM datasets, indicating that they help to alleviate the issues related to data sparsity and long-term memory loss.

•

For the KG-based recommendation method, their performances are relatively worse probably because they are not appropriate for the sequential recommendation tasks. In addition, we can see that KGAT consistently outperforms KGCN on all datasets. One possible reason is that KGAT models the high-order connectivities in collaborative knowledge graph, while KGCN only takes KG information into consideration.

•

We conduct the Nemenyi test [28] and Fig. 6 shows the critical difference (CD) diagram over the average ranks of the tested recommendation models. The group of classifiers that are not significantly different is connected by a bar (with length 4.35). The Classifier with the lowest(best) rank lies on the far right. According to Fig. 6, the result is not sufficient to distinguish the performance among AKCISR, SASRec, MA-GNN, KGCN, KGAT and Caser. However, AKCISR is statistically superior to the other methods and significantly better than KGCN.

5.4.2 Performance comparison w.r.t. interaction sparsity level

The cold start and sparsity problem usually limit the performance of the recommender system. Here we compare AKCISR with competitive sequential recommendation baselines to evaluate whether our model can alleviate these problems and whether incorporating KG and CF information is valid.

We divide users into groups according to their interaction numbers, with the total interaction numbers roughly the same in each group. Taking the MovieLens-1M dataset as an example, the interaction numbers per user are less than 95, 199, 356, and 1440, respectively. The results w.r.t NDCG@10 on our model and sequential recommendation baselines are shown in Fig. 7. Note that AKCISR-wo-kg $\&$ cf represents the AKCISR model without introducing KG and CF information.

According to Fig. 7, we can see that the density level increases with the interaction numbers. The NDCG@10 of all the methods also shows the same tendency. AKCISR outperforms the sequential recommendation baselines in all user groups, especially on the four user groups in the Last-FM dataset. From Section 5.5, we can observe that the KG information plays a great role on this dataset. If removing the KG and CF parts in AKCISR, the performance will decrease in most cases, further supporting the importance of KG and CF information in catering to user groups with varying densities.

Figure 6.

CD diagram over the average rank of AKCISR and the other 5 methods.

5.5 Ablation analysis (RQ 4)

To verify the effectiveness of the proposed modules in AKCISR, we conduct ablation analysis in Table 3 to demonstrate how each module contributes to our model. In (1), we utilize user vector to model user general preferences, and leverage GRU and attention mechanism to model user short-term interests. (2) We utilize the KG transition method to enhance the short-term interest module on top of (1). (3) We fuse long-term interests and short-term interests by gating mechanism on top of (2). (4) We also apply KG transition method in long-term interests module. (5) We incorporate the item co-occurrence module by a linear function to capture item-item relations on top of (3). (6) We screen similar items based on the CF method and perform aggregation to enrich the item co-occurrence patterns. (7) We adopt transD, a KG embedding method, to pretrain the representation on top of (5) and obtain our final proposed model, AKCISR.

According to the results shown in Table 3, we make the following observations.

•
Leveraging the user general interest module and short-term interest module to represent user preferences could obtain reasonable results, as (1) indicated.
•
By comparing (1) and (2), incorporating KG propagation in the short-term interest module can obviously improve the performance of the MovieLens-1M dataset and slightly improve that of

Figure 7.
Performance comparison over the sparsity distribution of user groups on different datasets. The background histograms indicate the density of each user group; meanwhile, the lines demonstrate the performance w.r.t. NDCG@10.

Table 3
Recommendation Performance. The best performing method is boldfaced. The underlined result represents the best baseline for each metric. ${}^{*}$ indicates the statistical significance for $p<=0.05$ compared to the best baseline method based on the paired $t$ -test

MovieLens-1M AmazonBook Last-FM

Recall MAP NDCG Recall MAP NDCG Recall MAP NDCG

@20 @20 @20 @20 @20 @20 @20 @20 @20

(1) U $+$ S 0.2618 0.1364 0.2687 0.1200 0.0370 0.0679 0.1444 0.0820 0.1520

(2) U $+$ S (KG-based) 0.2734 0.1443 0.2803 0.1202 0.0374 0.0676 0.1474 0.0841 0.1549

(3) U $+$ S (KG-based) $+$ L 0.2759 0.1456 0.2818 0.1210 0.0386 0.0681 0.1475 0.0840 0.1551

(4) U $+$ S (KG-based) $+$ L (KG-based) 0.2731 0.1441 0.2805 0.1198 0.0371 0.0669 0.1474 0.0839 0.1548

(5) U $+$ S (KG-based) $+$ L $+$ I 0.2760 0.1456 0.2843 0.1200 0.0361 0.0668 0.1467 0.0830 0.1537

(6) U $+$ S (KG-based GRU) $+$ L $+$ 0.2777 0.1479 0.2843 0.1241 0.0405 0.0700 0.1516 0.0861 0.1592

I (CF-based)

(7) AKCISR 0.2777 0.1486 0.2844 0.1269 0.0407 0.0731 0.1709 0.1034 0.1793

Last-FM dataset. However, there is no significant change on the AmazonBook dataset. This may be related to the scale of the KG. By selecting neighbourhoods from too many entities, the information aggregated into the item representations may be weakly related. Therefore, the learning difficulty can be increased.
•
Integrating short-term and long-term interest module achieves slight improvement, as shown in (2) and (3).
•
Acording to (3) and (4), incorporating KG information in both the short-term module and long-term module is worse than only incorporating it in short-term module, since aggregating side information for long interactions can bring much noise.
•
(5) and (6) add the item co-occurrence module and CF-based co-occurrence module respectively. We observe that (6) can further improve the performance on all the datasets, compared with (5). This shows the effectiveness of aggregating items themselves with similar item information, which helps to capture implicit item-item relations.
•
The KGE method also plays a role in enriching the representations, especially on the Last-FM dataset, according to (7).
•
Overall, the improvement on MovieLens-1M dataset is mainly affected by KG propagation, corresponding to (2), with an increase of 4% at Recall@20. AmazonBook dataset is mostly improved by 3.4% at Recall@20 through collaborative item aggregation, as (6) indicated. In addition, KG embedding plays a great role on Last-FM dataset with the improvement of (7) more than 12.7%.

6. Conclusion

	MovieLens-1M	AmazonBook	Last-FM
	Recall	MAP	NDCG	Recall	MAP	NDCG	Recall	MAP	NDCG
	@20	@20	@20	@20	@20	@20	@20	@20	@20
(1) U $+$ S	0.2618	0.1364	0.2687	0.1200	0.0370	0.0679	0.1444	0.0820	0.1520
(2) U $+$ S (KG-based)	0.2734	0.1443	0.2803	0.1202	0.0374	0.0676	0.1474	0.0841	0.1549
(3) U $+$ S (KG-based) $+$ L	0.2759	0.1456	0.2818	0.1210	0.0386	0.0681	0.1475	0.0840	0.1551
(4) U $+$ S (KG-based) $+$ L (KG-based)	0.2731	0.1441	0.2805	0.1198	0.0371	0.0669	0.1474	0.0839	0.1548
(5) U $+$ S (KG-based) $+$ L $+$ I	0.2760	0.1456	0.2843	0.1200	0.0361	0.0668	0.1467	0.0830	0.1537
(6) U $+$ S (KG-based GRU) $+$ L $+$	0.2777	0.1479	0.2843	0.1241	0.0405	0.0700	0.1516	0.0861	0.1592
I (CF-based)
(7) AKCISR	0.2777	0.1486	0.2844	0.1269	0.0407	0.0731	0.1709	0.1034	0.1793

In this paper, we propose an AKCISR framework with four modules for sequential recommendation. The external knowledge and collaborative information are integrated in the short-term module and item co-occurrence module through neighborhood aggregation, respectively. Gating mechanisms are adopted in the long-term module to strengthen the important items that appear in the historical interactions. The comparison results demonstrate that AKCISR outperforms all other baselines across various evaluation metrics. Moreover, the experiments focusing on interaction sparsity levels unequivocally indicate that our proposed model effectively alleviates the sparsity problem. Additionally, the ablation experiments provide compelling evidence for the significant contributions made by our proposed modules within the AKCISR framework. Further research should be performed to improve the parallelism efficiency of our model. Meanwhile, we could integrate the information of semantic paths in KG to design more explainable recommendation systems.

Footnotes

Acknowledgments

This work was supported by the National Key R & D Program of China (2021ZD0113002) and Fundamental Research Funds for the Central Universities (2022JBMC011).

Conflict of interest

The authors declare that they have no conflicts of interest.

Data availability

The MovieLens-1M dataset, AmazonBook dataset, and Last-FM dataset used to support the findings have been deposited in the GroupLens repository (https://grouplens.org/datasets/movielens/), UCSD repository (http://jmcauley.ucsd.edu/data/amazon/), and JKU repository (http://www.cp.jku.at/datasets/LFM-1b/). The KB information are extracted from the open Knowledge Base [] in recommendation systems.

References

Tang

Wang

, Personalized top-n sequential recommendation via convolutional sequence embedding, in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018, pp. 565–573.

Hidasi

Karatzoglou

Baltrunas

Tikk

, Session-based recommendations with recurrent neural networks, arXiv preprint arXiv:1511.06939, 2015.

Ren

Chen

Ren

Lian

, Neural attentive session-based recommendation, in: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 1419–1428.

Yao

Song

Wang

Deng

Wang

Xiao

Lin

Gui

, ADKGN: An Attentive Dynamic Knowledge Graph Network for Sequential Recommendation, in: 2021 International Joint Conference on Neural Networks (IJCNN), IEEE, 2021, pp. 1–8.

Kang

W.-C.

McAuley

, Self-attentive sequential recommendation, in: 2018 IEEE International Conference on Data Mining (ICDM), IEEE, 2018, pp. 197–206.

Sun

Liu

Pei

Lin

Jiang

, BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer, in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 1441–1450.

Zheng

Gao

Chang

Niu

Song

Jin

, Disentangling long and short-term interests for recommendation, in: Proceedings of the ACM Web Conference 2022, 2022, pp. 2256–2267.

Lian

Mahmoody

Liu

Xie

, Adaptive user modeling with long and short-term preferences for personalized recommendation, in: IJCAI, 2019, pp. 4213–4219.

Zhang

Sun

Liu

Coates

, Memory augmented graph neural networks for sequential recommendation, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 5045–5052.

10.

Guo

Zhuang

Qin

Zhu

Xie

Xiong

, A survey on knowledge graph-based recommender systems, IEEE Transactions on Knowledge and Data Engineering 34(08) (2020), 3549–3568.

11.

Wang

Zhang

Xie

Guo

, DKN: Deep knowledge-aware network for news recommendation, in: Proceedings of the 2018 World Wide Web Conference, 2018, pp. 1835–1844.

12.

Huang

Zhao

W.X.

Dou

Wen

J.-R.

Chang

E.Y.

, Improving sequential recommendation with knowledge-enhanced memory networks, in: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, 2018, pp. 505–514.

13.

Zhang

Yuan

N.J.

Lian

Xie

W.-Y.

, Collaborative knowledge base embedding for recommender systems, in: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 353–362.

14.

Wang

Zhang

Hou

Xie

Guo

Liu

, Shine: Signed heterogeneous information network embedding for sentiment link prediction, in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018, pp. 592–600.

15.

Wang

Zhang

Wang

Zhao

Xie

Guo

, Ripplenet: Propagating user preferences on the knowledge graph for recommender systems, in: Proceedings of the 27th ACM International Conference on Information and Knowledge Management, 2018, pp. 417–426.

16.

Kojima

Ishida

Ohta

Iwata

Honma

Okuno

, KGCN: A graph-based deep learning framework for chemical structures, Journal of Cheminformatics 12(1) (2020), 1–10.

17.

Wang

Cao

Liu

Chua

T.-S.

, KGAT: Knowledge graph attention network for recommendation, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 950–958.

18.

Liu

Wang

Tan

, A dynamic recurrent model for next basket recommendation, in: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2016, pp. 729–732.

19.

McAuley

, Fusing similarity models with markov chains for sparse sequential recommendation, in: 2016 IEEE 16th International Conference on Data Mining (ICDM), IEEE, 2016, pp. 191–200.

20.

Tang

Zhu

Wang

Xie

Tan

, Session-based recommendation with graph neural networks, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 346–353.

21.

Chen

Zhang

Tang

Cao

Qin

Zha

, Sequential recommendation with user memory networks, in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018, pp. 108–116.

22.

Wang

Xiong

Zhu

P.S.

, KASR: knowledge-aware sequential recommendation, in: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data, Springer, 2020, pp. 493–508.

23.

Liu

Zhao

, Knowledge graph embedding via dynamic mapping matrix, in: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2015, pp. 687–696.

24.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A.N.

Kaiser

Ł.

Polosukhin

, Attention is all you need, 2017.

25.

Jin

Sun

Lin

Yang

, SDM: Sequential deep matching model for online large-scale recommender system, in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 2635–2643.

26.

Breese

J.S.

Heckerman

Kadie

, Empirical analysis of predictive algorithms for collaborative filtering, arXiv preprint arXiv:1301.7363, 2013.

27.

Zhao

W.X.

Yang

Dou

Huang

Ouyang

Wen

J.-R.

, Kb4rec: A data set for linking knowledge bases with recommender systems, Data Intelligence 1(2) (2019), 121–136.

28.

Demšar

, Statistical comparisons of classifiers over multiple data sets, The Journal of Machine Learning Research 7 (2006), 1–30.

Aggregating knowledge and collaborative information for sequential recommendation

Abstract

Keywords

1. Introduction

2.1 Knowledge graphs for recommendation

2.2 Sequential recommendation

2.3 Knowledge graph for sequential recommendation

3. Problem formulation

4. Methodology

4.2.1 General interest module

4.2.2 Short-term interest module

5.1 Datasets

(1) Effect of neighbor sampling size N in the short-term interest module

(2) Effect of aggregation depth A in the short-term interest module

(3) Effect of collaborative item size M in the item co-occurrence module

5.4 Performance comparison (RQ 2 and RQ 3)

5.4.1 Overall comparison

Table 2 Recommendation Performance. The best performing method is boldfaced. The underlined result represents the best baseline for each metric. * indicates the statistical significance for p ≤ 0.05 compared to the best baseline method based on the paired t -test

Footnotes

Acknowledgments

Conflict of interest

Data availability

References

(1) Effect of neighbor sampling size $N$ in the short-term interest module

(2) Effect of aggregation depth $A$ in the short-term interest module

(3) Effect of collaborative item size $M$ in the item co-occurrence module

Table 2
Recommendation Performance. The best performing method is boldfaced. The underlined result represents the best baseline for each metric. ${}^{*}$ indicates the statistical significance for $p<=0.05$ compared to the best baseline method based on the paired $t$ -test