Heterogeneous Information Network Recommendation Driven by Automatic Meta-Path Identification

Abstract

Heterogeneous information networks significantly enhance the performance of recommendation systems by integrating rich structural and semantic information. However, most existing methods are designed for homogeneous networks and utilize techniques such as attention mechanisms and multi-layer architectures, which often introduce unnecessary parameter complexity. Furthermore, meta-path-based approaches typically rely on manually predefined meta-paths, which limit the models’ generalization capabilities. This article conducts an in-depth investigation into meta-path identification and proposes an Automatic Meta-Path Identification Recommendation (AMPIRec) framework. The framework optimizes the semantic representation of meta-paths through matrix self-transformation properties using weighted aggregation, while simultaneously reducing model complexity. Additionally, AMPIRec employs a multi-head attention mechanism for the joint training and optimization of user and item features, with its primary advantages being numerical stability and high prediction accuracy. To enhance recommendation precision, we specifically design a tailored top-k smoothed loss function that aligns recommendation objectives with real-world requirements. Extensive experiments on multiple real-world heterogeneous graph datasets demonstrate that AMPIRec achieves outstanding performance in both accuracy and stability.

Keywords

heterogeneous information network recommendation system meta-path identification graph neural network

1. Introduction

The entire world is a universally interconnected organic whole. The connections between all entities can be abstracted into a vast heterogeneous information network (HIN) (Sun & Han, 2013). Extracting appropriate subgraphs from this extensive HIN allows us to study and analyze specific phenomena and relationships. In the field of recommendation systems, researchers have previously extracted subgraphs of user-item interaction data and applied powerful graph neural networks (GNNs) (Scarselli et al., 2008) to recommendation tasks, utilizing graph representation learning to simulate and analyze user interaction behaviors. Traditionally, GNNs have been primarily applied to homogeneous graphs, and this non-linear model has demonstrated superior performance compared to algorithmic reasoning models in recommendation tasks involving single relational data (Liu et al., 2022; Wu et al., 2020). However, GNN-based recommendation models typically rely on continuous message passing and aggregation using interaction information between users and items (Kipf & Welling, 2016). During this message-passing process, they often assume that all messages are homogeneous, which is inadequate for analyzing these information-rich datasets, as it fails to capture the structural information inherent in single relationships. In contrast, HIN recommendation models fully leverage real-world information and generally outperform traditional recommendation models in terms of recommendation accuracy. Consequently, the development of robust representation learning methods specifically designed for HINs has become essential.

In GNNs designed for HINs, meta-paths serve as a sophisticated analytical tool that effectively captures higher-order semantic relationships between instances. These meta-paths are utilized to identify neighbors along paths of specific lengths and relationships. By analyzing the connections between instances, the innovative design of meta-paths allows for a deeper exploration of these higher-order semantic relationships, thereby overcoming the limitations associated with analyzing complex networks from a singular perspective. This approach significantly enhances the interpretability of the model. However, the advantage of improved model explanation can only be realized after experts assess and establish the importance and effectiveness of particular meta-paths. In large networks, this process can lead to a notable decrease in processing efficiency. Furthermore, in practical applications, the effectiveness of meta-paths is often constrained by the quality and completeness of the network data. In neural network, recommendation models that utilize meta-path design, it is common practice to first assign weights to different meta-paths based on their statistical information or through machine learning techniques. Alternatively, an average weight may be directly assigned to each meta-path. These weight assignment methods are heavily dependent on specific datasets and contexts, making them unsuitable for universal application across different networks. Subsequently, deep learning techniques are employed to integrate the features of the meta-paths. Finally, various neural network methods are used to process the fused data. The multi-parameter learning involved in this process demands substantial computational resources and time.

In light of these limitations, this article introduces the automatic meta-path identification recommendation (AMPIRec) model, which builds upon the meta-path concept to address the complexities of representation learning in HINs. AMPIRec is designed to automatically identify the optimal meta-path prior to training, thereby reducing the need for exhaustive computations and expert intervention. To further enhance the original user-item interactions and simplify the complexities associated with multi-layer neural network aggregation methods, AMPIRec employs a conditional weighted mean meta-path neighbor strategy for aggregating semantic information. This enriched interaction data is then utilized to learn user and item embeddings through a self-attention mechanism, enabling a more precise analysis of their correlations.

To validate the effectiveness of AMPIRec, we conducted experiments using two real-world datasets: an extended version of the MovieLens dataset, released by the GroupLens research group at the University of Minnesota, and the Amazon-Book Recommendations Database hosted on the Kaggle platform. The results indicate that AMPIRec consistently outperforms state-of-the-art methods in recommendations within HINs. This article presents the following contributions:

(i)
We propose an automatic meta-path identification method based on the properties of the meta-path matrix prior to training.
(ii)
We have developed a conditional weighted average aggregation method for meta-path semantic information. This approach enhances the original interactions and optimizes the complexity of semantic aggregation.
(iii)
Extensive experiments conducted on two real-world datasets validate the effectiveness of the AMPIRec model, demonstrating its superiority over existing advanced methods.

2. Related Work

In this section, we review the key developments in HIN recommendations and embeddings. We explore both early methods that rely on similarity measures and more recent approaches that incorporate GNNs and embeddings, discussing their contributions and limitations in handling heterogeneous data.

2.1. Embedding Techniques for HINs

In recent years, HINs and their rich semantic information have significantly enhanced the performance of recommendation systems. By leveraging various edge relationships, HINs can establish indirect connections between new users or items and existing users and items, thereby alleviating the cold-start problem faced by recommendation models. HIN embedding techniques capture the network’s topological structure, node attributes, and the complex relationships among different types of nodes and edges, transforming the high-dimensional HIN into a low-dimensional representation. HIN2Vec is a prominent model used for embedding HINs (Fu et al., 2017), it generates low-dimensional node embeddings by integrating random walks with the skip-gram (Mikolov et al., 2013). This approach predicts the existence of edges between pairs of nodes while simultaneously learning both node and meta-path representations.

The rich edge relationships in HINs allow nodes to be interconnected through various paths of differing lengths. However, as the path length increases, there is a recurrence of node pairs, resulting in a substantial number of repeated pairs. While these repetitions convey distinct information, they also introduce unnecessary noise into the model. Meta-paths enable researchers to analyze the similarity between users and items from multiple perspectives. Sun and Han (2013) were the first to propose the method of utilizing meta-paths to mine neighbors, which mitigates the explosion of node pair counts associated with paths that contain specific semantic relationships and provides enhanced interpretability. PathSim Sun et al. (2011) evaluates the similarity between nodes of the same type by analyzing their shared meta-paths. Building on this concept, HeteSim Shi et al. (2014) introduced a comprehensive framework for assessing the similarity between nodes of different types. The embedding vectors generated by HeteSim effectively capture the heterogeneity and structural characteristics of the network. HERec (Shi et al., 2018) captures the intricate relationships between users and items by extracting various types of meta-paths within the network and utilizes these meta-paths to construct a recommendation model. Metapath2Vec (Dong et al., 2017) employs specifically designed meta-paths to guide random walks, ensuring that the embedding process effectively captures particular semantic relationships within the network.

2.2. GNNs for HINs

GNNs, a deep learning technique, have been widely applied to manage graph-structured data. Their primary advantage is the ability to effectively capture and leverage the dependencies between nodes. One notable example is the LightGCN (He et al., 2020) algorithm, a recommendation model based on a simplified graph convolutional network (GCN). LightGCN streamlines the GNN architecture by eliminating non-essential components, such as feature transformations and activation functions. This simplification not only enhances training efficiency but also improves recommendation performance. The remarkable success of GNNs has inspired numerous researchers to develop various recommendation models specifically tailored for heterogeneous GNNs (HGNNs).

HGNNs that utilize meta-paths typically enhance the learning process by aggregating the features of adjacent nodes with similar semantics. One notable example of this approach is the RGCN (Schlichtkrull et al., 2018), which learns low-dimensional representations of nodes by stacking multiple graph convolutional layers and aggregating features from local neighborhood information. Additionally, the self-attention mechanism (Vaswani, 2017), as an advanced component of GNNs, enables models to adaptively focus on the most significant adjacent nodes during the information aggregation process. This mechanism allows the model to dynamically adjust weights, thereby improving the effectiveness and flexibility of representation learning. Self-attention and similar methods, such as the GAT (Veličković et al., 2017), are particularly effective in capturing relationships and dependencies between nodes while being adaptable to various topological structures. However, these methods are sensitive to hyperparameters, such as the number of attention heads and learning rates, which can significantly impact their overall performance. For instance, the meta-path-based HAN (Wang et al., 2019) takes into account multiple types of nodes and edges in heterogeneous networks and implements two levels of attention mechanisms to learn node representations through a hierarchical attention process. SeHGNN (Yang et al., 2023) learns node representations by considering both the structural and semantic roles of the nodes. To alleviate computational burden, CGGP (Liu et al., 2023) establishes a set of comprehensive pruning criteria to progressively eliminate unimportant nodes and edges from the graph, resulting in a sparser network.

In this article, we present a recommendation model that automatically selects and weights meta-paths utilizing graph transformation capabilities. It also learns node embedding representations based on the transformer architecture to effectively

3. Preliminaries

Definition 1
A HIN can be defined as $G = (V, E)$ , where $V$ represents the set of nodes, and $E$ represents the set of links between these nodes. The network includes a mapping function $τ : V ↣ O$ , which associates each node with its corresponding attribute, and a mapping function $ϕ : E ↣ R$ , which links each connection to its corresponding relationship. Here, $| O |$ denotes the number of distinct attributes, and $| R |$ denotes the number of distinct relationships. For each node $v \in V$ , there is an associated attribute $τ (v) \in O$ . Similarly, for each link $e \in E$ , there is an associated relationship type $ϕ (e) \in R$ . If $G$ contains either $| O | > 1$ or $| R | > 1$ , it is classified as a HIN. Otherwise, it is considered a homogeneous information network.
Example 1
An example, which serves as a concise HIN as illustrated in Figure 1, there are a total of nine nodes, which consist of two users (U), five movies (M), a director (D), and an actor (A). For instance, for user $U_{1}$ , there exists an $U s e r \overset{rate}{\to} M o v i e$ connected path between $U_{1}$ and $M_{1}$ , indicating the user $U_{1}$ rates the movie $M_{1}$ . Formally, the node set in this network is denoted as $V = {U_{1}, U_{2}, M_{1}, M_{2}, M_{3}, M_{4}, M_{5}, D_{1}, A_{1}}$ . The edge set is represented as $E = {(U_{1}, r a t e, M_{1}), (U_{1}, t a g e t, M_{2}), (U_{2}, r a t e, M_{2}), (U_{2}, t a g e t, M_{5}), (D_{1}, f i l m, M_{1}), (D_{1}, f i l m, M_{4}), (A_{1}, a c t, M_{1}), (A_{1}, a c t, M_{2}), (A_{1}, a c t, M_{3})}$ . Additionally, the sets of node types and edge types are defined as $O = {U s e r, M o v i e, D i r e c t o r, A c t o r}$ and $R = {U s e r \overset{target}{\to} M o v i e, U s e r \overset{rate}{\to} M o v i e, D i r e c t o r \overset{film}{\to} M o v i e, A c t o r \overset{act}{\to} M o v i e}$ , respectively.
Figure 1.
Example of heterogeneous information network (HIN).
Definition 2
Meta-Path is a path defined on $G$ and is represented in the form of $O_{1} \to O_{2} \to \dots \to O_{l}$ . Consequently, it defines a composite relationship $R = R_{1} \circ R_{2} \circ \dots \circ R_{l - 1}$ between two node types $O_{1}$ and $O_{l}$ , where $\circ$ denotes the composition operator applied to relations $R$ . A specific instance of this relationship can be expressed as Meta-Path $(v_{1}, v_{l}) = v_{1} e_{1} v_{2} e_{2} \dots e_{l - 1} v_{l}$ . In this instance, $v_{l}$ is also referred to as the $(l - 1)$ -hop neighbor of $v_{1}$ .
Example 2
In the HIN illustrated in Figure 1, multiple distinct meta-paths can be derived, each representing different semantic relationships between nodes. One such meta-path, referred to as UMAM, captures a higher-order semantic relationship: an user who has expressed interest in a particular movie may be inclined to watch other movies featuring the same actor as the initially selected movie. This meta-path reflects the user’s potential preferences based on their interest in specific actors, thereby facilitating more personalized and contextually relevant recommendations.
4. AMPIRec: HIN Recommendation Via Automated Meta-Path Identification

This section introduces the automatic meta-path identification recommendation (AMPIRec). AMPIRec is designed to automatically identify optimal Meta-Paths within HINs and generate effective embedding representations for recommendations. As illustrated in Figure 2, AMPIRec comprises three key modules: the meta-path recognition module, which identifies the most relevant meta-paths. The interaction information enhancement module, which enriches interaction data by incorporating these meta-paths; and the context-aware user-item embedding module, which refines node embeddings and captures complex dependencies between users and items. Together, these modules enable AMPIRec to deliver precise and contextually relevant recommendations by leveraging both structural and semantic information in HINs.

Figure 2.

The model of automatic meta-path identification recommendation (AMPIRec).

4.1. Meta-Path Identification Module

At a macro level, the abstract representation of the model is $A W = B$ . The stability of the input matrix $A$ influences how the weight matrix $W$ is adjusted during training and affects the convergence speed of the model. To break down this abstract representation into its feature dimensions, it can be expressed as $A w = b$ . Considering feedback adjustment in the neural network, let the correction to $w$ in each training batch be denoted as $δ w$ . The change in the output of the entire neural network can then be expressed as $A (w + δ w) = b + δ b$ . This representation helps in understanding how adjustments to weights impact the overall model output, providing insights into the models stability and learning dynamics. From the perspective of solving linear equations, the condition number of the matrix $A$ is a crucial factor in assessing numerical stability during matrix computations.

Definition 3
Condition number of the input matrix $A$ is defined as $C o n d (A) = ‖ A ‖ ‖ A^{- 1} ‖$ , which holds that $C o n d (A) \geq 1$ , where $‖ * ‖$ is spectral norm.

Inspired by matrix perturbation analysis from Matrix Theory (Franklin, 2012), we examine how perturbations to the weight vector $w$ in the neural network model affect its performance. Our main focus is whether the output vector representation can effectively and accurately capture the characteristics of a given node. Consider the model’s output $b$ , and analyze the equation $A δ w = δ b$ . Taking the norm of both sides, we obtain $‖ A δ w ‖ = ‖ δ b ‖$ . According to the properties of matrix norms, we have the following equation:
$\begin{aligned} ‖ A^{- 1} ‖ = \frac{1}{min \frac{‖ A x ‖}{‖ x ‖}} \geq \frac{1}{\frac{‖ A δ w ‖}{‖ δ w ‖}} = \frac{‖ δ w ‖}{‖ A δ w ‖} = \frac{‖ δ w ‖}{‖ δ b ‖} . \end{aligned}$
(1)

By taking the norm of both sides of the equation $A w = b$ , we obtain $‖ A w ‖ = ‖ b ‖$ . According to the properties of matrix norms, this implies as follows:
$\begin{aligned} ‖ A ‖ = max \frac{‖ A x ‖}{‖ x ‖} \geq \frac{‖ A w ‖}{‖ w ‖} . \end{aligned}$
(2)

Based on the non-negativity of norms and by combining equation (1) with equation (2), we obtain as follows:
$\begin{aligned} \frac{1}{‖ A ‖ ‖ A^{- 1} ‖} \frac{‖ δ w ‖}{‖ w ‖} \leq \frac{‖ δ b ‖}{‖ b ‖} . \end{aligned}$
(3)

From equation (3), we can derive the infimum of the change in the feedback adjustment weight matrix, $‖ δ w ‖$ , in relation to the change in the output, $‖ δ b ‖$ . Assume that $‖ A ‖ ‖ A^{- 1} ‖ = 10$ and that the feedback adjustment weight matrix changes by 10%, that is, $‖ δ w ‖ = 0.1 ‖ w ‖$ . Consequently, $‖ δ b ‖ = 0.01 ‖ b ‖$ , indicating that the output $b$ will change by at least 1%.

Similarly, considering $‖ A δ w ‖ = ‖ δ b ‖$ , and based on the properties of matrix norms, we have the following equation:
$\begin{aligned} ‖ A ‖ = max \frac{‖ A x ‖}{‖ x ‖} \geq \frac{‖ A δ w ‖}{‖ δ w ‖} = \frac{‖ δ b ‖}{‖ δ w ‖} . \end{aligned}$
(4)

Furthermore, for $‖ A w ‖ = ‖ b ‖$ , we can apply the properties of matrix norms, including those of the inverse matrix, to derive the following:
$\begin{aligned} ‖ A^{- 1} ‖ = \frac{1}{min \frac{‖ A x ‖}{‖ x ‖}} \geq \frac{1}{\frac{‖ A w ‖}{‖ w ‖}} = \frac{‖ w ‖}{‖ A w ‖} = \frac{‖ w ‖}{‖ b ‖} . \end{aligned}$
(5)

By combining equation (4) with equation (5), we can determine the supremum of the change in output, $‖ δ b ‖$ :
$\begin{aligned} \frac{‖ δ b ‖}{‖ b ‖} \leq ‖ A ‖ ‖ A^{- 1} ‖ \frac{‖ δ w ‖}{‖ w ‖} . \end{aligned}$
(6)

For the actual input $A$ , the adjacency matrices for different node types $o_{*}$ are denoted as $A_{o_{i}, o_{j}} \in R^{n \times m}$ , where $n$ and $m$ denote the number of instances of types $o_{i}$ and $o_{j}$ , respectively—Specifically, users and items in this context. We can derive the adjacency matrices for paths of length $l$ as follows:
$\begin{aligned} A_{o_{i}, o_{j}}^{l} = A_{o_{i}, o_{k_{1}}} A_{o_{k_{1}}, o_{k_{2}}}, \dots, A_{o_{k_{l}}, o_{k_{j}}} . \end{aligned}$
(7)

To evaluate the stability of meta-paths, we perform eigenvalue decomposition on each meta-path and calculate its condition number:
$\begin{aligned} D & = A_{o_{i}, o_{j}}^{l} {A_{o_{i}, o_{j}}^{l}}^{T} + ϵ I, \end{aligned}$
(8)

$\begin{aligned} D & = U \times diag {μ_{1}, μ_{2}, \dots, μ_{n}} \times V^{T}, \end{aligned}$
(9)

$\begin{aligned} Cond (A_{o_{i}, o_{j}}^{l}) & = \sqrt{\frac{μ_{1}}{μ_{r}}}, \end{aligned}$
(10)
where the singular values of $A_{o_{i}, o_{j}}^{l}$ are ordered as $\sqrt{μ_{1}} \geq \sqrt{μ_{2}} \geq \dots \geq \sqrt{μ_{r}} \geq \sqrt{μ_{r + 1}} = \sqrt{μ_{r + 2}} = \dots = \sqrt{μ_{n}} = 0$ . In summary, when identifying meta-paths as auxiliary information, we choose the top $k$ meta-paths with the smallest condition numbers $Cond (A)$ .
4.2. Interaction Information Enhancement Module

To mitigate the excessive number of parameters in neural networks and the associated computational overhead that can result in extended training times, we refrain from using the conventional method of employing a multi-layer perceptron (MLP) to aggregate neighbors during each training iteration. Instead, drawing inspiration from the high-order interaction information captured through multiple convolutions in LightGCN, we treat the meta-path reachability matrix as analogous to the high-order information derived from these convolutions, with the length of the meta-path coresponding the number of GCN layers. In our model, the stability of various meta-path reachability matrices is a crucial factor in assessing their effectiveness as auxiliary information.

In the abstract model of AMPIRec, the equation $A W = B$ represents the relationship between the model’s components, where the input to the model is defined as $A = M_{0} + M_{1} + \dots + M_{k}$ . Here, $M_{0}$ denotes the original interaction information, while $M_{1}, M_{2}, \dots, M_{k}$ refer to the auxiliary information represented by $A_{k}^{*} W_{f}^{k}$ . In this framework, the final output $B$ is determined by the combination of these $k + 1$ matrices.

\begin{aligned} B & = B_{0} + B_{1} + B_{2} + \dots + B_{k} \\ = (M_{0} + M_{1} + M_{2} + \dots + M_{k}) W \\ = (A_{0} W_{f} + \sum A_{i}^{*} W_{f}^{i}) W \\ = A_{e x} W, \end{aligned}

(11)

where

i = 1, 2, \dots, k

To minimize the impact of the meta-path reachability matrices, which serve as auxiliary information for the output matrix $B$ , we utilize the reciprocal of the condition number as the weight for the initial observations. Following this, we compute a weighted average of these matrices, and the resulting value is integrated into the primary interaction matrix. The final computation method for the interaction information enhancement module enables the selection of four different coefficients, resulting in four distinct formulas to choose from:

\begin{aligned} A_{e x}^{1} & = A_{0} W_{f} + \sum_{k = 1}^{K} \frac{1}{{Cond}_{k}} A_{k}^{*} W_{f}^{k}, \end{aligned}

(12)

\begin{aligned} A_{e x}^{2} & = A_{0} W_{f} + \sum_{k = 1}^{K} \frac{e^{- {Cond}_{k}}}{\sum_{j}^{k} e^{- {Cond}_{j}}} A_{k}^{*} W_{f}^{k}, \end{aligned}

(13)

\begin{aligned} A_{e x}^{3} & = A_{0} W_{f} + \sum_{k = 1}^{K} \frac{1}{1 + e^{- {Cond}_{k}}} A_{k}^{*} W_{f}^{k}, \end{aligned}

(14)

\begin{aligned} A_{e x}^{4} & = A_{0} W_{f} + \sum_{k = 1}^{K} \frac{e^{- {Cond}_{k}} - \min (e^{- Cond})}{\max (e^{- Cond}) - \min (e^{- Cond})} A_{k}^{*} W_{f}^{k}, \end{aligned}

(15)

where

A_{k}^{*} [i, j] = \frac{A_{k} [i, j]}{\sum_{i = 1}^{m} A_{k} [i, j]}

, and

A_{k}

denotes the reachability matrix of the

k

-th meta-path, while

A_{k}^{*}

represents its row-normalized matrix.

\begin{aligned} A_{e x}^{[U]} & = A_{e x} W_{u} + b_{u}, \end{aligned}

(16)

\begin{aligned} A_{e x}^{[I]} & = A_{e x}^{T} W_{i} + b_{i} . \end{aligned}

(17)

After incorporating the $k$ meta-paths, we obtain the user preference matrix $A_{e x}^{[U]}$ by a linear transformation layer. Similarly, by enhancing with the inverse paths of the $k$ meta-paths, we derive the item audience matrix $A_{e x}^{[I]}$ .

4.3. Context-Aware User-Item Embedding Module

This module aims to leverage the powerful context-awareness of multi-head attention mechanisms to uncover latent relationships between users and items within the preference matrix and audience matrix, thereby generating more representative node embeddings. Attention mechanism enhances the model’s ability to adaptively focus on the most significant neighboring nodes, thereby improving its capacity to uncover the relevance between users and items. The specific steps are as follows: The preference matrix and audience matrix are vertically concatenated to form the matrix $X = [\begin{matrix} A_{e x}^{[U]} \\ A_{e x}^{[I]} \end{matrix}]$ as the input. Each node vector is then mapped to the query, key, and value embedding spaces. These embeddings are divided into $h$ attention heads, where the scaled dot product is computed within each head’s query and key space to obtain attention scores. These attention scores are transformed into attention weights using the softmax function, which indicates the degree of focus each user (or item) places on all other users and items. Finally, the attention weights are applied to the value space matrix $V_{i}$ to produce the attention output. The embedding process of the attention mechanism can be expressed as follows:

\begin{aligned} Q_{i} & = X W_{q_{i}}, K_{i} = X W_{k_{i}}, V_{i} = X W_{v_{i}}, \end{aligned}

(18)

\begin{aligned} {head}_{i} (Q_{i}, K_{i}, V_{i}) & = softmax (\frac{Q_{i} K_{i}^{T}}{\sqrt{d_{k}}}) V_{i}, \end{aligned}

(19)

where

W_{q_{i}}

W_{k_{i}}

, and

W_{v_{i}}

are the transformation matrices for the

i

-th attention head, and

d_{k}

is the dimension of the key vector. The outputs from the

h

attention heads are concatenated and then passed through a linear transformation to obtain the final output:

\begin{aligned} [\begin{matrix} U_{e m b} \\ I_{e m b} \end{matrix}] = Concat (h e a d_{1}, h e a d_{2}, \dots, h e a d_{h}) W_{o} . \end{aligned}

(20)

Here, $Concat (\cdot)$ denotes the horizontal concatenation function, and the transformation matrix $W_{o} \in R^{(h \times d) \times d}$ maps the concatenated output to a $d$ -dimensional space, resulting in the final embeddings for users and items.

\begin{aligned} p r e d = U_{e m b} \times I_{e m b}^{T} . \end{aligned}

(21)

The row vectors in $p r e d$ represent the recommendation levels of the AMPIRec model for all items recommended to the user. AMPIRec aims to recommend the most suitable items to users by optimizing model parameters through the minimization of the Bayesian personalized ranking (BPR) loss. Instead of predicting user-item interactions directly, BPR emphasizes comparing the model’s predicted rankings of items. The $BPR_Loss$ is a contrastive loss function that ensures the predicted score for items with user interactions is higher than that for items without interactions. Specifically, for each user $u$ and a pair of items $i$ (positive sample) and $j$ (negative sample):

\begin{aligned} BPR_Loss = - \ln σ ({\hat{y}}_{u, i} - {\hat{y}}_{u, j}), \end{aligned}

(22)

where

{\hat{y}}_{u, i}

and

{\hat{y}}_{u, j}

represent the model’s predicted interest levels for the positive sample item

i

and the negative sample item

j

, respectively. The function

σ

denotes the sigmoid function, which is used to convert these predicted interest levels into probabilities.We not only considered the differences introduced by the BPR_loss but also designed a top-k smoothed distribution contrastive loss based on the discrete top-k preferences of users in the training data. By applying the Gaussian kernel function to smooth user interest scores and predicted results, the influence of noise and outliers in the data on model training can be minimized. The Gaussian kernel considers the information from neighboring data points, allowing the loss function to emphasize the overall trend of the data rather than individual extreme values. Figure 3 is an example of the distribution difference of a user after topk smoothing By selecting the top-k elements for loss calculation, this approach enables the model to focus more on those elements that significantly impact the prediction results. Specifically, the approach can be outlined as follows, then Algorithm 1 provides the pseudocode for AMPIRec:

\begin{aligned} L & = \frac{1}{k N} \sum_{i = 1}^{k} | \sum_{j = 1}^{N} G (i, j) {predict}_{j} - \sum_{j = 1}^{N} G (i, j) {true}_{j} |, \end{aligned}

(23)

\begin{aligned} G (i, j) & = \exp (- \frac{(t_{i} - j)^{2}}{2 σ^{2}}), \end{aligned}

(24)

where

t_{i}

represents the index of the

i th

element in the Top-k list, and

{predict}_{j}

and

{true}_{j}

denote the predicted result and the user’s interest score for the

j th

item, respectively. Then the total loss as follows:

\begin{aligned} Loss = BPR_Loss + L . \end{aligned}

(25)

Figure 3.

Demonstration of Top-30 smoothed distribution discrepancy between user’s ground-truth ratings and predicted ratings.

5. Experiments

In this section, we introduce the two datasets utilized in our experiments, present the results obtained from these datasets, and compare our findings with those of state-of-the-art recommendation methods.

5.1. Datasets and Metrics

The datasets utilized in this experiment are an extension of the MovieLens 1M dataset,¹ published by the GroupLens research team (Harper & Konstan, 2015), and the Amazon-Book Reviews Database available on Kaggle.²

The MovieLens dataset includes ratings from 6,040 users ( $U$ ) for 3,952 movies ( $M$ ), with ratings ranging from 1 to 5 and each user has at least 20 ratings. Additionally, it contains supplementary information such as movie genres ( $G$ ), occupation ( $O$ ), and age ( $A$ ). The Amazon-Book Reviews dataset employs the filtering method that requires each user to have a minimum of 20 interactions. It comprises ratings from 8,950 users (U) across 53,213 books (B), along with supplementary information, including the author (A) of each book and its corresponding category (C). A detailed description of both datasets is provided in Table 1.

Table 1.
Statistics of the Datasets.

Dataset Relation (A–B) Number of A Number of B

MovieLens User-Movie 6,040 3,883

Movie-Age 6,040 7

Movie-Genre 3,883 18

User-Occupation 6,040 21

Amazon-Book User-Book 8,950 53,213

Book-Category 53,213 2,780

Book-Author 53,213 31,203

Dataset	Relation (A–B)	Number of A	Number of B
MovieLens	User-Movie	6,040	3,883
	Movie-Age	6,040	7
	Movie-Genre	3,883	18
	User-Occupation	6,040	21
Amazon-Book	User-Book	8,950	53,213
	Book-Category	53,213	2,780
	Book-Author	53,213	31,203

To comprehensively evaluate the performance of various recommendation models, we employ three core metrics that are widely used in the assessment of recommendation systems:recall, normalized discounted cumulative gain (NDCG), and precision. These indicators align with those established in the latest MIMA method, and these metrics assess the quality and effectiveness of the recommendation system from various perspectives.

Recall (Koren et al., 2009) calculates the proportion of relevant items that the model successfully recommends out of the total number of relevant items. It directly reflects the model’s ability to identify items of interest to users, specifically assessing whether the model can discover as many items as possible that the user is likely to appreciate.

NDCG (Järvelin & Kekäläinen, 2002) assesses both the relevance of recommended items and the significance of their order in the recommendation list. It assigns weights to the recommended items based on their position, with items appearing higher in the list receiving greater weight. This approach emphasizes the importance of placing relevant items at the top of the list.

Precision (Herlocker et al., 1999) measures the accuracy of the recommendation system, defined as the proportion of items in the recommendation list that the user is genuinely interested in. It calculates this by determining the ratio of the number of correctly recommended items to the total number of recommended items.

5.2. Baselines

To provide a comprehensive comparison and evaluate the performance of our proposed model, we employ five well-established baseline models. These baselines are selected to represent a diverse array of approaches in graph-based recommendation systems, enabling us to benchmark our model against various state-of-the-art techniques and demonstrate its effectiveness across different methodologies.

PUP (Zheng et al., 2020) discusses the integration of price factors into recommendation systems. It proposes a recommendation framework based on GCNs that takes into account users’ sensitivity to item prices. By learning the interaction graph between users and items, along with their relationship to price information, the model enhances the accuracy of recommendations and increases user satisfaction.

PinSAGE (Ying et al., 2018) employs graph convolutional neural networks to enhance large-scale recommendation systems. It introduces a GCN model designed to manage recommendation systems comprising millions of users and items. This model utilizes a graph structure to represent interactions between users and items, effectively capturing intricate user preferences and item similarities by propagating information throughout the network.

ATGCF (Ma et al., 2022) proposes a HGNN model designed to manage multi-behavior feature interactions in recommendation systems. This model constructs a heterogeneous graph to capture the intricate relationships between user behaviors, thereby enhancing the predictive accuracy of recommendations, with a particular focus on the interactions among features.

HetGNN (Zhang et al., 2019) is a HGNN specifically designed for multi-behavior feature-interaction recommendations. It utilizes an advanced message-passing mechanism to manage various node and edge types, thereby enabling the effective capture of intricate user-item interactions.

HAN (Wang et al., 2019) is an advanced GNN specifically designed for heterogeneous graphs. It incorporates a hierarchical attention mechanism that separately addresses node-level and semantic-level attentions. This approach enables the model to effectively capture the significance of various types of nodes and relationships, facilitating more accurate representation learning for complex graph structures that involve multiple node and edge types.

MIMA (Li et al., 2024) is a multi-feature interaction meta-path aggregation HGNN specifically designed for recommendation systems. It combines a meta-path aggregation strategy with a multi-feature interaction mechanism, allowing the model to effectively capture the intricate relationships and interactions among various features and meta-paths in heterogeneous graphs. This approach enhances the understanding of user-item interactions, resulting in improved recommendation performance on complex datasets.

5.3. Experiments Results

In this article, three meta-paths are identified as auxiliary information within the meta-path identification module, specifically $k = 3$ , and a head count of $h = 4$ for the multi-head attention mechanisms. The notation $@ K$ refers to the top $K$ recommended items in the recommendation list.

5.3.1. Meta-Path Identification

Following the meta-path identification process applied to both datasets, the condition numbers $Cond (A_{o_{i}, o_{j}}^{l})$ for each meta-path are presented in Figure 4.

Figure 4.

Cond for the datasets: (a) cond for MovieLens and (b) cond for Amazon-Book.

From Figure 4, it is evident that the top three meta-paths with the highest condition numbers in the MovieLens dataset are UMGM, UOUM, and UOUOUM. In contrast, the top three meta-paths in the Amazon-Book dataset are UBUB, UBUBUB, and UBCB. A semantic analysis of these meta-paths revealed that the most stable adjacency matrix meta-path in the MovieLens dataset is UMGM, indicating that users select movies primarily based on genre. The second most stable meta-path, UOUM, suggests that users with different occupations tend to have similar tastes in films. UOUOUM pertains to movies that are widely circulated among users within their respective occupations. In the Amazon-Book dataset, UBUB ranks the highest, reflecting that users who enjoy the same book often share common preferences, which indicates a transfer of preferences among users. The meta-path UBUBUB encompasses a broader range of users with shared interests. Meanwhile, the meta-path UBCB illustrates users’ preferences for specific book categories. Overall, the identified meta-paths align well with the patterns and preferences commonly observed in everyday choices.

5.3.2. Model Comparison

In this subsection, we have chosen equation (12) as our information enhancement module. Effectiveness experiments were conducted to evaluate the performance of all baseline methods, as well as our proposed model. Table 2 presents a comparison of the performance of AMPIRec against the baseline methods across both datasets. This comparison highlights the effectiveness of AMPIRec in relation to state-of-the-art approaches, providing a comprehensive overview of its performance in various scenarios.

Table 2.
Comparative Results Between Baselines and AMPIRec.

Dataset Algorithm Recall Precision NDCG

20 30 50 10 20 20

MovieLens PUP 0.0455 0.0665 0.1048 0.0588 0.0596 0.0548

PinSAGE 0.0431 0.0607 0.1015 0.0496 0.0478 0.0537

ATGCF 0.0482 0.0689 0.107 0.0645 0.0627 0.0581

HetGNN 0.0533 0.0702 0.1033 0.0771 0.0747 0.0572

HAN 0.0529 0.0677 0.0995 0.0767 0.0729 0.0559

MIMA 0.0545 0.0712 0.117 0.082 0.0755 0.0621

ARMIRec 0.0732 0.1028 0.153 0.0874 0.0829 0.0931

Amazon-Book PUP 0.0382 0.0415 0.0754 0.0316 0.0285 0.0624

PinSAGE 0.0283 0.0357 0.071 0.0317 0.0229 0.0545

ATGCF 0.0408 0.0426 0.0775 0.0358 0.0298 0.0651

HetGNN 0.0416 0.0433 0.0783 0.0367 0.0309 0.0658

HAN 0.0432 0.0455 0.0729 0.0377 0.0359 0.0649

MIMA 0.0475 0.0503 0.0834 0.041 0.0381 0.0719

ARMIRec 0.0752 0.0963 0.1141 0.0603 0.0487 0.1184

Dataset	Algorithm	Recall	Precision	NDCG
MovieLens	PUP	0.0455	0.0665	0.1048	0.0588	0.0596	0.0548
	PinSAGE	0.0431	0.0607	0.1015	0.0496	0.0478	0.0537
	ATGCF	0.0482	0.0689	0.107	0.0645	0.0627	0.0581
	HetGNN	0.0533	0.0702	0.1033	0.0771	0.0747	0.0572
	HAN	0.0529	0.0677	0.0995	0.0767	0.0729	0.0559
	MIMA	0.0545	0.0712	0.117	0.082	0.0755	0.0621
	ARMIRec	0.0732	0.1028	0.153	0.0874	0.0829	0.0931
Amazon-Book	PUP	0.0382	0.0415	0.0754	0.0316	0.0285	0.0624
	PinSAGE	0.0283	0.0357	0.071	0.0317	0.0229	0.0545
	ATGCF	0.0408	0.0426	0.0775	0.0358	0.0298	0.0651
	HetGNN	0.0416	0.0433	0.0783	0.0367	0.0309	0.0658
	HAN	0.0432	0.0455	0.0729	0.0377	0.0359	0.0649
	MIMA	0.0475	0.0503	0.0834	0.041	0.0381	0.0719
	ARMIRec	0.0752	0.0963	0.1141	0.0603	0.0487	0.1184

The results demonstrate that our model, AMPIRec, significantly outperforms all baseline models across both datasets. In the MovieLens dataset, AMPIRec excels in all evaluation metrics, particularly in NDCG@20, where it achieves nearly an 50% improvement over the state-of-the-art method. In the Amazon-Book dataset, AMPIRec achieves approximately a great improvement in Recall@K compared to the best-performing baseline model, MIMA, and also exhibits substantial enhancements in PRECISION@K and NDCG@20. AMPIRec’s exceptional performance across a variety of tasks and datasets underscores its remarkable generalization capability. The advantages of this model become increasingly evident when the value of $K$ is small, demonstrating its effectiveness in managing long-tail recommendations, particularly in enhancing precision in user suggestions. These results validate the efficacy of AMPIRec and highlight its benefits in HIN recommendations.

5.3.3. Ablation Study

In this section, we explore various information enhancement modules to identify the most effective one. Specifically, we evaluate four variations of our model: AMPIRec $^{1}$ uses equation (12) as the information enhancement module, AMPIRec $^{2}$ applies equation (13), AMPIRec $^{3}$ utilizes equation (14), and AMPIRec $^{4}$ employs equation (15). By comparing these variants, we aim to determine the optimal module for enhancing the model’s performance. The experimental results are shown in Figure 5.

Figure 5.

Results of four coefficient calculation methods: (a) impact in the MovieLens and (b) impact in the Amazon-Book.

As shown in Figure 5, the information enhancement module, which employs the reciprocal of the condition number as a coefficient, demonstrates optimal performance across all evaluated metrics. Consequently, we recommend utilizing equation (12) as the practical computation method for the interaction information enhancement module.

Next, we conducted experiments utilizing different meta-paths with AMPIRec $^{1}$ to validate the effectiveness of the meta-path identification module. In the AMPIRec’s meta-path identification module, we experimented with various meta-path selection strategies by utilizing the three most stable meta-paths and the three least stable ones as auxiliary information. The results, as presented in Figure 6, demonstrate that the strategy of selecting the top three meta-paths with the smallest condition numbers consistently yields superior recommendation performance across both datasets. Furthermore, the experiments quantitatively confirm that longer meta-paths tend to exhibit lower stability.

Figure 6.

Results of the two meta-path combination selection modes in each datasets: (a) result in the MovieLens and (b) result in the Amazon-Book.

6. Conclusion

In this article, we introduce AMPIRec, a novel HGNN model aimed at enhancing recommendation systems by leveraging intricate relationships within HINs. AMPIRec integrates various types of node interactions and incorporates extensive attribute information, providing a comprehensive framework for accurately capturing user preferences and item characteristics.

Our comprehensive evaluation of AMPIRec on two real-world datasets, Amazon-Book and MovieLens, demonstrates its superior performance compared to several advanced baseline models. The results indicate significant improvements in key metrics such as Recall, Precision, and NDCG, underscoring AMPIRec’s effectiveness in predicting user preferences and enhancing recommendation accuracy. Notably, AMPIRec excels by not only capturing direct interactions but also leveraging auxiliary information from various Meta-Paths. This holistic approach enables AMPIRec to outperform both traditional and contemporary methods that rely on HINs.

Looking ahead, future research could explore extending AMPIRec to additional domains and incorporating a broader range of interaction types and attributes. Such developments would further enhance the model’s versatility and performance. By building on the principles demonstrated in AMPIRec, we aim to push the boundaries of personalized recommendation systems, ultimately delivering more precise and meaningful recommendations to users.

Footnotes

Acknowledgments

We extend our sincere gratitude to the editors and reviewers for their invaluable efforts in facilitating the publication of this manuscript. We also thank the curators of the Movielens and Amazon-Book databases for making their valuable datasets publicly available.

ORCID iD

Yongjie Liang

Funding

This work was supported by the National Natural Science Foundation of China grant No. 12261027.

Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Dong

Chawla

N. V.

Swami

(2017). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 135–144).

Franklin

J. N.

(2012). Matrix Theory. Courier Corporation.

T. Y.

Lee

C. W.

Lei

(2017). Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 1797–1806).

Harper

F. M.

Konstan

J. A.

(2015). The movielens datasets: History and context. ACM Transactions on Interactive Intelligent Systems (TIIS), 5(4), 1–19.

Deng

Wang

Zhang

Wang

(2020). Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd international acm sigir conference on research and development in information retrieval (pp. 639–648).

Herlocker

J. L.

Konstan

J. A.

Borchers

Riedl

(1999). An algorithmic framework for performing collaborative filtering. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 230–237).

Järvelin

Kekäläinen

(2002). Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422–446.

Kipf

T. N.

Welling

(2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

Koren

Bell

Volinsky

(2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37.

10.

Yan

Zhao

Jiang

Chen

Wang

(2024). Mima: Multi-feature interaction meta-path aggregation heterogeneous graph neural network for recommendations. Future Internet, 16(8), 270.

11.

Liu

Zhan

Ding

Tao

Mandic

D. P.

(2023). Comprehensive graph gradual pruning for sparse training in graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 35(10), 14903–14917.

12.

Liu

Yan

Deng

Fan

Pan

Xie

(2022). Survey on graph neural network acceleration: An algorithmic perspective. arXiv preprint arXiv:2202.04822.

13.

Chen

(2022). Heterogeneous graph neural network for multi-behavior feature-interaction recommendation. In International conference on artificial neural networks (pp. 101–112). Springer.

14.

Mikolov

Sutskever

Chen

Corrado

G. S.

Dean

(2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119.

15.

Scarselli

Gori

Tsoi

A. C.

Hagenbuchner

Monfardini

(2008). The graph neural network model. IEEE Transactions on Neural Networks, 20(1), 61–80.

16.

Schlichtkrull

Kipf

T. N.

Bloem

Van Den Berg

Titov

Welling

(2018). Modeling relational data with graph convolutional networks. In The semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15 (pp. 593–607). Springer.

17.

Shi

Zhao

W. X.

Philip

S. Y.

(2018). Heterogeneous information network embedding for recommendation. IEEE Transactions on Knowledge and Data Engineering, 31(2), 357–370.

18.

Shi

Kong

Huang

Philip

S. Y.

(2014). Hetesim: A general framework for relevance measure in heterogeneous networks. IEEE Transactions on Knowledge and Data Engineering, 26(10), 2479–2492.

19.

Sun

Han

(2013). Mining heterogeneous information networks: A structural analysis approach. ACM SIGKDD Explorations Newsletter, 14(2), 20–28.

20.

Sun

Han

Yan

P. S.

(2011). Pathsim: Meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment, 4(11), 992–1003.

21.

Vaswani

(2017). Attention is all you need. Advances in Neural Information Processing Systems, 30, 6000–6010.

22.

Veličković

Cucurull

Casanova

Romero

Lio

Bengio

(2017). Graph attention networks. arXiv preprint arXiv:1710.10903.

23.

Wang

Shi

Wang

Cui

P. S.

(2019). Heterogeneous graph attention network. In The world wide web conference (pp. 2022–2032).

24.

Pan

Chen

Long

Zhang

Philip

S. Y.

(2020). A comprehensive survey on graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 32(1), 4–24.

25.

Yang

Yan

Pan

Fan

(2023). Simple and efficient heterogeneous graph neural network. In Proceedings of the AAAI conference on artificial intelligence (Vol. 37, pp. 10816–10824).

26.

Ying

Chen

Eksombatchai

Hamilton

W. L.

Leskovec

(2018). Graph convolutional neural networks for web-scale recommender systems. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 974–983).

27.

Zhang

Song

Huang

Swami

Chawla

N. V.

(2019). Heterogeneous graph neural network. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 793–803).

28.

Zheng

Gao

Jin

(2020). Price-aware recommendation with graph convolutional networks. In 2020 IEEE 36th international conference on data engineering (ICDE) (pp. 133–144). IEEE.

Heterogeneous Information Network Recommendation Driven by Automatic Meta-Path Identification

Abstract

Keywords

1. Introduction

2.1. Embedding Techniques for HINs

2.2. GNNs for HINs

3. Preliminaries

5.1. Datasets and Metrics

Table 1. Statistics of the Datasets. Dataset Relation (A–B) Number of A Number of B MovieLens User-Movie 6,040 3,883 Movie-Age 6,040 7 Movie-Genre 3,883 18 User-Occupation 6,040 21 Amazon-Book User-Book 8,950 53,213 Book-Category 53,213 2,780 Book-Author 53,213 31,203

5.3. Experiments Results

5.3.1. Meta-Path Identification

Footnotes

Acknowledgments

ORCID iD

Funding

Conflicting Interests

Notes

References

Table 1.
Statistics of the Datasets.

Dataset Relation (A–B) Number of A Number of B

MovieLens User-Movie 6,040 3,883

Movie-Age 6,040 7

Movie-Genre 3,883 18

User-Occupation 6,040 21

Amazon-Book User-Book 8,950 53,213

Book-Category 53,213 2,780

Book-Author 53,213 31,203