Sage Journals: Discover world-class research

Abstract

Recommendation-based dialogue systems aim to capture user preferences via interactive conversations for personalized recommendations. While existing studies focus on modeling user preferences, real-time dialog scenarios face challenges in balancing historical conversation contexts and immediate interests. This study proposes MGIRD, a multi-aspect graph representation approach integrating ordinary graphs and hypergraphs. We use graph structures to model users’ current interests and hypergraphs for historical conversation features, while incorporating historical behaviors in the recommendation module to balance context relevance. A novel item selection mechanism is introduced during dialog generation to naturally integrate recommended items. Experiments on Chinese TG-Redial and English Redial datasets show MGIRD outperforms most state-of-the-art methods in recommendation accuracy and dialog diversity, validating its effectiveness in enhancing recommendation quality and conversational fluency.

Keywords

natural language processing conversation recommendation systems knowledge graphs graph feature extraction

1. Introduction

The rapid expansion of smart assistants and e-commerce platforms has led to the widespread adoption of recommendation systems in many areas. These systems are now a key part of our digital lives, making our experiences more personalized and convenient (Gao et al., 2021). The core function of these systems is to predict users’ potential needs and preferences by analyzing historical interaction data between users and products. Recently, due to the swift advancement in the fields of natural language processing (NLP) and the wide application of data mining techniques in the web (Ashraf, 2021), a new methodology has emerged. That is the strategy of adopting NLP techniques and utilizing a large amount of data to train recommender system models to improve the accuracy of the recommender system. This new methodology has gradually become the focus of the research in this field. The essence of this strategy lies in leveraging NLP to conduct an in-depth analysis of text data generated by users, thereby gaining a better understanding of their preferences and needs. Through NLP methods, recommendation systems can detect subtle emotional shifts and specific preferences of users, thus providing more personalized and precise recommendations. Recent research (Chen et al., 2019; Lei et al., 2020a, 2020b; Sun & Zhang, 2018; Wang et al., 2022) has shown that recommendation systems can identify user interests more accurately and effectively through analyzing users’ real-time interaction data. In this context, Recommendation-based Dialogue Systems (also known as conversation recommendation system [CRS]) (Chen et al., 2019; Li et al., 2018, 2022; Liang et al., 2021; Zhou et al., 2020) have emerged. These systems are delineated as a specific genre of task-oriented dialogue systems, with the fundamental goal of discerning user preferences via “dialogues” with real-time interaction data, thereby enabling precise recommendations. This has become an increasingly prominent research trend in recent years. Successful CRS not only have the ability to understand user interests but also possess natural and fluent language generation capabilities to foster natural and enjoyable dialogues with users. A typical case of CRS is shown in Figure 1.

Figure 1.

A typical case of conversation recommendation system.

A typical recommendation-based dialogue systems are composed of two core modules: the recommendation module and the dialogue module. The recommendation module focuses on making appropriate recommendations by parsing user preferences from the context of the conversation. Meanwhile, the dialogue module concentrates on generating fluent conversations to enhance the user’s authentic experience. Today, it has been widely deployed across various practical application scenarios, serving either as a standalone system or as an integrated module within large-scale voice assistant platforms, showcasing extensive potential for application.

In the research field of conversational recommendation, accurately capturing user preferences is crucial. Currently, many conversational recommendation systems are focusing on improving their ability to recognize user preferences. Some methods propose utilizing complex encoder structures to enhance natural language understanding capabilities (Li et al., 2018); while others suggest incorporating external information such as knowledge graphs (Chen et al., 2019; Wang et al., 2022; Zhang et al., 2021) and user comments (Lu et al., 2021) to enrich user modeling. While these methods can improve recommendation accuracy to some extent, an excessive focus on current preferences may lead to too much duplication of generated recommendation items and fail to elucidate the reasons for selecting these items when generating responses. In practical recommendation processes, user historical behavioral data holds significant value for more comprehensive user modeling while maintaining alignment with their current interests. Some studies have begun exploring the use of user historical information to enhance user modeling (Li et al., 2022). However, the majority of user historical information follows the characteristic of a “long-tail distribution” (Shang et al., 2023; Zhao et al., 2023), where only a few users have abundant historical data, which is extremely complex and difficult to accurately extract. Additionally, for most users, their historical information is sparse, making it challenging to extract effective information to provide meaningful support for understanding their current interests.

The goal of our research is to integrate knowledge graphs to strengthen the understanding of users’ current conversation state, and to integrate users’ historical information to enrich user models. Without altering the user’s current preferences, we aim to enhance current user interest modeling through comprehensive integration of relevant historical data. This approach is designed to improve recommendation accuracy and enhance the rationality of responses generated by the dialogue module by effectively combining historical information with the user’s current interests. Inspired by Shang et al. (2023) and Bai et al. (2021), we use the structure of “Hypergraphs” to model the historical activities of users. Then, we extract key features separately from the knowledge graph and historical hypergraphs data to enhance the effectiveness of the recommendation and dialogue. In view of these challenges, we propose a feature learning strategy that integrates user’s multi-faceted graph representation.

Incorporating historical information into the dialogue will help users understand the logic behind recommendation and enhance the interpretability of the recommendation system. Through this method, we can effectively integrate user information and reflect this integration in the content of the dialogue, thus increasing the richness and diversity of the dialogue.

The summarized of our main contributions are as follows:

Our proposed method adopts the strategy of fusing the features of three graphs. This method aims to increase the richness of user features.

We propose to enhance users’ current relevant preferences by leveraging historical activity based on hypergraphs structures, and this strategy is used to optimize the “slot filling” link of recommended items in the process of dialogue generation.

We tested our model on two real conversation recommendation datasets, and it achieved performance above most current baselines. In addition, through ablation study, we verified that the three graph features are indispensable.

2. Related Work

2.1. Task Description for Conversation Recommendation System

In recent years, recommendation system gradually became the core function of some of the largest online services in the world (such as Microsoft, Google, Amazon) (Tran et al., 2020), which was widely used in e-commerce, social networks, movies, music, and so on. By analyzing user behavior data and other relevant information, the system scores and ranks candidate options during the dialogue process, identifying products or information that users may find interesting. However, traditional recommendation systems often adopt static models and rely heavily on users’ offline historical data, which may be sparse or contain noise (Gao et al., 2021). At the same time, in some recommendation scenarios, users’ real-time preferences may vary, and the traditional static model can’t adapt to this change, so it can’t quickly capture users’ intentions, which affects the accuracy of recommendation.

With the development of dialogue system, dialogue technology provides a new way to solve the problems faced by traditional recommendation. Recommend fascinating projects and even perform complex tasks by simulating human language interaction (Moradizeyveh, 2022); Conversation recommendation system is the deep integration of dialogue and recommendation system (Lei et al., 2020a). Its purpose is to collect detailed information about users’ preferences through real-time communication with users, and to meet users’ immediate needs according to this information. This method can provide personalized, real-time and coherent recommendation content for users, thus improving the user experience and recommendation effect.

At present, the research on conversation recommendation system is mainly concentrated in two directions: one is attribute-based conversation recommendation system, and the other is generation-based conversation recommendation system. Next, we will introduce these two types of systems in detail.

2.2. Attribute-Based Conversation Recommendation System

Attribute-based conversation recommendation system (Deng et al., 2021; Lei et al., 2020b; Sun & Zhang, 2018; Zhang et al., 2018, 2022; Zhao et al., 2023a, 2023b; Zou et al., 2020) evolved from interactive (Guo et al., 2017) and comment-based (Luo et al., 2020) recommendation systems. These systems follow the SAUR (Zhang et al., 2018)(interactive mode of System Asks, User Responses). It is intended to optimize the recommend strategies by continuously asking questions to users and obtaining their feedback. Attribute-based conversation recommendation system aims to reduce the number of conversations required and improve the accuracy of recommendation results. Lei et al. proposed “EAR (Lei et al., 2020a)” proposes to divide the dialog recommendation task into three steps: Estimation-Action-Reflection identifies the basic constructs of such a dialog recommendation system, consisting of an attribute model and a dialog model, and introduces a dialog strategy to control the system’s choice of whether to continue asking questions or recommending. SCPR (Lei et al., 2020b)proposes to model the dialog recommendation task as a path reasoning problem on the knowledge graph, and model the interaction process between the system and users as a path walking process on the graph. Aiming at the issue of users’ multi-interest deviation, Zhang et al. (2022) proposed that comprehensively extracts users’ multiple interests based on multiple-choice questions.

This type of system focuses on improving the efficiency of interactive recommendation and completing the recommendation task in the least number of rounds as far as possible. However, constantly asking questions mechanically may sometimes sacrifice the natural fluency of the dialogue. In addition, because such systems often choose to reply in several preset templates, the query attributes of users are limited, which may have a negative impact on the overall experience of users, thus limiting their application in various dialogue scenarios.

2.3. Generation-Based Conversation Recommendation System

Generation-based conversation recommendation system (Chen et al., 2019; Feng et al., 2023; Li et al., 2018, 2023, 2022; Liang et al., 2021; Lu et al., 2021; Ma et al., 2020; Shang et al., 2023; Wang et al., 2022; Zhou et al., 2020, 2022): differing from the former, this kind of system is a special task-based dialogue system. It takes successful recommendation as the final task, and the system independently trains two modules, namely, dialogue and recommendation. By analyzing the current conversation content of users and combining external resources as well as users’ personal information, it is aimed at achieving accurate recommendation through modeling the user’s representation. At the same time, it generates appropriate and informative replies in the dialogue generation module by using an end-to-end framework. With deep learning and language modeling constantly updated, there are constantly new technologies to improve the dialogue understanding and generation ability of this system.

Obtaining a representation of a user’s current interests relies heavily on the context of the conversation, but due to limited conversation data, the system is unable to extract enough information to ensure the accuracy of recommendations. This limitation leads to challenges for recommender systems in accurately capturing user needs and preferences. Therefore, early research usually relied on external data, such as knowledge graphs (Chen et al., 2019; Ren et al., 2023; Zhang et al., 2024, 2021; Zhou et al., 2020). This enabled the system to enrich the content of user preference-related entities and extract the user’s current expression of interest in order to improve the accuracy of the recommendation in the presence of limited data. On this basis, recent research mainly explores other information of users, such as user history, comments (Lu et al., 2021), similar user characteristics (Li et al., 2022) and other dimensions. At the same time, recent research began to pay attention to the dialogue generation module, focusing on how to naturally integrate the recommended content into the reply ? and how to enhance the explanation of dialogue generation (Guo et al., 2023). The goal of this kind of dialogue recommendation system is to simulate an identity similar to that of a “real-world shopping guide,” which can skillfully guide users to the recommendation link on the basis of keeping the dialogue natural and smooth (Zhou et al., 2020).

Our model accurately captures users’ current interest in using graph data representation, and integrates users’ historical information by using hypergraph technology. Thus, it realizes the fusion of several graph features. These features are effectively input into the dialogue generation module to generate answers that are both integrated into the recommended content and maintain natural fluency. In addition, by introducing user information integrated by multiple graph features, our system can introduce other entities related to recommended items in the process of dialogue generation. This not only enrich the dialogue content, but also explain the reason of recommendation, enhancing interpretability.

3. Description of Method and Structure

In this section, we not only focus on the representation of the current conversation content, but we also cover the features extracted from the user’s historical conversation. Firstly, it focuses on capturing the current interests of users; then, the feature extraction and data enhancement of users’ historical interaction are carried out by applying hypergraph technology; After that, we fuse the historical information identification with entity and word coding feature. Finally, we will describe how to integrate the features of user representation to support the module of movie selection and dialogue generation. The overall structure of the system is shown in Figure 2.

3.1. Current User Preference Extractor

In recommendation scenarios, each conversation usually contains only a few questions and answers. There is less data on items that can be related to the user’s interests in the conversation. This leads to the problem of data sparsity in the content of the current conversation. To address this difficulty, Zhou et al. (2020)proposed extracting and integrating information from external knowledge graphs to enhance the feature representation of the current conversation. Following the recommendations of these studies, we employ graph convolutional networks (GCNs) (Kipf & Welling, 2016) and relational graph convolutional networks (R-GCNs) (Schlichtkrull et al., 2018) to extract features of words and entities from two different knowledge graphs, respectively.

3.1.1. Encoding Word-Oriented KG

We select ConceptNet (Speer et al., 2017), a well-recognized knowledge graph, as our word-oriented knowledge base, where semantic facts are stored in the form of triples: word-relationship-word. To mitigate the negative impact of noise on recommendation performance, we extract only words that appear in the corpus and retrieve their related triples in ConceptNet. By applying GCNs, we are able to capture contextual dependencies and potential semantic connections between nodes (i.e., words), thus transforming isolated word embeddings into context-rich representations, ultimately obtaining the current word encoding representation of the user, denoted as $n_{w}^{l + 1}$ . The GCN formula is as follows:

n_{w}^{l + 1} = σ (D^{- 1 / 2} A D^{- 1 / 2} n_{w}^{l} W_{w}^{l})

(1)where

σ

is the activation function,

A

is the adjacency matrix,

D

represents the diagonal degree matrix,

n_{w}^{l}

is the representation of nodes at the

l^{t h}

layer, and

W^{(l)}

is the learnable matrix at the

l^{t h}

layer.

3.1.2. Encoding Entity-Oriented KG

We employ DBpedia (Bizer et al., 2009)(a structured database derived from Wikipedia) as an entity-oriented knowledge graph for entity linking. We systematically identify and collect all mentioned entities from the corpus following the methodology of Chen et al. (2019). Considering the diversity of relationships between entities (e.g., directors, actors, screenwriters, etc.), traditional graph neural networks may not be able to adequately capture the associative properties of these diversities. So we adopt the R-GCN, which can assign different weights to neighboring nodes according to the type of relationship connected to the target node. This ability makes R-GCN useful in dealing with data containing rich types of relationships, especially in entity classification and link prediction, among other tasks, to demonstrate superior performance. The corresponding entity embeddings are obtained using R-GCN denoted as $n_{e}^{l + 1}$ . The formula to obtain the embedding of entity at layer using R-GCN is as follows:

n_{e}^{l + 1} = σ (\sum_{r \in R} \sum_{e^{'} \in N_{e}^{r}} \frac{1}{Z_{e, r}} W_{r}^{l} n_{e^{'}}^{l} + W^{l} n_{e}^{l})

(2)where

R

is the set of relations,

N_{e}^{r}

represents the set of neighboring nodes of e under relation r.

W_{r}^{l}

and

W^{l}

, respectively, denote the learnable matrices used for aggregating neighboring node relations and for representation transformation.

Z_{e, r}

is the normalization factor, and

σ

represents the sigmoid function.

3.1.3. Self-Attention Layer

In this step, we perform self-attention transformation separately on the current word and entity representations to identify key information, convert them into the same dimensionality, namely, the current word and entity encodings: $R_{w}^{c}$ , $R_{e}^{c}$ for ease of computation. The attention transformation is as follows:

\begin{aligned} R_{w}^{c} & = \sum_{j = 1}^{N} (\frac{e x p (w_{i, j})}{\sum_{i = 1}^{N} e x p (w_{i, j})}) \end{aligned}

(3)

\begin{aligned} R_{e}^{c} & = \sum_{j = 1}^{N} (\frac{e x p (e_{i, j})}{\sum_{i = 1}^{N} e x p (e_{i, j})}) \end{aligned}

(4)

where

\begin{aligned} w_{i, j} & = b^{T} \tanh (W^{T} n_{w}^{l + 1}) \end{aligned}

(5)

\begin{aligned} e_{i, j} & = b^{T} \tanh (W^{T} n_{e}^{l + 1}) \end{aligned}

(6)

where

W^{T}

and

b^{T}

are both learnable parameters, and

N

represents the length of entity and word vectors.

Figure 2.

Overall structure diagram: our model (MGIRD) consists of four parts: (a) Current user preference extractor; (b) Historical user preference extractor; (c) Recommender system; (d) Dialogue generator.

3.2. Historical User Preference Extractor

Given the difference in interaction frequency between users and the system, with some users engaging frequently while others relatively infrequently, we have decided to utilize a “hypergraph” structure to model historical information.

Hypergraphs have been used in various fields such as social networking (Zhao et al., 2023a), sequence recommendation (Xia et al., 2022), data mining (Alam et al., 2021), and so on, due to their unique ability to deal with “unpaired relationships.” They are particularly suitable for capturing higher-order relationships in data, which is crucial for us in dealing with complex historical information.

Firstly, the “one-side multiple nodes” capability of hypergraphs can integrate multiple entities (Bai et al., 2021; Zhao et al., 2023b), enhancing the connections between them. Additionally, the powerful flexibility of hypergraphs allows the model structure to adjust according to the continuously changing data, for example: when there are new changes in user behavior or dialogue, the hypergraph can adjust the graph structure by adding or deleting nodes and edges.

3.2.1. Construction of Hypergraphs

Following the work done in Shang et al. (2023), we constructed two types of hypergraphs. The build process is shown in Figure 3.

Figure 3.

Graph from dialogue to hypergraph structure.

First, we constructed a hypergraph that reflects the history of user interactions. Typically, interactions between a user and the system are centered around specific topics, especially for users with frequent interactions. Certain items may recur multiple times in these interactions, thus providing opportunities to connect different conversations. Specifically, we treat items in the same historical dialog as vertices and represent their connections by constructing hyperedges. This approach allows us to create associations between conversations and items, thus enhancing the relevance of items to each other, regardless of the chronological order.

As shown in Figure 3: take movie recommendation as an example, in one of the history conversations, the user mentions the names of three movies: “Escape Plan,” “Terminator 2,” and “Terminator,” we use this conversation as a hyperedge to connect these three movie nodes. Later, in a new interaction (which should also appear as a history hypergraph in future conversations), the user mentions several other movies. “Terminator 2” and “Terminator” appear again, to which we will assign more weight. Since the two dialog hyperedges are connected to the same node, the other nodes they are connected to will also have some connection to each other.

We have noticed the common phenomenon of “long-tail distribution” in real dialog scenarios, where a few users interact with the system frequently while the majority of users may only interact once or twice. This situation affects the accurate assessment of users’ historical preferences. To solve this problem, we create a knowledge-enhanced hypergraph by merging each item in the history and its N-hop neighbors in the knowledge graph into the hypergraph construction. We chose adjacent nodes because the neighbors of an item usually share similar or related semantic properties with that item. In this way, all hyperedges are interconnected through their shared entities (either items or attributes), ensuring that each hyperedge contains a specific item and multiple related entities, effectively mitigating the problem of sparse historical data.

As shown in the bottom left corner of Figure 3: “Avatar The Way of Water” and “Terminator” are shown as examples, we use a hyperedge to connect them to related entities or properties (e.g., director: James Cameron, star: Arnold Schwarzenegger, movie genre: sci-fi and thriller). When two or more identical nodes are connected to the hyperedge, the connection between the two movies will be increased, and if any one of them matches the user’s interest, the other one will be prioritized in the recommendation.

3.2.2. Extraction of Hypergraph Structural Features From User History

In this paper, we define the normalized hyperedge convolution layer using the hypergraph neural network (Bai et al., 2021) convolution method as:

X^{(l + 1)} = σ (D_{v}^{- 1 / 2} H W D_{e}^{- 1} H^{T} D_{v}^{- 1 / 2} X^{(l)} θ^{(l)})

(7)

In this description, $X^{(l + 1)}$ and $X^{(l)}$ represent the input features of the $l$ -th and $(l + 1)$ -th layers, respectively, while $θ^{(l)}$ is a trainable parameter used for feature transformation; $D_{v}$ and $D_{e}$ represent the degree matrices for vertices and edges, respectively, used to perform normalization; $H$ is an incidence matrix.

From the formula, we can view the operation of this hypergraph convolution as a two-stage process: the first stage aggregates vertex features into hyperedges to form hyperedge features; the second stage refines the aggregated hyperedge features back to vertex features, culminating in the hypergraph convolution formula HCon( $\dots$ ),

\begin{aligned} H C o n (X^{(l)}) & = D^{- 1} H B^{- 1} H^{T} X^{(l)} W \end{aligned}

(8)

\begin{aligned} X^{(l + 1)} & = H C o n (X^{(l)}) \end{aligned}

(9)

We assign the vertex degree matrix $D_{v}$ to $D$ and the hyperedge matrix $D_{e}$ to $B$ to distinguish between them. To maintain consistency between the representation of user historical interactions and the current user encoding, we define $R_{i}^{h}$ as the hypergraph representation of user historical interactions, and $R_{k}^{h}$ as the hypergraph representation of items enhanced by the relevant knowledge graph, $W$ denotes the weighting matrix. Following the Shang et al. (2023) approach, we will now perform feature extraction on the constructed hypergraphs.

For the feature extraction of hypergraph representations from historical interactions: we structure a set $N_{i}^{H}$ of items that appear in historical dialogues, and convert it into an embedding matrix $R_{H}$ . This embedding matrix is then fed into a hypergraph convolutional network for further processing. For the feature extraction of hypergraph representations enhanced by the knowledge graph. Similarly, we form a vertex set $N_{i}^{k}$ from historical items $N_{i}^{H}$ and their N-hop neighbors, and convert it into an embedding matrix $R_{k}$ , which is then input into the hypergraph convolution for processing.

\begin{aligned} R_{i}^{h} & = H C o n (H_{i}, D_{i}, B_{i}, R_{H}) \end{aligned}

(10)

\begin{aligned} R_{k}^{h} & = H C o n (H_{k}, D_{k}, B_{k}, R_{k}) \end{aligned}

(11)

Where

H_{i}

H_{k}

;

D_{i}

D_{k}

and

B_{i}

B_{k}

represent the incidence matrix, vertex degree matrix, and hyperedge degree matrix of the history interaction and knowledge graph-enhanced hypergraph, respectively.

3.2.3. Combining the Two Hypergraph Representations

After we obtained two forms of historical hypergraph representations: $R_{i}^{h}$ and $R_{k}^{h}$ . The next step is to merge these representations to better understand the user’s historical interests and preferences. Due to users often experiencing shifts in interests over time, the system can identify preferences such as romance, science fiction, martial arts and so on from historical records. However, based on the context of the current conversation, it might only be necessary to focus on one specific genre. To prevent other historical interests from influencing the user’s current preferences, we use the hypergraph attention layer (Bai et al., 2021) to fuse the context entity representation (if entities are mentioned in the current dialogue) with historical interests, thereby obtaining a user history representation driven by these current interests.

Specifically, we use the entity encoding obtained via R-GCN from the context entities to perform self-attention operations: the computed values serve as the representation of user historical features and provide a basis for the multi-feature fusion strategy in item recommendation. The formula is as follows:

R_{i k}^{h} = MHA (R_{e}^{c}, [R_{i}^{h} : R_{k}^{h}], [R_{i}^{h} : R_{k}^{h}])

(12)The function MHA(Q, K, V) defines a multi-head attention mechanism that operates on input matrices Q (query), K (key), and V (value),

R_{i}^{h}

and

R_{k}^{h}

denote two types of hypergraph features, respectively;

R_{e}^{c}

denotes the current entity representation extracted through R-GCN.

3.3. Recommendation System

Using the methods described above, we obtain the item’s word feature representation $R_{w}^{c}$ , context entity feature representation $R_{e}^{c}$ , and historical feature representation $R_{i k}^{h}$ . Our ultimate goal is to use these user representations to calculate the probability of recommending a particular item to the user. Specifically, we first merge the current semantic and entity-level representations, extracting the learned individual word and entity vectors represented by $r_{w}^{c}$ and $r_{e}^{c}$ , respectively. Then, we use a gated network to merge these two forms of feature representations, creating a preference representation $r_{u}^{c}$ that reflects the user’s preferences. The formula is as follows:

r_{u} = (β \cdot r_{w}^{c} + (1 - β) \cdot r_{e}^{c}) \cdot R_{i k}^{h}

(13)where

β = σ (W_{gate} [r_{w}^{c} : r_{e}^{c}])

(14)The reason for using gating mechanisms here is that they can dynamically adjust the influence of word feature representations and entity feature representations on the current user preference representation. By learning to gate signals, the model can determine which type of information is relatively more important and autonomously decide whether to consider semantic content or entity-related information more in the current context. This generates a representation that more accurately reflects user preferences. This dynamic adjustment capability allows the model to flexibly respond to complex contexts and user needs, significantly enhancing the performance and user experience of conversational recommendation tasks.

Since project recommendations should still be based on the current user’s immediate preferences, we perform matrix operations between the feature representation of the current user and the user’s historical feature representation $r_{s k}^{h}$ to achieve a linear transformation in high-dimensional space. This fusion of user-related historical preferences enhances the current user interest and improves the accuracy of the recommendations, obtaining the final user preference $r_{u}$ . Then, we calculate the probability of recommending item $i$ to the user, where $n_{i}$ is the vector embedding of the recommended item $i$ . Finally, the recommendation set is generated by following the recommendation probability distribution and sorting the recommended items,

\underset{rec}{Pr} (u, n_{i}) = Softmax (r_{u}^{T} \cdot n_{i})

(15)where

r_{u}^{T}

is the final result of the transpose-transform operation of

r_{u}

We have implemented a strategy with a separately trained recommendation module, using the cross-entropy loss function as the objective function to train the model’s parameters so that its predictions closely align with the actual user behavior.

L_{rec} = - \sum_{j = 1}^{BS} \sum_{i = 1}^{N_{i}} [- (1 - y_{i j}) \cdot \log (1 - {Pr}_{rec}^{(j)} (u, n_{i})) + y_{i j} \cdot \log ({Pr}_{rec}^{(j)} (u, n_{i}))]

(16)where

j

is the index of a conversation,

i

is the index of an item,

BS

represents the batch size,

N_{i}

denotes the size of the item set, and

y_{i j}

indicates the target label.

3.4. Dialogue Generation

Follow previous work (Liang et al., 2021; Zhou et al., 2020) we divide the dialogue generation into two main operations: first, the response template generator, which creates the basic template and framework for replies; When the recommendation state is not yet achieved, the response template generator engages in casual conversation with the user, generating replies without recommendation items, while also assessing the dialogue’s state to steer its direction. Second, the item selector is responsible for choosing the most suitable recommendation item from a set of options and integrating it into the generated dialogue template. This step-by-step design approach contributes to enhancing the quality and efficiency of replies, making the system more flexible and adaptable to new environments. The structure of the dialogue generation module is shown in Figure 4.

Figure 4.

Response generation module.

3.4.1. Response Template Generator

Unlike traditional fixed reply template generation methods (Deng et al., 2021; Lei et al., 2020b; Sun & Zhang, 2018), we adopt a Transformer network-based approach to generate reply templates, achieving a more natural and realistic dialogue generation effect. Specifically, we use the standard transformer encoder architecture (Vaswani et al., 2017) to encode the current dialogue, and then feed the resulting encoded representation into a KG-enhanced decoder architecture (Wang et al., 2022) for processing. Here, the KG-enhanced decoder can effectively utilize information from the knowledge graph, integrating it during the dialogue generation process to enhance the interpretability of the generated replies. Moreover, following the Liang et al. (2021) method, we add a special [movie] to the vocabulary during the reply template generation process, using it as a mask to cover all project targets mentioned in the dialogue corpus. Thus, at each time step, the reply template generator might predict either a word or this token marker, with the probability formula for generating the next token as follows:

P_{dial} (w) = Softmax (W_{d^{e}} + b_{d})

(17)where

W_{d}

and

b_{d}

are the weight and bias parameters, and

w

denotes a word from vocabulary

V

, respectively. After the generation process, these [movie] markers are replaced by the candidate item set from the recommendation module to fill the corresponding positions in the generated template. This strategy, by closely integrating with the recommendation module, increases the relevance and personalization level of the generated response.

3.4.2. Item Selector

After the previous steps, we have obtained the response template. According to the probability calculation method of the recommendation module, we select candidate items to form the candidate set $I_{r}$ .

Sim (u, n_{i}) = Softmax (r_{u}^{T} \cdot n_{i})

(18)How can we effectively fill the recommended items into the template? Current research in utilizing dialogues for recommendations typically only explains the final recommended items. This approach often leads to insufficient explainability of the recommended items. To enhance the explainability of the recommendations. We aim, under the premise of successful recommendations, to allow the system to provide additional information about the final item in the reply template (Liang et al., 2021). Such as the item’s attributes, features, etc., to enhance the user’s understanding of the reasons for the recommendation.

To achieve this goal, we first use a KG-enhanced decoder (Zhou et al., 2020) to generate the reply template $G$ , defining $G$ as the tagged token matrix $G_{S}$ and the remaining token matrix $G_{T}$ . The encoder’s output matrix is defined as $X$ , the candidate item set $I_{r}$ forms the embedding matrix $H_{r}$ , and semantic slot filling is performed through self-attention calculation. The specific calculation method is as follows:

\begin{aligned} A_{0}^{n} & = MHA (G_{S}, G_{T}, G_{T}) \end{aligned}

(19)

\begin{aligned} A_{1}^{n} & = MHA (A_{0}^{n}, X, X) \end{aligned}

(20)

\begin{aligned} A_{2}^{n} & = MHA (A_{1}^{n}, H_{r}, H_{r}) \end{aligned}

(21)

\begin{aligned} R_{n} & = FFN (A_{2}^{n}) \end{aligned}

(22)

where MHA stands for multi-head attention mechanism, and FFN stands for feed-forward neural network.

Through this approach, we are able to gradually integrate the generated template, dialogue context, and key information from the candidate item set. With the aid of the item selector in the dialogue module. We can ultimately predict the probability distribution of each item in the item set being filled into the dialogue template, subsequently selecting the item with the highest probability for insertion.

P_{rec} (w) = Softmax (W_{r} R_{n} + b_{r})

(23)where

W_{r}

and

b_{r}

are weight and bias parameters.

Unlike conventional chatbot systems, the goal of a dialogue recommendation system is to integrate recommended items and their associated attributes, entities, or keywords into the generated responses. Therefore, in this specific conversational context, we adopt the method proposed by Liang et al. (2021), training separately for dialogue loss and item slot-filling loss, and combining them. First, we train the loss for the dialogue generation template:

L_{gen} = - \sum_{i = 1}^{N_{i}} \log (P_{dial} (s_{t} | s_{1}, \dots, s_{t - 1}))

(24)where

N

is the number of dialogue turns, and

s_{t}

is the

t

-th utterance in the dialogue. The loss calculation method for the item selector is as follows:

L_{sel} = - \sum_{i = 1}^{I_{r}} \log (P_{rec} (i_{r}))

(25)where

I_{r}

is the candidate item set. The total loss is given by:

L_{response} = λ L_{gen} + L_{sel}

(26)where

λ

is a weighted hyperparameter, indicating different weights assigned to the dialogue generation and item selection tasks, respectively.

4. Expertiments

4.1. Datasets and Setup

4.1.1. Datasets

We conduct experiments on two widely used datasets in CRS: the ReDial dataset in English and the TG-ReDial dataset in Chinese. The ReDial dataset, collected through crowdsourcing tasks on the Amazon mechanical turk (AMT) platform, aims to obtain high-quality movie recommendation dialogues. This dataset consists of 10,006 dialogues comprising 182,150 sentences. The other dataset, TGRedial, in Chinese, is annotated in a semi-automatic manner with a focus on achieving natural and fluent transitions from non-recommendation scenes to recommendation scenes. It contains 10,000 dialogues with 129,392 sentences. Both datasets are annotated by collecting casual conversations from real humans. Participants are asked to play the roles of either a seeker or a recommender and engage in conversations with each other to achieve the goal of dialogue recommendation. The statistical data for both datasets are shown in Table 1. The data set is divided into training set, verification set and test set according to the ratio of 8:1:1.

Table 1.
Datasets Overview.

Dataset # Dialogs # Utterance # Items # Users

ReDial (Li et al., 2018) 10,006 182,150 51,699 956

TG-ReDial (Zhou et al., 2020) 10,000 129,392 33,834 1,482

Dataset	# Dialogs	# Utterance	# Items	# Users
ReDial (Li et al., 2018)	10,006	182,150	51,699	956
TG-ReDial (Zhou et al., 2020)	10,000	129,392	33,834	1,482

4.1.2. Baseline

In the research domain of CRSs, we focus on two core tasks: recommendation and dialogue generation. To accurately assess the performance of our model and demonstrate its relative advantages, we conducted independent training for each task and compared the results with several landmark CRS models.

Text CNN (Chen, 2015): This recommender system uses a CNN-based model to extract text features from utterances for recommendations.

Redial (Li et al., 2018): Introduced alongside the Redial dataset, it incorporates an HRED (hierarchical recurrent encoder-decoder) and an autoencoder in its recommendation module.

KBRD (Chen et al., 2019): Enhances user representations with a knowledge graph (KG) and utilizes a transformer model in the dialogue module.

KGSF (Zhou et al., 2020): Integrates semantic and KG information to model user preferences and uses mutual information maximization to align entity and word representations in current dialogues.

TG-ReDial (Zhou et al., 2020): A topic-guided conversational recommendation mechanism, controlling state transitions with thematic cues, and uses historical interactions and dialogue to obtain preferences.

BART (Devlin et al., 2018): A denoising auto-encoder pre-trained language model, further fine-tuned for the recommendation task.

RevCore (Lu et al., 2021): It introduces unstructured external knowledge, namely movie-related reviews as an enhancement to user representations.

NTRD (Liang et al., 2021): Proposes a neural dialogue recommender system framework that achieves item recommendation to response generation with a two-stage strategy.

UCCR (Li et al., 2022): Centers on the user, considering multiple aspects of current and historical dialogues as well as similar users to model preferences.

MHIM (Shang et al., 2023): Proposes constructing a multi-granularity hypergraph to model historical dialogue content, employing hypergraph convolution and attention to extract user preferences.

4.1.3. Evaluation Metrics

CRS are designed to provide high-quality recommendations through natural and fluid interactions. Therefore, it is crucial to evaluate the performance of dialogue and recommendation tasks separately and to determine the appropriate evaluation metrics for each. We selected the following metrics as key indicators to assess the precision and recall effectiveness of the recommendation system:

Recall@k: Measures the proportion of relevant items that are successfully retrieved by the recommendation system among its top k recommendations.

MRR@k (Mean Reciprocal Rank at k): Evaluates the system’s ability to rank relevant recommendations higher in the list.

NDCG@k (Normalized Discounted Cumulative Gain at k): Measures the overall quality of the recommendation list, taking into account the relevance of each recommended item and their positions in the list.

For dialogue tasks, we employ the Dist-k metric.

Dist-k: measures the diversity of the response generated content in dialogue systems tasks. Specifically, it calculates the ratio of distinct k-grams (consecutive k items, such as words or characters) to the total number of k-grams in all generated texts. This metric helps to understand whether the system tends to produce repetitive or patterned responses.

4.1.4. Parameter Settings

Our research work is conducted on a deep learning framework based on PyTorch, using the CRSlab (Zhou et al., 2021) framework as the basis for modifying and extending models. We also performed comparative experiments to verify the effectiveness of our methods. In handling recommendation and dialogue tasks, we set the embedding sizes for knowledge graphs and word tokens at 300 and 128, respectively; to reduce information redundancy due to excessively long dialogue texts, we truncated the lengths of historical and current dialogues to 1024 and 256, respectively; considering the efficiency of the system, we set the number of layers for GCN, R-GCN, and HGCN to 1; we used the Adam optimizer with default parameter configurations; during training, we set the batch sizes for recommendation and dialogue tasks at 128 and 64, respectively; the learning rate was uniformly set at 0.001; gradient clipping was limited to [0, 0.1], and the training dialogue loss weight $λ$ was set to 5.

4.2. Experiment Results of Recommendation Task

4.2.1. Result Analysis

In this subsection, we conducted a series of comparative experiments to verify the performance of MGIRD in recommendation tasks. Table 2 shows the experimental results of different methods on the redial and tg-redial datasets for recommendation tasks.

Table 2.
Comparison of Models on the ReDial and TG-ReDial Datasets.

Dataset ReDial TG-ReDial

Mod—Met R@10 R@50 MR@10 MR@50 ND@10 ND@50 R@10 R@50 MR@10 MR@50 ND@10 ND@50

Text CNN 0.073 0.181 0.044 0.048 0.058 0.081 0.005 0.019 0.002 0.002 0.003 0.006

ReDial 0.173 0.336 0.078 0.084 0.097 0.135 0.010 0.037 0.003 0.004 0.005 0.011

KBRD 0.183 0.369 0.078 0.086 0.100 0.143 0.014 0.048 0.005 0.006 0.007 0.014

KGSF 0.201 0.403 0.084 0.093 0.111 0.156 0.018 0.054 0.007 0.009 0.010 0.018

TG-ReDial 0.189 0.380 0.080 0.088 0.103 0.148 0.017 0.051 0.006 0.008 0.009 0.016

BART 0.169 0.378 0.065 0.074 0.083 0.135 0.005 0.019 0.001 0.002 0.002 0.005

RevCore 0.206 0.409 0.085 0.095 0.113 0.158 0.019 0.058 0.008 0.009 0.011 0.019

NTRD 0.180 0.360 0.068 0.077 0.094 0.134 0.026 0.066 0.012 0.014 0.015 0.024

UCCR 0.216 0.426 0.088 0.098 0.118 0.164 0.023 0.066 0.009 0.011 0.012 0.021

MHIM 0.197 0.383 0.074 0.083 0.102 0.144 0.030 0.078 0.011 0.013 0.015 0.026

MGIRD 0.266 0.460 0.123 0.132 0.156 0.199 0.030 0.079 0.012 0.014 0.016 0.027

Dataset	ReDial	TG-ReDial
Text CNN	0.073	0.181	0.044	0.048	0.058	0.081	0.005	0.019	0.002	0.002	0.003	0.006
ReDial	0.173	0.336	0.078	0.084	0.097	0.135	0.010	0.037	0.003	0.004	0.005	0.011
KBRD	0.183	0.369	0.078	0.086	0.100	0.143	0.014	0.048	0.005	0.006	0.007	0.014
KGSF	0.201	0.403	0.084	0.093	0.111	0.156	0.018	0.054	0.007	0.009	0.010	0.018
TG-ReDial	0.189	0.380	0.080	0.088	0.103	0.148	0.017	0.051	0.006	0.008	0.009	0.016
BART	0.169	0.378	0.065	0.074	0.083	0.135	0.005	0.019	0.001	0.002	0.002	0.005
RevCore	0.206	0.409	0.085	0.095	0.113	0.158	0.019	0.058	0.008	0.009	0.011	0.019
NTRD	0.180	0.360	0.068	0.077	0.094	0.134	0.026	0.066	0.012	0.014	0.015	0.024
UCCR	0.216	0.426	0.088	0.098	0.118	0.164	0.023	0.066	0.009	0.011	0.012	0.021
MHIM	0.197	0.383	0.074	0.083	0.102	0.144	0.030	0.078	0.011	0.013	0.015	0.026
MGIRD	0.266	0.460	0.123	0.132	0.156	0.199	0.030	0.079	0.012	0.014	0.016	0.027

According to the results, we can observe that as a standalone recommendation model, compared to the comprehensive CRS, the performance of the text CNN is significantly worse, reaching only 50% of the average performance. This is mainly because Text CNN cannot capture the user’s immediate interests, while CRS dynamically senses and captures real-time user preference changes through instant dialog interaction mechanisms, thereby improving the accuracy of recommendations.

Furthermore, our results also show that integrating external data contributes to the improvement of recommendation performance. Specifically, incorporating diversified information such as knowledge graphs (Zhou et al., 2020), user reviews Lu et al. (2021), and historical interactions (Li et al., 2022) into the recommendation model can significantly improve the relevance of the recommendation results and enhance the degree of personalization of recommendations. From the R@50 performance metrics of the Redial dataset, these three types of models and our MGIRD exceed 0.4, indicating that a relatively high recall is obtained.

Our method outperforms all other baselines, and MGIRD, by integrating various user features and improving upon these baselines, achieves notable results in recommendation tasks. The experimental results also further validate the superiority of our proposed CRS in facing complex user requirements.

4.2.2. Ablation Study

To investigate the contributions of various components to the recommendation task, we conducted ablation experiments on the following three types of graph features: (a) MGIRD w/o Wo: Removing word embeddings extracted from ConceptNet; (b) MGIRD w/o En: Removing entity embeddings extracted from DBpedia; (c) MGIRD w/o Hist: Removing context-aware historical representations constructed and extracted from hypergraphs. The results are shown in the Figure 5 .

Figure 5.

Ablation study results on ReDial and TG-ReDial datasets: (a) ablation study results conducted on ReDial and (b) ablation study results conducted on TG-ReDial.

Based on the experimental results, we note that these three components are indispensable, and removing any one of them results in performance degradation. In particular, the first two features, i.e., the current user’s representation, removing the word and entity representations decreased the performance by more than 20%. This suggests that in a dialog-based recommender system, the user’s primary preferences are captured primarily through the system’s representation of the user’s current interests.

While the user’s history information, although beneficial, contributes less to the overall modeling of the user’s interests compared to the former, and removing it only decreases the performance by about 10%. In addition, on both the English and Chinese datasets, entity representations and word representations of the current user vary in their importance to the system’s understanding of the user’s interests.

4.2.3. Comparison With Large Language Model

In recent years, large language models have demonstrated unprecedented reasoning and generative capabilities, providing new opportunities for the development of powerful CRSs. Several studies have begun to take advantage of the inference capabilities of large models by customizing them through fine-tuning to improve their performance in session recommendation tasks. Here, we decided to compare our approach with the strategy proposed by Feng et al. (2023)(LLMCRS), which divides the conversation recommendation task into four subtasks, namely, subtask detection, model matching, subtask execution, and reply generation, takes the outputs of the first three tasks and the context of the conversation as inputs, and utilizes reinforcement learning driven by CRS performance feedback is utilized to fine-tune the LLM (LLaMA and T5 in this case) and complete response generation. Considering the large number of parameters and high dialog generation capacity of large language models, our comparison will focus on recommendation performance, which is evaluated on the TG-Redial dataset. Table 3 below details the comparison results of recommendation performance:

Table 3.
The Comparison Results with the Large Language Model(TG-ReDial).

Model R@10 R@50 MRR@50 NDCG@50

LLMCRS (Flan-T5) 0.0302 0.0792 0.0138 0.0261

LLMCRS (LLaMA) 0.0308 0.0791 0.0139 0.0263

MGIRD 0.0300 0.0794 0.0140 0.0267

Model	R@10	R@50	MRR@50	NDCG@50
LLMCRS (Flan-T5)	0.0302	0.0792	0.0138	0.0261
LLMCRS (LLaMA)	0.0308	0.0791	0.0139	0.0263
MGIRD	0.0300	0.0794	0.0140	0.0267

The data presented above show that our method performs well on metrics such as R@50, MR@50, and NDCG@50, but it underperforms compared to LLMCRS in R@10 (LLMCRS: 0.0308; MGIRD: 0.0300). We speculate that this may be because our model was only trained for performance in recommendation systems, without further utilization and training of these candidates within the conversational system.

4.3. Experiment Results of Dialogue Task

In this section, we have demonstrated the effectiveness of our proposed approach, and the subsequent table presents a comparison of evaluation metrics against different methods. Notably, LLMCRS and TG-Redial methods utilize additional pre-trained models for dialogue generation, which may introduce unfairness in direct comparisons with other methods; therefore, we did not include them in our comparison. In comparing dialogue performance, we primarily focused on the diversity metrics, such as “dist.” The specific comparison results are shown in Table 4.

Table 4.
Diversity Metrics Comparison Across Models.

ReDial TG-Redial

Model Dist-2 Dist-3 Dist-4 Dist-2 Dist-3 Dist-4

Redial 0.0689 0.2697 0.4638 0.2672 0.5288 0.8012

KBRD 0.0712 0.2883 0.4893 0.4629 1.0540 1.5720

KGSF 0.0761 0.3865 0.8470 0.5269 1.2560 1.9240

RevCore 0.0769 0.3065 0.5283 0.4513 1.0932 1.6631

NTRD 0.0896 0.3566 0.7294 0.5843 1.5200 2.3710

UCCR 0.0818 0.3289 0.5635 0.5365 1.2783 1.9376

MGIRD 0.0833 0.4054 0.9682 0.8976 1.9540 2.8450

MGIRD w/o Wo 0.0694 0.3320 0.7724 0.7568 1.5900 2.2297

MGIRD w/o En 0.0639 0.3207 0.8073 0.7159 1.5720 2.2790

MGIRD w/o Sel 0.0474 0.2996 0.9461 0.6118 1.5880 2.4640

	ReDial	TG-Redial
Redial	0.0689	0.2697	0.4638	0.2672	0.5288	0.8012
KBRD	0.0712	0.2883	0.4893	0.4629	1.0540	1.5720
KGSF	0.0761	0.3865	0.8470	0.5269	1.2560	1.9240
RevCore	0.0769	0.3065	0.5283	0.4513	1.0932	1.6631
NTRD	0.0896	0.3566	0.7294	0.5843	1.5200	2.3710
UCCR	0.0818	0.3289	0.5635	0.5365	1.2783	1.9376
MGIRD	0.0833	0.4054	0.9682	0.8976	1.9540	2.8450
MGIRD w/o Wo	0.0694	0.3320	0.7724	0.7568	1.5900	2.2297
MGIRD w/o En	0.0639	0.3207	0.8073	0.7159	1.5720	2.2790
MGIRD w/o Sel	0.0474	0.2996	0.9461	0.6118	1.5880	2.4640

4.3.1. Results Analysis

As depicted in Table 4, our approach demonstrates superior performance compared to the majority of other baseline methods across various scenarios.

Our further analysis of the comparison between the different baselines reveals that the Knowledge Graph Enhanced Decoder significantly improves the converter’s ability to understand user conversations in recommendation scenarios. By incorporating rich semantic associations between entities and contextual knowledge into the model, the Knowledge Graph Enhanced Decoder is able to more accurately capture subtle variations in user intent. The experimental results of NTRD also outperform other baselines, especially the “dist-2” metric on the redial dataset reaches 0.0896, which outperforms all baselines. We believe that it may be due to the fact that NTRD skillfully utilizes the multi-attention mechanism to match appropriate recommendation items with conversation content in the item selection process. We believe that it may be due to NTRD’s skillful use of the multi-attention mechanism in the item selection process to match the appropriate recommended items with the dialog content. At the same time, NTRD also adds additional user contextual information to enrich the item slots in each attention layer, thus generating more diverse and richer responses.

Building on these approach, we further infused the integrated user preference information into the decoding process. In the presence of a dialogue template generator, the generated content not only includes movie recommendations but also incorporates casual conversation, thereby expanding the direction of dialogue generation and enhancing the Dist metric of dialogue generation. Experimental results demonstrate that our method can generate responses that are more diverse and aligned with user interests in the dialogue process.

4.3.2. Ablation Study

To explore the role of each component in the conversational task, we conducted an ablation study by removing graph features and the item selector in the generation module.

Specifically, the study included: (a) MGIRD w/o Wo: removing word embeddings extracted from ConceptNet, (b) MGIRD w/o En: removing entity embeddings extracted from Dbpedia, and (c) MGIRD w/o Sel: removing the item selector in the generation module. The results are shown in Table 4.

According to the results of the ablation studies, we found that removing any of the components resulted in performance degradation. This suggests that the multi-aspected graphical representation feature used in the dialog module not only improves the efficiency of the item selector and dialog generation, but also that the item filtering mechanism that we use positively contributes to the dialog generation process by introducing more diverse and content-rich information. The experimental results show that the Dist performance of our method on the Chinese and English datasets is better. It exceeds the current better baseline by about 10%. This suggests that our method can generate more diversified and user-interested responses during the conversation.

4.3.3. Realistic Conversation Recommendation Scenario Analysis

In order to verify the practicality of the method, we conducted an experimental analysis of realistic dialog recommendation scenarios to make it a combination of theory and practice. As shown in Figure 6 below, firstly, the user makes a recommendation request: our method quickly recognizes the keyword “Action Movie” through self-attention computation, which narrows down the space of candidates, and then enhances the information with the entity “Escape Plan” mentioned in the user’s history to generate the candidate “Rambo: First Blood Part II.” After that, we augment the information with the “Escape Plan” entity mentioned in the user’s history, and generate the candidate “Rambo: First Blood Part II,” which is inserted into the reply; after receiving the user’s indication of the right direction, such as “I have already seen it, but I liked it,” we can then add the candidate “Rambo: First Blood Part II” into the reply. Upon receiving a prompt from the user such as “I’ve already seen it, but like it,” which indicates the right direction, the system generates two candidate entities with stronger relevance. Then, the system weights the entity with stronger relevance to the historical information, prioritizes the recommendation of movies related to that entity, and integrates it with the generated dialogue template to generate a satisfactory response.

4.3.4. Sensitivity Analysis of Dialogue Parameters

In order to verify the scientific validity and effectiveness of the preset dialog loss weight $λ$ , we conducted a sensitivity analysis of this parameter. The significance of the dialog loss weight $λ$ is that the model needs to construct dialogs through two stages: generating dialog templates and selecting items; the generation of dialog templates is concerned with the fluency and logical coherence of language expression, while the selection of items is directly related to the relevance and accuracy of the recommended content. However, simply emphasizing either part in the training of dialogue models may lead to biased dialogue quality, and our goal is to balance the generation of dialogue templates and item selection during training. Therefore, we introduce the weighting parameter $λ$ in the hope that it can effectively guide the learning process towards a more optimal combination of strategies, which will result in higher quality responses. The parameter sensitivity analysis is shown in Figure 7.

Figure 6.

Realistic scenario analysis.

Figure 7.

Sensitivity analysis of dialogue parameters on ReDial and TG-ReDial datasets: (a) ReDial and (b) TG-ReDial.

Through parameter sensitivity experiments conducted on both datasets, our parameter settings achieved relatively good performance. Therefore, in our model, we decided to fix the dialog loss weight $λ$ to 5, which has a significant positive impact on maintaining the quality of the dialog and enhancing the user experience.

5. Conclusion

This study explores the possibility of integrating multifaceted graphs and hypergraph structures as a way to enhance the representation of user information. First, it conducts graph feature extraction of words and entities from the knowledge graph of entities related to the current conversation using graph convolutional networks, and then integrates historical information about the user by constructing a hypergraph structure. Ultimately, it extracts information from the historical conversations and the entities related to the project to enhance the understanding of the user’s current interests. In the context of the conversation generation phase, we utilize these features to guide conversation template generation and item selection. Extensive experiments on both datasets show the superior performance of our approach compared to other baselines.

At present, there are still many parts of our experiments that can be improved: in the construction of hypergraphs, we give the same weight to all historical information, while ignoring the influence of time factor on the user’s historical representation. For example, the closer the historical information is to the current conversation, the more it affects the user’s current interest, and vice versa, the older the information is, the less it affects the user’s current interest; also some of the user’s interest representations are not only reflected in the historical information, but may also include some other information (e.g., comment information, similar users’ information, etc.). In the future, we plan to explore more comprehensive user information dimensions to generate richer dialog content, while also considering the impact of time factor on the accuracy of historical hypergraph information extraction. In addition, given the rapid development of dialogue models, we hope to optimize the user’s dialogue experience in the future by adopting more advanced dialogue models (including, but not limited to, LLM) to improve the naturalness of the dialogue and the accuracy of the wording.

Footnotes

ORCID iD

Qing Yang Bai

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Alam

M. T.

Ahmed

C. F.

Samiullah

Leung

C. K.

(2021). Mining frequent patterns from hypergraph databases. In Pacific–Asia conference on knowledge discovery and data mining (pp. 3–15).

Ashraf

(2021). Avoiding vulnerabilities and attacks with a proactive strategy for web applications. Advances in Robotics and Mechanical Engineering, 3(2), 263–271.

Bai

Zhang

Torr

P. H.

(2021). Hypergraph convolution and hypergraph attention. Pattern Recognition, 110, 107637.

Bizer

Lehmann

Kobilarov

Auer

Becker

Cyganiak

Hellmann

(2009). Dbpedia-a crystallization point for the web of data. Journal of Web Semantics, 7(3), 154–165.

Chen

(2015). Convolutional neural network for sentence classification. Master’s thesis, University of Waterloo.

Chen

Lin

Zhang

Ding

Cen

Yang

Tang

(2019). Towards knowledge-based recommender dialog system. arXiv preprint arXiv:1908.05391.

Deng

Sun

Ding

Lam

(2021). Unified conversational recommendation policy learning via graph-based reinforcement learning. In Proceedings of the 44th International ACM SIGIR conference on research and development in information retrieval (pp. 1431–1441).

Devlin

Chang

M. W.

Lee

Toutanova

(2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arxiv preprint arxiv:1810.04805.

Feng

Liu

Xue

Cai

Jiang

Sun

(2023). A large language model enhanced conversational recommender system. arxiv preprint arxiv:2308.06212.

10.

Gao

Lei

de Rijke

Chua

T. S.

(2021). Advances and challenges in conversational recommender systems: A survey. AI Open, 2, 100–126.

11.

Guo

Tang

(2017). DeepFM: A factorization-machine based neural network for CTR prediction. In arXiv preprint arXiv:1703.04247.

12.

Guo

Zhang

Sun

Ren

Chen

Ren

(2023). Towards explainable conversational recommender systems. In Proceedings of the 46th International ACM SIGIR conference on research and development in information retrieval (pp. 2786–2795).

13.

Kipf

T. N.

Welling

(2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

14.

Lei

Miao

Hong

Kan

M. Y.

Chua

T. S.

(2020a). Estimation-action-reflection: Towards deep interaction between conversational and recommender systems. In Proceedings of the 13th international conference on web search and data mining (pp. 304–312).

15.

Lei

Zhang

Miao

Wang

Chen

Chua

T. S.

(2020b). Interactive path reasoning on graph for conversational recommendation. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2073–2083).

16.

Ebrahimi Kahou

Schulz

Michalski

Charlin

Pal

(2018). Towards deep conversational recommendations. Advances in Neural Information Processing Systems, 31.

17.

Wei

Mao

X. L.

Yuan

Xie

Chen

(2023). Trea: Tree-structure reasoning schema for conversational recommendation. arxiv preprint arxiv:2307.10543.

18.

Xie

Zhu

Zhuang

(2022). User-centric conversational recommendation with multi-aspect user modeling. In Proceedings of the 45th International ACM SIGIR conference on research and development in information retrieval (pp. 223–233).

19.

Liang

Miao

Chen

Jiang

(2021). Learning neural templates for recommender dialogue system. arXiv preprint arXiv:2109.12302.

20.

Bao

Song

Cui

(2021). RevCore: Review-augmented conversational recommendation. arXiv preprint arXiv:2106.00957.

21.

Luo

Yang

Sanner

(2020). Deep critiquing for VAE-based recommender systems. In Proceedings of the 43rd International ACM SIGIR conference on research and development in information retrieval (pp. 1269-1278).

22.

Takanobu

Huang

(2020). Cr-walker: Tree-structured graph reasoning and dialog acts for conversational recommendation. arxiv preprint arxiv:2010.10333.

23.

Moradizeyveh

(2022). Intent recognition in conversational recommender systems. arXiv preprint arXiv:2212.03721.

24.

Ren

Chen

Nguyen

Q. V. H.

Cui

Huang

Yin

(2024). Explicit knowledge graph reasoning for conversational recommendation. ACM Transactions on Intelligent Systems and Technology, 15(4), 1–21.

25.

Schlichtkrull

Kipf

T. N.

Bloem

Van Den Berg

Titov

Welling

(2018). Modeling relational data with graph convolutional networks. In The Semantic Web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings (pp. 593–607). Springer International Publishing.

26.

Shang

Hou

Zhao

W. X.

Zhang

(2023). Multi-grained hypergraph interest modeling for conversational recommendation. AI Open, 4, 154–164.

27.

Speer

Chin

Havasi

(2017). Conceptnet 5.5: An open multilingual graph of general knowledge. In Proceedings of the AAAI conference on artificial intelligence (pp. 1).

28.

Sun

Zhang

(2018). Conversational recommender system. In The 41st international ACM SIGIR conference on research & development in information retrieval (pp. 235–244).

29.

Tran

D. H.

Sheng

Q. Z.

Zhang

W. E.

Hamad

S. A.

Zaib

Tran

N. H.

Khoa

N. L. D

(2020). Deep conversational recommender systems: A new frontier for goal-oriented dialogue systems. arXiv preprint arXiv:2004.13245.

30.

Vaswani

Shazeer

Parmar

Uszkoreit

Jones

Gomez

A. N.

Polosukhin

(2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

31.

Wang

Zhou

Wen

J. R.

Zhao

W. X.

(2022). Towards unified conversational recommender systems via knowledge-enhanced prompt learning. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining (pp. 1929–1937).

32.

Xia

Huang

Zhang

(2022). Self-supervised hypergraph transformer for recommender systems. In Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining (pp. 2100–2109).

33.

Zhang

Chen

Yang

Croft

W. B.

(2018). Towards conversational search and recommendation: System ask, user respond. In Proceedings of the 27th ACM International conference on information and knowledge management (pp. 177–186).

34.

Zhang

Huang

Zou

(2024). Improving conversational recommender systems via multi-preference modelling and knowledge-enhanced. Knowledge-Based Systems, 286, 111361.

35.

Zhang

Liu

Yuan

Miao

(2022). Minimalist and high-performance conversational recommendation with uncertainty estimation for user preference. In arXiv preprint arXiv:2206.14468.

36.

Zhang

Liu

Zhong

Zhang

Wang

Miao

(2021). Kecrs: Towards knowledge-enriched conversational recommendation system. arXiv preprint arXiv:2105.08261.

37.

Zhang

Shen

Pang

Wei

Pei

(2022). Multiple choice questions based multi-interest policy learning for conversational recommendation. In Proceedings of the ACM Web conference (pp. 2153–2162).

38.

Zhao

Wei

Liu

Wang

Mao

X. L.

Wen

(2023a). Towards hierarchical policy learning for conversational recommendation with hypergraph-based reinforcement learning. In arXiv preprint arXiv:2305.02575.

39.

Zhao

Wei

Mao

X. L.

Zhu

Yang

Wen

Zhu

(2023b). Multi-view hypergraph contrastive policy learning for conversational recommendation. In Proceedings of the 46th International ACM SIGIR conference on research and development in information retrieval (pp. 654–664).

40.

Zhao

Zhou

Wang

Zhao

W. X.

Pan

Cao

Wen

J. R.

(2023). Alleviating the long-tail problem in conversational recommender systems. In Proceedings of the 17th ACM conference on recommender systems (pp. 374–385).

41.

Zhou

Wang

Zhou

Shang

Cheng

Zhao

W. X.

Wen

J. R.

(2021). CRSLab: An open-source toolkit for building conversational recommender system. arxiv preprint arxiv:2101.00939.

42.

Zhou

Zhao

W. X.

Bian

Zhou

Wen

J. R.

(2020). Improving conversational recommender systems via knowledge graph based semantic fusion. In Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 1006–1014).

43.

Zhou

Zhao

W. X.

Wang

Jiang

(2022). C²-CRS: Coarse-to-fine contrastive learning for conversational recommender system. In Proceedings of the Fifteenth ACM international conference on web search and data mining (pp. 1488–1496).

44.

Zhou

Zhao

W. X.

Wang

Wen

J. R.

(2020). Towards topic-guided conversational recommender system. arxiv preprint arxiv:2010.04125.

45.

Zou

Chen

Kanoulas

(2020). Towards question-based recommender systems. In Proceedings of the 43rd International ACM SIGIR conference on research and development in information retrieval (pp. 881–890).

Dataset				ReDial						TG-ReDial
Mod—Met	R@10	R@50	MR@10	MR@50	ND@10	ND@50	R@10	R@50	MR@10	MR@50	ND@10	ND@50
Text CNN	0.073	0.181	0.044	0.048	0.058	0.081	0.005	0.019	0.002	0.002	0.003	0.006
ReDial	0.173	0.336	0.078	0.084	0.097	0.135	0.010	0.037	0.003	0.004	0.005	0.011
KBRD	0.183	0.369	0.078	0.086	0.100	0.143	0.014	0.048	0.005	0.006	0.007	0.014
KGSF	0.201	0.403	0.084	0.093	0.111	0.156	0.018	0.054	0.007	0.009	0.010	0.018
TG-ReDial	0.189	0.380	0.080	0.088	0.103	0.148	0.017	0.051	0.006	0.008	0.009	0.016
BART	0.169	0.378	0.065	0.074	0.083	0.135	0.005	0.019	0.001	0.002	0.002	0.005
RevCore	0.206	0.409	0.085	0.095	0.113	0.158	0.019	0.058	0.008	0.009	0.011	0.019
NTRD	0.180	0.360	0.068	0.077	0.094	0.134	0.026	0.066	0.012	0.014	0.015	0.024
UCCR	0.216	0.426	0.088	0.098	0.118	0.164	0.023	0.066	0.009	0.011	0.012	0.021
MHIM	0.197	0.383	0.074	0.083	0.102	0.144	0.030	0.078	0.011	0.013	0.015	0.026
MGIRD	0.266	0.460	0.123	0.132	0.156	0.199	0.030	0.079	0.012	0.014	0.016	0.027

	ReDial			TG-Redial
Model	Dist-2	Dist-3	Dist-4	Dist-2	Dist-3	Dist-4
Redial	0.0689	0.2697	0.4638	0.2672	0.5288	0.8012
KBRD	0.0712	0.2883	0.4893	0.4629	1.0540	1.5720
KGSF	0.0761	0.3865	0.8470	0.5269	1.2560	1.9240
RevCore	0.0769	0.3065	0.5283	0.4513	1.0932	1.6631
NTRD	0.0896	0.3566	0.7294	0.5843	1.5200	2.3710
UCCR	0.0818	0.3289	0.5635	0.5365	1.2783	1.9376
MGIRD	0.0833	0.4054	0.9682	0.8976	1.9540	2.8450
MGIRD w/o Wo	0.0694	0.3320	0.7724	0.7568	1.5900	2.2297
MGIRD w/o En	0.0639	0.3207	0.8073	0.7159	1.5720	2.2790
MGIRD w/o Sel	0.0474	0.2996	0.9461	0.6118	1.5880	2.4640

Multi-Aspect Graph Representation Feature Integration for Recommender Dialogue System

Abstract

Keywords

1. Introduction

2.1. Task Description for Conversation Recommendation System

2.2. Attribute-Based Conversation Recommendation System

2.3. Generation-Based Conversation Recommendation System

3. Description of Method and Structure

3.1. Current User Preference Extractor

3.1.1. Encoding Word-Oriented KG

3.2.1. Construction of Hypergraphs

4.1. Datasets and Setup

4.1.1. Datasets

Table 1. Datasets Overview. Dataset # Dialogs # Utterance # Items # Users ReDial (Li et al., 2018) 10,006 182,150 51,699 956 TG-ReDial (Zhou et al., 2020) 10,000 129,392 33,834 1,482

4.1.3. Evaluation Metrics

4.1.4. Parameter Settings

4.2. Experiment Results of Recommendation Task

4.2.1. Result Analysis

Table 3. The Comparison Results with the Large Language Model(TG-ReDial). Model R@10 R@50 MRR@50 NDCG@50 LLMCRS (Flan-T5) 0.0302 0.0792 0.0138 0.0261 LLMCRS (LLaMA) 0.0308 0.0791 0.0139 0.0263 MGIRD 0.0300 0.0794 0.0140 0.0267

4.3.2. Ablation Study

4.3.3. Realistic Conversation Recommendation Scenario Analysis

4.3.4. Sensitivity Analysis of Dialogue Parameters

Footnotes

ORCID iD

Funding

Declaration of Conflicting Interests

References

Table 1.
Datasets Overview.

Dataset # Dialogs # Utterance # Items # Users

ReDial (Li et al., 2018) 10,006 182,150 51,699 956

TG-ReDial (Zhou et al., 2020) 10,000 129,392 33,834 1,482

Table 3.
The Comparison Results with the Large Language Model(TG-ReDial).

Model R@10 R@50 MRR@50 NDCG@50

LLMCRS (Flan-T5) 0.0302 0.0792 0.0138 0.0261

LLMCRS (LLaMA) 0.0308 0.0791 0.0139 0.0263

MGIRD 0.0300 0.0794 0.0140 0.0267