User Clustering Based on Cross-Domain Cognition for Recommendation Services #

Abstract

The current trend in recommendation services is prioritizing personalization to ensure accurate recommendations. This study aims to enhance the user-based collaborative filtering algorithm for cross-domain recommendations by exploiting the similarity in user cognition across multiple domains. The research suggests three steps: (i) gathering user feedback from various domains to represent their cognition, (ii) constructing a user cognition-based collaborative filtering model for multi-domain recommendations, and (iii) generating recommendations in the target domain. The experimental results demonstrate that the proposed model outperforms all baseline methods. In particular, the proposed method is better than the baselines, approximately 12% up to 16%, regarding mean average precision and normalized discounted cumulative gain metrics.

Keywords

user cognition collaborative filtering cross-domain recommendation systems

1. Introduction

In modern web applications, recommendation systems (RSs) are extensively utilized in various areas, such as video sharing on YouTube, e-commerce on Amazon, and social networking on Facebook, to address the issue of information overload (Ricci et al., 2010). Among the existing RS techniques, collaborative filtering (CF) is the most promising approach (Nguyen et al., 2020a, 2020b; Ricci et al., 2010; Vuong Nguyen et al., 2021). The fundamental principle of CF is to identify the most appropriate products for a specific user based on the preferences of other users with similar interests. Many effective RS techniques have been proposed in recent years, particularly those relying on CF (Nguyen et al., 2020c, 2021).

However, most real-world applications encounter the issue of data sparsity, as only a limited number of users provide ratings or reviews for items (Ricci et al., 2010). As a result, the accuracy of recommendations generated by CF algorithms is reduced. CF-based RSs face data sparsity problems to varying degrees (known as the cold-start problem), particularly for new users or products (Duan et al., 2022; Zhao et al., 2022). This issue can lead to over-fitting during CF model training, which can significantly impact the accuracy of recommendations. To address this problem, two approaches have been developed. The first method involves intelligently eliciting user preferences, while the second method involves inferring user preferences using other data. A promising solution under the second approach is the cross-domain recommendation, which uses user preferences and item features from different but related domains to make recommendations in the target domain (Yu et al., 2019; Zhu et al., 2020). Cross-domain RSs (CDRSs) have emerged to overcome data sparsity issues by utilizing relatively richer information, such as user/item information, thumbs-up, tags, reviews, and observed ratings, from the source domain to improve recommendation accuracy in the target domain (Berkovsky et al., 2007). For example, an RS can recommend books to users based on their movie reviews because a common user in multiple domains is likely to have similar preferences.

CDRSs are RSs that provide personalized recommendations by leveraging data from multiple domains or sources (Fernández-Tobías et al., 2019; Yu et al., 2019; Zhang et al., 2022). These systems aim to overcome the limitations of traditional RSs that are limited to a single domain. CDRSs are currently being studied in various domains for different purposes, such as cross-system personalization in user modeling (Hu et al., 2019; Yu et al., 2019; Yuan et al., 2019), practical applications of transfer learning techniques in machine learning (Fernández-Tobías et al., 2019; Li & Tuzhilin, 2020; Zhang et al., 2022), and alleviating the lack of user preference data in RSs. Cross-domain CF is a recommendation technique in a CDRS that leverages user and item data from multiple domains to make recommendations in a target domain. This method addresses the cold-start problem by utilizing the data from other domains where more user and item data are available. The basic idea behind CDCF (Liu et al., 2022) is to identify correlations and similarities between the different domains, which allows the algorithm to transfer knowledge from one domain to another. CDCF has been successfully applied in various domains, such as music, movies, and e-commerce. The main advantage of this approach is that it allows for better personalization, even for new users or items, by using information from other domains. However, challenges still need to be addressed, such as domain adaptation and feature selection, to ensure the effectiveness and efficiency of CDCF (Natarajan et al., 2022; Nguyen & Jung, 2023; Yu et al., 2022; Zhang et al., 2022).

This research aims to extend the CF algorithms to overcome the difficulties of CDRSs stated above. This requires transforming the CF problem into a multi-task learning problem, in which similarities between multiple domains are utilized to handle the issue (Krohn-Grimberghe et al., 2012; Singh & Gordon, 2008). Remarkably, multi-domain settings often exhibit significant similarities in online user behavior. For instance, similar products from book and fashion domains are recommended when users rate or like any item in the movie domain of the SABRE platform (Nguyen & Jung, 2020). Clicking on these recommended products implies that they are similar items in the auxiliary domains of users in the source domain, which helps gather cognitive similarity data of users across different domains. Hence, in multiple-domain scenarios, we can exploit the cognitive similarity between users across different domains to define the group of nearest neighbors for the target user. The idea is to represent the target user’s cognition with similar items by the set of similar neighbors, which can be obtained by combining the domain-specific and domain-shared cognitive similarities. This approach can help to overcome the data sparsity problem and improve the accuracy of recommendations in CDRS. To enhance the accuracy of the clustering method for similar users from multiple domains, the user cognition-based CF (UCCF) model is presented for a top-N cross-domain recommendation task. UCCF is based on the adaptive K-nearest-neighbor (KNN) framework, which identifies a set of nearest neighbors who are sufficiently similar to the active user based on cognitive similarity data across multiple domains. To collect this data, a crowdsourcing platform called SABRE is employed (Nguyen & Jung, 2020), where users can provide explicit and implicit feedback on similar items from various domains. In summary, the contributions of this research are as follows.

Deploy the SABRE² platform–the crowdsourcing platform that collects cognitive similarity data from users across multiple domains such as movies, books, tourism, fashion, etc.

Proposing UCCF models for clustering users across multiple domains without considering the knowledge transfer between domains.

The organization of this paper is as follows. The subsequent section presents a literature review on CDRSs using CF and prior efforts to enhance the performance of the user-based CF approach. Section 3 outlines the cross-domain crowdsourcing platform for recommendation services based on user cognition. The experimental results and evaluation are presented in Section 4. In Section 5, we summarize the results of the study and propose future research directions.

2. Related Work

The earliest studies on CDCF were mentioned in several works such as Berkovsky et al. (2007), Li et al. (2009a), and Li et al. (2009b). These studies proposed various methods for aggregating rating vectors of users from different domains, which can be categorized into neighborhood-based models and latent factor models. The former assumes shared users or items in different domains, while the latter does not require shared users or items. One specific example of the neighborhood-based CDCF (N-CDCF) approach is introduced in Berkovsky et al. (2007), which estimates the similarity between users or items. N-CDCF can be divided into the user-based nearest neighbor (NCFU) and item-based nearest neighbor (NCFI) models. In this paper, we focus on improving the user-based method and provide a detailed review of the N-CDCF-U model. The following section will present a discussion of related works on CF CDRSs and existing studies on enhancing the performance of the user-based collaborative filtering approach. Additionally, we will introduce a cross-domain crowdsourcing platform for recommendation services based on user cognition and report the experimental results and evaluation in later sections.

The study in Berkovsky et al. (2007) presents the problem formulation for N-CDCF-U. The formulation involves $m$ domains represented by the set $D = {D_{1}, D_{2}, \dots, D_{m}}$ , and a user set $U = {u_{1}, u_{2}, \dots, u_{n}}$ , where $n$ is the number of users. The items set $I_{k} = {i_{k}^{1}, i_{k}^{2}, \dots, i_{k}^{n (k)}}$ belonging to domain $D_{k}$ ( $0 \leq k \leq m$ ) is also included, where $n (k)$ represents the size of the item set in $D_{k}$ . To estimate user similarity ( $Sim (u, v)$ ), the user-based CDCF algorithm is employed to calculate Pearson correlation between users $u$ and $v$ who have co-rated the same set of items, as shown in the equation.

Sim (u, v) = \frac{\sum_{i \in i_{u, v}} (r_{u, i} - {\bar{r}}_{u}) (r_{v, i} - {\bar{r}}_{v})}{\sqrt{\sum_{i \in i_{u, v}} {(r_{u, i} - {\bar{r}}_{u})}^{2}} \sqrt{\sum_{i \in i_{u, v}} {(r_{v, i} - {\bar{r}}_{v})}^{2}}}

(1)

In equation (1), the set of co-rated items of users

u

and

v

is denoted as

i_{u, v} = i_{u} \cap i_{v}

, where

i_{u}

and

i_{v}

represent the items that users

u

and

v

interacted with over all domains

D_{k}

. The variables

r_{u, i}

and

r_{v, i}

indicate the ratings given by users

u

and

v

on item

i

, respectively, while

r_{u}

and

r_{v}

are the average ratings of users

u

and

v

for all rated items, respectively. In the second step, the predicted rating of item

p

for user

u

can be computed using the following equation:

{\hat{r}}_{u, p} = {\bar{r}}_{u} + \frac{\sum_{v \in U_{u, p}^{k}} Sim (u, v) * (r_{v, p} - {\bar{r}}_{v})}{\sum_{v \in u U_{u, p}^{k}} | Sim (u, v) |}

(2)

where

U_{u, p}^{k}

denotes a set of

k

neighbors (top

k

users) that are most similar users of user

u

who rated item

p

In addition to the aforementioned approach, the traditional matrix factorization (MF) model is also utilized for handling CDCF problems. In Koren et al. (2009), the Funk-SVD model is introduced as the most commonly used MF model for a single-domain collaborative filtering RS. The main idea of this model is to map users and items into a joint latent factor space of dimension $f$ . The decomposition of the Funk-SVD model is illustrated in Figure 1, where each item $i$ is associated with a latent vector $q_{i} \in R^{f}$ . The predicted rating of user $u$ on item $i$ is represented as ${\hat{r}}_{u, i}$ and can be formulated as

{\hat{r}}_{u, i} = q_{i}^{T} p_{u}

(3)

Figure 1.

Illustrate the decomposition of the Funk-SVD model.

To learn the latent vectors, the Funk-SVD model minimizes the regularized squared error on the set of known ratings, which is expressed by the following equation:

min_{q *, p *} \sum_{(u, i) \in K} {(r_{u i} - q_{i}^{T} p_{u})}^{2} + α (| | q_{i} | |^{2} + | | p_{u} | |^{2})

(4)

In the above equation,

K

represents the set of (

u, i

) pairs for each known

r_{u, i}

. The constant

α

is used to control the degree of regularization and prevent overfitting. It is typically determined through cross-validation (Kohavi, 1995). Stochastic gradient descent is used to reduce the optimization problem. This involves iterating through all the ratings in the training dataset. In each training step, the prediction error associated with the predicted rating

r_{u, i}

is computed using the following equation:

e_{u, i} = r_{u, i} - q_{i}^{T} p_{u}

(5)

The parameters are then updated by a magnitude proportional to

θ

(i.e., the learning rate) in the opposite direction of the gradient.

\begin{aligned} q_{i} \leftarrow q_{i} + θ (e_{u, i} p_{u} - α q_{i}) p_{u} \leftarrow p_{u} + θ (e_{u, i} q_{i} - α p_{u}) \end{aligned}

(6)

The standard MF approach is widely used to solve the CDCF problem. By combining the rating matrices from multiple domains, we can create a more comprehensive matrix that captures the interactions between users and items across all domains. The MF model can then be trained on this combined matrix to identify latent factors that capture users’ preferences and items’ attributes. These latent factors are then used to predict how users rate items they have not interacted with. However, one issue with this approach is that the N-CDCF and MF-CDCF methods assume that items are homogeneous within a domain, which may not be accurate in real-world scenarios. As a result, the performance of these methods may suffer when dealing with highly diverse items from different domains. Nevertheless, researchers continue to develop new and improved techniques for addressing the CDCF problem, including deep learning-based approaches that can capture more complex interactions between users and items.

In a different study, the cross-domain triadic factorization model (Hu et al., 2013) was utilized, which considers the entire relationship between users, products, and domains to effectively capture user preferences for items across different domains. A three-order tensor is employed to represent the user–item–domain interactions, and a tensor factorization approach is used to factorize users, items, and domains into latent feature vectors. The user–item–domain rating is generated by taking the element-wise product of the latent factors for user, item, and domain. However, the temporal complexity of tensor factorization is exponential, with a computational cost of $O (k^{m})$ , where $k$ is the number of factors and $m$ is the number of domains.

The transfer by collective factorization (TCF) model presented in Pan et al. (2011) aims to address the issue of data sparsity in numerical ratings by leveraging knowledge from auxiliary domains. This model assumes that the latent feature matrices for user–item pairs are identical and uses both numerical rating data and binary like/dislike auxiliary data. Unlike the code book transfer model (Li et al., 2009a) and the rating matrix generative model model (Li et al., 2009b), which do not share latent features, the TCF model shares these features and analyzes data-dependent information using two inner matrices. However, this approach is only suitable for a single auxiliary domain and requires the alignment of users/items between the target matrix (rating) and the auxiliary binary matrix (like/dislike).

3. User Cognition-Based Collaborative Filtering Model

Assuming the set of $m$ domains $D = {D_{1}, D_{2}, \dots, D_{m}}$ and a set of $n$ users $U = {u_{1}, u_{2}, \dots, u_{n}}$ , we define the matrix $X_{D} \in R^{m \times n}$ as the matrix containing the interactions of all users $U_{D}$ with the set of items $V_{D}$ in domain $D$ . Here, $m$ represents the size of $V_{D}$ . In our assumption, we consider $X_{D}$ to be a binary matrix since collecting information on actions such as clicks and purchases is easier than collecting rating values. Thus, when a user interacts with an item, the corresponding element of $X_{D}$ is set to $1$ , and $0$ otherwise.

The research aims to provide personalized recommendations for users in each domain based on their preferences. The problem is formulated by introducing a model based on general similarity in a single domain. This model uses a function $f (X, Θ)$ to determine preference values, where $Θ$ is a model parameter and $X$ is the user–item interaction matrix. The research proposes an adaptive KNN algorithm that generates top-N recommendations effectively. The KNN algorithm is a popular collaborative filtering method (Ning & Karypis, 2011).

In this user-based KNN problem, we set parameter $Θ$ as matrix $W \in R^{n \times n}$ since the $W$ resulting from domain-specific ( $V_{D}$ ) and domain-shared similarity ( $V$ ), and the function $f (.)$ is an aggregation of interaction values of the $k$ nearest neighbors. Thus, in mathematically, the predicted score ${\hat{x}}_{i, j}$ is calculated by

{\hat{x}}_{i j} = x_{i}^{T} w_{j}

(7)

where

x_{i} \in R^{n}

is the interaction indicator of item

v_{i}

and

w_{j} \in R^{n}

is the similarity coefficients of user

u_{j}

. In the next optimization step, since the parameter

W

is considered a similarity between users and optimized by the following function

\tilde{W} = \arg min_{W} L (W) + Ω (W)

(8)

where the loss function

L

is defined as least squares and the regularization penalty

Ω

is used to enforce parameters with specific structures.

The idea of CDRS involves exploiting the similarity between common users, which are overlapping users interacting with different items across multiple domains, based on their cognition of comparable items. However, the primary challenge is establishing a group of nearest neighbors for the target user based on cognitive similarity across different domains. As a result, we can represent the cognition of the target user using a set of similar neighbors. In a single domain, personalized neighbors for each user are typically determined based on shared cognitive similarities, known as domain-specific cognitive similarities. However, in multiple-domain scenarios, we aim to merge domain-specific cognitive similarity with cognitive similarities shared between domains, referred to as domain-shared cognitive similarity. The proposed model is depicted in Figure 2 and formulated as follows:

W_{D} = V + V_{D}, D \in {D_{1}, D_{2}, \dots, D_{m}}

(9)

where the personalized neighbors of the target user in various domains and the representative neighbors of the target user in a specific domain are represented by parameter matrices

V

and

V_{D}

, respectively, where both matrices are of dimensions

n \times n

. The regularizing of the structure of the set

V_{D}

is achieved using group lasso to explore similarity. The study indicates that domain-specific representation results in better user representation, which leads to denser results for corresponding rows in

V_{D}

. The definition of the group lasso

Ω_{g-lasso}

for

V_{D}

is given below.

Ω_{g-lasso} (V_{D}) = δ_{D} | V_{D} |_{2, 1} = \sum_{i = 1}^{n} δ_{D} | v_{D}^{i} |_{2}

(10)

where the

l_{2}

-norm value for each row in the parameter set

V_{D}

is represented by

| v_{D}^{i} |

, while

δ D

represents the contribution of group lasso in each specific domain. To maintain generality, we assume that

δ_{D} = δ

for all domains. Furthermore, the personalized neighbors of the target users in the overall user set are sparse. Thus, the constraint lasso is used to reduce noise coefficients in

V

, which is formulated as follows:

Ω_{lasso} (V) = γ | V |_{1} = \sum_{i = 1}^{n} \sum_{j = 1}^{n} γ | v_{i j} |

(11)

Figure 2.

Ilustration of the user cognition-based collaborative filtering (UCCF) model.

Finally, the cognitive similarity can be integrated into a loss function with the least square loss, which is defined by the following equation:

L (W_{D}) = \frac{1}{2} | X_{D} - X_{D} W_{D} |_{2}^{2}

(12)

Following equation (8), the UCCF model is proposed as follows:

min_{Θ} \sum_{D = 1}^{m} α_{D} (L (W_{D}) + Ω_{g-lasso} (V_{D})) + Ω_{lasso} (V)

(13)

In equation (13),

Θ

represents the parameter set

W_{D}, V_{D}, V

concisely, and

α_{D}

is used to balance the effect of different multiple domains. Figure 2 illustrates the complete UCCF model.

4. Experiments

This section provides an overview of our feedback collection platform for our experiment. We outline the platform’s architecture and functionality, detailing how it efficiently gathers and processes user feedback to inform our analysis. The platform’s user interface, data handling capabilities, and integration with other experimental tools are thoroughly explained to illustrate its robustness and reliability. Following this, we detailed SABRE’s initial dataset, describing its composition, sources, and the preprocessing steps undertaken to ensure data quality and consistency. This includes information on the types of data collected and the specific features extracted for our study. We also present the dataset collected from SABRE during the experiment, highlighting the data’s dynamic aspects as it evolves with user interactions. Finally, we discuss the experimental results, presenting a thorough analysis of the data obtained through our feedback collection platform. We provide quantitative and qualitative evaluations to demonstrate UCCF’s performance.

Figure 3.

Architecture of SABRE crowdsourcing platform.

4.1. Overview of SABRE Platform

The SABRE platform was created using Java 8 and the Spring framework. Its architecture follows the MVC model, which includes both web and background services. On the web service side, we use Tomcat 11, while MySQL is used for the background service. All user data is collected and stored in the MySQL database and then extracted into CSV files for our experiments. A SABRE platform’s architecture diagram can be found in Figure 3. Our goal with the SABRE platform is to gather user cognition effectively, so we prioritized designing a user-friendly interface that individuals with varying skill levels could use. To achieve this, we consulted the essential guide for user-interface design outlined in Galitz (2007). Interact with SABRE, users only need to remember URLs rather than a predefined sequence, and the platform offers alerts and roll-back methods in case of errors. Additionally, users can quickly resume their work if they need to stop abruptly. The simplicity of SABRE was our primary strategy in its design. Hence, we applied three golden rules to achieve this and described them as follows.

Placing user in control: We prioritize flexibility in creating the user interface by providing users with easy-to-use links that enable them to access any function without having to remember predefined or complicated sequences of tasks. If users engage in inappropriate behavior, the system has roll-back tools that allow them to return to previous actions quickly. Additionally, an interruptible capability is demonstrated on the SABRE platform, created using a straightforward approach. For example, if users encounter an unexpected scenario or need to leave work, they can quickly return to it later. This attribute is easily accessible and user-friendly, and our goal is to make the platform easy to use for people of all skill levels.

Making interface consistency: To ensure consistency in the user interface, it is recommended to use a single template for all functionality. Hence, the homepage of the SABRE cross-domain crowdsourcing platform, as shown in Figure 4, utilizes the bootstrap template as the basis for its user interface design. Additionally, users should be able to recognize the same symbols, such as buttons, fonts, and other elements, regardless of where they are or what they are doing. Users feel more comfortable and are more likely to interact appropriately when presented with a consistent user interface. We also maintain consistent user behavior across all features using the same templates to display results and create comparable user experiences for different functionalities.

Reducing memory load: Pagination is a method used by websites to organize content into related pages for users’ convenience. E-commerce websites use this technique because they typically have many products that cannot be listed on a single page within a category. Additionally, for sites with large amounts of data that cannot be feasibly presented on a single page, pagination is the go-to solution to load new items on the website. One variation of this technique, ”Infinite Scrolling,” was implemented in the SABRE platform to improve the user experience. An example of this can be seen in Figure 5, where Infinite Scrolling is used to display posters of movies. Infinite Scrolling is frequently used for continuous content, such as social media and entertainment sites, where the material can remain current. The advantage of not having to click ”next page” is that it keeps users engaged with the content and less concerned with moving to the next page.

Figure 4.

The homepage of the SABRE cross-domain crowdsourcing platform.

Figure 5.

An example in the movie domain of the SABRE platform with the variant of pagination technique, called Infinite Scrolling.

4.2. Datasets

In order to assess the efficacy of the proposed approach, we utilized a dataset obtained from SABRE. Initially, we imported data from Kaggle and IMDB to the SABRE platform as an initial dataset, consisting of user, item, and rating of the Movie,³ Book,⁴ and Fashion⁵ domains. Table 1 displays the detailed initial dataset. Cognitive similarity data for users was gathered on the SABRE platform by allowing them to provide implicit feedback, such as clicking on suggested products or similar products. An illustration of user interaction on SABRE is presented in Figure 6. To ensure a reliable evaluation, we removed all user data containing less than 10 feedbacks. The final dataset collected from SABRE contained around 7,000 interactions from 3,210 users.

4.3. Evaluation

To evaluate the effectiveness of our proposed model, we compared the top recommendations with the actual behavior of users (i.e., clicking on similar products). Our experiments were assessed using two metrics, namely, mean average precision (MAP) and normalized discounted cumulative gain (NDCG), with $N$ set to 5 for NDCG. Better recommendations are indicated by higher results for these metrics.

Table 1.
Description of Datasets.

Domains #Items #Users Sparsity

Movie 14,235 1,337 0.9887

Book 32,577 4,332 0.9975

Fashion 44,121 3,221 0.9994

Domains	#Items	#Users	Sparsity
Movie	14,235	1,337	0.9887
Book	32,577	4,332	0.9975
Fashion	44,121	3,221	0.9994

The MAP measure takes into account the relative order of the relevant items in the recommendation ranking by generating the precision score after each one is discovered as follows:

MAP = \frac{1}{U} \sum_{u \in U} AP (u), AP (u) = \frac{1}{U} \sum_{n = 1}^{k} P@ n .1 (i_{n} \in {Rel}_{u})

(14)

where

i_{n}

is the item in the

n

-th position of the recommendation list of user

u

P@ n

is computed as follows:

P@k = \frac{1}{U} \sum_{u \in U} \frac{| {Rel}_{u} @k |}{k},

(15)

The NDCG proposed in Järvelin and Kekäläinen (2002) is a suitable metric when there are multiple levels of relevance in the ground truth. The more relevant an item, the more it contributes to the quality if it is recommended, but adjusted to its relative position in the ranking

NDCG @ k = \frac{1}{| U |} \sum_{u \in U} \frac{{DCG}_{u} @ k}{{IDCG}_{u} @ k}, DCG @ k = \sum_{n = 1}^{k} \frac{2^{{rel}_{u} (i_{n})} - 1}{log (n + 1)}

(16)

where

{rel}_{u} (i_{n})

is the graded relevance for user

u

of the item in the

n

th position of the ranking, and

{IDCG}_{u} @ k

is the discounted cumulative gain of the ideal ranking for user

u

at cutoff

k

Figure 6.

An example of interactions of users across multiple domains in the SABRE platform.

Table 2.

The Prediction Performance of the Proposed Model (UCCF) and mrSLIM, mrBPR, PopRank, NCDCF-U on Three Domains. The Best Results are in the Bold Text.

	Movie		Book		Fashion
	MAP	NDCG	MAP	NDCG	MAP	NDCG
mrSLIM	0.3114	0.1006	0.3961	0.1553	0.6215	0.4341
mrBPR	0.3433	0.1171	0.4280	0.1654	0.6306	0.3477
PopRank	0.2470	0.0719	0.2979	0.1022	0.4884	0.2349
NCDCF-U	0.2926	0.0913	0.3747	0.1425	0.5982	0.3259
UCCF	0.3846	0.1341	0.4445	0.1803	0.6431	0.3552

Note. UCCF = user cognition-based collaborative filtering; MAP = mean average precision; NDCG = normalized discounted cumulative gain.

To evaluate the proposed model, we compared the UCCF with several popular baseline methods discussed above, such as sparse linear methods for Top-N recommender systems (SLIM; Ning & Karypis, 2011), popularity-based recommendation (PopRank), the user-based neighborhood method integrating user’s multiple types of behavior (NCDCF-U; Yuan et al., 2014), and the Bayesian personalized ranking (Krohn-Grimberghe et al., 2012).

The feedback dataset was separated into two parts, with 80% used for training and 20% for testing. This process was repeated 10 times, with randomly selected samples. The weights for the three domains in the UCCF model were set, respectively, as $0.2, 0.3, ~and~ 0.5$ , and the optimal parameters were $δ = 0.5$ and $γ = 0.01$ .

Table 2 presents the results of our experiments, which show that the proposed UCCF model outperforms other baseline methods. PopRank, which recommends popular items, is inferior to all other methods, highlighting the importance of personalization in RSs. We also observed that generating movie recommendations is relatively easy, as all methods except PopRank perform well in the movie domain. This suggests that leveraging knowledge from the movie domain as an auxiliary domain to transfer information to other domains (e.g., book and fashion) could be beneficial. However, NCDCF-U, with $k = 100$ and $k = 50$ nearest neighbors, fails to achieve satisfactory results in the book and fashion domains due to the lack of a well-constructed mechanism for multi-task learning. We constructed mrSLIM by setting net elastic weights to $δ = 5$ and $α = 0.1$ to obtain the best performance for comparison. Furthermore, UCCF outperforms the state-of-the-art ranking algorithm mrBPR, which has a learning rate of $0.01$ and weights the $l 2$ -norm of $1 \times 10^{- 3}$ . In summary, the proposed UCCF method outperforms all baseline methods, even when tuned to their best settings. Figures 7 and 8 are charts that show the comparison results of the UCCF model with the baselines regarding each separate metric, the MAP and NDCG, respectively.

Figure 7.

Comparision of the prediction performance between UCCF, mrSLIM, mrBPR, PopRank, and NCDCF-U with MAP metric in three domains. Note. UCCF = user cognition-based collaborative filtering; MAP = mean average precision.

Figure 8.

Comparision of the prediction performance between UCCF, mrSLIM, mrBPR, PopRank, and NCDCF-U with NCDG metric in three domains. Note. UCCF = user cognition-based collaborative filtering; NCDG = normalized discounted cumulative gain.

Moreover, we aimed to investigate whether the prediction performance of our proposed model can be further enhanced with more domains. Thus, we conducted experiments on three scenarios of overlapping datasets: Movie-Book, Movie-Fashion, and Fashion-Book. We evaluated the prediction performance of our UCCF model in each overlapping scenario and compared it with the same baselines and settings used in the previous experiments. The results are presented in Table 3, which demonstrate that the UCCF model consistently outperforms the baselines in all three scenarios. Specifically, our proposed method achieved better prediction performance in terms of MAP metric in all scenarios, indicating the effectiveness of our method in multi-domain RSs.

Table 3.

Comparison of the Prediction Performance Between Proposed and Baseline Methods (MAP Metric). The Best Results are in the Bold Text.

	mrSLIM	mrBPR	PopRank	NCDCF-U	UCCF
Fashion-Book	0.2014	0.2333	0.1370	0.1826	0.2746
Movie-Fashion	0.2861	0.3180	0.1879	0.2647	0.3345
Movie-Book	0.5115	0.5206	0.3784	0.4982	0.5341

Note. MAP = mean average precision; UCCF = user cognition-based collaborative filtering.

However, the performance of UCCF itself is not better in the scenarios that use three domains. Especially in the case of generating Top-N recommendations in the Movie domain, the proposed model when using three domains gets very good performance in comparison with other scenarios using two domains. We also measure the prediction performance of the proposed method, UCCF, by using the normalized discounted cumulative gain (NCDG) metric. The results of the comparison of the prediction performance between proposed and baseline methods in terms of NCDG metrics are shown in Table. 4

Table 4.

Comparison of the Prediction Performance Between Proposed and Baseline Methods (NCDG Metric). The Best Results are in the Bold Text.

	mrSLIM	mrBPR	PopRank	NCDCF-U	UCCF
Fashion-Book	0.0906	0.1071	0.0619	0.0813	0.1241
Movie-Fashion	0.1453	0.1554	0.0922	0.1325	0.1703
Movie-Book	0.3241	0.3377	0.2249	0.3159	0.3471

Note. NCDG = normalized discounted cumulative gain; UCCF = user cognition-based collaborative filtering.

5. Conclusion

This study introduces a new approach for cross-domain recommendation called the UCCF model. It incorporates both user cognition and personalization to construct a collective similarity parameter. To achieve this, an online crowdsourcing platform was utilized to gather cognitive similarity data of items from users across multiple domains. By utilizing the similarity between users based on their cognitive similarity data across various domains, the cognitive similarity data of active users can be predicted through optimized neighbors. The experiments conducted on cross-domain datasets illustrate the effectiveness of the proposed UCCF model in generating top-N recommendations compared to other existing methods.

However, there are some limitations to the cognitive similarity dataset. Specifically, only three datasets in the movie, book, and fashion domains were selected for the experiments. This led to the scope of the experiments being reduced and maybe not adapting to many more domains. Besides, the number of actual users who can express their feedback on the crowdsourcing platform is increasing slowly is one of the limitations of our approach. We aim to focus on growing the number of actual users as fast as possible to ensure the cognitive similarity data is more extensive for future research on user cognition-based approaches to solving the problems of RSs. Because of these limitations, our future work aims to deploy the experiments on many datasets of different domains. To do so, we keep collecting cognitive similarity data from our online crowdsourcing platform with more domains. In addition, we will select other methods and metrics to evaluate the proposed model in terms of multiple aspects of recommendation quality, namely accuracy, novelty, diversity, and coverage.

Footnotes

ORCID iDs

Luong Vuong Nguyen

GwanPil Kim

Jason J. Jung

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Chung-Ang University Research Scholarship Grants in 2023.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Berkovsky

Kuflik

Ricci

(2007). Cross-domain mediation in collaborative filtering. In User modeling 2007: 11th international conference, UM 2007, Corfu, Greece, July 25–29, 2007. (pp. 355–359). Springer. http://doi.org/10.1007/978-3-540-73078-1_44

Duan

Jiang

Jain

H. K.

(2022). Combining review-based collaborative filtering and matrix factorization: A solution to rating’s sparsity problem. Decision Support Systems, 156, 113748. https://doi.org/10.1016/j.dss.2022.113748

Fernández-Tobías

Cantador

Tomeo

Anelli

V. W.

Di Noia

(2019). Addressing the user cold start with cross-domain collaborative filtering: Exploiting item metadata in matrix factorization. User Modeling and User-Adapted Interaction, 29, 443–486. https://doi.org/10.1007/s11257-018-9217-6

Galitz

W. O.

(2007). The essential guide to user interface design: An introduction to GUI design principles and techniques. John Wiley & Sons. https://dl.acm.org/doi/10.5555/1202463

Cao

Zhu

(2013). Personalized recommendation via cross-domain triadic factorization. In Proceedings of the 22nd international conference on World Wide Web (pp. 595–606). ACM. https://doi.org/10.1145/2488388.2488441

Zhang

Yang

(2019). Transfer meets hybrid: A synthetic approach for cross-domain collaborative filtering with text. In The world wide web conference (pp. 2822–2829). ACM. https://doi.org/10.1145/3308558.3313543

Järvelin

Kekäläinen

(2002). Cumulated gain-based evaluation of ir techniques. ACM Transactions on Information Systems (TOIS), 20(4), 422–446. https://doi.org/10.1145/582415.582418

Kohavi

(1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Proceedings of the fourteenth international joint conference on artificial intelligence, IJCAI 95, Montréal Québec, Canada, August 20–25 1995, 2 Volumes (pp. 1137–1145). Morgan Kaufmann. https://dl.acm.org/doi/10.5555/1643031.1643047

Koren

Bell

Volinsky

(2009). Matrix factorization techniques for recommender systems. Computer, 42(8), 30–37. https://doi.org/10.1109/MC.2009.263

10.

Krohn-Grimberghe

Drumond

Freudenthaler

Schmidt-Thieme

(2012). Multi-relational matrix factorization using bayesian personalized ranking for social network data. In Proceedings of the fifth ACM international conference on Web search and data mining (pp. 173–182). ACM. https://doi.org/10.1145/2124295.2124317

11.

Tuzhilin

(2020). DDTCDR: Deep dual transfer cross domain recommendation. In Proceedings of the 13th international conference on web search and data mining (pp. 331–339). ACM. https://doi.org/10.1145/3336191.3371793

12.

Yang

Xue

(2009a). Can movies and books collaborate? cross-domain collaborative filtering for sparsity reduction. In Proceedings of the 21st International Joint Conference on Artificial Intelligence (IJCAI'09). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2052–2057. https://dl.acm.org/doi/10.5555/1661445.1661773

13.

Yang

Xue

(2009b). Transfer learning for collaborative filtering via a rating-matrix generative model. In Proceedings of the 26th annual international conference on machine learning (pp. 617–624). ACM. https://doi.org/10.1145/1553374.1553454

14.

Liu

Zheng

Chen

(2022). Collaborative filtering with attribution alignment for review-based non-overlapped cross domain recommendation. In Proceedings of the ACM web conference 2022 (pp. 1181–1190). ACM. https://doi.org/10.1145/3485447.3512166

15.

Natarajan

Vairavasundaram

Kotecha

Indragandhi

Palani

Saini

J. R.

Ravi

(2022). CD-semmf: Cross-domain semantic relatedness based matrix factorization model enabled with linked open data for user cold start issue. IEEE Access, 10, 52955–52970. https://doi.org/10.1109/ACCESS.2022.3175566

16.

Nguyen

L. V.

Hong

M. -S.

Jung

J. J.

Sohn

B.-S.

(2020a). Cognitive similarity-based collaborative filtering recommendation system. Applied Sciences, 10(12), 4183. https://doi.org/10.3390/app10124183

17.

Nguyen

L. V.

Jung

J. J.

(2020). Crowdsourcing platform for collecting cognitive feedbacks from users: A case study on movie recommender system. In: Pham, H. (eds) Reliability and Statistical Computing. Springer Series in Reliability Engineering. Springer, Cham., 139–150. https://doi.org/10.1007/978-3-030-43412-0_9

18.

Nguyen

L. V.

Jung

J. J.

(2023). SABRE: Cross-domain crowdsourcing platform for recommendation services. In Intelligent distributed computing XV (pp. 213–223). Springer. https://doi.org/10.1007/978-3-031-29104-3_24

19.

Nguyen

L. V.

Jung

J. J.

Hwang

(2020b). Ourplaces: Cross-cultural crowdsourcing platform for location recommendation services. ISPRS International Journal of Geo-Information, 9(12), 711. https://doi.org/10.3390/ijgi9120711

20.

Nguyen

L. V.

Nguyen

T. -H.

Jung

J. J.

(2020c). Content-based collaborative filtering using word embedding: A case study on movie recommendation. In Proceedings of the international conference on research in adaptive and convergent systems (pp. 96–100). ACM. https://doi.org/10.1145/3400286.3418253

21.

Nguyen

L. V.

Nguyen

T. -H.

Jung

J. J.

(2021). Tourism recommender system based on cognitive similarity between cross-cultural users. In Intelligent environments 2021; Workshop proceedings of the 17th international conference on intelligent environments (Vol. 29, pp. 225–232). IOS Press. https://doi.org/10.3233/AISE210101

22.

Ning

Karypis

(2011). SLIM: Sparse linear methods for top-n recommender systems. In 2011 IEEE 11th international conference on data mining (pp. 497–506). IEEE. https://doi.org/10.1109/ICDM.2011.134

23.

Pan

Liu

N. N.

Xiang

E. W.

Yang

(2011). Transfer learning to predict missing ratings via heterogeneous user feedbacks. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three (IJCAI'11). AAAI Press, 2318–2323. https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-386

24.

Ricci

Rokach

Shapira

(2010). Introduction to recommender systems handbook. In Recommender systems handbook (pp. 1–35). Springer. https://doi.org/10.1007/978-0-387-85820-3_1

25.

Singh

A. P.

Gordon

G. J.

(2008). Relational learning via collective matrix factorization. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 650–658). Defense Technical Information Center. https://doi.org/10.1145/1401890.1401969

26.

Vuong Nguyen

Nguyen

T.-H.

Jung

J. J.

Camacho

(2021). Extending collaborative filtering recommendation using word embedding: A hybrid approach. In Concurrency and computation: Practice and experience 2023; 35(16):e6232. https://doi.org/10.1002/cpe.6232

27.

Gao

Sun

(2022). Cross-domain recommendation based on latent factor alignment. Neural Computing and Applications, 34(5), 1–12. https://doi.org/10.1007/s00521-021-05737-w

28.

Jiang

Gong

(2019). A cross-domain collaborative filtering algorithm with expanding user and item features via the latent factor space of auxiliary domains. Pattern Recognition, 94, 96–109. https://doi.org/10.1016/j.patcog.2019.05.030

29.

Yuan

Cheng

Zhang

Qiu

(2014). Recommendation by mining multiple user behaviors with group sparsity. Proceedings of the AAAI Conference on Artificial Intelligence. 28, 1 (Jun. 2014). https://doi.org/10.1609/aaai.v28i1.8713

30.

Yuan

Yao

Benatallah

(2019). DAREC: Deep domain adaptation for cross-domain recommendation via transferring rating patterns. arXiv preprint arXiv:1905.10760. https://doi.org/10.48550/arXiv.1905.10760

31.

Zhang

Kong

Member

Zhang

(2022). Cross-domain collaborative recommendation without overlapping entities based on domain adaptation. Multimedia Systems, 28(5), 1621–1637. https://doi.org/10.1007/s00530-022-00923-9

32.

Zhao

Tian

Cui

Feng

(2022). A new item-based collaborative filtering algorithm to improve the accuracy of prediction in sparse data. International Journal of Computational Intelligence Systems, 15(1), 15. https://doi.org/10.1007/s44196-022-00068-7

33.

Zhu

Wang

Chen

Liu

Orgun

(2020). A deep framework for cross-domain and cross-system recommendations. arXiv preprint arXiv:2009.06215. https://doi.org/10.48550/arXiv.2009.06215

User Clustering Based on Cross-Domain Cognition for Recommendation Services #

Abstract

Keywords

1. Introduction

2. Related Work

4.3. Evaluation

Table 1. Description of Datasets. Domains #Items #Users Sparsity Movie 14,235 1,337 0.9887 Book 32,577 4,332 0.9975 Fashion 44,121 3,221 0.9994

Footnotes

ORCID iDs

Funding

Declaration of Conflicting Interests

Notes

References

Table 1.
Description of Datasets.

Domains #Items #Users Sparsity

Movie 14,235 1,337 0.9887

Book 32,577 4,332 0.9975

Fashion 44,121 3,221 0.9994