Sage Journals: Discover world-class research

Abstract

Personalized news recommendation aims to address information overload problems and find interesting news for users, which is essential for online news portals and platforms nowadays. Existing deep neural networks and pretrained language models require some labeled data to train end-to-end models or fine-tune object functions. More recently, prompt-tuning methods have been proposed to minimize the gap between the knowledge acquired during pretraining and the fine-tuned model. Although these methods have achieved fairly good results in news recommendation, how to take the personalized characteristics of items into template construction and achieve competitive performance compared to hand-crafted prompts through soft prompt-tuning is still an important academic and practical challenge. In this article, we propose a soft prompt-tuning method for personalized news recommendation. The side information of news is introduced to learn the characteristics of items in the recommendation, and different strategies of verbalizer optimization are designed for performance improvement. Specifically, the summaries and subcategories of news are first introduced for template construction to consider the characteristics of news, which can provide a more comprehensive and accurate description of news. Secondly, three different strategies are designed to expand the label word space for modifying soft prompts, and the integration of these strategies is used for final verbalizer optimization, which can significantly reduce additional noise and improve recommendation accuracy. Extensive experiments conducted on the MINDsmall and MINDlarge datasets validated the effectiveness of the proposed method compared to other state-of-the-art prompt methods.

Keywords

soft prompt-tuning news recommendation prompt-tuning verbalizer

1. Introduction

With the rapid development of the internet, online news portals and platforms have played an important role in people’s daily life. Users can easily access all kinds of news whenever and wherever possible, which brings convenience but also leads to the serious problem of “information overload.” To address these issues, personalized news recommendation aims to help users find their most interesting news from the huge amounts of news, which can be an effective filtering tool for users and an essential function for online news platforms nowadays (Wang et al., 2018; Zhang & Wang, 2023).

The research paradigm of news recommendation evolved from deep neural networks to pretrained language models (PLMs), which both show substantial performance in this task. Existing news recommendation methods based on deep neural networks mainly focus on learning the higher-level and abstract feature representations of users and items, which aim to find the connections between users and items to help personalized recommendations. The main intuition behind these methods is to learn the similarities between users and items in different networks. For example, Wang et al. (2018) proposed a deep knowledge-aware network for news recommendation (DKN) that incorporated knowledge graph representation for news recommendation, which introduced a convolution neural network and attention module to discover latent connections among news. Wu et al. (2019b) proposed a news recommendation method based on multihead self-attention (NRMS), where the representations of news and users are learned from news titles and users’ browsing, respectively. However, the static word vectors, such as Word2Vec and GloVe, are predominantly utilized as initializations in these deep-based news recommendation methods, which primarily focus on extracting information inside the recommendation dataset itself, while often neglecting the wealth of semantic and linguistic information available in real-world large-scale corpora.

Recently, there have already been some efforts in devoting PLMs to personalized news recommendation. In these methods, some popular PLMs (e.g., BERT and RoBERTa) have been introduced to fine-tune downstream recommendation tasks on pretrained knowledge. The PLMs are utilized as the news encoder, and the specific function for news recommendation is used to train the model, which has achieved substantial performance compared to deep-based methods. For example, Sun et al. (2019) proposed a sequential recommendation named BERT4Rec, the masked language model is introduced to learn the representations of user behaviors, and the masked items are predicted using both left and right context. However, due to the significant gap of objective forms in pretraining and fine-tuning, these fine-tuned pretrained language model (PLM) methods cannot stimulate the abundant and rich knowledge distributed in a large-scale pretrained model for news recommendation.

To address the huge gap between the knowledge acquired during pretraining and the fine-tuned model, the prompt-tuning model is proposed and has achieved awesome performance in various natural language processing (NLP) downstream tasks, especially for few-shot and even zero-shot learning scenarios, including machine translation (Zhang et al., 2023), question answering (Chappuis et al., 2022), sentiment analysis (Mao et al., 2022), and text classification (Zhu et al., 2024). In the prompt-tuning, the input statements are converted into the cloze-style tasks, which introduced the natural language template and adapted the masked model (Ding et al., 2021). For instance, given the news $x$ to be classified into interested or uninterested to the user, the prompt-tuning model with a manual template can be wrapped into “The title of news is [‘placeholder’: ‘text_a’], the user [MASK] the news,” and the probability of sentimental word such as “like” or “dislike” will be calculated to fill the “[MASK]” token. In contrast to previous fine-tuning PLM methods, there is no requirement for an additional neural layer in prompt-tuning, and it can be guaranteed to achieve excellent performance in few-shot scenarios.

Motivated by the recent success of the prompt-tuning model, in this paper, we propose a Soft Prompt-tuning method for Personalized News Recommendation (SP-PNR), which aims to take the personalized characteristics of items into template construction and achieve competitive performance compared to hand-crafted through soft prompt-tuning. Specifically, firstly, all the side information of news, including the summaries and subcategories, is introduced to learn the characteristics of news, which can provide a more comprehensive and accurate description of news for personalized recommendation. Secondly, several strategies are designed to capture different characteristics of expanded words, and the integration of these strategies is used for final verbalizer optimization, which can significantly reduce additional noise and improve recommendation accuracy (Acc) for modifying soft prompts. Extensive experimental results conducted on the MINDsmall and MINDlarge datasets validate the effectiveness of our SP-PNR compared to other state-of-the-art methods. The contributions of our method can be summarized as follows:

To take the personalized characteristics of news into template construction, a prompt-tuning model with the summaries and subcategories of news is proposed.

To achieve better recommendation performance by soft prompt-tuning, several strategies are employed to capture different characteristics of the expanded words for verbalizer optimization.

The experimental results on the MINDsmall and MINDlarge datasets confirm that our SP-PNR can achieve state-of-the-art performance compared to other news recommendation methods based on deep neural networks and prompt-tuning.

2. Related Work

2.1. Personalized News Recommendation

Personalized news recommendation methods have played critical roles in almost all online news platforms to alleviate the information overload problems of users.

In the past few years, deep neural networks have exhibited powerful advantages in multilevel feature extraction, implicit feature learning, sequence modeling, and attention mechanisms. Most popular deep neural networks, such as convolutional neural network (CNN), recurrent neural network (RNN), long short-term memory (LSTM), and autoencoder, have been applied to news recommendation tasks and achieved significant performance improvements. For example, Wu et al. (2019a) proposed a neural news recommendation approach (neural attentive multiview learning for news recommendation [NAML]), which can learn informative representations of users and news by exploiting different kinds of news information. Zhu et al. (2019) proposed a deep attentional neural network, which employed a parallel CNN that enhanced with attention to condense user interest features, then the method introduced an RNN fortified by attention to delve into intricate sequential patterns within user click behaviors. An et al. (2019) proposed a neural news recommendation method with long- and short-term user representations (LSTUR), which effectively fused the user’s long-term and short-term interests to provide an effective method for modeling user interest. While deep neural network methods demonstrate remarkable performance, they often fail to fully exploit the rich semantic structures within news articles. In light of this, recent advancements in natural language processing have paved the way for the integration of PLMs into personalized news recommendations.

Recently, fine-tuned PLMs such as BERT (Devlin et al., 2018), ALBERT (Lan et al., 2019), and BioGPT (Luo et al., 2022) have emerged as a powerful tool for exploiting rich knowledge in NLP tasks. By fine-tuning PLMs with specific downstream tasks, the latent information can be learned, and these models have achieved tremendous success in various NLP tasks. Unlike traditional models that are typically trained directly on labeled data for specific tasks, PLMs first undergo pretraining on a large-scale unlabeled corpus using self-supervised learning to encode universal text information. Consequently, PLMs usually offer a more advantageous starting point for fine-tuning in downstream tasks (Qiu et al., 2020). Considering the superior performance, fine-tuning PLM methods have been widely applied to acquire richer semantic information for news recommendation. For example, Wu et al. (2021) explored to model news with PLMs and fine-tuned them with the news recommendation task. Yu et al. (2021) proposed the Tiny-NewsRec framework, which effectively improved the performance and efficiency of news recommendation systems based on PLMs by reinforcing the model’s understanding of domain-specific features and optimizing the knowledge transfer process. Zhang et al. (2021) proposed the User–News Matching BERT for News Recommendation (UNBERT) approach, which employed a multilayered transformer architecture to enhance text representation through pretrained models with rich linguistic knowledge. Despite fine-tuning PLMs has achieved sound performance in personalized news recommendation, the significant gap between objective forms in pretraining and fine-tuning has restricted taking full advantage of knowledge in PLMs.

2.2. Prompt-Tuning

In response to the challenges of the huge gap between pretraining and fine-tuning in fine-tuning PLM methods, more recently, prompt-tuning has been advanced, influenced by GPT-3. Prompt-tuning streamlines the process by constructing natural language templates and strategically inserting input statements, followed by fine-tuning the masked model to transform tasks into cloze-style completion tasks, which aims to reduce the reliance on manual customization and enhance the consistency and reliability of evaluations. Prompt-tuning has shown impressive performance in various downstream NLP tasks, including relation extraction (Chen et al., 2022), data augmentation (Wang et al., 2022), and sentiment analysis (Li et al., 2021), particularly in few-shot learning scenarios.

The success of the prompt-tuning methods relies heavily on appropriate templates and suitable label words. The hand-crafted templates are first designed, which refer to discrete prompts that are manually specified and remain unchanged during training. For instance, Brown et al. (2020) created manually crafted prefix prompts to handle a wide variety of tasks, including question answering, translation, and probing tasks for common sense reasoning. Han et al. (2022) applied logic rules to construct prompts with several sub-prompts for relation classification, consistently outperforming existing state-of-the-art baselines without introducing additional model layers, manual annotations, or augmented data. Li et al. (2023) designed a series of personalized templates specifically to accommodate the unique preferences among different users, thereby enhancing the personalization level of news recommendations and ultimately optimizing the overall performance of the recommendation system. While the strategy of hand-crafted templates is intuitive and does allow solving various tasks with some degree of Acc, there are also several issues to be addressed: (1) Creating and experimenting with these prompts is an art that takes time and experience, particularly for some complicated tasks such as semantic parsing (Shin et al., 2021); (2) even experienced prompt designers may fail to manually discover optimal prompts (Jiang et al., 2020).

To address these problems, several methods have been proposed to automate the template design process. Soft templates are continuous prompts, usually presented as vectors, that can be continually optimized during training to obtain optimal results. For example, Shin et al. (2020) proposed a gradient-based prompt search method to automatically generate templates in prompt-tuning. Su et al. (2021) enhanced prompt-tuning via prompt transfer and investigated the transferability of soft prompts across distinct downstream tasks. Liu et al. (2022) proposed an automatic prompt generation method that achieved promising performance in Natural Language Understanding tasks by identifying the template suitable for downstream tasks and incorporating learnable vectors into the template while continually optimizing it during training. Zhang and Wang (2023) designed a diverse array of prompt templates consisting of discrete, continuous, and hybrid types, and correspondingly built answer spaces for each template to systematically examine and validate the efficacy and applicability of their proposed Prompt4NR framework.

Besides the generation of templates, the mapping from label words to categories, that is, the verbalizer, has proven effective in addressing the discrepancy between text and label space. For example, Hu et al. (2021) demonstrated the effectiveness of knowledge-enhanced tuning by enriching their label vocabulary with external knowledge bases and subsequently refining the augmented set of labels through meticulous processing. Wei et al. (2022) proposed a prototypical network that aggregates the semantic information of labels to construct a prototypical prompt verbalizer, thereby enabling the generation of prototypical embeddings for various labels within the feature space. Cui et al. (2022) proposed an evolutionary verbalizer search algorithm, which aims to improve prompt-based tuning with the high-performance verbalizer.

3. Methods

In this section, we will sequentially and meticulously outline the comprehensive framework, the automatic template generation, the construction of the verbalizer, and the news recommendations. The whole framework of the proposed SP-PNR is illustrated in Figure 1.

Figure 1.

The Whole Framework of our Soft Prompt-Tuning Method for Personalized News Recommendation (SP-PNR).

3.1. Comprehensive Framework

The insightful observation of prompt-tuning motivates us to complete the soft prompt-tuning method for personalized news recommendations. To ensure the personalization of recommendations, we train a separate model for each user. As shown in Figure 1, there are three main components in our SP-PNR: automatic template generation, verbalizer construction, and news recommendation. Firstly, in the experiments, we obtain the titles, subcategories, and summaries from the news, which aims to obtain as much comprehensive semantic information as possible from the news. These three are then taken as inputs and denoted by $x$ . The input $x$ and the mask are mapped to the embedding via PLMs (such as BERT in the experiments). Different from the hand-crafted templates, our method employs a neural network, such as BiLSTM in the experiments, for training the soft prompt. Secondly, considering that users’ responses to news are not simply a matter of like or dislike, but rather manifest as nuanced variations in their levels of interest and engagement with the news content, we have introduced a vocabulary expansion strategy. Through this external expansion mechanism, we can more comprehensively address the diverse expressions that users might have toward news at a deeper level. Finally, the constructed soft prompts and verbalizers are employed to predict users’ sentiment classification toward candidate news items, and based on these predictions, news recommendations are realized accordingly.

3.2. Automatic Template Generation

In our SP-PNR, news titles, subcategories, and summaries are denoted as $x_{a}$ , $x_{b}$ , and $x_{c}$ , respectively, which served as inputs incorporated within the soft prompt $T$ . Moreover, the soft prompt $T$ consists of the mask and the soft tokens.

As an example, to accurately predict user U730’s attitude toward the candidate news, we take the following news: Title: “Young and the Restless” star William Wintersole passes away at 88, Category: TV news, Summary: Well-known soap opera actor William Wintersole passes away at age 88. We incorporated all the detailed semantic information of the above news, including title, category, and summary, into our predefined template. Through this process, we aim to accurately determine user U730’s attitudinal inclination toward the candidate news.

For the task of automatic template generation, the PLMs are represented as $M$ in the experiments. Then given the input sentence as $x = {x_{0}, \dots, x_{i}, \dots, x_{h}}$ , which is first mapped into the embeddings as $e (x) = {e (x_{0}), \dots, e (x_{i}), \dots, e (x_{h})}$ by PLMs $e \in M$ . Then the soft prompt $T$ consists of the soft tokens, $e (x)$ , and the embeddings of mask $e (mask)$ . Let $V$ represents the vocabulary of PLMs $M$ and $[P_{i}]$ represents the $i^{th}$ prompt token in template, the soft prompt $T$ can be shown as (1):

\begin{aligned} T = {[P_{0}], \dots, [P_{i}], e (x), [P_{i + 1}], \dots, [P_{n}], e (mask)} . \end{aligned}

(1)

Different from the hand-crafted template, we employ the neural network in the experiments for training soft tokens, and the template we use can be represented as (2):

\begin{aligned} T = {h_{0}, \dots, h_{i}, e (x), h_{i + 1}, \dots, h_{n}, e (mask)}, \end{aligned}

(2)

where

h_{i}

is the embedding tensor. To address the problem of discreteness, the BiLSTM model is introduced as the neural network to model

h_{i}

, which can be formulated as (3):

\begin{aligned} h_{i} = ({\vec{h}}_{i}, {\vec{h}}_{i}) = (\vec{LSTM} (h_{0}, \vec{h_{i - 1}}), \overset{\leftarrow}{LSTM} (h_{i + 1}, \overset{\leftarrow}{h_{n}})) . \end{aligned}

(3)

Notably, a significant advantage of soft prompt templates lies in their ability to dynamically adjust their parameters based on specific task contexts. This characteristic endows soft prompts with enhanced adaptability and generalization capabilities across diverse scenarios and tasks, eliminating the need for manual design or tedious modification of fixed token templates. In contrast, hard-crafted templates are constrained by their inherent rigidity and limited vocabulary selection, potentially hindering the full expression of rich model semantics and thereby increasing the risk of over-fitting. On the other hand, soft prompts effectively generate appropriate tokens through refined parameter optimization processes, allowing them to capture and convey complex semantic nuances more accurately, which significantly reduces the likelihood of over-fitting. In a word, the mechanism of generating prompts via parameter optimization not only enhances the understanding and representation of intricate semantics but also, to a large extent, mitigates the occurrence of over-fitting issues.

3.3. Verbalizer Construction

Prompt-tuning entails a procedure in which a verbalizer systematically assigns label words to their corresponding categories, serving as a valuable strategy for enhancing the performance of downstream tasks. Our approach began with a focus on elementary sentiment-bearing terms, such as “like” and “dislike,” initially augmenting our vocabulary collection by compiling their synonymous counterparts. Subsequently, we employed an advanced large language model to uncover emotionally charged label words derived from user reactions to candidate news items, thereby further enriching the content of our label repository.

Furthermore, our SP-PNR incorporates three additional strategies for extending the range of label words. These methods not only mitigate potential noise within the expanded label words but also boost overall efficiency by abbreviating execution times. Each strategy distinctively addresses a particular attribute of the enlarged words’ nature. The following provides a detailed account of these three strategies:

BERT Prediction. Probability prediction is a crucial feature for news recommendation, as it provides information about both the context and the mask word in the template. To leverage the vast knowledge contained in PLMs, probability prediction is an essential strategy for constructing the verbalizer. Existing PLMs optimize two training objectives: masked language modeling (MLM) and next sentence prediction. MLM aims to learn to fill in the word at the masked position, which is randomly sampled from the input sentence and can directly predict the “[MASK]” word in the template. The probability of each predicted word corresponds to its relevance to the corresponding category. If the probability of a predicted word is higher, it will have a higher selected ranking. In the experiments, we use BERT to obtain the probability distribution $p ([MASK] | T)$ of the vocabulary corresponding to the “[MASK]” word, and the top $N_{a}$ words are selected from the probability distribution. BERT prediction enables the utilization of contextual information in the input sentence to predict the most relevant words for the category.

Feature Similarity. Another widely used feature for constructing verbalizers is the similarity between concepts and the category name $y$ . In the experiments, the classical Term Frequency-Inverse Document Frequency algorithm is utilized to obtain vector representations of words, which prefers to use a word that is relevant to a specific document while irrelevant to other documents as the keyword of the document. In our method, a class with the category name $y$ is analogous to the document, while a label word is comparable to the word in the document. From this perspective, the similarity between the category name $y$ and the expanded label words is measured by cosine similarity. Given the vector representations of the category name $y$ and expanded label word $s$ as $v_{y}$ and $v_{s}$ , respectively, the cosine similarity is computed as shown in equation (4):

\begin{aligned} \cos (v_{y}, v_{s}) = \frac{\sum_{i = 1}^{g} v_{y}^{i} v_{s}^{i}}{\sqrt{\sum_{i = 1}^{g} {(v_{y}^{i})}^{2}} \times \sqrt{\sum_{i = 1}^{g} {(v_{s}^{i})}^{2}}}, \end{aligned}

(4)

where

g

is the dimension of the vector representation, and

v_{y}^{i}

is the

i^{th}

dimension of

v_{y}

. As with other strategies, words with lower similarity are discarded, and the top

N_{a}

words are selected.

Context Information. The expanded words should take the sequence of words preceding and following the masked word into consideration, which refers to context information. In the experiments, we introduced PLMs such as BERT instead of traditional N-gram language modeling for context information. However, since BERT is a nonautoregressive language model, we cannot directly compute the likelihood of a sentence. Therefore, we introduce a symmetric window of size $c$ around the masked word “[MASK]” as context. We represent the context of the masked word $w$ as $W = \dots w_{- c}, \dots, w_{- 1}, w, w_{1}, \dots, w_{c}, \dots$ , where each $w_{i}$ in $W$ is then masked from front to back and fed into the BERT model to compute the loss of $w$ , which can be expressed as (5):

\begin{aligned} L (w_{i}) = - \sum_{v_{i} \in V} 1 v_{i} = w_{i} \times \log p (v_{i} = w_{i} | W_{∖ w_{i}}), \end{aligned}

(5)

where

V

represents the set of words in the vocabulary,

1 \cdot

is the indicator function, and

p (v_{i} = w_{i} | W_{∖ w_{i}})

is the BERT prediction distribution conditioned on

W

excluding

w_{i}

. The total loss of

W

is then computed as the average of the loss for each word

w_{i}

, which can be represented as (6):

\begin{aligned} L (W) = \frac{1}{2 c + 1} \sum_{i = - c}^{i = c} L (w_{i}) . \end{aligned}

(6)

All the expanded words are sorted based on their corresponding sequence loss $L (W)$ . Notably, in the experiments, we set $c$ to 5 and discarded words with higher loss; the top $N_{a}$ words are selected.

It is worth noting that $N_{a}$ =5 is taken on all three strategies in the experiments, which are subsequently merged with the initial vocabulary set and de-duplicated to ensure the uniqueness of the vocabulary.

3.4. News Recommendation

Upon successfully constructing the final verbalizer through a variety of strategies, a crucial subsequent step involves appropriately mapping the predicted probabilities for each label word to their respective categories. This mapping process can be effectively represented by an objective function $g$ , which signifies the utilization of the verbalizer. Specifically, $g$ can be calculated as :

\begin{aligned} \arg max_{y \in Y} \frac{1}{| V_{y} |} \sum_{v \in V_{y}} p ([MASK] = v | x_{p}), \end{aligned}

(7)

where

V_{y}

represents the set of label words corresponding to the label

y

and

| V_{y} |

denotes the cardinality of

V_{y}

. The function

p ([MASK] = v | x_{p})

computes the probability of the label word

v

given the input text

x_{p}

Based on precise modeling of user interests, following the application of objective function $g$ to effectuate meaningful mappings, the system can accurately differentiate between candidate news items that users are potentially interested in and those they are not. Consequently, we can effectively tailor recommendations to present users with likely interesting candidate news stories, thereby enhancing both the precision of personalized recommendations and the overall user experience.

4. Experiments

In this section, we conduct extensive experiments on a classic news dataset to evaluate the effectiveness of our proposed personalized recommendation method. Specifically, we first provide a detailed description of the datasets used in the experiments. Secondly, the compared methods and evaluation metrics of the experiment are introduced. Then, we introduce the results of the experiments and their observations.

4.1. Dataset

We followed the approach in the HDNR (Wang et al., 2023) and selected MINDlarge and MINDsmall as our experimental datasets (Table 1). The following section provides an introduction to these two datasets.

MINDsmall-488: The MINDsmall dataset is a widely used news recommendation dataset derived from anonymous user behavior logs from Microsoft’s news website. We created the MINDsmall-488 subset by selecting 488 users with historical click volumes between 160 and 320, ensuring sufficient interaction records without excessive redundancy. The dataset was processed by merging and deduplicating the training and testing sets, allocating 70% for training and 30% for testing. Each user’s data was formatted into combinations of user IDs, news titles, categories, and summaries.

MINDlarge-1138: The MINDlarge dataset is also derived from the same source. We constructed the MINDlarge-1138 subset by selecting 1,138 users with click volumes between 200 and 500, following the same criteria and processing methods as MINDsmall. This subset captures richer user–news interactions while maintaining focused user interests.

Furthermore, we have made our dataset and source code publicly available on “https://github.com/zhuyiYZU/SP-PNR.”

4.2. Compared Methods and Evaluation Metrics

4.2.1. Compared Methods

We compare SP-PNR with the following deep learning and fine-tuning PLM methods for news recommendation to demonstrate the effectiveness of our method:

NRMS (Wu et al., 2019b). The NRMS method employs the multihead self-attention mechanism in the Transformer architecture to capture the long-distance dependence of news headlines and content and the complex patterns of historical user behavior.

Neural News Recommendation with LSTUR (An et al., 2019). The LSTUR method provides a powerful and flexible approach to modeling user interests for recommender systems by effectively fusing long-term and short-term user interest features by utilizing user IDs and recent behavioral sequences, respectively.

NAML (Wu et al., 2019a). The method designs a semi-automatic encoder-based hybrid collaborative filtering recommendation method that utilizes multiperspective learning and attention mechanisms to process complex information for news recommendation.

DKN (Wang et al., 2018). The method improves the Acc and relevance of recommendations by fusing knowledge graph information with deep learning techniques to better understand and capture the complex semantics and diversity of user interests in news content.

UNBERT (Zhang et al., 2021). The UNBERT model employs a multilayered Transformer architecture to enhance the textual representation by pretrained models with rich linguistic knowledge.

4.2.2. Implementation Details and Parameter Settings

We utilize BERT (Devlin et al., 2018) as the backbone PLMs, and the BERT-base-cased model is used in our experiments. Acc, recall (Rec), and missing alarm rate (MAR) are adopted as the metrics. In our SP-PNR, the learning rate, the batch size, the hidden size, and the dropout rate are set to $1 \times 10^{- 4}$ , 16, 200, and 0.5, respectively. The AdamW is utilized as the optimizer. Furthermore, the number of epochs and the weight decay are set to 3 and 0.01. For NRMS, LSTUR, NAML, DKN, and UNBERT, we use the default parameters as reported in An et al. (2019), Wang et al. (2018), Wu et al. (2019a, 2019b), and Zhang et al. (2021), respectively.

Table 1.
Details of Dataset Used in our Experiments.

Dataset Number of users Number of news Number of interactions

MINDsmall-488 488 20,274 219,776

MINDlarge-1138 1,138 33,787 947,951

Dataset	Number of users	Number of news	Number of interactions
MINDsmall-488	488	20,274	219,776
MINDlarge-1138	1,138	33,787	947,951

All experimental results were obtained on a server with an NVIDIA Geforce RTX 3090 Founders Edition GPU, an Intel(R) Core(TM) i9-10980XE CPU running at 3.00 GHz, and 64 GB of memory. In addition, we employed Python version 3.7 in conjunction with PyTorch version 1.13.1 and OpenPrompt version 1.0.1.

4.2.3. Evaluation Metrics

In the experiments, Acc, Rec, and MAR are used to evaluate the effectiveness of our proposed SP-PNR and all compared methods; these three evaluation metrics are defined as (8), (9), and (10). The bigger the values of Acc and Rec, the better the performance of the methods. The smaller the values of MAR, the better the performance of the methods. Increased Acc signifies a closer alignment between our predictions of users’ news preferences and the actual situation. In a news recommendation system, a higher Rec ensures that our predicted results increasingly encompass all the news content that genuinely interests users. Rec and MAR work in tandem, both contributing to maximizing the provision of potentially favored content for users, which is the central goal of a news recommendation. By synergistically coordinating and optimizing these three evaluation metrics, we can more effectively achieve this objective.

\begin{aligned} Acc & = \frac{\sum_{p_{i}, p_{i}^{'} \in TestSet} | p_{i} = p_{i}^{'} |}{| TestSet |} . \end{aligned}

(8)

\begin{aligned} Rec & = \frac{\sum_{p_{i}, p_{i}^{'} \in TestSet} | p_{i} = p_{i}^{'} = 1 |}{\sum_{p_{i} \in TestSet} | p_{i} = 1 |} . \end{aligned}

(9)

\begin{aligned} MAR & = 1 - \frac{\sum_{p_{i}, p_{i}^{'} \in TestSet} | p_{i} = p_{i}^{'} = 1 |}{\sum_{p_{i} \in TestSet} | p_{i} = 1 |} . \end{aligned}

(10)

where

p_{i}

denotes the whole label, and

p_{i}^{'}

denotes the predication label.

4.3. Experimental Results

For both datasets, we train each user independently to fully demonstrate the personalized characteristics of the recommendation system. Considering the large number of users and the uneven distribution of information among individuals, we adopt the average value of all users as the comprehensive performance index. The results of Acc, Rec, and MAR on all the methods are presented in Table 2. By analyzing the results of the experiments, we can make the following observations:

While deep neural network methods, such as NAML, NRMS, LSTUR, and DKN, have presented fairly good and competitive results on both datasets, they often fail to fully exploit the rich semantic structures within news articles. We believe that is why these methods cannot achieve the best performance in the two datasets. Moreover, the results of these deep neural networks fluctuate greatly on different datasets, indicating that these methods have a strong dependence on training data, and their robustness and generality are relatively poor.

Despite the fine-tuning PLMs, such as UNBERT, can acquire richer semantic information for news recommendation, the significant gap between objective forms in pretraining and fine-tuning restricts taking full advantage of knowledge in PLMs, which can be validated from the results in two datasets. In contrast, our SP-PNR achieved superior performance on both datasets, reflecting a marginal advantage in introducing prompt-tuning.

Furthermore, our SP-PNR still demonstrates optimal performance when comparing multiple types of news recommendation methods on both Rec and MAR. The results on both datasets strongly suggest that our method can successfully incorporate the deep semantic information of news to a greater extent than other approaches, which empowers the method to more accurately capture user interests and significantly enhances the personalization level of the recommendation system.

Table 2.
The Performance of Acc, Rec, and MAR on MINDsmall-488 and MINDlarge-1138.

Data Methods Acc $↑$ Rec $↑$ MAR $↓$

MINDsmall-488 NAML 0.49046 0.50571 0.49429

NRMS 0.58442 0.60924 0.39076

LSTUR 0.53487 0.52251 0.47749

DKN 0.54133 0.26106 0.73894

UNBERT 0.60058 0.60328 0.39672

SP-PNR 0.65922 0.75542 0.24458

MINDlarge-1138 NAML 0.56401 0.50073 0.49927

NRMS 0.57792 0.56659 0.43341

LSTUR 0.57365 0.48467 0.51533

DKN 0.58274 0.31521 0.68479

UNBERT 0.57395 0.47235 0.52765

SP-PNR 0.59473 0.62012 0.37988

Data	Methods	Acc $↑$	Rec $↑$	MAR $↓$
MINDsmall-488	NAML	0.49046	0.50571	0.49429
	NRMS	0.58442	0.60924	0.39076
	LSTUR	0.53487	0.52251	0.47749
	DKN	0.54133	0.26106	0.73894
	UNBERT	0.60058	0.60328	0.39672
	SP-PNR	0.65922	0.75542	0.24458
MINDlarge-1138	NAML	0.56401	0.50073	0.49927
	NRMS	0.57792	0.56659	0.43341
	LSTUR	0.57365	0.48467	0.51533
	DKN	0.58274	0.31521	0.68479
	UNBERT	0.57395	0.47235	0.52765
	SP-PNR	0.59473	0.62012	0.37988

Note. Acc = accuracy; Rec = recall; MAR = missing alarm rate; NAML = neural attentive multiview learning for news recommendation; NRMS = neural news recommendation with multihead self-attention; LSTUR = long- and short-term user representations; DKN = deep knowledge-aware network for news recommendation; UNBERT = user–news matching BERT for news recommendation; SP-PNR = soft prompt-tuning method for personalized news recommendation. The bolder ones mean better.

4.4. The Influence of Template

To systematically validate the effectiveness of our SP-PNR in integrating news content, we conducted an in-depth comparative analysis, especially incorporating the experimental results from multiple hand-crafted templates. Three manual templates conducted in the dataset are listed in Table 3, and the experimental results are shown in Table 4.

Table 3.
The Information of Different Hard Templates.

ID Template

1 The headline of the news is {“placeholder”: “text_a”}, its category falls under {“placeholder”: “text_b”}, and the summary of the news is {“placeholder”: “text_c”}. The user’s attitude toward this news is {“mask”}.

2 There are some information of this news,title: {“placeholder”: “text_a”}, subcategory: {“placeholder”: “text_b”}, summary:{“placeholder”: “text_c”}. The user’s attitude toward this news is {“mask”}.

3 News {“placeholder”: “text_a”}, in subcategory {“placeholder”: “text_b”}, and summed up as {“placeholder”: “text_c”}, meet with user sentiment {“mask”}.

ID	Template
1	The headline of the news is {“placeholder”: “text_a”}, its category falls under {“placeholder”: “text_b”}, and the summary of the news is {“placeholder”: “text_c”}. The user’s attitude toward this news is {“mask”}.
2	There are some information of this news,title: {“placeholder”: “text_a”}, subcategory: {“placeholder”: “text_b”}, summary:{“placeholder”: “text_c”}. The user’s attitude toward this news is {“mask”}.
3	News {“placeholder”: “text_a”}, in subcategory {“placeholder”: “text_b”}, and summed up as {“placeholder”: “text_c”}, meet with user sentiment {“mask”}.

Table 4.

The Performance of Different Templates.

ID	Acc $↑$	Rec $↑$	MAR $↓$
1	0.63781	0.73423	0.26577
2	0.63031	0.75010	0.24990
3	0.63842	0.76422	0.23578
avg	0.63551	0.74952	0.25048
soft template	0.65922	0.75542	0.24458

Note. Acc = accuracy; Rec = recall; MAR = missing alarm rate. The bolder ones mean better.

From the results in Table 4, we can observe that the manual templates can achieve fairly good results, and even obtain a high level of Acc in controlled experiments. However, these methods exhibit limitations, mainly in terms of the difficulty of flexibly responding to the diverse and personalized interest tendencies of a wide range of users. Moreover, testing and matching numerous customized templates individually for each user in real scenarios is undoubtedly a resource-intensive and operationally difficult challenge.

On the contrary, our proposed soft template generation strategy shows significant advantages. With excellent flexibility and generalization capability, the method can adapt to the diversity of user needs and tailor more accurate and efficient information matching solutions for a wide range of user groups. In the experiments, we designed three different hand-crafted templates and conducted relevant experiments to compare the average of the obtained results with our soft templates, which further highlights the great potential of our approach in improving the efficiency and user experience of personalized news recommendation.

4.5. The Influence of News Semantic Information

To gain insight into the substantial impact of the completeness of news semantic information on the performance of a personalized news recommendation method, we systematically removed the subcategory and summary parts of each news from the entire dataset, called SP-PNR(-sub) and SP-PNR(-sum), respectively, thereby reducing the overall semantic information of the news. The purpose of the experiment is to quantify the specific impact of reducing semantic information on recommendation effectiveness, primarily by reducing the semantic richness of the news.

The experimental results are presented in Table 5, which clearly shows that the richness of news semantic information is crucial for ensuring recommendation Acc and enhancing user experience. It also reflects the nonnegligible role of rich semantic content in accurately matching user interests.

Table 5.
Representation of Different Semantic Information.

News Semantic Information Acc $↑$ Rec $↑$ MAR $↓$

SP-PNR(-sub) 0.64838 0.74224 0.25776

SP-PNR(-sum) 0.65918 0.74717 0.25283

SP-PNR 0.65922 0.75542 0.24458

News Semantic Information	Acc $↑$	Rec $↑$	MAR $↓$
SP-PNR(-sub)	0.64838	0.74224	0.25776
SP-PNR(-sum)	0.65918	0.74717	0.25283
SP-PNR	0.65922	0.75542	0.24458

Note. Acc = accuracy; Rec = recall; MAR = missing alarm rate; SP-PNR = soft prompt-tuning method for personalized news recommendation. The bolder ones mean better.

4.6. The Influence of Different PLMs

To investigate the influence of different PLMs on personalized news recommendation, we further introduced RoBERTa and DeBERTa as the backbone model to our SP-PNR, which aims to assess how varying PLMs affect recommendation Acc and effectiveness.

The experimental results are presented in Table 6, showing that SP-PNR with BERT generally provides the best performance in terms of Acc. RoBERTa-base and DeBERTa-base, while showing competitive Rec, performed relatively lower in Acc and MAR, especially on the MINDlarge dataset. These results indicate that while BERT excels in overall recommendation performance, RoBERTa-base and DeBERTa-base also contribute to competitive Rec, particularly in certain datasets.

Table 6.
The Influence of Different PLMs.

Data Methods Acc $↑$ Rec $↑$ MAR $↓$

MINDsmall-488 RoBERTa 0.60826 0.81145 0.18855

DeBERTa 0.63095 0.71175 0.28825

SP-PNR 0.65922 0.75542 0.24458

MINDlarge-1138 RoBERTa 0.48821 0.69257 0.30743

DeBERTa 0.53770 0.52608 0.47392

SP-PNR 0.59473 0.62012 0.37988

Data	Methods	Acc $↑$	Rec $↑$	MAR $↓$
MINDsmall-488	RoBERTa	0.60826	0.81145	0.18855
	DeBERTa	0.63095	0.71175	0.28825
	SP-PNR	0.65922	0.75542	0.24458
MINDlarge-1138	RoBERTa	0.48821	0.69257	0.30743
	DeBERTa	0.53770	0.52608	0.47392
	SP-PNR	0.59473	0.62012	0.37988

Note. PLM = pretrained language model; Acc = accuracy; Rec = recall; MAR = missing alarm rate; SP-PNR = soft prompt-tuning method for personalized news recommendation. The bolder ones mean better.

4.7. The Performance on the MovieLens-1M Dataset

To further validate the generalizability of our SP-PNR, we conduct experiments on the classical MovieLens-1M dataset. Due to the special characteristics in the MIND dataset, including news title, content, and the reading timestamp, the baselines, including NRMS, LSTUR, NAML, DKN, and UNBERT, cannot be applied in the MovieLens dataset. Thus, we introduced classical recommendation methods as baseline methods. The detailed descriptions of the dataset, baselines, and experimental results are as follows.

Dataset: MovieLens 1M.¹ It is a widely used benchmark for evaluating recommendation systems. It comprises approximately 1,000,209 ratings, each ranging from 1 to 5, provided by 6,040 users on 3,706 movies. Each user has rated at least 20 movies, ensuring sufficient user activity for meaningful analysis. In addition to user-item ratings, the dataset includes metadata about the movies—such as titles, genres, and release dates—facilitating content-based and hybrid recommendation strategies.

Baseline Methods:

Nonnegative Matrix Factorization (NMF; Wang & Zhang, 2012). A basic matrix factorization method for recommendation, using the generalized Kullback–Leibler divergence for updates in our experiments.

Singular Value Decomposition Plus (SVD++; Koren, 2008). A model that combines both explicit and implicit feedback from users, integrating latent factor and neighborhood models for recommendation.

Personalized Recommendation With Knowledge Graph via Dual-Autoencoder (PRKG; Yang et al., 2022). This method uses auxiliary information from item knowledge graphs, encoded into low-dimensional representations via a semi-autoencoder to enhance recommendations.

LLaMA (Touvron et al., 2023). A large language model based on the Transformer architecture, trained on a dataset containing 2 trillion tokens, demonstrates strong performance across various benchmarks.

Notably, the PREA toolkit (Lee et al., 2014)² is adopted for the implementation of NMF and SVD++. For PRKG, the default parameters as reported in Yang et al. (2022) are used in the experiments. It is worth noting that the existing experimental results reported in their papers are directly copied into our tables. For LLaMA, recommendations are made directly through a conversational approach.

The experimental results on the MovieLens-1M dataset are presented in Table 7. Our SP-PNR outperformed other baseline methods across all evaluation metrics, including Acc, Rec, and MAR. In contrast, other methods such as NMF, SVD++, and LLaMA had lower performance, particularly in Rec and MAR, with SP-PNR models significantly outperforming them.

Table 7.
The Performance of Acc, Rec, and MAR on MovieLens-1M Dataset.

Data Methods Acc $↑$ Rec $↑$ MAR $↓$

MovieLens-1M NMF 0.57313 0.48671 0.51329

SVD++ 0.56614 0.35335 0.64665

PRKG 0.58801 0.36337 0.28825

LLaMA 0.46675 0.32128 0.28825

SP-PNR 0.61777 0.76315 0.23685

Data	Methods	Acc $↑$	Rec $↑$	MAR $↓$
MovieLens-1M	NMF	0.57313	0.48671	0.51329
	SVD++	0.56614	0.35335	0.64665
	PRKG	0.58801	0.36337	0.28825
	LLaMA	0.46675	0.32128	0.28825
	SP-PNR	0.61777	0.76315	0.23685

Note. Acc = accuracy; Rec = recall; MAR = missing alarm rate; NMF=nonnegative matrix factorization; SVD++=singular value decomposition plus; PRKG=personalized recommendation with knowledge graph via dual-autoencoder; SP-PNR = soft prompt-tuning method for personalized news recommendation. The bolder ones mean better.

4.8. Case Study

We conducted a case study to evaluate the effectiveness and robustness of our SP-PNR method. User U13234 demonstrated a strong preference for news articles in the “lifestyle” category, with their clicking behavior highly concentrated on this category. In our experiments, the prediction Acc for this user reached an impressive 96.7%, indicating that our SP-PNR method performs exceptionally well for users with focused interests. In contrast, User U25769 exhibited a more diverse interest profile, with clicking behavior spanning over 10 different categories. However, further analysis revealed that the user rarely clicked on news articles from the “auto” category, despite this category frequently appearing in the recommendations. This suggests that the “automation” category does not align with the user’s interests. The SP-PNR method effectively identified both the user’s preferences and the categories they were less inclined to engage with, contributing to more targeted and accurate recommendations. Additionally, we observed more targeted interest patterns. For example, User U2237 showed a strong interest in news related to “basketball-NBA,” but did not engage with news in the “football” category, indicating that some users may only be interested in a specific subcategory within a larger category. In contrast, User U15631 demonstrated interest in “lifestyle” news, along with a notable interest in “movies,” “music,” and “travel” news, highlighting that this user enjoys a broad range of topics related to leisure and lifestyle. The SP-PNR method can effectively identify these varying interest patterns, whether they are highly concentrated or span multiple categories, providing more precise and personalized recommendations.

Our prompt-tuning templates demonstrated strong robustness by effectively capturing the multidimensional semantic information in news articles. For users with concentrated interests, the templates achieved high prediction Acc. For users with diverse interests, the templates still succeeded in identifying their primary preferences, showcasing adaptability across different user profiles. This robustness ensures that the SP-PNR method performs reliably across varying types of user behavior. However, as more semantic information is integrated, the interests of some users may become less distinct, indicating that the templates still have limitations in fully capturing complex user preferences.

Overall, the SP-PNR method, supported by a robust template design, exhibits strong flexibility and adaptability in capturing user interests, making it a reliable foundation for personalized recommendations. In future work, we aim to further enhance the coverage and flexibility of the templates to better address the needs of users with complex or highly diverse interests.

4.9. Parameter Sensitivity

The parameters in the experiments can significantly impact the experimental results, including the learning rate and batch size. In this section, we conducted parameter sensitivity experiments on the MINDsmall-488 News dataset. The results are presented in Figures 2 and 3, respectively. Notably, when we change one parameter, the rest others are fixed in the experiments.

Figure 2.

User U13234’s Personalized News Interaction Report With Category Insights.

Figure 3.

The Effect of Batch Size on Experimental Results.

4.9.1. Batch Size

Batch size is a pivotal hyperparameter in news recommendation systems, governing the quantity of samples processed per training iteration. Its selection critically influences both the efficiency and effectiveness of model training, with the specific impacts delineated in Figure 2.

The experimental results show that the model performance is optimal when the batch size is set to 16, with the Acc and Rec reaching the peaks of 0.65922 and 0.75542, respectively, while the MAR is as low as 0.24458, indicating the comprehensiveness and Acc of the model in capturing the information in this configuration. The small-scale batches $(4, 8)$ outperform the large-scale $(32, 64)$ in terms of Rec performance, implying that they are more effective in identifying relevant entries and reducing omissions. Model Acc peaks at medium-sized batches and then declines with increasing batch size, possibly stemming from noise introduced by large training batches or weakened generalization ability. Therefore, appropriate batch size selection is crucial to ensure recommendation system Acc and breadth, as emphasized in this study.

Figure 4.

The Effect of Learning Rate on Experimental Results.

4.9.2. Learning Rate

From the results of the experiments, it can be found that when the learning rate is $1 \times 10^{- 4}$ , the performance reaches the optimum, which is demonstrated as the Acc rate is improved to 0.65922, the Rec rate is optimized to 0.75542, and the missed-alarm rate is reduced to the lowest level of 0.24458. Comparatively, both lower learning rates (e.g., $1 \times 10^{- 5}$ ) and higher learning rates (e.g., $1 \times 10^{- 3}$ vs. $1 \times 10^{- 2}$ ) show different magnitudes of decrease in performance, which suggests that too small learning rate may lead to a sluggish learning process, while too large learning rate may trigger training instability, which in turn hinders the generalization ability of the model (Figure 4).

5. Conclusion

In this paper, we propose an SP-PNR. It improves previous methods by taking the personalized characteristics of items into template construction in a prompt and achieves competitive performance compared to hand-crafted through soft prompt-tuning. All the side information of news, including the summaries and subcategories, is introduced to learn the characteristics of news, and three strategies are designed to capture different characteristics of expanded words for verbalizer optimization. Finally, extensive experiments show the effectiveness of our presented method.

In the future, we will extend our research work in the following two directions. Firstly, we aim to incorporate more auxiliary information from external knowledge for personalized news recommendations. Secondly, we plan to explore better methods for automatic template construction.

Footnotes

ORCID iD

Xu Yuan

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was partially supported by the Graduate Student Scientific Research Innovation Projects in Jiangsu Province of China (SJCX23_1896), the National Natural Science Foundation of China under grants (62076217), the Key Research and Development Program of Jiangsu Province in China (BE2023315), and the Open Project of Anhui Provincial Key Laboratory for Intelligent Manufacturing of Construction Machinery (IMCM-2023-01).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Zhang

Liu

Xie

(2019). Neural news recommendation with long-and short-term user representations. In Proceedings of the 57th annual meeting of the association for computational linguistics (pp. 336–345). Association for Computing Machinery.

Brown

Mann

Ryder

Subbiah

Kaplan

J. D.

Dhariwal

Neelakantan

Shyam

Sastry

Askell

Agarwal

Herbert-Voss

Krueger

Henighan

Child

Ramesh

Ziegler

Winter

Hesse

Amodei

(2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.

Chappuis

Zermatten

Lobry

Le Saux

Tuia

(2022). Prompt-RSVQA: Prompting visual context to a language model for remote sensing visual question answering. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 1372–1381). IEEE.

Chen

Zhang

Xie

Deng

Yao

Tan

Huang

Chen

(2022). KnowPrompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction. In Proceedings of the ACM Web conference 2022 (pp. 2778–2788). Association for Computing Machinery .

Cui

Ding

Huang

Liu

(2022). Prototypical verbalizer for prompt-based few-shot tuning. arXiv preprint arXiv:2203.09770.

Devlin

Chang

M. W.

Lee

Toutanova

(2018). BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.

Ding

Zhao

Chen

Liu

Zheng

H. T.

Sun

(2021). OpenPrompt: An open-source framework for prompt-learning. arXiv preprint arXiv:2111.01998.

Han

Zhao

Ding

Liu

Sun

(2022). PTR: Prompt tuning with rules for text classification. AI Open, 3, 182–192.

Ding

Wang

Liu

Sun

(2021). Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification. CoRR abs/2108.02035.

10.

Jiang

F. F.

Araki

Neubig

(2020). How can we know what language models know?. Transactions of the Association for Computational Linguistics, 8, 423–438.

11.

Koren

(2008). Factorization meets the neighborhood: a multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 426–434). Association for Computing Machinery.

12.

Lan

Chen

Goodman

Gimpel

Sharma

Soricut

(2019). ALBERT: A Lite BERT for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942.

13.

Lee

Sun

Lebanon

Sonnenburg

(2014). Prea: Personalized recommendation algorithms toolkit. Journal of Machine Learning Research, 13(3), 2699–2703.

14.

Gao

Chen

Shao

Zheng

Zhang

Wang

(2021). SentiPrompt: Sentiment knowledge enhanced prompt-tuning for aspect-based sentiment analysis. arXiv preprint arXiv:2109.08306.

15.

Zhang

Malthouse

E. C.

(2023). PBNR: Prompt-based news recommender system. ArXiv abs/2304.07862. https://api.semanticscholar.org/CorpusID:258179168.

16.

Liu

Tam

Yang

Tang

(2022). P-Tuning: Prompt tuning can be comparable to fine-tuning across scales and tasks. In Proceedings of the 60th annual meeting of the association for computational linguistics (Volume 2: Short Papers) (pp. 61–68). Association for Computational Linguistics.

17.

Luo

Sun

Xia

Qin

Zhang

Poon

Liu

T. Y.

(2022). BioGPT: Generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), bbac409.

18.

Mao

Liu

Cambria

(2022). The biases of pre-trained language models: An empirical study on prompt-based sentiment analysis and emotion detection. IEEE Transactions on Affective Computing, 14(3), 1743–1753.

19.

Qiu

Sun

Shao

Dai

Huang

(2020). Pre-trained models for natural language processing: A survey. Science China Technological Sciences, 63(10), 1872–1897.

20.

Shin

Lin

Thomson

Chen

Roy

Platanios

E. A.

Pauls

Klein

Eisner

Van Durme

(2021). Constrained language models yield few-shot semantic parsers. In M. F. Moens, X. Huang, L. Specia & S. W. T. Yih (Eds.), Proceedings of the 2021 conference on empirical methods in natural language processing (pp. 7699–7715). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.608.

21.

Shin

Razeghi

Logan

I. V. R. L.

Wallace

Singh

(2020). AutoPrompt: Eliciting knowledge from language models with automatically generated prompts. arXiv preprint arXiv:2010.15980.

22.

Wang

Qin

Chan

C. M.

Lin

Wang

Wen

Liu

Hou

Sun

Zhou

(2021). On transferability of prompt tuning for natural language processing. arXiv preprint arXiv:2111.06719.

23.

Sun

Liu

Pei

Lin

Jiang

(2019). BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM international conference on information and knowledge management (pp. 1441–1450). Association for Computing Machinery.

24.

Touvron

Martin

Stone

Albert

Almahairi

Babaei

Bashlykov

Batra

Bhargava

Bhosale

Bikel

Blecher

Canton Ferrer

Chen

Cucurull

Esiobu

Fernandes

Fuller

Goyal

(2023). Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

25.

Wang

Zhang

Xie

Guo

(2018). DKN: Deep knowledge-aware network for news recommendation. In Proceedings of the 2018 world wide web conference (pp. 1835–1844). International World Wide Web Conferences Steering Committee.

26.

Wang

Guo

Wang

Liu

(2023). HDNR: A hyperbolic-based debiased approach for personalized news recommendation. In Proceedings of the 46th International ACM SIGIR conference on research and development in information retrieval (pp. 259–268). Association for Computing Machinery.

27.

Wang

Sun

Tao

Geng

Jiang

(2022). PromDA: Prompt-based data augmentation for low-resource NLU tasks. arXiv preprint arXiv:2202.12499.

28.

Wang

Y. X.

Zhang

Y. J.

(2012). Nonnegative matrix factorization: A comprehensive review. IEEE Transactions on Knowledge and Data Engineering, 25(6), 1336–1353.

29.

Wei

Jiang

Zhao

(2022). Eliciting knowledge from pretrained language models for prototypical prompt verbalizer. In International conference on artificial neural networks (pp. 222–233). Springer.

30.

Huang

Xie

(2019a). Neural news recommendation with attentive multi-view learning. arXiv preprint arXiv:1907.05576.

31.

Huang

Xie

(2019b). Neural news recommendation with multi-head self-attention. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 6389–6394). Association for Computational Linguistics.

32.

Huang

(2021). Empowering news recommendation with pre-trained language models. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval (pp. 1652–1656). Association for Computing Machinery.

33.

Yang

Zhu

(2022). Personalized recommendation with knowledge graph via dual-autoencoder. Applied Intelligence, 52(6), 6196–6207.

34.

Liu

(2021). Tiny-NewsRec: Effective and efficient PLM-based news recommendation. arXiv preprint arXiv:2112.00944.

35.

Zhang

Haddow

Birch

(2023). Prompting large language model for machine translation: A case study. In International conference on machine learning (pp. 41092–41110). PMLR.

36.

Zhang

Jia

Wang

Zhu

Wang

(2021). UNBERT: User–news matching BERT for news recommendation. In International joint conferences on artificial intelligence (pp. 3356–3362). IJCAI Organization.

37.

Zhang

Wang

(2023). Prompt learning for news recommendation. In Proceedings of the International ACM SIGIR conference on research and development in information retrieval (pp. 227–237). Association for Computing Machinery.

38.

Zhu

Zhou

Song

Tan

Guo

(2019). DAN: Deep attention neural network for news recommendation. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 5973–5980). AAAI Press.

39.

Zhu

Wang

Qiang

(2024). Prompt-learning for short text classification. IEEE Transactions on Knowledge and Data Engineering, 36(10), 5328–5339.

Soft Prompt-Tuning for Personalized News Recommendation

Abstract

Keywords

1. Introduction

2. Related Work

2.1. Personalized News Recommendation

2.2. Prompt-Tuning

3. Methods

3.2. Automatic Template Generation

4.1. Dataset

4.2. Compared Methods and Evaluation Metrics

4.2.1. Compared Methods

4.2.2. Implementation Details and Parameter Settings

Table 1. Details of Dataset Used in our Experiments. Dataset Number of users Number of news Number of interactions MINDsmall-488 488 20,274 219,776 MINDlarge-1138 1,138 33,787 947,951

Table 5. Representation of Different Semantic Information. News Semantic Information Acc ↑ Rec ↑ MAR ↓ SP-PNR(-sub) 0.64838 0.74224 0.25776 SP-PNR(-sum) 0.65918 0.74717 0.25283 SP-PNR 0.65922 0.75542 0.24458

Table 6. The Influence of Different PLMs. Data Methods Acc ↑ Rec ↑ MAR ↓ MINDsmall-488 RoBERTa 0.60826 0.81145 0.18855 DeBERTa 0.63095 0.71175 0.28825 SP-PNR 0.65922 0.75542 0.24458 MINDlarge-1138 RoBERTa 0.48821 0.69257 0.30743 DeBERTa 0.53770 0.52608 0.47392 SP-PNR 0.59473 0.62012 0.37988

Table 7. The Performance of Acc, Rec, and MAR on MovieLens-1M Dataset. Data Methods Acc ↑ Rec ↑ MAR ↓ MovieLens-1M NMF 0.57313 0.48671 0.51329 SVD++ 0.56614 0.35335 0.64665 PRKG 0.58801 0.36337 0.28825 LLaMA 0.46675 0.32128 0.28825 SP-PNR 0.61777 0.76315 0.23685

4.9. Parameter Sensitivity

5. Conclusion

Footnotes

ORCID iD

Funding

Declaration of Conflicting Interests

Notes

References

Table 1.
Details of Dataset Used in our Experiments.

Dataset Number of users Number of news Number of interactions

MINDsmall-488 488 20,274 219,776

MINDlarge-1138 1,138 33,787 947,951

Table 5.
Representation of Different Semantic Information.

News Semantic Information Acc $↑$ Rec $↑$ MAR $↓$

SP-PNR(-sub) 0.64838 0.74224 0.25776

SP-PNR(-sum) 0.65918 0.74717 0.25283

SP-PNR 0.65922 0.75542 0.24458

Table 7.
The Performance of Acc, Rec, and MAR on MovieLens-1M Dataset.

Data Methods Acc $↑$ Rec $↑$ MAR $↓$

MovieLens-1M NMF 0.57313 0.48671 0.51329

SVD++ 0.56614 0.35335 0.64665

PRKG 0.58801 0.36337 0.28825

LLaMA 0.46675 0.32128 0.28825

SP-PNR 0.61777 0.76315 0.23685