A Novel News Recommendation Model With Knowledge Enhancement and Stability

Abstract

With the rapid development of information technology, today’s society has higher and higher requirements for the recommendation system, especially regarding recommendation accuracy. A significant feature of news recommendation is that it has high timeliness, and the popularity of a news article will decline exponentially in a week. The effectiveness of traditional recommendation methods in news recommendation could be more optimistic. In order to further improve the accuracy of news recommendations, a large number of knowledge graphs are applied to news recommendations, and the nodes and edges of the knowledge graph can better represent the relationship between entities in the article; compared with traditional recommendation methods, it can better solve the problems of data sparsity and cold start. This paper proposes a relational entity credibility discrimination model, eliminating the relational entities without credibility to improve news recommendations accuracy, the existence of some relational entities in the triad of the knowledge graph may distort the meaning of the article or have a near-zero impact on the article, which is considered untrustworthy for these two types of relational entities. Experimental results show the effectiveness and efficiency of the model.

Keywords

News recommendation knowledge graph credibility accuracy

1. Introduction

In this era of rapid development of the internet, the popularity of the internet is also increasing rapidly. Various media, such as Facebook, TikTok, and other enterprises, are also competing for the number of users. At this time, the requirements for the recommendation system also follow. How to accurately recommend users’ favorite items, articles, and other items to users has become a major means to attract new users and retain good users. An excellent recommendation system will create more value for an enterprise.

Generally speaking, news recommendation has higher requirements than a commodity, tourism, and other recommendations, mainly because the news has timeliness. The effect of news recommendation using the previous traditional recommendation methods is often not optimal. At this time, Wang et al. (2018) introduced the knowledge graph into news recommendation, such as a deep knowledge-aware network for news recommendation (DKN), which embeds the news title into the knowledge graph through knowledge graph embedding via dynamic mapping matrix (Trans-D; Ji et al., 2015) and extremely efficient hardware keypoint detection with a compact convolutional neural network (KCNN; Febbo et al., 2018), but the disadvantage is that if only the article title is embedded, it will inevitably encounter some situations with low consistency between the content and the title, which will greatly affect the recommendation results. Then, Liu et al. (2020) proposed knowledge-aware document representation for news recommendations (KRED), which embeds the whole article into the knowledge graph through knowledge graph attention network for recommendation (KGAT; Wang et al., 2019) model, a three-tier model is introduced to enhance the knowledge of article entities and then recommend them.

The previous news recommendation based on the knowledge graph mainly embeds the entities and relational entities in the news into the knowledge graph, but it is impossible to confirm whether the embedded relational entities will affect the results of the news recommendation. This paper proposes a relational entity credibility discrimination model. The model analyzes the credibility of the relational entities embedded in the knowledge graph and eliminates the relational entities whose credibility does not meet the standard, as shown in Figure 1. The main contributions of this paper are summarized as follows:

A relational entity credibility discrimination model is proposed to judge whether each relational entity in the knowledge graph has credibility.

A two-layer confirmation model is designed to further improve the accuracy of the relationship entity credibility discrimination model.

Experimental results on real datasets prove the efficiency and effectiveness of our model.

Figure 1.

This is a real news about the US–Iraq war. After embedding it into the knowledge graph, we can see that there are various entities and relational entities in the knowledge graph, the red edge is the relationship entity to be deleted.

2. Definitions and Symbols

This section first introduces and defines commonly used concepts and notations.

2.1. Definitions

First, the minimum concept mentioned in the rest of this article is defined.

Definition 1
The $credibility discrimination mode$ refers to two types of relational entities that are untrustworthy: (a) Relational entities that have a significant impact on the general idea of the article and may misinterpret the general idea of the article. (b) Relationship entities that have too little impact on the general idea of the article and are optional for the article. The above two types of relationship entities are untrusted relationship entities, and in addition, they are both trusted relationship entities.
Definition 2
The $T (h, r, t)$ represents the weight of a triple in the knowledge graph, where $h$ identifies the header entity, $r$ identifies the relational entity, and $t$ represents the tail entity.
2.2. Symbols

Next is the definition of some symbols (Table 1).

Table 1.
Commonly Used Notations.

Parameter name Value

$\oplus$ Vector concatenation

$e_{h}$ Head entity vector

$e_{r}$ Relational entity vector

$e_{t}$ Tail entity vector

$e_{i}$ Entity presentat layer vector

$e_{n}$ Contextual embedding layer vector

$e_{t}$ Information distillation vector

$e_{p}$ Position entity vector

$e_{c}$ Category entity vector

$e_{f}$ Frequency entity vector

$P_{ev}$ Prediction function

Parameter name	Value
$\oplus$	Vector concatenation
$e_{h}$	Head entity vector
$e_{r}$	Relational entity vector
$e_{t}$	Tail entity vector
$e_{i}$	Entity presentat layer vector
$e_{n}$	Contextual embedding layer vector
$e_{t}$	Information distillation vector
$e_{p}$	Position entity vector
$e_{c}$	Category entity vector
$e_{f}$	Frequency entity vector
$P_{ev}$	Prediction function

3. Related Work

We divide the news recommendation method into two categories, namely, namely, news recommendation method based on nonneural network and news recommendation method based on neural network.

3.1. News Recommendation Method Based on Nonneural Network

Cantador et al. (2008) believe that news content and user preference descriptions appear in domain ontology based on a set of concepts and develop a semantic context-aware recommendation model that personalizes the order in which news articles are displayed to users based on their long-term interest profile, as well as other models that reorder the list of news items taking into account the semantic context in which the user is currently interested, Jntema et al. (2010) focused on the benefits of using domain ontology rather than terminology-based methods to recommend news projects, so they proposed an ontology-based news model, Athena, which is an extension of the existing Hermes framework. Athena uses user profiles to store terms or concepts in news items that users browse. Based on this information, the framework uses traditional term frequency–inverse document frequency (TF–IDF)-based methods and several ontology-based methods to recommend new articles to users, Goossen et al. (2011) believe that most traditional algorithms are based on TF–IDF and term-based weighting methods, which are mainly used for information retrieval and text mining. Therefore, they proposed a personalized recommendation model based on semantic mining. The results show that using the semantics of domain ontology to adjust TF–IDF to produce conceptual frequency–inverse document frequency (CF–IDF) produces better results than using the original TF–IDF method. CF–IDF is built and tested in Athena, a recommended extension of the Hermes news personalization framework. Athena uses user profiles to store concepts or terms in news items that users browse. The framework uses traditional TF–IDF recommenders and CF–IDF recommenders to recommend new articles to users. Statistical evaluation of these two methods shows that using ontology can significantly improve the performance of traditional recommenders, and Capelle et al. (2012) believe that news recommendations typically use TF–IDF weighting techniques combined with cosine similarity measurements, but do not consider the actual meaning of words. Therefore, they proposed a semantic-based news recommendation model that utilizes five semantic similarity (SS) measures to calculate the similarity between news items in news recommendations, Capelle et al. (2013) also believe that traditional content-based news recommendation uses a word vector space model without considering the use of semantics. Therefore, they proposed a semantic news recommendation model that uses the similarity between text networks and Bing to extend the most advanced semantic dictionary-driven SS recommendation method by taking additional consideration of named entities. First, similar to SS, calculate the similarity between the WordNet synonym set in unread news items and the synonym set in read news items (stored in the user profile). Then, the number of pages of named entities retrieved from the Bing Web search engine is used to calculate named entity similarity between unread and read news items, Rao et al. (2013) focus on using background knowledge to obtain potential semantic relevance, thereby promoting personalized news recommendations. Therefore, they propose using personalized news recommendations obtained through web ontology to model articles and user configuration documents from larger real-world ontologies collected on the web, without requiring too much manual annotation, they further study the conceptual similarity and news user matching of this ontology by considering its naturally embedded ontology structure, and calculate news user similarity through the collaborative construction of the ontology structure. Kumar and Kulkarni (2013) have developed a new news personalization algorithm that uses the adaptive algorithm of the classic nearest neighbor algorithm and combines it with the knowledge graph they create. Using implicit user data, such as read and unread articles, their location and distance in the chart, the algorithm ranks new articles based on the user’s predicted interest in the content of the article. Joseph and Jiang (2019) believe that content-based news recommendation systems need to recommend news articles based on the topic and content of the article, rather than using user-specific information. Many news articles describe the occurrence and naming of specific events, including people, places, or objects. So they proposed a graph traversal algorithm and a new weighting scheme for content-based cold start news recommendations using these named entities. To create a higher degree of user-specific relevance, their algorithm calculates the shortest distance between named entities, across news articles, and on a large knowledge graph. However, the above methods have a low degree of personalization and are simple to innovate. They cannot use nonlinear activation functions to model nonlinearity in data, such as Softmax, ReLU, Sigmoid, and Tanh. Moreover, they can only use manual feature design, and their actual recommendation effect is not comparable to that of automatic feature identification recommendation models. Finally, in terms of flexibility, nonneural network news recommendation methods have low flexibility and have shortcomings in the use of many tools, such as TensorFlow, which cannot receive better tool support.

3.2. News Recommendation Method Based On Neural Network

Wang et al. (2018) believe that the news language is highly condensed, full of knowledge entities and common sense, so they proposed a deep knowledge aware network news recommendation that combines knowledge graph entity embedding with neural networks and uses a convolutional neural network (KCNN) that combines knowledge to form a new embedding representation by merging the semantic representation of news and knowledge representation, then establish an attachment mechanism from the user’s news click history to candidate news, and select the news with higher scores to recommend to the user. Chu et al. (2019) believe that click sequences indicate the potential preferences of user Z from the interaction history and are used to predict their future preferences. Therefore, they propose a self-attention sequential knowledge awareness recommendation (Saskr) system composed of sequential awareness and knowledge awareness models. Use self-attention mechanisms to discover sequential patterns in sequential awareness models. Knowledge perception modeling uses knowledge graphs as side information to mine deep links between news, thereby improving the diversity and scalability of recommendations. Liu et al. (2020) feel that the key information carried by important entities helps to understand more direct content, so they propose a knowledge-enhanced news recommendation model that aggregates information from the neighborhood of the knowledge graph, thereby enriching the embeddedness of entities. Then, a context embedding layer is used to annotate the dynamic contexts of different entities, such as frequency, category, and location. Finally, the information rectification layer aggregates entities for embedding under the guidance of the original document representation and converts the document vector into a new document vector. It advocates optimizing the model under a multitasking framework to enable different news recommendation applications to unify and share useful information between different tasks, and Wang et al. (2020) in order to consider both knowledge and content factors, proposed a news recommendation method, known as knowledge and content aware news recommendation network (KCNR). KCNR represents users and news in the form of knowledge and content and then predicts the weight of user preferences on knowledge and content through a user preference prediction mechanism. In addition, based on the weight of user preferences on knowledge, it expands user preferences and entities in the knowledge graph (Raza and Ding, 2021). This paper proposes a deep neural network that jointly learns news and user representation in a unified framework. It learns the news representation (features) from the headlines, snippets (body), and taxonomy (category and subcategory) of news. The attention mechanism learns a reader’s long-term interests from the complete click history, short-term interests from recent clicks via long short-term memories, and diverse interests.

Although these models are effective, they have not considered whether the initial parameters have credibility. For example, whether the reference users or reference commodities of similarity measurement in the nonneural recommendation method need to be removed, whether the relationship entity vectors between entities in the neural recommendation method need to be removed, and whether the accuracy of recommendation will be improved after removal, and the above methods are different from nonneural recommendation methods. It has great innovation in the input of recommendation models, the embedding mode of knowledge graphs, and the aggregation mode of knowledge level components and text level components, with a high degree of personalization and innovation. And using neural network-based news recommendation has significant advantages in two aspects: first, neural network-based news recommendation can model users and items in a learning manner, can more accurately describe user interests and news article attributes, and has higher effectiveness in features that are difficult to capture by traditional methods, such as news article features, different types of news type features, improve the understanding of users and items in terms of depth and breadth, improve the interpretability of nonneural network news recommendation methods, and have strong fitting ability. Neural networks can accurately capture arbitrary relationships between users and news, such as nonlinear relationships, dynamic relationships, and space–time relationships, and can learn more accurate user news article interaction functions.

In summary, the neural network-based news recommendation method has several advantages:

Nonlinear transformation. Unlike other linear models, using neural networks can model nonlinearity in data using nonlinear activation functions, such as Softmax, ReLU, Sigmoid, and Tanh. This attribute makes it possible to capture complex user interaction patterns. The conventional method for sparse linear models such as FM is essentially linear models.

Representation learning. Using neural networks can effectively learn potential explanatory factors and useful feature representations from model inputs. Generally, a large amount of descriptive information about news articles and users can be obtained in actual news recommendation applications. In this way, we can use this information to promote our understanding of news articles and users, thereby improving the accuracy of news recommendations. Therefore, representation learning in news recommendation models based on neural networks is a natural choice.

Flexibility. Due to the high flexibility of neural network technology, especially with the emergence of many popular deep learning frameworks, such as Tensorflow, Keras, Caffe, MXnet, and DeepLearning4j. This also makes these tools available for news recommendation based on neural networks, and most of these tools are developed in a modular manner with active community and professional support. Good modularity makes news recommendation models based on neural networks have better tool support in the future. For example, it is easy to combine different neural structures to form a powerful hybrid model, or to replace one module with another. Therefore, we can easily build hybrid and composite recommendation models to capture different characteristics and factors simultaneously. Next, we introduce three key layers in RTD: the entity representation layer, the contextual embedding layer, and the information distillation layer, and the relational entity credibility discrimination model.

Figure 2.

This is an overview of the model proposed in this paper, $K D V$ is the text vector produced by the model in this paper, and the activation function is used to predict the click probability with the user vector generated by user embedding.

4. Methodology

We propose a recommendation model with knowledge enhancement and stability for news recommendations. As shown in Figure 2, $D V$ is the initial text vector generated by BERT (Devlin et al., 2019), and the article is embedded into the knowledge graph through KGAT. Through the stability model, we calculate the credibility of each pair of entities and its related entities and then regenerate the new knowledge graph, and then obtain the final text entity through the information background layer. Next, we will introduce the stability model and its information background layer model.

Figure 3.

Construction of triplet of knowledge graph of news articles.

4.1. The Relation Entity Credibility Discrimination Model

We use the construction idea of KRED and use the technology of Trans-E (Bordes et al., 2013) to embed the entities and related entities in the text into the knowledge graph, where the triple $(h, r, t)$ represents entity–relation–entity, and $h$ represents the head entity, $t$ represents the tail entity, and $r$ represents the relationship entity between $h$ and $t$ (Figure 3). Using the KGAT graph attention mechanism and calculating the triple output weight $(h, r, t)$ through a two-layer activation function (Liu et al., 2020):

\begin{aligned} T^{″} (h, r, t) & = BReLu (C (e_{h} \oplus e_{r} \oplus e_{t}) + D) + E \end{aligned}

(1)

\begin{aligned} T (h, r, t) & = \frac{\exp (T^{″} (h, r, t))}{\sum_{(h, r^{'}, t^{'}) ϵ I} \exp (T^{″} (h, r^{'}, t^{'}))} \end{aligned}

(2)

Figure 4.

Construction of Relationship Entity Credibility Identification Model (RTD).

Where $I$ represents the set of triples, $B$ , $D$ , and $E$ are the parameters obtained in the later training. After our construction is completed, considering that some relational entities $e_{r}$ may have insufficient credibility, which has almost zero impact on text semantics or may distort the general idea of the article, we will discriminate and eliminate such relational entities. In order to further ensure the accuracy of the excluded relationship entities, a two-layer confirmation model is proposed. The triplet must meet one of two conditions, and its related entity can be identified as having insufficient credibility (Figure 4):

b_{p} = ((P_{0} > S) & & (P_{1} < V))

(3)

where the value of

b_{p}

there is only 0 or 1,

b_{p}

=0 means that the triple

(h, r, t)

has no credibility,

b_{p}

=1 means that the triple

(h, r, t)

has credibility,

P_{0}

and

P_{1}

represent two judgment conditions, respectively, and

S

and

V

are the parameters obtained after training. One of the two-layer confirmation models,

P_{0}

, is used to determine whether the related entities in the triplet have too great an impact on the meaning of the text, which will lead to misinterpretation of the meaning of the text. The calculation method is as follows:

P_{0} (T (h, r, t)) = \frac{(T (h, r, t))}{\bar{T (h, r, t)} + T (h, r, t)}

(4)

where

\bar{T (h, r, t)}

represents the average weight of

T (h, r, t)

in news articles, and the larger the value of

P_{0}

, the greater the influence of the triplet

(h, r, t)

on the article. Here, only triples whose weights

(h, r, t)

are greater than

\bar{T (h, r, t)}

are substituted into the calculation. Another evaluation criterion,

P_{1}

, in the two-layer confirmation model is used to determine whether the influence of the related entities in the triplet

(h, r, t)

in the news article is too small to be ignored. The calculation method is as follows:

P_{1} (T (h, r, t)) = {(\frac{\bar{(T (h, r, t)} - T (h, r, t))^{d}}{(max - min)^{k}})}^{y}

(5)

where the smaller the value of

P_{1}

here, the smaller the influence of the triplet

(h, r, t)

on the article, Here, only triples whose weights of triples

(h, r, t)

are less than

\bar{T (h, r, t)}

are substituted into the calculation.

d

and

y

are the parameters obtained in the later training, and

y

is as follows:

y = \frac{\bar{T (h, r, t)} - min}{max - \bar{T (h, r, t)}}

(6)

Through the above-mentioned two-layer evaluation standard model, we mark the $e_{r}$ relationship entity vector in the triple $(h, r, t)$ that does not meet the requirements as the relationship entity vector with no credibility and then pass formulas (1) and (2). The two-layer activation function recalculates the weight $T (h, r, t)$ of the new triple. If the $e_{r}$ relationship entity vector in formula (1) is eliminated, it will be replaced with an empty vector, thereby reducing the weight of the triplet $T (h, r, t)$ .

4.2. Entity Presentation Layer

Using the idea of KGAT to generate an entity vector of triples, its representation is as follows (Liu et al., 2020):

e_{I} = ReLu (A (e_{h} \oplus \sum_{(h, r, t) ϵ I} T (h, r, t) \oplus e_{t}))

(7)

where

A

is the parameter obtained in the later training,

e_{r}

is the entity vector of the triplet, and

T (h, r, t)

is reconstructed by the relational entity credibility discriminant model, although the relational entity

e_{r}

is eliminated, this does not affect the construction of

e_{i}

4.3. Contextual Embedding Layer

The weights of different entities in news articles should introduce other relevant factors, such as the location information and frequency of occurrence of an entity, $e_{i}$ as the entity vector output by the entity representation layer, which will be used as the input of the context embedding layer, combined with the location information and frequency of occurrence, which is represented as follows (Liu et al., 2020):

e_{x} = e_{i} + e_{p} + e_{c} + e_{f}

(8)

e_{x}

represents the vector output by the entity embedding layer,

e_{p}

represents the location where the entity appears, and its weights are represented by 1.2, respectively. If it appears in the text content, it represents 1. If it appears in the title, it represents 2.

e_{f}

represents the number of occurrences of the entity, with a weight range of 0–10, and a number of occurrences of 10 or more are represented as 10,

e_{c}

represents the category of entities, such as President Biden of the United States, whose category is people, and Lamborghini, whose category is cars. Each entity is introduced into a category and embedded in a vector to enhance the comprehensiveness of information (Figure 5)

Figure 5.

Embedding other entity information in context embedding layer.

4.4. Information Distillation Layer

The key entities to be expressed in each article are different. In order to further enhance the purpose of the article vector, we take the original text vector as input and perform knowledge enhancement through this layer model (Figure 6). The representation is as follows (Liu et al., 2020):

\begin{aligned} T^{″} (h, v) & = BReLu (C (e_{x} \oplus V d) + D) + E \end{aligned}

(9)

\begin{aligned} T (h, v) & = \frac{\exp (T^{″} (h, v))}{\sum_{t ϵ ε V} \exp (T^{″} (t, v))} \end{aligned}

(10)

\begin{aligned} e_{n} & = \sum_{h ϵ ε V} T (h, v) e_{x} \end{aligned}

(11)

Figure 6.

Information distillation layer knowledge enhancement process.

The construction method of $T (h, v)$ is the same as that of formula (1). $B$ , $C$ , $D$ , and $E$ are the parameters obtained in the later training, $ε V$ is the set of entity $v$ , and $e_{n}$ is the entity vector obtained through this layer model. Next, we will use the knowledge-enhanced $e_{n}$ vector treatment prediction function. The prediction function is as follows:

P_{ev} = σ (e_{n} v)

(12)

where

e_{n}

is the vector finally obtained by the model in this paper, and

v

is the user vector obtained by the BERT model. The following is the loss function:

\begin{aligned} Loss & = - \log \prod_{(i, j) ϵ H} S (i | j) + λ | | Θ | | \end{aligned}

(13)

\begin{aligned} S (i | j) & = \frac{\exp (γ P_{e_{i} v_{j}})}{\sum_{i^{'} ϵ g} \exp (γ P_{e_{i^{'}} v_{j}})} \end{aligned}

(14)

where

S (i | j)

is constructed from a negative set of random 10 text items

i^{'}

and user

j

and a positive user

j

and text item

i

g

is the set of the above 11 items,

γ

is the smoothing factor of softmax function, and

H

is the interactive set of user

j

and item

i

Θ

is a parameter set,

λ

is the regularization coefficient.

5. Experiments

5.1. Dataset and Settings

We use Microsoft News 3’s real-world dataset and knowledge graph. A set of news was collected from January 15, 2019 to January 28, 2019. For personalized recommendation tasks, the first week is used for training and validation sets, and the next week is used for test sets. To establish a user profile, collect logs for another two weeks before training, and aggregate each user’s behavior for user modeling. Filter out users who clicked on fewer than five articles during profile creation. After filtering, there are 665,034 users, 24,542 news articles, and 1,590,092 interactions in the instance set. The average number of words in a document is 701.¹

5.2. Algorithm and Parameter Settings

In models with machine learning or deep learning, parameter settings are particularly important because in different experimental environments and datasets, it may be necessary to set parameters to achieve the highest accuracy of the model. In this study, we specifically introduce the setting of four parameters: learning rate and regularization coefficient $λ$ , neural network dimensions, and entity embedding dimensions.

5.3. Evaluation Indicators

AUC is the area under the curve. When comparing different classification models, you can draw the receiver operating characteristic curve of each model and compare the AUC as an indicator of the advantages and disadvantages of the model.

NDCG can be decomposed into four components, which are normalization (N), discounted (D) subtraction, cumulative (C) accumulation, and gain (G). The four components are represented as NDCG by the following equation. where $x$ denotes a query, $n$ denotes the NDCG of this query calculated using the first $n$ answers returned, and $i$ denotes the first few answers. The size of G is independent of $i$ and depends only on the quality of this answer. D can be interpreted as the appropriate subtraction for a plus score. Because the higher the answer should be, the more points should be added. G is independent of the position of the answer, so we need to control the size of the score by D. The size of the score is controlled by D. Therefore, D is a quantity that increases with the answer position $i$ . C is the sum of G/D from 1 to $n$ positions to get the score of this query. N is the normalization of the score, which can be interpreted as N being the ideal N is the normalized score, which can be interpreted as N is the ideal score and the highest score that can be achieved.

The $F_{1}$ score, also known as the equilibrium $F$ -score, is defined as the summed average of precision and recall, with a maximum value of 1 and a minimum value of 0. A larger value of $F_{1}$ means better model performance.

5.4. Benchmark Model Comparison and Experiments

FM (Rendle, 2010) is a matrix decomposition-based recommendation method that captures the interaction characteristics between users and items. Compared with the model in this paper, FM focuses on linear interactions, while the model in this paper uses the knowledge graph structure and information about entities and relationships to capture more complex relationships.

DKN (Wang et al., 2018) is a recommendation model that combines knowledge graphs with convolutional neural networks. It uses the entity and relationship information in the knowledge graph to enhance article representation. Compared with the model in this paper, DKN focuses on the entity information of the knowledge graph, but does not explicitly consider the paths between entities.

NAML (Wu et al., 2019) uses multiview learning and attention mechanisms to extract multiple features of news content. Compared with the model in this paper, NAML focuses on extracting features from articles, while the model in this paper uses entity and relationship information from the knowledge graph to construct recommendations.

STCKA (Chen et al., 2019) uses stacked co-attentive mechanisms to capture interaction features between users and items and uses knowledge graphs to provide additional contextual information. In contrast to the model in this paper, STCKA focuses on the entity information of the knowledge graph, but does not explicitly consider the paths between entities.

ERNIE (Zhang et al., 2019) is a pretraining-based language model that uses knowledge graphs to enhance text representation. Compared with the model in this paper, ERNIE focuses on the improvement of text representation, while the model in this paper combines entity and relationship information from the knowledge graph to construct recommendations.

KRED (Liu et al., 2020) combining knowledge graph and user behavior data to improve recommendation accuracy and diversity. Compared with the model in this paper, it does not consider whether there are cases where relational entities can misinterpret the general meaning of the article and cases where they have no influence on the article, and it is not comprehensive in terms of details.

The relational entity credibility recognition model (RTD) proposed in this paper outperforms other models in terms of recommendation effectiveness through news data within a week after removing untrusted relational entities considered by the model, demonstrating the effectiveness of the model in terms of news timeliness and the feasibility of the model. These news recommendation models are comparable to the model in this paper in that they both try to use knowledge graphs or other external information to improve the accuracy and relevance of recommendations. However, they differ in their implementations and concerns, making it meaningful to compare their performance and features for research.

Table 2.
Deleting Any Criteria Will Degrade Performance.

Parameter name Value

Learning rate 0.001

Regularization coefficient $λ$ $1 \times 10^{- 5}$

Neural network dimensions 128

Entity embedding dimensions 90

Parameter name	Value
Learning rate	0.001
Regularization coefficient $λ$	$1 \times 10^{- 5}$
Neural network dimensions	128
Entity embedding dimensions	90

5.5. Ablation Study

Next, we will explore whether the experimental results will be affected after deleting one of the two judgment conditions $P_{0}$ and $P_{1}$ in the two-layer confirmation model. Table 2 shows that deleting any judgment condition will reduce performance.

When we delete one of the two judgment conditions in the relational entity credibility identification model, we can see that when the judgment condition $P_{0}$ is deleted, the two performance indicators AUC and NDCG@10 decreased by 0.44% and 0.55%, respectively. When the judgment condition $P_{1}$ is deleted, AUC and NDCG@10. The decrease was 0.28% and 0.41%, respectively, indicating that it is effective to eliminate the two types of relational entities, which also proves that the credibility recognition model for relational entities proposed by us is feasible.

Figure 7.

Improvement rate of RTD in AUC index compared with other benchmark models. Note. AUC = area under the curve.

Figure 8.

Improvement rate of RTD in NDCG@10 index compared with other benchmark models.

It can be seen from Figures 7 and 8 that the proposed RTD model has the most improvement compared to the benchmark model FM in terms of AUC and NDCG, mainly because FM models differ from other deep learning models in that they use manual identification features as input, and in practical work, it is considered that the operability of identification features is not as good as the automatic identification features of deep learning. The second is the DKN model. Although DKN, STCKA, and ERNIE are three different types of knowledge-aware document understanding models, DKN only outputs the title of the article. When the title of the article does not match the content of the article, its recommendation system will be reduced (Figure 9).

By comparing and analyzing the results of experiments conducted on datasets between the proposed model RTD and other benchmark models, it can be concluded that RTD has higher performance than the other six classic benchmark models, and has a higher improvement compared to FM and DKN. The main reason is that the information used by FM is not comprehensive enough. DKN uses KG, but only outputs the headlines of news, and the content of news is not fully utilized. The model RTD in this article uses the idea of KRED to fully utilize the content of the entire news text, thereby improving the comprehensiveness of the information used in the model. Compared with NAML, RTD adheres to the idea of KRED and judges the frequency, category, and location of other entities in news text, which makes it more advantageous in processing entity details in news text. STCKA, ERNIE, and KRED do not consider the credibility of entities. During training, all entity information of news text is calculated, and the purpose of news article vectors is not strong. As shown in Tables 2 to 4, there are two judgment conditions $P_{0}$ and $P_{1}$ in the reliability identification model of the model RTD in this article, which are used to eliminate two relational entities with low reliability. If one of the judgment conditions $P_{0}$ and $P_{1}$ is removed separately, the experimental results will be reduced to varying degrees. Deleting the $P_{0}$ judgment condition will result in a performance decrease of 0.44% and 0.55% in the AUC and NDCG performance indicators, respectively, Deleting the $P_{1}$ judgment condition will result in performance differences between the AUC and the NDCG@10. The two performance indicators decreased by 0.28% and 0.41%, respectively. Finally, experiments were conducted in the dataset Microsoft News 3. In the performance indicator AUC of the model RTD in this article, compared to the benchmark models FM, DKN, NAML, STCKA, ERNIE, and KRED, the performance indicators increased by 5.3%, 3.6%, 2.3%, 2.8%, and 0.3%, respectively. In the performance indicator NDCG, compared to the benchmark models FM, DKN, NAML, STCKA, ERNIE, and KRED, the performance indicators were increased by 9.1%, 7.0%, 4.6%, 4.2%, 3.2%, and 0.3%, respectively, In the performance index $F_{1}$ , compared to the benchmark model, it has increased by 0.2% to 0.68%, indicating that the reliability concept proposed by the model RTD in this article has the effect of improving performance in news recommendation. The performance of the proposed RTD model is shown in bold in Table 3 and Table 4, the effectiveness is strongly illustrated.

Figure 9.

F1@K in top-K recommendation for MicrosoftNews3.

Table 3.

Comparison Results With Other Baseline Methods.

	AUC							NDCG@10
	Day 1	Day 2	Day 3	Day 4	Day 5	Day 6	Day 7	Overall	Overall
FM	0.6556	0.6600	0.6507	0.6587	0.6571	0.6395	0.6601	0.6565	0.2460
DKN	0.6645	0.6706	0.6685	0.6722	0.6621	0.6585	0.6655	0.6674	0.2509
NAML	0.6733	0.6802	0.6725	0.6779	0.6728	0.6699	0.6798	0.6755	0.2565
STCKA	0.6711	0.6807	0.6658	0.6801	0.6717	0.6612	0.6712	0.6727	0.2575
ERNIE	0.6765	0.6834	0.6761	0.6859	0.6719	0.6730	0.6828	0.6804	0.2600
KRED	0.6872	0.6921	0.6910	0.6977	0.6860	0.6759	0.6902	0.6904	0.2675
RTD	0.6880	0.6923	0.6918	0.6983	0.6869	0.6766	0.6910	0.6912	0.2684

Note. AUC = area under the curve.

Table 4.

Deleting Any Criteria Will Degrade Performance.

Model	AUC	NDCG@10
RTD	0.6912	0.2684
w/o $P_{0}$	0.6881 $↓$ 0.44%	0.2669 $↓$ 0.55%
w/o $P_{1}$	0.6892 $↓$ 0.28%	0.2673 $↓$ 0.41%

Note. AUC = area under the curve.

5.6. Case Study

We provide a case study on how to approximate the weight of relational entities in the article. As shown in Figure 1, the core view expressed in the article is that it is wrong for the United States to launch the war in Iraq, but because of the statements made by militants many times in the article, “the war in Iraq is correct,” it is likely to lead to the excessive weight of statements that are correct in the war in Iraq during the training process. Thus, the general meaning of the article is misinterpreted. To solve the above problems, we may consider extracting the statements with excessive weight and comparing them with other articles in the same event to see whether the weight of the statements is similar to that of the relevant statements in other articles. If not, reduce the weight of this statement in the article.

6. Conlusions

We propose a stability model to evaluate the credibility of each entity and its relational entities. The concept of credibility: that is, the impact on the semantics of the article is almost zero, or the entity vector and its relational entities that may distort the general meaning of the article in the semantics of the news article are eliminated. It puts forward a two-layer confirmation model. That is, a double evaluation standard is added to the credibility. Each entity and its related entities must meet two conditions at the same time to be judged not to be eligible. Our research is mainly applied to news recommendations. A large number of experiments show that our model is superior to other baseline models. In the future, we will further study the problems raised in the case of the above research.

Footnotes

Funding

This work was supported by Xihua University Science and Technology Innovation Competition Project for Postgraduate Students(Grant No:YK20240148).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Bordes

Usunier

Garcıa-Durán

Weston

Yakhnenko

(2013). Translating embeddings for modeling multi-relational data. In C. J. C. Burges, L. Bottou, Z. Ghahramani, and K. Q. Weinberger (Eds.), Advances in neural information processing systems 26: 27th annual conference on neural information processing systems 2013. ACM NeurIPS Proceedings of a meeting held 5–8 December 2013, Lake Tahoe, Nevada, United States (pp. 2787–2795). https://proceedings.neurips.cc/paper/2013/hash/1cecc7a77928ca8133fa24680a88d2f9-Abstract.html

Cantador

Bellogın

Castells

(2008). Ontology-based personalised and context-aware recommendations of news items. In 2008 IEEE/WIC/ACM international conference on web intelligence, WI 2008, 9–12 December 2008, Sydney, NSW, Australia, Main conference proceedings (pp. 562–565). IEEE Computer Society. https://doi.org/10.1109/WIIAT.2008.204

Capelle

Frasincar

Moerland

Hogenboom

(2012). Semantics-based news recommendation. In D. D. Burdescu, R. Akerkar, and C. Badica (Eds.), 2nd international conference on web intelligence, mining and semantics, WIMS’12, Craiova, Romania, 6–8 June 2012 (pp. 27:1–27:9). ACM. https://doi.org/10.1145/2254129.2254163

Capelle

Hogenboom

Frasincar

(2013). Semantic news recommendation using WordNet and Bing similarities. In S. Y. Shin, and J. C. Maldonado (Eds.), Proceedings of the 28th annual ACM symposium on applied computing, SAC ’13, Coimbra, Portugal, 18–22 March 2013 (pp. 296–302). ACM. https://doi.org/10.1145/2480362.2480426

Chen

Liu

Xiao

Jiang

(2019). Deep short text classification with knowledge powered attention. In The thirty-third AAAI conference on artificial intelligence, AAAI 2019, the thirty-first innovative applications of artificial intelligence conference, IAAI 2019, the ninth AAAI symposium on educational advances in artificial intelligence, EAAI 2019, Honolulu, Hawaii, USA, 27 January–1 February 2019 (pp. 6252–6259). AAAI Press. https://doi.org/10.1609/aaai.v33i01.33016252

Chu

Liu

Sun

Zhou

(2019). Next news recommendation via knowledge-aware sequential model. In M. Sun, X. Huang, H. Ji, Z. Liu, and Y. Liu (Eds.), Chinese computational linguistics—18th China national conference, CCL 2019, Kunming, China, 18–20 October 2019. Proceedings, Vol. 11856 of lecture notes in computer science (pp. 221–232). Springer. https://doi.org/10.1007/978-3-030-32381-3_18

Devlin

Chang

Lee

Toutanova

(2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, and T. Solorio (Eds.), Proceedings of the 2019 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019. Volume 1 (Long and Short Papers (pp. 4171–4186). Association for Computational Linguistics. https://doi.org/10.18653/v1/n19-1423

Febbo

P. D.

Mutto

C. D.

Tieu

Mattoccia

(2018). KCNN: Extremely-efficient hardware keypoint detection with a compact convolutional neural network. In 2018 IEEE conference on computer vision and pattern recognition workshops, CVPR workshops 2018, Salt Lake City, UT, USA, 18–22 June 2018 (pp. 682–690). Computer Vision Foundation/IEEE Computer Society. https://doi.org/10.1109/CVPRW.2018.00111 . http://openaccess.thecvf.com/content_cvpr_2018_workshops/w12/html/Di_Febbo_KCNN_Extremely-Efficient_Hardware_CVPR_2018_paper.html

Goossen

Jntema

W. I

Frasincar

Hogenboom

Kaymak

(2011). News personalization using the CF-IDF semantic recommender. In R. Akerkar (Ed.), Proceedings of the international conference on web intelligence, mining and semantics, WIMS 2011, Sogndal, Norway, 25–27 May 2011 (p. 10). ACM. https://doi.org/10.1145/1988688.1988701

10.

Liu

Zhao

(2015). Knowledge graph embedding via dynamic mapping matrix. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics and the 7th international joint conference on natural language processing of the Asian Federation of Natural Language Processing, ACL 2015, 26–31 July 2015, Beijing, China, Volume 1: Long papers (pp. 687–696). The Association for Computer Linguistics. https://doi.org/10.3115/v1/p15-1067

11.

Jntema

W. I

Goossen

Frasincar

Hogenboom

(2010). Ontology-based news recommendation. In F. Daniel, L. M. L. Delcambre, F. Fotouhi, I. Garrigós, G. Guerrini, J. Mazón, M. Mesiti, S. Müller, J. Trujillo, T. M. Truta, B. Volz, E. Waller, L. Xiong, and E. Zimányi (Eds.), Proceedings of the 2010 EDBT/ICDT workshops, Lausanne, Switzerland, 22–26 March 2010, ACM international conference proceeding series (Article No. 16, pp. 1–6). ACM. https://doi.org/10.1145/1754239.1754257

12.

Joseph

Jiang

(2019). Content based news recommendation via shortest entity distance over knowledge graphs. In S. Amer-Yahia, M. Mahdian, A. Goel, G. Houben, K. Lerman, J. J. McAuley, R. Baeza-Yates, and L. Zia (Eds.), Companion of the 2019 world wide web conference, WWW 2019, San Francisco, CA, USA, 13–17 May 2019 (pp. 690–699). ACM. https://doi.org/10.1145/3308560.3317703

13.

Kumar

Kulkarni

(2013). Graph based techniques for user personalization of news streams. In R. K. Shyamasundar, L. Shastri, D. Janakiram, and S. Padmanabhuni (Eds.), Proceedings of the 6th ACM india computing convention, COMPUTE 2013, Vellore, Tamil Nadu, India, 22–24 August 2013 (pp. 12:1–12:7). ACM. https://doi.org/10.1145/2522548.2523129

14.

Liu

Lian

Wang

Qiao

Chen

Sun

Xie

(2020). KRED: Knowledge-aware document representation for news recommendations. In R. L. T. Santos, L. B. Marinho, E. M. Daly, L. Chen, K. Falk, N. Koenigstein, and E. S. de Moura (Eds.), RecSys 2020: Fourteenth ACM conference on recommender systems, virtual event, Brazil, 22–26 September 2020, (pp. 200–209). ACM. https://doi.org/10.1145/3383313.3412237

15.

Rao

Jia

Feng

Zhao

(2013). Personalized news recommendation using ontologies harvested from the web. In J. Wang, H. Xiong, Y. Ishikawa, J. Xu, and J. Zhou (Eds.), Web-age information management—14th international conference, WAIM 2013, Beidaihe, China, 14–16 June 2013. Proceedings, Vol. 7923 of lecture notes in computer science (pp. 781–787). Springer. https://doi.org/10.1007/978-3-642-38562-9_79

16.

Raza

Ding

(2021). Deep neural network to tradeoff between accuracy and diversity in a news recommender system. In Y. Chen, H. Ludwig, Y. Tu, U. M. Fayyad, X. Zhu, X. Hu, S. Byna, X. Liu, J. Zhang, S. Pan, V. Papalexakis, J. Wang, A. Cuzzocrea, and C. Ordonez (Eds.), 2021 IEEE international conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021 (pp. 5246–5256). IEEE. https://doi.org/10.1109/BigData52589.2021.9671467

17.

Rendle

(2010). Factorization machines. In G. I. Webb, B. Liu, C. Zhang, D. Gunopulos, and X. Wu (Eds.), ICDM 2010, the 10th IEEE international conference on data mining, Sydney, Australia, 14–17 December 2010 (pp. 995–1000). IEEE Computer Society. https://doi.org/10.1109/ICDM.2010.127

18.

Wang

Cao

Liu

Chua

(2019). KGAT: Knowledge graph attention network for recommendation. Vol. abs/1905.07854, 2019. http://arxiv.org/abs/1905.07854

19.

Wang

Zhang

Chen

Liu

(2020). Incorporating knowledge and content information to boost news recommendation. In: X. Zhu, M. Zhang, Y. Hong, and R. He (Eds.), Natural language processing and Chinese computing—9th CCF international conference, NLPCC 2020, Zhengzhou, China, 14–18 October 2020, Proceedings, Part I, Vol. 12430 of lecture notes in computer science (pp. 443–456). Springer. https://doi.org/10.1007/978-3-030-60450-9_35

20.

Wang

Zhang

Xie

Guo

(2018). DKN: Deep knowledge-aware network for news recommendation. In Proceedings of the 2018 world wide web conference on world wide web, WWW 2018, Lyon, France, 23–27 April 2018 (pp. 1835–1844). ACM. https://doi.org/10.1145/3178876.3186175

21.

Huang

Xie

(2019). Neural news recommendation with attentive multi-view learning. Vol. abs/1907.05576. http://arxiv.org/abs/1907.05576

22.

Zhang

Han

Liu

Jiang

Sun

Liu

(2019). ERNIE: enhanced language representation with informative entities. In A. Korhonen, D. R. Traum, and L. Màrquez (Eds.), Proceedings of the 57th conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, 28 July–2August 2019, Volume 1: Long papers (pp. 1441–1451). Association for Computational Linguistics. https://doi.org/10.18653/v1/p19-1139

A Novel News Recommendation Model With Knowledge Enhancement and Stability

Abstract

Keywords

1. Introduction

2.1. Definitions

3.1. News Recommendation Method Based on Nonneural Network

3.2. News Recommendation Method Based On Neural Network

5.1. Dataset and Settings

5.2. Algorithm and Parameter Settings

5.3. Evaluation Indicators

5.4. Benchmark Model Comparison and Experiments

Table 2. Deleting Any Criteria Will Degrade Performance. Parameter name Value Learning rate 0.001 Regularization coefficient λ 1 × 10 − 5 Neural network dimensions 128 Entity embedding dimensions 90

6. Conlusions

Footnotes

Funding

Declaration of Conflicting Interests

Notes

References

Table 2.
Deleting Any Criteria Will Degrade Performance.

Parameter name Value

Learning rate 0.001

Regularization coefficient $λ$ $1 \times 10^{- 5}$

Neural network dimensions 128

Entity embedding dimensions 90