Sage Journals: Discover world-class research

Abstract

Current dynamic graph anomaly detection models learn multibehavior patterns for abnormal edges poorly and rely too much on the differences in long-term snapshots. Aiming at the above problems, combine dual behavior contrast dynamic graph anomaly detection model is proposed. Firstly, a dual behavior learning module is designed, where the role-based behavior learning submodule constructs graphlet degree vector by identifying four self-isomorphic orbits to capture deep structural features, while the attribute-based behavior learning submodule obtains attribute vectors through graph convolutional network. Then, the results are combined in the dynamic edge representation module to form the dynamic representations of edges to capture dual behavior patterns. Lastly, the anomaly detection module is designed to detect newly generated edges by combining contrastive learning with gated recurrent unit. We conduct experiments from four perspectives: anomaly detection accuracy, parameter sensitivity, robustness of module variants, and model runtime efficiency. The results demonstrate that the model achieves a peak accuracy of 92.05% in the task of dynamic edge anomaly detection.

Keywords

graph anomaly detection contrastive learning graphlet

1. Introduction

Graph anomaly detection is widely used in many domains, including finance, news, and social networks (Xu et al., 2018). From the perspective of graph composition, dynamic graph anomaly detection can be categorized into abnormal node detection, abnormal edge detection, abnormal subgraph detection, and abnormal graph detection. Most research focuses on abnormal node detection but overlooks the importance of abnormal edge detection, such as messages exchanged between users in social networks, which include information such as the timestamp and content of the messages, the information is modeled as dynamic edges, carrying significant semantic features within dynamic graphs. Therefore, we aim to detect abnormal edges in dynamic graphs.

In the task of dynamic abnormal edge detection, methods based on graph neural networks (GNNs) have demonstrated superior performance. Early dynamic graph anomaly detection methods (e.g., Ji et al., 2013; Ranshous et al., 2016) perform anomaly detection by defining specific anomaly patterns for structure changes. However, such methods rely too much on predefined anomaly patterns and only apply to specific data types. To solve this problem, feature extraction-based methods become mainstream, which is divided into graph embedding methods and spatiotemporal feature-based methods. Graph embedding methods continuously update the node embedding by traversal methods such as random walking, depth-first search (DFS), and breadth-first search (BFS; e.g., Grover & Leskovec, 2016; Perozzi et al., 2014; Von Luxburg, 2007; Yu et al., 2018). Although this type of method can be applied to various types of data, they do not adequately capture the neighborhood features in the graph, leading to suboptimal performance. Additionally, spatiotemporal feature-based methods, while capturing features from both spatial and temporal dimensions, overlook the role patterns of dynamic edges, which presents certain limitations (e.g., Cai et al., 2021; Yang et al., 2023; Zheng et al., 2019). Recently, reconstruction-based methods (e.g., Gao et al., 2022; Liu et al., 2021) have achieve superior performance.

However, existing research still has some problems. Firstly, since abnormal edges in the network will be camouflaged by establishing connections with normal entities, and the feature extraction-based methods usually use one-hop neighbor information of the node as input, the deep structural features of nodes are ignored. As shown in Figure 1(a), an abnormal edge between the nodes $v_{6}$ and $v_{7}$ emerged when the graph changes from a sparse graph at timestamp t to a dense graph at timestamp t+k. For the node $v_{5}$ , the one-hop neighbors keep still while the subgraph changes, so it is insufficient to represent the impact of the newly added abnormal edge if paying attention only to a one-hop neighbor. This is the problem of lacking global views. Therefore, how to mine the deep structural features of nodes to obtain global views? This is the first challenge.

Figure 1.

Two challenges of existing methods: (a) the one-hop neighbors are insufficient to represent the impact of the newly added edges and (b) using only reconstruction error as the anomaly score is too one-sided.

Secondly, reconstruction-based methods usually take the reconstruction errors of long-term snapshots as anomaly scores, which only focus on the graph structure changes between snapshots instead of distinguishing normal and abnormal edges. As shown in Figure 1(b), a normal edge is added from $G^{1}$ to $G^{2},$ while an abnormal edge is added from $G^{t - 1}$ to $G^{t}$ , and reconstruction errors can only measure the structural differences between two graphs, but fail to capture the changes in the graphs when normal or abnormal edges are added in real-time. This is the problem of ignoring edge differences. Therefore, the second challenge lies in effectively capturing structural changes while enhancing the difference between normal and abnormal edges.

To address the first challenge, the key is to capture the deep features of nodes. Inspired by the Orbit Counting Algorithm (ORCA; Hočevar & Demšar, 2014), small-induced subgraphs called graphlets are emerging as a possible tool for exploring the global and local structure of networks and analyzing the roles of individual nodes. We consider that the behavior patterns of the roles constituted by the graphlet degree vector (GDV) of the nodes can provide a more comprehensive view of abnormal edge detection. To address the second challenge, it is necessary to increase the difference between normal and abnormal edges, and contrastive learning (Zhu et al., 2021) is adopted to learn representations by contrasting positive samples against negative ones for anomaly detection. Based on these, a combine dual behavior contrast dynamic graph anomaly detection model (CDBC-DGADM) is proposed to realize edge anomaly detection. The main contributions of the paper can be summarized as follows:

An anomaly detection model CDBC-DGADM for dynamic graphs is proposed, in which, a dual behavior learning module (DBLM) is designed to capture the features of edges fully, and an anomaly detection module (ADM) is designed to amplify the differences between normal and abnormal edges by contrastive learning.

In DBLM, the role-based behavior patterns are captured by GDV, and the attribute-based behavior patterns are obtained through a graph convolutional network (GCN).

Through extensive experiments on multiple real-world datasets, the effectiveness of CDBC-DGADM is shown on all evaluated datasets.

The rest of this paper is organized as follows: Section 2 introduces the related works; Section 3 presents the architecture and the details of CDBC- DGADM; Section 4 shows the experimental results; and Section 5 concludes the paper and discusses future works.

2. Related Work

Methods of existing dynamic graph anomaly detection are mainly categorized into traditional graph analysis methods, feature extraction-based, and reconstruction-based methods.

2.1. Traditional Graph Analysis Methods

Traditional graph analysis methods usually define specific graph anomaly patterns (Ji et al., 2013; Ranshous et al., 2016). For example, Count-Min-Sketch (Ranshous et al., 2016) leverages the Count-Min sketch, a probabilistic data structure, to approximate global and local structural features of the graph, and IcLEOD (Ji et al., 2013) constructing a local substructure named Corenet for each node and detecting local evolutionary outliers by comparing these substructures between different snapshots. However, such methods rely too much on their predefined anomaly patterns and are suitable only to some specific data types.

2.2. Feature Extraction-Based Methods

To increase the scalability of detection methods and fully capture the features of dynamic graphs, feature extraction-based methods become mainstream, which can be divided into graph embedding methods and spatiotemporal feature-based methods. Graph embedding methods (Grover & Leskovec, 2016; Perozzi et al., 2014; Von Luxburg, 2007; Yu et al., 2018) capture the features of dynamic graphs by traversal methods such as random walking, DFS, and BFS firstly, then combine with classification or clustering algorithm for edge anomaly detection. However, such methods focus on structural relationships learning while neglecting temporal relationships learning, so spatiotemporal feature-based methods (Cai et al., 2021; Yang et al., 2023; Zheng et al., 2019) gradually gain attention, which captures the spatial features by GNN and the temporal features through recurrent neural network (RNN).

2.3. Reconstruction-Based Methods

The reconstruction-based method uses an autoencoder-like framework combined with GNN to learn and update node embeddings, and compute anomaly scores from reconstruction errors, for example, Transformer-based Anomaly Detection framework for DYnamic graphs (TADDY; Liu et al., 2021) and Combine Dual Behavior Contrast Dynamic Graph Anomaly Detection Model (CDBC-DGADM) (Gao et al., 2022). In recent years, contrastive learning has shown superior performance in the field of anomaly detection tasks for its ability to strength differences. For example, AddGraph (Zheng et al., 2019) widens the gap between the normal and the abnormal edges by constructing positive and negative samples, where the negative edges are based on the Bernoulli distribution.

However, according to the analysis in the Introduction section, existing methods still have the problems of lacking global views and ignoring edge differences. Therefore, CDBC-DGADM is designed to address these problems.

3. Methodology

3.1. Problem Definition

A dynamic graph consisting of T snapshots can be represented as ${G^{t}}_{t = 1}^{T}$ , the snapshot of timestamp t can be expressed as $G^{t} = {V^{t}, E^{t}}$ , ${V^{t}, E^{t}}$ denote the node set and edge set at timestamp t, respectively, where $e_{i j}^{t} = (v_{i}^{t}, v_{j}^{t}) \in {E^{t}}$ , $(v_{i}^{t}, v_{j}^{t}) \in {V^{t}}$ . $A^{t}$ is the adjacency matrix at timestamp t, where $A_{i, j}^{t} = 1$ if there is a connecting edge between $v_{i}^{t}$ and $v_{j}^{t}$ , otherwise $A_{i, j}^{t} = 0$ .

The research objective is to design the anomaly scoring function $f (\cdot)$ , calculate the anomaly score $η_{i j}^{T}$ of the edge $e_{i j}^{T}$ , then determine M edges with the largest anomaly score as abnormal edges.

For ease of understanding, the mathematical symbols used in this paper are summarized in Table 1.

Table 1.
Commonly Used Mathematical Symbols With Explanations.

Mathematical symbol Explanation

$G^{t} = {V^{t}, E^{t}}$ The snapshot graph at timestamp t.

${G^{t}}_{t = 1}^{T}$ A dynamic graph consisting of T snapshots.

${V^{t}}$ The node set at timestamp t.

${E^{t}}$ The edge set at timestamp t.

$v_{i}^{t}$ The node at timestamp t.

$e_{i j}^{t}$ The edge between node $v_{i}$ and node $v_{j}$ at timestamp t.

$A$ The adjacency matrix.

$D$ The degree matrix of $A$ .

$f (\cdot)$ An anomaly scoring function.

$η_{i j}^{t}$ The anomaly scores of the edges $e_{i j}^{t}$ .

M The number of artificially injected abnormal edges.

$m^{t}$ The number of original edges in the snapshot $G^{t}$ .

$g_{i}^{t}$ The graphlet vector of node $v_{i}$ at timestamp t.

$r_{i}^{t}$ The reset gate that balances the input and memory at timestamp t.

$z_{i}^{t}$ The update gate that controls the output at timestamp t.

$d_{i}^{t}$ The hidden layers of node $v_{i}$ at timestamp t

$W_{r}$ , $U_{r}$ The weight matrices of reset gate.

$W_{z}$ , $U_{z}$ The weight matrices of update gate.

$W_{d}$ , $U_{d}$ The weight matrices of hidden layer.

W The parameter matrix of the output layer.

$b_{r}$ , $b_{z}$ , $b_{d}$ The biases of reset gate, update gate, and hidden layer, respectively.

$y_{i}^{t}$ The role-based behavior pattern of node $v_{i}$ at timestamp t.

$h_{i}^{t}$ The attribute vectors of node $v_{i}$ at timestamp t.

$s_{i}^{t}$ The attribute-based behavior pattern of node $v_{i}$ at timestamp t.

$a_{i}^{t}$ The final vector representation of the node $v_{i}$ .

$W_{s}$ The weight matrix of the multilayer perceptron (MLP).

$b_{s}$ The bias coefficient of the MLP.

$β$ Training ratio.

$p$ Anomaly injection ratio.

n The average number of nodes in the snapshots.

C The number of iterations.

Mathematical symbol	Explanation
$G^{t} = {V^{t}, E^{t}}$	The snapshot graph at timestamp t.
${G^{t}}_{t = 1}^{T}$	A dynamic graph consisting of T snapshots.
${V^{t}}$	The node set at timestamp t.
${E^{t}}$	The edge set at timestamp t.
$v_{i}^{t}$	The node at timestamp t.
$e_{i j}^{t}$	The edge between node $v_{i}$ and node $v_{j}$ at timestamp t.
$A$	The adjacency matrix.
$D$	The degree matrix of $A$ .
$f (\cdot)$	An anomaly scoring function.
$η_{i j}^{t}$	The anomaly scores of the edges $e_{i j}^{t}$ .
M	The number of artificially injected abnormal edges.
$m^{t}$	The number of original edges in the snapshot $G^{t}$ .
$g_{i}^{t}$	The graphlet vector of node $v_{i}$ at timestamp t.
$r_{i}^{t}$	The reset gate that balances the input and memory at timestamp t.
$z_{i}^{t}$	The update gate that controls the output at timestamp t.
$d_{i}^{t}$	The hidden layers of node $v_{i}$ at timestamp t
$W_{r}$ , $U_{r}$	The weight matrices of reset gate.
$W_{z}$ , $U_{z}$	The weight matrices of update gate.
$W_{d}$ , $U_{d}$	The weight matrices of hidden layer.
W	The parameter matrix of the output layer.
$b_{r}$ , $b_{z}$ , $b_{d}$	The biases of reset gate, update gate, and hidden layer, respectively.
$y_{i}^{t}$	The role-based behavior pattern of node $v_{i}$ at timestamp t.
$h_{i}^{t}$	The attribute vectors of node $v_{i}$ at timestamp t.
$s_{i}^{t}$	The attribute-based behavior pattern of node $v_{i}$ at timestamp t.
$a_{i}^{t}$	The final vector representation of the node $v_{i}$ .
$W_{s}$	The weight matrix of the multilayer perceptron (MLP).
$b_{s}$	The bias coefficient of the MLP.
$β$	Training ratio.
$p$	Anomaly injection ratio.
n	The average number of nodes in the snapshots.
C	The number of iterations.

3.2. The Architecture of CDBC-DGADM

The framework of the CDBC-DGADM is shown in Figure 2.

Figure 2.

The overall framework of combine dual behavior contrast dynamic graph anomaly detection model (CDBC-DGADM).

The model consists of five modules: (1) input module, (2) DBLM, (3) dynamic edge representation module (DERM), (4) ADM, and (5) output module. Among them, the DBLM can be further divided into two submodules: role-based behavior pattern learning submodule (RBPLS) and attribute-based behavior pattern learning submodule. Firstly, the input module includes a dynamic graph consisting of T snapshots. Secondly, dynamic role-based behavior patterns and attribute-based patterns of the edges are learned by DBLM and combined to form the dynamic edge representations in DERM. At last, newly generated edges are used as negative samples to calculate the anomaly score of the target edges by contrastive learning. The detailed descriptions of the modules are as follows.

3.3. Input Module

In the input module, the dynamic graph consisting of consecutive snapshots ${G^{t}}_{t = 1}^{T}$ is taken as the model input, including the target edge $e_{i j}^{t}$ , structural features of nodes $v_{i}$ , $v_{j}$ .

3.4. Dual Behavior Learning Module (DBLM)

This module is used to learn about role-based behavior patterns and attribute-based behavior patterns in dynamic graphs. When capturing spatiotemporal features in dynamic graphs, existing methods typically extract spatial features from neighbors in a single snapshot while capturing temporal features from consecutive snapshots (Ranshous et al., 2015; Wang et al., 2019). However, these methods only learn simple features and do not consider the complex behavior patterns of dynamic graphs. To effectively detect abnormal edges in dynamic graphs, this module learns dual behavior patterns.

3.4.1. Role-Based Behavior Patterns Learning Submodule

This submodule learns the role-based behavior patterns by constructing GDV to obtain a global view. Compared to the shallow structural features used in traditional methods, the GDV can capture richer and more complex local structural information, providing more context-aware and multiscale structural features of nodes. Therefore, we construct the GDV and use it as the role vector for the nodes.

Specifically, a GDV is a nonisomorphic subgraph of a graph, which is widely used to determine the role of a node in a network (Sarajlić et al., 2016; Yaveroğlu et al., 2014). This module uses the ORCA (Hočevar & Demšar, 2014) for GDV sampling. In Figure 3, four self-isomorphic orbits are included in three different subgraphs, labeled as 0, 1, 2, and 3 in turn, and the number of nodes participating in each of the four orbits is calculated separately. For example, the GDV of node $v_{3}$ at timestamp t can be expressed as $g_{3}^{t} = [1, 0, 0, 0]$ . At timestamp, $t + 1$ , the edge between $v_{3}$ and $v_{5}$ is added, and the GDV of the $v_{3}$ is changed accordingly. The red color in Figure 3 indicates the value change of GDVs. Overall, we can get the role vector by calculating orbits.

Based on the above properties of the graphlet, the research captures dynamic role-based behavior patterns. Firstly, the GDV of node $g_{i}^{t}$ are extracted from the snapshots, respectively. Secondly, to capture the temporal features of the dynamic graph, gated recurrent unit (GRU) that can perform well in long sequence tasks is used (Dey & Salem, 2017). Specifically, the GDV is input to the GRU to obtain the role-based behavior pattern of node $v_{i}$ at timestamp t, which is calculated as follows:

\begin{aligned} r_{i}^{t} & = σ (W_{r} g_{i}^{t} + U_{r} d_{i}^{t - 1} + b_{r}), \end{aligned}

(1)

\begin{aligned} z_{i}^{t} & = σ (W_{z} g_{i}^{t} + U_{z} d_{i}^{t - 1} + b_{z}), \end{aligned}

(2)

\begin{aligned} d_{i}^{t} & = (1 - z_{i}^{t}) \cdot d_{i}^{t - 1} + z_{i}^{t} \cdot \tanh (W_{d} g_{i}^{t} + U_{r} r_{i}^{t} d_{i}^{t - 1} + b_{d}), \end{aligned}

(3)

\begin{aligned} y_{i}^{t} & = σ (W d_{i}^{t}) . \end{aligned}

(4)

Figure 3.

Graphlet degree vector (GDV) sampling.

The output $y_{i}^{t}$ is the role behavior pattern of node $v_{i}$ at the timestamp t.

3.4.2. Attribute-Based Behavior Patterns Learning Submodule

The submodule learns the attribute-based behavior patterns of nodes. First, node embedding is performed by some graph embedding methods (e.g., GCN and graph autoencoder), and we choose GCN to model the input graph in this subsection. Specifically, taking node $v_{i}$ as an example, using GCN to learn the attribute vectors $h_{i}$ , which can be represented as follows:

H^{(l)} = ϕ (D^{- 1 / 2} \tilde{A} D^{1 / 2} H^{(l - 1)} W^{(l - 1)}),

(5)

where

\tilde{A} = A + I

is the graph adjacency matrix with self-loop,

D

is the degree matrix of

A

, defined as

D_{i i} = \sum_{j = 1}^{n} A_{i j}

W^{(l - 1)} \in R^{(l - 1) \times (l)}

is weight matrix of the (l-1)th layer,

ϕ (\cdot)

is the activation function such as ReLU. The attribute vector of the target node

h_{i}^{t}

is the ith row of the output matrix

H^{L}

Then follow the same steps as the RBPLS: use the attribute vector $h_{i}^{t}$ as the input to the GRU to obtain the attribute-based behavior pattern $s_{i}^{t}$ of node $v_{i}$ at timestamp t.

Finally, the final vector embedding $a_{i}^{t}$ is obtained by performing a dot product between the role vector $y_{i}^{t}$ and attribute vector $s_{i}^{t}$ :

a_{i}^{t} = y_{i}^{t} \cdot s_{i}^{t} .

(6)

3.5. Dynamic Edge Representation Module (DERM)

This module is used to generate the dynamic representation of the target edge. The specific operation involves concatenating the representations $a_{i}^{t}$ and $a_{j}^{t}$ to obtain the dynamic edge representation $e_{i j}^{t}$ , which combines the role-based behavior pattern and the attribute-based behavior pattern. The calculation process can be expressed as follows:

e_{i j}^{t} = a_{i}^{t} \oplus a_{j}^{t} .

(7)

3.6. Anomaly Detection Module (ADM)

This module is designed to get an anomaly score of the target edge. The commonly used reconstruction-based methods (Gao et al., 2022; Liu et al., 2021) calculate the reconstruction error between different snapshots as the anomaly score, which cannot effectively distinguish normal edges from abnormal edges. To solve this problem, contrastive learning is used in this module. As shown in Figure 4, the edges present in the graph are regarded as positive examples, and the edges constructed by the abnormal edge injection strategy (Yu et al., 2018) are regarded as negative examples.

Figure 4.

Positive and negative edge sampling.

Then, $e_{i j}^{t}$ is the input to the fully connected layer, the output $η_{i j}^{t}$ is the anomaly score of the edge between $v_{i}$ and $v_{j}$ , and the sigmoid is the activation function.

η_{i j}^{t} = f (e_{i j}^{t}) = sigmoid (e_{i j}^{t} W_{s} + b_{s}) .

(8)

3.7. Output Module

Finally, this module uses a binary cross-entropy (BCE) loss function to detect abnormal edge, which is represented as follows:

L = - \sum^{m^{t}} \log (1 - f (e_{i j}^{t, P})) + \log (f (e_{p q}^{t, N})) .

(9)

The larger the anomaly score of an edge is, the more likely it is to be an abnormal edge. Finally, all anomaly scores of edges are sorted in descending order, and the first M edges are considered abnormal edges.

The detailed algorithm for the model training is shown in Algorithm 1. In particular, the value of $g_{i}^{t}$ is obtained using the ORCA algorithm, the role-based behavior patterns $y_{i}^{t}$ are derived from the GRU, and the attribute-based behavior patterns $s_{i}^{t}$ are obtained by using GCN. The embedding $a_{j}^{t}$ is calculated by taking the dot-product of $y_{i}^{t}$ and $s_{i}^{t}$ . Subsequently, the embedding of node $v_{j}$ , denoted as $a_{j}^{t}$ , is obtained through the same steps. The embeddings of dynamic edges are then computed by concatenating $a_{i}^{t}$ and $a_{j}^{t}$ . Finally, the anomaly scores of the edges are calculated by using BCE loss.

3.8. Complexity Analysis

The time complexity of the model is analyzed by considering each significant component of the CDBC-DGADM separately. First, for the DERM, the time complexity is $O (T n^{2})$ , where T is the number of snapshots, and n is the average number of nodes in the snapshots. For the ADM, the time complexity is $O (C n)$ , where C is the number of iterations. Finally, the total time complexity is $O (n (C + T n))$ .

4. Experiments

The effectiveness and efficiency of the model are demonstrated by conducting experiments on four publicly available datasets and analyzing the results in detail. In the experimental part, the model is evaluated by analyzing the following four questions.

RQ1:
Does CDBC-DGADM have better performance compared to existing models?
RQ2:
Does the key parameters affect the efficiency of the model?
RQ3:
How do the role-based behavior patterns and attribute-based behavior patterns in CDBC-DGADM affect the detection results, respectively?
RQ4:
How is the computational cost of CDBC-DGADM?

4.1. Experimental Setup

4.1.1. Datasets

To evaluate the performance of the proposed model, four real-world dynamic graph datasets are used, listed in Table 2.

Social Networks. UCI Messages (Opsahl & Panzarasa, 2009) is a small anonymized social network, where nodes denote users and edges denote message exchanges between users, while each edge is labeled with a timestamp.

News networks. Digg (De Choudhury et al., 2009) is a user behavior dataset. It contains users’ voting, commenting, tagging, and posting behaviors on news, where nodes represent users, edges are messages between users, and each message carries a timestamp.

Transaction networks. Bitcoin-Alpha (Kumar et al., 2016) and Bitcoin-OTC (Kumar et al., 2018) are transaction datasets for Bitcoin, in which nodes are transaction users and edges represent scoring behaviors between users with timestamps.

Table 2.
Datasets.

Dataset Nodes Edges Avg. degree

UCI Messages 1,899 13,888 14.57

Digg 30,360 85,155 5.61

Bitcoin-Alpha 3,777 24,173 12.80

Bitcoin-OTC 5,881 35,588 12.10

Dataset	Nodes	Edges	Avg. degree
UCI Messages	1,899	13,888	14.57
Digg	30,360	85,155	5.61
Bitcoin-Alpha	3,777	24,173	12.80
Bitcoin-OTC	5,881	35,588	12.10

Firstly, in the preprocessing stage, based on the references (Yu et al., 2018; Zheng et al., 2019), duplicate edges are removed because edges with the same timestamp belong to the same snapshot.

Secondly, when choosing anomaly generation strategies for self-supervised anomaly injection, it is important to ensure that the injected anomalies are as similar as possible to real anomalies, otherwise, the results of model training and evaluation may not accurately reflect the performance in real datasets. Therefore, the type and number of injected anomalies, as well as the way of injecting anomalies, need to be carefully selected. To this end, the same anomaly generation method is used as in the reference Yu et al. (2018), for the snapshots $G^{t}$ , M pairs of nodes are connected randomly, where $M = p \times m^{t}$ is the percentage of abnormal edges, $p$ is the anomaly injection ratio and $m^{t}$ is the number of abnormal edges in $G^{t}$ .

4.1.2. Baselines

To further test the detection performance of the models proposed in this article, the following seven classical dynamic graph anomaly detection models are selected as benchmarks for experimental comparison:

DeepWalk+support vector machine (SVM) (Perozzi et al., 2014): generates node sequences by random walk and learns low-dimensional representations of nodes by Skip-gram. Then, SVM is used for classification of the edges.

Node2vec+SVM (Grover & Leskovec, 2016): learns embeddings of nodes by random walk combing BFS and DFS. Then, SVM is used for classification.

Spectral Clustering (Von Luxburg, 2007): considers maintaining local connectivity in the graph as the objective, and learns embedding representations of nodes by maximizing node similarity in the neighborhood.

NetWalk (Yu et al., 2018): learns the representation of nodes and edges by random walk, and determines whether the edge is abnormal by the distance of the target edge to the clustering center.

AddGraph (Zheng et al., 2019): simulates a dynamic process by continuously adding new edges to the initial graph, and uses a GNN-based autoencoder for embedding learning and anomaly detection.

StrGNN (Cai et al., 2021): constructs h-hop subgraph structures and uses a combination of GCN and GRU for feature extraction and timing modeling.

TADDY (Liu et al., 2021): captures informative representation from dynamic graphs with coupled spatial–temporal patterns via a dynamic graph transformer model.

4.1.3. Evaluation Metrics

Receiver operating characteristic-area under the curve (AUC; (Fawcett, 2005)) is used as an evaluation metric to measure the model’s performance. The closer the AUC is to 1, the better the performance of the method.

4.1.4. Experimental Settings

In the experiments, Pytorch implements the models. The experiments run on a Windows server with Intel Core i5-9300 CPU@2.90GHz and GeForce 1650.

The dimension of the GDV is set to 4, the learning rate of the Adam optimizer is set to 0.001, and the number of training rounds is set to 100 for the UCI Messages, Bitcoin-Alpha, and Bitcoin-OTC, 200 for Digg. The dimension of the hidden layer of GRU is set to 128. The snapshot size is set to 1,000 for the Bitcoin-OTC and UCI Messages, 2,000 for Bitcoin-Alpha, and 6,000 for Digg. The training ratio $β$ is set to 80%, and the anomaly injection ratio $p$ is set to 10%.

4.2. Performance Comparison

In response to RQ1, CDBC-DGADM is compared with baselines. The experimental results are summarized in Table 3, where bold indicates the optimal performance.

Table 3.
Results on Four Real-World GAD Datasets.

Models UCI Message Digg Bitcoin-Alpha Bitcoin-OTC

DeepWalk+SVM 0.6837 $\pm$ 0.0079 0.6385 $\pm$ 0.0065 0.6793 $\pm$ 0.0024 0.7279 $\pm$ 0.0013

Node2vec+SVM 0.6867 $\pm$ 0.0158 0.6512 $\pm$ 0.0013 0.6774 $\pm$ 0.0036 0.6757 $\pm$ 0.0024

Spectral Clustering 0.5832 $\pm$ 0.0158 0.5548 $\pm$ 0.0125 0.7135 $\pm$ 0.0056 0.7039 $\pm$ 0.0034

NetWalk 0.7135 $\pm$ 0.0017 0.6794 $\pm$ 0.0136 0.8353 $\pm$ 0.0018 0.7547 $\pm$ 0.0026

AddGraph 0.7654 $\pm$ 0.0014 0.8354 $\pm$ 0.0025 0.8499 $\pm$ 0.0039 0.8584 $\pm$ 0.0018

StrGNN 0.7954 $\pm$ 0.0063 0.8272 $\pm$ 0.0125 0.8634 $\pm$ 0.0057 0.8843 $\pm$ 0.0026

TADDY 0.8168 $\pm$ 0.0019 0.816 $\pm$ 0.0013 0.8857 $\pm$ 0.0127 0.8993 $\pm$ 0.0067

CDBC-DGADM 0.8315 $\pm$ 0.0030 0.8438 $\pm$ 0.0045 0.9032 $\pm$ 0.0014 0.9205 $\pm$ 0.0079

Models	UCI Message	Digg	Bitcoin-Alpha	Bitcoin-OTC
DeepWalk+SVM	0.6837 $\pm$ 0.0079	0.6385 $\pm$ 0.0065	0.6793 $\pm$ 0.0024	0.7279 $\pm$ 0.0013
Node2vec+SVM	0.6867 $\pm$ 0.0158	0.6512 $\pm$ 0.0013	0.6774 $\pm$ 0.0036	0.6757 $\pm$ 0.0024
Spectral Clustering	0.5832 $\pm$ 0.0158	0.5548 $\pm$ 0.0125	0.7135 $\pm$ 0.0056	0.7039 $\pm$ 0.0034
NetWalk	0.7135 $\pm$ 0.0017	0.6794 $\pm$ 0.0136	0.8353 $\pm$ 0.0018	0.7547 $\pm$ 0.0026
AddGraph	0.7654 $\pm$ 0.0014	0.8354 $\pm$ 0.0025	0.8499 $\pm$ 0.0039	0.8584 $\pm$ 0.0018
StrGNN	0.7954 $\pm$ 0.0063	0.8272 $\pm$ 0.0125	0.8634 $\pm$ 0.0057	0.8843 $\pm$ 0.0026
TADDY	0.8168 $\pm$ 0.0019	0.816 $\pm$ 0.0013	0.8857 $\pm$ 0.0127	0.8993 $\pm$ 0.0067
CDBC-DGADM	0.8315 $\pm$ 0.0030	0.8438 $\pm$ 0.0045	0.9032 $\pm$ 0.0014	0.9205 $\pm$ 0.0079

Note. SVM = support vector machine; TADDY = Transformer-based Anomaly Detection framework for DYnamic graphs; CDBC-DGADM = combine dual behavior contrast dynamic graph anomaly detection model.

From Table 3, it can be seen that CDBC-DGADM exhibits the best detection performance on all datasets.

Through the observation and analysis of the experimental results, the following conclusions can be obtained: (1)

DeepWalk+SVM, Node2vec+SVM, Spectral Clustering, and NetWalk use graph embedding to learn and update the dynamic representation of nodes, which do not make full use of the node neighborhood information, and thus lead to poor results.

(2)

The spatiotemporal feature-based methods such as AddGraph and StrGNN learn the behavior evolution pattern in continuous snapshots, which fully use the temporal information in the dynamic graph.

(3)

The reconstruction-based method TADDY performs well because the majority of abnormal edges are indeed those that are difficult to reconstruct. However, it achieves suboptimal performance because it ignores the role-based behavior patterns of the edges.

(4)

CDBC-DGADM has the highest detection accuracy because it fully learns rich role and attribute features, meanwhile, it can detect newly generated abnormal edges more efficiently by contrastive learning.

4.3. Parameters Sensitivity

In response to RQ2, sets of parameter experiments on $β$ and $p$ are conducted.

4.3.1. Training Ratio $β$

In our experiments, we analyze the performance of the CDBC-DGADM framework using training data with different training ratios. In Figure 5, $β$ is set to {20%, 30%, 40%, 50%, 60%,70%,80%,90%} separately, and the results show that the AUC is in an increasing trend as $β$ increasing. This indicates the model has a stable detection performance with sufficient training data. However, when beta reaches 90%, the AUC no longer increases, which may be due to the $9 : 1$ ratio of the training set to the test set, leading to overfitting issues.

Figure 5.

AUC of CDBC-DGADM on four datasets with different training ratio. Note. AUC = area under the curve; CDBC-DGADM = combine dual behavior contrast dynamic graph anomaly detection model.

4.3.2. Anomaly Injection Ratio

p

As shown in Figure 6, $p$ is set to {1%, 5%, 10%, 15%, 20%, 25%, 30%, 35%,40%,45%, 50%}, and it is observed that as $p$ increases, despite some fluctuations, the overall AUC shows a downward trend, which may be caused by the following reasons:

(1)
Data imbalance. In the graph anomaly detection task, the number of abnormal edges is much less than the number of normal edges, so injecting too many abnormal edges could lead to a decline in model performance.
(2)
Noise increasing. Injecting a large amount of abnormal edges may increase the noise in the dataset. The noise data may not correspond to the actual abnormality, thus making it more difficult for the algorithm to distinguish the real abnormal edge from the noise data.
Therefore, we can also find that even if the anomaly injection ratio is (50%), the model still has a competitive performance, which proves the effectiveness of our framework.

Figure 6.
AUC of CDBC-DGADM on four datasets with different anomaly injection ratio. Note. AUC = area under the curve; CDBC-DGADM = combine dual behavior contrast dynamic graph anomaly detection model.
4.4. Ablation Study

In response to RQ3, the ablation experiments are performed over the key components of CDBC-DGADM: (1) CDBC-DGADM-x experiments, using only the attribute-based behavioral module for anomaly detection; and (2) CDBC-DGADM-gv experiments, using only the role-based behavioral module for anomaly detection. Additionally, to verify the effectiveness of GRU, long short-term memory (LSTM), and RNN, ablation experiments were conducted with LSTM and RNN, namely CDBC- DGADM-LSTM and CDBC-DGADM-RNN, respectively. The results of the experiments are shown in Table 4.

Table 4.
Ablation Study for CDBC-DGADM and Its Variants on Four Datasets.

Models UCI Message Digg Bitcoin-Alpha Bitcoin-OTC

CDBC-DGADM–LSTM 0.7859 $\pm$ 0.0140 0.8230 $\pm$ 0.0013 0.8704 $\pm$ 0.0062 0.9051 $\pm$ 0.0027

CDBC-DGADM–RNN 0.7761 $\pm$ 0.0075 0.8114 $\pm$ 0.0028 0.8648 $\pm$ 0.0088 0.9031 $\pm$ 0.0034

CDBC-DGADM-x 0.7789 $\pm$ 0.0064 0.8321 $\pm$ 0.0037 0.8579 $\pm$ 0.0116 0.8938 $\pm$ 0.0129

CDBC-DGADM-gv 0.7912 $\pm$ 0.0054 0.8411 $\pm$ 0.0015 0.8603 $\pm$ 0.0074 0.8981 $\pm$ 0.0036

CDBC-DGADM 0.8315 $\pm$ 0.0130 0.8438 $\pm$ 0.0145 0.9032 $\pm$ 0.0104 0.9105 $\pm$ 0.0179

Models	UCI Message	Digg	Bitcoin-Alpha	Bitcoin-OTC
CDBC-DGADM–LSTM	0.7859 $\pm$ 0.0140	0.8230 $\pm$ 0.0013	0.8704 $\pm$ 0.0062	0.9051 $\pm$ 0.0027
CDBC-DGADM–RNN	0.7761 $\pm$ 0.0075	0.8114 $\pm$ 0.0028	0.8648 $\pm$ 0.0088	0.9031 $\pm$ 0.0034
CDBC-DGADM-x	0.7789 $\pm$ 0.0064	0.8321 $\pm$ 0.0037	0.8579 $\pm$ 0.0116	0.8938 $\pm$ 0.0129
CDBC-DGADM-gv	0.7912 $\pm$ 0.0054	0.8411 $\pm$ 0.0015	0.8603 $\pm$ 0.0074	0.8981 $\pm$ 0.0036
CDBC-DGADM	0.8315 $\pm$ 0.0130	0.8438 $\pm$ 0.0145	0.9032 $\pm$ 0.0104	0.9105 $\pm$ 0.0179

Note. CDBC-DGADM = combine dual behavior contrast dynamic graph anomaly detection model; LSTM = long short-term memory; RNN = recurrent neural network.

By observing and analyzing the experimental results in Table 4, the following conclusions are obtained: (1)

The CDBC-DGADM model has optimal anomaly detection performance, proving that the learning combined dual-behavior patterns can provide more comprehensive information in anomaly detection.

(2)

In the social dataset UCI Messages, as well as the news network dataset Digg, the performance of CDBC-DGADM-gv is much better than that of CDBC-DGADM-x. In the trading network datasets, BTC-Alpha and BTC-OTC, the two are not significantly different. From this, it can be concluded that in networks with high socialization and strong message connections, role-based behavior patterns can provide more contextual and semantic information about the node, which can help anomaly detection.

(3)

The CDBC-DGADM outperforms both CDBC-DGADM–LSTM and CDBC-DGADM–RNN across four datasets, demonstrating the superiority of GRU compared to LSTM and RNN.

4.5. Comparison of Efficiency

In response to RQ4, the training efficiency of CDBC-DGADM is evaluated. Taking UCI Messages as an example, the running time of all methods in an epoch is recorded, and the results are shown in Figure 7.

Figure 7.

The running time of each model.

In general, traditional methods such as DeepWalk+SVM, Node2vec+SVM, Spectral Clustering, and NetWalk are more efficient, while deep learning methods such as AddGraph, StrGNN, and TADDY take longer to train. It is worth mentioning that although CDBC-DGADM shows marginal improvement in AUC on certain datasets, it requires less time, which is advantageous when training on large-scale datasets. Additionally, the GRU used in the model outperforms other sequence models, providing better efficiency at comparable accuracy levels. In conclusion, the proposed model CDBC-DGADM achieves a good balance between performance and efficiency.

5. Conclusions

Aiming at the problems that existing models in dynamic graphs only learn attribute-based behavior patterns for anomaly detection and ignore the essential differences between normal edges and abnormal edges, we proposed CDBC-DGADM to overcome the above problems. First, we study the dual-behavior learning patterns and obtain the final vector embedding by incorporating the use of GRU. Secondly, to enhance the model’s capability of targeting abnormal edges, we leverage contrastive learning to effectively distinguish and elevate the embeddings of positive and negative samples. Finally, experiments demonstrate that CDBC-DGADM achieves the best balance between performance and efficiency.

Future work includes but is not limited to incorporating human prior knowledge into anomaly detection to improve effectiveness and efficiency and applying our model to different domains (e.g., financial fraud detection or anomaly log detection).

Footnotes

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Cai

Chen

Luo

Gui

Chen

(2021). Structural temporal graph neural networks for anomaly detection in dynamic graphs. In Proceedings of the 30th ACM international conference on information & knowledge management (pp. 3747–3756). OpenReview.net.

De Choudhury

Sundaram

John

Seligmann

D. D.

(2009). Social synchrony: Predicting mimicry of user actions in online social media. In 2009 International conference on computational science and engineering (vol. 4, pp. 151–158). IEEE.

Dey

Salem

F. M.

(2017). Gate-variants of gated recurrent unit (GRU) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS) (pp. 1597–1600). IEEE.

Fawcett

(2005). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.

Gao

Feng

Liang

(2022). Anomaly detection in dynamic graph based on deep graph auto-encoder. In 2022 International conference on machine learning and intelligent systems engineering (MLISE) (pp. 317–320). IEEE.

Grover

Leskovec

(2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 855–864). ACM.

Hočevar

Demšar

(2014). A combinatorial approach to graphlet counting. Bioinformatics (Oxford, England), 30(4), 559–565.

Yang

Gao

(2013). Incremental local evolutionary outlier detection for dynamic social networks. In Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2013, Prague, Czech Republic, September 23–27, 2013, Proceedings, Part II 13 (pp. 1–15). Springer.

Kumar

Hooi

Makhija

Kumar

Faloutsos

Subrahmanian

(2018). Rev2: Fraudulent user prediction in rating platforms. In Proceedings of the Eleventh ACM international conference on web search and data mining (pp. 333–341). ACM.

10.

Kumar

Spezzano

Subrahmanian

Faloutsos

(2016). Edge weight prediction in weighted signed networks. In 2016 IEEE 16th international conference on data mining (ICDM) (pp. 221–230). IEEE.

11.

Liu

Pan

Wang

Y. G.

Xiong

Wang

Chen

Lee

V. C.

(2021). Anomaly detection in dynamic graphs via transformer. IEEE Transactions on Knowledge and Data Engineering, 35(12), 12081–12094.

12.

Opsahl

Panzarasa

(2009). Clustering in weighted networks. Social Networks, 31(2), 155–163.

13.

Perozzi

Al-Rfou

Skiena

(2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 701–710). ACM.

14.

Ranshous

Harenberg

Sharma

Samatova

N. F.

(2016). A scalable approach for outlier detection in edge streams using sketch-based approximations. In Proceedings of the 2016 SIAM international conference on data mining (pp. 189–197). SIAM.

15.

Ranshous

Shen

Koutra

Harenberg

Faloutsos

Samatova

N. F.

(2015). Anomaly detection in dynamic networks: A survey. Wiley Interdisciplinary Reviews: Computational Statistics, 7(3), 223–247.

16.

Sarajlić

Yaveroğlu

Malod-Dognin

Ö. N.

Pržulj

(2016). Graphlet-based characterization of directed networks. Scientific Reports, 6(1), 35098.

17.

Von Luxburg

(2007). A tutorial on spectral clustering. Statistics and Computing, 17, 395–416.

18.

Wang

(2019). Detecting and assessing anomalous evolutionary behaviors of nodes in evolving social networks. ACM Transactions on Knowledge Discovery from Data (TKDD), 13(1), 1–24.

19.

Leskovec

Jegelka

(2018). How powerful are graph neural networks?, arXiv preprint arXiv:1810.00826.

20.

Yang

Wen

Hooi

Zhou

(2023). A multi-scale reconstruction method for the anomaly detection in stochastic dynamic networks. Neurocomputing, 518, 482–495.

21.

Yaveroğlu

Ö. N.

Malod-Dognin

Davis

Levnajic

Janjic

Karapandza

Stojmirovic

Pržulj

(2014). Revealing the hidden language of complex networks. Scientific Reports, 4(1), 4547.

22.

Cheng

Aggarwal

C. C.

Zhang

Chen

Wang

(2018). Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining (pp. 2672–2681). ACM.

23.

Zheng

Gao

(2019). AddGraph: Anomaly detection in dynamic graph using attention-based temporal GCN. IJCAI, 3, 7.

24.

Zhu

Liu

(2021). An empirical study of graph contrastive learning. arXiv preprint arXiv:2109.01116.

Dynamic Graph Anomaly Detection Model Combining Dual Behavior Contrast

Abstract

Keywords

1. Introduction

2.1. Traditional Graph Analysis Methods

2.2. Feature Extraction-Based Methods

2.3. Reconstruction-Based Methods

3. Methodology

3.1. Problem Definition

3.4. Dual Behavior Learning Module (DBLM)

3.4.1. Role-Based Behavior Patterns Learning Submodule

4. Experiments

4.1.1. Datasets

Table 2. Datasets. Dataset Nodes Edges Avg. degree UCI Messages 1,899 13,888 14.57 Digg 30,360 85,155 5.61 Bitcoin-Alpha 3,777 24,173 12.80 Bitcoin-OTC 5,881 35,588 12.10

4.1.3. Evaluation Metrics

4.1.4. Experimental Settings

4.2. Performance Comparison

4.3.1. Training Ratio β

Footnotes

Funding

Declaration of Conflicting Interests

References

Table 2.
Datasets.

Dataset Nodes Edges Avg. degree

UCI Messages 1,899 13,888 14.57

Digg 30,360 85,155 5.61

Bitcoin-Alpha 3,777 24,173 12.80

Bitcoin-OTC 5,881 35,588 12.10

4.3.1. Training Ratio $β$