Abstract
There is a growing need to perform real-time analytics on dynamic graphs in order to deliver the values of big data to users. An important problem from such applications is continuously identifying and monitoring critical patterns when fine-grained updates at a high velocity occur on the graphs. A lot of efforts have been made to develop practical solutions for these problems. Despite the efforts, existing algorithms showed limited running time and scalability in dealing with large and/or many graphs. In this paper, we study the problem of continuous multi-query optimization for subgraph matching over dynamic graph data. (1) We propose
Introduction
Dynamic graphs emerge in different domains, such as financial transaction network, mobile communication network, data center network [20–22], uncertain network [2], etc. These graphs usually contain a very large number of vertices with different attributes, and have complex relationships among vertices. In addition, these graphs are highly dynamic with frequent updates of edge insertions and deletions.
Identifying and monitoring critical patterns in a dynamic graph is important in various application domains [8] such as fraud detection, cyber security, and emergency response, etc. For example, cyber security applications should detect cyber intrusions and attacks in computer network traffic as soon as they appear in the data graph [3]. In order to identify and monitor such patterns, existing work [3,13,15] studies the continuous subgraph matching problem that focuses on
Figure 1 shows an example of continuous subgraph matching. Given a query pattern

An example of continuous subgraph matching.
However, these applications deal with dynamic graphs in such a setup that is often essential to be able to support hundreds or thousands of continuous queries simultaneously. Optimizing and answering each query separately over the dynamic graph is not always the most efficient. Zervakis et al. [30] first propose a continuous multi-query process engine, namely,
These problems of existing methods motivated us to develop a novel concept of annotated query graph (
In summary, our
We propose an efficient continuous multi-query matching system,
We define annotated query graph, in which corresponding matching results can be obtained in one pass instead of multiple.
We construct a newly data-centric auxiliary data structure based on the equivalent query tree of the annotated query graph to represent the partial solution in a compact form.
We propose an incremental maintenance strategy to efficiently maintain the intermediate results in
We experimentally evaluate the proposed solution using three different datasets, and compare the performance against the three baselines. The experiment results show that our solution can achieve up to two orders of magnitude improvement in query processing time against the sequential processing strategy.
In this section, we first introduce several essential notions and formalize the continuous multi-query processing over dynamic graphs problem. Then, we overview the proposed solution.
Preliminaries
We focus on a labeled undirected graph
(Graph Update Stream).
A graph update stream
A
(Subgraph homomorphism).
Given a query graph
Since subgraph isomorphism can be obtained by just checking the injective constraint [15], we use the subgraph homomorphism as our default matching semantics. Note that we omit edge labels for ease of explanation, while the actual implementation of our solution and our experiments support edge labels.
Based on the above definitions, let us now define the problem of multi-query processing over dynamic graphs.
Overview of solution
In this subsection, we overview the proposed solution, which is referred to as Representation of intermediate results should be compact and can be used to calculate the corresponding matches of affected queries in one pass. Update operation needs to be efficient such that the intermediate results can be maintained incrementally to quickly detect the affected queries.
The former challenge corresponds to
Algorithm 1 shows the outline of

When an edge update occurs, it is costly to conduct sequential query processing. The central idea of multi-query handling is to employ a delicate data structure, which can be used to compute matches of affected queries in one pass.
Annotated query graph
Different from the work proposed in [30] that decomposes queries into covering paths and handles updates by finding affected paths, we provide a novel concept of annotated query graph, namely,
The queries in Fig. 2(a) are overlaped and can be merged into an annotated
Note that, there exists a case that the queries in the

Annotated query graph.
Since continuous multi-query processing is triggered by each update operation on the data graph, it is more useful to maintain some intermediate results for each vertex in the data graph as
The
Note that since we will transform all edges of
For each vertex

Example of constructing
Let
Consider the edge
Based on
Note that we do not store
Figure 3(c) gives the
We rely on an incremental maintain strategy to efficiently maintain
Incremental maintenance of intermediate results
We propose an edge state transition model to efficiently identify which update operation can affect the current intermediate results and/or contribute to positive/negative matches for each affected query. The edge state transition model consists of three states and six transition rules, which demonstrates how one state is transited to another.

Maintenance strategy.
Figure 4(b)–(c) give the example of edge transition rule from
Figure 4(d)–(e) give the example of edge transition rule from
Figure 4(f) gives the example of edge transition rule from
Figure 4(g)–(h) give the example of edge transition rule from
If the state of an edge
In order to calculate the matching order,
Intuitively,
In the next stage,

As show in Fig. 3(d), the state of edge insertion
In this section, we present detailed algorithms for
MDCG construction
In this subsection, we explain
In the worst case,

Note that not all the insertion edge

Here,
In this section, we perform extensive experiments on both real and synthetic datasets to show the performance of
Datasets and query generation
In order to construct the set of query graph patterns
Our method, denoted as
We measure and evaluate (1) the elapsed time and the size of intermediate results for
Evaluating the efficiency of IncMQO
In this subsection, we evaluated the performance of

Performance on SNB1M, NYC and BioGRID.

Performance of varying the percentage of query overlap.
In Fig. 6(1) we give the results of the time efficiency evaluation when varying the parameter

Performance of varying the average query size.
In this subsection, we evaluate the impact of the average query size in

Performance of varying query database size.
In this subsection, we evaluate the impact of the size of query database on the performance of

Performance of varying the edge insertion size.
Figure 9(1)–(2) show the performance results using SNB1M for varying edge insertion size. Here, we vary the number of newly-inserted edges from

Performance of varying the edge deletion size.
Figure 10(1)–(2) show the performance results using SNB1M for varying edge deletion size. Here, we vary the number of expired edges from
Varying dataset size
Figure 11(1)–(2) show the performance results using SNB for varying dataset size. Here, we fixed

Performance of varying dataset size.
In this set of experiments, we evaluate the performance of different matching orderings on SNB1M dataset. We compare our proposed matching order with that proposed in

Comparison of different matching orders.
In this set of experiments, we evaluate the performance of single query to text the efficiency when there is no common components in the query set

Performance evaluation of single query.
In this subsection, we further compare the incremental algorithm (

Results of the comparison.
We categorize the related work as follows.
Conclusion and further work
In this paper, we proposed an efficient continuous multi-query processing engine, namely,
A couple of issues need further study. We only simply consider the scenario that all the queries have been given at the start. However, the queries set will actually be updated due to users’ demands. To this end, we are going to consider this scenario in the future work and design an efficient algorithm to deal with the updates of queries set in multiple queries processing over dynamic graph.
Footnotes
Acknowledgement
This work is partially supported by National key research and development program under Grant Nos. 2018YFB1800203, Tianjin Science and Technology Foundation under Grant No. 18ZXJMTG00290, National Natural Science Foundation of China under Grant No. 61872446, and National Natural Science Foundation of Hunan Province under grant No. 2019JJ20024.
