Sage Journals: Discover world-class research

Abstract

Network data is ubiquitous, such as telecommunication, transport systems, online social networks, protein-protein interactions, etc. Since the huge scale and the complexity of network data, former machine learning system tried to understand network data arduously. On the other hand, thought of multi-granular cognitive computation simulates the problem-solving process of human brains. It simplifies the complex problems and solves problems from the easier to harder. Therefore, the application of multi-granularity problem-solving ideas or methods to deal with network data mining is increasingly adopted by researchers either intentionally or unintentionally. This paper looks into the domain of network representation learning (NRL). It systematically combs the research work in this field in recent years. In this paper, it is discovered that in dealing with the complexity of the network and pursuing the efficiency of computing resources, the multi-granularity solution becomes an excellent path that is hard to go around. Although there are several papers about survey of NRL, to our best knowledge, we are the first to survey the NRL from the perspective of multi-granular computing. This paper proposes the challenges that NRL meets. Furthermore, the feasibility of solving the challenges of NRL with multi-granular computing methodologies is analyzed and discussed. Some potential key scientific problems are sorted out and prospected in applying multi-granular computing for NRL research.

Keywords

Granular computing network representation learning network embedding data mining

1. Introduction

A network is a set of items, which we will call vertices or sometimes nodes, with connections or edges between them [1]. Network is ubiquitous in our life and objects that connected by relationships can be abstracted as networks. The network data is the abstraction for the complex interactions among the large-scale distributed information system and it exists widely in the physical world. The social activities among people, the interactions between human-beings, the units in social organizations, the transportation of the industrial products or vehicles, etc. can be abstracted as information networks. In the field of industrial robot fault detection, the network analysis can mine the hidden knowledge from the expert knowledge graph, so as to build the intelligent recognition algorithm for fault diagnosis in industrial robot operation [2]. Network analysis has been a non-trivial domain in the big-data intelligent mining and have attracted an enormous amount of research [1, 3].

Early network analysis mainly focuses on the centrality (some individuals have strong influence) and connectivity (how individuals are connected through the network) of nodes. In the following decades, with the expansion of data scale and the improvement of computing capacity, a large number of studies have shifted from analyzing the characteristics of nodes and edges in individual small-scale networks to analyzing the statistical characteristics of large-scale networks [4]. For example, the statistical method of quantitative large-scale network formed around 2002. The statistical method finds out the statistical characteristics that can describe the structure and behavior of network system, builds a mathematical network model, and predicts the behavior and local rules of the network system based on these statistical characteristics. The academic community has made great progress in the first two goals (characterizing and modeling network structures), but it has only just begun to study the impact of network structures. In recent years, people have also studied heterogeneous information networks, edge weights, super-edges and other problems [4].

Intuitively, the network or graph can be represented by tables. Considering a simple example from Singh [5], a social network describing actors and their participation in events. It is also called affiliation network [6]. This simple structure can be represented as several distinct graphs. We may construct a network in which the actors are nodes and edges correspond to actors who have participated in an event together. Nevertheless, for the large-scale networks with millions of nodes and edges, the table presentation makes the application of machine learning infeasible. A good representation for large-scale network data is key for subsequent analysis.

Representation is not a new word though. It is used in many fields to extract the features of cognitive objects such as images [7]. To extract features of graphs, traditional approaches often rely on summary graph statistics (e.g., degrees or clustering coefficients) [8], kernel functions [9], or carefully engineered features [10, 11]. Since these hand engineered features are inflexible i.e., they cannot adapt during the learning process, the traditional method of manual designing of graph features is inefficient and not feasible for large-scale network analysis. Moreover, the matrix network representation is usually high dimensional and sparse matrix. It consumes too much computational resource. More recently, people seek to learn representations that encode the features of graphs via self-irritation of model, like the process of image feature learning.

In recent years, machine learning methods and the feature extraction technologies represented by deep learning have achieved great success[12, 13, 14, 15]. For network feature extraction, one of the key preconditions for analyzing network data via machine learning technologies is how to present the data, that is, how to extract the data features.

This article discusses the use of multi-granularity cognitive computing in the field of network representation learning, and the idea of multi-granularity cognitive computing is widely present in the fields of data mining and artificial intelligence. Granular computing simulates the way the human brain thinks, i.e. hierarchically simplifies complex things, extracts and mines the multi-granularity features of objects. This is of great significance for improving the performance and effectiveness of data mining algorithms. The network data is naturally complex, and the network structure features have multi-granularity characteristics. Therefore, it is necessary to study and summarize the existing work in the field of network representation learning from the perspective of multi-granularity computing. This article is based on this idea and provides a unique perspective to summarize the relevant papers in the NRL field. Researchers engaged in granular computing research or NRL research can better understand how multi-granularity computing ideas are embodied and implemented in the NRL field through this article. This can help granular computing researchers understand the application of granular computing ideas in the NRL field, and can also help NRL researchers to inspire them to use multi-granularity computing ideas to achieve better NRL research results.

1.1 Network data representation and network representation learning

Graph is a ubiquitous way to organize a diverse set of real-world information. Intuitively, the network or graph can be represented by tables, that shows which two nodes are connected. Considering a simple example from Singh [5], a social network describing actors and their participation in events. Such social networks are commonly called affiliation networks [6] and are easily represented by three tables representing the actors, the events, and the participation relationships. Even this simple structure can be represented as several distinct graphs. We may construct a network in which the actors are nodes and edges correspond to actors who have participated in an event together.

For large scale networks that is consisted by millions or billions of nodes, the traditional network representation brings billions of dimensions. This makes the application of machine learning infeasible. Unfortunately, there is no straightforward way to encode this high-dimensional, non-Euclidean information about graph structure into a feature vector. For the task of big-data analysis using machine learning methods, the significant issues are how to represent and select the feature of the data. A good representation for meta data is key for subsequent analysis. Representation is not a new word though. It is used in many fields to extract the features of cognitive objects such as images. Since 1989, image features have been designed manually by people [7]. To extract features of graphs, traditional approaches often rely on summary graph statistics (e.g., degrees or clustering coefficients) [8], kernel functions [9] or carefully engineered features to measure local neighborhood structures [10, 11]. Since these hand engineered features are inflexible i.e., they cannot adapt during the learning process, the traditional method of manual designing of graph features is inefficient and not feasible for large-scale network analysis. More recently, people seek to learn representations that encode the features of graphs via self-irritation of model, like the process of image feature learning. This is the motivation of network representation learning.

Network representation learning (NRL) tries to learn low-dimensional vectors to represent each vertex of networks. Such that, NRL encodes the attributes of each vertex such as the structural properties into a vector space. Through this way, the high-dimensional vectors in adjacency matrix for each of the vertices is represented into a coarser granular level so that the off-the-shelf machine learning methods can be applicable. From the perspective of granularity computing, this is essentially a transformation of data from finer granularity level to coarser level. Thus, NRL transforms the extremely high-dimensional representation of large network into representation in low-dimension and dense vector space, which will be easy to be processed by machine learning technologies. NRL is essentially discovering the latent discriminatory factors behind data.

Due to the interconnections and interactions of information on different scales in networks, NRL models that can investigate the latent law and discover the influential factors behind the data on multi-granularities may introduce improvement for NRL. With respect to the common characteristics of components in networks and the challenges in NRL, we discuss how to design the NRL models by taking advantage of human cognitive laws and network features.

1.2 The development of NRL

The traditional representation learning method of network data is mainly based on spectral method, optimization method and probability generation model. Spectral method refers to the general designation of a class of algorithms that take advantage of the spectral characteristics of matrix such as eigenvalue, eigenvector, singular value and singular vector [16]. General way is commonly used to obtain the low-dimensional representation of matrix [17]. For example, the classic PCA (principal components analysis) algorithm is to reduce the dimension of the feature vectors selected by the sample covariance matrix. The network can be represented as an adjacency matrix as input to the PCA or singular value decomposition (SVD) to obtain a low dimensional representation of the node. This representation is usually of poor quality due to the lack of information within the nodes [18].

The NRL algorithms based on the optimization method solve a well-designed objective function and take the vector representation of vertices in the low-dimensional space as the parameter. Y Jacob et al. propose LSHM (latent space heterogeneous model) algorithm [19] to learn node vectors and the label of linear classification function. The objective function includes two parts: First, the adjacent nodes of the label should be as similar as possible; The ability of classification function to predict known tags. In the prediction of network information transmission, the traditional method is to first discover the hidden structure of communication from the user’s behavior [20], that is, to establish a new network. Reference [21] studies the low-dimensional representation of user nodes in the continuous hidden space. The network representation learning algorithms based on optimization methods are closely related to specific network analysis tasks. All of them use the random gradient descent method which has low convergence efficiency and may lead to over-fitting problems.

Other research uses a probability-based sampling process to model the generation process of network data. The solutions of these models are usually Gibbs sampling, variational inference and expectation maximization algorithm. Such algorithms are more representative of PLSA [22] and LDA [23] algorithms, which are essentially probabilistic graph models. Ramesh Nallapati et al. propose the LINK-PLSA-LDA model for academic paper citation network [24], Jonathan Chang et al. propose the RTM model [25]. The RTM model assumes that if there is edge between two text nodes, their distribution in the topic space should be similar. Tuan MV Le et al. extend the RTM model and propose the PLANE model [18] which learns the low-dimensional representation for the topic and the text nodes from a visual perspective.

With the development of representation learning technology in the field of natural language processing(NLP) [26, 27, 28], NRL researchers have employed their methods to develop algorithms and have achieved remarkable improvements, such as DeepWalk [29],LINE [30],PTE [31], node2vec [32], etc. Moreover, there have been a lot of NRL research works based stochastic language models. They combine network properties with the spirit of multi-granular computing methodologies.

1.3 The challenges in NRL

The ubiquitous multi-granularity, multi-scale, multi-level, evolving and complex characteristics of network data bring great challenges to the study of network representation learning. Bengio et al. [33] argue that the original data is generated by multi-source of factors, and a good feature extraction can only be achieved if representation learning models can learn to identify and disentangle the underlying explanatory factors hidden in the observed milieu of low-level sensory data. In the data of networks, the discriminative information is distributed on multiple granular layers due to the hierarchical characteristics of network data.

Handling the network data of millions or billions of nodes is one of the challenges in NRL. The large-scale network data introduces the problem of computability. Vertices in networks are interactive each other and simply breaking any linkage will introduce new interferences and noise. It’s difficult to define where the analysis or representation should start with. This is the reason that traditional network analysis algorithms always focus on the stochastic analysis but cannot obtain high performance in the sequent network inference tasks.

Most prior works on NRL have treated this with a ’one-size fits all’ approach. Nonetheless, network information has the multi-scale properties, and a single granularity of representation for nodes cannot fully explain the connections between nodes in network. It is desired to learn the embedding in a multi-granular granulation model to gradually shift the focus to the unexplained node connectivity behaviors [34]. These ‘one-size fits all’ approaches fail to explicitly capture the multiple scales of relationships in network.

Due to the hierarchical components in networks, features that on multiple granular layers may be difficult to capture. When representing the features that across multiple granular layers, NRL models should handle the contradiction between sparsity and interconnection. For instance, the community structure, which is widely existing in almost all the network data, is an important mesoscopic description of network structure. The sparsity and interconnection of network data introduces challenges in community structure constraint NRL. On the other hand, two nodes in different communities can have strong connections in microscope because of the interconnections in networks. The recent works such as LINE [30] and GraRep [35] extend NRL to higher order proximity approximation but the mesoscopic description of network is largely ignored. Furthermore, the nodes or components in networks may be created by multiple type of sources, such as text, image, video, sound, etc. If we use different types of models to handle the heterogeneous data, the non-coupling problems between interconnection and semantic similarities may emerge. In addition, the data of networks accumulates and evolve quickly with time. People need to analyze the status of components and predict the developing tendency in networks depends on the historic and new coming data.

1.4 Hierarchical information in networks and multi-granular computing

The hierarchical taxonomy in NRL results is discussed by Liu et al. [36] and Ma et al. presents a NRL algorithm for preserving the hierarchical taxonomy in networks [37]. Both of the two works reveal that the existence of hierarchical information in networks. Therefore, some research work on network representation which embodies the thought of multi-granularity cognitive computing is categorized. The possibility of using multi-granular computing methodologies to create NRL model for extracting the hierarchical and multi-granular features of network data is discussed in the following chapters. This paper discusses the representation learning algorithms for network data, and sorts out the possible multi-level features in network in the perspective of multi-granular cognitive computing.

Representation learning is the algorithm designed by people to automatically include the hidden features in the original data in the training results. The output of representation learning (also treated as the features of the original data) will be directly used for sequent inference tasks such as node classification, prediction, target recognition, etc. In addition, multi-granular cognitive computing is a set of methodology that inspired by human cognitive laws. It can be used to gradually process information on multi-granularities such as recognizing complex things in multiple scales, multi-layers, and providing approximate solutions for complex problems.

Granular computing has emerged as one of the fastest growing intelligent computing paradigms in the domain of cognitive intelligence and artificial intelligence [38]. It is often loosely regarded as an umbrella term to cover theories, methodologies, techniques, and tools that make use of granules in complex problem solving [39]. Granulation, an operation to construct or decompose granules, is one of the key issues of granular computing [40]. Inspired by human ways of granulating and manipulating information, Zadeh proposes a theory of fuzzy information granulation (TFIG) [41, 42]. The proposed fuzzy theories could provide a granular representation of uncertain information about a variable of interest. Liu et al. develop a multiple granularity concept generation model as hierarchical concept trees as shown in Fig. 1. Xu et al. develop an adaptive hierarchical clustering approach to generate a hierarchical tree [43] for mining the data that has hierarchical properties. Emad et al. propose a multi-connect architecture (MCA) associative memory to improve the Hopfield neural network by modifying the net architecture, learning and convergence processes [44]. It can learn and recognize unlimited patterns in varying size with acceptable percentage noise rate in comparison to the traditional Hopfield neural network.

Figure 1.

Hierarchical concept tree [45].

Multi-granular computation treats the complex problems as a set of child problems which have different scale of complexity. As the human recognition principle, a man may handle a complex problem from coarse to fine. In multi-granular perspective, the NRL models may process the complex network analysis problems progressively, and solve the problems on different granular layers step by step. Multi-granular computing methodologies may offer new ideas and approaches for solving such contradictions. In multi-granular perspective, models processing representation learning problem on a unique granular layer will ignore the multi-scale structural properties and the nodes do not share attributes between different granular layers. The multi-granular methodologies may inspire the NRL model to handle the nodes that were not previously seen and process the evolving situations without launching additional rounds of optimizations. Exploring the reflection of multi-granularity methodologies in existing NRL algorithms and analyzing the issues of application of multi-granular computing in NRL are believed to be significant. We hope to provide some reference and inspirations for applying the idea of multi-granularity computation to the research community of NRL.

In the NRL research, we should also pay attention to the cognitive rules of humans and the latent multi-granular features of the network itself. The success of deep learning is a concrete demonstration of the multi-granular cognitive mechanism. Soujanya et al. [46] apply deep convolutional neural networks for natural language feature extracting and combine the feature vectors of multi-model heterogenous data such as text, image and audio. In the feature extraction for network, there are also some studies that embody the multi-granular cognitive mechanism, such as GraRep [35], NEU [47], HARP [48], AROPE [49] and MOANA [50].

1.5 Multi-granular methodology in NRL algorithms

As NRL has the contradictions and challenges resulted by the multi-granular characteristics of networks discussed above, most of those representation methods suffer from the ineffectiveness of extracting the multi-granular features by hinging on a ‘one size fits all’ approach to network representation learning [51]. Therefore, some former studies about NRL have taken into account the hierarchical properties of network data such as GraRep [35], WALKLETS [52], HARP [48] and AROPE [49]. Phenomena and approaches that of using multi-layer representation and heterogeneous information are used for network knowledge extractions.

For instance, GraRep proves that the DeepWalk as well as the skip-gram model with negative sampling [28] can be essentially optimized as a matrix factorization problem. By extending the length of the sampling path, the global structural information for the graph can be captured. Through this way, in the perspective of granular computing, the GraRep is actually an extension of DeepWalk as it added global knowledge into the representation. In the work of NEU [47], the author analyzes the network representation method that can be viewed as matrix factorization. They find that the performance of representation will be better if the high-order information of graph vertices can be contained in the factorization of a matrix. But the complexity will increase as the order of information goes higher. The higher-order information can be equally viewed as the finer knowledge in granular computing. Adding the finer information into systems surely brings in complexity. The instances above are all about representing the structural property from coarser to finer. They encode more information about networks and so that the richness of the knowledge extracted from networks is increased. On the contrast, works such as [51], HARP [48] represent the information from finer to coarser. WALKLETS captures the coarser information by sub-sampling. Thereby, more global structure properties in the graph are captured. HARP explicitly introduces the hierarchical representation learning problem for graphs. They simplify the original graph by collapsing nodes and edges and approximating the structures using a smaller scale of graphs which contains fewer nodes and edges. Their approximation of the original graph into hierarchical small scale graphs can be seen as encoding the complex whole information into multi granular layers. In order to shift the embedding vectors between proximities of arbitrary orders, AROPE [49] reveals the intrinsic relationship between proximities of different orders. This work can be viewed as the concrete study of network representation model on multi-granularities.

Those works can be seen as embodiments of multi-granular computing methodologies. They represent the latent knowledge in networks on multi-granular layers. In the following chapters, we specifically discuss, in the domain of network analysis and representation, how the former works are combined with the idea of multi-granular knowledge extractions. And considering the shortages in current NRL studies, we propose the advantages in multi-granular cognitive computing that will be helpful in network representation learning. By the integration of the methodology of multi-granular recognizing computing, we hope to propose some inspiration for improving the performance of NRL algorithm and exploring the essential principle of NRL, especially in the time-limited, structure-constraint and data-streaming scenarios.

In this paper, we focus on the recently proposed NRL methods that were developed under the consider of using multi-granular computation thoughts and aiming to address the goal of network inference. The taxonomy structure of the related works is shown in Fig. 2.

Figure 2.

The taxonomy of multi-granular network representation learning methods.

1.6 Contributions

Research [37, 53] show that networks often exhibit hierarchical organization and granular computing provides true and natural representations of such hierarchical systems. To our best knowledge, we are the first to survey the NRL technologies with the perspective of multi-granular computing, which is consistent with human thinking [38], although there are several papers about survey of NRL [54, 55]. Moreover, this survey provides a review of the state-of-the-art network representation learning techniques with the perspective of multi-granular computation methodologies. It covers traditional methods on network analysis but mainly focuses on the multi-granular characteristics of the network data and the resulting contradictions and challenges in NRL field. Furthermore, it discusses the emerging of a brunch of works that borrow the multi-granular computing methodologies for the hierarchical network feature extraction to face those challenges. By doing so, we hope to provide a new and significant perspective for NRL community and focus on the most essential and ubiquitous multi-layer characteristics of network data. In particular, this survey has three major contributions:

We analyze the contradictions in NRL research and depends on which, a new taxonomy for categorizing the NRL research works that can be seen as the embodiments of multi-granularity computing methodologies is proposed.

We show the connections between human-being’s recognition law and multi-granular computing in feature extractions and analyze the potential advantages of applying multi-granular computing into NRL research.

We discuss issues and deficiencies of multi-granularity methods in current NRL research and propose research prospects and potentiality of applying multi-granular methodologies in further research of NRL.

The rest of this article is organized as follows. In Section 2, we introduce preliminaries and definitions required to understand the models discussed next. In Section 3, we present a multi-granularity perspective on categorizing the existing representative NRL techniques. A list of successful applications of network representation learning are discussed in Section 4. We discuss potential research directions in Section 5, and conclude this article in Section 6.

2. Preliminaries and definitions

In this section, as preliminaries, we first define notations that are used to discuss the NRL research next, followed by a formal definition of the NRL problem. For ease of presentation, we first define a list of common notations that will be used throughout the survey, as shown in Table 1.

Table 1
A summary of common notations

Notations	Description
$G$	The given information network
$V$	Set of vertices in the given information network
$E$	Set of edges in the given information network
$A$	Adjacency matrix
$a_{ij}$	The element of $A$ , at $i$ -th row and the $j$ -th column
$n$	Number of vertices, $n=\left\|V\right\|$
$v$	An element of $V$ ï¼Œi.e. a vertex of graph $G=\left({V,E}\right)$
$u$	An element of $V$ ï¼Œi.e. a vertex of graph $G=\left({V,E}\right)$
$\left\|E\right\|$	Number of edges
$m$	Number of vertex attributes
$d$	Dimension of learned vertex representations
$X$	The vertex attribute matrix
${\cal Y}$	Set of vertex labels
$\left\|{\cal Y}\right\|$	Number of vertex labels
$Y\in\mathbb{R}^{\left\|V\right\|\times\left\|{\cal Y}\right\|}$	The vertex label matrix

Definition 1 (Graph) A graph $G=\left({V,E}\right)$ , where $V=\{v_{1},\ldots,v_{n}$ } is a set of vertices representing the nodes and $E=\{e_{i,j}\}_{i,j=1}^{n}$ is an edge set representing the edges connecting nodes. $n=\left|V\right|$ denotes the number of vertices in graph $G$ . The adjacency matrix $A\in R^{\left|V\right|\times\left|V\right|}$ of graph $G$ contains non-negative weights associated with each edge. The value on the $i$ -th row and the $j$ -th column $a_{ij}\geqslant 0$ if there is an edge between node $v_{i}$ and node $v_{j}$ . If $v_{i}$ and $v_{j}$ are not connected, then $a_{ij}=0$ . The value of node $a_{ij}$ can be from 0 to 1 which shows the strength of the relationship between node $v_{i}$ and node $v_{j}$ .

Definition 2 (First-order proximity) The first-order proximity is the local pair-wise proximity of two nodes on the relationship of lowest level. The value of $a_{ij}$ is defined as the first-order proximity between vertex $v_{i}$ and $v_{j}$ . The first-order proximity represents the foremost measure of similarity of two vertices.

Definition 3 (Second-order proximity) The second-order proximity is the similarity of the neighborhood structure between two nodes. Let $a_{i}=\left[a_{i1},\ldots,a_{in}\right]$ denote the first-order proximity between $v_{i}$ and all other vertices. Then the second-order proximity of $v_{i}$ and $v_{j}$ is defined as the similarity between $a_{i}$ and $a_{j}$ . The second-order proximity can also be measured by the 2-step transition probability from $v_{i}$ to $v_{j}$ equivalently. The higher-order proximity captures more global structure and it explores the $k$ -step ( $k\geqslant 3)$ relations between each pair of vertices.

Definition 4 (Network data representation learning) Formally, let $G=\left({V,E}\right)$ be a graph, where $V$ and $E$ are the set of vertices and set of edges of the network respectively, $E\subseteq\left({V\times V}\right)$ . Network representation learning(NRL) aims to learn a representation vector $r_{v}\in\mathbb{R}^{d}$ for every graph node $v$ , and $d\ll\left|V\right|$ is the dimension of the learned representation. Let $A\in\mathbb{R}^{\left|V\right|\times\left|V\right|}$ be the adjacency matrix and the NRL transformation is a mapping function ${\Phi}:A\mapsto r\in\mathbb{R}^{\left|V\right|\times d}$ where $d\ll\left|V\right|$ is a small number of latent dimensions. This mapping ${\Phi}$ defines the latent representation of each node $v\in V$ . The learned vertex representations should preserve vertex proximity reflected by network structure or other information obtained in networks. Particularly, in the perspective of multi-granular computing, the NRL transformation is called the granulation for the network data and the low-dimensional representation $r_{v}\in\mathbb{R}^{d}$ for every graph node $v\in V$ is called a granule of network information.

3. Multi-granular network representation algorithms

This section first discusses the traditional graph embedding method, and then discusses the multi-granularity representation learning method for structural information preserving, community information preserving, content information preserving, and for advanced network referring tasks. It can be seen that in different types of presentation learning methods, the multi-granularity computing methodology is embodied in different ways and has different connotations.

3.1 Multi-level embedding for graphs

Definition 5 (Graph embedding) Graph embedding is a methodology aimed at representing a whole graph, along with the attributes attached to its nodes and edges, as a point in a suitable vector space. Graph embedding approaches learn a $d$ -dimensional embedding dictionary $r\in\mathbb{R}^{\left|V\right|\times d}$ , containing continuous real-values vector $r_{v}\in\mathbb{R}^{d}$ for every graph node $v\in V$ . Earlier approaches in computing embeddings include Eigenmaps [56], which embeds $r_{v_{i}}$ and $r_{v_{j}}$ to be close if they are connected (i.e. $\left({v_{i},v_{j}}\right)\in E$ or similarly $a_{ij}=1$ [57]).

Graph embedding offers a straightforward solution, by employing the representational power of symbolic data structures and the computational superiority of feature vectors [58]. It acts as a bridge between structural and statistical approaches [59, 60] and allows a pattern recognition task to benefit from the computational efficiency of state-of-the-art statistical models and tools [61] along with the convenience and representational power of classical symbolic representations. This permits the last three decades of research on graph-based structural representations in various domains [62], to benefit from the state-of-the-art machine learning models and tools. For further reading on the application of graph embedding, interested readers can refer to[63].

As most of the existing methods on graph embedding can only handle the graphs that are comprised of edges with a single attribute and vertices with either no or only symbolic attributes, people design graph embedding methods for attributed graphs with many symbolic as well as numeric attributes on both nodes and edges [64, 65, 66]. The method is named fuzzy multilevel graph embedding and is abbreviated as FMGE. It preserves multi-facet information (see Fig. 3) from global, topological, and local point of views and is based on multilevel analysis of a graph for embedding it into a feature vector. FMGE employs fuzzy overlapping trapezoidal intervals for minimizing the information loss while mapping from continuous graph space to discrete feature vector space. FMGE has built-in unsupervised learning abilities and thus is inexpensively deployable to a wide range of application domains [66].

Figure 3.

Multi-facet view of discriminatory information in graph.

The Fuzzy Multilevel Graph Embedding method (FMGE) performs multilevel analysis of graph to extract discriminatory information of three different levels. These include the graph level information, structural level information and the elementary level information. The three levels of information represent three different views of graph for extracting global details, details on the topology of graph and details on elementary building units of graph. The feature vector of FMGE is named Fuzzy Structural Multilevel Feature Vector – FSMFV (see Fig. 4) [67].

Figure 4.

The feature vector of FMGE [67].

The vector of FMGE, i.e. The Fuzzy Structural Multilevel Feature Vector (FSMFV) consists of six part of feature vectors. Graph order is the number of vertices contained in the graph. A graph vertex is an abstract representation of the primitive components of underlying content. The order of a graph provides very important discriminatory topological information on the graph. Graph order ( $\left|V\right|$ ) allows to discriminate between a small graph with few vertices and bigger graph with large number of vertices. Graph size is the number of edges contained in the graph ( $\left|E\right|$ ). Graph size also provides important discriminatory information on the topological details of graph. The embedding of node degree is the histogram of node degrees. The degrees of nodes represent the distribution of edges in graph and provide complementary discriminatory information on the structure and topology of graph. The embeddings of subgraph homogeneity is consisted of histograms of node attributes resemblance and histograms of edge attributes resemblance. The subgraph homogeneity embedding is represented as a histogram that counts the number of subgraphs with each level of node attribute similarity. This feature vector can be used to detect recurring patterns or motifs in the graph that are characterized by specific combinations of node attribute values. Embedding of edge attributes: This feature vector captures the similarity of edge attributes within subgraphs of the graph. Edge attributes are features associated with each edge, such as weight or direction. The subgraph homogeneity embedding for edge attributes is also represented as a histogram that counts the number of subgraphs with each level of edge attribute similarity. This feature vector can be used to detect recurring patterns or motifs in the edge structure of the graph that are characterized by specific combinations of edge attribute values.

There are correlations and differences between graph embedding and network representation learning. The goal of graph embedding is similar to network representation learning, that is, to embed a graph into a low-dimensional vector space [68]. The theoretical connection between skip-gram based network embedding algorithms and the theory of graph Laplacian is provided in the work [69]. However, graph embedding can be regarded as a special case of network representation learning as graph embedding mainly works on graphs constructed from feature represented data sets, where the proximity among nodes encoded by the edge weights is well defined in the original feature space. On the other side, network representation learning works on the naturally formed networks and targets the inference of networks such as node classification, link prediction, visualization [70].

Although there are differences between graph embedding and NRL, the multi-granular attribute exist in data of networks and it is explicitly proposed and studied in graph embedding domain [66]. This shows that the multi-granular methodology may help us to analyze the graphs and networks, both of which are consisted of vertices and edges.

3.2 Multi-granular representation for the structure properties of network

The original network structure is usually represented by adjacency matrix, which explicitly contains the information of connections between all the vertices in the graph. This can be regarded as the finest granular information representation. In DGCC [40], data is considered as knowledge in the lowest(finest) granularity level, and knowledge is considered to be the abstraction of data in different higher granularity layers. As the opinion of DGCC, the adjacency representation of original networks can be regarded as the knowledge in the finest granular layer. The process of NRL is essentially an extraction of knowledge from the information on the finest granular layer but the knowledge is required to contain the whole or partial of properties of the original network.

Typically, the purpose of network embedding is extracting the feature of graph which can be viewed as automatic feature engineering. The unsupervised feature learning for the graph is traditionally based on the spectral properties of various matrix representations of graph, such as adjacency matrix and Laplacian matrix [56]. Some of the representative works for this are dimensional-deduction techniques such as PCA [71] and IsoMap [72]. PCA is linear and IsoMap is a non-linear way of dimensionality reduction techniques [73, 61]. These kinds of feature extraction are essentially obtaining the higher-order knowledge of networks while the original representations such as the adjacency matrix can be viewed as the knowledge on the lowest granular level.

Since the invention of word2vec [26, 28], the skip-gram model, which is borrowed from natural language processing domain, has significantly advanced the research of network embedding, such as the recent emergence of the DeepWalk [29], LINE [30], PTE [31], and node2vec [32] approaches.

The most representative work inspired by word2vec that represent the network as a document should be DeepWalk. It generates the corpus by randomly walking along the edges between vertices. Optimize the objective by maximizing the likelihood of the neighborhood existence. After DeepWalk, Tang et al. propose the LINE model(Large-scale information network embedding). Based on DeepWalk, LINE represents the relationships between nodes as 1st-order and 2nd-order proximities. Taking the KL divergence between the empirical distribution and the model as the optimization objective, the low-dimensional vector representation of nodes was learned by the random gradient descent method. However, LINE only models the first-order and second-order information of linkage but the higher-order information is ignored. The DeepWalk and LINE have no clear winning sampling strategy that works across all networks and all prediction tasks. In 2016, node2vec is proposed for overcoming this limitation by designing a flexible objective that is not tied to a particular sampling strategy and provides parameters to tune the explored search space.

GraRep [35] extends the length of the path of random walk in DeepWalk. It inspects the global probability transition matrix and improves the amount of information that contained in the random walk. Then use matrix factorization for optimization. In this way, the finer information is encoded and the model represents more detailed structure properties of networks. It’s reasonable that GraRep has higher complexity of computing because encoding finer knowledge introduces higher complexity in the perspective of multi-granular computing [74].

Yang et al. [47] analysis the NRL algorithms which can be viewed as matrix factorization and shows that this kind of NRL algorithms can be improved when higher order proximities are encoded into the proximity matrix. However, an accurate computation of high-order proximities is time-consuming and thus not scalable for large-scale networks. They propose network embedding update(NEU) algorithm based on the lower order proximities to avoid computation consumption but the algorithm can only process the unweighted and undirected graphs. Zhang et al. [49] propose the AROPE model which supports shifts across arbitrary orders with a low marginal cost. This can be seen as a concrete work for shifting NRL model between different granularities. AROPE suffers from searching the optimal hyper parameters for different task of network inference.

We can see some manifestation of multi-granular methodologies in the factorizing of the adjacency matrix. For instance, LINE encodes only the first and second order information than DeepWalk. In the process of generating the corpus, GraRep encodes nodes in the longer path to enrich the information contained in the granule and reaches higher perceptions as well as the higher computation consumptions. This is also shown by the NEU algorithm.

Figure 5.

Heatmap of cosine distance from vertex $v_{35}$ (shown by arrow) in the Cora network [51].

Figure 6.

Illustration of graph coarsening algorithms [48].

From the perspective of the change of structure information in granule, some work is to increase the information capacity by increasing the depth of random walk or the number of transition matrices which leads to the increase of computation quantity. Like GraRep, NEU, etc., the refinement of granularity results in improved accuracy, but at the same time increases the computational overhead and the model becomes infeasible for large-scale networks (over 1 million nodes). Thus, how to find a suitable granularity of information and find a compromise between precision and performance is a problem. In the framework of matrix decomposition, we must measure the increase of granularity caused by the number of transition matrices and answer questions like which leads to the increase of computation, how much computation is increased, is it linear? Can we predict the computation time in advance according to the number of nodes to be examined? DeepWalk is essentially a matrix decomposition problem. In the DeepWalk framework, how to use approximated solutions to achieve the purpose of making precision within a reasonable range as well as the reducing the system consumption. Such problems are worth studying.

In addition to the methods that increase the information in the granule to obtain the finer representation of network knowledge, on the contrast, the works that decrease information in granule and transform the knowledge from finer to coarser are also proposed in recent years. B. Perozzi et al.proposes the WALKLETS algorithm [51] which extract the network structure property knowledge from finer to coarser by subsamples the sequences in random walk. It can be seen as the reversed process of GraRep which encodes the finer structural knowledge by creating the k-length path of sequence of vertices via transition matrix multiplication. Afterward, WALKLETS extract the network representation vectors on a specific level of granularity through matrix factorization. This work shows that the selection of granularity is significant and not the finer representation the better. They also define the multi-scale representation learning: Given a graph $G=\left({V,E}\right)$ , learn a family of $k$ successively coarser representations, $X_{1},\ldots,X_{k}$ , where $X_{k}\in\mathbb{R}^{\left|V\right|\times d}$ captures the view of the network at scale $k$ .The multi-scale representation learning is essentially extracting the knowledge of network on multi-granular levels. By calculating the cosine distance from vertex $v_{35}$ through a series of successively coarser representations, we can obtain a heatmap shown as Fig. 5. Fig. 5(a), Fig. 5(b) and Fig. 5(c) relates to WALKLETS derived from the $A^{1}$ , $A^{3}$ and $A^{5}$ powers of the adjacency matrix respectively. Heavier red-colored points mean closer cosine distance in the latent representation space.

Different from most NRL methods, [57] explicitly model an edge as a function of node embeddings. They are able to reduce the final node representation dimension significantly and they find that explicitly modeling edges can drastically reduce the representation dimensionality. This method obtains highly improvements of the link-prediction tasks. They use only 8 or 16 dimensions per node and that reaches the same or higher AOC value that baseline methods need 128 or higher dimensions to get. It can be seen that applying multi-granularity computation methodologies can decrease the computation consumptions while reaching or exceeding the precision of the baseline methods.

For the research of ’from finer to coarser’, such as WALKLETS, HARP, etc. WALKLETS achieves the goal of multi-granular NRL on the scale. It obtains the different granularity through sub-sampling on the raw data. The multi-granularity method is not integrated in the calculation model, because the later processing model is fixed for the same sample result. This is different from the typical multi-granularity calculation model like deep learning, which is to form multi-granularity solution layer by layer for the original data. Each granular layer is related and different, and it is a unified whole. The validity of WALKLETS stems from the fact that network topology does have multi-scale characteristics, and network nodes have characteristics of multi-scales. However, the WALKLETS model only use simple sub-sampling way and a fixed post-processing model. It separate the original connection between each granular layer. This is the limitation of WALKLETS. It fails to exert the effect of deep learning in image, speech and NLP.

Another work proposed by H Chen, B Perozzi et al. is HARP [48], a novel method for learning low dimensional embeddings of a graph nodes that preserves higher-order structural features. They show that DeepWalk, LINE, and node2vec have at least two main disadvantages: (1) higher-order graph structural information is not modeled, and (2) their stochastic optimization can fall victim to poor initialization. HARP finds smaller graph approximations as initial representations. They propose a general meta-strategy to improve all of the state-of-the-art neural algorithms for embedding graphs, including DeepWalk, LINE, and Node2vec. Nevertheless, HARP model lacks a criterion to evaluate the granular level of the granule. That criterion is necessary to show and improve the granulation quality of models. Figure 6 illustrates the edge collapsing strategies when coarsening the original graph. In detail, Fig. 6(a): Edge collapsing on a graph snippet. Fig. 6(b): How edge collapsing fails to coalesce star-like structures. Figure 6(c): How star collapsing scheme coalesces the same graph snippet efficiently.

In HARP, edge collapsing is used for coarsening and then the refinement is made through layer by layer merging. However, it is a problem that HARP adopts a uniform reduction method for the whole network. The reduction of the global nature of network will erase the relationship between the local node and the overall behavior of the larger community. For example, the activity of a small group may affect the speech of the larger population. HARP still has some deficiencies in the expression and correlation of structural information across multiple granular layers.

Those cases above show that the concrete embodiment of the multi-granularity idea in the study of NRL for structural properties preserving. It can be seen that, due to the multi-granularity characteristics, the multi-granularity methods have been successfully applied in NRL to preserve structural properties, such as GraRep, NEU, WALKLETS, HARP, etc. They provide good inspirations for improving the efficiency of large-scale network data processing and algorithm model performance in further data mining tasks.

3.3 Multi-granular representation for the community properties of network

In the research field of complex network analysis, the properties of community structures in the network have been studied by many researchers. It’s natural to see that the community structure, as a mesoscopic level of organization, is a common feature of networks [75, 76, 77, 78]. Peter J. Mucha et al. developed a methodology to remove these limits, generalizing the determination of community structure via quality functions to multi-slice networks that are defined by coupling multiple adjacency matrices (CMAM) [79].

A lot of works propose methods for detecting the community structure in complex networks. They show the hierarchical organization of communities [80, 81, 82]. Hence the hierarchical organizations in link communities widely exist in complex networks which shows big significance of studying the representation learning methods that can preserve the hierarchical community structures in networks. Some works on NRL consider preserving the structural properties of community. However, method that explicitly represent the community properties in a hierarchical way is not seen by now. The hierarchical organizations that widely exist in link communities of networks may be ignored.

Wang et al. [83] assume that if the representation of a node is similar to that of a community, the node may have a high propensity to be in this community, they introduce an auxiliary community representation matrix to bridge the representations of nodes with the community structure. They propose a model of nonnegative matrix factorization(M-NMF) for network embedding which preserves both the microscopic structure and the mesoscopic community structure [75]. They adopt the non-negative matrix factorization NMF model [84] to factorize the pairwise node similarity matrix and learn the representations of nodes. So that the microscopic structure, i.e., the first-order and second-order proximities of nodes, is preserved. Meanwhile, the community structure is detected by modularity maximization [85].

Figure 7.

WALKLETS captures multiple scales of social relationships [51].

It is desirable to have a family of representations which captures the full range of an individual’s community membership. The WALKLETS show some results about this topic (see Fig. 7). WALKLETS captures multiple scales of social relationships like those shown in Fig. 7(a). The scale of community is illustrated as a heatmap on the original graph in Fig. 7(b) and Fig. 7(c). Color depicts cosine distance to a single vertex, with red indicating close vertices (low distance), and blue far vertices (high distance). Figure 7(b): Only immediate are near the input vertex in a fine-grained representation. Figure 7(c): In a coarse representation, all vertices in the local cluster of the graph are close to the input vertex. Subgraph from the Cora citation network [51].

Many NRL methods aim to preserve the local structure of a node, including neighborhood structure, high-order proximity as well as community structure. Conversely, any works that explicitly represent the community properties in hierarchical way are not proposed, although the multi-scale community has been studied by former researchers and the hierarchical organization of communities and multi-scale community detection methods have been discussed by former works. This may ignore the hierarchical organization properties which widely exist in link communities in networks.

3.4 Multi-granular representation for the content in network

Granular computing is a broad concept. In addition to consider the structure and community information in granule obtained by NRL algorithms, the content in the network can also be regarded as information which should be contained in information granule. In reality, network vertices contain rich information (such as text), which cannot be well applied with algorithmic frameworks of typical representation learning methods. In this character, we discuss about the methods of information network representation which considers the content in nodes and edges in the network. Different from the NRL methods discussed above, the connotation of multi-granularity is neither the variety of amount of nodes nor the scale of communities. Moreover, the content or side information on nodes and edges are regarded as another kind of information granule.

With the proliferation of rich graph contents, such as user profiles in social networks, and gene annotations in protein interaction networks, it is necessary to consider both the structure and content information of graph for high-quality graph clustering. In the field of graph clustering, for the task of content-enriched graph embedding, AGC [87] proposes an approach by considering the side info: They quantify vertex-wise attribute proximity into edge weights and employ truncated, attribute-aware random walks to learn the latent representations for vertices so that the localized structural and attributive information of vertices are encoded.

Le et al. [18] join the various representations of a document (words, links, topics, and coordinates) within a generative model and learn the representation of nodes in the topic space based on the Relational Topic Model (RTM) [25]. They estimate the hidden representations of the vertex associated with a text document through maximum a posteriori (MAP) estimation using EM algorithm [88]. PTE [31], inspects the representation problem of large-scale heterogeneous text networks and use the represented vectors for prediction tasks. PTE add different types of information as new kinds of information granule. They encoded three types of text networks which can be seen as three kinds of information granules. Qiu, Yu et al. [69] show that all of the aforementioned models with negative sampling can be unified into the matrix factorization framework with closed forms. Their analysis and proofs show that as an extension of LINE, PTE can be seen as the joint factorization of multiple networks’ Laplacians which can be viewed, in the perspective of multi-granularity computing, as mixing types of information into granule when granulating the text information in the network.

Figure 8.

(a) Deepwalk as matrix factorization. (b) Text associated matrix factorization in TADW [86].

Yang et al. propose TADW [86]. TADW (Text-associated DeepWalk) incorporates text features of vertices into network representation learning under the framework of matrix factorization. Depends on the matrix factorization view of DeepWalk(see Fig. 8(a)).They introduce text information into MF for NRL. Figure 8(b) shows the main idea of this method: factorize matrix M into the product of three matrices: $W\in R^{k\times\left|V\right|}$ , $H\in R^{k\times f_{t}}$ and text features $T\in\mathbb{R}^{f_{t}\times\left|V\right|}$ . Then the 2 $k$ -dimensional representations of vertices are obtained by concatenating $W$ and $H T$ . TADW suffers from high computational cost and the node attributes are simply incorporated as unordered features. That loses much semantic information.

In order to increase the discrimination in the task of node classification, the MMDW (Mar-margin DeepWalk) utilize the semi-supervised learning for NRL [89] based on the matrix factorization NRL model and mar-margin classifier. In perspective of multi-granular computing, the label information can be viewed as the addition of granules in the NRL process. Similarly, the DDRW (Discriminative Deep Random Walk) [89] jointly train the DeepWalk model and max-margin classifier to improve the performance of node classification.

Another work that focuses on the difference of relationships using associated text information in the NRL is CANE [90]. CANE utilize the text information to describe the difference of style of relationships in social networks such as co-authors of papers. They assume that the representative vector of each node consists of text representation and structural representation. Such that the node representation encodes the context information due to the text representations are correlated with the neighbors in the network. The convolutional neural networks are used to encode the text information on two nodes connected by one edge. Lu et al. [91] consider the embedding of relational structure in heterogeneous information networks. This is a concrete work of applying multi-granularity computing methodologies in merging the structural granule with the content granule in the NRL models.

Pan et al. propose TriDNR [92], a tri-party deep network representation model, using information from three parties: node structure, node content, and node labels (if available) to jointly learn optimal node representation. As a result, the learned representations is enhanced by the three sides of network information. LANE [93] incorporate the label information into the representation of networks based on spectral techniques.

Figure 9.

Multi-layer Graph Convolutional Network (GCN) with first-order filters [95].

The graph convolutional networks (GCN) models (see Fig. 9) that appear in recent years generalize well-established neural models like RNNs or CNNs to work on arbitrarily structured networks. And these models achieve remarkable success. From RNNs and CNNs, the GCNs [94, 95, 96, 97, 98] borrow the idea of facilitating the multi-layer connecting models to extract multi-granular features of objects and these works are the typical embodiments of multi-granular computing methodologies. These models mainly adopt that information aggregating among neighborhood of nodes and nodes’ features such as text-data or protein-protein interactions to generate representations. They extract the local structural features between node and its neighborhoods by operations that similar with the convolutional kernels of CNNs. In addition, the mechanism of sharing parameters among nodes increases the efficiency of model training.

The content of nodes and edges are all considered as the addition of network representation through the approaches adopt various strategies. The side information can be viewed as additional information granule to obtain the finer knowledge of networks and the accuracy of model should be higher at a finer level of granularity.

The characteristics of multi-granularity information should also be considered and captured progressively. Although these models consider the problem of multi-data sources, they may have latent multi-level features in the data characteristics as well. Existing models do not consider these problems. If the multi-level features are considered while finer granule is grained by side information, the property of multi-granularity in side information and structural information would be obtained simultaneously by NRL model.

3.5 Multi-granular representation for advanced information in the network

The network analysis and mining task is far from just node classification, link prediction, community detection, etc. There are also many specific and more advanced analysis tasks, such as information diffusion mechanism [99], information cascade [100] and anomaly detection [101]. Advanced information preserving network representation learning tend to jointly consider representation and its specific network inference tasks. Methods for representing the structural properties of networks are designed while the domain knowledge of specific tasks is also needed to be considered. This requires researchers to have domain knowledge, and at the same time, model designing needs to consider the application scenarios. This section will discuss the embodiment of multi-granularity ideas in the NRL methods that contain advanced information.

In DGCC [40], data is considered to be knowledge in the lowest granularity level while knowledge is considered to be the abstraction of data in different granularity layers. People combine the advanced network analysis tasks and their domain knowledge for these special tasks in order to obtain the network representations which contains the properties for the advanced tasks. Wang et al. discussed the relationship between the prior knowledge of domain experts and the knowledge mined from data [102]. They address these basic issues of data mining from the viewpoint of informatics [103].

Domain-oriented knowledge is the information on higher granularity. Here the multi-granularity network representation has the connotation that the domain knowledge, as the information on the higher granularity, was combined with the structural properties of networks as the information on the lower granularity.

Information diffusion [99] is a ubiquitous phenomenon on the web, especially in social networks. Bourigault et al. [21] propose asocial network embedding algorithm for predicting information diffusion. The basic idea is to map the observed information diffusion process into a heat diffusion process modeled by a diffusion kernel in the continuous space. On the other hand, for the task of popular topic detecting, Leskovec et al. [104] show that the temporal dynamics of the most popular topics in social media are indeed made up of successive bursts of popularity based on the work of Kleinberg [105] which yield a nested representation of the set of bursts that imposes a hierarchical structure on the overall stream. If the hierarchical structure is considered in the representing learning, there may be some improvement for the information diffusion tasks.

Information cascades are identified to be a major factor in almost every plausible or disastrous social network phenomenon, such as viral marketing and diffusion of innovation. Predicting the increment of cascade size after a given time interval becomes significant in the social analysis [100]. Different from the previous work that all depends on bag of hand-crafting features to represent the cascade and network structures. Li et al. [106] use the idea of network representation learning to achieve an end-to-end learning model to predict the information cascade. Once the representation of this cascade is obtained, a multi-layer perceptron [107] can be adopted to output the final predicted size of this cascade. However, any work for explicitly extracting the multi-granular knowledge for information cascade will be significant due to the multi-scale properties in the information cascade [108].

Table 2
Summary of the key works of each Section of the survey

Sections for multi-granular NRL	The key works of each sections
Multi-level embedding for graphs	SSGEAL [63], Fuzzy-interval approaches [64, 65, 66], Graph embedding [68]
Multi-granular representation for the structure properties of network	PCA [71], IsoMap [72], DeepWalk [29], LINE [30], PTE [31], node2vec [32], GraRep [35], AROPE [49], HARP [48]
Multi-granular representation for the community properties of network	WALKLETS [51], Community feature extraction [75, 76, 77, 78]. CMAM [79], M-NMF [83], NMF [84]
Multi-granular representation for the content in network	AGC [87], RTM [25], TADW [86], MMDW [89], CANE [90], TriDNR [92], GCNs [94, 95, 96, 97, 98],
Multi-granular representation for advanced information in the network	NRL for information difussion [21], NRL for popular topic detecting [104, 105], DeepCas [106]

To the best of our knowledge, other network representation approaches for advanced information representation such as anomaly detection do not show multi-granular properties or there is no work about the multi-granular property preserving network representation learning for anomaly detection in graphs. There exists a representative work for detecting the anomaly based on the network representation learning which is proposed by Hu et al. in 2016 [101]. The summery of key research works mentioned in above sections for multi-granular NRL analysis is listed in Table 2.

4. Application

We look into this topic to find if there is any multi-granular method for applying network representations for specific network inference tasks. The representation of networks has multiple ways of applications. The important tasks in network analysis involve predictions and classifications over nodes and edges such as node classification, link prediction, community detection, and visualization. There are surely some other applications which are not so widely concerned and we will discuss them if needed.

4.1 Node classification

Node classification is predicting the most probable labels of nodes in a network [109]. For instance, in a social network, we need to predict the interest of a node depends on its structural properties [110, 111] or in a protein-protein interaction network we need to predict the functional labels of proteins [112]. This is the most common application of NRL. Given some nodes with known labels in a network, the node classification problem is to classify the rest nodes into different classes. The classification accuracy will increase if we consider the community properties of nodes such as the work of M-NMF [83]. They got higher performance than DeepWalk, LINE [30], GraRep [35] and node2vec [32] in the task of node classification. Through this, they verified the necessity of introducing mesoscopic community structure to network representation learning. As the community has the multi-scale properties, we may obtain different classification result if we consider the community structure at different granular levels when we represent the networks into low dimensional vector space.

4.2 Link prediction

Link prediction has attracted a large amount of research as a fundamental problem on network data mining [10, 113]. Most often, some links are observed, and one is attempting to predict unobserved links, or there is a temporal aspect: a snapshot of the set of links at time $t$ is given and the goal is to predict the links at time $t+1$ . The common setup [32] is to “hold out” test edges $E_{\textit{test}}\subset E$ and train on the remaining $E_{\textit{train}}=E-E_{\textit{test}}$ . Structure-preserving representations should retrieve the “held-out” $E_{\textit{test}}$ with high accuracy. In a wide variety of domains, such as genomics, link prediction helps us to discover novel interactions between genes [114]. In addition, it can also identify real-world friends in social network analysis [115]. Lise Getoor et al. summarizes the main approaches for link prediction [116], which shows the model-based probabilistic approaches have a computational price: exact inference is generally intractable, so approximate inference techniques are necessary.

Network representation captures the implicit structural properties of networks which are highly related to link prediction and inference. Hence the application of network representation brings in improvements for link prediction tasks. Various studies on network representation learning demonstrate their performance on link prediction task [70]. Nevertheless, multi-granular network representation may allow us to implement link prediction on multi granularities due to the multi-granular structural properties of networks. For instance, when the stream data of networks obtained, some scenarios require process tasks like link predicting in limited time. The coarser representation of network data which consumes fewer computation resources may meet that requirement.

Table 3
Performances of typical NRL algorithms that are listed in this paper

	Micro-F1 score of node classification on citeseer dataset	Micro-F1 score of node classification on cora [95] Dataset	ROC AUC of link prediction on BlogCatalog [117] dataset
M-NMF [83]	84.7	73.9	0.825
DeepWalk [29]	64.7	67.2	0.914
node2vec [32]	71.3	75.1	0.932
LINE [30]	73.1	76.6	0.927
GraRep [35]	82.2	78.4	0.910
NEU [47]	85.4	81.1	0.948
AROPE [49]	83.0	78.5	0.924
WALKLETS [52]	69.2	60.3	0.930
HARP [48]	82.9	78.1	0.949

The performances of some typical network representation learning algorithms are listed in Table 3. To make a fair comparison, we selected the Cora [95] and CiteSeer [118] datasets to compare the performance of these algorithms on the node classification task. The original papers of these algorithms include experimental results on these two datasets. In addition, we also listed the performance of these algorithms on the link prediction task, which is generally given in terms of the area under the ROC curve (AUC). Note the BlogCatalog [117] dataset is used for this link prediction performance comparison.

4.3 Community detection

Community detection, which is also called as group detection in link mining area [116], the goal of which is to cluster the nodes in the graph into groups(clusters) that share some common characteristics. The nodes within the same cluster are more similar to each other than the nodes in different clusters. In traditional way, a lot of works have been proposed in various communities to address node clustering problem in graph. For the single type of nodes and edges without attributes, agglomerative or divisive clustering methods are used. The deterministic methods such as blockmodeling [6], Spectral graph partitioning [104, 119], edge betweenness [120, 121, 83] separate the graph into clusters while based on the stochastic blockmodeling from social network analysis. The observed social network is assumed to be a realization from a pair-dependent stochastic blockmodel [1]. Several works such as generalization and extension of stochastic blockmodeling are committed [122, 123, 124]. To exploit multi-relational data to detect indicators of collaboration, Adibi et al. [125] propose a hybrid approach that initially posits potential groups using knowledge-based reasoning techniques. A generative model for multi-type link generation proposed by Kubica et al. [126, 127]. Wang et al. [128] propose a generalization of the general stochastic blockmodeling approach that allows joint inference of groups and topics based on observed relationships and their textual attributes. Such a model provides a mechanism to connect an observed relationship with its underlying context.

Recently, network representation learning research provides new approaches for the task of group detection and community detection which can also be achieved via directly doing clustering on the learned representation of nodes. Many typical clustering methods, such as K-means [129], can be directly adopted to cluster nodes based on their learned representations. Many researchers have tested their network representation learning work on the task of community detection [130, 131].For instance, Wang et al. [83] did the community preserving network embedding on node clustering. The network datasets such as YAGO [132] and Freebase [133] are used for experiments.

The central challenge for community detecting network representation learning is to develop scalable methods that can exploit increasingly complex graphs to aid the knowledge discovery process [116]. In the process of NRL, integrating the multi-scale properties in representation vectors may bring some probabilities for overcoming that challenge. For instance, Xu et al. [43] propose the hierarchical clustering which is the embodiment of multi-granular methodologies dealing with the clustering problem. It views the data as the nodes in a tree and analysis data on different granularities by clustering on multi-granularities. This idea can be integrated into the tasks like node clustering or community detection for network data mining.

4.4 Visualization

Visualization of nodes of networks makes it easy to see a big picture of a sophisticated network and the structure of networks becomes intuitive. It’s also a typical application of low-dimensional vectors obtained by NRL. Some properties of networks can be easily discriminated by visualization. Hence, LINE, GraRep, EOE [111] and SDNE [101] are applied to a citation network DBLP and generate a meaningful layout of the network using a visualization tool such as t-SNE [134]. Pan et al. [92] show the visualization of another citation network Citeseer-M10 [135] consisting of scientific publications.

The low-dimensional representation of multi-granular NRL may show network attributes on different granular layers. On the other hand, the multi-granular methodology can be used to simplify the result of the visualized data. For instance, coarser visualization shows us the global characteristics of networks while the finer visualization shows us the specific local properties of nodes or communities. This allows us to transfer the visualization appearance among different levels of granularities and obtain the multi-granular result as we want.

4.5 Advanced applications of NRL

In addition to the original network inference tasks such as node classification, link prediction, community detection and visualization, people also implement NRL models for advanced network analysis such as anomaly detection [101], recommendation [136, 137], information cascade [138], knowledge graph [139], identity detection across networks [50, 140, 141] and outlier detection [142]. For such more specific tasks, the models need to consider the specific factors that can discriminate the network data for the target applications. The most important issue is how to incorporate the domain-oriented knowledge in to the NRL models and generate the knowledge contained granule for the advanced, domain-oriented, specific network inference tasks. In the multi-granular computing perspective, this problem can be viewed as how to let the coarser knowledge in higher level merge into in the model when granulation.

5. Issues and open research problems

We discuss about the issues exist in the study of multi-granular network representation. Nevertheless, some of the issues exist not only in the multi-granular NRL but also in general NRL studies, such as the meaning of each dimension of representation vector $v_{i}$ for vertex $i$ . NRL can be viewed as feature extraction which automatically generates the latent representation vectors that contains the inherent discriminative properties of network data. It’s essentially granulation for the original network information. Any work that can qualitatively or quantitatively describe the meaning of each representation vector will be significant.

There are many theoretical issues to be studied for implementing multi-granular NRL model. For instance, large scale streaming data requires NRL algorithms to process the time-limited scenarios and provide solutions on different granularity levels in a timely way. Meanwhile, seeking an optimal granularity level and reasonable approach of granulation for a specific network inference task becomes an important problem. We sort out the issues and scientific problems in six aspects below, and discuss the future research prospects for such problems.

5.1 Progressive multi-granular NRL model

Usually, coarser answers could be generated in a higher granularity layer with less time cost, while finer solutions in a lower granularity layer with more time cost [40]. Progressive variable granularity NRL algorithms and models should be developed for the scenarios of multiple range of time-limits. The matrix factorization based NRL algorithms such as GraRep [35] and TADW [86] suffer from high computational cost but how to approximate the network information and simplify the computation of matrix factorization becomes significant. It’s desired that the coarser answers can be generated at first, and more exact answers will be available in lower granularity layers later. We need to select the optimal granular layer depends on the attributes of a network and the specific analysis tasks. Any conclusive laws that can guide the granularity process will be of great value.

Figure 10.

Granulation of network made at several different layers.

5.2 Multiple granularity joint NRL model and problem-solving mechanism

Data, information and knowledge should be represented in a multiple granularity space together by a multiple granularity joint NRL model. The multi-granular representation of such NRL models could be used in problem solving simultaneously in a parallel way. As shown in Fig. 10, granulation of network data can be made at several different layers simultaneously. Representations in different layers might be either dependent or independent. Such that, the mechanisms for joint computing and NRL in a multiple granularity space is required. Moreover, the merging and decomposing mechanisms for granules across multi-granular layers are also needed. WALKLETS shows some results that can offer multi-granularity representations for structural properties preserving. However, any models that can transform the granule across multi-granular layers via approximating graph structures such as edge collapse is needed for the task of multi-granular joint solving problems. In addition, the well-designed models that can process the granules consisted of multi-granular information sources such as side information, community information, etc. in parallel way should be of great significance.

5.3 Multi-granular representation for network evolutions

For the scenario of online learning of dynamic networks, edges and nodes are disappeared or added to networks over time. Most of the existing NRL methods can only repeat to compute the result when changes appear. DeepWalk and its similar models show some capabilities to handle the new-coming node online. But for multi-granular NRL, how to deal with the data streaming situation and how to determine the optimal granularity for data streaming need more research to answer. The high-efficient approximation mechanisms for large-scale streaming network data processing is of great value in the situation of time-limited dynamic network inference. We noticed that Dongsheng Duan et al. focus on the problems of dynamic networks [143] and Sun et al. [144] embeds network transfer behaviors in vector space. But how to merge together the methodologies of incremental computing and the methodologies of multi-granular computing, in the NRL model, to exploit the advantages of incremental computing when processing large-scale networks by NRL model is still of great interests.

5.4 The criterion for describing the size of granule obtained by NRL

We see the way of granulation in NRL has different types. Some of them achieve this by increasing the order of proximities or transforming matrices such as GraRep, LINE and NEU. On the other hand, the models like TADW, TriDNR add side information to increase the knowledge that the representation contains. This can be viewed as representation on the finer granular level. In the respective of multi-granular computing, information granulation for obtaining finer or coarser granule should have a criterion to describe the size of granule. However, lacking of the criterions for assessing the size of granule may be the reason that it’s impossible for NRL researchers to seek the optimal granularity when doing multi-granular NRL tasks. If there is a reliable criterion to evaluate the amount of information that represented by the granule generated by multi-granular NRL model, seeking the optimal granularity and granulation methods would become possible. Thus, how to represent the level of granularity is still a significant problem.

5.5 Evaluate and avoid the influence of unbalance in networks

The power law distribution property indicates that few nodes occupied more edges but most nodes are associated with a small number of edges. How this unbalance influent NRL and how to prevent this introducing negative influence in the multi-granular NRL process are still largely untouched.

5.6 Representation for the higher-order structural properties in networks

Multi-granular computing for NRL transforms the knowledge from finer representation to coarser one. It’s worth mentioning that most of the complex local structure of a node can be considered to provide higher level constraints. Recent NRL methods assume that the nodes share common edge should have higher proximity and they work well for some tasks such as link prediction. However, the node’s centrality information which is usually related to a more complex structure would be ignored. Similar problems of higher-order structural representation like motifs [40] structure preserving representation are still an open problem for multi-granular network representation learning research.

6. Conclusion

Networks have its complexity and highly abstract information and characteristics. This has attracted various data mining research. Comparing with the traditional network analysis methods, network representation learning maps the original data into lower dimensional space for applying the off-the-shelf machine learning algorithms. But the NRL algorithm meets some bottlenecks such as low-efficiency and low-accuracy. Considering the multi-level characteristics and the resulting contradictions in NRL, the application of multi-granular methodology is applicable. Multi-granular methodology is inspired by human recognition methods. It approximates the complex information into coarser knowledge and simplifies the original problems by decomposing the original problem into multiple granularities. In the perspective of multi-granular computing, the finer granularity of NRL can be obtained by encoding more detailed information into models and the improvement that achieved by the algorithms can be viewed as that getting more accuracy at the finer granular level. We propose a multi-granularity perspective on recent NRL research and sort out several prospect scientific problems about how to apply multi-granular computing methodologies in future NRL research. Although quite a number of former works show the inspirations of multi-granular methodology in recent research works of NRL, there should be more better-designed and explicit applications of multi-granular computation methodology in NRL models.

Footnotes

Acknowledgments

This work is supported in part by the Science and Technology Research Program of Chongqing Municipal Education Commission (Grant No. KJZD-M202203201, KJQN202303203), Doctoral Fund of Chongqing Industry Polytechnic College (No. 2023GZYBSZK3-03), the National Science Foundation of China (No.6206049), the Excellent Young Scientific and Technological Talents Foundation of Guizhou Province (QKH-platform talent [2021] No.5627), and the Science and Technology Top Talent Project of Guizhou Education Department (QJJ2022[088]).

Conflict of interest

The authors declare that they have no conflicts of interest to report regarding the present study.

Funding statement

The author(s) received no specific funding for this study.

References

Newman

M.E.J.

, The Structure and Function of Complex Networks, SIAM Review 45(2) (2003), 167–256.

Yang

Tao

Zhong

, Compound Fault Diagnosis of Harmonic Drives Using Deep Capsule Graph Convolutional Network, IEEE Transactions on Industrial Electronics (2022).

Wang

Chen

, Complex networks: small-world, scale-free and beyond, IEEE Circuits and Systems Magazine 3(1) (2003), 6–20.

Chen

Zhange

, Network Representation Learning, Big Data(in Chinese) 1(3) (2017), 2015025.

Singh

Getoor

Licamele

, Pruning Social Networks Using Structural Properties and Descriptive Attributes, in Proceedings of the Fifth IEEE International Conference on Data Mining, Washington, DC, USA, 2005, pp. 773–776. doi: 10.1109/ICDM2005.125.

Wasserman

Faust

, Social network analysis: Methods and applications, vol. 8. Cambridge university press, 1994.

Chaudhuri

Chatterjee

Katz

Nelson

Goldbaum

, Detection of blood vessels in retinal images using two-dimensional matched filters, IEEE Transactions on Medical Imaging 8(3) (1989), 263–269.

Bhagat

Cormode

Muthukrishnan

, Node classification in social networks, in Social network data analytics, Springer, 2011, pp. 115–148.

Vishwanathan

S.V.N.

Schraudolph

N.N.

Kondor

Borgwardt

K.M.

, “Graph kernels”, Journal of Machine Learning Research 11(2010), 1201–1242.

10.

Liben-Nowell

Kleinberg

, The link-prediction problem for social networks, Journal of the American Society for Information Science and Technology 58(7) (2007), 1019–1031.

11.

Mora-Gutiérrez

R.A.

Rincón-García

A.E.

Ponsich

Ramírez-Rodríguez

Méndez-Gurrola

I.I.

, Influence of social network on method musical composition, Artificial Intelligence Review 46(2) (2016), 225–266.

12.

Hinton

et al., Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine 29(6) (2012), 82–97.

13.

Hinton

G.E.

Salakhutdinov

R.R.

, Reducing the dimensionality of data with neural networks, Science 313(5786) (2006), 504–507.

14.

Lee

E.W.M.

Wong

LT..

Mui

K.W.

, Development of a Hybrid Artificial Neural Network Model and its Application to Data Regression, Intelligent Automation & Soft Computing 18(4) (2012), 319–332. doi: 10.1080/107985872012.10643246.

15.

Jahani Fariman

Ahmad

S.A.

Hamiruce Marhaban

Ali Jan Ghasab

Chappell

P.H.

, Simple and Computationally Efficient Movement Classification Approach for EMG-controlled Prosthetic Hand: ANFIS vs. Artificial Neural Network, Intelligent Automation & Soft Computing 21(4) (2015), 559–573. doi: 10.1080/107985872015.1008735.

16.

Kannan

Vempala

, Spectral algorithms, Foundations and Trends® in Theoretical Computer Science 4(3–4) (2009), 157–288.

17.

Brand

Huang

, A unifying theorem for spectral embedding and clustering, in AISTATS, 2003.

18.

T.M.

Lauw

H.W.

, Probabilistic latent document network embedding, in IEEE International Conference on Data Mining, 2014. pp. 270–279.

19.

Jacob

Denoyer

Gallinari

, Learning latent representations of nodes for classifying in heterogeneous social networks, in Proceedings of the 7th ACM international conference on Web search and data mining, 2014, pp. 373–382.

20.

Yang

Leskovec

, Modeling information diffusion in implicit networks, in Data Mining (ICDM), 2010 IEEE 10th International Conference on, 2010, pp. 599–608.

21.

Bourigault

Lagnier

Lamprier

Denoyer

Gallinari

, Learning social network embeddings for predicting information diffusion, in Proceedings of the 7th ACM international conference on Web search and data mining, 2014, pp. 393–402.

22.

Hofmann

, Probabilistic latent semantic indexing, in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, 1999, pp. 50–57.

23.

Blei

D.M.

A.Y.

Jordan

M.I.

, Latent dirichlet allocation, Journal of Machine Learning Research 3; (2003), 993–1022.

24.

Nallapati

R.M.

Ahmed

Xing

E.P.

Cohen

W.W.

, Joint latent topic models for text and citations, in Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, 2008, pp. 542–550.

25.

Chang

Blei

, Relational Topic Models for Document Networks, in Artificial Intelligence and Statistics, 2009. pp. 81–88.

26.

Mikolov

Chen

Corrado

Dean

, Efficient Estimation of Word Representations in Vector Space, in In Workshop Track Proceedings of International Conference on Learning Representations, Scottsdale, Arizona, USA, 2013.

27.

Mikolov

Yih

Zweig

, Linguistic regularities in continuous space word representations, in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2013, pp. 746–751.

28.

Mikolov

Sutskever

Chen

Corrado

G.S.

Dean

, Distributed representations of words and phrases and their compositionality, in Advances in neural information processing systems, 2013. pp. 3111–3119.

29.

Perozzi

Al-Rfou

Skiena

, Deepwalk: Online learning of social representationsin, Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, 2014, pp. 701–710.

30.

Tang

Wang

Zhang

Yan

Mei

, Line: Large-scale information network embedding, in Proceedings of the 24th International Conference on World Wide Web, Florence, 2015, pp. 1067–1077.

31.

Tang

Mei

, Pte: Predictive text embedding through large-scale heterogeneous text networks, in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 1165–1174.

32.

Grover

Leskovec

, node2vec Scalable Feature Learning for Networks, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13–17, 2016, San Francisco, 2016, pp. 855–864.

33.

Bengio

Courville

Vincent

, Representation learning: A review and new perspectives, IEEE transactions on pattern analysis and machine intelligence 35(8) (2013), 1798–1828.

34.

Chen

Tong

Liu

, Multi-layered network embedding, in Proceedings of the 2018 SIAM International Conference on Data Mining, 2018, pp. 684–692.

35.

Cao

, Grarep: Learning graph representations with global structural information, in Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, 2015, pp. 891–900.

36.

Liu

Huang

, On Interpretation of Network Embedding via Taxonomy Induction, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1812–1820.

37.

Cui

Wang

Zhu

, Hierarchical taxonomy aware network embedding, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1920–1929.

38.

Yao

J.T.

Vasilakos

A.V.

Pedrycz

, Granular computing: perspectives and challenges, IEEE Transactions on Cybernetics 43(6) (2013), 1977–1989.

39.

Yao

, Perspectives of granular computing, in 2005 IEEE international conference on granular computing 1 (2005), 85–90.

40.

Wang

, DGCC: data-driven granular cognitive computing, Granular Computing 2(4) (2017), 343–355.

41.

Klir

G.J.

Yuan

, Fuzzy sets, fuzzy logic and fuzzy systems: selected papers by Lotfi A. Zadeh. World Scientific Publishing Co., Inc., 1996.

42.

Zadeh

L.A.

, Toward a theory of fuzzy information granulation and its centrality in human reasoning and fuzzy logic, Fuzzy Sets and Systems 90(2) (1997), 111–127. doi: 10.1016/S0165-0114(97)00077-8.

43.

Wang

Deng

, DenPEHC: Density peak based efficient hierarchical clustering, Information Sciences 373 (2016), 200–218.

44.

Kareem

E.I.A.

Alsalihy

W.A.H.A.

Jantan

, Multi-Connect Architecture (MCA) Associative Memory: A Modified Hopfield Neural Network, Intelligent Automation & Soft Computing 18(3) (2012), 279–296. doi: 10.1080/107985872008.10643243.

45.

Liu

Wang

, Granular computing based on gaussian cloud transformation, Fundamenta Informaticae 127(1–4) (2013), 385–398.

46.

Poria

Cambria

Gelbukh

, Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis, in Proceedings of the 2015 conference on empirical methods in natural language processing, 2015, 2539–2544.

47.

Yang

Sun

Liu

, Fast network embedding enhancement via high order proximity approximation, in Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI, 2017. pp. 19–25.

48.

Chen

Perozzi

Skiena

, Harp: Hierarchical representation learning for networks, in Proceedings of the AAAI conference on artificial intelligence 32(1) (2018), 2127–2134.

49.

Zhang

Cui

Wang

Pei

Yao

Zhu

, Arbitrary-Order Proximity Preserved Network Embedding, in Proceedings of the 24thACM SIGKDD International Conference on Knowledge Discovery & Data Mining – KDD ’18, London, United Kingdom, 2018, pp. 2778–2786. doi: 10.1145/32198193219969.

50.

Zhang

Tong

Maciejewski

Eliassi-Rad

, Multilevel Network Alignment, presented at the International World Wide Web Conference Committee, 2019.

51.

Perozzi

Kulkarni

Chen

Skiena

, Dson’t Walk, Skip!: Online Learning of Multi-scale Network Embeddings, in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 2017, pp. 258–265.

52.

Perozzi

Kulkarni

Skiena

, Walklets: Multiscale Graph Embeddings for Interpretable Network Classification, p. 16.

53.

Clauset

Moore

Newman

M.E.J.

, Hierarchical structure and the prediction of missing links in networks, Nature 453(7191) (2008), 98–101.

54.

Cai

Zheng

V.W.

Chang

K.C.-C.

, A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications, IEEE Transactions on Knowledge and Data Engineering 30(9) (2018), 1616–1637.

55.

Hamilton

W.L.

Ying

Leskovec

, Representation Learning on Graphs: Methods and Applications, IEEE Data Engineering Bulletin 40(3) (2017), 52–74.

56.

Belkin

Niyogi

, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in neural information processing systems, 2002. pp. 585–591.

57.

Abu-El-Haija

Perozzi

Al-Rfou

, Learning edge representations via low-rank asymmetric projections, in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 1787–1796.

58.

Bunke

Irniger

Neuhaus

, Graph matching–challenges and potential solutions, in International Conference on Image Analysis and Processing, 2005. pp. 1–10.

59.

Bunke

Günter

Jiang

, Towards bridging the gap between statistical and structural pattern recognition: Two new concepts in graph matching, in International Conference on Advances in Pattern Recognition, 2001. pp. 1–11.

60.

Roth

Laub

Kawanabe

Buhmann

J.M.

, Optimal cluster preserving embedding of nonmetric proximity data, IEEE Transactions on Pattern Analysis and Machine Intelligence 25(12) (2003), 1540–1551.

61.

Chen

Yang

Tang

, Directed graph embedding, in Proceedings of the 20th international joint conference on Artifical intelligence, 2007, pp. 2707–2712.

62.

Conte

Foggia

Sansone

Vento

, Thirty years of graph matching in pattern recognition, International Journal of Pattern Recognition and Artificial Intelligence 18(3) (2004), 265–298.

63.

Lee

Madabhushi

, Semi-supervised graph embedding scheme with active learning (SSGEAL): classifying high dimensional biomedical data, in IAPR International Conference on Pattern Recognition in Bioinformatics, 2010. pp. 207–218.

64.

Luqman

M.M.

Lladós

Ramel

J.-Y.

Brouard

, A fuzzy-interval based approach for explicit graph embedding, in Recognizing patterns in signals, speech, images and videos, Springer, 2010. pp. 93–98.

65.

Luqman

M.M.

Lladós

Ramel

J.-Y.

Brouard

, Dimensionality Reduction for Fuzzy-Interval Based Explicit Graph Embedding, in Ninth IAPR International Workshop on Graphics RECognition 9; (2011), 117–120.

66.

Luqman

M.M.

Ramel

J.-Y.

Lladós

Brouard

, Fuzzy multilevel graph embedding, Pattern Recognition 46(2) (2013), 551–565.

67.

Conte

et al., A comparison of explicit and implicit graph embedding methods for pattern recognition, in International Workshop on Graph-Based Representations in Pattern Recognition, 2013. pp. 81–90.

68.

Yan

Zhang

H.-J.

, Graph embedding: A general framework for dimensionality reduction, in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on 2 (2005), 830–837.

69.

Qiu

Dong

Wang

Tang

, Network embedding as matrix factorization: Unifying deepwalk, line, pte and node2vec, in Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, 2018, pp. 459–467.

70.

Cui

Wang

Pei

Zhu

, A survey on network embedding, IEEE Transactions on Knowledge and Data Engineering, 2018. doi: 10.1109/TKDE2018.2849727.

71.

Jolliffe

I.T.

, Principal component analysis and factor analysis, in Principal component analysis, Springer, 1986, pp. 115–128.

72.

Tenenbaum

J.B.

De Silva

Langford

J.C.

, A global geometric framework for nonlinear dimensionality reduction, Science 290(5500) (2000), 2319–2323.

73.

Roweis

S.T.

Saul

L.K.

, Nonlinear dimensionality reduction by locally linear embedding, Science 290(5500) (2000), 2323–2326.

74.

Yang

Fujita

Liu

Yao

, A unified model of sequential three-way decisions and multilevel incremental processing, Knowledge-Based Systems 134; (2017), 172–188.

75.

Girvan

Newman

M.E.

, Community structure in social and biological networks, Proceedings of the national academy of sciences 99(12) (2002), 7821–7826.

76.

Newman

M.E.

, Detecting community structure in networks, The European Physical Journal B 38(2) (2004), 321–330.

77.

Alessandro

Guido

, Large scale structure and dynamics of complex networks: from information technology to finance and natural science Vol. 2. World Scientific, 2007.

78.

Meyers

R.A.

, Encyclopedia of complexity and systems science. Springer, 2009.

79.

Mucha

P.J.

Richardson

Macon

Porter

M.A.

Onnela

J.-P.

, Community structure in time-dependent, multiscale and multiplex networks, Science 328(5980) (2010), 876–878.

80.

Lancichinetti

Fortunato

Kertész

, Detecting the overlapping and hierarchical community structure in complex networks, New Journal of Physics 11(3) (2009), 033015.

81.

Ahn

Y.-Y.

Bagrow

J.P.

Lehmann

, Link communities reveal multiscale complexity in networks, Nature 466(7307) (2010), 761.

82.

Rosvall

Bergstrom

C.T.

, Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems, PLoS ONE 6(4) (2011), e18209.

83.

Wang

T.-S.

Lin

H.-T.

Wang

, Weighted-spectral clustering algorithm for detecting community structures in complex networks, Artificial Intelligence Review 47(4) (2017), 463–483.

84.

Lee

D.D.

Seung

H.S.

, Algorithms for non-negative matrix factorization, in Advances in neural information processing systems, 2001. pp. 556–562.

85.

Newman

M.E.

, Finding community structure in networks using the eigenvectors of matrices, Physical review E 74(3) (2006), 036104.

86.

Yang

Liu

Zhao

Sun

Chang

E.Y.

, Network representation learning with rich text information, in IJCAI, 2015. pp. 2111–2117.

87.

Akbas

Zhao

, Attributed Graph Clustering: an Attribute-aware Graph Embedding Approach, in Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, 2017, pp. 305–308.

88.

Dempster

A.P.

Laird

N.M.

Rubin

D.B.

, Maximum likelihood from incomplete data via the EM algorithm, Journal of the royal statistical society Series B (methodological) 1977. pp. 1–38.

89.

Zhang

Liu

Sun

, Max-Margin DeepWalk: Discriminative Learning of Network Representation, in IJCAI, 2016. pp. 3889–3895.

90.

Liu

Sun

, Cane: Context-aware network embedding for relation modeling, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 1 (2017), 1722–1731.

91.

Shi

Liu

, Relation Structure-Aware Heterogeneous Information Network Embedding, in Thirty-Third AAAI Conference on Artificial Intelligence, 2019.

92.

Pan

Zhu

Zhang

Wang

, Tri-party deep network representation, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016. pp. 1895–1901.

93.

Huang

, Label informed attributed network embedding, in Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017. pp. 731–739.

94.

Hamilton

Ying

Leskovec

, Inductive representation learning on large graphs, in Advances in Neural Information Processing Systems, 2017. pp. 1024–1034.

95.

Kipf

T.N.

Welling

, Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv1609.02907 2016.

96.

Kipf

T.N.

Welling

, Variational graph auto-encoders, arXiv preprint arXiv1611.07308. 2016.

97.

Schlichtkrull

Kipf

T.N.

Bloem

van den Berg

Titov

, Welling

, Modeling relational data with graph convolutional networks, in European Semantic Web Conference, 2018. pp. 593–607.

98.

van den Berg

Kipf

T.N.

Welling

, Graph convolutional matrix completion, arXiv preprint arXiv1706.02263. 2017.

99.

Guille

Hacid

Favre

Zighed

D.A.

, Information diffusion in online social networks: A survey, ACM Sigmod Record 42(2) (2013), 17–28.

100.

Cheng

Adamic

Dow

P.A.

Kleinberg

J.M.

Leskovec

, Can cascades be predicted, in Proceedings of the 23rd international conference on World wide web, 2014, pp. 925–936.

101.

Wang

Cui

Zhu

, Structural deep network embedding, in Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, 2016, pp. 1225–1234.

102.

Wang

, 3DM: Domain-oriented Data-driven Data Mining, Fundamenta Informaticae 90(4) (2009), 395–426, doi: 10.3233/FI-2009-0026.

103.

Wang

, On cognitive informatics, Brain and Mind 4(2) (2003), 151–167.

104.

Leskovec

Backstrom

Kleinberg

, Meme-tracking and the dynamics of the news cycle, in Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, 2009, pp. 497–506.

105.

Kleinberg

, Bursty and hierarchical structure in streams, Data Mining and Knowledge Discovery 7(4) (2003), 373–397.

106.

Guo

Mei

, DeepCas: An End-to-end Predictor of Information Cascades, in Proceedings of the 26thInternational Conference on World Wide Web, Republic and Canton of Geneva, Switzerland, 2017, pp. 577–586. doi: 10.1145/30389123052643.

107.

Ruck

D.W.

Rogers

S.K.

Kabrisky

Oxley

M.E.

Suter

B.W.

, The multilayer perceptron as an approximation to a Bayes optimal discriminant function, IEEE Transactions on Neural Networks 1(4) (1990), 296–298.

108.

Arneodo

Muzy

J.-F.

Sornette

, Direct causal cascade in the stock market, The European Physical Journal B-Condensed Matter and Complex Systems 2(2) (1998), 277–282.

109.

Tsoumakas

Katakis

, Multi-label classification: An overview, International Journal of Data Warehousing and Mining (IJDWM) 3(3) (2007), 1–13.

110.

Yang

S.-H.

Long

Smola

Sadagopan

Zheng

Zha

, Like like alike: joint friendship and interest propagation in social networks, in Proceedings of the 20th international conference on World wide web, 2011, pp. 537–546.

111.

Wei

Cao

P.S.

, Embedding identity and interest for social networks, in Proceedings of the 26th International Conference on World Wide Web Companion, 2017, pp. 859–860.

112.

Radivojac

et al., A large-scale evaluation of computational protein function prediction, Nature Methods 10(3) (2013).

113.

Lü

Zhou

, Link prediction in complex networks: A survey, Physica A: Statistical Mechanics and Its Applications 390(6) (2011), 1150–1170.

114.

Vazquez

Flammini

Maritan

Vespignani

, Global protein function prediction from protein-protein interaction networks, Nature biotechnology 21(6) (2003), 697.

115.

Backstrom

Leskovec

, Supervised random walks: predicting and recommending links in social networks, in Proceedings of the fourth ACM international conference on Web search and data mining, 2011. pp. 635–644.

116.

Getoor

Diehl

C.P.

, Link mining: a survey, Acm Sigkdd Explorations Newsletter 7(2) (2005), 3–12.

117.

Tang

Sun

Wang

Yang

, Blogs as a collective intelligence community, in Proceedings of the 2009 ACM SIGKDD international conference on Knowledge discovery and data mining, 2009, pp. 77–86.

118.

Giles

C.L.

Bollacker

K.D.

Lawrence

, CiteSeer: An automatic citation indexing system, in Proceedings of the third ACM conference on Digital libraries, 1998. pp. 89–98.

119.

Ding

, A tutorial on spectral clustering, in Talk presented at ICML(Slides. available at http://crd.lbl.gov/cding/Spectral/), 2004.

120.

Freeman

L.C.

, Centrality in social networks conceptual clarification, Social networks 1(3) (1978), 215–239.

121.

Huang

Chen

Wen

, Detecting network communities using regularized spectral clustering algorithm, Artificial Intelligence Review 41(4) (2014), 579–594.

122.

Nowicki

Snijders

T.A.B.

, Estimation and prediction for stochastic blockstructures, Journal of the American statistical association 96(455) (2001), 1077–1087.

123.

Kemp

Griffiths

T.L.

Tenenbaum

J.B.

, Discovering latent classes in relational data, CSAIL Technical Reports, 2004.

124.

Wolfe

A.P.

Jensen

, Playing multiple roles: Discovering overlapping roles in social networks, in ICML-04 workshop on statistical relational learning and its connections to other fields, 2004, p. 75.

125.

Adibi

Chalupsky

Melz

Valente

, The KOJAK group finder: Connecting the dots via integrated knowledge-based and statistical reasoning, in Proceedings of the national conference on Artificial Intelligence, 2004. pp. 800–807.

126.

Carnegie

J.K.

Kubica

Moore

Schneider

, Tractable Group Detection on Large Link Data Sets, in The third IEEE international conference on data mining, 2003.

127.

Kubica

Moore

Schneider

Yang

, Stochastic link and group detection, in Proceedings of the national conference on Artificial Intelligence, 2002. pp. 798–806.

128.

Wang

Mohanty

McCallum

, Group and topic discovery from relations and text, in Proceedings of the 3rd international workshop on Link discovery, 2005, pp. 28–35.

129.

MacQueen

, Some methods for classification and analysis of multivariate observations, in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability 1 (1967), 281–297.

130.

Huang

Mamoulis

, Heterogeneous Information Network Embedding for Meta Path based Proximity, arXiv preprint arXiv1701.05291. 2017.

131.

Cao

, Deep neural networks for learning graph representations, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016. pp. 1145–1152.

132.

Huang

Zheng

Cheng

Sun

Mamoulis

, Meta structure: Computing relevance in large heterogeneous information networks, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 1595–1604.

133.

Bollacker

Evans

Paritosh

Sturge

Taylor

, Freebase: a collaboratively created graph database for structuring human knowledge, in Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 2008, pp. 1247–1250.

134.

van der Maaten

Hinton

, Visualizing data using t-SNE, Journal of Machine Learning Research 9(2008), 2579–2605.

135.

Lim

K.W.

Buntine

, Bibliographic Analysis with the Citation Network Topic Model, in Asian Conference on Machine Learning, 2015, pp. 142–158.

136.

Xie

Yin

Wang

Chen

Wang

, Learning graph-based poi embedding for location-based recommendation, in Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, 2016, pp. 15–24.

137.

Zhou

Josang

Cox

, The state-of-the-art in personalized recommender systems for social networking, Artificial Intelligence Review 37(2) (2012), 119–132.

138.

Bourigault

Lamprier

Gallinari

, Representation learning for information diffusion through social networks: an embedded cascade model, in Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, 2016. pp. 573–582.

139.

Feng

Huang

Yang

, GAKE: Graph aware knowledge embeddingin, Proceedings of COLING 2016 the 26th International Conference on Computational Linguistics: Technical Papers, 2016, pp. 641–651.

140.

Liu

Cheung

W.K.

Liao

, Aligning Users Across Social Networks Using Network Embedding, in Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016. pp. 1774–1780.

141.

Liu

Zhang

Zhong

Zhang

, ABNE: An Attention Based Network Embedding for User Alignment Across Social Networks, IEEE Access 2019. pp. 1. doi: 10.1109/ACCESS2019.2900095.

142.

Bandyopadhyay

, L. N, Murty

M.N.

, Outlier Aware Network Embedding for Attributed Networks, in Thirty-Third AAAI Conference on Artificial Intelligence, 2019.

143.

Duan

, Incremental K-clique clustering in dynamic social networks, Artificial Intelligence Review 38(2) (2012), 129–147.

144.

Sun

Song

Dong

Plant

Böhm

, Network Structure and Transfer Behaviors Embedding via Deep Prediction Model, in Thirty-Third AAAI Conference on Artificial Intelligence, 2019.

A review on network representation learning with multi-granularity perspective

Abstract

Keywords

1. Introduction

1.1 Network data representation and network representation learning

1.2 The development of NRL

1.3 The challenges in NRL

1.4 Hierarchical information in networks and multi-granular computing

2. Preliminaries and definitions

Table 1 A summary of common notations

3.1 Multi-level embedding for graphs

Table 2 Summary of the key works of each Section of the survey

4.1 Node classification

4.2 Link prediction

Table 3 Performances of typical NRL algorithms that are listed in this paper

4.4 Visualization

4.5 Advanced applications of NRL

5. Issues and open research problems

5.1 Progressive multi-granular NRL model

5.3 Multi-granular representation for network evolutions

5.4 The criterion for describing the size of granule obtained by NRL

5.5 Evaluate and avoid the influence of unbalance in networks

5.6 Representation for the higher-order structural properties in networks

6. Conclusion

Footnotes

Acknowledgments

Conflict of interest

Funding statement

References

Table 1
A summary of common notations

Table 2
Summary of the key works of each Section of the survey

Table 3
Performances of typical NRL algorithms that are listed in this paper