Research on multi-attribute controller for virtual data domain based on software definition network

Abstract

Currently, large data sets are deployed on large-scale clusters, which require a large amount of physical resources. However, current network architecture does not have flexible deployment, making it difficult to adjust physical resources after deployment. Based on software definition network, this article proposes a framework for building virtual data domain, which establishes a multi-attribute decision model by network nodes for optimizing the deployment of control layer, so as to realize large-scale deployment. By analyzing the actual usage and virtual distribution of the underlying data resources, a mapping algorithm of network resource overhead–allocation ratio is proposed for adjusting the mapping space of network flow based on the mapping result, so as to meet more virtual domain applications. At the same time, the reasonable utilization of resources is helpful to reduce the communication delay in domain. Simulation results show that compared with the shortest path mapping and greedy resource mapping algorithms, the virtual domain established by the network resource overhead–allocation algorithm can improve the resource utilization rate by 10% and reduce the intra-domain communication delay by 30%. Therefore, under the background of the expanding scale of data domain, this framework can solve the problem that the current backward network architecture cannot adapt to the development trend of the information field.

Keywords

Virtual data domain multi-attribute decision model overhead–allocation ratio

Introduction

With the continuous promotion of big data concepts, analysts using the data sets of technology business analyze them to provide decision makers with data-based decisions. The data set of the technology business has a huge amount of data, and because of the different statistical indicators, the data set has a large personalization difference. At present, some studies are based on large-scale clusters, and the required physical resources are more expensive, which obviously cannot meet the growing demand for data, such as introducing Hadoop Distributed storage computing architectures, HDFS (Hadoop Distributed File System) as a representative of the Distributed File System, Hbase big data processing technology such as a variety of storage model.^1–3

Some methods^4,5 show cloud computing to provide large data elastic extension, cheaper storage, and computing services, but its service as an infrastructure, platform, and software does not have a service for the data. The data set needs to be service-oriented, dynamic data domain build, different statistics metrics, and business-based data.^6–8 Because of the increasing scale of the data domain, it causes the underlying traditional hardware to adapt to this development trend.⁹ Under the traditional network architecture, either from the performance of a single network equipment or multiple network equipment structure into consideration, are difficult to support the current rapid development of information demand,^10,11 the main reason is the current backward network architecture cannot adapt to the development trend of this kind of big data. To this end, it is necessary to put forward a data domain construction method that adapts to business development under the new network architecture.^12,13

Software Definied Network (SDN) is one of the popular new network architectures.¹⁴ It separates the data plane and the control plane, reduces the complexity of the network, and increases the flexibility of network programming. SDN network virtualization technology abstracts and distributes network resources, and increases the sharing of network resources. Since each virtual network is managed by one controller, SDN network mapping should consider not only network resource mapping but also the controller optimization deployment.^6,15,16

In this article, SDN is presented under the environment of network virtualization, considering the deployment of the controller, based on the node between centrality and reliability. The article puts forward a kind of multiple attribute decision-making model of controller deployment, optimizing the transmission time delay of control information. In order to achieve the underlying network resources can withstand more virtual data domain to construct, a overhead-network resources allocation algorithm was proposed, used the algorithm to generate virtual data domain mapping, network flow mapping space, realized the domain of information communication and information isolation between domains.

Virtual data domain framework

Virtual data domain is built on the basis of SDN architecture.¹⁷ The data layer contains a data mart and its relationship, the connection between the data layer and control layer virtualization, and the mapping unit between the switch and controller design train of thought, to make the mapping units as independent hardware running between switches and controllers, and as the agreement of the two agents.^18–20

Figure 1 is a schematic diagram of the technology virtual data domain based on SDN. The underlying physical network resources located in the data layer SDN, including five nodes and data marts, are connected by the black solid line; on behalf of the link connection, data mart link between nodes constitute the underlying network topology. Agent agreement between the data layer and control layer set each virtual shard ID of fields and, when connected the shard to SDN controller, isolate the data flow between domains and different virtual domain by a separate controller management. At the same time, the agent agreement according to the virtual data fields and the mapping of the real network resources establishes a mapping flow space; as shown in the figure, dashed line represents the virtual technology in the project domain of traffic, and the biopsy of the virtual project of science and technology domain ID is set to slice 1; the dotted line in the figure represents the communication flow of the virtual technology talent domain, and the slice ID of the virtual technology talent field is set to slice 2.

Figure 1.

A schematic diagram of a virtual data domain building framework.

Based on the sharding ID of the virtual domain, the controller obtains the control virtual network view resources, constructs the whole network view in the virtual domain, and generates the corresponding flow table items based on the business in the virtual domain. When the flow table item reaches the protocol agent, the flow table item is rewritten according to the established network flow mapping space, and the actual physical resources are realized. Then, according to the mapping rules of the protocol, based on OpenFlow protocol, the data mart node of the corresponding SDN data layer is decentralized to realize the business communication of the data mart node.

This article takes the data of technology business service as an example. According to the field of technology service, the data of technology business are divided into six categories. As shown in Figure 2, business data are divided into project plan of science and technology, science and technology agency, scientific and technological personnel, scientific and technological achievements, innovative service classes, and class statistics yearbook; there are some segments in every kind of small classes, each business data such as a large business data mart.

Figure 2.

Technology data mart classification.

Based on the analysis of technology business data, the technology data are divided into six categories. In the data layer of SDN, these six kinds of technology data are different data sets, and each kind of data domain is oriented to a certain kind of data business. When the data required for a business are derived from a number of different data sets, there needs to be a more regional virtual entity data set into a virtual data to support this business field^21–23; this can easily integrate physical data set area data and provide the corresponding data resources to support this business. Under the framework of SDN, this article discusses how to set up virtual data domain problems in the virtual cube market based on business requests.^24,25

Virtual data domain building

Multi-attribute controller deployment model

The deployment node selected by the controller is closer to the network center, and the forwarding of control information is faster.²⁶ Therefore, centrality is used as the measurement parameter to evaluate the centrality of nodes in the network topology. If the network has a controller deployment center node for the direct exchange of data, flow of information is most effective; however, if the node network fails, such as the control node failure, the impact on the network is huge. In order to ensure the reliability of the node, the node reliability is also used as the measurement parameter.

Node properties

A network undirected graph is represented by G(V, E), where V represents the node set in the network and E represents the edge set in the network. The number of shortest paths between node i and j is represented by Lij(m), which is the shortest distance between nodes i and j, where m is the number of node paths. Thus, in a network G(V, E), the centripetal definition of any node m is defined as follows

B_{(m)} = \sum_{i \neq j, m \neq v} \frac{Lij (m)}{Lij}

(1.1)

The fault of network components refers to the failure of link or node, and the set of all failures and trouble-free scenes in the network is represented as S. In a network, multiple network component failure probability is small; at the same time, most of the fault is caused by a single node or a single link, and once the failure occurs, the control path fails, as the normal controller cannot control the issuance of the information to all the network elements. Due to this, the failure control path expires and the network node controller is unable to connect to all the nodes. These nodes are defined as inaccessible nodes. In scenario s, where the control path fails due to network component failure, the defined node cannot reach the percentage value of the θ′, and the reliability of the node is represented by θ. W_SC represents the probability of failure of the control path under scenario s; Q_s represents the number of switch nodes that cannot be connected to the control network after the failure of the control path caused by network element failure in scenario s. Q_n represents the total number of nodes in the network

θ' = \sum_{s \in S} W s \times \frac{Wsc \times Qs}{Qn}

(1.2)

θ = 1 - θ'

(1.3)

Multi-attribute decision model

For a multi-attribute decision problem, define X = (x₁, x₂,…, x_n) for the solution set and U = (u₁, u₂,…, u_n) is an attribute set. Each solution x_i in the solution set corresponds to the u_i in the attribute set. Each scheme x_i i the row of the matrix, the attribute set u_i is the column of the matrix, and the decision matrix is constructed. Each row of the matrix represents a scheme within the solution set, and each column represents an attribute under the scheme. The processing steps are as follows.

1. Define properties W = (w₁, w₂,…, w_n)^T, in which w_j is in [0,1]. The number of attributes represented in the decision matrix can be different according to the importance of the effect of the property on the matrix row, and the attribute weight is set, but all the attribute weights must be 1

\sum_{j = 1}^{n} wj = 1

(1.4)

2. When the decision matrix is constructed, the decision matrix is normalized, and the decision matrix after the normalization operation is R = [r_ij]_n×m. The process of normalizing uses the method of extreme difference, which is to normalize the value of the attribute in each scheme to eliminate the influence of the order of magnitude of the attribute

rij = \frac{aij}{max_{xi}}

(1.5)

3. The linear weighted fusion value of each scheme is obtained, and u_i represents the resultant value of linear weighting, and w_j represents the weight of each attribute under this scheme. For the decision matrix R = [r_ij]_n×m, each attribute value is multiplied by the attribute weight, and the values obtained are added, and the linear weighted fusion value of each scheme is finally obtained

Ui = \sum_{j = 1}^{n} wj \times rij

(1.6)

4. According to the linear weighted fusion value u_i, the scheme set is sorted and the best solution is selected.

Finally, the rows of the selected best option represent the nodes deployed by the controller.

Network resource overhead–allocation ratio mapping algorithm

At the bottom of the data mart with OpenFlow communication protocol, we would set up network topology for the underlying data mart information database, storage resource information of each data mart, and the link between information.^27–29 The definition of network topology is represented by the privileges of graph G_s = (N_s, L_s), where N_s represents the data mart collection and L_s represents the physical link set. For the data mart n_s ∈ N_s, the data mart node’s compute resources (CPU) and node’s Flowtable Storage and Processing Capabilities (FSPC) represent the two attributes of the node. In the case of physical link l_s ∈ L_s, l_s is represented by an unordered number, representing the link between two data marts. Bandwidth BW(l_s) between links represents its bandwidth resource properties.

The network mapping of a virtual data domain is a subset process of mapping the virtual network graph $G'_{s}$ to the physical SDN network G_s. This process includes the node mapping of the two parts ƒ_N and the link mapping ƒ_L.

1. The node mapping $f_{N} : N'_{s} \to N_{s}$

The data node set of the virtual data domain is defined as $N'_{s}$ , and $N'_{s}$ is mapped to a subset of the physical network. In this subset network, each virtual data mart node takes up CPU and FSPC resources mapped to the actual data mart. Assuming that $n'_{s} \in N'_{s}$ , the actual CPU resource value for each virtual node is $cpu (n'_{s})$ and the actual FSPC resource value is $fspc (n'_{s})$ .

2. The link mapping $f_{L} : L'_{s} \to L_{s}$

The data node links that define the virtual data domain are set as $L'_{s}$ . Based on the mapping of nodes, a link that meets the bandwidth of the virtual data domain is found in the physical link, and ƒ_L sets the virtual link mapped to the physical link. Assuming that $l'_{s} \in L'_{s}$ , the bandwidth data value for each virtual link is $bw (l'_{s})$ .

Figure 3 represents the virtual data domain network mapping. The second half of SDN network virtualization is the real data mart network topology G_s = (N_s, L_s). The data mart nodes are represented by letters, namely, N_s = {A, B, C, D, E}, and the links between them as L_s = {AB, AC, BC, CD, DE}. All physical CPU resources and FSPC resource values in the digital representation node are next to the data mart. The number next to the link represents the total bandwidth of the link.

Figure 3.

Virtual data domain network mapping.

While trying to establish the virtual data domain, it is important to try and reduce overhead. It is important that the underlying network mapping achieves more data domain application to improve the benefits of physical SDN network while also ensuring the link has the required bandwidth.^30,31 When virtual domain applications generate virtual domains, the underlying physical network resource mapping will produce different mapping schemes, put forward the network resource mapping overhead–distribution than the map as a virtual mapping scheme algorithm, to select the optimal mapping solutions. Definitions are as follows:

1. The value of the spent resource O(N_s, L_s): The (N_s, L_s) represents a subset of the virtual domain mapped to the underlying network. In this network subset, the total resources allocated to other virtual domains by the underlying network resources are O(N_s, L_s). This includes the underlying data mart node has been assigned to other fields of CPU and FSPC values, as well as the market between nodes allocated the sum of link bandwidth bw(l_s), calculate the resource value of the underlying network have been occupied. The larger the number, the more data resources have been used

O (N_{s}, L_{s}) = \sum_{n_{s} \in N_{s}} (c (n_{s}) + fspc (n_{s})) + \sum_{l_{s} \in L_{s}} bw (l_{s})

(2.1)

2. Maximum resource value to be allocated $D (N'_{s}, L'_{s})$ indicates that the network subset can be allocated to the maximum resources of this virtual domain $(N'_{s}, L'_{s})$ . This includes the maximum CPU and FSPC values that can be allocated to the virtual data domain, in addition to the resources already allocated by the data mart node

\begin{matrix} D 1 (N'_{s}, L'_{s}) = \sum_{n'_{s} \in N'_{s}} \\ [(C (n_{s}) - c (n'_{s})) + (FSPC (n_{s}) - fspc (n'_{s}))] \end{matrix}

(2.2)

The maximum bandwidth of the link between the data mart nodes is represented as $D_{2} (N'_{s}, L'_{s})$ . When calculating the current underlying network, virtual domains can be assigned to one of the biggest resource value; the greater the requirement to allocate more resources for virtual domain the more the application sets aside

D 2 (N'_{s}, L'_{s}) = \sum_{n'_{s} \in N'_{s}} (BW (l'_{s}) - bw (L'_{s}))

(2.3)

Thus, the maximum resource value that can be allocated is represented as

D (N'_{s}, L'_{s}) = D 1 (N'_{s}, L'_{s}) + D 2 (N'_{s}, L'_{s})

(2.4)

3. Overhead–allocation ratio: β represents the maximum resource value of the resource values that are already spent. When the Network resource overhead–allocation ratio mapping algorithm resources value is smaller, and can be assigned to the resource value, the greater the biggest is calculated to decide the smaller β, network scheme is the better the results show that the mapping, can better allocation of network resources, virtual domain application for the future with more resources, can accept more virtual domain application

β = \frac{O (N_{s}, L_{s})}{D (N'_{s}, L'_{s})}

(2.5)

Based on the above cost–distribution ratio formula analysis, the virtual network mapping algorithm is as follows. When a virtual data domain application is created, the algorithm input gets the start and end points of the data mart communication and the maximum value of the property requirements in this virtual data domain. Searching algorithm based on deep start and end points searches all paths to determine whether the remaining actual network resources can meet the requirements of virtual data domain attributes, if can meet, get the optimal network to distribute through the overhead—than mapping, the algorithm outputs a virtual network mapping scheme.

Algorithm input: data mart communication starting point A and terminal B, and virtual network properties require CPU, FSPC, and bw values.

1) path = DFS(A, B);

2) truepath = check(path, cpu, fspc, bw);

3) for each i∈ truepath.size

4) Node mapping $f_{N} : N_{s}^{'} \to N_{s}$ ;

5) Link mapping $f_{L} : L_{s}^{'} \to L_{s}$ ;

6) Calculate the cost resource O(N_s; L_s);

7) Calculates the maximum allocated resource $D (N_{s}^{'}, L_{s}^{'})$ ;

8) Calculate the cost—allocation ß;

9) min-ß = Compare(ß);

10) end for

11)

G_{s}^{'} (N_{s}^{'}, L_{s}^{'})

= truepath(min-ß);

Algorithm output: virtual mapping subset $G'_{s} (N'_{s}, L'_{s})$ .

Experimental demonstration

Controller deployment simulation

The experiment uses mininet to simulate the underlying network topology and OpenDaylight as the controller. It sets up eight node numbers; analyzes the node centricity and reliability of the network topology, based on multiple attribute decision-making model; and chooses the optimal deployment of the control node, compared with single parameter network topology centricity method. Figure 4 shows the comparison of the number of routing hops of the controller under the flow table and the control node A in the network topology. The deployment node selected by this method is D. The comparison between the number of routes in the graph shows that, when the control node A is selected, the routing number of the flow table to other nodes will be greater than or equal to the number of selected D nodes as the routing hop of the deployment node. Therefore, it reduces the routing number of the flow table in the controller and saves the cost of routing.

Figure 4.

Comparison of the routing number of the flow table of the controller.

Through 10 communication between node B and node E, compare D as the initial communication time of deployment node and A as the deployment node, as well as the arithmetic mean of round-trip delay.Figure 5 shows the experimental comparison results of the two deployment methods. D as the deployment of nodes, the communication time and delay the arithmetic mean of back and forth for the first time is better than A as the deployment of nodes, D than A reference deployment node, effectively reduce the spread of control information.

Figure 5.

Control path information communication for the first time comparison.

As the scale of network topology increases, the arithmetic mean of the above initial communication time and round-trip delay is compared in the same experimental environment, and the experimental results are shown in Figure 6. You can see from the picture that, when the underlying network topology of the scale is small, two methods are selectively controller nodes deployment, the controller of the flow table issued time has little effect, and the time performance of the two methods is similar. When the scale of the underlying network increases gradually, the influence of the choice of deployment nodes on the release time process of the controller increases gradually, showed the deployment of nodes to SDN network performance influence, at the same time, by multi-parameter choice model selectively deployed nodes communication time less than the time of the method based on network topology centricity, explain deployment node selection of control by using the method of this article news propagation time less, reducing the transmission delay.

Figure 6.

Deployment method network performance comparison.

Virtual mapping algorithm simulation

Through Matlab simulation virtual domain application ground network mapping selection process, the article proposed mapping cost–allocation ratio and shortest path network mapping method. The underlying physical SDN network sets 10, 50, 100, 150, and 200 nodes, respectively, and each pair of nodes in each group is connected with a probability of 0.5. The CPU and FSPC resources of physical nodes and the physical link bandwidth resource are subject to the uniform distribution of 50–100. Virtual nodes in each group, under the network topology generation of CPU and FSPC and link bandwidth resource requirements, subject to the uniform distribution of 0–50 request; the successful acceptance rate and the underlying resource utilization ratio of the two methods are compared.

Figure 7, for the experimental results, shows that the success rate of the two methods is low when the network topology is smaller. However, the cost–allocation is slightly higher than the request acceptance rate for the shortest path mapping scheme. With the increasing of network scale, both methods can accept more application for virtual domain. The reason is that the network topology increases, and the mapping path of the virtual domain request is increased. However, the cost–allocation ratio mapping scheme’s request acceptance rate is higher than the shortest path scheme.

Figure 7.

Request acceptance rate comparison.

Experimental results show that, as shown in Figure 8, due to cost–allocation algorithm for virtual network mapping, the physical nodes of CPU and FSPC and physical link bandwidth resources must be considered. However, the shortest path to the virtual network mapping scheme usually considers only the local optimal solution, not from the global network resource usage. Therefore, through the overhead allocation ratio method, the underlying network can accept more virtual domain applications and realize the reasonable utilization of the underlying resources.

Figure 8.

Comparison of resource utilization.

Virtual data domain building simulation

The experiment takes the technology business data. There are five data sets at the bottom, the CPU and FSPC resources of each data mart, and the data mart IP information as shown in Table 1. These data sets belong to different physical locations and constitute the underlying data network topology. The control layer adopts OpenDaylight controller, and each virtual data domain is controlled by a single controller. As an agent between the control layer and the data layer, the protocol manages the underlying data set, the city domain information, and the mapping between the virtual domain and the real domain.

Table 1.

Data mart resource information table.

Data mart name	CPU	FSPC	IP
Science and technology project set (B)	20	15	10.0.1.1
Science and technology talent set (A)	16	10	10.0.1.2
Science and technology organization set (C)	17	17	10.0.1.3
Innovative service set (E)	32	10	10.0.1.4
Scientific and technological achievements set (D)	20	16	10.0.1.5

CPU: compute resources; FSPC: Flowtable Storage and Processing Capabilities.

When the agreement is successful, the agreement and the underlying data mart, through the OpenFlow protocol, information communication is established to obtain the resource information of each data mart and the link information between them, so as to build the undirected graph G_s = (N_s, L_s). The underlying data mart information obtained by the protocol is shown in Table 1, including the data mart name, CPU resource, FSPC resource, and IP. For example, the name of the data mart is a set of science and technology projects, which is represented by B, and the data mart node has all CPU resources as CPU(B) = 20, FSPC resource is FSPC(B) = 15, and its IP address is set to 10.0.1.1.

At the same time, the protocol agent should also construct the network link of the data mart, which is represented by an ordered number, whose property represents the total bandwidth of the link. This is shown in Table 2. Link AB represents the link between data mart A and data mart B, and its value represents the link transmission data capability.

Table 2.

Data mart link information table.

Link	Bandwidth
AB	30
AC	40
BC	25
CD	35
DE	25
AE	35

For the virtual framework of Figure 1, the science and technology project set (B), the science and technology talent set (A), and the science and technology organization (C) virtualization are used as the technology project domain. The innovation service set (E) and the technology achievement set (D) virtualization are used as the domain of scientific and technological achievements. When it is necessary to construct the talent achievement domain, the data come from the scientific and technological talent set (A) and the scientific and technological achievement set (D). Network mapping scheme is proposed in this article, and compares the greedy resource mapping scheme with the shortest path mapping scheme, respectively, under the condition of the same network set up three kinds of scheme of virtual data fields, achieve 10 D and A data mart node communication, statistical separate network performance, comparing results as shown in Figure 9. Due to the limited data set and communication time in the experiment, the time comparison of different virtual mapping schemes does not differ much. Therefore, in the following experiments, the scale of the underlying data domain is increased, and the time difference between different virtual mapping schemes is highlighted.

Figure 9.

Virtual mapping scheme comparison.

The first communication time in A-B-C-D virtual map is the time when the controller is translated by the controller via the protocol agent to add the flow table rule to the data mart node. As can be seen from the figure, the initial communication time of the A-B-C-D virtual mapping scheme is the least, indicating that the flow table processing capacity (FSPC) of the data mart node is faster than other schemes. Three experimental schemes from 10 D data mart node to node A data mart of information communication, the results showed A-B-C-D scheme of average time less than other schemes, suggests A-B-C-D scheme of CPU processing information ability is better than other schemes. However, the fastest time in 10 communications is the A-C-D scheme, which indicates that A-B-C-D is the cost of A certain path.

In the same experimental environment, 30 underlying data domains are set; each data domain is connected by 0.5 probability, and the CPU, FSPC, and BW resource values of each data domain node are subject to the uniform distribution of 50–100. Set each number per unit time is 1/m, simulated users have four units of time to build application processes and virtual data fields, 10 times each request virtual node number obey uniform distribution of 3–6, application of resources value obey uniform distribution of 0–30. The average delay and maximum delay of the network resource-cost mapping scheme, the greedy resource mapping scheme, and the shortest path mapping scheme are compared.

Figure 10 describes the average delay (left) and maximum delay (right) of the three mapping schemes, as the application of virtual data domains increases. The network resource overhead–allocation ratio mapping algorithm has the shortest average delay and is stable at 7.08 ms, which is 29% and 35% less than the greedy resource mapping algorithm (10.08 ms) and the shortest path resource mapping algorithm (11.06 ms). The main reason is that the network resource mapping algorithm from the whole plan pays close attention to the overall resources value and has been assigned to other virtual data domain resource value, while the other two algorithms are only considering the local optimum, not considering from global mapping scheme than the situation of resources. The maximum time-delay index shows that the network resource overhead–allocation ratio mapping algorithm has the smallest delay, about 8.08 ms.

Figure 10.

Average control delay and maximum control time delay comparison.

Figure 11 compares the cost of the three mapping schemes—the allocation ratio. By formula 710, the underlying resource has been allocated to other resource data values when the overhead of virtual request allocation is greater than that of mapping algorithm, with the continuous application of the virtual domain, network resources cost allocation overhead than mapping algorithm-distribution than steady at 62%; Compared with greedy resource mapping algorithm (74%) and shortest path resource mapping algorithm (81%).

Figure 11.

Mapping overhead–allocation ratio.

Because the shortest path mapping scheme is in virtual network mapping, many factors are not considered, so the effect is the worst. For most virtual mapping requests, the optimal virtual map can be found in the tanxin resource mapping scheme, so it is better than the shortest path mapping scheme. The network resource overhead–allocation ratio mapping scheme considers the underlying resources and the actual occupied resources of the mapping completely.

Conclusion

Under the new network architecture, this article discusses SDN controller deployment issues. The controller deployment of multiple attribute decision-making model is set up, and a mapping algorithm is put forward in view of the virtual data domain when applying for establishment of overhead—distribution network resources. Through a simulation experiment, the realization of virtual mapping of data against real data domains is achieved and guarantees to build a virtual data domain communication; the greedy resource mapping algorithm and the shortest path algorithm reduce the communication time delay of the virtual domain and realize the effective utilization of the underlying resources.

However, the method proposed in this article still has some shortcomings. For example, when the underlying network facilities are mapped with multiple virtual networks, if a single underlying network device fails, multiple virtual networks will be affected. Therefore, ensure that no uniqueness exists for each underlying network device. The research on this problem can be combined with the virtual data domain construction framework proposed in this article to establish backup resource data nodes and dynamically update the resource usage of the entire network, and further in-depth research can be conducted.

Footnotes

Handling Editor: Luca Reggiani

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors acknowledge the National Natural Science Foundation of China (Grant: 61373160), the Standardization Processing and Application System Development of Science and Technology’s Big Data (Grant: 17210113D), and Science and Technology Resource Survey, Statistical Analysis and System Development (Grant: 179676334D).

ORCID iD

Wenbin Zhao

References

Khan

Yaqoob

Hashem

IAT

et al . Big data: survey, technologies, opportunities, and challenges. Sci World J 2014; 2014: 712826.

Dev

Patgiri

A survey of different technologies and recent challenges of big data. In: Nagar

Mohapatra

Chaki

(eds) Proceedings of 3rd international conference on advanced computing, networking and informatics. New Delhi, India: Springer, 2016, pp.537–548.

Jiang

Wang

Dan

et al . Key technology of distributed file system for big data analysis. Comput Res Develop 2014; 51(2): 382–394.

Liu

. Research on key technologies of distributed storage in cloud computing environment. In: International conference on electrical & electronics engineering and computer science, Hangzhou, China, 26 April 2016.

König

Method for storing and recovering data, utilization of the method in a storage cloud, storage server and computer program product. Patent 9419796B2, USA, 2016.

Costa

LHMK

. Vulnerabilities and solutions for isolation in FlowVisor-based virtual network environments. J Internet Services Appl 2015; 6(1): 18.

Zhao

A review of data visualization. J Chongqing Univ Posts Telecommun 2016; 28(4): 494–502.

Zhang

Crew

Raksuntorn

Software defined network (SDN) research progress. J Software 2015; 26(01): 62–81.

Bosch

Big data in market research: why more data does not automatically mean better information. GFK Market Intell Review 2016; 8(2): 56–63.

10.

Deng

Luo

et al . Research on SDN research. Comput Appl Res 2014; 31(11): 3208–3213.

11.

Zuo

Chen

Zhao

et al . Research on SDN technology based on OpenFlow. J Software 2013; 24(5): 1078–1097.

12.

Wang

S-L

Zhang

et al . SDN architecture and security research. Telecommun Sci 2013; 29(3): 117–122.

13.

Zhao

Fan

Nie

et al . Research on attribute dimension partition based on SVM classifying and MapReduce. Wireless Pers Commun 2018; 102(4): 2759–2774.

14.

Shen

Huang

Luo

. Software definition network and its application analysis. Comput Visual 2014(4): 133–137.

15.

Zheng

Zhu

Lyu

MR.

Service-generated big data and big data-as-a-service: an overview. In: International congress on big data, Santa Clara, CA, 27 June–2 July 2013, pp.403–410. New York: IEEE.

16.

Zhao

Yin

Fan

et al . Research on influence spread of scientific research team based on scientific factor quantification of big data. Int J Distrib Sensor Netw 2019; 15(4): 1550147719842158.

17.

Wei

Wang

Liu

et al . Design of virtual platform based on SDN. Telecom Technol 2014(6): 47–52.

18.

Yin

Huang

Wang

et al . Software defined virtualization platform based on double-protocol in multiple domain networks. In: International ICST conference on communications and networking in China, Maoming, Guangdong, China, 14 August 2014, pp.776–780. New York: IEEE.

19.

Liu

Huang

Zhang

et al . Overview of the virtual slice mechanism of SDN test bed network. J Commun 2016; 37(4): 159–171.

20.

Cai

et al . High reliability virtual network mapping algorithm based on Openflow network. J Electron Inform 2014; 36(2): 396–402.

21.

Xiao

Jiang

Chen

et al . A Survey of Cloud Computing Data Virtualization Service[J]. Applied Mechanics & Materials 2014; 441: 1016–1019.

22.

Yang

Tang

. Data virtualization for coupling command and control (c2) and combat simulation systems. In: Chinese conference on image and graphics technologies, Beijing, China, 19–20 June 2015. Berlin: Springer.

23.

Tofigh

Adibi

Mobasher

et al . Novel approach to big data collaboration with network operators network function virtualisation (NFV). Int J Parall Emerg Distrib Syst 2015; 30(1): 65–78.

24.

Liao

Shami

Leung

VCM

. Distributed FlowVisor: a distributed FlowVisor platform for quality of service aware cloud network virtualisation. IET Netw 2015; 4(5): 270–277.

25.

Jackson

Nejabati

Agraz

et al . Demonstration of the benefits of SDN technology for all-optical data centre virtualisation. In: Optical fiber communications conference and exhibition, Los Angeles, CA, 19–23 March 2017, p.Tu3L3. New York: IEEE.

26.

Caraguay

ÁLV

Fernández

JAP

Villalba

LJG.

An optimisation framework for monitoring of SDN/OpenFlow networks. Int J Ad Hoc Ubiquit Comput 2017; 26(4): 263–273.

27.

Kourtis

Xilouris

Gardikis

et al . Statistical-based anomaly detection for NFV services. In: Conference on network function virtualization and software defined networks, Palo Alto, CA, 7–10 November 2016, pp.161–166. New York: IEEE.

28.

Herrera

Botero

JF.

Resource allocation in NFV: a comprehensive survey. IEEE Trans Netw Service Manage 2017; 13(3): 518–532.

29.

Vizarreta

Condoluci

Machuca

et al . QoS-driven function placement reducing expenditures in NFV deployments. In: IEEE international conference on communications, Paris, 21–25 May 2017, pp.1–7. New York: IEEE.

30.

Assi

Shaban

et al . A reliability-aware network service chain provisioning with delay guarantees in NFV-enabled enterprise datacenter networks. IEEE Trans Netw Service Manage 2017; 14: 554–568.

31.

Neves

Calé

Costa

et al . The SELFNET approach for autonomic management in an NFV/SDN networking paradigm. Int J Distrib Sensor Netw 2016; 12: 2897479.