Abstract
The Internet of Things (IoT), which combines identification, sensing, computing, and communication technologies, is considered one of the major trends in information and communication technologies. Communication performance is critical for IoT applications. According to previous research, an internet-based overlay model is feasible for the implementation of the IoT. One important issue in the overlay routing model is the overlay node placement problem (ONPP). Once the size of overlay node set is fixed to a particular number k, the ONPP changes to k-ONPP. In this work, the IoT-based overlay node placement problem is formulized and analyzed. The major contributions of the paper include providing the time complexity of multi hop k-ONPP and its theoretical limit boundary of approximation ratio and proposing a local search algorithm. Furthermore, the time complexity and approximation ratio boundary of the local search algorithm are given. The proposed local search algorithm is evaluated by both time and efficiency where efficiency refers to the degree of approximation of algorithm results with optimal solutions. Another algorithm, TAG, is used for comparison. Finally, a simulation experiment based on network simulator EstiNet is provided. The experimental results show network delay benefits from the proposed method.
1. Introduction
The Internet of Things (IoT) has been regarded as the future of internet and one of the major trends in information and communication technologies [1]. The key idea of IoT is combining identification, sensing, computing, and communication technologies to provide a better description of physical processes. IoT technologies can be applied in a wide variety of applications such as smart homes, smart cities, environmental monitoring, and health care [2].
Many IoT-based applications require timely interaction between users and physical objects. Therefore, communication performance is very important in IoT implementation. There are three options for the implementation of the IoT: using the current internet, building a new network, and building a dual-layer network [3]. Based on the consideration of both performance and ease of implementation, an internet based dual-layer network is suitable for the IoT. Here, the dual-layer network refers to the overlay network. Currently, many IoT applications are implemented using an overlay network. Take the smart grid, for example. One typical example of a smart grid is the wide area management system (WAMS). The WAMS uses the phasor measurement unit (PMU) as sensor and data collector. The collected data need to be transferred to a control center for analysis. The current WAMS is built on an IP-based network, and many studies have been conducted on the influence of network performance on WAMS [4–6].
However, as the internet only provides a best-effort service, internet-based overlay networks should add additional methods to improve network performance. Such methods include admission control and overlay routing. Admission control guarantees the worst-case delay boundary, but it may deny a connection [7] and requires special network devices. Overlay routing has been proved useful in reducing end-to-end delay [8], and no further devices are needed. The overlay routing method can be used to reduce the communication delay between sensors and the data center where the data are analyzed. One important issue in the overlay routing model is the overlay node placement problem (ONPP) [9]. The objective of the ONPP is to find the optimal overlay node set with minimum total data transfer cost. However, the size of overlay node set may be fixed to a given number k due to cost and efficiency considerations. This modified ONPP is called k-ONPP.
In this work, the overlay node placement problem (ONPP) in IoT applications is formulized and a local search algorithm is proposed. The time complexity of k-ONPP is analyzed. Furthermore, we give the theoretical limit boundary of the approximation ratio for k-ONPP. Additionally, the approximation ratio boundary of the proposed local search algorithm is provided. A genetic algorithm and a greedy algorithm are introduced for performance comparison. All algorithms are evaluated by time cost and efficiency with MATLAB tools. Here, efficiency refers to the degree of approximation of algorithm results with optimal solutions. Finally, a simulation experiment based on the network simulator EstiNet [10] is provided to test the efficiency of the proposed overlay network model and algorithm. The experimental results show network delay benefits from the proposed method.
The rest of the paper is organized as follows. Section 2 introduces the research background and related work in overlay node placement. Section 3 presents the model and formulation of k-ONPP in the IoT and provides a theoretical analysis for this problem. Section 4 proposes a local search algorithm and provides its time complexity and approximation ratio boundary. Section 5 evaluates the algorithms based on time cost and algorithm approximation using MATLAB tools. A genetic algorithm and a greedy algorithm, TAG, are introduced for comparison. The local search algorithm is tested in a simulation scenario with the network simulator EstiNet in Section 6. Some additional factors that impact the algorithm are discussed. Finally, a general conclusion is provided in Section 7.
2. Research Background and Related Work
Overlay routing has been proved to be a feasible method to improve network performance with unreliable internet infrastructure [8]. The basic concept of overlay routing is choosing one or more nodes in the overlay network as hop nodes for data transfer. Overlay routing will use an additional routing algorithm, separate from the underlying internet routing algorithm. There are two types of overlay networks: peer-to-peer networks and infrastructure networks [9]. Network nodes of peer-to-peer networks may change rapidly, while the nodes of infrastructure networks have higher persistence. In most infrastructure overlay networks, overlay nodes belong to a single entity, so it is feasible to apply the routing algorithm in these overlay nodes. Currently, most networks of IoT applications are similar to infrastructure overlay networks, so this paper focuses on the overlay node placement problem in such networks.
Previous work in [8] has proved that with the overlay routing method, the RTT of end-to-end packet may be reduced. Andersen et al. found that network performance would improve with only one hop node [11]. A random-k algorithm is proposed in [12]. The basic idea of this algorithm is randomly selecting k nodes out of M. Then, a source sends a test packet through these k nodes to the destination. The intermediate node whose response comes back first will be chosen as the overlay node. This algorithm aims to improve the network reliability after path failure occurs, so other performance metrics such as communication delay are not considered. Additionally, this algorithm is based on experimental experience, no theoretical analysis is given.
In [13], Zhu et al. further studied the node placement problem with one hop node. Their scenario uses overlay routing and multihoming to improve network performance and availability. They proved the overlay node placement problem with one hop node which is an NP-hard problem based on a reduction from the set covering problem. The number of overlay nodes is not fixed in their scenario. In contrast to previous work in [12], network performance is considered in [13]. Four heuristic methods, Random, Customer-driven, Traffic-driven, and Performance-driven, are introduced and tested. Their results show an improved RTT from sources to destinations with overlay routing.
A measurement study on the benefits from overlay routing is made in [14]. Their scenario uses overlay routing to improve end-to-end network performance. The intermediate node is also set to be one. In contrast to previous work in [11–13], the number of overlay nodes is fixed to a given number k in this work. The problem is also proved to be an NP-hard one. Four greedy heuristic algorithms are introduced and tested.
A generic description of k-ONPP is given in [9] as follows. Given M possible overlay nodes and a number k, choose k nodes out of M to optimize the application-specialized performance metric. Moreover, the number of intermediate nodes in an overlay path is unfixed. In [15], Yang et al. considered this generalized problem with the performance metric group delays from sources to destinations. In this work, the node placement problem is described by a linear programming formula and solved with ILOG CPLEX, an optimization software package designed by IBM. Although in this work the generalized problem is considered, it is unfeasible to implement the solution in a real network as the results are calculated by another software.
The latest work on overlay node placement problem can be found in [16–18]. In [16], Roy et al. introduced a greedy algorithm called Traffic Aware Greedy (TAG) and compared this algorithm with the node degree-based algorithm. Another greedy algorithm is proposed in [17]. Cohen and Raz made progress on this problem by providing a theoretical limit boundary of the approximation ratio that can be achieved by a polynomial algorithm based on the set cover problem [18]. The overlay node placement problem also has been discussed in other contexts such as web cache placement in [19, 20], but the motivation and objective are quite different.
As described above, most of the current works on the overlay node placement algorithm are greedy algorithms and lack a theoretical analysis. Although [18] analyzed this problem in a theoretical manner and provided a limit boundary of approximation ratio, their analysis is based on a one-hop overlay. In addition, the approximation ratio of their proposed algorithm is denoted by another parameter m, where m is the size of the maximum minimal overlay vertex cut. As written in [18], finding the minimal overlay vertex cut is not easy. In this work, the overlay node placement model for IoT applications is proposed, and a different analysis approach is made based on a reduction from the k-median problem. We further analysis the k-ONPP problem with a multihop overlay and provide its time complexity and the limit boundary of the approximation ratio. A local search algorithm is proposed, and the approximation ratio boundary has been provided in a more calculable way.
3. Problem Formulation and Analysis
According to previous research in IoT [21, 22], the architectural of IoT can be divided into three parts: sensor/actuator layer; information transmission layer (network layer), and application layer. As mentioned in the introduction, currently the internet based dual-layer network is feasible for an IoT application, so an abstract IoT architectural scheme can be described as in Figure 1.

IoT architectural scheme.
Consider the above IoT architectural scheme, the communication network of IoT is used for data gathering from distributed sensors to analysis centers. Some data concentrators, such as PMU in smart grid, are deployed so that an internet-based dual layer network is already there. These concentrators and sensors are fully constructed and controlled by the same entity or group, so they can be used as an overlay node to gain benefits. As these concentrators are almost persistent, this overlay network can be regarded as an infrastructure overlay network. Thus, the problem is how to place these concentrators to maximize the benefits. Figure 2 in the following page provides a sketch map of such a dual layer network.

Sketch map of a dual layer network of the IoT.
3.1. Problem Formulation
We consider the performance metric of group communication delay, which is the total communication delay from each sensor to the analysis center. Group communication delay is used because data generated by sensors in IoT application is predictable and generally periodic. This means that if group communication delay drops, system performance may become better. Then, the overlay node placement problem can be formulized as follows.
Consider a physical network represented as a graph
3.2. Problem Analysis
In this section, we will analysis the time complexity for k-ONPP and discuss the theoretical limit boundary of approximation ratio. We give the following theorems.
Theorem 1.
k-ONPP is an NP-hard problem.
Proof.
First, we consider another NP-complete problem. The k-median decision problem is a typical NP-complete problem which can be described as follows. Given a client set
Next, we modify problem
Now, we consider the original k-ONPP. Given a graph Use the polynomial algorithm for k-ONPP to find the result R of constructed k-ONPP. Test if
Because
Next, we provide the theoretical limit boundary of an approximation ratio for k-ONPP. In k-ONPP, define following parameters:
Theorem 2.
There is no polynomial algorithm for k-ONPP with an approximation ratio less than
Proof.
As proved above, problem
Both α and ω are easy to calculate in this formula. It is obvious that the time complexity to obtain ω is
4. Proposed Algorithm and Analysis
As discussed above, the k-ONPP problem is similar to the k-median problem, so the algorithm for the k-median problem may also be applied in k-ONPP. The proposed local search algorithm is modified from the local search algorithm developed by Arya et al. in [24]. However, the discussion in [24] is based on the metric space, which means that the distance definition satisfies the symmetrical characteristic and triangle inequality. However, neither of these two properties is consistent with network delay. In fact, if network delay satisfies triangle inequality, there is no need to optimize the delay as
We define the cost function as Cost
Delay graph (1) Random select a set N which (2) Constructing a new graph (3) Apply Dijkstra algorithm in (4) Calculating the (5) If (6) Return N.
Next, we discuss the time complexity of the proposed algorithm. We state the following theorem.
Theorem 3.
The time complexity of the proposed local search algorithm is polynomial.
Proof.
Let
Finally, we provided the approximation ratio boundary of the local search algorithm in Theorem 4.
Theorem 4.
The approximation ratio boundary of the proposed local search algorithm is
Proof.
Suppose that the optimal set for k-ONPP is O and local optimal set is N. Let
5. Algorithm Evaluation
In this paper, the proposed local search algorithm is tested with both Matlab and the network simulator EstiNet. EstiNet is used to test the algorithm's performance in a network environment. However, as the amount of network nodes is limited in a simulator, in this section, Matlab is used to evaluate the algorithm based on time cost and effectiveness. A genetic algorithm is introduced to approach the optimal result, while the TAG algorithm proposed in [16] is used for comparison.
5.1. Experiment Design and Implementation
As described above, a physical network can be represented as a graph
The Matlab test is implemented on a laptop with two Intel i5 core processors and 3 GB memory. For each experiment, we test the algorithm 500 times. The mean value of the proposed algorithm is then calculated to better expose the performance. In addition, the 95% quartile of 500 tests is calculated and the 95% confidence interval of the mean value is obtained. For the genetic algorithm, the best result of the 500 tests will be recorded, as it is used to generate the optimal result.
5.2. Two Other Algorithms for Comparison
5.2.1. Genetic Algorithm
To evaluate the time cost and algorithm approximation, the global optimal result is needed for comparison. However, as previously proved, k-ONPP is NP-hard; we cannot calculate the global optimal solution with an increasing problem scale. So, the result of a genetic algorithm is used to approximate the global optimal result. It is unnecessary to provide the detailed steps of the genetic algorithm, so only the key definitions are described here as follows.
Genetic representation. The target of k-ONPP is finding k nodes out of possible set B to minimize group delay. A natural thought is using the tag of these nodes to represent the solution. So, we mark each node in set B with a number and represent each individual as a set of numbers. The size of the individual set is fixed to k. Fitness function. The property of the fitness function is that the better the solution is, the larger the fitness will be. However, the cost function defined above is the group delay. Suppose that N denotes current generations and C denotes the fitness of calculated individual. We define fitness function as follows:
Crossover. Randomly choose the crossover point in an individual set. Then, the tag number before or after the crossover point is swapped. However, the same point may appear twice in a single individual set after crossover. For example, this occurs if individual A is (3, 4, 7, 9); individual B is (1, 3, 5, 7), and the crossover has happened at the second point; one of the results is (3, 3, 5, 7). In case of such a scenario, we extract the set of the same points from individuals and cross the left part. After the crossover, this set is added into both results. During the experiment, an adaptive crossover probability is used based on the work of Srinivas and Patnaik in [25]. Mutation. Given a mutation probability, for each point in the individual set, randomly generate a number between 0 and 1. If the random number is bigger than mutation probability, select another point in candidate set B to replace this point. An adaptive mutation probability is also used based on [25].
5.2.2. TAG Algorithm
The TAG algorithm is a greedy algorithm proposed by Roy et al. in [16]. The TAG algorithm works as follows. It selects overlay nodes from candidate set B based on a greedy strategy. During each step, the algorithm chooses the node that gives the best value of the cost function. Suppose that there are already m nodes in the overlay node set O. Then, the
5.3. Experimental Results
5.3.1. Comparison between Optimal Solutions and Genetic Algorithm Solutions
As a genetic algorithm is used to acquire the global optimal result, the efficiency of genetic algorithm must first be tested. Table 1 presents the comparison of the genetic algorithm with the global optimal algorithm in a small scale problem. The global optimal algorithm is achieved by the traversing method, so it is limited by the problem scale. For example, the time cost with
Comparison of genetic algorithm results with global optimal results.
As mentioned above, the adaptive crossover and mutation probability are used in the implementation of genetic algorithm. Adaptive function and parameters are the same as described in [25] except a minimum crossover probability is set to 0.1 and a minimum mutation probability is set to 0.05. Population size is set to 300. The iteration stop condition is that the fitness value remains unchanged for 100 iterations.
As Table 1 in the following page presents, the genetic algorithm works efficiently with the above definitions and parameters. The obtained results of genetic algorithm are very close to the results calculated by traversing method. Also, the best result of genetic algorithm can be the same as optimal result as illustrated in Table 1. These results illustrate the efficiency of the proposed genetic algorithm. During the experiment, we also found that increased population size would lead to increased probability of getting optimal results but with increased time cost too.
5.3.2. Comparison of Different Algorithms
Figure 3 compares the results of the local search algorithm, the TAG algorithm, and the best result achieved by the genetic algorithm.

Algorithm results comparison.
Clearly, in Figure 3, the results of local search algorithm are almost as good as the best result of the genetic algorithm, while the results of the TAG algorithm are very unstable. Although in some cases the TAG algorithm can obtain a result close to local search algorithm, the result of TAG is not monotonic, as shown in Figure 3. This occurs when the candidate set B increases but the result decreases. This is because TAG selects a node that gives the best cost value according to the current node set. However, a good node in one step may not be part of the best overlay node set. And, once the node is selected, there is no strategy for replacement.
Figure 4 illustrates the time cost of different algorithms. Time cost of local search algorithm and TAG algorithm is the mean time cost of 500 tests. Time cost of genetic algorithm is calculated based on the mean value of iteration step which returns the best solution.

Time cost comparison.
As Figure 4 presents, the genetic algorithm is more time-consuming than the local search algorithm, and the time cost of TAG is minimal. This result is consistent with the theoretical analysis. Additionally, these results clearly show that the time cost of the local search algorithm is linear with the size of candidate set B.
5.3.3. Stability of the Proposed Algorithm
As there is no randomness in the TAG algorithm, each of the 500 tests obtains the same result. For the proposed local search algorithm, there are random steps, so the corresponding result may fluctuate. However, during the tests, we discovered that the result of the proposed algorithm only varies over a small interval; therefore, the proposed algorithm works stably. Table 2 presents the 95% quartile of 500 tests and the 95% confidence interval of the mean value. Regardless of the distribution of the results, the confidence interval of the mean value is still reasonable because of the central limit theorem. Each of the experiments can be treated as an independent random variable, and these independent random variables have the same mean and variance. Thus, the mean value of these independent random variables follows the normal distribution.
Stability of proposed algorithm.
As Table 2 presents, the proposed algorithm works quite stably.
5.3.4. Impact of the Number of Overlay Nodes
In Section 3, we noted that because of the cost of deploying overlay nodes and the cost of maintaining communication delay information between overlay nodes, the size of set O should be limited. Figures 5 and 6 present the algorithm results and time cost with k increasing under the same M.

Algorithm results with an increasing k.

Time cost with an increasing k.
As Figures 5 and 6 show, the algorithm result and the number of overlay nodes are not linearly correlated. When k increases to a certain range, the algorithm results may remain unchanged. In contrast, the time cost will keep increasing. Thus, considering both the algorithm efficiency and cost of deployment, the size of the overlay set should be limited.
6. Experiment with Network Simulation
In this section, the proposed local search algorithm is tested in a network simulation environment. This section is used to represent the algorithm performance and some factors that impact the performance. The network simulator EstiNet is applied.
6.1. EstiNet Network Simulation
EstiNet is a novel network simulator developed by Wang et al. since 1999 [26]. The current version is EstiNet 8.0, which can be found in [10]. A novel simulation methodology called “kernel reentering methodology” is implemented in EstiNet. It combines the advantages of both simulation and emulation. In contrast to existing network simulators NS2 or OPNET, EstiNet uses the real-life UNIX TCP/IP protocol stack during the simulation, and thus all the real-life network application programs can readily run on any node in a simulated network without any modification. This approach can only be performed in emulation mode with other simulators. However, with emulation mode, the performance of switches, links, and so forth, is controlled by the operating system, which makes the results unpredictable; thus, the simulation result cannot be repeated precisely. There is a useful tool in EstiNet called “Generate Large Internet-like Network.” It automatically generates a large network that is similar to the Internet. We use this tool to implement the following experiments.
6.2. Experiment Design
Figure 7 in the following page shows the schematic diagram of the proposed experiment. The red points denote the sensors that are used to collect data. The black points denote the data analyzer used to gather data from sensors. The blue points denote the overlay node, and the red dotted lines denote the logic path from sensor to analyzer through the overlay node. A data-generating program is implemented in each sensor which periodically generates data packets with size L. A data server program is implemented in the analyzer. A data transfer program is implemented in the overlay node, and it retransfers each packet from the sensor to the next overlay node or data server. The goal is to find the optimal overlay node set with given size k that minimizes the sum of time delay from each sensor to the analyzer.

Schematic diagram of the experiments.
In a realistic environment, a delay testing program and transfer program should be placed at each candidate node. Once the optimal overlay set has been found, the transfer program in overlay node is set to continue while the other transfer program suspends. The recalculation should be triggered by time or by network events such as increasing network delay from sensors. For simplicity of analysis, we decompose the progress in the simulation. The simulation steps are as follows.
First, we add network traffic in the simulation environment. Then, we find the original time delay from the sensors to the analyzer. In this step, data-generating programs implemented in sensors are directly sending data to the server with the server IP address. To obtain the optimal overlay node set, a delay matrix that indicates the time delay of each of the two nodes in candidate set B is needed. Thus, a program to measure time delay from each of the two nodes is required. Here, we use the ping method to acquire the RTT of each of the two nodes. We modified the ping program to calculate the mean RTT and record it in a file. We construct the delay matrix out of measured time delay and find the optimal overlay node set. We implement the node transfer programs in the overlay nodes and retest the group time delay from the sensors to the analyzer.
The simulation network includes 167 nodes; we randomly select 11 nodes as sensors and 1 node as the analyzer. The overlay node set size is set to 5, and all the other nodes are added into candidate node set. 5 overlay nodes are used because system performance improvement becomes trivial with more overlay nodes. To eliminate the effect of randomness of the network, each of the following experiments has been repeated five times.
6.3. Experimental Results and Analysis
Network communication delay consists of four parts: nodal processing delay, queuing delay, transmission delay, and propagation delay. Queuing delay, nodal processing delay, and propagation delay can be measured by RTT, while transmission delay is related to network bandwidth and data size. To test the overlay network performance with different conditions, different data sizes are used in the experiment. In addition, all the network programs used in the experiment are implemented based on the Linux socket. Within socket programming, a large file should be decomposed into small packet for transfer; otherwise, the delay would increase. Consequently, packet size is also used as a parameter in the experiment. Some interesting phenomena are revealed.
Table 3 illustrates the overlay network performance with increasing data size. Here, D denotes the size of data sent from sensors and P denotes the program transfer packet size. The original cost means the group delay from sensors to analyzer without the overlay. The overlay network cost means the group delay after applying the overlay. The total improvement is calculated with a percentage.
Comparison of overlay network performance with increasing data size.
Table 3 illustrates that overlay network with the proposed algorithm can improve network delay under all conditions. Additionally, the transmission delay would affect system performance. When the sensor data size is small enough, the delay improvement is extremely good. This performance would decrease with the increase of sensor data size. However, after data size achieves a certain range, system performance would be stable. In addition, Table 2 also reveals that the transferring packet size should be set in agreement with MTU. When P is 1430, the system performance is the best. (Although when P is 1800, the delay improvement seems better and the actual delay cost increases.)
6.4. Further Discussion
Although the above experiments show good results of overlay network with the proposed algorithm, there is still work to be done. As described above, the weight of an edge in a network is represented by the average RTT. This weight matrix is the basic requirement for optimization. However, the mean value may not be sufficient to denote this network edge weight matrix. According to current research in internet performance, both internet traffic and delay show the self-similarity characteristic [27, 28]. In a self-similarity time-series, mean value and variance are not appropriate for describing the property because they may not exist [29]. To simulate the effect of hop phenomenon of RTT in real network, a large file is randomly added into network traffic. According to [30], this may be one cause of self-similarity. In Figure 8, the effect of such a phenomenon is shown.

Time cost with random large files.
Figure 8 shows the group delay of 30 former experiments with
7. Conclusions
Network delay is one of the critical issues in IoT applications. Based on both performance and ease of implementation, an internet based dual-layer network, which refers to the overlay network, is suitable for the IoT. Overlay routing has been proved to be a feasible solution to optimize the end-to-end delay. In this work, one of the most important issues in overlay routing, the overlay node placement problem (ONPP), has been discussed. The NP-hardness of multihop k-ONPP has been proven, and a theoretical boundary for k-OPNN optimization is provided.
With defined parameters ω and α, there is no polynomial algorithm with an approximation ratio that is less than
The local search algorithm is finally tested with the network simulator EstiNet. The experimental results show a stable benefit from the proposed method. Moreover, an additional discussion about measuring network edge weight is provided, which reveals a future research direction for ONPP.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work is supported in part by the Ministry of Science and Technology of China under the National 973 Basic Research Program (Grants no. 2013CB228206 and no. 2011CB302505) and the National 863 Science and Technology Support Program (Grant no. 2013BAH19F01), the National Natural Science Foundation of China (Grants no. 61233016), the China Southern Power Grid Science and Technology project (K-SZ2012-026), and the Tsinghua National Laboratory for Information Science and Technology (TNList) cross-disciplinary research program. The authors thank the EstiNet Company for providing a free license for the EstiNet software and technical support.
