Abstract
Three-dimensional network on chip (3D NoC) is developed based on three-dimensional integrated circuit, system on chip and two-dimensional network on chip. The 3D NoC is mainly used to solve the problems such as communication bottleneck of highly integrated chips. Mapping of 3D NoC is a key problem in the research area of 3D NoC. The authors proposed a low-power mapping algorithm based on quantum-behaved particle swarm optimization. Simulation results show that the proposed algorithm can significantly reduce power consumption, but with large-scale application characteristic graph, the efficiency of power optimization cannot be improved very much. To solve this problem, a low-power mapping algorithm for 3D NoC based on diversity-controlled quantum-behaved particle swarm optimization is proposed in this paper. Simulation results show that for large-scale application characteristic graph, this algorithm is able to maintain a stable power optimization efficiency (4.08–8.04%) and converges much faster.
Keywords
Introduction
Three-dimensional network-on-chip (3D NoC) is an effective solution to solve the problem of complex interconnection for large-scale system on chip (SoC). Since the adoption of integrated circuit stacking technology, 3D NoC breaks the limitations on performance and size of two-dimensional NoC (2D NoC). Mapping of 3D NoC is a key issue in 3D NoC research field. The results of mapping will directly affect the power consumption of communication in system, delay and other properties. 1 In terms of the optimal solution search method, 3D NoC mapping algorithm can be divided into three categories, i.e. search algorithms, construction algorithms, and enumeration algorithm. Search algorithms are the most commonly used and can be divided into two types of heuristic search and systematic search. 2 Heuristic search algorithm based on particle swarm optimization (PSO) is a kind of very typical algorithm to solve 3D NoC mapping problem. Zhang Zhen in Guangdong University of Technology improved the PSO to solve the problem of multi-objective and discrete space in Chip Multiprocessors (CMP) on-chip network mapping algorithm.3,4 Pradip Kumar Sahu et al. in Indian Institute of Technology Kharagpur presented a discrete particle swarm optimization (DPSO)-based strategy to map applications on both 2D and 3D mesh-connected NoC. 5 Bhardwaj and Mane in BITS Pilani-Goa Campus of India proposed centralized 3D mapping (C3Map) using a new octahedral traversal technology and realized the H-C3Map 6 which is formed from C3Map and attractive and repulsive PSO (ARPSO). 7 We have applied quantum-behaved particle swarm optimization (QPSO) to 3D NoC mapping problem for the first time, which reduced power consumption of mapping. But good power-optimized efficiency can only be maintained when the scale of the application characteristic graph (APCG) is within a certain range. To solve this problem, in this paper, a low-power mapping algorithm for 3D NoC based on diversity-controlled quantum-behaved particle swarm optimization (DCQPSO) is proposed. Simulation results show that for large-scale APCG, this algorithm is able to maintain a stable power optimization efficiency (4.08–8.04%), and converges faster.
Basic questions of 3D NoC mapping
Basic concept of 3D NoC mapping
Mapping of 3D NoC is used to decide which IP in a given APCG should be represented by which node of topology architecture graph of 3D NoC to achieve the goal of minimizing power consumption or packet latency described as objective functions.8,9 Mapping problem belongs to quadratic assignment problem and is a NP-hard problem.
10
Figure 1 shows a mapping instance of NoC.
11
A mapping instance of NoC.
Power consumption model
Mapping of 3D NoC is to find the optimal mapping relationship between IP cores and resource nodes of 3D NoC through an algorithm, so that one or more performance metrics such as lowest power consumption, shortest execution time of application, the most balanced network load and so on can be optimized. A major concern with 3D NoC mapping is power consumption among all these metrics. 11
Using the power consumption model given in Yang et al.,
12
power consumption of the process of node ti sending 1 flit to node tj can be formulated as:
In equation (1), μ represents the number of routers between ti to tj.
Mapping of 3D NoC is to find a mapping function under the conditions of a given APCG and TAG so that the following conditions in equations (2) and (3) are met and the total power consumption is lowest, i.e. the objective function in equation (4) is minimum (
For a determined topology,
Review of mapping algorithm of 3D NoC based on QPSO
Description of the problem
QPSO algorithm is proposed by Sun Jun from Jiangnan university of China. 14 QPSO is a global convergence algorithm and has many advantages such as global convergence, less control parameters, faster convergence, and stronger ability of optimization. The authors had applied QPSO algorithm to the mapping problem of 3D NoC for the first time to optimize the mapping position of IP cores of APCG on the resource nodes of 3D NoC.
Realization of mapping algorithm of 3D NoC based on QPSO
In this part, we show the realization of mapping algorithm based on QPSO. The steps are shown in Algorithm 1.
Algorithm 1. Mapping algorithm of 3D NoC based on QPSO
Input: Particle swarm randomly generated.
Output: Optimum position
Step 1: Parameters initialization. Define the number of particles as
Step 2: Calculate the average optimal position of the particle swarm according to equation (10).
Step 3: Calculate the fitness value of each particle according to equation (5). If the particle adapts better than
Step 4: If the fitness of
Step 5: For every dimension of the particle, calculate a random position according to equation (8).
Step 6: Update the position of particles according to equation (7).
Step 7: Correct the obtained position and velocity information of particles according to the determined range of value of position and velocity so that they would not exceed the available space.
Step 8: Decide whether the current iteration number reaches the maximum number
Step 9: Output best and end the algorithm.
Study of mapping algorithm of 3D NoC based on DCQPSO
DCQPSO algorithm
The diversity of QPSO algorithm is relatively high at the beginning of searching since the initialization of particle swarm. In the subsequence searching process, the diversity of the population continues to decline, resulting in stronger local search capability and much weakened global search capability. In the early and middle stages of the search, reducing the diversity of particle swarm is necessary for the search efficiency. However, the diversity of particle swarm is very low at the late stage of the search because the particles are gathered in a relatively small area. In this time, the capability of global searching gets very weak and the possibility of large-scale searching is very small. In this case, if the global best position is the locally optimal solution or suboptimal solutions, the algorithm prematurity will occur.
In order to effectively avoid prematurity and improve the performance of QPSO algorithm, Sun proposed diversity-controlled QPSO (DCQPSO).
14
In DCQPSO algorithm, the search in the algorithm is guided by a diversity measurement
Proposal and realization of mapping algorithm of 3D NoC based on DCQPSO
Theory of mapping algorithm of 3D NoC based on DCQPSO
The fitness function of mapping algorithm based on DCQPSO used is shown in equation (5),
In equation (5),
In equation (6),
In equation (7),
Design and realization of mapping algorithm of 3D NoC based on DCQPSO
In this part, we show the realization of mapping algorithm based on DCQPSO which is proposed in this paper. The steps are shown in Algorithm 2.
Algorithm 2. Mapping algorithm of 3D NoC based on DCQPSO
Input: Particle swarm randomly generated.
Output: Optimum position
Step 1: Parameters initialization. Define the number of particles as
Step 2: Calculate the average optimal position of the particle swarm according to equation (10).
Step 3: Calculate the fitness value of each particle according to equation (5). If the particle adapts better than
Step 4: If the fitness of
Step 5: Calculate the diversity
Step 6: Calculate the contraction–expansion coefficients α according to equation (9).
Step 7: If the diversity
Step 8: For every dimension of the particle, calculate a random positon according to equation (8).
Step 9: Update the position of particles according to equation (7).
Step 10: Decide whether the current iteration number reaches the maximum number MAX_GENERS. If it is reached, then output best, else jump to step 2 and iteration number adds 1.
Step 11: Output best and end the algorithm.
Experiments and performance analysis
Simulation platform and parameter design
Simulation platform
In this paper, Access Noxim 0.2 developed by Kai-Yuan Jheng in National Taiwan University is adopted as our simulation tool.
15
The Access Noxim is a co-simulation platform for 3D NoC system that couples the network model, power model, and thermal model. They integrate Noxim (a cycle-accurate SystemC NoC simulator) and HotSpot (HotSpot provides the architecture-level thermal model), and adopt the power model of Intel’s 80-core processor.
Choice of topology and routing algorithm
The 3D mesh architecture is formed by extending 2D mesh directly to 3D space
16
and has a regular structure. The position of network nodes of 3D mesh is homogeneous and the wiring is simple. TSV technology is used in the vertical direction which can reduce the overall wiring length. In our experiment, 3D mesh is used in our simulations. As for routing algorithm, we adopt XYZ routing algorithm. In XYZ algorithm, data packets are passed first in the x-axis, and then in the y-axis, finally in the z-axis. The algorithm is easy to understand and be realized and it is the most commonly used routing algorithm.
Parameter initialization
We initialized the parameters of the algorithms and the simulation tool as follows. Population size is set as 200 and the number of iterations is set 100. Data packets are injected in the way of memory-less Poisson distribution. Packet injection rate is set as 0.02. The total power consumption of 5000 cycles is summarized. The size of data packet is between 2 flits to 10 flits. The size of cache of each channel of the router is 8 flits.
Experiment results and performance analysis
In order to compare our proposed mapping algorithm of 3D NoC based on DCQPSO with the original mapping algorithm of 3D NoC based on QPSO, we conduct the experiments of these two algorithms on the same simulation platform. To reflect the universality of the algorithms, we use both classic APCG and random APCG in our experiments. Classic APCG includes PIP (8-core), MWD (12-core), VOPD (16-core), and DVOPD (32-core). Random APCG is generated by using task builder TGFF 17 and different APCGs have different numbers of IP cores. For each APCG, simulation experiments were conducted separately using mapping algorithms based on DCQPSO and QPSO. Then the communication mode files with the best solutions were generated. Finally, the communication mode files are fed into Access Noxim to start the simulation. In this paper, both convergence speed and power consumption of the two algorithms are compared.
Comparison on convergence speed of mapping algorithms based on DCQPSO and QPSO
Comparison on convergence speed of the two algorithms for classic APCG
For classic APCG, comparison on convergence speed in 100 iterations of mapping algorithms based on DCQPSO and QPSO is shown in Figure 2, where the abscissa represents the number of iterations, the ordinate represents fitness value and the solid circle represents the convergence point of the algorithm. Take Figure 2(b) as an example, for MWD, mapping algorithm based on DCQPSO converges to the optimal fitness value 2220 at the 35th iteration while mapping algorithm based on QPSO converges to the optimal fitness value 2189 at the 42th iteration. From Figure 2, we can observe that for any classic APCG, mapping algorithm based on DCQPSO can get optimal solutions within a few iterations and an obvious improvement in convergence speed can be observed compared with the mapping algorithm based on QPSO (about 16.67–66.67%). Also, except for the PIP, optimal solution of mapping algorithm based on DCQPSO fits better than the optimal solution of mapping algorithm based on QPSO. Because the number of IP cores of PIP is less, QPSO algorithm is not easy to fall into prematurity and we can get the same optimal solution with DCQPSO algorithm. It also can be seen that for APCG with large number of IP cores, the advantages of mapping algorithm based on DCQPSO are more obvious, i.e. it can get better solutions within a short period of time.
Comparison on convergence speed of the two algorithms for random APCG
Comparison on convergence speed for different classic APCG. (a) Comparison on conversation speed for PIP, (b) MWD, (c) VOPD, and (d) DVOPD.

In this paper, random APCGs with 26-core, 51-core, 75-core, 101-core, 127-core, and 151-core are generated using TGFF. For different random APCGs, comparison on convergence speed in 100 iterations of mapping algorithms based on DCQPSO and QPSO is shown in Figure 3. Take Figure 3(e) as an example, for random APCG with 127 cores, mapping algorithm based on DCQPSO converges to the optimal fitness value 164,330 at the 49th iteration while mapping algorithm based on QPSO converges to the optimal fitness value 154,395 at the 77th iteration. From Figure 3, we can observe that for any random APCG, convergence speed of mapping algorithm based on DCQPSO is improved obviously related to the mapping algorithm based on QPSO (about 12.05–50.52%). The final solution of mapping algorithm based on DCQPSO is also better than that of mapping algorithm based on QPSO.
Comparison on conversation speed for APCGs with different numbers of IP cores. (a) Comparison on conversation speed for 26-core APCG, (b) 51-core APCG, (c) 75-core APCG, (d) 101-core APCG, (e) 127-core APCG, and (f) 151-core APCG.
Comparison on power consumption of mapping algorithms based on DCQPSO and QPSO
In our simulation experiments, for each APCG, the two algorithms are separately repeated for 10 times and the best individual is used to generate communication mode files.
Comparison on power consumption of the two algorithms for classic APCG
For different classic APCGs, mapping algorithm based on DCQPSO and QPSO are repeated for 10 times, respectively, and get the value of power consumption from the simulation platform. The experiment results are analyzed and the average power consumption, maximum power consumption, and minimum power consumption of the two algorithms for the same APCG in 10 simulation trials are compared separately.
The comparison on the average power consumption of mapping algorithms based on DCQPSO and QPSO is shown in Figure 4. From Figure 4 we can observe for all classic APCGs, the power consumption of mapping algorithm based on DCQPSO is lower than that of mapping algorithm based on QPSO with maximum case as 5.29% (for DVOPD) and the minimum case as 0.36% (for PIP).
Comparison on the average power consumption for classic APCG.
The comparison on the maximum power consumption of mapping algorithms based on DCQPSO and QPSO is shown in Figure 5. From Figure 5 we can see that for all APCGs, the power consumption of mapping algorithm based on DCQPSO is lower than that of mapping algorithm based on QPSO with the maximum case as 6.85% (for DVOPD) and the minimum case as 0.25% (for PIP).
Comparison on the maximum power consumption for classic APCG.
The comparison on the minimum power consumption of mapping algorithms based on DCQPSO and QPSO is shown in Figure 6. From Figure 6 we can see that for all APCGs, the power consumption of mapping algorithm based on DCQPSO is lower than that of mapping algorithm based on QPSO with the maximum case as 3.89% (for VOPD) and the minimum case as 0.70% (for PIP).
Comparison on the minimum power consumption for classic APCG.
In summary, it can be observed that power consumption is reduced in varying degrees with mapping algorithm based on DCQPSO compared to the original mapping algorithm based on QPSO. Related to mapping algorithm based on QPSO, the reduction ratio of power consumption of mapping algorithm based on DCQPSO is shown in Figure 7, where the abscissa represents different classic APCGs, the ordinate represents the ratio of reduction of power consumption of mapping algorithm based on DCQPSO compared to mapping algorithm based on QPSO. When the number of IP cores of APCG is small, QPSO algorithm rarely falls in prematurity and the advantages of DCQPSO algorithm is not very obvious. When the number of IP cores of APCG is large, QPSO algorithm tends to fall in prematurity and the ability of optimization of DCQPSO algorithm is stronger.
Comparison on power consumption of the two algorithms for random APCG
The reduction ratio of power consumption of mapping algorithm based on DCQPSO for classic APCG.

In this paper, random APCGs with 26-core, 51-core, 75-core, 101-core, 127-core, and 151-core are generated using TGFF. For different random APCGs, mapping algorithm based on DCQPSO and QPSO are repeated for 10 times, respectively, and get the value of power consumption from the simulation platform. The experiment results are analyzed and the average power consumption, maximum power consumption, and minimum power consumption of the two algorithms for the same APCG in 10 simulation trials are compared separately. The comparison on the average power consumption of mapping algorithms based on DCQPSO and QPSO is shown in Figure 8. From Figure 8 we can see that for all random APCGs the power consumption of mapping algorithm based on DCQPSO is lower than that of mapping algorithm based on QPSO with the maximal decrease could reach 5.51% (for 127-core) and the minimum case as 4.26% (for 101-core).
Comparison on the average power consumption for random APCG.
The comparison on the maximum power consumption of mapping algorithms based on DCQPSO and QPSO is shown in Figure 9. From Figure 9 we can see that for all random APCGs, the power consumption of mapping algorithm based on DCQPSO is lower than that of mapping algorithm based on QPSO with the maximum case as 5.57% (for 75-core) and the minimum case as 3.56% (for 26-core).
Comparison on the maximum power consumption for random APCG.
The comparison on the minimum power consumption of mapping algorithms based on DCQPSO and QPSO is shown in Figure 10. From Figure 10 we can see that for all random APCGs the power consumption of mapping algorithm based on DCQPSO is lower than that of mapping algorithm based on QPSO with the maximum case as 8.04% (for 75-core) and the minimum case as 4.08% (for 151-core).
Comparison on the minimum power consumption for random APCG.
In summary, it can be observed that for random APCG, power consumption is reduced in varying degrees and the performance is more stable with mapping algorithm based on DCQPSO compared to the original mapping algorithm based on QPSO. This is because once the mapping algorithm based on QPSO falls into prematurity it will lead to poor results. In the comparison of mapping algorithms based on QPSO and PSO, mapping algorithm based on QPSO is most suitable for APCG with about 60 IP cores. But for APCG with large-scale, the efficiency of power-optimization cannot be improved very much. For APCG with large number of IP cores (100-core to 150-core), optimal results of the proposed mapping algorithm based on DCQPSO are still relatively stable and power consumption can decrease by 4.08–8.04%. Related to mapping algorithm based on QPSO, the reduction ratio of power consumption of mapping algorithm based on DCQPSO is shown in Figure 11, where the abscissa represents different random APCGs, the ordinate represents the ratio of reduction of power consumption of mapping algorithm based on DCQPSO related to mapping algorithm based on QPSO.
The reduction ratio of power consumption of mapping algorithm based on DCQPSO for random APCG.
Conclusions
In this paper, both mapping algorithms based on DCQPSO and QPSO are used to solve the problem of 3D NoC mapping. Simulation results show that mapping algorithm based on DCQPSO can maintain diversity from start to finish and is less prone to fall into prematurity. Convergence speed is significantly improved (12.05–66.67%) and power consumption of mapping is effectively reduced (4.08–8.04%). For APCG with different number of IP cores, the optimal efficiency is very stable. In future studies, we will continue to optimize the objective function and improve the efficiency of 3D NoC mapping in different ways. In addition, we will focus on the study of 3D NoC mapping algorithm for different topologies and routing, such as heterogeneous topologies and personalization routing, so that the proposed low-power mapping algorithm based DCQPSO can be more generally applied.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
