Abstract
Closed-circuit televisions serve as prevention against crime, and many studies for closed-circuit television deployment have been conducted. The closed-circuit television deployment in downtown is similar to solving coverage problem in wireless camera-based sensor networks. The difference between the two problems is various environmental factors such as buildings, roads, camera capability, and movements of pedestrians. We use a genetic algorithm to increase the efficiency of closed-circuit television deployment in two-dimensional topography. In addition, a parallel experiment using general-purpose computing on graphics processing units is added to improve computing speed, which is a disadvantage in genetic algorithms. The target region is 500 m × 500 m and consists of 50 × 50 grids. The fitness of the evaluation, which refers to a detection rate, is calculated from the corresponding cell when a pedestrian moves to each cell depending on whether the pedestrian is detected. The proposed experiment was superior to the random deployment experiment by approximately 37.5%. There was no significant difference in the detection rate between the CPU experiment and a NVIDIA GeForce GTX 970 experiment in the 95% confidence interval. The efficiency of a CUDA kernel function using the NVIDIA GeForce GTX 970 graphic card was analyzed.
Introduction
Boundary coverage1–3 involves the detection of an entity by forming a boundary when an entity passes a region of interest (RoI). This is applicable to various fields such as border surveillance, protection of resources, surveillance patrol, and invasion detection. Many sensor nodes must be deployed to cover the total RoI in order to reach full coverage. However, the number of deployed sensor nodes can be substantially reduced by forming boundary coverage and detecting the moving entities. The efficiency of boundary coverage increases significantly with wider RoI, but problems occur because coverage holes appear.
Boundary coverage can be classified into weak barrier coverage and strong barrier coverage methods. Weak barrier coverage detects the moving entities that pass the RoI in a fixed path. Strong barrier coverage detects moving entities that pass the RoI in a random path. 4
Metaheuristic algorithms such as tabu search, gradient ascend, simulated annealing, and evolution operations are used to optimize the boundary coverage. This is a non-deterministic polynomial-time (NP) hardness problem. 5 A genetic algorithm,6,7 which is a type of evolution operation, is used in this study to optimize the boundary of closed-circuit television (CCTV) deployment. Adopting parallelism is more convenient in genetic algorithms compared with other metaheuristic-based algorithms because high-quality solutions can be easily gained in a short time during parallel processing.
CCTVs are used in crime prevention, criminal tracking, invasion detection, and boundary surveillance patrol. Uses of CCTV equipment are gradually increasing. CCTV’s efficacy is especially important in urban areas that have concentrated populations.
It is quite difficult to deploy CCTVs efficiently in wide RoIs and on complex roads using a suitable number of CCTVs. This is an NP-hard problem and it is proper to adopt approaches that apply metaheuristic-based algorithms.
Zang 8 researched the boundary coverage of camera sensor networks and aimed to gain high-sensing coverage. Regardless of the direction from which an object approaches, the camera boundary is generated with full-view coverage. The camera sensor can effectively capture facial images. The authors aimed to generate a boundary using few camera sensors by analyzing a full-view coverage model and camera selection algorithm. Geometric models are used to achieve full-view coverage in both deterministic deployment and random deployment.
Wang and Cao 9 claimed that boundary coverage in camera sensor networks with directivity was not effective only with the sum of RoI sensing ranges. The direction of an object and the camera angle was used to measure the sensing quality. When the camera direction and direction in which the object faces are close enough, one object is completely covered from all angles regardless of its facing direction. This was used to generate a camera boundary that could completely cover all points within the RoI. They proposed a new camera deployment method to achieve boundary coverage in deterministic environments, and they reduced a redundancy to optimize the number of required cameras.
Tao et al. 4 researched a sensing model that had directivity. To provide strong boundary coverage, the researchers used a video camera to make sensors randomly control the direction. First, the researchers proposed to study strong boundary coverage in one-dimension (1D) using a polynomial-time algorithm, which can achieve the required coverage using a minimum number of sensors. Moreover, for strong boundary coverage, a geographical relation was found between deployment region boundaries and sensors with directivity. The concept of virtual nodes was introduced to reduce the solution space from a continuous domain into a discrete domain. A directivity boundary graph that can provide quick answers if the directions of directional sensors exist, to achieve strong boundary coverage in a given region, was generated. If such a direction exists, it was claimed that efficient solutions that minimize the maximum rotation angle of directional sensors could be found. The efficiency of the solutions was proven through simulation.
Navin
10
proposed a new distributed genetic algorithm to overcome camera coverage control problem, which has high time complexity. He/She divided one population into
Shih et al. 11 proposed a distributed algorithm, namely CoBRA (Cone-based Barrier coveRage Algorithm), to achieve barrier coverage in wireless camera sensor networks (WCSNs). The basic concept of CoBRA is that each camera sensor can determine the local possible barrier lines according to the geographical relations with their neighbors. Experiment results show that CoBRA can efficiently achieve barrier coverage in WCSNs. Comparing to the ideal results, CoBRA can use fewer nodes to accomplish barrier coverage in random deployment scenarios.
Navin et al. 12 also showed the ability of genetic algorithm to solve coverage problem in WCSNs. Simulation results of their presented method showed the better coverage ratio compared with the self-orientation algorithm presented in 2008.
Ma et al. 13 focused on the Minimum Camera Barrier Coverage Problem (MCBCP) in WCSNs in which the camera sensors are deployed randomly in a target field. In addition, they proposed an optimal algorithm for the MCBCP. Simulation results demonstrated that their algorithm outperformed the existing algorithm.
Costa and Guedes 14 surveyed the state of the art of the coverage problem issue in WCSNs, regarding strategies, algorithms, and general computational solutions. Open research areas are also discussed, envisaging promising investigation considering coverage in WCSNs.
This study aimed to optimize the deployment of CCTV in two dimensions. A downtown area was set as a background, and the locations and angles of CCTVs were fixed during deployment. Detection is determined as a random probability when both the pedestrian facing direction and CCTV facing direction overlap. This procedure was optimized using a genetic algorithm, and the movement detection of each pedestrian was processed in parallel. Our research deals with a barrier coverage problem. Barrier coverage problem is used to successfully detect events for some objects to cross sensing area. As a special case, CCTV deployments also aim to detect pedestrians crossing a region of downtown. Deterministic deployment and random deployment are types of sensor deployment. Deterministic sensor deployment is generally convenient for installation and is used in accessible environments. Random deployment is used in randomly scattered form when direct deployment is difficult. In this study, deterministic sensor deployment that identifies the locations and number of sensors in advance is used for installation, and a genetic algorithm that is a metaheuristic-based algorithm is used for optimization. Genetic algorithms include inherent parallelism differently from other heuristic algorithms such as tabu search15,16 and gradient ascend.17,18 Our research focused on the improvement on computing time. Tabu search enhances the performance of local search by relaxing a basic rule. At each step, worsening moves can be accepted if no improving move is available. Gradient ascend is a first-order iterative optimization algorithm for finding the maximum of a function. These algorithms do not include any parallelism and they use just one solution to find the optimum.
The remainder of this article is organized as follows. Section “CCTV deployments” explains CCTV deployments based on CPUs. In section “Parallel CCTV Deployments,” we present CCTV placements using a parallel technique such as CUDA. In section “Experiments,” we explain our experimental environments, plan, and results. This article ends with some concluding remarks in section “Concluding remarks.”
CCTV deployments
CCTV deployment is a difficult issue that must be determined by considering many variables. CCTV deployment was simplified as a problem to detect pedestrians by determining only CCTV locations and angles in a two-dimensional (2D) plane. CCTV types and detection rates were set in reference to actual products, and the detection direction was simplified into eight directions.
A downtown was set as the region in the experiment, and virtual topography was used to create Scenarios 1 and 2. Deployment was limited to a 2D plane, and the topography was composed only of buildings and roads. CCTVs can be deployed only on roadsides, and the detection direction and detection distance are different depending on the type of CCTV. The area of the RoI is 500 m × 500 m and is a grid composed of 50 × 50 cells.
The pedestrian moved randomly from a starting point of (
CCTV types and detection capabilities
Detectable distance and detection rate were set as listed in Table 1 by referring to CCTV products types sold in the market. CCTV1 is based on the DS-2CD2010-I model and has a maximum visible distance of 30 m. CCTV2 is based on the OPS-BMIP720P3 model and has a maximum visible distance of 40 m. CCTV3 is based on the VSTARCAM-100BP model and has a maximum visible distance of 50 m.
Distance and the detection rates according to CCTV type.
CCTV: closed-circuit television.
The directions of pedestrians were set in eight types, as shown in Figure 1(a). The detection directions of CCTVs were also set in eight directions. When a CCTV can detect pedestrians, the case is limited to the situation in which the pedestrian facing direction and CCTV facing direction are facing each other, as shown in Figure 1(b).

Relation between the direction of a moving person and that of a CCTV: (a) moving direction of pedestrian and (b) detectable direction of CCTV.
Calculation of detection ratio in each cell
Probability of detection (POD) of CCTV deployment should be calculated according to the movement direction of pedestrians, because CCTV detection has a limited detection direction, unlike the scalar sensor detection, which can detect in all angles. The POD should be calculated again for new movement because the pedestrian and CCTV facing directions are different when the pedestrian moves to a new cell. After adding the number of cases for both detected and undetected cases of a pedestrian in the relevant cell by comparing the calculated PODs and random values, equation (1) is used to calculate the detection rate.
Genetic algorithm
A genetic algorithm was used to deploy CCTVs in the target region, which was composed of 50 × 50 grids. The size of the population was 100, the number of generations was 100, tournament selection is used for selective operation, and the
Selection, crossover, and variation operations are repeated in every generation as many times as the number of populations. Then, the genetic algorithm with generation type is used for a substitution operation. Table 2 lists the parameters of the CPU-based genetic algorithm. Regarding encoding, integers that represent the eight directions are added to the integer-type
GA parameters for CCTV deployment.
CCTV: closed-circuit television; GA: genetic algorithm.

Flowchart of the proposed GA.
Parallel CCTV deployments
POD for each cell in CCTV parallelism should be calculated whenever a pedestrian crosses each cell. The CCTV deployment simulator, which is a fitness function, calculates the POD in each CUDA thread, processes the detection of pedestrians, and returns the number of detected and undetected pedestrians. Our research deals with a barrier coverage problem, of which the goal is how well a deployment detects events for some pedestrians to cross sensing area of CCTVs. Hence, it is very natural that the detection rate is considered as our fitness evaluation measure. The detection rate is calculated by applying equation (1) to the returned result.
The most important part of the parallelism of the CCTV deployment simulator was a path search using the A* algorithm. The A* algorithm priority queue enabled a hip alignment in space with a fixed size. The weighted value was put in the direction of pedestrian movement when setting the level values to diversify moving paths. Moving paths were attempted to be diversified by generating random numbers when setting the level values, but too much computing time was used owing to the frequent uses of random-number generation function. Moreover, the random-number generator used in CPU experiments could not be used in CUDA kernel function in the same way. We made a random-number generator based on the method proposed by Park and Miller. 20 The generator effectively reduced the computing time in CUDA kernel function.
Genetic algorithms have a trade-off between solution quality and computation time. When population size or the number of generations is large, genetic algorithm may converge too slowly. On the contrary, if it is small, the algorithm may take less computing time but produce worse solutions. Genetic algorithms usually adopt 100–200 solutions over 100–500 generations.21,22 Although not shown in this article, we conducted several preliminary experiments varying population size and the number of generations to find better parameter settings. When population size is 100 and the number of generations is 100, we could reach to an appropriate trade-off point.
Calculation of parallel A* algorithm
A path search calculation, which is part of the fitness function, required the longest computing time in the CPU-based CCTV deployment simulation results. Figure 3 shows the configuration of the NVIDIA CUDA platform 23 block used for the parallelism of the path search and thread per block. The number of blocks and thread per block are set as shown in Table 3.

Blocks and threads for detection of moving persons based on NVIDIA CUDA platform.
Blocks and threads used in detection of moving persons.
A total of 100 blocks are set, 10 threads work on a block, and each thread is used to calculate a path search of pedestrians. Usually, the number of blocks and that of threads per block used in CUDA kernel function are set considering the number of CUDA cores and the amount of shared memory of the graphic card used in the experiments. In addition, various optimization algorithms are used to reduce computation time in GPU. We set the number of blocks and that of threads per block considering graphic card specifications given in Table 5, shared memory usage used in CUDA kernel function of Figure 4, and implementation simplicity. We set the number of threads per streaming multiprocessor (SM) to be 10 because the amount of shared memory used in the CUDA kernel function was large.

Pseudocode of kernel function for POD calculation.
Implementation of parallel A* algorithm
Figure 4 shows a pseudocode to detect pedestrians who pass the RoI. Location information on all CCTVs deployed for all solutions were encoded in a 1D array;
Topography information for the scenarios and CCTV deployment information are saved in the 2D array
Experiments
Experimental environments
The experiment is composed of various combinations with different types and numbers of CCTVs. To measure the detection rate of pedestrians, a boundary is formed by considering buildings and roads in a downtown environment. Optimizations of sensor and CCTV deployment maximize the detection rate of detectable entities and use fewer CCTVs. A fitness evaluation is performed by using the NVIDIA CUDA platform to enhance the computing speed, which is a disadvantage in metaheuristic algorithms. 5
A statistical method is used on the experimental results of the CPU and GPU to compare and analyze detection rate values. The computing speeds of NVIDIA GeForce GTX graphic cards are compared. In addition, improvements in the kernel function are presented through an analysis of the kernel function using the NVIDIA Nsight profiler.24,25
The C language and CUDA-C language based on NVIDIA CUDA platform 7.0 were used to develop 2D deployment simulators to establish the experimental environment. A CCTV deployment simulator and genetic algorithm based on CPU were implemented using the C language. A parallel CCTV deployment simulator was implemented using the CUDA-C language.
Table 4 shows information about the clock speed, the number of cores, and the amount of CPU memory used in the experiment. (These are commonly used in all of the experiments.) Table 5 lists information about the three types of graphic cards used in the experiment. The NVIDIA GeForce GTX 970 graphic card is mainly used, and the other two cards are used for comparative experiments. The number of CUDA cores and the L2 cache size are the items that have significant influence on enhancing the computing speed of the GPU specification. The L2 cache is used in shared memories and registers. Using the L2 cache can result in great enhancements in the performance of the memory approach speed. The L2 cache of the NVIDIA GeForce GTX 970 is more than three times larger than that of other GPUs, and the time for memory approach and copying can be greatly shortened if this is used effectively.
CPU specifications used in experiments.
CPU: central-processing unit.
GPU specifications used in experiments.
GPU: graphics-processing unit; CUDA: Compute Unified Device Architecture.
A significant reduction in computing time can be achieved when there are many tasks that must be processed at the same time and do not require high-performance operation. In addition, the computing speed can be improved by optimizing software aspects such as block and thread configurations that have high efficiency, using a random-number generator in the GPU environment, 20 and by avoiding or reducing remainder operations and division operations.
Results
The experiment was repeated 30 times, which is the minimum statistically significant number.
19
Scenarios 1 and 2 were classified according to topography and were also classified according to the numbers and types of CCTVs. These are listed in Table 6. An analysis of the experiment results included a statistical analysis of the detection rate using 3
The number and types of CCTVs.
CCTV: closed-circuit television.
Statistical analysis of CCTV deployment
Table 7 lists the results of the experiment in Scenario 1. The averages of the NVIDIA GeForce GTX 970 and CPU were similar to the analysis of the 95% confidence interval in Figure 5(a). The computing speed of the NVIDIA GeForce GTX 970 was approximately 3.7 times faster.
Test results of CCTV Scenario 1.
CCTV: closed-circuit television; GA: genetic algorithm; POD: probability of detection.

95% confidence interval graph of CCTV deployment: (a) 95% confidence interval graph of Scenario 1 and (b) 95% confidence interval graph of Scenario 2.
Table 8 lists the results of the experiment in Scenario 2. The averages of the NVIDIA GeForce GTX 970 and CPU were similar to the analysis of the 95% confidence interval in Figure 5(b). The computing speed of the NVIDIA GeForce GTX 970 was approximately 4.2 times faster. Figure 6 shows the results of a comparison between computing time according to scenario and instance.
Test results of CCTV Scenario 2.
CCTV: closed-circuit television; GA: genetic algorithm; POD: probability of detection.

Comparison between CPU and GPU computing time according to scenarios and the number of CCTVs: (a) CPU versus GTX 970 (Scenario 1) and (b) CPU versus GTX 970 (Scenario 2).
The computing time of the NVIDIA GeForce GTX 560 and NVIDIA GeForce GTX 770 was similar to the results listed in Tables 9 and 10. The computing time of NVIDIA GeForce GTX 970 was decreased by approximately half. In addition, the NVIDIA GeForce GTX 970, which has an L2 cache with a capacity that was three times larger, showed relatively better performance.
Comparison of Test results according to GPU types for CCTV Scenario 1.
CCTV: closed-circuit television; GA: genetic algorithm; POD: probability of detection.
Comparison of Test results according to GPU types for CCTV Scenario 2.
CCTV: closed-circuit television; GA: genetic algorithm; POD: probability of detection.
Figure 7 shows the comparative experiment of computing time using three types of NVIDIA GeForce GTX graphic cards on Scenario 1. An overall even difference in computing time is shown for Instances 1 and 2. The time width of the operation in Instance 3 increases, and it is shown that the GPU operation time has significant influence when there are more than 100 deployed CCTVs.

Comparison of computing time of GPUs according to the number of CCTVs in Scenario 1.
Figures 8 and 9 show the results of Instance 3 experiment and examples of CCTV deployment for Scenarios 1 and 2, respectively. The CCTV detection directions of the initial solutions in Figures 8(a) and 9(a) are different, but the cases of CCTV detection direction facing the direction of pedestrian movement increased slightly in the optimum solutions in Figures 8(b) and 9(b). In Figures 8 and 9, CCTV deployment location was concentrated along the path of pedestrian movement.

Optimized CCTV deployment of Scenario 1: (a) initial solution and (b) improved solution.

Optimized CCTV deployment of Scenario 2: (a) initial solution and (b) improved solution.
CCTV kernel function analysis
An analysis of the kernel function in the three aspects of occupancy, instruction statistics, and issue efficiency were performed using NVIDIA Nsight profiler. A total of 100 blocks were allocated in the implemented kernel function, and 10 threads per block were set.
In the occupancy analysis, the number of warps that can be used when there are 10 threads per block is shown in the “Varying Block Size,” and low values are shown proportionately to the number of blocks and threads per block that were configured. An enhancement in performance is expected by increasing the number of threads per block. A total of 64 registers per thread are used in the “Varying Register Count.” It is considered that to enhance performance, an adjustment procedure is required to increase the number of registers to use per thread. The number of warps when using a shared memory of 1200 bytes per block is shown in the “Varying Shared Memory Usage.” It is considered that an enhancement in performance can be expected by increasing the used shared memory per block.
In the analysis of instruction statistics, the average of total SMs in the “SM Activity” is 93.50%, but the overall SM activity is unbalanced in that the exculpation time is different or the number of scheduled blocks per SM is variable. Thus, different activities in each SM, that is a Tail Effect, have an unfavorable influence on the total execution time. The balance level of workload distribution is also not effective in the “Instructions per Warp.” This can be considered as an effect of cases that have conditional statements depending on the block index, or when there is a high workload only in a few blocks.
The overall low performance appears to be a result of the concentrated operations in a few CUDA cores. In addition, the path-searching part of the A* algorithm is processed in order as in the CPU experiment. A considerable enhancement in performance is expected by improving the parallelism level in the path-searching part.
The analysis of issue efficiency highlights an efficiency issue of the kernel function in CCTV deployment. The number of warps per SM is considerably small in the “Warps per SM,” and the value of “No Eligible” in the “Warp Issue Efficiency” is very high. This shows that the efficiency of the kernel code is not high. The “No Eligible” part is related to the contents in the “Issue Stall Reasons,” and the memory dependency value in the “Issue Stall Reasons” is high, with a value of 70.25%. The values show a high possibility that the requested resource cannot be used or is being completely used by another process or too many processors have requested the resource. It is considered that this problem can be improved at some point by memory alignment and optimization of the memory approach pattern.
Analysis
A comparison of detection performance, comparison of computation time, and analysis of the kernel function using the NVIDIA GeForce GTX 970 were analyzed in the experiment. The simulator used in the experiment is classified into CPU-based sequential processing simulators and GPU-based parallel processing simulators. The C language was used to develop the sequential processing simulator, and a CUDA-C language based on the NVIDIA CUDA platform 7.0 was used to develop the parallel processing simulator.
It was found in the analysis (on the 95% confidence interval graph using 3
Concluding remarks
A generational genetic algorithm was used to optimize 2D deployment in the boundary coverage of this study. The C language and a CUDA-C language were used to develop the sequential processing simulator and parallel processing simulator for the evaluation of fitness.
The CCTV deployment issues in the two downtown environments were used as the case studies, and the results of various experiments using these cases were analyzed. An experimental analysis consisted of a statistical analysis on the detection rate, a comparison between the computing time of the sequential processing experiment and parallel processing experiment, and an analysis of the CUDA kernel function.
The detection rates of the CPU experiment and the experiment using the NVIDIA GeForce GTX 970 graphic card were similar and showed a considerable improvement in computing time. Problems with the kernel function were investigated, and an analysis of future improvements was conducted.
Like other precedent papers, a direct comparison of values with the results of other research is difficult. However, a new method for the optimization of 2D deployment was presented in from a methodological aspect and is expected to be used in various military, security, and invasion detection fields.
It is considered that more realistic experiment results can be obtained if various metaheuristic-based algorithms are used by finding parameter combinations of high efficiency. It is required to find efficient GPU parameter settings to reduce computing time. It is also considered that the efficiency of block settings and memory usage must be increased in order to enhance the parallelism level of the simulators.
Footnotes
Handling Editor: Bin Gao
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Wonkwang University in 2017.
