Maneuvering target recognition method based on multi-perspective light field reconstruction

Abstract

It is difficult to reconstruct the complete light field, and the reconstructed light field can only recognize specific fixed targets. These have limited the applications of the light field in practice. To solve the problems above, this article introduces the multi-perspective distributed information fusion into light field reconstruction to monitor and recognize the maneuvering targets. First, the light field is represented as sub-light fields at different perspectives (i.e. the Multi-sensor distributed network), and sparse representation and reconstruction are then performed. Second, we establish the multi-perspective distributed information fusion under the condition of regional full-coverage constraints. Finally, the light field data from multiple perspectives are fused and the states of the maneuvering targets are estimated. Experimental results show that the light field reconstruction time of the proposed method is less than 583 s, and the reconstruction accuracy exceeds 92.447% compared with the existing spatially variable bidirectional reflectance distribution function, micro-lens array, and others. In the aspect of maneuvering target recognition, the recognition time of the algorithm in this article is no more than 3.5 s. The recognition accuracy of the algorithm in this article is up to 86.739%. Moreover, the more viewing angles used, the higher the accuracy.

Keywords

Multi-perspective light field multi-perspective information fusion light field reconstruction maneuvering target recognition distributed cooperation

Introduction

The light field was first proposed by, a former Soviet Union physicist, Gershun¹ in a classic book in 1936. Gershun pointed out that the light field was the all-light function of a point in a given direction, and the function value represented the brightness per unit area. Due to a large amount of data in the light field and the higher acquisition cost, it is difficult to obtain the light field in reality. In 1996, Levoy and Hanrahan² introduced the concept of the light field into the range of computer graphics for the first time. They simplified the seven-dimensional (7D) function expression of light into a four-dimensional (4D) function expression. By using a small amount of scene geometry information to achieve the entire light field rendering, the initial reconstruction has achieved desired results.

Light field is the parametric representation of a 4D light radiation field that contains the position and direction information in space. In other words, it contains all images taken at the same object in the different positions and angles. The multi-perspective light field is to represent the complete light field with different perspectives. By combining the captured target information through the collaboration among the sensors, it is able to obtain the light field data set under many sensors. Therefore, it can be seen that the light field can naturally serve as a feature library of the target images.

Maneuvering target recognition is the process in which a particular target in movement is distinguished from other targets. The essence of maneuvering target recognition is to determine the image information of the maneuvering target in each frame. In a word, the process of target recognition not only includes the identification of similar targets but also includes the identification of the different type of targets.

With the rapid development of light field reconstruction technologies, the maneuvering target recognition based on light field has been widely studied in the fields of computer vision, pattern recognition, and image processing. Many researchers use the sparsity of the light field in the angle domain to monitor and identify the target. The use of the light field reconstruction technology can ignore the angle problem of shooting images and reduce the conditional constraints of the target recognition, which makes its application more extensive.

The main contributions of this article are as follows:

The multi-perspective representation of the complete light field is the reconstruction of the sub-light field at different perspectives based on the sparsity of the light field itself. This method reduces the amount of data during reconstruction and improves the accuracy of reconstruction.

For the maneuvering targets, the collaboration mechanism model of the multi-perspective light field under regional full-coverage constraints is established to fuse the information captured by the multi-perspective distributed network and to estimate the state of the maneuvering target.

This article conducts extensive experiments to evaluate the performance of the proposed method when recognizing both fixed target and maneuvering targets. The experimental results indicate that the proposed method outperforms state of the art methods.

The remainder of this article is organized as follows. The “Related work” section provides an overview of the related literature. Sections “The multi-perspective representation and reconstruction model of light field,”“The multi-perspective distributed information fusion mechanism,” and “Maneuvering target recognition based on the information fusion of different sub-light fields” describe the models of the representation and the reconstruction for the multi-perspective light field, and the multi-perspective collaboration mechanism, and the maneuvering target recognition method based on the perspectives, respectively. The simulation experiments are discussed in the “Experimental results and analyses” section. Finally, the “Conclusion” section concludes the article, as shown in Figure 1.

Figure 1.

Maneuvering target recognition method based on the multi-perspective light field reconstruction.

Related work

It is difficult to obtain a complete light field. Large amounts of data and complex spots are the obstacles in the process of constructing the light field. Therefore, many researchers have proposed the concept of light field reconstruction using the sparsity of the light field to reconstruct the unknown information of the targets from locally known information. In order to improve the sampling efficiency and image quality of Fourier transform ghost imaging, Zhu et al.³ proposed a spatial multiplexing reconstruction method. Experimental results show that the Fourier spectrum reconstructed by this method has better visibility and signal-to-noise ratio. Lu et al.⁴ effectively solved the scattering problem of underwater light field images through deep convolutional neural networks with depth estimation. Light field cameras are widely used due to the convenient three-dimensional (3D) imaging method. Zhu et al.⁵ reported a basic comparison between unfocused light field (ULF) and focused light field (FLF) cameras. Wang et al.⁶ proposed a spatially variable bidirectional reflectance distribution function ((SV)BRDF) invariance theory for recovering the 3D shape and reflectivity in a light field camera, and carry out a large number of experiments to demonstrate their own methods. Zhao et al.⁷ proposed a 3D flame temperature measurement optical section tomography (OST) technique combining light field imaging. Light field OST (LF-OST) system has the advantage of being simple and fast compared to conventional imaging. Cai et al.⁸ proposed a 3D-structured light field (SLF) and depth-SLF⁹ methods to solve the multi-perspective reconstruction problem. The experimental results show that the above methods all achieve efficient reconstruction of SLF. In order to record the direction information of the light, Chen et al.¹⁰ proposed the selected structural key views (SC-SKV) encoder. Performance has improved significantly compared to traditional methods. Schedl et al.¹¹ proposed an angular super-resolution method by minimizing the consistency of the representative global dictionary to obtain the optimal sampling mask. This method has undergone an essential change compared to the prior art. In order to solve the limitation of the spatial resolution of the light field camera in the micro-lenses, Zhou et al.¹² proposed a method for reconstructing high-resolution from the angular images. The experimental result with a magnification ratio of 8 demonstrates the effectiveness of the proposed method. Alam and Gunturk¹³ proposed a hybrid stereo imaging system. The experimental results show that the method effectively solves the problem of light field camera and retains the ability of light field imaging.

In the area of maneuvering target recognition, Nam and Han¹⁴ proposed a visual tracking algorithm based on convolutional neural networks. In tracking the target, the method combines the pretrained shared layer with a new two-level layer and updates online. For the model drift problem in online tracking, Zhang et al.¹⁵ proposed a multi-expert recovery scheme. In the experiment, the multi-expert recovery scheme proposed by this method significantly improved the robustness of the tracker, especially in the case of frequent occlusion and repeated applications. Tao et al.¹⁶ proposed a new tracker. This method only uses the original observation of the target in the first frame, which is enough to achieve the best performance state. For the visual tracking method lacking training data, Danelljan et al.¹⁷ proposed a discriminative correlation filter (DCF)-based spatial regularization discriminant filter. The goal is to introduce a spatial regularization component correlation filter in penalty learning. Experiments show that the recognition accuracy of this method is higher than the existing trackers in the four data sets. Valmadre et al.¹⁸ proposed an algorithm for training linear templates to distinguish images. The algorithm interprets the closed-form filter as a differential learner and overcomes the limitations of the differentiable layer in deep neural networks. The experimental results show that the method achieves the most advanced performance at high frame rate compared with the existing algorithms.

To sum up, in terms of light field reconstruction and target recognition, many researchers have made effective improvement in light field reconstruction. However, many problems are still existing: (1) obtaining the complete light field is still difficult, (2) the viewing range of the complete light field is limited, and (3) the target may be lost when monitoring the maneuvering target.

The multi-perspective representation and reconstruction model of light field

In the reconstruction process, the large amount of data will lead to the increased error and other problems, so that the running time and the recognition accuracy based on light field reconstruction are not ideal. Thus, the multi-perspective representation model of the light field is presented. By using this model, the complete light field can be effectively divided into the multi-perspective light fields, and the multi-perspective light fields are then reconstructed.

The multi-perspective representation model of light field

The Lego Bulldozer of the light field library of the Stanford University is represented as a weight graph $G = (ν, ε)$ as a given image. The node $ν$ is the pixels of the image, and the light field domain $ε$ is the choice of the neighborhood structure, and the weight of the domain is $e_{ij} \in ε$ . Defining the threshold is $C$ of the multi-perspective light field on the $C : Ω \to R$ , the minimization of the suitable functional is

E (C) = \int_{C} - {| \nabla I (C (ε)) |}^{2} + α {| C_{ε} (ε) |}^{2} + β {| C_{ε ε} (ε) |}^{2} d ε

(1)

where $\nabla I$ is the standard that the threshold value is in the gray gradient area, $α$ is a random parameter, and $β$ is a weighting function. Specific parameters are added to the threshold, that is, minimizing the calculation of the segmentation smooth approximation $u$ of the gamma function $I$

E (C) = \int_{Ω} {(u - I)}^{2} dx + λ \int_{Ω - C} {| \nabla u |}^{2} | dx + v | C

(2)

Let $I : Ω \to R$ be the gray value input map in the field

\begin{matrix} min_{Ω_{t}} {\frac{1}{2} \sum_{t = 0}^{k} pe r_{g} (Ω_{i}; Ω) + \sum_{i = 0}^{k} \int_{Ω_{i}} f_{i} (x) dx} \\ ⋃_{i = 0}^{k} Ω_{i} = Ω, Ω_{s} \cap Ω_{t} = Ø, \forall s \neq t \end{matrix}

(3)

In this article, the convex representation of image segmentation¹⁹ is introduced into multi-perspective light field reconstruction. The region $Ω_{i}$ in the above formula is represented by the labeling function $u : Ω \to {0, \dots, k}$ . The $k$ binary functions $θ (x) = (θ_{1} (x), \dots θ_{k} (x))$ are equivalent to the multi-label function

θ_{i} (x) = {\begin{matrix} 1, & u (x) \geq l \\ 0, & other \end{matrix}

(4)

The marker function $u$ is recovered from these sequentially functions by formula (5)

u (x) = \sum_{i = 1}^{k} θ_{i} (x)

(5)

Therefore, the light field model is divided into many perspectives, as shown in Figure 2.

Figure 2.

The multi-perspective 3D segmentation model.

The reconstruction model of multi-perspective light field

According to the model of the multi-perspective light field, the multi-perspective light field is reconstructed using the light field reconstruction algorithm combined with wavelet transform and sparse Fourier transform. The original light field image is a $(x, y)$ ray imaging grid. Each image represents that the light reaches a micro-lens on the imaging surface, which is from different $(u, v)$ positions of the main lens, and it is shown in Figure 3.

Figure 3.

The light field imaging model.

The original image is made up a series of pixels, and each pixel is a micro-lens image

E_{F} (x, y) = \frac{1}{F^{2}} \int \int L_{F} (x, y, u, v) \cos^{4} θ dudv

(6)

$L_{F} (x, y, u, v)$ is the light field parameter from the target plane $F$ , $\cos θ$ is the attenuation factor due to the optical halo effect

\begin{matrix} L_{F} (x', y', u, v) = L_{F} (u + \frac{x' - u}{α}, v + \frac{y' - v}{α}, u, v) \\ = L_{F} (u (1 - \frac{1}{α}) + \frac{x'}{α}, v (1 - \frac{1}{α}) + \frac{y'}{α}, u, v) \end{matrix}

(7)

Then, the point imaging function can be gotten in any plane if $(x, y, u, v) \to (x', y', u, v)$

E_{(α, F)} (x', y') = \frac{1}{α^{2} F^{2}} \int \int L_{F} (u (1 - \frac{1}{α}) + \frac{x'}{α}, v (1 - \frac{1}{α}) + \frac{y'}{α}, u, v) dudv

(8)

Based on the algorithm, the frequency domain information of images can be obtained by 4D Fourier transform of the multi-perspective light field. In this algorithm, the images are reconstructed by central slice and wavelet inverse transform. The multi-perspective light field is obtained as shown in Figure 4.

Figure 4.

The multi-perspective light field reconstruction flowchart.

The multi-perspective distributed information fusion mechanism

The visual range of the complete light field is limited when monitoring targets.^20,21 This article proposes the multi-perspective light field collaboration mechanism. In the process of monitoring the targets, different sub-light fields are fused according to other perspective information.^22–26

The multi-perspective light field digraph $G = (ν, ε, E)$ represents the network topology among the different perspectives, as shown in Figure 5.

Figure 5.

The directed graph model of the multi-perspective light field.

There is a diagonal matrix as in equation (9)

[I] = diag {λ_{1}, λ_{2}, \dots, λ_{N}}

(9)

The state errors among every perspective are

{\bar{ξ}}_{i} (k) = ξ_{i} (k) - ξ_{0} (k)

(10)

Therefore, the closed loop system of the different perspectives is

\bar{ξ} (k + 1) = (I_{N} \otimes \bar{A}) \bar{ξ} (k) - (I_{N} \otimes \bar{B}) δ ((M \otimes c R^{- 1} {\bar{B}}^{T} P \bar{A}) \bar{ξ} (k))

(11)

The directed graph network is made of different perspectives, and the dynamic performance is demonstrated as in equation (12)

x_{i} (k + 1) = A x_{i} (k) + B δ_{ϕ} (u_{i} (k)), i = 1, 2, 3, \dots, n

(12)

Let the state equation of a certain angle of view $x_{i}$ be

x_{i} (k + 1) = A x_{n} (k)

(13)

According to the above formula, the state equation of $x_{n}$ can be obtained. The global nature of the collaboration among the multi-perspective light field is

\begin{matrix} V (k) = {\bar{ξ}}^{T} (k) (M \otimes P) \bar{ξ} (k) \\ Δ V (k) \leq {\bar{ξ}}_{s}^{T} (k) (M \otimes ({\bar{A}}_{s}^{T} P_{s} {\bar{A}}_{s} - P_{s})) {\bar{ξ}}_{s} (k) \leq 0 \end{matrix}

(14)

The analysis process is achieved in the directed graph topological structure, and the overall coordination of the multi-perspective light field is consistent.

Suppose the target Z is in the alliance distribution $c_{i} = {a_{1}, a_{2}, \dots, a_{n}}$ , and the utility function of the alliance is $v (c_{i})$ . When performing a task on the target Z, the alliance utility function of alliance distribution state originated from different perspectives is $\sum_{i = 1}^{n} v (c_{i}) = v (Z)$ .

If $\sum_{i = 1}^{n} v (c_{i}) < v (Z)$ , there must be two perspectives alliance $c_{i}$ and $c_{i + 1}$ , and the perspectives combined increment is

v (c_{i} \cup c_{i + 1}) - v (c_{i}) - v (c_{i + 1}) \geq 0

(15)

By formula (15), when the target appears, there must be two or more than two perspective light fields to monitor them, as shown in Figure 6.

Figure 6.

The multi-perspective cooperative sampling model.

Maneuvering target recognition based on the information fusion of different sub-light fields

The information fusion of different sub-light fields

The target information is detected by each perspective (i.e. sub-light field) to be fused, and the consistency of the average of all observations is obtained through the support of the capture of information by each perspective.^27–29

The degree of each perspective is

a_{ij} (k) = \frac{| z_{i} (k), z_{j} (k) |}{v_{i} (k) \times {u'}_{uijj}} \mp {({f'}_{ghj} \times {B'}_{gjkk})}^{n}

(16)

In formula (16), $z_{i} (k)$ represents the observed value of the light field at the time $k$ and i viewing angle, and z_j(k) represents the observed value of the light field at the time $k$ and $j$ viewing angle, and $v_{i} (k)$ represents observation noise at the time $k$ .

In formula (17), $r'_{i} (k)$ represents the consistency of observation information for each perspective, and $ϖ'_{fgjk}$ represents the reliability of each perspective in the entire observation space, and $μ'_{fgp}$ represents the observational information variables of each perspective, and all the observation information in the space is fused.

In formula (18), $p'_{hjk}$ represents the probability distribution function of the light field information at each perspective, and $p'_{hj}$ represents the characteristic function of the light field of each perspective, and $l'_{asjj}$ represents the confidence distance of the information relative to the $j$ perspective observation information under the observation of the $i$ perspective, and all observation field light fields is given uniformity of information

W'_{asjj} = \frac{{μ'}_{fgp} \times {ϖ'}_{fgjk}}{{r'}_{i} (k)} \mp \frac{{i'}_{hjk} \pm {p'}_{jk}}{{l'}_{jh}}

(17)

E'_{sdnn} = \frac{{p'}_{hjk} \times {p'}_{hj} \pm {i, j}}{{l'}_{asjj} \times a_{ij} (k)} \times W'_{asjj}

(18)

In formula (19), ${χ ″}_{pojj}$ represents the basic credibility distribution threshold of the multi-perspective light field data, then the information fusion of the distributed multi-perspective data set is completed by using the following formula

b'_{jjhf} = \frac{{χ ″}_{pojj} \mp {c'}_{kl}}{{E'}_{sdnn}} \times W'_{asjj}

(19)

According to the above formula, the fused information of the maneuvering target is obtained, and the observation value of the maneuvering target under the multi-perspective light field is acquired.

The state estimation model of the maneuvering target

The state estimation for maneuvering target is achieved by introducing multiple target motion models,^30–32 and the state estimation of each motion model is weighted. According to a certain probability, it achieves the monitoring of the maneuvering target.

Maneuvering target state estimation process

It is assumed that the target state ${\hat{x}}_{k}^{i}$ , covariance $P_{k}^{i}$ , and model probability $μ_{k}^{i}$ at $k$ time have been obtained. According to the full probability model, the conditional probability function of state $x$ can be decomposed

\begin{matrix} P [x_{k} | Z^{k}] = \sum_{j = 1}^{r} P [x_{k} | m_{k}^{j}, Z^{k}] P {m_{k}^{j} | Z^{k}} \\ = \sum_{j = 1}^{r} P [x_{k} | m_{k}^{j}, z_{k}, Z^{k - 1}] μ_{k}^{j} \end{matrix}

(20)

In the above formula, $m_{k}^{j}$ represents the model in which $k$ operates, and $Z^{k}$ represents measurement information captured up to $k$ time.

The posterior probability function of the maneuvering target’s state is

P [x_{k} | m_{k}^{j}, z_{k}, Z^{k - 1}] = \frac{P [z_{k} | m_{k}^{j}, x_{k}]}{P [z_{k} | m_{k}^{j}, Z^{k - 1}]} P [x_{k} | m_{k}^{j}, Z^{k - 1}]

(21)

The covariance and state estimation of the sub-light fields are integrated

{\hat{x}}_{k} = \sum_{i = 1}^{r} {\hat{x}}_{k}^{i} μ_{k}^{i}

(22)

P_{k} = \sum_{i = 1}^{r} μ_{k}^{i} {P_{k}^{i} + ({\hat{x}}_{k}^{i} - {\hat{x}}_{k}) {({\hat{x}}_{k}^{i} - {\hat{x}}_{k})}^{T}}

(23)

Equation (23) shows that the target is monitored through introducing the diverse of maneuvering target motion models and the probability of each model is weighted.

Maneuvering target recognition

When the multi-perspective light field is scheduled,^33–36 the important focus is how to sure maneuvering target just as the same target. This part uses the proposed algorithm to identify feature point matching targets for maneuvering targets entering the monitoring area.

The Gaussian scale space of an image can be obtained with different Gaussian convolutions where $G (x, y, σ)$ is a Gaussian kernel function

L (x, y, σ) = G (x, y, σ) * I (x, y)

(24)

G (x, y, σ) = \frac{1}{2 π σ^{2}} e^{\frac{x^{2} + y^{2}}{2 σ^{2}}}

(25)

$σ$ is the scale space factor, which is also the standard deviation of the Gaussian normal distribution. It reflects the degree in which the image is blurred. As the value is larger, the image is more blurred, meanwhile, the corresponding scale becomes larger. If the $k$ is the Gaussian scaling factor of the captured image by two adjacent light fields, then the $DOG$ is

\begin{matrix} D (x, y, σ) = [G (x, y, k σ) - G (x, y, σ)] * I (x, y) \\ = L (x, y, k σ) - L (x, y, σ) \end{matrix}

(26)

The above formula can be obtained, and $L (x, y, σ)$ is the image Gaussian scale space. The $DOG$ local extreme points obtained by convolution exist in discrete space.

The feature point $x$ is selected on the maneuvering target, whose offset is $Δ x$ . The degree of the contrast is $D (x)$

D (x) = D + \frac{\partial D^{T}}{\partial x} Δ x + \frac{1}{2} Δ x^{T} \frac{\partial^{2} D}{\partial x^{2}} Δ x

(27)

Since $x$ is the extreme point of $D (x)$ , we derive the above formula and make it 0

Δ x = - \frac{\partial^{2} D^{- 1}}{\partial x^{2}} \frac{\partial D (x)}{\partial x}

(28)

If the contrast threshold is $T$ , finally bringing the found $Δ x$ into equation (29)

D (\hat{x}) = D + \frac{1}{2} \frac{\partial D^{T}}{\partial x} \hat{x}

(29)

If $| D (\hat{x}) | \geq T$ , the feature point is reserved, otherwise it is discarded.

Experimental results and analyses

Experimental settings

Controller hardware

In this article, we use MATLAB under the windows 10 system for experimental simulation. The simulation calculation runs on a small server with a CPU of E5-2630 v4, the main frequency of 2.2 GHz, and a memory of 32 GB.

Experimental parameters

This chapter uses the Stanford University Light Field Library as a light field data set for simulation experiments. The data set contains 17 × 17 angle trials. In order to ensure the feasibility of the experiment, the pixel is changed to 256 × 256. In terms of the light field reconstruction sampling scheme, this article adopts the line model. The sampling rate is the number of sampled lines and dimension of the angle domain.

The multi-perspective light field reconstruction algorithm

The data for this experiment is from the Stanford University Square database. And compared with the existing algorithms (SV)BRDF,⁶ LF-OST,⁷ 3D-SLF,⁸ micro-lens array (MLA)¹³ reconstruction results.

According to Figure 7 and Table 1, the reconstruction accuracy and reconstruction time of each algorithm in the experiment can be clearly seen. Due to the limitation of LF-OST⁷ application background, the reconstruction time of this method is above 600 s, and the reconstruction accuracy is less than 91%. (SV)BRDF⁶ and MLA¹³ are both innovative methods of light field camera reconstruction, and the two algorithms should be compared. The reconstruction accuracy of (SV)BRDF⁶ in Lego Bulldozer was 93.583%, and the reconstruction time in Bracelet was 235 s. Both parameters are higher than the experimental results of the algorithm. The reconstruction time of MLA¹³ on Chess and Lego Bulldozer is less than (SV)BRDF,⁶ but the reconstruction accuracy is lower than (SV)BRDF⁶ and the algorithm. 3D-SLF⁸ belongs to the scope of 3D reconstruction, which increases the complexity of light field reconstruction, so the reconstruction accuracy is low. The reconstruction time of the algorithm is less than 600 s, and the reconstruction accuracy exceeds 92%. Among the eight parameters of the experimental results, the algorithm has six parameters which are superior to the similar algorithms. Therefore, the algorithm in this article is effective.

Figure 7.

The multi-perspective light field reconstruction experiment results. The red box area is the light field reconstruction part of the experiment.

Table 1.

Comparison of light field reconstruction algorithms.

	(SV)BRDF⁶		LF-OST⁷		3D-SLF⁸		MLA¹³		Ours
	s	(%)	s	(%)	s	(%)	s	(%)	s	(%)
Chess	573	93.028	841	86.395	660	84.881	554	89.297	486	95.726
Lego Bulldozer	648	93.583	867	83.761	713	74.625	627	79.531	583	92.447
Bracelet	235	95.702	615	90.599	439	86.022	372	90.862	328	96.213
Treasure Chest	501	91.394	793	85.912	594	79.681	633	83.694	449	92.198

(SV)BRDF: spatially variable bidirectional reflectance distribution function; LF-OST: Light field optical section tomography; 3D-SLF: three-dimensional-structured light field; MLA: micro-lens array; s: time, %: accuracy.

Values in bold indicate excellent indicators of each algorithm.

Maneuvering target recognition performance testing based on the multi-perspective light field reconstruction

In order to verify the effectiveness of the algorithm in this article, this chapter will use the TB-50 and TB-100 video data sets as test data sets in simulation experiments. In this article, we use TensorFlow for experimental simulation under the windows 10 system. The analog calculation runs on a small server with a CPU of E5-2630 v4, a clock speed of 2.2 GHz and a memory of 32 GB. In this chapter, several representative data are selected for recognition, and the algorithm in this article identifies moving targets simultaneously with existing MDNet,¹⁴ MEEM,¹⁵ SINT,¹⁶ SRDCF,¹⁷ CFNet¹⁸ algorithms, as shown in Figure 8. The experimental data for maneuvering target recognition is divided into two categories, one is human and the other is a car. The human video data set includes Kitesurf, Matrix, and Skiting2. In the Kitesurf video data set, all algorithms effectively identify the target, but CFNet¹⁸ has the phenomenon of recognizing the box offset. In the second image of the Matrix video data set, MEEM¹⁵ identifies targets other than the subject. Other algorithms are effective in identifying targets. In the fourth of the Skating2 video data set, only the algorithm effectively identifies the target because the target is severely occluded. The car video data set includes Car24, Blurcar4, and CarScale. In the Car24 video data set, MEEM,¹⁵ SINT,¹⁶ and CFNet¹⁸ identify targets outside the subject. Both the second and fifth images of the Blurcar4 video data set showed severe motion blur. All algorithms effectively identify the target, and the algorithm uses a small red box to characterize the target in the second image. In the CarScale video data set, only MEEM¹⁵ has a frame offset in the first two frames of the video.

Figure 8.

Maneuvering target recognition result.

The time analysis of maneuvering target recognition is shown in Figure 9. Ours-1 represents the single-perspective recognition of the algorithm. In the second and fifth frames of the Car24 video data set, the target recognition time is less than 2 s. In this scenario (green histogram), Ours-1 is the fastest compared to other algorithms. However, the recognition accuracy of Ours-1 in the Car24 video data set is comparable to that of similar algorithms, as shown in Table 2.

Figure 9.

Maneuvering target recognition time for various algorithms in different scenarios. The X-axis represents a scene, and different colors represent different scenes. The Y-axis stands for time. The Z-axis represents different algorithms and different perspectives of the algorithm.

Table 2.

Maneuvering target recognition accuracy.

	MDNet¹⁴	MEEM¹⁵	SINT¹⁶	SRDCF¹⁷	CFNet¹⁸	Ours-1	Ours-2	Ours-3	Ours-Multiple
KiteSurf	76.481	72.821	71.563	74.634	68.653	–	80.391	82.318	84.336
Matrix	73.936	68.714	76.738	75.408	70.182	–	78.062	83	86.739
Skating2	75.427	75.392	73.946	70.691	73.321	–	82.6	83.592	85
Car24	78.662	66.107	68.769	72.662	65.412	77.217	–	–	81.402
BlurCar4	72.025	75	73.868	71.717	70.829	76.761	78.536	80.153	73.428
CarScale	77.138	76.346	74.521	72.426	73.637	–	–	82	84.225

TB-50 and TB-100 detected average accuracy (%). Ours-1: a perspective to detect the target, Ours-2: two perspectives to detect the target, Ours-3: three perspectives to detect the target, Ours-Multiple: four and more than four perspectives to detect the target.

Values in bold indicate excellent indicators of each algorithm.

In the target recognition of the Car24 video data set of Figure 9, we can clearly see that as the viewing angle increases, the target recognition time will also increase. Ours-Multiple average recognition time is close to 3 s, which is 0.3 s–1 s higher than similar algorithms. However, in the six video data sets in Table 2, Ours-Multiple has the highest target recognition rate among the five video data sets. In summary, the algorithm in this article is effective in maneuvering target recognition.

Conclusion

When the light field is reconstructed with a large amount of data, a complex scene and a long time, it is difficult to achieve a complete light field. The monitoring range is limited because the complete light field is stationary. Since the state of the maneuvering target keeps changing, the information of the target motion will be incomplete. In order to solve the above problems, a maneuvering target recognition method based on multi-perspective light field reconstruction is proposed in this article. First, the multi-perspective representation of the complete light field is performed. Based on the sparsity of the light field itself, the sub-light field at different perspectives is performed. Second, the multi-perspective distributed information fusion model under regional full-coverage constraints are established. According to the location of the target, the light fields in each perspective are optimized for sampling. Third, the sampling information of the Multi-perspective distributed network is merged to estimate the maneuvering target state. It can be seen from the experimental results that the proposed method is efficient in both the reconstruction accuracy and the maneuvering target recognition accuracy. There are still some shortcomings in the method proposed in this article. The more angles of view are used, the more difficult it is to converge at the initial position. As a result, the target recognition time increases. In the future work, we will focus on the above issues as the focus of research to solve the problems of large initial errors and difficulties in convergence.

Footnotes

Handling Editor: Salvatore Serrano

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This article was supported by the National Natural Science Foundation of China (61703143), Science and Technology Project of Henan Province (192102310260), Scientific and Technological Innovation Talents in Xinxiang (CXRC17004), the young backbone teacher training project of Henan University (2017GGJS123), and Science and Technology Major Special Project of Xinxiang City (ZD18006).

ORCID iD

Lei Cai

References

Gershun

. The light field. Stud Appl Math 1939; 18(1): 51–151.

Levoy

Hanrahan

. Light field rendering. In: Proceedings of the 23rd annual conference on computer graphics and interactive techniques, New Orleans, LA, 4–9 August 1996, pp.31–42. New York: ACM.

Zhu

et al. Spatial multiplexing reconstruction for Fourier-transform ghost imaging via sparsity constraints. Opt Express 2018; 26(3): 2181–2190.

Uemura

et al. Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gener Comp Sy 2018; 82: 142–148.

Zhu

Lai

Eaton

et al. On the fundamental comparison between unfocused and focused light field cameras. Appl Optics 2018; 57(1): A1–A11.

Wang

Chandraker

Efros

et al. SVBRDF-invariant shape and reflectance estimation from light-field cameras. In: 2016 IEEE conference on computer vision and pattern recognition, Las Vegas, NV, 27–30 June 2016, pp.5451–5459. New York: IEEE.

Zhao

Zhang

et al. Optical sectioning tomographic reconstruction of three-dimensional flame temperature distribution using single light field camera. IEEE Sens J 2017; 18(2): 528–539.

Cai

Liu

Peng

et al. Ray calibration and phase mapping for structured-light-field 3D reconstruction. Opt Express 2018; 26(6): 7598–7613.

Cai

Liu

Peng

et al. Universal phase-depth mapping in a structured light field. Appl Optics 2018; 57(1): A26–A32.

10.

Chen

Hou

Chau

L-P

. Light field compression with disparity guided sparse coding based on structural key views. IEEE T Image Process 2017; 27(1): 314–324.

11.

Schedl

Birklbauer

Bimber

. Optimized sampling for view interpolation in light fields using local dictionaries. Comput Vis Image Und 2017; 168: 93–103.

12.

Zhou

Yuan

et al. Multiframe super resolution reconstruction method based on light field angular images. Opt Commun 2017; 404: 189–195.

13.

Alam

Gunturk

. Hybrid light field imaging for improved spatial resolution and depth range. Mach Vision Appl 2016; 29(1): 11–22.

14.

Nam

Han

. Learning multi-domain convolutional neural networks for visual tracking. In: 2016 IEEE conference on computer vision and pattern recognition, Las Vegas, NV, 27–30 June 2016, pp.4293–4302. New York: IEEE.

15.

Zhang

Sclaroff

. MEEM: robust tracking via multiple experts using entropy minimization. In: European conference on computer vision, Zurich, 6–12 September 2014, pp.188–203. Cham: Springer.

16.

Tao

Gavves

Smeulders

AWM

. Siamese instance search for tracking. In: 2016 IEEE conference on computer vision and pattern recognition, Las Vegas, NV, 27–30 June 2016, pp.1420–1429. New York: IEEE.

17.

Danelljan

Häger

Khan

et al. Learning spatially regularized correlation filters for visual tracking. In: 2015 IEEE international conference on computer vision, Santiago, Chile, 7–13 December 2015, pp.4310–4318. New York: IEEE.

18.

Valmadre

Bertinetto

Henriques

et al. End-to-end representation learning for correlation filter based tracking. In: 2017 IEEE conference on computer vision and pattern recognition, Honolulu, HI, 21–26 July 2017, pp.5000–5008. New York: IEEE.

19.

Cúth

Kalenda

OFK

Petr

. Isometric representation of lipschitz-free spaces over convex domains in finite-dimensional spaces. Mathematika 2017; 63(2): 538–552.

20.

Wang

Han

et al. Cascaded H-bridge multilevel inverter system fault diagnosis using a PCA and multi-class relevance vector machine approach. IEEE T Power Electr 2015; 30(12): 7006–7018.

21.

Baraglia

Cakmak

Nagai

et al. Initiative in robot assistance during collaborative task execution. In: 2016 11th ACM/IEEE international conference on human-robot interaction, Christchurch, New Zealand, 7–10 March 2016, pp.67–74. New York: IEEE.

22.

Wei

Wang

Chen

et al. Psychological contract model for knowledge collaboration in virtual community of practice: an analysis based on the game theory. Appl Math Comput 2018; 329: 175–187.

23.

Wen

Wang

Liu

et al. Recursive distributed filtering for a class of state-saturated systems with fading measurements and quantization effects. IEEE T Syst Man Cy S 2018; 48(6): 930–941.

24.

Cheng

Wang

Jin

et al. New on-orbit geometric interior parameters self-calibration approach based on three-view stereoscopic images from high-resolution multi-TDI-CCD optical satellites. Opt Express 2018; 26(6): 7475–7493.

25.

Pineda

Easley

Karczmar

. Dynamic field-of-view imaging to increase temporal resolution in the early phase of contrast media uptake in breast DCE-MRI: a feasibility study. Med Phys 2018; 45(3): 1050–1058.

26.

Chen

Yang

Song

et al. Calibrate multiple consumer RGB-D cameras for low-cost and efficient 3D indoor mapping. Remote Sens 2018; 10(2): 328–356.

27.

Farid

Lucenteforte

Grangetto

. Evaluating virtual image quality using the side-views information fusion and depth maps. Inform Fusion 2018; 43: 47–56.

28.

Wen

Wang

et al. Recursive filtering for state-saturated systems with randomly occurring nonlinearities and missing measurements. Int J Robust Nonlin 2018; 28(5): 1715–1727.

29.

Ramírez-Gallego

Fernández

García

et al. Big Data: tutorial and guidelines on information and process fusion for analytics algorithms with MapReduce. Inform Fusion 2017; 42: 51–61.

30.

Dan

Miguez

. Nested particle filters for online parameter estimation in discrete-time state-space Markov models. Statistics 2018; 24(4): 3039–3086.

31.

Duník

Straka

. State estimate consistency monitoring in Gaussian filtering framework. Signal Process 2018; 148: 145–156.

32.

Chakraborty

Chattaraj

Mukherjee

. Performance evaluation of particle filter resampling techniques for improved estimation of misalignment and trajectory deviation. Multidim Syst Sign P 2018; 29: 821–838.

33.

Hollands

Terhaar

Pavlovic

. Effects of resolution, range, and image contrast on target acquisition performance. Hum Factors 2018; 60(3): 363–383.

34.

Cai

Long

Shao

. Adaptive RGB image recognition by visual-depth embedding. IEEE T Image Process 2018; 27(5): 2471–2483.

35.

Zhang

. Robust infrared small target detection using local steering kernel reconstruction. Pattern Recogn 2018; 77: 113–125.

36.

Deng

et al. From one to many: pose-aware metric learning for single-sample face recognition. Pattern Recogn 2018; 77: 426–437.