Determining next best view based on occlusion information in a single depth image of visual object

Abstract

How to determine the camera’s next best view is a challenging problem in vision field. A next best view approach is proposed based on occlusion information in a single depth image. First, the occlusion detection is accomplished for the depth image of visual object in current view to obtain the occlusion boundary and the nether adjacent boundary. Second, the external surface of occluded region is constructed and modeled according to the occlusion boundary and the nether adjacent boundary. Third, the observation direction, observation center point, and area information of external surface of occluded region are solved. And then, the set of candidate observation directions and the visual space of each candidate direction are determined. Finally, the next best view is achieved by solving the next best observation direction and camera’s observation position. The proposed approach does not need the prior knowledge of visual object or limit the camera position on a specially appointed surface. Experimental results demonstrate that the approach is feasible and effective.

Keywords

Depth image next best view occlusion information external surface of occluded region visual space

Introduction

The determination of next best view is always one of the important and difficult problems in many fields such as robot navigation, 3D reconstruction, automatic assembly, object recognition, spacecraft docking, and so on, which has attracted extensive attention of scholars.^1

–5

Nowadays, there are two main types of image information used to determine next best view, intensity information,^6
–8 and depth information.^9

–15 The methods based on intensity information are relatively fewer than the methods based on depth information. Compared with 2D intensity image, it is easier to obtain the 3D information of scene from 2.5D depth image; therefore, the existing next best view methods are usually achieved using depth image. Connolly,⁹ as one of the earlier scholars studying next best view, adopted octree model to describe the visual object, and marked each node to determine next best view. But this method doesn’t consider occlusion factor, so for the visual object with occlusion phenomenon, the accuracy of this method can’t be ensured. Scott¹⁰ proposed a method based on the modified measurability matrix (3 M) model to solve the viewpoint planning problem; however, this method needs the prior knowledge of scene. Banta et al.¹¹ proposed a method based on voxel information and overall observation strategy to determine next best view. This method not only needs the prior knowledge of scene, but also limits the camera position on a fixed surface. Li and Liu¹² proposed a viewpoint planning method by calculating information entropy and regarded the view corresponding to maximal information entropy as next best view, but the camera position is also limited on a fixed surface. Vasquez-Gomez at al.¹³ proposed a method based on octree model and ray tracing to determine next best view. Krainin at al.¹⁴ proposed a method based on the contour information of visual object to determine next best view. Zhang et al.¹⁵ proposed a method based on visual servo to determine next best view. But methods in the literature^13
–15 all depend on specific equipment.

Aiming at the shortages of existing next best view methods, such as not considering occlusion factor, needing the prior knowledge, limiting camera position, or depending on specific equipment, this article proposes a next best view method based on the occlusion information of visual object. First, detect the occlusion of visual object and construct the occluded region. Then, model the external surface of occluded region to solve next best view. What needs to point out is that the proposed method is different from the next best view method in 3D reconstruction, and it mainly aims at observing the occluded region of visual object. The next best view is determined based on the occlusion information in a single depth image of visual object, and then the observation of occluded region of visual object can be achieved. The rest of the article is organized as follows. “Method overview” section describes the overview of proposed method. “The determination of next best view” section discusses how to determine next best view based on the occlusion information of depth image in detail. “Experiment and analysis” section shows the experimental results and comparison analysis. The last section concludes the article.

Method overview

Problem description of next best view

The next best view problem can be defined as how to solve the next best observation direction and the position of camera, where the camera can observe the maximal occluded region of visual object. Figure 1 shows the spatial relation when an ideal visual object is observed by the camera in a certain view. The ideal visual object is compsed of the triangle ABC and the quadrilateral BCDE. In current camera view, the triangle A′BC is the part of the quadrilateral BCDE occluded by the triangle ABC. AB and AC are the occlusion boundaries and A′B and A′C are the nether adjacent boundaries corresponding to AB and AC, respectively. The interior region of triangular pyramid ABCA′ is the occluded region. Triangles ABA′ and ACA′ are the external surfaces of occluded region. As in the view of orange camera, the maximal occluded region can be observed, and the solution of next best view is to determine the observation direction and the position of orange camera. Because the occluded region information is unknown in current camera view, the external surface information of occluded region is utilized to represent the occluded region information approximately.¹¹ Thus, the problem of calculating the best view for observing the unknown information in occluded region is transformed into the problem of calculating the best view for observing the external surface of occluded region.

Figure 1.

The sketch map of camera observing ideal visual model.

Problem analysis of next best view

Based on the problem description of next best view, we know that the observation about the unknown information in the occluded region can be transformed into the observation about the external surface of occluded region. Therefore, the view where the external surface information of occluded region can be obtained maximally is regarded as next best view. Here, the amount of information is measured by the area of external surface of occluded region. Accordingly, the problem of next best view can be defined as

x_{N B V} = \underset{x_{N V} \in {X}_{N V}}{\arg \max} f (x_{N V}), s . t . 0 \leq f (x_{N V}) \leq S

where x _NBV is the next best view, x _NV is the any next view, $f (x_{N V})$ is the area of external surface of occluded region observed by the camera in the view of x _NV, ${X}_{N V}$ is the set including all possible next views, and S is the total area of external surface of occluded region.

Analysis can be known that the set ${X}_{N V}$ is an infinite set under ideal situation, so it is difficult to solve next best view directly. Although the finite set ${X}_{N V}$ can be obtained by setting constraints on the camera view to solve next best view from set ${X}_{N V}$ by searching, the flexibility of camera will be limited. Meanwhile, the searching process is more time-consuming when the set ${X}_{N V}$ has more elements.

In the view of above case, this article takes the occluded region as the research object to solve the problem of next best view and proposes a novel next best view method based on occlusion information in a single depth image of visual object. The general idea of the method is as follows. First, the occlusion detection is accomplished for the depth image of visual object to obtain the occlusion boundary and the nether adjacent boundary. Second, the external surface of occluded region is constructed and modeled. Third, the observation direction, observation center point, and area information of external surface of occluded region are calculated based on occlusion information. And then, the set of candidate observation directions and the visual space of each candidate observation direction are determined. Finally, the next best view is achieved by comparing the visual space of each candidate observation direction. The overall process of proposed method is shown in Figure 2.

Figure 2.

The overall process of next best view approach.

The determination of next best view

Determining the external surface of occluded region

In this article, the next best view is determined based on occlusion information in depth image. Because the method of obtaining occlusion information in depth image has been mentioned in relevant research,¹⁶ the process of obtaining occlusion information is no longer discussed here. This article focuses on the process of determining next best view based on occlusion information. The occlusion boundary and its corresponding nether adjacent boundary are obtained using the method in the study by Zhang et al.¹⁶ In 3D space, the surface formed by the occlusion boundary and its corresponding nether adjacent boundary along the camera observation direction is called external surface of occluded region. Because several occlusion boundaries can possibly be detected in one depth image, there may exist several external surfaces of occluded region. Any one external surface of occluded region is exampled to elaborate how to model it and calculate its area in this article.

Modeling the external surface of occluded region

The idea of modeling the external surface of occluded region is as follows. First, the occlusion straight-line segment set corresponding to the external surface of occluded region is constructed. Second, each occlusion straight-line segment is divided into the point set, and the total number of the points in all occlusion straight-line segments is counted. Thus, the model of external surface of occluded region can be constructed by dividing the external surface of occluded region into occlusion straight-line segment set and point set. The concrete modeling process is as follows. Take out the ith occlusion boundary point P_i in turn (i∈[1.n], where n is the total number of the points on the occlusion boundary), then construct the occlusion straight-line segment $P_{i} P_{i}'$ with length l_i by connecting P_i and its corresponding nether adjacent boundary point $P_{i}'$ , thus all the occlusion straight-line segments can form the occlusion straight-line segment set. Suppose, the average distance between any two neighboring occlusion boundary points is d_o, and the average distance between any two neighboring nether adjacent boundary points is d_a, the average value $(d_{o} + d_{a}) / 2$ can be regarded as the distance d between two neighboring points on the occlusion straight-line segment during modeling the external surface of occluded region, so the point number N_i on the occlusion straight-line segment $P_{i} P_{i}'$ is defined as

N_{i} = ⌈ \frac{l_{i}}{d + 0.5} ⌉ = ⌈ \frac{2 \cdot l_{i}}{(d_{o} + d_{a}) + 0.5} ⌉

As known from formula (2), after modeling the external surface of occluded region, the total point number N on the external surface of occluded region is

N = \sum_{i = 1}^{n} N_{i}

Figure 3 shows the modeling effect of some one external surface of occluded region based on the method mentioned above. In Figure 3, the green boundary is the occlusion boundary, the blue boundary is the nether adjacent boundary corresponding to the green one, and the red line segment is the occlusion straight-line segment. Meanwhile, the green point is the occlusion boundary point, the blue point is the nether adjacent boundary point, and the red point is the point on the occlusion straight-line segment.

Figure 3.

The sketch map after modeling the external surface of occluded region.

Solving the observation direction and center point of occlusion straight-line segment on the external surface of occluded region

Calculating the observation direction of occlusion straight-line segment

For the solving observation direction of occlusion straight-line segment, it is necessary to determine the plane on which the straight-line segment locates. Take a point $P_{i} (x_{i}, y_{i}, z_{i})$ , which is not the endpoint on the occlusion boundary. Suppose $P_{i}' (x_{i}', y_{i}', z_{i}')$ is the nether adjacent boundary point corresponding to P_i, the occlusion boundary point $P_{i + 1} (x_{i + 1}, y_{i + 1}, z_{i + 1})$ is the right neighboring point of P_i, and the nether adjacent boundary point $P_{i -1}' (x_{i - 1}', y_{i - 1}', z_{i - 1}')$ is the left neighboring point of $P_{i}'$ , thus the triangles $P_{i} P_{i + 1} P_{i}' and P_{i}' P_{i - 1}' P_{i}$ formed by points P_i, $P_{i}', P_{i + 1}, P_{i - 1}'$ can be regarded as two planes on which the occlusion straight-line segment $P_{i} P_{i}'$ locates, as shown in Figure 4.

Figure 4.

The planes for the occlusion straight-line segments.

By determining the normal vectors which point to the inner of occluded region, of triangles $P_{i} P_{i + 1} P_{i}' and P_{i}' P_{i - 1}' P_{i}$ , the sum of the two normal vectors can be regarded as the observation direction v _i of occlusion straight-line segment $P_{i} P_{i}'$ . Because the vertex coordinates of triangle $P_{i} P_{i + 1} P_{i}'$ are known, the normal vector of the plane for triangle $P_{i} P_{i + 1} P_{i}'$ can be calculated by $V_{P_{i} P_{i + 1}} \times V_{P_{i} P_{i}'}$ , namely

V_{P_{i} P_{i + 1}} \times V_{P_{i} P_{i}'} = | \begin{matrix} i & j & k \\ x_{i + 1} - x_{i} & y_{i + 1} - y_{i} & z_{i + 1} - z_{i} \\ x_{i}' - x_{i} & y_{i}' - y_{i} & z_{i}' - z_{i} \end{matrix} |

Similarly, the normal vector of the plane for triangle $P_{i}' P_{i - 1}' P_{i}$ can be determined by $V_{P_{i - 1}' P_{i}'} \times V_{P_{i} P_{i}'}$ , so the observation direction v _i of occlusion straight-line segment $P_{i} P_{i}'$ is defined as

v_{i} = V_{P_{i} P_{i + 1}} \times V_{P_{i} P_{i}'} + V_{P_{i - 1}' P_{i}'} \times V_{P_{i} P_{i}'}

Considering the first and the last occlusion straight-line segments correspond to the only one triangle, respectively, the normal vector which points to the inner of occluded region of the triangle can be regarded as the observation direction of this occlusion straight-line segment directly.

Calculating the observation center point of occlusion straight-line segment

The midpoint of each occlusion straight-line segment can be regarded as the observation center point of occlusion straight-line segment. Suppose $(x_{i}, y_{i}, z_{i}) and (x_{i}', y_{i}', z_{i}')$ are the occlusion boundary point and the nether adjacent boundary point of the ith occlusion straight-line segment $P_{i} P_{i}'$ , respectively, the corresponding observation center point $c_{i} (x_{c_{i}}, y_{c_{i}}, z_{c_{i}})$ is defined as

(x_{c_{i}}, y_{c_{i}}, z_{c_{i}}) = \frac{(x_{i} + x_{i}', y_{i} + y_{i}', z_{i} + z_{i}')}{2}

Solving the area of external surface of occluded region

In order to determine next best view, it is necessary to solve the area of each external surface of occluded region. In this article, the area of external surface of occluded region is solved based on the mass and surface density information of external surface of occluded region. The methods of calculating the mass M and surface density ρ of external surface of occluded region are given below.

Calculating the mass of external surface of occluded region

The mass of external surface of occluded region can be determined by the number and mass of all the points on the occlusion straight-line segments which belong to this external surface of occluded region. According to “Modeling the external surface of occluded region” section, we can know that the total point number of external surface of occluded region is N after modeling. Suppose m is the mass of each point on the external surface of occluded region, then the mass M of external surface of occluded region can be defined as

M = m \cdot N = m \cdot \sum_{i = 1}^{n} N_{i} = m \cdot \sum_{i = 1}^{n} (⌈ \frac{2 \cdot l_{i}}{(d_{o} + d_{a}) + 0.5} ⌉)

where n is the number of points on the occlusion boundary, namely, n is the number of occlusion straight-line segments on the external surface of occluded region, N_i is the number of points on the ith occlusion straight-line segment, l_i is the length of the ith occlusion straight-line segment, d_o is the average distance between any two neighboring occlusion boundary points, and d_a is the average distance between any two neighboring nether adjacent boundary points.

Calculating the surface density of external surface of occluded region

Because the external surface of occluded region is determined by the occlusion boundary and the nether adjacent boundary, the average linear density of occlusion boundary and nether adjacent boundary can be taken as the surface density ρ of external surface of occluded region. Suppose $(x_{i}, y_{i}, z_{i}) and (x_{i + 1}, y_{i + 1}, z_{i + 1})$ are the two neighboring points on the occlusion boundary, $(x_{i}', y_{i}', z_{i}') and (x_{i + 1}', y_{i + 1}', z_{i + 1}')$ are the two corresponding points on the nether adjacent boundary, so the surface density ρ is defined as

ρ = \frac{2 \cdot n \cdot m}{\sum_{i = 1}^{n - 1} \sqrt{{(x_{i} - x_{i + 1})}^{2} + {(y_{i} - y_{i + 1})}^{2} + {(z_{i} - z_{i + 1})}^{2}} + \sum_{i = 1}^{n - 1} \sqrt{{(x_{i}' - x_{i + 1}')}^{2} + {(y_{i}' - y_{i + 1}')}^{2} + {(z_{i}' - z_{i + 1}')}^{2}}}

where n is the number of occlusion boundary points on the external surface of occluded region, and m is the mass of each point on the occlusion boundary or the nether adjacent boundary, namely, m is the mass of each point on the external surface of occluded region.

Calculating the area of external surface of occluded region

Suppose M and ρ are the mass and surface density of one external surface of occluded region, then the area s of this external surface of occluded region is defined as

s = \frac{M}{ρ}

By substituting formulas (7) and (8) into formula (9), the formula (10) can be deduced as

s = \frac{\sum_{i = 1}^{n} (\frac{2 \cdot l_{i}}{(d_{o} + d_{a}) + 0.5})}{2 n} \cdot (\sum_{i = 1}^{n - 1} (\begin{matrix} \sqrt{{(x_{i} - x_{i + 1})}^{2} + {(y_{i} - y_{i + 1})}^{2} + {(z_{i} - z_{i + 1})}^{2}} \\ + \sqrt{{(x_{i}' - x_{i + 1}')}^{2} + {(y_{i}' - y_{i + 1}')}^{2} + {(z_{i}' - z_{i + 1}')}^{2}} \end{matrix}))

Solving next best view

The “Problem description of next best view” section shows that the next best view consists of the next best observation direction and the next best observation position. In order to determine next best view, it is necessary to determine the set of candidate observation directions and the visual space of each candidate observation direction first.

Determining the sets of candidate observation directions and center points

The observation directions and the observation center points of all occlusion straight-line segments on each external surface of occluded region can be determined according to the methods mentioned in “Calculating the observation direction of occlusion straight-line segment” and “Calculating the observation center point of occlusion straight-line segment” sections. After that, the observation direction and the observation center point of each external surface of occluded region can be determined using the observation directions and the observation center points of all occlusion straight-line segments. Suppose V _t and C_t are the observation direction and the observation center point of the tth external surface of occluded region, v _i and c_i are the observation direction and the observation center point of the ith occlusion straight-line segment on the tth external surface of occluded region, so the V _t and C_t are defined as

{\begin{matrix} V_{t} = \sum_{i = 1}^{n} v_{i} \\ C_{t} = \frac{1}{n} \sum_{i = 1}^{n} c_{i} \end{matrix}

where n is the number of occlusion boundary points on the external surface of occluded region, namely, the number of occlusion straight-line segments on this external surface of occluded region.

The set of candidate observation directions V_candidate and the set of candidate observation center points C_candidate can be obtained by taking V _t and C_t as candidate observation direction and candidate observation center point, respectively. To obtain more candidate observation directions, the addition result of any two observation directions corresponding to two external surfaces of occluded region is also taken as the candidate observation direction. Meanwhile, the midpoint of two center points corresponding to the two observation directions is taken as the observation center point corresponding to new candidate observation direction. Suppose v _i and V _j are any two candidate observation directions in set V_candidate, C_i and C_j are the observation center points corresponding to candidate observation directions v _i and V _j, and V_new and C_new are the new obtained candidate observation direction and observation center point, then V _new and C_new are defined as

{\begin{matrix} V_{new} = V_{i} + V_{j} \\ C_{new} = \frac{(C_{i} + C_{j})}{2} \end{matrix}

All the new obtained V _new and C_new are added to set V_candidate and set C_candidate, respectively. So far, the candidate observation direction set and the candidate observation center point set have been achieved.

Determining the visual spaces of all candidate observation directions

In this article, the visual space of each candidate observation direction in set V_candidate is determined based on the area of each external surface of occluded region. There are two kinds of candidate observation directions in set V_candidate. The first one is the candidate observation direction which corresponds to the external surface of occluded region directly, and the second one is the candidate observation direction which is obtained by combining the first one. The solving methods of visual space are different for these two kinds of candidate observation directions. In addition, we assumed that the whole visual object is in the field of view of camera, and the observation distances of camera are constant and enough in the process of determining the visual space of each candidate observation direction. The solving method of different visual spaces is as follow:

(1) Determining the visual spaces of the first kind of candidate observation directions

Above all, a candidate observation direction v _i, which belongs to the first kind, is selected randomly from set V_candidate. Suppose v _i is any one of the first kind of candidate observation directions in set V_candidate, and α_ij is the included angle between v _i and V _j, then according to the value of α_ij, we can judge whether the external surface of occluded region corresponding to V _j is visible for v _i or not. That is to say, the external surface of occluded region corresponding to V _j is visible for v _i when $α_{i j} \leq \frac{π}{2}$ , and the external surface of occluded region corresponding to V _j is invisible for v _i when $α_{i j} > \frac{π}{2}$ . It can be seen that the visual space S_i which corresponds to the candidate observation direction V _i is the total area of all external surfaces of occluded region that are visible for V _i . Therefore, the visual space S_i is defined as

S_{i} = \sum_{j = 1}^{n_{v}} s_{j} \cdot \cos α_{i j}, α_{i j} \leq \frac{π}{2}

where n_v is the number of the first kind of candidate observation directions in set V_candidate, s_j is the area of external surface of occluded region corresponding to V _j, and α_ij is the included angle between v _i and V _j.

(2) Determining the visual spaces of the second kind of candidate observation directions

Similarly, a candidate observation direction V _k, which belongs to the second kind, is selected randomly from set V_candidate. Suppose v _i is the ith first kind of candidate observation directions in set V_candidate, β_ik is the included angle between v _i and V _k, S_k is the visual space corresponding to candidate observation direction V _k, s_i is the area of external surface of occluded region corresponding to v _i, and n_v is the number of the first kind of candidate observation directions in set V_candidate, then the visual space S_k can be defined as

S_{k} = \sum_{i = 1}^{n_{v}} s_{i} \cdot \cos β_{i k}, β_{i k} \leq \frac{π}{2}

Thus, the visual spaces of all candidate observation directions in set V_candidate can be determined according to the formulas (13) and (14).

Determining next best view

We take the candidate observation direction with the maximal visual space as the next best observation direction V _NBV. Since the element in candidate observation direction set V_candidate and the element in observation center point set C_candidate are one-to-one correspondence, the next best observation center point C_view can be determined according to the next best observation direction V _NBV. And then, the camera observation position P_camera can be determined. Suppose $(x_{N B V}, y_{N B V}, z_{N B V})$ is the coordinates of next best observation direction V _NBV, then $(x_{view}, y_{view}, z_{view})$ is the coordinates of next best observation center point C_view, d_camera is the camera observation distance, and $(x_{P_{camera}}, y_{P_{camera}}, z_{P_{camera}})$ is the coordinates of camera observation position P_camera. Considering the vector formed by the camera observation position P_camera and the camera observation center point C_view has the same direction with the next best observation direction $V_{N B V}$ , then the camera observation position P_camera is defined as

(x_{P_{camera}}, y_{P_{camera}}, z_{P_{camera}}) = (x_{view}, y_{view}, z_{view}) - \frac{d_{camera}}{‖ V_{N B V} ‖} (x_{N B V}, y_{N B V}, z_{N B V})

The result $(V_{N B V}, P_{camera})$ consisting of the next best observation direction and the observation position is the final next best view.

Description of algorithm

Algorithm: Next best view based on occlusion information of depth image

Input: Depth image and camera internal and external parameters

Output: The next best view

Step 1: Detecting the occlusion information in depth image

Step 2: Constructing the external surface of occluded region according to occlusion information

Step 3: Modeling the external surface of occluded region

Step 4: Calculating the area of each external surface of occluded region according to formula (10)

Step 5: Calculating the observation direction and center point of each external surface of occluded region according to formula (11)

Step 6: Determining the sets of candidate observation directions and observation center points

Step 7: Calculating the visual space of each candidate observation direction according to formulas (13) and (14)

Step 8: Determining the next best observation direction and the observation position to achieve next best view

Experiment and analysis

Experimental environment and dataset

To validate the feasibility and effectiveness of proposed method, the experiment is conducted. The experimental hardware environment is CPU Pentium(R) Dual-Core 2.94 GHz, and the memory is 2.0 GB. The next best view program is implemented with C++ language. The 3D model of visual object is from the Stuttgart Range Image Database. In the process of experiment, the OpenGL is adopted to simulate the camera for observing 3D physical model and acquiring the depth image. The parameter of projection matrix in OpenGL is $(60, 1, 200, 600)$ , the window size is 400 × 400, and the camera observation distance is 300 mm.

Experimental results and analysis

Experimental results and analysis of proposed method

Figure 5 shows the experimental results of visual objects with different complexities. In Figure 5, the first line is the name of visual object—from left to right, they are Duck, Mole, Rocker, Bunny, and Dragon, respectively, the second line is the depth image of visual object acquired by the camera in current view, the third line is the occlusion boundary point (green pixel) and the nether adjacent boundary point (blue pixel) in the depth image of visual object, the fourth line is the observation direction of occlusion straight-line segment on the external surface of occluded region, the fifth line is the visible occlusion straight-line segment observed by the camera in next best view, and the sixth line is the depth image of visual object acquired by the camera in next best view which is calculated by proposed method in this article.

Figure 5.

The experimental results of next best view.

As can be seen from Figure 5, for the visual object with occlusion phenomenon in current view, the proposed method can calculate the reasonable observation direction of occlusion straight-line segment according to occlusion information and then finally achieve next best view, where the camera can observe the maximal occluded region of visual object. For the visual object Duck, as the occlusion phenomenon is not obvious, the number of visible occlusion straight-line segments on the external surface of occluded region in next best view is less. But for the visual objects Mole, Rocker, Bunny, and Dragon, as the occlusion phenomenon is obvious, the number of visible occlusion straight-line segments on the external surface of occluded region in next best view is more, namely, the red region in the fifth line of Figure 5 is larger. Therefore, the more obvious the occlusion phenomenon of visual object is, the more effective the proposed method is, which is coincident with the idea of solving next best view based on occlusion information in this article. Comparing the depth images of visual object in the second line with those in the sixth line of Figure 5, we can know that the next best view obtained by the proposed method accords with the observing habit of human vision.

In the view of the fact that there is no Ground Truth about next best view currently, it is very difficult to evaluate our method by comparing its result with the Ground Truth. But in order to verify the rationality of proposed method further, we do the quantitative analysis on the experimental results in Figure 5. The analysis results are shown in Table 1. Among them, N_NBV is the number of surface points of visual object obtained by the camera in next best view, N_OVERLAP is the number of overlap points, N_NEW is the number of new added points in next best view and $N_{NEW} = N_{N B V} - N_{OVERLAP}$ , R_OVERLAP is the overlap rate and $R_{OVERLAP} = N_{OVERLAP} / N_{N B V}$ , and R_NEW is the new added rate and $R_{NEW} = N_{NEW} / N_{N B V}$ . When the effect of next best view is evaluated, the number of new added points should be first concerned, and the new added rate is in the second place.

Table 1.

The quantitative evaluation of experimental results.

Visual object	N_NBV	N _OVERLAP	N _NEW	R_OVERLAP (%)	R_NEW(%)
Duck	17850	3219	14631	18.03	81.97
Mole	19521	1000	18521	5.12	94.88
Rocker	9754	570	9184	5.84	94.16
Bunny	23582	1046	22536	4.44	95.56
Dragon	9588	395	9193	4.12	95.88

As can be seen from Table 1, there are more new added points and higher new added rate in the next best view of camera, which indicates that the camera can indeed observe the maximal unobserved region of visual object. Analyzing Figure 5 and Table 1 shows that, for the visual object Duck with fewer visible occlusion straight-line segments, its overlap rate R_OVERLAP is slightly higher. The main reason is the external surface of occluded region of Duck is relatively smaller, which results in the change between next best view and current view is also smaller, thus the more observed points are observed in next best view. For the visual objects Mole, Rocker, Bunny, and Dragon with relatively more visible occlusion straight-line segments, their overlap rates R_OVERLAP are lower, and new added rates R_NEW are higher. It also accords with the facts that the visual objects have more occluded regions, and the external surfaces of occluded regions are larger.

Experimental result comparison and analysis of different methods

In order to evaluate the effect of proposed method better, we compare the proposed method with the methods in the studies by Banta et al.¹¹ and Li and Liu¹², which are also based on the depth image and consider the occlusion information. Figure 6 shows the depth image of visual object in next best view obtained by different methods. In Figure 6, the first line is the name of visual object, the second line is the depth image of visual object in current view, the third line is the depth image of visual object in next best view obtained by the method in the study by Banta et al.¹¹, the fourth line is the depth image of visual object in next best view obtained by the method in the study by Li and Liu,¹² and the fifth line is the depth image of visual object in next best view obtained by the proposed method in this article.

Figure 6.

The depth image of visual object in next best view for different methods.

Analysis by combining Figure 6 and the principles of different methods shows that the next best view of method in the study by Banta et al.¹¹ focuses on observing the back of visual object in current view. The next best view obtained by the method in the study by Li and Liu¹² focuses on observing the adjoining unknown region of the largest information gain point in current view. The next best view of the proposed method in this article is based on the occluded region generated by the shape of visual object and mainly focuses on observing the unknown information of occluded region in current view, which is coincident with the original intention of determining the next best view by making full use of occlusion information.

In order to evaluate the performance of different methods further, Table 2 shows the quantitative comparison results of different methods.

Table 2.

The quantitative evaluation of experimental results in next best view for different methods.

Visual object	Method in the study by Banta et al.¹¹					Method in the study by Li and Liu¹²					Proposed method
Visual object	N_NBV	N _OVERLAP	N _NEW	R_OVERLAP (%)	R_NEW (%)	N_NBV	N _OVERLAP	N _NEW	R_OVERLAP (%)	R_NEW (%)	N_NBV	N _OVERLAP	N _NEW	R_OVERLAP (%)	R_NEW (%)
Duck	21580	25	21555	0.12	99.88	21838	9769	12069	44.73	55.27	17850	3219	14631	18.03	81.97
Mole	17612	0	17612	0	100	14767	6391	8376	43.28	56.72	19521	1000	18521	5.12	94.88
Rocker	4523	0	4523	0	100	4383	1157	3226	26.40	73.60	9754	570	9184	5.84	94.16
Bunny	18839	1	18838	0.01	99.99	17601	5053	12548	28.71	71.29	23582	1046	22536	4.44	95.56
Dragon	8017	12	8005	0.15	99.85	9557	585	8972	6.12	93.88	9588	395	9193	4.12	95.88

Compared with the method¹¹ in Table 2, for the visual objects Mole, Rocker, Bunny, and Dragon, it can be seen, because the external surface of occluded region is larger and the back region is smaller, the number of new added points obtained by the proposed method is obviously more (although the new added rate is slightly lower). For the visual object Duck, because the back region of visual object is relatively larger in current view and the proposed method focuses on observing the occluded region, the number of new added points is less than that of the method in the study by Banta et al.¹¹ Overall, the method in the study by Banta et al.¹¹ focuses on observing the back of visual object, and it does not consider the occluded region generated by the shape of visual object. The proposed method makes full use of occlusion information to determine next best view, which is more beneficial to the solution of occluded region. Compared with the method in the study by Li and Liu¹² for the visual objects Mole, Rocker, Bunny, and Dragon, because the occlusion is obvious and the external surface of occluded region is larger, both the number of new added points and the new added rate are significantly better than those of the method in the study by Li and Liu.¹² For the visual object Duck with unobvious occlusion, although its number of new added points and the new added rate are relatively lower than those of visual objects Mole, Rocker, Bunny, and Dragon with obvious occlusion, the performance of proposed method is still clearly superior to that of the method in the study by Li and Liu.¹² In general, because the method in the study by Li and Liu¹² focuses on observing the adjoining unknown region of the largest information gain point in current view, the method in the study by Li and Liu¹² has higher overlap rate and the less number of new added points. The proposed method fully utilizes the occlusion information of visual object in current view to solve the candidate observation direction of external surface of occluded region and the visual space of candidate observation direction, and then determines the next best view finally. Therefore, the proposed method has a better performance for observing the occluded region of visual object.

In order to evaluate the computational complexity of different methods, Table 3 shows the time consumption of different methods, and the time consumption of each visual object is the average of 10 experiments.

Table 3.

The time consumption of different methods.

Visual object	Time consumption (s)
Visual object	Method in the study by Banta et al.¹¹	Method in the study by Li and Liu¹²	Proposed method
Duck	19.741	30.072	2.232
Mole	19.614	28.098	2.323
Rocker	19.549	4.056	2.485
Bunny	19.672	9.335	2.565
Dragon	19.561	4.847	2.772
Average time consumption (s)	19.632	13.594	2.475

From Table 3, we can see that the average time consumption of the methods in the studies by Banta et al.¹¹ and Li and Liu¹² is 13.594 and 19.632 s, respectively, and the average time consumption of the proposed method is 2.475 s. Therefore, the average time consumption of the proposed method is less than those of the methods in the studies by Banta et al.¹¹ and Li and Liu.¹² This shows that not only the computational complexity of the proposed method is low but also the proposed method has better real-time.

Conclusions

A next best view approach is proposed based on the occlusion information in a single depth image. This work is distinguished by three contributions: (1) the occlusion information of visual object is taken as the focus for solving the next best view of camera, which does not acquire the prior knowledge of visual object in advance or define the camera position on a fixed surface, and then it is suitable for different types of visual objects; (2) a modeling way which divides the external surface of occluded region into sets of occlusion straight-line segments and points is proposed. This modeling way provides a feasible scheme for constructing the model of unknown occluded region based on observed information; and (3) a method for solving the area of external surface of occluded region by utilizing its mass and surface density is presented, and then the visual space of candidate observation direction and the next best view can be obtained step by step. This method provides a physics-based knowledge idea to solve the problem of next best view.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is supported by the National Natural Science Foundation of China under grant no. 61379065 and the Natural Science Foundation of Hebei province under grant no. F2014203119.

References

Wong

Dumont

Abidi

. Next best view system in a 3-D object modeling task. In: Proceedings of the IEEE international symposium on computational intelligence in robotics and automation, Monterey, California, USA, 8–9 November 1999, pp. 306–311. Piscataway, NJ: IEEE.

Bottino

Laurentini

. What’s NEXT? An interactive next best view approach. Pattern Recognit 2006; 39(1): 126–132.

Pintilie

Stuerzlinger

. An evaluation of interactive and automated next best view methods in 3D scanning. Comput Aided Des Appl 2013; 10(2): 279–291.

Potthast

Sukhatme

. A probabilistic framework for next best view estimation in a cluttered environment. J Vis Commun Image Represent 2014; 25(1): 148–164.

Dutagaci

Cheung

Godil

. A benchmark for best view selection of 3D objects. In: Proceedings of the ACM workshop on 3D object retrieval, Firenze, Italy, 25 October 2010, pp. 45–50. New York, NY: ACM.

Trummer

Munkelt

Denzler

. Online next-best-view planning for accuracy optimization using an extended E-criterion. In: Proceedings of the international conference on pattern recognition, Istanbul, Turkey, 23–26 August 2010, pp. 1642–1645. Piscataway, NJ: IEEE.

Kriegel

Bodenmüller

Suppa

. A surface-based next-best-view approach for automated 3D model completion of unknown objects. In: Proceedings of the IEEE international conference on robotics and automation, Shanghai, China, 9–13 May 2011, pp. 4869–4874. Piscataway, NJ: IEEE.

Haner

Heyden

. Covariance propagation and next best view planning for 3D reconstruction. In: Proceedings of the European conference on computer vision, Florence, Italy, 7–13 October 2012, pp. 545–556. Springer Verlag, Berlin Heidelberg.

Connolly

. The determination of next best views. In: Proceedings of the IEEE international conference on robotics and automation, Mivssouri, USA, 25–28 March 1985, pp. 432–435. Piscataway, NJ: IEEE.

10.

Scott

. Model-based view planning. Mach Vis Appl 2009; 20(1): 47–69.

11.

Banta

Wong

Dumont

. Next-best-view system for autonomous 3-D object reconstruction. IEEE Trans Syst Man Cybern Bs Cybern 2000; 30(5): 589–598.

12.

Liu

. Information entropy-based viewpoint planning for 3-D object reconstruction. IEEE Trans Robot 2005; 21(3): 324–337.

13.

Vasquez-Gomez

Sucar

Murrieta-Cid

. Hierarchical ray tracing for fast volumetric next-best-view planning. In: Proceedings of the international conference on computer and robot vision, Regina, SK, Canada, 29–31 May 2013, pp. 181–187. Washington, DC: IEEE.

14.

Krainin

Curless

Fox

. Autonomous generation of complete 3D object models using next best view manipulation planning. In: Proceedings of the IEEE international conference on robotics and automation, Shanghai, China, 9–13 May 2011, pp. 5031–5037. Piscataway, NJ: IEEE.

15.

Zhang

Zuo

Yao

. A robot visual servo-based approach to the determination of next best views. In: Proceedings of the IEEE international conference on mechatronics and automation, Beijing, China, 2–5 August 2015, pp. 2654–2659. Piscataway, NJ: IEEE.

16.

Zhang

Liu

Kong

. Using random forest for occlusion detection based on depth image. Acta Optica Sinica 2014; 34(9): 0915003: 1–12.