Sage Journals: Discover world-class research

Abstract

Feature matching is one of the most important steps in the location technology of zooming images. According to the scale-invariant feature transform matching algorithm, several improved false matches elimination algorithms are proposed and compared in this article. First, features of zooming images and ranging models are introduced in detail in the theory framework of the scale-invariant feature transform feature detection and matching algorithm. The key role of the feature matching algorithm and false matches elimination in the ranging technology of zooming images is discussed and addressed. Second, false matches are eliminated by the proposed approach based on geometry constraint in zooming images with a higher accuracy. Third, false matches are removed by an elimination algorithm based on properties of the scale-invariant feature transform features. Finally, an iterative false matches elimination algorithm based on distance from epipole to epipolar line is proposed and this algorithm can also solve the real-time calibration of the shrink-amplify center for zooming images. Experiments results demonstrate that the three false matches elimination algorithms proposed are stable, and the false matches of feature points can be eliminated effectively with combination of these three methods, and the rest matching points can be applied into robot visual servoing.

Keywords

Zooming image scale-invariant feature transform geometry constraint false matches elimination robot visual servoing

Introduction

One key step to achieve image understanding is depth estimation of image, which is a fundamental problem in computer vision research and has important applications in robotics, scene understanding, and three-dimensional (3D) reconstruction.^1,2 The depth cues of monocular vision are the zooming image, which has a wide application in the field of visual monitoring, visual tracking, the robot’s environment sensing and map building.^3–5 According to the literature, the method of using zoom lens to achieve depth estimation was first proposed by Ma and Olsen,⁶ and information about depth can be provided by zooming images in theory. Precise depth estimation depending on thick lens instead of zoom lens based on the precise study of optical properties of the zoom lens is proposed by Lavest and colleagues,^7–10 but the experiments are conducted in structural scenes. Model of three parameters of zoom, focus, and aperture based on the actual structure of the lens zoom is proposed by Asada and colleagues^11–13 The experiment results show that the model is only fit for the high accurate and low distortion lens. Active vision of zoom tracking is proposed by Fayman et al.¹⁴ for visual tracking of depth estimation in zooming images, and applications of depth estimation of zooming image have been widened. In order to realize mobile robot visual servoing for object tracking and obstacle based on zooming image, some investigations are realized in Gao et al.¹⁵ Robust feature matching based on scale-invariant feature transform (SIFT) is realized by geometry constraint of zooming image. 3D reconstruction of real scene based on zooming image is established. The robot experimental results validate the practicability of the related algorithms. However, these studies mostly focus on the reconstruction model of zooming image with some special points, which lacks automatic sparse and dense matching, especially the investigation of false matches elimination algorithm. The former focuses on the technology of automatically matching of feature points of images taken in two different focal lengths, and the latter focuses on the pixel-by-point matching algorithm of pixels of two images. The above two matching algorithms are the basis of the sparse reconstruction and dense reconstruction. According to sparse match, one crucial aspect to the image registration technique is how to choose characterizes of image.

To solve the above identified problem, Harris corner detection operator,¹⁶ SUSAN corner detection operator,¹⁷ SIFT detection operator,¹⁸ and so on are classical operators. SIFT operator is one of the most popular and effective method because of its insensitive to light, rotation, and scaling. SIFT operator is ideally selected to detect and match image features because of rotation, translation, and scaling relations of the same scene in two images taken in different focal lengths. Subsequent 3D reconstruction or 3D ranging accuracy is greatly affected by part of false matches presented after the initial SIFT matching. According to the false matches of binocular vision system, random sample consensus (RANSAC) method is used to remove the false matches,^19,20 parallax filtering is used to remove the false matches in the 3D space,^21,22 and association rules are used to remove the false matches.²³ However, the methods of false matches mentioned above are limited to the binocular stereo vision system, which is quite different from the zooming ranging system. Therefore, this article tries to overcome the above issues by investigating approaches to eliminate false matches in zooming images. First, feature detection and matching algorithms are studied based on SIFT operator. Second, three improved false matches elimination algorithms are proposed based on different constraints. The experimental results are discussed to demonstrate the performance of the proposed methods. The rest of the article is organized as follows: The theory of zooming image depth estimation is described in section “Depth estimation principle of zooming images.” The theory of SIFT matching algorithm of zooming image is described in section “Feature detection and matching based on SIFT operator.” Two improved false matches elimination algorithms are proposed in section “False matches elimination.” The experimental results and analysis of different false matches elimination algorithms in the same environment are given in section “Experiment results and analysis.” Finally, the article is concluded with remarks in section “Conclusion.”

Depth estimation principle of zooming images

Depth estimate principle of the pinhole model

In the pinhole model, zoom is equivalent to the camera optical center’s movement along the axis (as shown in Figure 1 of $Δ f$ ). It is not difficult to find that in the Pinhole model, the objects distance (the distance between the camera’s optical center and the objects) variation is equivalent to the focal length’s variation (the translation of the optical center). In this model, using at least two different focal length to image can access the depth of the object information, as shown in Figure 1, the depth of the object calculation formula is

Depth = \frac{Δ {fr}_{1} f_{2}}{f_{1} r_{2} - f_{2} r_{1}}

(1)

where $r_{1}$ and $r_{2}$ are a pair of matching point’s radial radius in the zoom image.

Figure 1.

Pinhole model for two distinct focal lengths.

Depth estimate principle of the thick-lens model

The thick-lens model is considered as an ideal model of the zoom lens.⁷ As shown in Figure 2(a), plane $H_{oxy}$ and plane $H_{ixy}$ are called lord plane, they are all perpendicular to the axis, and from the plane $H_{oxy}$ to plane $H_{ixy}$ is parallel light rays. $H_{o}$ and $H_{i}$ are the intersection of the lord plane $H_{oxy}$ and $H_{ixy}$ with optical axis, respectively. Objects distance $p_{o}$ is the distance from the objects to the lord plane $H_{oxy}$ and image distance $p_{i}$ is the distance from the image plane to the lord plane $H_{ixy}$ . When zooming, the distance between the lord plane $H_{oxy}$ and the lord plane $H_{ixy}$ will change, which of the two lord planes will move along the axis. If the $H_{o}$ and $H_{i}$ overlap, the thick-lens model will change into the same model as the pinhole model, as shown in Figure 2(b). C is the projection center of the pinhole model, and $p_{o} = T_{z}$ (object to projection center distance), $p_{i} = f$ (the image plane to projection center distance).

Figure 2.

Thick-lens model for zoom lenses.

Compared with the pinhole model, thick-lens model is a more accurate zoom depth estimate model, because the actual zoom focal length when of changing quantity is not equal to the object variation of the distance.⁷ In the thick-lens model, the translation of the lord plane $H_{oxy}$ is decided by the position of the incident light, and the size of the focal length is decided by the translation of the lord plane $H_{ixy}$ . If with $H_{o}$ as a static and reference point, with translating the lord plane $H_{ixy}$ to a coincidence position with the lord plane $H_{oxy}$ , we will find that the variation $Δ t$ of the object distance is the key variable of zoom range (as shown in Figure 2(c)). Using at least two different focal length to images, the calculation formula of the depth of the thick-lens model is

Depth = \frac{Δ {tr}_{1} f_{2}}{f_{1} r_{2} - f_{2} r_{1}}

(2)

Obviously, no matter in what kind of models, after the camera model calibration, getting the matching points in the zoom images is an crucial step for zoom depth estimates.

Feature detection and matching based on SIFT operator

SIFT feature detection

SIFT algorithm is first proposed by D.G. Lowe in 1999 and optimized in Lindeberg.¹⁸ SIFT with principal components analysis (PCA) instead of histogram of the way and further improvement are described by Ke and Sukthankar.²⁴ The method has solved the scene, scaling, rotating partially occluded view changes caused by factors such as the image distortion, which is very suitable for sequence image processing research. Experiments and performance comparison with 10 kinds of the most representative feature matching describe operator (such as invariant moment, cross-correlation, SIFT) are discussed in Mikolajczyk and Schmid.²⁵ The results show that the SIFT features descriptor in light intensity change, image scaling, rotating, affine transformation can still achieve accurate and stable feature points with stable performance.

Scale-space extreme detection

The algorithm of scale-space extreme needs to detect all points. With experiments, Gaussian convolution effectively proved that it is the only linear transformation to show information of scale-space image. The scale-space of an image is defined as a function L(x, y, σ), that is produced from the convolution of a variable-scale Gaussian, G(x, y, σ), with the input image, I(x, y)

L (x, y, σ) = G (x, y, σ) \otimes I (x, y)

(3)

with

G (x, y, σ) = \frac{1}{2 π σ^{2}} e^{- (x^{2} + y^{2}) / 2 σ^{2}}

(4)

where I(x, y) is coordinate of pixel, denotes scale-space factor.

In order to effectively detect stable feature points in the scale-space, we can let the original image do convolution with a set of consecutive Gaussian convolution, thus generating a set of scale-space images. Therefore, we can get an image of multi-scale expression. This is also equivalent to add a new scale coordinates to the image data.

DOG operator, which is approximate with LOG operator of scale normalization, is defined as difference of Gaussians of two scales

\begin{array}{l} D (x, y, σ) = (G (x, y, k σ) - G (x, y, σ)) \otimes I (x, y) \\ = L (x, y, k σ) - L (x, y, σ) \end{array}

(5)

In general, the smaller the DOG (which means the smooth part of image is small), the smaller the scale. In order to find the extreme value point of scale-space, each pixel should make comparison with all pixels around it and detect whether it is an extreme value point in its image domain and scale domain or not. Pixels need to be detected should compare with eight adjacent points of same scale and 9×2 points of adjacent scale, in order to ensure the point both in scale-space and image meets the requirements of extreme value point.

Extreme point edge response

The Gaussian difference operator extreme value has smaller principal curvature in the direction of vertical edges and has larger principal curvature in the edge of the place across. So it is necessary to positioning the extreme point more accurately. Generally, principal curvature could be given by a $2 \times 2$ ’s Hessian matrix H

H = [\begin{matrix} D_{x x} & D_{x y} \\ D_{x y} & D_{y y} \end{matrix}]

(6)

It is known from formula (4), matrix H is symmetrical. If H is positive definite, its minimum value can be calculated by setting derivative to zero. If H is negative definite, its maximum value can also be got by the same way. Or it is impossible to set the extremum. The derivative of judgment process can be derived by the difference between adjacent pixels.

The principal curvature of H is proportional to the eigenvalue of H, so

Tr (H) = D_{xx} + D_{yy} = α + β

(7)

Det (H) = D_{xx} D_{yy} = α β

(8)

where $α$ is maximum eigenvalue and $β$ is minimum eigenvalue.

If $α = γ β$ , then

\frac{Tr {(H)}^{2}}{Det (H)} = \frac{{(α + β)}^{2}}{α \cdot β} = \frac{{(r β + β)}^{2}}{r β} = \frac{{(r + 1)}^{2}}{r}

(9)

where r is the given threshold, the value of $(r + 1)^{2} / r$ is increase with the value of r, and its minimum value is achieved as $α = β$ .

Therefore, detection of whether the principal curvature of D is less than r or not, is equal to determine formula (10)

\frac{Tr {(H)}^{2}}{Det (H)} < \frac{{(r + 1)}^{2}}{r}

(10)

is formed or not. Generally, $r = 10$ .

Make sure direction parameter of key point

Each key point in SIFT algorithm need to be specified a principal direction, which can ensure the rotation invariant character. The gradient magnitude and orientation are computed from each pixel of the region around the key point, as the following equation

m (x, y) = \sqrt{{(L (x + 1, y) - L (x - 1, y))}^{2} + {(L (x, y + 1) - L (x, y - 1))}^{2}}

(11)

θ (x, y) = a \tan 2 (((L (x, y + 1) - L (x, y - 1)) / (L (x + 1, y) - L (x - 1, y)))

(12)

where m(x, y) is the gradient magnitude, θ(x, y) is an orientation, L is the scale of key point, x is abscissa of pixel, and y is ordinate of pixel.

So far, the key points of image have been checked out. Each key point contains three parameters: position, scale, and direction.

Characteristic vector descriptor creation

The key points are regarded as the centers of 8×8 neighborhood window. Thus, each key point can form 128D characteristic vector. As shown in Figure 1 and 3(a), the central position is the position of current key point. Each division represents a pixel. The arrow’s direction represents gradient direction of the pixels as well as its length represents gradient modulus. And the circle in the graph represents the Gaussian weighted range. Characteristic vector descriptor is generated by accumulation of gradient direction histogram, which is shown in Figure 3(b).

Figure 3.

Characteristic vectors are generated by image gradients: (a) image gradients and (b) key point characteristic vector.

Feature matching based on minimum distance

With SIFT characteristic vector generated, Euclidean distance between two image characteristic vectors of the key points is treated as a similarity criterion, which is used for feature matching. Then, take certain critical points from the first image, through the traversal search algorithm, and find out two key points which have smaller Euclidean distance in the second image. As the two key points, using the closest distance divide by the second closest distance and the result we get is less than a certain threshold. Therefore, we can call the two key points a pair of matching points. Lowering the threshold, the number of SIFT feature matching points will decrease, but matching can have a more stable performance (in this article, threshold is 0.6). However, at the bottom of the overlapping area of two images, most feature points of benchmark image cannot find matching points. As sensor motion, scene occlusion and the similar structure may generate false matching. It is necessary to investigate some effective false matches elimination algorithms.

False matches elimination

False matches elimination based on zooming image geometry constraint

As we know, there must be some inevitable false matching in the results when using SIFT algorithm for image matching. Whether the application and improvement of SIFT algorithm are successful or not, it largely depends on the level of correct matching. So how to determine the right matching results is the essential problem.

In the research of SIFT algorithm, the synthetic images are used as research samples generally. Because the corresponding relation between original image and adding noise image can be predicted, which makes it easy for matching result statistic and favorable to the development of the study. The images are sampled from different focal length, the correspondence between the matching points changes with the content of the images randomly. Therefore, it needs to analyze the characteristics of the images and find appropriate standards to evaluate matching results.

Figure 4 shows the ideal matching points and practical matching points in the zooming images. P1 and P2 is a pair of ideal matching points; P1′and P2′ is a pair of practical matching points.

Figure 4.

The ideal matching points and practical matching points in zooming images.

For zooming images depth estimation, there is a basic assumption: the radial slope of matching point is the same in ideal state (matching points p1 and p2 in Figure 4). Obviously, having the same radial slope is the necessary conditions for correct matching points. So we can use this condition to get rid of false matching points in the matching results.

In fact, due to the influence of the distortion of imaging, even for the correct match points, radial slope could not be completely the same (matching points p1′ and p2′ in Figure 4). Therefore, it needs to give a reasonable tolerance to screen the ideal match points and try to eliminate those false matches.

Depth estimate of zooming images usually gathers image in two fixed focal lengths, and the zooming of center-collected images can be preset by the calibration of this two focal lengths. So, on this basis, we can design the following experiments for detecting the level of the radial angle of matching points. In Figure 5, the shown structured scene can find the matching points through angular points.

Figure 5.

Structured scene of detection experiments of matching point’s radial angle.

Assuming that there are N pair of matching points in zooming images, as a pair of matching points can determine a line, so N line equations for matching points can be obtained. Assume $(Z_{x}, Z_{y})$ is the zoom center, which is supposed to be in every line in an ideal situation, so the matrix equation can be written as

[\begin{matrix} a_{1} \\ a_{2} \\ ⋮ \\ a_{N} \end{matrix} \begin{matrix} b_{1} \\ b_{2} \\ ⋮ \\ b_{N} \end{matrix}] [\begin{matrix} Z_{x} \\ Z_{y} \end{matrix}] = [\begin{matrix} c_{1} \\ c_{2} \\ ⋮ \\ c_{N} \end{matrix}]

(13)

The matrix equation can be simplified as

A θ = b

(14)

Apparently, zoom center $θ = (Z_{x}, Z_{y})$ can be transformed into a least square parameter estimation problem. That is when N > 2, the zoom center can be uniquely identified by equation (13)

{\overset{⌢}{θ}}_{L S} = {(A^{T} A)}^{- 1} A^{T} b

(15)

After zoom center is identified, the radial angle can be calculated by two line equations. Assume that a pair of matched point coordinates respectively is $(x_{i}, y_{i})$ , $(x'_{i}, y'_{i})$ , so a pair of vectors which makes the zoom center as vertex is ${\overset{⇀}{a}}_{i} = (x_{i} - Z_{x}, y_{i} - Z_{y})$ , ${\overset{⇀}{b}}_{i} = ({x^{'}}_{i} - Z_{x}, {y^{'}}_{i} - Z_{y})$ . Then any pair of matching point radial angle is

α_{i} = \arccos \frac{{\overset{⇀}{a}}_{i} \cdot {\overset{⇀}{b}}_{i}}{| {\overset{⇀}{a}}_{i} | | {\overset{⇀}{b}}_{i} |}

(16)

The matching point radial angle set is $A = {α_{i} | α_{i} = f (p_{i}), i \in [1, n]}$ , which is obtained using type (16). So the ideal matching point set is

P' = {p_{i} | α_{i} = f (p_{i}), α_{i} \in [a, b]}

(17)

In summary, false matches elimination based on zooming image geometry constraint consists of the following steps:

Step 1: Make use of traditional SIFT method to obtain the zooming image matching points set P;

Step 2: According to the matching points set P, make an estimation of image zoom center using equation (15);

Step 3: Calculate the matching point radial angle set A by equation (16);

Step 4: Obtain a set of ideal matching points $P'$ by equation (17).

False matches elimination based on SIFT feature property

False matches elimination algorithm introduced previously suggests that more ideal match points can be obtained based on the geometry constraint of zooming images. However, it cannot completely guarantee the correctness of the matching results, because geometry constraint of zooming image is just the necessary conditions of the right matching. And the experiment results also show that more obvious false matches appear with the raising level of geometry constraint error. In this section, most match points are obtained directly on the analysis of SIFT feature attributes based on geometry constraint of zooming image.

The key point of the SIFT features generally contains scale, the main direction, and coordinate values. Scale and the main direction are more important. The impact of SIFT features properties of match points can be examined by the geometry constraint of the zooming image after achieving match points.

Any scale and direction of the key points are likely to become a pair of match points, since the SIFT algorithm is mainly based on the local gradient features, and matching results are not affected by the scale and the main direction of key points. They do not affect the match results directly. But there is no rotation and peace in the image of the zooming image, so the main direction of the match points should be consistent. The size of the scale represents the fuzzy degree of the original image; only fuzzy levels are similar of descriptors, which are most likely to become the matching point in the SIFT algorithm. Usually, one image is relatively clear, and another is fuzzy for the same scene in zooming images, and the fuzzy degree is relatively fixed. So the smooth scales are not equal and relative fixed inevitability, to make two images have a similar degree of fuzzy. This is the scale ratio of the match points, because 18 mm and 55 mm of the focal length and the ratio of focal length close to 3 in zooming images are used.

The main direction of the match points is in the same direction in the zoom image, the scale ratio of the match point is close to a constant; generally, this constant is the ratio of the imaging focal length of the zooming image. Comparison between the scale and the direction can be used in key points matching, accuracy and efficiency of matching are also promoted.

A simple method to eliminate the abnormal match points in scale and direction is to use the probability and statistics features of the all match points’ scales and the main directions. Assuming N match points are obtained by SIFT algorithm, the scale ratio of the match point is $s (i)$ , the ratio of the main direction is $o (i)$ , $i \in [1, N]$ . The average scale ratio of the match point and the average main direction ratio are as follows

μ_{s} = \frac{1}{N} \sum_{i = 1}^{N} s (i)

(18)

μ_{o} = \frac{1}{N} \sum_{i = 1}^{N} o (i)

(19)

The standard deviations are

σ_{s} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(s (i) - μ_{s})}^{2}}

(20)

σ_{o} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(o (i) - μ_{o})}^{2}}

(21)

Assume that the scale ratio of the match point and the ratio of the main direction respectively follow a normal distribution, the abnormal match points in scale and main direction can be eliminated by a confidence interval. Set the confidence intervals of the scale ratio and the main direction ratio of the match point are $[μ_{s} - k_{s} σ_{s}, μ_{s} + k_{s} σ_{s}]$ , $[μ_{o} - k_{o} σ_{o}, μ_{o} + k_{o} σ_{o}]$ , the standard deviation factor of the scale ratio and the main direction ratio are represent by $k_{s}$ and $k_{0}$ , using to control the size of the confidence interval. Due to depth estimate of the zooming image, higher requirements of the angle of the match point, so the standard deviation factor of the main direction ratio $k_{0}$ takes smaller value, here takes $k_{0} = 0.5$ . The standard deviation factor of the scale ratio $k_{s}$ can be larger, here takes $k_{s} = 1$ .

Iterative false matches elimination based on distance from epipole to epipolar line

Actually, zooming image is a special kind of translation image. In ideal condition, connecting lines of matching points shall intersect at a common epipole as shown in Figure 6. Therefore, an epipole can be fitted by epipolar line of matching points. Then, we can use the distance from epipole and epipolar line to eliminate the abnormal epipolar lines. Meanwhile, the false matches can also be eliminated.

Figure 6.

Epipole of zooming image.

Least square method is used for fitting epipole. In order to increase the accuracy of epipole, feature attribute of SIFT can be adopted to wipe off most false matches before epipole is fitted. According to equation (22), the distances from pole to every straight line can be calculated

d_{i} = \frac{A_{i} x_{0} + B_{i} y_{0} + C_{i}}{| A_{i}^{2} + B_{i}^{2} |}

(22)

$A_{i}, B_{i}, C_{i}$ signifies the epipolar line equation. $(x_{0}, y_{0})$ is the epipole which is fitted by least square method.

Distribution of distance from epipole to epipolar line is shown in Figure 7(a). And vertical distribution from epipole to epipolar line is shown in Figure 7(b). Length of vertical line is the Euclidean distance from epipole to epipolar line. The figure shows that a few epipoles which are far from epipolar lines should be eliminated. Suppose the ith distance of epipolar line is $d (i)$ , $i \in [1, N]$ . The average rate of distance from epipole to epipolar line is

μ_{d} = \frac{1}{N} \sum_{i = 1}^{N} d (i)

(23)

Figure 7.

Statistic results of distance from epipole to epipolar line: (a) distribution from epipole to epipolar line and (b) position of epipole.

Standard deviation is

σ_{d} = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(d (i) - μ_{d})}^{2}}

(24)

Presume that distance from epipole to epipolar line obeys normal distribution. Abnormal matching points can be eliminated by setting a confidence interval. The confidence interval of polar line distance of matching point is $[μ_{d} - k_{d} σ_{d}, μ_{d} + k_{d} σ_{d}]$ . The $k_{d}$ is the standard deviation factor of epipolar line distance of matching point. The $k_{d}$ can be taken in smaller value because of application of SIFT feature attribution. Here, $k_{d} = 0.5$ .

Suppose $d_{\max}$ to be the up limit of the distance from epipole to epipolar line, eliminate all the matching points whose distance is bigger than $d_{\max}$ , repeat the above steps until all the new distance from epipole to epipolar line is smaller than $d_{\max}$ , iterative formula is as follows

d_{i}^{m} < d_{\max}, i \in [1, N^{m}]

(25)

where $d_{i}^{m}$ represents the distance from epipole to epipolar line in the mth calculation and $N^{m}$ is the number of matching points left in the mth calculation.

In summary, the steps of false matches elimination algorithm based on distance from epipole to epipolar line are as follows:

Step 1: Make use of traditional SIFT algorithm to obtain the zoom image matching points set P;

Step 2: According to the matching points set P, make use of false matches elimination algorithm based on zooming image geometry constraint;

Step 3: Calculate the distances from epipole to epipolar line, and calculate the confidence interval by type (23) and type (24), set the standard deviation factor $k_{d} = 0.5$ to eliminate the false matches;

Step 4: Calculate $d_{i}$ by the left matching points if the type (25) is satisfied, then the iterative process is terminated; otherwise, switch to Step 3 and continue calculating until all the new distances from epipole to epipolar line are smaller than $d_{\max}$ , and output the final matching results.

Experiment results and analysis

Results of initial SIFT matching

The experimental results show that there are still some obvious false matches after using RANSAC algorithm (Table 1). It is necessary to eliminate false matches furtherly and get ideal matching points.

Table 1.

The statistics of matching results.

Scene	Resolution	Feature point		Initial matches	After RANSAC
Flower	1176 × 784	Left image: 1638	Right image: 791	208	176
Flower with checkerboard	1176 × 784	Left image: 876	Right image: 1432	213	193

Results of three proposed false matches elimination methods

False matches elimination based on zooming image geometry constraint

Before the matching of actual image, we can detect matching point radical angle between level and distribution through a simple structured scene, so as to provide reference for the removal of matching error. Suppose the matching point radial angle distribution interval to be $[a, b]$ , the matching point set got by traditional SIFT method for zooming image is $P = {p_{1}, p_{2}, \dots, p_{n}}$ , and $p_{i} = {(x_{i}, y_{i}), (x'_{i}, y'_{i})}$ . During the removing of error matches, a is generally set to 0, b is equal to the maximum acceptable match point radial angle.

The experiment results show that radial angle of matching points in the actual zoom image is different. Most of them are smaller (proportion of angle less than 1° is about 69.6% and less than 2° is about 93.1%). Matching points set with bigger radial angle is focus on nearby of the zoom center (proportion of angle greater than 2° is just 6.9%). So, according to the experiment results, a suitable error level is chosen for screening the ideal matching points of SIFT matching result.

Further processing results of SIFT matching are shown in Figure 8, which is on the basis of false matches elimination algorithm based on geometry constraint. First, statistics about the value of below a certain angle values of the match points, as shown in Figures 9 and 10. The change range to 0°–180° of the angle. The angle has covered all the match points. The remaining match point increase along the angle levels’ gradually increase can be shown in the picture.

Figure 8.

Matching results of traditional method: (a) a pair of original various focus images, (b) initial SIFT matching results, and (c) matching results after RANSAC algorithm.

Figure 9.

Results of radial angle of the matching point detection experiments: (a) results of the corner detection, (b) results of the angle of the matching points, (c) distribution of the angle of the matching points, and (d) distribution of large angle (angle is greater than 2°).

Figure 10.

Angle distribution of the match point.

The number of remained match points is increasing as the error levels’ gradually increases which can be seen from Figure 11(a)–(c). The false matches of feature points are eliminated effectively using the improved SIFT method, the accurate match of the feature points is realized, and the ideal match points in depth estimated of the zooming image are obtained.

Figure 11.

The effect of false matches elimination: (a) the upper limit of the angle: 7°, (b) the upper limit of the angle: 15°, and (c) the upper limit of the angle: 30°.

Figure 11 shows the effect of elimination the false matches. For the example of ‘Flower’ in the figure, there are no obvious false matches when the angle is small, and there are obvious false matches as the angle is big. There could not be allowed a larger angle of the match point in depth estimation of the zooming image. So an ideal match point in a smaller angle can be obtained. The analysis of ‘Flower with checkerboard’ is the same with ‘Flower’, the data can be seen from Figure 11(a)–(c).

As seen from the experimental results, false matches are eliminated by the algorithm based on geometry constraint, the accurate matching of the feature points are realized, the ideal matching points for depth estimation of the zooming image are also obtained. The algorithm proposed in this article has the same property with RANSAC algorithm which can adjust matches numbers by the threshold.

False matches elimination based on SIFT feature property

For the example of ‘Flower’ in the figure, the size distribution and the main direction distribution of the ideal match point are analyzed in Figures 12 and 13. The curve of scale of match point in the left and right images is shown in Figure 12(a). The curve of scale ratio of corresponding match point is shown in Figure 12(b).The curve of the main direction of match point in the left and right images is shown in Figure 13(a). The curve of ratio of the main direction of corresponding match point in the left and right images is shown in Figure 13(b).

Figure 12.

Scale distribution of ideal match point: (a) curve of scale of match point and (b) curve of scale ratio of match point.

Figure 13.

Main direction distribution of ideal match point: (a) curve of the main direction of match point and (b) curve of ratio of the main direction of match point.

The scale ratio of match points approaches to constant 3, only the scale ratio of the first match point is abnormal,which can be seen in the size distribution of the ideal match point from Figure 12. The direction rate of correspond match points close to constant 1, also only the first match point of abnormal ratio, which can be seen from Figure 13. The scale and direction match point is really a false match by observing the match points on the original image, as shown in Figures 14 and 15.

Figure 14.

False matches of abnormal scale and direction.

Figure 15.

The result of false matches elimination based on SIFT feature property.

Initial matches used in the experiment are shown in Figure 8(b), there are 208 match points, many false matches exist. The result of a false matches elimination algorithm based on properties of the SIFT features is shown in Figure 16, the remaining match point reached 121, accounting for 91.67% of all match points, and only one false match points left, which is superior the algorithms based on geometry constraint of zooming image, can be used for the subsequent 3D reconstruction of the surface. The algorithms based on geometry constraint of zooming image can be improved, since the scale ratio and the main direction ratio to meet the condition of the false match points are still exists. The analysis of ‘Flower with checkerboard’ is the same with ‘Flower’, the data can be seen from Figure 12 –15.

Figure 16.

Statistic results after false matches elimination: (a) distance from epipole to epipolar line and (b) position of the epipole.

Iterative false matches elimination based on distance from epipole to epipolar line

The new distance from epipole to epipolar line can be calculated with new epipole fitted by latest matching points. Figure 16 shows the statistic results; it can be seen that the distances from epipole to epipolar line are limited to a smaller range.

The accuracy of matching results can also be seen from the radial angle among the matching points. The statistic results are shown in Figure 17, the max angle is 7.6°, mean value of the angles is 0.4°, and most radial angles of the matching points are limited to a smaller range.

Figure 17.

Radial angle among the matching points.

The analysis of ‘Flower with checkerboard’ is the same with ‘Flower’, and the data can be seen from Figures 16 –18. Results of false matches elimination based on distance from epipole to epipolar line are shown, and there are no false matches left.

Figure 18.

Results of false matches elimination.

Conclusion

According to the false matches of SIFT matching algorithm, features of zooming images and the model of ranging are studied in detail on the basis of the basic theories of SIFT features detection and matching algorithms. Three kinds of false matching elimination approaches are mainly investigated. First, part of false matches is eliminated effectively based on geometry constraint of zooming image. Matches are filtered by the error level of geometry constraint from the experimental results. However, the percentage of ideal matching points needs increased further. Second, a false matches elimination algorithm based on properties of the SIFT features is investigated to further remove the false matches through setting confidence intervals of scale ratio of match points and ratio of the main direction, and more match points are gained. Third, an iterative false match elimination algorithm based on distance from epipole to epipolar line is proposed. The experiment results of real images collected show that the three proposed false matches elimination algorithms are stable, practical, and valuable, and the false matches of feature points can be eliminated effectively by combining the three methods. On the basis of RANSAC algorithm, the further application of the three proposed algorithms in this article can eliminate almost all the false matches, and the rest matching points can be applied into robot visual servoing, which can achieve more desirable 3D reconstruction results than those in Gao et al.¹⁵

Future work can focus on improving the speed of SIFT matching algorithm and finding more constraints to eliminate the false matches mostly. The elimination algorithms developed can also be applied into the image pairs feature matching gotten by one camera moving on a linear lead rail.

Footnotes

Handling Editor: Fei Chen

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work is partly supported by the National Natural Science Foundation of China (grant no. 51175494), the State Key Laboratory of Robotics Foundation (grant no. 2016008), Program for Liaoning Excellent Talents in University(grant no. LJQ2014021), the Natural Science Foundation of Liaoning Province (grant no. 201602652), and Shenyang Ligong University Computer Application Key Discipline Foundation (grant no. 4771004kfx09).

References

Zhuang

Chen

Han

et al . Status and development of natural science understanding for vision-based outdoor mobile robot. Acta Autom Sin 2010; 36: 1–11.

Zhang

Yang

Cai

ST.

Model design of monocular front vision for mobile robot. Control Decis 2012; 27: 792–796.

Wang

YQ.

A monocular stereo vision algorithm based on bifocal imaging. Robot 2007; 29: 41–44.

Liu

Wang

YQ.

Analysis of the monocular stereo system based on bifocal imaging. Comput Meas Control 2008; 16: 1316–1318.

Siagian

Itti

Chang

CK.

Mobile robot navigation system in outdoor pedestrian environment using vision-based road recognition. In: IEEE international conference on robotics and automation, Karlsruhe, 6–10 May 2013, pp.564–571. New York: IEEE.

Olsen

Depth from zooming. J Opt Soc Am 1990; 7: 1883–1890.

Lavest

Rives

Dhome

Three-dimensional reconstruction by zooming. IEEE T Robotic Autom 1993; 9: 196–207.

Lavest

Rives

Dhome

Modeling an object of revolution by zooming. IEEE T Robotic Autom 1995; 11: 267–271.

Delherm

Lavest

Dhome

et al . Dense reconstruction by zooming. Lect Notes Comput Sci 1996; 1065: 427–438.

10.

Lavest

Delherm

Peuchot

et al . Implicit reconstruction by zooming. Comput Vis Image Und 1997; 66: 301–315.

11.

Asada

Baba

Oda

Depth from blur by zooming. In: Proceedings of the vision interface, Ottawa, IN, Canada, 7–9 June 2001; pp.165–172. New York: ACM

12.

Baba

Asada

Oda

et al . A thin lens based camera model for depth estimation from defocus and translation by zooming. In: Proceedings of the vision interface, Calgary, AB, Canada, 27–29 May 2002, pp.274–281. New York: ACM

13.

Baba

Oda

Asada

et al . Depth from defocus by zooming using thin lens-based zoom model. Electr Commun Jpn 2006; 89: 53–62.

14.

Fayman

Sudarsky

Rivlin

et al . Zoom tracking and its applications. Mach Vision Appl 2001; 13: 25–37.

15.

Gao

Liu

et al . Distance measurement of zooming image for a mobile robot. Int J Control Autom 2013; 11: 782–789.

16.

Vino

Sappa

AD.

Revisiting Harris corner detector algorithm: a gradual thresholding approach. Lect Notes Comput Sci 2013; 7950: 354–363.

17.

Zhu

et al . Corner detection algorithm based on grey absolute correlation degree. Chin J Sci Instr 2014; 35: 1230–1238.

18.

Lindeberg

Scale selection. In: Ikeuchi

(ed.) Computer vision: a reference guide. Berlin: Springer, 2014, pp.701–713.

19.

Dong

Wang

Zhu

et al . Stereo vision image matching based on RANSAC algorithm. J Beijing Univ Technol 2009; 35: 1–6.

20.

Shi

Zhang

Wei

Robust feature matching algorithm for random texture images. J Nanjing Univ Aeronaut Astronaut 2010; 42: 1–7.

21.

YZ.

Research on false match filtering method for large-parallax images of stereo vision. Internet Things Technol 2011; 4: 63–65.

22.

The research and application of vehicle scene 3D reconstruction based on binocular stereo vision. Master Thesis, Hefei University of Technology, Hefeim, China, 2009.

23.

Liu

TJ.

An image feature matching algorithm based on association rules. Chin J Sens Actuator 2009; 22: 1737–1741.

24.

Sukthankar

. PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Washington, DC, 27 June–2 July 2004, pp.506–514. New York: IEEE.

25.

Mikolajczyk

Schmid

A performance evaluation of local descriptors. IEEE T Pattern Anal 2005; 27: 1615–1621.

Zooming image based false matches elimination algorithms for robot navigation

Abstract

Keywords

Introduction

Depth estimation principle of zooming images

Depth estimate principle of the pinhole model

Depth estimate principle of the thick-lens model

Feature detection and matching based on SIFT operator

SIFT feature detection

Scale-space extreme detection

Extreme point edge response

Make sure direction parameter of key point

Characteristic vector descriptor creation

Feature matching based on minimum distance

False matches elimination

False matches elimination based on zooming image geometry constraint

False matches elimination based on SIFT feature property

Iterative false matches elimination based on distance from epipole to epipolar line

Experiment results and analysis

Results of initial SIFT matching

Results of three proposed false matches elimination methods

False matches elimination based on zooming image geometry constraint

False matches elimination based on SIFT feature property

Iterative false matches elimination based on distance from epipole to epipolar line

Conclusion

Footnotes

Declaration of conflicting interests

Funding

References