Epipolar geometry and stereo matching algorithm for underwater fish-eye images

Abstract

A fish-eye lens achieves a large field of view at the cost of image distortion, and underwater imaging by a fish-eye lens introduces refraction when light passes through the different medium. To obtain three-dimensional information, under these circumstances, a stereo matching algorithm is proposed that analyzes the geometric characteristics of an underwater fish-eye image, taking into account distortion and refraction. First, the underwater imaging model of a stereo fish-eye camera is established, and the epipolar curve of the underwater fish-eye image is calculated. Then, in the matching step, an adaptive window based on mean-shift segmentation is proposed to further alleviate the impact of distortion. Experiments are performed on synthetic images and natural scene images. The results show that the proposed model and the calculated epipolar curve are effective and that the adaptive window method could improve the precision of stereo matching on underwater fish-eye images.

Keywords

Stereo vision underwater fish-eye images refractive geometry epipolar curve adaptive support window

Introduction

In recent years, the application of imaging devices in underwater environments has become a common practice. Such devices can be used on autonomous underwater vehicles, on remotely operated vehicles, and by divers.¹ As a result, three-dimensional (3-D) perception in underwater environments has attracted increasing attention.^2,3 Stereo matching is a key step in obtaining 3-D information, and the epipolar line is the most important constraint. Stereo matching based on general-perspective images has been studied extensively, and a detailed comparative study has been conducted by Scharstein and Szeliski.⁴ However, stereo matching based on underwater general images remains challenging. Epipolar line of underwater image would bend because of the refraction, and thus it becomes epipolar curve. Queiroz-Neto et al.⁵ performed stereo matching by ignoring the effect of refraction. Ferreira et al.⁶ made an approximate estimation of refraction and then removed the effect of refraction from the underwater image. These methods do not calculate the epipolar curve using a refraction model and often suffer from errors. To address this problem, Zhang et al.^7,8 calculated epipolar curves by taking refraction into account but only obtained a sparse disparity map based on scale-invariant feature transform (SIFT) feature points, which could not describe a 3-D scene in detail. Gedge et al.⁹ calculated epipolar curves based on a refraction model, thus improving the accuracy of dense matching. In summary, obtaining an accurate dense disparity map of underwater images requires an imaging model of refraction and an epipolar curve as an essential constraint while searching for correspondences.

The abovementioned studies focus on general images. A general camera’s field of view (FOV) is usually 50°–60°, whereas a fish-eye camera has a minimum FOV of 180°. Compared with general cameras, fish-eye cameras can capture more information of an ocean scene. Until now, few studies have investigated the stereo matching of underwater fish-eye images. Naruse et al.¹⁰ studied underwater 3-D measurement using fish-eye images, first converting the central parts of fish-eye images in air to general images and then performing the remainder of the process on the rectified parts. Thus, the large FOV of the fish-eye lens had not been fully utilized, and the distortion rectification introduced a significant error. Yamashita et al.¹¹ studied a stereo matching algorithm for underwater panoramic images acquired through a convex mirror and a general lens rather than using a fish-eye lens. The vision system was complex, and the captured images were also rectified before the stereo matching.

None of the previous works performed stereo matching on unrectified underwater fish-eye images. In order to avoid errors introduced by rectification, we work on unrectified images. A complete imaging model of the underwater fish-eye lens is established, and an epipolar curve is calculated based on the model. Note that, in addition to distortion caused by refraction, fish-eye cameras introduce serious distortion. Some of the abovementioned works use features to achieve sparse disparity map, while others use regular rectangular windows without consideration of the distortion. In order to deal with this problem and to achieve dense stereo matching of underwater fish-eye images, adaptive support windows that include the matching cost calculation are designed.

The main content of this article is organized as follows. A refraction model is built, and an epipolar curve is derived in the “Epipolar geometry” section. Then, a stereo matching algorithm based on an adaptive support window is presented in the “Stereo matching for underwater fish-eye image” section. “Experimental results and analysis” section presents our experimental results on both synthetic images and natural scene images, followed by concluding remarks in the “Conclusions” section.

Epipolar geometry

In general, the epipolar constraint is the most important constraint in the field of stereoscopic vision and reduces the field of search from a two-dimensional area to a one-dimensional straight line. Similarly, the epipolar constraint is essential to underwater fish-eye images. However, the distortion from a fish-eye lens and the refraction caused by underwater imaging make an epipolar line become a curve, called an epipolar curve.

Figure 1 shows a pair of fish-eye cameras in a canonical configuration, with baseline B and focal length f. In this configuration, points e_L1, O_L, e_L2, e_R1, O_R, and e_R2 lie on the same line. The distance between the optic center and the refractive surface is h, n_A is the refractive index of air, and n_W is the refractive index of water. Set O_L as the origin of the coordinate system. A transparent waterproof cover is needed when placing the lens underwater. Therefore, light travels through water, glass, and air successively. The waterproof cover is very thin and negligible, so we suppose that refraction only occurs when light passes from water to air.^12,13 Our fish-eye lens has an FOV of 180° and can be modeled by an equisolid projection model:

r = 2 f sin \frac{θ}{2}

Figure 1.

Imaging model of underwater binocular fish-eye lens. According to this model, we can get an epipolar curve on the right image from a point p_L on the left image.

where θ is an angle of incidence and r is the radius on images.

In this study, we regard the left image as the reference image and the right image as the target image. The derivation of the epipolar curve is as follows.

Let $p_{L} (x_{l}, y_{l})$ be a pixel of the left image. Then, its coordinate in a system of polar coordinates is given by:

{\begin{matrix} r_{L} = \sqrt{x_{l}^{2} + y_{l}^{2}} \\ φ_{L} = arctan \frac{y_{l}}{x_{l}} \end{matrix}

Suppose that p_L is projected from q_L, which is on the left lens’ hemisphere surface. Then, q_L is projected from Q_L on the refractive surface, but the corresponding object point of Q_L could be any point $Q_{i} (i = 1 \dots n)$ on the ray $Q_{L} Q_{0}$ if depth information is unknown. Coordinates of $Q_{i} (X_{i}, Y_{i}, Z_{i})$ are given by

{\begin{cases} X_{i} = (d tan (α_{L 1}) + h tan (α_{L 2})) cos (φ_{L}) \\ Y_{i} = (d tan (α_{L 1}) + h tan (α_{L 2})) sin (φ_{L}) \\ Z_{i} = d + h \end{cases}

where d is the distance between Q_i and the refractive surface. The projection of Q_i on the right refractive surface is Q_Ri, with coordinates $(X_{r i}, Y_{r i}, Z_{r i})$ . These coordinates are calculated as follows:

{\begin{cases} \sqrt{{(X_{r i} - B)}^{2} + Y_{r i}^{2}} = L_{i 2} \\ \frac{X_{i} - B}{X_{r i} - B} = \frac{Y_{i}}{Y_{r i}} \end{cases}

where L_i2 is the distance between Q_Ri and the right optic axis.

The projection of Q_Ri on the right lens’ hemisphere is q_Ri, and the projection of q_Ri on the right image is $p_{R i} (x_{r i}, y_{r i})$ . The coordinates of p_Ri are given by

{\begin{matrix} x_{r i} = r_{R i} cos (φ_{R i}) \\ y_{r i} = r_{R i} sin (φ_{R i}) \end{matrix}

From the abovementioned formulations, if we take different values for d, we obtain different p_Ri on the right image, and we can estimate a curve using a piecewise linear approximation of points p_Ri, that is, the epipolar curve of p_L on the right image. Theoretically, the range of d is $[0, \infty]$ ; however, the value of d could be limited in $[0, d_{max}]$ for underwater fish-eye images. Therefore, we only need to calculate a segment of the epipolar curve.

L_i2 in equation (4) is still unknown and is computed as follows.⁹

Figure 2 is part of Figure 1, and the variables in Figure 2 have the same meaning as those in Figure 1. From Figure 2, we have

[{(\frac{n_{W}}{n_{A}})}^{2} (h^{2} + L_{i 2}^{2}) - L_{i 2}^{2}] {(L_{i 1} - L_{i 2})}^{2} - L_{i 2}^{2} d^{2} = 0

Figure 2.

Refraction occurs at the junction of water and air. This is part of Figure 1. In order to obtain the epipolar curves, L_i2 should be calculated.

where L_i1 represents the distance between Q_i and the right optic axis.

Equation (6) is a fourth-degree polynomial of L_i2, but only one root is physically valid, which always lies in $[0, L_{i 1}]$ . The pseudocode for computing epipolar curve based on image modeling is given in Algorithm 1.

Algorithm 1.

Computation of epipolar curve.

The epipolar curve’s calculation is completed separately, and the result is stored in a lookup table (LUT), which can be called directly in the subsequent stereo matching. For each imaging model, the LUT only needs to be calculated one time.

Stereo matching for underwater fish-eye image

Adaptive window based on mean-shift segmentation

To reduce the image ambiguity, local stereo methods commonly aggregate the support from neighboring pixels in a given size-constrained window. In order to accurately estimate the disparity near depth discontinuities, a local support window is needed to adapt its shape and size and, thus, only collects support from pixels of the same depth.¹⁴ In addition, for underwater fish-eye images, a support window of fixed shape and size is not reasonable because of inhomogeneous distortion. In this case, color is a reliable feature. Therefore, we adopt an adaptive support window based on mean-shift color segmentation.¹⁵ Mean-shift segmentation is based on the hypothesis that pixels lying in the same segment have similar disparity values. Therefore, segmentation is yielded in the presence of low-textured surfaces and depth discontinuities.¹⁶ According to Figure 3, the adaptive window is constructed as follows:

Both the left and right images are segmented by mean-shift method and a label of each segment is obtained (Figure 3(a) and (e)).

Let i₁ be a pixel of the left image that lies in segment L_l1, and establish a k × k rectangular window D_i1 ⁴with i₁ as center. Then, the overlap of L_l1 and D_i1 is marked as J_i1.

If j₁, j₂…j_max are pixels on an epipolar curve (red curve in Figure 3(f)) of i₁ in the right image, then area J_j1 can be obtained as in step 2.

The overlap of J_i1 and J_j1 is marked as J_ij, in which case J_ij is the final adaptive support window of i₁ and j₁.

Figure 3.

Adaptive window based on mean-shift segmentation. The two images in the first column (a and e) are label figures of the left and right images, respectively. Then, (b) to (d), and (f) to (h) are enlarged versions of the areas within the green border in (a) and (e), respectively. (b) Left segmented region L_l1 (area bounded by blue edge), (c) left rectangular window D_i1 (area bounded by red edge), (d) left adaptive window J_j1 (area bounded by yellow edge), (f) right segmented region L_r1 (area bounded by blue edge), (g) right rectangular window D_r1 (area bounded by red edge), and (h) right adaptive window J_i1 (area bounded by yellow edge).

It is worth explaining that the image segmentation might obtain inconsistent segmentation results on the left and right images.¹⁷ This is why we use the overlap J_ij computed in step 4. The dissimilarity between pixels i and j can be calculated using mean absolute difference (MAD) measures⁴

C (i, j) = \frac{\sum_{i_{q} \in J_{l i j}, j_{q} \in J_{r i j}} | I (i_{q}) - I (j_{q}) |}{n_{a l l}}

where i and j are the corresponding pixels in the left and right images, respectively, and J_lij and J_rij are their support windows, and n_all is the total number of the pixels in the adaptive support window.

Then, winner-takes-all (WTA) strategy⁴ is chosen to select the corresponding point for i. The main steps of our stereo-matching algorithm are illustrated in Figure 4.

Figure 4.

Flow block diagram of the algorithm.

Experimental results and analysis

Our experiments are performed in MATLAB [version R2010a]. The natural scene images are captured by a binocular vision system, and the synthetic image is made by Persistence of Vision Raytracer. As shown in Figure 5, the binocular vision system is mainly made up of two parts: a binocular fish-eye camera system, whose model is NM33-F, and a laptop computer equipped with a 2.67-GHz Intel Core i5 CPU and 2-GB memory. The binocular vision system is placed behind a large, water-filled tank.

Figure 5.

(a) Overview of laboratory equipment. The binocular fish-eye camera system is outside pool and the objects are inside pool, just as shown in (b).

Test images are shown in Figure 6 and are the reference images of a pair.

Figure 6.

Test images. Both the synthetic image (a) and natural scene images (b and c) are used as the test images for the experiment. The synthetic image is obtained by software Persistence of Vision Raytracer. For natural scene images, many regions are not submerged in water.

Figure 7 shows some selected feature points in Figure 7(a) with their corresponding epipolar curves in Figure 7(b) of a synthetic scene. It is obvious that each epipolar curve in the right image exactly goes through the corresponding point.

Figure 7.

Examples of epipolar curves. The red crosses within blue circles show chosen points in the left image (a), which are used to calculate the epipolar curves in the right image (b).

In general, disparity is defined as the difference between the x-coordinates of two corresponding points. However, for underwater fish-eye images, disparity exists not only in a horizontal direction but also in a vertical direction. Thus, we need to redefine disparity: disparity of a pixel i in the left image is the index number of a pixel in i’s epipolar curve in the right image (if i’s corresponding point is the mth pixel in the epipolar curve, then the disparity of i is (m − 1)).

Figure 8 shows sparse stereo matching results based on the SIFT feature.¹⁸ The left column is obtained by the SIFT matching method, and the right column is obtained by our method. Table 1 presents the percentage of bad matching and the running times.

Figure 8.

Sparse matching results: (a) matching results from SIFT matching method and (b) matching results from our method (epipolar curve + adaptive support window). The feature points both in (a) and (b) are SIFT points. SIFT: scale-invariant feature transform.

Table 1.

Evaluation of sparse matching based on SIFT points.

Image	Method	Total number	Wrong number	Percentage of bad matching	Time (s)
Synthetic image	SIFT	393	77	19.6	4.7
Synthetic image	Ours	393	35	8.9	10.1
Natural image 1	SIFT	228	39	17.1	3.2
Natural image 1	Ours	228	18	7.8	9.8
Natural image 2	SIFT	307	51	16.6	3.8
Natural image 2	Ours	307	26	8.4	10.0

SIFT: scale-invariant feature transform.

In Figure 8 and Table 1, we observe a significant improvement in accuracy. In general, an SIFT descriptor is a high-dimensional vector that is robust to scale variance, rotation, and translation. However, SIFT does not work well in matching underwater fish-eye images, because they are incapable of dealing with the distortion in omnidirectional images.¹⁹ Furthermore, underwater fish-eye images have refractive distortion. However, our method obtains a better result with the help of the epipolar curve, even though it uses the simplest matching method (MAD to calculate matching cost and WTA to select corresponding points). The high accuracy of our algorithm proves that the derived epipolar curve is accurate and effective. Our method incurs a higher cost in terms of time than the SIFT method. For very few points, our method uses most of the running time on the image segmentation part.

Figure 9 shows dense disparity maps obtained from a rectangular window and an adaptive support window. Both methods use the epipolar curve described here. Because many regions of the captured images are not actually submerged in water (e.g. the fluorescent tube), we use a mask to indicate which parts of the image are considered for stereo matching. For convenience, the method with rectangular support windows is called “method 1,” and that with our adaptive support window is called “method 2.”

Figure 9.

Dense disparity map: (a) matching results from regular rectangular window method (epipolar curve + rectangular support window) and (b) matching results from our method (epipolar curve + adaptive support window). Depths range from close (red) to far (blue).

There is no data set similar to the Middlebury used for the performance evaluation of underwater fish-eye images, and we have no ground truth of the test images. Therefore, we evaluate the disparity map in Figure 8 as follows: randomly pick 1% points in Figure 8(a) and calculate the percentage of bad matches. The quantitative evaluation results are presented in Table 2.

Table 2.

Evaluation of dense matching.

Image	Method	Total number	Wrong number	Percentage of bad matching	Time (s)
Synthetic Image	Method 1	339	59	17.4	31.4
Synthetic Image	Method 2	339	20	5.8	71.2
Natural Image 1	Method 1	131	23	17.5	2.8
Natural Image 1	Method 2	131	9	6.9	14.9
Natural Image 2	Method 1	207	46	22.2	3.7
Natural Image 2	Method 2	207	24	11.6	17.3

In Figure 9, method 2 significantly improves the matching accuracy over method 1. Improvements are yielded in the presence of disparity-discontinuous regions: There is serious foreground hypertrophy in the results of method 1, and it is obvious that the adaptive support window based on color segmentation can solve this problem very well. In addition, in the natural scene image 1, the matching results of some points in disparity-discontinuous regions are shown in Figure 10.

Figure 10.

A comparison of the matching results between points in disparity-discontinuous regions using method 1 (b) and points in disparity-discontinuous regions using method 2 (d). The mismatch rate of method 1 is 60.8% and that of method 2 is only 15.2%. The regular rectangular window in method 1 leads to more mismatches in disparity-discontinuous regions.

For non-occlusion regions, in the case of rich texture, such as the pattern area on the box in the natural scene images, the adaptive support window is relatively small to contain points belonging to the same segment. In the low-textured area, such as the floor in the synthetic image, the window is relatively large to contain more neighboring information.

The computational complexity of both methods 1 and 2 is $O (n^{2})$ . However, for method 2, the computation of the adaptive support window is more complex than that of the fixed rectangular window. In addition, an image segment step is required for the adaptive support window. Subsequently, method 2 is more time-consuming than method 1, as shown in Tables 1 and 2 (average times). When dealing with very few points, most of the running time is taken up in the image segment step (the running time of the image segment step is 9.7 s.) The mean-shift segment method is relatively time-consuming and could be optimized to reduce the time. The efficiency of the proposed method can be also improved using parallel processors, because the adaptive support window can be computed in parallel. In addition, an acceleration strategy could be adopted to reduce the computational complexity, as in the box filter method and the multiresolution matching method for general images.

Conclusions

The stereo matching of underwater fish-eye images is studied. In order to obtain 3-D information accurately without rectification, we examine how the distortion of fish-eye images and refraction jointly affects the epipolar geometry and obtain an epipolar curve on the basis of imaging model. In addition, an adaptive support window for underwater fish-eye stereo matching is proposed. By means of experimental results, the necessity and effectiveness of epipolar curve are proved. In addition, the proposed method with an adaptive support window is able to improve the accuracy of matching results compared to the method with a fixed rectangular window.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Yakun Zhang

References

Servos

Smart

Waslander

. Underwater stereo SLAM with refraction correction. In: IEEE/RSJ international conference on intelligent robots and systems(IROS), Tokyo, Japan, 3 November–7 November 2013, pp. 3350–3355. IEEE.

Chang

Chen

. Multi-view 3D reconstruction for scenes under the refractive plane with known vertical direction. In: IEEE international conference on computer vision (ICCV), Barcelona, Spain, 6 November–13 November 2011, pp. 351–358.

Wei

Kang

. Image-based underwater 3D reconstruction with single viewpoint adjustment camera model. J Beijing Univ Aeron Astron 2015; 41(11): 2017–2022.

Scharstein

Szeliski

. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In: IEEE workshop on stereo and multi-baseline vision, Kauai, America, 2002, pp. 131–140.

Queiroz-Neto

Carceroni

Barros

. Underwater stereo. In: XX Brazilian symposium on computer graphics and image processing (SIBGRAPI), Curitiba, Brazil, 20 October 2004, pp. 170–177. IEEE.

Ferreira

Costeira

Santos

. Stereo reconstruction of a submerged scene. Iber Conf Pattern Recognit Image Anal 2005; 3522: 102–109.

Zhang

. Research on underwater stereo matching method based on color segmentation. Acta Optica Sinica 2016; 36(8): 193–200.

Zhang

Hao

. Research on scale invariant feature transform feature matching based on underwater curve constraint. Acta Optica Sinica 2014; 34(2): 183–189.

Gedge

Gong

Yang

. Refractive epipolar geometry for underwater stereo matching. In: Canadian conference on computer and robot vision (CVR), St Johns, Canada, 25 May–27 May 2011, pp. 146–152. IEEE.

10.

Naruse

Yamashita

Kaneko

. 3D measurement of objects in water using fish-eye stereo camera. In: International conference on image processing, Florida, America, 30 September–3 October 2012, pp. 2773–2776. IEEE.

11.

Yamashita

Kawanishi

Koketsu

. Underwater sensing with omni-directional stereo camera. In: IEEE international conference on computer vision workshops, Barcelona, Spain, 6 November–13 November 2011, pp. 304–311.

12.

Qin

Zhang

Huang

. Collinearity theory and the camera calibration method of underwater photogrammetry. Acta Metro Sinica 2014; 35(2): 133–138.

13.

Xiao

Zhang

Tan

. Effect of aberration induced by refractive index mismatch on imaging in confocal microscopy. Laser Optoelect Prog 2015; 52(2): 202–210.

14.

Zhang

Lafruit

. Cross-based local stereo matching using orthogonal integral images. IEEE Trans Circuits Syst Video Technol 2009; 19(7): 1073–1079.

15.

Comaniciu

Meer

. Mean shift: a robust approach toward feature space analysis. IEEE Trans Pattern Anal Mach Intell 2002; 24(5): 603–619.

16.

Tombari

Mattoccia

Stefano

. Segmentation-based adaptive support for accurate stereo correspondence: Pacific Rim conference on advances in image and video technology, Heidelberg, Germany, 2007, pp. 427–438. Berlin: Springer-Verlag.

17.

Yang

Ahuja

. Stereo matching using epipolar distance transform. IEEE Trans Image Proc 2012; 21(10): 4410–4419.

18.

Lowe

. Distinctive image features from scale-invariant keypoints. Int J Comput Vis 2004; 60(2): 91–110.

19.

Tong

Chen

. A spherical model based keypoint descriptor and matching algorithm for omnidirectional images. Adv Mech Eng 2014; 2014(2): 1–7.