Sage Journals: Discover world-class research

Abstract

Stereo vision has been studied for decades as a fundamental problem in the field of computer vision. In recent years, computer vision and image processing with a large field of view, especially using omnidirectional vision and panoramic images, has been receiving increasing attention. An important problem for stereo vision is calibration. Although various kinds of calibration methods for omnidirectional cameras are proposed, most of them are limited to calibrate catadioptric cameras or fish-eye cameras and cannot be applied directly to multi-camera systems. In this work, we propose an easy calibration method with closed-form initialization and iterative optimization for omnidirectional multi-camera systems. The method only requires image pairs of the 2D target plane in a few different views. A method based on the spherical camera model is also proposed for rectifying omnidirectional stereo pairs. Using real data captured by Ladybug3, we carry out some experiments, including stereo calibration, rectification and 3D reconstruction. Statistical analyses and comparisons of the experimental results are also presented. As the experimental results show, the calibration results are precise and the effect of rectification is promising.

Keywords

Stereo Calibration Rectification Omnidirectional Multi-camera System

1. Introduction

Stereo vision has been studied as a fundamental problem in computer vision for decades. By stereo matching, depth information can be obtained, which can be used in many applications such as autonomous navigation, obstacle detection, 3D reconstruction, virtual reality, object recognition and surveillance. For stereo matching, calibration and rectification are considered to be important pre-steps. By calibrating a stereo system, extrinsic parameters of the stereo system can be obtained to calculate the depth information. Calibration is also a must when rectifying stereo image pairs. A pair of images is thought to be rectified when its epipolar lines coincide [1]. One of the most important advantages of rectification is that the search for correspondences is done along the rows or the columns of the rectified images [2].

In recent years, computer vision and image processing with a large field of view (FOV), especially using omnidirectional vision and panoramic images, have been receiving increasing attention for navigation and surveillance applications [3]. Especially, the omnidirectional multi-camera system (OMS) has some more attractive advantages over other types of omnidirectional cameras. Firstly, an OMS supplies higher resolution. Additionally, it also provides uniform resolution for any direction of view. Stereo systems composed of OMSs also have these advantages. Consequently, the research into the calibration and rectification methods for this kind of systems is meaningful and necessary.

Many algorithms for calibration and rectification have been proposed in omnidirectional applications. However, the existing approaches suffer the following defects: (1) Most of them are limited to calibrate and rectify catadioptric cameras with parabolic, hyperbolic, elliptical or spherical mirror and fisheye cameras. It is not convenient to apply them directly to OMSs (Ladybug3 [4], e.g.). (2) Some calibration methods need special facilities and scenes which are not simple to make and commonly available. (3) Some of the rectification methods are not scanline rectification methods, thus losing the important advantage of making the search for correspondences faster and simpler. (4) Part of the rectification methods produce heavily distorted images.

To solve these issues we propose a method for calibration and rectification in this paper which is proved to be well suited to OMSs. This calibration method is based on a spherical model without intrinsic parameters. We first deduce the projective properties of a group of equidistant parallel lines on the checkerboard. Based on these properties, the closed-form solutions of the extrinsic parameters are obtained. With the closed-form solutions as initial guesses, the calibration method produces a refined result by optimizing the object functions we propose. The rectification method we propose is a scanline method. It reprojects the stereo image pair onto a public unit sphere, on which the corresponding epipolar lines are projected onto the same longitude line. For a certain stereo system, the rectification process can be parameterized as a look-up table, which is easy to implement on a field-programmable gate array (FPGA) or a graphics processing unit (GPU).

There are two main contributions in this paper. Firstly, we propose a calibration method with iterative optimization and closed-form initialization which is well suited to OMSs. Secondly, we design a rectification algorithm [5] to project the corresponding epipolar lines onto the same longitude line on a unit sphere. This algorithm is a scanline algorithm and is convenient for hardware implementation with highly parallel processing. The algorithm also avoids the heavy distortion which is common in existing methods.

The remainder of this paper is organized as follows: In Section 2 we introduce the related work. Section 3 discusses the proposed calibration method for an OMS. In Section 4, we describe the process of stereo rectification in detail. Lastly we present our experimental results for real images and draw the conclusions.

2. Related Work

2.1 Calibration Methods for Omnidirectional Cameras

Researchers have proposed many calibration methods for single viewpoint omnidirectional cameras and most of these methods can be classified into the following three general categories [6]: the planar-grid-based methods, the line-based methods, and the non-prior-knowledge methods.

2.1.1 Planar-grid-based Calibration Methods

As methods using planar grids [7] are wildly adopted for their simplicity of use and accurate results in the calibration of perspective cameras, researchers naturally tend to develop similar approaches for omnidirectional cameras. D. Scaramuzza et al. [8] propose a method to calibrate single viewpoint omnidirectional cameras based on the assumption that the image projection function can be described by a Taylor series expansion whose coefficients are estimated by solving a two-step least squares linear minimization problem. However C. Mei and P. Rives point out that the user has to select each point of the calibration grid independently when calibrating for the use of the polynomial approximation of the projection function. They propose an improved method in [9] by introducing a slightly modified sphere model of Geyer [10] and Barreto [11]. This method is simple without the need to know the mirror parameters. Gasparini et al. recover the intrinsic parameters of the central catadioptric cameras from the image of the absolute conic (IAC) in [12]. Deng et al. [13] use the bounding ellipse and the field of view to initialize the intrinsic parameters and then obtain the extrinsic parameters by computing the homography using a direct linear transformation (DLT) algorithm with data normalization. Ikeda, S. et al. [14] propose a calibration method for an OMS. They compute the intrinsic and extrinsic parameters using a target plane and a laser total station. Generally speaking, the main advantages of this kind of method are the high precision, and that the target plane used is simple to make and commonly available.

2.1.2 Line-based Calibration Methods

Line-based calibration methods such as [15]-[19] usually need only one view to carry out the calibration process. These approaches estimate the intrinsic parameters of the catadioptric system by means of computing the image of the absolute conic. The main advantage of line-based calibration methods is that no special pattern is needed because lines are easily found in the natural scene. However, the accuracy of this kind of method is usually lower than that of the planar-grid-based calibration methods.

2.1.3 Non-prior-knowledge Calibration Methods

Self-calibration methods, such as [20]-[22], use only point correspondences in multiple views, without needing any prior knowledge about the scene. However, they are less satisfying for 3D reconstruction and motion estimation due to limited precision.

Compared with previous methods, our calibration method is also based on a sphere model, but the model is simplified without intrinsic parameters. Additionally, our method is a combination of the planar-grid-based method and the line-based method. The line-based algorithm is used in the initialization stage. As we mentioned above, this algorithm enables us to get relatively accurate initial calibration results conveniently from only one view. In the iterative optimization stage, we design the objective function using the planar-grid-based technique which produces higher precision.

2.2 Rectification Methods for Omnidirectional Cameras

Considering that former researchers have proposed a lot of effective methods on matching perspective stereo pairs, people try to rectify omnidirectional stereo pairs and then transform omnidirectional stereo problems into traditional stereo problems. Many methods are proposed to rectify different kinds of omnidirectional stereo image pairs. C. Geyer et al. propose a method to rectify stereo image pairs of parabolic catadioptric cameras [23]. However, this method is limited to parabolic catadioptric stereo systems. Also, the method produces heavily distorted images. S. Abraham et al. propose an algorithm for fish-eye-stereo calibration and epipolar rectification [3]. Although it avoids heavy distortion and over expansion near the epipoles, the method is not a scanline rectification method, thus losing the important advantage of making computing stereo correspondences simpler. Researchers also exploit the rectification under different geometry frameworks. For example, Gonzalez-Barbosa et al. [24] rectify omnidirectional images on the rectangular grid, while Takiguchi et al. [25] reproject omnidirectional images onto cylinders. F. Kangni et al. [26] rectify the image pairs using the cubic projection model.

Although all the approaches mentioned above exhibit good behaviour in rectification, they still suffer at least one of the following defects: (1) some of the rectification methods are not scanline rectification methods, thus losing the important advantage of making the search for correspondences faster and simpler; (2) some of them produce heavily distorted images; (3) some are not easy for hardware implementation.

3. Stereo Calibration of OMSs

3.1 The Ladybug3 Omnidirectional Camera System

The high resolution Ladybug3 spherical digital video camera system () has six 2 MP cameras that enable the system to collect video from more than 80% of the full 360° sphere, and an IEEE-1394b (FireWire) interface with locking screw connection that allows JPEG-compressed 12MP resolution images to be streamed to disk at 15fps. The Ladybug3 camera system can output not only six normal images, but also an omnidirectional image either in DOME mode or Mercator mode. The Mercator mode panoramic images provided by the Ladybug3 are created using a spherical coordinate system. The projection model is as follows. The image axes u and v map to the two spherical angles θ and φ respectively in the following formulations:

\{\begin{array}{l} u = \frac{π - θ}{2 π} \cdot n C o l s & ​ \\ v = \frac{φ}{π} \cdot n R o w s, & ​ \end{array}\}

(1)

where nCols × nRows is the size of the panoramic images, θ and φ defined in are computed as follows:

\{\begin{array}{l} θ = a t a n 2 (y, x) & ​ \\ φ = a r c c o s \frac{z}{\sqrt{x^{2} + y^{2} + z^{2}}}, & ​ \end{array}\}

(2)

where (x, y, z) is a point in the camera head coordinate frame. Substitute Eq. 2 into Eq. 1, we get

\{\begin{array}{l} u = \frac{π - a t a n 2 (y, x)}{2 π} \cdot n C o l s & ​ \\ v = \frac{a r c c o s (z / d)}{π} \cdot n R o w s, & ​ \end{array}\}

(3)

where d = √x² + y² + z².

Figure 1.

Ladybug3 Omnidirectional System.

Figure 2.

The projection model of Ladybug3.

Figure 3.

The model plane is on the x – y plane of the world coordinate system.

Figure 4.

Two sets of the great circles intersect at E_x+, E_x– and E_y+, E_y–.

This simple spherical angle projection has been chosen because it maps nicely to a 2D image display.

3.2 Independent Extrinsic Calibration

In this part, we will calibrate the rotation and the translation between the target plane and the camera in each view. The final results are solved with the Levenberg-Marquardt algorithm. Consequently, we will discuss the initial guess of the rotation and the translation first. And then, a complete procedure of estimating the best result will be given.

3.2.1 Initialization of Extrinsic Parameters in Each View

In our algorithm of initialization, we will first estimate the rotation matrix. Then the translation can be obtained based on the estimated rotation matrix.

Let P _w be the coordinate of an arbitrary point attached to the target plane in the world reference frame and R _wc be the 3 × 3 orthonormal rotation matrix that rotates the world reference frame into the camera head coordinate frame and T _wc be the the translation. Then the transformation equation that transforms a point from the world reference frame to the camera head coordinate frame can be written as:

P_{c} = R_{w c} P_{w} + T_{w c} .

(4)

Without loss of generality, we assume the model plane is on the x – y plane of the world coordinate system, as shown in, that is z _w = 0. Let dX and dY denote the size of each square in the grid in X_w and Y_w.

There are two sets of parallel lines on the checkerboard. Each line is perpendicular to the lines in the other set. When projected to the viewing sphere in, each line forms a great circle. As shown in, the great circles corresponding to the parallel lines from the same set intersect at two antipodal junctions (E_x+, E_x– and E_y+, E_y–), just like the meridian lines intersect at the North Pole and the South Pole. It is easy to prove that vector OE_x+ is parallel with axis X_w and vector OE_y+ is parallel with axis Y_w. Therefore, if we denote the ith column of the rotation matrix R _wc by r _i , we have

\{\begin{array}{l} r_{1} = E_{x +} & ​ \\ r_{2} = E_{y +} & ​ \\ r_{3} = r_{1} \times r_{2}, & ​ \end{array}\}

(5)

Once we obtain the initial guess of the rotation matrix R _wc , we are able to solve the estimated translation vector T _wc . Let p _c,mn = (u_mn, v_mn) be the pixel of a grid corner on the checkerboard, where the subscripts m and n denote the indexes of the row and the column. According to Eq. 1, we have θ_mn = π – 2π · u_mn/nCols and φ_mn = π · v_mn/nRows. As shown in, we define OP'(θ, φ, 1) as the unit directional vector of OP'(θ, φ,‖OP‖). For any grid corner P _c,mn on the checkerboard, we can write the unit directional vector of OP _c,mn (θ_mn, φ_mn, ‖OP_c,_mn‖) as

\begin{matrix} M_{m n} = {O P'}_{c, m n} (θ_{m n}, φ_{m n}, 1) \\ = [\begin{matrix} s i n φ_{m n} c o s θ_{m n} \\ s i n φ_{m n} s i n θ_{m n} \\ \cos φ_{m n} \end{matrix}] . \end{matrix}

(6)

Let d_mn be the distance from P _c,mn to the center of the viewing sphere, that is d_mn = ‖OP _c,mn ‖. We have

P_{c, m n} = d_{m n} M_{m n} .

(7)

As the normal of the checkerboard is r₃, assuming that the distance from the center of the viewing sphere to the checkerboard plane is d_c, any point P _c,mn on the checkerboard plane satisfies the following equation

r_{3} \cdot P_{c, m n} + d_{c} = 0 .

(8)

Substitute Eq. 7 into Eq. 8, we get

r_{3} \cdot (d_{m n} M_{m n}) + d_{c} = 0 .

(9)

We can rewrite it as

d_{m n} = - \frac{d_{c}}{r_{3} \cdot M_{m n}} .

(10)

Substitute Eq. 10 into Eq. 7, we get

P_{c, m n} = - \frac{d_{c}}{r_{3} \cdot M_{m n}} M_{m n} .

(11)

Let P _w,mn be the corresponding coordinate of P _c,mn in the world coordinate system. According to Eq. 4, the relationship between them is given by:

P_{w, m n} = R_{w c}^{⊤} P_{c, m n} - R_{w c}^{⊤} T_{w c} .

(12)

Since the model plane is on the x – y plane of the world coordinate system and the point P_w,11 is the origin, as shown in, we have P_w,11 = 0. Substitute it into Eq. 4, we have P_c,11 = R _wc P_w,11 + T _wc . That is

T_{w c} = P_{c, 11} .

(13)

Substitute Eq. 13 and Eq. 11 into Eq. 12, we have

\begin{array}{l} P_{w, m n} = R_{w c}^{⊤} P_{c, m n} - R_{w c}^{⊤} P_{c, 11} \\ = [\begin{matrix} (\frac{r_{1} \cdot M_{11}}{r_{3} \cdot M_{11}} - \frac{r_{1} \cdot M_{m n}}{r_{3} \cdot M_{m n}}) d_{c} \\ (\frac{r_{2} \cdot M_{11}}{r_{3} \cdot M_{11}} - \frac{r_{2} \cdot M_{m n}}{r_{3} \cdot M_{m n}}) d_{c} \\ 0 \end{matrix}] . \end{array}

(14)

According to, we can rewrite P _w,mn as

P_{w, m n} = [\begin{matrix} (m - 1) d X \\ (n - 1) d Y \\ 0 \end{matrix}] .

(15)

From Eq. 14 and Eq. 15, we can construct two equations with a single point P _w,mn :

\{\begin{matrix} (\frac{r_{1} \cdot M_{11}}{r_{3} \cdot M_{11}} - \frac{r_{1} \cdot M_{m n}}{r_{3} \cdot M_{m n}}) d_{c} = (m - 1) d X \\ (\frac{r_{2} \cdot M_{11}}{r_{3} \cdot M_{11}} - \frac{r_{2} \cdot M_{m n}}{r_{3} \cdot M_{m n}}) d_{c} = (n - 1) d Y \end{matrix}\}

(16)

For a checkerboard with M × N grid corners, we can construct (M – 1)N + M(N – 1) equations. All these equations form an overdetermined system with the form Ad_c = b. The least squares formula can be written as

d_{c} = {(A^{⊤} A)}^{- 1} A^{⊤} b,

(17)

where

\begin{array}{l} A^{⊤} A = \sum_{m = 2}^{M} \sum_{n = 1}^{N} {(\frac{r_{1} \cdot M_{11}}{r_{3} \cdot M_{11}} - \frac{r_{1} \cdot M_{m n}}{r_{3} \cdot M_{m n}})}^{2} \\ + \sum_{m = 1}^{M} \sum_{n = 2}^{N} {(\frac{r_{2} \cdot M_{11}}{r_{3} \cdot M_{11}} - \frac{r_{2} \cdot M_{m n}}{r_{3} \cdot M_{m n}})}^{2}, \end{array}

(18)

and

\begin{array}{l} A^{⊤} b = \sum_{m = 2}^{M} \sum_{n = 1}^{N} (\frac{r_{1} \cdot M_{11}}{r_{3} \cdot M_{11}} - \frac{r_{1} \cdot M_{m n}}{r_{3} \cdot M_{m n}}) (m - 1) d X \\ + \sum_{m = 1}^{M} \sum_{n = 2}^{N} (\frac{r_{2} \cdot M_{11}}{r_{3} \cdot M_{11}} - \frac{r_{2} \cdot M_{m n}}{r_{3} \cdot M_{m n}}) (n - 1) d Y . \end{array}

(19)

According to Eq. 13 and Eq. 11, we have

T_{w c} = - \frac{d_{c}}{r_{3} \cdot M_{11}} M_{11} .

(20)

3.2.2 Optimization of Extrinsic Parameters

Next, we will find the final results of the extrinsic parameters by applying the iterative optimization.

By extracting the grid corners in the images, we are given M × N points on the target plane. Assume that the image points are corrupted by independent and identically distributed noise. The maximum likelihood estimate of R _wc and T _wc can be obtained by minimizing the following function:

\begin{array}{l} R_{w c}, T_{w c} \\ = arg min \sum_{m = 2}^{M} \sum_{n = 1}^{N} | | P_{c, m n} \\ - {\hat{P}}_{c, m n} (R_{w c}, T_{w c}, P_{w, m n}) | |^{2}, \end{array}

(21)

where p̂ _c,mn (R _wc , T _wc , P _w,mn ) is the projection of point P _w,mn according to Eq. 3 and Eq. 4. For a known target plane, P _w,mn can be easily calculated from Eq. 15 and the corresponding projection p _c,mn can be obtained by extracting the grid corners in the image. This is a nonlinear minimization problem, which can be solved with the Levenberg-Marquardt Algorithm [27][28]. After we get R _wc and T _wc , the normal of the target plane in the camera frame can be calculated. Let's rewrite Eq. 4 as follows:

R_{w c}^{⊤} P_{c} - R_{w c}^{⊤} T_{w c} = P_{w} .

(22)

Noticing that P _w = [x_w y_w 0]^T, we have

r_{3} \cdot P_{c} - r_{3} \cdot T_{w c} = 0 .

(23)

Therefore, the normal of the target plane in the camera frame is given by:

{N = (r_{3} \cdot T_{w c}) r}_{3} .

(24)

3.3 Stereo System Calibration

We treat the stereo system as a strictly rigid system. Consequently, the calibration of this stereo system is to estimate the best transform between Camera h and Camera l. We propose a two-step calibration process. In step I, we decouple the problem of finding the best transform into estimating a rotation part and a translation part independently. In step II, we use the results in step I as the initial guess to optimize the cost function to refine the results. We have two choices of the cost function: the first one is chosen to be the distance from the 3D points of the grid corners in one camera to the corresponding target plane observed from the other camera; the second one is the Euclidean distance between the 3D correspondences observed in the two cameras.

For each posture of the target plane, we can calculate its rotation R _wc , transformation T _wc and the normal of the target plane N in the camera head reference frame of Camera h and Camera l from the previous section. Let's denote them as R _h,i , T _h,i , N _h,i and R _l.i , T _l,i , N _l,i , respectively, where the subscripts h and l denote the camera head reference frame h and l; and i is the index of the observation. Let

\begin{array}{l} N_{h} = [\begin{matrix} \frac{N_{h, 1}}{‖N_{h, 1}‖} & \frac{N_{h, 2}}{‖N_{h, 2}‖} & \dots & \frac{N_{h, m}}{‖N_{h, m}‖} \end{matrix}] \\ D_{h} = [\begin{matrix} ‖N_{h, 1}‖ & ‖N_{h, 2}‖ & \dots & ‖N_{h, m}‖ \end{matrix}], \end{array}

(25)

and

\begin{array}{l} N_{l} = [\begin{matrix} \frac{N_{l, 1}}{‖N_{l, 1}‖} & \frac{N_{l, 2}}{‖N_{l, 2}‖} & \dots & \frac{N_{l, m}}{‖N_{l, m}‖} \end{matrix}] \\ D_{l} = [\begin{matrix} ‖N_{l, 1}‖ & ‖N_{l, 2}‖ & \dots ‖N_{l, m}‖ \end{matrix}], \end{array}

(26)

where mis the number of stereo image pairs.

Then the estimate of the translation has a closed form solution as follows:

T_{h l} = {(N_{l} N_{l}^{⊤})}^{- 1} N_{l} {(D_{l} - D_{h})}^{⊤},

(27)

and

R_{h l} = {V U}^{⊤},

(28)

where N_hN_l^T = USV^T is the associated singular value decomposition and the subscript hl means the translation is from the reference frame h to the reference frame l.

In Step II, what we do is minimizing a cost function which stands for the error of R _hl and T _hl to reduce the influence of noise. Here we introduce two kinds of cost functions used for optimization. The minimization of both the objective functions can be done through an iterative optimization procedure with initial estimates obtained from step I.

Let P_h,l^j be the j^th 3D grid corner points from the i^th image' captured by Camera h and P_l,i^j be the corresponding points from the image of Camera l. For m different views of the target plane and n 3D grid corner points per view, the refined extrinsic parameters R _hl and T _hl can be obtained by minimizing the following reprojection error:

E = \sum_{i}^{m} \sum_{j}^{n} {(\frac{N_{l, i}}{‖N_{l, i}‖} \cdot (R_{h l} P_{h, i}^{j} + T_{h l}) - ‖N_{l, i}‖)}^{2} .

(29)

This means the distance from the 3D points of the grid corners in one camera to the corresponding target plane observed from the other camera should be zero if the extrinsic parameters R _hl and T _hl are ideally estimated.

Another objective function is given more naturally in that it requires the Euclidean distance between the 3D points observed in the two cameras should be a minimum. The objective function is given as follows:

E = \sum_{i}^{m} \sum_{j}^{n} {‖(R_{h l} P_{h, i}^{j} + T_{h l}) - P_{l, i}^{j}‖}^{2} .

(30)

These two nonlinear optimization problems given in Eq. 29 and Eq. 30 can be solved by using the Levenberg-Marquadrt algorithm.

When solving the optimization problem using the first object function Eq. 29, we should notice that at least three views of the target plane which are nonparallel to each other are required to fully constrain the optimization problem. Since the solution of the system has 6 degrees of freedom (DOF) and each view can constrain 2 DOF, we need at least three views of the target plane. If the number of views is smaller than three, the solution obtained will not converge to the actual value.

4. Stereo Rectification Method

4.1 Geometric Framework

Firstly, let's make some notations clear. Let O_h be the origin of the camera head reference frame h and O_l be the the origin of the camera head reference frame l. From the calibration data, we are able to derive the extrinsic parameters R _hl and T _hl of the omnidirectional stereo camera system, in which R _hl is the 3 × 3 rotation matrix from h to l and T _hl is the translation.

Imagining that if we set up a rectification reference frame r whose z is along the baseline T _hl , as shown in Figure 5, every epipolar plane is perpendicular to the x – y plane. It means that all the points belonging to a certain epipolar plane are projected on the same line in the x – y plane. Consequently, all of the points on the same epipolar line should have the same ratio of their x and y coordinates (aligned in frame r). In other words, an epipolar plane intersects two spheres at two great circles with constant zenithal angles if we consider the epipoles as the two poles of the spheres.

Figure 5.

All the points belonging to a certain epipolar plane are projected on the same line in the x – y plane. For clarity, the x – y plane is moved from O_r to O′ _r

In order to transform the coordinates from the reference frames h and l to the reference frame r, we have to decide R _hr and R _lr , which are the rotation matrices from the reference frames h and l to the reference frame r respectively. Next we will show that R _hr is able to be derived as long as R _lr is known. As shown in Figure 5, let P be an arbitrary point in 3D space. The coordinates of point P in the reference frame of camera system h, l and r are P _h , P _l and P _r respectively. We have

\begin{matrix} P_{r} = R_{l r} P_{l} + T_{l r}, \\ P_{r} = R_{h r} P_{h} + T_{h r} \end{matrix}

(31)

where T _lr = 0 for the origins of the reference frame l and r are coincident. As a rigid transformation, we have

P_{l} = R_{h l} P_{h} + T_{h l} .

(32)

By substituting Eq. 32 and T _lr = 0 into Eq. 31, we get

\begin{array}{l} P_{r} = R_{l r} R_{h l} P_{h} + R_{l r} T_{h l} \\ P_{r} = R_{h r} P_{h} + T_{h r} . \end{array}

(33)

From Eq. 33, we can derive

\begin{matrix} R_{h r} = R_{l r} R_{h l} \\ T_{h r} = R_{l r} T_{h l} . \end{matrix}

(34)

Obviously, once R _lr is determined, transformation from the reference frames h and l to the reference frame r is able to be computed.

Next, let's estimate R _lr . As shown in the previous section, R _hl and T _hl are known after calibration. Considering point O _h , its coordinate vector in reference frame l is vector T _hl . At the same time, point O _h , is on the z axis of the reference frame r and its coordinates in the reference frame r are [0,0,‖T _hl ‖]^T. R _lr is the 3 × 3 rotation matrix and we write it as follows:

R_{l r} = [\begin{matrix} r_{11} & r_{12} & r_{13} \\ r_{21} & r_{22} & r_{23} \\ r_{31} & r_{32} & r_{33} \end{matrix}] .

(35)

Substitute the coordinates of point O _h in the reference frames h and r into Eq. 33, we have

[0,0, ‖T_{h l}‖]^{⊤} = R_{l r} T_{h l}

(36)

and the following constraint because R _lr is orthonormal as a rotation matrix:

R_{l r} R_{l r}^{⊤} = I,

(37)

where I is an identity matrix. Eq. 36 and Eq. 37 form a group of indeterminate equations because the reference frame r can rotate around the axis z. For simplicity, we let r₁₁ = 0 and show the corresponding analytic solutions as follows:

\begin{array}{l} r_{12} = - \frac{r_{33}}{r_{32}} \sqrt{\frac{r_{32}^{2}}{r_{32} + r_{33}}}, \\ r_{13} = \sqrt{\frac{r_{32}^{2}}{r_{32} + r_{33}}}, \\ r_{21} = - \frac{r_{22} (r_{32}^{2} + r_{33}^{2})}{r_{31} r_{32}}, \\ r_{22} = \sqrt{1 / [\frac{r_{33}^{2}}{r_{32}^{2}} + {(\frac{r_{32}}{r_{31}} + \frac{r_{33}^{2}}{r_{31} r_{32}})}^{2} + 1]}, \\ r_{23} = r_{22} \frac{r_{33}}{r_{32}}, \\ [\begin{matrix} r_{31} & r_{32} & r_{33} \end{matrix}] = \frac{T_{h l}^{⊤}}{‖T_{h l}‖} . \end{array}

(38)

4.2 Rectification with Interpolation

Without loss of generality, we do not limit the projection model of the camera as long as the correspondences in the same row or column of the rectified images have constant zenithal angles if we take frame r as the reference for the spherical coordinate system. Here we take Mercator projection as the model and the relation between every pixel in the rectified images and the corresponding points on the sphere is given by Eq. 3.

Supposing that we want to derive the intensity of a pixel with the coordinate [i, j] in the rectified image, we first calculate its coordinates on the sphere X _r in the frame r by transforming its spherical polar coordinate (d, θ, φ) into a Cartesian coordinate X _r , where d is the radius of the sphere, θ and φ can be obtained by Eq. 1. Thus its Cartesian coordinate X _l in frame l is X_l = R_lr⁻¹X_r, here R_lr⁻¹ = R_lr^T because the rotation matrix R _lr is orthonormal. Then we can compute its corresponding pixel coordinates [m, n] in the image of Camera l by Eq. 3. In most cases m and n are not integers. This means that the corresponding pixels fall into the rectangular region delimited by the four original pixels in the original images. In order to derive a more accurate result of the rectified image, some interpolation techniques need to be introduced. For the balance of complexity and performance, we choose bilinear interpolation as our preferred interpolation method. Let I_R(i, j) be the intensity of pixel [i, j] in the rectified images and I_O(〈m〉, 〈n〉|) be the intensity of pixel [〈m|,〉n〉] in the original images, we have

\begin{matrix} I_{R} (i, j) = I_{O} (⌊m⌋, ⌊n⌋) (1 - p) (1 - q) \\ + I_{O} (⌊m⌋ + 1, ⌊n⌋) p (1 - q) \\ + I_{O} (⌊m⌋, ⌊n⌋ + 1) (1 - p) q \\ + I_{O} (⌊m⌋ + 1, ⌊n⌋ + 1) p q, \end{matrix}

(39)

where p = m – 〈m〉 and q = n – 〈n〉, 〈• 〉 means rounding the element to the nearest integer towards minus infinity.

5. Experimental Results

The experiments are conducted on the platform of Ladybug3. Camera h and Camera l are installed with a baseline of about 37cm in the vertical direction. The resolution of the images captured by Ladybug3 can reach up to 5,400×2,700, which enables us to obtain a more precise calibration result. We carry out the experiments in two different scenes and show one of them for rectification and the other for 3D reconstruction.

5.1 Calibration

The method we proposed requires image pairs of the 2D target plane in a few different views. In this experiment, a target plane with a printed chessboard pattern is used. The target plane has 11 × 11 squares and the side of each square is 30mm. We get nine pairs of stereo images in total. In this experiment, we have done some statistical analysis on the calibration results. The rotation part of the calibration result is expressed in Euler angles (α, β, γ), and the translation part is expressed as a 3D vector (t_x, t_y, t_z). In Table 1, we have shown the variation of the calibration results obtained from different numbers of images. The results shown in Table 1 are obtained by using Eq. 29 and Eq. 30. As the method using Eq. 29 requires at least three views of the target plane to give the right results, we set the ‘Num of images’ to increase from 3. We can see that the σ of the results given by Eq. 29 is generally larger than that produced by Eq. 30.

Table 1.

Results using real data of 3 to 9 images using Eq. 29 and Eq. 30

Num of images		Using Eq. 29						Using Eq. 30
Num of images		α	β (10⁻³)	γ	t_x	t_y (in mm)	t_z	α	β (10⁻³)	γ	t_x	t_y (in mm)	t_z
3	Initial	−3.92	−4.16	−4.45	−2.98	−94.84	350.47	−3.92	−4.16	−4.45	−2.98	−94.84	350.47
	Final	−3.92	−4.16	−4.45	−1.81	−94.44	349.94	−0.06	−5.48	−5.62	−0.43	−95.93	351.47
	σ	4.80	5.50	2.90	3.16	2.90	6.22	1.50	6.50	3.10	2.18	2.47	5.02

4	Initial	−2.12	−0.80	−3.58	−2.62	−95.00	353.67	−2.12	−0.80	−3.58	−2.62	−95.00	353.67
	Final	−2.19	−0.36	−3.25	−1.98	−95.06	351.59	0.19	−3.56	−4.04	−0.01	−96.90	352.66
	σ	3.10	3.60	3.10	3.30	2.63	4.94	1.40	5.00	2.90	1.99	2.64	4.23

5	Initial	−0.09	−0.91	−3.16	−3.19	−96.61	349.79	−0.09	−0.91	−3.16	−3.19	−96.61	349.79
	Final	−1.07	0.37	−3.38	−2.10	−95.30	351.28	0.01	−1.86	−3.50	0.34	−97.16	353.57
	σ	2.20	3.60	3.10	3.40	2.56	5.16	1.40	4.00	3.00	1.88	2.74	3.77

6	Initial	0.44	−0.56	−4.20	−4.21	−95.26	349.06	0.44	−0.56	−4.20	−4.21	−95.26	349.06
	Final	−0.98	−1.45	−2.87	−2.38	−95.42	349.79	0.27	−1.15	−2.77	−0.10	−97.63	354.01
	σ	2.10	3.90	3.30	3.63	2.53	6.35	1.30	3.80	3.30	2.02	2.98	3.62

7	Initial	−0.72	0.50	−4.61	−1.76	−93.14	352.16	−0.72	0.50	−4.61	−1.76	−93.14	352.16
	Final	−11.10	3.13	2.08	−1.37	−97.51	354.04	−0.62	0.26	−3.91	0.54	−96.62	355.08
	σ	11.80	4.80	7.10	2.81	2.91	3.61	1.80	3.60	2.90	1.84	2.55	3.44

8	Initial	−1.25	1.38	−3.92	−1.29	−94.73	354.53	−1.25	1.38	−3.92	−1.29	−94.73	354.53
	Final	−9.92	0.42	0.45	−0.94	−96.97	353.19	−0.17	0.04	−3.46	0.64	−96.93	354.97
	σ	10.70	3.60	5.70	2.50	2.67	3.95	1.50	3.60	3.00	1.83	2.65	3.44

9	Initial	−0.45	1.08	−3.60	−1.74	−95.44	352.10	−0.45	1.08	−3.60	−1.74	−95.44	352.10
	Final	−7.52	2.46	−0.42	−1.25	−96.40	353.82	0.45	0.40	−2.99	0.87	−97.26	355.16
	σ	8.30	4.40	4.90	2.72	2.51	3.68	1.30	3.60	3.20	1.83	2.79	3.44

In order to investigate the stability of the proposed methods, we have applied the two methods using Eq. 29 and Eq. 30 to all combinations of eight images from the available nine images. We use Exc. i to denote the combination with the ith image excluded. For example, Exc. 9 means the results come from the images of 1–8. All the results are shown in Table 2 and Table 3. We find that the deviations from both methods are quite small and it implies that both the proposed algorithms can be considered stable. Since the deviation of the results given by Eq. 29 is still slightly larger, this may indicate that the method using Eq. 30 is probably more robust.

Table 2.

Variation of the calibration results among all continuous 8 images using Eq. 29

	Exc. 1	Exc. 2	Exc. 3	Exc. 4	Exc. 5	Exc. 6	Exc. 7	Exc. 8	Exc. 9	Mean	σ
α(10⁻³)	−1.54	−7.82	−7.31	−7.72	−7.35	−4.60	−1.08	−0.82	−9.77	−5.33	3.21
β(10⁻³)	2.93	2.89	2.13	1.96	3.08	3.53	0.89	1.03	0.40	2.10	1.05
γ(10⁻³)	−0.28	−1.62	−0.84	−0.72	−1.22	−2.32	−3.10	−1.37	0.29	−1.24	0.97
t_x(mm)	0.19	0.29	−0.57	−0.56	−0.16	0.47	−2.62	−0.97	−0.73	−0.52	0.88
t_y(mm)	−96.34	−92.14	−93.26	−93.39	−92.97	−94.06	−95.00	−94.85	−93.40	−93.93	1.20
t_z(mm)	354.83	355.88	354.10	354.10	355.61	355.80	350.97	352.33	354.17	354.20	1.55

Table 3.

Variation of the calibration results among all continuous 8 images using Eq. 30

	Exc. 1	Exc. 2	Exc. 3	Exc. 4	Exc. 5	Exc. 6	Exc. 7	Exc. 8	Exc. 9	Mean	σ
α(10⁻³)	2.32	0.47	0.45	0.53	0.34	−0.40	−0.07	−0.08	−0.17	0.38	0.75
β(10⁻³)	1.71	0.65	0.47	0.30	0.62	0.84	−0.67	0.55	0.04	0.50	0.60
γ(10⁻³)	−1.13	−3.07	−3.33	−3.16	−3.00	−2.90	−2.47	−3.17	−3.46	−2.85	0.66
t_x(mm)	1.75	0.97	0.66	0.60	0.69	1.31	−0.05	0.70	0.32	0.77	0.50
t_y(mm)	−99.72	−97.32	−97.06	−97.26	−97.39	−97.20	−97.74	−97.11	−96.87	−97.52	0.81
t_z(mm)	356.00	355.35	355.21	354.97	355.40	355.56	354.24	355.21	354.99	355.21	0.45

We also compare the results from our method with other previous methods in Table 4. The existing algorithms or toolboxes are mostly designed specifically for dioptric or catadioptric cameras. Their contributions are mainly concerned with the estimation of the parameters of the projection model. The Ladybug3 system is not only an OMS without intrinsic parameters, but also with a different projection model. In order to apply the previous methods on the Ladybug3 system, we partly modify the methods, including removing the calculation of the intrinsic parameters and replacing the projection model. We compare two kinds of the existing methods with ours In Table 4. One is the planar-grid-based (PGB) method similar to [9], in Table 4. The other is the line-based (LB) method similar to [15]. The PGB and LB methods in Table 4 are followed by Eq. 29 or Eq. 30, which means Eq. 29 or Eq. 30 is chosen as the object function for the optimization of the stereo calibration.

Table 4.

Comparison of our calibration method with other previous ones

Method	α	β (10–3)	γ	t_x	t_y (in mm)	t_z	baseline (in mm)
Our Method (Eq. 29)	−7.52	2.46	−0.42	−1.25	−96.40	353.82	366.72
Our Method (Eq. 30)	0.45	0.40	−2.99	0.87	−97.26	355.16	368.24
PGB (Eq. 29)	1.68	83.85	49.61	31.64	−93.03	364.07	377.10
PGB (Eq. 30)	−2.79	75.60	43.54	28.48	−94.54	358.01	371.38
LB (Eq. 29)	−1.88	3.40	2.85	0.33	−101.90	355.06	369.39
LB (Eq. 30)	3.26	2.09	−0.18	0.84	−101.10	356.66	370.71

5.2 Stereo Rectification

Figure 6 is the original image pair captured by Ladybug3 in this experiment and Figure 7 shows the rectified images from our rectification method. We line up the correspondences which are found and matched using Scale-invariant feature transform (SIFT). According to Figure 7, we can see that the correspondences fall on the same columns in the rectified pair. In order to make the results clear, we make some comparisons between the picture blocks from an unrectified stereo pair and a rectified stereo pair.

Figure 6.

Original image pair

Figure 7.

Rectified image pair

Figure 8(a) presents the details from the unrectified image pair. It is clearly shown that the stereo correspondences are not on the same line. Consequently, the performance of stereo matching will be poor.

Figure 8.

Comparison between the picture blocks from an unrectified stereo pair and a rectified stereo pair. Picture blocks in the first row are from Camera h and those in the second row are from Camera l .

Figure 9.

Experiment of 3D reconstruction.

Figure 8(b) presents the details from the rectified image pair. We can observe that the stereo correspondences fall on the same line after rectification. Comparing the details from Figure 8(a) and Figure 8(b), we can conclude that our method of calibration and rectification is effective.

We also record the time required to perform the rectification on a PC platform. The average time to rectify a 2700×1350 grey image is 2.66 seconds, while the average time for a coloured image of the same size is 9.29 seconds. All experiments were carried out on a server with an Intel Xeon X5560 CPU (2.80GHz), 12G RAM, running Matlab 2009 64bit. For an FPGA with an 80MHz system clock, it theoretically takes 228ms to rectify a 2700×1350 grey image pair by taking advantage of the parallel computing and the pipeline framework. For a coloured image of the same size, the time increases to 0.68s.

5.3 3D reconstruction

In this experiment, we have also obtained the 3D reconstruction results of the example scene by calculating the intersection of two rays using equation [29]. The virtual bird's-eye view (by IPM) is shown in (a). The reconstruction results can be seen in (b). As the experiments are carried out in the corner of the room, we should observe two walls perpendicular to each other in the bird's-eye view of the reconstruction result. (b) strongly agrees with this inference. In the point cloud, the two perpendicular walls, the desks and the chair can be clearly seen. We have obtained the 360° 3D reconstruction results of the example scene and the results are promising.

6. Conclusion

In this paper, we have developed an easy stereo calibration and rectification method for omnidirectional multi-camera systems. In the calibration procedure, iterative optimization is adopted to reduce the influence of noise on the calibration results. We give the closed-form solution based on the projective properties of equidistant parallel lines as the initial guess for iterative optimization. We have tested the algorithm with real data and carried out some analysis on the calibration results. According to the analysis, we believe that the method we propose is robust and reliable. Furthermore, we have applied the scanline rectification method we propose to the unrectified images. By comparing the details before and after rectification, we can conclude that the calibration results are precise enough to support rectification and that the rectification method is well suited to the OMS. Generally, we can conclude that the calibration and the rectification methods for the OMS are effective and promising.

Footnotes

7. Acknowledgements

This research work was supported in part by the National Natural Science Foundation of China via grant 61001171, 60534070 and 90820306, and by the Chinese Universities Scientific Fund.

References

Heller

Pajdla

(2009) Stereographic rectification of omnidirectional stereo pairs. In Computer Vision and Pattern Recognition. CVPR 2009. IEEE 2009:1414–1421.

Fusiello

Trucco

Verri

(2000) A compact algorithm for rectification of stereo pairs. Machine Vision and Applications 2000, 12:16–22.

Abraham

Förstner

(2005) Fish-eye-stereo calibration and epipolar rectification. ISPRS Journal of photogrammetry and remote sensing 2005, 59(5):278–288.

Ladybug3 spherical digital video camera system. http://www.ptgrey.com/products/ladybug3/.

Yanchang

(2011) A unified rectification method for single viewpoint multi-camera system. In Advanced Video and Signal-Based Surveillance (AVSS), 2011 8th IEEE International Conference on. 2011.

Puig

Bermúdez

Sturm

Guerrero

(2012) Calibration of omnidirectional cameras in practice: A comparison of methods. Computer Vision and Image Understanding 2012, 116:120–137.

Zhang

(1999) Flexible camera calibration by viewing a plane from unknown orientations. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, Volume 1, IEEE 1999:666–673.

Scaramuzza

Martinelli

Siegwart

(2006) A Flexible Technique for Accurate Omnidirectional Camera Calibration and Structure from Motion. In Proceedings of the Fourth IEEE International Conference on Computer Vision Systems, ICVS '06 2006:45–52.

Mei

Rives

(2007) Single View Point Omnidirectional Camera Calibration from Planar Grids. In Proceedings of IEEE International Conference on Robotics and Automation: 10–14 April 2007; Roma 2007.

10.

Geyer

Daniilidis

(2000) A unifying theory for central panoramic systems and practical implications. In European Conference on Computer Vision (ECCV), Springer 2000:445–461.

11.

Barreto

Araujo

(2001) Issues on the geometry of central catadioptric image formation. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, Volume 2, IEEE 2001:II–422.

12.

Gasparini

Sturm

Barreto

(2009) Plane-based calibration of central catadioptric cameras. In Computer Vision, 2009 IEEE 12th International Conference on, IEEE 2009:1195–1202.

13.

Deng

(2007) An easy calibration method for central catadioptric cameras. Acta automatic sinica 2007, 33(8):801–808.

14.

Ikeda

Sato

Yokoya

(2003) A calibration method for an omnidirectional multi-camera system. In Proc. SPIE, Volume 5006 2003:499–507.

15.

Geyer

Daniilidis

(1999) Catadioptric Camera Calibration. In Proc. Seventh Int'l Conf. Computer Vision, vol. I, pp. 398–404, 1999.

16.

Ying

(2004) Catadioptric camera calibration using geometric invariants. Pattern Analysis and Machine Intelligence, IEEE Transactions on 2004, 26(10):1260–1271.

17.

Barreto

Araujo

(2005) Geometric properties of central catadioptric line images and their application in calibration. Pattern Analysis and Machine Intelligence, IEEE Transactions on 2005, 27(8):1327–1333.

18.

Caglioti

Taddei

Boracchi

Gasparini

Giusti

(2007) Single-image calibration of off-axis catadioptric cameras using lines. In Proceedings of IEEE 11th International Conference on Computer Vision, IEEE 2007:1–6.

19.

Duan

(2008) A new linear algorithm for calibrating central catadioptric cameras. Pattern Recognition 2008, 41(10):3166–3172.

20.

Micusik

Pajdla

(2006) Structure from motion with wide circular field of view cameras. Pattern Analysis and Machine Intelligence, IEEE Transactions on 2006, 28(7):1135–1149.

21.

Ramalingam

Sturm

Lodha

(2010) Generic self-calibration of central cameras. Computer Vision and Image Understanding 2010, 114(2):210–219.

22.

Espuny

Gil

Burgos JI

(2011) Generic Self-calibration of Central Cameras from Two Rotational Flows. Int. J. Comput. Vision 2011, 91(2):131–145.

23.

Geyer

Daniilidis

(2003) Conformal rectification of omnidirectional stereo pairs. In Computer Vision and Pattern Recognition Workshop, 2003. CVPRW'03. Conference on, Volume 7, IEEE 2003:73–73.

24.

Gonzalez-Barbosa

Lacroix

(2005) Fast dense panoramic stereovision. In Robotics and Automation, 2005. ICRA 2005. Proceedings of the 2005 IEEE International Conference on, IEEE 2005:1210–1215. 18

25.

Takiguchi

Yoshida

Takeya

Eino

Hashizume

(2002) High precision range estimation from an omni-directional stereo system. In Intelligent Robots and Systems, 2002. IEEE/RSJ International Conference on, Volume 1, IEEE 2002:263–268.

26.

Kangni

Laganiere

(2006) Epipolar Geometry for the Rectification of Cubic Panoramas. In The 3^rd Canadian Conference on Computer and Robot Vision. 2006:70.

27.

Levenberg

(1944) A method for the solution of certain problems in least squares. Quarterly of applied mathematics 1944, 2:164–168.

28.

Marquardt

(1963) An algorithm for least-squares estimation of nonlinear parameters. Journal of the society for Industrial and Applied Mathematics 1963, 11(2):431–441.

29.

Hill

(1994) The pleasures of ‘perp dot’ products. Graphics gems IV 1994:138–148.

Stereo Calibration and Rectification for Omnidirectional Multi-Camera Systems

Abstract

Keywords

1. Introduction

2. Related Work

2.1 Calibration Methods for Omnidirectional Cameras

2.1.1 Planar-grid-based Calibration Methods

2.1.2 Line-based Calibration Methods

2.1.3 Non-prior-knowledge Calibration Methods

2.2 Rectification Methods for Omnidirectional Cameras

3. Stereo Calibration of OMSs

3.1 The Ladybug3 Omnidirectional Camera System

3.2 Independent Extrinsic Calibration

3.2.1 Initialization of Extrinsic Parameters in Each View

3.2.2 Optimization of Extrinsic Parameters

3.3 Stereo System Calibration

4. Stereo Rectification Method

4.1 Geometric Framework

4.2 Rectification with Interpolation

5. Experimental Results

5.1 Calibration

5.2 Stereo Rectification

5.3 3D reconstruction

6. Conclusion

Footnotes

7. Acknowledgements

References