Oriented Edge-Based Feature Descriptor for Multi-Sensor Image Alignment and Enhancement

Abstract

In this paper, we present an efficient image alignment and enhancement method for multi-sensor images. The shape of the object captured in a multi-sensor images can be determined by comparing variability of contrast using corresponding edges across multi-sensor image. Using this cue, we construct a robust feature descriptor based on the magnitudes of the oriented edges. Our proposed method enables fast image alignment by identifying matching features in multi-sensor images. We enhance the aligned multi-sensor images through the fusion of the salient regions from each image. The results of stitching the multi-sensor images and their enhancement demonstrate that our proposed method can align and enhance multi-sensor images more efficiently than previous methods.

Keywords

Feature Descriptor Multi-sensor Images Image Alignment Image Enhancement

1. Introduction

For autonomous robots, fusion of multi-sensor images provides more useful information regarding an environment than a single-sensor image, because different sensors can be used to provide different information. Specifically, since infrared (IR) images provide detailed information about dark or smoky areas that is not available in charge coupled device (CCD) images, they are commonly used to enhance CCD images [1 –3]. Lex et al. [1] proposed the dehazing method in the fog-shrouded area of the CCD image using a Near IR (NIR) image. Zhang et al. [3] enhanced CCD image dark area using a corresponding NIR image. For the fusion of multi-sensor images, precise alignment of the images is the most important step in the processing procedure. However, because the intensities of IR images are determined by the temperatures of the captured objects, whereas the intensities of the CCD images are determined by the colours, the intensities of the same object in the IR and the CCD images are represented by different values and in different proportions; this difference makes proper alignment challenging. In addition, it is difficult to capture CCD and IR images using the same device for the pre-alignment because of the hardware differences in the CCD and IR capturing devices. Therefore, alignment of the images is a required part of pre-processing for their enhancement.

Figure 1.

Examples of four oriented edge maps of CCD and IR images (top: examples of CCD images; bottom: examples of IR images). The oriented edge maps are generated according to four directions: 22.5°∼67.5°, 67.5°∼112.5°, 112.5°∼157.5°, 157.5°∼202.5° from left to right.

For the alignment of multi-sensor images, most previous works minimized the error in the overlapped area using edge maps [4], oriented edge maps [5], edge contours and mutual information (MI) [5 –10]. In this approach, the features used for error minimization are designed as invariant to brightness changes induced by the physical characteristics of various sensors. However, regardless of their successive aligned results, it is hard to use the figures in practice, due to the computational expense necessary for global optimization. On the other hand, feature-based approaches, such as SIFT [11, 12] and SURF [13], have also been proposed for performing fast alignment, and have yielded promising results. However, these methods are not suitable for aligning multi-sensor images because of the intensity and gradient differences between the CCD and IR images.

To solve these problems, we propose a robust feature descriptor for multi-sensor image alignment. Since the proposed alignment method is based on feature matching between images, the aligned result is achieved in a few seconds, without the need for global iterative optimization. To overcome the difference in properties among the multi-sensor images, our descriptor uses the magnitudes of the oriented edge maps, which are independent to the expressional values of the pixels. The final descriptor formula is constructed by using geometric blur with a coarse-to-fine strategy. Following the alignment of the IR and CCD images, we are also able to enhance the CCD image by combining its salient region with the corresponding one in the IR image. The enhanced result represents more detailed information than either CCD or the IR images alone.

The rest of this paper is organized as follows. Section 2 presents the proposed alignment method for multi-sensor images. In Section 3, we describe the enhancement of a CCD image using its aligned IR image. In Section 4, we show that our aligned result is accurate, and is achieved with a faster computational time than previous methods. We also show that the enhanced image not only has natural variations but also retains details from the IR image.

2. Multi-sensor Image Alignment

In multi-sensor images, the recorded intensity range and the values for the specific objects vary depending on the capture device. However, most object shapes appear in multi-sensor images with different contrasts. For the detection of feature points, we use a Hessian Detector using Integral Images [13], which is one of the most popular fast feature detection methods available. By using the feature locations detected and the scales of the features, we generate proposed feature descriptors for matching the images.

To construct the feature descriptor, we use the orientation properties of the signals around the detected features. The orientation of each signal is constructed from the edge around it. With the assumption of gradient preservation, we use the features in the oriented edge maps [14] constructed from the oriented map [5] of the input images. The oriented map is generated by determining the edge direction at the pixel that belongs to the edge group. To decide whether an arbitrary pixel belongs to the edge group or not, we use an eigenanalysis [5, 15] of a structure matrix E at each pixel as

E = (\begin{matrix} ω_{1} & ω_{2} \end{matrix}) (\begin{matrix} λ_{1} & 0 \\ 0 & λ_{2} \end{matrix}) (\begin{matrix} ω_{1}^{T} \\ ω_{2}^{T} \end{matrix})

(1)

where ω₁ and ω₂ denote the eigenvectors representing the maximum and minum directions of intensity variations, respectively, and λ₁ and λ₂ are the eigenvalues which represent the magnitudes ω₁ and ω₂, respectively. Since the pixels belonging to the edge group have large differences betwen their two eivenvalues, we determine edge pixels as follows:

(λ_{1} - λ_{2}) \cdot \frac{λ_{1} - λ_{2}}{λ_{1} + λ_{2}} > T_{e d g e}, where λ_{1} \geq λ_{2} \geq 0

(2)

T_edge is the threshold value which is chosen as the mean of the local variances (see [15] for more details).

The oriented map is constructed by the orientations of edge pixels as

θ_{x, y} = \arctan (\frac{d y}{d x})

(3)

The oriented edge maps are generated using the edge magnitude according to the orientation angle θ defined in (3). We quantize the orientation angle into k directions; then, the edge magnitude of each pixel is assigned to one of the oriented edge maps according to its quantized orientation angle:

I_{e}^{i} (x, y) = {\begin{matrix} M (x, y) \\ 0 \end{matrix} \begin{matrix} if θ \in q_{i} \\ otherwise \end{matrix}

(4)

where I_eⁱ is the i th oriented edge map, which is constructed with the quantized angle range q_i. M(x, y) is the magnitude of the gradient at the pixel, I (x, y), which depends on the assigned orientation angle θ_{x, y}. In our experiments, we used $M (x, y) = \sqrt{d x^{2} + d y^{2}}$ . We also quantized the orientation angle into four directions for the k value. Figure 1 shows examples of CCD and IR images and their oriented edge maps in our experiments.

The feature descriptor is constructed by sampling around the feature in the oriented edge maps using geometric blur [14, 16]. The geometric blur descriptor provides discriminative information by using a spatially varying kernel. Given one of the oriented edge maps, I_e, the sampled information at the central feature point, x_c, is defined by a Gaussian blur with a varying standard deviation as

B_{x_{c}} (x) = I_{e} \circ G_{α ‖ x_{c} - x ‖ + β}

(5)

where x is the signal around the feature point, x_c, and ◯ is the convolution operator. α and β are constants that determine the amount of blur. In our experiments, we used the values α=0.5 and β=1.0.

Figure 2.

Sampling strategy for constructing a feature descriptor using geometric blur

To create a feature descriptor, we sample 51 pixels from each oriented edge map using six scales and 30-degree intervals, as shown in Figure 2. We concatenate the sampled pixels, thereby creating a descriptor with dimensions 51×k.

To ensure scale invariance in feature matching, we generate feature descriptors with a scaling 1.1 times that of the oriented edge map. Feature descriptors for one feature point are defined by six descriptors with dimensions 51×k. The feature matching of our descriptors is applied in the same manner as in [13]. The initial homography between each input image pair is calculated using RANSAC [11], and then is adjusted in the bundle phase.

3. Multi-Sensor Image Enhancement

In this Section, we propose an image enhancement method using mid-wavelength IR (MWIR) or long-wavelength IR (MWIR) images. In the CCD image, we detect the regions which need to be enhanced. Then, we combine these regions with those in the IR image, while the same regions in the IR image may have more details. Intuitively, regions which are too bright or dark with low saturation incur a loss of detail [3]. On this basis, the salient region, R_salient can be calculated based on saturation and brightness value:

\begin{matrix} R_{s a l i e n t} = R_{s} \cdot R_{v}, \\ and R_{s} = 1 - e^{- p_{s} c_{s}}, R_{v} = 1 - e^{- p_{v} c_{v}} \end{matrix}

(6)

Figure 3.

Example of the proposed salient region computation.

where R_s and R_v denote weights of saturation and brightness, p_s and p_v are the appearance probabilities of the saturation and brightness, and c_s and c_v are location weights of the saturation and brightness in the image, respectively. The appearance probabilites p_s and p_v enale a wide area distribution of the salient regions. These values are obtained from the normalized histograms of the S and V channels. For the regtions to be enhanced, the location weights c_s and c_v delineate the regions that are too bright or too dark with low saturation.

\begin{array}{l} c_{s} = 1.0 - F (s), \\ c_{v} = | 1.0 - 2 F (v) |, \end{array}

(7)

where F(·) is an accumulated histogram function. Figure 3 shows one of our salient region results and weights used. We enhance the higher values in the salient region using the aligned IR image, and vice versa.

In the IR image, the caputred objects are recorded by their temperatures. This means that the major objects may have higher temperatures, which puts them in the salient regions. Therefore, we directly construct the salient region weights of the IR image by using its normalized pixel values. For smoothing out variation between the salient and non-salient pixels, we map the pixel values to the salient region weight W_salient with two thresholds.

W_{s a l i e n t} = {\begin{matrix} 1 & i f & I_{I R} \geq T_{h i g h} \\ \frac{I_{I R} - T_{l o w}}{T_{h i g h} - T_{l o w}} & i f & T_{l o w} < I_{I R} < T_{h i g h} \\ 0 & o t h e r w i s e \end{matrix}

(8)

Figure 4.

Comparison of feature matching results between CCD and MWIR images: red lines are classified as inliers by RANSAC, while blue lines are outliers.

where I_IR is the pixel value in the IR image, and T_high and T_low are the thresholds to control the range of the salient region values with T_low ≤T_high. When both of the thresholds have lower values, the enhanced result will have detailed expressions in the salient regions of the IR image. On the other hand, only strong pixel regions in the IR image will be detailed in the enhanced result when both of the thresholds have higher values. The weights between two thresholds increase from 0 (T_low) to 1 (T_high) linearly. In our experiments, we set T_high =0.5 and T_low =0.1 for the normalized IR pixel range.

Given the aligned CCD and IR images with their salient regions, we enhance the CCD image by adjusting the brightness of the salient regions in the CCD image using the salient region weight of the IR image. The brightness of the CCD image is replaced with:

\begin{array}{l} V_{C C D}^{'} = (R_{s a l i e n t} \cdot W_{s a l i e n t} - 1) \cdot V_{C C D} \\ + R_{s a l i e n t} \cdot W_{s a l i e n t} \cdot I_{I R} \end{array}

(9)

where V_CCD and V'_CCD are the brightness and enhanced brightness of the CCD image.

Figure 5.

Aligned results of the proposed method applied to IR and CCD Images.

4. Experiment Results

In our experiments, we used CCD and IR images for multi-sensor image alignment. The IR images were mid-wavelength IR (MWIR), which spanned the 3–5 µm wavelength range, and long-wavelength IR (LWIR), which spanned 8–14 µm. They were captured with cooled technology by using a detector from Sofradir. All images had a uniform sizes of 640×480 pixels and were captured in an open space in which houses, factories, buildings, trees, roads with electrical fences and a bridge were featured.

To evaluate the matching performance of the proposed descriptor, we compared our method to other feature-based approaches such as SIFT [11, 12] and SURF [13]. Figure 4 shows one of the matching comparisons. For performance visibility, we classified each matching line as either an inlier or an outlier by using RANSAC [11]: the red line signifies an inlier and the blue line an outlier. In Figure 4, the captured objects – house, fences, telegraph poles – have enough features such as corners and edges. However, it can be seen that SIFT and SURF cannot find enough accurate feature matching results, while our descriptor results in sufficient matched features. Table 1 compares the matching performance of our method with that of SIFT and SURF according to the type of input image pairs. Our method yielded sufficient inlier numbers; the number of inliers was greater for our method than for the others. Our method therefore yields more accurately aligned results.

Figure 6.

Comparison of computational time with MI-based method – a global optimization method – according to image size variation.

Table 1.

Comparison of matching feature numbers

Input Images	SIFT [1]	SURF [2]	Our method
CCD + LWIR	2 / 8	4 / 46	125 / 318
CCD + MWIR	1 / 9	4 / 31	17 / 113
LWIR + MWIR	340 / 345	218 / 280	1212 / 1726
(Inlier num. / Total matching num.)

Figure 5 shows the alignment results of the proposed method. To confirm the continuity of the overlapped edges, each result is displayed by using the composition of the two spliced input images in turn. Due to use of feature matching rather than global optimization, the computational time of the proposed method is very short, as shown in Figure 6. Accurate aligned results are achieved, as shown in Figure 5. It can be seen that our method was about 300 times faster than the MI-based method, one general alignment method that uses global optimization.

Testing our method in respect to another application, we generated stitched images using two different partially overlapping sensor images, as shown in Figure 7. Since we were able to match enough features in the overlapped area, an accurate stitched result was achieved.

Figure 7.

Stitched result using the proposed method.

Figure 8.

Comparison between BEMD fusion and the proposed method.

To evaluate our enhanced results, we compared our result with the Bi-dimensional Empirical Mode Decomposition (BEMD) fusion method [17]. As shown in Figure 8, our enhanced result provides not only detailed information from the IR image (the smoky region) but also natural variations (a cloudy sky). In most previous image fusion methods, the enhanced result values are too low (e.g. a cloudy sky in Figure 8(b)) due to the dark region in the IR image, which may combine low importance with low temperatures, while the same region has high values in the CCD image. However, since we used the pixel values in the IR image as adjusted by the salient region weights, our result features natural variations with enhanced details. Figure 9 and 10 show additional enhanced results when using the proposed method. The enhanced results show more detail with fusion between the CCD and IR images. However, the results are limited in that they lose contrast in some regions, such as the wires in Figure 10, since we only enhanced the brightness of the salient regions in the CCD image.

Figure 9.

Enhanced result #1 of the proposed method.

Figure 10.

Enhanced result #2 of the proposed method.

4. Conclusion

This paper has presented alignment and enhancement methods for multi-sensor images such as CCD and IR images. For the alignment, we used a geometric descriptor for feature matching based on shape similarities between input pairs. In our enhancement process, we enhanced the CCD image using its aligned IR image. With the salient region weights in both CCD and IR images, our enhanced result has not only enhanced detail but also natural variations. Our method is undemanding of terms of processing power and shows the desirable aligned results. Therefore, we expect that this method of alignment between multi-sensor images may quickly become widely adopted for general application. However, it should be noted that when matching features are not sufficiently clearly detected due to noise or lack of distinctive shapes, our method cannot adequately align the input image pairs.

Footnotes

6. Acknowledgments

This work was partly supported by Defense Acquisition Program Administration and Agency for Defense Development under the contract (UD100001ID)

References

Lex

Clement

Sabine

, (2009) Color Image Dehazing using the Near-Infrared. IEEE Interational Conf. on Image Processing. 1629–1632.

Sun

Tang

, (2009) Single Image Haze Removal Using Dark Channel Prior. Proc. IEEE Conf. on Computer Vision and Pattern Recognition. 33(12): 2341–2353.

Zhang

X. P.

Sim

Miao

X. P.

(2008) Enhancing Photographs with Near Infrared Images. Proc. IEEE Conf. on Computer Vision and Pattern Recognition. 1–8.

Sharma

Pavel

, (1997) Registration of video sequences from multiple sensors. In Proceedings of Image Registration Workshop. 361–366.

Kim

Y. S.

Lee

J. H.

J. B.

(2008) Multi-sensor image registration based on intensity and edge orientation information. Pattern recognition. 41(11):3356–3365.

Peng

H. C.

Long

Ding

, (2005) Feature selection based on mutual information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. on Pattern Analysis and Machine Intelligence. 27(8):1226–1238.

Studholme

Hill

D. L. G.

Hawkes

D. J.

(1999) An overlap invariant entropy measure of 3D medical image alignment. Pattern Recognition. 32(1):71–86.

Rueckert

Clarkson

M. J.

Hill

D. L. G.

Hawkes

D. J.

(2000) Non-rigid registration using higher-order mutual information. In Proc. of SPIE Medical Imaging. 438–447.

Gan

Chung

A. C. S.

(2005) Multi-dimensional mutual information based robust image registration using maximum distance-gradient-magnitude. In Proc. of Information Processing in Medical Imaging. 210–221.

10.

Guo

, (2006) Multi-modality image registration using mutual information based on gradient vector flow. In Proc. of the 18th International Conference on Pattern Recognition. 697–700.

11.

Lowe

, (1999) Object recognition from local scale-invariant features. Proc. Int. Conf. on Computer Vision. 2:1150–1157.

12.

Lowe

, (2004) Distinctive image features from scale-invariant keypoints, cascade ltering ap-proach. IJCV. 60(2):91–110.

13.

Bay

Tuytelaars

Gool

L. V.

(2006) SURF: Speeded Up Robust Features. Proc. European Conference on Computer Vision. 404–417.

14.

Berg

A. C.

Malik

, (2005) Shape Matching and Object Recognition using Low Distortion Correspondence. Proc. IEEE Conf. on Computer Vision and Pattern Recognition. 26–33.

15.

Harris

Stephens

, (1988) A combined corner and edge detector. Proc. in Alvey Vision Conf., 147–151.

16.

Martin

Fowlkes

Malik

, (2004) Learning to detect natural image boundaries using local brightness, color, and texture cues. PAMI, 26(5):530–549.

17.

Zhang

X.-Q

, (2009) Comparison of EMD Based Image Fusion Methods. International Conference on Computer and Automation Enginerring, 302–305.