Abstract
In this paper, we present an efficient image alignment and enhancement method for multi-sensor images. The shape of the object captured in a multi-sensor images can be determined by comparing variability of contrast using corresponding edges across multi-sensor image. Using this cue, we construct a robust feature descriptor based on the magnitudes of the oriented edges. Our proposed method enables fast image alignment by identifying matching features in multi-sensor images. We enhance the aligned multi-sensor images through the fusion of the salient regions from each image. The results of stitching the multi-sensor images and their enhancement demonstrate that our proposed method can align and enhance multi-sensor images more efficiently than previous methods.
1. Introduction
For autonomous robots, fusion of multi-sensor images provides more useful information regarding an environment than a single-sensor image, because different sensors can be used to provide different information. Specifically, since infrared (IR) images provide detailed information about dark or smoky areas that is not available in charge coupled device (CCD) images, they are commonly used to enhance CCD images [1–3]. Lex et al. [1] proposed the dehazing method in the fog-shrouded area of the CCD image using a Near IR (NIR) image. Zhang et al. [3] enhanced CCD image dark area using a corresponding NIR image. For the fusion of multi-sensor images, precise alignment of the images is the most important step in the processing procedure. However, because the intensities of IR images are determined by the temperatures of the captured objects, whereas the intensities of the CCD images are determined by the colours, the intensities of the same object in the IR and the CCD images are represented by different values and in different proportions; this difference makes proper alignment challenging. In addition, it is difficult to capture CCD and IR images using the same device for the pre-alignment because of the hardware differences in the CCD and IR capturing devices. Therefore, alignment of the images is a required part of pre-processing for their enhancement.

Examples of four oriented edge maps of CCD and IR images (top: examples of CCD images; bottom: examples of IR images). The oriented edge maps are generated according to four directions: 22.5°∼67.5°, 67.5°∼112.5°, 112.5°∼157.5°, 157.5°∼202.5° from left to right.
For the alignment of multi-sensor images, most previous works minimized the error in the overlapped area using edge maps [4], oriented edge maps [5], edge contours and mutual information (MI) [5–10]. In this approach, the features used for error minimization are designed as invariant to brightness changes induced by the physical characteristics of various sensors. However, regardless of their successive aligned results, it is hard to use the figures in practice, due to the computational expense necessary for global optimization. On the other hand, feature-based approaches, such as SIFT [11, 12] and SURF [13], have also been proposed for performing fast alignment, and have yielded promising results. However, these methods are not suitable for aligning multi-sensor images because of the intensity and gradient differences between the CCD and IR images.
To solve these problems, we propose a robust feature descriptor for multi-sensor image alignment. Since the proposed alignment method is based on feature matching between images, the aligned result is achieved in a few seconds, without the need for global iterative optimization. To overcome the difference in properties among the multi-sensor images, our descriptor uses the magnitudes of the oriented edge maps, which are independent to the expressional values of the pixels. The final descriptor formula is constructed by using geometric blur with a coarse-to-fine strategy. Following the alignment of the IR and CCD images, we are also able to enhance the CCD image by combining its salient region with the corresponding one in the IR image. The enhanced result represents more detailed information than either CCD or the IR images alone.
The rest of this paper is organized as follows. Section 2 presents the proposed alignment method for multi-sensor images. In Section 3, we describe the enhancement of a CCD image using its aligned IR image. In Section 4, we show that our aligned result is accurate, and is achieved with a faster computational time than previous methods. We also show that the enhanced image not only has natural variations but also retains details from the IR image.
2. Multi-sensor Image Alignment
In multi-sensor images, the recorded intensity range and the values for the specific objects vary depending on the capture device. However, most object shapes appear in multi-sensor images with different contrasts. For the detection of feature points, we use a Hessian Detector using Integral Images [13], which is one of the most popular fast feature detection methods available. By using the feature locations detected and the scales of the features, we generate proposed feature descriptors for matching the images.
To construct the feature descriptor, we use the orientation properties of the signals around the detected features. The orientation of each signal is constructed from the edge around it. With the assumption of gradient preservation, we use the features in the oriented edge maps [14] constructed from the oriented map [5] of the input images. The oriented map is generated by determining the edge direction at the pixel that belongs to the edge group. To decide whether an arbitrary pixel belongs to the edge group or not, we use an eigenanalysis [5, 15] of a structure matrix
where ω1 and ω2 denote the eigenvectors representing the maximum and minum directions of intensity variations, respectively, and λ1 and λ2 are the eigenvalues which represent the magnitudes ω1 and ω2, respectively. Since the pixels belonging to the edge group have large differences betwen their two eivenvalues, we determine edge pixels as follows:
The oriented map is constructed by the orientations of edge pixels as
The oriented edge maps are generated using the edge magnitude according to the orientation angle
where
The feature descriptor is constructed by sampling around the feature in the oriented edge maps using geometric blur [14, 16]. The geometric blur descriptor provides discriminative information by using a spatially varying kernel. Given one of the oriented edge maps,
where

Sampling strategy for constructing a feature descriptor using geometric blur
To create a feature descriptor, we sample 51 pixels from each oriented edge map using six scales and 30-degree intervals, as shown in Figure 2. We concatenate the sampled pixels, thereby creating a descriptor with dimensions 51×
To ensure scale invariance in feature matching, we generate feature descriptors with a scaling 1.1 times that of the oriented edge map. Feature descriptors for one feature point are defined by six descriptors with dimensions 51×
3. Multi-Sensor Image Enhancement
In this Section, we propose an image enhancement method using mid-wavelength IR (MWIR) or long-wavelength IR (MWIR) images. In the CCD image, we detect the regions which need to be enhanced. Then, we combine these regions with those in the IR image, while the same regions in the IR image may have more details. Intuitively, regions which are too bright or dark with low saturation incur a loss of detail [3]. On this basis, the salient region,

Example of the proposed salient region computation.
where
where
In the IR image, the caputred objects are recorded by their temperatures. This means that the major objects may have higher temperatures, which puts them in the salient regions. Therefore, we directly construct the salient region weights of the IR image by using its normalized pixel values. For smoothing out variation between the salient and non-salient pixels, we map the pixel values to the salient region weight

Comparison of feature matching results between CCD and MWIR images: red lines are classified as inliers by RANSAC, while blue lines are outliers.
where
Given the aligned CCD and IR images with their salient regions, we enhance the CCD image by adjusting the brightness of the salient regions in the CCD image using the salient region weight of the IR image. The brightness of the CCD image is replaced with:
where

Aligned results of the proposed method applied to IR and CCD Images.
4. Experiment Results
In our experiments, we used CCD and IR images for multi-sensor image alignment. The IR images were mid-wavelength IR (MWIR), which spanned the 3–5 µm wavelength range, and long-wavelength IR (LWIR), which spanned 8–14 µm. They were captured with cooled technology by using a detector from Sofradir. All images had a uniform sizes of 640×480 pixels and were captured in an open space in which houses, factories, buildings, trees, roads with electrical fences and a bridge were featured.
To evaluate the matching performance of the proposed descriptor, we compared our method to other feature-based approaches such as SIFT [11, 12] and SURF [13]. Figure 4 shows one of the matching comparisons. For performance visibility, we classified each matching line as either an inlier or an outlier by using RANSAC [11]: the red line signifies an inlier and the blue line an outlier. In Figure 4, the captured objects – house, fences, telegraph poles – have enough features such as corners and edges. However, it can be seen that SIFT and SURF cannot find enough accurate feature matching results, while our descriptor results in sufficient matched features. Table 1 compares the matching performance of our method with that of SIFT and SURF according to the type of input image pairs. Our method yielded sufficient inlier numbers; the number of inliers was greater for our method than for the others. Our method therefore yields more accurately aligned results.

Comparison of computational time with MI-based method – a global optimization method – according to image size variation.
Comparison of matching feature numbers
Figure 5 shows the alignment results of the proposed method. To confirm the continuity of the overlapped edges, each result is displayed by using the composition of the two spliced input images in turn. Due to use of feature matching rather than global optimization, the computational time of the proposed method is very short, as shown in Figure 6. Accurate aligned results are achieved, as shown in Figure 5. It can be seen that our method was about 300 times faster than the MI-based method, one general alignment method that uses global optimization.
Testing our method in respect to another application, we generated stitched images using two different partially overlapping sensor images, as shown in Figure 7. Since we were able to match enough features in the overlapped area, an accurate stitched result was achieved.

Stitched result using the proposed method.

Comparison between BEMD fusion and the proposed method.
To evaluate our enhanced results, we compared our result with the Bi-dimensional Empirical Mode Decomposition (BEMD) fusion method [17]. As shown in Figure 8, our enhanced result provides not only detailed information from the IR image (the smoky region) but also natural variations (a cloudy sky). In most previous image fusion methods, the enhanced result values are too low (e.g. a cloudy sky in Figure 8(b)) due to the dark region in the IR image, which may combine low importance with low temperatures, while the same region has high values in the CCD image. However, since we used the pixel values in the IR image as adjusted by the salient region weights, our result features natural variations with enhanced details. Figure 9 and 10 show additional enhanced results when using the proposed method. The enhanced results show more detail with fusion between the CCD and IR images. However, the results are limited in that they lose contrast in some regions, such as the wires in Figure 10, since we only enhanced the brightness of the salient regions in the CCD image.

Enhanced result #1 of the proposed method.

Enhanced result #2 of the proposed method.
4. Conclusion
This paper has presented alignment and enhancement methods for multi-sensor images such as CCD and IR images. For the alignment, we used a geometric descriptor for feature matching based on shape similarities between input pairs. In our enhancement process, we enhanced the CCD image using its aligned IR image. With the salient region weights in both CCD and IR images, our enhanced result has not only enhanced detail but also natural variations. Our method is undemanding of terms of processing power and shows the desirable aligned results. Therefore, we expect that this method of alignment between multi-sensor images may quickly become widely adopted for general application. However, it should be noted that when matching features are not sufficiently clearly detected due to noise or lack of distinctive shapes, our method cannot adequately align the input image pairs.
Footnotes
6. Acknowledgments
This work was partly supported by Defense Acquisition Program Administration and Agency for Defense Development under the contract (UD100001ID)
