Sage Journals: Discover world-class research

Abstract

In this paper, a multi-focus image fusion method based on Dual-Tree Compactly Supported Shearlet Transform (DT CSST) and Direction Decision Map (DDM) is proposed. DT CSST is a shift invariant modification of conventional Compactly Supported Shearlet Transform (CSST). Based on the mitigation of shift variance of CSST in DT CSST, a clearer fused image could be acquired through the General Image Fusion (GIF) method, and this image is called the initial fused image in this paper. The decision map is determined by the similarity of the initial fused image and the source images. The generation algorithm of the decision map in this paper takes advantage of the directional nature of DT CSST: every direction of the transform generates an initial directional decision map and then yields the final map through vote and smooth steps. This scheme is called DDM in this paper. The proposed method is evaluated by four groups of standard images. The results show that the proposed method is able to improve the quality indices compared with two algorithms which have excellent quality indices.

Keywords

Multi-focus image fusion Shearlet Dual-tree compactly supported shearlet transform Directional decision map

1. Introduction

Image fusion is a long-studied field that is attracting ever-increasing attention for a number of applications, such as remote sensing, navigation for robots, etc. Multi-focus image fusion, a branch of image fusion, refers to the fusing of two or more source images with various focus distances into an “all-in-focus” image. The fused images will substitute vague objects or areas with clearer ones. There are two types of image fusion method. The first type is performed directly in the spatial domain, such as using the pixel of the fused image as the average, large or small value of the input images. The method of this type is simple and easy to implement, but its quality is relatively low. The second type is performed in a certain Multi-Scale Transform (MST) domain, and it is called General Image Fusion (GIF) in this paper. Although the method of this type is relatively more complex and has a larger computation complexity, it usually results in better visual perceptions and quality indices. Many MSTs are introduced into these methods, such as Discrete Wavelet Transform (DWT) [1–5], Curvelet [6, 7], Band Limited Shearlet [8] and Compactly Supported Shearlet Transform (CSST) [9, 10], etc.

Shearlet has emerged in recent years as among the most successful frameworks for the efficient representation of multidimensional data. Indeed, many other transforms are introduced to overcome the limitation of traditional MST's poor ability in capturing edges or other anisotropic features. Shearlet transform stands out since it has many unique advantages: a single or finite set of generating functions; optimally sparse representations for multidimensional data; unified treatment of the continuum and digital realms; and compactly supported transform. In this paper, the CSST is selected as the MST for image fusion, because 1) it is easy to implement, 2) its computation cost is comparable to DWT, 3) it is compactly supported in the spatial domain, which is in accordance with the features of natural images such as edge, texture, etc., which are also compactly supported in the spatial domain. However, conventional CSST is shift-variant and will cause the presence of artefacts in fused images. The mitigation of its shift-variant property is necessary.

Figure 1.

Forward and backward transform of CSST

In [11], Li proposed a spatial fusion method called Local Fraction Dimensions with Decision Map. Its quality indices are lower than those of frequency-based methods. In [12], Li et al. proposed a multi-focus image fusion method with an additional post-processing step, called the detection of the focused region, after GIF is performed. The idea of the post-processing step is to determine the similarity between the initial fused image and both source images based on the Root Mean Square Error (RMSE). The similarity is recorded by a decision map, and the final fused image is the weighted sum of the clearer areas of the source images and the initial fused image. This method is motivated by the fusion of noise images, while the scheme can also improve the quality of clear images. In another paper [13], Li et al. gave more details about how to calculate the decision map from both regional RMSE and Correlation Coefficient (CC), and a dual-window technique is performed to calculate the final fused images. Both methods are successful because the fused images have good perceptions and the quality indices are higher than many methods. However, they can still be improved. Firstly, both methods are based on wavelet transform, which lacks the ability for geometric analysis. Secondly, the decision map is based on RMSE, and it has been verified that RMSE cannot reflect the quality of different images according to the visual effect [14], and on the regional CC, which will cause the denominator to be zero if the local pixels equal each other. In this paper, the scheme of Directional Decision Maps (DDM) is proposed, whereby the initial decision maps are generated in each direction of the CSST. DDM can take the advantage of CSST's geometric analysis ability. In addition, the local Q values [14], instead of RMSE or CC, are selected in generating each decision map to overcome the invalidation of regional CC. The experiments based on several standard images are used to evaluate the visual perceptions and quality indices of the proposed method and methods in [12] and [13].

2. Dual-Tree Compactly Supported Shearlet Transform

Conventional CSST, first proposed by Lim in [15], describes the spatial domain implementation of a cone-adapted shearlet transform. CSST has also proved to be almost optimally sparse for cartoon-like images [16, 17].

As this paper is not a thorough introduction to the theory of CSST, only the steps are briefly given in Figure 1. The left parts are the steps of the forward transform and the right parts are the steps of the backward transform. Both forward and backward transform have two steps: Shear and Anisotropic Discrete Wavelet Transform (ADWT), but in the reverse order. As shown in Figure 2, the shear step can elongate source images along the horizontal directions by different selected k, as well as in the vertical directions. ADWT can also further generate the coefficients at different scales (j) and positions (m). The coefficients at different scales, directions and positions in the horizontal and vertical cones are represented by $C_{j, k, m}$ and ${\tilde{C}}_{j, k, m}$ , respectively, in Figure 1.

In order to mitigate the shift invariance of Conventional CSST, ADWT is substituted by dual-tree complex wavelet transform (DT CWT) in [18], and the mitigation of the shift variance of transforms can improve the image fusion performance in other transforms too, such as DT CWT [5], curvelet [7], etc.

Figure 2.

Shear in horizontal direction

3. General Image Fusion based on DT CSST

The research on constructing the General Image Fusion method can be said to originate in the prestigious work of Pella in [19], where the basic framework of the image was proposed, which greatly improved the design and evaluation of image fusion methods. In this paper, GIF refers to the process given in Figure 3, which is in fact a simplified version of Pella's. Pella's frame further split the fusion rule step into several steps; however, these split steps are only suitable for competition rules, where two source images are treated equally.

Figure 3.

The steps of GIF

In Figure 3 the GIF has three steps: forward transform fusion rule and inverse transform Forward and inverse transform refer to the decomposition of the source images into certain coefficients (CA and CB) before eventually composing them The fusion rule describes how to fuse C_A and CB into one set C_F Thus the key problems of GIF are the choice of transform and the design of the fusion rule In this paper the transform is certainly DT CSST and the fusion rules are given in equation (1) The output of GIF based on DT CSST does in fact have good perceptions but the quality indices can be further improved by the Directional Decision Map (DDM) method as shown in the following section.

\begin{array}{l} C_{F L} = m e a n (C_{A L}, C_{B L}) \\ C_{F H} = {\begin{matrix} C_{A H}, S_{A} \geq S_{B} \\ C_{B H}, S_{A} < S_{B} \end{matrix} \end{array}

(1)

where $S = \sum_{p ∊ P} | x_{p} - \bar{x} |$ , P is the certain local region of a pixel, and p represents every position in P. ¯ is the average value of all pixels in P. The geometric analysis is the essence of the proposed method. P is usually a (2N + 1)x(2N + 1) squared window in many algorithms. But in the proposed method, P is a (2M + 1)x(2N + 1) rectangle where M!=N. Combined with the shear operation, its shape can be varied in different directions. In Figure 4, different windows with five directions of the horizontal cone are shown, where M <N. In the vertical cone M > N.

Figure 4.

Local regions at different directions

4. Directional Decision Map

In [11], a focused region detection method based on the Root Mean Squared Error (RMSE) is adopted, and the initial decision map is generated from the lower value of regional RMSE between the initial fused image and both source images. In [13], Li realized that RMSE cannot represent the decision map precisely, and he adds the local Correlation Coefficient (CC) as an auxiliary However, in [14], it was stated that RMSE cannot precisely represent the similarity of two images and the index of Q is proposed which outperforms the RMSE under different types of image distortions Local CC would be invalidated if the pixel values are the same in the local region of the initial fused image and the source images To overcome the above disadvantages in this paper the initial map is generated based on local Q instead of RMSE or local CC The steps of the proposed method are given as follows.

1) Calculate the value of Q for each pixel within (2M +1) × (2N +1) window between the source images and the initial fuse image The choice of M and N have been introduced in previous section.

\begin{array}{l} Q_{A F}^{(k)} (x, y) = \frac{4 σ_{A F} (x, y) \bar{A^{(k)}} (x, y) \bar{F^{(k)}} (x, y)}{(σ_{A}^{2} (x, y) + σ_{F}^{2} (x, y)) [{(\bar{A^{(k)}} (x, y))}^{2} + {(\bar{F^{(k)}} (x, y))}^{2}]} \\ Q_{B F}^{(k)} (x, y) = \frac{4 σ_{B F} (x, y) \bar{B^{(k)}} (x, y) \bar{F^{(k)}} (x, y)}{(σ_{B}^{2} (x, y) + σ_{F}^{2} (x, y)) [{(\bar{B^{(k)}} (x, y))}^{2} + {(\bar{F^{(k)}} (x, y))}^{2}]} \end{array}

(2)

where (x, y) means every location in the whole image, and $\bar{T} (x, y) = \frac{1}{N_{Ω}} \sum_{i = - M}^{M}$ $\sum_{j = - N}^{N} T (x + i, y + j)$ , $N_{Ω} = (2 M + 1) (2 N + 1)$ $σ_{T}^{2} (x, y) = \frac{1}{N_{Ω} - 1} \sum_{i = - M}^{M}$ $\sum_{j = - N}^{N} {(T^{(k)} (x + i, y + j) - \bar{T^{(k)}} (x, y))}^{2}$ , $T ∊ {A, B, F}$ , $σ_{T F} (x, y) = \frac{1}{N_{Ω} - 1} \sum_{i = - M}^{M}$ $\sum_{j = - N}^{N} (T^{(k)} (x + i, y + j) - \bar{T^{(k)}} (x, y))$ $(F^{(k)} (x + i, y + j) - \bar{F^{(k)}} (x, y))$ , $T ∊ {A, B}$

2) Compare the value for each point: The larger value of Q indicates that at this position the initial fused image and corresponding source image are very similar. The initial decision maps Z^(k)_AF(x,y)Z^(k)_BF(x,y) can be constructed by the equation (3).

Figure 5.

Steps of the proposed method

\begin{array}{l} Z_{A F}^{(k)} (x, y) = {\begin{matrix} 1, D_{S_{1}^{- k}} (Q_{A F}^{(k)} (x, y)) \geq D_{S_{1}^{- k}} (Q_{B F}^{(k)} (x, y)) \\ 0, D_{S_{1}^{- k}} (Q_{A F}^{(k)} (x, y)) < D_{S_{1}^{- k}} (Q_{B F}^{(k)} (x, y)) \end{matrix} \\ Z_{B F}^{(k)} (x, y) = {\begin{matrix} 1, D_{S_{1}^{- k}} (Q_{B F}^{(k)} (x, y)) \geq D_{S_{1}^{- k}} (Q_{A F}^{(k)} (x, y)) \\ 0, D_{S_{1}^{- k}} (Q_{B F}^{(k)} (x, y)) < D_{S_{1}^{- k}} (Q_{A F}^{(k)} (x, y)) \end{matrix} \end{array}

(3)

where subscript k refers to the k-th direction. 3) Vote step: Firstly, add the directional decision maps. If the sum is larger than the experience value t=k+1, this position is regarded as in focus. Both sums are represented by Z_AF and Z_BF.

\begin{array}{l} Z_{A F} (x, y) = {\begin{matrix} 1 & \sum_{k = 1}^{2 K} Z_{A F}^{(k)} (x, y) > t \\ 0 & else \end{matrix} \\ Z_{B F} (x, y) = {\begin{matrix} 1 & \sum_{k = 1}^{2 K} Z_{B F}^{(k)} (x, y) > t \\ 0 & else \end{matrix} \end{array}

(4)

4) Smooth step: A strategy based on morphology is performed to mitigate the thin protrusions, thin gulfs, narrow breaks, small holes in Z_AF and Z_BF. In multi-focus images, the area which is in focus is usually continuous and has an adequate number of pixels in the image. The smooth step is represented by Smooth(Z) and it is further divided into three sub-steps: (i) fill the holes of both {Z_AF, 1-Z_AF} {Z_BF, 1-Z_BF}; (ii) morphological image closing operation is performed to smooth the edge of the remaining areas; (iii) perform the size-filtering to {Z_AF, 1-Z_AF}{Z_BF, 1-Z_BF}.

The structural parameter of the closing operation is very important, and in the proposed method, ‘disk’ structure is chosen. If the size of the connected area is less than an experience threshold, this area will be set to 0. Because Smooth(Z_AF) and 1-Smooth(1-Z_BF) are in fact not identical, the overlapped pixels are set to 0. The smoothed maps are represented by Z_A^S_F^mooth and Z_BF^Smooth.

After the four steps above, the DDM is calculated. Then, the final fused image is calculated by equation (5):

F = F_{initial} • (1 - Z_{A F}^{Smooth} - Z_{B F}^{Smooth}) + A • Z_{A F}^{Smooth} + B • Z_{B F}^{Smooth}

(5)

where F_Initial =mean(F⁽¹⁾_initial, F⁽²⁾_initial, …, F^(2K)_initial and the means element multiplication.

To sum up, all steps of the proposed method are given as shown in Figure 5. The steps are represented by the rectangle with the name label, and the data are represented by a parallelogram. The symbols of variables which appear in previous and current sections are marked at the arrows where they are calculated.

5. Evaluation

In this section, the evaluations of the proposed method, as well as several comparative methods, are reported. The standard source images are given in Figure 6. From left to right, they are named ‘clock', ‘lab', ‘disk’ and ‘flower', respectively. In evaluation, the number of directions is 18, which means k = 9 direction in each cone. The experience threshold in the smooth step is set to 20000.

Figure 6.

Source images

Figure 7 and Figure 8 are the initial fused images and their mean image of ‘clock’ and ‘disk'. In Figures 7 and 8, (a)–(e) are the initial fused images of five odd directions in the horizontal cone, labelled by $F_{initial}^{1}$ $F_{initial}^{3}$ , $F_{initial}^{5}$ $F_{initial}^{7}$ , $F_{initial}^{9}$ . The images of the vertical cone are not shown in order to limit the pages labelled from $F_{initial}^{10}$ to $F_{initial}^{18}$ . (f) is $F_{initial}$ which is the average initial fused image of all 18 different directions. In fact (f) has good perceptions because it contains most of the clearer information of both source images. Its differences to the source images A and B are shown in (g) and (h), respectively. Small distortions can be observed in the clear area, and the distortions degenerate the quality indices.

In Figure 9 and Figure 10, some key intermediate images for ‘lab’ and ‘flower’ are shown. (a) and (b) are $Z_{A F}^{1}$ , $Z_{A F}^{9}$ , (c) and (d) are $Z_{B F}^{1}$ and $Z_{B F}^{9}$ . It can be observed that the small holes have certain directional features. (e) and (f) are Z_AF and Z_BF. After the vote step the impact of direction can hardly be observed, but many holes exist. (g) and (h) are $Z_{A F}^{Smooth}$ and $Z_{B F}^{Smooth}$ . Their shape is very similar, but not exactly the same; (i) is the final fused image F, (j) is the difference of image between A and F, and (k) is the difference of image between B and F.

The quality indices are Mutual Information (MI) and Q^AB|F [20]. MI measures how much information in the source images would be contained in the fused images. It is defined as:

MI = I_{F A} + I_{F B}

(6)

where $I_{F A} (f, a) = \sum_{f, a} P_{F A} (f, a) \log \frac{P_{F A} (f, a)}{P_{F} (f) P_{A} (a)}$ and $I_{F B} (f, b) = \sum_{f, b} P_{F B} (f, b) \log \frac{P_{F B} (f, b)}{P_{F} (f) P_{B} (b)}$ , P is the distribution of greyscale in the images. Q^AB|F evaluates the amount of edge information that is transferred from the source images to the fused images [20]. Larger Q^AB|F means more important edge information being transferred. The experiment's results are shown in Table 1 and Table 2.

From Table 1 and Table 2, it can be observed that the values of MI and Q^AB|F of the proposed methods are better than those of both methods in [12, 13]. From both documents, we know that both methods take advantage of decision maps, and that their quality indices have reached a very extreme level. The main difference between the proposed method and these studies is the selection of MST. In the proposed method, the MST is the shift-invariant shearlet transform – DT CSST, which has the capacity for geometric analysis. So the quality indices of the proposed method can be further improved.

Figure 7.

Initial fused image of ‘clock'

Figure 8.

Initial fused image of ‘disk'

Figure 9.

Key intermediate and final fused images of ‘lab'

Table 1.

MI of proposed method and method of [12] and [13]

MI	clock	lab	disk	flower
proposed	8.5113	8.8436	8.3026	8.2054
[12]	8.4812	8.7460	8.2256	7.9303
[13]	8.5022	8.8262	8.2976	8.0693

Table 2.

Q^AB|F of proposed method and method of [12] and [13]

Q^AB\|F	clock	lab	disk	flower
proposed	0.7355	0.7591	0.7401	0.7255
[12]	0.7229	0.7575	0.7354	0.7239
[13]	0.7203	0.7590	0.7380	0.7248

6. Conclusion

In this paper, a multi-focus image fusion method based on the combination of DT CSST and DDM is proposed. DT CSST is approximately shift invariant. Compared with wavelet transform, which provides only one initial fused image and the decision map, the geometric analysis ability of DT CSST can provide initial fused images and DDMs in each direction. These redundancies provide more candidates with which to generate a finer decision map. Based on the parameter of local Q, instead of local RMSE or CC, the initial decision map can be generated, which may also contribute to the improvement of performance. In general, the proposed multi-focus image fusion method has better quality indices than the methods of [12, 13], which already had good perceptions. In a future study, the proposed method will be applied in the navigation of robots in noisy environments.

Figure 10.

Key intermediate and final fused images of ‘flower'

7. Acknowledgments

This work is supported by the Young Scholars Development Fund of SWPU (South West Petroleum University), No. 201499010119.

References

Roux

(2010) Multifocus image fusion based on redundant wavelet transform. Image Processing. IET. j.4(4): 283–293.

Sheng

Wen-Zhong

Liu

(2007) Multisource Image Fusion Method Using Support Value Transform. IEEE Transactions on Image Processing. j.16(7): 1831–1839.

Cvejic

Seppanen

Godsill

(2009) A nonreference image fusion metric based on the regional importance measure. Selected Topics in Signal Processing. IEEE. j.3(2): 212–221.

Yang

(2010) Multifocus image fusion and restoration with sparse representation. IEEE Transaction on Instrumentation and Measurement. j.59(4): 884–892.

Ioannidou

Karathanassi

(2007) Investigation of the dual-tree complex and shift-invariant discrete wavelet transforms on quickbird image fusion. Geoscience and Remote Sensing Letters. IEEE. j.4(1): 166–170.

Liang

Junping

Qian

Qingping

(2013) Feature-based image fusion with a uniform discrete curvelet transform. Int J Adv Robot Syst 10:255.

Yan

Xiao

Zhu

(2008). Image fusion algorithm based on spatial frequency-motivated pulse coupled neural networks in nonsubsampled contourlet transform domain. Acta Automatica Sinica 34(12): 1508–1514.

Duan

Wang

(2011) Multi-focus image fusion based on shearlet transform. Journal of Information and Computational Science. j.8(15): 3713–3720.

Miao

Shi

(2011) Multi-focus image fusion algorithm based on shearlets. Chinese Optics Letters 9(4). 041001–1–5.

10.

Miao

Shi

(2011) A novel algorithm of image fusion using shearlets. Optics Communications. j.284(6): 1540–1547.

11.

Qingping

Junping

Liang

(2013) Multi-focus image fusion using the local fractal dimension. Int J Adv Robot Syst 10:251.

12.

Chai

Yin

(2012) Multifocus image fusion and adenoising scheme based on homogeneity similarity. Optics Communications. j.285(2): 91–100.

13.

Chai

(2013) A new fusion scheme for multifocus images based on focused pixels detection. Machine Vision and Applications. j.1–15.

14.

Wang

Bovik

(2002) A universal image quality index. Signal Processing Letters. IEEE. 9(3):81–84.

15.

Lim

W Q

, (2010) The discrete shearlet transform: A new directional transform and compactly supported shearlet frames. IEEE Transactions on Image Processing. j.19(5): 1166–1180.

16.

Kutyniok

Lim

(2011) Compactly supported shearlets are optimally sparse. Journal of Approximation Theory. j.163(11): 1564–1589.

17.

Kittipoom

Kutyniok

Lim

(2012) Construction of compactly supported shearlet frames. Constructive Approximation 35(1), 21–72.

18.

Duan

Huang

Wang

(2014) Remote Sensing Image Fusion Based On IHS and Dual Tree Compactly Supported Shearlet Transform. International Journal of Signal Processing, Image Processing & Pattern Recognition. 7(5): 361–374

19.

Piella

(2003). A general framework for multiresolution image fusion: From pixels to regions. Information Fusion 4(4): 259–280.

20.

Petrovi

Xydeas

(2000) On the effects of sensor noise in pixel-level image fusion performance. Information Fusion. Proceedings of Third International Conference. c.2:WEC3/14-WEC3/19 vol.

A Novel Multi-Focus Image Fusion Method Based on Dual-Tree Shearlet Transform

Abstract

Keywords

1. Introduction

References