Saturation-based quality assessment for colorful multi-exposure image fusion

Abstract

Multi-exposure image fusion is becoming increasingly influential in enhancing the quality of experience of consumer electronics. However, until now few works have been conducted on the performance evaluation of multi-exposure image fusion, especially colorful multi-exposure image fusion. Conventional quality assessment methods for multi-exposure image fusion mainly focus on grayscale information, while ignoring the color components, which also convey vital visual information. We propose an objective method for the quality assessment of colored multi-exposure image fusion based on image saturation, together with texture and structure similarities, which are able to measure the perceived color, texture, and structure information of fused images. The final image quality is predicted using an extreme learning machine with texture, structure, and saturation similarities as image features. Experimental results for a public multi-exposure image fusion database show that the proposed model can accurately predict colored multi-exposure image fusion image quality and correlates well with human perception. Compared with state-of-the-art image quality assessment models for image fusion, the proposed metric has better evaluation performance.

Keywords

Colorful multi-exposure image fusion image quality assessment texture similarity structure similarity saturation similarity extreme learning machine

Introduction

Data fusion has become popular recently and various data fusion algorithms have been proposed.^1

–6 Multi-exposure image fusion is a typical data fusion area, and is considered an effective quality enhancement technique that is widely adopted in consumer electronics.⁷ With many multi-exposure image fusion algorithms^8

–13 at hand, it is essential to evaluate their performance. Numerous quality metrics^{14

–27} for image fusion have been developed, but few of these measure color characteristics. Therefore, in this article, we propose a method for colorful multi-exposure image fusion assessment.

Multi-exposure image fusion takes a sequence of images with different exposure levels as inputs and synthesizes an output image that is more informative and perceptually appealing than any of the input images.^28,29 In general, the problem of multi-exposure image fusion can be formulated as³⁰

Y (i) = \sum_{j = 1}^{K} W_{j} (i) X_{j} (i)

where K is the number of multi-exposure input images in the source sequence, X_j (i) and W_j (i) usually represent the luminance value (or the coefficient amplitude in the transform domain) and the weight of the ith pixel in the j th exposure image, respectively, and Y denotes the fused image. The weight map W_j often bears information regarding the relative structural detail and perceptual importance at different exposure levels.

In the last few years, numerous multi-exposure image fusion methods have been proposed.^8

–13 The existing multi-exposure image fusion algorithms mainly differ in the computation of W_j . For example, global and local energy weighting algorithms employ global and local energy in source images to determine W_j , respectively. Mertens et al.⁸ defined contrast, color saturation, and exposure intensity as measures to compute weights for multi-exposure image fusion. Based on the work of Mertens et al.,⁸ Li et al.⁹ enhanced the details of a given fused image by solving a quadratic optimization problem. Gu et al.¹⁰ extracted gradient information from the structure tensor and smoothed it to compute weights. Edge-preserving filters, such as a bilateral filter,¹¹ a fast multi-exposure median filter and recursive filter,¹² and a guided filter,¹³ have been applied to retrieve edge information to refine W_j . All of these multi-exposure image fusion approaches present diverse fusion performance, as shown in Figure 1. Figure 1(a) shows multi-exposure source images. Figure 1(b) to (i) are fused images generated using the aforementioned multi-exposure image fusion algorithms.^8

–13 One can see that the images in Figure 1(b) and (g) have low brightness. Hence, they may lead to texture information loss. Figure 1(c) and (i) suffer from structure disordering, as highlighted by the red rectangles. In addition, Figure 1(c) contains unnatural artifacts. There are some obvious color distortions in Figure 1(f). Figure 1(h) and (i) present some unnatural black areas, which cause uncoordinated color. Compared with the other fused images in Figure 1, Figure 1(d) and (e) show better image quality. Therefore, it is necessary to conduct studies to assess multi-exposure image fusion quality, to evaluate the performance of different multi-exposure image fusion methods.

Figure 1.

(a) Multi-exposure source images; (b, c) images fused by global and local energy weighting algorithms, respectively; (d–i) images fused by algorithms of Mertens et al.,⁸ Li et al.,⁹ Gu et al.,¹⁰ Raman and Chaudhuri,¹¹ and Li and colleagues,^12,13 respectively.

With the development of image quality assessment, considerable effort has been made to develop performance measurement for image fusion; these evaluations can be categorized as subjective or objective. Subjective evaluations^14
–16 are reliable but expensive and time-consuming. Most importantly, they cannot be embedded into automated frameworks of systems, which makes them unsuitable for practical applications. Hence, objective algorithms have been developed for image fusion. These algorithms can be categorized into four types.

Based on information theory. Qu et al.¹⁷ adopted summation of the mutual information between the fused and multiple input images to evaluate image quality.

Based on features. Xydeas and Petrovic¹⁸ proposed an edge-based performance measure that computes the amount of edge information that is transferred from the source images to the fused image. A similar idea was employed by Wang and Liu,¹⁹ who retrieved edge strength using a two-scale Haar wavelet. Zheng et al.²⁰ computed spatial frequency to measure the activity level of the image to estimate the fused image quality.

Based on structure similarity. These algorithms are mostly based on the structural similarity index.²¹ Piella and Heijmans²² combined local image salience with a universal quality index²³ to predict fused image quality. Cvejic et al.²⁴ and Yang et al.²⁵ built their quality measures using structural information theory. Ma et al.³⁰ extracted multi-scale local contrast and patch structures of input images to calculate similarities with the fused image.

Based on human perception. By extracting edge information, Chen and Varshney²⁶ calculated local saliency and combined it with a contrast sensitive function. Chen and Blum²⁷ applied the contrast-sensitive function in the frequency domain and preserved local information to measure image quality.

However, most of the aforementioned image quality assessment algorithms only consider fusion cases where there are two input images, which is unsuitable for multi-exposure image fusion. In addition, they suffer from various drawbacks. For example, the mutual information algorithm¹⁷ treats an image as a global entity and attributes a single score to it, without taking individual pixel intensities and regional structures into account. Xydeas and Petrovic¹⁸ only compute the edge similarity between the source and fused image as the evaluator, while ignoring texture information and human perception. The same problem can also be found in the works of Wang and Liu¹⁹ and Zheng et al.²⁰ Metrics^26,27 estimate the image quality by computing local saliency maps, which unavoidably leads to some background information loss. Algorithms^22,24,25,30 consider the similarities of the source images and fused image based on different weights in grayscale; thus, they are unable to evaluate the quality degradation caused by color information loss. Figure 2(a) and (c) are produced by Gu’s algorithm,¹⁰ while Figure 2(b) and (d) are generated by a global energy weighted algorithm. Obviously, we can see that there exist large differences in color between the two sets of images. The color of objects in Figure2 (a) and (c) is extraordinary and unnatural, which degrades image quality. The mean opinion scores of Figure 2(a) and (b) are 4.9130 and 6.6522, and those of Figure 2(c) and (d) are 3.9565 and 5.0435. However, the quality scores predicted by Mat et al.³⁰ are 0.9216, 0.9066, 0.8760, and 0.8358, which are not consistent with the mean opinion scores. Therefore, it is necessary to take color information into account in the assessment of multi-exposure image fusion images.

Figure 2.

Examples of color information degradation. (a, c) Fused images by Gu et al.;¹⁰ (b, d) fused images by global energy weighted algorithm. MOS: mean opinion score.

To address those aforementioned drawbacks, we propose an objective quality assessment method for colored multi-exposure image fusion based on image texture, structure, and saturation. First, texture similarity is employed to evaluate the performance of image texture preservation. Unlike texture, structure similarity is adopted as a measurement of spatial consistency. With respect to color information, saturation similarity models the accuracy and naturalness of fused images well. Then, an extreme learning machine is used to learn the interaction between texture, structure, and saturation similarities and image quality. Experimental results demonstrate that our proposed model correlates with subjective scores better than other models.

Motivation

A colorful image can be composed of texture, structure, and color information;³¹ distortion in the three parts would degrade the overall visual quality. Thus, a colorful image fusion method should correctly combine all these three types of image information. In our algorithm, we employed texture, structure, and saturation similarities to evaluate colorful multi-exposure fusion images.

Texture information

As defined by Shapiro and Stockman,³² image texture can provide information about the spatial arrangement of color or intensities in an image or a selected region of an image. It has previously been explained that the main information of image texture lies in the middle- and high-frequency regions.^33
–35 The discrete wavelet transform can decompose images into four subbands in low (LL), middle (LH, HL) and high (HH) frequencies, and is often used to extract image texture.^33
–35 Figure 3(b) and (d) are discrete wavelet transform decompositions of the fused images in Figure 3(a) and (c). The middle- (LH, HL) and high-frequency (HH) subimages in Figure 3(b) and (d) contain rich textures of rocks and trees. The amount of texture information contained in the fused image is closely related to its quality. As indicated in Figure 3, there exists a large difference in image quality between Figure 3(a) and (c). The middle- and high-frequency subbands (i.e., LH, HL, HH) in Figure 3(b) have a richer texture than those in Figure 3(d), which is in accordance with the mean opinion score values of Figure 3(a) and (c) of 7.6957 and 4.2609. Therefore, in this work, we use the discrete wavelet transform to extract texture information from images for further analysis.

Figure 3.

(a, c) Fused images by algorithms;^8,11 (b, d) corresponding discrete wavelet transform decomposition results. MOS: mean opinion score.

Structure similarity

Natural images are highly structured, which means that adjacent pixels have strong relevance and carry important structure information of visual content. Maintaining the structural integrity of the source images is a challenging task of multi-exposure image fusion. In most cases, the fused images are subject to structure loss or structure disordering. Wang et al.²¹ proposed structural similarity to evaluate image structure distortion. Figure 4 provides a demonstration of the structure similarity maps of two fused images generated from the same source sequence, where brighter regions in the maps indicate better structure preservation. Figure 4(a), which is generated by a local energy weighting algorithm, fails to maintain good structure and its structure similarity map (Figure 4(b)) shows unnatural artifacts around the bold edges of the books and lamp. In comparison, Figure 4(c), which is fused by the algorithm of Li and Kang,¹² shows better structure. Hence, we would like to extract the structure of images based on the model of Ma et al.³⁰ for multi-exposure image fusion structure assessment.

Figure 4.

(a, c) Images fused by a local energy weighting algorithm and the algorithm of Li and Kang,¹² respectively; (b, d) corresponding structure similarity maps.

Saturation

The color of images in Figure 2(a) and (c) is obviously distorted and inconsistent with human perception. According to Mohan and Moorthy,³⁶ saturation is able to describe how human beings naturally respond to color information. Saturation is the colorfulness of a color relative to its own brightness.³⁷ As shown in Figure 5, pixels with high saturation have bright color. On the contrary, pixels with low saturation have dim color, which means they lose some colorfulness. Therefore, saturation can be applied to evaluate the quality of colorful multi-exposure image fusion.

Figure 5.

Saturation distribution map.

Objective quality assessment of colorful multi-exposure image fusion images

Figure 6 shows the flow diagram of our proposed objective quality method for colorful multi-exposure image fusion images. The input source images (multi-exposure images) and multi-exposure image fusion image are colored images. The texture similarity, ascertained using the discrete wavelet transform, the structure similarity, ascertained using the structural similarity index, and the saturation similarity are computed to measure the consistency between source images and multi-exposure image fusion image in terms of texture, structure, and color information. The resultant similarities are thus input to an extreme learning machine to determine the final quality score of the tested fused image.

Figure 6.

Proposed model.

Texture similarity

A good fused image should contain a rich texture of source images.^38,39 To estimate texture preservation, we extract the richest texture map from the source images. First, the discrete wavelet transform is applied to the input source images and test fused image. Given an image I, its one-scale discrete wavelet transform decomposition coefficients I _c can be represented as

I_{c} = [\begin{matrix} L L & L H \\ H L & H H \end{matrix}]

where LL, LH, HL, and HH are the low–low, low–high, high–low, and high–high subbands. As discussed in the previous section, the low–high, high–low, and high–high subbands, that is, V = [LH; HL; HH], represent image textures. Typically, the larger the coefficients, the richer the image texture information. In Figure 7(d), the maximum wavelet coefficients of source images (highlighted by red and yellow rectangles) correspond to the texture of the cave and the trees outside the cave, respectively. Figure 8 illustrates the calculation of the richest texture map and texture similarity. Following the form of a feature similarity index,⁴⁰ the texture similarity TS is given by

Figure 7.

(a–c) One-scale discrete wavelet transform decomposition results of three source images; (d) maximum wavelet coefficients map of LH, HL, and HH subbands in (a), (b), and (c), respectively. The red and yellow rectangles correspond to the texture of the cave and the trees outside the cave.

Figure 8.

Texture similarity calculation. DWT, discrete wavelet transform.

T S_{j} = \frac{1}{N} \sum_{i = 1}^{N} \frac{2 V_{max} (x_{i}) V_{F_{j}} (x_{i}) + C_{1}}{V_{max} {(x_{i})}^{2} + V_{F_{j}} {(x_{i})}^{2} + C_{1}}

where TS _j is the texture similarity corresponding to the jth multi-exposure image fusion algorithm, j = 1,2,…,8, N is the total number of pixels x _i.,V _max(X _i) is the maximum wavelet coefficient of the source images, $V_{F_{j}} (x_{i})$ is the wavelet coefficient of the jth fused image, and C ₁ is a constant.

In this study, we adopted a three-scale discrete wavelet transform for image decomposition, since the low-frequency subimage LL of the one-scale discrete wavelet transform still contains some image texture. Then the final texture similarity of one fused image is

T S = [T S_{1}, T S_{2}, T S_{3}]

Structure similarity

As illustrated in Figure 4, the fused images suffer from some structure degradations, such as structure loss or structure disordering. Following the concept of structure similarity,²¹ we calculate the structure similarity between the source images and the fused image. Ma et al.³⁰ built a model to extract the local structure from source images, as

\hat{s} = \frac{\bar{s}}{∥ \bar{s} ∥}

and

\bar{s} = \frac{\sum_{k = 1}^{K} ω ({\tilde{x}}_{k}) s_{k}}{\sum_{k = 1}^{K} ω ({\tilde{x}}_{k})}

where $\hat{s}$ is the best structure saved from source images ${\tilde{x}}_{k}$ and K is the total number of source images; ω(⋅) is a weighting function, given by

\begin{matrix} ω ({\tilde{x}}_{k}) = ∥ {\tilde{x}}_{k} ∥^{p} \\ p = tan \frac{π R}{2} \end{matrix}

and

R ({{\tilde{x}}_{k}}) = \frac{‖ \sum_{k = 1}^{K} {\tilde{x}}_{k} ‖}{\sum_{k = 1}^{K} ‖ {\tilde{x}}_{k} ‖}

where p ≥ 0 is an exponent parameter and R is used to represent consistency between a set of vectors. In this article, we extract image structure based on the method of Ma et al.³⁰ and the structure similarity can be computed as

S S_{i} = \frac{σ_{x y} + C_{2}}{σ_{x} σ_{y} + C_{2}}

where SS _i is the structure similarity for the ith fusion algorithm, σ_x and σ_y are the standard deviation (the square root of variance) of $\hat{s}$ and the fused image, and the constant C ₂ is included to avoid the situation when σ_xσ_y is very close to zero. Moreover, σ_xy can be estimated as

σ_{x y} = \frac{1}{N - 1} \sum_{i = 1}^{N} (x_{i} - μ_{x}) (y_{i} - μ_{y})

where μ_x and μ_y are the mean values of x _i and y _i.

Saturation similarity

Mertens et al.⁸ measured the saturation by computing the standard deviation within the R, G, and B channels; this is defined as

S A = \sqrt{\frac{{(R - μ)}^{2} + {(G - μ)}^{2} + {(B - μ)}^{2}}{3}}

where SA is the saturation and μ is the mean of R, G, and B. Saturated colors are desirable and make the image look vivid.⁸ Here, we computed a maximum saturation map from source images, as shown in Figure 9. Figure 9(a) to (d) exemplify three source images varying from underexposure to overexposure and the generated image based on the maximum saturation map. One can see that the colored image of Figure 9(d) combines the best colorfulness of the other three source images. For example, the regions highlighted by red, yellow, and green rectangles in Figure 9(d) are the most colorful parts, as can be determined from their corresponding exposure images. The experiment result shows that the maximum saturation map can extract the best color information from the source image. Given n source images, the maximum saturation map SA _max is calculated as

Figure 9.

(a–c) Source images; (d) maximum saturation image. All the pixels are picked up from source images with the maximum saturation. The regions highlighted by red, yellow, and green rectangles in (d) are the most colorful parts of their corresponding exposure images.

S A_{max} = \max (S A_{1}, S A_{2}, \dots, S A_{n})

where SA _n represents the saturation of the nth source image.

In this article, the saturation similarity is calculated between the saturation map of fused image and the maximum saturation map to evaluate colorfulness distortion. The saturation similarity can be defined as⁴⁰

S A S_{j} = \frac{1}{N} \sum_{i = 1}^{N} \frac{2 S A_{max} (x_{i}) S F_{j} (x_{i}) + C_{3}}{S A_{max} {(x_{i})}^{2} + S F_{j} {(x_{i})}^{2} + C_{3}}

where SAS _j is the saturation similarity of the jth multi-exposure image fusion algorithm, j = 1,2,…,8, N is the total number of pixels x _i, SA _max(x _i) is the maximum saturation of the source images, SF _j(x _i) refers to the saturation of the jth fused image, and C ₃ is a constant.

Quality prediction based on extreme learning machine

In this subsection, we are to introduce the quality prediction process of this work. In the literature on image quality assessment, different kinds of method have been used for feature mapping, such as manually designed linear or nonlinear weighted summation or multiplication,^18

–22 neural networks,^41,42 support vector regression,^43,44 ^, etc. Usually, the manually designed formulas are applied to model the unknown complex relationship between image features and quality score. The learning-based tools (e.g., neural networks, support vector regression) are often time-consuming and suffer from such problems as overfitting and local optimization.⁴⁵ Therefore, in this study, we adopt an emergent machine learning technique, that is, an extreme learning machine, for quality prediction.

Huang et al.^45
–47 originally proposed the extreme learning machine for generalized single hidden-layer feedforward neural networks, and it has been used in various applications.^48

–54 The extreme learning machine aims to learn an approximation function based on training data. Suppose that single hidden-layer feedforward neural networks with K hidden nodes can be represented by

f_{L} (x_{i}) = \sum_{j = 1}^{K} g (x_{i}, a_{i j}, b_{i j}) β_{j}

where a _ij is the input weight connecting the input x _i to the jth hidden node, x _ij is the bias connecting the input x _i to the jth hidden node, g(⋅) is the activation function, and β_j is the output weight of the jth hidden node. The activation function g(⋅) can be any nonlinear piecewise continuous function, such as the sigmoid function (equation (13)) or the radial basis function (equation (14))

g (x; θ) = \frac{1}{1 + \exp (- (a^{T} X + b))}

g (x; θ) = \exp (- b ∥ X - a ∥_{2})

where θ = (a,b) are the parameters of the mapping function and ∥⋅∥₂ denotes the Euclidean norm.

Huang et al.⁴⁶ have proved that single hidden-layer feedforward neural networks are able to approximate any continuous target function over any compact subset X with the sigmoid and radial basis function functions. Training extreme learning machines is equivalent to solving a regularized least squares problem, which is considerately more efficient than training support vector machines or learning with back-propagation. Therefore, in our model, an extreme learning machine is employed to map the image features X = [TS SS SAS] into objective quality scores.

Experimental results

In this section, we compare the performance of the proposed algorithm with other state-of-the-art algorithms^{18
–20,22,26,27,30,55
–57} on the multi-exposure image fusion database.³⁰ We analyze the Spearman rank-order correlation coefficients and the Pearson linear correlation coefficients between the objective quality scores and the corresponding subjective scores. Furthermore, a scatter plot of the objective scores predicted by our method versus the subjective scores is also provided, to demonstrate good consistency.

Experimental setup

Database

The multi-exposure image fusion database³⁰ contains 17 natural source image sequences, which are shown in Figure 10 and listed in Table 1. Each source image sequence contains 3 to 30 different exposure images. We choose the best quality source image in terms of subjective evaluation to represent each source sequence, as presented in Figure 10. Eight multi-exposure image fusion algorithms are selected, including (1) global energy weighted linear combination, (2) local energy weighted linear combination, (3) Mertens07,⁸ (4) Li12,⁹ (5) Gu12,¹⁰ (6) Raman09,¹¹ (7) ShutaoLi12,¹² and (8) ShutaoLi12.¹³ Eventually, a total of 136 fused images are generated, with eight fused images for each image sequence. An example is shown in Figure 1, which includes ten source images at different exposure levels (Figure 1(a)) and eight fused images (Figure 1(b) to (i)).

Figure 10.

Input source image sequences in the database. Each image sequence is represented by one image, which is a fused image of the sequence that has the best quality in the subjective test.

Table 1.

Source input image sequences.

Source sequence	Size	Image source
House	340 × 512 × 4	Tom Mertens
Lighthouse	340 × 512 × 3	HDRsoft
Chinese garden	340 × 512 × 3	Bartlomiej Okonek
Madison capitol	384 × 512 × 30	Chaman Singh Verma
Tower	512 × 341 × 3	Jacques Joffre
Balloons	339 × 512 × 9	Erik Reinhard
Kluki	341 × 512 × 3	Bartlomiej Okonek
Cave	384 × 512 × 4	Bartlomiej Okonek
Belgian house	384 × 512 × 9	Dani Lischinski
Landscape	341 × 512 × 3	HDRsoft
Office	340 × 512 × 6	MATLAB
Venice	341 × 512 × 3	HDRsoft
Lamp 1	384 × 512 × 15	Martin Ĉadík
Memorial	512 × 381 × 16	Paul Debevec
Lamp 2	342 × 512 × 6	HDR projects
Farmhouse	341 × 512 × 3	HDR projects
Candle	364 × 512 × 10	HDR projects

Evaluation criteria

As recommended by the Video Quality Expert Group,⁵⁸ we use two criteria to evaluate the performance of our proposed image quality assessment model: (1) Spearman rank-order correlation coefficient, which evaluates prediction monotonicity; (2) Pearson linear correlation coefficient to measure prediction accuracy. The relationship between subjective scores and predicted quality scores may not be linear, owing to the nonlinear responses of human observers. A five-parameter logistic regression function is built between the predicted scores and the subjective scores when calculating the Pearson linear correlation coefficient. Assuming that Q and Q_p are the predicted scores before and after regression, respectively, the logistic regression function is defined as

Q_{p} = β_{1} (\frac{1}{2} - \frac{1}{1 + \exp (β_{2} (Q - β_{3}))}) + β_{4} Q + β_{5}

where β₁, β₂, β₃, β₄, and β₅ are regression model parameters.

Parameter setting. The three constants, that is, C ₁ in equation (3), C ₂ in equation (7), and C ₃ in equation (11) are all set to 0.00003. For the extreme learning machine, we set the number of hidden nodes K as 21 and select the sigmoid function (equation (13)) as the activation function. To reduce the dependencies within input features, divisive normalization is conducted on TS, SS, and SAS. Given a random variable x, its divisive normalization is defined as

\tilde{x} = \frac{x - \min (x)}{\max (x) - \min (x)}

Colorful multi-exposure image fusion assessment performance comparison

We compare the proposed approach with state-of-the-art quality metrics for multi-exposure image fusion,^{18

–22,26,27,30,55
–57} for the multi-exposure image fusion database.³⁰ Among these metrics, those of Hossny et al.⁵⁵, Cvejic et al.⁵⁶, and Wang et al.⁵⁷ are based on information theory; the metrics of Xydeas and Petrovic,¹⁸ Wang and Liu,¹⁹ and Zheng et al.²⁰ use feature-based methods; the metrics of Ma et al.³⁰ and Piella and Heijmans²² are based on structure similarity; and the metrics of Chen and colleagues^26,27 are based on human perception. Tables 2 and 3 show the Pearson linear correlation coefficient and Spearman rank-order correlation coefficient of the proposed and other compared quality metrics for multi-exposure image fusion. It can be seen that the proposed method delivers the best performance for almost all sets of test images. On average, the Pearson linear correlation coefficient and the Spearman rank-order correlation coefficient of our proposed method for the multi-exposure image fusion database are 0.9299 and 0.8958, which are much higher than the second- (Pearson linear correlation coefficient: 0.8928; Spearman rank-order correlation coefficient: 0.8570) and third-best methods (Pearson linear correlation coefficient: 0.6950; Spearman rank-order correlation coefficient: 0.6198).

Table 2.

Pearson linear correlation coefficient performance evaluation of proposed model against existing models.

Source sequence	⁵⁵	⁵⁶	⁵⁷	¹⁸	¹⁹	²⁰	²²	²⁶	²⁷	³⁰	Proposed model
House	−0.5417	0.7605	−0.5283	0.7050	0.4389	0.6648	0.5038	−0.3417	0.2765	0.9297	0.9241
Lighthouse	−0.3853	0.1744	−0.4001	0.8017	0.6261	0.5608	0.5020	−0.9610	−0.5719	0.9312	0.9901
Chinese garden	−0.1205	−0.4785	−0.1655	0.7294	0.7281	0.4017	0.4320	−0.9180	0.0095	0.8907	0.9699
Madison capitol	0.2654	−0.7290	0.1253	0.9388	0.8917	0.1063	0.1789	−0.9563	−0.8006	0.9513	0.9544
Tower	−0.2143	0.0526	−0.1574	0.6946	0.8136	0.6206	0.6301	−0.8479	0.1562	0.7719	0.8742
Balloons	−0.2238	−0.2942	−0.2687	0.7684	0.8358	0.4812	0.4087	−0.5263	0.3800	0.9563	0.9595
Kluki	−0.6407	0.5039	−0.7019	0.6408	0.5997	0.6929	0.2163	−0.4077	−0.2729	0.8632	0.8751
Cave	−0.2885	−0.5237	−0.3076	0.6207	0.5956	0.4762	0.4805	−0.8928	0.4154	0.8414	0.9609
Belgian house	−0.0912	0.0209	−0.0480	0.3914	0.3585	−0.1124	−0.0489	−0.7126	0.5952	0.8242	0.8627
Landscape	−0.3870	0.6205	−0.4039	0.8445	0.7518	0.6493	0.5955	−0.6828	0.6131	0.8291	0.8728
Office	−0.2107	0.5393	−0.1148	0.3204	0.4479	0.0806	0.0306	0.4677	0.9015	0.7457	0.8790
Venice	−0.2955	−0.2613	−0.2411	0.8383	0.6553	0.2462	−0.0226	0.1694	0.6680	0.9420	0.8566
Lamp 1	−0.4061	0.0313	−0.4434	0.6278	0.4225	0.5414	0.6184	−0.7133	0.4900	0.9141	0.9713
Memorial	−0.4182	0.4454	−0.4446	0.8276	0.6780	0.5877	0.7332	−0.9493	−0.2164	0.8981	0.9692
Lamp 2	−0.2030	0.3022	−0.1817	0.4980	0.4725	0.3161	0.3238	−0.5301	0.5309	0.9628	0.9809
Farmhouse	−0.4777	−0.1155	−0.4822	0.7719	0.8347	0.5718	0.5941	−0.1729	0.6806	0.9561	0.9572
Candle	−0.3584	−0.0222	−0.3226	0.7949	0.6543	0.4786	0.2796	−0.4498	0.6790	0.9699	0.9499
Average	− 0.2939	0.0604	− 0.2992	0.6950	0.6356	0.4332	0.3798	− 0.5544	0.2667	0.8928	0.9299

Table 3.

Spearman rank-order correlation coefficient performance evaluation of proposed model against existing models.

Source sequence	⁵⁵	⁵⁶	⁵⁷	¹⁸	¹⁹	²⁰	²²	²⁶	²⁷	³⁰	Proposed model
House	−0.4286	0.7143	−0.3810	0.6667	0.5000	0.5952	0.4524	−0.3810	0.3333	0.8333	0.9355
Lighthouse	−0.2994	0.0000	−0.2515	0.7785	0.7545	0.5389	0.4671	−0.9102	−0.7066	0.9701	0.9339
Chinese garden	−0.0714	−0.3820	−0.0238	0.7857	0.6190	0.4762	0.4048	−0.9286	0.1905	0.9762	0.9544
Madison capitol	0.3571	−0.6667	0.4048	0.9762	0.7857	0.1667	0.5476	−0.9762	−0.4524	0.9286	0.9274
Tower	−0.1190	0.0238	−0.0238	0.7143	0.8095	0.6429	0.5714	−0.8571	0.3333	0.8333	0.8509
Balloons	−0.2143	−0.2857	−0.1905	0.6905	0.7857	0.5476	0.5238	−0.7143	0.4048	0.9286	0.9459
Kluki	−0.4524	0.5000	−0.4524	0.7381	0.8095	0.5000	0.2857	−0.6905	0.1905	0.9286	0.8829
Cave	−0.0476	−0.6905	−0.0238	0.5952	0.4524	0.5238	0.4048	−0.9048	0.4524	0.8571	0.9087
Belgian house	−0.2381	0.1667	−0.2381	0.2619	0.2857	0.0476	0.1190	−0.7381	0.7381	0.7857	0.8672
Landscape	−0.4286	0.8333	−0.4762	0.7619	0.6190	0.6905	0.5476	−0.6905	0.8810	0.7143	0.8442
Office	−0.7381	0.5476	−0.7381	0.0238	0.4048	0.1429	0.1429	0.7143	0.8333	0.5238	0.7598
Venice	−0.8333	−0.4286	−0.8333	0.5000	0.4286	0.3810	0.0714	0.2381	0.6905	0.8810	0.8153
Lamp1	−0.2143	−0.3095	−0.1429	0.5238	0.3571	0.5238	0.4762	−0.7381	0.5476	0.8810	0.9548
Memorial	0.0000	0.8095	0.0238	0.7619	0.5476	0.5238	0.6667	−0.8571	−0.2381	0.8571	0.9066
Lamp2	−0.1928	0.0843	−0.1566	0.2771	0.3976	0.3856	0.4579	−0.6266	0.4940	0.7832	0.9066
Farmhouse	−0.4762	−0.2143	−0.4524	0.5714	0.5238	0.5952	0.5714	−0.0238	0.5952	0.9524	0.9414
Candle	−0.3353	0.2994	−0.3353	0.9102	0.7306	0.5629	0.3114	−0.4551	0.6587	0.9341	0.8930
Average	− 0.2784	0.0589	− 0.2524	0.6198	0.5771	0.4614	0.4131	− 0.5612	0.3498	0.8570	0.8958

Figure 11 shows scatter plots of subjective scores versus our proposed model in the multi-exposure image fusion database. Note that the scatter plots of the proposed model exhibit good linearity, tight clustering, and a relatively uniform density. They are consistent with the high Pearson linear correlation coefficient and Spearman rank-order correlation coefficient shown in the bottom row of Tables 2 and 3. We choose a sequence of images fused by the aforementioned multi-exposure image fusion algorithms^8

–13 and list the predicted scores of the algorithm of Ma et al.,³⁰ as well as our proposed algorithm, and mean opinion scores in Figure 12. It can be seen that the mean opinion score and quality score of Figure 12(c) predicted by the method of Ma et al.³⁰ are 3.9565 (highlighted in blue) and 0.8760 (highlighted in red), while those of Figure 12(a) are 5.0434 and 0.8356. Obviously, they are not consistent. However, the proposed method performs good consistency with the values of 5.4440 and 4.8622 for Figure 12(a) and (c) (highlighted in green). Figure 12(c) and (g) have the same problem as the method of Ma et al.³⁰

Figure 11.

Scatter plots of subjective scores versus our proposed model scores in multi-exposure image fusion database. MOS: mean opinion score.

Figure 12.

Fusion images generated by aforementioned multi-exposure image fusion algorithms,^8

–13 with predicted scores of the algorithm of Ma et al.³⁰ and our proposed algorithm, and mean opinion score values. Numbers in red and green correspond to predicted results of the method of Ma et al.³⁰ and our proposed algorithm, and those in blue correspond to the mean opinion score values. MOS: mean opinion score.

To further illustrate that the apparent advantages of our proposed model over the other compared methods are statistically significant, a one-sided t test was conducted on the multi-exposure image fusion databases using the Spearman rank-order correlation coefficient values. The one-sided t test tests the equivalence of the mean values of two samples drawn from insensitive populations of a normal distribution. Figure 13 demonstrates the t test experimental results, where the symbols “1”, “0,” and “−1” highlighted in blue, yellow, and red indicate that the row model is statistically better, indistinguishable, or worse than the column model, respectively. The most blue “1” values in the bottom line illustrate that our method is statistically superior to the previous methods for quality assessment of the multi-exposure image fusion database.

Figure 13.

One-sided t test results conducted using Spearman rank-order correlation coefficient values of compared quality metrics provided in Table 3. A value of “1” indicates that the row model is statistically better than the column model, while a value of “−1” indicates that the column model is statistically better. A value of “0” indicates that the two models are statistically equivalent. MEF IQA: Multi-exposure image fusion.

Conclusions and discussion

In this work, we propose a quality assessment metric for colorful multi-exposure image fusion. The texture, structure, and saturation similarities are computed as measurements of texture, structure, and color information. The normalized similarities are mapped to objective quality scores by an extreme learning machine technique. The experimental results on the multi-exposure image fusion database show that the proposed model correlates well with subjective perception and better than other compared state-of-the-art image fusion image quality assessment models. For future research, there is still plenty of room for the quality evaluation of various types of image fusion, including multi-focus image fusion, £¬hyper-spectral image fusion, and multi-source heterogeneous image fusion. These all have their own specific properties and are waiting to be explored.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Major Special Project-the China High-Resolution Earth Observation System (grant number 30-Y20A06-9003-15/16) and partially by the National Natural Science Foundation of China (grant number 61301090).

References

Miloslavov

Veeraraghavan

. Sensor data fusion algorithms for vehicular cyber-physical systems. IEEE Trans Parallel Distrib Syst 2012; 23(9): 1762–1774.

Liu

Sun

. Robust exemplar extraction using structured sparse coding. IEEE Trans Neural Networks Learn Syst 2015; 26(8): 1816–1821.

Stoychev

Wehner

Rettkowski

. Sensor data fusion with MPSoCSim in the context of electric vehicle charging stations. In: 2016 IEEE Nordic circuits and systems conference (NORCAS), Copenhagen, Denmark, 1–2 November 2016. Piscataway, NJ: IEEE.

Liu

Guo

Sun

. Object recognition using tactile measurements: kernel sparse coding methods. IEEE Trans Instrum Meas 2016; 65(3): 656–665.

Yan

Liu

Zhang

. Multiple sensor data fusion for degradation modeling and prognostics under multiple operational conditions. IEEE Trans Reliab 2016; 265(3): 1416–1426.

Liu

Sun

Fang

. Robotic room-level localization using multiple sets of sonar measurements. IEEE Trans Instrum Meas 2017; 66(1): 2–13.

Reinhard

Ward

Pattanaik

. High dynamic range imaging: acquisition, display, and image-based lighting. 2nd ed. Amsterdam: Elsevier, 2010.

Mertens

Kautz

Van Reeth

Frank

. Exposure fusion: a simple and practical alternative to high dynamic range photography. Comput Graphics Forum 2009; 28(1): 161–171.

Zheng

Rahardja

. Detail-enhanced exposure fusion. IEEE Trans Image Process 2012; 21(11): 4672–4676.

10.

Wong

. Gradient field multi-exposure images fusion for high dynamic range image visualization. J Visual Commun Image Represent 2012; 23(4): 604–610.

11.

Raman

Chaudhuri

. Bilateral filter based compositing for variable exposure photography. In: Eurographics 2009, Munich, Germany, 30 March–3 April 2009. Geneva: European Association for Computer Graphics.

12.

Kang

. Fast multi-exposure image fusion with median filter and recursive filter. IEEE Trans Consum Electron 2012; 58(2): 626–632.

13.

Kang

. Image fusion with guided filtering. IEEE Trans Image Process 2013; 22(7): 2864–2875.

14.

Toet

Franken

. Perceptual evaluation of different image fusion schemes. Displays 2003; 24(1): 25–37.

15.

Petrović

. Subjective tests for image fusion evaluation and objective metric validation. Inf Fusion 2007; 8(2): 208–216.

16.

Zeng

Hassen

. Perceptual evaluation of multi-exposure image fusion algorithms. In: 2014 sixth international workshop on quality of multimedia experience (QoMEX), Singapore, 18–20 September 2014. Piscataway, NJ: IEEE.

17.

Zhang

Yan

. Information measure for performance of image fusion. Electron Lett 2002; 38(7): 313–315.

18.

Xydeas

Petrovic

. Objective pixel-level image fusion performance measure. Proc SPIE Int Soc Opt Eng 2000; 4051: 89–98.

19.

Wang

Liu

. A novel image fusion metric based on multi-scale analysis. In: 9th international conference on signal processing, Beijing, China, 26–29 October 2008, pp.965–968. Piscataway, NJ: IEEE.

20.

Zheng

Essock

Hansen

. A new metric based on extended spatial frequency and its application to DWT based fusion algorithms. Inf Fusion 2007; 8(2): 177–192.

21.

Wang

Bovik

Sheikh

. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Processing 2004; 13(4): 600–612.

22.

Piella

Heijmans

. A new quality metric for image fusion. In: International conference on image processing, ICIP 2003, Barcelona, Spain, 14–17 September 2003, vol. 3, pp.173–176. Piscataway, NJ: IEEE.

23.

Wang

Bovik

. A universal image quality index. IEEE Signal Process Lett 2002; 9(3): 81–84.

24.

Cvejic

Łoza

Bull

. A similarity metric for assessment of image fusion algorithms. Int J Signal Process 2005; 2(3): 178–182.

25.

Yang

Zhang

J-Q

Wang

X-R

. A novel similarity based quality metric for image fusion. Inf Fusion 2008; 9(2): 156–160.

26.

Chen

Varshney

. A human perception inspired quality metric for image fusion based on regional information. Inf Fusion 2007; 8(2): 193–207.

27.

Chen

Blum

. A new automated quality assessment algorithm for image fusion. Image Vision Comput 2009; 27(10): 1421–1432.

28.

Burt

. The pyramid as a structure for efficient computation. In: Rosenfeld

(ed.) Multiresolution image processing and analysis. Berlin: Springer, 1984, pp.6–35.

29.

Burt

Kolczynski

. Enhanced image capture through fusion. In: Fourth international conference on computer vision, Berlin, Germany, 11–14 May 1993. Piscataway, NJ: IEEE.

30.

Zeng

Wang

. Perceptual quality assessment for multi-exposure image fusion. IEEE Trans Image Process 2015; 24(11): 3345–3356.

31.

Rinsi

Panicker

. Content based image retrieval using texture structure histogram and color. In: Innovative trends in science, engineering and management (ICITSEM-16), Panaji, India, 28–29 May 2016, pp.62–71. Chennai: Institute for Engineering, Research and Publication.

32.

Shapiro

Stockman

. Computer vision. Upper Saddle River: Prentice Hall, 2001.

33.

Chang

Kuo

CCJ

. A wavelet transform approach to texture analysis. In: IEEE international conference on acoustics, speech, and signal processing ICASSP-92, San Francisco, CA, 23–26 March 1992, vol. 4, pp.661–664. Piscataway, NJ: IEEE.

34.

Xia

Boncelet

Arce

. Wavelet transform based watermark for digital images. Opt Express 1998; 3(12): 497–511.

35.

Acharyya

Kundu

. Wavelet-based texture segmentation of remotely sensed images. In: 11th international conference on image analysis and processing, Palermo, Italy, 26–28 September 2001, pp.69–74. Piscataway, NJ: IEEE.

36.

Mohan

Moorthy

. Early detection of diabetic retinopathy edema using FCM. Int J Sci Res India 2013; 2(5): 115–118.

37.

Fairchild

. Color appearance models. 3rd ed. Chichester: John Wiley & Sons, 2013, p. 87.

38.

Liu

Huang

. Image fusion method considering region & texture feature. In: 2nd international conference on computer and automation engineering (ICCAE), Singapore, 26–28 February 2010, pp.79–83. Piscataway, NJ: IEEE.

39.

Omar

Stathaki

. GLCM-based metric for image fusion assessment. In: 15th international conference on information fusion (FUSION), Singapore, 9–12 July 2012. Piscataway, NJ: IEEE.

40.

Zhang

Mou

. FSIM: a feature similarity index for image quality assessment. IEEE Trans Image Process 2011; 20(8): 2378–2386.

41.

Bouzerdoum

Havstad

Beghdadi

. Image quality assessment using a neural network approach. In: Fourth IEEE international symposium on signal processing and information technology, Rome, Italy, 18–21 December 2004. pp.330–333. Piscataway, NJ: IEEE.

42.

Carrai

Heynderickz

Gastaldo

. Image quality assessment by using neural networks. In: IEEE international symposium on circuits and systems ISCAS, Scottsdale, AZ, 26–29 May 2002, vol. 5, pp.249–252. Piscataway, NJ: IEEE.

43.

Moorthy

Bovik

. Blind image quality assessment: from natural scene statistics to perceptual quality. IEEE Trans Image Process 2011; 20(12): 3350–3364.

44.

Mittal

Moorthy

Bovik

. No-reference image quality assessment in the spatial domain. IEEE Trans Image Process 2012; 21(12): 4695–4708.

45.

Huang

Zhu

Siew

. Extreme learning machine: theory and applications. Neurocomputing 2006; 70(1): 489–501.

46.

Huang

Chen

Siew

. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Networks 2006; 17(4): 879–892.

47.

Huang

Zhou

Ding

. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 2012; 42(2): 513–529.

48.

Miche

Sorjamaa

Bas

. OP-ELM: optimally pruned extreme learning machine. IEEE Trans Neural Networks 2010; 21(1): 158–162.

49.

Dong

Meng

. Real-time transient stability assessment model using extreme learning machine. IET Gener Transm Distrib 2011; 5(3): 314–322.

50.

Rong

Bao

Zhao

. Model reference adaptive neural control for nonlinear systems based on back-propagation and extreme learning machine. In: IEEE ninth international conference on intelligent sensors, sensor networks and information processing (ISSNIP), Singapore, 21–24 April 2014. Piscataway, NJ: IEEE.

51.

Mesquita

Gomes

Rodrigues

. Pruning extreme learning machines using the successive projections algorithm. IEEE Lat Am Trans 2015; 13(12): 3974–3979.

52.

Tang

Deng

Huang

. Extreme learning machine for multilayer perception. IEEE Trans Neural Networks Learn Syst 2016; 27(4): 809–821.

53.

Wang

Deng

Zhao

. Gradient-based no-reference image blur assessment using extreme learning machine. Neurocomputing 2016; 174: 310–321.

54.

Wang

Deng

Lin

. NMF-based image quality assessment using extreme learning machine. IEEE Trans Cybern 2017; 47(1): 232–243

55.

Hossny

Nahavandi

Creighton

. Comments on “Information measure for performance of image fusion”. Electron Lett 2008; 44(18): 1066–1067.

56.

Cvejic

Canagarajah

Bull

. Image fusion metric based on mutual information and Tsallis entropy. Electron Lett 2006; 42(11): 626–627.

57.

Wang

Shen

Jin

. Performance evaluation of image fusion techniques. In: Stathaki

(ed.) Image fusion: algorithms and applications. London: Academic Press, 2008, pp.469–492.

58.

Moorthy

Bovik

. A two-step framework for constructing blind image quality indices. IEEE Signal Process Lett 2010; 17(5): 513–516.