Model-Based Sensitivity Analysis on Aerosol Optical Thickness Prediction

Abstract

Prediction of aerosol optical thickness (AOT) is important to study worldwide climate changes. Researchers have built multiple AOT prediction models. However, few researches were focused on the validation of input attributes for AOT regression. In this paper, we proposed a support vector regression (SVR) model-based sensitivity analysis approach to order 35 MODIS input attributes according to their sensitivity to prediction outputs. Next, the attribute sensitivity orders are used for feature selection in the context of regression by removing insensitive attribute one at a time or by removing attributes whose sensitive orders are larger than number k. The experimental results based on the collocated data between MODIS and AERONET from 2009 to 2011 showed that the top 10 insensitive attributes can be screened to speed up prediction model computation with very little loss of accuracy. The results also suggested that the top sensitive attributes are the most informative attributes, requiring the highest precision for accurate AOT prediction. Thereby, our approach will be valuable for remote sensing scientists or atmospheric scientists to optimize the design precision of top sensitive attributes in scanning equipment like MODIS and therefore improve AOT retrieval accuracy.

1. Introduction

Aerosols are small solid or liquid particles produced by natural or man-made sources. The research on atmospheric aerosols is very useful to reveal the mechanism of the earth's solar radiation budget, water cycle balance, and climate change dynamics [1, 2]. Aerosol optical thickness (AOT) is one of the most important aerosol properties. AOT has been computed both by ground-based and satellite-based methods for years and many aerosol retrieval theories and algorithms were proposed. Although ground-based measurements, such as AERONET (Aerosol Robotic Network), turn out to be effective and have high accuracy in AOT retrievals, they require great cost correspondingly and are conditioned to a small amount of sporadic land observation sites. Satellite-based measurements based on domain model, such as MODIS (Moderate Resolution Imaging Spectroradiometer), have global coverage with low costs. But they have no enough accuracy due to the complexity nature of chemical or physical processes affecting aerosols.

Thereby, in recent years, scientists worldwide proposed several data-driven retrieval approaches or machine learning methods for AOT prediction and validation [3–13]. In these approaches, satellite-based measurements generally are used as input attributes and corresponding ground-based measurements are used as AOT validation outputs. They are combined together to build a supervised global AOT prediction model.

However, few researches were focused on the validation of input attributes, or, say, feature selection, for regression models. Although feature selection has been studied in classification tasks [14–17], few works have been implemented in regress tasks. To some degree, too many input attributes could be redundant or noisy for accurate model prediction. Particularly, for some types of data, such as images, multiple feature extraction approaches can be applied to obtain many features [18, 19]. But only parts of them are informative for regression models. Thereby, it is very useful to study feature selection problem in the context of regression.

In this paper, we proposed a novel model-based sensitivity analysis approach for feature selection in AOT regression. The approach combines support vector regression (SVR) model with sensitivity analysis (SA) method together to validate the usefulness of model input attributes. SA is a useful tool to ascertain how the output of a given model depends on its input attributes. It has been developed in a growing number of fields, such as domain modeling in hydrology, ecosystem, structure engineering, and so forth, where computational models are used to simulate the real world [20–26]. SA helps the modeler to understand the model better, especially when the model is complex and unknown correlations exist among input attributes.

Our approach consists of three steps. Firstly, we build an optimized support vector regression (SVR) model to predict AOT. This is because SVR model has been demonstrated to achieve better prediction accuracy than other effective machine learning approaches for AOT prediction [3]. Next, we propose a model-based sensitivity analysis approach to order the input attributes by their sensitivity effects to the prediction model outputs. Further, we compare prediction accuracy of model with full inputs to those of models by removing insensitive attribute one at a time or by removing attributes whose sensitive orders are larger than a number k. The experimental results based on the collocated data between MODIS and AERONET from 2009 to 2011 show that the top 10 insensitive attributes can be screened to speed up prediction model computation with very little loss of accuracy. The results also suggested that the top sensitive attributes are the most informative attributes, requiring the highest precision for accurate AOT prediction. Thereby, our model-based sensitivity analysis method will be valuable for remote sensing scientists or atmospheric scientists to optimize the design precision of top sensitive attributes in scanning equipment like MODIS and therefore improve AOT retrieval accuracy.

The contribution of this paper is two sides. On one side, feature selection is normally studied in data classification task and few researchers study the problem in the context of regression prediction. In this paper, we proposed a novel model-based sensitivity analysis method for this purpose. On the other side, experimental results show that our method not only can refine the inputs to AOT prediction model, but also provide valuable insights for remote sensing scientists or atmospheric scientists to optimize the observation equipment design and therefore improve their geophysical parameter retrieval algorithm.

2. Related Work

In recent years, several data-driven retrieval approaches or machine learning methods were proposed for AOT prediction and validation [3–13]. Radosavljevic et al. used MODIS radiance observations as inputs and predicted AERONET AOT by neural networks [4]. Further, they applied five measures to evaluate the AOT retrieval accuracy [5]. Both experimental results showed that the proposed ensemble of neural networks was significantly more accurate than domain-based AOT retrievals for all measures. To make models more accurate, Radosavljevic et al. proposed that the predictors should be customized according to different spatiotemporal partitions [6]. They also explored to reduce AERONET sites and select only most informative neighborhood sites to improve accuracy [7]. Based on the research by Radosavljevic et al., Ristovski et al. proposed a bootstrap technique for regression and uncertainty estimation [8]. Das et al. argued that AOT predictor could be enhanced by combining active learning method with neural networks [9]. Han et al. applied a statistic approach to predict AOT as a complement to the domain algorithm [10]. In their research, two statistic approaches, spatial interpolation and neural network predictors, were explored. The results showed that statistic approach could serve as a useful complement to traditional deterministic methods with reduced computational efforts. Albayrak et al. used a neural network algorithm with one hidden layer to build a global bias adjustment model to improve MODIS AOT retrieval accuracy [11]. Besides of neural networks methods, support vector regression (SVR) was used for AOT prediction by Nguyen et al. [12]. They used instance data set and aggregate data set, respectively, to build two SVR models for AOT predictions and they achieved more accurate results than neural network predictions. In addition, Djuric et al. proposed a semisupervised approach to integrate AOD estimations from multiple satellite sensors together and make more accurate estimations [13]. However, all aforementioned related work did not use data-driven approach to validate the input attributes and check if the inputs really make significant contributions for AOT regression models.

3. Method

In this section, we will explain the proposed model-based sensitivity analysis approach for feature selection in AOT regression. The approach consists of three steps, building a SVR prediction model, SVR model-based sensitivity analysis, and feature selection for regression. The processing steps are illustrated in Figure 1. We will explain each step in detail as follows.

Figure 1

Processing flow.

3.1. Prediction Model: SVR

SVR is a machine learning method to do regression prediction. It was built on the basis of the Vapnik-Chervonenkis dimension theory and structure risk minimization principle in statistical learning. SVR maps model's input attributes from lower nonlinear dimensions to higher dimensional feature space and tries to find the best regression hyperplane. The most commonly used SVR model is epsilon-SVR.

Suppose the training dataset is composed of points $(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{n}, y_{n})$ , where $x_{i} = 〈a_{1}, a_{2}, \dots, a_{m}〉$ is a feature vector of m attributes for sample point i, $a_{j}$ is attribute j, $y_{i}$ is the regression target for $x_{i}$ , and n is the number of points in dataset. In epsilon-SVR, we aim to find a regression hyperplane with an epsilon-insensitive loss function as follows:

\begin{matrix} f (x) = 〈w, \emptyset (x)〉 + b, \end{matrix}

(1)

where w and b are weight vector and offset, respectively and ∅ is a nonlinear mapping function from the input attribute space to a high dimensional feature space.

〈\cdot, \cdot〉

represents the inner product of involved parameters.

For the function f to be epsilon-insensitive and also as flat as possible, we have the objective function and constraints for SVR as follows:

\begin{array}{l} Minimize & \frac{1}{2} {‖ w ‖}^{2} + C \sum_{i = 1}^{n} (δ_{i} + δ_{i}^{*}) \\ Subject to & y_{i} - 〈 w, \emptyset (x_{i}) 〉 - b \leq epsilon + δ_{i} \\ 〈 w, \emptyset (x_{i}) 〉 + b - y_{i} \leq epsilon + δ_{i}^{*} \\ δ_{i}, δ_{i}^{*} \geq 0. \end{array}

(2)

The constant parameter

C > 0

determines the trade-off between the flatness of f and the training error.

δ_{i}

and

δ_{i}^{*}

are slack variables suggesting the amount for exceeding the target value by more than epsilon or below the target value by epsilon.

By introducing the Lagrangian and performing optimization, we can have the regression hyperplane in the following dual representation:

\begin{matrix} f (x) = \sum_{i = 1}^{n} (α_{i} - α_{i}^{*}) K (X_{i}, X) + b . \end{matrix}

(3)

Here, $α_{i}$ and $α_{i}^{*}$ are Lagrange multipliers and $K (X_{i}, X)$ is the kernel function which represents the inner product $〈\emptyset (X_{i}), \emptyset (X)〉$ .

There are four major types of kernel functions, linear kernel, polynomial kernel, radial basis function kernel, and sigmoid kernel. In this study, we use epsilon-SVR with radial basis function kernel provided by libsvm [27] for AOT prediction. Formula (4) shows Gaussian Kernel, one of the radial basis function kernel used by libsvm:

\begin{matrix} K (x_{i}, x_{j}) = \exp (- γ {‖x_{i} - x_{j}‖}^{2}) . \end{matrix}

(4)

Here

x_{i}

x_{j}

are two samples in the dataset, represented as vectors in an input attribute space.

{‖x_{i} - x_{j}‖}^{2}

represents the squared Euclidean distance between them. γ is a free parameter.

3.2. Model-Based Sensitivity Analysis

To apply feature selection, we firstly need to order attributes according to their contribution to the regression model, such that the most informative attributes are in the top and the noisy or useless attributes are in the bottom. In this paper, we combine SVR regression model with SA method together to propose a model-based sensitivity analysis approach for deciding the attribute order.

SA is a powerful method widely used in studying the uncertainty of model inputs. It orders input attributes by their degree influencing the model outputs. The most sensitive attributes with the biggest impact to outputs are ranked in the top. And the insensitive attributes with no or very little impact on outputs are ranked in the bottom.

In our approach, by using the kernel function showed in formula (4), we build the epsilon-SVR regression model.

As we explain before, $x_{i}$ is a vector of multiple attributes; that is, $x_{i} = 〈a_{i}, a_{2}, \dots, a_{m}〉$ . To analyze the influence of a small offset Δ of attribute variable $a_{j}$ to the dependent variable $Y_{i}$ , we could use the following formula:

\begin{array}{l} d_{j} (x) \\ = \sum_{i = 1}^{n} | \frac{Y_{i}^{'} - Y_{i}}{Δ} | \\ = \sum_{i = 1}^{n} | \frac{f {(a_{1}, \dots, (a_{j} + Δ), \dots, a_{m})}_{i} - f {(a_{1}, \dots, a_{j}, \dots, a_{m})}_{i}}{Δ} | . \end{array}

(5)

The larger the $d_{j} (x)$ is, the more influence $a_{j}$ has on $Y_{i}$ . Therefore, we change the value of each attribute slightly at one time in m rotation cycles and we could get m changes of the SVR model outputs. Sorting the input attributes by their influences on SVR outputs, we will have the attribute sensitivity order.

3.3. Feature Selection by Filtering out Insensitive Attributes

After we obtained the sensitivity order of input attributes to regression outputs, we can apply feature selection by filtering out insensitive attributes.

We have designed two types of feature selection. One is the univariate filtering out. In this case, according to the reversed sensitivity order, we will remove an insensitive attribute from inputs to the SVR regression model one at a time. By experiments, we can pragmatically point out which attribute can be filtered out from inputs with no or little loss of regression accuracy or even improving accuracy. The other type of feature selection is multivariate filtering out. We will leave out attributes whose sensitivity orders are larger than number k. k is optimized by experiments. The SVR regression model with the remaining attributes will achieve similar accuracy as the model with full attributes.

3.4. Measurements of Prediction Accuracy

Our sensitivity analysis is built based on the SVR regression model. Each input attributes are evaluated by their sensitivity degree to impact the SVR prediction accuracy. For fully judge of the impact, we select multiple widely used regression accuracy measures in AOT retrievals.

The most simple and plain measurements are mean square error (MSE):

\begin{matrix} MSE = \frac{\sum_{i = 1}^{n} {(y_{i} - t_{i})}^{2}}{n}, \end{matrix}

(6)

where

t_{i}

is the AERONET AOT as ground trues and

y_{i}

is the corresponding AOT prediction obtained from the SVR model.

Another most commonly used measure is coefficient of determination ( $R^{2}$ ):

\begin{matrix} R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - t_{i})}^{2}}{\sum_{i = 1}^{n} {(t_{i} - \bar{t})}^{2}} . \end{matrix}

(7)

Here, $\bar{t}$ is the mean value of t across all n collected samples. The highest $R^{2}$ accuracy is 1. The closer it is to 1, the more accurate the model output is.

Correlation coefficient (CORR) is the indicator of the degree of correlation between truth and prediction variables; it is often used to measure the regression accuracy and defined as

\begin{matrix} CORR = \frac{\sum_{i = 1}^{n} (y_{i} - \bar{y}) (t_{i} - \bar{t})}{\sqrt{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \sqrt{\sum_{i = 1}^{n} {(t_{i} - \bar{t})}^{2}}} . \end{matrix}

(8)

According to domain scientists, there are inherent measurement errors in MODIS AOT retrievals [2]. The expected boundary is defined as

\begin{matrix} |y_{i} - t_{i}| \leq 0.05 + 0.15 t_{i} . \end{matrix}

(9)

Based on this boundary, two domain specific measurements of AOT retrieval accuracy, mean square relative error (MSRE) and fraction of successful prediction (FRAC), were proposed in [4]. We also use these two measurements to evaluate our model.

MSRE is defined as

\begin{matrix} MSRE = \frac{1}{n} \sum_{i = 1}^{n} {(\frac{y_{i} - t_{i}}{0.05 + 0.15 t_{i}})}^{2} . \end{matrix}

(10)

The closer it is to 0, the more accurate the AOT predictor is.

FRAC is defined as

\begin{matrix} FRAC = \frac{I}{N} * 100 %, \end{matrix}

(11)

where I is the number of predictions that drop within the expected boundary. The closer it is to 1, the more accurate the AOT regression model is.

4. Experimental Results

4.1. Data Set

Like many previous studies, we use AERONET retrievals as ground-truth for validation of our prediction outputs. We randomly picked 24 AERONET sites among 40 whose longitude is between 70°E and 140°E and latitude between 20°N and 50°N (shown in Figure 2). And their level 2.0 cloud-screened and quality-assured AERONET data from 2009 to 2011 were collected. Since AERONET does not provide AOT retrievals at wavelength of 550 nm, we interpolated it with its measurements at 440 nm and 675 nm by the following equation:

\begin{matrix} \log_{10} τ_{550} = \frac{\log_{10} (τ_{675} / τ_{440})}{\log_{10} (675 / 440)} * \log_{10} (\frac{550}{440}) + \log_{10} τ_{440} . \end{matrix}

(12)

Figure 2

AERONET sites used in the experiment.

As for MODIS, we choose three aerosol related products from MODIS instrument aboard on TERRA. They are MOD02SSH level 1B radiance data with spatial resolution of 5 km, MOD04 level 2 aerosol retrievals, and MOD35 level 2 cloud mask product with resolution of 1 km. To collocate MODIS data with AERONET data spatially, we apply a region box of ±0.15 degree in latitude and longitude around the corresponding AERONET site when considering MODIS data. MODIS information in the region box is synchronized with the temporal mean values of the AERONET AOT observations taken within ±30 minutes of MODIS overpass. In this way, we derived 35 attributes from MODIS products, represented as $x = 〈a_{1}, a_{2}, \dots, a_{35}〉$ . $a_{1}$ is the number of validated gflags in the region, $a_{2}$ is AOT at 550 nm for both ocean (best) and land (corrected) with best quality data, $a_{3} – a_{5}$ are corrected optical thickness land at 470, 550, and 660 nm by MODIS deterministic algorithm, $a_{6} – a_{19}$ are the average and minimum MODIS radiances over cloud-free pixels for seven wavelengths between 0.47–2.1 $μ m$ , $a_{20} – a_{26}$ are the average MODIS radiance uncertainties for the seven wavelengths of cloud-free pixels, $a_{27} – a_{31}$ are the number of cloud-free pixels over water, coastal, desert, and land in the region, and $a_{32} – a_{35}$ are mode analysis results of the first four bytes in MODIS quality assurance (QA) flags of land. MODIS provides five bytes in QA flags of land in total. But we observe that the fifth byte values are all constant 0 throughout the collected samples, so it is omitted and we just use the first four bytes of QA flags for modeling. The name and explanation of 35 attributes are summarized in Table 1.

Table 1

35 attributes derived from three MODIS products.

Attribute index	Name and explanation
1	Number of validated gflags in the region

2	MODIS AOT at 550 nm by deterministic algorithm for both ocean (best) and land (corrected) with best quality data

3–5	MODIS corrected AOT land at 470, 550, and 660 nm by deterministic algorithm

6–12	Average of MODIS radiances over cloud-free pixels for seven wavelengths

13–19	Minimum of MODIS radiances over cloud-free pixels for seven wavelengths

20–26	Average MODIS radiance uncertainties for seven wavelengths over cloud-free pixels

27–31	Number of cloud-free pixels over water, coastal, desert, and land in the region

32–35	Mode analysis results of the first four bytes in MODIS quality assurance flags of land

For each piece of MODIS data, it was taken into account only when MODIS data within the region described above contains at least one noncloud pixel and at least one AERONET AOT retrieval is available within the time range of ±30 minutes around MODIS overpass. In total, we obtain 1080 spatially and temporally collocated data samples.

4.2. SVR Model Optimization

To build a SVR predict model, we use the tool libsvm [27]. It is an efficient and widely used SVM tool package. It can be applied to many classification and regression problems and provides four types of common kernel functions. We first format the experiment data into following form:

\begin{matrix} 〈y_{i}〉 〈index 1〉 : 〈a_{1}〉 〈index 2〉 : 〈a_{2}〉 \dots . \end{matrix}

(13)

The AERONET AOT retrieval result was labeled as $y_{i}$ above, followed by 35 MODIS derived attributes.

Then we normalize the input attributes of formatted samples. All attribute values were normalized into the range $[- 1,1]$ . After that, 5 rounds of random selection were applied to the 1080 samples and we obtained 5 groups of training sets and test sets. In each group, 540 samples were selected as training set and the rest samples as test set. The SVR model was trained on each training set and applied on each corresponding test set. $R^{2}$ was used to evaluate the model.

In order to gain an optimal model, parameters of each SVR model are optimized by using grid-search provided by libsvm. We choose epsilon-SVR and radial basis kernel function. The optimized parameters are c, g, and p, representing cost in epsilon-SVR, gamma in kernel function, and epsilon in loss function of epsilon-SVR, respectively. Other parameters are set to the default values.

First, the searching ranges of the three parameters were all set to [ $2^{- 15}, 2^{15}$ ], and the search step increment was set to 2². For each parameter setting, the model was trained on 5 train sets and tested on corresponding test sets. The 5 rounds of resulted R-squares are reported in Table 2. From the results, we see that through all 5 rounds of modeling and tests, the optimized parameters are very close to each other and the R-squares are around 0.81. It shows that SVR model stably achieves optimization with these settings.

Table 2

Results of SVR model parameter optimization.

Number	Param.	Min	Step	Max	Search result
1	c	2⁻¹⁵	2²	2¹⁵	$c = 512$
	g	2⁻¹⁵	2²	2¹⁵	$g = 0.0078125$
	p	2⁻¹⁵	2²	2¹⁵	$p = 0.0078125$
	p	2⁻¹⁵	2²	2¹⁵	$R^{2} = 0.811328$

2	c	2⁵	2	2¹⁵	$c = 128$
	g	2⁻¹²	2	2⁰	$g = 0.015625$
	p	2⁻¹²	2	2⁰	$p = 0.015625$
	p	2⁻¹²	2	2⁰	$R^{2} = 0.812316$

3	c	2⁵	2^0.5	2⁹	$c = 90.5097$
	g	2⁻⁹	2^0.5	2⁻³	$g = 0.015625$
	p	2⁻⁹	2^0.5	2⁻³	$p = 0.0220971$
	p	2⁻⁹	2^0.5	2⁻³	$R^{2} = 0.812806$

4	c	2⁵	2^0.25	2⁸	$c = 90.5097$
	g	2⁻⁷	2^0.25	2⁻⁴	$g = 0.015625$
	p	2⁻⁷	2^0.25	2⁻⁴	$p = 0.0220971$
	p	2⁻⁷	2^0.25	2⁻⁴	$R^{2} = 0.812806$

5	c	2⁶	2^0.1	2⁷	$c = 78.7932$
	g	2⁻⁷	2^0.1	2⁻⁵	$g = 0.0167465$
	p	2⁻⁷	2^0.1	2⁻⁴	$p = 0.0236831$
	p	2⁻⁷	2^0.1	2⁻⁴	$R^{2} = 0.812822$

4.3. Sensitivity Analysis Experiments

To explore the impact of each input attribute on model prediction output, we change the value of each attribute with a small offset one at a time and then calculate the prediction output difference as described in (5).

By sorting $d_{j} (x)$ in descending order with a constant Δ, we obtain the sensitivity order of all input attributes to the model. To get a statistically robust result, we have done the experiment for 1000 times. Every time the training set and test set are randomly selected. Δ is set to 0.01. We obtain a $1000 * 35$ matrix M. Matrix element $M (i, j)$ represents the sensitivity order of attribute j in experiment i. The ascending order of $\sum_{i = 1}^{1000} M (i, j)$ suggests the overall statistic sensitivity result for all 35 attributes. The detailed experimental results are reported in Table 3. In the table, the sensitivity order of 35 attributes can be divided into three groups. The first 10 attributes are in the sensitive group, the last 10 are in the insensitive group and the rest are in the medium-sensitive group.

Table 3

Sensitivity order of attributes by 1000 times of experiments.

Sensitivity order	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18

Attribute number	3	11	7	26	9	10	13	4	16	5	18	12	21	19	8	6	2	28

Sensitivity order	19	20	21	22	23	24	25	26	27	28	29	30	31	32	33	34	35

Attribute number	20	25	15	17	29	14	24	30	27	22	33	1	32	34	31	23	35

Note: 1st attribute in sensitivity order is the most sensitive attribute and the 35th-order attribute is the most insensitive attribute.

To further validate the sensitivity order of 35 attributes, we repeat experiments by changing parameters from three perspectives: the repeating number of independent experiments, the value of Δ, and the model parameter setting.

For the first aspect, we simply repeat the experiments for 2000 times. For the second aspect, we try $Δ \in [- 0.3,0.3]$ with a step of 0.01 and obtain 60 statistic results with different Δ. For the third aspect, we apply model jitter with a small offset on SVR model parameters. We aim to see if the sensitivity orders of attributes remain stable by jittering.

By analyzing the experimental results, we find that the sorting order of attributes in the sensitive group and insensitive group is stable through all experiments by various conditions in the three aspects. The rest medium-sensitive group shows no fixed pattern. But it is rare that attributes in one group moved to another group when condition changed.

These experimental results proved that the sensitivity orders of attributes reported in Table 3 keep stable in various conditions.

4.4. Feature Selection Evaluation

In this step, we aim at exploring feature selection by using the obtained sensitivity order results. We implement two groups of experiments. In the first group, we remove an attribute from inputs to the model one at a time in the reversed sensitivity order. In the second group, we remove all input attributes whose sensitivity orders are larger than number k. In both groups, attributes are removed in reversed sensitivity order as reported in Table 3. That is, in the first group, we rotationally remove the 35th, 23rd, 31st, …, until 3rd attribute. In the second group of experiments, we will firstly model regression by leaving out 35th attribute, then model regression by leaving out 35th and 23rd attributes, next model regression by leaving out 35th, 23^rd, and 31st attributes, …, and so forth. When every time we remove an attribute or multiple attributes with sensitivity order larger than k, the model training process was reoptimized and prediction test process was redone. We use the five measures, $R^{2}$ , MSE, CORR, MSRE, and FRAC, to evaluate the prediction accuracy.

The changes of the above five measures as we remove one attribute from inputs to the model in the first group of feature selection experiments are shown in Figure 3. From Figure 3, we can see that roughly the five measures changes little when we remove any of the top 25 insensitive and medium-sensitive attributes. In such conditions, models with one less attributes achieve accuracy around $R^{2} = 0.805$ and it is very close to the model accuracy with the full 35 input attributes. It suggests that any insensitive and medium-sensitive attribute can be screened to speed up prediction model with almost no loss of accuracy.

Figure 3

Changes of R-square, MSE, CORR, MSRE, and FRAC when we remove an attribute in sensitivity order one at a time.

In particular, Figure 3 shows that the 3rd sensitive order attribute; that is, attribute 7, the average MODIS radiances over cloud-free pixels at wavelength 0.55 $μ m$ , plays an import role for achieving accurate prediction. Thereby, the design of high precision detection at this wavelength on MODIS will be very useful for resulting in more accurate AOT retrievals.

The exception insensitive attributes are in the order at 5 and 7. The 5th insensitive attribute is $x 32$ , representing the first byte of QA flag for MODIS retrieval on land. The 7th insensitive attribute, $x 33$ , represents the second byte of QA flag. When removing $x 32$ , the value of R-square and CORR decrease and the value of MSE increase visibly. The result indicates that the information in the first byte of MODIS QA flag land has an important effect on model prediction even though $x 32$ is an insensitive attribute. While leaving out $x 33$ shows the opposite effect. The model prediction accuracy is even better when we remove it. It suggests that we can safely leave out $x 33$ .

In this way, by implementing experiments practically, we found that sensitive attributes affect the prediction accuracy greatly while other insensitivities are not. In addition, we observe that there are some correlations among the input attributes, affecting the SVR model prediction accuracy. So if a sensitive attribute was left out, but the information it contains can be covered by other remaining attributes, then the prediction accuracy will not change much. This phenomenon can partially explain the reason why removing any attribute in the range of reversed sensitivity order 10–25 has little impact to regression accuracy.

The changes of the five measures as we remove all input attributes whose sensitivity orders are larger than a number k are shown in Figure 4. From Figure 4, we see that the five measures did not change much ( $R^{2}$ keeps around 0.80) even if we leave out the top 10 insensitive attributes. But when removing the top sensitive attributes, the prediction accuracy will drop sharply. It suggests that the sensitivity order of attributes is effective for feature selection and useful to build a fast prediction model with less input attributes.

Figure 4

Changes of R-square, MSE, CORR, MSRE, and FRAC when removing all input attributes whose sensitivity orders are less than k.

5. Conclusions

In this paper, we proposed a SVR model-based sensitivity analysis approach for feature selection in the context of AOT regression. Specifically, we firstly use our method to order 35 input MODIS attributes according to their sensitivity to prediction outputs. Next, the attribute sensitivity orders are used to carry out feature selection by removing insensitive attribute one at a time or by removing attributes whose sensitive orders are larger than number k. The experimental results of regression based on the collocated data between MODIS and AERONET from 2009 and 2011 showed that the top 10 insensitive attributes can be screened to speed up prediction model computation with very little loss of accuracy. The results also suggested that the top sensitive attributes are helpful to identify the most informative attributes as well as the attributes requiring the highest precision for AOT prediction. Thereby, our approach will be valuable for remote sensing scientists or atmospheric scientists to optimize the design precision for some observation attributes in scanning equipment, like MODIS, and further improve AOT retrieval accuracy.

6. Future Work

Next, we will experimentally combine sensitivity analysis with other data mining models, such as neural networks and decision trees, to test their model-based sensitivity analysis ability.

In addition, we will apply the proposed model-based sensitivity analysis method to other satellite sensors in A-train satellite constellation and analyze their informative observing attributes for AOT regression. Then we aim to fuse these selected attributes from multiple sensors together for further improving AOT regression accuracy.

Our future work also includes extending our proposed method in big data cases in the internet of things or astronomy context [28–30]. In these cases, we will collect information from multiple sources with different scale of uncertainty. For example, for estimating astronomy photometric redshift, we need to combine multibands together from multiple sky surveys, such as SDSS, UKIDSS, and WISE. It is useful to first apply our model-based sensitivity analysis approach to pick out those attributes which make significant contributions for the regression model before information fusion from multiple sources.

Footnotes

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grants 61272272 and 61440054. The authors also would like to thank the NASA scientists for providing MODIS and AERONET data.

References

Xie

S.-P.

Xiang

B. Q.

Similar spatial patterns of climate responses to aerosol and greenhouse gas changes

Nature Geoscience 2013 6 10 828 832

10.1038/ngeo1931

2-s2.0-84884990477

Ichoku

Kaufman

Y. J.

Remer

L. A.

Levy

Global aerosol remote sensing from MODIS

Advances in Space Research 2004 34 4 820 827

10.1016/j.asr.2003.07.071

2-s2.0-3242660787

Lary

D. J.

Remer

L. A.

MacNeill

Roscoe

Paradise

Machine learning and bias correction of MODIS aerosol optical depth

IEEE Geoscience and Remote Sensing Letters 2009 6 4 694 698

10.1109/lgrs.2009.2023605

2-s2.0-70350276188

Radosavljevic

Vutevic

Obradovic

Aerosol optical depth retrieval by neural networks ensemble with adaptive cost function

Proceedings of the 10th International Conference on Engineering Applications of Neural Networks

August 2007

Thessaloniki, Greece

266 275

Radosavljevic

Vucetic

Obradovic

A data-mining technique for aerosol retrieval across multiple accuracy measures

IEEE Geoscience and Remote Sensing Letters 2010 7 2 411 415

10.1109/LGRS.2009.2037720

2-s2.0-77951208625

Radosavljevic

Vucetic

Obradovic

Spatio-temporal partitioning for improving aerosol prediction accuracy

Proceedings of the SIAM International Conference on Data Mining

2008

609 620

Radosavljevic

Vucetic

Obradovic

Reduction of ground-based sensor sites for spatio-temporal analysis of aerosols

Proceedings of the 3rd International Workshop on Knowledge Discovery from Sensor Data (SensorKDD ‘09)

June 2009

71 78

10.1145/1601966.1601980

2-s2.0-70449876035

Ristovski

Vucetic

Obradovic

Uncertainty analysis of neural-network-based aerosol retrieval

IEEE Transactions on Geoscience and Remote Sensing 2012 50 2 409 414

10.1109/TGRS.2011.2166120

2-s2.0-84856363007

Das

Radosavljevic

Vucetic

Obradovic

Reducing need for collocated ground and satellite based observations in statistical aerosol optical depth estimation

Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS ‘08)

July 2008

II879 II882

10.1109/igarss.2008.4779135

2-s2.0-66549117136

10.

Han

Vucetic

Braverman

Obradovic

A statistical complement to deterministic algorithms for the retrieval of aerosol optical thickness from radiance data

Engineering Applications of Artificial Intelligence 2006 19 7 787 795

10.1016/j.engappai.2006.05.009

2-s2.0-33749242922

11.

Albayrak

Wei

Petrenko

Lynnes

Levy

R. C.

Global bias adjustment for MODIS aerosol optical thickness using neural network

Journal of Applied Remote Sensing 2013 7 1

073514

10.1117/1.jrs.7.073514

2-s2.0-84887467141

12.

Nguyen

T. N. T.

Vani

S. M.

Campalani

Cavicchi

Bottoni

Aerosol optical thickness retrieval from satellite observation using support vector regression

Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications 2010 6419

Berlin, Germany

Springer

492 499 Lecture Notes in Computer Science

10.1007/978-3-642-16687-7_65

13.

Djuric

Kansakar

Vucetic

Semi-supervised learning for integration of aerosol predictions from multiple satellite instruments

Proceedings of the 23rd international joint conference on Artificial Intelligence

2013

AAAI Press

2797 2803

14.

Pal

Foody

G. M.

Feature selection for classification of hyperspectral data by SVM

IEEE Transactions on Geoscience and Remote Sensing 2010 48 5 2297 2307

10.1109/tgrs.2009.2039484

2-s2.0-77951295936

15.

Bron

Smits

van Swieten

Niessen

Klein

Feature selection based on SVM significance maps for classification of dementia

Machine Learning in Medical Imaging 2014 8679

Springer

272 279 Lecture Notes in Computer Science

10.1007/978-3-319-10581-9_34

16.

Lin

K. C.

Huang

Y. H.

Hung

J. C.

Lin

Y. T.

Feature selection and parameter optimization of support vector machines based on modified cat swarm optimization

International Journal of Distributed Sensor Networks. In press

17.

Hao

Liu

A rule based feature selection approach for target classification in wireless sensor networks with sensitive data applications

International Journal of Distributed Sensor Networks 2014 2014 7

429651

10.1155/2014/429651

2-s2.0-84899951031

18.

Nixon

Feature Extraction & Image Processing 2008

Academic Press

19.

Liu

Zhang

W.-S.

Feature extraction using maximum variance sparse mapping

Neural Computing and Applications 2012 21 8 1827 1833

10.1007/s00521-010-0519-9

2-s2.0-84867732117

20.

Zhan

C.-S.

Song

X.-M.

Xia

Tong

An efficient integrated approach for global sensitivity analysis of hydrological model parameters

Environmental Modelling and Software 2013 41 39 52

10.1016/j.envsoft.2012.10.009

2-s2.0-84871038799

21.

Hall

J. W.

Boyce

S. A.

Wang

Dawson

R. J.

Tarantola

Saltelli

Sensitivity analysis for hydraulic models

Journal of Hydraulic Engineering 2009 135 11 959 969

10.1061/(asce)hy.1943-7900.0000098

2-s2.0-70350402150

22.

Sandberg

N. H.

Sartori

Brattebø

Sensitivity analysis in long-term dynamic building stock modeling—exploring the importance of uncertainty of input parameters in Norwegian segmented dwelling stock model

Energy and Buildings 2014 85 136 144

10.1016/j.enbuild.2014.09.028

23.

Confalonieri

Bellocchi

Bregaglio

Donatelli

Acutis

Comparison of sensitivity analysis techniques: a case study with the rice model WARM

Ecological Modelling 2010 221 16 1897 1906

10.1016/j.ecolmodel.2010.04.021

2-s2.0-77954034219

24.

Marco

Gilles

Global sensitivity analysis in the identi cation of cohesive models using full-eld kinematic data

Blucher Material Science Proceedings 2014 1 1 29

25.

Morris

D. J.

Speirs

D. C.

Cameron

A. I.

Heath

M. R.

Global sensitivity analysis of an end-to-end marine ecosystem model of the North Sea: Factors affecting the biomass of fish and benthos

Ecological Modelling 2014 273 251 263

10.1016/j.ecolmodel.2013.11.019

2-s2.0-84890158343

26.

Campolongo

Saltelli

Cariboni

From screening to quantitative sensitivity analysis. A unified approach

Computer Physics Communications 2011 182 4 978 988

10.1016/j.cpc.2010.12.039

2-s2.0-79251599792

27.

Chang

C. C.

Lin

C. J.

LIBSVM: a Library for support vector machines

ACM Transactions on Intelligent Systems and Technology 2011 2 3, article 27

10.1145/1961189.1961199

2-s2.0-79955702502

28.

Sun

Yan

Zhang

Xia

Wang

Bie

Tian

Organizing and querying the big sensing data with event-linked network in the internet of things

International Journal of Distributed Sensor Networks 2014 2014 11

218521

10.1155/2014/218521

29.

Sun

Jara

A. J.

An extensible and active semantic model of information organizing for the internet of things

Personal and Ubiquitous Computing 2014 18 8 1821 1833 3

10.1007/s00779-014-0786-z

30.

Zhang

Peng

Zhao

X.-B.

Estimating photometric redshifts of quasars via the k-nearest neighbor approach based on large survey databases

The Astronomical Journal 2013 146 2, article 22

10.1088/0004-6256/146/2/22

2-s2.0-84880637782