Sparse representation of salient regions for no-reference image quality assessment

Abstract

This paper introduces an efficient feature learning framework via sparse coding for no-reference image quality assessment. The important part of the proposed framework is based on sparse feature extraction from a sparse representation matrix, which is computed using a sparse coding algorithm. Image patches extracted from salient regions of unlabeled images are used to learn a dictionary of sparse coding. The ℓ1-norm of the sparse representation is taken as a sparse penalty term in the process of learning the dictionary and computing the sparse representation. A feature detector adopts the ℓ1-norm together with the max-pooling results of the sparse representation matrix as the output sparse features to obtain the objective quality scores. Sparse features of salient regions are evaluated using the LIVE, CSIQ and TID2013 databases, and result in good generalization ability, performing better than or on par with other image quality assessment algorithms.

Keywords

no-reference image quality assessment sparse coding sparse representation

Introduction

Nowadays digital images have a significant effect on representing and communicating information. Since digital images are often distorted in the procedure of image acquisition and image processing, the question of how to accurately measure image quality becomes a popular and challenging issue in computer vision and image processing. According to whether image quality assessment (IQA) requires the participation of people, it can be classified into objective IQA and subjective IQA. Subjective IQA requires that image quality evaluation results are provided by human visual identification, and is difficult to employ in real time image processing because of its strong randomicity. Correspondingly, objective IQA can be automatically implemented by computing devices running IQA algorithms; therefore, studying efficient and reliable IQA algorithms has enormous practical significance. Recently, lots of objective IQA algorithms have been proposed. Based on the availability of a reference image, we can classify objective IQA algorithms into three categories: full-reference (FR), no-reference (NR) and reduced-reference (RR) methods.¹ Although many FR and RR methods which also achieve satisfying results have been proposed,^2

–5 NR methods that do not need any pristine reference knowledge to measure the quality scores have more room for improvement and practical value. Usually, NR-IQA methods aim to measure the objective quality scores of images with specific distortion types.^6
–8 However, methods which estimate the quality scores of overall images are more useful in practical applications and are called distortion-generic methods. In this work we focus on distortion-generic NR-IQA methods.

Most of the distortion-generic NR-IQA methods estimate the image quality by measuring the deviations from the Natural Scene Statistic (NSS) model that captures the statistical ‘naturalness’ of non-distorted images.⁹ The BIQI and DIIVINE metrics are two-stage frameworks based on the NSS model for estimating quality, adopting a Gaussian scale mixture model to extract features from the wavelet coefficients of images.¹⁰ ^, ¹¹ The BRISQUE metric used scene statistics of locally normalized luminance coefficients to evaluate possible losses of naturalness in images, and then applied the losses to achieve the prediction of image quality.¹² The BLIINDS-II method, given an input image, used an NSS model to extract a set of features from Discrete Cosine Transform (DCT) coefficients of this image, then a Bayesian inference approach adopted the features to predict quality.¹³ The M3 model utilized the joint statistics of two local contrast features, including the gradient magnitude map and the Laplacian of Gaussian response, to predict image quality.¹⁴ Recently, some NR-IQA approaches used machine learning to construct a quality assessment model. Tang et al. defined a radial basis function integrating with a deep network to predict the perceived image quality.¹⁵ First, the network was pre-trained in an unsupervised manner and labeled data was used to fine-tuned the network; then, the image quality scores were predicted by exploiting a Gaussian process regression. Bianco and Celona investigated the use of deep learning for distortion-generic NR-IQA.¹⁶ They used Convolutional Neural Network (CNN) as a feature extractor and then exploited an Support Vector Regression (SVR) machine to predict the perceived quality scores.

Ye et al. presented an efficient unsupervised feature learning approach called CORNIA, which adopted several training and encoding methods, including K-means, sparse coding (SC), hard-assignment and soft-assignment (SA), amongst others.¹⁷ Its experimental results show that SA encoding slightly outperformed SC encoding if using max-pooling to extract features. Ye et al. have achieved good generality and evaluation results. Theoretically, SC whose basis vectors resemble the receptive field of simple cells in the mammalian primary visual cortex (also known as the striate cortex and V1) is more suitable for obtaining the representation of images.¹⁸ The reasons why SA encoding achieves better results compared with SC encoding in the approach of Ye et al. are that SA encoding adopts the max-pooling method to extract features and SC encoding ignores the sparsity of the coefficient matrix. Inspired by overcompleteness and sparsity of SC,¹⁹ we propose a feature extraction framework in which a sparse representation matrix of salient patches is converted to a fixed-length feature vector for NR-IQA. More precisely, a spectral residual (SR) approach, that is an approach of saliency detection, is adopted to locate image salient regions where image patches are extracted.²⁰ In addition, to learn the overcomplete dictionary of SC and calculate the sparse representation matrix of salient image patches, the ℓ1-norm of sparse representation replacing ℓ0-norm is introduced. Then, the ℓ1-norm integrating with the max-pooling act as a feature detector to extract sparse features of salient regions (SFOSR) from the sparse representation matrix. And we adopt SFOSR, which can describe the sparsity of salient image patches over the overcomplete dictionary, to train a SVR machine and predict the quality scores. Lastly, the reason why we introduce saliency detection to NR-IQA is that sparse features of salient patches can achieve more satisfying performance in predicting the quality scores when we fix the size (the number of basis vectors) of the dictionary; that is, a small size dictionary learned from salient patches replacing a large size dictionary learned from original patches can also result in better performance, and this is proved in Section 3.2.

The LIVE, CSIQ and TID2013 databases are used to test the performance of our proposed approach. And experimental results prove that the SFOSR framework, as compared with other IQA algorithms, achieves excellent performance in terms of estimating image quality in cross database trials.

The rest of this paper is structured as follows. We first describe the quality estimation approach via SFOSR in Section 2. Section 3 provides the experimental results and corresponding analysis. Section 4 concludes the paper.

Proposed approach

There are four parts to the process of calculating the SFOSR of an image: extracting image patches from salient regions, learning an overcomplete dictionary, calculating sparse representation and performing sparse feature extraction.

Image patches of salient regions extraction

Usually, the human visual system (HVS) pays more visual attention to pixels from the salient regions of images. In other words, visual saliency captures more attention. Engelke et al. have shown that including information about visual saliency can improve the performance of the IQA metrics.²¹ Based on this, we apply a SR approach to detect the salient regions.²⁰ Representing the information of saliency regions, SR is obtained by log-spectrum subtracting the averaged log spectrum. If $M (x)$ represents an image, its saliency map via SR can be calculated as

\begin{array}{l} B (f) = ℜ (F [M (x)]) \\ C (f) = ℑ (F [M (x)]) \\ D (f) = \log (B (f)) \\ E (f) = D (f) - h_{n} (f) * D (f) \\ S (x) = g (x) * F^{- 1} {[exp (E (f) + i * C (f))]}^{2} \end{array}

where F and $F^{- 1}$ represent the Fourier transform and inverse Fourier transform, and $B (f)$ and $C (f)$ represent the amplitude spectrum and phase spectrum of the image.²² A local average filter $h_{n}$ convolutes the log spectra $D (f)$ to approximate the averaged spectrum. $g (x)$ represents a gaussian filter, and $S (x)$ represents the saliency map of $M (x)$ .

Considering that the output saliency map $S (x)$ is $64 * 64$ , we adopt linear interpolation to resize the saliency map and then obtain a new map $S' (x)$ whose size the same as the original image. If there are N non-overlapping $w * h$ image patches extracted from original gray image, the saliency value s_i of every patch is computed as

\begin{array}{l} s_{i} = \sum_{x \in i} S' (x) \end{array}

The larger the saliency value, the more visual attention is given to a patch.

Patches whose saliency value s_i satisfy equation (3) are vectorized and arranged as rows of a matrix X . Referring to the preprocessing method of image patches employed by Ye et al.,¹⁷ we normalize every row vector in X and use a zero component analysis (ZCA) algorithm to whiten vectors of X . Then, we produce a new matrix $\tilde{X}$ . The saliency maps and salient image patches of example images from the LIVE database are shown in Figure 1

\begin{array}{l} s_{i} > \frac{1}{N} \sum_{j = 1}^{N} s_{j} \end{array}

Figure 1.

The left column displays the original images. The middle column displays the saliency maps via SR. The right column displays salient image patches from the original gray images.

Dictionary learning

With the ability of explaining the most important part of the HVS, SC is suitable to achieve perceptual quality prediction, and its sparsification of natural images well describes the representation pattern of the HVS. This step involves learning an overcomplete dictionary from unlabeled learning images. Feng et al. have provided the process of adopting image patches to learn an overcomplete dictionary via SC.²³ Similarly, an overcomplete dictionary can be learned from a set of learning images to represent the local structures of the images. And salient image patches with $w * h$ size, presented in Section 2.1, can account for the local structures of the images. So salient image patches are treated as local descriptors to describe the local structures in images. If there are a number of M salient patches selected from the set of learning images, we can vectorize and arrange these patches as rows of a matrix $X_{learn} \in R^{M \times D}$ where $D (D = w * h)$ is the column dimensionality of $X_{learn}$ . After normalizing and whitening, a new matrix ${\tilde{X}}_{learn}$ accounting for the local structures in images is used to learn an overcomplete dictionary

\begin{array}{l} C = \underset{C, S}{argmin} {‖ {\tilde{X}}_{learn} - S C ‖}^{2} + λ ‖ S ‖_{0} \end{array}

From ${\tilde{X}}_{learn}$ , we adopt equation (4) to learn a dictionary $C = [c_{1}, c_{2}, \dots, c_{K}]^{T} \in R^{K \times D}$ , where K represents the number of basis vectors. In equation (4), S represents the sparse representation of ${\tilde{X}}_{learn}$ over the dictionary C . Additionally, $‖ . ‖_{0}$ represents the ℓ0-norm. $λ ‖ S ‖_{0}$ , where the ℓ0-norm of S measures the sparse degree of coding and λ controls the balance between the reconstruction error and sparsity, represents the sparse penalty function. Aiming at the problem that solving (4) is NP hard, Donoho has proved that the ℓ0-norm can be converted to a convex optimization estimation of the ℓ1-norm if the representation is sparse enough.²⁴ So, (4) can be written as

\begin{array}{l} C = \underset{C, S}{argmin} {‖ {\tilde{X}}_{learn} - S C ‖}^{2} + λ ‖ S ‖_{1} \end{array}

This problem can then be transformed into an unconstrained optimization problem, and we can adopt (5) to learn an overcomplete dictionary C . The overview of learning an overcomplete dictionary is shown in Figure 2.

Figure 2.

The process of learning an overcomplete dictionary via SC. (a) is the projection map of the dictionary.

Sparse representation

Given an overcomplete dictionary C , Figure 3 shows the process of calculating the sparse representation matrix which is obtained by adopting (5). As shown in Figure 3, there are a number of N salient patches, which are vectorized and arranged as rows of a matrix $X_{test} \in R^{N \times D}$ , extracted from the tested gray image. $S' = [{s'}_{1}, {s'}_{2}, \dots, {s'}_{N}]^{T} \in R^{N \times K}$ denotes the sparse representation of $X_{test}$ , and every row vector ${s'}_{n}$ in $S'$ represents the sparse representation of the n th image patch.

Figure 3.

The process of obtaining the sparse expression for a tested image. (a) is the projection map of the dictionary, and (b) indicates the three-dimensional scatter plots of the sparse representation matrix S_'.

Sparse feature extraction

We now have the sparse representation $S'$ of the tested image. Figure 4 shows the three-dimensional scatter plots of the sparse representation matrices of salient patches for four images with different distortion degrees of white noise, and we can see that these three-dimensional scatter plots have obvious differences in distribution and values according to the degree of distortion. But the sparse representation of salient patches with the ability of distinguishing the degree of image distortion cannot be used as the last feature of NR-IQA. At this point, the next task is to convert the sparse representation $S'$ to a fixed-length feature vector. Ye et al. provide and use a max-pooling method to convert the coefficient matrix to a fixed-length feature vector for the purpose of learning a regression model.¹⁷ Similarly, the sparse representation $S'$ of an image introduced above acts as our coefficient matrix to extract features.

Figure 4.

The three-dimensional scatter plots of sparse expression matrices for four images with white noise distortion, the degree of distortion is more and more serious from left to right, and the difference mean opinion score (see Section 3.1) values which indicate subjective scores for these images are 0, 41.17, 49.09 and 65.73 from left to right. The second row displays the saliency maps via SR. Image patches which will be extracted are coloured in blue in the third row. The last row shows the three-dimensional scatter plots of sparse expression.

Considering that the L1-norm is taken as a sparse penalty term to calculate the sparse representation, we add the L1-norm of the sparse representation acting as a part of the last output features. The sparse representation $S'$ can also be written as $S' = [{s ″}_{1}, {s ″}_{2}, \dots, {s ″}_{K}] \in R^{N \times K}$ where every vector ${s ″}_{K} = [{s ″}_{1 K}, {s ″}_{2 K}, \dots, {s ″}_{N K}]^{T} \in R^{N \times 1}$ represents the sparse representation of N salient patches over the k th basis vector $c_{k}$ of dictionary C . For a vector ${s ″}_{k}$ , its sparse features via ℓ1-norm can be written as:

\begin{array}{l} f_{sparsity, k} = [\frac{‖ P_{k} ‖_{1}}{N}, \frac{‖ N_{k} ‖_{1}}{N}] \\ P_{k} = [p_{1 k}, p_{2 k}, \dots, p_{N k}]^{T} \\ p_{n k} = {\begin{cases} {s ″}_{n k}, {s ″}_{n k} > 0 \\ 0, {s ″}_{n k} < = 0 \end{cases} \\ N_{k} = [n_{1 k}, n_{2 k}, \dots, n_{N k}]^{T} \\ n_{n k} = {\begin{cases} - {s ″}_{n k}, {s ″}_{n k} < 0 \\ 0, {s ″}_{n k} > = 0 \end{cases} \end{array}

where symbol vectors $P_{k}$ and $N_{k}$ are constructed from positive and negative values in sparse vector ${s ″}_{K}$ , and the ℓ1-norm of $P_{k}$ and $N_{k}$ are treated as the sparse features $f_{sparsity, k}$ of ${s ″}_{k}$ . In addition, the max-pooling features of ${s ″}_{k}$ , which are obtained from (7), are also a part of the last features, considering that humans tend to perceive ‘poor’ regions of images.¹⁷ Then, we get the last feature vector $f_{k} = [f_{sparsity, k}, f_{max-pooling, k}]$ with four elements for ${s ″}_{k}$ . For a tested image, a fixed-length feature vector $F = [f_{1}, f_{2}, \dots, f_{K}]$ with $4 * K$ elements obtained from sparse representation $S'$ acts as the last sparse feature in the regression model

f_{max-pooling, k} = [max ({s ″}_{k}), max (- {s ″}_{k})]

Experimental results

Databases and evaluation criteria

To evaluate the performance of SFOSR framework for NR-IQA, three subjective image quality evaluation databases, LIVE, CSIQ and TID2013, have been used for our experiment. The values of difference mean opinion score (DMOS) or mean opinion score (MOS) for images in these databases are provided as objective scores. To illustrate the performance of IQA metrics, four evaluation criteria are used, Spearman rank order correlation coefficient (SROCC), Pearson linear correlation coefficient (LCC), root-mean-squared error (RMSE) and mean absolute error (MAE). The prediction accuracy is measured by LCC, RMSE and MAE, and the prediction monotonicity is measured by SROCC. The closer to 1 that SROCC and LCC criteria score, the better the performance of IQA methods. The smaller the RMSE and MAE, the better performance of IQA methods. SROCC and LCC can be calculated by equations (8) and (9)

SROCC = 1 - \frac{6 \sum_{i = 1}^{n} d_{i}^{2}}{n (n^{2} - 1)}

LCC = \frac{Cov (O_{score}, S_{score})}{σ_{O_{score}} σ_{S_{score}}}

In these formulas, $O_{score}$ denotes the vector of objective scores and S_score denotes the vector of subjective scores; n is the dimension of O_score or S_score; d_i represents the ranking difference of the i th image between objective score and subjective score. Additionally, Cov represents a covariance operator, and σ represents the standard deviation operator.

In our experiments, 80% of samples were selected randomly as a training set for training the SVR model and the remaining as a testing set. In addition, there were 1000 train–test iterations for each experimental result in this section. The overcomplete dictionary was obtained from 240 images consisting of some reference images and images with all types of distortions from the LIVE and CSIQ databases fixing the size of image patches to 7×7 (w = 7,h = 7). In addition, the number of salient patches was decided via the corresponding saliency map for an image. In total, 618,188 salient patches were extracted from the 240 learning images to learn the overcomplete dictionary.

Impact of algorithm modules and parameters

In the SFOSR framework, some modules and parameters influence the performance. Here, we discuss two algorithm modules, which include saliency detection methods and feature extraction methods, and the parameter of the size of overcomplete dictionary. Results shown were conducted on the LIVE database with all types of distortions under consideration (80% samples for training and the rest for test).

Impact of saliency detection methods

Figure 5 shows that there are 12 groups of non-distortion-specific experimental results for evaluation measures (SROCC, LCC, RMSE and MAE) of saliency detection methods via SR, Shannon’s entropy and a method without saliency detection. Each group has six experimental results for different dictionary sizes including 200, 500, 1000, 2000, 3000, 4000. Through comparing this experimental data, we can conclude that adopting SR in NR-IQA can achieve better results when compared with Shannon’s entropy and with the method without saliency detection when the size of dictionary is fixed from the figure. In addition, a small sized dictionary learned from salient patches via SR can replace a large size dictionary learned from patches without using saliency detection. That a small sized dictionary can also result in good image assessment results is one of this framework’s advantages as compared with that presented by Ye et al.¹⁷

Figure 5.

Median LCC (a), SROCC (b), RMSE (c) and MAE (d) across 1000 train–test random splits of the LIVE database, with respect to the size of dictionary. Three modules (SR, Shannon’s entropy and a method without saliency detection) are taken into consideration; for each of the modules, the results of evaluation criteria for different dictionary sizes (200, 500, 1000, 2000, 3000, 4000) are provided.

Impact of dictionary size

Here we pay our attention to the effect of different overcomplete dictionary sizes on evaluation measures. It is obvious that the experimental results of this framework get better as the number of basis vectors in the dictionary increase, as shown in Figure 5, and that satisfying results can be obtained when the dictionary size is 4000.

Impact of feature extraction from sparse matrix method

In addition to feature extraction via the ℓ1-norm integrating with max-pooling, we also tried using feature extraction via max-pooling and the ℓ1-norm in our experiment. We tested these three feature extraction methods over the dictionary with 4000 basis vectors in the LIVE database, and the results are displayed in Figure 6. In conclusion, the feature extraction method via the ℓ1-norm integrating with max-pooling results of sparse representation can improve overall performance in NR-IQA.

Figure 6.

Median SROCC of different feature extraction methods fixing the size of dictionary to 4000.

Performance of SFOSR

The LIVE, CSIQ and TID2013 databases were utilized as the benchmark databases to test the performance of our approach, and the overcomplete dictionary with 4000 basis vectors was used for our experiment. The median values of SROCC and LCC across the 1000 iterations are chosen as the performance evaluation criteria of IQA metrics. The full-reference metrics – PSNR, SSIM – and the blind metrics – BIQI, DIIVINE, BLINDS-II, BRISQUE, M3, CORNIA – are compared with the proposed SFOSR framework. The results of SROCC and LCC obtained using the LIVE, CSIQ and TID2013 databases for IQA metrics are given in Tables 1 –5, and best results are marked in bold.

Table 1.

Median Spearman rank order correlation coefficient (SROCC) across 1000 iterations on the LIVE database.

Model	JPEG	JP2K	GBLUR	WN	FF	ALL
PSNR	0.851	0.890	0.811	0.985	0.890	0.872
SSIM	0.923	0.932	0.904	0.963	0.939	0.918
BIQI	0.890	0.794	0.845	0.925	0.703	0.816
DIIVINE	0.892	0.853	0.880	0.962	0.833	0.893
BLIINDS-II	0.942	0.927	0.913	0.978	0.886	0.920
BRISQUE	0.961	0.920	0.959	0.967	0.881	0.931
M3	0.965	0.928	0.935	0.985	0.901	0.941
CORNIA	0.955	0.943	0.969	0.976	0.906	0.942
SFOSR	0.947	0.932	0.951	0.982	0.946	0.953

Table 2.

Median LCC across 1000 iterations on the LIVE database.

Model	JPEG	JP2K	GBLUR	WN	FF	ALL
PSNR	0.874	0.873	0.774	0.928	0.869	0.855
SSIM	0.955	0.920	0.891	0.982	0.939	0.906
BIQI	0.901	0.809	0.830	0.954	0.733	0.821
DIIVINE	0.921	0.922	0.923	0.988	0.888	0.917
BLIINDS-II	0.979	0.963	0.948	0.985	0.864	0.923
BRISQUE	0.973	0.923	0.951	0.985	0.903	0.942
M3	0.970	0.931	0.930	0.986	0.920	0.949
CORNIA	0.960	0.935	0.971	0.983	0.911	0.953
SFOSR	0.951	0.939	0.945	0.949	0.925	0.964

Table 3.

Median SROCC across 1000 iterations on the CSIQ database.

Model	JPEG	JP2K	GBLUR	WN	Fnoise	Contrast	ALL
BIQI	0.870	0.692	0.741	0.873	0.381	0.546	0.078
DIIVINE	0.805	0.809	0.880	0.859	0.155	0.388	0.748
BLIINDS-II	0.912	0.909	0.912	0.829	0.318	0.020	0.791
BRISQUE	0.908	0.856	0.911	0.923	0.223	0.031	0.756
M3	0.932	0.910	0.906	0.933	0.897	0.771	0.913
SFOSR	0.938	0.923	0.877	0.918	0.897	0.829	0.887

Table 4.

Median LCC across 1000 iterations on the CSIQ database.

Model	JPEG	JP2K	GBLUR	WN	Fnoise	Contrast	ALL
BIQI	0.856	0.719	0.758	0.891	0.324	0.577	0.142
DIIVINE	0.842	0.863	0.913	0.824	0.361	0.375	0.768
BLIINDS-II	0.931	0.910	0.916	0.848	0.465	0.178	0.629
BRISQUE	0.897	0.875	0.924	0.917	0.361	0.159	0.742
M3	0.927	0.916	0.913	0.934	0.906	0.753	0.918
SFOSR	0.941	0.925	0.889	0.892	0.868	0.817	0.883

Table 5.

Median SROCC across 1000 iterations on the TID2013 database.

	BRISQUE	M3	SFOSR
Additive Gaussian noise	0.766	0.762	0.824
Additive noise more in color	0.581	0.603	0.801
Spatially correlated noise	0.823	0.785	0.812
Masked noise	0.213	0.498	0.903
High frequency noise	0.876	0.871	0.939
Impulse noise	0.809	0.723	0.834
Quantization noise	0.712	0.824	0.875
Gaussian blur	0.876	0.871	0.921
Image denoising	0.509	0.781	0.905
JPEG compression	0.781	0.813	0.909
JPEG2000 compression	0.798	0.880	0.892
JPEG transmission errors	0.217	0.431	0.813
JPEG2000 transmission errors	0.712	0.744	0.779
Non eccentricity pattern noise	0.109	0.209	0.581
Local block-wise distortions	0.245	0.279	0.724
Mean shift	0.115	0.088	0.462
Contrast change	0.063	0.351	0.425
Change of color saturation	0.096	0.190	0.486
Multiplicative Gaussian noise	0.632	0.687	0.811
Comfort noise	0.172	0.234	0.882
Lossy compression of noisy images	0.523	0.709	0.901
Image color quantization with dither	0.865	0.860	0.808
Chromatic aberrations	0.723	0.612	0.860
Sparse sampling and reconstruction	0.811	0.918	0.886
All	0.561	0.673	0.637

Results from the LIVE database are shown in Tables 1 and 2. From these tables, it is obvious that our IQA metric via the SFOSR framework is better than that of the other general purpose IQA methods in the experiment including all samples. Besides this, our proposed method also achieves good performance for distortion-specific samples. In addition, Figure 7 – the scatter plots that the predicted DMOS versus the subjective DMOS – roughly illustrates the performance of SFOSR, and these plots also demonstrates that our algorithm framework can achieve stable and good performance across different distortions and the entire LIVE database.

Figure 7.

The scatter plots of subjective scores (DMOS) versus predicted quality scores obtained by SFOSR on the LIVE database. In each plot, plus signs (+) colored in blue represent the values (predicted quality score, DOMS) of images, and the black line represents the fitting curve of these values by using the least-squares method.

Results from the CSIQ database are shown in Tables 3 and 4, where we tested BIQI, DIIVINE, BLINDS-II, BRISQUE, M3 and our proposed framework. It is clear that SFOSR and M3 significantly outperform most of the other methods. Compared with M3, SROCC and LCC in the distortion type named ‘global contrast decrements’, there is a considerable improvement using SFOSR. Table 5 displays the SROCC tested on TID2013 adopting BRISQUE, M3 and SFOSR. Although the SROCC of SFOSR in some distortion types, such as mean shift, contrast change and change of color saturation, are not satisfying, SFOSR achieves superior performance in most distortion types compared with the other two methods. For most of distortion types in the TID2013 database, the SROCC results in Table 5 illustrate that the dictionary learned from images in the LIVE and CSIQ databases does possess a certain generalization ability.

Cross-database test

In order to illustrate the cross-database performance of our metric, we conduct a database independence experiment for our approach by training on the LIVE database and testing on the CSIQ database. We chose CSIQ as the test database because the scenes of samples in CSIQ are very different from scenes in the LIVE database. There are four distortions in common between the two databases, including JPEG2k, JPEG, WN and GBLUR, and we only test SROCC and LCC on these four common distortions. For every distortion type or non-distortion test, the SVR model is trained on features from the LIVE database, then predicts the assessment results of samples in the CSIQ database. Table 6, which shows the SROCC in the cross-database test, illustrates that our approach has good database independence.

Table 6.

Database independence test: trained on LIVE and tested on CSIQ.

	JPEG	JP2K	GBLUR	WN	ALL
SROCC	0.895	0.821	0.738	0.764	0.740
LCC	0.872	0.806	0.699	0.728	0.729

LCC: Pearson linear correlation coefficient.

Conclusions

This paper presented a feature extraction framework named SFOSR for NR-IQA. This algorithm utilized a sparse representation matrix as the coefficient matrix to extract sparse features. We detailed the sparse feature extraction framework. Additionally, we discussed the impact of dictionary size and two algorithm modules for NR-IQA performance. Further, we compared the performance of SFOSR with other general purpose IQA metrics in the LIVE, CSIQ and TID2013 databases. The experiments conducted show database independence and show that the proposed algorithm has good generalization ability and performs better or on par with some general purpose IQA algorithms.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

Doermann

. No-reference image quality assessment using visual codebooks. IEEE T Image Process 2012; 21(7): 3129–3138.

Wang

Bovik

Sheikh

. Image quality assessment: From error visibility to structural similarity. IEEE T Image Process 2004; 13(4): 600–612.

Wang

. Information content weighting for perceptual image quality assessment. IEEE T Image Process 2011; 20(5): 1185–1198.

Bovik

. Content-partitioned structural similarity index for image quality assessment. Signal Process – Image 2010; 25(7): 517–526.

Tao

. Reduced-reference IQA in contourlet domain. IEEE T Syst Man Cy B 2009; 39(6): 1623–1627.

Corchs

Gasparini

Schettini

. No reference image quality classification for JPEG-distorted images. Digit Signal Process 2014; 30: 86–100.

Gastaldo

Zunino

. Neural networks for the no-reference assessment of perceived quality. J Elec Imag 2005; 14(3): 4–11.

Brandão

Queluz

. No-reference image quality assessment based on DCT domain statistics. Signal Process 2008; 88(4): 822–833.

Bovik

. Automatic prediction of perceptual image and video quality. Proc IEEE 2013; 101(9): 2008–2024.

10.

Moorthy

Bovik

. A two-step framework for constructing blind image quality indices. IEEE Signal Process Let 2010; 17(5): 513–516.

11.

Moorthy

Bovik

. Blind image quality assessment: From natural scene statistics to perceptual quality. IEEE T Image Process 2011; 20(12): 3350–3364.

12.

Mittal

Moorthy

Bovik

. No-reference image quality assessment in the spatial domain. IEEE T Image Process 2012; 21(12): 4695–4708.

13.

Saad

Bovik

Charrier

. Blind image quality assessment: A natural scene statistics approach in the DCT domain. IEEE T Image Process 2012; 21(8): 3339–3352.

14.

Xue

Mou

Zhang

. Blind image quality assessment using joint statistics of gradient magnitude and Laplacian features. IEEE T Image Process 2014; 23(11): 4850–4862.

15.

Tang

Joshi

Kapoor

. Blind image quality assessment using semi-supervised rectifier networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Columbus, Ohio, USA, 24–27 June 2014, pp.2877–2884. New York, NY: IEEE.

16.

Bianco

Celona

Napoletano

. On the use of deep learning for blind image quality assessment. arXiv preprint 2016; arXiv:1602.05531.

17.

Kumar

Kang

. Unsupervised feature learning framework for no-reference image quality assessment. In: IEEE conference on computer vision and pattern recognition (CVPR), Providence, Rhode Island, USA, 16–21 June 2012, pp.1098–1105. New York, NY: IEEE.

18.

Jiang

Shao

Jiang

. Supervised dictionary learning for blind image quality assessment using quality-constraint sparse coding. J Vis Commun Image R 2015; 33: 123–133.

19.

Guha

Ward

. Image similarity using sparse representation and compression distance. IEEE T Multimedia 2014; 16(4): 980–987.

20.

Hou

Zhang

. Saliency detection: A spectral residual approach. In: IEEE conference on computer vision and pattern recognition (CVPR’07), Minneapolis, Minnesota, USA, 18–23 June 2007, pp.1–8. New York, NY: IEEE.

21.

Engelke

Nguyen

Zepernick

. Regional attention to structural degradations for perceptual image quality metric design. In: IEEE international conference on acoustics, speech and signal processing (ICASSP), Las Vegas, Nevada, USA, 30 April–4 March 2008, pp.869–872. New York, NY: IEEE.

22.

Zhangyuan

Shuo

Jianxun

. Detection of infrared dim and small targets based on saliency and grayscale morphological reconstruction. In: 34th Chinese control conference (CCC), Hangzhou, China, 28–30 July 2015, pp.3811–3815. New York, NY: IEEE.

23.

Feng

Zou

Yan

. Real-time fabric defect detection using accelerated small-scale over-completed dictionary of sparse coding. Int J Adv Rob Sys 2016; 13(1–9).

24.

Donoho

. For most large underdetermined systems of linear equations the minimal L1-norm solution is also the sparsest solution. Commun Pur Appl Math 2006; 59(6): 797–829.