Abstract
Internal patch prior (e.g. self-similarity) has achieved a great success in image denoising. However, it is a challenging task to utilize clean external natural patches for denoising. Natural image patch comes from very complex distributions which are hard to learn without supervision. In this paper, we use an autoencoder to discover and utilize these underlying distributions to learn a compact representation that is more robust to realistic noises. By exploiting learned external prior and internal self-similarity jointly, we develop an efficient patch sparse coding scheme for real-world image denoising. Numerical experiments demonstrate that the proposed method outperforms many state-of-the-art denoising methods, especially on removing realistic noise.
Introduction
Image denoising is a classical ill-posed problem in low-level vision. It aims to recover the original image signal
The seminal work of nonlocal means
7
is based on the assumption that a local patch often has many nonlocal similar patches to it across the image. The use of such image internal self-similarity has significantly enhanced the denoising performance and has led to many good denoising algorithms, such as block-matching three-dimensional filtering (BM3D).
8
Based on sparse representation, another popular approach is to encode an image patch as a linear combination of a few atoms selected from a dictionary. Due to the seminal work of KSVD,
11
learning dictionaries from natural image patches has attracted much attention.10,12,13 Given a noisy patch matrix
Internal self-similarity-based patch methods have been successful in denoising. However, learning good prior from natural patches is a great challenge. The plain multilayer perceptron (MLP) method 15 uses a neural network to learn a denoising procedure from training examples consisting of pairs of noisy and noise-free image patches. By viewing image patches as samples of a multivariate variable vector and considering that natural images are non-Gaussian, Zoran and Weiss 16 learned clean natural image patches using Gaussian mixture models (GMM) with the means and full covariance matrices, and mixing weights over all pixels. Recently, GMM is developed to patch group learning 17 and patch clustering18,19 for high-performance denoising.
Since image patch space is very complex, there is no guarantee that GMM is a good choice for patch prior learning. Figure 1 shows two patches having very similar values of average intensity and covariance matrix. The GMM classifier 16 could not distinguish them directly. As a consequence, their collaborative filtering may not be effective to restore image intensity. In contrast to GMM which mainly learns the covariance matrices of the clean natural patches, in this paper we take a different approach, inspired by recent advances in unsupervised learning.20–22 By using clean natural patches, we train an autoencoder to learn patch features that are more robust against realistic noises, because we do not assume the corrupted noise to be additive white Gaussian noise (AWGN). In the denoising stage, the learned external prior will guide internal noisy patch clustering, and followed by a sparse coding scheme to estimate the latent patch group for image recovery.

Two patches have very similar values of average intensity and covariance matrix.
Image patch learning by autoencoder
An autoencoder neural network is an unsupervised learning algorithm.20,22 It attempts to map inputs to their hidden representations. Suppose we have a total of
The hidden features
Modeling natural patches is challenging because image patches are continuous and high-dimensional. Figure 2(a) visualizes eigenvectors of one Gaussian component of the learned GMM from the Berkeley segmentation dataset.
23
It can be seen that the different eigenvectors only encode a kind of patch structure. Figure 2(b) visualizes the encoding weights of the learned autoencoder. Each input pixels in the encoder has a vector of weights associated with it which will be trained to respond to a particular visual feature of image patches. The encoding weights could extract more compact features from image patches. Therefore, the learned parameters

Eigenvectors of one Gaussian component learned by the GMM and encoding weights by the learned autoencoder. (a) Eigenvectors of one Gaussian component learned by the GMM and (b) Encoding weights by the learned autoencoder.
Image denoising by using external patch prior
Nonlocal self-similarity has been widely adopted in patch-based image denoising. However, how to learn the patch prior from clean natural images and apply it to image restoration is still an open problem. Based on the idea that good patch prior should be robust to noises, we include autoencoder-based external patch prior into the denoising framework.
Including external patch prior into the framework
Given a noisy image
Then, all overlapped noisy patches could be partitioned into
Note that the parameter
Firstly, GMM-based image patch modeling could not effectively capture the natural patch distribution. For example, Figure 3(a) shows a noisy image

GMM model vs. autoencoder. (a) A noisy patch from image
Secondly, GMM-based patch clustering generally depends on noise type. Since denoising procedure needs several iterations for better denoising results, the noise distribution is no longer Gaussian after the first iteration. Unlike existing methods14,17,18 which require estimating the standard deviation of noise to search similar patches in each iteration, auto-encoder-based patch learning could obtain more compact representation, and it could be considered as a non-linear mapping model which is more robust to noises.
Thirdly, the realistic noise in real-world noisy images is much more complex than white Gaussian noise. The low-rank approximation used in PCLR becomes much less effective when applied to real-world noisy images captured by CCD or CMOS cameras. However, the weighted sparse coding scheme can characterize the statistics of realistic noise in patch group.
Image denoising by using external patch prior Input: Noisy image Output:The recovered image Initialization: Iterative Regularization: Patch clustering by computing the class label by (3); For each class do weighted sparse coding model by (6); Reconstruct the image by aggregation end for
Denoising algorithm
Owe to the rapid development of non-convex optimization techniques,
19
it is shown that equation (4) has a closed-form solution. By a soft-thresholding operation on the sparse coding coefficient matrix
Due to the non-convexity of the object function, equation (5) is difficult to solve, and the alternating minimization algorithm is commonly employed. Given the prior
Experimental results and discussion
To validate the effectiveness of our proposed method, we apply it to both synthetic AWGN corrupted images and real-world noisy images captured by CCD or CMOS cameras. The proposed method contains two stages, the prior learning stage and the denoising stage. In the autoencoder-based learning stage, we use autoencoder with default parameter settings to learn the patch prior from a set of

The 12 popularly used test images.

The 15 cropped real-world noisy images used in the dataset. 25
Results on AWGN noise removal
To better demonstrate the role of the external patch prior in our model, we compare it with several state-of-the-art denoising methods, including BM3D,
8
EPLL,
16
DnCNN,
26
SAIST,
14
PGPD,
17
and PCLR.
18
In the denoising stage, we test it on 12 popularly used test images. Gaussian white noise with standard deviations
As shown in Table 1, the best two PSNR results of each image are highlighted in bold, and the proposed algorithm outperforms the other methods in most cases in terms of PSNR. When standard deviation of noise fluctuates from 10 to 90, the average PSNR of the proposed method is about 0.4 dB higher than BM3D. The visual comparison of the denoising methods on noise level (

Denoising results on image
Denoising PSNR (dB) results by different denoising methods.
Results on realistic noise removal
This subsection evaluates the proposed method on the publicly available real-world noisy image dataset. 25 Since the dataset is very large, Xu et al. 27 cropped 15 smaller images of size 512 × 512 to perform experiments. These noisy images are captured by Canon 5D Mark 3, Nikon D600, and Nikon D800 cameras, which are shown in Figure 5. Each scene was captured 500 shots, and the mean image of these 500 shots can be used as a kind of ground-truth to compute the PSNR and SSIM. 28 We compare the proposed method with CBM3D, 29 DnCNN, 26 the commercial software Neat Image (NI), 30 the Noise Clinic algorithm (NC), 31 and GID. 19 For CBM3D, PCLR, and the proposed method, we use statistical method 32 to estimate the standard deviation of noise. For blind mode DnCNN, we use its color version provided by the authors and there is no need to estimate the standard deviation of noise. In order to handle RGB images, we extend PCLR and the proposed method by stacking three channels of color images.
The results on PSNR (dB) and average computational time of different methods are listed in Table 2. The best PSNR results of each image are highlighted in bold. One can see that on 11 out of the 15 images, our method achieves the best PSNR values. This is because real world noise is much more complex than Gaussian noise. The visual comparisons of the denoising methods are shown in Figure 7. In addition, all experiments are run on a laptop with Intel Xeon E3 CPU (3.40 GHz). The fastest result is highlighted in bold. One can see that CBM3D and DnCNN generate some noise-caused color artifacts across the whole image, while PCLR tends to over-smooth the image a little. These results show that the methods designed for AWGN are not effective for realistic noise removal. The methods NI and NC do not work well on removing the realistic noises effectively. Since autoencoder-based patch learning is more robust to realistic noise, the proposed method works much better in removing the noise while maintaining the details (see the zoom-in window) than the competing method GID.

Denoised images of a region cropped from the real-world noisy image by different methods. The images are better to be zoomed-in on screen. (a) Noisy image (PSNR 29.63 dB), (b) CBM3D (PSNR 31.13 dB), (c) DnCNN (PSNR 29.83 dB), (d) NI (PSNR 31.28 dB), (e) NC (PSNR 33.49 dB), (f) PCLR (PSNR 35.04 dB), (g) GID (PSNR 33.28 dB), (h) Ours (PSNR 35.15 dB) and (i) Mean image.
PSNR (dB) results and averaged computational time (s) on 15 cropped real-world noisy images used in the dataset. 25
Conclusion
This paper extends patch clustering-based image denoising by including external patch prior, since the inherent complexity of patch space, autoencoder, is used to learn the patch feature in order to build a good low-dimensional representation. In contrast to GMM, the proposed denoising algorithm can find better similar patches and make it more efficient to preserve edge and texture areas. Experimental results showed the proposed algorithm can achieve very competitive denoising performance. In particular, it can preserve better texture structures under realistic noise environment than the other state-of-the-art denoising algorithms.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (61771141), the Natural Science Foundation of Fujian Province (2017J01751, 2017J01502), and the Scientific Research Foundation of Fuzhou University (XRC-17015).
