Sage Journals: Discover world-class research

Abstract

BACKGROUND:

In clinical medicine, low-dose radiographic image noise reduces the quality of the detected image features and may have a negative impact on disease diagnosis.

OBJECTIVE:

In this study, Adaptive Projection Network (APNet) is proposed to reduce noise from low-dose medical images.

METHODS:

APNet is developed based on an architecture of the U-shaped network to capture multi-scale data and achieve end-to-end image denoising. To adaptively calibrate important features during information transmission, a residual block of the dual attention method throughout the encoding and decoding phases is integrated. A non-local attention module to separate the noise and texture of the image details by using image adaptive projection during the feature fusion.

RESULTS:

To verify the effectiveness of APNet, experiments on lung CT images with synthetic noise are performed, and the results demonstrate that the proposed approach outperforms recent methods in both quantitative index and visual quality. In addition, the denoising experiment on the dental CT image is also carried out and it verifies that the network has a certain generalization.

CONCLUSIONS:

The proposed APNet is an effective method that can reduce image noise and preserve the required image details in low-dose radiographic images.

Keywords

Medical image denoising residual learning subspace projection X-ray CT low dose CT

1 Introduction

X-Ray and Computed tomography (CT) technologies have become widely used in clinical medicine over the past few years for both diagnosis and therapy. CT is an advanced technology that was developed based on X-Ray technology that can represent more information about organs in three-dimensional space. Existing medical imaging methods to minimize the radiation dose to which patients are usually exposed, the body is usually scanned with a low-dose CT (LDCT) device. Nevertheless, LDCT scanning will result in noise and a loss of image sharpness. Medical images typically contain granular noise because of the intrinsic qualities of imaging equipment, interference from the outside environment, and other causes. Two types of noise that frequently appear in medical images are Gaussian noise and Quantum noise. Whereas the quantum noise follows the Poisson distribution. Noise-polluted images lose features and reduce accuracy in clinical diagnosis. Therefore, the pre-processing stage for denoising is crucial for medical images.

The goal of image denoising is to remove noise from images. Existing noise reduction techniques first assume the same distribution of noise, such as the Additive white Gaussian noise (AWGN) [1 –5], and then extended to the study of Poisson-Gaussian mixed noise [6, 7]. Because they can distort the details of image edges while denoising, traditional denoising techniques like wavelet filtering [8], non-local means (NLM) [9], and block-matching 3D (BM3D) [10] cannot meet the needs of medical diagnostics.

Convolutional neural network (CNN)-based image denoising has gained popularity thanks to the advance of deep learning. To enhance the performance of AWGN removal, DnCNN [1] adds batch normalization and residual learning. The FFDNet [11] and CBDNet [12] series gradually consider more sophisticated noise distribution. In order to achieve deep end-to-end denoising, Memory Network (MemNet) [13] presents an extended network memory model that incorporates the output of all short-term and long-term memory units. Convolutional Super-Resolution Network for Multiple Degradations (SRMD) [14] obtains a model appropriate for multiple degraded images by considering both the influence of noise level and fuzzy kernel. An architecture for full-resolution processing dubbed MIRNet was suggested by Zamir et al. [15]. The network may retain high-resolution spatial details while obtaining contextual information from low-resolution representations. Also, they suggested the MPRNet [16] multi-stage design, which significantly enhances the performance of noise removal. In the proposed NBNet, Chen et al. [17] introduced adaptive image projection to the job of image denoising for the first time, the network is capable of noise suppression and maximal detail retention. Instance normalization (IN) [18] is used as a building block by the Half Instance Normalization Network (HINet) [19] to enhance network performance.

Although the techniques have improved image denoising, several drawbacks remain. First, the CNN-based approach will result in a smooth transition and missing details because it cannot adapt to texture features. In a scene with hardly discernible details, it is challenging to restore high-quality images. Second, because there are many types of noise in LDCT images, channel characteristics should be changed based on the prior to giving areas with considerable noise more weight. Moreover, by adding more layers to networks, many networks improve model performance [20 –22]. The redundant network layer, however, will cause model degradation as the network’s depth rises, which will increase the amount of time and memory used for computation.

At present, these above methods have been applied to LDCT image denoising. It is challenging for the model-based method [24 –29] to accept the hybrid noise distribution as an a priori for the LDCT image denoising problem. Nevertheless, the learning-based approach [30 –36] is not sufficiently adaptable to the image content, making it challenging to distinguish between noise and texture information. To address the abovementioned issues, a projection-based deep attention denoising network called Adaptive Projection Network (APNet) is proposed. The underlying clean image with texture features is recovered from the medical noise image using non-local information. To consider changes in texture and edges, we created a Dual Attention Residual Block (DARB) that weights significant features. The network uses DARB in the encoder-decoder structure based on U-Net [23] to eliminate noise from coarse to fine. Additionally, we created a Subspace Feature Fusion Module (SFFM) to separate the combined noise and detail texture in medical images by feature projection and further fuse features in skip connections. Our approach can achieve good performance while using significantly less computing than the most sophisticated approaches.

In brief, the contributions to this work are as follows:

A dual attention residual block that includes both spatial and channel attention mechanisms is suggested to integrate with the U-shaped network. While extracting noise features, the weight can be modified adaptively to suppress unnecessary information.

A subspace feature fusion module that combines encoding and decoding capabilities is proposed. Convolution is used to enhance the stable network in hierarchical feature recovery. Then, the orthogonal projection approach is introduced to reconstruct the image in feature space. The combined noise and the intricate texture of the image may be distinguished in the rebuilt image.

An adaptive projection network is proposed.

The remaining parts of the paper are organized as follows. Section 2 describes the motivation of the usage of the image projection for the image denoising problem. Section 3 presents the proposed method. Section 4 explains the innovation of the proposed approach. Section 5 presents experimental results. Section 6 concludes the paper.

2 Research motivation

Deep learning methods usually involve a huge amount of high-dimensional data, and in the case of images, each dimension is also known as an image feature. However, for the training dataset of the network, only a few features contain useful information, and the importance of features in other dimensions is insufficient. Through feature extraction, representative features are selected from the high-dimensional feature vectors to improve the performance of network learning information while reducing the dimension. However, information loss is inevitable in the process of dimension reduction. In order to reduce the loss of dimension reduction, it is necessary to find those dimensional features that represent important information as much as possible when designing the algorithm.

Therefore, some subspace methods are used in image processing, such as Fourier transform and wavelet denoising. The basic principle of this kind of method is to split the digital image into a set of bases, let the neural network to find the basis where the noise is located, and remove this part while preserving the original information as much as possible.

The algorithm proposed in this section uses non-local image information by orthogonal linear projection method to effectively compress high-dimensional data features. The principle of image projection is shown in Fig. 1. A set of image basis vectors are generated from the feature mapping of the input image through the neural network, and then the vectors in the original feature map are projected into these feature basis vectors, that is, the reconstructed image is obtained in the subspace spanned by these basis vectors. The learned feature basis vector can suppress the noisy vector without affecting the vector where the image texture is located. In the training of neural networks, deep learning is used to automatically learn a set of bases, and the network learns to generate the base vector of signal subspace, and projects the input image into the subspace. Since natural images are usually in low-rank signal space, the image texture information can be enhanced after reconstruction through accurate learning and generation of these base vectors. Through image projection, the reconstructed image can extract important features, retain most of the original information, and suppress noise unrelated to the generated basis vector. Subspace projection can train a network that separates information from noise.

Fig. 1

Principle of image projection. By learning and generating a set of feature basis vectors accurately through neural network, the vectors in the input feature map are projected onto these feature basis vectors, and the reconstruction image can retain the structure information of the image well and suppress the noise.

3 Proposed network architecture

3.1 Network structure

Denoising algorithms in use have been successful at learning specific noise information, but when faced with more complicated noise intensity, these networks frequently fail to hold onto non-significant features. As a result, we suggest APNet, a network for denoising using image adaptive projection. The network uses the orthogonal linear projection given in the previous section and has adaptive learning ability. Figure 2 depicts the structure. The three steps of APNet are the encoder stage, decoder stage, and fusion stage.

Fig. 2

The network architecture of the Adaptive Projection Network (APNet).

Following the transmission of the noisy image into the network, the coding step does four times of down-sampling, and four groups of DARBs extract the characteristics of each scale. The decoder receives the low-level characteristics obtained by encoding via a residual block (RB). DARB is also used to up-sampling and extract sophisticated characteristics during the decoding phase. To make up for information lost during sampling, RB and SFFM are introduced in the fusion step in order to bypass connections. The encoder sends low-level features to the SFFM, and the decoder sends high-level features to the module. In order to further restore the image’s details, the projection features obtained from the low-level feature mapping are combined with the high-level features before being sent to the appropriate encoder. A convolution layer then obtains the residual output of the reconstructed image.

The network’s training set is made up of pairs of clear images and images with medical noise. We define a clean image x a noisy image y, and the denoised image $\hat{y}$ that is produced by the network. Our model can thus be represented as follows: $\hat{y} = F_{APNet} (y)$ (1)

F_APNet (·) stands for the model created through neural network training. L₁ loss between the clean image x and the denoising result $\hat{y}$ is utilized as a loss function during the training process. The formalized form of the loss function L₁ (·) is:

$L_{1} (x, \hat{y}) = | | x - \hat{y} | |_{1}$ (2)

The trained network model F_APNet (·) can be used to directly generate the noise image in the test set.

3.2 Dual attention residual block (DARB)

We offer a residual learning design module to execute down-sampling and up-sampling operations to increase the stability of network training. Figure 3(a) demonstrates that our RB is made up of convolutional layers and the Leaky ReLU [37]. In contrast to the modules in the traditional residual network (ResNet) [38], our RB eliminates extraneous elements to streamline the information flow during the learning phase.

Fig. 3

The structure of some key building blocks of the network.

To improve the network’s capacity for detail recognition, the RB contains a double attention mechanism. Figure 3(b) depicts the dual attention residual block’s organizational structure. The deep attention (DA) module is on the upper branch whereas the channel attention (CA) [39 , 43] module is on the bottom branch. Convolution feature mapping is used by CA to extract significant channel characteristics. Enter a feature map and squeeze it using global average pooling (GAP) [41] to reduce its spatial dimension. Excitation operation using two layers of convolution and a Leaky ReLU activation function is the compressed feature descriptor. Lastly, dimension transformation of the features is performed using the Sigmoid activation function to obtain the CA branch output. To further delve into the network, DA branches mine the correlation of spatial variables. The DA module initially performs GAP and maximum pooling (MP) operations on the feature map before connecting the output features. The weight of the input characteristics is then calibrated in the spatial dimension, and the deep mapping is obtained using the convolutional layer and Sigmoid activation function.

The output feature graph I_DARB of DARB can be expressed as follows given an input feature graph I: $I_{DARB} = I + F_{Conv} (F_{C} (F_{S} (I)) \cdot F_{D} (F_{S} (I)))$ (3)

Before the feature enters the attention branch, the shallow feature extraction process is represented by F_S, the output of the CA and DA branches are represented by F_C and F_D, respectively, and the final convolution operation is represented by F_Conv. To aid the network in learning features more effectively, DARB is utilized to extract useful detail texture information from feature maps and suppress irrelevant information.

3.3 Subspace feature fusion module (SFFM)

Figure 4 depicts the structure of the Subspace Feature Fusion Module (SFFM), which is added to each skip connection during the Fusion phase. Before the low-level features from the encoder and the high-level features from the decoder are combined, two convolution layers are utilized for convolution aggregation. As a result, the network will be more stable during the optimization process. It can be solved by better integrating the structural information from the deep network and the texture information from the shallow network during the hierarchical feature recovery. The feature map was then resized using the fusion features that had been fed into the RB. The noise information is then separated using an orthogonal linear subspace projection. In the neural network-based image projection method, a set of basis vectors are created in the input image feature space, and the feature map’s vectors are then projected onto these basis vectors to transform the feature mapping into signal subspace.

Fig. 4

The architecture of the Subspace Feature Fusion Module (SFFM).

The lower-level feature graph from the skip connection is projected into the signal subspace that is guided by the up-sampled higher-level feature following the convolution refinement, and the projected feature is then further fused with the original higher-level feature. I_de is the high-level feature graph that has been refined and is the same size as I_en, which is the low-level feature graph that the encoder ran via skip joins and convolution to refine. Assume that a subspace U based on the feature graphs I_en and I_de has a set of bases [u₁, u₂, ⋯ , u_k]. The technique of projecting low-level feature graph vector I_en to these basis vectors is denoted by the orthogonal linear projection as follows:

$U (I_{en}) = \sum_{k = 1}^{K} λ_{1} u_{1} + λ_{2} u_{2} + \dots + λ_{k} u_{k} = \sum_{k = 1}^{K} λ_{k} U_{k} = U λ$ (4)

when λ = [λ₁, λ₂, ⋯ , λ_k] ^T. As the vector I_en - U (I_en) is perpendicular to the foundation of the subspace U, we can also derive: $[\begin{matrix} u_{1}^{T} \\ ⋮ \\ u_{k}^{T} \end{matrix}] (I_{en} - U (I_{en})) = 0 \Rightarrow U^{T} (I_{en} - U λ) = 0 \Rightarrow λ = {(U^{T} U)}^{- 1} U^{T} I_{en}$ (5)

After that, the feature graph I_en in the signal subspace produced the following reconstructed image: $U (I_{en}) = U λ = U {(U^{T} U)}^{- 1} U^{T} I_{en}$ (6)

More original information is present in the feature map that the encoder input. The image is rebuilt via a projection operation, and the denoising procedure incorporates the global structure. The projection can preserve the local structure information of the input signal, improve the reconstructed signal, and improve the ability of the denoising process to discriminate the noise.

4 Innovation of the proposed approach

The network proposed in this paper refers to the classic U-Net U-shaped structure, whose backbone is divided into two symmetrical left and right parts. On the left is the feature extraction network, the encoder, where the original input image is down-sampled four times through a stack of convolution and maximum pooling. On the right side is the feature fusion network, that is, the decoder. While up-sampling is carried out, the feature maps of each level and the feature maps obtained by deconvolution are fused by skip connection. Inspired by U-Net, we adopted this coding-decoding framework. The difference is that:

For low-level visual tasks, especially images in the medical field, traditional network structures are not sufficient to extract multi-level features. Therefore, we introduce residual modules in encoder and decoder to extract the features of each scale. Compared with U-Net convolution, the structure of residual can solve the training optimization problem when the number of layers deepens. This is because the residual structure does not increase the number of parameters of the model through the identity mapping, which means that the computational complexity will not increase, and the convergence speed of the model will also be accelerated, so that the accuracy of the network performance will be improved to a certain extent.

In particular, to further avoid the information loss in the transmission process of features, we transform the one-way skip connection in the traditional structure into a two-way information transmission. The skip connection in U-Net transmits the features from encoder to decoder, and our network receives the features from decoder and encoder at the same time in the jump connection to compensate for the information loss in the sampling process. The traditional convolution structure depends on the corresponding fixed local filter, so we also introduce the projection mechanism, which reconstructs the texture based on the globally determined coefficient, and the reconstructed image can help the network further refine and separate the mixed noise from the real texture information.

5 Experiments

5.1 Experimental setting

5.1.1 Dataset

2048 images from the Low-Dose Parallel Beam (LoDoPaB)-CT dataset [33] were utilized to train our model for image denoising tasks. The two-dimensional image form received from the LIDC/IDRI public database of CT reconstruction of the human chest was utilized in this dataset to represent the ground truth (GT). By adding simulated synthetic noise to 2048 GT images in the LoDoPaB-CT dataset and considering different noise levels, a variety of paired data was generated for the training model. The way to synthesize noise is to add Gaussian noise and Poisson noise respectively through MATLAB’s built-in function imnoise. To test the denoising effect of the model under various noise intensities, 256 images from the LoDoPaB-CT dataset were also included in the test set. In addition, we added dental CT images provided by SHDMU (Stomatological Hospital of Dalian Medical University, China) to further test the model’s denoising capabilities.

5.1.2 Baseline algorithm

As a baseline, the proposed network was evaluated against numerous image-denoising methods, such as DnCNN [1], IRCNN [40], FFDNet [11], and NBNet [17].

5.1.3 Evaluation metric

We calculated the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) of the denoised images as evaluation indices to compare the algorithm’s objective performance.

5.2 Network parameters and training settings

The three stages of the network structure proposed in this paper are the encoder stage, decoder stage, and fusion stage. Dual attention residual modules are presented in both the encoder and decoder and the fusion stage consists of residual modules and subspace feature fusion modules. The convolution layer utilized in these modules has a convolution kernel size of 3×3. When employing LDCT images, the clean images are truncated into 128×128 image blocks for the training set and the test set, respectively, and various amounts of synthetic noise are applied to the clean images. The batch size parameter was set to 4, the learning rate at le-5, the training procedure iterated 200 times, and Adam was utilized as the model optimizer. The PyTorch framework is used to implement the algorithms in this paper. On a workstation with an NVIDIA GeForce RTX 3090 GPU and a Core (TM) i7-7700K CPU, all the comparison algorithms are run on the same system.

5.3 Comparison of denoising performance of chest CT images synthesized with Poisson noise

Poisson noise is created in medical CT during photoelectric conversion, so the experimental component simulates the impact of low-intensity Poisson noise. To examine the effect of different denoising techniques, we added two different levels of Poisson noise intensities to the clean images of the test dataset before comparing the synthetic Poisson noise images. We synthesized two kinds of Poisson noise on the test set containing 256 LDCT images, and denoised them by using the method proposed in this paper and other comparison methods. The average results of image evaluation indicators are shown in Tables 1 and 2, respectively. The greatest value under the evaluated noise level is reached by our method.

Table 1
Average PSNR (dB) and SSIM results on synthetic Poisson×1 noisy images

Noise level Methods Evaluation Metric

Poisson×1 DnCNN PSNR 32.45

SSIM 0.7369

IRCNN PSNR 36.10

SSIM 0.8598

FFDNet PSNR 36.19

SSIM 0.8611

NBNet PSNR 38.56

SSIM 0.9413

APNet PSNR 39.14

SSIM 0.9467

Noise level	Methods	Evaluation Metric
Poisson×1	DnCNN	PSNR	32.45
		SSIM	0.7369
	IRCNN	PSNR	36.10
		SSIM	0.8598
	FFDNet	PSNR	36.19
		SSIM	0.8611
	NBNet	PSNR	38.56
		SSIM	0.9413
	APNet	PSNR	39.14
		SSIM	0.9467

Table 2

Average PSNR (dB) and SSIM results on synthetic Poisson×2 noisy images

Noise level	Methods	Evaluation Metric
Poisson×2	DnCNN	PSNR	29.34
		SSIM	29.34
	IRCNN	PSNR	32.82
		SSIM	0.7635
	FFDNet	PSNR	33.19
		SSIM	0.7728
	NBNet	PSNR	36.40
		SSIM	0.9255
	APNet	PSNR	37.58
		SSIM	0.9324

Figures 5 and 6 display the contrast in a visual manner. Our approach can recover intricate textures from low-intensity Poisson noise without adding artifacts since medical images self-adapt.

Fig. 5

Denoising results of chest images with noise level Poisson×1.

Fig. 6

Denoising results of chest images with noise level Poisson×2.

5.4 Comparison of denoising performance of chest CT images synthesized with mixed poisson-gaussian noise

To simulate the mixed Poisson-Gaussian noise, we have added Gaussian noise to images first and then add Poisson noise. Because in the mixed Poisson-Gaussian noise model, the additive white Gaussian noise is usually used, we also consider this type of Gaussian noise. We take into consideration two noise variances for the Gaussian noise with σ= 10 and σ= 20. The average PSNR and SSIM values for the proposed technique and a few commonly used denoising methods are shown in Tables 3 and 4. Our approach considerably enhances the denoising performance of the LDCT images and outperforms other methods in terms of index average value.

Table 3
Average PSNR (dB) and SSIM results on mixed Gaussian noise with σ= 10 and Poisson noisy images

Noise level Methods Evaluation Metric

Gaussian σ= 10 Poisson×1 DnCNN PSNR 31.74

SSIM 0.6872

IRCNN PSNR 32.42

SSIM 0.7490

FFDNet PSNR 32.34

SSIM 0.7528

NBNet PSNR 35.27

SSIM 0.9025

APNet PSNR 36.46

SSIM 0.9101

Poisson×2 DnCNN PSNR 31.21

SSIM 0.6602

IRCNN PSNR 31.29

SSIM 0.6793

FFDNet PSNR 31.32

SSIM 0.6776

NBNet PSNR 33.61

SSIM 0.8729

APNet PSNR 34.61

SSIM 0.8845

Noise level		Methods	Evaluation Metric
Gaussian σ= 10	Poisson×1	DnCNN	PSNR	31.74
			SSIM	0.6872
		IRCNN	PSNR	32.42
			SSIM	0.7490
		FFDNet	PSNR	32.34
			SSIM	0.7528
		NBNet	PSNR	35.27
			SSIM	0.9025
		APNet	PSNR	36.46
			SSIM	0.9101
	Poisson×2	DnCNN	PSNR	31.21
			SSIM	0.6602
		IRCNN	PSNR	31.29
			SSIM	0.6793
		FFDNet	PSNR	31.32
			SSIM	0.6776
		NBNet	PSNR	33.61
			SSIM	0.8729
		APNet	PSNR	34.61
			SSIM	0.8845

Table 4

Average PSNR (dB) and SSIM results on mixed Gaussian noise with σ= 20 and Poisson noisy images

Noise level		Methods	Evaluation Metric
Gaussian σ= 20	Poisson×1	DnCNN	PSNR	29.00
			SSIM	0.5794
		IRCNN	PSNR	30.89
			SSIM	0.7499
		FFDNet	PSNR	30.24
			SSIM	0.7237
		NBNet	PSNR	34.90
			SSIM	0.8943
		APNet	PSNR	35.59
			SSIM	0.8986
	Poisson×2	DnCNN	PSNR	28.68
			SSIM	0.5554
		IRCNN	PSNR	28.94
			SSIM	0.5999
		FFDNet	PSNR	28.88
			SSIM	0.5940
		NBNet	PSNR	31.62
			SSIM	0.8291
		APNet	PSNR	32.82
			SSIM	0.8660

Figures 7 and 8 compare the denoising outcomes for various noise levels. As can be observed, while eliminating noise, our suggested method maintains and preserves the texture of microscopic lung tissue in medical images. The visual comparison confirms the proposed method achieved higher performance in LDCT image denoising.

Fig. 7

Denoising results of chest images with mixed Gaussian noise level with σ= 10 and Poisson×1.

Fig. 8

Denoising results of chest images with mixed Gaussian noise with σ= 20 and Poisson×2.

5.5 Comparison of denoising performance of dental CT images synthesized with mixed noise

To further verify the denoising ability of the model, we added experiments in real scenarios. By synthesizing noise on dental CT images, the denoising results of different methods were tested. Two groups of experiments are shown in Figs. 9 and 10, which respectively compare the denoising results and evaluation indexes under Poisson noise and Poisson-Gaussian mixed noise. Comprehensively, the proposed method can obtain the best denoising effect.

Fig. 9

Denoising results of dental CT images with noise level Poisson×2.

Fig. 10

Denoising results of dental CT images with mixed Gaussian noise with σ= 10 and Poisson×1.

6 Conclusion

In this paper, we have proposed a deep projection-based attention image denoising network. By projecting features onto the network, it is possible to perform adaptive denoising and successfully separate the noise and texture of detail in LDCT images. The network can weight significant features based on content and texture to adjust to essential texture and edge information in medical images according to the use of dual attention residuals, the introduction of convolutional refinement, and the inclusion of orthogonal projection reconstruction. The encoder-decoder structure can also accurately store detailed rich images and guarantee the effectiveness of network operation. The proposed network offers qualitative and quantitative advantages over conventional image denoising methods. This technique can help the pathological judgment process in the medical diagnosis process and significantly improve the processing of uneven texture features in medical CT images.

Footnotes

Acknowledgments

This work is supported by General project of Liaoning Provincial Department of Education, China, No. LJKMZ20222010; Joint Fund LH-JSRZ-202205; Liaoning Provincial Education Science Planning Project JG21EB029. Fund receiver: Dr. Xiang Li. This study is funded by University of Economics Ho Chi Minh City, Vietnam. Fund receiver: Dr. Dang Ngoc Hoang Thanh.

References

Zhang

, Zuo

, Chen

, Meng

and Zhang

, Beyond a gaussian denoiser: Residual learning of deep CNN for image denoising, IEEE Trans Image Process 26(7) (2017), 3142–3155.

, Park

, Jeong

Deep iterative down-up CNN for image denoising, CVPR Workshops (2019), 2095–2103.

Lefkimmiatis

Universal denoising networks: A novel CNN architecture for image denoising, CVPR (2018), 3204–3213.

Chang

, Li

, Feng

and Xu

, Spatial-adaptive network for single image denoising,, ECCV (30) (2020), 171–187.

Tai

, Yang

, Liu

, Xu

MemNet: A persistent memory network for image restoration, ICCV (2017), 4549–4557.

Luisier

, Blu

and Unser

, Image denoising in mixed poisson-gaussian noise, IEEE Trans Image Process 20(3) (2011), 696–708.

Byun

, Cha

, Moon

FBI-denoiser: Fast blind image denoiser for poisson-gaussian noise, CVPR (2021), 5768–5777.

Mishro

P.K.

, Agrawal

and Panda

, Medical image denoising using spline based fuzzy wavelet shrink technique, CVIP (1) (2019), 185–194.

Buades

and Coll

, Jean-michel morel: A non-local algorithm for image denoising, CVPR (2) (2005), 60–65.

10.

Dabov

, Foi

, Katkovnik

, Egiazarian

K.O.

Image restoration by sparse 3D transform-domain collaborative filtering, Image Processing: Algorithms and Systems (2008), 681207.

11.

Zhang

, Zuo

and Zhang

, FFDNet: Toward a fast and flexible solution for CNN-based image denoising, IEEE Trans Image Process 27(9) (2018), 4608–4622.

12.

Guo

, Yan

, Zhang

, Zuo

, Zhang

Toward convolutional blind denoising of real photographs, CVPR (2019), 1712–1722.

13.

Tai

, Yang

, Liu

, Xu

MemNet: A persistent memory network for image restoration, ICCV (2017), 4549–4557.

14.

Zhang

, Zuo

, Zhang

Learning a single convolutional super-resolution network for multiple degradations, CVPR (2018), 3262–3271.

15.

Zamir

, Arora

, Khan

S.H.

, Hayat

, Khan

F.S.

, Yang

and Shao

, Learning enriched features for real image restoration and enhancement, ECCV (25) (2020), 492–511.

16.

Zamir

S.W.

, Arora

, Khan

S.H.

, Hayat

, Khan

F.S.

, Yang

, Shao

Multi-stage progressive image restoration, CVPR (2021), 14821–14831.

17.

Cheng

, Wang

, Huang

, Liu

, Fan

and Liu

, NBNet: Noise basis learning for image denoising with subspace projection, CVPR (2021), 4896–4906.

18.

Chen

, Lu

, Zhang

, Chu

, Chen

HINet: Half instance normalization network for image restoration, CVPR Workshops (2021), 182–192.

19.

Ulyanov

, Vedaldi

, Lempitsky

V.S.

Improved texture networks: Maximizing quality and diversity in feedforward stylization and texture synthesis, CVPR (2017), 4105–4113.

20.

Simonyan

, Zisserman

Very deep convolutional networks for large-scale image recognition, ICLR (2015).

21.

Krizhevsky

, Sutskever

and Hinton

G.E.

, ImageNet classification with deep convolutional neural networks, Commun ACM 60(6) (2017), 84–90.

22.

, Müller

, Ali

Thabet, bernard ghanem: DeepGCNs: Can GCNs go as deep as CNNs? ICCV (2019), 9266–9275.

23.

Ronneberger

, Fischer

and Brox

, U-Net: Convolutional networks for biomedical image segmentation, MICCAI (3) (2015), 234–241.

24.

Manduca

, Yu

, Trzasko

J.D.

, Khaylova

, Kofler

J.M.

, McCollough

C.M.

and Fletcher

J.G.

, Projection space denoising with bilateral filtering and CT noise modeling for dose reduction in CT, Medical Physics 36(11) (2009), 4911–4919.

25.

Andria

, Attivissimo

, Cavone

, Spadavecchia

, Magli

Denoising Filter to Improve the Quality of CT Images, Proceedings of IEEE Conference on Instrumentation and Measurement Technology, 2009, pp. 947–950.

26.

Soni

V.K.

and Karanjgaokar

, Wavelet based noise reduction in medical images, IEEE Transactions on Medical Imaging 27(12) (2008), 1685–1703.

27.

Kang

, Slomka

, Nakazato

, Woo

, Berman

S.D.

, Kuo

C.-C. J.

, Dey

Image denoising of low-radiation dose coronary CT angiography by an adaptive block-matching 3D algorithm, in SPIE Med. Imaging International Society for Optics and Photonics, 2013, pp. 86 692G–86 692G.

28.

, Huang

, Feng

, Zhang

, Lu

, Liang

and Chen

, Low-dose computed tomography image restoration using previous normal-dose scan, Med Phys (38) (2011), 5713–5731.

29.

, Yu

, Mou

, Zhang

, Hsieh

and Wang

, Low-dose X-ray CT reconstruction via dictionary learning, IEEE Trans Med Imaging 31(9) (2012), 1682–1697.

30.

Chen

, Zhang

, Liao

, Li

, Zhou

, Wang

Low-dose CT denoising with convolutional neural network, CoRR abs/1610.00321 (2016).

31.

Kang

, Min

, Chul Ye

WaveNet: A deep convolutional neural network using directional wavelets for low-dose X-ray CT reconstruction, CoRR abs/1610.09736 (2016).

32.

Chen

, Zhang

, Kalra

K.M.

, Lin

, Chen

, Liao

, Zhou

and Wang

, Low-dose CT with a residual encoder-decoder convolutional neural network, IEEE Trans Medical Imaging 36(12) (2017), 2524–2535.

33.

Leuschner

, Schmidt

, Baguer

D.O.

and Maass

, LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction, Scientific Data 8 (2021), 109.

34.

Goodfellow

I.J.

, Pouget-Abadie

, Mirza

, Xu

, Warde-Farley

, Ozair

, Courville

A.C.

, Bengio

Generative adversarial networks, CoRR abs/2014, 1406.2661.

35.

and Babyn

P.S.

, Sharpness-aware low-dose CT denoising using conditional generative adversarial network, J Digit Imaging 31(5) (2018), 655–669.

36.

Yang

, Yan

, Zhang

, Yu

, Shi

, Mou

, Kalra

M.K.

, Zhang

, Sun

and Wang

, Low-dose CT image denoising using a generative adversarial network with wasserstein distance and perceptual loss, IEEE Trans Medical Imaging 37(6) (2018), 1348–1357.

37.

Maas

A.L.

, Hannun

A.Y.

, Ng

A.Y.

Rectifier nonlinearities improve neural network acoustic models, In Proc Icml, 2013, 30:3.

38.

, Zhang

, Ren

, Sun

Deep residual learning for image recognition, CVPR (2016), 770–778.

39.

Woo

, Park

and Lee

, In so kweon. CBAM: Convolutional block attention module, ECCV (7) (2018), 3–19.

40.

Liu

, Zhang

, Lian

and Zuo

, Multi-level wavelet convolutional neural networks[J], IEEE Access 7 (2019), 74973–74985.

41.

Wang

, Liu

and Fan

, An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection, Information Fusion 98 (2019), 101828.

42.

Wang

, Tang

, Pan

, Tang.

Learning a tree-structured channel-wise refinement network for efficient image deraining, IEEE ICME (2021), 1–6.

43.

Wang

, Ma

, Liu

, Fan

Semantic-aware texture-structure feature collaboration for underwater image enhancement, IEEE ICRA (2022), 4592–4598.

APNet: Adaptive projection network for medical image denoising

Abstract

BACKGROUND:

OBJECTIVE:

METHODS:

RESULTS:

CONCLUSIONS:

Keywords

1 Introduction

2 Research motivation

3.1 Network structure

5 Experiments

5.1 Experimental setting

5.1.1 Dataset

5.1.2 Baseline algorithm

5.1.3 Evaluation metric

5.2 Network parameters and training settings

5.3 Comparison of denoising performance of chest CT images synthesized with Poisson noise

Table 1 Average PSNR (dB) and SSIM results on synthetic Poisson×1 noisy images Noise level Methods Evaluation Metric Poisson×1 DnCNN PSNR 32.45 SSIM 0.7369 IRCNN PSNR 36.10 SSIM 0.8598 FFDNet PSNR 36.19 SSIM 0.8611 NBNet PSNR 38.56 SSIM 0.9413 APNet PSNR 39.14 SSIM 0.9467

Footnotes

Acknowledgments

References

Table 1
Average PSNR (dB) and SSIM results on synthetic Poisson×1 noisy images

Noise level Methods Evaluation Metric

Poisson×1 DnCNN PSNR 32.45

SSIM 0.7369

IRCNN PSNR 36.10

SSIM 0.8598

FFDNet PSNR 36.19

SSIM 0.8611

NBNet PSNR 38.56

SSIM 0.9413

APNet PSNR 39.14

SSIM 0.9467