Sage Journals: Discover world-class research

Abstract

In modern urban traffic systems, intersection monitoring systems are used to monitor traffic flows and track vehicles by recognizing license plates. However, intersection monitors often produce motion-blurred images because of the rapid movement of cars. If a deep learning network is used for image deblurring, the blurring of the image can be eliminated first, and then the complete vehicle information can be obtained to improve the recognition rate. To restore a dynamic blurred image to a sharp image, this paper proposes a multi-scale modified U-Net image deblurring network using dilated convolution and employs a variable scaling iterative strategy to make the scheme more adaptable to actual blurred images. Multi-scale architecture uses scale changes to learn the characteristics of different scales of images, and the use of dilated convolution can improve the advantages of the receptive field and obtain more information from features without increasing the computational cost. Experimental results are obtained using a synthetic motion-blurred image dataset and a real blurred image dataset for comparison with existing deblurring methods. The experimental results demonstrate that the image deblurring method proposed in this paper has a favorable effect on actual motion-blurred images.

Keywords

U-Net motion blur blind deblurring dilated convolution multi-scale

Introduction

In road image surveillance, vehicles are the primary objects that require identification. License plate image recognition helps track vehicles and obtain vehicle owner information.¹ However, a license plate may not be accurately recognized owing to motion blur caused by the rapid movement of the vehicle. If there is a traffic accident or when a vehicle is being tracked, more time and cost will be spent if the image cannot be identified owing to blurring and afterimages. According to the research results of Kupyn et al.,² image deblurring can improve the success rate of image recognition. If an image deblurring method based on deep learning is used to remove the motion blur of the monitor screen in advance, it can assist in image recognition and analysis, as well as restore information lost due to blurring.

Digital photography often captures defective, low-quality images, which may be caused by inappropriate camera settings or poor camera hardware. Common defects in images are mainly caused by dynamic blurring, which results from static shooting of moving objects or camera-shaking. The image deblurring method defines a blurred image as a convolution calculation using a blur kernel to perform a convolution calculation on a sharp image, which can be expressed as follows:

B = K \otimes I + N

(1)

where B is the blurred image, K is the blur kernel, I is the original sharp image, N is noise, and ⊗ denotes the convolution operation. The blur kernel is also called the point-spread function and describes the diffusion process of pixels along a dynamic trajectory. The purpose of image deblurring is to obtain sharp images from blurred ones. Image deconvolution uses blurred images and blurred kernels to restore the sharp images. Existing deconvolution methods can be divided into blind and non-blind deconvolution. The blur kernel of the former is unknown and that of the latter is known. In most cases, only blurred images are obtained. However, the blurred kernels and sharp images are unknown. Precisely estimating the blur kernel is the key to restoring a blurred image to a clear image using the traditional method. However, even if a blurred image is deconvolved using the estimated blur kernel, we cannot determine whether the solution for the approximate sharp image is correct. As blind deconvolution is an ill-posed problem, additional information is required to restore sharp images.

In research on image deblurring methods, traditional methods reduce the resolution of blurred images and make them clearer through different limitations and different priors.^3–6 Fergus et al.³ used a large number of image statistics to discover the law of image gradient distribution and proposed a mixed Gaussian model to simulate the law of gradient distribution, estimate the blur kernel, and restore the sharp image. Cho and Lee⁴ proposed a convolution-like edge prediction method based on the method of Kupyn et al.² A shock-wave filter was used to enhance the edges of the image, and the sharpened image was then used to iteratively predict the sharpened edge. Whyte et al.⁵ described the dynamics of the camera by using the rotation of the camera and pointed out that the blurred image caused by camera movement is a non-uniform blur; they then proposed a parametric model according to the problem of non-uniform blur. Hu et al.⁶ found that it is not possible to estimate the blur kernel for the effective restoration of images in image regions with edge and gradient distribution rules and that the blur kernels obtained using different regions of the image are not consistent. Therefore, an image deblurring method using conditional random fields was proposed to obtain the most effective blur kernel area and to use a reasonable image area to estimate the blur kernel for deblurring.

Researchers have introduced learning-based methods for complex natural blurring. Sun et al.⁷ proposed a patch enhancement method to collect clear patches with features similar to those of blurred images by learning a large number of clear images. They used these patches to replace the edges in the blurred image and then estimated the blur kernel with these edge-enhanced images to obtain a clear image. Xu et al.⁸ combined traditional image optimization methods with deep convolutional neural networks (CNNs) to design a separable network structure. This network was divided into two main parts: a deconvolution network and noise reduction network. The deconvolution network extracts features and restores sharp images using a noise reduction network. Because there is no need to perform image preprocessing, end-to-end image deblurring is achieved. Isola et al.⁹ and Zhu et al.¹⁰ chose image-to-image conversion while ignoring the estimated blur kernel. The former requires multiple sets of image pairs corresponding to the sketch image and the target-generated image as a training dataset, while the latter does not require corresponding image pairs; it needs to collect a large number of target images with objects similar to the input image as a style-transformed training dataset.

Methodology

In this section, the research theory and framework of this study are presented.

U-Net

Encoder–decoder networks are frequently used in computer vision and have achieved good results in image deblurring methods. This network is a CNN architecture designed symmetrically with an encoder and a decoder.¹¹ The encoder downsamples the input image into a feature map with more channels and a smaller size while extracting shallow features with rich details, while the decoder upsamples the smaller sized feature map into a larger sized feature map with fewer channels and extracts deeper features. Ronneberger et al.¹² added a skip connection between the encoder and decoder networks, also known as a residual connection, whose main function is feature fusion. As the network becomes deeper, the downsampling process loses a significant amount of feature information. To ensure that the final feature map has sufficiently detailed information, the feature map extracted by encoder downsampling skips the multilayer network and directly connects with the decoder correspondence. The feature maps obtained by the corresponding decoder upsampling are fused with shallow and deep features that retain more detail. As the network structure drawing is similar to a U-shape, it is called a U-Net, as shown in Figure 1. The blue, gray, red, and green arrows represent the image convolution calculation, skip connection, max pooling, and deconvolution, respectively. The U-Net downsampling method adopts the max pooling method, which divides the image into several 2 × 2 rectangular areas, outputs the maximum value for each area, and reduces the amount of image data while retaining important information.

Figure 1.

U-Net architecture.¹².

Tao et al.¹³ proposed the use of a residual block (ResBlock)¹⁴ combined with the encoder–decoder network U-Net architecture, as shown in Figure 2, where conv is the convolutional layer and ReLU (Rectified Linear Unit)¹⁵ is the activation function. ResBlock deepens the network and avoids the vanishing gradient problem. Deep features have a larger receptive field, which helps to solve the problem of dynamic blurring.

Figure 2.

Structure of ResBlock.¹³.

Multi-scale architecture

Multi-scale architectures are widely used in deblurring research. Nah et al.¹⁴ proposed a deep multi-scale convolutional neural network (MSCNN), as shown in Figure 3, for dynamic scenes in which the scale ranges from large to small. A Gaussian pyramid was used to process image scale changes, and the model used multiple ResBlocks to deepen the network in the series network architecture. However, this multi-scale architecture model is excessively large, which increases the difficulty of training and prediction. Kupyn et al.¹⁶ proposed a faster and more efficient generative adversarial network (GAN) based on DeblurGAN1, which uses a feature pyramid network in the generator part.¹⁷ The input images were directly processed by the generator for multi-scale operations in the GAN, which significantly reduced the operation time and model size.

Figure 3.

Structure of MSCNN.¹⁴ MSCNN: multi-scale convolutional neural network.

The aforementioned image deblurring methods are fixed multi-scale architectures for both training and prediction. Ye et al.¹⁸ considered that the effect of deblurring was slightly inadequate for these fixed multi-scale architectures in naturally blurred images. Therefore, they proposed a scale-iterative upscaling network (SIUN) that could adjust the scale structure and number of iterations according to the degree of image blurring, as shown in Figure 4. SIUN uses U-Net with a residual dense network (RDN).¹⁹ An RDN is a network that helps improve image resolution and allows the model to recover more details in the image.

Figure 4.

Network architecture of raise the scale and iterate.¹⁷.

The general multi-scale deblurring method involves inputting an image into the model, upsampling the deblurred image, and then inputting it to the next scale to continue the operation. This training method can only achieve single learning on a single scale, whereas Ye et al.¹⁸ considered a scale as a scale iteration. The input image was first passed through the U-Net₁ model combined with a residual dense block and upsampled to increase the scale. The upsampled image was then input into U-Net₂ to obtain a scale iteration for a deblurred image. The SIUN training method can deepen the network compared with other fixed multi-scale architectures. The results show that the best model can be trained using three-scale iterations, and the best deblurring results can be obtained using four-scale iterations during the prediction. Compared with the previous fixed multi-scale architecture, a moderate-scale iteration strategy is more adaptable to naturally blurred scenes from different scenes, as used for training and prediction.

Dilated convolution

Dilated convolution is also called atrous convolution, which originated in the field of image segmentation.²⁰ Dilated convolution can expand the receptive field without losing features and can obtain more information from images or feature maps without increasing the computational burden. The receptive field refers to the area within the input space that is influenced by a specific feature of a CNN. More informally, it is the part of a tensor that after convolution results in a feature. The dilation rate was used to control the number of holes inserted between adjacent pixels in the convolution kernel. The area of the dilated convolution kernel is calculated as follows:

F = (k - 1) \times (r - 1) + k

(2)

where F denotes the area of the dilated convolution kernel, k is the area of the original convolution kernel, and r is the dilation coefficient. Although the receptive field of the convolution kernel increased, no additional parameters were added because only the original pixels were convolved in the image. A schematic of the dilated convolution kernel is shown in Figure 5, where the grid corresponds to the pixel positions in the image, with the orange portion representing each pixel in the image. The light blue region represents the receptive field of various dilated convolutional networks.

Figure 5.

Schematic diagram of the convolution kernel of dilated convolution. (a) r = 1, (b) r = 2, and (c) r = 3.

Although dilated convolution can ensure a larger receptive field with the same parameters, it is not suitable for small objects in an image, because such a large receptive field is not required. In addition, it can be shown that the pixels of the convolution kernel that are calculated are not continuous. If several dilated convolution layers are used, certain pixels in the feature map may not participate in the calculation. This potential problem is known as the gridding effect. To solve these problems, Wang et al.²¹ proposed hybrid dilated convolution (HDC). The HDC model uses mixes of several convolutions with different dilation rates, and the receptive field covers all areas to avoid holes. Figure 6(a) shows the grid effect when all the convolutional layers use a dilation rate r = 2, and Figure 6(b) shows the receptive field when using the dilation rate r = [1, 2, 3] for the convolutional layer.

Figure 6.

Schematic diagram of dilated convolution.²¹ (a) The grid effect when all the convolutional layers use a dilation rate r = 2. (b) The receptive field when using the dilation rate r = [1,2,3].

Network Architecture

This section considers the image deblurring preprocessing of an intersection monitor, which mainly aims to deblur a naturally blurred image, pre-eliminate the dynamic mode caused by the rapid movement of the vehicle, and restore the blurred graphics to a sharp image to improve image recognition accuracy.

Because there are usually only blurred images and the blur kernels are unknown, they are blind deconvolution problems. We propose a model that uses dilated convolution combined with U-Net to improve the effect of removing motion blur. The model is also combined with a multi-scale architecture, which refers to the SIUN architecture of Ye et al.¹⁸ The image deblurring network architecture is illustrated in Figure 7.

Figure 7.

The architecture diagram image deblurring network that uses dilated convolution combined with a U-Net to improve the effect of removing motion blur.

Upscaling iterative strategy

This subsection addresses the scaled-up iterative network architecture proposed by Ye et al.¹⁸ to restore sharp images using iterative scale-up. To simplify the network model, the scale iteration is divided into upper and lower parts, such as the $U N_{1}$ and $U N_{2}$ are indicated.

Because the network model setting must splice two images of the same scale as the input, the blurred and deblurred images were spliced into an image with six channels. With input to the network for the first time, we did not have the same scale deblurred image of blurred image $B_{i}$ to be stitched, so that two blurred images $B_{i}$ were stitched as the input. In the first part, two blurred images with the smallest scale $B_{i}$ are input, and the deblurred image $L_{i}$ is obtained after passing through $U N_{1}$ .

L_{i} = U N_{1} ([B_{i}, B_{i}])

(3)

The second part inputs the blurred image of the next scale

B_{i - 1}

and the upsampled image

L_{i}^{↑}

from the deblurred image

L_{i}

for stitching. The output of

U N_{2}

is the deblurred image

L_{i - 1}

L_{i - 1} = U N_{2} ([B_{i - 1}, L_{i}^{↑}])

(4)

where i is the number of scale iterations, the blurred image

B_{i}

is the original blurred image at a scale of

1 / 2^{i}

, and the deblurred image

L_{i}

is the output of the i-th scale iteration.

We set I as the number of scale iterations $(i = I, I - 1, I - 2, \dots ., 1, 0)$ . The first scale iteration consists of two blurred images, $B_{I}$ spliced and input to the deblurring network, and the deblurred image is output as $L_{I - 1}$ ; the second scale iteration consists of blurred image $B_{I - 1}$ , and blurred image $L_{I - 1}$ is spliced into the network; the deblurred image is output as $L_{I - 2}$ , and so on, until the last scale iteration to output the deblurred image $L_{0}$ , The scaled-up iteration design is shown in Figure 8.

Figure 8.

Schematic diagram of iterative design for scale-up.

U-Net with dilated convolution

Encoder–decoder networks are often used in deep learning image deblurring methods. Ronneberger et al.¹² proposed a U-Net with skip connections between the corresponding encoder and decoder. Tao et al.²² added ResBlock¹⁴ to U-Net to increase the network model depth. This study primarily refers to the U-Net used in the study of Ronneberger et al.¹² and improves on it. As shown in Figure 7, this study replaces all ReLU activation functions used in ResBlock, E-Block, and D-Block with LeakyReLU.^15,23 In the encoder, a convolutional layer with a stride of 2 was selected for downsampling, and the feature map after each downsampling increases by twice the number of channels. For example, if the input image is w × h × c, then after an E-Block, it becomes a feature map of $\frac{w}{2} \times \frac{h}{2} \times 2 c$ , where w is the width of the image, h is the height, and c is the channel.

This study uses a dilated convolution combined with a U-Net. Dilated convolution is used only in the encoder. The main purpose of this design is to capture more features at different scales and enhance feature extraction without increasing the number of additional parameters. Because dilated convolution inserts several holes into the convolution kernel according to the dilation rate, it may cause grid effects on the special map after several convolutions. To solve this problem, the HDC proposed by Wang et al.²¹ is used to design a dilated convolution block (DilateBlock) with a dilation rate of r = [1, 2, 3], which consists of convolutional layers with dilation rates of 1, 2, and 3 in sequence. The dilated block uses a skip connection for feature fusion and splices the feature map output by each convolutional layer on the channel, as shown in Figure 9.

Figure 9.

Dilated block design scheme.

Loss function

In the multi-scale image deblurring networks investigated by Nah et al.¹⁴ and Tao et al.,²² the loss between the output image and target image of each scale was added to the weight and calculated. In contrast to these methods, this study mainly refers to the lifting-scale iteration architecture proposed by Ye et al.,¹⁸ in which one scale iteration is regarded as an independent deblurring subtask; it still has an effect even though the training process ends at any iteration. Therefore, only the difference between the final deblurred image and target image is calculated, and the mean absolute error (MAE) is selected as the loss function. The MAE is also called $L_{1}$ loss.

L o s s = \frac{L_{i} - G_{i}}{N_{i}}

(5)

where

L_{i}

is the deblurred image,

G_{i}

is the sharp image, and

N_{i}

is the number of elements in

L_{i}

to normalize at the i -th scale.

Image dataset

The image deblurring networks used in this study were trained using the GoPro¹⁴ dataset, and the GoPro and Lai²⁴ datasets were used for the prediction and comparison of the deblurring results.

Synthetic image dataset. For related image deblurring methods, the most commonly used synthetic GoPro dataset uses the GoPro Hero4 Black action camera to shoot 240 FPS of video and then uses the average number of consecutive frame rates to produce blurred images of varying intensities. The dataset consisted of 3214 pairs of clear and blurred images with a resolution of 1280 × 720 pixels. The training set consisted of 2103 image pairs, and the prediction set consisted of 1111 image pairs.

Real image dataset. The research motivation for this study was the deblurring of intersection monitor images. Predicted images must be obtained from blurred scenes. Therefore, the Lai dataset was selected to evaluate the naturally blurred images. This dataset contains 100 naturally blurred images obtained from Flickr, Google, previous literature, or photos taken by Lai et al. For these images to be suitable for most image deblurring algorithms, the image size must be less than 1200 pixels.

Experimental results and discussion

In this study, the deblurred and original sharp images were evaluated using the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) of the image standards. The root-mean square error (RMSE), which measures the mean squared difference between the predicted and actual observed values, is given by

MSE = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} (I_{i j} - {\hat{I}}_{i j})^{2}

(6)

The PSNR is the ratio of the peak signal energy to the average noise energy. The RMSE is used to calculate the PSNR, with higher values indicating greater similarity between the two images.

PSNR = 10 \cdot \log \frac{255^{2}}{M S E}

(7)

The SSIM is based on three comparison measures between two images: luminance, contrast, and structure. For each calculation, an 11 × 11 sliding window was taken from the image. Finally, the average value was considered as the SSIM for the entire field. The larger the value, the more similar the two images.

SSIM (x, y) = [l {(x, y)}^{α} \cdot c {(x, y)}^{β} \cdot s {(x, y)}^{γ}]

(8)

l (x, y) = \frac{2 μ_{x} μ_{y} + c_{1}}{μ_{x}^{2} μ_{y}^{2} + c_{1}}

(9)

c (x, y) = \frac{2 σ_{x} σ_{y} + c_{2}}{σ_{x}^{2} σ_{y}^{2} + c_{2}}

(10)

s (x, y) = \frac{2 σ_{x y} + c_{3}}{σ_{x} σ_{y} + c_{3}}

(11)

Generally, taking

c_{3} = \frac{c_{2}}{2}

and setting α, β, γ to 1, we obtain

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + c_{1}) (2 σ_{x y} + c_{2})}{(μ_{x}^{2} + μ_{y}^{2} + c_{1}) (σ_{x}^{2} + σ_{y}^{2} + c_{2})}

(12)

Dilated convolution combined with U-Net experiment

Next, for the architecture of dilated convolution combined with U-Net, two experiments were designed: “dilation rate strategy of mixed dilated convolution” and “network model of dilated convolution combined with U-Net,” to study how to design an image suitable for image multi-scale dilated U-Net for deblurring. The following experiments were based on the GoPro dataset¹⁴ for training and testing. The PSNR in the table comparing the deblurring results is the average value, and the value with the best result is in bold.

Dilation rate strategy of mixed dilated convolution. The largest scale of the input image was 128 × 128 and the smallest was 32 × 32 for the image pyramid in the training network. The network model needs to adapt to the different scales of images at the same time, so that the dilation rate of the convolutional layer needs to be adjusted according to the image scale setting. To test and determine which type of DilateBlock combined with U-Net is more suitable for image deblurring, we used HDC to design DilateBlock and four different dilation rate strategies of $r = [R_{1}, R_{2}, \dots . . R_{n}]$ are proposed: pure dilation rate strategies D = 3, D = 4, and D = 5, and the mixed dilation rate strategy D = [3, 4, 5], as shown in Figure 10, where D is the maximum value R_n of the dilation rate, which indicates the number of convolutional layers used in the dilated block. For example, the pure dilation rate policy D = 4 implies that the dilated block has four layers of convolutions, and the dilation rate is r = [1, 2, 3, 4]. In the mixed dilation rate strategy D = [3, 4, 5], the network uses different dilation rates according to the scale of the input image, and the dilated block with the smallest scale iteration uses the dilation rate r = [1, 2, 3]; the next scale uses r = [1, 2, 3, 4], and then iterates until zoomed back to the original image size using r = [1, 2, 3, 4, 5].

Figure 10.

Schematic diagram of dilated convolution strategy.

As displayed in Table 1, using the pure dilation rate strategy, D = 3, to obtain the highest PSNR value of 30.05 dB and SSIM value of 0.869 is a more suitable dilation rate strategy for image deblurring.

Table 1.

Comparison of experimental results of dilation rate strategy.

Strategies	D = 3	D = 4	D = 5	D = [3, 4, 5]
PSNR	30.05	29.88	29.91	29.81
SSIM	0.869	0.819	0.806	0.788

PSNR: peak signal-to-noise ratio; SSIM: structural similarity index measure.

Bold text indicates that its data is better than other data.

Network model experiment of dilated convolution combined with U-Net. According to the experimental results of the “Dilation rate strategy of mixed dilated convolution,” we have obtained a better dilation rate strategy, which is the pure dilation rate strategy D = 3 with one DilateBlock used to join U-Net. Dilated convolution can enhance feature extraction without adding additional parameters. We consider adding more dilated blocks to the U-Net; therefore, according to the architecture used in the experiment (dilution rate strategy of mixed dilated convolution), this experiment proposes four dilated U-Net strategies for comparison and determines a model of dilated convolution combined with a U-Net that is more suitable for deblurring methods.

The basic network architecture of this experiment is to add a dilated block with a pure dilation rate strategy of D = 3 to U-Net. Because the purpose of adding a dilated convolution is to strengthen the feature extraction of the coding, a dilated block is inserted before an E-Block. There are two U-Net models used in the network model, which are referred to as U-Net₁ and U-Net₂, respectively. The following are the proposed four dilated U-Net strategies, and the network model diagram of these strategies is shown in Figure 11.

A dilated block is added before the first E-Block of U-Net₁ and U-Net₂.

A dilated block is added before and after the first blocks of U-Net₁ and U-Net₂.

A dilated block is added before and after the first E-Block of U-Net₁, and a dilated block is added before the first E-Block of U-Net₂.

A dilated block is added before all blocks of U-Net₁, and a dilated block is added before the first E-Block of U-Net₂.

Figure 11.

Schematic diagram of the network model of dilated convolution combined with U-Net.

It can be observed from Table 2 that Strategy C yields the highest PSNR value of 30.45 dB and SSIM value of 0.918, which is a more suitable image deblurring network.

Table 2.

Comparison of experimental results of a network model of dilated convolution combined with U-Net.

Strategies	A	B	C	D
PSNR	30.05	30.26	30.45	30.15
SSIM	0.854	0.867	0.918	0.891

PSNR: peak signal-to-noise ratio; SSIM: structural similarity index measure.

Bold text indicates that its data is better than other data.

Experimental results of model test

The network models were trained and tested using the Keras library based on the TensorFlow learning framework, and implemented on a computer with an Intel Core i7-8700 CPU, an NVIDIA GeForce RTX 2070 GPU, and 16 GB of RAM. By training 2000 epochs using Adam optimization, the learning rate scheduler was 1e⁻⁴, 3e⁻⁵, 5e⁻⁶, and 1e⁻⁶, and the learning rate was updated according to the loss at the moment of training.

Although Ye et al.¹⁸ and Tao et al.²² have provided the weights that were trained, these weights are retrained by the computer used in this study for testing. In their published papers, Ye et al.¹⁸ and Tao et al.²² showed that the deblurring results on the GoPro dataset were PSNR values of 30.22 and 30.26 dB, respectively.

Test data using artificially blurred images

This study compares our network model with recent deep learning-based image deblurring methods and uses the recognized artificial synthetic blur GoPro dataset¹⁴ for testing. The PSNR, SSIM, and time values in Table 3 are the average (avg) and standard (std) values, and the values with the best results are in bold. It can be seen from Table 3 that in the deblurring results of the artificially synthesized blurred images, the proposed method obtains a PSNR of 30.45 dB(avg) and SSIM value of 0.918(avg), and the deblurring effect has a better performance compared with other methods. The following analyzes the differences between this study and other methods through a deblurred image comparison chart, as shown in Figures 12–14.

Figure 12.

Comparison of deblurring results of artificial synthetic blurred images (1). (a) Blur, (b) Nah et al.¹⁴ (c) Tao et al.,²² (d) Ye et al.,¹⁸ (e) proposed method, and (f) Sharp-truth image.

Figure 13.

Comparison of deblurring results of artificial synthetic blurred images (2). (a) Blur, (b) Nah et al.¹⁴ (c) Tao et al.,²² (d) Ye et al.,¹⁸ (e) Proposed method, and (f) sharp-truth image.

Figure 14.

Comparison of deblurring results of artificial synthetic blurred images (3). (a) Blur, (b) Nah et al.¹⁴ (c) Tao et al.,²² (d) Ye et al.,¹⁸ (e) proposed method, and (f) sharp-truth image.

Table 3.

Comparison of deblurring results of GoPro dataset.

Method	Nah et al.¹⁴		Tao et al.²²		Ye et al.¹⁸		Ours
	avg	std	avg	std	Avg	std	avg	std
PSNR	30.40	0.911	29.89	0.747	28.84	0.721	30.45	0.87
SSIM	0.914	0.024	0.908	0.227	0.883	0.029	0.918	0.027
Time	1.51 s	0.051	0.36 s	0.011	0.37 s	0.066	0.40 s	0.01

PSNR: peak signal-to-noise ratio; SSIM: structural similarity index measure.

Bold text indicates that its data is better than other data.

In Figure 12, the blurred image of the car advertising billboard, numbers, and text cannot be recognized because of dynamic blurring. After comparing our method with other deblurring methods, the edges of the numbers in the image obtained were clearer, which is helpful for the identification of information on advertising billboards.

This study is expected to be applied to the image deblurring of intersection monitors. From the image in Figure 13, it can be seen that the numbers on the license plate in the original image are blurred owing to dynamic blur, and it is almost impossible to distinguish them clearly. The deblurring results of our method restore the numbers on the license plate to a degree that can be viewed directly, which is better than the other methods of restoring numbers.

In the blurred image shown in Figure 14, ringing is generated owing to dynamic blurring. Taking the feet shown in the figure as an example, the deblurring results of the other methods still exhibit ripples and edge distortion. The deblurring method used in this study almost completely suppresses these phenomena and is closer to the original sharp image than the other deblurred images.

Test data with naturally blurred images

The research goal of this study is to help intersection monitor images deblur, and the predicted blurred image comes from a real blurred scene. To test whether the network model in this study can adapt to natural blurring, the recognized Lai dataset²⁴ was used to carry out testing. Because these images originated from real blurred scenes, there are no correct and sharp images; the difference cannot be calculated, and the resulting images were evaluated. We can only examine the image deblurring results with the naked eye. Compared with the other methods, the method proposed in this paper is more effective for the removal of natural motion blur, and the deblurring results of naturally blurred images are shown in Figures 15–17.

Figure 15.

Comparison of deblurring results of naturally blurred images (1). (a) Blur, (b) Nah et al.,¹⁴ (c) Tao et al.,²² (d) Ye et al.,¹⁸ (e) Kupyn et al.,¹⁶ and (f) our method.

Figure 16.

Comparison of deblurring results of naturally blurred images (2). (a) Blur, (b) Nah et al.,¹⁴ (c) Tao et al.,²² (d) Ye et al.,¹⁸ (e) Kupyn et al.,¹⁶ and (f) our method.

Figure 17.

Comparison of deblurring results of naturally blurred images (3). (a) Blur, (b) Nah et al.,¹⁴ (c) Tao et al.,²² (d) Ye et al.,¹⁸ (e) Kupyn et al.,¹⁶ and (f) our method.

Figure 15 shows a naturally blurred image captured in a dark indoor scene. Under conditions of insufficient light, the facial features of the blurred images become blurred and unclear. Although the method proposed by Kupyn et al.¹⁶ has a good deblurring effect on human faces, the edges of the facial features are not sufficiently sharp. The glass mouth also appears as ripple waves, making the image blurry. Our method achieves better edge restoration.

The blurred image in Figure 16 produces afterimages owing to dynamic blurring. Almost all other deblurring methods sharpen the afterimages of blurry images, making the text appear more blurred. Although the method used in this study cannot completely restore a sharp image, it effectively removes most afterimages.

After comparing the deblurring results in Figure 17, it was found that although some methods can make the text clearer, edge distortion still occurs. Compared with other methods, the deblurring results in this study can moderately suppress afterimages.

Conclusion

In the related literature on deep learning for deblurring motion-blurred images, the use of multi-scale architecture uses scale changes to learn the characteristics of different scales of images. The scale recurrent network can share network weights at different scales to reduce the number of parameters and make full use of the feature information. The multi-scale number of iterations can be adjusted according to the variety of blurred images, as compared with other fixed multi-scale architectures.

The multi-scale modified U-Net using the dilated convolution deblurring method proposed in this study was compared with other image deblurring methods. The PSNR was 30.45 dB in the test of artificial synthetic blurred data using our method, and the deblurring effect was better than that of other performance methods. The results show that our method can effectively suppress damage to afterimages and ripple problems. Although the experiment of the natural blur dataset test could not obtain a numerical evaluation, according to the results of the naked-eye observation, it was found that our network model can adapt to naturally blurred images, especially blurred text and images with insufficient light, and can obtain better deblurring results.

Footnotes

Acknowledgements

The authors thank the anonymous referees for their helpful comments and suggestions.

Authors’ contributions

Xiao-Pei Shi and Song-Yih Lin were responsible for the conception and design, acquisition of data, analysis, and interpretation of data, drafting the initial manuscript and revising it critically for important intellectual content. Min-Lang Yang was responsible for the conception and design, and interpretation of data and reviewing all drafts of the manuscript. Chung-Chi Huang and Jen-Chun Lee were responsible for the experiments of the manuscript. All authors read and approved the final manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Song-Yih Lin

Chung-Chi Huang

Author biography

Jen-Chun Lee is the professor and vice president of Maritime College at National Kaohsiung University of Science and Technology.

References

Suh

Park

Kim

. Traffic safety evaluation based on vision and signal timing data. Proc Eng Technol Innov 2017; 7: 37–40.

Kupyn

Budzan

Mykhailych

, et al. Deblur-GAN: Blind motion deblurring using conditional adversarial networks. In: Proceeding of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018, pp. 8183–8192.

Fergus

Singh

Hertzmann

, et al. Removing camera shake from a single photograph. ACM Trans Graph 2006; 25: 787–794.

Cho

Lee

. Fast motion deblurring. ACM Trans Graph 2009; 28: 1–8.

Whyte

Sivic

Zisserman

, et al. Non-uniform deblurring for shaken images. Int J Comput Vis 2012; 98: 168–186.

Yang

. Good regions to deblur. In: Proceeding of 12th European Conference on Computer Vision (VECCV 2012), Florence, Italy, 7–13 October 2012, pp. 59–72.

Sun

Cho

Wang

, et al. Edge-based blur kernel estimation using patch priors. In: Proceedings of the IEEE International Conference on Computer Photography, Cambridge, MA, USA, 13 June 2013, pp. 1–8.

Ren

Liu

, et al. Deep convolutional neural network for image deconvolution. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, Quebec, Canada, 8–13 December 2014, pp. 1790–1798.

Isola

Zhu

Zhou

, et al. Image-to-image translation with conditional adversarial networks. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017, pp. 5967–5976.

10.

Zhu

Park

Isola

, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017, pp. 2242–2251.

11.

Yeo

Yen

. Impurities detection in intensity inhomogeneous edible bird’s nest (EBN) using a U-Net deep learning model. Int J Eng Technol Innov 2021; 11: 135–145.

12.

Ronneberger

Fischer

Brox

. U-Net: Convolutional networks for biomedical image segmentation. In: Proceeding of 18th International Conference Medical Image Computing and Computer-Assisted Intervention (MICCAI 2015), Munich, Germany, 5–9 October 2015, pp. 234–241.

13.

Tao

Gao

Liao

, et al. Detail-revealing deep video super-resolution. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017, pp. 4472–4480.

14.

Nah

Kim

Lee

. Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017, pp. 3883–3891.

15.

Wang

Chen

, et al. Empirical evaluation of rectified activations in convolutional network. in arXiv:1505.00853, 2015.

16.

Kupyn

Martyniuk

, et al. DeblurGAN-v2: Deblurring (orders-of-magnitude) faster and better. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV 2019), Seoul, South Korea, 27 October–2 November 2019, pp. 8878–8887.

17.

Lin

Dollà

Girshick

, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017, pp. 2117–2125.

18.

Lyu

Chen

. Scale-iterative upscaling network for image deblurring. IEEE Access 2020; 8: 18316–18325.

19.

Zhang

Tian

Kong

, et al. Residual dense network for image super-resolution. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 18–23 June 2018, pp. 2472–2481.

20.

Koltum

. Multi-scale context aggregation by dilated convolution. In: Proceedings of 4th International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016, pp. 1–13.

21.

Wang

Chen

Yuan

, et al. Understanding convolution for semantic segmentation. In: Proceeding of the IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV/CA, USA, 11–15 March 2018, pp. 1451–1460.

22.

Tao

Gao

Shen

, et al. Scale-recurrent network for deep image deblurring. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), Salt Lake City, UT, USA, 18–23 June 2018, pp. 8174–8182.

23.

Yuexian Zou

Shi

. Dilated convolution neural network with LeakyReLU for environmental sound classification. In: 2017 22nd International Conference on Digital Signal Processing (DSP), London, United Kingdom, 23–25 August 2017, pp. 1–5.

24.

Lai

Huang

, et al. A comparative study for single image blind deblurring. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas NV, USA, 27–30 June 2016, pp. 1701–1709.

Image deblurring by multi-scale modified U-Net using dilated convolution

Abstract

Keywords

Introduction

Methodology

U-Net

Multi-scale architecture

Dilated convolution

Network Architecture

Upscaling iterative strategy

U-Net with dilated convolution

Loss function

Image dataset

Experimental results and discussion

Dilated convolution combined with U-Net experiment

Experimental results of model test

Test data using artificially blurred images

Test data with naturally blurred images

Conclusion

Footnotes

Acknowledgements

Authors’ contributions

Declaration of conflicting interests

Funding

ORCID iDs

Author biography

References