Sage Journals: Discover world-class research

Abstract

Low-light image enhancement technique aims to improve the contrast and brightness of low-light images. Diffusion models, attributed to adeptness at capturing intricate details, have achieved good results in image enhancement, but there are problems such as inadequate estimation of noise characteristics and the emergence of color bias in the enhanced images. To address the aforementioned problems, this paper proposes a diffusion models-based method for low-light image enhancement, termed KANDiff. In the diffusion model architecture, this paper adds the nonlinear learnable activation function Kolmogorov–Arnold network to the noise estimation network U-Net to generate higher quality enhanced images. Additionally, KANDiff mitigates color bias in the enhanced images through a joint loss function and employs a patch-based image restoration strategy to significantly enhance the model generalization capability. The experimental results show that the KANDiff algorithm proposed in this paper can achieve high-quality image enhancement and achieve better enhancement effects compared to other algorithms.

Keywords

low-light enhancement diffusion models Kolmogorov–Arnold network patch-based image restoration

1. Introduction

During image acquisition, images captured in low-light environments often exhibit various forms of degradation, including diminished contrast, reduced visibility, and heightened noise. These deficiencies can adversely impact subsequent visual tasks, such as target detection (Liang et al., 2021), image classification (Loh & Chan, 2019), and automated driving (Li et al., 2021), potentially compromising the efficacy of numerous computer vision systems. Consequently, the challenge of transforming low-light images into high-quality and well-exposed images has attracted the attention of researchers with substantial theoretical significance and practical applicability.

Traditional low-light enhancement methods primarily rely on the adjustment of contrast and brightness to enhance image visibility. In traditional methods, histogram equalization and Retinex theory are commonly used to enhance the low-light images. Histogram equalization-based methods (Ibrahim & Kong, 2007; Kim, 1997) improve image contrast by modifying the image histogram, and the methods based on Retinex theory (Land & McCann, 1971; Li et al., 2018) decompose the image into illumination and reflection components and achieve the enhancement of the dark light image by altering the dynamic range of the pixels in the illumination map and reducing the noise in the reflection map. Although traditional methods are characterized by simplicity in implementation and low computational cost, they may result in noise artifacts or color distortion, as they often fail to adequately account for the spatial information inherent in the image.

With the rapid advancement of deep learning, which has demonstrated significant advantages in processing complex scenes while preserving details and exhibiting adaptability, deep learning techniques have increasingly been adopted as a more comprehensive approach for low-light image enhancement. Methods for low-light image enhancement based on deep learning can be broadly categorized into learning-based approaches and generative model-based approaches. Learning-based low-light image enhancement methods typically employ structures such as convolutional neural networks (CNNs) to enhance low-light images by learning mapping relationships, with an emphasis on the direct enhancement of input images. Shen et al. (2017) conceptualized the traditional Retinex method as a feed-forward CNN utilizing varying Gaussian convolutional kernels, which facilitates end-to-end mapping from dark to bright images for enhancement purposes. Cai et al. (2018) implemented a two-stage CNN structure to solve the color bias problem associated with single-stage CNN architectures. Additionally, Hao et al. (2022) decoupled the enhancement model into two sequential stages to improve the network’s feature extraction capability, resulting in more natural and realistic enhanced images. Learning-based methods generally treat the enhancement task as a pixel-by-pixel mapping between low-light and normal-light images, employing pixel-by-pixel loss during the training process, and achieving acceptable enhancement results. However, these methods often encounter challenges in preserving visual fidelity, leading to suboptimal results.

Generative model-based methods for low-light image enhancement typically utilize generative models such as generative adversarial networks (GANs), diffusion models, and flow models, with a primary focus on generating realistic images rather than direct enhancement. Liu et al. (2021) employed the GANs in conjunction with a residual dense-patch encoder–decoder structure to effectively suppress noise while finely adjusting illumination, resulting in successful enhancement of low-light images even under extremely low illumination conditions. Wang et al. (2022) proposed an effective method for modeling the light distribution of an image using the normalizing flow technique, thereby achieving enhanced image quality through sophisticated image enhancement methods. Nevertheless, GAN-based methods and flow-based methods are often subject to unstable training and mode collapse, which may result in artifacts in the enhanced images.

In comparison to other generative models such as GANs and variational autoencoders, diffusion models (Ho et al., 2020) facilitate high-quality mapping from randomly sampled Gaussian noise to the target image or latent distribution by incorporating Gaussian noise into data samples and subsequently predicting the noise in a stepwise manner. Therefore, the diffusion model results in enhanced stability and robustness, and is less susceptible to problems such as mode collapse. Liu et al. (2024) decoupled the diffusion models into two distinct processes: residual diffusion and noise diffusion. This modification extended the traditional image restoration task into a unified and interpretable framework for image restoration. In Özdenizci and Legenstein (2023), images were decomposed into several patches, facilitating the recovery of images of varying sizes. The diffusion models are comprised of a forward diffusion process and an inverse diffusion process. In the forward diffusion process, Gaussian noise is added to the input data, which is ultimately transformed into an approximate pure Gaussian noise. In the inverse diffusion process, the original input data is recovered from its noisy state by iteratively predicting the noise to be removed at each denoising step through a noise estimation network, typically structured as the U-Net framework (Ronneberger et al., 2015). Existing studies have primarily focused on the application of pre-trained diffusion U-Nets to downstream tasks, while the internal characteristics of diffusion U-Nets have remained largely underexplored. Recently, Kolmogorov–Arnold network (KAN) (Liu et al., 2024) has garnered significant attention due to its outstanding performance. KAN introduces learnable activation functions on the network edges (i.e., weights) to enhance the flexibility and expressiveness of the model while preserving interpretability. Consequently, Li et al. (2024) integrated U-Net with KAN to yield a network characterized by strong interpretability, particularly in the context of medical image segmentation.

Diffusion model-based low-light image enhancement methods have made significant advancements in illuminance and detail restoration, but are still insufficient in noise estimation, which leads to problems such as image artifacts and inadequate color reproduction. To address the limitations of existing algorithms, this paper proposes a novel low-light image enhancement approach that utilizes a joint backbone network comprising U-Net and KAN as the noise estimation network, thereby enhancing the modeling capability of noise within the framework of diffusion models. Additionally, a patch-based strategy is employed to effectively enhance low-light images of varying sizes. The principal contributions of this paper are as follows:

(1)
This paper embedded KAN into U-Net network architecture and applied it as a noise predictor in a diffusion model for dark light image enhancement, and obtained the KANDiff algorithm. The KANDiff algorithm predicts noise more accurately and efficiently and improves the precision and accuracy of the generated images. In KANDiff, a patch-based image restoration structure is used to make the model applicable to images of varying sizes, which enhances the generalization ability of the model.
(2)
The joint loss function is formulated to regulate the training process, aiding in error reduction, enhancing the fitting ability, bolstering model robustness, and minimizing the occurrence of the biased color phenomenon within model generation.
(3)
The KANDiff method is assessed across a range of standard datasets, including synthetic pairwise datasets such as LOLv1 (Wei et al., 2018), VE-LOL (Liu et al., 2021), and LSRW (Hai et al., 2023), alongside real-world unpaired datasets such as DICM (Lee et al., 2013) and LIME (Guo et al., 2016), and tested on our own LIDS dataset. It is evidenced through the experiments that the KANDiff method proposed in this paper surpasses other existing low-light enhancement techniques in terms of its capacity for detail recovery.

2. Related Work

2.1. Low-Light Image Enhancement

After decades of research, a variety of algorithms for low-light image enhancement have been proposed by researchers to improve image quality. Currently, low-light image enhancement methods are primarily categorized into two groups: traditional methods and deep learning-based methods. Traditional low-light image enhancement methods are primarily composed of histogram equalization-based and Retinex theory-based approaches. Histogram equalization is employed to uniformly adjust the dynamic range of the image, either in a linear or nonlinear manner, to enhance brightness. Dynamic histogram equalization, as proposed by Abdullah-Al-Wadud et al. (2007), realized image enhancement by partitioning the image histogram based on local minima and allocating specific grayscale ranges to each partition before equalizing them separately. Ooi and Isa (2010) improved image quality by segmenting the histogram into four sub-histograms and doing histogram equalization on each sub-histogram. These methods are simple and efficient, but they do not take into account the spatial information of the image, which may lead to artifacts resulting from over-enhancement or under-enhancement. Retinex theory (Land & McCann, 1971) simulates the human visual system by decomposing the image into illuminated and reflected components and optimizes the illuminated image to enhance both contrast and luminance. Fu et al. (2016) introduced a novel weighted variational model to provide a more robust prior representation in the regularization term, with the aim of preserving additional details in the estimated reflectance, yielding favorable outcomes in the enhancement of low-light images. Retinex-based methods are designed to address illumination issues, but they may fall short in mitigating color loss and could potentially induce further color distortion in the resultant image.

Deep learning-based methods have become the mainstream in image enhancement algorithms. In 2017, a stacked sparse denoising autoencoder was constructed by Lore et al. (2017), and it was introduced into a low-light image enhancement network, providing a reference example for the application of deep learning models to tasks such as low-light image enhancement. Wei et al. (2018) proposed a low-light enhancement method known as Retinex-net, which integrates Retinex theory with CNN. However, the modeling of the relationship between global long-range pixels within an image remains challenging, resulting in reconstructed images that exhibit substantial noise. To solve this problem, Jiang et al. (2021) and Guo et al. (2020) introduced effective unsupervised methods, Enlighten GAN and Zero-reference deep curve estimation (Zero-DCE), respectively, which enable the model to perform image enhancement without paired data. In 2022, Wang et al. (2022) introduced the LLFlow network, the LLFlow network effectively captures both local pixel dependencies and global image characteristics by modeling the pixel distribution of a normally exposed image, resulting in improved image enhancement quality. Despite the significant advancements made in low-light image enhancement by the aforementioned methods, problems still exist, such as pattern collapse and training instability plague GANs during training, while flow-based models may encounter challenges in modeling multi-peak distributions. In comparison to GAN-based and flow-based models, diffusion models are characterized by a more stable training process and a heightened capacity to recover the covered details by noise. In 2023, a wavelet-based conditional diffusion model was proposed by Jiang et al. (2023), which leveraged the generative capabilities of diffusion models to generate results with satisfactory perceptual fidelity. With the advantage of wavelet transform, the reasoning speed was accelerated significantly and the computing resources were reduced greatly without compromising information integrity. These operations led to the lightweight of the model, resulting in commendable enhancement outcomes. Additionally, Zhou et al. (2023) employed a novel pyramid diffusion technique, sampling technique, to incrementally increase the resolution in an inverse process, thus enhancing the performance and efficiency of the model.

2.2. Image Restoration Based on Diffusion Models

The improved denoising diffusion probabilistic model (DDPM) has achieved remarkable results in image generation, as a result, the diffusion models were introduced to the image restoration by researchers and demonstrated state-of-the-art (SOTA) performance across various low-level image restoration tasks. In the realm of image deblurring, an alternative framework for blind deblurring has been proposed by Whang et al. (2022), centered on conditional diffusion models. A stochastic sampler has been trained to enhance the output of a deterministic predictor, capable of generating a range of plausible reconstructions for a given input, leading to a notable enhancement in perceptual quality when compared to existing methods. Utilizing a noisy synthesis for training, guided by a domain-generalizable multiscale representation of the input image, the method proposed by Ren et al. (2022) has achieved superior performance relative to other methods. In the field of image super-resolution, SOTA levels have been attained across multiple datasets through the generation of images with augmented resolution employing a series of multiple diffusion models (Ho et al., 2022). In low-light enhancement, advances in image enhancement were made by Wang et al. (2023) through the proposition of a degradation-aware learning scheme for low-light image enhancement using diffusion models. This scheme adeptly integrates degradation and image priors into the diffusion process. Robust shadow removal was achieved by Guo et al. (2023) through the progressive refinement of degradation and generation of priors. Denoising diffusion restoration model (Kawar et al., 2022) utilized pre-trained diffusion models to address any linear inverse problem. In comparison to GAN-based models and normalized flow-based models, diffusion models exhibit superior performance in the domain of image restoration, capable of producing high-quality images in tasks such as super-resolution, image de-raining, and image dehazing (Özdenizci & Legenstein, 2023; Saharia et al., 2022).

In the realm of noise estimation networks, a new type of diffusion model, designated DiT, was proposed by Peebles and Xie (2023). The DiT model is based on the transformer architecture and utilizes the transformer in place of the U-Net, which serves as the backbone network for noise prediction within diffusion models, thereby demonstrating significant extensibility. Additionally, Si et al. (2024) proposed the Free-U network, which strategically realigns contributions derived from the U-Net jump connections and the backbone feature maps to leverage the strengths of both components of the U-Net architecture, resulting in a marked enhancement in the quality of image and video generation. However, the internal properties of diffusion U-Net remain only partially explored. In this paper, it is proposed that the combination of diffusion models with KAN may significantly enhance the quality of noise prediction and achieve more efficient and robust low-light image enhancement.

2.3. Kolmogorov–Arnold Network (KAN)

The theoretical foundation of the KAN is derived from the Kolmogorov–Arnold representation theorem (KART) (Kolmogorov, 1957), which was independently formulated by Andrei Kolmogorov and Vladimir Arnold in 1957. This theorem asserts that any multivariate continuous function $f$ that depends on $x = [x_{1}, x_{2}, \dots, x_{n}]$ can be represented as a finite synthesis of simple continuous functions that involve only a single variable. Specifically, it is posited that a real, smooth, continuous multivariate function can be expressed as a finite superposition of one-dimensional functions. Consequently, the task of learning high-dimensional functions is simplified to the learning of a polynomial number of one-dimensional functions. However, it has been observed that these one-dimensional functions may exhibit non-smooth properties, which could render them potentially unlearnable in real-world contexts. Thus, while the KART is regarded as theoretically sound, its practical applicability is considered limited.

Recently, Liu et al. (2024) proposed a novel network architecture based on the Kolmogorov–Arnold theorem, termed the KAN. It is argued in the article that, within the finite superposition of unitary functions employed to represent multivariate functions, the unitary functions can be parameterized as B-spline functions. These B-spline functions are typically defined by a set of control points and a node vector, ensuring desirable smoothness properties. Furthermore, the original network architecture has been extended to accommodate arbitrary widths and depths, as opposed to the previous limitation of two depths and $(2 n + 1)$ widths. As a result, KAN is extremely interpretable and has a very low number of parameters, which paves the way for future developments in KAN.

Owing to its strong interpretability and expressive power, KAN has been successfully applied in various domains. For instance, Vaca-Rubio et al. (2024) employed the spline function as an activation function, which allows for adaptive adjustment of the activation pattern in accordance with the dynamic characteristics of time series data. This approach enables a more accurate capture of complex patterns and variations in time series, thereby demonstrating the unique advantages of KAN in enhancing the performance of prediction models through adaptive activation functions. Additionally, Bresson et al. (2024) applied KAN to graph learning tasks instead of the traditional multilayer perceptron (MLP) and demonstrated that KAN has significant advantages in graph regression tasks.

3. Method

3.1. Diffusion Models

DDPM (Ho et al., 2020) is characterized by a two-step process comprising a forward diffusion process and an inverse diffusion process. In the forward diffusion process, random noise is systematically added to the data, while in the inverse diffusion process, data samples are sampled from the noised data. The principles of the diffusion process are illustrated in Figure 1.

Figure 1.

Diffusion models principle.

The DDPM defines a forward diffusion process $q$ that gradually injects Gaussian noise with standard deviation $β_{T}$ into the image in a Markov chain over $T$ diffusion time steps until the data converges to a Gaussian distribution. This is shown in equations (1) and (2):

\begin{aligned} q (x_{T} ∣ x_{t - 1}) & = N (x_{T}; \sqrt{α_{T}} x_{t - 1}, (1 - α_{T}) I_{d}), \end{aligned}

(1)

\begin{aligned} q (x_{T} | x_{0}) & = N (x_{T}; \sqrt{{\bar{α}}_{T}} x_{0}; (1 - {\bar{α}}_{T}) I), \end{aligned}

(2)

where

q (x_{T} ∣ x_{t - 1})

is the prior distribution,

N

is the Gaussian distribution,

x_{T}

is the noise-added image corresponding to time

t

I_{d}

is the variance of the Gaussian noise,

α_{T} = 1 - β_{T}

is the hyper-parameter controlling the variance of the noise added at each step, and

{\bar{α}}_{T} = \prod_{i = 1}^{t} α_{i}

. When the time

t

is sufficiently large, the obtained

x_{T}

is changed to a completely random noise.

By combining equation (2) with the reparameterization technique, the relationship between $x_{T}$ and $x_{0}$ can be established in equation (3):

x_{T} = {\sqrt{{\bar{α}}_{T}} x}_{0} + \sqrt{1 - {\bar{α}}_{T}} ϵ_{T},

(3)

where

ϵ_{T} \sim N (0, I)

has the same dimension as

x_{0}

. Starting from the standard normal prior

p (x_{T}) = N (x_{T}; 0, I)

in the backward diffusion process, the original data is recovered from the noise by constructing a neural network fitting

p_{θ} (x_{t - 1} | x_{T})

, which can be expressed as equation (4):

p_{θ} (x_{t - 1} | x_{T}) = N (x_{t - 1}; μ_{θ} (x_{T}, t), Σ_{θ} (x_{T}, t)),

(4)

where

θ

denotes the parameters of the neural network,

μ_{θ} (x_{T}, t)

and

Σ_{θ} (x_{T}, t)

represent the mean and variance of the conditional distribution

p_{θ} (x_{t - 1} | x_{T})

, respectively. The noisy data obtained by

p_{θ}

must closely match that of the forward process. In the DDPM, the training process is achieved by optimizing the variational lower bound of the negative log-likelihood equation

E_{q (x_{0})} [- \log p_{θ} (x_{0})] \leq L_{θ}

, as represented by the following equation (5):

L_{θ} = E_{q} [\underset{L_{T}}{\underset{⏟}{D_{KL} (q (x_{T} ∣ x_{0}) ∥ p (x_{T}))}} \underset{L_{0}}{\underset{⏟}{- \log p_{θ} (x_{0} ∣ x_{1})}} + \underset{L_{t - 1}}{\underset{⏟}{D_{KL} (q (x_{t - 1} ∣ x_{T}, x_{0}) ∥ p_{θ} (x_{t - 1} ∣ x_{T}))}}],

(5)

where

D_{KL}

is the KL (Kullback–Leibler) divergence of the calculated two functions.

Since the term $\underset{L_{T}}{\underset{⏟}{D_{KL} (q (x_{T} ∣ x_{0}) ∥ p (x_{T}))}}$ does not depend on $θ$ , it can be eliminated. The objective is to ensure that $p_{θ} (x_{t - 1} | x_{T})$ closely resembles the true posterior probability of the forward process at each time step $t$ . To simplify the computation, DDPM sets $Σ_{θ} (x_{T}, t)$ as a constant, while the trainable formulation for $μ_{θ} (x_{T}, t)$ is as follows, equation (6):

μ_{θ} (x_{T}, t) = \frac{1}{\sqrt{α_{T}}} (x_{T} - \frac{β_{T}}{\sqrt{1 - {\bar{α}}_{T}}} ϵ_{θ} (x_{T}, t)) .

(6)

In this equation,

ϵ_{θ} (x_{T}, t)

represents the noise parameter that needs to be predicted, with the corresponding loss function defined as follows, equation (7):

E_{x_{0}, t, ϵ_{T} \sim N (0, I)} [{‖ ϵ_{T} - ϵ_{θ} (\sqrt{{\bar{α}}_{T}} x_{0} + \sqrt{1 - {\bar{α}}_{T}} ϵ_{T}, t) ‖}^{2}],

(7)

where

ϵ_{θ} (\sqrt{{\bar{α}}_{T}} x_{0} + \sqrt{1 - {\bar{α}}_{T}} ϵ_{T}, t)

is predicted by the noise estimation network, U-Net, in the DDPM framework.

Although methods such as DDPM (Ho et al., 2020) and its improved variants are capable of generating realistic images, a notable gap remains between these approaches and mainstream methods based on GANs in terms of metrics such as Frechet inception distance. To address this issue, Ho and Salimans (2022) introduced classifier-free guidance, which enhances model constraints by incorporating category label information during the image generation process, thereby improving the quality of the generated images. The core idea is that, as illustrated in Figure 2, high fidelity to the data distribution is achieved for the sampled $x$ conditional on $\tilde{x}$ by learning the conditional inverse process $p_{θ} (x_{0 : T} | \tilde{x})$ without altering the diffusion process $q (x_{1 : T} | x_{0})$ .

Figure 2.

Conditional diffusion models.

During training, we sample from paired data distributions (e.g., a normal-light image $x_{0}$ and a low-light image $\tilde{x}$ ) to learn conditional diffusion models and use $\tilde{x}$ as the input to the inverse process, equation (8):

p_{θ} (x_{0 : T} ∣ \tilde{x}) = p (x_{T}) \prod_{t = 1}^{T} p_{θ} (x_{t - 1} ∣ x_{T}, \tilde{x}),

(8)

where

x_{t - 1} \sim p_{θ} (x_{t - 1} ∣ x_{T}, \tilde{x})

is given by equation (9):

x_{t - 1} = \sqrt{{\bar{α}}_{t - 1}} (\frac{x_{T} - \sqrt{1 - {\bar{α}}_{T}} \cdot ϵ_{θ} (x_{T}, \tilde{x}, t)}{\sqrt{{\bar{α}}_{T}}}) + \sqrt{1 - {\bar{α}}_{t - 1}} \cdot ϵ_{θ} (x_{T}, \tilde{x}, t),

(9)

where

ϵ_{θ} (x_{T}, \tilde{x}, t)

is the noise parameter matrix to be predicted by the noise estimation network.

3.2. Patch-Based Image Restoration

When designing low-light image enhancement models, most of the existing models assume that the size of the input image is fixed, but the actual image is often of varying sizes, which limits the practical application of the model. Recently, Whang et al. (2022) proposed an image deblurring algorithm based on diffusion models for images of unknown size. The core of the method lies in segmenting an image of arbitrary size into fixed-size blocks for deblurring operation, which ultimately eliminates the blurring in the image effectively. Although this design exhibits some advantages in processing input images of varying sizes, the implementation of this strategy relies on a fully convolutional network, which entails high computational demands, particularly during the testing phase. Moreover, to ensure efficient processing of the entire image, the model necessitates loading the complete image into memory, which can lead to a high memory expense in practice.

Drawing on the core idea of the algorithm, the patch-based method is proposed (Özdenizci & Legenstein, 2023). The general idea of the method is to divide the input image into sample blocks of the same size (also known as patches), use the diffusion model to perform local operations on each patch, and finally merge the results to obtain the global image, the split operation is shown in Figure 3. However, since the recovery of each patch is performed independently, artifacts may arise during the merging process. To mitigate this issue, the model is designed to reduce the appearance of artifacts by guiding the backsampling process, thereby achieving smoother transitions between neighboring patches. An image of arbitrary size is defined as $x_{0}$ , a low-light image is represented as $\tilde{x}$ , and $P_{i}$ is introduced as a binary mask matrix with dimensions corresponding to those of $x_{0}$ and $\tilde{x}$ , indicating the location of the $i$ th $p \times p$ patch within the image. Consequently, the conditional backpropagation process in equation (8) can be modified as follows, equation (10):

p_{θ} (x_{0 : T}^{(i)} ∣ {\tilde{x}}^{(i)}) = p (x_{T}^{(i)}) \prod_{t = 1}^{T} p_{θ} (x_{t - 1}^{(i)} ∣ x_{T}^{(i)}, {\tilde{x}}^{(i)}),

(10)

where

x_{0}^{(i)} = Crop (P_{i} \circ x_{0})

and

{\tilde{x}}^{(i)} = Crop (P_{i} \circ \tilde{x})

denote the

p \times p

patches from the training set of image pairs

(x_{0}, \tilde{x})

. During training, the model will randomly (with uniform probability) sample

P_{i}

p \times p

patch locations within the complete image size range.

Figure 3.

Illustration of overlapping patch operation. The image shows a simplified schematic of this operation, where the colored blocks are the different patch blocks and the white grid lines are the parts where the four patch blocks overlap, and we will sample the pixels in the region to update them based on the average estimated noise of the four patch blocks at each denoising time step $t$ .

3.3. Joint Loss Function

By minimizing the loss function, the network can narrow the distance between the output image and the standard image to achieve the purpose of low-light enhancement. Different loss functions have different feature extraction capabilities; in order to get a higher-quality output image, a joint loss function is used in this paper. Considering the structural information and contextual information between images, in addition to the loss function $L_{Diff}$ , which is generally used to optimize diffusion models, the loss function in this paper consists of two other components, and the total loss is given by equation (11):

Loss = L_{Diff} + L_{ssim} + L_{C},

(11)

where

L_{ssim}

is the structural loss and

L_{C}

is the Charbonnier loss.

Images captured in low-light environments frequently exhibit structural distortion. The structural similarity index (SSIM), a widely recognized metric for assessing the similarity between two images, is commonly employed to evaluate the degree of similarity before and after distortion, as well as to determine the fidelity of images generated by models. Therefore, in this paper, the SSIM metric is utilized to formulate a structural loss function aimed at minimizing the disparity between the generated image and the reference image. The equation for this metric is presented as follows, equations (12) and (13):

\begin{aligned} SSIM (Y, y) & = \frac{2 μ_{Y} μ_{y} + C_{1}}{μ_{Y}^{2} + μ_{y}^{2} + C_{1}} ∙ \frac{2 σ_{Y y} + C_{2}}{σ_{Y}^{2} + σ_{y}^{2} + C_{2}}, \end{aligned}

(12)

\begin{aligned} L_{ssim} & = 1 - SSIM (Y, y), \end{aligned}

(13)

where

Y

denotes the standard image,

y

denotes the output image,

μ_{i}

is the result after Gaussian filtering of image

i

σ_{i}

denotes the standard deviation of image

i

σ_{i j}

denotes the covariance between image

i

and

j

, and

C_{1}, C_{2} \in [0, 1],

which serves to maintain the stability of the denominator.

The Charbonnier loss is similar to the L1 loss, but it has a smoothing, data stabilization feature that exhibits the behavior of the L2 loss at smaller errors, while maintaining similar linear properties to the L1 loss at larger errors. The smoothing property helps to mitigate the gradient vanishing problem and makes the optimization process more stable. The problem of gradient instability at small errors is avoided by adding a small constant $ε$ , which makes the model more robust to learning at small errors during training, as shown in equation (14):

L_{C} = \sqrt{x^{2} + ε^{2}},

(14)

where

x

is the difference between the standard image and the output image and

ε

is a smoothing term used to ensure the differentiability of the function when the error is small.

3.4. KANDiff: KAN-Based Diffusion Models

In this paper, KANDiff, a dark light image enhancement method, is proposed by combining the patch-based network structure with the diffusion model and utilizing KAN to enhance the noise estimation ability of the diffusion model. The model network structure is shown in Figure 4. The overall structure adheres to the framework of forward noise addition and reverse denoising characteristic of diffusion models. The model employs a patch-based image recovery strategy, where the image is divided into $N$ sample patches for diffusion operations prior to forward noise addition, and the patches are subsequently combined in the final sampling stage. As a result, KANDiff can recover images of varying sizes.

Figure 4.

Overall network framework diagram.

In comparison to the noise estimation networks utilized in conventional diffusion models, excessive convolution operations can result in the loss of high-frequency detail information. Consequently, the skip connection is implemented to ensure that the extracted features encapsulate richer semantic information. Additionally, since traditional noise estimation networks do not adequately meet the demands for image restoration in predicting noise, KAN has been integrated into the noise estimation network to enhance its generative capacity. This integration aims to improve the network’s accuracy in estimating noise, thereby enabling the high-quality enhancement of low-light images.

In the backward denoising process of the diffusion models, the noise estimation network estimates the noise while denoising the image to complete the image recovery. The image quality depends on the estimation accuracy of the noise estimation network, and since KAN has a stronger noise estimation ability, this paper combines it with U-Net to obtain Unet-Kan, which realizes more accurate noise estimation and thus improves the quality of the recovered image. Firstly, the image is down-sampled to obtain different levels of image features; then the time embedding is added in the intermediate layer to perform the feature extraction through the KAN module, and finally, the up-sampling is used to obtain high-quality and accurate noise prediction results. The structure of the KAN module is shown in Figure 5.

Figure 5.

Internal structure of Kolmogorov–Arnold network (KAN) module.

When the feature map $X^{L}$ is input into the KAN block, it is initially reshaped into a series of two-dimensional feature patches ${X_{i}^{L} \in R^{P^{2} \cdot C_{L}} | i = 1, \dots, N}$ for tokenization, where each patch has a size of $P \times P$ and the total number of feature patches is defined as $N = (H_{L} \times W_{L}) / P^{2}$ . Subsequently, the feature patches are mapped into the potential space via the linear projection matrix $E$ , as indicated in equation (15):

x_{0} = [X_{L}^{1} E; X_{L}^{2} E; \dots; X_{L}^{N} E],

(15)

where the projection matrix

E

is realized by a convolutional layer with kernel size 3.

When tokenization is completed, the desired tokens are obtained and passed to the KAN layer, which is similar in principle to the MLP and consists of multiple layers of nested activation functions, each with fixed input and output dimensions, allowing for more efficient and interpretable feature extraction. A $k$ -layer KAN can be defined in equation (16) as a nesting of multiple KAN layers.

KAN (x) = (Φ_{k - 1} \circ Φ_{k - 2} \circ \dots \circ Φ_{1} \circ Φ_{0}) x,

(16)

where

Φ_{i}

is the

n_{in} \times n_{out}

activation function matrix corresponding to the

k

th KAN layer, and

n_{in}

is the input dimension and

n_{out}

is the output dimension of the

k

th KAN layer.

The results of KAN computation from the $k$ th to the $k + 1$ th layer can be expressed in the form of a matrix, equation (17):

x_{k + 1} = \underset{Φ_{k}}{\underset{⏟}{(\begin{array}{cccc} ϕ_{k, 1, 1} (\cdot) & ϕ_{k, 1, 2} (\cdot) & \dots & ϕ_{k, 1, n_{k}} (\cdot) \\ ϕ_{k, 2, 1} (\cdot) & ϕ_{k, 2, 2} (\cdot) & \dots & ϕ_{k, 2, n_{k}} (\cdot) \\ ⋮ & ⋮ & ⋮ \\ ϕ_{k, n_{k + 1}, 1} (\cdot) & ϕ_{k, n_{k + 1}, 2} (\cdot) & \dots & ϕ_{k, n_{k + 1}, n_{k}} (\cdot) \end{array})}} x_{k},

(17)

After each KAN layer, the application layer normalizes and passes the output features to the next block, for example, the output of the $k$ th KAN block can be written as equation (18):

x_{k} = LN (KAN (x_{k - 1})) + F (TE (t))

(18)

where

F

is the linear projection and

TE (t)

denotes the time embedding given the time step

t

In addition, the noise prediction network Unet-Kan whose purpose is to predict the noise $ϵ_{T}$ and optimize it by mean squared error loss, the loss function $L_{Diff}$ is shown in equation (19):

L_{Diff} = {‖ ϵ_{T} - Unet - Kan (I_{T}, t) ‖}_{2} .

(19)

4. Experiments and Analysis of Results

4.1. Experimental Setup and Datasets

The KANDiff model is trained to utilize the PyTorch framework on an NVIDIA Tesla T4 GPU. The initial learning rate for network training is uniformly set to $2 \times 10^{- 5}$ for each image patch, the patch size is $64 \times 64$ , and the ADAM optimizer is used for network training. Additionally, an exponential moving average with a decay rate of 0.999 is implemented during parameter updates to enhance test metrics and bolster model robustness.

The model is trained on LOLv1 (Wei et al., 2018) dataset, which has a total of 500 image pairs, of which 485 pairs are allocated for training and 15 pairs for validation. Furthermore, the performance of the KANDiff is evaluated on two synthetics pairwise datasets: VE-LOL (Liu et al., 2021) and LSRW (Hai et al., 2023). The VE-LOL dataset consists of 500 image pairs designated for training and 100 pairs for validation, whereas the LSRW dataset includes 5,650 image pairs sourced from diverse scenes, with 5,600 pairs randomly selected for training and the remaining 50 pairs reserved for validation. In this paper, we also test the generalization ability of the proposed method to unknown scenes on two commonly used real-world unpaired datasets DICM (Lee et al., 2013) and LIME (Guo et al., 2016). Moreover, this paper proposes a low-light dataset LIDS with 16 low-light images of different sizes.

4.2. Evaluation Indicators

To objectively assess the performance of the model, this paper employs the peak signal-to-noise ratio (PSNR) and SSIM as the primary evaluation metrics for paired datasets. For unpaired datasets, natural image quality evaluation (NIQE) serves as the main evaluation criterion.

PSNR is a widely used metric to quantify the quality of an image or video, typically comparing the quality of a compressed image with its original counterpart. Higher PSNR values generally indicate superior image quality.

SSIM is a metric designed to evaluate the perceptual similarity between two images. Unlike PSNR, SSIM incorporates factors such as luminance, contrast, and structural information, providing a metric that aligns more closely with human visual perception. The SSIM value ranges from 0 to 1, with higher values indicating greater similarity between the two images.

NIQE is a widely adopted metric for assessing the quality of an image in the absence of a reference image. Lower NIQE values signify higher image quality, as the image exhibits characteristics more closely resembling those of natural images.

4.3. Ablation Experiments

To evaluate the effectiveness of KANDiff, we conducted two ablation experiments; ablation experiment 1 used the original U-Net for model training; ablation experiment 2 used a patch size of $128 \times 128$ for training the model. Figure 6 presents a comparative analysis of the results from the ablation experiments and the proposed method. From the visual point of view, the images generated by the two ablation experiments have obvious color distortion, while the enhanced images by the method proposed in this paper do not have obvious chromatic aberration and effectively retain the original content of the images, so the output images have better visual effects. Furthermore, as shown in Table 1, this paper compares the floating point operations per second (FLOPS) before and after the addition of KAN, and it can be seen that the model can be enhanced by adding a lower computational complexity to get a larger result, and the method proposed in this paper outperforms the ablation experiments in terms of SSIM values on all three datasets.

Figure 6.

Comparison of the effect of this paper’s method with ablation experiments.

Table 1.

Comparison of Our Method Between the Methods in This Paper and Ablation Experiments on LOLv1, VE-LOL and LSRW Datasets. Ablation Experiment 1 Used the Original U-Net for Model Training, Ablation Experiment 2 Used a Patch Size of $128 \times 128$ for Training the Model. The Best Results are Highlighted in Bold.

		LOLv1		VE-LOL		LSRW
Method	FLOPS	PSNR $↑$	SSIM $↑$	PSNR $↑$	SSIM $↑$	PSNR $↑$	SSIM $↑$
Ablation Experiment 1	44.62 GFLOPS	20.15	0.839	24.78	0.897	15.52	0.494
Ablation Experiment 2	–	24.63	0.881	26.83	0.907	15.61	0.501
KANDiff(Ours)	44.83 GFLOPS	24.77	0.884	27.84	0.911	15.74	0.504

Note. FLOPS = floating point operations per second; PSNR = peak signal-to-noise ratio; SSIM = structural similarity index.

4.4. Comparison and Analysis of Results

The KANDiff proposed in this paper is compared with RetinexNet (Wei et al., 2018), Zero-DCE (Guo et al., 2020), EnlightenGAN (Jiang et al., 2021), Restormer (Zamir et al., 2022), UHD-LL (Li et al., 2023), LLFlow (Wang et al., 2022), and LLFormer (Wang et al., 2023) across the LOLv1, VE-LOL, and LSRW datasets. Two evaluation metrics, PSNR and SSIM, are employed for performance assessment. The results, presented in Table 2, indicate that the proposed method achieves the highest performance in both PSNR and SSIM on the VE-LOL dataset. On the LOLv1 and LSRW datasets, the KANDiff outperforms the other methods in terms of SSIM, and the difference in PSNR between KANDiff and the best-performing method is only 0.988 and 1.56. In summary, on LOL, VE-LOL, and LSRW datasets, the algorithm in this paper combines the advantages of the diffusion model and KAN and achieves the best image enhancement effect, which can prove the effectiveness of the proposed method in this paper.

Table 2.
Comparison of our Method With Mainstream Low-Light Enhancement Methods on LOLv1, VE-LOL, and LSRW Datasets. The Best Results are Highlighted in Bold.

LOLv1 VE-LOL LSRW

Method PSNR $↑$ SSIM $↑$ PSNR $↑$ SSIM $↑$ PSNR $↑$ SSIM $↑$

RetinexNet

(BMVC2018) 16.774 0.562 17.715 0.652 15.609 0.414

Zero-DCE

(CVPR2020) 14.861 0.562 18.059 0.580 15.867 0.443

EnlightenGAN

(TIP2021) 17.606 0.653 18.676 0.678 17.106 0.463

Restormer

(CVPR2022) 23.451 0.837 25.160 0.882 16.303 0.453

UHD-LL

(ICLR2023) 23.090 0.870 21.780 0.854 17.300 0.493

LLFlow

(AAAI2023) 25.132 0.872 26.200 0.888 16.540 0.485

LLFormer

(AAAI2023) 25.758 0.851 26.197 0.819 16.050 0.446

KANDiff(Ours) 24.77 0.884 27.84 0.911 15.74 0.504

	LOLv1	VE-LOL	LSRW
RetinexNet
(BMVC2018)	16.774	0.562	17.715	0.652	15.609	0.414
Zero-DCE
(CVPR2020)	14.861	0.562	18.059	0.580	15.867	0.443
EnlightenGAN
(TIP2021)	17.606	0.653	18.676	0.678	17.106	0.463
Restormer
(CVPR2022)	23.451	0.837	25.160	0.882	16.303	0.453
UHD-LL
(ICLR2023)	23.090	0.870	21.780	0.854	17.300	0.493
LLFlow
(AAAI2023)	25.132	0.872	26.200	0.888	16.540	0.485
LLFormer
(AAAI2023)	25.758	0.851	26.197	0.819	16.050	0.446
KANDiff(Ours)	24.77	0.884	27.84	0.911	15.74	0.504

Note. PSNR = peak signal-to-noise ratio; SSIM = structural similarity index; Zero-DCE = Zero-reference deep curve estimation.

In order to better illustrate the low-light enhancement ability of the images generated by the model in this paper, the image visualization of two datasets, LOLv1 and VE-LOL, are selected and compared with the images generated by other algorithms, and as shown in Figures 7 and 8, the KANDiff effectively enhances the image contrast and recovers the details of the image in the low-light environment.

Figure 7.

Comparison of images generated by this paper’s method with other methods in the LOLv1 dataset with normal light and low light images.

Figure 8.

Comparison of images generated by this paper’s method with other methods in the VE-LOL dataset with normal light and low light images.

For evaluating the performance of the model on dark–light images of unknown sizes, this paper proposes an unpaired dark–light image dataset containing images of different sizes, named LIDS. In the experiments on the LIDS dataset, the method proposed in this paper effectively enhances the contrast of the image and significantly recovers the dark details. By comparing the results of the experiments (shown in Figure 9), it can be clearly observed that the method in this paper maintains high image quality when dealing with low-light images of different sizes. This indicates that the method proposed in this paper has a relatively strong generalization ability to adapt to various unknown sizes of low-light images, thus demonstrating good results and potential in practical applications.

Figure 9.

Comparison of the images generated by this paper’s method in the LIDS dataset with low-light images.

To further verify the generalization ability of this paper’s algorithm, experiments are conducted on the unpaired datasets LIME and DICM, and systematic comparisons are made with the current mainstream low-light enhancement approaches Zero-DCE and Restormer. The comparison results are shown in Figure 10. In the DICM dataset, the image enhanced by KANDiff effectively enhances the brightness of the indoor low-light region, while preserving the image details well in the enhancement process. This result shows the effectiveness and applicability of the proposed method in low-light conditions. On the LIME dataset, the images enhanced by both Zero-DCE and Restormer methods showed different degrees of color bias, which affected the overall quality of the images. In contrast, the proposed KANDiff successfully recovers the details of the image, no color bias occurs during the enhancement process, and the overall visual effect is superior.

Figure 10.

Comparison of the effectiveness of this paper’s method with other methods on LIME and DICM datasets.

In addition, the evaluation metric NIQE for unpaired datasets is introduced in this paper to quantify the performance of different methods in image enhancement. The data analysis in Table 3 shows that the NIQE values of the proposed KANDiff are both the lowest, 4.171 and 3.898, respectively, indicating the superiority of the proposed methods in image quality assessment. The experimental results show that the method in this paper exhibits good generalization ability on a wide range of datasets and has higher reliability.

Table 3.

Comparison of NIQE Values on the Unpaired Datasets LIME and DICM. Lower NIQE Values Represent Higher Image Quality. The Best Results are Highlighted in Bold.

Method	LIME	DICM
Zero-DCE	4.379	3.951
Restormer	4.365	3.964
KANDiff(Ours)	4.171	3.898

Note. NIQE = natural image quality evaluation; Zero-DCE = Zero-reference deep curve estimation.

4.5. More Directions

While the proposed method has demonstrated SOTA performance in low-light image enhancement, we further explore its potential for cross-domain adaptation. Underwater image enhancement presents unique challenges due to wavelength-dependent light attenuation and scattering effects caused by suspended particles, which manifest as color casts and low contrast. Addressing these issues holds critical implications for marine exploration and ecological monitoring, driving continuous interdisciplinary innovations in computational imaging and underwater robotics, with significant scientific research value and real-world application urgency.

We extended the approach presented in this paper to the field of underwater image enhancement and conducted training and testing on the UIEB dataset. As shown in Table 4, the proposed method outperforms the methods of Water-net (Li et al., 2019) and PUGAN (Cong et al., 2023) in terms of both PSNR and SSIM, achieving the best results. Furthermore, the visual results of the proposed method are illustrated in Figure 11.

Figure 11.

The method proposed in this article generates images on the UIEB dataset.

Table 4.

Comparison Between Our Method and Mainstream Underwater Image Enhancement Methods on the UIEB Dataset. The Best Results are Highlighted in Bold.

	UIEB
Method	PSNR $↑$	SSIM $↑$
Water-net (2019)	18.950	0.860
PUGAN (2023)	22.26	0.857
KANDiff(Ours)	22.96	0.917

Note. PSNR = peak signal-to-noise ratio; SSIM = structural similarity index.

5. Conclusions

In this paper, a diffusion models-based low-light image enhancement network, named KANDiff, is proposed. The network significantly improves the accuracy of the model in the noise estimation process by combining KAN with the noise estimation network, which in turn achieves an efficient inverse diffusion process to achieve high-quality image enhancement. In addition, KANDiff adopts a patch-based image restoration strategy that enables it to handle images of unknown size, further enhancing the generalization ability of the model, and introducing a hybrid loss function that effectively suppresses the production of bias color phenomenon.

The proposed method is systematically evaluated on several popular low-light image datasets. The experimental results show that the KANDiff in this paper exhibits significant enhancement in several objective metrics (e.g., PSNR and SSIM) compared to other existing algorithms, and reaches the optimal level in most cases. In addition, experimental validation on underwater image enhancement scenarios has demonstrated KANDiff’s robustness and adaptability across diverse environments. The subjective evaluation results also show that KANDiff exhibits good enhancement performance on real-world unpaired datasets, and the analysis of NIQE metrics further demonstrates that the enhanced images are able to contain more detailed information and are more in line with the standards of human visual perception. The research results in this paper provide an important theoretical foundation and practical guidance for the development of image enhancement techniques in a low-light environment.

While KANDiff exhibits impressive performance in the domain of low-light image enhancement, several promising avenues for further research remain. First, the computational efficiency of diffusion models continues to pose a significant challenge. Future studies could investigate the application of knowledge distillation techniques or the development of lightweight variants of KANDiff, aimed at facilitating real-time processing on resource-constrained devices. Additionally, expanding the current framework to accommodate dynamic video sequences, as opposed to static images, would greatly enhance its potential applications in fields such as surveillance and mobile photography.

Footnotes

ORCID iDs

Yuanxin Ren

Minghui Yue

Yuxuan He

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project has received funding from The National Natural Science Foundation of China (Grant Nos. 62001272 and 62472264) and Shandong Provincial Natural Science Foundation, China (Grant No. ZR2023MF015).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Abdullah-Al-Wadud

Kabir

M. H.

Dewan

M. A. A.

Chae

(2007). A dynamic histogram equalization for image contrast enhancement. IEEE Transactions on Consumer Electronics, 53(2), 593–600.

Bresson

Nikolentzos

Panagopoulos

Chatzianastasis

Pang

Vazirgiannis

(2024). Kagnns: Kolmogorov–Arnold networks meet graph learning. arXiv preprint arXiv:2406.18380.

Cai

Zhang

(2018). Learning a deep single image contrast enhancer from multi-exposure images. IEEE Transactions on Image Processing, 27(4), 2049–2062.

Cong

Yang

Zhang

Guo

C.-L.

Huang

Kwong

(2023). PUGAN: Physical model-guided underwater image enhancement using gan with dual-discriminators. IEEE Transactions on Image Processing, 32, 4472–4485.

Zeng

Huang

Zhang

X.-P.

Ding

(2016). A weighted variational model for simultaneous reflectance and illumination estimation. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 2782–2790). IEEE.

Guo

Loy

C. C.

Hou

Kwong

Cong

(2020). Zero-reference deep curve estimation for low-light image enhancement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 1780–1789). IEEE.

Guo

Ling

(2016). LIME: Low-light image enhancement via illumination map estimation. IEEE Transactions on Image Processing, 26(2), 982–993.

Guo

Wang

Yang

Huang

Wang

Pfister

Wen

(2023). Shadowdiffusion: When degradation prior meets diffusion model for shadow removal. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 14049–14058).

Hai

Xuan

Yang

Hao

Zou

Lin

Han

(2023). R2rnet: Low-light image enhancement via real-low to real-normal network. Journal of Visual Communication and Image Representation, 90, 103712.

10.

Hao

Han

Guo

Wang

(2022). Decoupled low-light image enhancement. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(4), 1–19.

11.

Jain

Abbeel

(2020). Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33, 6840–6851.

12.

Saharia

Chan

Fleet

D. J.

Norouzi

Salimans

(2022). Cascaded diffusion models for high fidelity image generation. Journal of Machine Learning Research, 23(47), 1–33.

13.

Salimans

(2022). Classifier-free diffusion guidance. arXiv preprint arXiv:2207.12598.

14.

Ibrahim

Kong

N. S. P.

(2007). Brightness preserving dynamic histogram equalization for image contrast enhancement. IEEE Transactions on Consumer Electronics, 53(4), 1752–1758.

15.

Jiang

Gong

Liu

Cheng

Fang

Shen

Yang

Zhou

Wang

(2021). Enlightengan: Deep light enhancement without paired supervision. IEEE Transactions on Image Processing, 30, 2340–2349.

16.

Jiang

Luo

Fan

Han

Liu

(2023). Low-light image enhancement with wavelet-based diffusion models. ACM Transactions on Graphics (TOG), 42(6), 1–14.

17.

Kawar

Elad

Ermon

Song

(2022). Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35, 23593–23606.

18.

Kim

Y.-T.

(1997). Contrast enhancement using brightness preserving bi-histogram equalization. IEEE Transactions on Consumer Electronics, 43(1), 1–8.

19.

Kolmogorov

A. N.

(1957). On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Doklady Akademii Nauk, 114, 953–956.

20.

Land

E. H.

McCann

J. J.

(1971). Lightness and retinex theory. Josa, 61(1), 1–11.

21.

Lee

Kim

C.-S.

(2013). Contrast enhancement based on layered difference representation of 2D histograms. IEEE Transactions on Image Processing, 22(12), 5372–5384.

22.

Guo

Ren

Cong

Hou

Kwong

Tao

(2019). An underwater image enhancement benchmark dataset and beyond. IEEE Transactions on Image Processing, 29, 4376–4389.

23.

Guo

C.-L.

Zhou

Liang

Zhou

Feng

Loy

C. C.

(2023). Embedding fourier for ultra-high-definition low-light image enhancement. arXiv preprint arXiv:2302.11831.

24.

Liu

Wang

Liu

Chen

Yuan

(2024). U-KAN makes strong backbone for medical image segmentation and generation. arXiv preprint arXiv:2406.02918.

25.

Liu

Yang

Sun

Guo

(2018). Structure-revealing low-light image enhancement via robust Retinex model. IEEE Transactions on Image Processing, 27(6), 2828–2841.

26.

Yang

Cao

(2021). A deep learning based image enhancement approach for autonomous driving at night. Knowledge-Based Systems, 213, 106617.

27.

Liang

Wang

Quan

Chen

Liu

Ling

(2021). Recurrent exposure generation for low-light face detection. IEEE Transactions on Multimedia, 24, 1609–1621.

28.

Liu

Wang

Fan

Wang

Tang

(2024). Residual denoising diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 2773–2783). IEEE.

29.

Liu

Wang

Vaidya

Ruehle

Halverson

Soljačić

Hou

T. Y.

Tegmark

(2024). KAN: Kolmogorov–Arnold networks. arXiv preprint arXiv:2404.19756.

30.

Liu

Wang

Zeng

Zhao

(2021). PD-GAN: Perceptual-details GAN for extremely noisy low light image enhancement. In ICASSP 2021—2021 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 1840–1844). IEEE .

31.

Liu

Yang

Fan

Huang

(2021). Benchmarking low-light image enhancement and beyond. International Journal of Computer Vision, 129, 1153–1184.

32.

Loh

Y. P.

Chan

C. S.

(2019). Getting to know low-light images with the exclusively dark dataset. Computer Vision and Image Understanding, 178, 30–42.

33.

Lore

K. G.

Akintayo

Sarkar

(2017). LLNet: A deep autoencoder approach to natural low-light image enhancement. Pattern Recognition, 61, 650–662.

34.

Ooi

C. H.

Isa

N. A. M.

(2010). Quadrants dynamic histogram equalization for contrast enhancement. IEEE Transactions on Consumer Electronics, 56(4), 2552–2559.

35.

Özdenizci

Legenstein

(2023). Restoring vision in adverse weather conditions with patch-based denoising diffusion models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(8), 10346–10357.

36.

Peebles

Xie

(2023). Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision (ICCV) (pp. 4195–4205). IEEE.

37.

Ren

Delbracio

Talebi

Gerig

Milanfar

(2022). Image deblurring with domain generalizable diffusion models. arXiv preprint arXiv:2212.01789, 1.

38.

Ronneberger

Fischer

Brox

(2015). U-net: Convolutional networks for biomedical image segmentation. In Medical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, 5–9 October 2015, proceedings, part III 18 (pp. 234–241). Springer.

39.

Saharia

Chan

Salimans

Fleet

D. J.

Norouzi

(2022). Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), 4713–4726.

40.

Shen

Yue

Feng

Chen

Liu

(2017). Msr-net: Low-light image enhancement using deep convolutional network. arXiv preprint arXiv:1711.02488.

41.

Huang

Jiang

Liu

(2024). Freeu: Free lunch in diffusion u-net. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 4733–4743). IEEE.

42.

Vaca-Rubio

C. J.

Blanco

Pereira

Caus

(2024). Kolmogorov-arnold networks (kans) for time series analysis. arXiv preprint arXiv:2405.08790.

43.

Wang

Wan

Yang

Chau

L.-P.

Kot

(2022). Low-light image enhancement with normalizing flow. In Proceedings of the AAAI conference on artificial intelligence (Vol. 36, pp. 2604–2612). AAAI Press.

44.

Wang

Zhang

Shao

Luo

Stenger

Kim

T.-K.

Liu

(2023). LLDiffusion: Learning degradation representations in diffusion models for low-light image enhancement. arXiv preprint arXiv:2307.14659.

45.

Wang

Zhang

Shen

Luo

Stenger

(2023). Ultra-high-definition low-light image enhancement: A benchmark and transformer-based method. In Proceedings of the AAAI conference on artificial intelligence (Vol. 37, pp. 2654–2662). AAAI Press.

46.

Wei

Wang

Yang

Liu

(2018). Deep retinex decomposition for low-light enhancement. arXiv preprint arXiv:1808.04560.

47.

Whang

Delbracio

Talebi

Saharia

Dimakis

A. G.

Milanfar

(2022). Deblurring via stochastic refinement. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 16293–16303). IEEE.

48.

Zamir

S. W.

Arora

Khan

Hayat

Khan

F. S.

Yang

M.-H.

(2022). Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR) (pp. 5728–5739). IEEE.

49.

Zhou

Yang

(2023). Pyramid diffusion models for low-light image enhancement. arXiv preprint arXiv:2305.10028.

KANDiff: Low-light Image Enhancement Based on Diffusion Models

Abstract

Keywords

1. Introduction

2.1. Low-Light Image Enhancement

2.2. Image Restoration Based on Diffusion Models

2.3. Kolmogorov–Arnold Network (KAN)

3. Method

3.1. Diffusion Models

4.1. Experimental Setup and Datasets

4.2. Evaluation Indicators

4.3. Ablation Experiments

Footnotes

ORCID iDs

Funding

Declaration of Conflicting Interests

References