Sage Journals: Discover world-class research

Abstract

Low-light conditions reduce brightness and contrast, obscure structural details, and amplify noise, which degrades visual quality and adversely affects downstream tasks, such as object detection and segmentation. Despite significant advances in deep learning-based enhancement, existing approaches still struggle to simultaneously balance brightness, detail preservation, and color fidelity. To address this, we propose a two-stage Retinex-based algorithm guided by multi-channel integrated feature optimization. Our main contributions are threefold. (1) We introduce a three-channel illumination decomposition strategy that models RGB illumination independently to mitigate color distortion. (2) We design a U-Net–based decomposition network with deformable convolutions, dual-layer attention, and selective-kernel fusion for multi-scale feature extraction. (3) We develop a two-branch fusion network incorporating detail enhancement, low-frequency filtering, and a curve-based illumination adjustment module for artifact-free enhancement. Extensive experiments on standard datasets show that our method outperforms state-of-the-art algorithms in terms of PSNR, SSIM, and NIQE. Furthermore, experiments on the ExDark dataset demonstrate that the proposed method improves object detection mAP to 0.193, representing a 67.8% increase over the original low-light images, thereby validating its effectiveness for downstream vision tasks.

Keywords

attention mechanism deep learning feature fusion low-light image enhancement multi-channel decomposition retinex theory

1. Introduction

In many real-world applications, such as security monitoring, autonomous driving, medical imaging and remote sensing, images captured under low-light conditions often suffer from insufficient brightness, low contrast, blurred details, and heavy noise. These degradations not only reduce visual quality but also hinder high-level vision tasks such as object detection and segmentation.¹ With advances in deep learning, low-light image enhancement has developed rapidly, leveraging neural networks to learn illumination estimation and restoration strategies directly from large-scale datasets.^2,3 In this work, low-light conditions refer to scenes captured under insufficient illumination, typically characterized by low luminance, reduced contrast, and low signal-to-noise ratio (SNR), often accompanied by noise amplification and color distortion. Such conditions commonly occur in nighttime environments, poorly illuminated indoor scenes, and complex real-world scenarios.⁴

Recent methods can be categorized as follows. Supervised learning approaches train models on paired low-/normal-light images to directly learn decomposition and enhancement mappings. Early representative works adopt Retinex-inspired decompositions or encoder–decoder structures to estimate illumination and reflectance from paired datasets.^5,6 However, many supervised methods rely on a single-channel illumination estimate that is simply replicated across RGB channels, which can cause color distortions under complex lighting.³ Reinforcement-based or exposure-learning methods cast enhancement as a sequential decision problem, where the network learns to adjust exposure or enhancement parameters through reward signals rather than pixel-wise loss.⁷ These methods can flexibly adapt to varying lighting conditions and sometimes preserve details better without paired supervision.⁸ Nevertheless, they often require carefully designed rewards and may be less stable during training.⁹

Unsupervised approaches, based on adversarial training, learn enhancement from unpaired datasets by combining Generative Adversarial Networks (GAN) objectives with perceptual or cycle-consistency constraints.^10,11 These methods tend to generalize better to diverse real-world scenarios but can produce artifacts or inconsistent color fidelity if discriminator or perceptual losses are not well balanced.¹²

Zero-shot and test-time optimization methods derive enhancement mappings from the input image itself.¹³ This method avoids dataset domain gaps and can operate in real time, but may struggle with extremely noisy or severely underexposed inputs.^14,15 Motivated by the complementary strengths and weaknesses of these paradigms, we propose a two-stage Retinex-based framework that jointly leverages multi-channel illumination modeling and learned fusion to balance brightness recovery, color fidelity, and noise suppression.^16,17

Despite significant progress, several challenges remain.¹⁸ Many existing models struggle to balance brightness enhancement, detail preservation, and color fidelity simultaneously, resulting in blurry structures or unnatural colors.¹⁹ Low-light images captured in complex scenes exhibit coupled degradations such as noise, blur, and non-uniform illumination, which remain challenging for traditional Retinex-based or single-branch models.²⁰ Models relying on single-channel illumination estimation inherently limit color accuracy, because illumination is not strictly uniform across RGB channels under real imaging conditions. Finally, encoder–decoder structures based on upsampling operations frequently introduce checkerboard artifacts that degrade visual quality.

To address these challenges, we propose a two-stage low-light image enhancement algorithm based on Retinex theory, incorporating multi-channel integrated feature optimization. In the decomposition stage, we introduce a three-channel illumination modeling strategy that separately learns the illumination distributions for the R, G, and B channels. This design effectively mitigates the color distortion commonly observed in single-channel approaches. A U-Net architecture serves as the backbone of the proposed decomposition network.²¹ This backbone is augmented with deformable convolution and a dual-layer attention mechanism.^22,23 These enhancements substantially strengthen the network’s capability for multi-scale feature extraction in low-light conditions. Consequently, the model achieves a more precise decomposition of reflectance and illumination components. In the fusion stage, a Detail–Low-Frequency Module (DLM) processing module inspired by PE-YOLO is employed to jointly enhance texture information and suppress noise.²³ Additionally, a learnable curve-based illumination adjustment module is integrated to achieve smooth and artifact-free brightness enhancement, avoiding the checkerboard effects associated with conventional upsampling.

The proposed network is optimized with a redesigned loss function that includes an illumination grayscale constraint, encouraging consistent and low-contrast illumination across channels and improving decomposition stability. To fully exploit reference images in paired datasets, reflectance and illumination components are jointly restored using the enhanced fusion module.

Recent research has highlighted that low-light image enhancement does not always lead to consistent improvements in downstream vision tasks. In some cases, enhancement artifacts may degrade detection or classification accuracy, and the impact is often category-dependent.²⁴ To address this issue, several studies have explored joint enhancement–recognition frameworks, which aim to optimize low-level image restoration and high-level semantic understanding in a unified manner.²⁵ These findings suggest that evaluating enhancement methods solely based on image quality metrics may be insufficient, and downstream task performance should be carefully considered.

Extensive experiments on standard datasets demonstrate the superiority of the proposed method. On LOL-v1 and LOL-v2-real, the algorithm achieves improvements in PSNR, SSIM, and NIQE over state-of-the-art methods including RetinexNet,⁵ KinD++,²⁶ RRM,²⁷ R2RNet,²⁸ EnlightenGAN,¹² URetinexNet,²⁹ Diff-Retinex and LYT-Net.^30,31 On unpaired datasets such as LIME, NPE, MEF, and VV, our method shows strong generalization and produces more natural enhancement results. Furthermore, validation on the ExDark dataset demonstrates that enhanced images can significantly improve object detection accuracy when combined with the DETR or YOLOv5 detector, confirming the method’s practicality in downstream visual tasks.^32,33

2. Methodology

Building on the two-stage Retinex-based low-light image enhancement algorithm, we propose an improved multi-channel low-light image enhancement algorithm (MC-LLE) that performs multi-channel decomposition of illumination. As illustrated in Figure 1, the algorithm primarily consists of an image decomposition network and an image fusion network. In the image decomposition stage, based on the U-net encoder-decoder architecture,³⁴ a Multi-Scale Deformable Feature Fusion with Dual-Layer Attention (MSDFF) and a multi-feature fusion strategy are incorporated to decompose the input low-light image into a multi-channel reflectance image and a multi-channel illumination image. In the image fusion stage, histogram equalization is incorporated into the network as a prior. Specifically, the input low-light image is first processed by histogram equalization, and the resulting enhanced image is jointly fed into the fusion network (PX-DenseUNet) together with the reflectance and illumination components produced by the decomposition network. Formally, the histogram-equalized image serves as an auxiliary input branch, providing global intensity distribution priors that complement the learned reflectance and illumination representations during feature fusion. The fusion network adopts a DenseUNet architecture with a channel attention mechanism and is trained under the supervision of reconstruction and structural similarity (SSIM) losses.

Figure 1.

Multi-channel low-light image enhancement algorithm architecture.

Specifically, the proposed PX-DenseUNet differs from standard DenseUNet in three aspects:

(1) pixel reorganization is adopted instead of traditional upsampling to alleviate checkerboard artifacts;

(2) a channel attention mechanism is embedded into each dense block to enhance feature selection;

(3) multi-source inputs (reflectance, illumination, and histogram-equalized images) are jointly fused, enabling cross-domain feature interaction.

This design effectively integrates the color and texture information provided by histogram equalization into the final enhanced result, thereby alleviating color distortion and detail loss.

Additionally, the fusion stage employs a PX-DenseUNet-based architecture, integrated with a learnable curve estimation module, to perform joint optimization of reflectance and illumination components, enabling natural and artifact-free enhancement.

2.1. Image decomposition network

Traditional Retinex theory-based methods decompose low-light images into three-channel reflectance components and a single-channel illumination component. When fusing the illumination and reflectance components, these methods typically expand the single-channel illumination component to three channels through simple replication, followed by pixel-wise multiplication with the reflectance component. However, due to the nonlinearity of color channels and the complexity of low-light image data, this expansion approach tends to cause local color loss during the enhancement process.

To overcome the decomposition challenges, an improved illumination modeling strategy is introduced, as shown in Figure 2. This strategy models illumination independently for each RGB channel, enabling the network to learn channel-specific illumination distributions under diverse lighting conditions. This achieves more accurate estimation of the three-channel illumination components, thereby obtaining more precise reflectance components and effectively enhancing the generalization ability of the model for low-light image decomposition. In the subsequent image fusion stage, this strategy achieves nonlinear fusion of reflectance and illumination components within the RGB three channels which can be formulated as:

\begin{array}{l} L_{l o w / n o r m a l}^{(R)} = R_{l o w / n o r m a l}^{(R)} ⊙ I_{l o w / n o r m a l}^{(R)} \\ L_{l o w / n o r m a l}^{(G)} = R_{l o w / n o r m a l}^{(G)} ⊙ I_{l o w / n o r m a l}^{(G)} \\ L_{l o w / n o r m a l}^{(B)} = R_{l o w / n o r m a l}^{(B)} ⊙ I_{l o w / n o r m a l}^{(B)} \end{array}

(1)

where

L_{l o w / n o r m a l}^{(R / G / B)}

denotes the R/G/B channel component of the low-light/normal-light image pair input to the decomposition network,

R_{l o w / n o r m a l}^{(R / G / B)}

represents the R/G/B channel reflectance component image output by the decomposition network, and

I_{l o w / n o r m a l}^{(R / G / B)}

denotes the R/G/B channel illumination component image. The symbol

⊙

denotes pixel-wise multiplication, meaning the input network image can be represented as the pixel-wise product of reflectance and illumination components within each of the RGB three channels. Modeling illumination for each RGB channel separately enables a more accurate representation of color-dependent lighting, thus reducing the color distortion observed in single-channel approaches.

Figure 2.

The improved illumination modeling strategy.

2.2. Image fusion network

2.2.1. Detail–low-frequency module (DLM)

To address noise characteristics and detail blurring during image fusion, inspired by PE-YOLO,³⁵ we propose a Detail–Low-Frequency Module (DLM) to jointly perform detail enhancement and noise suppression on the image reflectance component. The DLM module adopts a parallel architecture design, comprising two core sub-modules: the Detail Processing Module (DPM) for enhancing image texture details and the Low-Frequency Enhancement Module (LEF) for extracting low-frequency semantic information and suppressing high-frequency noise. This parallel design enhances the feature representation by jointly capturing high-frequency details and low-frequency structural cues, thereby improving the fidelity and robustness of the reconstructed images.

In the fusion network stage, the DPM module is utilized to enhance the image features derived from the decomposition network. As shown in Figure 3, the DPM adopts a dual-branch structure, including a context branch and an edge branch. The context branch captures global context information by establishing long-range dependencies, enabling global enhancement of image features. The edge branch calculates image gradients using Sobel operators in two different directions, effectively extracting edge information of image features and enhancing texture detail features of the image.

Figure 3.

DPM module architecture diagram.

The context branch employs a residual learning mechanism, to preserve abundant low-frequency information through skip connections. This branch contains two residual blocks. The first residual block increases the number of channels of the input feature from 3 to 32, while the second residual block reduces the number of channels from 32 back to 3. This bottleneck structure enables the model to aggregate contextual information while preserving low-frequency components through residual connections. Such design is beneficial for low-level vision tasks where both local detail and global structure must be retained. The calculation process of the context branch can be expressed as:

x_{n + 1} = x_{n} + γ (F (x_{n}))

(2)

where

F

denotes a convolutional layer with a kernel size of 3×3, and

γ

represents the Leaky ReLU activation function.

In the edge branch, the Sobel operator is used to compute discrete derivatives in horizontal and vertical directions, thereby approximating the image gradient for edge feature extraction. This branch applies Sobel operators in both the vertical and horizontal directions of the image, re-extracts edge information via convolutional filters, and employs residual connections to enhance information flow. This process is mathematically expressed as:

x_{n + 1} = F ({Sobel}_{h} (x_{n}) + {Sobel}_{w} (x_{n})) + x_{n}

(3)

where

{Sobel}_{h}

and

{Sobel}_{w}

denote Sobel operations in the vertical and horizontal directions, respectively.

In addition to the DPM module, the fusion network also integrates the LEF module, specifically designed to extract low-frequency information from the reflectance component. The structural design of the LEF module is shown in Figure 4. For the input reflectance component $f \in R^{h \times w \times 3}$ , it is first converted to $f \in R^{h \times w \times 32}$ via a convolution layer. Subsequently, a dynamic low-pass filter combined with multi-scale adaptive average pooling (with kernel sizes of 1×1, 2×2, and 3×3) is used to capture low-frequency information. Bilinear interpolation upsampling is applied at the end of each scale to restore the original size of the features. Through channel separation $f$ , the module divides the feature into three parts ${f_{1}, f_{2}, f_{3}}$ , each of which undergoes pooling processing at different scales. The specific process is expressed as:

Filter (f_{i}) = Up (β_{s} (f_{i}))

(4)

where

β_{s}

represents adaptive average pooling with a kernel size of

s \times s

, and

Up

denotes bilinear interpolation upsampling. Finally, the processed results of each branch are concatenated and restored to the original dimension

f \in R^{h \times w \times 3}

to form the enhanced feature representation of the reflectance component.

Figure 4.

LEF module architecture diagram.

2.2.2. Illumination component enhancement module

In the decomposition framework based on Retinex theory, the reflectance component carries the inherent structure and texture details of the image, while the illumination component determines the brightness distribution and overall atmosphere of the image. Effective enhancement of the illumination component is a key step in improving the visual quality of low-light images. Traditional encoder-decoder networks often introduce checkerboard artifacts during such tasks due to the non-divisibility issue in upsampling operations,³⁶ resulting in unnatural local patch-like effects in the enhanced image. To address this problem and achieve refined, adaptive adjustment of the illumination component, this study designs an illumination enhancement module based on learnable curve estimation, whose workflow is shown in Figure 5. Abandoning the traditional downsampling-upsampling paradigm, this module transforms the enhancement process into a pixel-level nonlinear curve mapping,³⁷ facilitating flexible and robust brightness improvement while maintaining smooth transitions across the entire image.

Figure 5.

Schematic diagram of curve estimation method.

The illumination component image serves as the input to the curve estimation module. The module estimates eight sets of curve coefficients $α_{i} (i = 1, 2, 3 \dots 8)$ (each with three channels), resulting in a total of 24 feature channels. These $α_{i}$ are trainable parameters, constrained within the range of $[- 1, 1]$ The illumination component undergoes 4 rounds of 3×3 convolution, and the feature maps obtained from each convolution layer are mapped separately, ultimately yielding a feature map $x_{7}$ with 24 channels. The calculation process is expressed as:

\begin{array}{l} x_{1} = C o n v_{3 \times 3} (x), x_{2} = C o n v_{3 \times 3} (x_{1}), \\ x_{3} = C o n v_{3 \times 3} (x_{2}), x_{4} = C o n v_{3 \times 3} (x_{3}), \\ x_{5} = C o n v_{3 \times 3} (C o n c a t (x_{3}, x_{4})), x_{6} = C o n v_{3 \times 3} (C o n c a t (x_{2}, x_{5})), \\ x_{7} = C o n v_{3 \times 3} (C o n c a t (x_{1}, x_{6})) \end{array}

(5)

The feature map $x_{7}$ is then decomposed into 8 features with 3 channels per feature. Curve estimation is applied to each feature to adjust the illumination component, which is mathematically represented as:

\begin{array}{l} α_{1}, \dots, α_{8} = s p l i t (x_{7}) \\ I_{i} = x (j) + α_{i} x (j) (1 - x (j)) \\ I_{E n} = s i g m o i d (C o n c a t (I_{1}, \dots, I_{8})) \end{array}

(6)

where

x (j)

denotes the j-th pixel of

x

I_{E n}

represents the adjusted image, and

I_{i}

denotes the i-th adjusted image of

α_{i}

2.3. Loss function design

To train the decomposition network more effectively, constraints must be imposed on the three-channel illumination components. During Retinex decomposition of the image, the decomposed reflectance component is expected to contain more of the image’s color information. Since the original input image can be obtained by pixel-wise multiplication of the illumination and reflectance components, enhancing color details in the output three-channel reflectance component implies reducing color details in the three-channel illumination component. Therefore, a grayscale regularization term is introduced to constrain the inter-channel consistency of illumination. The corresponding loss function is formulated as:

L_{g r a y} = {‖ I_{R} - I_{G} ‖}_{1} + {‖ I_{R} - I_{B} ‖}_{1} + {‖ I_{G} - I_{B} ‖}_{1}

(7)

where

I_{R}

I_{G}

,and

I_{B}

denote the pixel values of the illumination component in the R, G, and B channels, respectively. The illumination component grayscale loss calculates the

L_{1}

loss between any two of the three channels of the illumination component, reducing the pixel difference among the three channels and ensuring low contrast of the illumination component. During decomposition, illumination grayscale constraints are applied simultaneously to the illumination components of both the low-light and normal-light images.

The loss function of the decomposition network, after incorporating the improved decomposition strategy, is the sum of the original decomposition network loss function and the illumination grayscale loss $L_{g r a y}$ proposed in this section. The total decomposition loss function is:

L_{d e c o m} = L_{r e c o n} + L_{c o l o r} + L_{g r a d} + λ_{m c} \times L_{m c} + L_{g r a y}

(8)

where

λ_{m c}

empirically balances reconstruction fidelity and illumination consistency.

We conduct a sensitivity analysis on the parameter

λ_{m c}

. As shown in Table 1, the model achieves optimal performance when

λ_{m c} = 0.1

, while larger or smaller values lead to performance degradation, indicating the effectiveness and robustness of the selected parameter.

Table 1.

Quantitative evaluation of model performance under different values of $λ_{m c}$ .

$λ_{m c}$	PSNR(dB)	SSIM	NIQE
0.00	23.512	0.821	4.372
0.05	23.786	0.829	4.241
0.10	23.933	0.835	4.166
0.15	23.801	0.832	4.210
0.20	23.624	0.826	4.298

To fully utilize the reference normal-light image, the reflectance and illumination components of the low-light image are restored and enhanced using the reflectance and illumination components of the reference image. The improved loss function of the fusion network is:

L_{f u s} = L_{r e c o n} - L_{s s i m} + {‖ R_{r e s} - R_{n o r m a l} ‖}_{1} - {‖ I_{e n} - I_{n o r m a l} ‖}_{1}

(9)

where

{‖ \cdot ‖}_{1}

denotes the

l_{1}

norm.

2.4. Experiments

All experiments are implemented based on the open-source KinD++ framework, with modifications to both the decomposition and fusion networks.²⁶ The decomposition network adopted the improved DeSK-Unet, and the three-channel illumination decomposition strategy is used. The loss function of the decomposition network is the one proposed in Section 2.3. For the fusion of the reflectance image, illumination image, and histogram equalization map of the low-light image, an improved network is used, and the loss function of the fusion network is the one proposed in Section 2.3.

2.4.1. Experimental environment and parameter configuration

All experiments in this study were implemented based on the PyTorch deep learning framework and conducted under a unified hardware and software environment to ensure comparability and reproducibility of results. The specific configuration is as follows:

Operating System: Ubuntu 20.04

GPU: NVIDIA GeForce RTX 3090 (24GB)

CPU: AMD R9-5900X

CUDA: 11.3

cuDNN: 8.2

Deep Learning Framework: PyTorch 1.12

For training hyperparameters, the basic settings of the KinD++ open-source framework are adopted, with fine-tuning performed for the tasks in this work. The specific parameters are as follows:

Optimizer: Adam

Initial Learning Rate: 0.0004

Learning Rate Scheduler: ExponentialLR (gamma = 0.997)

Batch Size: 10

Training Epochs: 250

Image Patch Size: 256×256

2.4.2. Datasets and evaluation metrics

To evaluate the proposed method, experiments are conducted on widely used low-light image enhancement datasets, including both paired and unpaired data.

2.4.2.1. Paired training datasets

The LOL-v1 and LOL-v2-real datasets are used for supervised training. LOL-v1 contains 500 paired images, with 485 for training and 15 for testing. LOL-v2-real includes 689 real-captured image pairs with more complex illumination variations and realistic noise characteristics.

These datasets cover diverse indoor and outdoor scenes under varying lighting conditions, enabling a comprehensive evaluation of model robustness. All experiments follow the official dataset splits for fair comparison.

2.4.2.2. Unpaired testing datasets

To further assess generalization, additional experiments are conducted on unpaired real-world low-light datasets. These datasets include diverse scenes such as nighttime and low-illumination environments without ground-truth references.

Evaluation on these datasets relies on no-reference metrics and visual comparison, reflecting more practical application scenarios.

The following widely recognized image quality evaluation metrics are used for quantitative analysis of the enhancement results:

Peak Signal-to-Noise Ratio (PSNR): Measures the pixel-level fidelity between the enhanced image and the reference normal-light image, and higher values indicate better performance.

Structural Similarity Index Measure (SSIM): Evaluates the similarity in structural information between the enhanced image and the reference image, whose values range from [0, 1] and higher values indicating better similarity.

Natural Image Quality Evaluator (NIQE): A no-reference image quality evaluation metric; lower values indicate higher naturalness and better visual quality of the image.

Mean Average Precision (mAP): Evaluated using the DETR detector on the ExDark dataset to measure the practical value of the enhanced images in object detection tasks.

3. Results and analysis

3.1. Validation of the effectiveness of the three-channel illumination decomposition strategy

The first part of the experiment verifies an improved strategy through a comparative experiment. This strategy targets multi-channel illumination component decomposition and can effectively enhance the performance of the network in Retinex decomposition of low-light images. Using DeSK-Unet as the basic decomposition network, we train the network with both single-channel and three-channel illumination decomposition strategies respectively. The training is conducted on the LOL-v1 dataset, and the trained decomposition networks with different decomposition strategies are used to perform Retinex decomposition on images from the LOL-v1 test set. We compare the Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index Measure (SSIM) between the reflectance components of low-light/reference images, as well as the Natural Image Quality Evaluator (NIQE) of the reflectance component images of low-light images. Table 2 summarizes the comparative results. As shown in Figure 6, the single-channel strategy introduces noticeable color distortion, while the proposed three-channel decomposition produces more natural and visually consistent results.

Table 2.

Performance comparison of different illumination decomposition strategies on the LOL-v1 test set.

Decomposition strategy	PSNR(dB)	SSIM	NIQE
Single-channel Decomposition	24.655	0.828	3.966
Three-channel Decomposition	24.736	0.841	3.845

Figure 6.

Qualitative comparison corresponding to Table 2. The three-channel illumination decomposition strategy effectively reduces color distortion and improves color fidelity.

The results show that after adopting the three-channel illumination decomposition strategy, all key metrics are improved. Compared with the single-channel version, the three-channel strategy yields moderate gains in PSNR (+0.081 dB) and SSIM (+0.013), indicating that the three-channel modeling leads to more accurate restoration of image structure and details. At the same time, the NIQE value decreased by 0.121, indicating that the enhanced image has a higher degree of naturalness. These results indicate that the three-channel strategy provides more accurate illumination estimation.

3.2. Comprehensive ablation study

To thoroughly evaluate the effectiveness of each component in the proposed framework, we conduct a comprehensive ablation study covering three aspects:

(1) the decomposition network design,

(2) the fusion network modules, and

(3) the proposed loss function.

All experiments are conducted on the LOL-v1 test set under the same training configuration to ensure fairness.

3.2.1. Ablation study on the decomposition network

To validate the contribution of the proposed Multi-Scale Deformable Feature Fusion with Dual-Layer Attention (MSDFF) module and its internal components, we perform a progressive ablation study starting from a baseline U-Net architecture.

The evaluated components include:

deformable convolution

dual-layer attention mechanism

selective-kernel (SK) fusion

the overall MSDFF module

The ablation results in Table 3 reveal a clear and consistent performance improvement as more advanced components are progressively incorporated into the decomposition network, demonstrating both the effectiveness and complementarity of the proposed design.

Table 3.

Ablation study on the decomposition network.

MSDFF	Deformable conv	Dual attention	SK fusion	PSNR(dB)	SSIM	NIQE
×	×	×	×	23.102	0.798	4.512
×	√	×	×	23.218	0.807	4.436
×	√	√	×	23.336	0.819	4.351
×	√	√	√	23.421	0.826	4.298
√	√	√	√	23.587	0.832	4.221

First, the introduction of deformable convolution leads to a noticeable improvement across all metrics. This gain can be attributed to its ability to adaptively adjust spatial sampling locations, allowing the network to better capture irregular structures and non-uniform illumination patterns commonly observed in low-light images. Compared with standard convolution, this flexibility is particularly beneficial for Retinex decomposition, where accurate separation of illumination and reflectance depends heavily on local structural fidelity.

Second, the integration of the dual-layer attention mechanism further enhances performance, especially in terms of SSIM. This indicates that attention mechanisms play a critical role in selectively emphasizing informative spatial regions and feature channels while suppressing irrelevant or noisy responses. In low-light conditions, where signal-to-noise ratio is inherently low, such adaptive feature reweighting significantly improves structural consistency and detail preservation.

Third, the inclusion of selective-kernel (SK) fusion introduces multi-scale adaptability into the network. By dynamically aggregating features from different receptive fields, the model becomes more capable of handling diverse illumination distributions and object scales. This results in improved detail reconstruction and a reduction in perceptual artifacts, as reflected by the steady decrease in NIQE.

Finally, when all components are integrated into the MSDFF module, the model achieves the best overall performance. This demonstrates that the combination of deformable sampling, attention-based feature selection, and multi-scale fusion forms a synergistic architecture. Rather than acting as independent enhancements, these components jointly improve the network’s ability to perform accurate and robust Retinex decomposition under challenging lighting conditions.

3.2.2. Ablation experiment of the fusion network modules

To evaluate the effectiveness of the proposed fusion-stage components, we analyze the contributions of the Detail–Low-Frequency Module (DLM) and the curve estimation module.

The ablation results in Table 4 highlight the distinct yet complementary roles of the DLM module and the curve estimation module in the fusion stage.

Table 4.

Ablation study on fusion modules.

DLM module	Curve estimation module	PSNR (dB)	SSIM	NIQE
×	×	23.358	0.811	4.322
√	×	23.452	0.830	4.236
×	√	23.653	0.821	4.534
√	√	23.933	0.835	4.166

When the DLM module is introduced independently, the model shows a clear improvement in SSIM and a reduction in NIQE, while the gain in PSNR remains relatively modest. This suggests that the primary contribution of the DLM module lies in enhancing structural fidelity and suppressing noise rather than directly increasing pixel-wise accuracy. This behavior is consistent with its architectural design: the detail processing branch strengthens high-frequency texture information, while the low-frequency enhancement branch captures global semantic structures and filters out noise. The parallel design enables the model to balance detail enhancement and denoising, which is crucial for perceptual quality.

In contrast, the curve estimation module significantly improves PSNR when applied alone, indicating its effectiveness in adjusting global illumination and restoring brightness. However, this improvement is accompanied by a slight degradation in NIQE, suggesting that brightness enhancement without proper structural guidance may amplify noise or introduce unnatural artifacts. This observation aligns with the nature of pixel-wise nonlinear mapping, which enhances intensity but lacks explicit constraints on structural consistency.

When both modules are combined, the model achieves the best performance across all metrics. This confirms that the two modules address complementary aspects of the enhancement problem: the DLM module ensures structural integrity and noise suppression, while the curve estimation module provides flexible and smooth illumination adjustment. Their integration enables the model to simultaneously achieve accurate brightness restoration and high perceptual quality, avoiding the trade-offs observed when each module is used independently.

3.2.3. Ablation study on the loss function

To validate the effectiveness of the proposed grayscale regularization loss Equation (7), we compare the model performance with and without this constraint.

The results in Table 5 demonstrate that the proposed grayscale regularization loss plays a critical role in improving both perceptual quality and color consistency.

Table 5.

Ablation study on grayscale regularization loss.

Grayscale loss	PSNR (dB)	SSIM	NIQE
×	23.742	0.828	4.298
√	23.933	0.835	4.166

Specifically, introducing the grayscale constraint leads to consistent improvements in SSIM and a noticeable reduction in NIQE, while also yielding a moderate gain in PSNR. This indicates that the loss function not only stabilizes the decomposition process but also enhances the visual naturalness of the reconstructed images.

From a mechanistic perspective, the grayscale regularization enforces consistency among the RGB illumination channels, effectively constraining the illumination component to exhibit low inter-channel variance. This encourages the model to encode color information primarily within the reflectance component, which is more appropriate under the Retinex formulation. As a result, color distortion artifacts—commonly observed in single-channel or unconstrained illumination models—are significantly reduced.

Moreover, this constraint improves training stability by reducing the solution space of the decomposition problem. Without such regularization, the model may produce ambiguous decompositions where color information is inconsistently distributed between illumination and reflectance. By enforcing a physically meaningful prior, the grayscale loss leads to more reliable and interpretable decomposition results, ultimately improving downstream enhancement quality.

3.2.4. Overall component analysis

To further analyze the cumulative contribution of different components, we perform a combined ablation experiment by progressively integrating the major modules.

The overall ablation results in Table 6 provide a holistic view of how each component contributes to the final performance and demonstrate the necessity of the complete model design. As shown in Figure 7, the visual quality is progressively improved as more modules are introduced. The baseline result suffers from low brightness and color distortion, while the full model produces more natural colors, clearer details, and better noise suppression.

Table 6.

Overall ablation study of the proposed model.

MSDFF	DLM	Curve	Gray loss	PSNR (dB)	SSIM	NIQE
×	×	×	×	23.102	0.798	4.512
√	×	×	×	23.587	0.832	4.221
√	√	×	×	23.702	0.834	4.198
√	√	√	×	23.851	0.833	4.287
√	√	√	√	23.933	0.835	4.166

Figure 7.

Visual comparison of the progressive integration of model components. From (c) to (g), the results demonstrate the incremental improvements in detail preservation, brightness enhancement, noise suppression, and color consistency.

Starting from the baseline, the introduction of the MSDFF module yields a substantial improvement across all metrics, confirming that enhanced feature extraction is fundamental to accurate Retinex decomposition. By improving the quality of the initial decomposition, subsequent enhancement stages are provided with more reliable inputs.

Adding the DLM module further improves SSIM and reduces NIQE, highlighting its effectiveness in refining structural details and suppressing noise during the fusion process. This indicates that even with a strong decomposition backbone, dedicated mechanisms for detail enhancement remain essential.

When the curve estimation module is incorporated, PSNR increases further due to improved brightness and contrast. However, a slight fluctuation in NIQE is observed when the grayscale constraint is not applied, suggesting that illumination enhancement alone may introduce minor perceptual inconsistencies.

Finally, with the inclusion of the grayscale regularization loss, the model achieves the best performance across all metrics. This confirms that enforcing illumination consistency is crucial for stabilizing the overall pipeline and improving color fidelity.

Overall, the progressive improvements observed in Table 6 demonstrate that each component contributes to a specific aspect of the enhancement task, including feature extraction, structural refinement, illumination adjustment, and color consistency. More importantly, these components operate in a highly complementary manner, forming a coherent and well-balanced framework rather than a simple aggregation of independent modules.

3.3. Comprehensive performance comparison with advanced algorithms

In this section, the multi-channel illumination low-light image enhancement algorithm proposed in this work is comprehensively compared with a variety of mainstream and cutting-edge algorithms on multiple datasets. Quantitative analysis of different algorithms is conducted using image evaluation metrics including PSNR, SSIM, and NIQE, and the superiority of the low-light image enhancement algorithm proposed in this study is more intuitively judged by visualizing the low-light image enhancement results. Table 7 shows the experiment results of different methods on the LOL-v1 dataset and the LOL-v2-real dataset.

Table 7.

Performance comparison of paired datasets. (LOL-v1 and LOL-v2-real).

Comparison method	Complexity		LOL-v1 dataset			LOL-v2-real dataset
Comparison method	FLOPS(G)	Params(M)	PSNR	SSIM	NIQE	PSNR	SSIM	NIQE
RetinexNet⁵	6.79	1.23	17.271	0.707	7.461	17.715	0.658	5.662
KinD++²⁶	35.06	8.27	21.300	0.822	4.682	17.661	0.799	4.726
RRM²⁷	-	-	13.877	0.669	6.067	17.345	0.726	6.581
R2RNet²⁸	31.10	∼10	18.180	0.756	4.720	18.180	0.775	4.986
EnlightenGAN¹²	7.88	8.64	17.486	0.631	4.716	18.640	0.675	5.094
URetinexNet²⁹	58.27	0.36	21.328	0.834	4.363	21.093	0.858	4.474
Diff-Retinex³⁰	402.11	5.23	22.691	0.818	5.027	21.032	0.833	5.118
LYT-Net³¹	4.25	0.045	21.608	0.812	4.232	21.390	0.843	4.189
MC-LLE (this work)	6.87	1.72	23.933	0.835	4.166	26.157	0.869	4.174

As can be seen from the results in Table 7, on the two mainstream paired datasets, the proposed algorithm in this work achieves the best performance in the PSNR and NIQE metrics, and also has strong competitiveness in the SSIM metric. Especially on the more challenging LOL-v2-real dataset, compared with other advanced algorithms, the proposed method further improves the PSNR by more than 4.767 dB and the SSIM by more than 0.011, significantly widening the performance gap and demonstrating the effectiveness of the multi-channel decomposition and enhancement strategy. Figure 8 shows the comparison of low-light image enhancement effects of different algorithms on the LOL-v1 dataset. It can be seen from the comparison figure that compared with other algorithms, the proposed algorithm in this work has less color loss and richer details when enhancing low-light images. Figure 9 presents a visual comparison of the low-light image enhancement effects of different algorithms on the LOL-v2-real dataset. It can be seen from the figure that RetinexNet, RRM and EnlightenGAN tend to produce unnatural images when enhancing low-light images. The enhancement results of the KinD++ and Diff-Retinex algorithms have the problem of insufficient image brightness. The enhancement result of the R2RNet algorithm has obvious color loss, and the enhancement result of the URetinexNet algorithm on the LOL-v2-real dataset shows good color and detail performance, but there is still obvious noise.

Figure 8.

Comparison of low-light image enhancement effects by different algorithms on the LOLv1 dataset.

Figure 9.

Comparison chart of low-light image enhancement effects by different algorithms on the LOLv2-real dataset.

To evaluate the practicality of the proposed method, we compare its computational complexity with representative approaches in terms of FLOPs and parameter count, as shown in Table 7.

The proposed MC-LLE achieves superior enhancement performance while maintaining relatively low computational cost. Compared with high-complexity methods, it requires significantly fewer FLOPs and parameters, while delivering better reconstruction quality than lightweight models.

These results indicate that the performance gains are achieved through efficient architectural design rather than increased model size. Therefore, the proposed method achieves a favorable performance–efficiency trade-off, making it suitable for real-world applications.

To verify the robustness of the algorithm for image enhancement, unpaired datasets including LIME, NPE, MEF, and VV are used to evaluate different algorithms. The comparison results are shown in Table 8. In the unpaired test without reference image guidance, the algorithm proposed in this work achieves the lowest NIQE values on the MEF and VV datasets, indicating that its enhancement results have the best natural visual quality in these scenarios. Although it is slightly inferior to EnlightenGAN and KinD++ on the LIME and NPE datasets, it still shows excellent generalization ability overall.

Table 8.

Comparison of NIQE values of different algorithms on unpaired datasets.

Comparison method	LIME	NPE	MEF	VV
RetinexNet⁵	5.187	4.278	4.310	3.505
KinD++²⁶	4.358	4.195	3.868	3.188
RRM²⁷	4.860	4.938	5.247	4.485
R2RNet²⁸	5.488	4.980	5.372	4.561
EnlightenGAN¹²	4.141	3.922	4.333	3.611
URetinexNet²⁹	4.776	4.407	4.051	3.428
Diff-Retinex³⁰	4.558	4.666	3.802	7.588
LYT-Net³¹	5.587	6.026	3.847	11.003
MC-LLE (this work)	4.550	4.287	3.367	2.978

The comparison results of the algorithms on the MEF and VV datasets are shown in Figures 10 and 11. The enhancement result of RetinexNet has relatively serious color distortion and noise, while the enhancement results of the KinD++, RRM, and EnlightenGAN algorithms have obvious noise.

Figure 10.

Comparison of algorithm result images on the MEF dataset.

Figure 11.

Comparison of algorithm result images on the VV dataset.

3.4. Application validation in downstream visual tasks

To evaluate the practical value of the proposed method, we conduct object detection experiments on the full ExDark dataset, which contains 7,363 low-light images across 12 categories. To ensure statistical reliability, all experiments are repeated three times, and the mean ± standard deviation is reported.

To validate generalization, two detectors (DETR and YOLOv5) pre-trained on COCO are used without fine-tuning. As shown in Table 9, the proposed method consistently improves mAP across both detectors. For DETR, it achieves 0.193 ± 0.007, significantly outperforming the original low-light input (0.115). Similar trends on YOLOv5 indicate that the gains are not architecture-specific.

Table 9.

Object detection results on the ExDark dataset.

Source of input image	DETR (mAP)	YOLOv5(mAP)
Original Low-Light Image (Direct Input)	0.115±0.005	0.287±0.006
RetinexNet⁵	0.140±0.006	0.312±0.007
KinD++²⁶	0.169±0.006	0.338±0.008
RRM²⁷	0.170±0.007	0.341±0.008
R2RNet²⁸	0.180±0.006	0.352±0.007
EnlightenGAN¹²	0.181±0.007	0.355±0.007
URetinexNet²⁹	0.177±0.006	0.348±0.006
Diff-Retinex³⁰	0.181±0.006	0.355±0.009
LYT-Net³¹	0.185±0.007	0.359±0.008
MC-LLE (this work)	0.193±0.007	0.372±0.008

Per-category analysis reveals that performance improvements are category-dependent. Objects with strong structural features (e.g., car, bus) benefit more, while texture-sensitive categories (e.g., bicycle) show smaller gains. This suggests that enhancement artifacts, such as over-smoothing or contrast distortion, may negatively affect certain categories.

Overall, the proposed method improves both visual quality and downstream detection performance, while maintaining stable generalization across detectors.

According to the test results, all enhancement algorithms can effectively improve the object detection performance under low-light conditions. Among them, the proposed multi-channel illumination enhancement algorithm in this work achieves the highest mAP (0.193), which represents a 67.8% performance improvement compared with directly using the original low-light images. These findings indicate that the enhanced images generated by the proposed algorithm in this work not only possess higher visual quality but also provide higher-quality and more discriminative visual information for downstream advanced computer vision tasks, thus having important practical application significance.

4. Discussion

This work presents a two-stage Retinex-based framework integrating three-channel illumination decomposition with multi-scale fusion and dual-layer attention. The method further employs a parallel Detail–Low-Frequency Module (DLM) and a learnable curve-based illumination adjustment in the fusion stage. Collectively, these design choices seek to improve brightness restoration, color fidelity, and detail preservation under challenging low-light conditions.

Mechanistically, modeling illumination per RGB channel allows the network to capture non-uniform and color-dependent lighting effects that are not well represented by a single-channel approximation. Similarly, the multi-scale dynamic feature fusion and dual-attention modules facilitate complementary aggregation of semantic and low-level cues: attention weighting emphasizes informative spatial–channel locations, while deformable convolutions enable adaptive receptive fields for local structure recovery. In the fusion stage, the parallel detail and low-frequency branches decouple texture enhancement from denoising, and the learnable curve mapping enforces smooth, artifact-free luminance adjustments at the pixel level.

Empirically, the combined architecture yields consistent gains in both fidelity and perceptual metrics. For instance, adopting the three-channel decomposition improved PSNR and SSIM on the LOL-v1 test set (+0.081 dB and +0.013, respectively) and reduced NIQE. Moreover, applying the enhanced images to a downstream DETR and YOLOv5 detector resulted in a substantive mAP increase on ExDark, indicating the practical benefit for high-level vision tasks.

While the proposed method demonstrates strong performance across various benchmarks, it is important to further analyze its limitations under more challenging conditions.

In extremely low-light scenarios, where the signal-to-noise ratio is severely degraded, the model may still exhibit slight noise amplification and limited structural recovery. This is mainly due to insufficient reliable information in the input, which makes accurate illumination–reflectance decomposition more difficult. As a result, noise may be amplified along with useful signals, and fine structural details may not be fully preserved.

In addition, although the proposed modules improve robustness in most cases, their effectiveness may decrease under highly uneven illumination or severely underexposed regions. The current model also introduces moderate computational overhead compared to lightweight methods. Future work will focus on improving robustness under extreme conditions while further optimizing model efficiency.

5. Conclusion

To address the challenges of insufficient brightness, detail degradation, color distortion, and noise in low-light image enhancement, this paper proposes a Retinex-based framework that integrates multi-channel illumination decomposition with a two-stage fusion strategy. The method models illumination independently across RGB channels and combines decomposition and fusion networks to achieve robust enhancement under complex lighting conditions.

The core contributions of this work lie in three aspects. First, a three-channel illumination modeling strategy is introduced to overcome the limitations of traditional single-channel assumptions, significantly improving color fidelity. Second, a multi-scale feature extraction mechanism incorporating deformable convolution and dual-layer attention is designed to enhance decomposition accuracy. Third, a fusion framework integrating a Detail–Low-Frequency Module (DLM) and a learnable curve estimation module is developed to jointly optimize structural details and illumination adjustment.

Extensive experiments on both paired and unpaired datasets demonstrate that the proposed method consistently outperforms existing approaches in terms of PSNR, SSIM, and NIQE. In addition, evaluations on the ExDark dataset show that the enhanced images substantially improve object detection performance, achieving a 67.8% increase in mAP compared with original low-light inputs, which highlights the practical value of the proposed method for downstream vision tasks.

Despite these promising results, the proposed method still has limitations in extremely low-light scenarios and introduces moderate computational overhead. Future work will focus on improving robustness under severe illumination degradation, reducing model complexity through lightweight design, and exploring joint optimization strategies that better align low-level enhancement with high-level vision tasks.

Footnotes

ORCID iD

Zhiwen Wang

Ethical considerations

This study did not involve human participants, animal experiments, or any biological samples requiring ethical review.

Author contributions

Zhiwen Wang conducted the programming tasks, performed primary data analysis. Yulong Qiao supervised the overall study, and critically reviewed and refined the manuscript for submission. Yan Cang contributed to formal analysis, investigation, and drafted the initial manuscript. All authors contributed to the research and approved the final manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

All datasets used in this study are publicly available and can be accessed as follows: • LOL-v1 and LOL-v2-real datasets: These paired low-light/normal-light image datasets are available at https://www.github.com/albrateanu/LYT-Net/tree/main/PyTorch. These datasets were used for supervised training of the proposed model. • LIME, NPE, MEF, and VV datasets: These unpaired datasets for generalization testing are available at https://www.github.com/weichen582/RetinexNet. • ExDark dataset: This low-light object detection dataset is available at . It was used to evaluate the impact of image enhancement on downstream object detection tasks. All datasets were accessed and used in accordance with their respective terms of use. No new datasets were generated during the current study.

References

Zheng

Pan

, et al. Low-Light Image and Video Enhancement: A Comprehensive Survey and Beyond. 2022; arXiv Preprint arXiv:2212.10772. https://doi.org/10.48550/arXiv.2212.10772

Zhao

Abisado

. A Survey of Low-light Image Enhancement Algorithms based on Deep Learning. 2025 3rd International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS). IEEE, India, 2025, pp. 1191–1196.

Wang

Zhang

C-W

, et al. Underexposed Photo Enhancement Using Deep Illumination Estimation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 2019, pp. 6842–6850.

Zhang

Kong

Wang

, et al. LLNet: A Lightweight Lane Line Detection Network. In: Peng

S-M

Gabbouj

, et al. (eds). Image and Graphics. Springer International Publishing, pp. 355–369.

Wei

Wang

Yang

, et al. Deep Retinex Decomposition for Low-Light Enhancement. 2018; arXiv Preprint arXiv:1808.04560. https://doi.org/10.48550/arXiv.1808.04560

Zhang

Guo

. Kindling the Darkness: A Practical Low-light Image Enhancer. Proceedings of the 27th ACM International Conference on Multimedia. ACM, Nice France, 2019, pp. 1632–1640.

Liu

Zhang

, et al. DeepExposure: Learning to Expose Photos with Asynchronously Reinforced Adversarial Learning. 2018, 32nd Conference on Neural Information Processing Systems, Montréal, Canada, Curran Associates, Inc., 2153-2163.

, et al. Exposure: A White-Box Photo Post-Processing Framework. ACM Trans Graph 2018; 37: 1–17. https://doi.org/10.1145/3181974

Guo

Chen

. Learning to Enhance Low-Light Image via Zero-Reference Deep Curve Estimation. IEEE Trans Pattern Anal Mach Intell 2021; 44: 1. https://doi.org/10.1109/tpami.2021.3063604

10.

Zhu

J-Y

Park

Isola

, et al. Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks. 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, 2017, pp. 2242–2251.

11.

Ignatov

Kobyshev

Timofte

, et al. DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks. 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, Venice, 2017, pp. 3297–3305.

12.

Jiang

Gong

Liu

, et al. EnlightenGAN: Deep Light Enhancement Without Paired Supervision. IEEE Trans Image Process 2021; 30: 2340–2349. https://doi.org/10.1109/TIP.2021.3051462

13.

Guo

, et al. Zero-Reference Deep Curve Estimation for Low-Light Image Enhancement. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 2020, pp. 1777–1786.

14.

Sara

Akter

Uddin

. Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study. J Comput Commun 2019; 07: 8–18. https://doi.org/10.4236/jcc.2019.73002

15.

Gharbi

Chen

Barron

, et al. Deep bilateral learning for real-time image enhancement. ACM Trans Graph 2017; 36: 1–12. https://doi.org/10.1145/3072959.3073592

16.

Land

McCann

. Lightness and Retinex Theory. J Opt Soc Am 1971; 61: 1–11. https://doi.org/10.1364/josa.61.000001

17.

Jiang

Luo

Han

, et al. Low-Light Image Enhancement with Wavelet-based Diffusion Models. 2023; arXiv Preprint arXiv:2306.00306. https://doi.org/10.48550/ARXIV.2306.00306

18.

Yang

Wang

Fang

, et al. From Fidelity to Perceptual Quality: A Semi-Supervised Approach for Low-Light Image Enhancement. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 2020, pp. 3060–3069.

19.

Liu

, et al. Toward Fast, Flexible, and Robust Low-Light Image Enhancement. 2022; arXiv Preprint arXiv:2204.10137. https://doi.org/10.48550/ARXIV.2204.10137

20.

Yang

Yin

, et al. Learning to Restore Low-Light Images via Decomposition-and-Enhancement. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 2020, pp. 2278–2287.

21.

Liu

Tian

, et al. Dual Attention Network for Scene Segmentation. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 2019, pp. 3141–3149.

22.

Zhu

Lin

, et al. Deformable ConvNets V2: More Deformable, Better Results. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 2019, pp. 9300–9308.

23.

Zamir

Arora

Khan

, et al. Restormer: Efficient Transformer for High-Resolution Image Restoration. arXiv Preprint arXiv:2111.0, 2021. https://doi.org/10.48550/ARXIV.2111.09881.

24.

Lai

Jie

, et al. Low-Light Enhancement Effect on Classification and Detection: An Empirical Study. 2024; arXiv Preprint arXiv:2409.14461. https://doi.org/10.48550/arXiv.2409.14461

25.

Al Sobbahi

Tekli

. Low-Light Homomorphic Filtering Network for integrating image enhancement and classification. Signal Process Image Commun 2022; 100: 116527. https://doi.org/10.1016/j.image.2021.116527

26.

Zhang

Guo

, et al. Beyond Brightening Low-light Images. Int J Comput Vis 2021; 129: 1013–1037. https://doi.org/10.1007/s11263-020-01407-x

27.

Liu

Yang

, et al. Structure-Revealing Low-Light Image Enhancement Via Robust Retinex Model. IEEE Trans Image Process 2018; 27: 2828–2841. https://doi.org/10.1109/TIP.2018.2810539

28.

Hai

Xuan

Yang

, et al. R2RNet: Low-light image enhancement via Real-low to Real-normal Network. J Vis Commun Image Represent 2023; 90: 103712. https://doi.org/10.1016/j.jvcir.2022.103712

29.

Weng

Zhang

, et al. URetinex-Net: Retinex-based Deep Unfolding Network for Low-light Image Enhancement. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 2022, pp. 5891–5900.

30.

Zhang

, et al. Diff-Retinex: Rethinking Low-light Image Enhancement with A Generative Diffusion Model. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, France, 2023, pp. 12268–12277.

31.

Brateanu

Balmez

Avram

, et al. LYT-NET: Lightweight YUV Transformer-Based Network for Low-Light Image Enhancement. IEEE Signal Process Lett 2025; 32: 2065–2069. https://doi.org/10.1109/lsp.2025.3563125

32.

Loh

Chan

. Getting to know low-light images with the Exclusively Dark dataset. Comput Vis Image Underst 2019; 178: 30–42. https://doi.org/10.1016/j.cviu.2018.10.010

33.

Carion

Massa

Synnaeve

, et al. End-to-End Object Detection with Transformers. In: Vedaldi

Bischof

Brox

, et al. (eds). Computer Vision – ECCV. Springer International Publishing, 2020, pp. 213–229.

34.

Ronneberger

Fischer

Brox

, et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Navab

Hornegger

Wells

, et al. (eds). Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. 2015, Springer International Publishing, pp. 234–241.

35.

Yin

Fei

, et al. PE-YOLO: Pyramid Enhancement Network for Dark Object Detection. In: Iliadis

Papaleonidas

Angelov

, et al. (eds). Artificial Neural Networks and Machine Learning – ICANN 2023. Springer Nature Switzerland, pp. 163–174.

36.

Zhou

Tulsiani

Sun

, et al. View Synthesis by Appearance Flow. In: Leibe

Matas

Sebe

, et al. (eds). Computer Vision – ECCV. Springer International Publishing, 2016, pp. 286–301.

37.

Liu

Zhang

, et al. Retinex-inspired Unrolling with Cooperative Prior Architecture Search for Low-light Image Enhancement. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, USA, 2021, pp. 10556–10565.

Retinex-based low-light image enhancement with multi-channel feature optimization

Abstract

Keywords

1. Introduction

2. Methodology

2.1. Image decomposition network

2.2. Image fusion network

2.2.1. Detail–low-frequency module (DLM)

2.2.2. Illumination component enhancement module

2.3. Loss function design

2.4. Experiments

2.4.1. Experimental environment and parameter configuration

2.4.2. Datasets and evaluation metrics

2.4.2.1. Paired training datasets

2.4.2.2. Unpaired testing datasets

3. Results and analysis

3.1. Validation of the effectiveness of the three-channel illumination decomposition strategy

3.2. Comprehensive ablation study

3.2.1. Ablation study on the decomposition network

3.2.2. Ablation experiment of the fusion network modules

3.2.3. Ablation study on the loss function

3.2.4. Overall component analysis

3.3. Comprehensive performance comparison with advanced algorithms

3.4. Application validation in downstream visual tasks

4. Discussion

5. Conclusion

Footnotes

ORCID iD

Ethical considerations

Author contributions

Funding

Declaration of conflicting interests

Data Availability Statement

References