Sage Journals: Discover world-class research

Abstract

Background

Deep generative models can improve the generalization of deep learning in medical imaging by enriching limited training data with diverse, realistic synthetic images.

Purpose

To assess whether Denoising Diffusion Probabilistic Models (DDPM) generated synthetic MRI, with and without mutual information (MI) regularization, enhances brain tumor classification across heterogeneous datasets.

Study Type

Retrospective.

Population

A total of 559 patients with low and high grade brain tumors (LGG, HGG) were included from two datasets: public dataset (BraTS, n = 335) and clinical dataset (TASMC, n = 224), used exclusively to evaluate model generalization.

Field Strength/Sequence

1.5 T/3.0T-MR / T1WI, T1WI + C, T2WI, and FLAIR images.

Assessment

DDPM models were trained to generate synthetic MR images of low grade glioma (LGG) and high grade glioma (HGG), with a variant incorporating MI. Image quality was assessed using Pearson-correlation, Frechet-Inception-Distance (FID) and Inception-Score (IS). For classification purposes. For classification, a 2D ResNet-152 was trained under four setups: (1) real images (baseline), (2) +augmentation, (3) +DDPM, and (4) +DDPM + MI. Performance was assessed by accuracy and F1-score. Robustness was tested through cross-dataset evaluation using a 5-fold ensemble.

Results

The DDPM models, with and without MI, generated high-quality synthetic images, achieving FID = 31.47, 45.00, and IS = 1.50, 1.25, respectively. Lower FID and higher IS indicate enhanced realism and diversity, suggesting that MI improved both the quality and variability of the generated images. Cross-dataset evaluation demonstrated that DDPMs with MI achieved superior generalization performance in brain tumor classification task, with accuracies of 0.89 and 0.85 for BraTS-to-TAMSC and TAMSC-to-BraTS evaluations, respectively. These results outperform the baseline model (0.87, 0.80), traditional data augmentation (0.85, 0.78), and the standard DDPM without MI (0.82, 0.83).

Data Conclusion

DDPM + MI with ensemble learning significantly improves brain tumor generalization across diverse datasets, consistently outperforming baseline, traditional augmentation, and standard DDPM. This combination offers a robust solution for cross-institutional clinical applications.

Keywords

model generalization DDPM mutual information brain tumors classification

Introduction

Deep learning (DL) algorithms have demonstrated significant efficacy in brain lesion segmentation and classification tasks, particularly when leveraging multi parametric MRI data (T1WI, T1W + C, T2WI, and FLAIR).¹

Generalization is a critical aspect of DL models, especially in medical image based models, where the ability to accurately predict unseen data is essential for clinical applications. Medical imaging datasets frequently encompass only a narrow spectrum of pathological diagnosis. Relatively homogeneous image acquisition protocols, and patient demographics, potentially lead to model overfitting.² The efficacy of DL models in classifying unseen data is crucial, yet frequently compromised by the constrained nature of training datasets.³ This limitation often leads to overfitting, where models perform exceptionally well on familiar data but falter when presented with unseen inputs. The discrepancy between performance on training data and generalization to new, unseen data, poses a significant challenge in deploying these models in real-world scenarios. This is particularly evident in critical domains such as medical imaging.⁴

To address these challenges, solutions such as data augmentation, and synthetic data generation, can be employed.^5,6 Synthetic data generated images can serve as an effective form of data augmentation by enriching the training dataset with diverse and realistic variations. However, these strategies must be implemented carefully to ensure that they enhance model generalizability without introducing artifacts that could further skew classification results.

Several models have been proposed for synthetic MRI data generation. GAN models have been effectively employed to generate synthetic T1WI and FLAIR MRI, which are then utilized for brain lesion segmentation on the BraTS dataset.^7,8 Conte et al accomplished this task using an image-to-image translation GAN model.⁹ Moshe et al successfully used a GAN model to generate missing T1WI, T2WI, FLAIR, or both, for classification tasks.¹⁰ Guan and Loew demonstrates that incorporating GAN-generated images into the training dataset can mitigate classifier overfitting when distinguishing between normal and abnormal tissue. However, their results indicate that traditional data augmentation techniques yielded marginally superior performance compared to the GAN-based approach.¹¹

It has been shown that images, including medical images^12–14 generated by Denoising Diffusion Probabilistic Models (DDPMs) are of higher quality to those generated by GANs.^15,16 DDPMs have emerged as an alternative to GANs in generative modeling, offering advantages in training stability, sample quality, and flexibility. They have been shown to generate higher-quality medical images, resulting in improved diversity and fewer artifacts within the outputs.^17,18 In medical imaging tasks, Yuan et al and Salehinejad et al used DDPMs to manage missing data and address class imbalances in both Alzheimer's and chest x-ray classification tasks.^19,20 Yi et al investigated the generalization capabilities of DDPMs, defining generalization as the correlation between the training set and the quality of generated images. They proposed that a lower correlation and lower mutual information value, indicates better generalization, helping distinguish models that truly learn data distributions from those that memorize.²¹

The aim of this study was to evaluate the impact of DDPM-generated synthetic MRI data, with and without mutual information regularization. This impact enhancing brain tumor classification and generalization across heterogeneous datasets.

Material and Methods

The network was trained and tested on Linux system, using a single Graphical Processing Unit (GPU), CUDA 9.1 device, Nvidia RTX2080 Ti. Analysis was performed with Python software (Python 3.7.7) along with Fastai software (1.0.61).

Datasets

The Brain Tumor Segmentation Challenge (BraTS, 2019^22–24) dataset, available for download at https://www.med.upenn.edu/cbica/brats2019/data.html, used for model training and evaluation. BraTS dataset included 335 patients, scanned between 2012 and 2018 (estimation, based on^24,25), 76 cases with low grade glioma (LGG), and 259 cases with high grade glioma (HGG), respectively. Each scan contains T1WI, T1WI + C, T2WI and FLAIR images and mask of the enhancing tumor.

BraTS dataset was acquired with different clinical protocols and various scanners from multiple (n = 19) institutions. For each subject, the central tumor slice (based on the provided segmentation of the enhancing lesion) was selected.

TAMSC dataset: A local dataset from the Tel Aviv Sourasky Medical Center (TAMSC), used exclusively for evaluation assessed, the model's generalization capabilities. The TAMSC dataset included 224 patients scanned between 2008–2021. Of them, 37 cases with LGG, and 187 cases with HGG were studied. The MRI scans were part of routine clinical assessments. 3D U-net,^26,27 a validated model specifically trained for brain tumor segmentation, was employed to automatically segment the enhancing lesion area. All resulting segmentations were reviewed and manually refined, as needed.

The study was approved by the TAMSC institutional review board (IRB) which waived informed consent (IRB approval number 0200-10).

Denoising Diffusion Probability Model (DDPM)

DDPM works by incrementally corrupting the data $x$ with Gaussian noise $ϵ_{t} \sim N (μ_{t}, σ_{t})$ , in a series of forward steps; thereby learning to reconstruct the original data by reversing this corruption process. Through such approach, the model effectively has learned to generate data by denoising step by step, until the data is fully restored.¹⁵ In broad terms, a forward process was defined based on data distribution $q (x_{0})$ . This forward process was modeled as a discrete-time Markov chain ${x_{t}}_{t = 0}^{T}$ with transitions defined by $q (x_{t} | x_{t - 1})$ . Based on the Markov property, the initial distribution $q (x_{0})$ as evolved toward a known stationary distribution point $p (x_{t}) = N (0, 1)$ , as describe in equation (1):

q (x_{1}, \dots ., x_{T} | x_{0}) = \prod_{t = 1}^{T} q (x_{t} | x_{t - 1})

(1)

q (x_{t} | x_{t - 1}) = N (x_{t}; \sqrt{1 - β_{t}} x_{t - 1}, β_{t} I)

When the noise schedule $β_{t} ϵ (0, 1)$ sets hyperparameters that typically increase linearly over time, it represents the amount of noise added to the original signal at each step of the forward process. Thus, the definition of $α_{t} = 1 - β_{t}$ and accumulation of the product over time, as ${\bar{α}}_{t} = \prod_{i = 1}^{t} α_{i}$ , reflects the total signal preservation up to time t. Instead of applying noise incrementally across steps, we can now directly sample a noisy version $x_{t}$ from any time t, using the following equation (2):

x_{t} = \sqrt{{\bar{α}}_{t}} x_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ_{t}

(2)

Where $ϵ_{t} \sim N (0, I)$ .

In the reverse process, the backward transition $p_{θ} (x_{t - 1} | x_{t})$ is modeled as a learnable, conditional distribution. Each step depends only on the current noisy input, approximating the reverse-time dynamics. This process is approximately used as neural network equation (3):

p_{θ} (x_{t - 1} | x_{t}) = N (x_{t - 1}; μ_{θ} (x_{t}, t), \sum_{θ} (x_{t}, t))

(3)

Where the mean $μ_{θ} (x_{t}, t)$ and variance $\sum_{θ} (x_{t}, t) = σ_{t}^{2} I$ have been learned by a neural nework $ϵ_{θ} (x_{t}, t)$ where θ represents the learnable parameters. In practice, $ϵ_{θ} (x_{t}, t)$ is learned at every step, using U-Net architecture, that received the noisy image $x_{t}$ and the timestep $t$ , effectively capturing both local and global features. This made it well-suited for high-resolution image generation. The variance $σ_{t}$ is typically set either to a fixed $β_{t}$ or computed as $\tilde{β_{t}} = \frac{1 - \bar{α_{t - 1}}}{1 - \bar{α_{t}}} β_{t}$ . Sampling began with $x_{t} \sim N (0, I)$ and proceeded iteratively by drawing $x_{t - 1}$ from the learned distribution $p_{θ} (x_{t - 1} | x_{t})$ until reaching $x_{0}$ , thus yielding the final generated sample $\hat{x_{0}} \sim p (x_{0})$ . Such a sampling process should be described as equation (4):

x_{t - 1} = \frac{1}{\sqrt{α_{t}}} (x_{t} - \frac{1 - α_{t}}{\sqrt{1 - \bar{α_{t}}}} ϵ_{θ} (x_{t}, t)) + σ_{t} z

(4)

Where $z \sim N (0, I)$ if $t > 0$ else $z = 0$ .

During training, Smooth L1 loss (Huber loss) was used to train the denoising network $ϵ_{θ}$ , balancing L1 and L2 behavior—stabilizing near zero and less sensitive to outliers. It compared predictive noise $\hat{ϵ_{θ}} (x_{t}, t)$ to true noise $ϵ$ . The equation as equation (5):

L ({\hat{ϵ}}_{θ}, ϵ) = {\begin{matrix} 0.5 {({\hat{ϵ}}_{θ} - ϵ)}^{2} i f | {\hat{ϵ}}_{θ} - ϵ | < 1 \\ | {\hat{ϵ}}_{θ} - ϵ | - 0.5 o t h e r w i s e \end{matrix}}

(5)

Denoising Diffusion Probability Model with Mutual Information (DDPM + MI)

In order to improve diversity and reduce overfitting in synthetic image generation, we incorporated Mutual Information (MI) into the DDPM loss function. While DDPMs were generating high-quality images, they produced outputs highly correlated with the training data.^28–30 Additionally, MI as a regularization term encouraged the model to generate more diverse samples by reducing dependency on the input. MI defined as equation (6):

I (X; Y) = \int \int \int p (x, y) l o g (\frac{p (x, y)}{p (x) p (y)}) d x d y

(6)

The total loss combined the standard denoising loss and the MI term as equation (7):

L_{t o t a l} = λ_{1} L + λ_{2} I (X; Y)

(7)

This equation has been measured as informing the generated image and sharing with the input. This approach helps the model produce realistic images while avoiding memorization, leading to better generalization in clinical classification tasks.

DDPM Model Training

The algorithm that is used with the adamW optimization, along with smooth L1 loss, and in combination with 100 epochs and 0.0002 learning rate was used in this research. The DDPM model was implemented with and without MI regularization, and separated for the different tumor types. This generated HGG and LGG based upon original images (Figure 1B). Training used a linear noise schedule over 1000 diffusion steps in this process. The batch size was determined by extracting the number of images or samples in the current batch from the shape of the pixel data. The model based on the lowest validation loss was chosen over all the other models.

Figure 1.

Input data. A. Extraction of lesion patches, B. Experimental design for data classification involves four classification model setups: (1) real images, (2) real images with augmentation, (3) real images with images generated by Denoising Diffusion Probabilistic Models (DDPM), and (4) real images with images generated by DDPM with Mutual Information (MI) optimization.

The DDPM with MI are incorporated into the loss function as regularization. This is trained under the same conditions (The standard DDPM). Along with the MI term, weighted at 0.1 and the L1 smooth loss weighted at 0.9.

DDPM Model Evaluation

1. Fréchet Inception Distance (FID) and Inception Score (IS)

Evaluation of the DDPM model results were performed by comparing the generated images to the input images based on FID and IS.³¹ The FID metric mirrors the resemblance between the distribution of real images and generated images in a feature space. A lower FID score indicated more similarity between real and generated image distribution. The highest IS metric pinpointed quality and diversity of the generated images.³² In order to ensure consistency with standard 3-channel datasets and methodologies, we needed to reduce the 4-channel data to 3 channels using an encoder to preserve the structural information.

2. Pearson Correlation - Image Similarity to Training Data

To assess the similarity between synthetic images and the training data, we used the Pearson correlation coefficient, which computed the highest correlation with any training image. We focused on the average of the highest correlations between synthetic and training images across different datasets. This approach highlighted how similar the synthetic images were to the training set - and whether they reflected excessive similarity (ie, memorization) or appropriate diversity. The high correlations between values may indicate memorization or data leakage, while lower connections suggest greater diversity (a desirable trait in generative tasks).

Application for Tumor Classification

The segmented enhancing tumor was used as the target area for classification. Each image was first cropped around the center of the enhanced lesion area, and then resized to a 96 × 96 image size (Figure 1A). This size was chosen as a practical compromise—large enough to capture the tumor and surrounding tissue, yet efficient for DDPM training. This design is in line with previous studies using DDPMs on localized regions.³³

The BraTS data was split, at the subject level, into datafiles of 80% training and 20% testing. The training dataset was split into 80% training and 20% validation. For cross-cataloging evaluation, models instructed on the BraTS training dataset were tested on the TAMSC dataset (BraTS-to- TAMSC). The model was adapted to the full TAMSC dataset where it was tested on the BraTS dataset (TAMSC-to-BraTS).

A 2D Resnet152 convolutional neural network was employed for model training. Following this optimization, the training was conducted with batch size = 32, utilizing Focal Loss as a loss function with α = 0.25 and γ = 2. The initial learning rate was set to 0.01, and the training spanned 200 epochs. Additionally, the network structures were modified to accommodate 4 channels of input data (accounting for the different MRI contrasts).

The classification was performed under four setups (Figure 1B): (1) Baseline: real-images only. (2) Baseline + augmentation. Augmentation through transformations was applied to the real images dataset. All transformations were applied randomly, with some restricted to limited ranges. These included random horizontal and vertical flip, random rotations between −2° to 2°, and 0.5 lightning. (3) Baseline + DDPM-generated images, and (4) Baseline + DDPM + Mutual Information images (DDPM + MI).

Integrating Ensemble Modeling for Enhanced Robustness

To enhance prediction accuracy and robustness, we employed an ensemble model using 5-fold cross-validation within each dataset. This approach combined the outputs of five models by averaging their weights. Overall, ensemble models provided a more robust and accurate classification by mitigating weaknesses and amplifying strengths.³⁴

Evaluation of Classification Results

Classification performance across the four experimental setups was evaluated using 5-fold cross-validation, within each dataset and cross-dataset evaluation between the BraTS and TAMSC datasets, measuring accuracy and F1 scoring as performance metrics.

Results

The mean tumor volume was 2.54 ± 1.41 mL (range: 0.3-10.5 mL) for HGG and 2.72 ± 3.01 mL (range: 0.34-14.47 mL) for LGG. Statistical analysis using the Mann–Whitney U test showed no significant difference in tumor volumes between HGG and LGG (P = .14), supporting the suitability of the selected patch size across tumor types.

DDPM Model Evaluation

The FID and IS values presented in Table 1 indicates that the generated images achieved a notable level of quality and diversity. The results aligned with other studies,³¹ confirming the high image quality and diversity of the generated images. The Pearson correlation between the generated and real images, reached a mean value of 0.55 ± 0.11 for both data sets. When mutual information was incorporated into the DDPM model, the correlation decreased to a mean value of 0.47 ± 0.11, suggesting reduced memorization and increased diversity. Nonetheless, image quality remained comparable, as reflected by similar IS values and only a moderate impact on FID scores.

Table 1.

Evaluation of Generated Images Model on BraTS and TAMSC Datasets.

		BraTS dataset		TAMSC local dataset
		IS	FID	IS	FID
LGG	DDPM	1.36 ± 0.07	36.25	1.16 ± 0.03	53.45
LGG	DDM + MI	2.32 ± 0.02	43.65	1.14 ± 0.03	46.93
HGG	DDPM	1.41 ± 0.04	17.67	1.19 ± 0.01	19.49
HGG	DDM + MI	1.28 ± 0.01	65.88	1.13 ± 0.01	22.85

HGG, high grade glioma; LGG, low grade glioma; FID, Fréchet Inception Distance; IS, Inception Score.

Tumor Classification

BraTS Dataset (Baseline)

This model reached a mean 5-fold accuracy level of 0.90 ± 0.03, and F1 score of 0.91 ± 0.03 for the validation dataset. The mean accuracy of 0.86 ± 0.03 and F1 score of 0.86 ± 0.03 for the test dataset shows the output of the base model. Combining Ensemble model, by averaging the output of 5-fold weight, has improved the overall test dataset accuracy and F1 score to 0.97. Based on those results, an Ensemble model was employed for all subsequent inference results.

Model Generalization - Cross-Dataset Evaluation

Figure 2 shows classification performance heatmaps comparing four different scenarios (Baseline, +Augmentation, +DDPM, +DDPM + MI) across two transfer directions: (BraTS→TAMSC and TAMSC→BraTS). The integration of mutual information into the DDPM model, combined with ensemble learning, resulted in improvements in generalization across datasets. Combining DDPM with MI has significantly outperformed all other methods, by achieving the highest accuracy and F1-score of 0.89 and 0.90 for BraTS-to-TAMSC, along with 0.85 and 0.86 for TAMSC-to-BraTS. Such results highlight the effectiveness of combining MI with ensemble learning, in order to enhance model robustness, reduce overfitting, and improve transferability to unseen datasets in cross-dataset evaluations.

Figure 2.

Classification performance heatmaps comparing four different scenarios.

Discussion

Our findings demonstrate that DDPM with mutual information regularization and ensemble learning, substantially improved cross-dataset generalization in brain tumor classification. By consistently outperforming baseline models, traditional augmentation, and standard DDPM created approaches across heterogeneous clinical datasets.

Model generalization presents a critical challenge in deploying DL models for clinical applications. Models trained on one dataset often struggle to perform well on unseen data from different institutions.^20,21 In clinical data, this is particularly important, due to the inherent variability in patient populations, imaging devices, and acquisition protocols across varied hospitals or regions. Ensuring robust generalization is key for achieving reliable, real-world performance in clinical settings.³⁵

It was suggested that integrating synthetic images into model training can significantly enhance DL model generalization. By increasing the diversity of the training set, we seamlessly adapt models to better integrate with unseen conditions and external datasets.

In this study we used DDPMs for synthetic MRI image generation. Studies have demonstrated that DDPMs improve classification tasks by generating realistic synthetic MRI data and addressing class imbalances. However, DDPMs alone tend to produce samples with high correlation to the training data, which can limit their effectiveness in promoting generalization. To overcome this, incorporation of mutual information (MI) into the loss function,^21,28,36 aims to reduce connection with the training set and enhance data diversity.³⁷ This offers a promising solutions to the overfitting problem, commonly encountered in small, imbalanced datasets.^13,17,18 In the current study, we evaluated the contribution of synthetic data generation with MI-augmented DDPMs for improving model generalization in brain tumor classification tasks. We achieved high image quality, as demonstrated by FID and IS scoring.³²

2D classification was selected due to computational constraints and limitations of current DDPM architectures, which are optimized for 2D slice generation given memory and stability challenges. This approach enabled efficient training with MI regularization, ensemble learning, and cross-dataset evaluation. The 2D strategy also allowed for the creation of larger training sets by extracting tumor-centered slices, thereby enhancing DDPM stability. Central slices were chosen to support lesion segmentation and to capture the most diagnostically relevant features.

To further evaluate the impact of synthetic data on model robustness, we conducted cross-dataset generalization tests (training the classification model on the BraTS dataset and testing it on an external cohort (TAMSC dataset), and vice versa). These evaluations reflect real-world clinical scenarios, where models are expected to generalize across institutions with differing imaging protocols and patient populations. Four inference settings were assessed: (1) baseline, (2) baseline with traditional data augmentation, (3) baseline with standard DDPM-generated images, and (4) baseline with DDPM-generated images incorporating MI.

While traditional augmentation is well-established for improving classification performance, the use of DDPM-based synthetic data revealed more nuanced outcomes.^38,39 Notably, the DDPM model with MI, consistently achieved superior performance across datasets. These findings underscored the limitations of traditional augmentation techniques and standard DDPMs, in capturing sufficient variability and highlighting the critically low correlation with in the training data.

To further enhance stability and performance, we incorporated ensemble learning during training. Specifically, we employed a 5-fold cross-validation strategy, where the model was trained independently on five different subsets of training data. The final prediction was obtained by averaging the outputs of these five models. This ensemble approach helped reduce prediction variance and mitigated the risk of overfitting to specific data distributions. Ultimately this leads to more robust and reliable performance across diverse test sets.

These findings indicate that synthetic data plays a crucial role in improving model generalization across diverse test datasets. It creates entirely new, diverse samples, simulating scenarios absent in the original dataset. Added to this findings, our models are able to generalize more effectively adjusting for unseen conditions or rare edge cases.^40,41

Limitations

This study is limited to conventional MRI (T1WI, T1WI + C, T2WI, and FLAIR sequences), which includes other MRI modalities or advanced imaging techniques, that might reveal further benefits or limitations of such proposed. In addition, the use of DDPMs and large models like ResNet152, demands substantial computational resources, potentially limiting their accessibility in resource-constrained settings. It is also acknowledged that 3D classification could offer richer volumetric context and is proposed as a direction for future work, particularly to enhance anatomical coherence and capture spatial dependencies across slices.

Conclusion

This study has demonstrated that the combination of DDPM-generated synthetic data, with mutual information as regularization and ensemble learning, substantially improves brain tumor classification performance across diverse datasets. This approach consistently surpassed conventional methods including baseline models, traditional augmentation, and standard DDPM implementations, offering an effective strategy for addressing cross-institutional variability in clinical imaging.

Footnotes

Acknowledgments

Thanks to Dr Aviad Rabinowich, Mr Tuvia Genut and Mrs. Shelly Gur for the technical support. The results published here are in whole or part based upon data generated by the TCGA Research Network: .

ORCID iD

Moran Artzi

Ethics Approval

Approved by the Tel Aviv Sourasky Medical Center IRB (No. 0200-10). The study was conducted in accordance with the Declaration of Helsinki.

Informed Consent

Patient consent was waived by the IRB due to the retrospective and anonymized nature of the data.

Author Contributions

Yael H. Moshe – Conceptualization, Methodology, Analysis, Writing.

Mina Teicher – Supervision, Theory, Review & Editing.

Moran Artzi – Conceptualization, Methodology, Data Curation, Clinical Insight, Supervision,

Review & Editing.

All authors approved the final manuscript.

Funding

Israel science foundation 205501.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability

The MRI data used in this study were obtained from the Brain Tumor Segmentation Challenge (BraTS 2019), available for download at .

Use of Artificial Intelligence

No generative AI was used for data or content creation. ChatGPT was used only for minor language editing; Language editing was performed by a professional English editor.

References

Cui

Mao

Jiang

Liu

Xiong

. Automatic semantic segmentation of brain gliomas from MRI images using a deep cascaded neural network. J Healthc Eng. 2018;2018(1):4940593.

Hosny

Parmar

Quackenbush

Schwartz

Aerts

. Artificial intelligence in radiology. Nat Rev Cancer. 2018;18(8):500-510.

Zech

Badgeley

Liu

Costa

Titano

Oermann

. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med. 2018;15(11):e1002683.

Oakden-Rayner

Dunnmon

Carneiro

Ré

. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. 2020:151-159.

Goceri

. Medical image data augmentation: techniques, comparisons and interpretations. Artif Intell Rev. 2023;56(11):12561-12605.

Swati

ZNK

Zhao

Kabir

, et al. Brain tumor classification for MR images using transfer learning and fine-tuning. Comput Med Imaging Graph. 2019;75:34-46.

Grøvik

, et al. Handling missing MRI sequences in deep learning segmentation of brain metastases: a multicenter study. NPJ Digit Med. 2021;4(1):1-7.

Dalmaz

Yurt

Çukur

. Resvit: residual vision transformers for multimodal medical image synthesis. IEEE Trans Med Imaging. 2022;41(10):2598-2614.

Conte

Weston

Vogelsang

, et al. Generative adversarial networks to synthesize missing T1 and FLAIR MRI sequences for use in a multisequence brain tumor segmentation model. Radiology. 2021;299(2):313-323.

10.

Moshe

Buchsweiler

Teicher

Artzi

. Handling missing MRI data in brain tumors classification tasks: usage of synthetic images vs. duplicate images and empty images. J Magn Reson Imaging. 2024;60(2):561-573.

11.

Guan

Loew

. Breast cancer detection using synthetic mammograms from generative adversarial networks in convolutional neural networks. J Med Imaging. 2019;6(3):031411-031411.

12.

Khader

Müller-Franzes

Tayebi Arasteh

, et al. Denoising diffusion probabilistic models for 3D medical image generation. Sci Rep. 2023;13(1):7303.

13.

Chang

C-W

Peng

Safari

, et al. High-resolution MRI synthesis using a data-driven framework with denoising diffusion probabilistic modeling. Phys Med Biol. 2024;69(4):045001.

14.

Dorjsembe

Pao

H-K

Odonchimed

Xiao

. Conditional diffusion models for semantic 3D brain MRI synthesis. IEEE J Biomed Health Inform. 2024;28(7):4084-4093.

15.

Azizi

Kornblith

Saharia

Norouzi

Fleet

. Synthetic data from diffusion models improves imagenet classification. arXiv preprint arXiv:230408466. 2023.

16.

Dhariwal

Nichol

. Diffusion models beat gans on image synthesis. Adv Neural Inf Process Syst. 2021;34:8780-8794.

17.

Müller-Franzes

Niehues

Khader

, et al. A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis. Sci Rep. 2023;13(1):12098.

18.

Jain

Abbeel

. Denoising diffusion probabilistic models. Adv Neural Inf Process Syst. 2020;33:6840-6851.

19.

Yuan

Duan

Tustison

Hubbard

Linn

. Remind: recovery of missing neuroimaging using diffusion models with application to Alzheimer’s disease. medRxiv. 2023.

20.

Salehinejad

Valaee

Dowdell

Colak

Barfett

. Generalization of deep neural networks for chest pathology classification in x-rays using generative adversarial networks. IEEE; 2018:990-994.

21.

Sun

. On the generalization of diffusion model. arXiv preprint arXiv:230514712. 2023.

22.

Bakas

Akbari

Sotiras

Bilello

Rozycki

Kirby

. Advancing the cancer genome atlas glioma MRI collections with expert segmentation labels and radiomic features. Nature Scientific Data 4 (1)(Sep 2017). arXiv preprint arXiv:14091556. 2014:267-373.

23.

Bakas

Reyes

Jakab

, et al. Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge. arXiv preprint arXiv:181102629. 2018.

24.

Menze

Jakab

Bauer

, et al. The multimodal brain tumor image segmentation benchmark (BRATS). IEEE Trans Med Imaging. 2014;34(10):1993-2024.

25.

CBICA. Multimodal brain tumor segmentation challenge 2019. https://www.med.upenn.edu/cbica/brats-2019/

26.

Isensee

Kickingereder

Wick

Bendszus

Maier-Hein

. Brain tumor segmentation and radiomics survival prediction: contribution to the brats 2017 challenge. Springer; 2017:287-297.

27.

Isensee

Jaeger

Kohl

Petersen

Maier-Hein

. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods. 2021;18(2):203-211.

28.

Zhai

Long

Pan

Chen

. Mutual information compensation for high-fidelity image generation with limited data. IEEE Signal Process Lett. 2024;31:2145-2149.

29.

Shaghayegh

Mostafa

Reza

. MidGAN: mutual information in GAN-based dialogue models. Appl Soft Comput. 2023;148:110909.

30.

Yang

Yan

Cheng

Zhang

. Learning deep generative clustering via mutual information maximization. IEEE Trans Neural Netw Learn Syst. 2022;34(9):6263-6275.

31.

Nguyen

Nhat

. Class label conditioning diffusion model for robust brain tumor MRI synthesis. Authorea Preprints. 2023.

32.

Salimans

Goodfellow

Zaremba

Cheung

Radford

Chen

. Improved techniques for training gans. Adv Neural Inf Process Syst. 2016;29:2226-2234.

33.

Wang

Jiang

Zheng

, et al. Patch diffusion: faster and more data-efficient training of diffusion models. Adv Neural Inf Process Syst. 2023;36:72137-72154.

34.

Anand

Gupta

, et al. Weighted average ensemble deep learning model for stratification of brain tumor in MRI images. Diagnostics. 2023;13(7):1320.

35.

Wong

Gatt

Stamatescu

McDonnell

. Understanding data augmentation for classification: when to warp? IEEE; 2016:1-6.

36.

Liu

Fan

Liu

, et al. Generative diffusion models on graphs: methods and applications. arXiv preprint arXiv:230202591. 2023.

37.

Hamidi

Yang

E-H

. Conditional mutual information based diffusion posterior sampling for solving inverse problems. arXiv preprint arXiv:250102880. 2025.

38.

Shorten

Khoshgoftaar

. A survey on image data augmentation for deep learning. J Big Data. 2019;6(1):1-48.

39.

Mikołajczyk

Grochowski

. Data augmentation for improving deep learning in image classification problem. IEEE; 2018:117-122.

40.

Tremblay

Prakash

Acuna

, et al. Training deep networks with synthetic data: bridging the reality gap by domain randomization. 2018:969-977.

41.

Nikolenko

. Synthetic Data for Deep Learning. Vol. 174. Springer; 2021.

Enhancing Brain Tumor Classification and Generalization Using DDPM-Generated MRI,Mutual Information and Ensemble Learning

Abstract

Background

Purpose

Study Type

Population

Field Strength/Sequence

Assessment

Results

Data Conclusion

Keywords

Introduction

Material and Methods

Datasets

Denoising Diffusion Probability Model (DDPM)

Denoising Diffusion Probability Model with Mutual Information (DDPM + MI)

DDPM Model Training

DDPM Model Evaluation

Application for Tumor Classification

Integrating Ensemble Modeling for Enhanced Robustness

Evaluation of Classification Results

Results

DDPM Model Evaluation

Tumor Classification

BraTS Dataset (Baseline)

Model Generalization - Cross-Dataset Evaluation

Discussion

Limitations

Conclusion

Footnotes

Acknowledgments

ORCID iD

Ethics Approval

Informed Consent

Author Contributions

Funding

Declaration of Conflicting Interests

Data Availability

Use of Artificial Intelligence

References