Sage Journals: Discover world-class research

Abstract

Introduction

Rapid magnetic resonance imaging (MRI) plays an increasingly important role in radiotherapy. It improves the accuracy of delineation of target volumes and organs at risk (OARs), but generating accurate synthetic CT(sCT) from MRI remains challenging, and the lack of electron density information limits its further clinical application. Therefore, the purpose of this study was to develop and evaluate a CBAMPix2Pix model for MRI to CT synthesis.

Methods

We adapted the CBAMPix2Pix architecture, incorporating convolutional block attention module(CBAM), to synthesize CT images from 17260 MRI slices of 86 patients with metastatic brain cancer. The model analyzes local features to enhance image authenticity and is designed to map T1-weighted Contrast-enhanced (T1wc) to sCT. To address the data imbalance between normal tissue and bone, we introduce structural similarity loss(SSIM) to enhance local features of learning images, thereby better reducing differences in Hounsfield Unit(HU).

Results

We evaluate the performance of the model through quantitative and qualitative evaluations. Our proposed model achieves higher peak signal-to-noise ratio (PSNR) of 27.5 ± 3.3 dB, normalized mean absolute error (NMAE) of 0.019 ± 0.023, and structural similarity index (SSIM) of 0.857 ± 0.059 for sCT images in MR simulation sequences, and the average mean absolute error(MAE) was 74.48 ± 22.88 HU in body and 185.89 ± 21.59 HU in bone. The P-values of the Wilcoxon signed-rank test for the CBAMPix2Pix model compared with the other two models in PSNR, SSIM, MAE, and NMAE were all less than 0.05 in the test cohorts.

Conclusion

We have developed a novel CBAMPix2Pix model that can effectively generate realistic sCT images comparable to real images, potentially improving the accuracy of MRI-based treatment planning.

Keywords

MRI CT deep learning metastatic brain cancer generative adversarial networks

Introduction

Magnetic Resonance imaging (MRI) is extensively employed in radiotherapy due to its superior soft-tissue contrast compared to Computed Tomography (CT) imaging. This enables more accurate delineation of tumor targets and organs at risk.^1–4 The development of a novel radiotherapy modality based solely on MRI simulation, namely MR-only radiotherapy, is emerging as a prevalent research focus.⁵ In head-and-neck cancer radiation treatment, some studies have shown that using MRI images can greatly reduce inter-observer variability in tumor delineation and improve treatment outcomes.^6–11 This is attributed to the elimination of additional CT imaging requirements, thereby reducing patients’ radiation exposure.¹² But to implement MR-only radiotherapy, we need to derive synthetic CT(sCT) images from MR images to obtain electron densities for radiation dose calculation. Unlike CT, MR signals usually depend only on hydrogen atoms, not the entire material composition. So, MRI alone can't be directly linked to the key physical properties needed for dose calculations, like electron density in photon therapy. Moreover, MR images are often fused with simulated CT through inter-modality registration, and physicians delineate contours based on the fused MR view. However, this approach is likely to have systematic errors, which can cause geometrical uncertainties for the contours.^13,14

The primary challenge in MR-only workflows lies in obtaining electron density data from MRI for radiation dose calculations. Unlike CT numbers, which can be directly changed into electron density, MRI pixel values only show the magnetic relaxation time of tissues and aren't directly related to electron density. However, tissue relaxation time can first be converted into CT numbers (Hounsfield Unit, HU) and then into electron density.¹⁵ To solve this problem, many ways to get sCT images have been suggested. These are statistical modeling,¹⁶ traditional machine learning,^17,18 and multi-atlas based methods.^19–23 Lately, there's been more interest in artificial intelligence-based methods, especially deep convolutional neural networks (CNN). With deep learning using CNN, the model can automatically extract features at different scales and combine them into an end-to-end network for prediction. This lessens the need for manual feature extraction.^24–30 The performance of these methods can be affected by the accuracy of MR-CT registration, and even tiny voxel-wise misalignments may lead to blurring of synthesized images. To create a complex and nonlinear mapping mechanism between MR and CT image domains, self-learning and self-optimizing strategies can be used. Once the best Deep learning(DL) parameters are determined, sCT images can be easily obtained in a few seconds by putting new MR images into the trained model. This approach allows for efficient and accurate extraction of electron density information from MRI, thereby overcoming the limitations of single-modality imaging in clinical radiotherapy implementation and enabling more precise radiation dose calculations. Recently, increasing interest focused on generative adversarial network (GAN),^31–33 and its variants,^34–36 have shown promise in synthesizing high-quality sCT images with reduced blurriness compared to conventional CNN approaches.

In this study, we are dedicated to generating sCT images from MR images to optimize MR-based radiotherapy workflows. We propose a novel deep learning network called CBAMPix2Pix. This network introduces a convolutional block attention module(CBAM) based on the Pix2Pix framework to enhance the feature representation capabilities of the deep network and extract more effective features. Additionally, we employ a structural similarity(SSIM) loss function to ensure geometric consistency between MR and sCT images. This allows for a more comprehensive evaluation of image characteristics and provides guidance for optimizing radiotherapy plans and treatment effects.

Materials and Methods

Table 1 summarizes the characteristics of the MR and CT datasets used in this study. A total of 86 brain cancer patients treated at the authors’ institutions between January 2023 and March 2025 were enrolled: 61 for model training, 10 for validation, and 15 for testing. The data split was performed at the patient level to ensure independence and prevent information leakage. Specifically, all MRI and CT slices from a given patient were assigned exclusively to one of the training, validation, or test sets. All MR and CT scans were acquired using identical patient positioning and thermoplastic masks on a dedicated radiotherapy simulation scanner. CT images were rigidly aligned to MRI and resampled to 1 mm³ isotropic resolution, MRI was deformably registered to CT using ANTsPy (v0.0.7) with the SyN algorithm. The field of view was cropped to 256 × 256 × (160–240) voxels. CT Hounsfield Units (HU) were clipped to [−1000, 3071], and both MR and deformed CT (dCT) underwent histogram matching and intensity scaling to [−1, 1]. MRI preprocessing included N4 bias field correction and histogram equalization. MRI preprocessing included: (1) N4 bias field correction³⁷ to address non-uniformity; (2) histogram equalization.

Table 1.

Data Characteristics of Magnetic Resonance (MR) and Computed Tomography (CT) Images.

	MR	CT
Number of patients	86	86
System	United Imaging, Omega 3T	Brilliance Big Bore CT scanner (Philips, Cleveland, OH)
Original volume size	256256200 ∼ 384384240	51251282 ∼ 10241024342
Sequence	T1wc	No-Contrast CT
Slice Thickness (mm)	1	1∼3
Preprocessed size	256256(200 ∼240)	256256(200 ∼ 240)
Isotropic resolution after resampling (mm³)	1.0 × 1.0 × 1.0	1.0 × 1.0 × 1.0

All models (CycleGAN, Pix2Pix, CBAMPix2Pix) were implemented in PyTorch (v1.12.0) and trained on an RTX 2080 GPU (12 GB). A batch size of 1 was used with Adam optimizer (β₁=0.5, β₂=0.999) and initial learning rate of 1e−4 for both generator and discriminator. Training ran for up to 40 epochs, with early stopping based on validation loss, the best model was selected by validation performance.

Models

GAN have shown significant potential in image-to-image translation tasks, particularly in domain adaptation within medical image analysis. They have driven advancements in segmentation, cross-modal synthesis, anatomical registration, and radiation treatment planning, including dose calculations. However, their clinical application is hindered by challenges such as model instability from adversarial training, limited generalization across diverse imaging modalities, and reliance on high-quality paired datasets. Cycle-consistency models like CycleGAN,³⁸ can process misaligned images but may produce multiple solutions, making them less suitable for high-precision medical image translation. Pix2Pix,³⁹ a conditional GAN (cGAN) model, comprises a generator with residual blocks for capturing context and multi-scale features, and a discriminator for assessing image authenticity. While effective in many tasks, it struggles with complex scenes and fine details.

To enhance the classic Pix2Pix model, we introduced the CBAM to create the CBAMPix2Pix architecture, thereby boosting the model's feature capturing and representing abilities. The overall architecture is illustrated in Figure 1. CBAM, a lightweight attention module, combines channel and spatial attention mechanisms. It first employs channel attention to learn channel importance and then feeds the output into the spatial attention module to learn spatial position weights. This process adaptively recalibrates input feature maps, enabling the model to focus on key channels and spatial positions. This allows the network to better capture important features and improves its performance in image-to-image translation tasks. The generator in the original Pix2Pix model uses residual blocks to capture image context and multi-scale features, while the discriminator assesses the authenticity of generated images. However, Pix2Pix has limitations in handling complex scenes and fine-grained details. By incorporating CBAM into the Pix2Pix framework, our CBAMPix2Pix model addresses these limitations and achieves better results in generating synthetic CT images from MRI data.

Figure 1.

(a) Is the overall architecture of CBAMPix2Pix; (b), (c), and (d) represent the operation flowcharts of the channel and spatial attention mechanisms, respectively.

Figure 2 presents the architecture of the modified generator utilized in this study. In the CBAMPix2Pix generator, a CBAM was incorporated before the residual blocks to enhance feature extraction. The feature map first undergoes convolution to extract preliminary features, then enters the CBAM module. Within CBAM, the channel attention module employs global average and max pooling alongside a shared multilayer perceptron (MLP) to generate a channel attention map. This map weights the feature map's channels, amplifying important ones and suppressing less relevant ones. Subsequently, the weighted feature map enters the spatial attention module, which calculates the importance of each spatial position to produce a spatial attention map. This further emphasizes key spatial regions by weighting the spatial positions of the feature map. After undergoing CBAM processing, the feature map proceeds to the up-sampling stage. This design enables the generator to more accurately capture image details and semantics, resulting in higher-quality image generation. It enhances feature capture, focuses on key image regions, reduces noise and irrelevant features, and boosts the model's expressiveness. This allows it to better preserve image details and textures during generation, resulting in more realistic and accurate images.

Figure 2.

Illustration of CBAMPix2Pix. The model begins with a convolutional layer for input image processing, followed by CBAM.

Loss Functions

The proposed model employs a multi-loss(L1 loss, adversarial loss and structural similarity loss) framework to optimize CBAMPix2Pix architecture. The primary loss components include L1 loss( $L_{1} (G)$ ) between original and reconstructed images, which are commonly used for pixel-wise regression tasks. Equation (1):

\begin{matrix} L_{1} (G) = E_{x, y} [‖ y - G (x) ‖_{1}] \end{matrix}

(1)

The adversarial loss( $L_{G A N} (G, D)$ ) assesses how closely the generator's images match real images from the discriminator's perspective.

\begin{matrix} L_{G A N} (G, D) = E_{x, y} [\log D (x, y)] + E_{x} [\log (1 - D (x, G (x)))] \end{matrix}

(2)

Equation (2) is the adversarial loss function for Pix2Pix, where x is the input image, y the target image, and G(x) the generator's output.

In order to effectively measure the similarity of the regional contours of MRI and CT images, a structural similarity(SSIM) loss calculation method was proposed. As show as Equation (3):

\begin{matrix} S S I M (x, y) = \frac{(2 μ_{x} μ_{y} + N_{1}) (2 σ_{x} σ_{y} + N_{2})}{(μ_{x}^{2} μ_{y}^{2} + N_{1}) (σ_{x}^{2} σ_{y}^{2} + N_{2}))} \end{matrix}

(3)

In Equation (3), μ_x_, μ_y_, σ_x, and σ_y are the mean value and standard deviation of the real image x and the image to be evaluated y, respectively. This method focuses on the data distribution characteristics from different modalities rather than the pixel-level differences. It thus pays more attention to the similarity of overall contours at a higher level. By incorporating this SSIM loss based on mutual information into the loss function, the model can better preserve the structural features such as contour shape during the generation of synthetic CT images. This enhances the similarity between the regional contours of the generated synthetic CT images and the original MRI images.

The Total loss is as follows Equation (4):

\begin{matrix} T o t a l l o s s = α . L 1 + β . L_{G A N} (G, D) + λ . (1 - S S I M) \end{matrix}

(4)

The hyperparameters α, β, and λ represent the relative influence weights assigned to each component of the composite loss function. In this study, they were empirically set as α=100, β=1, and λ=60 to balance the contributions of each loss. This multi-loss strategy serves as a form of regularization, constraining the solution space to prioritize pixel-wise accuracy, and realism.

Model Evaluation

Model performance was quantitatively evaluated using four metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), Mean Absolute Error (MAE), and Normalized Mean Absolute Error (NMAE). The metrics (NMAE, PSNR, SSIM) were computed on a per-2D-slice basis across the entire test set, and then averaged across all slices.

PSNR is an important indicator for evaluating image or signal quality. It is mainly used to measure the similarity between the generated image and the real image, and is especially widely used in the field of image compression and reconstruction. Higher PSNR values indicate greater similarity to the original image, with the formula defined as:

\begin{matrix} P S N R (x, y) = 10 \cdot \log_{10} {(\frac{M A X_{I}}{M S E (x, y)})}^{2} \end{matrix}

(5)

In Equation (5), $M A X_{I}$ is the peak signal, MSE(x, y) is the MSE of the image, $M S E (x, y) = \frac{1}{n} \sum_{i = 1}^{n} (i_{x} - i_{y})^{2}$ , x is the reference image, y is the synthesized image, and n is the total number of pixels.

SSIM is an indicator used to measure the similarity between two images, mainly from three aspects: brightness, contrast and structure. Higher SSIM values indicate better perceived quality. The SSIM formula is as shown in Equation (3);

MAE is a widely-used metric for assessing image quality, quantifying the discrepancy between a generated image and its corresponding real image. It is calculated by taking the absolute differences between the true pixel values and the predicted pixel values, summing these differences, and then averaging them. A lower MAE value indicates a higher degree of similarity between the generated and real images, thereby reflecting the superior performance of the image generation model, The NMAE is computed after mapping model outputs and ground truth from the [−1, 1] preprocessing range to [0, 1]. Thus, NMAE ∈ [0, 1], where 0 represents perfect agreement and 1 indicates maximal deviation:

\begin{matrix} M A E (x, y) = \frac{\sum_{i = 1}^{n} | i_{x} - i_{y} |}{n} \end{matrix}

(6)

Where x is the real image, y is the image to be estimated and n is the size of the image to be estimated.

Statistical Analysis

All statistical analyses were performed using Python (version 3.10.13; https:// www. python. org/) and SPSS Statistics 27. The differences between the sCT images and the dCT images, and between models were all assessed using the Wilcoxon signed-rank test, as the comparisons were performed on paired observations.

Results

Quantitative and Qualitative Evaluation

The three models were trained in 36 (CycleGAN), 16 (Pix2Pix), and 17(CBAMPix2Pix) hours, respectively. Once the models have been trained, it took a few seconds to generate sCT images from new MR volume image. Figure 3 shows the visual comparison at different anatomical locations, which contains the axial view of the original MR, the corresponding dCT, and sCT images generated from CycleGAN, the Pix2Pix, the proposed CBAMPix2Pix. In the visual inspection, the generated results of our method are more consistent with the dCT compared with other two methods in terms of global and local imaging regions. As can be seen in Figure 3, the sCT generated using CBAMPix2Pix showed sharper boundaries than the sCT generated using the original CycleGAN and Pix2Pix.

Figure 3.

Comparison of deformed CT(dCT) and synthetic CT(sCT) images generated by different models. The first and second columns show MR images and dCT. And the third, fourth columns, and the fifth column respectively display sCT images generated by the CycleGAN model, the Pix2Pix, and CBAMPix2Pix. Red boxes highlight anatomical regions where CBAMPix2Pix shows the most significant improvement.

Table 2 summarizes the quantitative results of NMAE, SSIM, and PSNR for sCT images of all test patients relative to reference CT images. It shows that CBAMPix2Pix generated sCT images have smaller errors (NMAE) and higher similarity (PSNR and SSIM) to dCT than those from CycleGAN and Pix2Pix. For CycleGAN, Pix2Pix, and CBAMPix2Pix, the average ± SD of NMAE between in T1wc images are 0.023 ± 0.022, 0.020 ± 0.023, and 0.019 ± 0.023; the average ± SD of PSNR are 25.4 ± 2.4 dB, 27.1 ± 3.0 dB, and 27.5 ± 3.3 dB; the average ± SD of SSIM are 0.818 ± 0.061, 0.843 ± 0.067, and 0.857 ± 0.059, respectively. CBAMPix2Pix outperforms traditional CycleGAN in synthesis quality. In addition to slight improvements in NMAE (0.019), PSNR (27.5 dB), and SSIM (0.857) for T1wc images, our model demonstrated minimal but consistent enhancements across all structures.

Table 2.

Errors (NMAE) and Similarities (PSNR, SSIM) Relative Between sCT and dCT in the Test Group.

	CycleGAN	Pix2Pix	CBAMPix2Pix	P1	P2
NMAE (↓)	0.023 ± 0.022	0.020 ± 0.023	0.019 ± 0.023	<0.001	<0.001
PSNR(dB) (↑)	25.4 ± 2.42	27.1 ± 3.00	27.5 ± 3.28	<0.001	<0.001
SSIM (↑)	0.818 ± 0.061	0.843 ± 0.067	0.857 ± 0.059	<0.001	<0.001

NMAE Normalized Mean absolute error, PSNR(dB) Peak signal-to-noise ratio, SSIM Structural similarity index; synthetic CT(sCT), deformed CT(dCT); The calculated metrics are presented as mean ± SD. P value: P1 = CBAMPix2Pix versus CycleGAN, P2 = CBAMPix2Pix versus Pix2Pix.

The sCT images generated by CBAMPix2Pix from T1wc MR images show smaller differences in Hounsfield Unit (HU) values compared to dCT. As indicated in Table 3, the MAE is 74.48 ± 22.88 HU for body and 185.89 ± 21.59 HU for bones, significantly lower than those of CycleGAN and Pix2Pix. These results highlight the enhanced accuracy of CBAMPix2Pix in synthesizing CT-like images from MRI data.

Table 3.

HU Discrepancies Between the Deformed CT(dCT) Image and the Synthetic CT Image Generated from the Models Trained with All Samples on 3D Volumes.

Mean ± SD(range) HU	MAE			P3	P4
Mean ± SD(range) HU	CycleGAN	Pix2Pix	CBAMPix2Pix	P3	P4
Body	93.61 ± 22.63	78.35 ± 23.32	74.48 ± 22.88	<0.001	<0.001
	(58.90 ∼ 119.23)	(43.54 ∼ 132.87)	(42.09 ∼ 128.31)	<0.001	<0.001
Bone	296.18 ± 32.61	209.05 ± 26.11	185.89 ± 21.59	<0.001	<0.001
	(230.46 ∼ 346.18)	(171.93 ∼ 247.85)	(151.39 ∼ 214.78)	<0.001	<0.001

Abbreviations: HU = Hounsfield unit; MAE = mean absolute error; SD = standard deviation. P value: P3 = CBAMPix2Pix vs CycleGAN, P4 = CBAMPix2Pix vs Pix2Pix.

The P-values of the Wilcoxon signed-rank test for the CBAMPix2Pix model compared with the other two models in PSNR, SSIM, MAE, and NMAE were all less than 0.05 in the test cohorts.

Discussion

Generating CT images from MRI data is inherently a style transfer challenge. Recent advancements in data-to-data translation within generative models have spurred research into improving sCT quality via GANs and their variants. Nevertheless, the exclusive utilization of the sigmoid cross-entropy loss function in the original GAN tends to result in unstable training procedures. Pix2Pix, functioning as a cGAN, has demonstrated efficacy in image-to-image translation assignments. It accomplishes this by acquiring a mapping from input to output images when trained on paired data. In contrast to CycleGAN, which operates using unpaired data, Pix2Pix necessitates paired datasets. Yet, owing to the direct supervision afforded by the paired MR and CT images, it can produce more precise and detailed sCT images. Notwithstanding this advantage, when employing Pix2Pix for generating sCT images from MR data, the L1 loss function is typically utilized to regulate the similarity between the generated CT images and the dCT images. Although the L1 loss imposes constraints at the pixel level and computes loss values on a pixel-by-pixel basis, it fails to precisely gauge structural similarity in terms of global structural cues such as contour or shape consistency. This limitation can give rise to deficiencies in the sCT images, particularly in the depiction of bone structures. For example, Peng et al⁴⁰ revealed that, in patients with nasopharyngeal carcinoma (NPC), the MAE of sCT images generated by cGAN was greater than that of CycleGAN. Specifically, the MAE values were 69.67 ± 9.27 HU (cGAN) and 100.62 ± 7.90 HU (CycleGAN) in soft tissues, and 203.71 ± 28.22 HU (cGAN) and 288.17 ± 17.22 HU (CycleGAN) in bone. Critically, despite these residual errors (particularly in bone), the same study reported 2%/2-mm γ passing rates were (98.68 ± 0.94)% and (98.52 ± 1.13)% for the cGAN and cycleGAN, respectively. Meanwhile, the absolute dose discrepancies within the regions of interest were (0.49 ± 0.24)% and (0.62 ± 0.36)%, respectively. These values substantially exceed the commonly accepted clinical threshold of 95% for γ passing rates. Furthermore, Kang et al⁴¹ discovered that when utilizing CycleGAN, the mean SSIM and PSNR values were merely 0.90 ± 0.03 and 26.3 ± 0.7 dB in patients with pelvic, thoracic, and abdominal tumors.

In this study, we concentrated on producing sCT images from the MR images of metastatic brain cancer patients, utilizing paired 3 T MR data. We introduced a new deep-learning model named CBAMPix2Pix, designed to enhance the MR-based radiotherapy workflow. This model is an extension of the Pix2Pix framework, enhanced by the CBAM to strengthen feature representation. We also incorporated a structural similarity loss function to guarantee geometric consistency between MR and sCT images. Our experiments revealed that CBAMPix2Pix works effectively on brain cancer patients'MR images. The sCT images closely resemble real CT images in image quality and HU value accuracy. The HU values of the sCT images are nearly identical to those of real CT images, fulfilling the radiotherapy dose calculation requirements and reducing dose errors caused by inaccurate HU values. The generated sCT images bear a striking similarity to dCT images, especially in high-density bony tissues. This makes them ideal for direct use in quantitative applications like dose calculation and adaptive treatment planning, as shown in Figure 3. Unlike CycleGAN, which uses unpaired data and may fail to preserve local features, our method uses paired data for more accurate and detailed sCT generation. By integrating CBAM and a structural similarity loss function, CBAMPix2Pix overcomes the limitations of traditional Pix2Pix methods. It better captures global structural cues and ensures geometric consistency, producing higher-quality and more realistic sCT images. This is vital for clinical radiotherapy planning and treatment. Lately, many modified GAN-based methods have emerged to boost the accuracy and efficiency of sCT generation from MR. For example, Yang et al⁴² created an extra structure-consistency loss based on the modality independent neighborhood descriptor for unsupervised MR-to-CT synthesis. Their method outperforms original CycleGAN in synthetic CT image accuracy and visual quality. A novel compensation-CycleGAN was proposed, modifying the cycle-consistency loss in traditional CycleGAN to simultaneously generate sCT images and compensate for missing anatomy in truncated MR images. Focusing on cervical cancer patients, Cusumano et al⁴³ applied a conditional GAN (cGAN) to 20 test patients. They used MAE and mean error (ME) to evaluate image metrics, achieving an MAE of 78.71 HU in the abdominal region and an ME of 10.83 HU. Fu et al⁴⁴ used cGAN and CycleGAN networks, validated via leave-one-out cross-validation on 12 abdominal tumor patients. They assessed image metrics using MAE and PSNR. The cGAN achieved an MAE of 89.80 HU and a PSNR of 27.4 dB, while the CycleGAN had an MAE of 94.10 HU and a PSNR of 27.2 dB.

The geometric and structural accuracy of sCT images is crucial for radiotherapy planning, as it ensures precise radiation dose delivery. Even minor inaccuracies in anatomy can lead to severe consequences in radiation therapy outcomes. To address this, we proposed the CBAMPix2Pix method, which effectively preserves structural characteristics and focuses on the human body region or the most critical part of the sCT. We evaluated our method on 15 independent test patients (see Table 2). The results indicated that while CycleGAN and Pix2Pix can produce realistic CT images, they struggle with regions of low MR signals, such as bone interfaces, leading to uncertainties. In contrast, CBAMPix2Pix outperformed the other two methods in terms of NMAE, PSNR, and SSIM, likely due to its integration of the advantages of Pix2Pix and the inclusion of channel and spatial attention mechanisms (see Figure 2). A common challenge in MR-to-CT synthesis is the less accurate HU values compared to dCT images. However, the three methods implemented in this study demonstrated a good ability to correct the CT values of MRI. Among them, CBAMPix2Pix achieved the best results, followed by Pix2Pix and then the original CycleGAN. Comparing our results with previous studies, Zhao et al⁶reported average MAE, PSNR, and SSIM values of 91.30 HU, 27.4 dB, and 0.94, respectively, for their Comp-CycleGAN models trained with body contour information. Our CBAMPix2Pix model showed better performance in quantitative metrics, achieving a PSNR of 27.5 dB and an MAE of 74.48 HU in body. The accurate sCT generation provided by our model aids oncologists in contouring precise targets, leading to more accurate radiotherapy plans for cancer patients and potentially improving survival rates.

CBAMPix2Pix, which integrates an SSIM loss function and channel-spatial attention mechanisms, has certain limitations. First, all data were acquired at a single institution using a fixed protocol (United Imaging 3 T MRI and Philips Big Bore CT), limiting generalizability across vendors, field strengths, or acquisition settings. Multi-center, multi-vendor validation is essential for clinical translation and will be pursued in future collaborations. Second, while our method reduces HU prediction errors, a full dosimetric evaluation was not performed. Task-based validation—such as dose recalculation in a commercial treatment planning system (TPS) using clinical plans—is critical to confirm clinical acceptability. Future work will quantify dosimetric accuracy and assess suitability for applications including dose calculation, online adaptive radiotherapy, and multi-center trials. Notably, as a fully MR-derived approach, our framework is inherently compatible with MR-guided radiotherapy(MRgRT) platforms, where it could enable CT-free daily replanning by eliminating manual bulk density assignment. Third, the current model uses only a single MR sequence; future studies will explore multimodal MR inputs to further improve sCT quality and robustness. Beyond these technical directions, our framework also aligns with emerging clinical paradigms that integrate artificial intelligence(AI), imaging, and real-world clinical data, As highlighted by Bilski et al⁴⁵ While our sCT model is developed for photon therapy, its use in proton beam therapy—where accurate HU-to- stopping power ratios (SPR) conversion is critical, especially for central nervous system(CNS) tumors—is not currently supported. Systematic discrepancies between sCT and true CT in bone and air-tissue interfaces remain a barrier to reliable SPR estimation.

Conclusion

This study proposes a CBAMPix2Pix network to generate sCT from MR images to optimize radiotherapy procedures. The network integrates CBAM mechanism, adopts SSIM loss, and combines T1wc to improve performance and generalization capabilities. Qualitative and quantitative experiments revealed that our approach outperformed both CycleGAN and Pix2Pix in terms of both structural preserving and HU accuracy. Our analysis shows that sCT images can provide a new perspective for radiotherapy plan optimization and efficacy prediction. In the future, the model will be optimized, the sample size will be expanded to verify stability, and the application potential of different tumor types will be explored.

Footnotes

Abbreviations

Acknowledgements

Not applicable.

ORCID iD

Chengjian Xiao

Ethics Considerations and Consent to Participate

The studies involving humans were approved by Scientific Research Ethical Review of Ganzhou Cancer Hospital (Approval number 2025-71). The studies were conducted in accordance with the local legislation and institutional requirements. Written informed consent for participation was not required from the participants or the participants’ legal guardians/next of kin in accordance with the national legislation and institutional requirements.

Author Contributions

C.X., C.C., Q.H., F.X., L.L., W.L., and Y.X. contributed to the study conception and design, C.C., Q.H., F.X, L.L., and W.L. prepared materials and collected data. C.X. and Y.X. analyzed the experiments. The first draft of the manuscript was written by C.X. and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The datasets generated and analyzed during the current study are not publicly available due to patient privacy and institutional policies but may be available from the corresponding author upon reasonable request and with permission from Ganzhou Cancer Hospital.

References

Debois

Oyen

Maes

, et al. The contribution of magnetic resonance imaging to the three-dimensional treatment planning of localized prostate cancer. Int J Radiat Oncol Biol Phys. 1999;45(4):857–865.

Emami

Sethi

Petruzzelli

. Influence of MRI on target volume delineation and IMRT planning in nasopharyngeal carcinoma. Int J Radiat Oncol Biol Phys. 2003;57(2):481–488.

Dimopoulos

JCA

De Vos

Berger

, et al. Inter-observer comparison of target delineation for MRI-assisted cervical cancer brachytherapy: Application of the GYN GEC-ESTRO recommendations. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2009;91(2):166–172.

Rasch

Steenbakkers

van Herk

. Target definition in prostate, head, and neck. Semin Radiat Oncol. 2005;15(3):136–145.

Tenhunen

Korhonen

Kapanen

, et al. MRI-only based radiation therapy of prostate cancer: Workflow and early clinical experience. Acta Oncol. 2018;57(7):902–907.

Zhao

Wang

, et al. Compensation cycle consistent generative adversarial networks (comp-GAN) for synthetic CT generation from MR scans with truncated anatomy. Med Phys. 2023;50(7):4399-4414.

Chung

Ting

Hsu

Lui

Wang

. Impact of magnetic resonance imaging versus CT on nasopharyngeal carcinoma: Primary tumor target delineation for radiotherapy. Head Neck. 2004;26(3):241–246.

Rasch

CRN

Steenbakkers

RJHM

Fitton

, et al. Decreased 3D observer variation with matched CT-MRI, for target delineation in nasopharynx cancer. Radiat Oncol. 2010;5:21.

Dai

King

. State of the art MRI in head and neck cancer. Clin Radiol. 2018;73(1):45–59.

10.

Ulin

Urie

Cherlow

. Results of a multi-institutional benchmark test for cranial CT/MR image registration. Int J Radiat Oncol Biol Phys. 2010;77(5):1584–1589.

11.

Karlsson

Nyholm

Amies

Zackrisson

. Dedicated magnetic resonance imaging in the radiotherapy clinic. Int J Radiat Oncol Biol Phys. 2009;74(2):644–651.

12.

Brenner

Hall

. Computed tomography–an increasing source of radiation exposure. N Engl J Med. 2007;357(22):2277–2284.

13.

Burgos

Guerreiro

McClelland

, et al. Iterative framework for the joint segmentation and CT synthesis of MR images: Application to MRI-only radiotherapy treatment planning. Phys Med Biol. 2017;62(11):4237–4253.

14.

Nyholm

Jonsson

. Counterpoint: Opportunities and challenges of a magnetic resonance imaging-only radiotherapy work flow. Semin Radiat Oncol. 2014;24(3):175–180.

15.

Edmund

Nyholm

. A review of substitute CT generation for MRI-only radiation therapy. Radiat Oncol. 2017;12(1):28.

16.

Kapanen

Collan

Beule

Seppälä

Saarilahti

Tenhunen

. Commissioning of MRI-only based treatment planning procedure for external beam radiotherapy of prostate. Magn Reson Med. 2013;70(1):127–135.

17.

Huynh

Gao

Kang

, et al. Estimating CT image from MRI data using structured random forest and auto-context model. IEEE Trans Med Imaging. 2016;35(1):174–183.

18.

Cao

Yang

Gao

Guo

Shen

. Dual-core steered non-rigid registration for multi-modal images via bi-directional image synthesis. Med Image Anal. 2017;41:18–31.

19.

Merchant

Hua

. MRI-based treatment planning with pseudo CT generated through atlas registration. Med Phys. 2014;41(5):51711.

20.

Dowling

Sun

Pichler

, et al. Automatic substitute computed tomography generation and contouring for magnetic resonance imaging (MRI)-alone external beam radiation therapy from standard MRI sequences. Int J Radiat Oncol Biol Phys. 2015;93(5):1144–1153.

21.

Andreasen

Van Leemput

Hansen

Andersen

JAL

Edmund

. Patch-based generation of a pseudo CT from conventional MRI sequences for MRI-only radiotherapy of the brain. Med Phys. 2015;42(4):1596–1605.

22.

Demol

Boydev

Korhonen

Reynaert

. Dosimetric characterization of MRI-only treatment planning for brain tumors in atlas-based pseudo-CT images generated from standard T1-weighted MR images. Med Phys. 2016;43(12):6557–6568.

23.

Andreasen

Van Leemput

Edmund

. A patch-based pseudo-CT approach for MRI-only radiotherapy in the pelvis. Med Phys. 2016;43(8):4742.

24.

Nie

Cao

Gao

Wang

Shen

Estimating CT image from MRI data using 3D fully convolutional networks. In: Deep Learn data labeling Med Appl First Int Work LABELS 2016, Second Int Work DLMIA 2016, held conjunction with MICCAI 2016, Athens, Greece, Oct 21, 2016, Proc. 2016;2016:170-178.

25.

Spadea

Pileggi

Zaffino

, et al. Deep convolution neural network (DCNN) multiplane approach to synthetic CT generation from MR images-application in brain proton therapy. Int J Radiat Oncol Biol Phys. 2019;105(3):495–503.

26.

Xiang

Wang

Nie

, et al. Deep embedding convolutional neural network for synthesizing CT image from T1-weighted MR image. Med Image Anal. 2018;47:31–44.

27.

Dinkla

Florkow

Maspero

, et al. Dosimetric evaluation of synthetic CT for head and neck radiotherapy generated by a patch-based three-dimensional convolutional neural network. Med Phys. 2019;46(9):4095–4104.

28.

Arabi

Dowling

Burgos

, et al. Comparative study of algorithms for synthetic CT generation from MRI: Consequences for MRI-guided radiation planning in the pelvic region. Med Phys. 2018;45(11):5218–5233.

29.

Dinkla

Wolterink

Maspero

, et al. MR-Only Brain radiation therapy: Dosimetric evaluation of synthetic CTs generated by a dilated convolutional neural network. Int J Radiat Oncol Biol Phys. 2018;102(4):801–812.

30.

Neppl

Landry

Kurz

, et al. Evaluation of proton and photon dose distributions recalculated on 2D and 3D unet-generated pseudoCTs from T1-weighted MR head scans. Acta Oncol. 2019;58(10):1429–1434.

31.

Maspero

Savenije

MHF

Dinkla

, et al. Dose evaluation of fast synthetic-CT generation using a generative adversarial network for general pelvis MR-only radiotherapy. Phys Med Biol. 2018;63(18):185001.

32.

Emami

Dong

Nejad-Davarani

Glide-Hurst

. Generating synthetic CTs from magnetic resonance images using generative adversarial networks. Med Phys. 2018;45:3627–3636.

33.

Kazemifar

McGuire

Timmerman

, et al. MRI-only brain radiotherapy: Assessing the dosimetric accuracy of synthetic CT images generated using a deep learning approach. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2019;136:56–63.

34.

Klages

Benslimane

Riyahi

, et al. Patch-based generative adversarial neural network models for head and neck MR-only planning. Med Phys. 2020;47(2):626–642.

35.

Lei

Harms

Wang

, et al. MRI-only based synthetic CT generation using dense cycle consistent generative adversarial networks. Med Phys. 2019;46(8):3565–3581.

36.

Liu

Chen

Shi

, et al. CT Synthesis from MRI using multi-cycle GAN for head-and-neck radiation therapy. Comput Med Imaging Graph Off J Comput Med Imaging Soc. 2021;91:101953.

37.

Tustison

Avants

Cook

, et al. N4ITK: Improved N3 bias correction. IEEE Trans Med Imaging. 2010;29(6):1310–1320.

38.

Zhu

Park

Isola

Efros

. Unpaired image-to-image translation using cycle-consistent adversarial networks. IEEE ; 2017. doi:10.1109/ICCV.2017.244

39.

Isola

Zhu

Zhou

, Efros AABTIC on CV& PR. Image-to-image translation with conditional adversarial networks. 2016. doi:10.1109/CVPR.2017.632

40.

Peng

Chen

Qin

, et al. Magnetic resonance-based synthetic computed tomography images generated using generative adversarial networks for nasopharyngeal carcinoma radiotherapy treatment planning. Radiother Oncol. 2020;150:217–224.

41.

Kang

Jin

, et al. Synthetic CT generation from weakly paired MR images using cycle-consistent GAN for MR-guided radiotherapy. Biomed Eng Lett. 2021;11(3):263–271.

42.

Yang

Sun

Carass

, et al. Unsupervised MR-to-CT synthesis using structure-constrained CycleGAN. IEEE Trans Med Imaging. 2020;39(12):4249–4261.

43.

Cusumano

Lenkowicz

Votta

, et al. A deep learning approach to generate synthetic CT in low field MR-guided adaptive radiotherapy for abdominal and pelvic cases. Radiother Oncol J Eur Soc Ther Radiol Oncol. 2020;153:205–212.

44.

, Singhrao

, Cao

, et al. Generation of abdominal synthetic CTs from 0.35T MR images using generative adversarial networks for MR-only liver radiotherapy. Biomed Phys Eng Express. 2020;6(1). doi:10.1088/2057-1976/AB6E1F

45.

Bilski

Buat

Baranowska

, et al. Computer-aided target volume delineation for postoperative radiation therapy in brain glioma patients with the use of the hybrid artificial intelligence model. Eur Phys J Spec Top. 2025;123. doi:10.1140/epjs/s11734-025-01688-8

Synthetic CT Generation from MRI Using Generative Adversarial Networks for MR-Only Brain Radiotherapy Planning

Abstract

Introduction

Methods

Results

Conclusion

Keywords

Introduction

Materials and Methods

Models

Loss Functions

Model Evaluation

Statistical Analysis

Results

Quantitative and Qualitative Evaluation

Discussion

Conclusion

Footnotes

Abbreviations

Acknowledgements

ORCID iD

Ethics Considerations and Consent to Participate

Author Contributions

Funding

Declaration of Conflicting Interests

Data Availability Statement

References