Abstract
Introduction
This study aims to develop an individualized risk prediction model for radiation pneumonitis (RP) based on unsupervised image feature learning. A deep convolutional generative adversarial network (DCGAN) was utilized to automatically extract features from computed tomography (CT) images.
Methods
A retrospective analysis was conducted on 180 lung cancer patients treated with volumetric modulated arc therapy (VMAT) at Kaohsiung Veterans General Hospital between 2015 and 2022. To mitigate clinical sample size limitations, rotation-based augmentation was employed to expand the training dataset. The pretreatment CT images were processed into three input configurations: whole-lung, V5Gy dose regions, and V20Gy dose regions. An unsupervised feature extraction model, designated RP-GAN, was constructed to capture latent representations associated with RP risk. High-dimensional features were refined via least absolute shrinkage and selection operator (LASSO) and integrated into a stacking ensemble learning framework (including RF, SVM, KNN, XGBoost, and LR). Model stability and generalization were validated through 10-fold cross-validation alongside an independent test set, while clinical interpretability was ensured using Grad-CAM and LIME.
Results
The whole-lung input model demonstrated superior performance, achieving an AUC of 0.856 and an accuracy of 0.861, with a recall of 0.778. In contrast, models restricted to V20Gy dose regions showed a significant decline in sensitivity, with the recall decreasing to 0.273. XAI visualization confirmed that the model focused not only on the tumor bed but also on the peritumoral parenchyma and contralateral lung.
Conclusion
The proposed RP-GAN architecture effectively captures subtle textural changes across the whole lung microenvironment without requiring manual annotations. This framework provides a robust tool for individualized RP risk assessment, facilitating the optimization of radiation therapy plans.
Keywords
1. Introduction
Radiation therapy (RT) is a cornerstone modality in the clinical management of lung cancer and is capable of achieving substantial tumor control and improving survival outcomes. However, radiation pneumonitis (RP) caused by inadvertent irradiation of normal lung parenchyma remains a serious complication, with clinical manifestations ranging from dry cough and dyspnea to irreversible pulmonary fibrosis, thereby posing a significant threat to patient prognosis and quality of life. Owing to the combined influence of dose distribution, age, and host-related physiological factors, the development of RP exhibits marked inter-individual variability. Although commonly used dose–volume parameters (such as V5Gy and V20Gy) have been shown to correlate with the occurrence of RP, single or limited indices are insufficient to capture its inherent complexity, thus constraining the precision of individualized risk prediction.1-3
Most existing RP risk prediction studies rely on supervised learning and heavily depend on manually contoured clinical structures delineated by medical physicists. However, the pathogenesis of RP involves widespread and complex lung responses that may not be fully encompassed by pre-defined annotated regions. In addition, the clinical implementation of deep learning faces two major challenges: first, the substantial time and subjective variability associated with high-quality manual annotation; second, as highlighted by Guo et al (2024), the heterogeneity of data across different imaging scanners often leads to performance degradation and undermines the robustness of model deployment in real-world practice.4,5
Motivated by the need for a more individualized and clinically scalable approach to radiation pneumonitis (RP) risk assessment, we sought to move beyond conventional dose–volume metrics and annotation-dependent supervised models. Although parameters such as V5Gy and V20Gy remain clinically useful, they provide only limited summaries of radiation exposure and may not adequately capture the complex pulmonary background that modulates individual RP susceptibility. At the same time, most existing deep learning approaches depend on manually defined structures, which are labor-intensive to generate and may miss latent risk-related signals distributed outside pre-specified regions.
To address these gaps, this study proposes an unsupervised feature learning strategy based on a deep convolutional generative adversarial network (DCGAN) and develops a model termed RP-GAN. The motivation behind this framework is to determine whether whole-lung imaging contains clinically meaningful latent information that can improve RP prediction without requiring manual annotations. By automatically learning texture, structural, and spatial distribution patterns from CT images, RP-GAN is designed to capture lung-wide features potentially associated with subclinical inflammatory responses and inter-individual susceptibility.
In recent years, explainable artificial intelligence (XAI) has become essential for improving transparency and trust in cancer detection systems. Prior studies emphasize that medical AI should move beyond accuracy-driven models toward clinically traceable decision support, particularly in high-stakes oncological settings.6,7 To improve decision transparency and clinical trustworthiness, XAI techniques, including gradient-weighted class activation mapping (Grad-CAM) and local interpretable model-agnostic explanations (LIME), were incorporated to visualize model attention regions and quantify their contributions to predictions. Furthermore, because no single classifier may be sufficient to model the heterogeneity of RP risk, a stacking ensemble learning framework was adopted to integrate the complementary strengths of multiple base learners, such as random forest (RF) and support vector machine (SVM), with the aim of improving generalizability and stability in RP risk prediction.8-10
The central hypothesis of this study is that RP risk arises from the interaction between the global pulmonary microenvironment and the three-dimensional dose distribution rather than being driven solely by specific high-dose subregions. The scientific problem addressed in this study is whether RP risk can be adequately characterized by conventional dose-restricted regions alone, or whether clinically meaningful risk signals are distributed across the whole lung and embedded in the global pulmonary microenvironment. Because RP is biologically heterogeneous and spatially diffuse, models based only on predefined local regions may fail to capture latent susceptibility patterns relevant to individualized prediction. Accordingly, three image input configurations—whole-lung, V5Gy, and V20Gy regions—are designed to evaluate the necessity of incorporating full-lung information for enhancing predictive performance.11-13 In summary, by integrating the RP-GAN, stacking ensemble learning, and XAI techniques, this work establishes a clinically oriented risk prediction framework intended to assist clinicians in precise individualized RP risk assessment, thereby facilitating treatment plan optimization and reducing the incidence of RP.
2. Materials and Methods
2.1. Study Framework
This study was developed by optimizing a DCGAN architecture, hereafter referred to as the RP-GAN. The overall workflow, illustrated in Figure 1, comprises four major stages: data preprocessing, RP-GAN model training, feature engineering, and the construction and evaluation of the RP risk prediction model. First, image format conversion, window setting, centering, and normalization were performed. The images were then divided into whole-lung regions and dose-restricted regions (V5Gy and V20Gy) as separate inputs to investigate the impact of different regions on feature extraction and model performance. RP-GAN was subsequently used to perform unsupervised feature learning on unlabeled images, and the resulting features were processed with the least absolute shrinkage and selection operator (LASSO) for feature selection and dimensionality reduction.14,15 The selected features were fed into multiple base classifiers for prediction, and a stacking ensemble was used to integrate their outputs and build the RP risk prediction model. Finally, gradient-weighted class activation mapping (Grad-CAM; Selvaraju et al, USA) and local interpretable model-agnostic explanations (LIME; Ribeiro et al, USA) were applied to visualize model attention regions, and multiple evaluation metrics were used to assess model performance. This study was reported in accordance with the Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines.
16
Workflow of the proposed RP risk prediction and explainability framework. The overall framework is divided into four core stages: (1) 
2.2. Data Collection
Baseline Clinical Characteristics and Radiotherapy Parameters of the Study Cohort
This table summarizes the baseline characteristics of 180 patients with lung cancer, including demographic variables (age, sex, and BMI), tumor stage, and Chemotherapy. By comparing the distributions between the RP and non-RP groups, this table provides the clinical context for subsequent individualized risk assessment. Abbreviation: RP: Radiation Pneumonitis, BMI: Body Mass Index.
Distribution of the Study Cohort and Imaging Data Across Datasets
This table details the distribution of 180 patients enrolled at Kaohsiung Veterans General Hospital (KSVGH), stratified at both the patient level and the image-slice level. Cases with and without radiation pneumonitis (RP) were allocated to the training and test sets at an 8:2 ratio. This large-scale imaging dataset provides a sufficient foundation for feature evolution during the unsupervised training of RP-GAN. Abbreviation: RP: Radiation Pneumonitis.
2.3. Assessment of Radiation Pneumonitis
RP is a common complication in lung cancer patients undergoing radiotherapy and is triggered by radiation-induced damage to normal lung tissue. In the early phase, such radiation injury leads to damage to alveolar cells and acute inflammatory responses, which may progress over time to irreversible fibrotic and other structural changes. The development of RP is driven by multiple factors, including dose distribution, treatment volume, and the patient’s pre-existing pulmonary conditions. Although modern techniques such as VMAT have substantially reduced the incidence of severe toxicity, a comprehensive capture of early and mild pulmonary reactions remains clinically important. In this study, RP was defined as any case with RP of Grade 1 or higher, according to the Radiation Therapy Oncology Group (RTOG) grading criteria, to ensure the inclusion of subtle and early manifestations. The typical imaging features of RP, as highlighted by the red circle in Figure 2A, were contrasted with the normal lung appearance shown in Figure 2B. Representative CT appearances of radiation pneumonitis (RP). (A) RP image: The red circle highlights the irradiated lung region showing ground-glass opacities and interstitial thickening, consistent with acute inflammatory changes. (B) Normal lung image: The lung parenchyma demonstrates preserved vascular markings and normal aerated spaces. In this study, cases with RP of grade 1 or higher according to the RTOG criteria were classified into the RP group, enabling the model to capture early, patient-specific subclinical abnormalities that may already be visually appreciable on CT
RTOG Radiation Pneumonitis Grading Scale
This table summarizes the clinical symptoms and imaging features associated with RP from grade 0 to grade 5 according to the Radiation Therapy Oncology Group (RTOG) criteria (e.g., ground-glass opacities and radiation-induced pulmonary fibrosis). In this study, cases with RP of grade 1 or higher were classified into the RP group in order to capture subtle, early subclinical changes and thereby improve the clinical sensitivity of individualized risk prediction. Abbreviation: RTOG: Radiation Therapy Oncology Group.
2.4. Image Preprocessing
Pretreatment thoracic CT images were standardized through a multi-stage pre-processing pipeline to ensure computational consistency and data traceability. Original DICOM data were batch-processed and converted into PNG format using a systematic indexing convention to facilitate precise cross-referencing between model predictions and anatomical locations. Lung regions of interest were extracted using radiotherapy structure files to restrict analysis to the pulmonary parenchyma. To optimize the input for the RPGAN’s Tanh activation layers, all images were centered and pixel intensities were normalized to a range of -1, 1.
Three input configurations were constructed: whole-lung images, V5Gy dose regions, and V20Gy dose regions. The latter two were based on clinical lung dose distributions to evaluate dose-relevant features. Systematic reviews by Keffer et al reported significant associations between RP risk and the proportion of lung volume exposed to both low-and high-dose ranges, with V20Gy being a widely used clinical dose constraint in radiotherapy planning and V5Gy also demonstrating a significant correlation with RP risk. Guided by this clinical evidence, the V5Gy and V20Gy regions were selected as dose-based high-risk controls to evaluate the performance of the proposed unlabeled feature learning framework in the absence of explicit clinical structure information. 17 All other preprocessing steps were kept identical to those used for models trained with the whole - lung input.
2.5. Development of the RPGAN and Data Augmentation
The workflow of the RPGAN is illustrated in Figure 3. To mitigate clinical sample size limitations and alleviate the impact of class imbalance, strategic data augmentation was applied to the training set. Specifically, we employed rotation-based transformation within a range of ± 10° exclusively for the minority class (RP-positive samples). Through adversarial training, the generator and discriminator were iteratively optimized until reaching equilibrium. The model was implemented in Python 3.10 (Python Software Foundation, Wilmington, DE, USA) via TensorFlow 2.0 (Google LLC, Mountain View, CA, USA), with training settings including a batch size of 32 and 100 epochs optimized via the Adam optimizer. The trained model was subsequently employed for RP risk prediction. Architecture of the RP-GAN model and schematic of unsupervised feature learning. The proposed framework is optimized from a DCGAN architecture and consists of a generator–discriminator adversarial training pipeline. The convolutional feature maps derived from the discriminator are used as latent representations, enabling the model to automatically learn lung-wide texture patterns and spatial structural features associated with radiation-induced injury from whole-lung images without relying on manual annotations
2.5.1. Generator and Discriminator Architecture
The generator adopted a multi-layer transposed convolutional architecture, in which a 100-dimensional noise vector was progressively upsampled to produce 512 × 512 single-channel grayscale images. Starting from a 4 × 4 feature map, the spatial resolution was gradually increased to 128 × 128 through several intermediate layers, while the number of channels was reduced stepwise to refine the image details. Each layer utilized a 4 × 4 kernel to perform feature learning and spatial expansion, enabling the generated images to exhibit a coherent structure and fine-grained texture suitable for downstream model training. The detailed layer configuration, including feature map sizes and channel numbers, is shown in Figure 4A. RP-GAN deep learning architecture and multi-scale feature extraction scheme. (A) Generator evolution: A 100-dimensional random noise vector is progressively transformed through multiple transposed convolution (deconvolution) layers, reconstructing image details stepwise and expanding the feature maps to a final spatial resolution of 512 × 512. This pathway illustrates how latent noise is mapped into realistic lung-like images. (B) Discriminator convolutional pathway: Seven convolutional layers interleaved with pooling operations compress the input image into 28,672 latent features (multi-feature layer), which serve as the raw representations for subsequent unsupervised learning. This architecture enables high-dimensional characterization of lung texture and structure for RP risk modeling. Abbreviation: deconv: deconvolution, conv: convolution
The discriminator consists of seven convolutional layers and two max-pooling layers, which are designed to progressively extract deep features from the input images and perform real–fake classification. The input was a 512 × 512 single-channel grayscale images. Rather than employing automated convergence criteria or early stopping, the model was trained for a fixed number of epochs using Binary Cross-Entropy (BCE) loss. To stabilize feature learning, a feature matching loss was integrated into the generator’s objective, minimizing the
2.6. Feature Engineering
2.6.1. Aggregation of Image Slices
The model inputs were constructed from features extracted from multiple CT slices per patient. To integrate slice-level information and build patient-level representations, mean pooling was applied to aggregate features across all slices from the same patient, yielding a single feature vector per case. This aggregation strategy reduces slice-to-slice variability and improves the consistency of patient-level predictions.
2.6.2. LASSO
To reduce feature dimensionality and model complexity, LASSO regression with L1 regularization was employed, driving the coefficients of noninformative features toward zero and retaining only features with meaningful predictive contributions. All features were standardized prior to training, and the regularization parameter λ was selected via cross-validation to mitigate overfitting and enhance model generalizability. The optimal λ was determined to be 0.0295, resulting in the retention of 66 features from the original 32,512 extracted features for subsequent model development.
2.7. Ensemble Learning: Stacking Classifier
The dataset was split into training and test sets at an 8:2 ratio. The test set was kept completely independent from model training and was used only for the final performance evaluation. A 10-fold cross-validation was applied to the training set. In each iteration, nine folds trained the model while one fold validated it. This process was repeated ten times to build the meta-feature matrix. The base learners (RF, SVM, KNN, and XGBoost) were trained sequentially, incorporating class weighting to address potential label imbalance, and their out-of-fold predictions were collected as inputs for the meta-learner, which was implemented via logistic regression (LR). The hyperparameters of the base learners were optimized via a randomized search combined with cross-validation, with the area under the ROC curve (AUC) serving as the primary evaluation metric. The overall stacking procedure is depicted in Figure 5. Stacking ensemble architecture and meta-learner fusion workflow. In the meta-feature generation stage, the training set is processed using 10-fold cross-validation to train four base learners—random forest (RF), support vector machine (SVM), k-nearest neighbors (KNN), and extreme gradient boosting (XGB)—and to construct a meta-feature matrix that captures their complementary prediction patterns. In the decision fusion stage, logistic regression (LR) is used as the meta-learner to perform a linear weighted combination of the predicted probabilities from all base models, aiming to reduce overfitting risk associated with any single classifier and to yield a stable final prediction of RP risk. Abbreviation: RF: Random Forest, SVM: Support Vector Machine, KNN: K Nearest Neighbor, XGB: Extreme Gradient Boosting, LR: Logistic Regression
2.8. Model Evaluation Methods
2.8.1. Evaluation of Predictive Performance (Confusion Matrix)
To comprehensively assess the performance of the proposed radiation pneumonitis (RP) prediction and diagnostic models, multiple evaluation metrics, including the area under the receiver operating characteristic curve (AUC), accuracy, positive predictive value (PPV), negative predictive value (NPV), specificity, recall (sensitivity), and F1- score, were employed. These metrics quantify both global and detailed aspects of model behavior in clinical classification tasks. All indices were derived from the binary confusion matrix, which is based on four fundamental quantities: true positives (TPs), false positives (FPs), false negatives (FNs), and true negatives (TNs).
2.8.2. Visualization With Explainable AI (XAI) Models
Grad-CAM was adopted as one of the explainable AI techniques to generate heatmaps by combining convolutional feature maps with class score information, thereby visualizing the image regions to which the model attends during decision-making. To further enhance interpretability at the output level, local interpretable model-agnostic explanations (LIME) were applied for localized analysis. LIME segments an image into superpixel regions and randomly occludes subsets of these regions to create perturbed samples; then, it evaluates changes in model outputs to estimate the contribution of each region to the prediction. In contrast to Grad-CAM, which emphasizes feature-level explanations, LIME focuses on local interpretability at the output layer; together, the two methods provide complementary insights into model behavior.19,20
3. Results
3.1. Predictive Performance of RP Risk Models and Pathophysiological Interpretation
Summary of Classification Performance Under Different Image Input Configurations
This table presents the core outcome metrics of the study, summarizing AUC, accuracy, recall, and F1-score for comprehensive performance evaluation. The results show that the stacking architecture achieves the best performance when using whole-lung images as input and quantitatively demonstrate that, when image information is restricted to the V20Gy high-dose region, the model’s sensitivity (recall) for identifying high-risk cases declines markedly, underscoring the pivotal role of whole-lung information in RP risk prediction. Abbreviation: AUC: Area Under the ROC Curve, ROC: Receiver Operating Characteristic, PPV: Positive Predictive Value, NPV: Negative Predictive Value, SVM: Support Vector Machine, RF: Random Forest, KNN: K Nearest Neighbor, XGB: Extreme Gradient Boosting, RP: Radiation Pneumonitis.
In contrast, when the input was restricted to the V5Gy or V20Gy regions, the model performance deteriorated markedly; under the V20Gy configuration, the AUC of the stacking model decreased to 0.653, and its recall plummeted to 0.273. These findings highlight the limitations of models trained solely on local dose subregions and demonstrate that neglecting global lung anatomy severely compromises the ability to capture subclinical injury signals, thereby reducing the precision of individualized RP risk stratification.
This performance gap is strongly supported by radiobiological mechanisms. RP is increasingly recognized as a complex, lung-wide inflammatory process rather than a purely focal tissue injury. Although the V20Gy region receives the highest radiation dose, previous studies have shown that radiation can trigger the release of pro-inflammatory cytokines—such as transforming growth factor-β (TGF-β), interleukin-1 (IL-1), and interleukin-6 (IL-6)—through bystander and abscopal effects, thereby altering the microenvironment even in non-irradiated lung regions. The proposed RP-GAN framework is capable of capturing subtle textural and structural alterations distributed throughout the lungs, which likely correspond to subclinical inflammation that is not readily discernible by visual inspection. When analysis is confined only to high-dose regions, the model effectively loses access to the global “inflammatory background,” resulting in a markedly reduced ability to identify high-risk patients.
Moreover, individual tolerance to radiation is profoundly influenced by the baseline lung microenvironment, including pre-existing pulmonary conditions such as chronic lung disease, emphysema, or structural remodeling, which are spatially distributed across the entire lung. All-lung images preserve this baseline status and thus encode susceptibility factors that shape RP risk at the patient level. Consistent with this concept, explainability analyses via Grad-CAM and LIME revealed that model attention was not confined to the tumor region but also extended to the surrounding and contralateral lung parenchyma, indicating that RP risk emerges from the interaction between the three-dimensional dose distribution and the global pulmonary microenvironment. These results further validate the proposed unlabeled feature learning strategy, demonstrating its ability to capture latent features closely related to individual physiological responses and to deliver more accurate personalized risk prediction than traditional dose-volume metrics alone.
3.2. Convolutional Attention Patterns and Interpretability Analysis of the RP Risk Model
Figure 6A shows the Grad-CAM visualizations of the RP risk prediction model across different convolutional layers. In the early convolutional layers (first to fourth layers), feature extraction focuses primarily on the tumor and its immediately adjacent regions, showing substantial spatial overlap with the clinically contoured tumor volume. As the convolutional depth increases, the model’s attention gradually expands from the local lesion to the surrounding lung parenchyma and eventually extends into the contralateral lung, indicating that the model integrates spatial information from the entire lung rather than relying solely on focal tumor features during decision-making. XAI-based visualization of feature evolution from local lesions to whole-lung patterns for a patient with radiation pneumonitis (RP=1). (A) Grad-CAM feature-level visualization: heatmaps of model attention are shown across different convolutional layers. The early layers (Layers 1–4) primarily focus on the local tumor region, whereas the deeper layers (Layers 5–7) progressively integrate spatial features from the surrounding and contralateral lung, indicating that the model is capable of deriving global lung microenvironment features from an initially lesion-centered representation. (B) LIME output-level visualization: superpixel-based maps quantify the contribution of individual regions to the model prediction. The results demonstrate that the model’s decisions are informed not only by the left-sided tumor region (yellow dashed contour in the original image) but also by signals from the right, non-irradiated lung, thereby supporting the hypothesis that RP risk is influenced by background whole-lung features. Abbreviation: RP: Radiation Pneumonitis, Grad-CAM: Gradient-weighted Class Activation Mapping, LIME: Local Interpretable Model-agnostic Explanations
LIME-based visualizations further complement this analysis by quantifying the local contributions of different image regions to the model’s predictions. As shown in Figure 6B, high-weight superpixels are predominantly located in the tumor-bearing left lung, but notable attention regions are also observed in the contralateral, non-irradiated right lung. This pattern is highly consistent with the “global lung microenvironment interaction” hypothesis described in Section 3.1 and confirms that the proposed RP-GAN can capture subclinical risk signals distributed throughout the lungs without requiring manual annotations.
These findings demonstrate that the model assesses radiation pneumonitis (RP) risk by treating the entire lung as an interconnected microenvironment, rather than focusing solely on peritumoral high-dose regions. Irradiated cells within high-dose volumes release soluble factors—including cytokines, reactive oxygen species (ROS), and exosomes—that mediate the radiation-induced bystander effect (RIBE). These signaling molecules propagate via systemic circulation or local diffusion to non-irradiated lung parenchyma, including the contralateral lung, thereby inducing cytotoxic and genotoxic damage. Together with radiation-induced immune activation, this process triggers systemic sterile inflammation and reconfigures the pulmonary immune landscape.21-23 Collectively, these interactions manifest as cross-lung spatial effects, which the RP GAN effectively captures as predictive subclinical features. The concordant attention distributions revealed by Grad-CAM and LIME across distinct interpretability frameworks not only enhance the transparency of model decisions but also provide radiobiologically plausible explanations for the predicted RP risk.
4. Discussion
4.1. RP-GAN Feature Extraction Model
Beyond model development, the key scientific finding of this study is that whole-lung imaging features provide more informative RP risk signals than dose-restricted regions alone. This finding suggests that RP should not be viewed solely as a focal toxicity confined to high-dose areas, but rather as a lung-wide process influenced by the interaction between radiation exposure and the pre-existing pulmonary microenvironment. The superior performance of the whole-lung model therefore supports a broader biological interpretation of RP susceptibility and highlights the importance of preserving global pulmonary information in risk modeling. The RP-GAN model developed in this study employs an unsupervised feature learning strategy that enables automatic extraction of imaging features without relying on annotated data, and these learned representations are subsequently used for radiation pneumonitis (RP) risk prediction. Through adversarial training, the RP-GAN learns texture and structural patterns directly from CT images under label-free conditions and is able to capture latent signals associated with disease risk that may not be explicitly encoded in conventional contours or dose parameters.
Most RP-related studies to date have focused on supervised risk prediction models that depend heavily on clinically contoured structures as inputs, such as physician- or physicist-delineated lung and tumor volumes. Although such contours are routinely available from radiotherapy planning, the formation of RP involves complex and widespread pulmonary responses that may extend beyond predefined anatomical or high-dose regions, meaning that excessive reliance on manual annotations can constrain the model’s capacity to learn subtle risk-relevant features. The RP-GAN framework was therefore designed to reduce dependence on clinical annotations and to extract RP-associated features directly from imaging data via unlabeled feature learning.24,25
With only a standard image preprocessing pipeline, the RP-GAN can be retrained and adapted to local datasets from different institutions, making it inherently data-driven and flexible with respect to variations in scanners, acquisition parameters, and processing protocols. Prior work has shown that even visually similar medical images acquired on different scanners or with different parameter settings can lead to substantial performance degradation in deep learning models, underscoring the impact of cross-device heterogeneity on model robustness and generalizability. By enabling site-specific adaptation without the burden of manual re-annotation, the RP-GAN has the potential to improve deployment efficiency in clinical practice and to facilitate integration into real-world radiotherapy workflows.
4.2. Advantages of Whole-Lung Imaging Features and Pathophysiological Mechanisms
Given the 13%–37% clinical incidence of radiation pneumonitis (RP) and its associated mortality risk, predictive models must prioritize sensitivity to mitigate false-negative risks. 26 This study and related meta-analyses demonstrate that effective models, such as RP-GAN, achieve sensitivities exceeding 0.74, with a combined AUC of 0.93 validating their diagnostic efficacy. The results of this study demonstrate that models using whole-lung images as input (AUC 0.856) markedly outperform those restricted to high-dose regions such as V20Gy (AUC 0.653), thereby challenging traditional risk assessment strategies that focus primarily on high-dose irradiated volumes. This observation is well supported by radiobiological principles. RP is increasingly recognized as a complex, lung-wide immune-inflammatory process rather than a purely focal tissue injury confined to high-dose regions. Although the V20Gy volume receives the bulk of the radiation dose, prior studies have reported that localized lung irradiation can induce the release of pro-inflammatory cytokines—including transforming growth factor-β (TGF-β), interleukin-1 (IL-1), and interleukin-6 (IL-6)—through bystander and abscopal effects, with these mediators disseminating via the bloodstream and interstitium to non-irradiated regions and altering the global pulmonary microenvironment. RP-GAN appears able to capture subtle whole-lung textural changes that likely reflect such subclinical inflammatory processes, which are difficult to perceive via routine visual inspection.17,27
Low-dose regions, such as those encompassed by V5Gy, also play a non-negligible role in pulmonary injury. When the entire lung receives low-dose radiation, the dose may be below the threshold for overt cell death but sufficient to activate alveolar macrophages and alter vascular endothelial permeability. In this study, incorporating whole-lung information substantially increased the recall from 0.273-0.778, indicating that many latent high-risk features reside in low-dose background lung tissue and would be overlooked if analysis were confined to the V20Gy region alone. Under such restricted conditions, the model effectively loses its ability to evaluate the global “inflammatory background,” resulting in a high rate of missed high-risk cases.
From a radiobiological standpoint, the baseline lung microenvironment is a key determinant of individual tolerance to radiation. Whole-lung images inherently encode pre-existing pulmonary conditions such as emphysema, interstitial changes, or microvascular abnormalities, which are spatially distributed throughout the lungs and collectively form a susceptibility background for RP. Through unsupervised learning, the RP-GAN successfully extracts latent features that are independent of dose distribution yet strongly related to individual physiological responses, thereby enabling more accurate personalized risk prediction than models relying solely on traditional dose–volume metrics.
4.3. Contribution of XAI Techniques to Model Interpretability and Clinical Applicability
The Grad-CAM and LIME visualizations of the risk prediction model show that the primary attention regions are located over the tumor and the surrounding lung parenchyma, with a strong spatial correspondence to clinically defined high-dose regions. This finding indicates that, even without relying on manual structural contours or explicit dose information, the model can autonomously learn and localize image regions that are closely related to RP risk, reflecting its ability to capture treatment-related lung responses. Notably, the attention maps are not confined to the high-dose core but extend into adjacent lung tissue, suggesting that RP risk may arise from the interaction between high-dose exposure and the neighboring pulmonary microenvironment rather than being driven by a single dose band alone. This observation is consistent with the superior performance of the whole-lung input model compared with models restricted to V5Gy and V20Gy inputs, further supporting the importance of global lung information for RP risk assessment. Previous studies have shown that Grad-CAM provides feature-level spatial heatmaps, whereas LIME offers instance-level explanations 28 ; their integration enables a more comprehensive understanding of model behavior and enhances the reliability of clinical decision-making. 6 In addition, the broadly concordant attention patterns produced by Grad-CAM and LIME across different interpretability frameworks enhance confidence in the model’s decision-making process and demonstrate that the proposed unlabeled feature learning strategy exhibits good stability and clinical plausibility in the RP risk prediction setting.
4.4. Advantages of Ensemble Learning
To improve robustness and classification stability in the presence of heterogeneous data, this study employed a stacking ensemble architecture for RP risk prediction. The approach integrated the outputs of several base classifiers—Random Forest (RF), Support Vector Machine (SVM), k-Nearest Neighbors (KNN), and XGBoost—using logistic regression (LR) as the metaclassifier for final decision-making. Overall, the stacking framework exhibited consistently superior performance, outperforming all individual base classifiers on this task. In direct comparisons of predictive performance, the stacking model achieved higher AUC 0.856, accuracy 0.861, and F1 score 0.737 than any single base learner (Table 4). Although the AUC improvement over the best-performing single model (SVM: 0.846) appears modest, the clinical gain in sensitivity was substantial. Single models such as RF and XGBoost showed limited ability to identify high-risk cases, with Recall values of only 0.444 and 0.333, respectively. In contrast, the stacking ensemble markedly increased Recall to 0.778. In the clinical context of radiation pneumonitis (RP), where failing to identify a high-risk patient (false negative) may lead to severe pulmonary complications, this substantial gain in sensitivity strongly supports the adoption of the ensemble approach despite its greater complexity. This observation aligns with previous studies demonstrating that stacking can exploit complementary strengths and compensate for individual model weaknesses. 29
To balance performance and complexity, logistic regression was chosen as the metaclassifier. By employing a simple linear fusion model in the second layer, we effectively combined the diverse predictive capabilities of the base learners while minimizing the risk of overfitting and preserving computational efficiency. This design is consistent with prior work in cardiovascular risk prediction, which has shown that logistic regression as a fusion classifier can strike an appropriate balance between predictive performance and model complexity in stacked ensemble frameworks. 30
4.5. Limitations and Future Directions
The current model successfully applies the RP-GAN for unsupervised feature extraction, thereby substantially reducing the dependence on manual annotations. Future work could incorporate self-supervised learning (SSL) strategies-such as masked image modeling and contrastive learning—while integrating singular value decomposition (SVD)-based spectral pooling techniques (e.g., singular pooling as proposed by Zhu et al) to guide the network in learning more representative anatomical features and spatial relationships from large collections of unlabeled CT images. 31 In addition, inspired by recent work on the CR-SCAD framework, 32 future studies may explore whether RP-GAN-derived latent features can be further modeled using a strategy that combines collaborative representation with sparse variable selection. Such a design may be particularly relevant in RP prediction, where patient-level samples are relatively limited but the learned image features are high-dimensional. In this context, a CR-SCAD-inspired downstream module may help preserve relationships among patient representations while simultaneously selecting the most discriminative latent variables associated with RP risk. This direction could improve robustness and feature interpretability, especially when integrating imaging, dosimetric, and clinical variables in a multimodal prediction setting. Such approaches are expected to enhance the model’s ability to capture subtle lung texture alterations and improve sensitivity for detecting early subclinical inflammatory changes.
The stacking ensemble framework used in this study involves multiple classifiers and non-trivial hyperparameter tuning. A future direction is to introduce automated machine learning (AutoML) techniques, including neural architecture search (NAS) and automated hyperparameter optimization (HPO), to systematically identify the optimal combination of base learners and meta-classifier weights. This would not only improve development efficiency and reduce manual tuning effort but also help ensure that the model maintains optimal generalization performance and stability when confronted with heterogeneous data from different institutions. 33
Furthermore, as a prerequisite for large-scale clinical adoption, future iterations must address data privacy and security through mechanisms such as Federated Learning or Blockchain34,35. Given the progressive nature of RP, future studies will also explore the integration of time-series imaging and multidimensional clinical information to construct a dynamic risk prediction system capable of quantifying both the severity and temporal evolution of RP. In addition, external validation using multicenter datasets will be conducted to test the model’s applicability under varying scan protocols and population backgrounds. Prospective validation and real-world workflow integration studies will also be necessary before the framework can be considered for routine clinical implementation in radiotherapy planning.
5. Conclusion
This study successfully developed a radiation pneumonitis (RP) risk prediction framework that integrates RP-GAN--based unsupervised feature extraction, stacking ensemble learning, and explainable AI (XAI) techniques. The experimental results demonstrate that the whole-lung image model significantly outperforms models that use only V20Gy or V5Gy dose-restricted regions, achieving an AUC of 0.856 and increasing the recall for high-risk cases to 0.778. These findings indicate that imaging features associated with RP risk are not confined to high-dose sub volumes but are strongly linked to global lung texture, structural alterations, and spatial distribution patterns.
The proposed RP-GAN model highlights the advantages of unsupervised feature learning, enabling the automatic discovery of key risk-related signals from CT images without labor-intensive manual annotation and thereby addressing both the high cost of clinical labeling and the challenge of cross-device imaging heterogeneity. By integrating multiple classifiers through a stacking strategy, the framework further improves the stability and accuracy of risk prediction. Moreover, XAI analyses via Grad-CAM and LIME show that model attention is mainly concentrated in the lung parenchyma surrounding the tumor, with good correspondence to high-dose regions, supporting the clinical plausibility and transparency of the model’s decision process.
Taken together, the findings of this work underscore the lung-wide inflammatory nature of RP, the risk of which is shaped by the interaction between the global pulmonary microenvironment and subclinical radiation-induced damage. The proposed prediction framework has substantial potential for clinical application as an adjunctive tool for optimizing radiotherapy planning. Future research should expand the sample size and include multicenter external validation, prospective evaluation, and workflow integration studies to develop a dynamic risk prediction system covering different RP severity grades and to support individualized precision radiotherapy for lung cancer patients before routine clinical implementation can be considered.
Supplemental Material
Supplemental Material - Individualized Prediction of Radiation Pneumonitis Using RP-GAN: Leveraging Global Lung Features and Explainable Artificial Intelligence
Supplemental Material for Individualized Prediction of Radiation Pneumonitis Using RP-GAN: Leveraging Global Lung Features and Explainable Artificial Intelligence by Yang-Wei Hsieh, Pei-Ju Chao, Yi-Lun Liao, Wen-Ping Yun, Ling-Chuan Chang-Chien, Cheng-Shie Wuu, Yu-Wei Lin and Tsair-Fwu Lee in Technology in Cancer Research & Treatment.
Footnotes
Acknowledgements
We acknowledge the use of artificial intelligence tools solely for linguistic refinement and improving the readability of this manuscript. All scientific concepts, methodological developments, and clinical interpretations presented in this study are the sole work of the authors.
Consent to Participate
Informed consent was waived due to the retrospective design and anonymization of patient data, as approved by the Institutional Review Board (IRB) of Kaohsiung Veterans General Hospital (approval number: KSVGH25-CT1-06, approval date: December 17, 2024).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was partially supported by grants from the National Science and Technology Council (NSTC), Executive Yuan, Taiwan, Republic of China (113-2221-E-992-011-MY2, 114-2637-8-992-002). Institutional Review Board Statement: This study involving human participants was approved by the Institutional Review Board (IRB) of Kaohsiung Veterans General Hospital (approval number: KSVGH25-CT1-06, approval date: December 17, 2024), in compliance with ethical standards and regulatory requirements. The requirement for informed consent was waived.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The datasets generated and/or analyzed during the current study are not publicly available due to institutional ethical restrictions and patient privacy regulations but are available from the corresponding authors (Yang-Wei Hsieh,
.
Use of Artificial Intelligence Statement
During the preparation of this manuscript, we used artificial intelligence (AI) tools solely for language improvement and polishing to enhance clarity and readability. All content was carefully reviewed and edited by the authors, who take full responsibility for the final manuscript. The core scientific contributions of this study—including the development of the RP-GAN unsupervised feature extraction model, the design of the stacking ensemble learning framework, and the clinical interpretation of global lung microenvironment signals—were conceived and completed entirely by the authors without the use of generative AI.
Supplemental Material
Supplemental material for this article is available online.
Appendix
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
