Abstract
Purpose
To predict the voxel-based dose distribution for postoperative cervical cancer patients underwent volumetric modulated arc therapy using deep learning models.
Method
A total of 254 patients with cervical cancer received volumetric modulated arc therapy in authors’ hospital from January 2018 to September 2021 were enrolled in this retrospective study. Two deep learning networks (3D deep residual neural network and 3DUnet) were adapted to train (203 cases) and test (51 cases) the feasibility and effectiveness of the prediction method. The performance of deep learning models was evaluated by comparing the results with those of treatment planning system according to metrics of dose-volume histogram of target volumes and organs at risk.
Results
The dose distributions predicted by deep learning models were clinically acceptable. The automatic dose prediction time was around 5 to 10 min, which was about one-eighth to one-tenth of the manual optimization time. The maximum dose difference was observed in D98 of rectum with a | δD| of 5.00 ± 3.40% and 4.88 ± 3.99% for Unet3D and ResUnet3D, respectively. The minimum difference was observed in the D2 of clinical target volume with a |δD| of 0.53 ± 0.45% and 0.83 ± 0.45% for ResUnet3D and Unet3D, respectively.
Conclusion
The 2 deep learning models adapted in the study showed the feasibility and reasonable accuracy in the voxel-based dose prediction for postoperative cervical cancer underwent volumetric modulated arc therapy. Automatic dose distribution prediction of volumetric modulated arc therapy with deep learning models is of clinical significance for the postoperative management of patients with cervical cancer.
Introduction
Cervical cancer (CC) is one of the most common gynecologic malignant carcinomas and remains the fourth most common cancer worldwide. 1 Intensity-modulated radiotherapy (IMRT) and volumetric modulated arc therapy (VMAT) have been widely applied in the external beam radiotherapy for postoperative CC in order to minimize the radiation dose to the bowel, bladder, and other critical structures in comparison with conventional conformal radiotherapy.2,3 However, the radiotherapy plan development is a complex and time-consuming process, which consists of numerous iterations of optimization and takes hours and even days. 4 Another challenge is the plan quality variability resulted from different choices of compromises between inter- and intraplanners for various objective functions, which also results in suboptimal plans delivered in the clinical and worse clinical outcomes.5,6
With the introduction of artificial intelligence into radiotherapy, automatic planning has been investigated intensively to enhance the speed, efficiency, and uniformity of IMRT/VMAT planning. 7 Knowledge-based planning (KBP) models, which use geometric/anatomical features extracted from regions of interest (ROIs) of previously treated patients to predict dose-volume objectives automatically for new patients, demonstrated clinically acceptable quality in the treatment of CC, head-and-neck cancer, and rectum cancer.8,11 One key limitation of KBP models is that the predicted dose-volume histogram (DVH) is lack of spatial information which may require extra work to deal with cases with uncommon geometry of ROIs.12,13 On the other hand, the number of geometric/anatomical features that influence the dose distribution of ROIs is very large and may not all be sufficiently included in the KBP to establish a reliable model.14,15
In recent years, voxel-based prediction was suggested to preserve spatial information with dose values of individual voxels predicted.16,17 More recently, dose distribution prediction models using deep learning (DL) were further investigated to improve the efficiency and accuracy of automatic planning.18,19 Accurately predicted dose distributions using DL for nasopharyngeal cancer, prostate, lung, and esophageal cancers with IMRT have been demonstrated.20,23 However, the predicted dose distribution is hardly reproducible and readily clinically applicable due to complexities in training networks of DL, which results in few studies convert directly the predicted to clinical deliverable plans. 24 Also few studies have predicted voxel-based VMAT dose distribution for CC.
In this study, we adapted a 3D deep residual neural network (3DResUnet) to predict the voxel-based dose distribution of VMAT for CC patients and compared it with conventional 3DUnet. This DL-based automatic planning can not only provide clinical deliverable DVH objectives but also give voxel-level feedback to planners to further improve the dose distribution.
Materials and Methods
Patients and Plans
Patients with CC treated by postoperative VMAT in the author's hospital from 2018 to 2021 were retrospectively reviewed and enrolled in this study. Patients were immobilized in the supine position with a thermoplastic abdominal fixation device and simulated using a 16-slice Brilliance Big Bore CT scanner (Philips Healthcare) at continuous 3-mm slices. The clinical target volume (CTV) of each patient was contoured by a resident radiation oncologist and then reviewed, edited, and finally approved by a senior radiation oncologist according to the consensus guideline. 2 The corresponding planning target volume (PTV) was generated with a 3-dimensional margin of 5 mm around the CTV. VMAT plans were optimized and calculated with a dose calculation grid of 3 mm in the Monaco treatment planning system (TPS) version 5.1.03 (Elekta) with a dose prescription of 50Gy for 25 fractions or 45Gy for 25 fractions to the PTV, as reported in the previous study. 25 The reporting of this study conforms to STROBE guidelines. 26 As for the simulation nature of this retrospective study, the ethical approval was waived by the ethical committee of authors’ hospital.
Data Preprocessing
Considering the fact that the dose is dependent on the spatial distance between PTVs and organs at risk (OARs), all these images were trimmed into 256 × 256 pixels. The scientific computing package NumPy was used to process all these images. 27 In order to account for the dependence on the anatomical geometry in the adjacent region of the given anatomy, additional 6 anatomical images in the superior and inferior region of target volumes were also included in the dose matrix prediction for this given anatomy. All these anatomical images were treated as different channels of the input layer when fed into the learning network.
Dose Prediction Network
As shown in Figure 1, 3DResUNet and 3DUNet were adapted for the dose prediction. These 2 networks are able to restore the resolution of the high-level semantic feature map to that of the original images through the skip of the network, and 3DResUNet uses the residual network to realize the skip of the convolution layer to avoid the disappearance of gradient backpropagation during weight updating and optimization.18,28 In order to increase the robustness of the model, Batch Normalization and ReLU are added to each convolution layer. The inputs to these architectures are the contours of the PTV and the OARs, including bladder, rectum, small intestine, left and right femoral heads, and body region, covered the whole scanning volume but was not within the previous ROIs. The output data were the dose distribution of the corresponding ROI.

The illustration of the network architecture (3D ResUNet and 3D UNet) used in this analysis. A, The architecture of 3D deep residual neural network (3DResUnet); (B) The architecture of 3DResUnet.
As shown in Figure 1, ResUnet3D is composed of encoder (left) and decoder (right) with 4 layers, in which the encoder extracts the 3D matrix of CT images, OARs, PTV, and RTDose characteristics, and the decoder realizes regression fitting of 3D characteristics to dose distribution. The encoder contains two 3D convolution layers (3 × 3 × 3) with a step size of 2 × 2 × 2. After the data of the upper layer go through the pooling layer of 3 × 3 × 3 with a step size of 2 × 2 × 2, the size is halved with the number of channels increased by 2. The transposed matrix will multiply the size and keep the number of channels constant. In order to reduce the loss of semantic information in the upsampling path, data of the same size in the downsample will be connected to the upsampled by skip. The upsampled path data will go through a 3D convolution layer of 3 × 3 × 3 with a step size of 1 × 1 × 1, after which the transpose convolution operation is performed again. The last layer of the model carries out the nearest interpolation to restore the data to a size consistent with the initial input image.
The loss function used in the optimization was a mean absolute error (MAE) (L1). The Adam optimization algorithm was used to minimize the loss function value between the predicted dose and the clinical truth. The batch size was set to 1. The total training epoch was set to 200. The stochastic gradient descent method was used to optimize the network with the initial learning rate set as 0.0001. The final model was chosen at the training epoch when the loss on the validation set stopped decreasing which guarantees that model stopped learning image features and prevented overfitting on the training data. This final model was then run on the testing set to assess its performance. The networks were coded on the Pytorch1.5.0 framework and run on a workstation with GeForce RTX 2080Ti graphics card and a 64 GB RAM.
Model Evaluation
The performance of the proposed 3DResUNet and 3DUNet models was evaluated by comparing the dose distributions and DVH parameters of OARs and PTV between the prediction and clinical truth. Voxel-wise dose differences were evaluated using| δD|=| Dc-Dpre |, where Dc and Dpre denote the clinical and predicted dose of each voxel within the body, respectively. The mean and standard deviation of δD were calculated to evaluate the prediction bias and precision.
Statistical Analysis
The models were built using Pytorch1.5.0, Keras 2.4.0 and Python 3.7. The characteristics of patients were analyzed using the Fisher exact test and the Mann-Whitney U test. Statistical analyses were performed using SPSS version 19.0 (SPSS, Inc., IBM) with a P < .05 considered to be statistically significant.
Results
A total of 254 patients with postoperative CC were enrolled in this study with a mean age of 54.68 ± 10.89 years. Patients were randomly divided into a training set (203 cases) and a test set (51 cases) at a ratio of 4:1. Most of the patients were of squamous cell carcinoma (60.2%) and at stage I (48.4%) and stage II (31.1%). Detailed characteristics of these patients are presented in Table 1.
Clinical Characteristics of Enrolled Patients and Images.a
P value is calculated from the univariate association test between subgroups. Mann-Whitney U test for continuous variables, Fisher exact test for categorized variables.
Figure 2 shows the DVH comparison between planned and predicted dose distributions with 2 DL models for 4 patients with different prescription doses. The prescription dose is 45Gy for PTV in Figure 2A, 35Gy for PTV and 40Gy for CTV in Figure 2B, 45Gy for PTV and 50Gy for CTV in Figure 2C, and 50Gy for PTV and 55Gy for CTV in Figure 2D, respectively. Two DL models were able to predict the dose distribution for patients with varied dose prescriptions.

The dose-volume histogram (DVH) comparison between planned and predicted dose distributions with 2 deep learning (DL) models for 4 patients with different prescription doses. (A) Prescription dose of 4500 cGy to planning target volume (PTV); (B) prescription dose of 3500 cGy to PTV and 4000 cGy to GTV; (C) prescription dose of 4500 cGy to PTV and 5000 cGy to GTV; and (D) prescription dose of 5000 cGy to PTV and 5500 cGy to GTV.
Figure 3 shows the pixel-wise difference maps between the clinical and predicted dose distributions of 2 patients with ResNunet3D and Unet3D, respectively. The mean and standard deviation of dose differences of all voxels within the body for each testing patient are shown in Table 2. The maximum dose difference was observed in D98 of rectum with a | δD| of 5.00 ± 3.40% and 4.88 ± 3.99% for Unet3D and ResUnet3D, respectively. The minimum difference was observed in the D2 of CTV with a | δD| of 0.53 ± 0.45% and 0.83 ± 0.45% for ResUnet3D and Unet3D, respectively.

The pixel-wise difference maps between the clinical and predicted dose distributions of 2 patients with ResNunet3D and Unet3D, respectively.
The Clinical Indices of PTVs and OARs for 51 Cervix Cancer Patients in the Manually Optimized Plan and the Predicted Plan.
Abbreviations: CTV, clinical target volume; PTV, planning target volume; Dx%, dose received by ≥ x% of the objective structure; SI, small intestine; FR, right femoral head; FL, left femoral head; TPS, treatment planning system; OAR, organs at risk; PTV, planning target volume.
Discussion
In this study, 2 DL models were adapted for the voxel-based dose prediction for CC patients who underwent VMAT. The 2 DL models were able to predict the dose distribution for patients with varied dose prescriptions with a maximum dose difference observed in D98 of rectum with a | δD| of 5.00 ± 3.40% and 4.88 ± 3.99% for Unet3D and ResUnet3D, respectively.
IMRT/VMAT had been widely applied in the clinical to improve the target coverage and reduce the volume and toxicity of OARs irradiated for patients of CC after hysterectomy. 29 However, due to the increased parameters and freedoms during optimization, the treatment planning time for VMAT was increased dramatically as a 10-fold time was reported in comparison with IMRT optimization. 30 For the suggested models in this study, the automatic dose prediction time was around 5 to 10 min, which was about one-eighth to one-tenth of the manual optimization time. However, as the predicted dose distributions are not directly clinically applicable, dose mimicking is needed to translate them to deliverable plans afterward. Song et al applied a deep neural network DeepLabv3+ to predict radiotherapy dose distribution for rectal cancer patients and demonstrated that DL model produced a clinically acceptable dose distribution with a time save of 13.66 to 15.76 min for the 1-year and 6-year experienced groups. 31 In the study of Chen et al, a reducing planning time from 33 to 37 min with manual planning to an average of 8.5 min with automatic CNN planning for patients with CC who underwent IMRT was reported. 32
In this study, the output was a spatial dose objective specifying the desired dose-per voxel, which eliminates the need to tune the complexity of dose-volume objectives. 16 The predicted dose distribution with 3DUnet and 3DResUnet is all acceptable, as shown in Figure 2. Compared with the 3DUNet model, the DVH curves and the edge dose of body predicted by 3DResUnet were closer to the ones of TPS in comparison with those predicted by 3DUnet.The maximum dose error of PTV was less than 1.64% ± 1.51% and 1.27% ± 1.67% for 3DUnet and 3DResUnet, respectively, as shown in Table 2 the detailed dose differences for DVH metrics. These were better than the reported differences (1.80% + 1.09%) with 2DUnet for prostate cancer. 19
It is a challenge to use 2DUNet to predict the 3D dose distribution on a slice-by-slice basis instead of real 3D volume prediction, which may cause uncertainty, especially at the edge of PTV. 33 The 3DUnet applied in this study achieved similar performance in comparison with the results of Zhang et al, in which an overall average MAE and maximum MAE for the predicted dose of CC patients using 3DUnet were 2.43 ± 3.17% and 3.16 ± 4.01% of the prescribed dose, respectively. 34 However, a few metrics in our study showed a bit higher dose differences, such the D98 of rectum and bladder, this may resulted from the variability of plan complexity and OAR priority during optimization for different patients in the training set. It also indicated that there is still room to further improve our prediction models.
In this study, we further tested the performance of our models to predict the dose distribution of CC patients with unusual dose prescription, as shown in Figure 2B and D for patients with a prescript of 36 Gy and 60 Gy, respectively. These 2 prescriptions were not included in the training set. This also demonstrates the feasibility and flexibility of DL models for voxel-based dose distribution prediction. One limitation of this study is the number of trained cases was not large enough, and they were enrolled from one institute. The depth of the 3DResUnet may be insufficient due to the limited hardware capacity.
Conclusion
In conclusion, in comparison with the 3DUNet model, the predicted edge dose of the body in the 3DResUNet model is closer to the dose distribution of TPS. With the adding the residual block, 3DResUNet improved the dose prediction of PTV. The 2 DL models adapted in the study both showed the feasibility and reasonable accuracy in the voxel-based dose prediction for postoperative CC underwent VMAT. Automatic dose distribution prediction of VMAT with DL models is of clinical significance for the postoperative management of patients with CC.
Footnotes
Abbreviations
Acknowledgments
The authors sincerely thank everyone who worked on this paper.
Authors’ Note
As for the simulation nature of this retrospectively study, the ethical approval was waived by the ethical committee of authors’ hospital.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Partial of this work is supported by funding from the Major Project of Wenzhou Science and Technology Bureau (ZY2022016) and Wenzhou Municipal Science and Technology Bureau (Y20190423).
