Abstract
Purpose:
To develop and evaluate an automatic intensity-modulated radiation therapy (IMRT) program for cervical cancer, including a Convolution Neural Network (CNN)-based prediction model and an automated optimization strategy.
Methods:
A CNN deep learning model was trained to predict a patient-specify set of IMRT objectives based on overlap volume histograms (OVH) and high-quality plan of previous patients. A total of 140 cervical cancer patients were enrolled in this study, including 100 patients in the training set, 20 patients in the validation set and 20 patients in the testing set. The input of this model was OVH data and the output were values of IMRT plan objectives. For patients in the testing set, the set of planning objectives were predicted by the CNN model and used to automatically generate IMRT plans. Meanwhile, manual plans of these patients were generated by 1 beginner planner and 1 senior planner respectively. Finally, dose distribution, dosimetric parameters and planning time were analyzed. In addition, the 3 types of plans were blinded compared and ranked by an experienced oncologist.
Results:
There were almost no statistically differences among these 3 types of plans in target coverage and dose conformity. Dose homogeneity were slightly decreased while the average dose and parameters for most organs-at-risk (OARs) were decreased in automatic plans. Especially in comparison with manual plans by the beginner planner, V40 of bladder and rectum decreased 6.3% and 12.3%, while mean dose of rectum and marrow were 1.1 Gy and 1.8 Gy lower with automatic plans (either P < 0.017). In the blinded comparison, automatic plans were chosen as best plan in 14 cases.
Conclusions:
For cervical cancer, automatic IMRT plans optimized from the CNN generated objectives have superior dose sparing without compromising of target dose. It significantly reduced the planning time.
Introduction
IMRT has been widely used for cervical cancer external beam radiotherapy treatment (EBRT). 1,2 Compared to the 3-dimensional (3D) conformal radiotherapy (3DCRT), IMRT generally improves the target coverage and organs-at-risk (OARs) dose sparing. 3,4 However, obtaining an optimal IMRT plan was a demanding and time-consuming task, due to the irregular shapes of the target volumes and surrounding OARs. 5 Plan quality usually highly depends on the individual planner’s skill and experience, and it may vary between treatment centers. 6 Many IMRT plans clinically acceptable could still be further optimized, which could potentially improve treatment outcomes.
Automatic planning is one promising solution to consistently producing high quality clinical treatment plans efficiently regardness of planner skills and experiences. 7 -26 Since Xing et al demonstrated a significant improvement in inverse planning efficiency from an iterative algorithm for automatically selecting the importance factors, 7 many commercial softwares (AutoPlan in Pinnacle, 11,15 RapidPlan in Eclipse 12,14,19,20 and Multi-Criteria Optimization in RayStation 13,16 ) and in-house systems 8,9,17 have been developed for automatic planning, continuously improving the quality and consistency of treatment plans. Recent developments and breakthroughs in machine learning have found many applications in medicine 27 -29 and radiotherapy, such as toxicity prediction, 30 automatic segmentation 31 and quality assurance. 32 Novel machine learning based dose-volume histogram (DVH) prediciton 33 and 3D dose distribution prediction 21 -24,34 have attracted more and more attention for developing better automatic planning techniques. Ma et al developed a support vector regression (SVR) model for DVH prediction for prostate cancer patients. 25 A voxel-based dose prediction and dose mimicking method were proposed to generate complete treatment plans without user interaction. 21 Similarly, Mardani et al proposed a dose prediction model based on the convolutional neural network (CNN), which can be used to derive contour-to-dose relation map. 18 DoseNet 22 and ResNet 23 also demonstrated success for 3D dose distribution prediction.
The CNN is a deep learning algorithm capable of extracting nonlinear higher-order features from input data through a hierarchial learning process. 35 Although useful in 3D spatial dose prediction, 21 -24 CNN models were often difficult for clinic implementation due to the technical complexities in training networks to directly map pateint anatomy represented by the raw CT image and structure contour data to ideal dose distributions. The conversion from 3D spatial dose prediction to clinically deliverable plans was also challenging. 24 -26 As a result, previous results were hardly reproducible and rarely readily applicable in a standard clinic settings. On the other hand, DVH prediction was relatively simple and easier to achieve since the DVH itself is an intrinsic generalization of the relationship between patient anatomy and dose distribution. With the improved feature extraction provided by CNN, DVH predictions from patient anatomy represented by target and OAR data alone can be theoretically more accurate. 36 Currently, fast and effective predictions of patient specific IMRT optimization objectives by deep learning methods were very limited. Nonetheless, if the patient specific IMRT objectives were successfully predicted, we could obtain clinical deliverable plans by one simple TPS optimization in any commercial system. In addition, rare attempts have been made to explore the benefit of automatic IMRT planning for planners with different planning experiences. 19 This study developed an automatic IMRT planning workflow based on a novel CNN based DVH prediction model. Overlap volume histograms (OVH), which characterize the spatial information of the target volume and OARs, 37 were used as the input for a CNN deep learning prediction model. IMRT plan objectives from the model outputs were subsequently applied to automatically generate plans fully driven by the inherent Pinnacle 3 scripting technology. The feasibility and the efficacy of the automatic planning (AP) workflow were evaluated by comparing dosimetry against the corresponding manual plans from a veteran planner and a novice.
Material and Methods
A CNN-based deep learning model was trained to predict the specify patient dependent IMRT objectives for cervical cancer. The automatic plans were generated based on these predicted objectives. The flowchart of the automatic planning program was shown in Figure 1.

Flowchart showing the proposed automatic planning program. CXCA represented cervical cancer.
Patient Data and Treatment Planning
140 cervical cancer patients treated with IMRT from 2018 to 2019 in our institution were selected for this study. They were immobilized in the supine position by a thermoplastic mask. Planning CTs were performed with a Brilliance CT (Big Bore, Philips Medical Systems Inc., Cleveland, OH, USA) and the slice thickness was 5 mm.
The clinical target volume (CTV) and OARs including bladder, rectum, bowel, bilateral femoral heads and marrow were delineated by an attending oncologist. The prescription was 48.6 Gy (1.8 Gy per fraction) to the planning target volume (PTV) generated by 1 cm uniform expansion from the CTV. Additional contours were made: (1) PTV-0.3, shrinkage from PTV by 3 mm; (2) Def1 and Def2, the 5mm-wide-ring at 0.5 cm and 1 cm distance from PTV; (3) Def3, the external contour excluding the PTV expanded by 1.5 cm.
All IMRT plans were designed in the Pinnacle 3 treatment planning system (TPS) (version 9.2, Philips Radiation Oncology Systems, Madison, WI) for an Elekta Synergy accelerator. Equally spaced 7 coplanar 6 MV photon beams were employed and optimized with direct machine parameter optimization (DMPO) for all IMRT plans. The maximum number of segments, minimum segment area and minimum segment MUs were set to 70, 8 cm 2 , and 8, respectively.
120 patients were randomly chosen to build the model. IMRT expert plans were generated by a planner with 10+ years experience. The planning began with the preset function objectives (Table 1). 15 parameters (marked with “X”) were adjustable by the planner, who iteratively improved the plan quality until the OAR doses were as low as possible without compromising target coverage of PTV.
IMRT Objectives Set for the Expert Plans.
X represents adjustable value.
The Prediction Model Using CNN Deep Learning
The OVH(r) for an OAR is defined as the overlap volume ratio of the OAR with the PTV extended or contracted with a margin of r in mm. In this study, OVHs for 7 OARs including bladder, rectum, bowel, femoral_L, femoral_R, marrow_L and marrow_R were extracted on TPS, with a distance from -2 cm to 3 cm and a step-size of 1 mm. Thus, 51 OVH were calculated for each OAR, and a total of 357 (7 × 51) input values was obtained for each patient. For supervised learning, the final values of 15 parameters form expert plans were used as the training data output values.
The same 120 patients were randomly shuffled into training and validation sets (100/20). As shown in Figure 2, the CNN model contained 2 convolution layers, 2 activation (ReLU) layers, 2 max pooling layers and 2 fully connected layers. 38 The specifications of these layers were given in Table 2. A robust optimizer called Adam was applied in the model with a learning rate of 0.005. 39 The batch-size was set to 20 and iteration times to 10000. The mean square error (MSE) loss function was applied to ensure stable learning. The CNN model were realized by Tensorflow with Spyder (version 3.31). 40 It was trained on a personal computer with an Intel(R) Core(TM) i5–7200U 2.5 GHz CPU and 4 GB of RAM. The whole training time was about 40 mins.

Schematic diagram of the CNN architecture.
Specification of the CNN Model.
Automatic Planning and Manual Planning for Patients in Test Dataset
Patient-specific IMRT objectives set derived from CNN model were used to generate automatic plans (shorten by AP) for the remaining 20 patients in the test date set. 80 maximum iterations were allowed and no more additional adjustments were made once the AP completed.
Meanwhile, manually optimized plans for the same patients were generated by a beginner planner (MP1, <1 year experience) and a senior planner (MP2, >5years experience). Basic optimization parameters of the manual plans were the same as those AP as shown in section 2.B. However, planners were allowed to generate additional contours, freely set all objectives and make multiple iterations as many as deemed necessary.
Plan Evaluation and Statistical Analysis
For quantitative comparisons, the 3 sets of plans (AP, MP1 and MP2) were normalized such that 95% volume of PTV would receive the prescription dose. The average DVH for the PTV and all delineated OARs were calculated and analyzed for the 3 sets.
PTV dose corresponding to 2% of volume (D2) and 98% of volume (D98), mean dose (Dmean), conformity index (CI = (Vprescription in PTV /VPTV)*(Vprescription in PTV/Vprescription)) and homogeneity index ((HI = (D2-D98)/Dprescription)) were all evaluated. For bladder, rectum and bowel, V30, V40, V45 and Dmean were evaluated . For marrow and femoral (combined by bilateral marrow and femoral), Dmean was analyzed. The average planning duration and monitor unit (MU) per fraction were also included for comparison.
Shapiro-Wilk tests were performed for all relevant parameters to verify the normal distribution of the differences. The paired-t test were carried out (between AP and MP1, AP and MP2, MP1 and MP2) for dosimetric parameters previously described, planning time and MU. Statistical Package for the Social Sciences (SPSS 21.0; SPSS Inc., Chicago, IL, USA) was used to perform these tests and P < 0.017 after the Bonferroni correction was considered statistically significant.
Furthermore, the plan quality for each patient was evaluated by another experienced oncologist taking into account the clinical judgements. All AP, MP1 and MP2 plans were blindly reviewed and rated from best to worst.
Result
Average DVH Curves
Average DVH curves of the PTV and delineated OARs for the AP, MP1 and MP2 test plans were shown in Figure 3. For the PTV, AP had a small higher dose than MP1 and MP2 above 50 Gy, while no obvious difference was found below 50 Gy. The slope of MP1 and MP2 were slighter steeper than that of AP, indicating slightly better dose homogeneity. For the bladder, AP resulted in noticeable lower doses in the region from 20 Gy to 50 Gy. For the rectum, AP had lower doses between 30 Gy and 45 Gy while similar to both MPs in other regions. For the bowel, femoral heads and marrows, AP in general showed lower dose; however, not as noticeable as that of the bladder and rectum. In addition, MP2 observed a lower dose than MP1 for most OARs, except for the bowel.

Average DVHs for AP (black line), MP1 (red line) and MP2 (blue line) in the test set.
Dosimetric Quality Metric Comparison and Analysis
DVH parameters were summarized in Table 3 for quantitative comparisons. PTV maximum dose D2 and mean dose Dmean were slightly higher in AP than MP1 and MP2 (P < 0.017), while the minimum doses D98 were comparable (P > 0.04). The difference in the target conformity were small for the 3 sets of plans. However, the target homogeneity of AP was a little worse than MP1 and MP2 (both P < 0.017).
Summary of Interested DVH Parameters to the PTV and OARs.
For OARs, most DVH parameters for AP were lower than or equal to the 2 manual plan sets. Compared to MP1, for instance, the V30 and V40 of bladder decreased with 8.7% and 6.3%, while V40 and V45 of rectum decreased with 12.3% and 10.2% respectively. The mean dose of bladder, rectum, bowel, femoral and marrow were lower in AP by 0.5 Gy, 1.1 Gy, 0.2 Gy, 0.3 Gy and 1.8 Gy respectively. Compared to MP2, AP had 0.8 Gy (P < 0.017) lower mean marrow dose, while the doses for the other OARs were comparable. For the rest parameters, AP produced a little better, although most of the results were not statistically significant (P > 0.017).
By the way, most DVH parameters for OARs were lower for MP2 than MP1.
Blind Comparison by Oncologist
Blind comparison by an experienced oncologist was performed for each patient to rank the 3 types of plans. Both DVH and isodose contributions were carefully examined. The result was illustrated in Figure 4. In 70% of the cases (14 out of 20 patients), AP was rated the best while MP2 was deemed the best for the other cases. The MP1 was considered least favorable for all patients.

Blind comparison by an experienced attending radiation oncologist. The Green, yellow and white tiles represented the most, intermediate and least favorable plan for each patient, respectively.
MU and Planning Time Comparison
The average MU of AP and MP was calculated. Average MU for AP was 772, 4.5% higher than that of MP1 (739) and 9.5% higher than that of MP2 (705) (both P < 0.017). In practice, such increases in MU was clinically acceptable in favor of the improved plan quality.
All APs were completed within only one optimization cycle while the MPs were iteratively optimized for as many cycles as needed. The average planning time of AP was 8.5 minutes, including OVH extracting. In Comparison, the beginner planner and the senior planner spend 33 minutes and 37 minutes for the planning process on average, respectively (both P < 0.017).
Discussion
Automatic treatment planning has been introduced to improve the plan quality and it was usually implemented based on one of the 2 general techniques. The first is based on an optimization algorithm. It would automatically attempt to generate an acceptable plan that meet the preset clinical criteria by adjusting optimization parameters. 15 The commercial Autoplan from Pinnacle 3 utilizes this approach. 11,15 The second approach, known as the knowledge-based planning (KBP), 41 aimed to predict DVH objectives and/or 3D dose contribution for a new patient based on generalization from a library of previously treated patients with similar target and OAR anatomical features. RapidPlan in Eclipse 12,14,19,20 is an example of this method. Both techniques could consistently generate high-quality plans even compared with experienced planners.
In this study, a new automatic IMRT planning program for cervical cancer patients is developed based on inherent Pinnacle 3 scripts with the IMRT objectives predictions achieved with machine learning methods. This approach has shown certain advantages by recent studies. Patient-specific 3D dose distribution for head-and-neck cancer treatments were successfully predicted by deep learning technique 21 -24 then used as guidelines for plan optimization. Compared with DVH and 3D dose distribution prediction, there were few reports for direct prediction of IMRT objectives. In our study, a CNN prediction model was implemented to perform a prediction of patient-specific IMRT objectives setting. OVHs that represent PTV-OAR relationships were extracted and used as the input data for the CNN model. Using the OVH information greatly reduced the amount of data needed for CNN such that the model can be implemented on a standard office PC without the need for GPU and other cost prohibitive computational resources. The entire training process for our study was less than 1 hour on a Windows PC with a 2.4 GHz CPU and 16G RAM. Patient-specific set of IMRT objectives were successfully predicted by the CNN model and used to automatically generate plans based on scripts in Pinnacle 3 TPS. Although only optimized for a single iteration, the dosimetric quality of such generated plans were comparable or better than manually created plans from planners of different experience levels. Compared to manual plans from the beginner planner, all OARs enjoyed statistically significant lower doses. In particular, the mean doses for the rectum and marrow decreased 1.1 Gy and 1.8 Gy respectively. V40 of bladder decreased by 8.7% and V40 of rectum decreased with 12.3%. Although the target homogeneity of PTV slightly decreased and the mean MU increased by 4.5% with AP, consistent with previous study, 42,43 it was considered acceptable for the improvement of plan quality. Compared to manual plans from senior planners, the AP plan quality was slightly superior or comparable but most differences were statistically insignificant. However, the consistency of plans optimized with our approach are expected to be better since the process need little to none manual interventions. Consequently, the average planning time was significant reduced.
There is few research that focus on the benefit of automatic IMRT planning for planners with different planning experiences. In one study, it was shown that the KBP effectively improved the IMRT plan quality for left-sided breast cancer against the benchmark of manual plans for less experienced planners without any manual intervention. 19 In our study, it was demonstrated that the plan quality strongly correlatives with the planner’s individual experience and skill levels. In most cases, better OARs were achieved by senior planners while the target dose parameters were comparable. Our study revealed that clinical acceptable IMRT plans could be suboptimal and should be be improved further, especially among plans generated by inexperienced planners. In such cases, automatic planning could be particularly beneficial to serve as patient specific tangible plan quality guidelines. The feasibility of this new automatic program was also testified by the result of blind comparison by an experienced oncologist, 70% of APs were chosen as the best.
However, there were some limitations in this study. First, OVH may be sufficient for generalizing anatomical features of cervical cancer patients. This index alone could be inadequate for the characterization of patients of other sites. Secondly, our current dataset may not be large enough to include all possible anatomy that could present in clinic. The robustness of the CNN model can be questionable for such outliers not “seen” in the training of our model. Finally, this study only included IMRT plans. Volumetric modulated arc therapy (VMAT) are beginning to be widely used in the treatment for cervical cancer, a library for VMAT plan would be more representative for follow-up studies. 44
Conclusion
In conclusion, a robust automatic IMRT treatment planning program for cervical cancer is developed based on a CNN deep learning model and inherent Pinnacle3 scripts. The results suggested that this automatic program could generate high-quality IMRT plans consistently. The use of this program will mostly benefit clinics with planners of entry level experience to improve plan quality. For senior level planners, the use of this program may improve planning efficiency.
Footnotes
Abbreviations
Authors’ Note
This study was approved by the ethics committee of Fujian Cancer Hospital (approval no. SQ2016-048-01). All patients provided written informed consent prior to enrollment in the study.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The project was sponsored by Fujian Provincial Health Technology Project (Grant No: 2017-ZQN-15, 2018-ZQN-19 and 2017-CX-8), Fujian Provincial Department of Science and Technology (Grant No: 2016Y0018), and Science and Technology program of Fujian Province, China (Grant No: 2018Y2003).
