Abstract
Background and Purpose
Artificial intelligence (AI) is a technique which tries to think like humans and mimic human behaviors. It has been considered as an alternative in a lot of human-dependent steps in radiotherapy (RT), since the human participation is a principal uncertainty source in RT. The aim of this work is to provide a systematic summary of the current literature on AI application for RT, and to clarify its role for RT practice in terms of clinical views.
Materials and Methods
A systematic literature search of PubMed and Google Scholar was performed to identify original articles involving the AI applications in RT from the inception to 2022. Studies were included if they reported original data and explored the clinical applications of AI in RT.
Results
The selected studies were categorized into three aspects of RT: organ and lesion segmentation, treatment planning and quality assurance. For each aspect, this review discussed how these AI tools could be involved in the RT protocol.
Conclusions
Our study revealed that AI was a potential alternative for the human-dependent steps in the complex process of RT.
Keywords
Introduction
Radiotherapy (RT) is one of the most common treatment modalities for tumor.1,2 The RT workflow is a complex process consisting of several human-dependent steps that have an impact on treatment effectiveness. Artificial intelligence (AI), a modern technology to think like humans and mimic their actions, seems to be a potential alternative in the following RT aspects: (1) lesion and organ contouring, (2) treatment planning, (3) quality assurance (QA).
The lesion and organ contouring means to identify and delineate the edges of lesion and organ on hundreds of two-dimension images. In the current clinical protocol, it is mainly done by human. It is labor-intensive and time-consuming. Although the automatic or semi-automatic segmentation tools have been commercially available to release such burdens, they can’t achieve the satisfactory performance. Taking the atlas-based automatic segmentation,3,4 a common tool, as an example. It is an image registration-based approach. The atlas refers to the reference images with organ contours. When getting a new image, it would be matched with the reference images using registration algorithms. Based on the registration results, the organ edges on the new image are generated by transforming the contours annotated on the reference images. Therefore, the performance is impacted by various choices on atlas and registration approach. Human check and correction are still necessary. The drawback of those currently available automatic segmentation methods is that they don’t show human intelligence. With the development of AI, it shows the potential of mimicking humans and of doing a good job on the lesion and organ contouring.
The treatment planning is a human-computer interaction process to solve an optimization problem. The purpose of the optimization problem is a satisfactory treatment plan (ie, the dose delivered to tumor reaches the prescription and the normal organs’ dose is as low as possible). To achieve this purpose, a human gives an initial optimization goal (including the minimum dose delivered to tumor, the tolerated dose to various normal organs, optimization weights, et al) to the computer. The computer updates the treatment plan parameters, such as the linear accelerator (LINAC) gantry angles and the multi-leaf collimator shapes, to approach this goal. During the process, the human decides whether the treatment plan reaches optimum and how to adjust the optimization goal to get a better plan. It means that the quality of a treatment plan depends on the planner experience, and hence causes quality uncertainty. Therefore, AI, a machine to think like humans, becomes an alternative to make such a decision.
The QA is a systematic process of determining whether an equipment or a step meets specified requirements. RT involves a lot of equipment, such as LINAC, simulator and laser positioning systems. RT also consists of various aspects, including computed tomography (CT) scanning, tumor identification and treatment plan optimization. Any error occurring in one equipment or one aspect may cause medical risk. QA is to reduce the likelihood of these errors. Human is the QA operator. Thus, the human-dependent factors impact the QA precision. Additionally, the complex QA procedures cause a labor burden for a clinic, and crowd out these equipment’s time for treatment. Therefore, QA needs accuracy, efficiency, and uniform standards urgently. AI, a machine/robot which is capable of human intelligence, seems a good candidate for QA.
Based on the great potential of AI in the three RT aspects, it has been explored to increase quality, standardization, and acceleration.5-10 This article is to provide a systematic literature review on the application of AI in the above three parts, and their promises and limitations for clinical use. The work is organized as follows: The “Search Strategy and Selection Criteria” section introduces the search strategy and selection criteria. Sections “Organ and lesion segmentation”, “Treatment planning” and “QA” review the AI techniques for segmentation, treatment planning and QA respectively, and discuss their applications for each aspect in terms of clinical community. Sections “Limitations and Challenges” and “Prospects for the Future” discuss the challenges and prospects of AI in RT in terms of management, economics and society. Section “Conclusion” gives a conclusion.
Search Strategy and Selection Criteria
To assemble the literature relevant to this work, the authors searched PubMed and Google Scholar, from inception until the end of May of 2022, for articles employing AI in RT. Specially, we searched PubMed and Google Scholar using the following list of queries: [“Artificial Intelligence” AND “automatic segmentation”], [“Artificial Intelligence” AND “automated treatment planning” AND “radiotherapy”], [“Artificial Intelligence” AND “dose prediction”], [“Artificial Intelligence” AND “automated optimization”], [“Artificial Intelligence” AND “quality assurance”], [“Artificial Intelligence” AND “QA”], [“Artificial Intelligence” AND “patient-specific QA”], [“Artificial Intelligence” AND “machine-specific QA”] and [“Artificial Intelligence” AND “prognosis prediction”]. “Artificial Intelligence” was in turn replaced with the terms “Machine Learning”, “Deep Learning” and “Neural Network”. These searched articles’ relevance to the topic was further checked by the authors. The articles that were not related to RT were excluded from this review. Additionally, the reference lists of selected articles were hand searched for other relevant articles.
The literature search was limited to English language publications, original researches that were published in a journal. Eventually, we categorized these selected articles into three groups: (1) organ and lesion segmentation, (2) treatment planning and (3) QA.
Organ and Lesion Segmentation
In RT, the segmentation of organs and lesion is used for inverse treatment planning and clinical evaluation. It reveals the spatial relationship among organs at risk (OARs) and lesion, and provides parameters (such as volume) to calculate clinical goals. Therefore, the segmentation accuracy and consistency are necessary to guarantee the plan quality and treatment effectiveness.11,12 The automatic OAR segmentation using AI is relatively easier than lesion delineation, since OARs in various patients are similar, but lesion shows different in shape and size which are individualized.
Organ Segmentation
AI-based automatic segmentation of OARs has been reported,13-16 and the relevant tools are also commercially available (as shown in Figure 1). Fully convolutional network (FCN)
5
is the primary type of AI for automatic segmentation. For the OARs encompassing high contrast with their surrounding tissues, such as lung, eye and bladder, they can achieve high accuracy. Zhu et al
13
reported the average Dice similarity coefficient (DSC) of 0.95 for lung. Bladder DSC of 0.94 and eye DSC of 0.91 were reported by Zhou et al.
14
The OARs with small volumes and fuzzy boundaries, such as optic chiasma, pose challenges to the segmentation task. To deal with this problem, researchers tried to find solutions by improving their network architecture designs
16
or loss functions.
16
A cross-layer spatial attention map fusion architecture
16
was proposed to enhance the network’s attention to the target area. A multi-task learning paradigm with shape constraints
17
aimed to learn well-generalizing features. The focal loss,
18
exponential logarithmic Dice loss (ELD-Loss)
19
and top-K exponential logarithmic Dice Loss (TELD-Loss)
16
were introduced to solve the imbalance problem (ie, the imbalance among organs with various sizes, and the imbalance between difficult-to-segment organs and easy-to-segment organs). Automatic organ segmentation by a commercial AI tool. (A-B) Are the CT slices in head and thorax respectively. (C-D) Show the CT images in abdomen.
Lesion Segmentation
FCN plays a main role in the task of segmenting lesion. Due to the tumors’ diverse shapes, sizes, locations and poor contrast with its surrounding tissues, this task is hard and relies on delicate network architecture20,21 or multimodal images.20,22,23
Jin et al 20 proposed a two-stream deep network fusion framework and a progressive semantically-nested network (PSNN) segmentation model to delineate gross target volume (GTV) for esophageal cancer on CT and positron emission tomography (PET). They achieved DSC of 79% and average surface distance (ASD) of 5.7 mm. Attention mechanisms, 21 modified ResNet 24 and a context block 25 were also adopted in an FCN to segment different tumors.
Although CT is the primary image modality for lesion segmentations in most published papers,26,27 it shows blur boundary and low contrast for certain tumors (as shown in Figure 2). PET28,29 and magnetic resonance image (MRI)30,31 support complementary information to it. The two imaging techniques are versatile, since the different radiotracers used in PET can target different molecules, and the different sequences in MRI can highlight different tissues. To exploit the information from these multimodal images via AI, image fusion is prerequisite. Except for the spatial alignment of them, the data noise is also a major factor which would negatively affect lesion segmentation accuracy. It is because that the noise is usually an unwanted information in images, and the AI is susceptible to it.32,33 More details about data fusion and its relevant preprocessing can be found in Wang et al.’s work
34
and Zhang et al.’s report.
35
Illustration of brain tumor in CT slice and MR image. (A) Shows the medical images. (B) Shows the GTV (red line) edge on these images. Compared to CT, brain tumor has a clearer boundary in MR. GTV is gross target volume.
Clinical Views and Applications on AI-Based Automatic Segmentation
Time saving, accuracy and consistency is the purpose of developing AI-based automatic segmentation. It has been proved that the currently available FCN sped up the delineation process, and reduced the time cost to tens of minutes or less.36-39 Geometric metrics, such as DSC, 37 hausdorff distance (HD) 37 and average surface distance (ASD),40,41 are seen in lots of published reports. However, a lower DSC or a higher ASD doesn’t always reflect a bad segmentation, due to the inter-observer variation and different guidelines among different institutions. 15 Likewise, a higher DSC doesn’t always represent a clinical acceptance. For example, GTV is the volume which needs accurate boundary definition for treatment effectiveness. A larger GTV volume presents a higher DSC, but the automatically generated edge is unstable. 42 Therefore, in clinical practice, manual check is necessary.
The dosimetric evaluations on these automated contours of OARs proved its potential for routine clinical use. Zhu et al 43 performed dosimetric evaluation on the automated delineation of OARs for esophageal cancer, and found its clinical acceptance. Liu et al 38 found that there was no significant difference in the dose-volume parameters between manually and automatically delineated OARs for non-small-cell lung cancer radiotherapy.
Treatment Planning
Inverse treatment planning is an iterative optimization process after being given a series of optimization parameters. These parameters, including target coverage and OAR constraints, are modified again and again for the optimal plan during manual treatment planning. To speed up such procedure as well as guarantee the plan quality, giving the optimal parameters as the initial ones to the treatment planning system (TPS) can shorten the process of back-and-forth modifications. Some researchers also use AI to guide and supervise the optimization process. These are what AI does for automated treatment planning.
Automated Dose Map Prediction
The dose map prediction is the primary use of AI for treatment planning. It can be categorized as dose-volume histogram (DVH) prediction 44 and voxel-based dose prediction.45,46 From the predicted DVH and voxel-based dose, the optimization parameters (such as maximum dose and volume receiving a certain dose) can be derived.
Initially, the inputs of DVH prediction model were hand-crafted features, such as OAR DVH, 47 organ volumes 48 and distance-to-target histogram (DTH).48,49 The quantity and variety of manually-selected features are limited. They are hard to cover all DVH-related characteristics and thus are hard to be mapped perfectly to DVH. By resorting to the automatic feature extraction of neural network, Liu et al 50 used a connected residual deconvolution network to correlate the spatial distribution of planning target volumes (PTVs) and OARs directly to DVHs of OARs. The spatial distribution of PTVs and OARs was a multi-channel image. In each channel, the pixels were labelled as different digits to denote different OARs or PTVs. Similarly, Chen et al 51 used a ResNet-101-based network to predict OAR’s DVH based on a two-channel structure image. Cao et al 52 adopted a gated recurrent unit-based recurrent neural network (GRU-RNN) to predict DVHs using the dosimetric information induced by individual beam.
Given that the predicted one-dimension (1D) DVH lacks the spatial dose distribution information,
53
AI is explored for three-dimension (3D) dose distribution prediction. Its common design is shown in Figure 3. Song et al
54
used a deep neural network DeepLabv3+ to predict dose distribution for rectal cancer, and invited four dosimetrists with different years of experience to conduct the replanning based on the predicted dose. Their results showed that the DeepLabv3+ prediction doses were all clinically acceptable. Using the information of predicted dose indeed saved an average replanning time of 13.66 min∼15.76 min. Gronberg et al
55
proposed a 3D dense dilated U-Net architecture to predict 3D dose distributions for head and neck radiation plans. They achieved an average mean absolute difference of 2.56 Gy between the ground truths and the predicted ones. A hierarchically densely connected U-net
56
was explored for automated treatment planning on head and neck patients. It was proved effective in 3D dose prediction with an error of less than 6.3% for all OARs’ max doses and an error of less than 5.1% for the prescription dose. An example of AI-based prediction model for dose map distribution for breast cancer. PTV is planning target volume. ROI is the region of interest.
Automated Optimization Process
Automated optimization process is using AI to simulate the interaction between TPS and human planners. Zhang et al 57 trained a reinforcement learning (RL)-based planning bot for pancreas SBRT plans. Their 24 test plans achieved similar target coverage compared to clinical plans while satisfying other dose constraints. Shen et al 58 developed a hierarchical virtual treatment planner network (HieVTPN) to operate a TPS to generate a treatment plan. HieVTPN consisted of three networks: Structure-Net, Parameter-Net and Action-Net. During automated optimization, the three networks were employed in a sequence order. Structure-Net decided which structures needed adjustment. Parameter-Net decided what parameters for the selected structures needed update. Action-Net decided the specific adjustment manner for the selected parameters. HieVTPN achieved a plan score of 8.62 ± .83 (the best score was 9) on 59 testing prostate IMRT (intensity modulated radiation therapy) plans and a plan score of 139.07 (the best score was 150) on 5 testing prostate SBRT (stereotactic body radiation therapy) plans.
Clinical Views and Applications on AI-Based Automated Treatment Planning
The standardization and the improved quality of treatment plan are the contributions of developing AI-based automated treatment planning, due to the expert experience learnt by AI. The currently available AI-based dose prediction tools can be used as a guidance for planners to adjust plan optimization parameters or can be added to a commercially available automated method to generate customized inputs to TPS for personalized treatment plans. The AI-based optimization tools can be merged into a TPS for an automated planning. Specially, after a TPS receives the initial input parameters, the AI tools tune them until the optimization reaches a preset maximum iteration number or reaches an acceptable convergence tolerance. Its time and computing resource cost for a plan, especially for a complex one (such as a plan for tumor in head and neck), are the concern for its clinical application. Furthermore, it can be a possible solution for adaptive radiation therapy, since it has the potential to accelerate treatment planning. 54
QA
QA in RT is all procedures to ensure consistency between the medical prescription and safe fulfilment of it. 59 QA involves all aspects in the course of RT, and thus refers to a significant workload and machine downtime. The main feasibility studies of AI application for QA include patient- and machine-specific QA. Prognosis prediction is also included in this section for its potential in QA of treatment efficacy.
Patient-specific QA
Patient-specific QA measures the consistency between the delivered dose and the expected dose. By using AI to predict plan passing rate or whether the delivery errors exist in a plan, the workload of measuring and analyzing dose using a phantom can be reduced or avoided.
Interian et al 60 used a convolution neural network (CNN) to predict gamma passing rate by inputting fluence maps, and obtained a mean absolute error of 0.70 ± 0.05. Similarly, Tomori et al 61 developed a 15-layer CNN to predict gamma passing rate with the input of dose distribution, structure volumes and monitor unit values for each field. Nyflot et al 62 adopted a deep learning approach to predict the presence or absence of RT delivery errors from gamma image. In their work, the mentioned RT delivery errors were the random and systematic multi-leaf collimator (MLC) errors. The deep learning approach achieved an accuracy of 77.3% to classify the plans with and without errors, and an accuracy of 64.3% to label plans as containing random MLC errors, containing systematic MLC errors and error-free.
Patient-specific QA test results are influenced by a lot of factors, including the machine accuracy (eg, leaf position and velocity, gantry angle and dose rate) and the dose calculation precision (eg, the model used in TPS). Only several factors are hard to correlate perfectly with the QA test results. This maybe the reason of the low prediction accuracy in the above reports. Involving more information as the model input is potential to improve its performance.
Machine-specific QA
Machine-specific QA consists of assessing the performances of all devices involved in RT, such as LINAC, CT simulator and on-board imaging equipment.
Valdes et al 63 used a support vector machine (SVM) to identify image artifacts. Naqa et al 64 reported their work on predicting gantry sag, radiation field shift and MLC offset data by using machine learning methods. Li et al 65 developed an artificial neural network (ANN) time-series model to predict beam symmetry, and achieved a mean square error of around 0.14. The results of ANN time-series model were better than autoregressive moving average (ARMA) model.
Prediction on Patient Efficacy and Side Effect
Prognosis prediction is not included in the routine QA. It was written in this section for its potential application. Current plan evaluation metrics are 1D, such as average dose and volume receiving a certain dose. They are derived by reducing complex treatment data and discard spatial information. This has proven to be a particular problem for normal tissue complication probability (NTCP) prediction. 66 By relating the spatial dose to radiotherapy outcomes, it may improve the prediction accuracy, and may push QA into a new era of assuring treatment efficacy, more than just delivery quality.
Ibragimov et al 67 proposed a 3D CNN to predict hepatobiliary toxicity by inputting 3D dose distribution and CT images. The prediction accuracy achieved 0.73 in terms of the area under the receiving operator characteristic curve. Liang et al 68 built a 3D CNN to predict radiation pneumonitis grade with the input of 3D dose. Their prediction accuracy (ie, the area under curve, AUC) was 0.842 and was better than other three comparative multivariate logistic regression models. Similarly, Liang et al 69 correlated multi-modality data, namely 3D dose, ventilation image (VI) and functional dose (obtained by weighting dose distribution with VI), to radiation pneumonitis grade. Their AUC was 0.874.
Clinical Views on the AI Application for QA
The AI approaches for QA are just in the early stage. Although a large amount of papers, as previously detailed, have reported their exploration in various aspects of QA, there is no determination about how to take them into the clinical routine. The lack of interpretability of some AI approaches, such as deep learning, reduces the trustworthiness of their QA results. If the actual QA tests are replaced by these AI-based QA tests (act as virtual tests), it may increase the treatment risk. It is opposite to the goal of performing QA in RT (ie, the treatment safety).
Currently, the AI-based QA approaches can be used as an early warning system for anomalous event detection, and suggest the radiation physicists to conduct tests. As for the elimination of some QA tests (eg, to replace patient-specific QA test with a passing rate prediction), it needs more researches on the interpretability and accuracy.
Limitations and Challenges
For now, the application of AI has been explored in a lot of aspects in radiation treatment. Some results have been satisfactory for the clinical community, such as the segmentation of normal lung. Some research fields are still challenging, such as tumor segmentation and automated optimization process. These technical achievements and challenges have been discussed detailly in the last subsection of each above section. In this section, we discuss its limitations and challenges in terms of management, economics and society.
The lack of regulation to use AI in the radiotherapy practice is one of the major limitations. AI is a machine without guarantee of 100% accuracy. Its “black box phenomenon” reduces its trustworthiness for clinical practice. Therefore, the human surveillance is necessary. But the surveillance details are lacking. Such as what items need check? Who is responsible for these inspections? There is no answer to these questions now. At its most basic, it’s the lack of unified standards for radiation treatment practice.
The fragmented data which is used for training lowers AI’s widespread use. The available AI products are trained using data from several institutions. Various institutions or hospitals execute different protocols. Thus, the fragmented data can’t give an optimal solution for all hospitals.
The current AI may obstacle the progress of radiation treatment. Radiotherapy, even medicine, is a discipline with continuous development. Most current AI products are static machines that grasp knowledge from history labelled data. Its closed architectures limit the knowledge exploration and extraction for physicians, since most of them have no educational background of computer science. Furthermore, if AI replaces some human work (such as prognosis prediction), it means that the radiation oncologists and technicians who just graduate from school can never find the new problems and solutions from this kind of experience. It is not good for the progress of this discipline. To sum up, how to use AI in a best way with addressing these concerns is essential.
The uncertain market requirement is limiting AI products’ falling into practice. At present, the AI tools are created to mimic human, and hence it is a robot integrating knowledge from experienced and educated professionals. Therefore, the AI products are unneeded for the first-class hospitals in which a lot of experts work. It seems that the requirement of AI assistance is urgent for the primary hospitals, but the primary hospital is not the first choice for a cancer patient. Furthermore, the habit of seeing a doctor is face-to-face and talking to a real person. To change a habit is difficult.
The potential unemployment is a social ramification of AI. AI may cause some jobs redundant, when it cuts cost and reduces clinician pressure. The consequence is that a large amount of people who do the repeated work are laid off, and hence causes social instability.
Prospects for the Future
Although AI faces numerous unsolved problems in the field of technology, management, economics and society, it shows promising in the practice of radiation treatment. It frees up clinicians from tedious work and gives more time to interact with patients. It cuts cost and improves the quality of medical service, and thus is potential to turns all radiation treatment departments or centers to be first-class. The automation brought by AI accelerates the realization of adaptive radiotherapy.
By resorting to the feature extraction of AI, researchers can know more about pathogenesis and other treatment-related issues. When AI goes into a new era of concluding new knowledge from the current era of mimicking human, it may push the radiotherapy, even medicine, into a whole new world.
Conclusion
Overview of the AI Applications in RT Summarized in This Worka.
aAbbreviations: AI, artificial intelligence; RT, radiation treatment; TPS, treatment planning system.
Footnotes
Authors’ Contributions
GP Shan wrote this manuscript, SF Yu, J Zhang and ZJ Lai reviewed and analyzed the AI-related papers for radiotherapy. ZQ Xuan, BB Wang and Y Ge revised it.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Key Research and Development Program of Zhejiang Province, Number: 2024C03070; Key Cultivation Foundation for National Natural Science Foundation of China.
