Abstract
Introduction
Although long short-term memory (LSTM) networks have been tested to predict short-term respiratory motion, their performance in long-term forecasting under breathing irregularities must be assessed. We aim to evaluate and enhance the long-term prediction of internal motion from an external surrogate using subject-specific LSTM models through a hybrid, adaptive approach.
Methods
Concurrent internal navigator and external bellows respiratory-motion waveforms were acquired for ten volunteers during two four-dimensional magnetic resonance imaging (4DMRI) scans lasting 3–10 min each. Approximately 20 min intervened between the first (mid-term) and second (long-term) scan. After training on the first half of the mid-term data, subject-specific LSTM models were applied to the remaining mid-term and entire long-term datasets to predict internal waveforms. The accuracy of a model's prediction was assessed with Pearson's correlation (
Results
Compared to the native waveforms (
Conclusion
The feasibility of a novel adaptive subject-specific LSTM-TCC modeling was tested in 10 subjects, demonstrating that high accuracy of external-to-internal motion predictions in 3–10 min can be extended to 30 min overcoming breathing irregularities without remodeling. Further investigations of the adaptive LSTM-TCC model are warranted as a potential clinical solution.
Keywords
Introduction
In the presence of respiratory motion, radiotherapy treatments target the volume encompassing the trajectory of a moving lung or liver tumor throughout a respiratory cycle. Locating a tumor in real-time would reduce the motion margin needed in the internal tumor volume (ITV) for respiratory-gated radiotherapy (RGRT) or for real-time tumor-tracking radiotherapy (RTRT).1,2 As stereotactic body radiotherapy (SBRT) can be associated with significant skin toxicity, 3 however, real-time fluoroscopic imaging to track tumors throughout SBRT was not recommended. Over the last two decades, alternative approaches—including those employing external-to-internal correlation, semi-empirical formulas, and physical modeling––have been advanced for predicting respiratory-induced tumor motion.1,4–10 Nevertheless, it remains challenging to build predictive motion models with clinically acceptable accuracy over typical treatment timeframes. For example, Cyberknife's Synchrony system creates a patient-specific respiratory-motion model before treatments based on external- and internal-motion data pairs, and radiographic imaging is used to verify and update the model periodically during treatment. 11 However, breathing irregularities often violate the pre-treatment model, and several rounds of model rebuilding are often required during tumor-tracking treatments that can last up to 60 min. The lack of accurate and reliable real-time guidance hinders the practicality of utilizing external surrogates to guide RGRT or RTRT, especially for the isocentric linear accelerators, the most commonly used in the radiotherapy clinic. Consequently, real-time magnetic resonance (MR) imaging has been adopted to build MR-integrated linear accelerators (MRL) for MR-guided radiotherapy (MRgRT) as an alternative to external-to-internal motion modeling. Nevertheless, obtaining an MRL for MRgRT requires a large investment, and real-time MRgRT in intra-fractional motion management is still under development. To date, MRL accounts for only a small portion of all clinical linear accelerators, limiting its impact on cancer treatments worldwide.
Recently, deep-learning neural networks have been introduced to radiotherapy clinics. These methods outperform conventional approaches for image reconstruction, organ segmentation, deformable image registration, treatment planning, quality assurance, and outcome prediction.12–16 In respiratory-motion management, long short-term memory (LSTM) deep-learning neural networks are so far most applied to learn and predict time-series events.17–22 LSTM-based modeling can be categorized according to prediction timeframes (near future vs long term), datasets (within single series vs across a pair of datasets), and learning scope (population-based vs patient-specific). Examples of population-based, single-dataset, and short-term predictions include (1) Lin et al 17 applied an LSTM network to real-time positioning management (RPM) breathing traces from 985 patients to predict the values of external breathing traces 500 ms into the future and found the LSTM model's accuracy surpassed that of conventional machine learning models, and (2) Lombardo et al 23 reported that adaptive LSTM modeling (offline + online LSTM) yielded the best prediction of superior-to-inferior tumor motion at 250, 500, and 750 ms into the future using 2D cine MRI datasets from 88 patients. Patient-specific modeling requires sufficient data points per patient, such as Nie and Li 24 forecasted patient-specific tumor positions 125 ms and 250 ms into the future by applying the LSTM network to 4DMRI image libraries (960 time points with 240 3D images per patient) of lung tumors, and superimposed projected tumor volumes onto the next MR 2D cine frame in the beam eye's view for MR-guided RGRT delivery. For short-term, subject-specific, cross-dataset prediction, Wang et al 19 studied the external-to-internal motion correlation of seven volunteers using an LSTM network with 6500 datapoints per subject to predict internal motion 450 ms in the future (RMSE ≤ 1.0 mm) to reduce gating latency. These studies have demonstrated convincing evidence that LSTM networks can accurately predict short-term tumor motion and reduce tumor-tracking latency in motion management during radiotherapy.
The accuracy of the subject-specific LSTM model's prediction of internal motion from external-motion data has not been explored over timeframes—such as the 5 to 30 min needed for radiotherapy treatments—extending well beyond the collection period for obtaining training data. Additionally, clinical applications will require robust LSTM models that incorporate mechanisms for adapting to changes in breathing patterns. Breathing irregularities commonly occur among cancer patients due to changes in their breathing behavior. An adaptive LSTM model aims to enhance the model stability over a long timeframe, such as a 30-min treatment fraction in radiotherapy, in which new breathing variations would occur, allowing non-interruptive, high-quality prediction without lengthy remodeling processes. Previously, maximization of the time-domain cross-correlation (TCC) was shown to be effective in correcting the phase shift between external and internal respiratory-motion waveforms, and the identified phase shifts tended to be stable over long timeframes.9,10 The TCC method, therefore, provides a conventional standard against which the performance of an LSTM model can be compared. More broadly, the long-term accuracy of LSTM models and methods for introducing adaptation are of general interest in deep-learning research and clinical applications. 25
In this IRB-approved study of 10 healthy volunteers, we aimed to develop an adaptive LSTM model that can maintain high performance over the radiotherapy treatment timeframe (∼30 min). We first tested the LSTM network's ability to predict the internal navigator motion from the external bellows waveform in the data that immediately followed the model-training dataset, namely the second half of the mid-term timeframe (3-10 min). The LSTM model's performance was compared to the correlation between the native external and internal waveforms and the correlation after correcting the phase shift between the native waveforms through the conventional TCC method.9,10 An effective method was developed to select hyperparameters to create a high-performance LSTM model. We further assessed long-term predictions using data from the second scan (20-30 min after the training data) using both the LSTM models and a hybrid approach that—through the application of the TCC method to check and adjust LSTM model predictions—enabled the LSTM models to adapt to new breathing variations that were not present in the training dataset. The Pearson correlation coefficient (
Materials and Methods
Concurrent External and Internal Respiratory-Motion Waveforms of 10 Subjects
Under an IRB-approved protocol, external bellows and internal navigator waveforms were acquired concurrently from 10 healthy volunteers during two 4DMRI scans in 2015–2016. The duration was 9 ± 3 min (mid-term) for the first scan and, after an intervening 20 ± 5 min, 12 ± 2 min for the second scan (long-term). The bellows was placed about 5 cm inferior to the xiphoid process, and the navigator (3 × 3 × 6 cm3) interrogated the position of the right diaphragm's dome. The MR bellows was an airbag placed under a fixed-length Velcro strip around the belly, measuring air pressure changes during respiration, forming an external surface motion waveform. An MR navigator defined a region of interest, in which the MR signal can be triggered and acquired as a one-dimensional image. When the navigator was placed on a high-contrast interface, such as the diaphragm, its position could be detected from the MR navigator images, generating an internal motion waveform. The setup of the bellows and navigator is illustrated in Figure 1. The timestamps (in ms) of the scan log files were used to synchronize the two waveforms, and the 496 Hz bellows signal was down-sampled to match the 20 Hz navigator signal. Because both the navigator and bellows waveforms were collected at arbitrary scales, Z-score normalization was applied: The waveforms were normalized to have a mean of zero and a standard deviation of one. Because the waveforms were acquired during retrospective 4DMRI scans as an internal-motion signal for acquiring and binning, 26 the navigator scan was paused during the acquisition of 4DMRI slices. Due to breathing irregularities, the waiting time to get a particular not-yet-scanned phase became longer. Consequently, most of the waveform data were acquired toward the end of the two 4DMRI scans, and the length of time contained within the waveforms was therefore shorter than the 4DMRI scans.

Experimental setting and workflow: (A & B) navigator (red box) and bellows (blue) placements, (C) MR-compatible bellows, and (D) respiratory-correlated (RC) 4DMRI scanning (dash-line box), correlation modeling (dash-dot-line box), and model testing workflows in both mid-term and long-term time frames.
This study adhered to the relevant Equator guidelines, specifically the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD), 27 for study design, method development, and result reporting.
Long Short-Term Memory (LSTM) Networks for Subject-Specific Training and Testing
LSTM deep-learning networks were employed to build subject-specific models that predict internal organ motion from external respiratory waveforms. Following Z-score normalization, each of the 10 subjects’ first set of concurrent motion waveforms (mid-term: 3-10 min) was split into two halves, and the first half of both waveforms was used to train the LSTM networks. Each half of the mid-term waveforms contained between 88 s × 20 Hz = 1760 and 300 s × 20 Hz = 6000 data points. Testing was performed on the second half of the scan, wherein the trained models were deployed to predict the internal waveform from the external waveform. Because the ground truth was present in both training and testing datasets, the accuracy of the predicted waveform was evaluated by calculating the root mean square error (RMSE) between the predicted and actual waveforms. The trained models were subsequently tested—without retraining—on the entire long-term dataset (obtained 20-30 min after the mid-term dataset), and the prediction was again compared to the ground truth. A laptop computer (Intel® Core™ i7-5500U CPU 2.40 GHz, RAM 8.00 GB, Intel® HD Graphics 5500 GPU 4.0 GB) was used in the data analysis.
Common Hyperparameters for LSTM Model Training
The machine-learning toolbox in MATLAB (version 2023a) was used to construct the LSTM network architecture, which contained the following layers: a sequence-input layer with input size 1, an LSTM layer with 150 hidden units, a fully connected layer with output size 100, a batch-normalization layer, a rectified linear unit (ReLU) layer, a dropout layer with 80% probability of randomly setting input elements equal to 0, an additional fully connected layer with output size 1, and a regression-output layer. Additionally, the optimizer was set to stochastic gradient descent with momentum to allow random selection of initial points, the maximum number of epochs was 50, the gradient threshold was 1, and shuffling was not applied.
Because many of the major LSTM configuration parameters—including the number of LSTM layers, hidden units per layer, epochs, and learning rate and time lags—do not have obvious optimal settings, 17 values were selected from commonly used ranges. Although an exhaustive optimization of hyperparameter values was not undertaken, a limited exploration of the mini-batch size (10, 20, 50), initial learn rate (0.001, 0.005, 0.01), learn rate drop factor (0.05, 0.1, 0.2), learn rate drop period (5, 10, 15, 50), and number of LSTM layers (1, 2, 4) was performed to find an optimal, subject-specific LSTM setting, near the reported optimal setting range. 17 Specifically, 12 unique sets of hyperparameters, each with five replicates, were tested for every volunteer, amounting to 60 LSTM models per volunteer. The individual model that minimized the RMSE between the predicted and actual waveforms during training was subsequently used when the trained LSTM models were applied to the testing datasets. The workflow is shown in Figure 1D.
Assessment of LSTM Model Accuracy and Selection of LSTM Hyperparameters
Pearson correlation coefficients (
Aiming for potential RGRT applications, we assessed how accurately the models predict the temporal occurrence of respiratory peaks in the internal waveforms for both the mid-term (3-10 min) and long-term (20-30 min) cases. Only non-negative peak values in predicted waveforms were considered, and each peak in the predicted waveform was paired with the peak in the actual waveform that was nearest in time. The pairing of the predicted and native internal respiratory peaks was visually verified. The temporal error (Δ
Selection of Hyperparameters for High-Performance LSTM Models
We investigated whether the RMSE values generated by the LSTM models in the training dataset could predict strong correlations when the LSTM models were applied to the testing data. For this analysis, results for all hyperparameter groups—including five replicates per group—were used, providing a sample of 600 data points. It is worthwhile to emphasize that the LSTM model was thoroughly studied with more than 44 000 searches and found to be insensitive to the hyperparameter settings, as shown previously, 17 and the reported optimal settings were applied to selecting the testing groups in this study.
We set
Conventional Time-Domain Cross-Correlation (TCC) for Phase-Shift (or Time-Shift) Correction
Respiratory irregularities refer to any breathing variations, including pattern changes, baseline drifts, and changes in frequency, amplitude, and synchronization of external and internal motion waveforms. A phase shift between paired external-internal signals, leading to a reduction of their correlation, can be identified with the time-domain cross-correlation,
The Fast Fourier Transform (FFT) was also used to analyze breathing irregularities in the motion waveforms. Furthermore, changes in breathing patterns from training to testing (mid-term and long-term) datasets can be evaluated using the cross-correlation of the FFT spectra by shifting the testing datasets relative to the training dataset. Between the mid-term and long-term datasets, the plots of cross-correlation versus frequency shift were compared to determine if the breathing patterns remained similar or changed drastically.
A Hybrid, Adaptive Approach Using TCC to Enhance the LSTM Model Stability
Owing to possible large breathing variations (eg, changes in breathing patterns) that were not captured in the training dataset, a residual phase shift may arise between the actual internal waveform and the waveform predicted by the LSTM model in the long-term timeframe. Previously, it was reported that phase-shift correction was an effective approach for improving external-to-internal motion correlation.9,10 To evaluate whether weak correlations could be rescued, the time (phase) shift between each actual internal waveform and its paired model-predicted waveform was corrected by maximizing the TCC. By correcting possible residual phase shifts in the mid-term and long-term testing datasets that were not present in the training data, this hybrid approach (LSTM-TCC) may achieve near real-time model adaptation without requiring retraining. Again, Pearson correlation, RMSE values, and temporal error of inspiratory peaks were used to evaluate the hybrid, adaptive approach, similar to that for the LSTM model assessment described earlier (see Figure 1D).
The correlation and RMSE values between the native internal and external waveforms provided a baseline against which the TCC method, LSTM models, and hybrid approach were compared. The native waveforms were also employed to calculate baseline temporal errors in locating inspiratory peaks. To compare various methods to the native strategy and with each other, single-tailed, exact Wilcoxon signed-rank tests were employed to evaluate the significance of differences among Pearson correlation coefficients, RMSE values, and average temporal errors. Exact, single-tailed Mann-Whitney U tests were used to compare the distributions of temporal errors generated by the various methods. Statistical significance was
Results
Limited Hyperparameter Search for Training the LSTM Models
Twelve sets of hyperparameters were investigated, as shown in Table 1. Although multiple sets of hyperparameters generated low RMSE values between the predicted and actual internal waveforms in the training dataset for most volunteers, the hyperparameters in group 4 yielded the lowest RMSE values across all volunteers. Consequently, modeling results using group 4 hyperparameters were used to evaluate the performance of the LSTM models on the testing datasets.
Tested Ranges of LSTM Hyperparameters.
Twelve parameter groups (each with five hyperparameters) were used for LSTM model training. The shaded boxes highlight differences between groups of hyperparameters.
The lengths of the datasets ranged from 88 s to 300 s for the training and mid-term testing datasets, and from 39 s to 326 s for the long-term datasets (Table 2). By design, the training and mid-term testing datasets were equal in length. The long-term testing data were shorter than the mid-term data for six volunteers and longer for the others. Because they contained multiple respiratory cycles, all of the waveforms were sufficiently long for testing. On average, for each second of training data, 0.48 s of training time—or about half the length of the training data—was needed to train the LSTM model with group 4 hyperparameters. Consequently, 96 ± 34 s were needed, on average, to train the LSTM models. By contrast, the trained LSTM models required just 0.2 s on average to predict the entire length of the testing data. Because clinical applications require forecasting only a few breathing cycles into the future, the LSTM models’ near real-time prediction could be further hastened by shortening the prediction window.
The Length of Training and Testing Waveform Datasets (20 Hz), the Average Time for LSTM Model Training and Prediction Using the Group 4 Hyperparameters among Five Replicates, and the Mean Motion Range (mm) and External-Internal Time Shift (s) in the two Timeframes.
Note: substantial differences in subjects 4 and 7 are highlighted in bold.
The Average time of long-term prediction is the same as that for mid-term prediction.
Validation of the LSTM Models
The LSTM models were validated by testing the trained models on the same data upon which they were trained. Correlation and RMSE values were found for four different methods: native, TCC, LSTM, and LSTM-TCC (Table 3). The external-to-internal motion correlations obtained for the LSTM-predicted waveforms (
Correlation and RMSE Values for the Training Dataset (mid-Term) for Validation among Four Different Approaches.
For all subjects, the correlation coefficient increased in the order of Native < TCC < LSTM ≤ LSTM-TCC.
Bold italic font indicates no difference between the Hybrid approach and the LSTM models alone.
As a control, the hybrid approach was also applied to the training data, and only negligible residual time shifts (0.15 s for volunteer 7 and 0.00-0.05 s for all others) were found, resulting in a trivial improvement (at most 1.1% increase in correlation,
Internal-Motion Predictions in the mid-Term (3-10 min) and Long-Term (20-30 min) Periods
Applied to the mid-term external waveforms, the trained LSTM models predicted internal waveforms that correlated strongly with the actual internal waveforms and always exceeded the correlation obtained for the native waveforms and the TCC method: Whereas the LSTM models achieved acceptable correlations (
Correlation Values for the mid-Term (3-10 min) and Long-Term (20-30 min) Testing Datasets.
The LSTM model achieved an acceptable correlation (
*The time shift found in the testing data was applied to the external waveforms of the mid-term and long-term testing datasets. Bold numbers are acceptable correlations (C ≥ 0.8).
When applied to the long-term datasets, the LSTM method failed to achieve

Heat maps portray the correlations achieved by the LSTM and LSTM-TCC models for each of the 12 hyperparameter groups (with five replicates in each group). The LSTM models predict well (
The Ability of the Training RMSE Values to Identify High-Performance LSTM Models
The LSTM training RMSE values were strongly correlated with the LSTM predictions in the mid-term scan (
Predicting the Temporal Occurrence of Inspiratory Peaks in the Actual Internal Waveform
The waveforms predicted by the LSTM models recapitulated the qualitative shapes of the actual internal waveforms with high fidelity (Figure 3). To quantify the accuracy of predicting the occurrence of inspiratory peaks in the internal waveforms, subject-specific distributions were found for the temporal errors in locating peaks (Figure 4).

Comparison of actual (red) diaphragmatic-motion trajectories and the predicted (blue) from the LSTM model for three volunteers (1, 5, 7) in (A) mid-term (5-10 min) and (B) long-term (20-30 min) tests, and the Fast Fourier Transform (FFT) spectra (C) and cross-correlation of FFT spectra (D). The peaks of the predicted waveforms are marked by blue dots, and the corresponding peaks in the actual waves by black dots. It is worthwhile to mention that all plots illustrate accurate peak prediction (less than 0.18±0.15 s), despite some spatial uncertainty. The FFT spectra of the waveforms in training (Tr), mid-term (MT), and long-term (LT) timeframes (C) and the cross-correlation of the FFT spectra of MT/LT versus Tr waveforms by shifting MT and LT spectra relative to the fixed Tr spectra (D). For subjects 1 and 5, the FFT and cross-correlation appear similar in terms of frequency and cross-correlation distributions, whereas for subject 7, both plots (C, D) show substantial changes between long-term and mid-term waveforms.

Temporal errors in box plots depicting the median (horizontal, red lines), quartile, and range of the predicted inspiratory peaks in reference to the navigator waveforms for the mid-term (top) and long-term (bottom) scans. Original (O, red), TCC method (T, green), LSTM model (L, blue), and hybrid LSTM-TCC approach (H, purple). The dashed, horizontal line indicates an error of 0.25 s. The results of statistical comparisons are also shown, and the number of asterisks indicates the degree of certainty (*
In the mid-term scan, the LSTM models located the peaks more accurately than the native waveforms for nine out of 10 volunteers (Figure 4A) and more accurately than the TCC method in all 10 subjects. Although small residual time shifts were detected in the mid-term data for six volunteers, the sizes of the observed differences in the temporal errors were small between the LSTM and LSTM-TCC approaches (Figure 4A).
In the long-term scan, the LSTM models located peaks more accurately than the native waveforms for six volunteers, but the temporal errors were significantly worse for volunteer 4 (Figure 4B). The LSTM models were more accurate than the TCC method for seven volunteers, and no statistically significant difference was found for the other three subjects. In the six volunteers for whom a residual time shift was detected during the long-term scan, the hybrid approach was more accurate than the LSTM models for three subjects (Figure 4B). Notably, the hybrid approach could rescue the poor performance of the LSTM models alone for volunteer 4. Major breathing variations arising in the long-term timeframes, therefore, necessitate the application of the hybrid LSTM-TCC approach to maintain an acceptable performance level.
The LSTM models yielded median temporal errors of up to 0.25 s for eight out of ten subjects in the mid-term timeframe and for nine subjects in the long-term timeframe. The hybrid approach produced median temporal errors below 0.25 s for all subjects in the mid-term scan and for nine out of 10 subjects in the long-term waveform. For volunteer 7, the one exception to the hybrid approach achieving the 0.25-s benchmark, the median temporal error was 0.35 s For this single subject, the shape of the external waveform changed considerably between the mid-term and long-term scans, rendering the learned relationship between the internal and external waveforms in the training dataset less valid.
In the mid-term scan, the mean error in predicting the timing of peaks for all 10 subjects was 0.76 ± 0.61 s for the native waveforms, 0.67 ± 0.53 s for the TCC method, 0.18 ± 0.12 s for the LSTM models, and 0.17 ± 0.10 s for the hybrid approach. In the long-term scan, the average errors were 0.38 ± 0.37 s for the native waveforms, 0.39 ± 0.22 s for the TCC method, 0.18 ± 0.15 s for the LSTM models, and 0.15 ± 0.11 s for the hybrid approach.
Discussion
Although they have been employed to study time series processes in radiotherapy, LSTM models for long-term predictions of external-to-internal respiratory-induced organ motion have not been fully explored, especially the accuracy and adaptability of their predictions in the presence of common breathing irregularities. LSTM models can be used to predict future motion (1) within the same waveform or across waveforms to predict the motion in one waveform from the other; (2) for short-term, just-in-time predictions (such as the next 50-500 ms) or for relatively long timeframes (such as 5-30 min); and (3) for population-based modeling with a large subject pool, or subject-specific modeling with sufficient dataset per subject for a handful a individuals. In this study, we deployed LSTM models—with and without adaptation—to investigate long-term, subject-specific relationships between external and internal motions in 10 healthy volunteers, demonstrating the proof of principle of the novel adaptive LSTM-TCC approach.
Advantages of Subject-Specific, Adaptive LSTM Prediction of Long-Term, Internal Motion
This study tested the ability of LSTM networks to predict internal respiratory motion from external-motion waveforms over mid-term (3-10 min) and long-term (20-30 min) timeframes. Because training datasets cannot capture the entire range of an individual's breathing patterns, we sought to develop an adaptation mechanism to address breathing variations occurring over longer timescales. Adaptation of the trained LSTM models was achieved by identifying and correcting residual phase shifts through a conventional TCC method. This near-real-time adaptive mechanism avoids time-consuming retraining, potentially useful for clinical RGRT applications. We therefore assessed not only the correlations obtained between the actual and predicted internal waveforms, but also the temporal accuracy achieved by the models through predicting inspiratory peaks, which are usually better for beam gating than the typically flat expiratory valleys.28,29
Because patients may display breathing irregularities that were not captured in the training datasets, the accuracy of the LSTM models’ predictions was expected to decline from the 3–10 min to 20–30 min periods. Indeed, within mid-term, the
The median temporal error in predicting inspiratory peaks was < 0.25 s for the mid-term timeframe in eight out of ten subjects using the LSTM models, and in all subjects using the LSTM-TCC approach. In the long-term timeframe, the hybrid approach rescued the large temporal errors yielded by the LSTM model for volunteer 4, the only subject for whom the temporal errors were statistically worse than for the native waveforms. The hybrid approach produced median temporal errors below 0.25 s in nine out of ten subjects, and the median temporal error was 0.35 s in one subject (volunteer 7) whose bellows waveform changed considerably between the first and second scans. Again, subjects 4 and 7 experienced substantial breathing pattern changes (Table 2). For respiratory gating, the temporal uncertainty can be further reduced using an automatic <0.5 s delay for beam-on/off after receiving the gating signal in Varian TrueBeam (Simens-Varian Medical Systems, Palo Alto, CA).
Sources of Phase (Time) Shifts Between External and Internal Respiratory Motions
Phase (time) shifts between external and internal respiratory motions are common,9,10,27 reflecting the imperfect synchronization between the diaphragmatic, superior-to-inferior (SI) motion detected by the navigator and the upper-abdominal, anterior-to-posterior (AP) motion detected by the bellows. In free breathing, the diaphragmatic and intercostal muscles are the main driving forces for diaphragmatic SI and chest AP motions, respectively. Time delays—or phase shifts—therefore arise when these muscles’ motions are not precisely synchronized during respiration. Because distinct muscle groups are involved, the diaphragmatic SI motion that propagates to the abdominal AP motion (measured by the bellows device) differs from that propagating to the chest AP motion. Using the TCC method to correct the phase shift between the diaphragmatic SI motion and abdominal AP motion, therefore, enhances the correlation between the two motions (Table 2). Breathing irregularities—such as a sudden uncoupling between internal and external respiratory motions—that deviate from an individual's normal muscle motion and synchronization may corrupt the correlation and generate outliers among the temporal errors.
The various muscle groups make different contributions to respiration. Namely, abdominal-driven, thoracic-driven, or mixed breathing patterns may be observed, and, in a few minutes, a patient's respiration may drift from one pattern to another. In this study, two cases in the long-term timeframe displayed substantial breathing pattern changes, and the residual phase shifts engendered by those changes could not be accommodated by LSTM models trained on the mid-term datasets. Correcting residual phase shifts through the TCC method effectively boosted the LSTM models’ performance (Table 3). This approach constitutes a near real-time, adaptive mechanism that maintains the fidelity of LSTM models’ predictions and avoids time-consuming retraining. Because the LSTM-TCC model always outperforms or equals the performance of the LSTM model alone, the TCC serves as a potential near-real-time quality assurance step, at least in part, ensuring accurate prediction results.
Phase shifts between external and internal waveforms—quantified by phase-domain or time-domain techniques—were previously found to be steady over time for most subjects despite breathing irregularities.9,10 Whereas residual phase shifts and their impact on model performance were negligible for most participants in the present study, two subjects displayed large residual phase shifts (or time-shift changes) in the long-term timeframe (Table 2). By learning the complex, non-linear relationships between external and internal motions, the LSTM model can correct phase shifts and achieve results that are superior to those obtained by oversimplified linear correlation (such as TCC) and approximate physical modeling approaches. However, the TCC approach can take new variations into account and correct any residual phase shifts from the LSTM predictions almost instantly (<100 ms), therefore, the hybrid approach (LSTM + TCC) becomes more accurate and reliable for internal motion prediction.
Selection of Hyperparameter Values During LSTM Training and the Model Stability
The performance of the LSTM model is insensitive to hyperparameter selection within the previously tested range. 17 Therefore, we limited our testing range to only the reported optimal hyperparameter settings for LSTM modeling, rather than exhaustive optimizations. Indeed, similar performance levels are obtained (Figure 2) for hyperparameter values—including the number of LSTM layers, hidden units per LSTM layer, epochs, learning rate, and forgetting rate—chosen from recommended ranges, 17 suggesting that the deep learning network may find a way to optimize its learning outcomes. It is worthwhile to note, however, that the stochastic gradient descent with momentum optimization algorithm should be used to ensure non-biased deep-learning results. The large AUROC values found in this study additionally reveal that high-performance, subject-specific LSTM models can be identified during the training period from limited sets of hyperparameter values, such as the 12 combinations used herein.
The prediction stability of the trained model depends on the variation of the testing datasets, including both mid-term and long-term timeframes. In this study, several indicators are used to characterize the breathing pattern changes in a subject, including motion amplitudes and time shifts in Table 2 and FFT spectra and cross-correlation of the spectra in reference to that of the training data in Figure 3. For subjects 4 and 7, the LSTM model predictions begin to fail (C ≈ 0.6) in the long-term testing, which is directly caused by the substantial changes in these values. The poor predictions for these two subjects in the long-term time frame are rescued by correcting the time shifts using the hybrid LSTM-TCC model. However, other breathing pattern changes, such as low-frequency baseline drifts, are also present in the testing data and reduce the accuracy of positioning prediction, especially in motion amplitude (Figure 3), beyond the capability of the hybrid model.
Potential Clinical Applications Using the Hybrid LATM-TCC Modeling
This study has demonstrated that the hybrid LSTM-TCC models can predict internal organ motion with high correlation and temporal accuracy with stable performance over 5–30 min and thus have the potential for RGRT applications. The advantages of this adaptive LSTM-TCC approach include (1) potential near real-time prediction and adaptation, and their latency (LSTM: 200 ms and TCC: 100 ms) can be further reduced by using GPU-based computation in the future for up to two orders of magnitude acceleration
6
and (2) long-term stability within 30 min, which covers the entire treatment fraction without remodeling. This study has demonstrated the feasibility of using the new hybrid LSTM-TCC to predict internal organ motion in 20–30 min via adaptation. Using the hybrid LSTM-TCC approach (
It is worthwhile to note that the breathing irregularities may occur more frequently in patients than in volunteers, as cancer patients, especially lung cancer patients, may experience breathing difficulties and variations caused by pain, disease progression, and other health changes. Therefore, an adaptive approach to cope with a higher probability of severe breathing pattern changes is critical to ensure the stability of model prediction, avoiding breathing remodeling that interrupts patient treatment. 11 For LSTM-TCC prediction, it could be performed continuously with a latency of 300 ms, increased from 200 ms due to the addition of TCC computation.
Additionally, the external-internal motion correlation study was based on normalized motion waveforms, and the absolute accuracy (in mm) could only be estimated from the motion ranges measured based on the reconstructed RC-4DMRI images. On the other hand, respiratory gating is often based on normalized motion range near the full exhalation with a gating threshold of 30%-50%. Therefore, it should be suitable to evaluate the potential of LSTM-TCC models for RGRT applications in the future.
Concerns and Limitations of the Hybrid LSTM Network for Motion Management
As discussed above, the conventional linear TCC method is inferior to the LSTM, a non-linear deep-learning approach, resulting in less enhancement in the external-internal correlation, in reference to the native correlation (Table 4). However, as the LSTM may lose its predicting power when facing severe respiratory pattern variations, using the TCC method to correct any residual time shift in the predicted results becomes a safeguard to the overall results with a near real-time performance (<100 ms). This hybrid approach overcomes the lengthy LSTM model re-training process by correcting the major cause of external-internal correlation degradation. Other factors affect the correlation, such as changes in the shape of the waveforms, which may not be fully correctable using the linear TCC approach.
Whether breathing variations outside the training dataset result in consequences other than residual phase shifts remains to be investigated. We observed one example (volunteer 7) of considerable change in the shape of the external waveform (with changing frequencies in FFT, Figure 3), and, although an acceptable correlation was obtained, the hybrid approach yielded a median temporal error in identifying inspiratory peaks that exceeded the benchmark (<0.25 s). More subjects should be recruited and studied to determine whether the hybrid, adaptive LSTM-TCC approach is sufficiently versatile and universal to adapt to various possible breathing variations under different clinical scenarios. Changes in the shape of the bellows waveform would be readily detected, and the LSTM models could be retrained for the patients demonstrating such variation. Additionally, low-frequency baseline shifts were also observed, affecting the prediction accuracy in motion amplitude (Figure 3). Although increasing the size of the training dataset may fix this issue, as more motion variations can be included in the LSTM model, other adaptive approaches may be necessary to handle new breathing pattern changes, especially for cancer patients. Furthermore, a significantly longer timeframe—typically a week—separates a patient's simulation from their treatment. In a future study, we will investigate whether patient-specific LSTM models trained at simulation can be adapted via the hybrid LSTM-TCC approach to account for differences in breathing motions that occur on the day of treatment. Before it can be put into clinical practice, more investigations are needed to test and validate the hybrid LSTM model's stability and reliability.
Lastly, the LSTM and LSTM-TCC models’ predictions for respiratory-motion amplitudes are imprecise (Figure 3), and further improvements are required for tumor-tracking applications. Potential solutions are (1) to increase the training sample size so that more variations, including low-frequency baseline drifts, can be included in the LSTM modeling, and (2) to explore other adaptive approaches to update the LSTM model in near real-time. For respiratory gating, however, a high temporal accuracy (0.18 ± 0.15 s) of the models would be sufficient, as it suggests a high motion sensitivity required to determine the beam-on and beam-off windows. An ongoing simulation study has been initiated to characterize respiratory gating capability using both the LSTM and hybrid LSTM-TCC approaches, including accuracy, efficiency, and uncertainty for RGRT. Overall, the hybrid LSTM-TCC model has been demonstrated to be feasible as a potential adaptive strategy for combating common breathing irregularities and maintaining accurate and reliable external-to-internal motion predictions over a 30-min timeframe.
Conclusions
In this study, we investigated the LSTM network and a hybrid LSTM-TCC approach for subject-specific, cross-dataset, and mid- and long-term prediction of internal-organ motion, demonstrating a feasible strategy to predict internal organ motion with high correlation and temporal accuracy in the presence of common breathing irregularities. Through correcting any residual phase/time shifts in the LSTM model caused by new breathing variations beyond the training dataset, the hybrid LSTM-TCC model can help maintain near real-time, high prediction accuracy throughout 5–30-min timeframe without remodeling. This adaptive approach could serve as a safeguard to check and correct any phase/time shifts, securing prediction performance. Further investigations are required before the hybrid LSTM approach is ready for clinical applications.
Footnotes
PACS Classification Numbers
Artificial intelligence, 07.05.Mh
Respiration, 87.19.Wx
Magnet resonance imaging, 87.61.−c
Treatment planning, 87.55.D-
treatment strategy in, 87.55.-x
Abbreviations
Acknowledgments
This research is supported in part by the MSK Cancer Center Support Grant/Core Grant (P30 CA008748). The authors are grateful to the simulation therapists for acquiring MRI scans of all participants under an IRB-approved protocol.
Ethical Considerations
This clinical research is conducted under MSK IRB-approved protocol (IRB 073-015, project #4), in accordance with the principles embodied in the Declaration of Helsinki and US NIH regulations. All subjects enter this study voluntarily and sign the informed consent.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Clinical Center, (grant number P30 CA008748).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
