Sage Journals: Discover world-class research

Abstract

Introduction

Although long short-term memory (LSTM) networks have been tested to predict short-term respiratory motion, their performance in long-term forecasting under breathing irregularities must be assessed. We aim to evaluate and enhance the long-term prediction of internal motion from an external surrogate using subject-specific LSTM models through a hybrid, adaptive approach.

Methods

Concurrent internal navigator and external bellows respiratory-motion waveforms were acquired for ten volunteers during two four-dimensional magnetic resonance imaging (4DMRI) scans lasting 3–10 min each. Approximately 20 min intervened between the first (mid-term) and second (long-term) scan. After training on the first half of the mid-term data, subject-specific LSTM models were applied to the remaining mid-term and entire long-term datasets to predict internal waveforms. The accuracy of a model's prediction was assessed with Pearson's correlation (C), referenced to the native waveforms, maximized through the time-domain cross-correlation (TCC), and enhanced by correcting residual phase shifts in the LSTM models using a hybrid (LSTM-TCC) approach. Hyperparameter selection by minimizing the root mean square error (RMSE) to identify high-performance (C ≥ 0.8) LSTM models was evaluated by the area under the receiver operating characteristic curve (AUROC). The temporal accuracy of inspiratory-peak predictions was characterized.

Results

Compared to the native waveforms (C = 0.42 ± 0.28) and TCC method (C = 0.77 ± 0.09), the LSTM models yielded more accurate predictions (C = 0.89 ± 0.07) in the mid-term scans. Over 20–30 min, LSTM predictions faltered (C < 0.80) in two subjects but were rescued by LSTM-TCC (C = 0.90 ± 0.09). The temporal error in predicting inspiratory peaks was smaller for LSTM-TCC (Δt = 0.15 ± 0.11sec) than LSTM (Δt = 0.18 ± 0.15sec). RMSE reliably identified high-performance models: $A U R O C_{L S T M}^{m i d - t e r m} = 0.82$ , $A U R O C_{L S T M}^{l o n g - t e r m} = 0.74$ , and $A U R O C_{h y b r i d}^{l o n g - t e r m} = 0.83$ .

Conclusion

The feasibility of a novel adaptive subject-specific LSTM-TCC modeling was tested in 10 subjects, demonstrating that high accuracy of external-to-internal motion predictions in 3–10 min can be extended to 30 min overcoming breathing irregularities without remodeling. Further investigations of the adaptive LSTM-TCC model are warranted as a potential clinical solution.

Keywords

artificial intelligence external-internal motion correlation LSTM and hybrid LSTM deep learning respiratory motion prediction motion management for treatment planning

Introduction

In the presence of respiratory motion, radiotherapy treatments target the volume encompassing the trajectory of a moving lung or liver tumor throughout a respiratory cycle. Locating a tumor in real-time would reduce the motion margin needed in the internal tumor volume (ITV) for respiratory-gated radiotherapy (RGRT) or for real-time tumor-tracking radiotherapy (RTRT).^1,2 As stereotactic body radiotherapy (SBRT) can be associated with significant skin toxicity,³ however, real-time fluoroscopic imaging to track tumors throughout SBRT was not recommended. Over the last two decades, alternative approaches—including those employing external-to-internal correlation, semi-empirical formulas, and physical modeling––have been advanced for predicting respiratory-induced tumor motion.^1,4–10 Nevertheless, it remains challenging to build predictive motion models with clinically acceptable accuracy over typical treatment timeframes. For example, Cyberknife's Synchrony system creates a patient-specific respiratory-motion model before treatments based on external- and internal-motion data pairs, and radiographic imaging is used to verify and update the model periodically during treatment.¹¹ However, breathing irregularities often violate the pre-treatment model, and several rounds of model rebuilding are often required during tumor-tracking treatments that can last up to 60 min. The lack of accurate and reliable real-time guidance hinders the practicality of utilizing external surrogates to guide RGRT or RTRT, especially for the isocentric linear accelerators, the most commonly used in the radiotherapy clinic. Consequently, real-time magnetic resonance (MR) imaging has been adopted to build MR-integrated linear accelerators (MRL) for MR-guided radiotherapy (MRgRT) as an alternative to external-to-internal motion modeling. Nevertheless, obtaining an MRL for MRgRT requires a large investment, and real-time MRgRT in intra-fractional motion management is still under development. To date, MRL accounts for only a small portion of all clinical linear accelerators, limiting its impact on cancer treatments worldwide.

Recently, deep-learning neural networks have been introduced to radiotherapy clinics. These methods outperform conventional approaches for image reconstruction, organ segmentation, deformable image registration, treatment planning, quality assurance, and outcome prediction.^12–16 In respiratory-motion management, long short-term memory (LSTM) deep-learning neural networks are so far most applied to learn and predict time-series events.^17–22 LSTM-based modeling can be categorized according to prediction timeframes (near future vs long term), datasets (within single series vs across a pair of datasets), and learning scope (population-based vs patient-specific). Examples of population-based, single-dataset, and short-term predictions include (1) Lin et al¹⁷ applied an LSTM network to real-time positioning management (RPM) breathing traces from 985 patients to predict the values of external breathing traces 500 ms into the future and found the LSTM model's accuracy surpassed that of conventional machine learning models, and (2) Lombardo et al²³ reported that adaptive LSTM modeling (offline + online LSTM) yielded the best prediction of superior-to-inferior tumor motion at 250, 500, and 750 ms into the future using 2D cine MRI datasets from 88 patients. Patient-specific modeling requires sufficient data points per patient, such as Nie and Li²⁴ forecasted patient-specific tumor positions 125 ms and 250 ms into the future by applying the LSTM network to 4DMRI image libraries (960 time points with 240 3D images per patient) of lung tumors, and superimposed projected tumor volumes onto the next MR 2D cine frame in the beam eye's view for MR-guided RGRT delivery. For short-term, subject-specific, cross-dataset prediction, Wang et al¹⁹ studied the external-to-internal motion correlation of seven volunteers using an LSTM network with 6500 datapoints per subject to predict internal motion 450 ms in the future (RMSE ≤ 1.0 mm) to reduce gating latency. These studies have demonstrated convincing evidence that LSTM networks can accurately predict short-term tumor motion and reduce tumor-tracking latency in motion management during radiotherapy.

The accuracy of the subject-specific LSTM model's prediction of internal motion from external-motion data has not been explored over timeframes—such as the 5 to 30 min needed for radiotherapy treatments—extending well beyond the collection period for obtaining training data. Additionally, clinical applications will require robust LSTM models that incorporate mechanisms for adapting to changes in breathing patterns. Breathing irregularities commonly occur among cancer patients due to changes in their breathing behavior. An adaptive LSTM model aims to enhance the model stability over a long timeframe, such as a 30-min treatment fraction in radiotherapy, in which new breathing variations would occur, allowing non-interruptive, high-quality prediction without lengthy remodeling processes. Previously, maximization of the time-domain cross-correlation (TCC) was shown to be effective in correcting the phase shift between external and internal respiratory-motion waveforms, and the identified phase shifts tended to be stable over long timeframes.^9,10 The TCC method, therefore, provides a conventional standard against which the performance of an LSTM model can be compared. More broadly, the long-term accuracy of LSTM models and methods for introducing adaptation are of general interest in deep-learning research and clinical applications.²⁵

In this IRB-approved study of 10 healthy volunteers, we aimed to develop an adaptive LSTM model that can maintain high performance over the radiotherapy treatment timeframe (∼30 min). We first tested the LSTM network's ability to predict the internal navigator motion from the external bellows waveform in the data that immediately followed the model-training dataset, namely the second half of the mid-term timeframe (3-10 min). The LSTM model's performance was compared to the correlation between the native external and internal waveforms and the correlation after correcting the phase shift between the native waveforms through the conventional TCC method.^9,10 An effective method was developed to select hyperparameters to create a high-performance LSTM model. We further assessed long-term predictions using data from the second scan (20-30 min after the training data) using both the LSTM models and a hybrid approach that—through the application of the TCC method to check and adjust LSTM model predictions—enabled the LSTM models to adapt to new breathing variations that were not present in the training dataset. The Pearson correlation coefficient (C) and temporal error (Δt) in locating the inspiratory peaks of the respiratory cycles were calculated to evaluate the quality of a model's predictions.

Materials and Methods

Concurrent External and Internal Respiratory-Motion Waveforms of 10 Subjects

Under an IRB-approved protocol, external bellows and internal navigator waveforms were acquired concurrently from 10 healthy volunteers during two 4DMRI scans in 2015–2016. The duration was 9 ± 3 min (mid-term) for the first scan and, after an intervening 20 ± 5 min, 12 ± 2 min for the second scan (long-term). The bellows was placed about 5 cm inferior to the xiphoid process, and the navigator (3 × 3 × 6 cm³) interrogated the position of the right diaphragm's dome. The MR bellows was an airbag placed under a fixed-length Velcro strip around the belly, measuring air pressure changes during respiration, forming an external surface motion waveform. An MR navigator defined a region of interest, in which the MR signal can be triggered and acquired as a one-dimensional image. When the navigator was placed on a high-contrast interface, such as the diaphragm, its position could be detected from the MR navigator images, generating an internal motion waveform. The setup of the bellows and navigator is illustrated in Figure 1. The timestamps (in ms) of the scan log files were used to synchronize the two waveforms, and the 496 Hz bellows signal was down-sampled to match the 20 Hz navigator signal. Because both the navigator and bellows waveforms were collected at arbitrary scales, Z-score normalization was applied: The waveforms were normalized to have a mean of zero and a standard deviation of one. Because the waveforms were acquired during retrospective 4DMRI scans as an internal-motion signal for acquiring and binning,²⁶ the navigator scan was paused during the acquisition of 4DMRI slices. Due to breathing irregularities, the waiting time to get a particular not-yet-scanned phase became longer. Consequently, most of the waveform data were acquired toward the end of the two 4DMRI scans, and the length of time contained within the waveforms was therefore shorter than the 4DMRI scans.

Figure 1.

Experimental setting and workflow: (A & B) navigator (red box) and bellows (blue) placements, (C) MR-compatible bellows, and (D) respiratory-correlated (RC) 4DMRI scanning (dash-line box), correlation modeling (dash-dot-line box), and model testing workflows in both mid-term and long-term time frames.

This study adhered to the relevant Equator guidelines, specifically the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD),²⁷ for study design, method development, and result reporting.

Long Short-Term Memory (LSTM) Networks for Subject-Specific Training and Testing

LSTM deep-learning networks were employed to build subject-specific models that predict internal organ motion from external respiratory waveforms. Following Z-score normalization, each of the 10 subjects’ first set of concurrent motion waveforms (mid-term: 3-10 min) was split into two halves, and the first half of both waveforms was used to train the LSTM networks. Each half of the mid-term waveforms contained between 88 s × 20 Hz = 1760 and 300 s × 20 Hz = 6000 data points. Testing was performed on the second half of the scan, wherein the trained models were deployed to predict the internal waveform from the external waveform. Because the ground truth was present in both training and testing datasets, the accuracy of the predicted waveform was evaluated by calculating the root mean square error (RMSE) between the predicted and actual waveforms. The trained models were subsequently tested—without retraining—on the entire long-term dataset (obtained 20-30 min after the mid-term dataset), and the prediction was again compared to the ground truth. A laptop computer (Intel^® Core™ i7-5500U CPU 2.40 GHz, RAM 8.00 GB, Intel^® HD Graphics 5500 GPU 4.0 GB) was used in the data analysis.

Common Hyperparameters for LSTM Model Training

The machine-learning toolbox in MATLAB (version 2023a) was used to construct the LSTM network architecture, which contained the following layers: a sequence-input layer with input size 1, an LSTM layer with 150 hidden units, a fully connected layer with output size 100, a batch-normalization layer, a rectified linear unit (ReLU) layer, a dropout layer with 80% probability of randomly setting input elements equal to 0, an additional fully connected layer with output size 1, and a regression-output layer. Additionally, the optimizer was set to stochastic gradient descent with momentum to allow random selection of initial points, the maximum number of epochs was 50, the gradient threshold was 1, and shuffling was not applied.

Because many of the major LSTM configuration parameters—including the number of LSTM layers, hidden units per layer, epochs, and learning rate and time lags—do not have obvious optimal settings,¹⁷ values were selected from commonly used ranges. Although an exhaustive optimization of hyperparameter values was not undertaken, a limited exploration of the mini-batch size (10, 20, 50), initial learn rate (0.001, 0.005, 0.01), learn rate drop factor (0.05, 0.1, 0.2), learn rate drop period (5, 10, 15, 50), and number of LSTM layers (1, 2, 4) was performed to find an optimal, subject-specific LSTM setting, near the reported optimal setting range.¹⁷ Specifically, 12 unique sets of hyperparameters, each with five replicates, were tested for every volunteer, amounting to 60 LSTM models per volunteer. The individual model that minimized the RMSE between the predicted and actual waveforms during training was subsequently used when the trained LSTM models were applied to the testing datasets. The workflow is shown in Figure 1D.

Assessment of LSTM Model Accuracy and Selection of LSTM Hyperparameters

Pearson correlation coefficients (C) and RMSE values were calculated between the predicted and actual internal waveforms:

C = \frac{\sum [S_{I} (t_{i}) - \bar{S_{I}}] [S_{E} (t_{i}) - \bar{S_{E}}]}{\sqrt{\sum {[S_{I} (t_{i}) - \bar{S_{I}}]}^{2} \sum {[S_{E} (t_{i}) - \bar{S_{E}}]}^{2}}}

(1)

RMSE = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {[S_{I} (t_{i}) - S_{E} (t_{i})]}^{2}}

(2)Where

S_{I} (t_{i})

and

S_{E} (t_{i})

are the two time series for the internal and external waveforms, respectively, and

\bar{S_{I}}

and

\bar{S_{E}}

are the corresponding mean values. The average correlation and RMSE value were also calculated across all 10 subjects.

Aiming for potential RGRT applications, we assessed how accurately the models predict the temporal occurrence of respiratory peaks in the internal waveforms for both the mid-term (3-10 min) and long-term (20-30 min) cases. Only non-negative peak values in predicted waveforms were considered, and each peak in the predicted waveform was paired with the peak in the actual waveform that was nearest in time. The pairing of the predicted and native internal respiratory peaks was visually verified. The temporal error (Δt) in the predicted peaks was then taken to be the difference in the timing of the actual and predicted peaks.

Selection of Hyperparameters for High-Performance LSTM Models

We investigated whether the RMSE values generated by the LSTM models in the training dataset could predict strong correlations when the LSTM models were applied to the testing data. For this analysis, results for all hyperparameter groups—including five replicates per group—were used, providing a sample of 600 data points. It is worthwhile to emphasize that the LSTM model was thoroughly studied with more than 44 000 searches and found to be insensitive to the hyperparameter settings, as shown previously,¹⁷ and the reported optimal settings were applied to selecting the testing groups in this study.

We set C ≥ 0.8 as the criterion to define true positive, whereas C < 0.8 becomes a false positive, and applied the receiver operating characteristic (ROC) curve for the binary classification. The area under the ROC curve (AUROC) was applied to evaluate the predictive power of the training RMSE to identify LSTM models that would accurately predict (C ≥ 0.8) the internal waveform in the testing datasets. Note that C = 0.80 was used as the minimal acceptable threshold, although C ≥ 0.90 is more desirable in clinical applications. Bootstrapping with 1000 samples was used to estimate 95% confidence intervals (CI) for the AUROC, and Student's t-test was used to determine whether differences between AUROC values for the LSTM models and the hybrid approach were significant.

Conventional Time-Domain Cross-Correlation (TCC) for Phase-Shift (or Time-Shift) Correction

Respiratory irregularities refer to any breathing variations, including pattern changes, baseline drifts, and changes in frequency, amplitude, and synchronization of external and internal motion waveforms. A phase shift between paired external-internal signals, leading to a reduction of their correlation, can be identified with the time-domain cross-correlation, $TCC (τ_{j})$ , which was calculated as follows:

TCC [s_{E}, s_{I}] (τ_{j}) = \frac{\sum_{k = 1}^{N} s_{E} (t_{k} + τ_{j}) s_{I} (t_{k}) - \sum_{k = 1}^{N} s_{E} (t_{k} + τ_{j}) \sum_{i = 1}^{N} s_{I} (t_{i})}{\sqrt{\sum_{k = 1}^{N} {[s_{E} (t_{k} + τ_{j}) - \sum_{i = 1}^{N} s_{E} (t_{i} + τ_{j})]}^{2}} \sqrt{\sum_{k = 1}^{N} {[s_{I} (t_{k}) - \sum_{i = 1}^{N} s_{I} (t_{i})]}^{2}}}

(3)where N is the number of data points contained within both the internal navigator signal,

s_{I}

, and the external bellows waveform,

s_{E}

TCC [s_{E}, s_{I}] (0)

gives the correlation between the original external and internal waveforms. The time shift

τ_{max}^{t r a i n}

was chosen to maximize the

TCC

between the internal and external waveforms in the training dataset. Then

TCC [s_{E}^{t r a i n}, s_{I}^{t r a i n}] (τ_{max}^{t r a i n})

gives the maximum time-domain cross-correlation—termed Max TCC—between the original internal signal and the time-shifted external signal in the training dataset. This can also be understood as the correlation between

s_{I}^{t r a i n} (t)

(the internal waveform in the training dataset) and

s_{E}^{t r a i n} (t + τ_{max}^{t r a i n})

(the time-shifted external waveform in the training dataset). Shifting the external signal effectively corrects the phase shift that exists between the internal and external waveforms.^9,10 The same time shift,

τ_{max}^{t r a i n}

, was also applied to the external waveform in the mid-term and long-term testing datasets, and the Pearson correlations were calculated for the waveform pairs

{s_{I}^{t e s t, m i d} (t)

s_{E}^{t e s t, m i d} (t + τ_{max}^{t r a i n})}

and

{s_{I}^{t e s t, l o n g} (t)

s_{E}^{t e s t, l o n g} (t + τ_{max}^{t r a i n})}

. RMSE values and the differences between the temporal locations of the paired signals’ peaks were also found for all three pairs of waveforms:

{s_{I}^{t r a i n} (t), s_{E}^{t r a i n} (t + τ_{max}^{t r a i n})}, {s_{I}^{t e s t, m i d} (t), s_{E}^{t e s t, m i d}

(t + τ_{max}^{t r a i n})}

, and

{s_{I}^{t e s t, l o n g} (t)

s_{E}^{t e s t, l o n g} (t + τ_{max}^{t r a i n})}

The Fast Fourier Transform (FFT) was also used to analyze breathing irregularities in the motion waveforms. Furthermore, changes in breathing patterns from training to testing (mid-term and long-term) datasets can be evaluated using the cross-correlation of the FFT spectra by shifting the testing datasets relative to the training dataset. Between the mid-term and long-term datasets, the plots of cross-correlation versus frequency shift were compared to determine if the breathing patterns remained similar or changed drastically.

A Hybrid, Adaptive Approach Using TCC to Enhance the LSTM Model Stability

Owing to possible large breathing variations (eg, changes in breathing patterns) that were not captured in the training dataset, a residual phase shift may arise between the actual internal waveform and the waveform predicted by the LSTM model in the long-term timeframe. Previously, it was reported that phase-shift correction was an effective approach for improving external-to-internal motion correlation.^9,10 To evaluate whether weak correlations could be rescued, the time (phase) shift between each actual internal waveform and its paired model-predicted waveform was corrected by maximizing the TCC. By correcting possible residual phase shifts in the mid-term and long-term testing datasets that were not present in the training data, this hybrid approach (LSTM-TCC) may achieve near real-time model adaptation without requiring retraining. Again, Pearson correlation, RMSE values, and temporal error of inspiratory peaks were used to evaluate the hybrid, adaptive approach, similar to that for the LSTM model assessment described earlier (see Figure 1D).

The correlation and RMSE values between the native internal and external waveforms provided a baseline against which the TCC method, LSTM models, and hybrid approach were compared. The native waveforms were also employed to calculate baseline temporal errors in locating inspiratory peaks. To compare various methods to the native strategy and with each other, single-tailed, exact Wilcoxon signed-rank tests were employed to evaluate the significance of differences among Pearson correlation coefficients, RMSE values, and average temporal errors. Exact, single-tailed Mann-Whitney U tests were used to compare the distributions of temporal errors generated by the various methods. Statistical significance was a priori set at α < 0.05, and the Benjamini-Hochberg procedure was used to account for multiple hypothesis testing.

Results

Limited Hyperparameter Search for Training the LSTM Models

Twelve sets of hyperparameters were investigated, as shown in Table 1. Although multiple sets of hyperparameters generated low RMSE values between the predicted and actual internal waveforms in the training dataset for most volunteers, the hyperparameters in group 4 yielded the lowest RMSE values across all volunteers. Consequently, modeling results using group 4 hyperparameters were used to evaluate the performance of the LSTM models on the testing datasets.

Table 1.

Tested Ranges of LSTM Hyperparameters.

Parameter Group	Mini Batch Size	Initial Learn Rate	Learn Rate Drop Factor	Learn Rate Drop Period	Number of LSTM Layers
1	10	0.005	0.2	15	1
2	20	0.005	0.2	15	1
3	50	0.005	0.2	15	1
4	20	0.01	0.2	15	1
5	20	0.001	0.2	15	1
6	20	0.005	0.05	15	1
7	20	0.005	1.0	15	1
8	20	0.005	0.2	5	1
9	20	0.005	0.2	10	1
10	20	0.005	0.2	50	1
11	20	0.005	0.2	15	2
12	20	0.005	0.2	15	4

Twelve parameter groups (each with five hyperparameters) were used for LSTM model training. The shaded boxes highlight differences between groups of hyperparameters.

The lengths of the datasets ranged from 88 s to 300 s for the training and mid-term testing datasets, and from 39 s to 326 s for the long-term datasets (Table 2). By design, the training and mid-term testing datasets were equal in length. The long-term testing data were shorter than the mid-term data for six volunteers and longer for the others. Because they contained multiple respiratory cycles, all of the waveforms were sufficiently long for testing. On average, for each second of training data, 0.48 s of training time—or about half the length of the training data—was needed to train the LSTM model with group 4 hyperparameters. Consequently, 96 ± 34 s were needed, on average, to train the LSTM models. By contrast, the trained LSTM models required just 0.2 s on average to predict the entire length of the testing data. Because clinical applications require forecasting only a few breathing cycles into the future, the LSTM models’ near real-time prediction could be further hastened by shortening the prediction window.

Table 2.

The Length of Training and Testing Waveform Datasets (20 Hz), the Average Time for LSTM Model Training and Prediction Using the Group 4 Hyperparameters among Five Replicates, and the Mean Motion Range (mm) and External-Internal Time Shift (s) in the two Timeframes.

Subject	Mid-Term Data (3-10 min)						Long-Term Data (20-30 min)
	Training			Testing^$			Testing^#
	Length (s)^$	Avg t^T (s)	Avg t^T (s) Per Training Data (s)	Avg t^P (s)^#	Motion Range (mm)	Time Shift (sec)	Length (s)	Motion Range (mm)	Motion Change (%)	Time Shift (sec)	Time Shift Change (%)
1	133.5	73.3	0.55	0.8	13.8	0.40	42.5	15.6	13%	0.35	−14%
2	256.6	133.0	0.52	0.3	13.2	0.70	50.0	16.5	25%	0.55	−27%
3	88.0	43.0	0.49	0.1	16.8	0.40	33.0	17.3	3%	0.45	11%
4	115.0	55.2	0.48	0.1	7.7	1.95	70.0	66.9	769%	0.50	−290%
5	228.0	110.7	0.49	0.1	14.1	0.35	275.0	14.6	4%	0.35	0%
6	175.0	82.6	0.47	0.1	17.1	0.65	266.6	19.2	12%	0.70	7%
7	175.2	84.4	0.48	0.2	13.9	2.95	195.0	22.1	59%	0.55	−436%
8	300.0	145.1	0.48	0.2	9.2	1.35	325.7	6.9	−25%	1.50	10%
9	261.4	125.1	0.48	0.2	5.1	0.65	81.0	6.1	20%	0.45	−44%
10	224.9	106.1	0.47	0.2	25.7	0.70	39.2	20.1	−22%	0.65	−8%
AVG	195.8	95.9	0.5	0.2	13.6	1.0	137.8	21.2	95%	0.6	−79%
STD	69.6	33.7	0.0	0.2	6.1	0.8	115.1	18.0	254%	0.3	155%

Note: substantial differences in subjects 4 and 7 are highlighted in bold.

^$ The length of the waveforms for the training (first half) and mid-term testing (second half) is the same.

The Average time of long-term prediction is the same as that for mid-term prediction.

t^T – Time needed to train the LSTM model, averaged across 5 replicates, using 1760–24 000 data points/subject.

t^P – Time needed for the trained LSTM model to generate the predicted waveform, averaged over 5 replicates.

Validation of the LSTM Models

The LSTM models were validated by testing the trained models on the same data upon which they were trained. Correlation and RMSE values were found for four different methods: native, TCC, LSTM, and LSTM-TCC (Table 3). The external-to-internal motion correlations obtained for the LSTM-predicted waveforms (C = 0.93 ± 0.05) significantly exceeded those obtained for the native waveforms (C = 0.42 ± 0.28, p < .001) and the TCC-corrected waveforms (C = 0.75 ± 0.12, p < .001). Moreover, the performance of the LSTM models surpassed that of the TCC method for all volunteers: Whereas the latter approach yielded only two acceptable correlations (C ≥ 0.8), the former method produced acceptable correlations for all subjects.

Table 3.

Correlation and RMSE Values for the Training Dataset (mid-Term) for Validation among Four Different Approaches.

Subject	Native		TCC		LSTM		LSTM-TCC Hybrid^†
Subject	Corr	RMSE	Corr	RMSE	Corr	RMSE	Corr	RMSE
1	0.690	0.792	0.864	0.531	0 . 955	0 . 305	0 . 955	0 . 305
2	0.411	1.062	0.793	0.632	0.972	0.247	0.974	0.237
3	0.836	0.581	0.969	0.253	0 . 982	0 . 204	0 . 982	0 . 204
4	0.207	1.258	0.714	0.770	0.922	0.392	0.923	0.404
5	0.654	0.847	0.727	0.753	0.844	0.546	0.845	0.563
6	0.456	1.059	0.702	0.787	0.957	0.302	0.968	0.302
7	0.012	1.382	0.788	0.651	0.937	0.370	0.941	0.346
8	0.024	1.435	0.720	0.773	0 . 986	0 . 169	0 . 986	0 . 169
9	0.355	1.069	0.524	0.938	0.920	0.400	0.924	0.510
10	0.553	0.903	0.725	0.718	0.859	0.516	0.860	0.535
AVG	0.42	1.04	0.75	0.68	0.93	0.35	0.94	0.36
STD	0.28	0.27	0.12	0.19	0.05	0.12	0.05	0.14

For all subjects, the correlation coefficient increased in the order of Native < TCC < LSTM ≤ LSTM-TCC.

†

Bold italic font indicates no difference between the Hybrid approach and the LSTM models alone.

As a control, the hybrid approach was also applied to the training data, and only negligible residual time shifts (0.15 s for volunteer 7 and 0.00-0.05 s for all others) were found, resulting in a trivial improvement (at most 1.1% increase in correlation, p > .05) from LSTM to LSTM-TCC.

Internal-Motion Predictions in the mid-Term (3-10 min) and Long-Term (20-30 min) Periods

Applied to the mid-term external waveforms, the trained LSTM models predicted internal waveforms that correlated strongly with the actual internal waveforms and always exceeded the correlation obtained for the native waveforms and the TCC method: Whereas the LSTM models achieved acceptable correlations (C ≥ 0.8) for all volunteers, the TCC method alone achieved acceptable correlations in only 2 volunteers (Table 4). The LSTM models' performance for the mid-term testing dataset (C = 0.89 +/− 0.07) significantly suprassed those of the TCC method (C = 0.73 ± 0.13, p < .001) and native correlation (C = 0.42 ± 0.28, p < .001). The RMSE values obtained for the LSTM method (0.45 ± 0.18) were smaller than those for the TCC method (0.72 ± 0.23) and the native waveforms (1.06 ± 0.29). Only minor residual time shifts were identified in the mid-term testing dataset: within ±0.05 s for 7 volunteers, −0.15 s for one volunteer, 0.20 s for volunteer 4, and 0.25 s for volunteer 7. The hybrid approach, therefore, conferred little benefit, but no harm, over the LSTM models alone in the mid-term dataset, obtaining an improvement in correlation of 1.1% on average, which accounts for the correlation increase from C = 0.89 to C = 0.90 (Table 4).

Table 4.

Correlation Values for the mid-Term (3-10 min) and Long-Term (20-30 min) Testing Datasets.

Subject	Native Corr		TCC Corr*		LSTM Corr ^#		LSTM-TCC Corr ^#
Subject	Mid-term	Long-term	Mid-term	Long-term	Mid-term	Long-term	Mid-term	Long-term
1	0.67	0.80	0.88	0.96	0 . 91	0 . 99	0 . 91	0 . 99
2	0.34	0.58	0.76	0.88	0 . 97	0.97	0 . 97	0.98
3	0.86	0.77	0.97	0.91	0.99	0.94	0.99	0.95
4	0.32	0.45	0.71	−0.17	0.92	0.63	0.93	0.80
5	0.74	0.66	0.79	0.76	0.84	0.80	0.85	0 . 80
6	0.45	0.45	0.71	0.69	0.81	0.84	0.82	0.84
7	0.08	0.60	0.66	−0.63	0.83	0.61	0.84	0.85
8	0.03	−0.01	0.67	0.71	0 . 97	0 . 97	0 . 97	0 . 97
9	0.17	0.49	0.47	0.67	0.90	0.85	0.90	0.86
10	0.49	0.52	0.69	0.71	0 . 80	0 . 89	0 . 80	0 . 89
AVG	0.42	0.53	0.73	0.55	0.89	0.85	0.90	0.89
STD	0.28	0.23	0.13	0.52	0.07	0.14	0.07	0.07

The LSTM model achieved an acceptable correlation (C ≥ 0.8, in bold) in all ten subjects for the mid-term predictions, while the hybrid model (LSTM-TCC) achieved an acceptable correlation (C ≥ 0.8) in all subjects for both mid-term (5-10 min) and long-term (20-30 min) predictions.

*The time shift found in the testing data was applied to the external waveforms of the mid-term and long-term testing datasets. Bold numbers are acceptable correlations (C ≥ 0.8).

^# The bold italics values are identical between LSTM and LSTM-TCC results.

When applied to the long-term datasets, the LSTM method failed to achieve C ≥ 0.80 for volunteers 4 and 7, but the low performance was rescued by the hybrid approach, and C ≥ 0.80 was thus obtained for all volunteers. Whereas residual time shifts for other volunteers remained within ±0.10 s, the time shifts for volunteers 4 and 7 increased to 0.50 s and 0.55 s, respectively, in the long-term scan. Although the absolute difference was small, the average correlation obtained by the hybrid approach was statistically higher than that obtained by the LSTM models alone (p = .02). Notably, the TCC method yielded acceptable correlations for only three volunteers. The RMSE values for the hybrid approach (0.42 ± 0.19) were significantly lower than those for the LSTM models (0.49 ± 0.26, p = .01), the TCC method (0.84 ± 0.48, p < .001), and the native waveforms (0.95 ± 0.22, p < .001). Compared to the LSTM models, the hybrid approach confers the greatest benefit when the residual time shift changes over time (Figure 2). When the residual time shift becomes large, the LSTM model falters, and adaptation—like that enacted by the hybrid approach—is needed to ensure adequate performance. After all, the hybrid LSTM-TCC model provides equal or better correlation predictions in both the mid-term and long-term time frames than the LSTM model alone without exception, serving as a safeguard for the predictions.

Figure 2.

Heat maps portray the correlations achieved by the LSTM and LSTM-TCC models for each of the 12 hyperparameter groups (with five replicates in each group). The LSTM models predict well (C > 0.8) in mid-term testing (A) but suffer in long-term testing (B), while the hybrid models (LSTM-TCC) predict well (C > 0.8) for both mid-term (C) and long-term (D) testing. The improvements in subjects 4 and 7 are significant, boosting the correlation to an acceptable level (C > 0.8), because the hybrid models can correct the changes in phase shifts, adapting to the new breathing patterns.

The Ability of the Training RMSE Values to Identify High-Performance LSTM Models

The LSTM training RMSE values were strongly correlated with the LSTM predictions in the mid-term scan (C = 0.889) and reliably identified LSTM models that yielded correlations of 0.8 or greater—termed high-performance LSTM models—in the mid-term dataset (AUROC = 0.818, 95%CI [0.780, 0.852]). The hybrid approach conferred a small but significant improvement in the ability to identify high-performance models in the mid-term dataset (AUROC = 0.844, 95%CI [0.804, 0.882], p < .001). For the long-term timeframe, however, the RMSE's ability to identify high-performance LSTM models dropped (AUROC = 0.764, 95% CI [0.726, 0.802]), but was rescued and significantly enhanced by the hybrid approach (AUROC = 0.830, 95%CI [0.788, 0.867], p < .001).

Predicting the Temporal Occurrence of Inspiratory Peaks in the Actual Internal Waveform

The waveforms predicted by the LSTM models recapitulated the qualitative shapes of the actual internal waveforms with high fidelity (Figure 3). To quantify the accuracy of predicting the occurrence of inspiratory peaks in the internal waveforms, subject-specific distributions were found for the temporal errors in locating peaks (Figure 4).

Figure 3.

Comparison of actual (red) diaphragmatic-motion trajectories and the predicted (blue) from the LSTM model for three volunteers (1, 5, 7) in (A) mid-term (5-10 min) and (B) long-term (20-30 min) tests, and the Fast Fourier Transform (FFT) spectra (C) and cross-correlation of FFT spectra (D). The peaks of the predicted waveforms are marked by blue dots, and the corresponding peaks in the actual waves by black dots. It is worthwhile to mention that all plots illustrate accurate peak prediction (less than 0.18±0.15 s), despite some spatial uncertainty. The FFT spectra of the waveforms in training (Tr), mid-term (MT), and long-term (LT) timeframes (C) and the cross-correlation of the FFT spectra of MT/LT versus Tr waveforms by shifting MT and LT spectra relative to the fixed Tr spectra (D). For subjects 1 and 5, the FFT and cross-correlation appear similar in terms of frequency and cross-correlation distributions, whereas for subject 7, both plots (C, D) show substantial changes between long-term and mid-term waveforms.

Figure 4.

Temporal errors in box plots depicting the median (horizontal, red lines), quartile, and range of the predicted inspiratory peaks in reference to the navigator waveforms for the mid-term (top) and long-term (bottom) scans. Original (O, red), TCC method (T, green), LSTM model (L, blue), and hybrid LSTM-TCC approach (H, purple). The dashed, horizontal line indicates an error of 0.25 s. The results of statistical comparisons are also shown, and the number of asterisks indicates the degree of certainty (* p < .05, ** p < .01, *** p < .001, and ns for not significant; Mann-Whitney U test). Red asterisks indicate that the opposite relation was true. For example, the single, red asterisk in the mid-term scan for volunteer 1 on the T < O line indicates that the temporal error yielded by the TCC method was statistically larger than that for the native waveforms (p < .05). Gray shading indicates that no residual time shift was detected and, consequently, the hybrid approach was no different from the LSTM models alone.

In the mid-term scan, the LSTM models located the peaks more accurately than the native waveforms for nine out of 10 volunteers (Figure 4A) and more accurately than the TCC method in all 10 subjects. Although small residual time shifts were detected in the mid-term data for six volunteers, the sizes of the observed differences in the temporal errors were small between the LSTM and LSTM-TCC approaches (Figure 4A).

In the long-term scan, the LSTM models located peaks more accurately than the native waveforms for six volunteers, but the temporal errors were significantly worse for volunteer 4 (Figure 4B). The LSTM models were more accurate than the TCC method for seven volunteers, and no statistically significant difference was found for the other three subjects. In the six volunteers for whom a residual time shift was detected during the long-term scan, the hybrid approach was more accurate than the LSTM models for three subjects (Figure 4B). Notably, the hybrid approach could rescue the poor performance of the LSTM models alone for volunteer 4. Major breathing variations arising in the long-term timeframes, therefore, necessitate the application of the hybrid LSTM-TCC approach to maintain an acceptable performance level.

The LSTM models yielded median temporal errors of up to 0.25 s for eight out of ten subjects in the mid-term timeframe and for nine subjects in the long-term timeframe. The hybrid approach produced median temporal errors below 0.25 s for all subjects in the mid-term scan and for nine out of 10 subjects in the long-term waveform. For volunteer 7, the one exception to the hybrid approach achieving the 0.25-s benchmark, the median temporal error was 0.35 s For this single subject, the shape of the external waveform changed considerably between the mid-term and long-term scans, rendering the learned relationship between the internal and external waveforms in the training dataset less valid.

In the mid-term scan, the mean error in predicting the timing of peaks for all 10 subjects was 0.76 ± 0.61 s for the native waveforms, 0.67 ± 0.53 s for the TCC method, 0.18 ± 0.12 s for the LSTM models, and 0.17 ± 0.10 s for the hybrid approach. In the long-term scan, the average errors were 0.38 ± 0.37 s for the native waveforms, 0.39 ± 0.22 s for the TCC method, 0.18 ± 0.15 s for the LSTM models, and 0.15 ± 0.11 s for the hybrid approach.

Discussion

Although they have been employed to study time series processes in radiotherapy, LSTM models for long-term predictions of external-to-internal respiratory-induced organ motion have not been fully explored, especially the accuracy and adaptability of their predictions in the presence of common breathing irregularities. LSTM models can be used to predict future motion (1) within the same waveform or across waveforms to predict the motion in one waveform from the other; (2) for short-term, just-in-time predictions (such as the next 50-500 ms) or for relatively long timeframes (such as 5-30 min); and (3) for population-based modeling with a large subject pool, or subject-specific modeling with sufficient dataset per subject for a handful a individuals. In this study, we deployed LSTM models—with and without adaptation—to investigate long-term, subject-specific relationships between external and internal motions in 10 healthy volunteers, demonstrating the proof of principle of the novel adaptive LSTM-TCC approach.

Advantages of Subject-Specific, Adaptive LSTM Prediction of Long-Term, Internal Motion

This study tested the ability of LSTM networks to predict internal respiratory motion from external-motion waveforms over mid-term (3-10 min) and long-term (20-30 min) timeframes. Because training datasets cannot capture the entire range of an individual's breathing patterns, we sought to develop an adaptation mechanism to address breathing variations occurring over longer timescales. Adaptation of the trained LSTM models was achieved by identifying and correcting residual phase shifts through a conventional TCC method. This near-real-time adaptive mechanism avoids time-consuming retraining, potentially useful for clinical RGRT applications. We therefore assessed not only the correlations obtained between the actual and predicted internal waveforms, but also the temporal accuracy achieved by the models through predicting inspiratory peaks, which are usually better for beam gating than the typically flat expiratory valleys.^28,29

Because patients may display breathing irregularities that were not captured in the training datasets, the accuracy of the LSTM models’ predictions was expected to decline from the 3–10 min to 20–30 min periods. Indeed, within mid-term, the C = 0.93 has dropped to C = 0.89 from the first half (training) data to the second half (testing) data. More importantly, the LSTM models failed to achieve acceptable correlations (C ≥ 0.8) for two subjects, but the models’ performance was rescued by correcting the residual phase shifts through the hybrid LSTM-TCC approach. Breathing pattern changes in subjects 4 and 7 were illustrated by substantial changes in their motion ranges and external-internal time shifts between the mid-and long-term timeframes, compared with the other subjects (Table 2). These results reveal that LSTM models cannot account for severe changes in breathing behaviors that occur outside the training dataset, underscoring the need for adaptation. Although LSTM models can be retrained by adding newly obtained data to the training dataset, such approaches will cause a time delay of a few minutes per retraining, interrupting potential clinical applications. In contrast, the hybrid, adaptive approach (LSTM-TCC) is almost instantaneous (300 ms), readily incorporated to identify and correct residual phase shifts, and ensures to achieve acceptable performance during a treatment. To the best of our knowledge, this is the first study to achieve a potential adaptive deep-learning method by correcting the time shifts, which are the major cause of incoherent motion waveforms.^9,10,28

The median temporal error in predicting inspiratory peaks was < 0.25 s for the mid-term timeframe in eight out of ten subjects using the LSTM models, and in all subjects using the LSTM-TCC approach. In the long-term timeframe, the hybrid approach rescued the large temporal errors yielded by the LSTM model for volunteer 4, the only subject for whom the temporal errors were statistically worse than for the native waveforms. The hybrid approach produced median temporal errors below 0.25 s in nine out of ten subjects, and the median temporal error was 0.35 s in one subject (volunteer 7) whose bellows waveform changed considerably between the first and second scans. Again, subjects 4 and 7 experienced substantial breathing pattern changes (Table 2). For respiratory gating, the temporal uncertainty can be further reduced using an automatic <0.5 s delay for beam-on/off after receiving the gating signal in Varian TrueBeam (Simens-Varian Medical Systems, Palo Alto, CA).

Sources of Phase (Time) Shifts Between External and Internal Respiratory Motions

Phase (time) shifts between external and internal respiratory motions are common,^9,10,27 reflecting the imperfect synchronization between the diaphragmatic, superior-to-inferior (SI) motion detected by the navigator and the upper-abdominal, anterior-to-posterior (AP) motion detected by the bellows. In free breathing, the diaphragmatic and intercostal muscles are the main driving forces for diaphragmatic SI and chest AP motions, respectively. Time delays—or phase shifts—therefore arise when these muscles’ motions are not precisely synchronized during respiration. Because distinct muscle groups are involved, the diaphragmatic SI motion that propagates to the abdominal AP motion (measured by the bellows device) differs from that propagating to the chest AP motion. Using the TCC method to correct the phase shift between the diaphragmatic SI motion and abdominal AP motion, therefore, enhances the correlation between the two motions (Table 2). Breathing irregularities—such as a sudden uncoupling between internal and external respiratory motions—that deviate from an individual's normal muscle motion and synchronization may corrupt the correlation and generate outliers among the temporal errors.

The various muscle groups make different contributions to respiration. Namely, abdominal-driven, thoracic-driven, or mixed breathing patterns may be observed, and, in a few minutes, a patient's respiration may drift from one pattern to another. In this study, two cases in the long-term timeframe displayed substantial breathing pattern changes, and the residual phase shifts engendered by those changes could not be accommodated by LSTM models trained on the mid-term datasets. Correcting residual phase shifts through the TCC method effectively boosted the LSTM models’ performance (Table 3). This approach constitutes a near real-time, adaptive mechanism that maintains the fidelity of LSTM models’ predictions and avoids time-consuming retraining. Because the LSTM-TCC model always outperforms or equals the performance of the LSTM model alone, the TCC serves as a potential near-real-time quality assurance step, at least in part, ensuring accurate prediction results.

Phase shifts between external and internal waveforms—quantified by phase-domain or time-domain techniques—were previously found to be steady over time for most subjects despite breathing irregularities.^9,10 Whereas residual phase shifts and their impact on model performance were negligible for most participants in the present study, two subjects displayed large residual phase shifts (or time-shift changes) in the long-term timeframe (Table 2). By learning the complex, non-linear relationships between external and internal motions, the LSTM model can correct phase shifts and achieve results that are superior to those obtained by oversimplified linear correlation (such as TCC) and approximate physical modeling approaches. However, the TCC approach can take new variations into account and correct any residual phase shifts from the LSTM predictions almost instantly (<100 ms), therefore, the hybrid approach (LSTM + TCC) becomes more accurate and reliable for internal motion prediction.

Selection of Hyperparameter Values During LSTM Training and the Model Stability

The performance of the LSTM model is insensitive to hyperparameter selection within the previously tested range.¹⁷ Therefore, we limited our testing range to only the reported optimal hyperparameter settings for LSTM modeling, rather than exhaustive optimizations. Indeed, similar performance levels are obtained (Figure 2) for hyperparameter values—including the number of LSTM layers, hidden units per LSTM layer, epochs, learning rate, and forgetting rate—chosen from recommended ranges,¹⁷ suggesting that the deep learning network may find a way to optimize its learning outcomes. It is worthwhile to note, however, that the stochastic gradient descent with momentum optimization algorithm should be used to ensure non-biased deep-learning results. The large AUROC values found in this study additionally reveal that high-performance, subject-specific LSTM models can be identified during the training period from limited sets of hyperparameter values, such as the 12 combinations used herein.

The prediction stability of the trained model depends on the variation of the testing datasets, including both mid-term and long-term timeframes. In this study, several indicators are used to characterize the breathing pattern changes in a subject, including motion amplitudes and time shifts in Table 2 and FFT spectra and cross-correlation of the spectra in reference to that of the training data in Figure 3. For subjects 4 and 7, the LSTM model predictions begin to fail (C ≈ 0.6) in the long-term testing, which is directly caused by the substantial changes in these values. The poor predictions for these two subjects in the long-term time frame are rescued by correcting the time shifts using the hybrid LSTM-TCC model. However, other breathing pattern changes, such as low-frequency baseline drifts, are also present in the testing data and reduce the accuracy of positioning prediction, especially in motion amplitude (Figure 3), beyond the capability of the hybrid model.

Potential Clinical Applications Using the Hybrid LATM-TCC Modeling

This study has demonstrated that the hybrid LSTM-TCC models can predict internal organ motion with high correlation and temporal accuracy with stable performance over 5–30 min and thus have the potential for RGRT applications. The advantages of this adaptive LSTM-TCC approach include (1) potential near real-time prediction and adaptation, and their latency (LSTM: 200 ms and TCC: 100 ms) can be further reduced by using GPU-based computation in the future for up to two orders of magnitude acceleration⁶ and (2) long-term stability within 30 min, which covers the entire treatment fraction without remodeling. This study has demonstrated the feasibility of using the new hybrid LSTM-TCC to predict internal organ motion in 20–30 min via adaptation. Using the hybrid LSTM-TCC approach (C = 0.89 ± 0.07, ranging 0.80-0.99, with a 100-300 ms latency), an ongoing investigation has been initiated to characterize respiratory gating accuracy and efficiency, and to evaluate the dosimetric consequences, treatment margin, and clinical feasibility. Although a 10-subject sample size is sufficient for the proof-of-principle study, its generality needs further testing in a larger number of cancer patients, who may exhibit more frequent or extreme breathing variations.

It is worthwhile to note that the breathing irregularities may occur more frequently in patients than in volunteers, as cancer patients, especially lung cancer patients, may experience breathing difficulties and variations caused by pain, disease progression, and other health changes. Therefore, an adaptive approach to cope with a higher probability of severe breathing pattern changes is critical to ensure the stability of model prediction, avoiding breathing remodeling that interrupts patient treatment.¹¹ For LSTM-TCC prediction, it could be performed continuously with a latency of 300 ms, increased from 200 ms due to the addition of TCC computation.

Additionally, the external-internal motion correlation study was based on normalized motion waveforms, and the absolute accuracy (in mm) could only be estimated from the motion ranges measured based on the reconstructed RC-4DMRI images. On the other hand, respiratory gating is often based on normalized motion range near the full exhalation with a gating threshold of 30%-50%. Therefore, it should be suitable to evaluate the potential of LSTM-TCC models for RGRT applications in the future.

Concerns and Limitations of the Hybrid LSTM Network for Motion Management

As discussed above, the conventional linear TCC method is inferior to the LSTM, a non-linear deep-learning approach, resulting in less enhancement in the external-internal correlation, in reference to the native correlation (Table 4). However, as the LSTM may lose its predicting power when facing severe respiratory pattern variations, using the TCC method to correct any residual time shift in the predicted results becomes a safeguard to the overall results with a near real-time performance (<100 ms). This hybrid approach overcomes the lengthy LSTM model re-training process by correcting the major cause of external-internal correlation degradation. Other factors affect the correlation, such as changes in the shape of the waveforms, which may not be fully correctable using the linear TCC approach.

Whether breathing variations outside the training dataset result in consequences other than residual phase shifts remains to be investigated. We observed one example (volunteer 7) of considerable change in the shape of the external waveform (with changing frequencies in FFT, Figure 3), and, although an acceptable correlation was obtained, the hybrid approach yielded a median temporal error in identifying inspiratory peaks that exceeded the benchmark (<0.25 s). More subjects should be recruited and studied to determine whether the hybrid, adaptive LSTM-TCC approach is sufficiently versatile and universal to adapt to various possible breathing variations under different clinical scenarios. Changes in the shape of the bellows waveform would be readily detected, and the LSTM models could be retrained for the patients demonstrating such variation. Additionally, low-frequency baseline shifts were also observed, affecting the prediction accuracy in motion amplitude (Figure 3). Although increasing the size of the training dataset may fix this issue, as more motion variations can be included in the LSTM model, other adaptive approaches may be necessary to handle new breathing pattern changes, especially for cancer patients. Furthermore, a significantly longer timeframe—typically a week—separates a patient's simulation from their treatment. In a future study, we will investigate whether patient-specific LSTM models trained at simulation can be adapted via the hybrid LSTM-TCC approach to account for differences in breathing motions that occur on the day of treatment. Before it can be put into clinical practice, more investigations are needed to test and validate the hybrid LSTM model's stability and reliability.

Lastly, the LSTM and LSTM-TCC models’ predictions for respiratory-motion amplitudes are imprecise (Figure 3), and further improvements are required for tumor-tracking applications. Potential solutions are (1) to increase the training sample size so that more variations, including low-frequency baseline drifts, can be included in the LSTM modeling, and (2) to explore other adaptive approaches to update the LSTM model in near real-time. For respiratory gating, however, a high temporal accuracy (0.18 ± 0.15 s) of the models would be sufficient, as it suggests a high motion sensitivity required to determine the beam-on and beam-off windows. An ongoing simulation study has been initiated to characterize respiratory gating capability using both the LSTM and hybrid LSTM-TCC approaches, including accuracy, efficiency, and uncertainty for RGRT. Overall, the hybrid LSTM-TCC model has been demonstrated to be feasible as a potential adaptive strategy for combating common breathing irregularities and maintaining accurate and reliable external-to-internal motion predictions over a 30-min timeframe.

Conclusions

In this study, we investigated the LSTM network and a hybrid LSTM-TCC approach for subject-specific, cross-dataset, and mid- and long-term prediction of internal-organ motion, demonstrating a feasible strategy to predict internal organ motion with high correlation and temporal accuracy in the presence of common breathing irregularities. Through correcting any residual phase/time shifts in the LSTM model caused by new breathing variations beyond the training dataset, the hybrid LSTM-TCC model can help maintain near real-time, high prediction accuracy throughout 5–30-min timeframe without remodeling. This adaptive approach could serve as a safeguard to check and correct any phase/time shifts, securing prediction performance. Further investigations are required before the hybrid LSTM approach is ready for clinical applications.

Footnotes

PACS Classification Numbers

Artificial intelligence, 07.05.Mh

Respiration, 87.19.Wx

Magnet resonance imaging, 87.61.−c

Treatment planning, 87.55.D-

treatment strategy in, 87.55.-x

Abbreviations

Acknowledgments

This research is supported in part by the MSK Cancer Center Support Grant/Core Grant (P30 CA008748). The authors are grateful to the simulation therapists for acquiring MRI scans of all participants under an IRB-approved protocol.

ORCID iD

Guang Li

Ethical Considerations

This clinical research is conducted under MSK IRB-approved protocol (IRB 073-015, project #4), in accordance with the principles embodied in the Declaration of Helsinki and US NIH regulations. All subjects enter this study voluntarily and sign the informed consent.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Clinical Center, (grant number P30 CA008748).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

Keall

Mageras

Balter

, et al. The management of respiratory motion in radiation oncology report of AAPM task group 76 [published online ahead of print 2006/11/09]. Med Phys. 2006;33(10):3874-3900.

Mageras

Dong

Mohan

. Image-guided radiation therapy. In: Khan

Sperduto

Gibbons

, eds. Treatment planning in radiation therapy. Wolters Kluwer; 2021:374-403.

Hoppe

Laser

Kowalski

, et al. Acute skin toxicity following stereotactic body radiation therapy for stage I non-small-cell lung cancer: Who's at risk? [published online ahead of print 2008/11/26]. Int J Radiat Oncol Biol Phys. 2008;72(5):1283-1286.

Sharp

Zhao

Shirato

Jiang

. Statistical analysis and correlation discovery of tumor respiratory motion [published online ahead of print 2007/08/03]. Phys Med Biol. 2007;52(16):4761-4774.

Eom

Shi

. Predictive modeling of lung motion over the entire respiratory cycle using measured pressure-volume data, 4DCT images, and finite-element analysis [published online ahead of print 2010/10/01]. Med Phys. 2010;37(8):4389-4400.

Lewis

Jia

, et al. 3D Tumor localization through real-time volumetric x-ray imaging for lung cancer radiotherapy [published online ahead of print 2011/07/23]. Med Phys. 2011;38(5):2783-2794.

Keall

Aun Ng

O'Brien

, et al. The first clinical treatment with kilovoltage intrafraction monitoring (KIM): A real-time image guidance method [published online ahead of print 2015/01/08]. Med Phys. 2015;42(1):354-358.

Kim

Lee

Chie

Kang

Park

. Development of patient-controlled respiratory gating system based on visual guidance for magnetic-resonance image-guided radiation therapy [published online ahead of print 2017/07/05]. Med Phys. 2017;44(9):4838-4846.

Milewski

Olek

Deasy

Rimner

. Enhancement of long-term external-internal correlation by phase-shift detection and correction based on concurrent external bellows and internal navigator signals [published online ahead of print 2019/04/24]. Adv Radiat Oncol. 2019;4(2):377-389.

10.

Milewski

. Stability and reliability of enhanced external-internal motion correlation via dynamic phase-shift corrections over 30-min timeframe for respiratory-gated radiotherapy. Technol Cancer Res Treat. 2022;21:1-15. 15330338221111592

11.

Torshabi

Pella

Riboldi

Baroni

. Targeting accuracy in real-time tumor tracking via external surrogates: A comparative study [published online ahead of print 2010/11/13]. Technol Cancer Res Treat. 2010;9(6):551-562.

12.

Teng

Chen

Zhang

Ren

. Respiratory deformation registration in 4D-CT/cone beam CT using deep learning [published online ahead of print 2021/02/04]. Quant Imaging Med Surg. 2021;11(2):737-748.

13.

Cui

Tseng

Pakela

Ten Haken

El Naqa

. Introduction to machine and deep learning for medical physicists. Med Phys. 2020;47(5):e127-e147.

14.

Shi

Dvornek

, et al. Automatic inter-frame patient motion correction for dynamic cardiac PET using deep learning [published online ahead of print 20211130]. IEEE Trans Med Imaging. 2021;40(12):3293-3304.

15.

Kim

Park

Gach

Chun

Mutic

. Technical note: Real-time 3D MRI in the presence of motion for MRI-guided radiotherapy: 3D dynamic keyhole imaging with super-resolution [published online ahead of print 2019/08/04]. Med Phys. 2019;46(10):4631-4638.

16.

Dai

Lei

Roper

, et al. Deep learning-based motion tracking using ultrasound images [published online ahead of print 20211101]. Med Phys. 2021;48(12):7747–7756. doi:10.1002/mp.15321

17.

Lin

Shi

Wang

Chan

Tang

. Towards real-time respiratory motion prediction based on long short-term memory neural networks [published online ahead of print 20190410]. Phys Med Biol. 2019;64(8):085010.

18.

Jeong

Cheon

Cho

Han

. Clinical applicability of deep learning-based respiratory signal prediction models for four-dimensional radiation therapy [published online ahead of print 20221018]. PloS one. 2022;17(10):e0275719.

19.

Wang

, et al. Real-time liver tracking algorithm based on LSTM and SVR networks for use in surface-guided radiation therapy [published online ahead of print 20210114]. Radiat Oncol. 2021;16(1):13.

20.

Chun

Zhang

Gach

, et al. MRI super-resolution reconstruction for MRI-guided adaptive radiotherapy using cascaded deep learning: In the presence of limited training data and unknown translation model [published online ahead of print 2019/07/17]. Med Phys. 2019;46(9):4148-4164.

21.

Chang

Dang

Dai

Sun

. Real-Time respiratory tumor motion prediction based on a temporal convolutional neural network: Prediction model development study [published online ahead of print 20210827]. J Med Internet Res. 2021;23(8):e27235.

22.

Zhang

Yan

Xiao

Zhong

. Modeling of artificial intelligence-based respiratory motion prediction in MRI-guided radiotherapy: A review [published online ahead of print 20241008]. Radiat Oncol. 2024;19(1):140.

23.

Lombardo

Rabe

Xiong

, et al. Offline and online LSTM networks for respiratory motion prediction in MR-guided radiotherapy [published online ahead of print 20220419]. Phys Med Biol. 2022;67(9):095006. doi:10.1088/1361-6560/ac60b7

24.

Nie

. Real-Time 2D MR cine from beam eye's view with tumor-volume projection to ensure beam-to-tumor conformality for MR-guided radiotherapy of lung cancer [published online ahead of print 20220629]. Front Oncol. 2022;12:898771. doi:10.3389/fonc.2022.898771

25.

Han

Huang

Song

Yang

Wang

. Dynamic neural networks: A survey [published online ahead of print 20221004]. IEEE Trans Pattern Anal Mach Intell. 2022;44(11):7436-7456.

26.

Wei

Olek

, et al. Direct comparison of respiration-correlated four-dimensional magnetic resonance imaging reconstructed using concurrent internal navigator and external bellows [published online ahead of print 2016/12/25]. Int J Radiat Oncol Biol Phys. 2017;97(3):596-605.

27.

Collins

Reitsma

Altman

Moons

KGM

. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. Br Med J. 2015;350:g7594. doi:10.1136/bmj.g7594

28.

Vedam

Kini

Keall

Ramakrishnan

Mostafavi

Mohan

. Quantifying the predictability of diaphragm motion during respiration with a noninvasive external marker [published online ahead of print 2003/05/02]. Med Phys. 2003;30(4):505-513.

29.

Santanam

Yang

Parikh

. Accuracy and consistency of respiratory gating in abdominal cancer patients [published online ahead of print 2012/06/22]. Int J Radiat Oncol Biol Phys. 2013;85(3):854-861.

A Feasibility Study of Hybrid Deep-Learning Prediction with Online Adaptation of Breathing Irregularities for Long-Term Internal Organ Motion During Radiotherapy

Abstract

Introduction

Methods

Results

Conclusion

Keywords

Introduction

Materials and Methods

Concurrent External and Internal Respiratory-Motion Waveforms of 10 Subjects

Long Short-Term Memory (LSTM) Networks for Subject-Specific Training and Testing

Common Hyperparameters for LSTM Model Training

Assessment of LSTM Model Accuracy and Selection of LSTM Hyperparameters

Selection of Hyperparameters for High-Performance LSTM Models

Conventional Time-Domain Cross-Correlation (TCC) for Phase-Shift (or Time-Shift) Correction

A Hybrid, Adaptive Approach Using TCC to Enhance the LSTM Model Stability

Results

Limited Hyperparameter Search for Training the LSTM Models

Validation of the LSTM Models

Internal-Motion Predictions in the mid-Term (3-10 min) and Long-Term (20-30 min) Periods

The Ability of the Training RMSE Values to Identify High-Performance LSTM Models

Predicting the Temporal Occurrence of Inspiratory Peaks in the Actual Internal Waveform

Discussion

Advantages of Subject-Specific, Adaptive LSTM Prediction of Long-Term, Internal Motion

Sources of Phase (Time) Shifts Between External and Internal Respiratory Motions

Selection of Hyperparameter Values During LSTM Training and the Model Stability

Potential Clinical Applications Using the Hybrid LATM-TCC Modeling

Concerns and Limitations of the Hybrid LSTM Network for Motion Management

Conclusions

Footnotes

PACS Classification Numbers

Abbreviations

Acknowledgments

ORCID iD

Ethical Considerations

Funding

Declaration of Conflicting Interests

References