Sage Journals: Discover world-class research

Abstract

Introduction

Recent advances in deep learning have significantly improved the ability to solve ill-posed problems, making 4D cone-beam CT (CBCT) reconstruction from projections of 3D CBCT imaging achievable. However, extracting respiratory signal from CBCT projections for 4D CBCT phase sorting remains a challenge. This study aims to evaluate conventional and deep learning methods for extracting respiratory signal from projections of clinical 3D CBCT imaging.

Methods

This study analyzed 70 sets of projections from clinical 3D CBCT imaging, involving thoracic and abdominal cancer patients with regular and irregular respiratory motion patterns. Using the labeled apex of the diaphragm as a reference, respiratory signals extracted using conventional methods—including intensity analysis (IA), Fourier transform (FT), Amsterdam Shroud (AS), and local principal component analysis (LPCA)—as well as a deep learning-based method (U-Net) were compared through correlation analysis and phase-sorting capability.

Results

The U-Net significantly outperformed the conventional methods across varying conditions, achieving a correlation coefficient of 0.93 ± 0.07. Among the conventional methods, LPCA and AS outperformed IA and FT, with LPCA is considered superior because the AS method is influenced by the cutoff frequencies of the bandpass filter.

Conclusion

The U-Net demonstrates superiority in extracting respiratory signals from clinical 3D CBCT projections, highlighting its potential to enhance respiratory phase sorting and 4D CBCT reconstruction.

Keywords

respiratory motion CBCT projections internal anatomical structures conventional methods deep learning

Introduction

Radiotherapy has become increasingly important in cancer treatment with the advancement of precision radiotherapy techniques, such as intensity-modulated radiation therapy (IMRT), stereotactic body radiotherapy (SBRT), and particle therapy. These techniques enable the delivery of highly conformal doses to targets, while also posing challenges for respiratory motion management in thoracoabdominal cancer radiotherapy. Currently, respiratory motion management methods mainly contain 4DCT, breath hold, abdominal compression, respiratory gating, and motion tracking.¹ In clinical practice, breath hold and abdominal compression require significant patient cooperation and tolerance,² while respiratory gating and motion tracking involve more complex technologies and are dependent on specialized equipment.^3,4 Conversely, 4DCT, which creates an internal target volume (ITV) to encompass the respiratory motion of targets, is widely used for its accessibility and reduced patient burden.^5,6 Nevertheless, the motion characteristics extracted from 4D CT often fail to represent those observed during dose delivery, leading to dose uncertainties and potentially unfavorable clinical outcomes.^7-9 In other words, 4D CT-based ITV suffers from inter-fractional motion. A well-established approach for quantifying inter-fractional motion is 4D cone-beam CT (CBCT).¹⁰

4D CBCT images comprise a series of 3D CBCT images, each representing the motion state at a specific breathing phase. Conventionally, 4D CBCT images are generated by sorting projections into different breathing phases and reconstructing 3D CBCT images for each phase separately using the Feldkamp-Davis-Kress (FDK) algorithm.¹¹ In clinical practice, the generation of 4D CBCT images often necessitates thousands of projections, which result in excessive imaging dose and increased scan time.^12,13 Recently, deep learning-based volumetric image reconstruction has made the reduction of projections for 4D CBCT reconstruction achievable.^14,15 Thummerer et al.¹⁶ achieved 4D CBCT reconstruction using projections obtained with a 3D acquisition protocol on the gantry-mounted CBCT scanner of an IBA Proteus Plus proton therapy system. Yang et al.¹⁷ proposed a multiscale-discriminator generative adversarial network (MSD-GAN) for 4D CBCT reconstruction from projections obtained with a single routine scan. Notably, both studies underscore the importance of extracting internal respiratory signal from CBCT projections for effective 4D CBCT phase sorting and subsequent reconstruction, as external respiratory signal (e.g., from optical surface imaging or respiratory belt) may not accurately represent the motion of internal anatomical structures.¹⁸

Extracting internal respiratory signal manually from CBCT projections is labor-intensive and time-consuming, prompting several studies to explore data-driven methods to overcome these challenges. Zijp et al.¹⁹ proposed the Amsterdam Shroud (AS) method, which converts all the projections into a so-called AS image, enabling the extraction of respiratory signal. Kavanagh et al.²⁰ introduced the intensity analysis (IA) method, which examines the variations in lung intensity. Vergalasova et al.²¹ incorporated Fourier transform (FT) theory and demonstrated that both phase information (FT-p) and magnitude information (FT-m) extracted from projections are applicable for 4D CBCT phase sorting. Yan et al.²² summarized that variations in the AS image are mainly influenced by respiratory motion and gantry rotation, and achieved real-time extraction of respiratory signal using the local principal component analysis (LPCA) method. To further accelerate the LPCA method, Chao et al.²³ developed an adaptive robust z-normalization filtering technique to augment the weak oscillating structures in the AS image. Tsai et al.²⁴ combined external respiratory signals and the AS image to provide additional information for LPCA. Edmunds et al.²⁵ and Radig et al.²⁶ incorporated a deep learning-based approach to extract respiratory signal in a more directly way.

However, comparison studies among these methods remain limited, particularly in the context of respiratory signal extraction from projections of 3D CBCT imaging. Tan et al.²⁷ conducted a case study comparing the conventional methods, finding that the LPCA method outperformed the AS method, the IA method, and the FT method. In a follow-up case study, Tan et al.²⁸ demonstrated that on average, the AS method outperformed other methods across various motion patterns, and found that the performance of these conventional methods is associated with the motion pattern. Collectively, these comparison studies did not include the more advanced deep learning-based methods. Additionally, the inter-patient variability in CBCT projections, which cannot be fully captured by case studies, may significantly influence the performance of data-driven methods.²² Therefore, in this paper, we aim to evaluate the performance of the conventional methods and a deep learning-based method for extracting respiratory signals from projections of clinical 3D CBCT images across varying anatomical locations and respiratory cycle regularity.

The main contributions of this paper are summarized as follows: (1) To our knowledge, a comprehensive comparison of conventional methods and a deep learning–based method for extracting respiratory signals is performed using a large cohort of clinical 3D CBCT projection data, providing a more robust evaluation than previously reported case-based studies. (2) The evaluated methods are assessed using low-dose 3D CBCT projections, whose image quality is inferior to that of projections acquired for 4D CBCT. This investigation establishes a practical basis for 4D CBCT reconstruction using routine clinical CBCT scans without additional imaging dose. (3) The advantages and limitations of each method are systematically analyzed from a clinical perspective, offering practical guidance for method selection and highlighting key challenges for reliable respiratory signal extraction across different anatomical sites and levels of respiratory regularity.

Materials, Patients and Methods

Patient Data Acquisition and Reference Signal Extraction

This retrospective study analyzed 86 sets of clinical 3D CBCT projection data (a total of 21,363 projections), each corresponding to an individual patient. The cohort included patients with multiple disease sites, comprising 29 lung cancer cases, 47 liver cancer cases, 6 breast cancer cases, and 4 abdominal tumors. All CBCT data were acquired as part of routine clinical practice between July 2023 and January 2024. These projections were acquired using three X-ray volumetric imaging (XVI) systems, each installed on one of the beam-matched linear accelerators (one VersaHD and two Synergy equipped with Agility MLC, Elekta, Crawley, UK), at our institution. The acquisition parameters for these projections are listed in Table 1. This study was approved by our institution and all participants provided written informed consent.

Table 1.

The Parameters Used for CBCT Imaging

Imaging parameters	Thorax	Abdomen
KV	120	120
KV Collimator	S20	S20
KV Filter	F1	F1
Start Angle (deg)	45	45
Stop Angle (deg)	200	220
Gantry Speed (deg/min)	360	300
Total mAs	200	512
Pixel Dimension	512 × 512	512 × 512
Pixel Size (mm)	0.8	0.8
Sampling rate (Hz)	5.5	5.5

For each projection, the apexes of the bilateral diaphragms were manually labeled by two independent observers and cross-checked. The two-dimensional image coordinates $(x, y)$ of each apex were recorded, and the vertical image coordinate $y$ (corresponding to the superior-inferior direction) was extracted and used to represent the magnitude of respiratory motion. Data labeled from the higher side of the diaphragm in the majority of projections were selected as the reference signal of respiratory motion. However, the diaphragm on the higher side may shift to the lower side during imaging. For certain projections, the diaphragm on the lower side occasionally obscured the diaphragm on the upper side, making it difficult to accurately determine its position. This limitation hindered the acquisition of continuous respiratory signals and further analysis.

As a result, data from 16 patients (8 lung cancer cases, 8 liver cancer cases) out of the 86 were excluded from the analysis. The reference signals extracted from the 70 sets of projections (a total of 17,191 projections) were then classified as irregular following these criteria: (1) baseline shifts (the difference in valley positions across multiple respiratory cycles) exceeding 20% of the average respiratory motion amplitude; and (2) a coefficient of variation²⁹ of the respiratory cycle durations exceeding 0.2. As a result, the reference signals were categorized into the following groups: Thorax and Regular (15 patients), Abdomen and Regular (20 patients), Thorax and Irregular (12 patients), and Abdomen and Irregular (23 patients).

The Conventional Methods

Figure 1 shows an overview of the conventional methods for extracting respiratory signals from projections of clinical 3D CBCT imaging. All image processing and data analysis was implemented using an in-house developed MATLAB program (v2014b, MathWorks Inc, Natick, MA).

Figure 1.

Overview of the conventional methods for extracting respiratory signals from projections of clinical 3D CBCT imaging

The principle of the AS method¹⁹ for extracting respiratory signal is based on the periodic variations in anatomical structures (i.e. the diaphragm) in sequential CBCT projections due to respiratory motion. The main steps of the AS method: (1) for each projection, logarithmic transformation is applied to the pixel values, and the first derivative along the superior-inferior direction is computed to achieve edge enhancement; (2) each projection is individually processed by averaging the enhanced values along the left-right axis, and the results from all projections are then concatenated to form the AS image; (3) L2-minimization is performed by comparing the first column of the z-normalized AS image with all subsequent columns to extract the variations in anatomical structures; (4) a bandpass filter is used to extract the respiratory signal from the variations in anatomical structures.

The LPCA method²² provides an alternative approach for extracting respiratory signal using the AS image. In this method, foreground extraction is applied to enhance the variations in the AS image, and the variations in the AS image primarily result from respiratory motion and gantry rotation. Consequently, PCA is performed sequentially on the enhanced AS image to isolate the respiratory signal component, using a sliding window of 55 columns. Notably, the LPCA method eliminates the need for applying a bandpass filter.

The IA method²⁰ and the FT method²¹ mainly focus on respiratory-induced tissue variations within the selected region-of-interest (ROI), typically beginning at the superior portion of the lungs and encompassing all regions below. More specifically, the IA method detects the average density within the selected ROI from each CBCT projection. The FT method, encompassing FT-m and FT-p, utilizes the 2D Fourier transform to extract tissue variations from each CBCT projection. FT-m extracts the respiratory signal by plotting the direct current component, representing the average intensity of each projection. As reported in Tan et al.,²⁷ FT-m and IA are equivalent in practice, so in this study we combine them for analysis. The FT-p method extracts the first low-frequency component along the y-axis in Fourier space, specifically at the (0,1) location, from each CBCT projection. However, the tissue variations across sequential projections can also be influenced by gantry rotation and heartbeat, which occur at frequencies distinct from respiratory motion. Therefore, a bandpass filter was applied to isolate the desired respiratory signal for these methods.

Additionally, given the variation in respiratory motion frequencies across patients, the choice of cut-off frequencies for the bandpass filter can impact the performance of the AS, IA, and FT methods. In this study, patient-specific cut-off frequencies was first applied to the bandpass filter to evaluate the performance of these methods. Subsequently, we calculated the 95% confidence interval of the respiratory motion frequencies across all patients to determine population-based cut-off frequencies.

Deep Learning-Based Method

In this study, the classic U-Net architecture³⁰ was employed to represent the deep learning-based method, with a single CBCT projection used as input. To meet the output requirements of the U-Net, we fitted a parabolic curve to the apex points of the bilateral diaphragms to approximate the diaphragm contour, as shown in Figure 2A. The implementation of the U-Net was based on the publicly available code. Figure 2C illustrates the architecture of the U-Net used in this study. The network follows the classic encoder-decoder structure with symmetric skip connections between corresponding layers. The encoder consists of a series of convolutional and max-pooling layers that progressively extract high-level features while reducing spatial resolution. The decoder mirrors the encoder with up-sampling and convolutional layers, gradually restoring spatial resolution and combining feature maps from the encoder via skip connections to enhance localization accuracy. The y-coordinate of the apex of the higher diaphragm was extracted from the segmented output image and used as the respiratory motion signal.

Figure 2.

Overview of the U-Net for extracting respiratory signals from projections of clinical 3D CBCT imaging. (A) Parabolic curve fitted to diaphragm apex points to generate the input contour for the U-Net. (B) Training/test dataset split. (C) The architecture of the U-Net

To ensure that the model performance was evaluated on all available data, a rotational testing strategy was adopted, as illustrated in Figure 2B. Specifically, the 70 projection datasets were divided into five groups, each serving once as the test set while the remaining groups were used for training. The combined results on all test sets constituted the final evaluation results.

Statistical Analysis

Taking the labeled respiratory signal as the reference, the extracted signals from the conventional methods and the U-Net were evaluated using the Pearson correlation coefficient ( $r$ ), amplitude error, and phase error. Statistical analysis was performed using SPSS (v20.0, IBM Corp, Chicago, IL), and P < 0.05 was regarded as statistically significant.

The $r$ between the reference signal $S_{r e f}$ and each of the extracted signals $S_{e x t r a c t e d}$ is formulated as:

r = \frac{\sum_{t = 1}^{N} (S_{ref} (t) - {\overset{↼}{S}}_{ref}) (S_{extracted} (t) - {\overset{↼}{S}}_{extracted})}{\sqrt{\sum_{t = 1}^{N} {(S_{ref} (t) - {\overset{↼}{S}}_{ref})}^{2} \sum_{t = 1}^{N} {(S_{extracted} (t) - {\overset{↼}{S}}_{extracted})}^{2}}}

(1)

where

N

donates the number of points, and

t

refers the time point. The

r

ranges from 0 to 1, with 0 indicating the worst correlation and 1 representing the best correlation between the reference signal and the extracted signal.

The goal of extracting respiratory signals using these methods is to enable 4D CBCT phase sorting, which can be accomplished through both phase and amplitude sorting. Amplitude error and phase error are used to evaluate the accuracies of phase and amplitude sorting, respectively. It should be emphasized that these 4D CBCT phase sorting metrics are applicable only to regular breathing patterns.

amplitude error = \frac{\sqrt{\frac{1}{N} \sum_{t = 1}^{N} {(S_{ref} (t) - S_{extracted} (t))}^{2}}}{\max (S_{ref}) - \min (S_{ref})}

(2)

The amplitude error, which equals the normalized root mean square error, ranges from 0% to 100%, with 0% indicating perfect agreement between the two signals and 100% representing the maximum error.

phase error = \frac{1}{M} \sum_{m = 1}^{M} \frac{| {lo c}_{extracted} (m) - {lo c}_{ref} (m) |}{Average Respiratory Cycle}

(3)

where

M

is the total number of peaks and valleys in the reference signal, detected using the findpeaks function in MATLAB.

{lo c}_{extracted} (m)

and

{lo c}_{ref} (m)

represent the location of the m-th peak or valley in the extracted and reference signal, respectively. The phase error, which measures the average phase error of the peaks and valleys between the reference signal and the extracted signal, ranges from 0% to 100%, with 0% indicating perfect alignment of peaks and valleys and 100% denoting complete misalignment.

Results

Correlation Analysis of the Conventional Methods and the U-Net

Table 2 shows the correlation coefficient r of the respiratory signals extracted using the conventional methods and the U-Net under four conditions. The U-Net significantly outperformed the conventional methods, achieving a correlation coefficient of 0.93 ± 0.07. Furthermore, the U-Net obtained a consistently results across varying conditions, whereas the conventional methods are more effective for the regular groups. Among the conventional methods, LPCA and AS outperformed IA and FT. Additionally, for IA/FT-m and FT-p, the results for the thoracic groups are superior to those for the abdominal groups. In contrast, for AS and LPCA, the performance differences are minor, suggesting that these methods are less influenced by the anatomical region.

Table 2.

Correlation Coefficient $r$ of the Respiratory Signals Extracted Using the Data-Driven Methods. Tho-Reg, Abd-Reg, Tho-Irreg, and Abd-Irreg Represent Thorax and Regular, Abdomen and Regular, Thorax and Irregular, and Abdomen and Irregular, Respectively

Methods	Conventional methods				U-Net
Methods	IA/FT-m	FT-p	AS	LPCA	U-Net
Tho-Reg	0.63 ± 0.09	0.70 ± 0.08	0.82 ± 0.06	0.81 ± 0.07	0.92 ± 0.05
Abd-Reg	0.58 ± 0.10	0.58 ± 0.10	0.83 ± 0.04	0.83 ± 0.03	0.94 ± 0.07
Tho-Irreg	0.58 ± 0.13	0.58 ± 0.12	0.64 ± 0.14	0.67 ± 0.12	0.92 ± 0.05
Abd-Irreg	0.44 ± 0.11	0.49 ± 0.12	0.64 ± 0.12	0.65 ± 0.10	0.92 ± 0.08
All conditions	0.55 ± 0.13	0.58 ± 0.13	0.73 ± 0.13	0.74 ± 0.12	0.93 ± 0.07

Phase-Sorting Capability in the Regular Groups

Supplemental Material 1 shows a comparison of extracted and reference respiratory signals at different correlation levels. The extracted signals exhibit a notable discrepancy from the reference signals at correlation coefficients below 0.8. Therefore, we did not analyze the amplitude error and phase error for the IA/FT-m and FT-p methods. As shown in Figure 3, no statistically significant differences were found in the amplitude error and phase error for the extracted respiratory signals from AS and LPCA for the regular groups based on the paired t-test (amplitude error: 12.32 ± 2.27% vs 12.71 ± 2.57%, P = 0.14, phase error: 7.85 ± 2.95% vs 8.17 ± 2.32%, P = 0.44). In contrast, the U-Net achieved significantly superior and more reliable phase-sorting performance, yielding amplitude and phase errors of 7.58 ± 2.57% and 3.79 ± 1.63%, respectively.

Figure 3.

The amplitude error and phase error of the extracted respiratory signals from AS, LPCA, and the U-Net, respectively. The distributions of errors are presented as violin plots

Worst-Case Analysis of the U-Net

Figure 4 shows representative worst-case analysis of the U-Net. In (a), the couch structure was misidentified as the diaphragm. In (b), a parabolic contour in the neck region was incorrectly recognized as the diaphragm. In (c), the bilateral diaphragms were erroneously merged and detected as a single hemidiaphragm. However, the confidence scores for identified diaphragms were generally higher than 0.7, while those of the misidentified structures tended to have lower confidence. Alternatively, the respiratory signal corresponding to the higher side of the diaphragm can be selected from the scatter plots, as shown in Figure 4D.

Figure 4.

Representative worst-case analysis of the U-Net. (A) The couch structure was misidentified as the diaphragm. (B) A parabolic contour in the neck region was incorrectly recognized as the diaphragm. (C) The bilateral diaphragms were erroneously merged and detected as a single hemidiaphragm. (D) Scatter plots of detected diaphragm points from which the respiratory signal on the higher side of the diaphragm can be selected

Effect of Cutoff Frequencies on the Conventional Methods

Although the U-Net demonstrates superior performance, comparative analysis of conventional methods remains essential for understanding their parameter dependencies and practical limitations. Figure 5A shows the distribution of low-frequency cutoff values for the patient-specific bandpass filter. Based on this data, the 95% confidence interval for the respiratory motion frequencies across all patients is estimated to range from 0.10 Hz to 0.55 Hz, corresponding to the lower and upper cutoff frequencies of the population-based bandpass filter. As shown in Figure 5B, applying the population-based bandpass filter significantly reduces the correlation between the reference and the extracted respiratory signals from IA/FT-m (from 0.60 ± 0.10 to 0.28 ± 0.15, P < 0.05), FT-p (from 0.63 ± 0.11 to 0.32 ± 0.15, P < 0.05), and AS (from 0.82 ± 0.05 to 0.75 ± 0.12, P < 0.05), for the regular groups.

Figure 5.

The impact of cutoff frequencies for the bandpass filter on the performance of for IA/FT-m, FT-p, and AS. (A) The distribution of low-frequency cutoff values for patient-specific bandpass filter. (B) Comparisons of results from patient-specific and population-based bandpass filters. The distributions of errors are presented as violin plots

Discussion

4D CBCT plays a critical role in quantifying both inter-fractional and intra-fractional motion, especially in patients with small tumors.^31,32 However, most linear accelerators are equipped with only 3D CBCT and lack the capability for 4D CBCT, which requires additional equipment, such as external respiratory signal sensors synchronized with the CBCT system, to extract respiratory signals for phase sorting in 4D CBCT reconstruction. Therefore, in this study, we evaluated the conventional methods and the U-Net for extracting respiratory signal and phase sorting from projections of clinical 3D CBCT across anatomical locations and respiratory cycle regularity. Our findings suggest that the U-Net significantly outperformed the conventional methods across varying conditions. Among the conventional methods, AS and LPCA consistently outperformed IA and FT, with LPCA being superior because it does not rely on bandpass filtering.

The results for the conventional methods obtained in our study (Table 2) are inferior to those reported in the case study conducted by Tan et al.,²⁷ in which the results from AS, LPCA, and FT-p were around 0.90. Interestingly, in a follow-up case study, Tan et al.²⁸ observed that correlation coefficients of these methods ranged from 0.56 to 0.90, with variations depending on the motion patterns. Our findings suggest that these methods are more effective for the patient data from the regular groups, as opposed to the irregular groups. This highlights respiratory motion regularity as a key factor influencing the performance of the conventional methods. However, the patient data from the irregular groups do not meet the data requirements for 4D CBCT reconstruction.^33,34 For this reason, the results from the regular groups were considered representative of the final performance of the conventional methods. Conversely, the performance of the U-Net is not influenced by anatomical locations and respiratory cycle regularity, suggesting its robustness over the conventional methods against varying conditions.

In the absence of deep learning models such as the U-Net, it remains necessary to assess the relative performance of the conventional methods. As in Tan et al.,^27,28 AS and LPCA outperform IA and FT across varying conditions. One possible reason is that AS and LPCA are less dependent on the accuracy of CBCT projection intensities, tend to be more reliable in low-dose CBCT imaging protocols. On the other hand, AS and LPCA are less affected by the differences between thoracic and abdominal groups compared to IA and FT. Indeed, IA and FT are primarily designed for extracting respiratory signals in thoracic patients. Additionally, Although AS with a patient-specific bandpass filter and LPCA perform similarly in phase sorting for 4D CBCT reconstruction, LPCA is considered superior to AS. This is because LPCA operates without relying on bandpass filtering, while the cutoff frequencies of the filter (respiratory motion frequency) may not be accurately determined without the use of additional external respiratory signal sensors, which are not always available.

LPCA is also reported to be superior over IA, FT, and AS in real-time applications, such as respiratory motion tracking.^22-24 However, the average correlation coefficients from LPCA for the regular groups in our study are only above 0.80, which is considered insufficient for respiratory motion tracking.^35-37 Fortunately, a correlation coefficient of 0.80 corresponds to errors within 1/6 of a respiratory cycle (amplitude error: 12.71 ± 2.56 %, phase error: 8.17 ± 2.31 %), indicating a minor impact on the 4D CBCT reconstruction across six respiratory phases.¹⁰ Moreover, the respiratory signal extracted using LPCA is more robust than results from external respiratory signal sensors, such as the real-time position management (RPM) system. Wang et al.,¹⁸ reported a large variance in the internal and external correlation from 4D CT, with values ranging from 0.01 to 0.99 in thoracic cancer patients and from 0.55 to 1.00 in abdominal cancer patients. In contrast, the results from LPCA in our study ranged from 0.63 to 0.90 in thoracic cancer patients, and from 0.76 to 0.87 in abdominal cancer patients. Nevertheless, external respiratory signal-based methods are not patient-dependent, whereas 16 out of 86 patients could not yield continuous respiratory signals due to difficulties in manual labeling (example plots are shown in Supplemental Material 2). In cases where manual delineation is challenging, automated methods are even less likely to produce reliable results. This highlights that approaches relying on internal anatomical structures, with the U-Net being particularly affected, are inherently patient-dependent.

A limitation of our study is that only patient data with distinguished moving anatomical features (the diaphragm) were included in the analysis, as this was essential for labeling reference signal. This restriction prevents us from highlighting the limitation of deep learning-based methods and the AS method, as noted by Yan et al.,²² which depends on the presence of distinguished moving anatomical features. However, our study also found that LPCA outperforms AS in the presence of the diaphragm. Furthermore, the dataset used in this study was relatively limited in size and was collected from a single institution. Therefore, the generalizability of the U-Net model to data acquired from different scanners or imaging protocols remains to be validated. The U-Net model was trained and evaluated on the same dataset without external validation. Further evaluation using independent patient cohorts is required to confirm the model’s robustness and clinical applicability.

Conclusion

In this study, we demonstrated that the U-Net significantly outperforms the conventional methods (IA, FT, AS, and LPCA) in extracting respiratory signal across varying anatomical locations and respiratory cycle regularity. However, the U-Net relies on diaphragm delineation and is inherently patient-dependent. Among the conventional methods, LPCA is superior because it does not rely on bandpass filtering. These findings make significant contributions to advancing phase sorting techniques and facilitating deep-learning-based 4D CBCT reconstruction using projections from the clinical 3D CBCT imaging protocol.

Supplemental Material

Supplemental Material - Comparative Evaluation of Conventional and Deep Learning Methods for Respiratory Signal Extraction From Clinical 3D CBCT Projections

Supplemental Material for Comparative Evaluation of Conventional and Deep Learning Methods for Respiratory Signal Extraction From Clinical 3D CBCT Projections by Wan Li, Weihang Yang, Xiangyu Zhang, Yinan Huang, Xiaokang Wang, Renming Zhong, and Xiangbin Zhang in Technology in Cancer Research & Treatment.

Supplemental Material

Supplemental Material - Comparative Evaluation of Conventional and Deep Learning Methods for Respiratory Signal Extraction From Clinical 3D CBCT Projections

Footnotes

ORCID iDs

Xiaokang Wang

Xiangbin Zhang

Ethical Considerations

This study was approved by the Ethics Committee on Biomedical Research, West China Hospital of Sichuan University (Approval No. 20230614).

Consent to Participate

All participants provided written informed consent.

Author Contributions

W. Li and W. Yang contributed equally to this work. W. Li, W. Yang, X. Wang, and Y. Huang performed experiments, data curation, and formal analysis. X. Zhang and R. Zhong conceived and supervised the study. W. Li and W. Yang drafted the manuscript, and all authors contributed to manuscript revision and approved the final version. X. Zhang acquired funding and managed the project.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by the National Natural Science Foundation of China (No. 12405390) and the Science and Technology Department of Sichuan Province (No. 2024YFFK0147, 2026NSFSC1898).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.*

Supplemental Material

Supplemental material for this article is available online.

Appendix

References

Ball

Santanam

Senan

Tanyi

van Herk

Keall

. Results from the AAPM Task Group 324 respiratory motion management in radiation oncology survey. J Appl Clin Med Phys. 2022;23(11):e13810.

Shen

Zhang

Meng

, et al. Deep Inspiratory Breath-Hold Technique for Patients with Left-Sided Breast Cancer: Dosimetric Analysis, Clinical Evaluation, and Prediction. Technol Cancer Res Treat. 2025;24:15330338251329120.

Cantú-Delgado

Garnica-Garza

. Feasibility of Radiotherapy Fiducial Marker Tracking via Single-Shot X-ray Acoustic Tomography. Technol Cancer Res Treat. 2025;24:15330338251342867.

Song

Sai

, et al. A Multimodal Point Cloud-Based Method for Tumor Localization in Robotic Ultrasound-Guided Radiotherapy. Technol Cancer Res Treat. 2024;23:15330338241273149.

Olovsson

Wikström

Flejmer

Ahnesjö

Dasu

. Impact of setup and geometric uncertainties on the robustness of free-breathing photon radiotherapy of small lung tumors. Phys Med. 2024;123:103396.

Karaca

Kırlı Bölükbaş

. Time Matters: A Review of Current Radiotherapy Practices and Efficiency Strategies. Technol Cancer Res Treat. 2025;24:15330338251345376.

Tryggestad

Rong

. 4DCT is long overdue for improvement. J Appl Clin Med Phys. 2023;24(4):e13933.

Zeng

Wang

Zhou

, et al. Analysis of the amplitude changes and baseline shifts of respiratory motion using intra-fractional CBCT in liver stereotactic body radiation therapy. Phys Med. 2022;93:52-58.

Santanam

Noel

Parikh

. Planning 4-dimensional computed tomography (4DCT) cannot adequately represent daily intrafractional motion of abdominal tumors. Int J Radiat Oncol Biol Phys. 2013;85(4):999-1005.

10.

Zhang

Jiang

Zhang

Ren

. A review on 4D cone-beam CT (4D-CBCT) in radiation therapy: Technical advances and clinical applications. Med Phys. 2024;51(8):5164-5180.

11.

Lau

BKF

Dillon

Vinod

O’Brien

Reynolds

. Faster and lower dose imaging: evaluating adaptive, constant gantry velocity and angular separation in fast low-dose 4D cone beam CT imaging. Med Phys. 2024;51(2):1364-1382.

12.

Lau

BKF

Reynolds

Wallis

, et al. Reducing 4DCBCT scan time and dose through motion compensated acquisition and reconstruction. Phys Med Biol. 2021;66(7):75002.

13.

Hugo

Weiss

Sleeman

, et al. A longitudinal four-dimensional computed tomography and cone beam computed tomography dataset for image-guided radiation therapy research in lung cancer. Med Phys. 2017;44(2):762-771.

14.

Mukherjee

Hauptmann

Oktem

Pereyra

Schonlieb

C-B

. Learned Reconstruction Methods With Convergence Guarantees: A survey of concepts and applications. IEEE Signal Process Mag. 2023;40(1):164-182.

15.

Wang

de Man

. Deep learning for tomographic image reconstruction. Nat Mach Intell. 2020;2(12):737-748.

16.

Thummerer

Seller Oria

Zaffino

, et al. Deep learning-based 4D-synthetic CTs from sparse-view CBCTs for dose calculations in adaptive proton therapy. Med Phys. 2022;49(11):6824-6839.

17.

Yang

Tsui

, et al. Four-Dimensional Cone Beam CT Imaging Using a Single Routine Scan via Deep Learning. IEEE Trans Med Imaging. 2023;42(5):1495-1508.

18.

Wang

Song

, et al. Correlation of Optical Surface Respiratory Motion Signal and Internal Lung and Liver Tumor Motion: A Retrospective Single-Center Observational Study. Technol Cancer Res Treat. 2022;21:15330338221112280.

19.

Sonke

J-J

Zijp

Remeijer

van Herk

. Respiratory correlated cone beam CT. Med Phys. 2005;32(4):1176-1186.

20.

Kavanagh

Evans

Hansen

Webb

. Obtaining breathing patterns from any sequential thoracic x-ray image set. Phys Med Biol. 2009;54(16):4879-4888.

21.

Vergalasova

Cai

Yin

F-F

. A novel technique for markerless, self-sorted 4D-CBCT: feasibility study. Med Phys. 2012;39(3):1442-1451.

22.

Yan

Wang

Yin

, et al. Extracting respiratory signals from thoracic cone beam CT projections. Phys Med Biol. 2013;58(5):1447-1464.

23.

Chao

Wei

Yuan

Rosenzweig

Y-C

. Robust breathing signal extraction from cone beam CT projections based on adaptive and global optimization techniques. Phys Med Biol. 2016;61(8):3109-3126.

24.

Tsai

Yan

Liu

, et al. Tumor phase recognition using cone-beam computed tomography projections and external surrogate information. Med Phys. 2020;47(10):5077-5089.

25.

Edmunds

Sharp

Winey

. Automatic diaphragm segmentation for real-time lung tumor tracking on cone-beam CT projections: a convolutional neural network approach. Biomed Phys Eng Express. 2019;5(3):035005.

26.

Radig

Paysan

Scheib

. Deep-learning-based respiratory surrogate signal extraction. In: Stayman

, ed. 7th International Conference on Image Formation in X-Ray Computed Tomography. SPIE; 2022:100.

27.

Tan

Mokri

Ahmad

Ismail

Abd Rahni

. Evaluation Methodology for Respiratory Signal Extraction from Clinical Cone-Beam CT (CBCT) using Data-Driven Methods. IJIE. 2021;13(5):1-8.

28.

Tan

Mokri

Ahmad

Ismail

Rahni

AAA

. Analysis of Inter-Fraction Respiratory Variability Effect on Data-Driven Respiratory Signal Estimation Methods from CBCT Imaging. In Paper presented at: 2022 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC); November, 5-12 2022; Milano, Italy. IEEE:1-3.

29.

Dobashi

Mori

. Evaluation of respiratory pattern during respiratory-gated radiotherapy. Australas Phys Eng Sci Med. 2014;37(4):731-742.

30.

Ronneberger

Fischer

Brox

. U-Net: Convolutional Networks for Biomedical Image Segmentation. In: Frangi

Hornegger

Navab

Wells

, eds. Medical Image Computing and Computer-Assisted Intervention - MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III. 1st ed. Cham: Springer International Publishing; Imprint: Springer; 2015:234-241.

31.

García-Acilu

García Ruiz-Zorrilla

Hernando

, et al. Analysis of intra-fractional positioning correction performed by cone beam computed tomography in SBRT treatments. Phys Med. 2024;125:104502.

32.

Varasteh

Ali

Esteve

, et al. Patient specific evaluation of breathing motion induced interplay effects. Phys Med. 2023;105:102501.

33.

Büttgen

Werner

Gauer

. Stability analysis of patient-specific 4DCT- and 4DCBCT-based correspondence models. Med Phys. 2024;51(9):5890-5900.

34.

Cao

Wang

Ding

, et al. A 4D-CBCT correction network based on contrastive learning for dose calculation in lung cancer. Radiat Oncol. 2024;19(1):20.

35.

Zhang

Liu

, et al. Neural signals-based respiratory motion tracking: a proof-of-concept study. Phys Med Biol. 2023;68(19):195015.

36.

Zhang

Yan

Xiao

Zhong

. Modeling of artificial intelligence-based respiratory motion prediction in MRI-guided radiotherapy: a review. Radiat Oncol. 2024;19(1):140.

37.

Zhang

Huang

Yan

Jiang

Zhong

. Neural Signals-Based Respiratory Motion Tracking: A Surface Electromyography Study. Int J Radiat Oncol Biol Phys. 2025;123(4):1187-1194.