Abstract
Introduction
Recent advances in deep learning have significantly improved the ability to solve ill-posed problems, making 4D cone-beam CT (CBCT) reconstruction from projections of 3D CBCT imaging achievable. However, extracting respiratory signal from CBCT projections for 4D CBCT phase sorting remains a challenge. This study aims to evaluate conventional and deep learning methods for extracting respiratory signal from projections of clinical 3D CBCT imaging.
Methods
This study analyzed 70 sets of projections from clinical 3D CBCT imaging, involving thoracic and abdominal cancer patients with regular and irregular respiratory motion patterns. Using the labeled apex of the diaphragm as a reference, respiratory signals extracted using conventional methods—including intensity analysis (IA), Fourier transform (FT), Amsterdam Shroud (AS), and local principal component analysis (LPCA)—as well as a deep learning-based method (U-Net) were compared through correlation analysis and phase-sorting capability.
Results
The U-Net significantly outperformed the conventional methods across varying conditions, achieving a correlation coefficient of 0.93 ± 0.07. Among the conventional methods, LPCA and AS outperformed IA and FT, with LPCA is considered superior because the AS method is influenced by the cutoff frequencies of the bandpass filter.
Conclusion
The U-Net demonstrates superiority in extracting respiratory signals from clinical 3D CBCT projections, highlighting its potential to enhance respiratory phase sorting and 4D CBCT reconstruction.
Keywords
Introduction
Radiotherapy has become increasingly important in cancer treatment with the advancement of precision radiotherapy techniques, such as intensity-modulated radiation therapy (IMRT), stereotactic body radiotherapy (SBRT), and particle therapy. These techniques enable the delivery of highly conformal doses to targets, while also posing challenges for respiratory motion management in thoracoabdominal cancer radiotherapy. Currently, respiratory motion management methods mainly contain 4DCT, breath hold, abdominal compression, respiratory gating, and motion tracking. 1 In clinical practice, breath hold and abdominal compression require significant patient cooperation and tolerance, 2 while respiratory gating and motion tracking involve more complex technologies and are dependent on specialized equipment.3,4 Conversely, 4DCT, which creates an internal target volume (ITV) to encompass the respiratory motion of targets, is widely used for its accessibility and reduced patient burden.5,6 Nevertheless, the motion characteristics extracted from 4D CT often fail to represent those observed during dose delivery, leading to dose uncertainties and potentially unfavorable clinical outcomes.7-9 In other words, 4D CT-based ITV suffers from inter-fractional motion. A well-established approach for quantifying inter-fractional motion is 4D cone-beam CT (CBCT). 10
4D CBCT images comprise a series of 3D CBCT images, each representing the motion state at a specific breathing phase. Conventionally, 4D CBCT images are generated by sorting projections into different breathing phases and reconstructing 3D CBCT images for each phase separately using the Feldkamp-Davis-Kress (FDK) algorithm. 11 In clinical practice, the generation of 4D CBCT images often necessitates thousands of projections, which result in excessive imaging dose and increased scan time.12,13 Recently, deep learning-based volumetric image reconstruction has made the reduction of projections for 4D CBCT reconstruction achievable.14,15 Thummerer et al. 16 achieved 4D CBCT reconstruction using projections obtained with a 3D acquisition protocol on the gantry-mounted CBCT scanner of an IBA Proteus Plus proton therapy system. Yang et al. 17 proposed a multiscale-discriminator generative adversarial network (MSD-GAN) for 4D CBCT reconstruction from projections obtained with a single routine scan. Notably, both studies underscore the importance of extracting internal respiratory signal from CBCT projections for effective 4D CBCT phase sorting and subsequent reconstruction, as external respiratory signal (e.g., from optical surface imaging or respiratory belt) may not accurately represent the motion of internal anatomical structures. 18
Extracting internal respiratory signal manually from CBCT projections is labor-intensive and time-consuming, prompting several studies to explore data-driven methods to overcome these challenges. Zijp et al. 19 proposed the Amsterdam Shroud (AS) method, which converts all the projections into a so-called AS image, enabling the extraction of respiratory signal. Kavanagh et al. 20 introduced the intensity analysis (IA) method, which examines the variations in lung intensity. Vergalasova et al. 21 incorporated Fourier transform (FT) theory and demonstrated that both phase information (FT-p) and magnitude information (FT-m) extracted from projections are applicable for 4D CBCT phase sorting. Yan et al. 22 summarized that variations in the AS image are mainly influenced by respiratory motion and gantry rotation, and achieved real-time extraction of respiratory signal using the local principal component analysis (LPCA) method. To further accelerate the LPCA method, Chao et al. 23 developed an adaptive robust z-normalization filtering technique to augment the weak oscillating structures in the AS image. Tsai et al. 24 combined external respiratory signals and the AS image to provide additional information for LPCA. Edmunds et al. 25 and Radig et al. 26 incorporated a deep learning-based approach to extract respiratory signal in a more directly way.
However, comparison studies among these methods remain limited, particularly in the context of respiratory signal extraction from projections of 3D CBCT imaging. Tan et al. 27 conducted a case study comparing the conventional methods, finding that the LPCA method outperformed the AS method, the IA method, and the FT method. In a follow-up case study, Tan et al. 28 demonstrated that on average, the AS method outperformed other methods across various motion patterns, and found that the performance of these conventional methods is associated with the motion pattern. Collectively, these comparison studies did not include the more advanced deep learning-based methods. Additionally, the inter-patient variability in CBCT projections, which cannot be fully captured by case studies, may significantly influence the performance of data-driven methods. 22 Therefore, in this paper, we aim to evaluate the performance of the conventional methods and a deep learning-based method for extracting respiratory signals from projections of clinical 3D CBCT images across varying anatomical locations and respiratory cycle regularity.
The main contributions of this paper are summarized as follows: (1) To our knowledge, a comprehensive comparison of conventional methods and a deep learning–based method for extracting respiratory signals is performed using a large cohort of clinical 3D CBCT projection data, providing a more robust evaluation than previously reported case-based studies. (2) The evaluated methods are assessed using low-dose 3D CBCT projections, whose image quality is inferior to that of projections acquired for 4D CBCT. This investigation establishes a practical basis for 4D CBCT reconstruction using routine clinical CBCT scans without additional imaging dose. (3) The advantages and limitations of each method are systematically analyzed from a clinical perspective, offering practical guidance for method selection and highlighting key challenges for reliable respiratory signal extraction across different anatomical sites and levels of respiratory regularity.
Materials, Patients and Methods
Patient Data Acquisition and Reference Signal Extraction
The Parameters Used for CBCT Imaging
For each projection, the apexes of the bilateral diaphragms were manually labeled by two independent observers and cross-checked. The two-dimensional image coordinates
As a result, data from 16 patients (8 lung cancer cases, 8 liver cancer cases) out of the 86 were excluded from the analysis. The reference signals extracted from the 70 sets of projections (a total of 17,191 projections) were then classified as irregular following these criteria: (1) baseline shifts (the difference in valley positions across multiple respiratory cycles) exceeding 20% of the average respiratory motion amplitude; and (2) a coefficient of variation 29 of the respiratory cycle durations exceeding 0.2. As a result, the reference signals were categorized into the following groups: Thorax and Regular (15 patients), Abdomen and Regular (20 patients), Thorax and Irregular (12 patients), and Abdomen and Irregular (23 patients).
The Conventional Methods
Figure 1 shows an overview of the conventional methods for extracting respiratory signals from projections of clinical 3D CBCT imaging. All image processing and data analysis was implemented using an in-house developed MATLAB program (v2014b, MathWorks Inc, Natick, MA). Overview of the conventional methods for extracting respiratory signals from projections of clinical 3D CBCT imaging
The principle of the AS method 19 for extracting respiratory signal is based on the periodic variations in anatomical structures (i.e. the diaphragm) in sequential CBCT projections due to respiratory motion. The main steps of the AS method: (1) for each projection, logarithmic transformation is applied to the pixel values, and the first derivative along the superior-inferior direction is computed to achieve edge enhancement; (2) each projection is individually processed by averaging the enhanced values along the left-right axis, and the results from all projections are then concatenated to form the AS image; (3) L2-minimization is performed by comparing the first column of the z-normalized AS image with all subsequent columns to extract the variations in anatomical structures; (4) a bandpass filter is used to extract the respiratory signal from the variations in anatomical structures.
The LPCA method 22 provides an alternative approach for extracting respiratory signal using the AS image. In this method, foreground extraction is applied to enhance the variations in the AS image, and the variations in the AS image primarily result from respiratory motion and gantry rotation. Consequently, PCA is performed sequentially on the enhanced AS image to isolate the respiratory signal component, using a sliding window of 55 columns. Notably, the LPCA method eliminates the need for applying a bandpass filter.
The IA method 20 and the FT method 21 mainly focus on respiratory-induced tissue variations within the selected region-of-interest (ROI), typically beginning at the superior portion of the lungs and encompassing all regions below. More specifically, the IA method detects the average density within the selected ROI from each CBCT projection. The FT method, encompassing FT-m and FT-p, utilizes the 2D Fourier transform to extract tissue variations from each CBCT projection. FT-m extracts the respiratory signal by plotting the direct current component, representing the average intensity of each projection. As reported in Tan et al., 27 FT-m and IA are equivalent in practice, so in this study we combine them for analysis. The FT-p method extracts the first low-frequency component along the y-axis in Fourier space, specifically at the (0,1) location, from each CBCT projection. However, the tissue variations across sequential projections can also be influenced by gantry rotation and heartbeat, which occur at frequencies distinct from respiratory motion. Therefore, a bandpass filter was applied to isolate the desired respiratory signal for these methods.
Additionally, given the variation in respiratory motion frequencies across patients, the choice of cut-off frequencies for the bandpass filter can impact the performance of the AS, IA, and FT methods. In this study, patient-specific cut-off frequencies was first applied to the bandpass filter to evaluate the performance of these methods. Subsequently, we calculated the 95% confidence interval of the respiratory motion frequencies across all patients to determine population-based cut-off frequencies.
Deep Learning-Based Method
In this study, the classic U-Net architecture
30
was employed to represent the deep learning-based method, with a single CBCT projection used as input. To meet the output requirements of the U-Net, we fitted a parabolic curve to the apex points of the bilateral diaphragms to approximate the diaphragm contour, as shown in Figure 2A. The implementation of the U-Net was based on the publicly available code. Figure 2C illustrates the architecture of the U-Net used in this study. The network follows the classic encoder-decoder structure with symmetric skip connections between corresponding layers. The encoder consists of a series of convolutional and max-pooling layers that progressively extract high-level features while reducing spatial resolution. The decoder mirrors the encoder with up-sampling and convolutional layers, gradually restoring spatial resolution and combining feature maps from the encoder via skip connections to enhance localization accuracy. The y-coordinate of the apex of the higher diaphragm was extracted from the segmented output image and used as the respiratory motion signal. Overview of the U-Net for extracting respiratory signals from projections of clinical 3D CBCT imaging. (A) Parabolic curve fitted to diaphragm apex points to generate the input contour for the U-Net. (B) Training/test dataset split. (C) The architecture of the U-Net
To ensure that the model performance was evaluated on all available data, a rotational testing strategy was adopted, as illustrated in Figure 2B. Specifically, the 70 projection datasets were divided into five groups, each serving once as the test set while the remaining groups were used for training. The combined results on all test sets constituted the final evaluation results.
Statistical Analysis
Taking the labeled respiratory signal as the reference, the extracted signals from the conventional methods and the U-Net were evaluated using the Pearson correlation coefficient (
The
The goal of extracting respiratory signals using these methods is to enable 4D CBCT phase sorting, which can be accomplished through both phase and amplitude sorting. Amplitude error and phase error are used to evaluate the accuracies of phase and amplitude sorting, respectively. It should be emphasized that these 4D CBCT phase sorting metrics are applicable only to regular breathing patterns.
The amplitude error, which equals the normalized root mean square error, ranges from 0% to 100%, with 0% indicating perfect agreement between the two signals and 100% representing the maximum error.
Results
Correlation Analysis of the Conventional Methods and the U-Net
Correlation Coefficient
Phase-Sorting Capability in the Regular Groups
Supplemental Material 1 shows a comparison of extracted and reference respiratory signals at different correlation levels. The extracted signals exhibit a notable discrepancy from the reference signals at correlation coefficients below 0.8. Therefore, we did not analyze the amplitude error and phase error for the IA/FT-m and FT-p methods. As shown in Figure 3, no statistically significant differences were found in the amplitude error and phase error for the extracted respiratory signals from AS and LPCA for the regular groups based on the paired t-test (amplitude error: 12.32 ± 2.27% vs 12.71 ± 2.57%, P = 0.14, phase error: 7.85 ± 2.95% vs 8.17 ± 2.32%, P = 0.44). In contrast, the U-Net achieved significantly superior and more reliable phase-sorting performance, yielding amplitude and phase errors of 7.58 ± 2.57% and 3.79 ± 1.63%, respectively. The amplitude error and phase error of the extracted respiratory signals from AS, LPCA, and the U-Net, respectively. The distributions of errors are presented as violin plots
Worst-Case Analysis of the U-Net
Figure 4 shows representative worst-case analysis of the U-Net. In (a), the couch structure was misidentified as the diaphragm. In (b), a parabolic contour in the neck region was incorrectly recognized as the diaphragm. In (c), the bilateral diaphragms were erroneously merged and detected as a single hemidiaphragm. However, the confidence scores for identified diaphragms were generally higher than 0.7, while those of the misidentified structures tended to have lower confidence. Alternatively, the respiratory signal corresponding to the higher side of the diaphragm can be selected from the scatter plots, as shown in Figure 4D. Representative worst-case analysis of the U-Net. (A) The couch structure was misidentified as the diaphragm. (B) A parabolic contour in the neck region was incorrectly recognized as the diaphragm. (C) The bilateral diaphragms were erroneously merged and detected as a single hemidiaphragm. (D) Scatter plots of detected diaphragm points from which the respiratory signal on the higher side of the diaphragm can be selected
Effect of Cutoff Frequencies on the Conventional Methods
Although the U-Net demonstrates superior performance, comparative analysis of conventional methods remains essential for understanding their parameter dependencies and practical limitations. Figure 5A shows the distribution of low-frequency cutoff values for the patient-specific bandpass filter. Based on this data, the 95% confidence interval for the respiratory motion frequencies across all patients is estimated to range from 0.10 Hz to 0.55 Hz, corresponding to the lower and upper cutoff frequencies of the population-based bandpass filter. As shown in Figure 5B, applying the population-based bandpass filter significantly reduces the correlation between the reference and the extracted respiratory signals from IA/FT-m (from 0.60 ± 0.10 to 0.28 ± 0.15, P < 0.05), FT-p (from 0.63 ± 0.11 to 0.32 ± 0.15, P < 0.05), and AS (from 0.82 ± 0.05 to 0.75 ± 0.12, P < 0.05), for the regular groups. The impact of cutoff frequencies for the bandpass filter on the performance of for IA/FT-m, FT-p, and AS. (A) The distribution of low-frequency cutoff values for patient-specific bandpass filter. (B) Comparisons of results from patient-specific and population-based bandpass filters. The distributions of errors are presented as violin plots
Discussion
4D CBCT plays a critical role in quantifying both inter-fractional and intra-fractional motion, especially in patients with small tumors.31,32 However, most linear accelerators are equipped with only 3D CBCT and lack the capability for 4D CBCT, which requires additional equipment, such as external respiratory signal sensors synchronized with the CBCT system, to extract respiratory signals for phase sorting in 4D CBCT reconstruction. Therefore, in this study, we evaluated the conventional methods and the U-Net for extracting respiratory signal and phase sorting from projections of clinical 3D CBCT across anatomical locations and respiratory cycle regularity. Our findings suggest that the U-Net significantly outperformed the conventional methods across varying conditions. Among the conventional methods, AS and LPCA consistently outperformed IA and FT, with LPCA being superior because it does not rely on bandpass filtering.
The results for the conventional methods obtained in our study (Table 2) are inferior to those reported in the case study conducted by Tan et al., 27 in which the results from AS, LPCA, and FT-p were around 0.90. Interestingly, in a follow-up case study, Tan et al. 28 observed that correlation coefficients of these methods ranged from 0.56 to 0.90, with variations depending on the motion patterns. Our findings suggest that these methods are more effective for the patient data from the regular groups, as opposed to the irregular groups. This highlights respiratory motion regularity as a key factor influencing the performance of the conventional methods. However, the patient data from the irregular groups do not meet the data requirements for 4D CBCT reconstruction.33,34 For this reason, the results from the regular groups were considered representative of the final performance of the conventional methods. Conversely, the performance of the U-Net is not influenced by anatomical locations and respiratory cycle regularity, suggesting its robustness over the conventional methods against varying conditions.
In the absence of deep learning models such as the U-Net, it remains necessary to assess the relative performance of the conventional methods. As in Tan et al.,27,28 AS and LPCA outperform IA and FT across varying conditions. One possible reason is that AS and LPCA are less dependent on the accuracy of CBCT projection intensities, tend to be more reliable in low-dose CBCT imaging protocols. On the other hand, AS and LPCA are less affected by the differences between thoracic and abdominal groups compared to IA and FT. Indeed, IA and FT are primarily designed for extracting respiratory signals in thoracic patients. Additionally, Although AS with a patient-specific bandpass filter and LPCA perform similarly in phase sorting for 4D CBCT reconstruction, LPCA is considered superior to AS. This is because LPCA operates without relying on bandpass filtering, while the cutoff frequencies of the filter (respiratory motion frequency) may not be accurately determined without the use of additional external respiratory signal sensors, which are not always available.
LPCA is also reported to be superior over IA, FT, and AS in real-time applications, such as respiratory motion tracking.22-24 However, the average correlation coefficients from LPCA for the regular groups in our study are only above 0.80, which is considered insufficient for respiratory motion tracking.35-37 Fortunately, a correlation coefficient of 0.80 corresponds to errors within 1/6 of a respiratory cycle (amplitude error: 12.71 ± 2.56 %, phase error: 8.17 ± 2.31 %), indicating a minor impact on the 4D CBCT reconstruction across six respiratory phases. 10 Moreover, the respiratory signal extracted using LPCA is more robust than results from external respiratory signal sensors, such as the real-time position management (RPM) system. Wang et al., 18 reported a large variance in the internal and external correlation from 4D CT, with values ranging from 0.01 to 0.99 in thoracic cancer patients and from 0.55 to 1.00 in abdominal cancer patients. In contrast, the results from LPCA in our study ranged from 0.63 to 0.90 in thoracic cancer patients, and from 0.76 to 0.87 in abdominal cancer patients. Nevertheless, external respiratory signal-based methods are not patient-dependent, whereas 16 out of 86 patients could not yield continuous respiratory signals due to difficulties in manual labeling (example plots are shown in Supplemental Material 2). In cases where manual delineation is challenging, automated methods are even less likely to produce reliable results. This highlights that approaches relying on internal anatomical structures, with the U-Net being particularly affected, are inherently patient-dependent.
A limitation of our study is that only patient data with distinguished moving anatomical features (the diaphragm) were included in the analysis, as this was essential for labeling reference signal. This restriction prevents us from highlighting the limitation of deep learning-based methods and the AS method, as noted by Yan et al., 22 which depends on the presence of distinguished moving anatomical features. However, our study also found that LPCA outperforms AS in the presence of the diaphragm. Furthermore, the dataset used in this study was relatively limited in size and was collected from a single institution. Therefore, the generalizability of the U-Net model to data acquired from different scanners or imaging protocols remains to be validated. The U-Net model was trained and evaluated on the same dataset without external validation. Further evaluation using independent patient cohorts is required to confirm the model’s robustness and clinical applicability.
Conclusion
In this study, we demonstrated that the U-Net significantly outperforms the conventional methods (IA, FT, AS, and LPCA) in extracting respiratory signal across varying anatomical locations and respiratory cycle regularity. However, the U-Net relies on diaphragm delineation and is inherently patient-dependent. Among the conventional methods, LPCA is superior because it does not rely on bandpass filtering. These findings make significant contributions to advancing phase sorting techniques and facilitating deep-learning-based 4D CBCT reconstruction using projections from the clinical 3D CBCT imaging protocol.
Supplemental Material
Supplemental Material - Comparative Evaluation of Conventional and Deep Learning Methods for Respiratory Signal Extraction From Clinical 3D CBCT Projections
Supplemental Material for Comparative Evaluation of Conventional and Deep Learning Methods for Respiratory Signal Extraction From Clinical 3D CBCT Projections by Wan Li, Weihang Yang, Xiangyu Zhang, Yinan Huang, Xiaokang Wang, Renming Zhong, and Xiangbin Zhang in Technology in Cancer Research & Treatment.
Supplemental Material
Supplemental Material - Comparative Evaluation of Conventional and Deep Learning Methods for Respiratory Signal Extraction From Clinical 3D CBCT Projections
Supplemental Material for Comparative Evaluation of Conventional and Deep Learning Methods for Respiratory Signal Extraction From Clinical 3D CBCT Projections by Wan Li, Weihang Yang, Xiangyu Zhang, Yinan Huang, Xiaokang Wang, Renming Zhong, and Xiangbin Zhang in Technology in Cancer Research & Treatment.
Footnotes
Ethical Considerations
This study was approved by the Ethics Committee on Biomedical Research, West China Hospital of Sichuan University (Approval No. 20230614).
Consent to Participate
All participants provided written informed consent.
Author Contributions
W. Li and W. Yang contributed equally to this work. W. Li, W. Yang, X. Wang, and Y. Huang performed experiments, data curation, and formal analysis. X. Zhang and R. Zhong conceived and supervised the study. W. Li and W. Yang drafted the manuscript, and all authors contributed to manuscript revision and approved the final version. X. Zhang acquired funding and managed the project.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study is supported by the National Natural Science Foundation of China (No. 12405390) and the Science and Technology Department of Sichuan Province (No. 2024YFFK0147, 2026NSFSC1898).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Supplemental Material
Supplemental material for this article is available online.
