Abstract
Introduction
Leukemia is one of the most prevalent cancers in children. The use of total marrow and lymphoid irradiation (TMLI) via helical tomotherapy (TOMO) as a conditioning regimen prior to bone marrow transplant (BMT) has been widely adopted in clinical practice. Accurate and efficient segmentation of target volumes and organs at risk (OARs) is a prerequisite for precise TMLI. The purpose of this study was to investigate the feasibility of deep learning-based auto-segmentation technology (using 2D U-net and 3D V-net models) for target volumes (bone marrow and lymphatic drainage regions) and organs at risk (OARs) in pediatric total marrow and lymphoid irradiation (TMLI).
Methods
This study was designed as a retrospective study. Thirty-six pediatric patients treated with TMLI between 2018 and 2024 were included. Target volumes and OARs were manually segmented and refined. The CT images and corresponding contours were imported into the AccuLearning workstation (Manteia Company, Xiamen, China) to train, validate, and test based on 2D U-net and 3D V-net deep learning models. The auto-segmentation performance was evaluated on 6 test cases using the Dice Similarity Coefficient (DSC), Hausdorff Distance (HD), and Average Surface Distance (ASD).
Results
Finally, analysis revealed DSC values >0.7 for all OARs except lenses segmented by the 3D V-net model. For target volumes, bone structures achieved high segmentation accuracy.
Conclusion
The 3D V-net model demonstrated superior performance compared to the 2D U-net model. Auto-segmented contours generated by the 2D U-net and 3D V-net models, with minor manual adjustments, are clinically applicable for TMLI radiotherapy planning.
Introduction
Radiotherapy is a common treatment modality for malignant tumors, with over 70% of cancer patients requiring radiation therapy. 1 Accurate and efficient segmentation of target volumes and Organs at Risk (OARs) is a prerequisite for precise radiotherapy. 2 Currently, manual segmentation by physicians remains the gold standard, though it is a time-consuming process. Studies3,4 indicate that contouring for a single total marrow and lymphoid irradiation (TMLI) patient typically requires 12–16 h. Even with established contouring guidelines, inter-observer variability exists due to differing physician preferences, and intra-observer variations may occur when the same physician contours at different times. 5
Compared to manual segmentation, auto-segmentation has demonstrated significant advantages since its inception, including reduced workload for physicians, 6 shorter patient waiting times, and improved therapeutic ratios for tumors.7–11 Currently, the most widely used auto-segmentation technology relies on Deep Learning (DL)-based algorithms, with the workflow illustrated in Figure 1.

Flow chart of the auto-segmentation.
Leukemia is one of the most prevalent cancers in children. The use of TMLI via helical tomotherapy (TOMO) as a conditioning regimen prior to bone marrow transplant (BMT) has been widely adopted in clinical practice.3,12–16 Recent researches on TMLI auto-segmentation have been conducted.3,4,17–19
The primary objective of this study is to explore the feasibility of auto-segmentation for target volumes and OARs in pediatric TMLI using deep learning-based techniques [2D U-net (Figure 2) and 3D V-net (Figure 3) models]. This approach aims to address the limitations of conventional auto-segmentation software in handling pediatric-specific anatomical variations,20,21 along with overcoming the time-consuming nature and inter-observer variability inherent in traditional manual segmentation, thereby enhancing radiotherapy workflow efficiency.

2D U-net structure.

3D V-net structure.
Materials and Methods
Dataset Acquisition and Preprocessing
This retrospective study enrolled 36 consecutive pediatric patients (25 males and 11 females; age distribution is presented in Figure 4) who underwent TMLI at the Department of Radiotherapy, The Seventh Medical Center of Chinese PLA General Hospital between 2018 and 2024. All patient data were de-identified. The study protocol was approved by the Ethics Committee of the same institution (No. S 2025-080-01), and the requirement for written informed consent was waived due to the retrospective nature of the study and the use of fully anonymized data.

Patient age distribution.
CT simulation was performed using a Philips Brilliance Big Bore CT scanner (Philips Healthcare, Best, the Netherlands). Patients were positioned supine with a head-first orientation, immobilized using a head-neck-shoulder thermoplastic mask for the upper thorax, a thermoplastic mask for the abdomen/pelvis, and a vacuum cushion for the lower extremities. Due to variations in pediatric patient height, full-body scans (from head to feet) were acquired for shorter children, while taller children underwent two-phase scanning: initial head-first supine scans up to the knees, followed by feet-first supine scans to complete the procedure. Notably, the feet-first supine (FFS) scans only covered the leg bone marrow target regions. To standardize the study scope, all defined target volumes were limited to regions above the knees. Acquired CT images had a resolution of 512 × 512, with slice thickness and spacing of 5 mm, and a tube voltage of 120 kV. These images were then imported into the Pinnacle3 treatment planning system (Philips Radiation Oncology Systems, Madison, WI, USA) for manual segmentation.
The OARs defined in this study include: brain, brainstem, heart, kidneys, liver, lungs, oral cavity, parotid glands, stomach, bladder, lenses, eyeballs, thyroid, esophagus, small bowel, colon, bowel bag and rectum. The clinical target volume (CTV) is subdivided into three components (Figure 5): CTV1 (femoral heads, humeral heads, and bone marrow excluding the appendicular bones), CTV2 (bone marrow above the knees excluding CTV1 regions), and CTV3 (lymphatic drainage regions).

Clinical target volume (CTV1 in red, CTV2 in green, CTV3 in purple).
The planning target volume (PTV) was generated by merging all three CTV components with appropriate margin expansions for subsequent treatment planning. Following initial segmentation, manual refinements were performed based on the patients’ clinical data and unified contouring criteria. The final contours were reviewed and validated by three experienced radiation oncologists, serving as the Ground Truth (GT).
No specific bladder preparation protocol was required during simulation, as the degree of bladder filling showed no significant impact on treatment delivery or OAR protection. While a filled bladder could be delineated with clearer boundaries on CT images, the borders between an unfilled bladder and adjacent immature pediatric pelvic structures (such as the uterus or prostate) were often indistinct due to their underdeveloped anatomical differentiation. To date, no internationally established guidelines exist for TMLI target volume definition; the criteria adopted in this study were based on our institution's clinical experience. The reporting of this study conforms to the STROBE guidelines. 22
Environment Configuration
The training and validation of the auto-segmentation models were conducted using AccuLearning, a Deep Learning (DL)-based medical image auto-segmentation training platform developed by Manteia Technologies Co., Ltd (Xiamen, China). The platform operates on a Windows 10 system with an Intel® Core™ i7-10700 CPU @ 2.90 GHz processor. AccuLearning supports small-sample training for auto-segmentation algorithms, where high-precision models can be generated even with limited datasets, with a minimum recommended training cohort size of 30 cases. To date, few studies have reported the feasibility of small-sample algorithms. In this work, we trained and tested models using data from 36 pediatric leukemia patients on the AccuLearning platform to evaluate the platform's applicability. During training, the platform enables data-driven parameter updates and automated feature extraction, outperforming traditional image processing algorithms. 23
In recent years, Convolutional Neural Networks (CNNs), particularly 2D U-net and 3D V-net architectures and their variants, have been widely adopted in medical image auto-segmentation with promising results.6,24–34 The AccuLearning platform utilizes two network architectures: 2D U-net and 3D V-net. Generally, 3D networks yield superior performance when processing large datasets with abundant z-axis slices in three-dimensional data, whereas 2D networks may achieve better results for smaller datasets or those with limited z-axis slices.
Model Training
The manually segmented patient data were transferred to the AccuLearning platform. Within AccuLearning, each dataset consisted of a CT image series and its corresponding RT Structure file. To optimize data processing and ensure model accuracy, regions of interest (ROIs) were divided into six training datasets: OARs Group 1: Brain, brainstem, oral cavity, lungs (bilateral), heart, liver, stomach, and bladder (8 OARs). OARs Group 2: Left/right eyeballs, left/right lenses, left/right parotid glands, and left/right kidneys (8 OARs). OARs Group 3: thyroid, esophagus, small bowel, colon, bowel bag, and rectum (6 OARs). CTV Groups: CTV1, CTV2, and CTV3 were assigned to three separate training sets.
The 36 patient cases were randomly split into training, validation, and test sets at a ratio of 26:4:6. The training set was used for model development, the validation set for hyperparameter tuning and training progress monitoring, and the test set for final model evaluation. Training parameters were standardized between the 2D U-net and 3D V-net frameworks to ensure comparability. After model training, auto-segmentation was performed on the 6 test cases, generating corresponding RT Structure files. 35
Evaluation Indicators
The manually segmented target volumes and OARs by physicians served as the Ground Truth (GT). Quantitative evaluation of the auto-segmentation model's performance was conducted using three metrics: Dice Similarity Coefficient (DSC), 95% Hausdorff Distance (HD95), Average Surface Distance (ASD). Studies36,37 have indicated that a DSC value greater than 0.70 suggests good reproducibility between structures, and the auto-segmentation results are considered clinically acceptable. While DSC provides a straightforward and intuitive measure, it alone cannot fully characterize all aspects of segmentation quality. Therefore, we supplemented the evaluation with two distance-sensitive metrics: the HD95 and ASD.
The Hausdorff Distance (HD) is used to evaluate the surface distance in three-dimensional space between automatically and manually segmented structures. To avoid the impact of outlier noise points on evaluation results, typically the top 95% of data (HD95) with the smallest distances between point sets are selected for calculation. A smaller HD95 value indicates greater overlap between automatic and manual segmentations, representing higher segmentation accuracy. HD95 distance possesses strong fault tolerance and anti-interference capabilities, making it a highly position-sensitive parameter. 38 When image alignment is good, HD95 values remain very small; however, even minor deviations can cause HD95 values to suddenly increase to dozens or even hundreds of millimeters. Generally, larger anatomical regions correspond to higher HD95 values. Therefore, there is no definitive standard value for determining “good” HD95 values— as long as the HD95 value is not abnormally large, the alignment can generally be considered acceptable. The ASD quantifies the mean distance between two points sets by dividing the sum of mutual distances by their total surface area, serving as a metric to evaluate the overall contour deviation between automatically and manually segmented structures. An ASD value approaching zero indicates minimal shape discrepancy between the auto-segmented and manual reference contours.
Results
Tables 1–3 present the three evaluation metrics for the auto-segmentation performance of all OARs using the 2D U-net and 3D V-net models, as well as the differences between the two models for the same metrics.
DSC Values for OARs Auto-Segmented by 2D U-net and 3D V-net Models (
Note: Boldface indicates that 3D V-net outperforms 2D U-net.
HD95 (mm) for OARs Auto-Segmented by 2D U-net and 3D V-net Models (
Note: Boldface indicates that 3D V-net outperforms 2D U-net.
ASD (mm) for OARs Auto-Segmented by 2D U-net and 3D V-net Models (
Note: Boldface indicates that 3D V-net outperforms 2D U-net.
The degree of bladder filling showed no significant impact on auto-segmentation outcomes; thus, no specific bladder preparation protocol was required during simulation. While partially filled bladders could be segmented on CT images, the boundaries of unfilled bladders were often indistinct from adjacent immature pediatric pelvic structures (uterus/prostate) due to their underdeveloped anatomical differentiation.
The 2D U-net model demonstrated clinically acceptable performance for lens contouring, with DSC values consistently around the 0.7 threshold, meeting predefined expectations for segmentation accuracy. The 3D V-net model failed to achieve the DSC threshold of 0.7 for lens segmentation, with significant deviations from ground truth contours and occasional segmentation errors.
Table 4 list the evaluation metrics for CTV auto-segmentation under 2D U-net and 3D V-net frameworks. For CTV2 (structurally simpler regions), DSC values were 0.87 ± 0.02 and 0.89 ± 0.04, with HD95 of 12.16 ± 12.14 mm and 8.61 ± 7.72 mm respectively, indicating better overlap accuracy with superior 3D V-net performance. CTV3, representing the lymphatic drainage regions, remains the most complex and labor-intensive component of the entire target volume. Nevertheless, its auto-segmentation achieved favorable results. For clinical implementation, only minor refinements to auto-segmented details are required to meet treatment planning specifications. Although CTV1 metrics appeared acceptable, 2D U-net failed to adequately contour the annular skull region above the pituitary gland, capturing only the outer bone margins while missing the inner contours (Figure 6), likely causing DSC underestimation.

Comparison of CTV1 segmentation results using the 2D U-net model at the head (manual segmentation in green, auto-segmentation in red).
Evaluation Metrics for CTVs Auto-Segmented by 2D U-net and 3D V-net Models (
Note: Boldface indicates that 3D V-net outperforms 2D U-net.
Manual contouring for a pediatric TMLI patient typically requires 5–8 h. By utilizing the auto-segmentation model trained in this study, the entire workflow—from data import and auto-segmentation to contour review and modifications meeting clinical planning requirements—can be reduced to approximately 1.5 h, thereby enhancing workflow efficiency and reducing pre-treatment waiting periods for patients.
Discussion
This study focused on pediatric TMLI contour segmentation, training and validating 2D U-net and 3D V-net models using imaging data from 30 pediatric patients and testing the auto-segmentation models on 6 additional pediatric cases. Analysis of the test set results demonstrated the feasibility of CNN-based auto-segmentation for pediatric TMLI. This approach enhances workflow efficiency in radiotherapy centers and provides technical support for implementing pediatric TMLI. While 3D neural networks slightly outperformed 2D counterparts in medical image segmentation, specific challenges persisted, such as the suboptimal lens segmentation observed in this study and the performance for CTV1 and CTV3.
The suboptimal lens segmentation performance of the 3D V-net model may be attributed to the inherently small volume of the lenses, which typically span only 2 CT slices. This resulted in insufficient z-axis data for robust 3D model training, thereby compromising the network's ability to learn effective spatial features. Regarding the failure of the 2D U-net model to adequately contour the inner boundaries of annular bony structures, repeated testing confirmed this persistent limitation, suggesting potential inherent limitations at the algorithmic level for parsing such complex anatomical configurations. In contrast, the 3D V-net model did not exhibit this specific issue.
Watkins et al 18 trained an auto-segmentation model using 100 clinical TMLI patient datasets based on the U-net framework within the Medical Mind AI-software, and applied the trained model to 21 clinical cases. The results showed that 18 out of 21 OARs achieved a DSC >0.8, aligning closely with the findings of the current study. Notably, the DSC values for the oral cavity and stomach exceeded 0.9, surpassing the corresponding results in this study. Although the target definition criteria in their study differed slightly from ours (eg, lymphatic drainage coverage and bone marrow subdivision), the auto-segmentation outcomes exhibited remarkable similarity, indicating model robustness to anatomical variations. For lens auto-segmentation, Watkins et al 18 also encountered suboptimal performance for small organs (eg, lenses and optic chiasm), mirroring the challenges observed in our study. Literature39,40 further confirms that lens auto-segmentation generally underperforms; however, since manual lens segmentation is quick (typically requiring only few minutes) and does not significantly increase clinicians’ workload, excessive focus on optimizing its auto-segmentation may be unwarranted. In previous studies, the cranial bones were consistently defined as a separate target volume, achieving DSC values of 0.814 ± 0.99 and 0.893 ± 0.005, both indicating poorer auto-segmentation performance compared to other bony targets in the respective studies. 18 Combined with the suboptimal cranial bone contouring observed in the 2D U-net model of this study, this limitation may stem from inherent algorithmic constraints—specifically, the U-net neural network's potential inability to recognize annular anatomical configurations (eg, the pituitary-adjacent skull region). However, this hypothesis requires further targeted studies for validation.
CNNs for auto-segmentation are not limited to U-net and V-net architectures; alternative networks can be explored to enhance performance. For instance, Chen et al 41 employed the WB-net artificial intelligence algorithm, achieving modest improvements in auto-segmentation accuracy. Shi et al 17 proposed a dual-encoder hybrid architecture (DE-net) and applied it to TMLI patient auto-segmentation, demonstrating superior performance compared to conventional algorithms and highlighting the potential of hybrid neural networks in medical image analysis. Additionally, studies have introduced decision tree-based approaches that combine atlas-based models with CNN frameworks to tailor auto-segmentation for OARs with distinct anatomical features, thereby improving precision.
This study utilized a limited cohort of 36 pediatric cases, which may constrain model generalizability. Furthermore, the test set contained only 6 cases, a limited sample size that precluded a formal statistical comparison of the performance differences between the two models. The current auto-segmentation results cannot be directly applied in clinical practice without manual verification and refinement. The model also demonstrated limited capability in handling unusual anatomical variations. Future efforts should focus on expanding datasets with multicenter imaging from diverse CT scanners, refining algorithms—including exploring alternative neural network architectures—to enhance pediatric TMLI-specific adaptability and robustness, and ultimately advancing clinical radiotherapy workflows through robust auto-segmentation support.
Conclusion
Based on the study findings, training deep learning-based auto-segmentation models for target volumes and OARs in pediatric TMLI appears feasible. This approach may addresses the limitations of traditional manual contouring, including its time-consuming nature and inter-observer variability, potentially enhancing radiotherapy efficiency. Overall, the 3D V-net model demonstrated slightly superior performance compared to the 2D U-net. However, in practical applications, the optimal network architecture should be selected based on the anatomical characteristics of specific structures to optimize results. The auto-segmented contours require manual verification and refinement to ensure clinical accuracy before application in radiotherapy planning. In practice, this refinement step is efficient: clinicians typically need only 5–10 min to perform minor manual adjustments to structures such as the lens, maxilla, mandible, testes, and the junctions of target regions.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
