Abstract
Background:
Accurate quantification of the glenoid track is critical for determining optimal treatment strategies in patients with anterior shoulder instability. However, conventional computed tomography (CT)-based assessment methods require approximately 2 hours of manual segmentation and suffer from limited interobserver consistency, which may compromise diagnostic accuracy and surgical planning. Deep learning has demonstrated significant potential in medical image analysis.
Purpose:
To propose a deep learning-based framework for automated CT segmentation and bone defect quantification in anterior shoulder dislocation to enhance diagnostic efficiency and consistency.
Study Design:
Cohort Study (diagnosis); Level of evidence, 3.
Methods:
A deep learning model was developed by adapting the TotalSegmentator framework to perform automated segmentation and 3-dimensional (3D) reconstruction of CT images from 43 patients with anterior shoulder dislocation. Glenoid track width (GTW) and Hill-Sachs interval (HSI) were manually assessed using the Two-Thirds Glenoid Height Technique by 1 senior shoulder and elbow surgeon and 2 junior physicians. Semi-automated determination of the on-track/off-track status was also performed. Segmentation performance and measurement method reliability were evaluated using the Dice similarity coefficient and intraclass correlation coefficient (ICC), respectively.
Results:
The model achieved excellent segmentation accuracy, with mean Dice similarity coefficients exceeding 0.95 for both the scapula and humerus. Segmentation time was significantly reduced compared with manual segmentation, requiring only 30 seconds per case. Based on the segmented images, the GTW measured using the Two-Thirds Glenoid Height Technique demonstrated almost perfect intra- and interobserver agreement (ICC > 0.90). HSI measurements showed almost perfect intraobserver reliability (ICC > 0.90) and substantial interobserver agreement (ICC ≥ 0.80). The semi-automated determination of on-track/off-track status improved workflow efficiency, saving approximately 2 hours compared with the fully manual approach.
Conclusion:
This study integrates deep learning techniques into the entire diagnostic workflow for shoulder dislocation, enabling rapid, accurate quantification of bony defects. The reliability of using the Two-Thirds Glenoid Height Technique for measuring glenoid parameters on 3D models was validated, offering an efficient tool for surgical planning.
Shoulder dislocation accounts for >40% of all joint dislocations, with the vast majority being anterior dislocations. 18 After an anterior dislocation, the posterior aspect of the humeral head often affects the anterior glenoid rim, resulting in bipolar osseous defects—namely, glenoid bone loss (true glenoid bony deficiency) and a Hill-Sachs lesion (HSL). These osseous lesions are key contributors to shoulder instability and recurrent anterior dislocation.3,20 The concept of the glenoid track (GT), proposed by Yamamoto et al, 21 has gained widespread acceptance among orthopaedic surgeons. Determining whether the HSL is “on-track” or “off-track” has become an important criterion for surgical decision-making. 6 The Hill-Sachs interval (HSI) is a key parameter for evaluating whether the HSL is on-track or off-track. Therefore, reliable identification of both the glenoid track width (GTW) and the HSI is critically important for surgical planning in patients with anterior shoulder dislocation.9,16
Computed tomography (CT) remains the gold standard for evaluating bony defects in patients with shoulder dislocation. 2 Some studies have utilized 2-dimensional (2D) CT to calculate the GT. 5,22 However, the selection of the optimal measurement plane and angle is time-consuming and labor-intensive. Moreover, studies have raised concerns about the accuracy and consistency of estimating glenoid bone loss from axial CT slices. 7 Sugaya et al 17 introduced the best-fit circle method using 3D CT to quantify GTW. While 3D reconstruction models provide clearer and more intuitive measurements of bone loss, the glenoid morphology is not always geometrically regular, which can undermine the reliability of the best-fit circle placement. Thus, the Two-Thirds Glenoid Height Technique, which relies on the more consistently identifiable superior and inferior poles of the glenoid, may offer a more reliable method for calculating glenoid-related parameters. 10 Additionally, CT-based 3D reconstruction is time-consuming, and the subsequent determination of the GT and classification of HSL (on-track vs off-track) are still labor-intensive. There is a need for accurate, rapid, and quantitative methods to assess bipolar bone defects in the shoulder joint, thereby substantially improving diagnostic efficiency and surgical planning.
Deep learning, a major subfield of artificial intelligence, has been widely applied in medical image analysis, including the automated segmentation of shoulder radiographs, CT, and magnetic resonance imaging (MRI) images.4,15 One representative example is the TotalSegmentator model developed by Wasserthal et al, 19 which enables robust multiorgan segmentation across the entire body. However, the application of TotalSegmentator to the shoulder joint has primarily focused on anatomically normal structures, and no existing models have yet been developed to simultaneously identify both normal and pathological shoulder anatomies in CT scans, perform automatic segmentation, and enable accurate 3D reconstruction.
Therefore, the purpose of this study was to develop and validate a deep learning-based workflow that (1) automatically segments the scapula and humerus on CT scans of anterior shoulder dislocation, (2) quantitatively measures the GTW on the 3D reconstructed scapula using the Two-Thirds Glenoid Height Technique, and quantitatively measures the HSI on the reconstructed humerus, and (3) provides a semi-automated on-track/off-track classification to improve efficiency and reduce interobserver variability.
Methods
Patients Selection
We retrospectively collected shoulder CT scan data from 80 patients with anterior shoulder dislocation treated at our institution between January 2019 and December 2023. After applying the inclusion and exclusion criteria, 43 cases were included as the validation cohort for model evaluation.
The inclusion criteria were as follows: (1) age between 18 and 80 years, regardless of sex; (2) skeletally mature individuals without severe deformities; and (3) documented history of shoulder dislocation and availability of CT scans, with a maximum slice thickness of 1.5 mm. The exclusion criteria were (1) posterior shoulder dislocation; (2) scapular fractures; (3) humeral fractures; (4) shoulder tumors or developmental abnormalities; (5) absence of HSL on the humeral head; and (6) CT data incompatible with 3D-Slicer processing (Figure 1).

The flowchart of study population selection. CT, computed tomography; CTA, computed tomography angiography; HSI, Hill-Sachs Interval; 3D, 3-dimensional.
In addition, we retrospectively collected 50 chest CT scans from healthy individuals who underwent routine health check-ups at our institution between January 2019 and December 2023. These scans were used to augment model training. All CT scans were acquired using the following imaging parameters: 120 kV tube voltage, automated tube current modulation, and slice thickness of ≤1.5 mm.
Data Annotation
The ground truth used during the annotation process was manually obtained by a chief orthopaedic surgeon (X.L.), with 15 years of experience using 3D-Slicer software. The chief orthopaedic surgeon (X.L.) employed 3D-Slicer software 8 to manually annotate the scapula and humerus layer-by-layer on the axial CT images of the shoulder joint, thus performing segmentation and 3D reconstruction of both the scapula and humerus.
Glenoid Track Measurement and On-Track/Off-Track Evaluation
As demonstrated by Makovicka et al, 10 the Two-Thirds Glenoid Height Technique yields more consistent and reliable measurements of the GT than the best-fit circle method. Based on this evidence, we implemented this technique on 3D reconstructed glenoid models in our study and further evaluated its measurement reliability (Figure 2).

Measurement of glenoid parameters in the 3D reconstructed model. (A) Frontal view of the reconstructed glenoid after segmentation. (B) Illustration of the measurement process: the long axis of the glenoid (L) is defined by connecting the superior and inferior poles of the glenoid. The radius of the best-fit circle is calculated by dividing L by R. Starting from the inferior pole, a distance equal to the radius is measured upward along the long axis to determine the center of the best-fit circle. The shortest distance from this center to the anterior margin of the glenoid is recorded as d′. 3D, 3-dimensional.
After performing a 3D reconstruction of the patient's CT scan, the glenoid view was obtained by adjusting the glenoid sagittal plane. This view was then exported to ImageJ, where 2D measurements of the 3D model were taken. 13
The long axis of the glenoid (L) was defined by connecting its superior and inferior poles. The radius of the best-fit circle was calculated as one-third of the glenoid's long axis length (ie, 1/3L). Starting from the inferior pole, a measurement was made upward along the long axis until the radius length was reached; this point was designated as the center of the best-fit circle. The shortest distance from this center to the anterior margin of the glenoid was recorded as d′. The glenoid bone defect length was then calculated as 1/3L-d′.
To calculate the GTW, we used the following formula: GTW = 83%·D − d, 1 where D is the diameter of the best-fit circle of the glenoid (ie, 2/3L) and d is the glenoid bone defect length (ie, 1/3L − d′).
For HSI measurement, we directly measured the 3D-reconstructed model in 3D-Slicer. The measurement method for HSI is the distance between the innermost point of the HSL and the inner edge of the rotator cuff footprint 1 (Figure 3).

HSI Measurement in the 3D-reconstructed model. (A) The posterior view of the humeral head after segmentation and reconstruction. (B) The measurement process: point h represents the innermost edge of the HSL, point f marks the inner edge of the rotator cuff footprint, and HSI is the distance between points h and f. The area indicated by the yellow double arrows represents the HSL. HSL, Hill-Sachs lesion; 3D, 3-dimensional.
Finally, the GTW was compared with the HSI. If the HSI was greater than the GTW, the HSL was classified as off-track; conversely, if the HSI was less than the GTW, the lesion was considered on-track. To streamline the analysis, an automated computational workflow was developed to perform these calculations. The detailed process is illustrated in Figure 4.

Flowchart of the automated determination process for whether a Hill-Sachs lesion is on-track or off-track. GTW, glenoid track width.
All measurements were performed by a senior orthopaedic surgeon (X.L.) with 15 years of experience and 2 junior orthopaedic residents (F.Z., Y.Y.). Each junior doctor independently measured all data twice. The senior surgeon provided both verbal and written instructions to the junior doctors, following a unified and standardized measurement protocol to minimize technical variability.
All observers performed the assessments independently and were blinded to each other's measurements. The interval between the 2 measurements by each junior doctor exceeded 2 weeks. All assessors discussed and agreed upon the prescribed measurement technique before data collection. However, during the assessment process, no communication or discussion among the observers was allowed to ensure the independence of the evaluations.
Deep Learning
The TotalSegmentator model developed by Wasserthal et al 19 has been demonstrated to robustly segment a wide range of anatomic structures, including the scapula and humerus. This model is based on the nnU-Net framework, a U-Net–based implementation that automatically configures all hyperparameters according to the characteristics of the input dataset. Building on this foundation, we trained and fine-tuned the model using a dataset of 100 healthy shoulder CT scans to enhance its focus and accuracy in segmenting the scapula and humerus. To further validate its segmentation performance under pathological conditions, we evaluated the model using data from patients with anterior shoulder dislocation (Figure 5).

Fine-tuning the workflow of TotalSegmentator. CT, computed tomography; 3D, 3-dimensional.
Evaluation Metrics
The Dice and Jaccard coefficients are used to evaluate segmentation model performance, with higher values indicating greater similarity between the predicted and reference samples. The Average Surface Distance and Hausdorff Distance, as distance-related metrics, are used to assess the accuracy of medical image segmentation, providing a more precise reflection of boundary delineation. Lower values of these metrics correspond to more favorable segmentation outcomes.
Statistical Analysis
All statistical analyses were conducted using SPSS Version 26 (IBM). Measurements obtained from the 3D CT models were compared using the paired Wilcoxon signed-rank test. The intraclass correlation coefficient (ICC) was calculated to assess the consistency of measurement methods. An ICC value < 0.5 indicates poor reliability; 0.5 ≤ ICC <0.75 indicates moderate reliability; 0.75 ≤ ICC < 0.9 suggests good reliability; and ICC ≥ 0.9 reflects excellent reliability. The standard error of measurement was interpreted as follows: ≤5%—excellent; >5% and ≤10%—good; >10% and ≤20%—questionable; >20%—negative. P < .05 was considered statistically significant.
Results
Segmentation and 3D Reconstruction
After fine-tuning the TotalSegmentator model, the segmentation and reconstruction of the humerus and scapula were highly accurate across a wide range of morphological variations (Figure 6). Comparison with the original TotalSegmentator results demonstrated improved segmentation performance for both the proximal humerus and scapula, with Dice coefficients of 0.958 ± 0.027 and 0.950 ± 0.037, respectively (Table 1).

Representative cases of proximal humerus and scapula contour segmentation by the chief orthopaedic surgeon (X.L.) (Ground truth) and the Fine-Tuned TotalSegmentator.
Results of Automatic Segmentation of 2 Osseous Components of the Shoulder Joint Using the Fine-Tuned TotalSegmentator a
Data are presented as mean ± SD. ASD, average surface distance; HD, Hausdorff Distance.
In addition, the time required for automatic segmentation of each shoulder CT scan was <30 seconds, which was significantly shorter than the manual segmentation time of approximately 2 hours.
The deep learning model demonstrated robust performance, achieving accurate segmentation and 3D reconstruction of the shoulder joint even in the presence of severe dislocation and extensive bone loss (Figure 7).

Representative case of segmentation and reconstruction of a severely dislocated shoulder joint with a large bony defect using the automatic segmentation model.
Interobserver Reliability of Measurement
To evaluate the interobserver reliability of the measurement method, ICCs were calculated for 4 parameters obtained during the measurement process: Glenoid L, Glenoid d′, GTW, and HSI. We first assessed the agreement between the 2 junior clinicians. The ICC values for glenoid-related parameters, including Glenoid L, Glenoid d′, and GTW, were all >0.90, indicating excellent interobserver agreement when applying the Two-Thirds Glenoid Height Technique. The ICC for HSI was slightly lower but still indicated good interobserver agreement (ICC = 0.879) (Table 2). The agreement between the 2 observers is further illustrated by the Bland-Altman plots (Figure 8).
Interrater Agreement Among Junior Physicians a
CI, confidence interval; GTW, glenoid track width; HSI, Hill-Sachs interval; ICC, intraclass correlation coefficient; SEM, standard error of measurement.

A Bland-Altman plot between the 2 junior physicians (F.Z., Y.Y.) shows the agreement difference against the mean of their measurements. Bland-Altman plots for interrater measurements of Glenoid L, Glenoid d′, HSI, and GTW by the 2 junior physicians (center black line = mean difference; outer dashed lines = 95% limits of agreement). GTW, glenoid track width; HSI, Hill-Sachs Interval.
Subsequently, we evaluated the interobserver reliability between the senior physician and the junior clinicians. The ICC values for all 4 parameters were comparable with those observed between the 2 junior clinicians. Specifically, Glenoid d′ and GTW exhibited ICCs exceeding 0.90, while HSI demonstrated good interobserver agreement (ICC = 0.884) (Table 3). The agreement between the 2 observers is further illustrated by the Bland-Altman plots (Figure 9).
Interrater Agreement Between Junior Physicians and the Senior Physician a
Data are presented as mean (SD), unless otherwise indicated. CI, confidence interval; GTW, glenoid track width; HSI, Hill-Sachs Interval; ICC, intraclass correlation coefficient; SEM, standard error of measurement.

The Bland-Altman plot between junior and senior physicians shows the agreement difference against the mean of their measurements. Bland-Altman plots for interrater measurements of Glenoid L, Glenoid d′, HSI, and GTW by junior and senior physicians (center black line = mean difference; outer dashed lines = 95% limits of agreement). GTW, glenoid track width; HSI, Hill-Sachs Interval.
Intraobserver Reliability of Measurement
The repeated measurements performed by the junior clinicians demonstrated excellent intraobserver reliability, with ICCs >0.95 for all 4 parameters, including HSI. These results confirm the robustness and reproducibility of the proposed method and indicate that HSI measurement is closely associated with the clinician's subjective judgment. The detailed ICC values are summarized in Table 4.
Intrarater Agreement Within the Junior Physician Group a
Data are presented as mean (SD), unless otherwise indicated. CI, confidence interval; GTW, glenoid track width; HSI, Hill-Sachs interval; ICC, intraclass correlation coefficient; SEM, standard error of measurement.
Artificial Intelligence-Assisted Computational Tasks
To evaluate the translational potential of the proposed workflow, we recorded the time required for (1) segmentation and 3D reconstruction of the scapula and proximal humerus and (2) calculation and on-track/off-track determination, using both the manual workflow and our artificial intelligence (AI)-assisted pipeline. In the validation cohort (n = 43), automated segmentation and reconstruction were completed in a median of 27 seconds (range, 25-30 s) per case, whereas manual segmentation required a median of 7773 seconds (approximately 2.2 h; range, 7255-8457 s). For the calculation and on-track/off-track determination step, the semiautomated computational module (Figure 4) required a median of 30 seconds (range, 27-34 s), compared with 119 seconds (range, 111-133 seconds) for manual computation and classification. These findings demonstrate substantial efficiency gains from our AI-assisted workflow compared with the manual workflow, as summarized in Table 5.
Time Required for Each Step of the Workflow, Including Segmentation and On-Track/Off-Track Determination a
Data are presented as median (range) in seconds, per case.
Discussion
This study systematically validates the feasibility and advantages of deep learning algorithms for automated segmentation and bone defect quantification in CT images of anterior shoulder dislocations. The TotalSegmentator model was fine-tuned and applied to simultaneously process the scapular and humeral structures in both normal and pathological states, providing a highly automated and consistent tool for determining whether the HSL is "on-track" or "off-track." Comparison with the traditional manual evaluation method demonstrated that this approach shows significant advantages in terms of accuracy, time efficiency, and consistency.
The fine-tuned TotalSegmentator model exhibited excellent segmentation performance on anterior shoulder dislocation CTs, achieving Dice coefficients >0.95, representing a significant improvement over the original model. The automatic segmentation process was considerably faster than manual procedures, thus greatly reducing the time required for clinical assessment and improving overall workflow efficiency. Moreover, even in complex cases involving severe bone loss, the model maintained a high degree of segmentation accuracy, reflecting strong robustness in pathological scenarios. These findings are consistent with the growing trend reported in the literature, where deep learning-based segmentation is gradually replacing manual processing methods.15,19
To address the limitations of the traditional “best-fit circle” method—specifically, its high subjectivity and large measurement error due to irregular glenoid morphology—we employed the Two-Thirds Glenoid Height Technique combined with a semiautomated measurement workflow. This approach significantly enhanced the reproducibility and interrater reliability of GTW measurements (ICC ≥ 0.90). The proposed method demonstrated advantages in image clarity, ease of operation, and adaptability to anatomic variations, indicating high clinical applicability. In agreement with the recently proposed modified techniques by Makovicka et al, 10 our study emphasizes the importance of standardized measurement pathways for improving data consistency.
Meanwhile, Rayes et al 14 proposed estimating native glenoid width from glenoid height using ipsilateral 3D CT, which incorporates 3D morphology and orientation and may therefore yield a slightly different height-width relationship compared with standardized planar measurements. In the present study, measurements were performed on a standardized en-face glenoid view exported from the reconstructed 3D model and measured in ImageJ, consistent with the Two-Thirds Glenoid Height Technique as a practical and reproducible measurement workflow. Accordingly, we interpret D = 2/3·L as a standardized approximation for circle construction rather than a universal anatomic constant. Future studies could directly compare 3D morphologic derivations and en-face planar techniques and evaluate their effect on GTW estimation and on-track/off-track classification.
In HSI measurement, although the assessment still relies on anatomic landmark identification on the reconstructed humerus, our results demonstrated good interobserver agreement (ICC = 0.879-0.884) and excellent intraobserver reliability (ICC = 0.963). This pattern likely reflects the fact that HSI measurement depends on the observer's selection of the innermost point of the HSL and the medial margin of the rotator cuff footprint, which may be less sharply defined in certain cases. Further standardization of landmark definitions and/or automated landmark detection may help reduce observer-related variability in future studies.
The proposed workflow in this study significantly shortens the time required to determine whether the HSL is “on-track” or “off-track” and reduces the impact of operator subjectivity, thereby creating an efficient pathway for preoperative bone defect assessment. Beyond time savings, this framework may provide additional clinical benefits. First, automated segmentation combined with a standardized quantitative measurement pathway may reduce operator dependence and interobserver variability, which is clinically relevant because the on-track/off-track classification directly informs treatment strategy selection in anterior shoulder instability. 16 Second, the framework reduces clinicians’ computational burden: in conventional practice, GTW requires formula-based calculation (GTW = 83%·D − d) and subsequent comparison with HSI, whereas our semiautomated computational module performs these calculations and the on-track/off-track determination automatically after the user inputs a small set of measured values, thereby minimizing manual arithmetic steps and potential calculation errors—particularly for junior physicians. Finally, by lowering technical and arithmetic barriers while providing consistent quantitative outputs, the proposed workflow may improve accessibility as a decision-support tool for less-experienced clinicians and in settings without specialized shoulder surgeons, thereby facilitating more standardized assessment, counseling, and referral/surgical planning.
Despite the clinical relevance of our findings, several limitations should be noted. First, the relatively small sample size (n = 43) may limit the generalizability of the model. Future studies should include larger cohorts and incorporate data from multiple imaging centers and equipment vendors to enhance the algorithm's robustness and adaptability to heterogeneous datasets. Additionally, the development of deep learning-based modules capable of automatically detecting GTW and HSI parameters could further facilitate fully automated measurement with minimal human intervention.
Second, HSI determination still involves a degree of subjectivity, particularly when the medial margin of the rotator cuff footprint is indistinct or anatomic boundaries are altered, which may reduce interobserver consistency. This observation aligns with the findings of Cantarelli Rodrigues et al, 4 who reported similar challenges in 3D MRI modeling. To address this issue, future research should incorporate more objective and standardized measurement criteria, such as automated recognition of key humeral landmarks or geometric features of the humeral head.11,12
Finally, cases with severe anatomic abnormalities, such as fractures or tumors, were excluded from this study. As such, the applicability of our model in special pathological scenarios remains to be validated. Subsequent studies should evaluate the model's performance across more complex case types to further assess its clinical generalizability and boundaries of use.
Conclusion
This study developed an automated evaluation workflow based on deep learning that is capable of efficiently and accurately performing CT image segmentation and bone defect quantification for anterior shoulder dislocations, significantly improving diagnostic efficiency and consistency. The results demonstrate that the system exhibits excellent reliability and clinical applicability in determining whether the HSL is "on-track" or "off-track," providing robust support for preoperative planning in patients with shoulder instability.
Footnotes
Final revision submitted January 4, 2026; accepted February 23, 2026.
One or more of the authors has declared the following potential conflict of interest or source of funding: This work was supported by the Key R&D Project of Jilin Provincial Department of Science and Technology (20250206025ZP). AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Ethical approval for this study was obtained from the Ethics Committee of The Second Hospital of Jilin University.
Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding authors on reasonable request.
