Abstract
Background:
Minimally invasive or percutaneous surgery (MIS) for hallux valgus correction has seen increased adoption because of a growing evidence base of positive clinical and radiographic outcomes following surgery. However, no standardized or validated radiographic classification exists to evaluate the first metatarsal osteotomy healing following MIS hallux valgus surgery. The aim was to develop a new radiographic classification system for assessing bone healing following MIS distal transverse osteotomy for hallux valgus.
Methods:
A 4-domain radiographic classification system based on callus formation, anteroposterior (AP) osteotomy line, lateral osteotomy line, and remodeling for MIS osteotomy healing was developed and tested on a cohort of 27 feet that underwent percutaneous transverse osteotomy for hallux valgus correction. Patients had simultaneous postoperative weightbearing computed tomography (WBCT) and standard radiographs following surgery. Five surgeons reviewed anonymized radiographs to evaluate interobserver reliability. WBCT was used to confirm union status and classification interpretation.
Results:
The classification system demonstrated substantial interobserver reliability for lateral osteotomy line (Fleiss kappa = 0.671, 95% CI 0.505-0.814) and AP osteotomy line assessment (Fleiss kappa = 0.664, 95% CI 0.459-0.811), with moderate agreement for callus formation (κ = 0.465) and remodeling (κ = 0.439). The classification showed strong correlation with WBCT findings, with an optimal threshold of 8 points identified to differentiate union from nonunion, achieving an overall classification accuracy of 85.2%. This finding was supported by the area under the receiver operating characteristic (ROC) curve of 0.832. At the optimal threshold, the classification demonstrated 90.0% sensitivity and 71.4% specificity for detecting union.
Conclusion:
This preliminary classification provides a reliable tool for assessing first metatarsal bone healing following MIS hallux valgus osteotomies, with substantial interobserver reliability. It offers a standardized approach for radiographic evaluation, which may enhance comparability across studies and serve as a radiographic research tool pending further validation. Its clinical applicability remains to be determined.
Level of Evidence:
Level III, diagnostic study.
This is a visual representation of the abstract.
Keywords
Introduction
Minimally invasive surgery (MIS) or percutaneous surgery for hallux valgus (HV) deformity correction has gained significant popularity in recent years because of growing evidence demonstrating positive improvement in clinical and radiologic outcomes.1,9,13,18,20,26 One of the key component stages in fourth generation percutaneous hallux valgus deformity correction is the percutaneous transverse osteotomy and large metatarsal head shift in order to correct the intermetatarsal angle and reduce the 1-2 intermetatarsal space. 18 This leads to a characteristic pattern of remodeling as the osteotomy heals and remodeling (Figure 1). Firstly, there is central healing followed by medial and lastly lateral bone formation. On the lateral view, the osteotomy unites from the dorsal side to the plantar side. Typically, patients are restricted from high-risk activities such as running and jumping until bony union is achieved. 28

Stages of healing and pattern of remodeling following percutaneous hallux valgus surgery using a distal extra-articular transverse osteotomy.
There is currently no validated radiographic criteria for assessing healing patterns and no universal agreement on what constitutes radiographic bony union in the context of MIS hallux valgus surgery. Traditional radiologic markers of bone healing, such as callus formation and cortical continuity, can be difficult to interpret in MIS because of altered healing patterns and remodeling compared with traditional open osteotomy techniques. 30 Currently, there is little published regarding osteotomy healing following percutaneous hallux valgus correction and no validated classification. Blitz et al evaluated 172 feet with mean 8-month follow-up and described the lateral cortical remodeling following MIS. 3 Spacek et al 34 noted in a cadaveric study that the lateral pyramidal space is contained, which may explain the pattern of remodeling visualized in this region. The lack of standardized assessment tool for osteotomy healing means there is potential for considerable variability in assessment and documentation of radiographic healing following minimally invasive hallux valgus procedures. This is particularly relevant in cases of suspected delayed union or discrepancy between radiologic healing and clinical symptoms.
Studies investigating the definition of fracture healing have found a lack of consensus in the current orthopaedic literature. 5 Without valid and reliable clinical or radiographic measures of union, the interpretation of studies reporting bony union remains difficult. There is a need to develop standardized and validated criteria for radiographic bone healing following percutaneous HV surgery to provide physicians and researchers with reproducible and reliable information. This can guide surgeons advising patients with regard to return to activity, particularly when there are concerns over delayed or nonunion as well as draw comparison with other studies of percutaneous techniques.
Aims
This preliminary study aims to evaluate the interobserver reliability in assessing bone healing following a percutaneous transverse osteotomy for hallux valgus surgery and compare this classification by comparison to weightbearing computed tomography (WBCT).
Methods
Development of Classification
We performed a scoping literature search to identify existing osteotomy or fracture healing classifications.3 -5,11,16,23,25,35 We then modified this to create a new classification that reflected the specific pattern of osteotomy healing based on plain radiographs that has been observed following percutaneous transverse metatarsal osteotomy as shown in Table 1, Figures 1 and 2. The classification consists of 4 domains (callus formation, visibility of anteroposterior [AP] and lateral osteotomy lines, and remodeling). Each domain had 4 grades, assigned a score of 0-3. A total score for each case could be generated through summation of the individual domain scores. The maximum possible score was 12.
Classification and Scoring Guide for Percutaneous Osteotomy Healing on Radiographs: The Percutaneous Osteotomy Healing Score.
Abbreviations: AP, anteroposterior; XR, radiograph.
Remodeling based relative to the medial and lateral cortex of the proximal first metatarsal fragment. Lateral: bone formation and remodeling lateral to the proximal screw; central: bone formation and remodeling between the proximal and distal screw; medial: bone formation and remodeling medial to the distal screw.

Illustrated anteroposterior and lateral radiographs highlighting key radiographic features of the percutaneous osteotomy healing classification including medial, central, and lateral remodeling zones and the presence of visible osteotomy lines.
Study Design
This is a preliminary observational analysis designed to evaluate the interobserver reliability of our proposed classification of osteotomy healing following percutaneous transverse osteotomy for hallux valgus correction (Percutaneous Osteotomy Healing Score [PERC-OS]). This study was reported in line with the Guidelines for Reporting Reliability and Agreement Studies (GRRAS). 14 This study was not designed to assess the time taken for union to occur or the impact of osteotomy configuration on rate of osteotomy healing.
We planned to validate the accuracy of the classification by comparing the classification outcome of radiographic union or nonunion based on plain radiographs by comparing against WBCT imaging performed at the same time as the radiographs (Figure 3). We hypothesized that if the classification was valid then predicted union on plain radiographs would correlate with union on WBCT imaging. All WBCTs were reviewed by a single consultant radiologist, with radiologic union defined as follows:
Cortical continuity in at least 3 of 4 cortices (medial, lateral, dorsal, and plantar) with bridging bone visible on multiplanar reconstructions
Trabecular bone crossing the osteotomy site in at least 50% of the cross-sectional area
We purposefully identified patients with imaging at various postoperative time points to capture a range of healing stages, with some at early time points when radiographic union would not be expected and others at later time points when complete healing would be anticipated. It is important to note that our use of the terms “union” and “nonunion” in this study refers specifically to the radiographic appearance at the time of imaging acquisition rather than the clinical definition of nonunion (traditionally defined as absence of healing after 6 months). We have deliberately chosen this terminology to evaluate the classification’s ability to distinguish between radiographically united and nonunited osteotomies at any given time point, rather than making a clinical determination of pathologic nonunion.

Flowchart demonstrating the methodology of the study.
Each surgeon assessed a series of cases with standard weightbearing AP and lateral radiographs using an online platform for image distribution and evaluation. The images were anonymized to maintain confidentiality and ensure unbiased assessments. The WBCT scans were performed at a single institution, using consistent imaging protocols to assess bone healing.
Setting and Rater Population
The study was conducted across multiple centers, with surgeons based in Europe, America, and Australia. We purposefully recruited surgeons including both experienced and inexperienced minimally invasive surgery practitioners, to participate in the image review process to enhance the validity and generalizability of our classification.
Study Population
We identified consecutive patients diagnosed with symptomatic hallux valgus who had exhausted conservative treatment and underwent fourth-generation percutaneous hallux valgus deformity correction using a transverse osteotomy between July 2022 and September 2023. Patients who had undergone simultaneous postoperative WBCT and plain radiographs at some point in the postoperative period as part of routine care were retrospectively selected from a research database. Exclusion criteria were adolescent patients (age ≤ 17 years), history of prior forefoot surgery, peripheral neuropathy, and patients with incomplete clinical and radiographic data. The studied cohort consisted of 27 feet from 18 patients. Demographic data including age, sex, laterality, and union status (from WBCT) was also collected (Table 2). The mean radiographic follow-up was 0.66 ± 0.31 years (0.16-1.38), and 74.1% of feet demonstrated radiographic union on WBCT.
Patient Demographics.
Variables
The primary variables assessed in this study were the 4 radiographic markers of bone healing, as defined in our classification system. These variables were used to evaluate union at the osteotomy site and included callus formation, cortical continuity, trabecular bridging, and remodeling. The primary outcome was the interobserver reliability and correlation with bony union.
Data Sources and Measurement
The scans were evaluated using the proposed 4-domain classification system explicitly developed for radiographic assessment of healing following MIS hallux valgus correction. Each surgeon independently reviewed the 27 cases. The interobserver reliability was determined by comparing the ratings across all surgeons. The mean classification score for each case was compared to the WBCT assessment of radiologic union to assess validity. Radiographic union on WBCT was based on an independent consultant radiologist assessment of the scan.
Bias
To reduce bias, the surgeons received standardized training on the classification system prior to the image review. Furthermore, the scans were anonymized before distribution to the surgeons. Each surgeon was masked to the other surgeons grading of each scan and also the WBCT results.
Study Size
We reviewed the literature investigating the interobserver reliability of osteotomy/fracture healing classifications to guide a sample size calculation. Sample size calculation determined that 27 cases and 5 raters would provide 80% power to detect a difference between moderate agreement (κ = 0.40) and substantial agreement (κ = 0.75), with α = 0.05. These thresholds were selected a priori based on established criteria for meaningful improvements in interrater reliability. 33 This is the first study to evaluate this novel classification system for bone healing following percutaneous transverse osteotomy in hallux valgus surgery; there were no previous studies on which to base our power calculation. Our sample size was determined based on healing of other distal extremity fractures. Recognizing the limitations of sample size in reliability studies, we specifically included WBCT comparison to provide additional objective evidence of bony radiographic union for the classification system’s validity.
Statistical Methods
Statistical analysis was performed using the SciPy. Continuous and categorical data were reported with descriptive statistics. Fleiss kappa analysis with 95% CIs was used to assess the interobserver reliability between observers.8,10 The Landis and Koch criteria were used to judge interobserver reliability: 0.00-0.20, poor agreement; 0.21-0.40, fair; 0.41-0.60, moderate; 0.61-0.80, good; and 0.81-1.00, very good. 15
To assess the validity of our classification system, we performed a comparative analysis using computed tomography (ie, WBCT) as the gold standard. The WBCT scans were used to definitively categorize each case as either “united” or “not united.” We then applied our classification scoring system to the corresponding radiographs. Using logistic regression, we analyzed the relationship between radiographic classification scores and the binary WBCT-determined radiographic union status. We generated a receiver operating characteristic (ROC) curve to evaluate the classification system’s diagnostic accuracy and determine the optimal score threshold for predicting radiographic union.
Results
In September and October 2024, 5 surgeons of a range of experience levels (3 consultant, 1 fellow, and 1 trainee) reviewed and completed radiographic assessment of the study population using the classification. Patient demographics are shown in Table 2.
Primary Outcome
The primary outcome was the interrater reliability of our classification system using Fleiss kappa coefficient across the 4 radiographic domains in this classification. There was substantial agreement between observers in assessment of the lateral osteotomy line (κ = 0.671, 95% CI 0.505-0.814) and AP osteotomy line (κ = 0.664, 95% CI 0.459-0.811). 15 There was moderate agreement in the callus formation domain (κ = 0.465, 95% CI 0.281-0.602), and remodeling domain (κ = 0.439, 95% CI 0.234-0.612).
Table 3 demonstrates the summarized cell counts by domain.
Summarized Cell Counts by Domain for a Classification of First Metatarsal Osteotomy Healing Following Minimally Invasive Hallux Valgus Surgery.
Abbreviations: AP, anteroposterior; XR, radiograph.
Preliminary Validation and Correlation of Classification Threshold With Bony Union
The classification of union outcomes was compared against WBCT imaging, which served as the reference standard for confirming the true union status of each case. ROC curve analysis of the combined scoring system demonstrated good discriminative ability with an AUC of 0.832. Using Youden index, an optimal threshold score of 8 was identified to balance sensitivity and specificity. At this threshold, a 2 × 2 contingency table (Table 4) achieved a sensitivity of 90.0% and specificity of 71.4% with an overall accuracy of 85.2%. The positive predictive value was 75.9% and the negative predictive value was 87.7%. The F1 score was 90.0%.
Contingency Table Based on Classification Threshold of 8 to Identify Bony Union.
Abbreviation: WBCT, weightbearing computed tomography.
When analyzed individually, all 4 radiographic parameters demonstrated fair discriminative ability with AUC values ranging from 0.746 to 0.786 (Figure 4). Using an optimal threshold of 2, the parameters showed varying sensitivity and specificity patterns. The AP osteotomy line demonstrated the highest sensitivity (95.0%) but lowest specificity (42.9%), whereas the lateral osteotomy line showed the highest specificity (85.7%) with lower sensitivity (60.0%). Callus formation and remodeling showed more balanced diagnostic performance characteristics, with accuracies ranging from 74.1% to 81.5%. Detailed ROC analysis metrics for each parameter are presented in Figure 4.

Receiver operating characteristic (ROC) analysis of osteotomy healing classification system. (Left) Individual domain ROC curves demonstrating varying diagnostic performance. (Right) Combined score ROC curve showing improved diagnostic performance with an optimal score threshold of 8.00. Diagonal reference lines represent random chance (area under the curve = 0.5).
Further analysis of scoring thresholds revealed clinically relevant decision points that enable a practical approach to assessing healing status. Scores below 6 demonstrated perfect sensitivity (1.000) and reasonable specificity (0.429) for identifying nonunion, suggesting that patients below this threshold warrant careful monitoring and potential intervention. Scores between 6 and 8 represent an indeterminate zone where union status is unclear and continued protection with close radiographic follow-up is advised. Scores above 8 showed optimal discriminative ability for union (sensitivity 90.0%, specificity 71.4%), indicating that patients above this threshold can be considered to have achieved satisfactory radiographic healing suitable for progressive rehabilitation.
Discussion
This study presents a new preliminary classification system that may have some validity for assessing the healing of first metatarsal osteotomies following percutaneous hallux valgus correction, demonstrating substantial interobserver reliability and preliminary validation against WBCT. The kappa scores observed in our analysis and correlation with bony union as assessed on WBCT suggest this classification is generalizable with external validity. This classification provides a reliable framework for monitoring the progress of first metatarsal osteotomy healing, offering structured radiographic insights that may complement clinical decision making. However, its utility in guiding patient management remains unproven and requires further validation. This 3-tier approach as shown in Figure 5 with scores <6 indicating likely nonunion, 6-8 indicating unclear union status, and >8 indicating likely union provides clinicians with a framework for radiographic assessment while acknowledging the continuous nature of the healing process.

Summary of preliminary classification osteotomy healing score and score implications.
Our findings demonstrate that this radiographic classification system for osteotomy healing achieves substantial interobserver reliability for osteotomy line assessment (κ > 0.66 for both AP and lateral views) while maintaining moderate reliability for the more subjective parameters of callus formation and remodeling (κ = 0.465 and 0.439, respectively). This gradient of reliability across different domains aligns with previous studies of fracture healing classification systems, where more objective parameters typically show higher reliability than those requiring qualitative judgement. 2 The system’s overall diagnostic accuracy of 85.2% against WBCT, with a positive predictive value of 75.9% and negative predictive value of 87.7%, suggests it provides clinically meaningful guidance for healing assessment. Particularly noteworthy is the high sensitivity (90.0%) at the optimal threshold score of 8, indicating the system’s strong ability to identify true radiographic unions while maintaining acceptable specificity (71.4%) to avoid missing nonunions. The high negative predictive value (87.7%) suggests that surgeons can use this system to identify cases requiring continued monitoring or intervention. The threshold score of 8 may offer a reference point for radiographic assessment of healing, though its use in guiding weightbearing or return to activities requires further clinical validation. Furthermore, the system’s objective nature using multiple domains makes it useful for longitudinal monitoring of healing progression.
Remodeling of the osteotomy site generally follows a predictable course as highlighted in Figure 1 though the specific factors influencing the rate and pattern of healing remain unclear. A recent study on the pyramidal space surrounding the lateral osteotomy site found that this is a contained area. 34 We hypothesize that this enclosed space contains bone swarf, cells, and other inflammatory mediators that promote lateral cortical remodeling. The remodeling process is not limited to lateral remodeling, with one study reporting long-term radiographic outcomes reporting that metalwork prominence secondary to medial remodeling and resorption can be an issue. 21 Further work on understanding first metatarsal remodeling is needed. High-volume surgeons will inevitably encounter cases such as those shown in Figure 6 where, despite bony union, lateral, central, or medial remodeling fails to occur, leading to union of <25% of the MT neck width. We call this the “Filament-Union” sign characterized by bony union <25% of metatarsal neck with absent or minimal central, medial, and lateral bony remodeling to reflect the thin bony union bridge. 22

Plain radiograph of a 55-year-old woman taken 12 months following percutaneous hallux valgus surgery demonstrating bony union with absent medial or lateral remodeling. The patient was clinically asymptomatic.
Blitz et al 3 also noted this phenomenon in their study of 172 feet whereby they classified bone healing into 3 types of lateral remodeling. These cases where lateral modeling fails to occur are of particular concern because of the (presumably) increased risk of fracture or metalwork failure due to poor bony consolidation. Several factors may explain this phenomenon, including the fixation technique, number of screws used (1-screw vs 2-screw fixation 19 ), bone density,6,27 and patient factors and comorbidities. It is hypothesized that reduced mechanical stability, as seen with single-screw or intramedullary fixation, may impact callus formation per Wolff law. However, clinical studies focusing 1 screw vs 2 screws have so far not shown any difference in union rates.3,12 The role of fixation and union/remodeling should certainly be further investigated as there are numerous studies using first- (no metal fixation) or second-generation (Kirschner-wire fixation) techniques that do not report high levels of nonunion (although loss of position is certainly higher with less rigid fixation). 7 Other factors that could hypothetically affect bone healing and subsequent remodeling include thermal injury from the burr,31,32 tourniquet usage, 17 and osteotomy location, configuration, and percentage head shift. 24 Reports of nonunion following percutaneous hallux valgus correction are generally rare, with systematic reviews and large clinical studies indicating consistently low rates of nonunion regardless of osteotomy technique.1,9,13,20,21,26,29,36 These potential explanations warrant further investigation and could help guide future modifications to surgical technique or postoperative management.
The strengths of this study include the use of multiple independent observers with varying degrees of experience to assess interobserver reliability, and the use of WBCT to confirm union helps validate the generalizability of our study. However, there are important limitations to acknowledge, of which the most important is the lack of intraobserver validation of this classification. We were also unable to perform clinical assessment and correlate the clinical and radiologic findings of this classification. Pain on weightbearing is a well-recognized indication of incomplete bony union that is not accounted for with our classification. We purposefully chose to exclude this because of the subjective nature of pain and high risk of reporting or response bias. Second, radiographic signs of bony healing may lag behind the actual functional recovery of patients, which may affect the interpretation of the correlation between radiologic and clinical outcomes. The sample size calculation was limited by the novel nature of this classification system, with no previous studies available for reference. Our sample size, although meeting minimal requirements for statistical power, is modest and was limited by the number of patients undergoing simultaneous WBCT and radiographs. However, the study may have been underpowered to detect the predetermined difference between moderate (κ = 0.40) and substantial agreement (κ = 0.75), as evidenced by wide CIs and kappa values that failed to reach the target effect size. As such, these findings warrant further validation in a larger, independent cohort. Additional limitations include the fact each WBCT was read by only 1 observer without testing the interobserver reliability of this reference standard. We treated each osteotomy as an independent case, which may not fully account for within-subject correlation in bilateral cases and the uneven distribution of scores could affect the kappa statistics’ stability and interpretation. Notably, our population had a high proportion of female patients, and the union-skewed population may limit the generalizability and robustness of the findings, which should be taken into account when interpreting the results. Furthermore, our assumption that the classification can be applied at any postoperative time point lacks empirical validation across a full healing timeline, and future studies should assess its reliability at specific postoperative intervals. We wish to emphasize that all references to “union” in this study denote radiographic union only, and do not imply clinical or functional union, or clinical outcomes. A significant limitation is that radiographic signs of ossification may not directly correlate with the mechanical stability or clinical resistance of the healing bone. This potential disconnect between imaging findings and functional outcomes should be tested in future studies through biomechanical testing and correlation with clinical outcomes to determine if radiographic union truly reflects functional stability.
Conclusion
This preliminary study reports a classification system for assessment of union following a percutaneous transverse distal osteotomy for hallux valgus. This system may aid researchers in standardizing the radiographic assessment of bone healing following MIS osteotomy. Its clinical role remains investigational and awaits further prospective validation.
Supplemental Material
sj-pdf-1-fao-10.1177_24730114251345818 – Supplemental material for Preliminary Radiographic Classification of First Metatarsal Osteotomy Healing Following Minimally Invasive Hallux Valgus Surgery
Supplemental material, sj-pdf-1-fao-10.1177_24730114251345818 for Preliminary Radiographic Classification of First Metatarsal Osteotomy Healing Following Minimally Invasive Hallux Valgus Surgery by Thomas L. Lewis, Sanjana Mehrotra, Jonathan Kaplan, Tyler Gonzalez, Sergio Morales, Thomas J. Goff, Vikramman Vignaraja, Ayla Claire Newton, Robbie Ray and Peter Lam in Foot & Ankle Orthopaedics
Footnotes
Ethical Approval
This study complied with the Declaration of Helsinki and was approved by the institutional review board. All observers provided informed consent.
Declaration of Conflicting Interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Thomas L. Lewis, MBChB(Hons), BSc(Hons), FRCS(Tr&Orth), MFSTEd, reports royalties and consulting from Vilex beyond the scope of this study and PhD tuition fees supported by MIFAS. Jonathan Kaplan, MD, reports disclosures relevant to manuscript from royalties Enovis/DJO, staples; royalties from Treace Medical, hallux valgus implants; royalties from Vilex, calcaneal osteotomy; Artelon, paid consultancy; Edge Surgical, paid consultancy; and general disclosures from AOFAS Governance Committee Chair, unpaid; AOFAS Education Committee, unpaid; Foot & Ankle Orthopaedics journal editor, unpaid; Treace Medical, stock; GLW Medical Innovation, investor/stock options; Exactech (discontinued), surgeon advisory board; Surgical Fusion Inc (discontinued), consultant; and Royalty (soft tissue anchor). Tyler Gonzalez, MD, MBA, reports consultancy fees from Treace Medical Concepts Inc, Surgical Fusion Technologies, Stryker, Enovis, Exactech, and Surgebright and royalties from Surgical Fusion Technologies, Treace Medical Concepts, and Vilex, all outside scope of this study. Robbie Ray, MBChB, ChM(T&O), FRCSEd(Tr&Orth), FEBOT, reports prospective royalty payments from Enovis/Novastep and payment for honoraria, lectures, presentations, and speakers fees from Enovis/Novastep, Medartis/IBRA, and Marquardt UK, beyond the scope of this study. Peter Lam, MBBS(Hons), FRACS, reports royalties from Enovis, consulting fees from Enovis and Paragon 28, and payment from AOFAS for Kenneth Johnson Lecture 2024. All fees outside the scope of this manuscript. Disclosure forms for all authors are available online.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
ORCID iDs
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
