Abstract
Background:
The proximal tibial epiphyseal inclination can be used as a prognostic factor for good results after knee osteotomy and measured using the tibial bone varus angle (TBVA). This angle depends on the visibility of the epiphyseal plate, which has shown poor reproducibility when measured on standard radiographs by conventional methods.
Purpose:
To evaluate the measurement reliability of the TBVA and other angles based on the epiphyseal scar using a digital image display.
Study Design:
Cohort study (diagnosis); Level of evidence, 3.
Methods:
A total of 100 whole-leg radiographs were analyzed twice by 3 orthopaedic surgeons from 2 countries in a blinded and randomized manner. Observers measured the hip-knee-ankle angle, mechanical lateral distal femoral angle, medial proximal tibial angle, and TBVA. The growth plate–tibial plateau (GPTP) angle, defined as the angle between the epiphyseal scar and tibial plateau, was measured; this angle has not yet been described for osteotomy. In addition, a modified version of the TBVA (mTBVA), defined as that between the epiphyseal scar, its center, and the center of the talus, was measured. The Ahlbäck score for osteoarthritis and a 3-grade score for epiphyseal scar visibility were also determined. The reliability of the angle measurements and scoring was evaluated using the Fleiss kappa and intraclass correlation coefficient (ICC).
Results:
The scores for epiphyseal scar visibility showed fair interobserver (Fleiss kappa correlation coefficient [κ] = 0.29-0.35) and strong intraobserver (Fleiss κ = 0.62-0.69) reliability. TBVA, GPTP angle, and mTBVA measurements showed good interobserver reliability (ICC, 0.76-0.77), while the GPTP angle achieved excellent intraobserver reliability (ICC, >0.9).
Conclusion:
Using digital image display, angles that depend on the epiphyseal scar—such as TBVA, GPTP angle, and mTBVA—can achieve acceptable measurement reliability despite the low agreement on the visibility of the epiphyseal scar.
Keywords
Knee osteotomy for deformity correction has been widely used to treat or delay degenerative changes and the necessity of knee replacement. 2 The outcomes of knee osteotomy are usually measured by survival in years before conversion to knee arthroplasty, and the 10-year survival rate4,7,17,19,20 varies from 51% to over 90%. Many factors influencing survival and outcome in knee osteotomy have become evident. Among these are age, weight, grade of deformity, and degree of correction.3,16 To assess the deformity on the anteroposterior (AP) plane, many angles are measured on whole-leg radiographs (WLRs), 21 the most prominent being the hip-knee-ankle (HKA) angle, mechanical lateral distal femoral angle (mLDFA), and medial proximal tibial angle (MPTA). 13
In tibial osteotomy, surgeons have paid particular attention to the tibia’s proximal epiphyseal alignment. In 1991, Bonnin and Levigne 5 from the Lyon Knee School described the tibial bone varus angle (TBVA). They postulated that the varus inclination of the tibial epiphysis was a critical prognostic outcome factor. Other authors have investigated the radiographic reliability of the TBVA and found difficulties in reproducing reliable measurements. In 2005, Jenny et al 9 compared measurement results from 50 knee radiographs by 2 observers using the intraclass correlation coefficient (ICC) and calculated an intraobserver reliability of 0.62 and an interobserver reliability of 0.41. In contrast, ICCs for a common angle such as the HKA angle are usually 29 >0.9. In a study, van Raaij et al 23 investigated 83 tibial osteotomies and determined the preoperative TBVA. They reported ICCs ranging from 0.52 to 0.48 for inter- and intraobserver reliability, respectively, and concluded that TBVA assessments did not seem reliable. The lack of clear visibility of the epiphyseal scar has emerged as the main problem in these studies. However, both sets of authors reported that observers had used manual goniometers, which may also have limited the reproducibility.9,23
Apart from the TBVA, some authors have suggested calculating the angle between the epiphyseal scar, growth plate, and tibial plateau, namely the growth plate–tibial plateau (GPTP) angle. 8 However, the clinical significance of osteotomies has not yet been determined.
Since digital image display provides tools such as zoom and contrast, it is said to increase the general visibility of landmarks 10 and thus the visuality of the epiphyseal scar. This study aimed to evaluate the reliability of the epiphyseal scar–based angles by digital image analysis compared with the standard angles measured before knee osteotomy. We hypothesized that enhanced visibility of the epiphyseal scar would improve the measurement reliability of angles based on the epiphyseal scar.
Methods
In this study, we selected the latest AP WLRs from a population of White patients, including 47% men and 53% women, with a mean body mass index of 24.9 ± 4.2 kg/m2. Images were not preselected and contained all indications, including older patients for knee replacements and younger patients for potential knee osteotomy. Images were analyzed by 3 fellowship-trained orthopaedic surgeons (M.P., B.d.G., M.J.) from 2 different countries. Since the lowest available ICC for the TBVA in the literature 9 was 0.41, a minimum acceptable reliability of 0.4 and an expected reliability of 0.65 were assumed. Using 3 observers, a significance level of 0.05, and a power of 0.8, the minimum sample size was 40 according to Walter et al. 30 Ultimately, 100 images were selected and analyzed. To perform intraobserver reliability studies, 50% of the images were analyzed twice.
All procedures were performed in accordance with the ethical standards of the institutional and national research committee, the 1964 Declaration of Helsinki and its subsequent amendments, or comparable ethical standards.
Scores and Measurements
On every image, observers determined 2 scores: the Ahlbäck score for osteoarthritis 1 (Figure 1) and a score to evaluate the visibility of the proximal epiphyseal scar on the tibia. Scar visibility was graded as either “well visible,”“visible,” or “not visible” (Figure 2). A similar grading scale has already been used and validated. 6 In addition, the observers determined the measurements of 6 angles. Among these were the HKA, the LDFA, the MPTA (Figure 3), and the TBVA (Figure 4). The landmarks for the TBVA are the center of the proximal tibial epiphyseal scar, the center of the proximal tibial interspinous point, and the center of the talus. It was postulated that it therefore captures the epiphyseal inclination. 5 Moreover, the GPTP angle was measured. It was defined as the angle between the tibial plateau and the proximal tibial epiphyseal scar 8 (Figure 4). Finally, an angle that has not yet been described was measured between the center of the talus, the center of the epiphyseal scar, and the lateral epiphyseal scar (Figure 4). This new angle was named the “modified TBVA” (mTBVA).

Each anteroposterior knee radiograph was graded using the Ahlbäck score for osteoarthritis: grade 1 = joint space narrowing (<3 mm); grade 2 = joint space obliteration; grade 3 = minor bone attrition (0-5 mm); grade 4 = moderate bone attrition (5-10 mm); and grade 5 = severe bone attrition (>10 mm).

The epiphyseal scars were graded by 3 observers as “well visible,”“visible,” or “not visible.” They were able to use zoom and contrast tools to increase visibility.

The HKA angle was determined using the Cobb angle tool in Tyche. Landmarks were (from proximal to distal) the femoral head center, the center of the distal femoral intercondylar notch, the proximal tibial interspinous point, and the center of the talus. The mLDFA was determined using the standard angle tool. Landmarks were (from proximal to distal) the femoral head center and the center of the tangential of the distal femoral condyles. The MPTA was determined using the standard angle tool. Landmarks were (from proximal to distal) the center of the tangential of the tibial plateau and the center of the talus. The orange lines show the “auxiliary lines,” and the white lines show the angle tools. HKA, hip-knee-ankle; mLDFA, mechanical lateral distal femoral angle; MPTA, medial proximal tibial angle.

The TBVA, GPTP angle, and mTBVA rely on the visibility of the epiphyseal scar. The TBVA was determined using the standard angle tool. Landmarks were the center of the epiphyseal scar (orange line), the proximal tibial interspinous point, and the center of the talus. The GPTP angle was determined using the Cobb angle tool in Tyche. Landmarks were the tibial plateau and a line intersecting the most medial and most lateral part of the growth plate (eg, epiphyseal scar). The mTBVA was determined using the standard angle tool. Landmarks (from distal to proximal) were the center of the talus, the center of the epiphyseal scar, and the lateral axis of the epiphyseal. The orange lines show the “auxiliary lines,” and the white lines show the angle tools. The blue arrows indicate the center of the epiphyseal scar. GPTP, growth plate–tibial plateau; mTBVA, modified tibial bone varus angle; TBVA, tibial bone varus angle.
Image Analysis Using Tyche
The online tool Tyche® (Philipp Schippers) was utilized to facilitate a multicenter study and obtain objective measurement results.14,24,25,28 Fully anonymized images were temporarily uploaded to Tyche, where only dedicated observers had temporary access with encrypted connections. Observers analyzed the images blinded and in random order inside a web browser; they were provided with standard imaging tools (eg, pan, zoom, and contrast) and angle tools (eg, standard angle, Cobb angle). In addition, they could store results on the same window inside tailored input forms. After the observers had finished the analysis, the results from Tyche were exported into a spreadsheet.
Statistical Analysis
Prism 9.4 (GraphPad Software) and XLSTAT 2019 (Lumivero) were used for statistical analysis. For all angles, mean values and standard deviations were calculated. According to Popović and Thomas, 22 measurement accuracy was calculated as follows: for every image, the standard deviations for every angle were calculated for the 3 observers. The mean of these standard deviations was calculated and termed the “mean of individual standard deviations,” with a low value indicating that the results between the observers varied very little. The Kruskal-Wallis test was used for statistical analysis.
In addition, ICCs for inter- and intraobserver reliability were calculated. In case the epiphyseal scar was not visible (see Figure 2), observers were able to state that measurements for angles relying on the epiphyseal scar were not possible. For the nonmetric parameters (Ahlbäck score and epiphyseal scar visibility), the Fleiss kappa correlation coefficient (κ) was determined. To assess whether any angles were associated with one another, we used the Spearman correlation coefficient (rS). The ICC, Spearman correlation, and Fleiss κ values were interpreted (Table 1).
Interpretations for ICC, Spearman Correlation, and Fleiss Kappa Coefficient Values
ICC, intraclass correlation coefficient.
According to Koo and Li. 11
According to Schober et al. 26
According to Landis and Koch. 12
Results
Angle Measurements
The mean values for the measured angles were as follows: HKA angle, 176.9°± 5.8°; mLDFA, 88°± 1.8°; MPTA, 87.3°± 2.3°; GPTP angle, 3°± 1.9°; TBVA, 5.2°± 3.1°; and mTBVA, 88.3°± 2.2° (Figure 5 and Table 2). The HKA angle had the lowest mean of individual standard deviations (0.41), which was significantly lower than those of the mLDFA (0.76) and the MPTA (0.96). The means of individual standard deviations were not statistically different between the mLDFA and the MPTA. The GPTP angle, TBVA, and mTBVA had higher means of individual standard deviations (GPTP angle, 1.2; TBVA, 1.72; and mTBVA, 2.1) but without significant differences (Figure 6 and Table 2).

Violin plots showing the mean (dashed line), standard deviations (dotted lines), and distribution of all angle measurements by the 3 observers. GPTP, growth plate–tibial plateau angle; HKA, hip-knee-ankle angle; LDFA, lateral distal femoral angle; MPTA, medial proximal tibial angle; mTBVA, modified tibial bone varus angle; TBVA, tibial bone varus angle.
Measurement Results From All Observers for All Images a
Data are reported as mean ± SD (rounded range) unless otherwise indicated. GPTP, growth plate–tibial plateau angle; HKA, hip-knee-ankle angle; mLDFA, mechanical lateral distal femoral angle; MPTA, medial proximal tibial angle; mTBVA, modified tibial bone varus angle; TBVA, tibial bone varus angle.
The mean of individual standard deviations is calculated as the mean from standard deviations between the observers on every image. It can be used to estimate accuracy.
The HKA angle had a significantly lower mean of individual standard deviations when compared with the mLDFA or MPTA (P < .0001).

Mean values of individual standard deviations between the 3 observers for all images. ns, nonsignificant. GPTP, growth plate–tibial plateau angle; HKA, hip-knee-ankle angle; LDFA, lateral distal femoral angle; MPTA, medial proximal tibial angle; mTBVA, modified tibial bone varus angle; TBVA, tibial bone varus angle.
Reliability of Scoring
Table 3 demonstrates the results of Ahlbäck scores and epiphyseal scar visibility calculations. According to the Ahlbäck score, 65% of patients had grade 1 and 24% had grade 2 osteoarthritis. In 48.4% of patients, the epiphyseal scar was visible. In nearly 37.6% of patients, the epiphyseal scar was considered well visible, and in 14% of patients, the scar was not visible (Table 3). The interobserver reliability (Fleiss κ) was 0.5 (95% CI, 0.42-0.58) for the Ahlbäck score, which was considered moderate, and the intraobserver reliability was 0.66 (95% CI, 0.54-0.76), which was considered strong (Table 4). For epiphyseal scar visibility, the interobserver reliability was only 0.29 (95% CI, 0.20-0.37), while the intraobserver reliability was 0.62 (95% CI, 0.50-0.74). When the score was simplified and the grades “visible” and “well visible” were combined, the κ coefficients increased to 0.35 (95% CI, 0.24-0.47) for interobserver and 0.69 (95% CI, 0.53-0.85) for intraobserver reliability.
Frequency of Grades From the Ahlbäck Score for Osteoarthritis and the Visibility of the Epiphyseal Scar a
OA, osteoarthritis.
Inter- and Intraobserver Reliability for Ahlbäck Score and Epiphyseal Scar Visibility a
Data are reported as Fleiss κ (95% CI). Statistical significance was found for all values (P≤ .001).
Reliability of the Angle Measurements
Apart from the interobserver agreement for mLDFA, which showed good reliability (ICC, 0.88), the HKA, mLDFA, and MPTA showed excellent reliability (ICC, >0.9). All angles that relied on the epiphyseal scar (GPTP, TBVA, and mTBVA) showed good to excellent reliability (ICC, 0.75-0.90). The only exception was intraobserver reliability for the TBVA, which was considered moderate (ICC, 0.72) (Table 5).
Inter- and Intraobserver Reliability of the Angle Measurements a
Data are reported as ICC (95% CI). Statistical significance was found for all values (P≤ .0001). GPTP, growth plate–tibial plateau angle; HKA, hip-knee-ankle angle; ICC, intraclass correlation coefficient; mLDFA, mechanical lateral distal femoral angle; MPTA, medial proximal tibial angle; mTBVA, modified tibial bone varus angle; TBVA, tibial bone varus angle.
The correlation between the angles is shown in Table 6. There was a strong negative correlation between HKA and MPTA (rS = −0.73; P≤ .001). A moderate negative correlation (rS = −0.42; P≤ .001) was also noted between the GPTP angle and mTBVA. No correlation was found between the TBVA and mTBVA (rS = 0.12; P = .177) or the GPTP angle and TBVA (rS = 0.02; P≤ .001).
Correlation Between Angles (Spearman r) a
GPTP, growth plate–tibial plateau angle; HKA, hip-knee-ankle angle; mLDFA, mechanical lateral distal femoral angle; MPTA, medial proximal tibial angle; mTBVA, modified tibial bone varus angle; TBVA, tibial bone varus angle.
Statistical significance (P≤ .001).
Discussion
The most important finding of the present study was acceptable measurement reliability for the TBVA that can be considered moderate (ICC, 0.72) for intraobserver reliability and good (ICC, 0.76) for interobserver reliability. Our results, obtained from digital image display, are better than those reported in previous reliability studies9,23 (ICC, 0.41-0.62) that used hard-copy radiographs. However, there was only fair agreement between the observers for the visibility of the epiphyseal scar, which is fundamental for measuring the TBVA and other epiphyseal scar-based angles. Thus, our initial hypothesis—that better visibility of the epiphyseal scar and, thus, higher measurement reliability could be achieved using image display techniques—can only be partially confirmed.
The HKA, mLDFA, and MPTA showed excellent intraobserver reliability (ICC, 0.96-0.99). Apart from interrater reliability for the mLDFA (good reliability; ICC, 0.88), MPTA and HKA angles showed excellent interobserver reliability as well (ICC, 0.92-0.99). These results are in line with findings in the literature, where ICCs >0.9 are usually reported.10,31 This confirms the validity of the study and allows for interpretations of the findings from the epiphyseal scar-based angles. The TBVA has been well studied in the literature; however, reported ICCs are wildly divergent and range5,9,23,27 from 0.41 to 0.99. Our findings show ICCs ranging from 0.72 to 0.76 and are thus in the middle of those reported in the literature. For the TBVA, studies have postulated a better outcome when the value is >5° according to Bonnin and Chambat, 5 >3° to 6° according to Niemeyer et al, 18 and >6° to 9° according to Schuster et al. 27 However, the mean of individual standard deviations for the TBVA from our study (1.72°) should be kept in mind. These findings indicate that individual measurements that are close to the above-mentioned intervals need to be cautiously interpreted. Furthermore, including multiple observers should be considered before deducing treatment decisions from a single measurement.
As the measurements rely heavily on the visibility of the epiphyseal scar, we investigated its visibility on standard WLRs. Even though there was only moderate agreement (Table 4), most of the images being analyzed had a low grade of osteoarthritis according to the Ahlbäck score. Hence, they were good potential candidates for knee osteotomy and thus should have had good epiphyseal scar visibility. The scar was considered visible or even well visible in 86% of the images (Table 3). However, interobserver agreement was as low as 0.29 when applying a 3-graded score and only increased to 0.35 when using a 2-graded score. This partly explains difficulties in producing reliable measurements for the TBVA and the diverging ICCs reported in the literature. The intraobserver agreement was slightly higher (κ = 0.62), which showed that observers were more consistent with their assessments. Standardizing the settings for brightness, contrast, and zoom could increase observer agreement but might make the scars less visible in general.
Apart from the visibility of the epiphyseal scar, there are several challenges when measuring the TBVA and mTBVA. Sophisticated measuring tools that allow one to precisely determine the center of a straight line are needed. If they are unavailable, observers or clinicians need to draw a first line along the epiphyseal scar, determine its length, and then draw a second parallel line with half the length above it. In addition, the measurements for the TBVA ranged between 0° and 16° (Table 2), indicating that the center of the epiphyseal scar often lies next to the mechanical axis. Thus, exact angle tools need to be employed, and being able to zoom in is crucial. To address potential difficulties arising from the proximity of the mechanical axis and the center of the epiphyseal scar, we introduced a modified version of the TBVA and named it the mTBVA (Figure 4). Interestingly, the mTBVA had slightly higher ICCs (Table 5). However, the mean of individual standard deviations was not lower when compared with the TBVA (2.10 vs 1.72, respectively). Since the mTBVA was found not to correlate with the TBVA (rS = 0.12), it might capture different geometric patterns and should therefore be studied alongside the TBVA in future osteotomy studies to determine its clinical significance. In summary, there are many pitfalls when measuring the TBVA and mTBVA. In contrast, measuring the GPTP angle only requires a Cobb angle tool and good visibility of the epiphyseal scar. Consequently, the GPTP angle had slightly higher ICCs (Table 5). Thus, it might be worthwhile to determine the clinical significance of the GPTP angle in the setting of knee osteotomy.
Limitations
This study has several limitations. First, measurements were not correlated with patient-reported outcomes or survival rates. The clinical significance, especially of the mTBVA and the GPTP, is still unknown. However, this is beyond the scope of this study, as the goal was to evaluate their radiographic measurement reliability. A higher number of observers could have further increased the validity of the results. Another weakness is that images were acquired at only 1 institution, which might have distinct ways to capture WLRs. Furthermore, the study was conducted on a population of White patients, which might limit transferring results to other races, as differences in normal alignment and angles are a known phenomenon. 15 Last, this study was performed on WLRs obtained for all indications and not specifically for knee osteotomy. However, this can also be considered a strength.
Conclusion
Using digital image display, angles that depend on the epiphyseal scar—such as TBVA, GPTP angle, and mTBVA—can achieve acceptable measuring reliability despite low agreement on the visibility of the epiphyseal scar.
Footnotes
Final revision submitted September 25, 2023; accepted November 15, 2023.
One or more of the authors has declared the following potential conflict of interest or source of funding: P.S. created and maintains Tyche software. J.-F.G. has received consulting fees from Amplitude. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Ethical approval was not sought for the present study.
