Abstract
Background:
Patellofemoral instability (PFI) is common in adolescent patients and can lead to a reduction in quality of life and function, as well as long-term arthritis. Treatment of PFI involves assessing and, at times, surgically correcting underlying anatomic abnormalities. Trochlear dysplasia is the most common anatomic risk factor present in PFI.
Purpose:
To determine the interrater reliability of existing measures and classification systems for the assessment of trochlear dysplasia.
Study Design:
Cohort study (diagnosis); Level of evidence, 3.
Methods:
As part of the PRISM (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) patellar instability Research Interest Group (RIG) project, a database was created to include 60 knees (40 with documented patellar instability and 20 control knees) with perfect lateral radiographs and magnetic resonance imaging (MRI). Five pediatric sports medicine orthopaedic surgeons, who were blinded to the diagnosis, assessed trochlear dysplasia in all knees. The axial MRI slice number selected by each reviewer for measurement was noted for each knee. The measures evaluated included crossing sign, double contour sign, presence of trochlear bump, sulcus angle, trochlear depth, lateral inclination angle, trochlear bump height, Oswestry-Bristol (OB) classification, and Dejour classification (2-grade and 4-types classification). Continuous variables were assessed using the intraclass correlation coefficient, and categorical data were evaluated using the Fleiss Kappa and percent agreement.
Results:
Of the 60 knees included in this study, 63% belonged to women, with a mean age of 14.2 ± 3 years. The mean age at MRI for the control cohort was 14.6 +3.3 years, and for the trochlear dysplasia cohort was 14 ± 2.3 years. Raters agreed on the MRI axial slice for trochlear evaluations 69% of the time. The crossing sign, double contour sign, and presence of trochlear bump on lateral radiographs demonstrated poor reliability. Continuous variables showed poor reliability, except for lateral inclination angle and trochlear bump height, which showed moderate reliability. The reliability of the OB classification and the Dejour Classification (2-grade and 4-type) was fair.
Conclusion:
Overall, the radiographic criteria for assessing trochlear dysplasia demonstrate only poor to moderate reliability when assessed by 5 pediatric sports medicine orthopaedic surgeons. A more reliable system for evaluating and classifying trochlear dysplasia would be beneficial in the future.
Patellofemoral instability (PFI) is a common condition in adolescents. The annual incidence in this population has been estimated at 29 to 42 cases per 4 100,000 but recent evidence suggests this may be increasing. 11 Recurrent dislocations after an initial event range from 18 15% to 80%. The natural history of recurrent patellar dislocations includes pain, recurrent instability, decreased physical function, and patellofemoral arthritis. 6 The cause of PFI is multifactorial, with underlying causes including anatomic abnormalities of the patellofemoral joint, ligamentous laxity, and previous trauma. Anatomic factors that increase the risk for PFI include trochlear dysplasia, genu valgum, patella alta, patellar tilt, rotational abnormalities, and an increased tibial tubercle-trochlear groove distance. Of these, trochlear dysplasia is the most common anatomic risk factor, present in 85% of patients.5,8 A normal trochlear groove is concave in the axial plane, providing bony constraint to the patella with knee flexion. In trochlear dysplasia, the groove may be shallow, flat, or convex. Recognition of the severity of trochlear dysplasia is important for risk assessment, counseling, selecting the appropriate treatment for patients with PFI requiring surgical intervention, and understanding prognosis.
For decades, the Dejour classification has been most used for identifying and classifying trochlear dysplasia. The original 3-part classification system, based on a lateral knee radiograph, was modified to the current 4-part classification system, based on a lateral knee radiograph and axial magnetic resonance imaging computed tomography (MRI/CT) images of the knee. 5 The radiographic criteria obtained from a true lateral radiograph include the crossing sign, supratrochlear bump, and double contour sign. 5 The description of trochlear dysplasia on axial MRI/CT images includes shallow, flat, convex, and cliff patterns. Using a combination of these data, the Dejour classification categorizes trochlear dysplasia into A to D. 4 Higher grades of trochlear dysplasia have been associated with higher rates of recurrence and suboptimal results after patellar stabilization surgery. 7 The Dejour classification has been further classified by Nelitz et al, 12 using 2-grade analysis as low-grade dysplasia (Dejour type A) and high-grade dysplasia (Dejour type BCD). 12 However, the reliability of this classification system can be affected by the quality of the lateral knee radiograph and by the selection of the axial MRI/CT image for analysis. 17
Because of the subjective nature (qualitative analysis) and limited reliability of the Dejour classification, other alternatives to evaluate trochlear dysplasia have been attempted. The Oswestry-Bristol (OB) classification system classifies trochlear dysplasia into 4 types based on axial MRI: normal, shallow, flat, and convex. 15 Pfirrmann et al 13 utilized objective measurements for trochlear dysplasia on MRI and noted that trochlear depth on axial MRI was specific and sensitive for the diagnosis of trochlear dysplasia. The reliability of these newer classification systems and measurements is limited in the existing literature, particularly within the adolescent population.
Commonly used qualitative measurements of trochlear dysplasia include the crossing sign, double contour sign, presence of trochlear bump (Dejour classification), and OB classification. Common quantitative measurements of trochlear dysplasia include the sulcus angle, trochlear depth, lateral inclination angle, and trochlear bump height. The purpose of this study was to evaluate the inter-rater reliability of existing trochlear dysplasia assessment tools among patients with and without patellar instability. It was hypothesized that existing trochlear dysplasia metrics would demonstrate poor inter-rater reliability.
Methods
Following institutional review board approval, a de-identified imaging database of 40 consecutive patients who had undergone patellar stabilization and had preoperative knee radiographs with a corresponding MRI within 6 months of radiographic study between 2008 and 2018, was created at a large urban pediatric hospital. The criteria for selection were the following: a perfect lateral knee radiograph with superimposition of femoral condyles, no previous surgery, and an MRI within 6 months of radiographs. These 40 cases represented a mix of low- and high-grade trochlear dysplasia, though fewer patients had high-grade trochlear dysplasia than low-grade. Another 20 patients (20 control knees) without a history of patellar instability were age- and sex-matched to the previous group and included in the database. Patients were excluded if there was a presence of a fracture on imaging, if they had previous knee surgery, if the lateral radiograph was inadequate (greater than 3 mm overlap of posterior femoral condyles), or if the MRI was not within 6 months of preoperative radiographs. 17 The senior author reviewed all radiographs and MRI and confirmed the presence of trochlear dysplasia in patients with patellar instability and absence of trochlear dysplasia in the control population, using the crossing sign for trochlear dysplasia. All radiographs and corresponding MRI images were uploaded to the medical image management system AMBRA (AMBRA Health, Raleigh, NC) in DICOM format, and access was provided to all participants. This image-sharing platform could allow the operator to measure and manipulate images, including image enlargement and adjustments to contrast and brightness. All patient-identifying information was removed to enable blinded review by study raters. The images were randomly mixed so that the control knees were scattered between the cases. Study data were collected and managed using REDCap electronic data capture tools hosted at Cincinnati Children's Hospital Medical Center. All raters were fellowship-trained pediatric sports medicine orthopaedic surgeons who were members of the Pediatric Research in Sports Medicine (PRISM) Patellofemoral Instability Research Interest Group (RIG). Each surgeon assessed all 60 knees for the crossing sign, double contour sign, presence of trochlear bump, sulcus angle, trochlear depth (Figure 1), lateral trochlear inclination angle (LTI) (Figure 2), trochlear bump height (Figure 3), OB classification, and Dejour classification (4-types and 2-grade classification). The original references for the measurements and classification systems were provided to each reviewer. Reviewers were asked to record the axial MRI slice number selected for each knee. They were also asked to select the criteria for choosing the appropriate axial slice: most proximal extent of the trochlear cartilage on a single image, 2 3 cm above the tibiofemoral joint line, 13 level of Roman arch of the posterior femoral condyles, 17 mid-patellar height, 17 and/or location of the largest epicondylar width. 3 All of these options were provided to the reviewers in this study, along with the original references and explanations from the literature. Table 1 presents the variables and assessment methods.

Trochlear depth index: The anteroposterior distances of the medial femoral condyle (a) and the lateral femoral condyle (b) are measured. The distance between the deepest point of the trochlear groove and the posterior condyles (c) is also measured. Trochlear depth is then measured utilizing the formula ([a+b]/2) – c.

Lateral trochlear index: An angle is measured between the lateral trochlear cartilaginous surface and a horizontal reference line along the posterior femoral condyles. If the apex is medial, it is assigned a positive value. If the apex is lateral, it is assigned a negative value.

Trochlear bump height: A line is drawn along the anterior femoral cortex. The most anterior point of the trochlear line is identified. The distance between these lines represents the trochlear bump.
Description of Imaging Assessments a
MRI, magnetic resonance imaging.
Statistical Analysis
Demographic variables were compared using t tests (age) and the Fisher exact test (sex). The measures assessed included continuous variables, which were analyzed using the intraclass correlation coefficient (ICC) in a 2-way mixed-effects model. Categorical data were evaluated using Fleiss's kappa. An ICC value <0.5 indicated poor reliability, between 0.5 and 0.75 indicated moderate reliability, between 0.75 and 0.9 indicated good reliability, and any value >0.9 indicated excellent reliability. 10 Good or excellent reliability was considered satisfactory reliability for the present study. Kappa results were interpreted as follows: values ≤0 as indicating no agreement, 0.01 to 0.20 as none to slight, 0.21 to 0.40 as fair, 0.41 to 0.60 as moderate, 0.61 to 0.80 as substantial, and 0.81 to 1 as almost perfect agreement. 10 Substantial or almost perfect agreement was considered satisfactory reliability for the present study. The sample size of 40 participants was estimated based on 5 raters (M.E., J.S., S.R.W., B.A.W., L.H.R.), an expected ICC of 0.5, a precision of 0.15, and a 95% CI. 1 Statistical significance was set at P < .05. Statistical analyses were performed using R statistical computing (R Foundation for Statistical Computing).
Results
Of the 60 knees included in this study, 63% belonged to women, with a mean age of 14.2 + 3 years. The mean age at MRI was 14.6 +3.3 years for the control cohort and 14 + 2.3 years for the trochlear dysplasia cohort. The percentage of women was 60% in the control cohort and 65% in the trochlear dysplasia cohort. No differences were observed in age or sex distribution between the control and trochlear dysplasia cohorts.
Across the entire cohort, the crossing sign, double contour sign, and the presence of a trochlear bump demonstrated only slight agreement. These 3 measurements had percent agreements of 11.7%, 35%, and 8.3%, respectively. The reliability for the OB classification and Dejour Classification (4-type and 2-grade) was fair. The percent agreement for the OB classification was 21.7%, 11.7% for the 4-type Dejour classification, and 25% for the 2-grade Dejour classification. When patients were separated into control and trochlear dysplasia cohorts, the reliability of these categorical measurements remained slight for the crossing sign, double contour sign, and presence of a trochlear bump. The control patients demonstrated fair reliability for the OB classification and the Dejour Classifications (4-type and 2-grade), whereas the OB classification and the 2-grade Dejour classification demonstrated moderate agreement for trochlear dysplasia patients (Table 2).
Reliability Assessments of Each Trochlear Dysplasia Radiographic Assessment a
ICC, intraclass correlation coefficient.
When assessing continuous variables across the entire cohort, the sulcus angle and trochlear depth showed poor interrater reliability. The lateral inclination angle and trochlear bump height demonstrated moderate reliability. When the cohorts were separated into trochlear dysplasia and control groups, the reliability of the lateral inclination angle and trochlear bump height was once again moderate. The sulcus angle showed poor reliability in both trochlear dysplasia and control patients. When assessing trochlear depth control, patients showed moderate reliability, whereas those with trochlear dysplasia showed poor reliability (Table 2).
Table 3 demonstrates reliability measurements stratified based on whether raters agreed on the optimal slice for assessment. Raters agreed on the MRI axial slice for trochlear evaluations 69% of the time. The percent agreement was 68% for patients with dysplasia and 75% for the control group. All raters chose the MRI slice based on the most proximal slice with cartilage coverage, although some raters used additional parameters as well. Except for trochlear bump height, the reliability of all measurements was similar, regardless of whether the raters agreed on the correct MRI slice. The trochlear bump height demonstrated moderate reliability when raters agreed on the correct slice, and poor reliability when raters did not.
Reliability Assessments Stratified Based on Agreement on Slice for Assessment a
ICC, intraclass correlation coefficient.
Discussion
Several commonly used quantitative and qualitative assessments of trochlear dysplasia exist; however, the literature supporting their reliability is sparse. This study aimed to assess the reliability of these measurements amongst 5 pediatric sports medicine orthopaedic surgeons in a blinded imaging study in patients with or without trochlear dysplasia. Our findings indicate that none of the existing measurements demonstrate satisfactory interrater reliability.
A key finding in the present study involves the selection of the appropriate MRI slice for assessing trochlear dysplasia. Previous studies have shown that MRI axial slices at different levels can alter the shape and, hence, the measurements in trochlear dysplasia. 17 The following options exist in the literature 2 for choosing the appropriate axial slice: most proximal extent of the trochlear cartilage on a single image, 3 cm above the tibiofemoral joint line, 13 level of Roman arch of the posterior femoral condyles, 17 mid-patellar height, 17 and location of the largest epicondylar width. 3 All of these options were provided to the reviewers in this study, along with original references and explanations from the literature. Reviewers were asked to indicate which definition they used for each patient. All reviewers chose the most proximal extent of the trochlear cartilage on a single image, which may reflect a common thought that evaluation of the proximal-most part of the trochlea may represent trochlear dysplasia more accurately than other parameters. Theoretically, selecting the same axial slice should increase measurement reliability. Despite this finding, the present study showed poor reliability among the different measurements. Thus, the existing measurements are unreliable, even when made on the same axial slice.
Several studies corroborate the poor interrater agreement of the Dejour classification. Remy et al 14 studied 68 lateral radiographs using 7 observers and classified trochlear dysplasia utilizing the Dejour classification. None of the 68 were recognized as having the same shape by the 7 observers. The authors concluded that the classification has low interobserver agreement. Lippacher et al 9 assessed intra- and interobserver agreement of the Dejour classification for both radiographs and MRIs. They found that the 4-grade analysis showed fair intra- and interobserver agreements (24%-78%) and that the lateral radiograph tended to underestimate the severity of trochlear dysplasia compared with axial MRI. Nelitz et al 12 modified the Dejour classification to a 2-grade analysis and reported improved interobserver and intraobserver repeatability in high-grade dysplasia (Dejour type BCD) and almost perfect inter- and intraobserver repeatability in low-grade dysplasia (Dejour type A). They concluded that measurements could not be reliably performed in high-grade trochlear dysplasia, but that trochlear inclination, trochlear facet asymmetry, and trochlear groove depth may help distinguish between low- and high-grade dysplasia. Stepanovich et al 16 analyzed 36 lateral radiographs and axial MRI scans of skeletally immature patients with PFI using the Dejour classification, trochlear depth, lateral trochlear inclination (LTI), and medial condyle trochlear offset. They determined that the Dejour classification had the poorest intra- and interobserver reliability among the radiological parameters. Tscholl et al 17 showed that the Dejour classification has fair agreement between true lateral radiographs and axial MRI.
Other methods for measuring trochlear dysplasia have also been assessed. Carrillon et al 2 found that an LTI of <11° on MRI has a sensitivity, specificity, and accuracy of 93%, 87%, and 90%, respectively, in diagnosing PFI. Pfirmann et al 13 assessed a number of quantitative criteria on axial MRI. The authors found that dysplasia can be reliably diagnosed using axial images 3 cm above the joint line. They reported that a trochlear depth of ≤3 mm had a sensitivity of 100% and a specificity of 96%. They also reported that when the trochlear bump height was >8 mm, this finding had a sensitivity of 75% and a specificity of 83% for the diagnosis of trochlear dysplasia. The present study found that LTI and trochlear bump height demonstrated moderate reliability. Nelitz et al 12 studied the correlation between the Dejour classification and several other objective parameters and found that none of these parameters correlated with the Dejour classification. Sharma et al 15 assessed 32 CT and axial MRI scans and classified each according to the OB classification and the Dejour classification systems. In their study, the OB classification showed superior intra- and interobserver agreement to the Dejour classification. In the present study, the reliability of the Dejour and OB classifications was fair.
The present study is not without limitations. All measurements were performed by the raters; one-time and intrarater reliability assessments were not performed for this study. Raters used a third-party imaging platform to enable blinded evaluation of the imaging studies, which may have differed in some measurement features (eg, imaging triangulation) from the reviewer's own institution's software. Additionally, although surgeons agreed on an axial slice definition for trochlear assessments, variation in other aspects of measurement techniques or measurement errors may have affected the quantitative measures. While more extensive rater training may have improved consensus on technique and enhanced measurement reliability, the authors believed that the study design provided a more accurate, real-world assessment of these trochlear measures. Furthermore, the limited agreement identified in this work carries significant implications for the interpretation of the existing PFI literature and our ability to make appropriate comparisons across the literature base. In conclusion, these findings support the need for a more reliable tool for assessing trochlear morphology.
Footnotes
Final revision submitted April 22, 2025; accepted June 16, 2025.
S.R.W. has received educational support from Alon Medical Technology. B.A.W. has received educational support from Liberty Surgical Inc and Arthrex Inc; and travel and lodging support from Arthrex Inc. J.S. has received nonconsulting fees from Arthrex Inc; travel and lodging support from Arthrex Inc; educational support from Arthrex Inc and Micromed Inc; and food and beverage from Arthrex Inc. L.H.R. has received honoraria from AcelRx Pharmaceuticals; educational support from Gotham Surgical Solutions & Devices; and travel and lodging support from Arthrex Inc. M.E. has received educational support and travel and lodging from Medine of Texas and Arthrex Inc. S.N.P. has received consulting fees from Pfizer Inc; and travel and lodging from Linvatec Corporation. D.G. has received nonconsulting fees from Arthrex Inc; royalties or a license from OrthopPediatrics Canada LLC and Arthrex Inc; consulting fees from Arthrex Inc and OrthoPediartics Corp; travel and lodging from OrthoPediartics Corp, Arthrex Inc, OrthopPediatrics Canada LLC, and Synthes GmbH; and faculty or speaking fees from Synthes GmbH. J.M. has received educational support from CDC Medical LLC and Arthrex Inc. J.P. has received nonconsulting fees from Arthrex Inc; consulting fees from Joint Restoration Foundation Inc and Arthrex Inc; travel and lodging from Joint Restoration Foundation Inc and Arthrex Inc; educational support from Pylant Medical and Kairos Surgical Inc; and food and beverage from Arthrex Inc. J.R. has received consulting fees from OrthoPediatrics Corp. K.S. has received consulting fees from OrthoPediatrics Corp and educational support from Evolution Surgical Inc. C.V. has received travel and lodging support from Smith & Nephew Inc.
Ethical approval for this study was waived by Cincinnati Children's Hospital (IRB ID: 2021-0126).
