Abstract
Background:
Femoroacetabular impingement (FAI) syndrome is a cause of pain and reduced range of motion in the hip joint. Given the limited number of randomized controlled trials, prospective cohort studies constitute the dominant part of the available prospective evidence evaluating relevant clinical outcomes after arthroscopic hip surgery for FAI.
Purpose:
To assess the methodological quality of prospective cohort studies evaluating arthroscopic surgery for FAI and to determine whether there has been an improvement in methodological quality over time.
Study Design:
Systematic review; Level of evidence, 4.
Methods:
A systematic literature search was performed in PubMed, Embase (OvidSP), and the Cochrane Library. Included studies were clinical prospective cohort studies of primary arthroscopic surgery for cam and/or pincer morphology FAI. Methodological quality was assessed with the Methodological Index for Non-randomized Studies (MINORS). The mean MINORS score for studies published during the first 5 years of the period was compared with those published during the last 5 years to evaluate methodological improvement over time. The methodological quality of randomized controlled trials was also assessed with the Coleman Methodology Score.
Results:
The search yielded 53 studies. There were 34 noncomparative studies, 15 nonrandomized comparative studies, and 4 randomized controlled trials. The included studies were published between 2008 and 2017. The mean ± SD MINORS score for noncomparative and comparative studies was 10.4 ± 1.4 of 16 possible and 18.7 ± 2.0 of 24 possible, respectively. The mean Coleman Methodology Score for randomized controlled trials was 79.0 ± 7.0 of 100 possible.
Conclusion:
The methodological quality of prospective cohort studies evaluating arthroscopic surgery for FAI is moderate for comparative and noncomparative studies. Common areas for improvement include unbiased assessment of study endpoints and prospective sample-size calculations. Despite an increase in the number of published studies, an improvement in methodological quality over time was not observed.
Keywords
Femoroacetabular impingement (FAI) syndrome is a cause of pain and reduced range of motion in the hip joint. The syndrome is caused by a prominent femoral head-neck junction (ie, cam morphology) or a prominent acetabular rim (ie, pincer morphology). Patients may also present a combination of the 2 deviations, where they have both cam and pincer morphology. 25 At motion, these morphologies have the potential to cause abnormal mechanical stresses within the hip joint, which may cause subsequent soft tissue damage. 2 The surgical treatment of FAI aims to correct cam and/or pincer morphologies and repair damaged soft tissue. 62 Initially, this was performed with an open surgical approach, 24 but arthroscopic management has now emerged as the treatment of choice. 37
Ganz et al 25 formulated the modern concept of FAI in 2003, and hip arthroscopy is now one of the fastest-growing fields in orthopaedic surgery. The number of hip arthroscopy procedures increased 18-fold between 1999 and 2009. 15 There has also been a corresponding increase in the literature related to FAI. 32 In a systematic review exploring the trends in FAI-related publications between 2011 and 2015, Khan et al 38 identified 1066 published articles. As of 2018, some randomized controlled trials (RCTs) have been published; however, only 2 RCTs comparing arthroscopic treatment with physical therapy as the primary treatment have been published. 30,50
The Gothenburg Hip Arthroscopic Registry 62 and the Danish Hip Arthroscopic Registry 53 are both examples of initiatives to evaluate the arthroscopic treatment of FAI with a prospective approach, and several studies have evolved from these registers over the past few years. 45,59 –61 Given the limited number of RCTs, prospective cohort studies constitute the dominant part of the available prospective evidence evaluating relevant clinical outcomes. As the majority of patients with FAI can be included in prospective cohort studies, the risk of selection bias is reduced, thereby increasing the general applicability of the results. Nonetheless, difficulty controlling for confounding factors is a major weakness of prospective cohort studies. The methodological quality of published prospective cohort studies evaluating arthroscopic surgery for FAI has not previously been evaluated with a systematic approach.
The aim of the present study was to evaluate the quality of available evidence for prospective cohort studies on arthroscopic treatment for FAI. In the present study, the hypothesis was that there would be wide variation in methodological quality, with an improvement in quality over time.
Methods
This systematic review was conducted in accordance with the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. 52
Eligibility Criteria
The study inclusion criteria were clinical prospective cohort studies of primary arthroscopic surgery for cam and/or pincer morphology FAI. Only studies with clinical outcomes, patient-reported outcomes, and/or complications were included. Studies with only radiographic outcomes were not included, nor were studies that comprised <8 patients. Further exclusion criteria were cohorts described as adolescent and/or with open physes, retrospective reviews of prospectively collected data, and studies relating to the validation of outcome scores. Comparative studies in which the main aim was to evaluate diagnoses other than FAI, including patients with FAI only as a control group, were also excluded.
Information Sources and Search
A systematic literature search was performed in PubMed, Embase (OvidSP), and the Cochrane Library in January 2018. Searches were conducted with controlled vocabulary and title/abstract words. Variations of the words hip impingement OR CAM impingement OR femoroacetabular impingement OR FAI were used with variations of the word arthroscopy. The searches were performed and validated by a librarian at the Sahlgrenska University Hospital Library, Gothenburg, Sweden. Detailed search strategies for all databases are available in Appendix Tables A1 to A3.
Data Collection and Analysis
Study Selection
All the studies yielded by the electronic search were sorted per their abstracts by 2 reviewers (A.Ö., L.K.), and both reviewers sorted all 3 databases. Separate studies of the same cohort were included, as meta-analysis of data was not planned. The included studies were then categorized as a cohort study, a nonrandomized comparative study, or an RCT. Studies were analyzed in full text if the abstract did not provide enough data to make a decision in terms of fulfilment of inclusion criteria. The researchers were not blinded to the author, year, and journal of publication. Disagreement between the reviewers was resolved by consensus or by discussion with the senior author (M.S.) when consensus was not reached. Interobserver agreement for the reviewers’ assessment of study eligibility was calculated with the Cohen κ coefficient.
Data Collection Process
For studies that lacked any data, attempts to contact the corresponding authors were made via email.
Data Items
The data extracted from the included studies were as follows: year, country, study size, age, sex ratio, follow-up time, and outcome measurements. If reported, demographic data including patients lost to follow-up were always presented in favor of demographic data excluding patients lost to follow-up. For comparative studies, if both groups matched the inclusion criteria of the present study, the total of both groups became the presented study size; if only 1 group matched the inclusion criteria, only its size was included as the study size. For age and sex ratio, if both groups in a comparative study matched the inclusion criteria, the mean age and sex ratio for each group were included, if reported. Regarding the follow-up time, if both the mean and minimum follow-up times were reported, only the mean follow-up time was included; if the mean follow-up time was not reported, the minimum follow-up time was included. Only clinical outcome scores were included. For studies with several outcome scores, a maximum of 3 per study were included.
Synthesis of Results
No statistical meta-analysis was performed, owing to the heterogeneity of outcome reporting.
Assessment of Risk of Bias
The risk of bias was evaluated with the Methodological Index for Non-randomized Studies (MINORS). 67 The MINORS is a validated instrument used to determine the methodological quality of nonrandomized surgical studies, comparative and noncomparative. It consists of 8 items for noncomparative studies and an additional 4 items for comparative studies. The maximum score is 16 for noncomparative studies and 24 for comparative studies. For noncomparative studies, the scores are as follows: 0-4, very low quality; 5-8, low quality; 9-12, moderate quality; and 13-16, high quality. For comparative studies, the scores are as follows: 0-6, very low quality; 7-12, low quality; 13-18, moderate quality; and 19-24, high quality. 39 The scoring method is described in Appendix Table A4. To study the development of methodological quality over time, articles published during the first 5 years of the period were compared with papers published during the last 5 years of the period. Noncomparative and comparative studies were analyzed separately. The methodological quality of RCTs was additionally assessed by the Coleman Methodology Score (maximum score, 100). 14
Statistical Analysis
Statistical analysis was performed with the SAS System for Windows (v 9; SAS Institute). Descriptive data are presented as mean ± SD. Comparisons between articles were made with the Mann-Whitney U test, as data followed a nonparametric distribution. Statistical significance was defined with an alpha of .05 for 2-sided tests. For agreement between reviewers for full-text screening, kappa and percentage agreement were calculated.
Results
The electronic search yielded 891 studies in Embase, 65 studies in Cochrane Library, and 866 studies in PubMed. A total of 636 duplicates were removed, leaving 1186 unique studies. Of these, 1001 were excluded per their abstracts, and 132 were excluded after a full-text assessment. Ultimately, 53 studies were included. Figure 1 presents a flowchart of the included and excluded studies. The percentage agreement between reviewers was 93.5% (173 of 185), while the kappa for agreement between reviewers for full-text screening was 0.83 (95% CI, 0.74-0.92), indicating excellent agreement. 41

Outline of systematic search strategy used.
Characteristics of the Included Studies
Of the 53 studies included, there were 34 noncomparative studies, 15 nonrandomized comparative studies, and 4 RCTs. The included studies were published between 2008 and 2017 in 26 journals. Most studies came from the United States (n = 16), followed by Switzerland (n = 8) and the United Kingdom (n = 7). Of the patient-reported outcome scores recommended by the Warwick agreement, 29 the Hip Outcome Score was the most frequently reported outcome (n = 13), followed by the international Hip Outcome Tool (n = 7) and the Copenhagen Hip and Groin Outcome Score (n = 6). Demographic data for each study are presented in Table 1.
Demographic Data and MINORS Scores a
a ADL, Activities of Daily Living; AU, Australia; BRT, brake reaction time; CMS, Coleman Methodology Score; EQ-5D, EuroQoL-5 Dimension; EQ-5D-5L, EuroQoL-5 Dimensions 5 levels; FAA, Functional Activity Assessment; HAGOS, Copenhagen Hip and Groin Outcome Score; HHS, Harris Hip Score; HOOS, Hip disability and Osteoarthritis Outcome Score; HOS, Hip Outcome Score; HSAS, Hip Sports Activity Scale; iHOT, international Hip Outcome Tool; LEFS, Lower Extremities Functional Scale; med, median; mHHS, modified Harris Hip Score; min, minimum; MINORS, Methodological Index for Non-randomized Studies; MS, muscle strength; NAHS, Non-arthritic Hip Score; NRS, numeric rating scale; OHS, Oxford Hip Score; ROM, range of motion; RTS, return to sport; SFS, Sports Frequency Score; SF-12, 12-Item Short Form Health Survey; STST, Sit-to-Stand Test; SUI, Switzerland; TMG, tensiomyography; VAS, visual analog scale; VPT, Vibratory Perception Threshold; WOMAC, Western Ontario and McMaster Universities Osteoarthritis Index; X, no data.
b Age is reported as mean years. Values separated by a slash report two cohorts. MINORS scores are reported as total/maximum.
c CMS: 87 of 100 maximum.
d CMS: 79 of 100 maximum.
e CMS: 80 of 100 maximum.
f CMS: 70 of 100 maximum.
Risk of Bias
None of the included studies received a full score according to the MINORS. The best noncomparative study received a global score of 14 out of 16. The mean ± SD global MINORS score for all noncomparative studies was 10.4 ± 1.4. The areas of weakest reporting for noncomparative studies were an unbiased assessment of the study endpoint and a prospective calculation of study size and endpoints appropriate to the aim of the study. The best comparative studies received a global MINORS score of 21 out of 24 (n = 3 studies), and the mean global score for all comparative studies was 18.7 ± 2.0. The areas of weakest reporting for comparative studies were an unbiased assessment of the study endpoint, a prospective calculation of study size, and loss to follow-up <5%. Apart from the prospective collection of data, which was an inclusion criterion, the area of strongest reporting for both noncomparative and comparative studies was a clearly stated aim. For noncomparative studies, an adequate control group was an area of equally strong reporting. Table 1 presents the MINORS score for each study. More studies were published during the last 5 years of the period (n = 37) than the first 5 years (n = 16). There were no statistically significant differences in MINORS scores between studies published during the first 5 years of the period and those published during the last 5 years, either for noncomparative studies (10.3 ± 1.2 vs 10.4 ± 1.5, P ≥ .999) or for comparative studies (17.7 ± 1.5 vs 18.9 ± 2.0, P = .21). The mean Coleman Methodology Score for RCTs was 79.0 ± 7.0. Weak areas of methodological quality for RCTs were short follow-up time and description of postoperative rehabilitation.
Discussion
Key Findings
The most important finding in this systematic review was that the method of published prospective cohort studies is of moderate quality for comparative and noncomparative studies, with only a few studies having low methodological quality. Despite an increase in the number of studies published during the last 5 years of the period, the hypothesis that the methodological quality would improve over time was not confirmed.
To our knowledge, this is the first systematic review assessing the methodological quality of prospective cohort studies that evaluate arthroscopic surgery for FAI. This review addresses a growing area of research and evaluates the best available evidence related to FAI. An extensive and comprehensive database search and adherence to strict guidelines are further strengths of the present study.
Limitations
The limitations of the present study are the restriction to studies published in English and the fact that not all data were available, despite an effort to contact authors. The limited number of studies also impaired the opportunity for more robust statistical analysis or pooling of data. As the present study sought to evaluate the available research with the highest possible level of evidence, retrospective studies were excluded. This might have somewhat biased the results.
Prospective cohort studies are generally regarded as a lower level of evidence, and the present study reveals that the mean methodological quality of such available studies is moderate. The results in the present study are similar to those reported by Khan et al, 39 who, in a systematic review of the utility of hip injections for FAI, reported that the methodological quality according to the MINORS was 11 for noncomparative studies and 17.3 for comparative studies. Similar results were reported by Sim et al 66 in a systematic review of non–hip score outcomes after surgery for FAI. Moreover, it was not possible to confirm an improvement in methodological quality over time in the present study. Overall, most studies failed to use an unbiased assessment of study endpoint or to report reasons for not blinding. In addition, a prospective calculation of study size was lacking in most included studies. For RCTs, weak areas of methodological quality were short follow-up time and description of postoperative rehabilitation. However, not all studies failed in these areas, indicating that improvements in methodological quality would be possible in future studies.
Future Directions
Given the natural progression of a research field, the accessibility of level 1 studies in terms of the outcome of arthroscopic treatment for FAI will probably increase in the future. However, because of the difficulty involved in performing an RCT and its inherent weaknesses with narrow inclusion criteria, there will still be a need for observational studies, and it is therefore important to also improve the methodological quality of lower-level studies. 69 Based on the findings in the present study, it is important that authors of comparative and noncomparative studies strive to utilize an unbiased assessment of the study endpoint when possible and calculate the study size prospectively. Noncomparative studies could, in addition, benefit from using endpoints appropriate to the aim of the study, and authors of comparative studies may consider action to minimize the loss to follow-up. RCTs need longer follow-up times and a more detailed description of postoperative rehabilitation.
Conclusion
The methodological quality of prospective cohort studies evaluating arthroscopic surgery for FAI is moderate for comparative and noncomparative studies. Common areas for improvement include unbiased assessment of study endpoints and prospective sample size calculations. Despite an increase in the number of published studies, an improvement in methodological quality over time was not observed.
Footnotes
Acknowledgment
The authors express their gratitude to Kristian Samuelsson for his substantial contributions to the conception and design of the work.
One or more of the authors has declared the following potential conflict of interest or source of funding: O.R.A. is an educational consultant for the speakers’ bureau of Smith & Nephew and ConMed. AOSSM checks author disclosures against the Open Payments Database (OPD). AOSSM has not conducted an independent investigation on the OPD and disclaims any liability or responsibility relating thereto.
Appendix
MINORS Scoring Method a
| Item 1 |
| • 0: Aim not reported |
| • 1: Aim reported but not precise |
| • 2: Aim is precise |
| Item 2 |
| • 0: Inclusion not reported |
| • 1: Inclusion reported but not consecutive |
| • 2: Inclusion of consecutive patients, or reasons for exclusion were reported |
| Item 3 |
| • 0: NA |
| • 1: NA |
| • 2: Prospective collection of data |
| Item 4 b |
| • 0: Endpoints not reported |
| • 1: Clinical endpoints but not iHOT, HAGOS, or HOS |
| • 2: The endpoints used are iHOT, HAGOS, or HOS |
| Item 5 c |
| • 0: Evaluation of endpoints not blinded |
| • 1: Blind evaluations of objective endpoints and double-blind evaluation of subjective endpoints but inadequate blinding |
| • 2: Blind evaluations of objective endpoints and double-blind evaluation of subjective endpoints; or reasons for not blinding were reported |
| Item 6 d |
| • 0: Follow-up period not reported |
| • 1: Follow-up period reported but less than mean 2 y |
| • 2: Follow up period mean 2 y or longer |
| Item 7 |
| • 0: Loss to follow-up not reported |
| • 1: Loss to follow-up ≥5% |
| • 2: Loss to follow-up <5%; or, number of patients lost to follow-up should not exceed proportion experiencing major endpoint e |
| Item 8 f |
| • 0: Study size was not calculated |
| • 1: Study size was calculated, but actual study size was smaller than calculated size |
| • 2: Study size was calculated, and actual study size was equal to or larger than calculated size |
| Item 9 g |
| • 0: Characteristics of control group not reported |
| • 1: Control group assessed as inadequate by the first author |
| • 2: Control group assessed as adequate by the first author |
| Item 10 |
| • 0: Not reported if groups were contemporary or not |
| • 1: Reported but not contemporary groups |
| • 2: Contemporary groups |
| Item 11 |
| • 0: Baseline equivalence of groups not reported |
| • 1: Baseline equivalence of groups questioned by the authors of the respective study |
| • 2: Baseline equivalence of groups not questioned by the authors of the respective study |
| Item 12 |
| • 0: No statistical analyses were performed |
| • 1: Statistical analyses were performed but no P values were presented |
| • 2: A P value was presented |
a Items 8 to 12 were used only for nonrandomized comparative studies and randomized comparative studies. HAGOS, Copenhagen Hip and Groin Outcome Score; HOS, Hip Outcome Score; iHOT, international Hip Outcome Tool; NA, not applicable.
b The “intention to treat” aspect was deemed irrelevant for the majority of the included studies and was therefore not considered, to avoid bias.
c A study was considered to be blinded as long as some part of the treatment was blinded; the surgery per se did not need to be blinded.
d If the mean follow-up was not reported, the minimum follow-up was used instead.
e Used only when a major endpoint was clearly stated
f Any calculation of study size was accepted. The calculation of study size had to be performed for at least 1 of the outcomes, but it was not necessary for all outcomes.
g An assessment of adequateness was performed per the aim of each study.
