Abstract
Background
Artificial intelligence (AI) has rapidly gained momentum in the field of orthopaedics, with an increasing number of systematic reviews and meta-analyses providing synthesised evidence. However, most studies have focused on individual subspecialties or specific applications, and a comprehensive overview across the discipline is lacking.
Aim
The aim of this study is to chart publication trends and geographical distribution, classify clinical and anatomical focus, and map AI methodologies and applications in orthopaedic settings, thereby highlighting research opportunities in underexplored areas.
Methods
We conducted a scoping review of freely accessible systematic reviews with and without meta-analysis across PubMed, Web of Science and Scopus databases from year 2015 up to July 2025 that evaluated the use of AI in orthopaedics. Data were extracted on publication characteristics, geographical origin, orthopaedic subspecialty focus, anatomical region, AI methodologies, data modalities, and application types. The methodological quality of the included reviews was appraised using the A Measurement Tool to Assess Systematic Reviews-2 (AMSTAR-2). Descriptive trends were summarised, and associations between variables were analysed using R software.
Results
We identified 183 eligible systematic reviews published in the last 10 years, with an exponential increase in publications over the past 5 years. Most reviews concentrated on fractures, arthroplasty, and surgery-related studies, particularly in the spine, knee, and hip. Imaging datasets predominated, with deep learning most frequently applied to radiological tasks, while machine learning methods were more common in structured clinical data applications. Notable gaps remain in underrepresented anatomical regions and in underexplored applications such as prescriptive modelling.
Conclusion
Our review highlights that while there is rapid growth in AI research across orthopaedics, certain clinical domains remain underexplored. These gaps represent opportunities for future work to align AI methods with clinical needs. By addressing these areas, AI has the potential to effectively support orthopaedic care and improve patient outcomes.
Introduction
Artificial intelligence (AI) has emerged as one of the most significant technological innovations in healthcare, offering new possibilities for improving diagnostic accuracy, optimising treatment planning, predicting patient outcomes, and personalising patient care. 1 The rise in AI application in healthcare is exerting prominent impact, not only in potentially enhancing clinical decision-making while reducing human error, but also improving efficiency of healthcare delivery to the patients. 2 In orthopaedics, AI applications have expanded rapidly in recent years, encompassing image-based diagnosis, prognostic modelling, surgical planning and navigation, rehabilitation monitoring, and even decision support systems. 3
The resurgence of interest towards the use of AI in orthopaedics has resulted in an increase in the number of published systematic reviews and meta-analyses. These studies are important to consolidate current available knowledge, assess the reliability of evidence and understand the limitations of AI application in orthopaedic settings. While these studies provide clinicians and researchers with synthesised evidence and diverse perspective, the large number of literature and diversity of these systematic reviews and meta-analyses may present a challenge for clinicians and researchers in fully understanding the research landscape in this field. Previous systematic reviews typically focus on individual subspecialties or specific applications of AI (e.g., fracture detection, joint arthroplasty, or outcome prediction); and no study has comprehensively mapped the breadth of AI research across the entire orthopaedic discipline. Significant gaps are still present in the reporting of the trend of AI use in orthopaedics especially across anatomical regions, clinical domains and types of applications for which the technology is used. Clarifying the relationship between orthopaedic focus, data modalities, AI methodologies, and clinical applications can provide important insights into how the AI application in the field has evolved and where it should progress next.
The objective of this scoping review is to identify and summarise published systematic reviews and meta-analyses that examine the use of AI in orthopaedics. Specifically, we aim to chart publication trends and geographical distribution, classify clinical and anatomical focus, and also map AI methodologies and applications in orthopaedic settings. By providing a structured overview of this emerging body of evidence, our review highlights current priorities, underexplored domains, and guide future directions for AI research in orthopaedics.
Materials and methods
Search strategy
The search and selection process were conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) (Figure 1). A search strategy was formulated and executed in three electronic databases (PubMed, Web of Science (WoS) and Scopus) from 1 January 2015 to 23 July 2025. Filters were applied in each database to retrieve reviews only: Systematic Review & Meta-analysis filters in PubMed; Review Article filter in WoS and Scopus. For accessibility reasons, we limited searches to English-language articles and to Free Full Text in PubMed or Open Access (Gold, Gold-Hybrid) in WoS and Scopus.
Summary of search queries using selected keywords.
Inclusion and exclusion criteria
The PCC framework (Population, Concept and Context) recommended by the Joanna Briggs Institute (JBI) for scoping review was used a guide to identify the main concepts and eligibility criteria for our scoping review. 4 This review focuses on orthopaedic conditions concerning human population in clinical settings (musculoskeletal injuries, degenerative diseases, trauma, rehabilitation, oncology, etc.). The primary concept of this review are artificial intelligence (AI) applications, including machine learning, deep learning, radiomics, natural language processing, and other computational methods, as reported in systematic reviews and/or meta-analyses. Published reviews conducted in any geographical region, in any clinical or research setting related to orthopaedics were considered.
The inclusion criteria of the current study are as follows: (1) systematic reviews and meta-analyses that explicitly report systematic search methods (search strategy, inclusion criteria) and that synthesize primary studies, (2) reviews that evaluate or report on AI applications in orthopaedics (e.g., diagnosis, prognosis, image analysis, surgical planning/assistance, treatment assistance, etc.), (3) patients with musculoskeletal/orthopaedic conditions (excluding dental/orthodontic settings), (4) only published systematic review articles in English language were considered.
The exclusion criteria are as follows: (1) scoping reviews, narrative reviews, editorials, commentaries, primary studies, conference abstracts without full systematic review text, (2) reviews on non-orthopaedic specialties, (3) reviews without AI use or integration, (4) publications not available in English.
Data extraction
A title-and-abstract and full-text screening was performed by experienced reviewers based on the search strategy and eligibility criteria. We documented reasons for exclusion and present the selection process using a PRISMA flow diagram (Figure 1). PRISMA flow diagram of the study selection process.
The following data were extracted for each included review: (i) bibliography details (first author, title, year of publication, region of corresponding author); (ii) orthopaedic focus; (iii) anatomical region(s) of the body; (iv) AI application(s); (v) type of AI used; (vi) data modalities; and (vii) number of primary studies reviewed.
Methodological quality assessment
The methodological quality of the included reviews was appraised using the A Measurement Tool to Assess Systematic Reviews-2 (AMSTAR-2). 5 To interpret weakness detected in each of the critical domains in AMSTAR-2, a quantitative score was calculated by assigning 1 point for ‘yes’, 0.5 points for ‘partial yes’, and 0 points for ‘no’, resulting in a maximum of 7 points for systematic reviews with meta-analyses and 5 points for those without.
Statistical analysis
Descriptive analysis was done to explore the trend and pattern of our dataset. Frequencies were used for categorical variables, while measures of central tendency and dispersion (mean, median, standard deviation, and interquartile range, where appropriate) were calculated for continuous variables.
To explore associations between categorical variables, we conducted cross-tabulation analyses using Pearson’s Chi-square tests. Strength and direction of relationships between continuous variables were assessed using Spearman’s rank correlation coefficient as the data did not fully meet the assumptions of normality required for Pearson’s correlation. Correlation coefficients were interpreted according to conventional thresholds (e.g., 0.1–0.3 = weak, 0.3–0.5 = moderate, >0.5 = strong).
For comparisons of continuous variables across more than two independent groups, we employed the Kruskal–Wallis H test. When significant differences were detected, Dunn’s post-hoc pairwise tests with Bonferroni correction were conducted to identify specific group differences while adjusting for multiple testing.
All statistical analyses were conducted using R software, version 4.5.1. A two-tailed p-value of <0.05 was considered statistically significant.
Results
Study selection
Extracted data and methodological quality of included systematic reviews. ML – Machine learning, DL – Deep learning, NLP – Natural language processing.
Table 2 provides a consolidated overview of the reviewed studies, detailing their publication characteristics, orthopaedic focus, AI applications, and methodological quality. This table is structured to help readers directly relate each AI application to familiar clinical conditions. The orthopaedic domains are categorised according to major anatomical regions such as the spine, knee and hip with examples of specific focus areas including fracture, arthroplasty, oncology, etc. Correspondingly, the AI applications are mapped to clinical tasks such as diagnosis, surgical assistance and image analysis to name a few. This summary serves to illustrate AI utilisation across various anatomical regions and orthopaedic subspecialties which forms the foundation for our subsequent analyses presented in later sections. The accompanying discussion further elaborates on observed trends and identifies research areas that are well-established versus those that remain underexplored.
Publication trends
From 2016 to 2024, the number of systematic reviews and meta-analyses on AI applications in orthopaedics increased markedly. Publication activity began to accelerate in 2019 (n = 3), followed by a steady annual increase, peaking in 2024 with 58 publications. As illustrated in Figure 2, we observed the steepest acceleration between 2023 and 2024, with an addition of 23 publications, representing the most single-year increase, while the second steepest period occurred from 2021 to 2022, adding 15 publications annually. Modest increment was observed between 2020 – 2021 and 2022 – 2023 with 8 publications added each year, indicating a period of steady, sustained growth. Overall, the trend demonstrates a sustained upward trajectory in research activity over the last decade. Number of papers published per year from 2016 until July 2025.
Geographic distribution
A total of 183 publications originated from 35 countries. The United States (n = 29, 15.9%), China (n = 23, 12.6%), and the United Kingdom (n = 20, 10.9%) were the top three contributors, together accounting for nearly 40% of global publication output. Other notable contributors included Italy (n = 16), the Netherlands (n = 10), and Australia, Germany, Korea, and Malaysia (n = 6 each). The Herfindahl–Hirschman Index (HHI) for this distribution was 734. A value below 1500 indicates a highly dispersed authorship base with minimal concentration in any single country. Figure 3 shows a geographical representation of publication density by country. Geographic distribution of publications. The map shows the number of publications by country, with the colour scale representing the publication count.
We further examine the correlation between the number of publications and the gross domestic expenditure on research and development (GERD) of the country, expressed as a percentage of gross domestic product (GDP). Data for GERD were obtained from the World Bank Open Data for the most recent year recorded for each country. As shown in Figure 4, a statistically significant positive correlation was observed between publication output and GERD (correlation coefficient = 0.352, p = 0.0381), suggesting that higher national investment in R&D is associated with greater scientific productivity in the field of orthopaedics. Relationship between national R&D expenditure and publication output.
Distribution of other study variables
Figure 5 shows the distribution of data within each categorical variable we examined in this review. Among the included systematic reviews and meta-analyses, the most common orthopaedic focus was on fracture (n = 28), followed by arthroplasty (n = 25) and surgery (n = 18). Number of studies based on (a) orthopaedic focus, (b) body parts, (c) AI applications, (d) AI types, and (e) Data types.
Most of the studies involved general or multiple body regions (n = 69). This was followed by studies focusing on the spine (n = 41), knee (n = 33), and hip (n = 22). The shoulder (n = 7) and lower limb (n = 9) were less frequently studied, with very few reviews focusing on the wrist, upper limb, ribs, and foot/ankle (n = 1–4 each).
AI was most often applied for diagnosis (n = 86), followed by prognosis (n = 69). Other applications included surgical planning/assistance (n = 22), image analysis (n = 22), and treatment assistance (n = 12). Machine learning (n = 137) and deep learning (n = 133) dominated the AI approaches reported in the reviews. Fewer reviews mentioned hybrid/other models (n = 10), natural language processing (n = 6), or rule-based systems (n = 2). For data types, most reviews evaluated AI models using medical imaging data (n = 119) and/or clinical data (n = 68). Sensor data (n = 24) and radiomics data (n = 7) were less common.
A cross-tabulation between each variable pair was performed to evaluate the distribution of data as shown in Figure 6. To ensure valid chi-square test assumptions, we initially reduced the number of categories within each categorical variable by excluding those with insufficient frequencies. Specifically, we retained only the most frequent categories: Orthopaedic_Focus (Fracture, Arthroplasty, Surgery, General), Body_Part (General, Spine, Knee, Hip), AI_Application (Diagnosis, Prognosis, Surgical assistance, Image analysis), AI_Type (Machine Learning, Deep Learning), and Data_Type (Medical Imaging, Clinical Data). Despite this preliminary filtering, three variable pairs (Orthopaedic_Focus × Body_Part, Orthopaedic_Focus × AI_Application, and Body_Part × AI_Application) still do not meet the chi-square assumption where no more than 20% of cells should have expected frequencies less than 5. These pairs were excluded from further analysis due to insufficient cell frequencies that could lead to unreliable test results. Heatmaps of cross-tabulation between variable pairs. Red outlines indicate significant pairs.
Chi-square test results showing the relationships between study variables.
The table shows the chi-square statistic, degrees of freedom (df) and p-value.
Publication trends based on study variables
We analysed temporal trends in all 5 categorical variables studied from year 2015 – 2024 using Spearman’s correlation test, as shown in Figure 7. Specifically, our results show that all AI applications (Diagnosis, Image analysis, Treatment assistance, Prognosis, Surgical assistance, and Biomechanical analysis) & AI Types (Deep learning, Machine learning, Hybrid/Others, and Natural language processing) were positively correlated with publication year (p < 0.05; correlation coefficient ≥ 0.70). Publication trends up to year 2024 for the top six categories of (a) orthopaedic focus, (b) body part, (c) AI application, (d) AI type, and (e) data type.
Anatomical regions (General, Knee, Spine, Hip, Lower Limb, Shoulder) mostly showed statistically significant positive correlation with year, but studies involving upper limb were only moderately correlated with publication year (correlation coefficient 0.50 ≤ |rs| < 0.70), while wrist studies showed no significant correlation with publication year (p ≥ 0.05).
Data types (Medical Imaging, Clinical Data, Sensor Data, Text/Unstructured) mostly showed statistically significant positive correlation with publication year, but radiomics studies showed no significant correlation with publication year.
Orthopaedic focus (Arthroplasty, Oncology, Rehabilitation, Injuries, Fracture, Surgery, Kinematics, Scoliosis, Trauma, General, Osteoarthritis, Musculoskeletal Disorders, Degenerative Disease) mostly showed statistically significant positive correlation with publication year; however, the number of studies involving tears and osteoporosis only moderately correlated with publication year (correlation coefficient 0.50 ≤ |rs| < 0.70), while osteoporosis-related fractures and back pain studies showed no significant correlation with publication year (p ≥ 0.05).
Methodological quality (AMSTAR-2)
Methodological quality was assessed separately for systematic reviews that have meta-analysis (n = 52), and those without meta-analysis (n = 131). Shapiro-wilk test for normality showed statistical significance (p < 0.05), indicating that the data was not normally distributed. Therefore, we proceeded with non-parametric Spearman’s correlation test.
Studies without meta-analysis revealed a statistically significant weak positive correlation with publication year (rs = 0.268, p = 0.002, N = 131), indicating that study quality, as measured by AMSTAR scores, has improved over time. For studies with meta-analysis, a better correlation was seen between publication year and AMSTAR scores (rs = 0.363, p = 0.008, N = 52), indicating a clearer trend of quality improvement compared to systematic reviews without meta-analysis (Figure 8). Publication trends for systematic review with and without meta-analysis.
Discussion
In this scoping review, we aim to map and provide an overview of systematic reviews and meta-analyses evaluating AI applications in orthopaedic settings. Our study showed steady increase in the number of published systematic reviews with and without the inclusion of meta-analyses from year 2016 to 2024. This reflects growing interest and increased investment into AI research in the field of orthopaedics. The steep growth especially after 2020 suggests the possibility of growth in this area may be due to the COVID-19 pandemic’s impact on healthcare digitization. 189
Geographically, we observed publishing dominance in the USA, China and the UK, reflecting regional research strengths and resource distribution in AI technologies in healthcare. 190 This pattern is consistent with our statistical findings, where we have identified a modest but significant positive correlation between GERD and publication output, thus providing a plausible explanation for the leading number of publications by countries with established research infrastructure and sustained research & development (R&D) funding. Similar associations were also reported by Meo, Al Masri 191 who demonstrated a strong positive correlation between national R&D expenditure and respective bibliometric outputs across Middle Eastern countries. While informative, these geographical findings should nevertheless be interpreted with caution. Since our search strategy included only open-access and freely available full-text articles, the dominance of certain countries may partly reflect differences in open-access publishing mandates and institutional policies rather than true disparities in research output. High-income countries and well-funded institutions are more likely to publish in open-access journals or comply with funder requirements for public access, whereas studies from regions with lower open-access adoption may be underrepresented. This potential bias should be considered when interpreting the observed global distribution. Interestingly, Figure 4 shows that countries with lower GERD, such as Malaysia, produced a comparable number of AI-related publications to those of higher-GERD nation like Korea, suggesting that substantial funding is not the only determinant of progress in this field. Our findings of a relatively low HHI (734) reflects a diverse authorship landscape spreading across multiple countries with contributions not only from high-income nations but also from middle-income countries as well, indicating the growing interest of AI use in orthopaedic settings within these regions.
Our review highlights that AI applications in orthopaedics are heavily concentrated in domains of fractures, arthroplasty, and surgery-related studies, especially in high-burden anatomical sites such as the spine, knee, and hip. For instance, fractures are one of the most common orthopaedic conditions worldwide with high morbidity and healthcare costs, making them a suitable target for implementing AI-driven diagnostic and prognostic tools. 192 Similarly, knee and hip arthroplasty procedures are among the fastest-growing elective surgeries, with demand projected to increase dramatically within ageing populations to address the growing needs. 193 These domains appear to attract the greatest interest in AI applications, likely due to the greater availability of specialists and also the abundance of available data in these areas as imaging modalities provide a robust source for training AI models. 194 These patterns suggests that research activity is not only influenced by disease burden but also by the accessibility and availability of high-quality datasets. Our findings on these well-research domains were further supported by a supplementary screening of the excluded studies due to inaccessible full text (n = 65), which revealed similar patterns in research focus areas. Among these additional studies, fracture diagnosis remained the most investigated domain (13 studies), with the majority examining general fracture detection (6 studies) and upper limb fractures (4 studies). Surgery-related applications also emerged as a significant area of interest (8 studies), particularly for spine procedures (6 studies), where AI is being explored for both intraoperative assistance and postoperative outcome prediction. Other notable focus areas include arthroplasty outcome prediction (3 studies), osteoporosis diagnosis (3 studies) and detection of ACL and meniscal tears (3 studies), which reflect the predominance of diagnostic applications identified in our main analysis. However, as this analysis was restricted to abstracts, the observation should be interpreted cautiously and viewed as contextual rather than conclusive.
An interesting finding was seen between orthopaedic focus areas and the utilization of data modalities as significant association found between them reveals that different orthopaedic subspecialties have distinct preferences for different data sources. This shows that each specialty would have unique clinical characteristics and will require specific diagnostic requirements or methodology to achieve clinically useful outcomes. Fracture research, for example, shows heavy reliance on imaging datasets which is in line with the radiological nature of fracture diagnosis and the standardized protocols for assessing fractures. 195 In contrast, surgical studies adopt a more balanced integration of using both imaging and clinical data. This is because operative decision-making is usually comprehensive and requires not only the consideration of the anatomical precision from imaging data, but also patient-specific clinical information such as their surgical or medical history. 196 This subspecialty-specific pattern suggests that orthopaedic AI research is increasingly adapting to the clinical protocols and workflow requirements of each discipline. However, several anatomical regions such as the upper limb and foot/ankle remain relatively underexplored despite their clinical importance. Addressing these gaps could potentially expand the applicability of AI in research and provide far broader clinical relevance across orthopaedic settings.
Another significant association was observed between AI type and data modality which highlights a key difference in methodological approach within orthopaedic AI research. Deep learning was the most dominant approach in medical imaging applications (112 studies), reflecting its ability to extract complex features from high-dimensional visual data such as radiographs, CT, and MRI modalities using convolutional neural networks (CNNs). For instance, Thian, Li 197 used Faster R-CNN (Inception-ResNet v2) to detect and localize wrist fractures and was able to achieve sensitivity of 91–96% across views. In contrast, machine learning methods show greater prominence in clinical data applications. This is because algorithms like random forests, support vector machines, and logistic regression excel in processing structured datasets, which are often smaller and more tabular in nature. 198 These methods offer advantages in interpretability, and computational efficiency, making them well-suited for applications such as pre-processing of electronic health record data, risk stratification and outcome predictions. 199 Together, these patterns suggests that methodological selection in orthopaedic AI research requires alignment between data modality, clinical task and algorithmic strength, with deep learning being the standard for imaging-based diagnostics, while machine learning plays central role in prognostics and decision-making.
The strong association between AI application and data type reflects the distinct data requirement of diagnostic versus prognostic models. We observed that diagnostic applications are primarily image-driven, with 86 studies relying on radiological data. This reflects the synergy with visual diagnostic tasks in orthopaedics, such as fracture detection, 155 anatomical measurement, 75 pathology identification, 124 etc. The visual nature of orthopaedic conditions combined with the standardized protocols for acquiring medical images creates optimal conditions for AI-assisted diagnosis where deep learning algorithms can potentially identify subtle findings that might be overlooked in routine clinical practice. On the other hand, prognostic applications exhibit a more balanced distribution across clinical and imaging data, reflecting the recognition that outcome prediction requires multimodal information. Prognostic models often integrate patient demographics, surgical or medical history, functional data and even lifestyle considerations in addition to imaging. 200 This more holistic approach acknowledges that patient outcomes are influenced by a mixture of biological, functional and environmental factors, not all of which are visible on imaging.
Implications and future directions
From our review, we were able to see growing interest in the use of AI in orthopaedics research with the steady rise in number of publications in this field. Collectively, these findings reveal that orthopaedic AI research is influenced by factors such as clinical burden, data availability, and methodological pragmatism. Researchers and healthcare institutions can make use of these findings to plan and guide research for AI implementations and integrations. Our findings can help inform selection of suitable technology and increase efficiency of resource allocation by understanding which approach would work best for specific orthopaedic areas, specifically with the various anatomical regions, clinical tasks and data modalities, underscoring the adoption of AI in orthopaedic settings to be highly context dependent.
The importance of sustained investment and improved research infrastructure in driving innovation is highlighted by the significant positive correlation between national R&D expenditure and their respective publication outputs. Furthermore, there are several avenues for researchers to explore for future research as our findings not only show potential opportunities for expanding AI research into underrepresented anatomical regions such as in the shoulder, hand, foot and ankle, but also highlights the need for future studies to explore emerging application areas such as using prescriptive modelling approaches. 201 Prescriptive models go beyond the conventional diagnosing and risk prediction tasks to recommend specific actions or interventions that are likely to lead to the best outcomes for individuals and patients, thus helping to guide personalized interventions based on unique patient profiles.
Limitations
There were several limitations in the current systematic review and meta-analysis that are worth noting. First, our scope was restricted to only applications in orthopaedics, and therefore findings may not be generalizable to related specialties such as orthodontics or other musculoskeletal fields. Secondly, the literature search was limited to three electronic databases (PubMed, Scopus, and Web of Science), with no inclusion of grey literature sources such as conference proceedings, theses, or preprints. Additionally, only open-access and freely available full-text articles were included. This may have introduced selection bias, as subscription-based journals often host high-impact studies, potentially affecting the regional and institutional representation of the included literature. Nevertheless, the studies excluded due to unavailability of full texts accounted for only 22% of those that initially met the selection criteria and were sought for retrieval. A further review of their abstracts indicated research trends comparable to those identified in our full-text analyses of open-access publications. Finally, our methodological appraisal only assessed the critical domains of AMSTAR-2, and the non-critical domains of AMSTAR-2 were not evaluated in this study. While this could limit interpretation of the overall strength of evidence, our emphasis for this study is on mapping evidence and identifying research gaps rather than evaluating study quality.
Conclusion
Current AI research in orthopaedics is largely driven by clinical foci, data availability, and methodological feasibility, with certain domains remaining underexplored. Our review provides a comprehensive mapping of systematic reviews and meta-analyses in the field, offering guidance for researchers, clinicians, and policymakers to shape future research priorities effectively.
Footnotes
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Ministry of Higher Education, Malaysia; FRGS/1/2021/STGO1/UM/02/5(FP033-2021).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
