Abstract
Osteoarthritis (OA) is the commonest musculoskeletal disease worldwide, with an increasing prevalence due to aging. It causes joint pain and disability, decreased quality of life, and a huge burden on healthcare services for society. However, the current main diagnostic methods are not suitable for early diagnosing patients of OA. The use of machine learning (ML) in OA diagnosis has increased dramatically in the past few years. Hence, in this review article, we describe the research progress in the application of ML in the early diagnosis of OA, discuss the current trends and limitations of ML approaches, and propose future research priorities to apply the tools in the field of OA. Accurate ML-based predictive models with imaging techniques that are sensitive to early changes in OA ahead of the emergence of clinical features are expected to address the current dilemma. The diagnostic ability of the fusion model that combines multidimensional information makes patient-specific early diagnosis and prognosis estimation of OA possible in the future.
Introduction
Osteoarthritis (OA), one of the commonest forms of rheumatic diseases, is an age-related joint disorder. Pain and dysfunction are major symptoms of OA, and the knee joint is the most affected site.1,2 OA affects more than 520 million people globally in 2019,3,4 and the aging global population is driving a much greater increase in absolute number of new cases of OA, causing a growing burden on health services and societies worldwide. In the United States, the estimated medical costs and lost earnings caused by OA is 303 billion dollars annually. 5
To date, there are no disease-modifying drugs that can delay the progression of OA. Early identifying patients with OA or at a higher risk of progression could facilitate clinical decision-making and allow the design of more effective and specific therapeutic interventions. To date, X-rays are the ‘gold standard’ for radiographic osteoarthritis (ROA) diagnosis. However, these X-rays lack sensitivity and specificity in changes of bone and cartilage, which leads to a delay in timely clinical intervention. 6 Magnetic resonance imaging (MRI) can detect pre-radiographic abnormalities in bone and soft tissues. 7 There are multiple evaluation methods to assess OA abnormalities using MRI; however, the commonly accepted diagnostic standard for OA using MRI is still absent. In addition, clinical symptoms in many patients with OA are not consistent with imaging findings. 8
Research on OA biomarkers has attracted much attention, with the latest development of sequencing and imaging technologies.9,10 Currently, biochemical markers are mainly derived from blood, urine, and joint synovial fluid. 11 In the last few years, a growing number of investigators have been working on establishing quantitative imaging biomarkers as reproducible and reliable indicators for early diagnosis of OA. 12 Biochemical and radiographic factors are the major categories of intense research for identifying OA biomarkers.13,14
Machine learning (ML) is a type of artificial intelligence (AI) that applies algorithmic methods and enables machines to solve problems without specific computer programing. 15 ML can be mainly divided into supervised learning and unsupervised learning. The most common classifications of supervised learning include classification and regression. Typical examples of unsupervised learning include clustering, including k-means clustering, Hierarchical clustering and dendrograms. ML was first promoted in the 1950s and has recently shown impressive performance in a variety of domains. 16 Automatic diagnosis and evaluation of disease prognosis based on ML models have shown promising potential in fields such as cancer, sepsis, and ankylosing spondylitis.17–19 The use of ML in OA diagnosis has also increased dramatically in recent years. Therefore, we aimed to review the research progress on ML in the early diagnosis of OA when patients have few or no symptoms, but there may be early subclinical structural changes. 20 We also discuss the current trends and limitations of ML approaches and propose future research priorities for this approach in the field of OA.
Materials and methods
A literature search was conducted using different databases such as PubMed, Web of Science, Embase, Google Scholar, and Cochrane Library for all types of articles on the application of ML in the early diagnosis of OA. Additional references were identified by searching the bibliographies of retrieved articles. We performed a keyword search restricted to the abstract of the articles, using terms relating to ML and early diagnosis of OA. The date of publication was not used as an entry selection criterion. Articles were reviewed according to their titles and abstracts, with all article types taken into consideration if they provided data relevant to the research questions.
ML application in conventional radiographs and computed tomography scans
X-ray imaging is frequently used to define ROA, because it is cost-effective, easy, and rapid. 6 The Kellgren–Lawrence (KL) grading system and Osteoarthritis Research Society International (OARSI) atlas are both widely used, yet both suffer from the subjectivity of the readers. 21 ML could be a promising technique to improve this situation by automating the two grading systems. 21 One study comparing the reliability of ROA assessment between non-clinical readers and an experienced radiologist, suggested that in cost-constrained settings, ML models may be potentially suitable readers of ROA for increased throughput in image inspection and accuracy, that can achieve ‘super-human performances’. 22 Antony 23 proposed an approach based on a convolutional neural network (CNN) for the synchronous analysis of KL and OARSI grades using two large public data sets, the Osteoarthritis Initiative (OAI) study and the Multi-center Osteoarthritis (MOST) study. Nevertheless, the agreements between the predictions and the labels of the test set for KL and OARSI grades were lower than the human observers’ inter-rater agreements. 21 In the study by Tiulpin and Saarakkala, 21 a method based on deep learning (DL) was developed to automatically perform KL and OARSI grading simultaneously from knee X-ray images using transfer learning, and demonstrated an excellent agreement with the test set labels. This was the first study of OA that used an independent test set for automatic OARSI grading from plain radiographs. Brahim et al. 24 developed a computer-aided diagnosis (CAD) system to detect early knee OA (KOA) using knee radiographs and ML algorithms. This approach was applied to the classification using naive Bayes and random forest (RF) classifiers on 1024 knee radiographs from the OAI database, which showed excellent diagnostic values for OA (82.98% accuracy, 87.15% sensitivity, and up to 80.65% specificity). However, it does not account for the variations in the acquisition parameters or the impact of soft tissue on X-ray absorption and diffusion. To build a clinically robust applicable model, another study produced the most advanced methods for KL grade, trained solely with the MOST data set and tested with the OAI data set. This method yielded a quadratic kappa coefficient of 0.83. The area under the receiver operating characteristic (ROC) curve was 0.93, and the average multiclass accuracy was 66.71%. This research indicates that the ability of the model to learn features related to OA is robust toward different data settings, 25 being practical in clinical settings. Bayramoglu et al. 26 used a landmark detection tool (Bone Finder) to automatically detect the patella region of interest (ROI) on the lateral view of 5507 knee X-ray images from the MOST database. They trained an ML model using a gradient boosting machine for the radiographic patellofemoral OA (PFOA) for texture detection, using local binary pattern (LBP) features. Furthermore, a deep neural network (DNN) method, that trained end-to-end, was used to directly detect PFOA on the texture patches. The results showed that textural ROI classification using DNN had good prediction performance, demonstrating the potential of using texture features of the patella in knee radiographs for PFOA prediction. 27
To automatically diagnose hip OA, a DNN method was trained and validated on 420 hip plain radiographs from Beijing Chaoyang Hospital, China. 6 The study applied a verified CNN model and compared it with diagnoses from physicians with different experience levels. The CNN model’s diagnostic performance was comparable to that of an attending physician with 10 years of experience. 6 Another study adopted a model based on end-to-end DL, namely, faster R-CNN (region-CNN), that can directly locate the knee joint in plain radiographs and quantify KOA severity automatically. To fit the real data of the radiographs, a study by Bin et al. 28 made some adjustments, which indicated that the adjusted model outperformed the faster R-CNN. It yielded a mean precision of nearly 0.82, a sensitivity of 78%, and a specificity of 94%, and only took 0.33 s to test each image, and thus achieved a better balance between accuracy and speed.
An early stage of KOA detection technique using ML for feature extraction and classification was presented by Mahum et al. 29 The input radiographs taken from Mendeley Data set VI were pre-processed, where features of ROA were extracted by using hybrid feature descriptors such as CNN through LBP, and CNN with a histogram of oriented gradient (HOG). Multi-class classifiers, including a support vector machine (SVM), RF, and K-nearest neighbor (KNN), were used for KOA classification that performed five-fold validation and cross-validation. The HOG feature descriptor yielded an accuracy of 97.14% in cross-validation and 98% in five-fold validation for the early KOA detection and classification. The proposed algorithm yielded 93% average accuracy within 1870 iterations.
CT has a high resolution to reflect pathological conditions more clearly and a high detection rate for KOA. 30 A novel approach using phase-contrast imaging (PCI) with CT for the knee cartilage matrix visualization enables direct examination of chondrocyte patterns and their subsequent correlation with OA. 14 Nagarajan et al. 14 extracted statistical features using gray-level co-occurrence matrices (GLCMs) and geometric features using scaling index method (SIM) from 404 ROIs that annotated on PCI images of normal and osteoarthritic specimens. Both feature extractions achieved great performance with an area under the curve of ROC (AUC) value of 0.98 in classification between normal and OA, suggesting these quantitative knee cartilage matrix analyses can highly accurately distinguish between normal and diseased tissue patterns. Abidin et al. 31 used two different CNN models, CaffeNet and Inception-v3, to extract features for characterizing healthy and diseased tissues. They used a classification task measured by the ROC curve to quantitatively evaluate the features and visualize them by t-distributed stochastic neighbor embedding (t-SNE) a dimension reduction approach. Using features from the last convolutional layer and fully connected layer for CaffeNet showed the best classification performance of an AUC up to 0.91. Simultaneously, the approach using Inception-v3 yielded similar classification performance of an AUC up to 0.95. The visualized features from these layers showed adequate characterization of chondrocyte patterns in classification between normal and osteoarthritic tissues. These techniques can potentially be used to detect the presence of the early osteoarthritic changes in the articular cartilage, which was previously impossible on CT images. For early diagnosis of OA, imaging changes could be earlier than the emergence of clinical symptoms. The application of ML in radiological imaging makes the early diagnosis of OA possible. However, studies have not yet been carried out on a large-scale clinical application, and its clinical benefits need to be further explored.
ML application in MRI
MRI plays an increasingly essential role in OA diagnosis and related research, and has been regarded as a hot spot in combination with computer-aided diagnostic technology. 32 According to previous studies, cartilage degeneration is not a geometrically regular process. 33 Tibial trabecular bone structural markers detected in MRI could be effective predictors of articular cartilage loss. 34 A community-based cohort study, including 159 participants with MRI images of both the left and right knees, presented a framework using ML and texture analysis to diagnose ROA. They used an ML approach to apply feature selection and extraction to identify the linear combination of the features that best related to the volume of cartilage loss. However, due to the limited sample size and selection bias, the results can only be partially applied to patients at the early stages of ROA. 34 Chang et al. 35 used a methodology based on bone flattening to estimate the presence of osteophytes and ROA (KL ⩾ 2). The AUC for osteophytes was 0.85. However, the test set size was relatively small, suggesting that the results require further validation.
Kundu et al. 36 developed a model to distinguish between progressors and non-progressors from baseline cartilage texture maps that enabled the early detection of OA using pre-symptomatic cartilage texture maps derived from transport-based learning. Their work in detecting future symptomatic OA progression 3 years prior to its occurrence, achieved a robust test accuracy of 0.78 and also demonstrated that early OA diagnosis was possible at a potentially reversible phase. In another study, 313 knee scans were included, and six different ML classifiers were used to analyze trabecular bone structures and quantify OA risks. They found that the optimal approaches were linear discriminant analysis (LDA) (with a ROC of 0.82) and weighted KNN (with a ROC of 0.81). The results demonstrated the feasibility of using ML to separate healthy and OA knees based on trabecular bone structures. Moreover, due to the fully automatic framework, the markers were well-suited for longitudinal clinical studies. 37 In a study consisting of 68 participants from the OAI database, an ML algorithm–weighted neighbor distance using a compound hierarchy of algorithms representing morphology (WND-CHRM) was applied to T2 mapping of the central medial femoral condyle. This study found that WND-CHRM could classify T2 maps of cartilage, which has 75% accuracy in predicting the development of OA. 12 Another study used fast imaging and employed a steady-state acquisition sequence to quantitatively analyze bone. A total of 665 right knees from the Rotterdam Study, scanned with 1.5 T MRI and elastic net models, were used to differentiate knees with and without MRI-defined OA. The image feature (17 shape features, 3 orientation features, 12 histogram-based features, and 271 texture features) model had an AUC under 0.80. These results suggest that radiomic features from the subchondral bone are useful imaging biomarkers for the diagnosis of OA. 38
The quality of infrapatellar fat pad (IPFP) is considered as a marker to predict KOA. 39 However, current semiquantitative scoring system is largely constrained to reader-dependent reliability, and signal alterations of the IPFP are considered nonspecific findings. 40 Jia Li et al. 41 investigated 690 participants with KOA risk from the Pivotal OAI MRI Analyses incident OA cohort, being distributed to development (n = 500, 340 women; mean age, 60 years) and test (n = 190, 120 women; mean age, 61 years) cohorts. All knees had a KL grade of 1 or less at baseline. IPFP texture features were extracted by three-dimensional (3D) texture analysis based on MRI. In both cohorts, IPFP texture scores (AUC ⩾ 0.75) showed greater discrimination than clinical scores (AUC ⩽ 0.6) at baseline, P-1 (1 year before P0), and P0 (the incident time point of KOA), with significant differences in pairwise comparisons (p ⩽ 0.002). Greater predictive and concurrent validities of IPFP texture scores (AUC ⩾ 0.75) compared with MRI Osteoarthritis Knee Scores (AUC ⩽ 0.66) were also demonstrated (p < 0.001). Therefore, MRI-based 3D texture of the IPFP could help predict incident radiographic KOA and the development of incident KOA.
The low resolution of MRI of the knee joint seriously affects the diagnosis of KOA. In one study, a high resolution (~0.3 mm in plane pixel size) was applied to make an accurate early diagnosis. 42 Another study compared the efficient medical image super-resolution (EMISR) method with the sparse coding-based network (SCN), super-resolution convolutional neural network (SRCNN), and efficient sub-pixel convolution neural network (ESPCN) methods. They found that the reconstruction quality of EMISR was higher, and the reconstruction speed of EMISR was faster than its counterparts. 43 However, T1 and T2 alone are not representative of cartilage changes; thus, combining them may improve the sensitivity and specificity of OA diagnosis. Pedoia et al. 44 mapped 178 participants by integrating demographic and clinical information, gait kinematics, kinetics, cartilage compositional T1ρ and T2, and R2-R1ρ (1/T2–1/T1ρ) acquired at 3 T and whole-organ magnetic resonance imaging score (WORMS) morphological grading. This study suggested R2-R1ρ as an early predictor of cartilage lesion progression in KOA, reflecting the advantage of data-driven topological data analysis (TDA). However, an unsupervised learning method was used in this study. In the future, a supervised learning method should be adopted to improve predictive ability. 44 Although current developments in MRI hardware, such as compositional and functional assessments, can provide helpful information for clinicians, challenges remain in selecting the best features from a large number of variables to integrate into prediction models. Different types of MRI scanning may have an impact on the sensitivity and accuracy of the prediction model. Therefore, further research should make full use of MRI data and explore the best features of MRI to reflect the imaging changes of OA for early diagnosis of OA and make a more accurate diagnosis.
ML application in biochemical biomarkers
As OA is a multifaceted chronic joint disease, it is not very persuasive to study OA from an imaging perspective only. Biomechanical biomarkers that have the potential to diagnose OA have also been identified, with the development of sequencing technology and application of ML approaches.
Diagnostic algorithms were applied in a study comprising 46 patients with early OA in the training data set. Glycated, oxidized, and nitrated proteins and amino acids were accessed in the synovial fluid and plasma of patients with early OA. 45 In the algorithm-based model, feature selection by the ML process produced algorithms with multiple oxidation, nitration, and glycation adducts, suggesting that assessment of these adducts increase the diagnostic ability of discriminating early-stage disease from control. 45 However, due to the complexity and heterogeneity of OA, a single biomarker is unlikely to achieve a comprehensive classification. Thus, the investigation of aggregate biomarkers based on both biochemical and imaging assessments is complementary and could be a superior strategy for the early diagnosis of OA. Using publicly available data from the Foundation for the National Institutes of Health (FNIH) cohort study, Nelson et al. 46 incorporated data on biomechanical, demographic and clinical variables, and MRI assessment, to simultaneously distinguish different KOA phenotypes. This was achieved through the application of novel ML methodologies, including the distance-weighted discrimination (DWD), t-SNE, and principal components analysis (PCA), which have identified various variables that are mostly related to progressor status (e.g. bone marrow lesions, osteophytes, medial meniscal extrusion, and urine CTX-II). These innovative ML algorithms provide a way to estimate numerous variables of multiple types and scalings simultaneously in relation to KOA progression, and could be applied to other fields in the future. However, the authors admitted that this study was a preliminary step, requiring further internal validation using other methodologies in this cohort, and external validation in other data sets, to test the validity of the results.
Temporomandibular joint (TMJ) disease is the second most common musculoskeletal disorder. A study enrolled 46 TMJ-OA patients and 46 healthy controls and used four ML models (logistic regression (LR), RF, LightGBM, and XGBoost) to analyze 52 features, including 5 clinical variables, 20 radiomics features from high-resolution cone-beam tomography imaging (HR-CBCT), 25 biomolecular features (13 from serum and 12 from saliva), and 2 demographic variables (age and gender). Results showed that XGBoost + LightGBM model achieved accuracy of 0.82 (SD 0.03), and an AUC of 0.87. Composed of clinical, biological, and radiomics features, this study is expected to boost future studies by integrating multidimensional information for the diagnosis of OA. 47 Another study by Celia Le et al. 48 included 46 TMJ OA patients and 46 healthy controls. 52 features were obtained from ROI, which was the lower part of the TMJ. Meanwhile, 32 mandibular fossa radiomic features from the upper part of the TMJ were tested. A total of 3828 features and interactions features are obtained in the training data set. Results showed LightGBM and XGBoost were the most efficient diagnostic methods because they had the highest AUCs (0.831) and F1-scores (0.761). However, this study only had 92 participants, so it can still be improved by expanding sample size to improve the robustness of predictive models. OA is a complex disease with various clinical manifestations and imaging features in different populations and stages of the disease. Therefore, it is necessary to combine multidimensional information, including a variety of biomarkers, to establish an early diagnosis model of the disease. In addition, the internal and external validation of the model needs to be further explored to enhance the credibility of the model.
ML application in the multi-omics study
Omics include but are not limited to, genomics, epigenomics, transcriptomics, proteomics, and metabolomics. It has been increasingly used for the identification of novel biomarkers, pathways, and mechanisms for OA, 49 and significant progress has been made by integrating multiple algorithm techniques and multi-omics data sets. 50 Swan et al. 13 used high throughput proteomics to identify potential biomarkers in OA articular cartilage and found that interleukin 1β (IL-1β) played a significant role in the diagnosis of OA. As a rule-based ML technique, bioinformatics-oriented hierarchical evolutionary learning (BioHEL) proved to be excellent in classifying proteomics results, which indicated that some proteins, including matrix metalloproteinase-3 (MMP-3), interleukin 8 (IL-8), hyaluronan and proteoglycan link protein 1 (HPLN1), matrix gla protein (MGP), and apolipoprotein E (APOE), were associated with the occurrence of OA and others had little effect on OA. Another study developed an ML heuristic based on BioHEL and rule-guided iterative feature elimination (RGIFE), whose diagnosis accuracy for OA was better in proteomics and transcriptomics data sets compared with other ML methods. 51
The application of ML has significantly improved the efficiency of research in the context of the Big Data era. A primary challenge faced by multi-omics studies is the under-representation of the lower abundance of potentially valuable proteins. Therefore, ML sensitivity plays an important role in resolving this issue. 13 Further analyses should use larger sample sizes and more replicate populations to investigate the individual proteins in the networks and to provide patients with individualized treatment guidance. Moreover, building more publicly available data sets, developing more transparent and reproducible computational analyses, and validating biomarkers across populations and time points are of crucial importance for early diagnostic OA biomarkers derived from multi-omics studies.
ML application in other studies
In addition to the above methods, many other studies have applied ML approaches for the early diagnosis of OA, such as gait analysis, vibroarthrography, and cartilage texture analysis.
Kotti et al. 52 built a computer system that used body kinetics as input and produced an output of an estimation of the KOA presence that filled the interpretability gap between engineering and medical approaches. They used RF regressors to map the parameters by rule induction for KOA grading that led to an accuracy of quintuple cross-validation of 72.60% ± 4.20%. Vibroarthrography can evaluate the cartilage damage of KOA during extension and flexion movements that is cheap and radiation-free. 53 Using ML with a linear SVM, a satisfactory classification performance could be achieved that produced a specificity of nearly 0.80 and a sensitivity of 0.75. 54 Lim et al. 55 employed a DNN method for OA detection using 5749 patients’ medical utilization data and their health behavior information. They used PCA with quantile transformer scaling to generate features from background medical information of the participants. This study yielded an AUC of 0.77, and the proposed method minimized the effort for features generation.
One study included 407 middle-aged women with a mean body mass index (BMI) of 27 kg/m2 and without symptomatic KOA, and built an analytics pipeline with the RGIFE method to predict the early incidence of KOA using clinical variables, food and pain questionnaires, and biochemical and imaging information. 56 Five highly predictive models were proposed and could possibly be adopted for early diagnosis of the disease. All models achieved the outstanding performance of an AUC > 0.70 when only a few variables were used. The study demonstrated the potential of applying the ML approach to generate predictive models integrating all types of information for KOA diagnosis. The method proposed in this study is universal and can be extended to other diseases. The influence of BMI on the development of OA is recognized. Therefore, subsequent researches can comprehensively consider various information, including BMI to increase the applicability of diagnosis model.
Perspectives of ML in OA
In this review, we summarized various early predictive models based on ML, and different perspectives that can be applied in patients with OA, including radiographs, MRI, biochemical biomarkers, and multi-omics (Table 1 and Figure 1). The commonest method for radiographic definition is the KL grading system and atlas, which has been used for more than 60 years and contains common limitations of reader-dependent reliability. 21 The widely used American College of Rheumatology (ACR) classification criteria for clinical OA diagnosis was first proposed in 1995. 57 Comprehensive predictive models combining multidimensional information are needed to early diagnose OA and further understand the etiology and progression of joint degeneration. Effective prediction models could pave the way for developing efficient and personalized therapies for OA patients.
Extracted data for OA early diagnosis using machine-learning techniques.
3D, three-dimensional; AP, average precision; AUC, area under curve; BioHEL, bioinformatics-oriented Hierarchical Evolutionary Learning; BMI, body mass index; CCBR, Center for Clinical and Basic Research; CNN, convolutional neural network; DESS, dual-echo in steady state; DNN, deep neural network; ECM, extracellular matrix; eOA, early-stage OA; eRA, early rheumatoid arthritis; FIESTA, fast imaging employing steady-state acquisition; FL, femoral lateralp; FM, femoral-medial; FS, fat suppression; FSE, fast spin echo; GLCM, gray-level co-occurrence matrices; GP, General Practitioner; IL, interleukin; IPFP, infrapatellar fat pad; JSN, joint space narrowing; KL, Kellgren-Lawrence; KNN, k-nearest neighbor; KOA, knee OA; KR, knee replacement; LDA, linear discriminant analysis; LR, logistic regression; MGP, matrix Gla protein; MidL, lateral mid-part; MidM, medial mid-part; ML, machine learning; MMP, matrix metalloproteinase; MOST, Multicenter Osteoarthritis Study; MRI, Magnetic resonance imaging; NA, not available; NN, nearest-neighbor; OA, osteoarthritis; OAI, Osteoarthritis Initiative; PA, prealbumin; PCA, principal components analysis; PCI-CT, Phase-contrast X-ray computed tomography; PD, proton density; QDA, quadratic discriminant analysis; R-CNN, Region-CNN; RF, random forest; RGIFE, Ranked Guided Iterative Feature Elimination; ROA, radiographic osteoarthritis; ROI, region of interest; ROI, regions of interest; RR, ridge regression; SBL, lateral subchondral bone; SBL, subchondral bone length; SBM, medial subchondral bone; SGE, spoiled gradient echo; SIM, Scaling Index Method; SVM, support vector machine; TBL, lateral trabecular bone; TBM, medial trabecular bone; TDA, topological data analysis; TGF-β, transforming growth factor-β; TL, tibial-lateral; TM, tibial-medial; TMJ, temporomandibular joint; VOIs, volumes of interests; wkNN, weighted k-nearest-neighbor; WND-CHRM, weighted neighbor distance using compound hierarchy of algorithms representing morphology; wNN, weighted nearest-neighbor.

Machine-learning model working flow chart.
Recent studies have regarded MRI as an important imaging tool in clinical and research applications, especially high-resolution MRI. However, early diagnostic criteria for OA using MRI have yet to be established. It is accepted that high-resolution MRI reflects cartilage changes earlier than radiographs, however, the course of OA is typically diverse and in an individual OA patient is largely unpredictable. 58 In the future, supervised learning methods could significantly improve the predictive ability of high-resolution MRI. Moreover, although aging has always been considered as an important etiology for OA, the mechanisms are still not fully understood. Current ML methods cannot distinguish the osteoarthritic structural changes from normal aging manifestations. 59
Biomarkers are essential for the early diagnosis of OA. However, the development of biochemical biomarkers that can accurately and early diagnose OA is far from trivial, since changes in the disease are subtle. Recent advances in multi-omics technology, including genomics, transcriptomics, proteomics, and metabolomics, have promoted the development of new methods to identify disease biomarkers. Bioinformatics tools have detected several OA-related proteins and have shown the potential to investigate the individual early diagnosis of OA.
Due to complexity and heterogeneity of OA, it is unlikely that a single marker will provide an accurate early diagnosis of the disease. The challenges on KOA using ML remain because of high heterogeneity among different OA subtypes or localizations, which calls for diversifying the studies. Moreover, how to best subset OA into phenotypes and endotypes and whether certain subsets are of any clinical value is an important issue that requires both internal and external validation both in discovery cohort and in external populations. In this regard, the availability of high-quality and easy-to-obtain patient data, ideally longitudinal and from different populations, is key for OA phenotyping and/or endotyping research.60–62 Therefore, additional research combining measurements from MRI, biomarkers, clinical manifestations, and multi-omics are needed to build a more accurate and generalizable model for early diagnosis and prediction of OA. Along with the development of related techniques, a growing number of public and private data sets will provide extremely large amounts of statistics for OA research. Thus, patient-specific prognosis estimation and personalized prevention and treatment of OA can be achieved. Further studies are needed to verify whether the current ML algorithms are also approximate to other data sets and to identify their generalizability.
One major challenge that needs to be highlighted is consistent quality and standardization to extract OA features between studies in respect to acquisition techniques. Although the data sources appeared to be larger and larger, the lack of high-quality labeling may reduce their effectiveness in the ML model training. Until now, the principle of the feature map or the target location of the most important features extracted by ML is unknown. Therefore, standardized protocols and acceptance testing should be established to ensure ML studies can meet certain criteria before routine clinical use. Moreover, most studies applying ML to OA tested or trained in the same data set, which often have high comparability but limited generalizability. Besides, most of the cohorts currently available predate the interest of ML approaches in the field. This situation reinforces the need for purpose-built cohorts for ML analyses and future studies should aim to test ML models on broad data to demonstrate the robustness and generalizability.63–65
Although ML could be an important tool to aid the work of researchers, and eventually clinicians, in the early identification of OA, acceptable quality standards for real-world applications have not been fully met. It is hoped that the heightened pace of external validation added to the current pace of model development and internal validation will accelerate the translation to the accurate, rapid, and cost-effective detection of OA in the near future. 66
The main limitation of ML in OA is that the performance of the algorithm may be biased. Although the method could perform well on the entire data set, it is still unsatisfactory when the test set is stratified according to the stages of OA development. Therefore, future studies should focus on improving the performance when OA is not severe. In addition, most of the current studies were associated with imaging and lack external validation and real-world clinical applications, and few articles merged the use of clinical and biological data. Therefore, it is necessary to use larger scale and multifaceted data to make the diagnosis model more credible and reproducible. Most of the applied ML algorithms are based on Supervised Learning, which may limit the ability to identify new phenotypes. Moreover, developing interpretable ML models that can determine which features contribute to a specific model decision is also a challenge. 65 Despite these limitations, we believe that investigational methodologies are emerging in clinical trials, and could provide better quantitative information about the joints of patients who already have OA in a systematic way. 21
In conclusion, we have demonstrated current predictive models based on ML for the early diagnosis of OA. An accurate prediction model that can detect early changes in OA before the emergence of clinical features is expected to alter the current state of OA diagnosis. Although there are neither cluster of imaging nor biological markers allowing routine early diagnosis of OA in clinical practice, the diagnostic performance of the fusion model combining multidimensional information could enable patient-specific prognosis estimation and personalized therapy for OA in the future. In addition, to standardize and improve the quality of such studies, it is necessary to develop ML checklists, consensus, and training for the scientific community of OA.
