Abstract
Objective
This study aimed to examine the performance of machine learning models in predicting the progression of knee pain, functional decline, and incidence of knee osteoarthritis (OA) in high-risk individuals, with automated machine learning (AutoML) being used to automate the prediction process.
Design
There were four stages in the process of our AutoML-integrated prediction. Stage 1—Data preparation: The data of 3200 eligible individuals in the Osteoarthritis Initiative (OAI) study who were considered at high risk of knee OA at the baseline visit were extracted and used. Specifically, 1094 variables from the OAI study were used to predict the changes in knee pain, physical function, and incidence of knee OA (i.e. the first occurrence of frequent knee symptoms and definite tibial osteophytes (Kellgren and Lawrence grade ≥2)) over a 9-year period. Stage 2—Model training: The AutoML approach was used to automatically train nine widely used machine learning (ML) models. Stage 3—Model testing: The AutoML approach was used to automatically test the performance of the ML models. Stage 4—Selection of important input variables: The AutoML approach automated the process of computing the importance scores of all input variables and identifying the most important ones, using the technique of permutation feature importance.
Results
Using the AutoML approach, the weighted ensemble model and the CatBoost model showed the best performance among all nine ML models. For the prediction of each outcome in each year, the five most important input variables were identified, most of which were obtained from self-reported questionnaire surveys and radiographic imaging reports.
Conclusion
The AutoML approach has shown potential in automating the process of using ML models to predict long-term changes in knee OA-related outcomes. Its use could support the deployment of ML solutions, facilitating the provision of personalized interventions to prevent the deterioration of knee health and incident knee OA.
Introduction
Knee osteoarthritis (OA) is a leading cause of knee pain, immobility, and poor quality of life, which affects more than 20% of middle-aged and older adults worldwide.1–5 Many more people will be at risk of knee OA in future because of aging and obesity, 6 so it is critical for OA to be prevented or to be detected early. Typically, diagnosis of knee OA relies on patient-reported symptoms and evidence from X-radiographic images, but individuals may not notice subtle changes in their knee health, and X-radiography has limitations in detecting early disease.7–9 As a result, clinicians may fail to ensure that their patients take necessary precautions to prevent the progression of knee pain, functional decline, and the occurrence of knee OA.
Artificial intelligence-based prognostic models can be developed and may enable the prediction of incident knee OA, for which, machine learning (ML) algorithms such as logistic regression 10 and k-nearest neighbors (k-NN) 11 have been used. However, there are limitations to such approaches. First, the deployment of high-performance models demands manual preprocessing of data, feature engineering, model selection, hyperparameter tuning, and model testing, all of which can be complex and time-consuming. 12 As such, clinicians who lack knowledge and experience in programming may struggle to use predictive models. Second, the risk factors that were used as the input variables in previously developed predictive models may not be the most important variables to use for accurate prediction. 13 Identifying the most important factors for precise prediction is an ongoing challenge in this field.
Automated machine learning (AutoML) can be used to automatically execute the processes of implementing ML models to generate predictions (e.g. selecting and combining ML models, optimizing hyperparameter settings, and achieving optimal performance) in a more flexible, robust, and efficient manner than by using traditional ML techniques.14–16 With the assistance of AutoML, clinicians would be able to devise and deploy ML solutions without the need to expend extensive time and effort in model training, testing, and identification of important input variables. 14 Despite these benefits, little research has been conducted on the use of AutoML to assist in the process of implementing ML models to predict knee health conditions in individuals at high risk of knee OA.
Therefore, the aim of the present study was to apply an AutoML approach to automatically manage the processes of implementing ML models and to examine the performance of these models in predicting the progression of knee pain, functional decline, and incidence of knee OA in individuals at high risk of knee OA. Data from the Osteoarthritis Initiative (OAI) study, which comprise over 3000 eligible participants and over 1000 input variables (i.e. clinical and imaging data collected from the OAI clinical visits), were used to train and test the ML models, and the most important variables for the predictions were identified.
Methods
Study sample
The OAI (https://nda.nih.gov/oai) is a longitudinal study of the natural progression of knee OA in 4796 individuals aged 45 to 79 years from four clinical centers in the United States. These individuals are examined annually, and the OAI clinical data collected at the baseline visit and nine follow-up visits have been made public. Ethical approval was obtained from the Institutional Review Boards of four OAI clinical sites, located at Baltimore, Maryland; Columbus, Ohio; Pittsburgh, Pennsylvania; and Pawtucket, Rhode Island. All participants included in the OAI study provided written informed consent.
In the current study, we used the OAI study's eligibility criteria to include individuals at high risk of knee OA. The OAI study defines individuals at high risk of knee OA (i.e. the incidence cohort) as those who had risk factors at baseline but were not diagnosed with symptomatic tibial-femoral OA, that is, those who did not have frequent knee symptoms (pain, soreness, or stiffness in or around the knee on most days for at least 1 month in the past 12 months) and definite tibial osteophytes (Kellgren and Lawrence imaging grade ≥2) in the same knee. 17 The risk factors used by the OAI study to identify the individuals at high risk of knee OA (n = 3284) from all of the participants include age above 70, overweight, presence of knee symptoms during the past 12 months, previous knee injury, previous knee surgery, family history of total knee replacement for knee OA, and Heberden's node in the hand. 17
Knee OA-related outcomes
Three knee OA-related outcomes were used in our predictions: progression of knee pain, functional decline, and incidence of knee OA. We used the participants’ scores on the pain subscale and the physical function subscale of the Western Ontario and McMaster Universities Osteoarthritis Index (WOMAC) 18 to predict the progression of knee pain and functional decline over nine years. The pain subscale contains five items on knee pain: knee pain during walking, while using stairs, while in bed, while sitting or lying, and while standing upright. The physical function subscale of the WOMAC comprises 17 items measuring the difficulty of performing daily activities, such as using stairs, rising from sitting, bending, shopping, and performing domestic duties. The items are scored on a 5-point Likert scale ranging from 0 (none) to 4 (extreme), with a maximum score of 20 for the pain subscale and 68 for the physical function subscale. A high score indicates worse knee health. With respect to the prediction of incident knee OA, the participants included in this study had no knee OA at baseline, and we predicted whether they exhibited knee OA at the follow-up visits each year. According to the OAI study, the incidence of knee OA is defined as the first occurrence of frequent knee symptoms (pain, soreness, or stiffness in or around the knee on most days for at least 1 month in the past 12 months) and definite tibial osteophytes (Kellgren and Lawrence imaging grade ≥2) in the same knee. 17
Input variables used in our predictions
We first excluded the administrative variables from the OAI study data, such as data collection date and staff identifications, and finally extracted 1094 variables, which we subsequently used in our prediction of the progression of knee pain, functional decline, and incidence of knee OA in the participants over nine years. The input variables we used to generate our predictions comprised demographic information, joint symptoms and functions, quality of life, medical history, surgery history, performance of physical examinations, physical activity, dietary nutrition intake, knee examination, and assessments of knee X-ray images.
AutoML-based prediction
For each of the three outcomes, we used the baseline data of both knees as input variables and predicted the conditions of the right and left knees separately for each follow-up visit. This approach would enable individuals to have a better understanding of their knee health, such as which knee may experience deterioration in specific years in the future, allowing for the provision of more personalized interventions. Figure 1 presents the four stages in the ML pipeline of AutoML applied in our prediction task. We applied AutoGluon 15 (an open source AutoML platform) to execute the processes of model training and testing in an automatic manner, which was shown to be faster, more robust, and much more accurate than many other AutoML platforms. 15 The details of the four stages are described below.

Pipeline of automated machine learning used to predict the progression of knee pain, functional decline, and incidence of knee osteoarthritis in the participants.
Stage 1—Data preparation
The data records of the OAI's participants who were considered at high risk of knee OA at the baseline visit were located, and input variables and their outcome data were extracted. The outcome data of individuals that were acquired after knee replacement surgery or the outcome data of those who had died were not extracted and used. After excluding the participants with no outcome data, the remaining sample was used for the training and testing of the predictive models. Their data were randomly divided into a training set (90% of the data) to fit the parameters of the predictive models, and a testing set (10% of the data) to examine the performance of the models by comparing the predicted values with the true values of the outcomes. The procedure of data splitting was performed 10 times, along with the following processes of model training and testing, to alleviate the problem of overfitting or selection bias. 19 The mean value of the performances of each model across these 10 runs were reported.
Stage 2—Model training
We applied a random forest algorithm, an extra-trees algorithm, a gradient-boosting algorithm (using five implementations), a k-NN algorithm, and a weighted ensemble approach 15 to develop regression models to predict the progression of knee pain and functional decline, and applied classification models to predict the incidence of knee OA. We used AutoML to automatically search for hyperparameters to optimize the performance of each model in the training set.
Stage 3—Model testing
We calculated the root-mean-square errors (RMSEs) to evaluate the performance of the models in predicting knee pain and functional decline, with a small RMSE indicating high accuracy, that is, a small difference between the predicted and actual WOMAC scores. To determine the incidence of knee OA, we calculated the area under the receiver operating characteristic curve (AUC) to test the accuracy of binary classification, with a large AUC suggesting that a model exhibited good performance in distinguishing between the knee OA and no knee OA conditions. In theory, the weighted ensemble model should perform as good as the best individual models in the training set; however, because of the overfitting issue, it can produce less accurate predictions in the testing set. Thus, we compared the performance of all models to identify any potential differences.
Stage 4—Selection of important input variables
After we had predicted the outcomes using all 1,094 input variables, we employed the technique of permutation feature importance20,21 to calculate the importance score of each variable and thus identify the most important variables. The higher the score of a variable, the more important it was to the predictive accuracy. To be specific, the value of each variable was permuted individually, and the predictive accuracy was assessed after each permutation; the decrease in the accuracy indicated how much the predictive performance relied on the variable. 21 This study focused on the importance of features in determining the predictive performance of ML models, regardless of the direction of the feature effect. In this study, we identified the five most important variables for the prediction of each outcome in each year.
Results
Study sample
After removing the 84 individuals who had been enrolled in the OAI study but later found to have no outcome data due to knee replacement surgery or confirmed death, a total of 3,200 individuals who were considered at high risk of knee OA at baseline were included as participants in the current study to generate predictive models. Table 1 summarizes the baseline characteristics of the participants.
Baseline characteristics of the 3200 participants in this study.
WOMAC = Western Ontario and McMaster Universities Osteoarthritis Index.
Prediction results
Figure 2 presents the performances of the ML models for each outcome, and Appendix A shows their detailed performance data. With increasing years, the models’ RMSEs increased and their AUCs slightly decreased, indicating that their performances were worsening over time.

Predictive performances of the models over nine years.
Specifically, the results indicated that the weighted ensemble model, category gradient-boosting (CatBoost) model, and extra-trees modelexhibited the best performance in predicting pain around the right knee, as these models each had mean RMSEs over nine years of 2.27 (SD = 0.16) on the 0–20 WOMAC scale. In contrast, we found that the weighted ensemble model showed the best performance for predicting pain around the left knee (mean RMSE = 2.30, SD = 0.13). However, the k-nearest neighbors (k-NN) model performed the worst in predicting the progression of knee pain over nine years in both knees (right knee: mean RMSE = 2.83, SD = 0.16; left knee: mean RMSE = 2.88, SD = 0.14).
With respect to functional decline over nine years, for the right knee, the weighted ensemble model (mean RMSE = 7.01, SD = 0.65) and CatBoost model (mean RMSE = 7.01, SD = 0.66) showed the best predictive performance, and for the left knee, the weighted ensemble model showed the best predictive performance (mean RMSE = 7.33, SD = 0.59). In contrast, the k-NN model showed the worst predictive performance for both knees (right knee: mean RMSE = 9.05, SD = 0.47; left knee: mean RMSE = 9.46, SD = 0.61).
Most of the models exhibited high performance in predicting the incidence of OA in both the right and left knees, with the mean AUC over nine years ranging from 0.78 to 0.81. The best-performing models were the weighted ensemble model, CatBoost model, and extra-trees model (right knee: mean AUC = 0.81, SD = 0.01; left knee: mean AUC = 0.81, SD = 0.02). In contrast, the worst-performing model was the k-NN model, with a mean AUC less than 0.5 for both knees (right knee: mean AUC = 0.47, SD = 0.07; left knee: mean AUC = 0.48, SD = 0.02).
Important input variables for prediction
From all the input variables collected at the OAI baseline visit, we identified the five most important ones for the prediction of each outcome in each year (see Appendix B for the priorities and importance scores of each important variable). The important variables identified for the prediction of all three outcomes over nine years are summarized in Table 2 and were found by analyzing self-reported questionnaire surveys, performance measures, clinical examination reports, and radiographic imaging reports. Some questionnaire-based variables were found to be important for predicting the progression of knee pain and functional decline, such as the WOMAC score, Knee injury and Osteoarthritis Outcome Score (KOOS), severity of knee pain, and Medical Outcomes Study 12-item Short-Form Health Survey score. With respect to the incidence of knee OA, we found that the radiographic data (including data on the composite OA grade, osteophytes, and joint space narrowing) showed the greatest impacts on prediction results, and other important variables included BMI, knee examination results, dietary nutrition intake, etc.
Summary of the most important input variables for the prediction of progression of knee pain, functional decline, and incidence of knee OA over nine years.
OA = osteoarthritis, WOMAC = Western Ontario and McMaster Universities Osteoarthritis Index, KOOS = Knee injury and Osteoarthritis Outcome Score, SF-12 = Medical Outcomes Study (MOS) 12-item short-form health survey, JSN = joint space narrowing.
Discussion
In this study, we applied AutoML to automate the process of using ML models to predict the progression of knee pain, functional decline, and incidence of knee OA in individuals at high risk of such outcomes, and determine the predictive performance of the models. Out of the nine predictive models we used, the weighted ensemble model and the CatBoost model performed the best. This may be attributable to the fact that the weighted ensemble model uses a multi-layer strategy that integrates multiple ML models; as a result, it has demonstrated superior performance in model training and achieved one of the best performances in model testing. 15 It may also be attributable to the fact that compared with traditional gradient-boosting models, the CatBoost model deals better with categorical variables and overcoming the overfitting problem. 22 In contrast, we found that the k-NN model exhibited worse predictive performance than the other models. This may be attributable to the k-NN model having limitations in dealing with a large number of input variables, as it is ineffective at determining the most important variables. 23
Future studies can adopt our prediction and AutoML approach to automate the process of model training and testing, and then obtain prediction results for clinical use in various contexts. For instance, our approach can be used to predict various outcomes, such as lower limb muscle strength, joint space width, and the probability of total knee replacement. It can also be applied to predictions that use different input variables, including a set of variables predetermined by clinicians and variables collected over multiple years. Furthermore, it can also be used to conduct predictions on different datasets, such as data collected from different patient populations or specific subgroups.
From all input variables collected at baseline, we examined their importance and identified the most important ones for the predictive accuracy of ML models. The identification of the most important variables can help simplify the process of prediction, allowing clinicians to make predictions using a smaller set of important input variables. For the three outcomes investigated in our study, the most important variables varied. It was not surprising to find that the most important variables for predicting the progression of knee pain and functional decline included the subscale and total scores of questionnaire-based measures (e.g. WOMAC and KOOS) collected at baseline, as in our predictions these two outcomes were determined by examining the changes in WOMAC scores. 24 Regarding the prediction of the incidence of knee OA, radiographic data collected at baseline were found to have greater importance than other variables. One reason for this could be that radiographic evidence is considered a mainstay in the diagnosis of knee OA. 25 Another reason could be that the individuals with radiographic evidence of knee OA at baseline were likely to develop knee symptoms,26,27 which would result in a diagnosis of knee OA with both radiographic evidence and knee symptoms in the follow-up visits. Moreover, there were differences in important variables for predicting the outcomes in left and right knees. Specifically, when predicting outcomes for the left knees, we found that a majority of important input variables were associated with the left knees; and the same findings were observed for the predictions on right knees. This could be because the data of input variables of one side of the knee can provide a more accurate reflection of the health condition of that particular side, thereby being more influential in predicting outcomes for that corresponding side.
Our findings have several implications for research. First, there was a decreasing trend in the predictive accuracy of our models over time, indicating that short-term prediction was more accurate than long-term prediction. Thus, it is necessary to develop models that can achieve higher accuracy for long-term prediction than the models used in our study. Alternatively, to improve the accuracy, it may be beneficial to use the data from multiple visits as input variables, instead of solely from the baseline visit, as research has shown that using data from longer periods can obtain more robust results compared to using short-term data when predicting the cartilage loss trajectory among OAI participants.28,29 Second, out of over 1,000 variables, we determined only the five most important input variables for prediction. In future research, it would be useful to allow clinicians to determine the appropriate number of input variables for prediction and select the variables based on clinical need. Third, future studies could use more datasets than we used in this study, as this would enable the validation of AutoML's predictive performance in more healthcare settings, such as predicting the progression of other chronic diseases.
In practical settings, accurate prediction results can enable both clinicians and individuals at high risk to have a clear prospective understanding of the potential progression of knee OA-related symptoms, allowing them to take action to manage knee health in a timely manner before their condition worsens. For example, we performed predictions for both left and right knees to provide more personalized results. This would allow individuals to be aware of the knee that is more vulnerable and take appropriate precautions, such as avoiding injury and minimizing excessive usage of that knee. Moreover, based on personalized prediction results, interventions can be customized to meet the demand for individualized healthcare. For example, individuals at high risk of knee OA who experience a significant decline in physical function could be encouraged to take immediate actions, such as having regular check-ups, implementing body weight control, or taking exercise.30,31 It is also essential to deliver personalized education to individuals at high risk on the process of OA, its pain mechanisms, and preventive strategies. 32 Further, new interventions can be developed and implemented to improve knee health. Similar to the implementation of other healthcare technologies, their usability, acceptance, and effects on knee health are worth further investigation among the end users.33–39
Conclusions
The use of AutoML was found to have the ability to automate the process of using ML models to predict the progression of knee pain, functional decline, and incidence of knee OA in individuals at high risk of these outcomes. Also, the weighted ensemble model and the CatBoost modelshowed the best performance among all nine ML models examined. Among all the input variables, the questionnaire-based outcomes and radiographic data were found to be more important than others in prediction. For future practice, with the use of AutoML, clinicians would be able to generate predictions without the need for extensive knowledge of and experience in ML. This would allow the development of personalized interventions for the prevention of health issues. Moreover, the use of this predictive approach for assessing individuals at high risk of knee OA would help them to be made sufficiently aware of their knee problems that they would take immediate action to improve their knee health.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076231216419 - Supplemental material for Automated machine learning-based prediction of the progression of knee pain, functional decline, and incidence of knee osteoarthritis in individuals at high risk of knee osteoarthritis: Data from the osteoarthritis initiative study
Supplemental material, sj-docx-1-dhj-10.1177_20552076231216419 for Automated machine learning-based prediction of the progression of knee pain, functional decline, and incidence of knee osteoarthritis in individuals at high risk of knee osteoarthritis: Data from the osteoarthritis initiative study by Tianrong Chen and Calvin Kalun Or in DIGITAL HEALTH
Footnotes
Contributorship
TC and CKLO designed the study. TC conducted data analyses and drafted the manuscript. CKLO reviewed and revised the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Ethical approval of the OAI study was obtained from the Institutional Review Boards of four OAI clinical sites, located at Baltimore, Maryland; Columbus, Ohio; Pittsburgh, Pennsylvania; and Pawtucket, Rhode Island. All participants included in the OAI study provided written informed consent.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by the Department of Industrial and Manufacturing Systems Engineering, the University of Hong Kong.
Guarantor
CKLO.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
