Sage Journals: Discover world-class research

Abstract

Background:

Rheumatoid arthritis (RA) is a chronic autoimmune disease, and a predicting clinical improvement is essential.

Objectives:

The aim of the present study was to identify predictor variables of clinical improvement in patients with RA using artificial intelligence (AI) models in a specialized RA center.

Design:

Retrospective cohort study in adult RA patients was conducted between January and June 2022. Follow-up data related to clinical improvement was taken from 6 to 12 months after the baseline. Predictive models were generated by machine learning (ML), by Python programming language. The Transparent Reporting of a multivariate prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines were followed to harmonize this study based on AI.

Methods:

The response variable was classified as improved and non-improved. Patients were considered improved if they persisted or achieved a Disease Activity Score 28—joints (DAS28) <3.2 at the end of the follow-up period or experienced a decrease ⩾0.6 compared to baseline, regardless of the initial DAS28 value. Explainability techniques in AI were applied to identify the most relevant clinical features.

Results:

In total, 3161 RA patients were included. The median age was 65 years (interquartile range (IQR) 57–72). 82.7% were female. Disease duration was 8.3 years (IQR 4.9–11.3). The median value of baseline DAS28 was 2.1 (IQR 2.1–2.8). 2668 (84.4%) were classified as improved, and 493 (15.6%) as non-improved. From ML models, the Extra tree model showed higher sensitivity (0.841). Regarding clinical improvement prediction with the Shapley Additive Explanations method, it was observed that low values of baseline DAS28 were positively associated with clinical improvement. The use of biologic disease-modifying antirheumatic drugs and the presence of anti-cyclic citrullinated peptide (CCP) were related to an increase in the probability of non-improved, which may be secondary to the level of severity of the disease.

Conclusion:

AI models in RA can predict clinical improvement at initial consultations, enabling targeted approaches. Disease severity may be influenced by anti-CCP positivity and the use of biologic therapies when conventional treatments fail.

Keywords

artificial intelligence clinical improvement machine learning probability learning rheumatoid arthritis

Introduction

Rheumatoid arthritis (RA) is a chronic autoimmune disease that not only affects small and large joints but also causes systemic involvement.¹ It affects 0.45%–1% of the population,² and in Colombia, the prevalence is estimated to be 1.49%.³ Over the years, the disease has increased the functional limitation of patients, affecting their quality of life and work activity.^4,5 If the disease was diagnosed and treated in early stages, the anatomical-functional alterations would be less compared to long-standing RA without treatment.^6
–11 Therefore, early detection using different tools, as well as carrying out a targeted treatment based on specific characteristics of individuals, are usually essential to prevent its progression.^1,4 The most commonly used tool to achieve targeted treatment and reach remission or decreased disease activity based on clear objectives is called treat-to-target (T2T),^12,13 which proposes therapeutic goals based on disease activity and the patient’s functional limitations, thus guiding management strategies directed by an interdisciplinary team but on an individual basis.^13,14 To meet the goal of reducing disease activity until remission is achieved, the European League Against Rheumatism (EULAR) sets out treatment response objectives, guided by measuring disease activity with the Disease Activity Score 28—joints (DAS28),^15,16 where they rate the treatment response and clinical improvement according to the levels of change in the DAS28 from baseline to follow-up.^15,17 Although they have been widely validated and are used during daily consultation, these scales present certain problems in patient populations, where the response to treatment is usually dynamic, with changes in the quality of life and functional status of the patient, which impact the perception of pain and therefore the activity of the disease. Identifying models, through different strategies, where various variables enable the assessment of clinical improvement, is particularly relevant in the management of this disease.

Artificial intelligence (AI) is a branch of science in which computing implements greater speed, capacity, and software programming to perform complex tasks, using different models based on the creation of algorithms, which demonstrate intelligent behavior.^18,19 In medicine,^19,20 AI-implemented models, such as machine learning (ML), enable computers to learn from experience and adjust the processing of specific information, creating various patterns to make decisions based on predefined algorithms.²⁰ ML approaches, increasingly used in clinical research, can complement or even surpass the statistical methods traditionally used to predict clinical outcomes since they manage to incorporate information in such a way that more precise predictions are obtained. The application of ML models offers the opportunity to handle complex nonlinear relationships between individuals’ attributes, which are difficult to model with traditional statistical methods. For this reason, ML is postulated as a competent tool for the implementation of complex multiparametric decision algorithms.²¹ ML encompasses deep learning. The latter considers multiple sets of data that, at the same time, are evaluated and reprocessed until reaching a certain conclusion.²⁰ This tool is especially beneficial in clinical decision-making, as it helps not only to diagnose complex diseases but also to guide more personalized treatments based on a large and constantly growing amount of population data.

In rheumatic diseases, AI has acquired a crucial role in clinical decision-making, with RA being one of the conditions where its applications have been widely explored. AI models have demonstrated the ability to make predictions through various bioinformatics approaches, aiding in diagnosis, disease progression assessment, and, in some cases, predicting clinical improvement. Several studies have shown promising results in early diagnosis, impact assessment, and disease progression prediction.^22
–25

Some studies,²⁵ have shown that ML techniques, such as Random Forest, outperform traditional regression models, including multivariate logistic regression, in predicting patient outcomes with biologic disease-modifying antirheumatic drugs (bDMARD) therapy. While regression models remain a standard approach in clinical research, classification-based ML models can capture complex, nonlinear relationships between multiple variables, potentially improving predictive performance in clinical settings. In the field of health, and particularly in rheumatology, this represents a step toward so-called “precision medicine,” the purpose of which is to make decisions on management and treatment tailored to each patient. Thus, the aim of the present study was to identify predictor variables of clinical improvement in patients with RA at a specialized RA center using AI models.

Materials and methods

Population and data collection

A retrospective cohort of patients with a confirmed diagnosis of RA older than 18 years who met the validated international criteria for RA (2010 Rheumatoid Arthritis Classification Criteria American College of Rheumatology/EULAR Collaborative Initiative),⁵ and who were routinely monitored under surveillance for disease control at a specialized outpatient RA center were included. Patients seen between January and June 2022 were involved. Patients with the onset of arthritis before the age of 18 years were excluded. This cohort included a mix of prevalent cases (patients already under follow-up at the center before the study period) and incident cases (newly diagnosed patients or those attending the center for the first time within the study period). However, no distinction was made between these groups in the analyses. This methodological aspect is acknowledged as a limitation and will be discussed later in the study.

Baseline data were obtained from the time of the patient’s last visit to the center within the aforementioned time period; follow-up data related to clinical improvement were taken from the closest follow-up performed between 6 and 12 months from the last visit.

The data collected provided information related to sociodemographic variables, comorbidities (presence or absence at any date before or up to the baseline). As low frequencies were reported in some comorbidities, it was decided to group hypertension, diabetes mellitus, and cardiovascular disease into a single variable called “cardiometabolic disease.” The following variables were defined and measured to assess the clinical status of the patients and were obtained from the clinical records: Rheumatoid factor and anti-cyclic citrullinated peptide (anti-CCP) antibodies were determined using standard immunoassays, with positivity defined according to laboratory reference ranges. C-reactive protein (CRP) was measured in mg/L, with values above 5 mg/L indicating elevated inflammation. Erythrocyte sedimentation rate (ESR) was measured in mm/h, with levels above 20 mm/h in men and 30 mm/h in women considered elevated. Leucocyte counts were obtained through routine complete blood counts, with normal ranges between 4000 and 11,000 cells/μL. Alanine aminotransferase (ALT) was measured in U/L, with normal levels ranging from 7 to 56 U/L. Glomerular filtration rate was estimated using the CKD-EPI formula, with values above 60 mL/min/1.73 m² indicating normal renal function. The Health Assessment Questionnaire (HAQ)²⁶ was used to evaluate the functional status of patients, with scores ranging from 0 to 3. Higher scores indicate greater disability, with a score of 0 representing no difficulty and 3 indicating an inability to perform daily activities. The DAS28²⁷ was calculated using a combination of tender and swollen joint counts, the ESR or CRP levels, and the patient’s global health assessment. DAS28 scores range from 0 to 10, with scores below 2.6 indicating remission, 2.6–3.2 suggesting low disease activity, 3.2–5.1 indicating moderate activity, and scores above 5.1 reflecting high disease activity.

Medications were organized into different groups based on their pharmacological class. Opioid analgesics (medications containing any opioid component used for pain relief) such as codeine and tramadol were included, while non-opioid analgesics comprised acetaminophen and dipyrone. Nonsteroidal anti-inflammatory drugs (NSAIDs) such as ibuprofen and naproxen were included. Antimalarials such as hydroxychloroquine and chloroquine were included. Sulfasalazine and leflunomide treatments were grouped into the category of conventional disease-modifying antirheumatic drugs (csDMARDs), while methotrexate treatment was considered a separate variable. Biologic disease-modifying antirheumatic drugs (bDMARDs) such as tumor necrosis factor inhibitors, interleukin-6 inhibitors, B-cell depleting agents, and T-cell costimulation inhibitors were included.

Defining the response variable

Clinical improvement was represented as a Boolean variable, classifying patients as improved and non-improved. Patients were considered improved if they persisted or achieved a DAS28 <3.2 at the end of the follow-up period or experienced a decrease ⩾0.6 compared to the baseline value, regardless of their initial DAS28 value, according to the Disease Activity Score and the EULAR criteria.¹⁷ The remaining individuals were classified as non-improved. Non-improved patients were defined as those who did not achieve low disease activity or who experienced disease flares.

ML methodology

Input variables

The Transparent Reporting of a multivariate prediction model for Individual Prognosis or Diagnosis, plus the use of AI (TRIPOD + AI) guidelines, were followed.²⁸ A descriptive analysis of the data was carried out in order to assess its distribution and identify missing values. During the selection of input variables, those with 40% or more missing data were excluded to minimize the risk of bias and ensure robust model performance. A high proportion of missing data can lead to unreliable estimates and negatively impact the predictive power of ML models.²⁹ Similarly, categorical variables with low variability (category frequency <10%) were excluded, as they provide minimal discriminatory power for classification tasks and may introduce noise rather than meaningful information.³⁰ As a result, 25 variables were preselected for modeling.

Preprocessing and modeling

The data were split into independent training (80%, n = 2528) and validation (20%, n = 633) data sets. Predictive models were generated using ML using only the training dataset. The test dataset did not participate in any internal training or validation of the predictive models; it was only used for the final external validation of each trained model. Missing data in the training set were imputed using simple imputation, the mean for numerical variables, and the highest frequency category (mode) for categorical variables. To address the class imbalance in the training set, the majority class was subsampled, and the Synthetic Minority Oversampling Technique was applied to augment the minority class from synthetic data.

To evaluate the performance of the models, avoid overfitting, and tune hyperparameters, cross-validation was performed on the training set using 10 independent iterations. In each iteration, the model was trained in nine segments and evaluated on the remaining one. This process was repeated 10 times, allowing each segment to act once as a validation set and nine times as a training set in different combinations. The performance score of each model was estimated by averaging these iterations.

Regarding modeling, a thorough analysis of the different classification models was performed to predict clinical improvement. Considering the average performance of the trained models in cross-validation, the five models with the highest scores in performance metrics such as accuracy, area under the curve, recall, precision, and F1 score were selected. The selected models: Gradient Boosting Classifier, Random Forest Classifier (RF), Extra Trees Classifier (ET), Light Gradient Boosting Machine (LightGBM), and Extreme Gradient Boosting (XGboost), were evaluated using the test data set that was previously separated, for the purpose of performing an external validation, thus allowing a new measurement of performance.

Identification of important clinical characteristics associated with clinical improvement

Explainability in AI techniques was used to identify the most relevant clinical characteristics associated with clinical improvement. To achieve this, the Shapley Additive Explanations (SHAP) method was used,³¹ which approximates a complex model to a linear one and explains the importance of the characteristics in said model. This approach seeks to show how a specific characteristic influences prediction. In addition, it provides a graph that facilitates the understanding of the complex relationship between variables and results. For this analysis, the decision-tree-based rf model for binary classification was used.

Statistical analysis

All quantitative variables were summarized using the median and interquartile range (IQR), while absolute and relative frequencies were presented for qualitative variables. Stata18 and Python 3.10.12 were used for data analysis. For the development of ML models, the Python programming language was used in conjunction with the Google Colab environment, using the Scikit-learn 1.2.2, PyCaret 3.2.0, and Shap 0.43.0 packages.

Bias control

To control and reduce potential selection bias, only patients with a confirmed diagnosis were included, using international criteria, validated and accepted by the scientific community, for the diagnosis of RA, as mentioned above. It is important to highlight that the center where the study was conducted and from which the records were taken is a reference center, specialized in the care of patients with RA, which allowed the inclusion of patients across the spectrum of the disease; in addition, it has expert, highly qualified professionals with experience in the diagnosis and management of this disease, so it is considered that there was a low risk of bias during the selection of participants. To minimize possible information bias, the data was initially extracted by an engineering expert in database management, subsequently reviewed, and refined by the group of researchers. It should be noted that all patients treated at the center are monitored and followed under the same follow-up protocols.

Ethical considerations

It is worth noting that the group of researchers involved in this project is aware of and has adhered to the second version of the Ethical Framework for AI in Colombia document.³² This is because the Presidency of the Republic of Colombia has identified the importance of ethics, considerations for the design, development, and implementation of AI in Colombia, and more precisely the need to adopt an Ethical Framework as a non-binding guide for the implementation of AI in the country.

Results

Patient characteristics at baseline

A total of 3161 patients diagnosed with RA were included (Figure 1), with a median age of 65 years (IQR 57–72); 82.7% were female. Of the total patients, 2668 (84.4%) were classified as “improved” and 493 (15.6%) as non-improved. The median disease duration was 8.3 years (IQR 4.9–11.3), with 60.2% (1902/3161) of patients having a disease duration of less than 10 years. Rheumatoid factor and anti-CCP were not available for all patients, reporting positivity in 75.9% (2216/2918) and 73.6% (2060/2798), respectively, with a significantly higher frequency in the non-improved group.

Figure 1.

Strategy to define clinically improved patients and non-improved patients.

The median at baseline of HAQ was 0 with IQR (0–0.2). The median value of the baseline DAS28 score was 2.1 (IQR 2.1–2.8), with 71.3% of patients with an initial DAS28 ⩽2.6 considered as patients in remission and 10.4% in low disease activity. This score was significantly higher at baseline in the non-improved group. The median follow-up time from the baseline care and the closest control was 8.6 months (IQR 7.9–9.6).

Regarding comorbidities, the next most frequent diagnosis was osteoporosis in 37.6%, followed by arterial hypertension, which occurred in 35.3% of cases. 10.6% of the included population reported the presence of Sjögren’s syndrome. A reactive CRP was considered a positive value according to the reference laboratory (which is greater than 6 mg/L). 36.5% of patients had elevated CRP. For the non-improved group, it was positive in 43.3%. Regarding the ESR, the median of the total group was 10 IQR (4–22), and for the non-improved group, it was 14, both being significantly higher in this group.

Regarding treatment, 63.5% of patients were receiving methotrexate, while 55.7% of patients were being treated with other csDMARDs. The use of biological agents was recorded in 23.6% of patients. The proportion of patients receiving corticosteroids, opioid analgesics, NSAIDs, and bDMARDs at baseline was higher in the non-improved group compared to the improved group. All comparisons can be seen in Table 1.

Table 1.

Description of sociodemographic variables of patients with RA responders and nonresponders.

Variable	All	Improved	Non-improved	p-Value
Variable	N = 3161	n = 2668	n = 493	p-Value
Median age at diagnosis in years (IQR)	55.1 (46.4–63.2)	55.4 (46.7–63.4)	53.7 (44.5–62.2)	0.004
Current age, median (IQR)	65 (57–72)	65 (57.5–72)	63 (55–71)	<0.001
Female gender, n (%)	2613 (82.7)	2211 (82.9)	402 (81.5)	0.474
Disease duration in years, median (IQR)	8.3 (4.9–11.3)	8.3 (5–11.3)	7.8 (4.7–11.2)	0.135
BMI kg/m⁻, median (IQR)	25.3 (22.8–28.3)	25.3 (22.8–28.3)	25.3 (22.6–28.3)	0.757
Rheumatoid factor positive, n (%)	2216/2918 (75.9)	1877/2473 (75.9)	339/445 (76.2)	0.899
Anti-CCP positive, n (%)	2060/2798 (73.6)	1712/2372 (72.2)	348/426 (81.7)	<0.001
DAS-28 initial score, median (IQR)	2.1 (2.1–2.8)	2.1 (2–2.7)	2.3 (2.1–3.4)	<0.001
Disease activity at baseline				<0.001
Remission, DAS28 ⩽ 2.6	2255 (71.3)	1978 (74.1)	277 (56.2)
LDA, 2.6 < DAS28 ⩽ 3.2	328 (10.4)	258 (9.7)	70 (14.2)
MDA, 3.2 < DAS28 ⩽ 5.1	466 (14.7)	333 (12.5)	133 (27)
HDA, DAS28 > 5.1	112 (3.6)	99 (3.7)	13 (2.6)
DAS-28 final score, median (IQR)	2.1 (2.1–2.9)	2.1 (2.1–2.3)	4.2 (3.6–4.9)	<0.001
Disease activity at the end of follow-up (EoF)				<0.001
Remission, DAS28 ⩽ 2.6	2232 (70.6)	2232 (83.7)	0 (0)
LDA, 2.6 < DAS28 ⩽ 3.2	372 (11.8)	368 (13.8)	4 (0.8)
MDA, 3.2 < DAS28 ⩽ 5.1	458 (14.5)	63 (2.3)	395 (80.1)
HDA, DAS28 > 5.1	99 (3.1)	5 (0.2)	94 (19.1)
HAQ score, median (IQR)	0 (0–0.2)	0 (0–0.2)	0.1 (0–0.3)	<0.001
CRP reactive, n (%)	703/1925 (36.5)	568/1613 (35.2)	135/312 (43.3)	0.007
ESR in mm/h, median (IQR)	10 (4–22)	10 (4–20)	14 (5.5–30)	<0.001
ALT normal, n (%)	2810/2980 (94.3)	2376/2522 (94.2)	434/458 (94.8)	<0.001
Glomerular filtration rate in mL/min, median (IQR)	79.7 (64.6–98.6)	79.2 (64.3–97.4)	84.2 (66.4–108)	0.001
Cardiometabolic disease	1269 (40.2)	1086 (40.7)	183 (37.1)	0.136
Arterial hypertension, n (%)	1116 (35.3)	953 (35.7)	163 (33.1)
Diabetes mellitus, n (%)	286 (9.1)	242 (9.1)	44(8.9)
Cardiovascular disease, n (%)	182 (5.8)	158 (5.9)	24 (4.9)
Chronic kidney disease, n (%)	32 (1)	28 (1.1)	4 (0.8)	0.628
Osteoporosis, n (%)	1187 (37.6)	1003 (37.6)	184 (37.3)	0.909
Sjögren’s syndrome, n (%)	335 (10.6)	282 (10.6)	53 (10.8)	0.905
Corticoids, n (%)	1974 (64.5)	1619 (60.7)	355 (72)	<0.001
Non-opioid analgesics (acetaminophen, dypirone), n (%)	1376 (43.5)	1162 (43.6)	214 (43.4)	0.952
Opioid analgesics (codeína, tramadol), n (%)	1078 (34.1)	882 (33.1)	196 (39.8)	0.004
NSAIDs, n (%)	200 (6.3)	159 (6)	41 (8.3)	0.048
Methotrexate, n (%)	2007 (63.5)	1704 (63.9)	303 (61.5)	0.308
bDMARDS	747 (23.6)	586 (22)	161 (32.7)	<0.001
csDMARDs without methotrexate, n (%)	1762 (55.7)	1472 (55.2)	290 (58.8)	0.134
Antimalarials, n (%)	367 (11.6)	312 (11.7)	55 (11.2)	0.732

ALT, alanine aminotransferase; Anti-CCP, anti-cyclic citrullinated peptide; bDMARDs, biological DMARDS; BMI, body mass index; CRP, C-reactive protein; csDMARDs, conventional disease-modifying antirheumatic drugs; EoF, end of follow-up; ESR, erythrocyte sedimentation rate; HAD, high disease activity; HAQ, Health Assessment Questionnaire; IQR, interquartile range; LDA, low disease activity; MDA, mild disease activity; NSAIDs, nonsteroidal anti-inflammatory drugs; RA, rheumatoid arthritis.

Prediction of clinical improvement

At the end of follow-up, a median DAS28 of 2.1 (IQR 2.1–2.9) was observed for all patients. In the group of patients classified as “improved,” the final median DAS28 remained at 2.1 (IQR 2.1–2.3), with 83.7% of patients considered to be in remission (DAS28 ⩽ 2.6) and 13.8% in low disease activity (2.6 < DAS28 ⩽ 3.2). In contrast, in the group of patients classified as “non-improved,” an increase in the DAS28 score was observed at the end of follow-up, with a value of 4.2 (IQR 3.6–4.9), with no patients considered in remission and only 0.8% of patients in low disease activity.

The design of the algorithm was carefully aligned with the TRIPOD checklist, as shown in Supplemental Material 1. Each item was either evaluated or considered not applicable to this study. Overall, it was observed that the performance of all the ML models evaluated was quite similar according to the metrics used. During cross-validation, an accuracy of around 0.65 was recorded for all models. In this context, the “ET” model stood out by achieving a higher recall or sensitivity (0.841). However, when evaluating the prediction on the test data set, all models showed a slight deterioration in their performance, with a decrease in at least one of the performance metrics. The “RF” model was the most consistent, with the smallest drop in performance between the training and test sets, maintaining high precision and recall (0.7081 and 0.8506, respectively), suggesting that it is better at handling new data. The “ET” model, despite high precision, showed a greater mismatch in its generalization ability when evaluating its performance on the test set, with a significant reduction in precision (0.7073). The “XGboost” model showed interesting behavior, as it had low recall during training (0.7522), but achieved the highest recall in the test set (0.9805), indicating that it seems to be favoring true positives in the test, although this could also be a sign of overfitting. Table 2 provides a detailed summary of the performance metrics of the five models during cross-validation training and in the evaluation of the test set. Learning curves, classification reports, and feature important plots for each of the models trained after hyperparameter optimization are shown in Supplemental Material 2. In addition, a summary of excluded variables due to missing data and low variability, as well as model calibration on the testing set, are presented in Supplemental Material S3.

Table 2.

Performance metrics of the models in the training and test sets.

Model	Training					Test
Model	Accuracy	AUC	Precision	F1-score	Recall	Accuracy	AUC	Precision	F1-score	Recall
GBC	0.6596	0.6509	0.8073	0.7200	0.7597	0.6737	00.6410	0.7200	0.7660	0.8182
RF	0.6582	0.6504	0.8232	0.7116	0.7629	0.6737	0.6316	0.7081	00.7729	0.8506
ET	0.6566	0.6157	0.8411	0.7045	0.7657	0.6356	0.5937	0.7073	0.7296	0.7532
LightGBM	0.6520	0.6220	0.7869	0.7206	0.7507	0.6356	0.5770	0.6977	0.7362	0.7792
Xgboost	0.6505	0.6132	0.7960	0.7146	0.7522	0.6695	0.6147	0.6681	0.7947	0.9805

AUC, area under the curve; ET, extra trees classifier; GBC, gradient boosting classifier; LightGBM, light gradient boosting machine; RF, random forest classifier; Xgboost, extreme gradient boosting.

Important features for predicting clinical improvement

According to the SHAP model, baseline DAS28 was found to be the most relevant characteristic for predicting clinical improvement. Low baseline DAS28 values were found to be positively associated with improvement, while high values were negatively related to clinical improvement. In second place (Figure 2), in importance was the use of opioids, evidence of an association between the use of these therapies, and lack of clinical improvement. Regarding the HAQ score, leukocyte count, and ESR, ranked third, fifth, and ninth in importance, respectively, low values of these characteristics were found to be positively related to clinical improvement. On the other hand, the use of biologics and the presence of anti-CCP were found to be associated with an increased probability of non-improvement. The interpretations of the importance of each characteristic are detailed in Figure 2.

Figure 2.

SHAP summary plot.

Discussion

This study is novel as it focuses on a Latin American population with established RA, aiming to predict clinical improvement using ML models. A notable aspect of our cohort is the relatively low baseline DAS28 scores, suggesting that many patients were already under surveillance for disease control. However, when comparing groups, we observed that most patients classified as “non-improved” had moderate disease activity at baseline, while only a few were in remission. In contrast, a larger proportion of the “improved” group had lower baseline disease activity. ML models identified key factors, such as low baseline DAS28 scores and non-opioid use as critical predictors of clinical improvement, with performance varying between cross-validation and test datasets.

This cohort of patients is one of the largest reported in which a prediction model through AI is used to predict clinical improvement since its population is much larger compared to other studies that predict improvement through AI.^25,33 Most patients were seropositive with a median disease duration of 8.3 years, indicating a chronic disease course similar to other Latin American studies.¹⁴ The majority of these patients were on methotrexate treatment (63.5%), and only 23.6% used biological agents.^21,33 This contrasts with other studies that only include patients who are solely under biological treatment, showing that this study covers a real-life cohort in which they are treated with both csDMARDs and bDMARDs.

AI applications for predicting clinical improvement in RA remain scarce in Latin America, particularly in Colombia. This study is novel in addressing this gap, considering the region’s genetic diversity and the severity of RA in Latin American patients.³⁴ Remission rates remain low, with only 19.3% achieving remission and 32.5% reaching low disease activity at 1 year in early RA cases,³⁵ reinforcing the need for predictive tools to enable personalized medicine.

Although some AI-based algorithms have been developed to improve the diagnosis of autoimmune diseases,²² most studies in Latin America remain theoretical.^36
–38 In Colombia, initiatives have explored AI applications in rheumatology, including computational models for patient classification using genetic, serological, and clinical data^39
–41 and pharmacogenetic models for predicting outcomes with DMARDs.⁴² However, few studies use real-world data or focus on clinical improvement predictors. The scarcity of AI-based studies in Latin America, combined with the high disease burden and prognostic challenges in RA, underscores the need for further research. AI-driven models, such as ML, offer an objective approach to identifying key predictors of clinical improvement, as demonstrated in this study.

In this context, baseline DAS28 was the most relevant predictor of clinical improvement, consistent with other ML-based models where lower baseline disease activity is associated with a higher likelihood of remission.^21,33 Similarly, low ESR, HAQ, and leukocyte count were linked to better clinical improvement, likely reflecting a lower baseline inflammatory and functional burden, which facilitates better outcomes.⁴³ Conversely, biologic use was associated with non-improvement, possibly because patients with more severe disease require more aggressive therapies, yet still show limited improvement.^33,43 Additionally, anti-CCP positivity predicted non-improvement, aligning with ML-based studies identifying it as a marker of worse prognosis. Likewise, opioid use was linked to non-improvement, potentially reflecting a higher pain burden from RA and other comorbidities.

ML models are widely used to assess clinical improvement in RA. While logistic regression has traditionally been used to identify independent predictors,²¹ it assumes a direct and isolated association between prognostic factors and remission, which may not accurately reflect the multifactorial nature of RA.^21,44 Some studies suggest that ML methods, such as Random Forest and XGBoost, better capture complex relationships and improve predictive performance.^25,45 In this study, ML models showed comparable performance, but Random Forest did not perform optimally, while the ET model achieved higher sensitivity. Similarly, XGBoost demonstrated high efficiency in identifying key clinical features, achieving the highest recall in the test set, aligning with findings from a study on RA remission prediction.⁴⁵

In this study, the comparison of logistic regression models helped determine which best fit the adjusted variables. The XGBoost model demonstrated a stronger correlation with the input variables, suggesting better interpretability, while the LightGBM model showed a weaker correlation. This may be due to the inclusion of binary variables and the specific characteristics introduced into the model (Supplemental Material 2).

While some studies highlight the superiority of specific models, such as Random Forest for predicting methotrexate response in RA,⁴⁶ in this study, performed similarly to other models. However, it was chosen for its decision tree-based approach to binary classification. Similarly, other researchers have used Random Forest to identify key predictors of clinical response to bDMARDs,⁴⁷ often incorporating SHAP to assess feature importance and direction of effect, as done in this study. Their analysis also showed a positive correlation between lower DAS28-ESR values and a higher probability of clinical improvement. Most studies focus on predicting response to a single drug, such as MTX,^43,46,48 or a pharmacological class like bDMARDs,^49
–51 whereas this study takes a broader approach, reflecting real-world clinical practice in a specialized center, with a specific focus on clinical improvement.

Among the limitations of the study is the selection and information bias, since the records are taken from the medical records of the center specialized in RA. However, being a center specializing in the disease, it provides greater security of the veracity of the data, especially taking into account that they are also part of national registries.¹⁴ However, the strength of analyzing results under real-world conditions and overall clinical improvement in the specialized center, within a T2T approach, is highlighted. Even though a comparison of clinical improvement between drug groups was not performed, no distinction was made between the csDMARD and bDMARD groups. However, it is clearly described how these variables have influenced the prediction model. Another important thing to highlight from the study is that it shows that the severity of the disease can be influenced by the presence of anti-CCP positivity and the use of biological therapies, since these therapies are used when control of the disease is not achieved with conventional therapies (Confounding by indication).

Another possible bias refers to the fact that most of the patients started with low disease activity; however, there were patients with high and moderate disease activity from the beginning, particularly in the non-improvement group. The inclusion of both incident and prevalent cases, without distinguishing between them in the analysis, may have introduced bias. Patients newly entering the cohort with poorly controlled disease could differ significantly from those already under regular follow-up, potentially confounding the identification of predictors in the model. However, this approach enhances the generalizability of our findings, as it reflects real-world clinical conditions where patients present with varying disease stages and treatments. The results of this study should be interpreted considering its limitations, which also include: (a) class imbalance, with the non-improvement group as the minority class, may affect the generalization capacity of the models; (b) it is possible that the input features were insufficient to properly classify patients as improvement or non-improvement. Although the models should be validated in other settings and patient groups to optimize their predictive capacity and applicability, it is expected that the features identified as factors contributing to predicting clinical improvement in patients with RA may be useful in decision-making regarding the management of these patients.

We emphasize that in the present study, we strictly followed the TRIPOD guidelines, as it has been observed that few studies implementing ML in RA prediction—whether for clinical improvement, prognosis, or other response variables—fully adhere to these guidelines.⁵² This helps to minimize biases and errors in the present study.

Conclusion

The application of AI models in RA enables the analysis of large datasets to predict clinical improvement in both newly referred patients and those already under follow-up in specialized centers. This facilitates targeted treatment approaches by identifying variables associated with lower probabilities of improvement. The ET model, which demonstrated higher sensitivity, suggests that disease severity prediction may be influenced by factors such as anti-CCP positivity and the use of biological therapies. While this model provides valuable insights, its direct clinical applicability remains limited at this stage. Implementation in rheumatology practice would require further validation in external cohorts, integration into clinical workflows, and prospective evaluation to assess its impact on decision-making. However, its ability to identify key predictors of clinical improvement highlights the potential of AI-driven models to support personalized treatment strategies in RA. Notably, this study focuses on a Latin American population, contributing to the understanding of AI applications in diverse real-world settings.

Supplemental Material

sj-docx-1-tab-10.1177_1759720X251342426 – Supplemental material for Development and evaluation of a multivariable prediction model for clinical improvement in an established cohort of Colombian rheumatoid arthritis patients

Supplemental material, sj-docx-1-tab-10.1177_1759720X251342426 for Development and evaluation of a multivariable prediction model for clinical improvement in an established cohort of Colombian rheumatoid arthritis patients by Claudia Ibáñez-Antequera, Gabriel-Santiago Rodríguez-Vargas, Fernando Rodríguez-Florido, Pedro Rodríguez-Linares, Adriana Rojas-Villarraga and Pedro Santos-Moreno in Therapeutic Advances in Musculoskeletal Disease

Supplemental Material

sj-docx-2-tab-10.1177_1759720X251342426 – Supplemental material for Development and evaluation of a multivariable prediction model for clinical improvement in an established cohort of Colombian rheumatoid arthritis patients

Supplemental material, sj-docx-2-tab-10.1177_1759720X251342426 for Development and evaluation of a multivariable prediction model for clinical improvement in an established cohort of Colombian rheumatoid arthritis patients by Claudia Ibáñez-Antequera, Gabriel-Santiago Rodríguez-Vargas, Fernando Rodríguez-Florido, Pedro Rodríguez-Linares, Adriana Rojas-Villarraga and Pedro Santos-Moreno in Therapeutic Advances in Musculoskeletal Disease

Supplemental Material

sj-docx-3-tab-10.1177_1759720X251342426 – Supplemental material for Development and evaluation of a multivariable prediction model for clinical improvement in an established cohort of Colombian rheumatoid arthritis patients

Supplemental material, sj-docx-3-tab-10.1177_1759720X251342426 for Development and evaluation of a multivariable prediction model for clinical improvement in an established cohort of Colombian rheumatoid arthritis patients by Claudia Ibáñez-Antequera, Gabriel-Santiago Rodríguez-Vargas, Fernando Rodríguez-Florido, Pedro Rodríguez-Linares, Adriana Rojas-Villarraga and Pedro Santos-Moreno in Therapeutic Advances in Musculoskeletal Disease

Footnotes

Acknowledgements

The authors would like to thank Laura Villarreal and Nicolás Gutiérrez for their participation and help in the data collection for this study.

Declarations

ORCID iDs

Claudia Ibáñez-Antequera

Fernando Rodríguez-Florido

Adriana Rojas-Villarraga

Pedro Santos-Moreno

Supplemental material

Supplemental material for this article is available online.

References

Smolen

Aletaha

McInnes

. Rheumatoid arthritis. Lancet 2016; 388: 2023–2038.

Almutairi

Nossent

Preen

, et al. The prevalence of rheumatoid arthritis: a systematic review of population-based studies. J Rheumatol 2021; 48: 669–676.

Londoño

Peláez Ballestas

Cuervo

, et al. Prevalencia de la enfermedad reumática en Colombia, según estrategia COPCORD-Asociación Colombiana de Reumatología. Estudio de prevalencia de enfermedad reumática en población colombiana mayor de 18 años. Rev Colomb Reumatol 2018; 25: 245–256.

Guzmán Moreno

Restrepo Suárez

. Artritis reumatoide temprana. Rev Colomb Reumatol 2002; 9: 171–175.

Aletaha

Neogi

Silman

, et al. 2010 Rheumatoid arthritis classification criteria: an American College of Rheumatology/European League Against Rheumatism collaborative initiative. Arthritis Rheum 2010; 62: 2569–2581.

Machado-Alba

Ruiz

Machado-Duque

. Effectiveness of treatment with biologic- and disease-modifying antirheumatic drugs in rheumatoid arthritis patients in Colombia. Int J Clin Pract 2016; 70: 506–511.

Quintana-Duque

Rondon-Herrera

Mantilla

, et al. Predictors of remission, erosive disease and radiographic progression in a Colombian cohort of early onset rheumatoid arthritis: a 3-year follow-up study. Clin Rheumatol 2016; 35: 1463–1473.

Bautista-Molano

Fernández-Avila

Jiménez

, et al. Epidemiological profile of colombian patients with rheumatoid arthritis in a specialized care clinic. Reumatol Clin 2016; 12: 313–318.

Santos-Moreno

Sánchez

Gomez

, et al. Clinical outcomes in a cohort of Colombian patients with rheumatoid arthritis treated with Etanar, a new biologic type rhTNFR:Fc—PubMed. Clin Exp Rheumatol 2015; 33: 858–862.

10.

Rojas-Villarraga

Ortega-Hernandez

Gomez

, et al. Risk factors associated with different stages of atherosclerosis in Colombian patients with rheumatoid arthritis. Semin Arthritis Rheum 2008; 38: 71–82.

11.

Rogers

Brotherton

de Luis

, et al. Depressive symptoms are independently associated with pain perception in Colombians with rheumatoid arthritis—PubMed. Acta Reumatol Port 2015; 40: 40–49.

12.

Smolen

. Treat-to-target as an approach in inflammatory arthritis. Curr Opin Rheumatol 2016; 28: 297–302.

13.

Fraenkel

Bathon

England

, et al. 2021 American College of Rheumatology guideline for the treatment of rheumatoid arthritis. Arthritis Care Res (Hoboken) 2021; 73: 924–939.

14.

Santos-Moreno

Rodríguez-Vargas

Martínez

, et al. Better clinical results in rheumatoid arthritis patients treated under a multidisciplinary care model when compared with a national rheumatoid arthritis registry. Open Access Rheumatol 2022; 14: 269–280.

15.

Van Gestel

Prevoo

MLL

Van ’t Hof

, et al. Development and validation of the European League Against Rheumatism response criteria for rheumatoid arthritis. Comparison with the preliminary American College of Rheumatology and the World Health Organization/International League Against Rheumatism Criteria. Arthritis Rheum 1996; 39: 34–40.

16.

Van Gestel

Haagsma

van Riel

. Validation of rheumatoid arthritis improvement criteria that include simplified joint counts. Arthritis Rheum 1998; 41: 1845–1850.

17.

Fransen

van Riel

PLCM

. The disease activity score and the EULAR response criteria. Rheum Dis Clin North Am 2009; 35: 745–757.

18.

Hamet

Tremblay

. Artificial intelligence in medicine. Metabolism 2017; 69S: S36–S40.

19.

Ramesh

Kambhampati

Monson

JRT

, et al. Artificial intelligence in medicine. Ann R Coll Surg Engl 2004; 86: 334–338.

20.

Mintz

Brodie

. Introduction to artificial intelligence in medicine. Minim Invasive Ther Allied Technol 2019; 28: 73–81.

21.

Venerito

Angelini

Fornaro

, et al. A machine learning approach for predicting sustained remission in rheumatoid arthritis patients on biologic agents. J Clin Rheumatol 2022; 28: e334–e339.

22.

Fernández-Ávila

Rojas

Mora

, et al. Design of an algorithm for the diagnostic approach of patients with joint pain. Clin Rheumatol 2021; 40: 1581–1591.

23.

Liu

Chen

. A 9 mRNAs-based diagnostic signature for rheumatoid arthritis by integrating bioinformatic analysis and machine-learning. J Orthop Surg Res 2021; 16: 44.

24.

Kothari

Gionfrida

Bharath

, et al. Artificial intelligence (AI) and rheumatology: a potential partnership. Rheumatology 2019; 58: 1894–1895.

25.

Lee

Kang

Eun

, et al. Machine learning-based prediction model for responses of bDMARDs in patients with rheumatoid arthritis and ankylosing spondylitis. Arthritis Res Ther 2021; 23: 254.

26.

Bruce

Fries

. The health assessment questionnaire (HAQ). Clin Exp Rheumatol 2005; 23: S14–S18.

27.

Prevoo

MLL

Van ’t Hof

Kuper

, et al. Modified disease activity scores that include twenty-eight-joint counts development and validation in a prospective longitudinal study of patients with rheumatoid arthritis. Arthritis Rheum 1995; 38: 44–48.

28.

Collins

Moons

KGM

Dhiman

, et al. TRIPOD+AI statement: updated guidance for reporting clinical prediction models that use regression or machine learning methods. BMJ 2024; 385: e078378.

29.

Rios

Miller

RJH

Manral

, et al. Handling missing values in machine learning to predict patient-specific risk of adverse cardiac events: insights from REFINE SPECT registry. Comput Biol Med 2022; 145: 105449.

30.

Pudjihartono

Fadason

Kempa-Liehr

, et al. A review of feature selection methods for machine learning-based disease risk prediction. Front Bioinform 2022; 2: 927312.

31.

Lundberg

Lee

. A unified approach to interpreting model predictions. Adv Neural Inf Process Syst 2017; 2017: 4766–4775.

32.

Guío Español

Tamayo Uribe

Gómez Ayerbe

. Marco ético para la inteligencia artificial en Colombia [Internet]. Bogotá, Ministry of Health of Colombia, 2021, pp. 1–63, https://minciencias.gov.co/sites/default/files/marco-etico-ia-colombia-2021.pdf (accessed 5 June 2024).

33.

Koo

Eun

Shin

, et al. Machine learning model for identifying important clinical features for predicting remission in patients with rheumatoid arthritis treated with biologics. Arthritis Res Ther 2021; 23(1): 178.

34.

Castro-Santos

Díaz-Peña

. Genetics of rheumatoid arthritis: a new boost is needed in Latin American populations. Rev Bras Reumatol 2016; 56: 171–177.

35.

Gamboa-Cárdenas

Ugarte-Gil

Loreto

, et al. Clinical predictors of remission and low disease activity in Latin American early rheumatoid arthritis: data from the GLADAR cohort. Clin Rheumatol 2019; 38: 2737–2746.

36.

Fajardo

Graf

. Inteligencia artificial, ¿transformación de la reumatología?—II Parte. Global Rheumatol. Epub ahead of print 30 June 2022. DOI: 10.46856/GRP.26.E125.

37.

Romero-Sánchez

Beltrán-Ostos

. Aproximaciones de la inteligencia artificial aplicada en la inmunología de las enfermedades autoinmunes y autoinflamatorias, https://aipocrates.blog/2022/11/27/aproximaciones-de-la-inteligencia-artificial-aplicada-en-la-inmunologia-de-las-enfermedades-autoinmunes-y-autoinflamatorias/ (2022, accessed 1 January 2023).

38.

González

. Machine learning models in rheumatology. Revis Colomb Reumatol (Engl Ed) 2015; 22: 77–78.

39.

Morales Muñoz

. Modelo computacional para la identificación de endofenotipos en pacientes con Artritis Reumatoide utilizando información del Antígeno Leucocitario Humano HLA clase II [Internet]. Bogotá, Universidad Nacional de Colombia, https://repositorio.unal.edu.co/bitstream/handle/unal/50373/98400299.2014.pdf (2014, accessed 5 June 2014).

40.

Morales Muñoz

Quintana

Niño

. Modelo computacional para la identificación de endofenotipos y clasificación de pacientes con artritis reumatoide a partir de datos genéticos, serológicos y clínicos, utilizando técnicas de inteligencia computacional. Revis Colomb Reumatol 2015; 22: 90–103.

41.

Del Risco Morales

. Modelo computacional para el análisis de historias clínicas de pacientes con Artritis Reumatoide aplicando bioinformática traslacional y minería de textos [Internet]. Bogotá, Universidad Nacional de Colombia, https://repositorio.unal.edu.co/handle/unal/80693 (2021, accessed 5 June 2024).

42.

Hernández Tarapués

. Modelo farmacogenético y clínico para la predicción de desenlaces en pacientes con artritis reumatoide tratados con metotrexato y adalimumab, https://repositorio.unal.edu.co/handle/unal/78759 (2020, accessed 1 January 2023).

43.

Duong

Crowson

Athreya

, et al. Clinical predictors of response to methotrexate in patients with rheumatoid arthritis: a machine learning approach using clinical trial data. Arthritis Res Ther 2022; 24: 162.

44.

Chen

, et al. Early and accurate prediction of clinical response to methotrexate treatment in juvenile idiopathic arthritis using machine learning. Front Pharmacol 2019; 10: 1155.

45.

Ventura

Sheta

Alsaber

, et al. Machine learning-based remission prediction in rheumatoid arthritis patients treated with biologic disease-modifying anti-rheumatic drugs: findings from the Kuwait rheumatic disease registry. Front Big Data 2024; 7: 1406365.

46.

Lim

AJW

Lim

Ooi

BNS

, et al. Functional coding haplotypes and machine-learning feature elimination identifies predictors of methotrexate response in rheumatoid arthritis patients. EBioMedicine 2022; 75: 103800.

47.

Salehi

Lopera Gonzalez

Bayat

, et al. Machine learning prediction of treatment response to biological disease-modifying antirheumatic drugs in rheumatoid arthritis. J Clin Med 2024; 13: 3890.

48.

Myasoedova

Athreya

Crowson

, et al. Toward individualized prediction of response to methotrexate in early rheumatoid arthritis: a pharmacogenomics-driven machine learning approach. Arthritis Care Res (Hoboken) 2022; 74: 879–888.

49.

Kalweit

Burden

Boedecker

, et al. Patient groups in rheumatoid arthritis identified by deep learning respond differently to biologic or targeted synthetic DMARDs. PLoS Comput Biol 2023; 19: e1011073.

50.

Sonomoto

Fujino

Tanaka

, et al. A machine learning approach for prediction of CDAI remission with TNF inhibitors: a concept of precision medicine from the FIRST registry. Rheumatol Ther 2024; 11: 709–736.

51.

Ukalovic

Leeb

Rintelen

, et al. Prediction of ineffectiveness of biological drugs using machine learning and explainable AI methods: data from the Austrian Biological Registry BioReg. Arthritis Res Ther 2024; 26: 1–12.

52.

Mendoza-Pinto

Sánchez-Tecuatl

Berra-Romani

, et al. Machine learning in the prediction of treatment response in rheumatoid arthritis: a systematic review. Semin Arthritis Rheum 2024; 68: 152501.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.09 MB

0.62 MB

0.21 MB