Abstract
This narrative review outlines the current applications and considerations of artificial intelligence (AI) for diagnosis, management, and prognosis in rheumatoid arthritis (RA), axial spondyloarthritis (axSpA), and psoriatic arthritis (PsA). Advances in AI, mainly in machine learning and deep learning, have significantly influenced medical research and clinical practice over the past decades by offering precisions in data understanding and treatment approaches. AI applications have enhanced risk prediction models, early diagnosis, and better management in RA. Predictive models have guided treatment decisions such as—response to methotrexate and biologics—while wearable devices and electronic health records (EHR) improve disease activity monitoring. In addition, AI applications are reported as promising for the early identification of extra-articular involvements, prediction, detection, and assessment of comorbidities. In axSpA, AI-driven models using imaging techniques such as sacroiliac radiography, magnetic resonance imaging, and computed tomography have increased diagnostic accuracy, especially for early inflammatory changes. Predictive algorithms help stratify and predict disease outcomes, while clinical decision support systems integrate clinical and imaging data for optimized management. For PsA, AI has also allowed for early detection among psoriasis patients using genetic markers, immune profiling, and EHR-based natural language processing systems. Overall, AI models may predict diagnosis, disease severity, treatment response, and comorbidities to improve care in patients with RA, axSpA, and PsA. As a rapidly developing and improving area, AI has the potential to change our current perspective of medical practice by offering better diagnostic evaluation and treatments and improved patient follow-up. Multimodal AI, focusing on collaboration, reliability, transparency, and patient-centered innovation, looks like the future of medical practice. However, data quality, model interpretability, and ethical considerations must be addressed to ensure reliable and equitable applications in clinical practice.
Keywords
Introduction
Humanity has witnessed various revolutions throughout history, bringing about new opportunities and challenges. Although scientific knowledge and curiosity have progressed cumulatively, the advancements of the last 50 years have achieved unparalleled momentum in pace and impact. 1 Like links in a chain, the progression of computer technologies has led to the emergence of one of the most significant developments of recent decades: artificial intelligence (AI). 2 As it influences every aspect of life, AI has already changed and will probably change our practice and understanding of medical practice, medical education, and scientific research methodologies. In recent years, AI algorithms, particularly machine learning (ML) and deep learning (DL), have begun to be employed across nearly all areas of medicine for purposes such as diagnosis, differential diagnosis, imaging analysis, treatment response prediction, drug discovery, and development, precision medicine, public health applications, and prognosis assessment.3,4
Rheumatic diseases, in particular, represent an appropriate field for using the benefits of these technologies. Chronic rheumatic diseases, which often exhibit autoimmune and/or autoinflammatory characteristics, present complex clinical profiles where AI applications may provide advantages. 4 These diseases frequently involve overlapping symptomatology, complex autoantibody profiles, and diverse management strategies. 5 The traditional classification and diagnostic criteria often fall short of adequately capturing these complex profiles, underscoring the need for a paradigm shift in rheumatology. 5 The rapid increase in AI applications in rheumatology research in recent years has deepened our understanding of these conditions.2,3,6 It has the potential to elucidate the underlying mechanisms of disease progression.2,3,6 In this regard, AI’s capabilities in big data analysis and multidimensional data processing may have significant potential for developing clinical decision support systems (CDSS) that can enhance diagnostic accuracy and disease management processes’ comprehensiveness.3,6
This narrative review will first provide a brief overview of essential AI terminology to assist clinicians in understanding key concepts. Subsequently, it will explore the contributions of AI to disease management across key stages—from diagnosis to prognosis determination—in rheumatoid arthritis (RA), axial spondyloarthritis (axSpA), and psoriatic arthritis (PsA) and discuss relevant examples from literature related to the subtopic. Finally, it will discuss the opportunities that AI presents, the challenges it may introduce, and the potential obstacles researchers and healthcare professionals in rheumatology may face in integrating these technologies.
Methods
The PubMed/MEDLINE and Web of Science databases were searched until March 15, 2025. The search was performed by using combinations of the terms “artificial intelligence,” “machine learning,” “deep learning,” “large language models,” “rheumatoid arthritis,” “axial spondyloarthritis,” and “psoriatic arthritis.” Original research articles and review papers written in English and relevant articles cited within these works were included in this review. As this is a narrative review, the articles selected for inclusion were chosen based on their relevance to the topic, adequate sample size, validation process, reporting a new finding, and/or reflecting the author’s perspective. If the results’ validation (internal or external) was not assessed, it was indicated in the text.
AI and related concepts
Artificial intelligence
AI can be explained as machines that mimic human intelligence and make decisions similar to humans. 7 Unlike traditional programming methods, AI learns from data to uncover relationships and derive insights, combining computer science and statistics. ML, natural language processing (NLP), speech recognition, expert systems, robotics, computer vision, and several other branches work harmoniously to serve AI. All the branches of AI, especially ML and NLP, appear to be more prominent in this area of medicine.
Machine learning
ML, a specialized field within AI, enables systems to recognize patterns and generate predictions from data without explicit programming. ML models identify relationships within data, improving decision-making processes used in diagnosis, prognosis, and treatment planning in medicine.7,8 Models generalize patterns learned from datasets to predict new outcomes.7,8 Primary ML methods include supervised learning (learning input-output relationships from labeled datasets, such as linking patient symptoms to diagnoses), unsupervised learning (identifying hidden structures or patient subgroups from unlabeled data), semi-supervised learning (combining limited labeled data with extensive unlabeled data, particularly useful in rare diseases), and reinforcement learning (learning sequential decision-making through feedback, potentially beneficial for developing treatment strategies).7,8
Deep learning
DL is a subset of ML, using neural networks with multiple layers to automatically learn representations from large and complex datasets. 9 Unlike traditional ML algorithms, such as the Support Vector Machine (SVM), Random Forests (RnFr), Decision Trees, and Gradient Boosting Machines, which normally require manual feature extraction, DL models learn relevant features themselves from raw data. 3 Therefore, they are particularly suited to high-dimensional and unstructured data represented by images, audio, and text. DL models have proved particularly powerful, as their depth enables them to capture non-linear, complex relationships among data. The common architectures are convolutional neural networks (CNNs) for image data and recurrent neural networks (RNNs) for sequential data like time series or natural language. The transformer architecture is a DL model designed primarily for NLP and sequence-related tasks. Transformers were proposed by researchers at Google in 2017. They utilized an “attention” mechanism that freed the model to process relationships across the entire input sequence in parallel rather than sequentially, as in RNNs. 10 This makes transformers very efficient, especially for dealing with long data sequences. Popular models include BERT, generative pre-trained transformer (GPT), and T5—all based on transformer architecture. 11 In-depth analysis of electronic health records (EHR), radiology reports analysis, genomic data analysis, and medical literature review are readily available. Vision transformers (ViTs) are DL models that adapt the transformer architecture, initially designed for NLP, to image analysis tasks. By dividing images into fixed-size patches and processing them as sequences, they effectively capture long-range dependencies, achieving state-of-the-art performance in image classification. This enables them to be applied in medical image segmentation and classification.12,13
Radiomics
Radiomics is a technique responsible for extracting and analyzing complex features from medical images, such as X-rays, computed tomography (CT), and magnetic resonance imaging (MRI) scans. Radiomics can characterize tissue properties by examining details within images that may not be visible to the human eye, providing valuable insights for diagnosis, treatment planning, and disease monitoring. 14 Although initially popular in cancer diagnostics, radiomics has also become relevant for rheumatology, helping to evaluate joint inflammation, bone erosion, and synovial thickness, which are critical information for disease activity and prognosis.
Traditional radiomics uses feature extraction algorithms to identify structural, textural, and shape-based characteristics within the images, and ML algorithms then classify these features. More recently, DL techniques, especially CNNs ViTs, have also been applied in radiomics to automatically learn and classify features from images, further enhancing diagnostic capabilities and predictive modeling.12–14
Natural language processing
NLP is a sub-field of AI that enables computers to understand, interpret, and generate human language.15,16
In medicine, NLP can potentially improve the quality of patient care, research, and administrative activities by extracting relevant information from the unstructured clinical notes of EHRs, medical literature, and patient-generated data.15,17 In addition, NLP can support mining scientific literature and clinical trial data for drug candidates and help describe the mechanism of action, including adverse effects. 17 One of the most notable advancements in NLP is the development of large language models (LLMs), such as GPT and BERT, which are trained on vast amounts of text data to perform complex language tasks. These models can summarize clinical notes, extract disease-relevant features, assist in automated diagnosis, and support clinical decision-making by understanding the context and semantics of medical language. 15
Summary of AI and related concepts was in Table 1
EHRs, medical imaging data, genomic and other omics data, clinical trial data, patient-generated data, and medical literature and research data are key data sources that form the backbone of AI applications in medical practice and research. These sources may help to an accurate diagnosis, treatment, and continuous patient monitoring.
Overview of AI methodologies, algorithms, and their applications in medicine.
AI, artificial intelligence; CNN, convolutional neural networks; DL, deep learning; GPT, generative pre-trained transformer; KNN, K-nearest neighbors; LMM, large language model; ML, machine learning; NLP, natural language processing; RNN, recurrent neural networks; SVM, support vector machines; ViT, vision transformer.
AI in the management of RA
AI, ML, and DL algorithms may evolve the management of RA through more precise diagnostics, better treatment strategies, and effective follow-up of disease progression through their capacity to analyze large datasets, including patient records, imaging data, and genetic profiles (Table 2).
AI applications in RA.
AI, artificial intelligence; ANN, artificial neural network; AUC, area under the receiver operating characteristic curve; AuRA, automated RA scoring algorithm; CNN, convolutional neural network; CT, computed tomography; CVD, cardiovascular disease; DL, deep learning; DNN, Deep neural network; EHR, electronic health record; FVC, forced vital capacity; GAN, generative adversarial network; ILD, interstitial lung disease; LLM, large language model; mAP, mean average accuracy; MEG, multi-evidence genes; ML, machine learning; MRI, magnetic resonance imaging; mTSS, modified total Sharp score; NER, named entity recognition; NLP, natural language processing; OP, osteoporosis; OSS, Overall Sharp Score; PPI, protein-protein interaction; PPV, positive predictive value; PRO, patient-reported outcome; RA, rheumatoid arthritis; RMSE, root mean square error; RnFr, Random Forest; ROC, receiver operating characteristic; RSF, random survival forest; SHAP, SHapley Additive exPlanations; SNP, single-nucleotide polymorphism; SvH, Sharp–van der Heijde score; SVM, support vector machine; ViT, Vision Transformer; XGBoost, extreme gradient boosting.
Predicting the risk of RA
After recognizing that early diagnosis and detection can dramatically change the course of RA, interest in the disease’s early phase and the preclinical phase has grown. 18 In this regard, predicting the risk of developing RA plays an important role. Thus, identifying these people allows for close monitoring that may help reduce the disease burden from the beginning. The recognition and follow-up of at-risk individuals could also be significant in gaining valuable molecular and clinical data regarding the preclinical phase of RA. As they represent an important group for future potential RA prevention studies, they will likely provide further insight into the RA development process. AI applications may benefit this area, where predictive factors are limited to serological and clinical data. AI models have been utilized to predict RA risk by analyzing serum proteomics, genomics, SNPs, clinical and laboratory data, EHRs, and imaging data.19–28
A study by Chung et al. 23 using ML algorithms based on human leukocyte antigen (HLA) reported that alleles HLADQA103:03, DQB10401, and DRB1*0405 could predict the risk of RA development. The following non-HLA genes were studied for their association with RA development: PTPN22, STAT4, TRAF1, CD40, and PADI4; decision tree model showed the best performance, with sensitivity and specificity of around 70%. 24 Besides, IRF5 gene polymorphisms have been shown to influence RA susceptibility, particularly in anti-citrullinated protein antibody (ACPA)-negative and shared epitope-positive patients. 29 A model based on 23 proteins, regardless of ACPA status, predicted the risk of developing RA with 91.2% accuracy. 25 Another study using ML and weighted gene co-expression network analysis identified mitochondrial oxidative stress-related genes with high predictive value in RA: CDKN1A, GADD45B, and MAFF. 26
Fujii et al. developed a Feedforward Neural Network model to predict the progression from seronegative undifferentiated arthritis (UA) to RA using clinical and laboratory data. Utilizing the KURAMA cohort (training dataset, 210 patients with seronegative UA, 27.1% progressed to RA) and the ANSWER cohort (validation dataset, 140 patients, 32.1% progressed to RA), the model incorporated demographic data, acute phase reactants, autoantibodies, and physical examination findings, achieving an area under the curve (AUC) of 0.924 in the training set and 0.777 in the validation set. 27 Salehi et al. aimed to estimate the time and risk of RA onset in 154 preclinical RA patients using various survival ML models, with the Random Survival Forest (RSF) model performing best (C-index: 0.798). By employing SHapley Additive Explanations (SHAP), the model identified baseline rheumatoid factor (RF) and anti-cyclic citrullinated peptide (anti-CCP) antibody levels as key predictors, facilitating patient stratification into low-, medium-, and high-risk groups; however, the model needs further validation. 28
When combined with ML algorithms, ultrasonography may help predict RA development.30,31 In the study by Daskareh et al., 326 patients with arthralgia were followed up with regular ultrasound imaging over 2 years, and 123 patients developed RA by the end of the follow-up. ML data analysis showed that, even with a need for further validation, radiocarpal synovial thickness, PIP/MCP synovitis, wrist effusion, and RF and anti-CCP levels were associated with the development of RA. 30 Hu et al. aimed to improve early identification of RA progression in 432 UA patients using ML models, particularly a RnFr model integrating clinical data with the 18-joint ultrasound scoring system (US18). The RnFr model demonstrated and validated the highest accuracy and sensitivity, with SHAP analysis highlighting US18 Grade 2 joint counts, total US18 score, and swollen joint count as key predictive variables. 31
Diagnosis and differential diagnosis of RA
The diagnosis of RA is clinical, but specific diagnostic and classification criteria assist clinicians in identifying the disease and distinguishing it from other conditions. 32 Besides clinical signs and symptoms, serological markers, especially rheumatoid factor and anti-cyclic citrullinated peptide, ease the diagnostic process. Given the crucial role of early diagnosis and treatment in managing the disease, tools that facilitate and confirm diagnosis effectively have become increasingly important. 32 Data from EHRs, sensor and monitoring data, imaging and biopsy data, and omics data—such as proteomics, genomics, metagenomics, transcriptomics, and metabolomics—have been used in AI applications developed for RA diagnosis and classification.19,32–34
“Omic,” molecular and histopathologic data
Omics data, such as genomics, epigenomics, transcriptomics, proteomics, metabolomics, and metagenomics, molecular and histopathologic data, may provide important data in diagnosing RA by providing multi-dimensional views of the disease.33–47 Other epigenetic mechanisms, including DNA methylation, histone modifications, chromatin remodeling, and non-coding RNAs, regulate gene expression and provide insight into the molecular processes underlying RA. 33 Below, some examples from the current literature are explained.33–47
Mu et al. utilized ML-assisted bioinformatics to identify and validate potential biomarkers and immune molecular mechanisms associated with RA. They analyzed gene expression data from the Gene Expression Omnibus (GEO) database and identified 79 key Differentialy Expressed Genes (DEGs). Further screening using three ML methods—LASSO regression, SVM recursive feature elimination, and RnFr-identified 12 hub genes. Among these, seven (KYNU, EVI2A, CD52, C1QB, BATF, AIM2, and NDC80) showed high diagnostic potential (all had AUC > 0.80). 35 Liu et al. 36 reported that RnFr models identified nine mRNAs that distinguished RA from healthy samples with high accuracy (AUC ranging between 0.95 and 1.00). However, the model needs validation. In another setting, ML models applied to 16 genes effectively differentiated RA from osteoarthritis (OA), achieving accuracy rates from 91% to 96% with external validation. 37 Geng et al. 38 reported and internally validated models based on the expression of 19 N6-methyladenosine (m6A) methylation regulators, particularly IGF2BP3 and YTHDC2, achieved high AUC values (above 0.8) for distinguishing RA from non-RA, demonstrating the diagnostic potential of m6A methylation profiles.
With immunoblotting analysis, Rychkov et al. analyzed peripheral blood cells of RA patients and healthy controls and synovial cells of RA patients. TNFAIP6/TSG6 and HSP90AB1/HSP90 were found valuable for RA diagnosis and monitoring, and they externally validated the results in an independent cohort. 39 A RnFr model based on specific serum proteins accurately classified RA, ACPA-positive RA, and ACPA-negative RA with high precision (AUCs of 0.9949, 0.9913, and 1.0, respectively), which was supported by internal validation. 34 Wu et al. identified and validated two fibroblast-related biomarkers via three different ML techniques. AIM2 and PSMB9 proteins were strongly associated with RA. 40
Using cytokine and chemokine profiles obtained from synovial biopsies, Yeo et al. 41 employed the Generalized Matrix Learning Vector Quantization method to distinguish RA from non-inflammatory conditions and differentiate early RA from self-remitting arthritis with high accuracy (AUCs of 0.99 and 0.76, respectively), with the need of further validation. Another study by Heard et al. 42 revealed and internally validated that 12 cytokines can significantly differentiate RA from osteoarthritis and healthy individuals (AUC > 0.90).
A study utilizing Liquid Chromatography-Mass Spectrometry (LC-MS) to analyze metabolomic data successfully differentiated RA cases from healthy controls, highlighting the value of a holistic metabolomic and lipidomic approach in diagnosis, with internal validation. 43 Metagenomic data from gut microbiota studies underscored the microbiota’s influence on the immune system and metabolic pathways relevant to RA. 44 Kopec´ et al. aimed to distinguish OA from RA using advanced machine-learning techniques applied to metabolomic data from synovial fluid. ML models, optimized by genetic algorithms and validated through SHAP analysis, identified key metabolites (glutamine, pyruvate, and proline) with high accuracy. 45
Heo et al. reported a plasmonic diagnostics platform combined with ML to accurately classify RA from synovial fluid analysis. The platform achieved high sensitivity and specificity, but further validation is needed. 46
In addition to -omic and molecular data analysis, histopathological examination and its evaluation through ML algorithms can yield a high diagnostic yield. When symptoms overlap among conditions, particularly in cases affecting the joints, a tissue biopsy—such as a synovial biopsy—can provide essential clarity. A RnFr algorithm was utilized in a study involving synovial samples from 147 osteoarthritis patients and 60 RA patients who had undergone total knee arthroplasty. By combining pathologist-assigned scores with cellular density measures derived from computer vision, the model achieved an AUC of 0.91, effectively distinguishing between OA and RA. 47
EHR, clinical, and digital data
EHRs contain comprehensive patient information, provide a real-world perspective on patient care, and offer insights into treatment outcomes and disease progression. However, due to the vast amount of data in EHRs, efficiently identifying RA patients can be challenging because of their unstructured nature. To ensure accuracy, various ML algorithms have been developed to detect RA cases within EHRs.48–56
For example, in a study conducted in the Netherlands and Germany, six ML algorithms and a naïve word-matching approach were compared in identifying RA patients. Among these, the SVM algorithm had the highest performance scores, with an AUROC of 0.98 and a positive predictive value (PPV) of 0.94; this model identified 2873 RA patients out of 23,300 in less than 7 s, proving the efficiency of this model. 48 Another study assessed the use of ML to identify RA patients from EHRs based on rheumatologist diagnoses, comparing this with patients meeting the 1987 and 2010 classification criteria. The ML model had relevant accuracy, with 80% of the ML-identified cases meeting at least one classification set, and no significant differences in clinical or demographic parameters were noted between the groups. 49 Carroll et al. developed an SVM model that achieved an AUROC greater than 0.90 in predicting RA status, exceeding the performance of deterministic models. Follow-up evaluations demonstrated the robustness of the model even after revisions in diagnostic codes and medications. 50 Kronzer et al. developed a novel RA algorithm using a LASSO-based ML approach to enhance diagnostic accuracy within the Mayo Clinic Biobank and Tapestry Study. Compared to standard rules-based and eMERGE algorithms, the ML model achieved higher sensitivity (4%–11% improvement) at 90% PPV, outperforming conventional methods. 51
ML algorithms using clinical data (including laboratory or clinical assessment) may also provide diagnostic benefits.57,58 For instance, Bai et al. developed an artificial neural network (ANN) model utilizing patient demographics and antibody profiles to identify RA patients. This model incorporated six features: age, sex, rheumatoid factor, anti-citrullinated peptide antibody, 14-3-3η, and anti-carbamylated protein antibodies. It achieved an AUROC of 0.95 and an F1 score of 0.916, indicating a high level of accuracy. 57 Fukae et al. developed a novel diagnostic approach by transforming clinical data into two-dimensional color-coded array images (TDA). These were then used to fine-tune a CNN to distinguish RA from non-RA cases. Their results demonstrated that while certain subsets of clinical information alone yielded moderate classification accuracy, the full TDA images—integrating both joint assessment and various clinical parameters, including laboratory parameters—achieved the highest diagnostic performance (up to 98% accuracy), comparable to expert rheumatologists. 58
Smartphone photographs can be used to detect inflammation in the joints. 59 However, Knitza et al. assessed the diagnostic accuracy of two digital decision support systems, Ada and Rheport, for identifying inflammatory rheumatic diseases (IRDs), including RA, in a large, multicenter trial involving 600 patients, showing overall diagnostic accuracies of 63% (Ada D1), 58% (Ada D5), and 52% (Rheport). While Ada performed better in identifying RA compared to other IRDs, the overall accuracy of both systems was limited, underlining the need for improved digital diagnostic decision support systems and stricter regulations to ensure efficacy and safety. 60
Imaging
Imaging is the cornerstone in diagnosing and grading RA, with conventional radiography remaining one of the most important imaging modalities. However, interpreting radiographs can be challenging due to the insidious nature of radiographic changes in musculoskeletal diseases and interobserver variability. To address some of these challenges, AI, ML, and DL, especially CNN and ViTs, have begun to be integrated into image analysis to enhance diagnostic precision and speed. 13
Hand radiographs are widely used to identify characteristic features in RA, such as periarticular osteopenia and juxta-articular erosions. Several studies have indicated the potential for using CNNs and other algorithms to automate RA diagnosis from hand radiographs.61–66 Ureten et al. 63 proposed a CNN-based model that achieved an accuracy of 73.33% in distinguishing RA patients from healthy controls, demonstrating the feasibility of AI-assisted diagnosis using standard radiographic images. Another CNN-based automated RA diagnostic system using GoogLeNet and VGG16 architectures on hand radiographs achieved a high AUC of 97.80% and 100% sensitivity for RA recognition (GoogLeNet), and 83.36% AUC with 92.67% sensitivity for RA staging (VGG16). 64
Fung et al. reported a DL-based joint detection approach using YOLOv5l6 (You Only Look Once), trained on pediatric hand radiographs, to identify joints in RA patient X-rays. Despite being trained only on healthy left-hand images, the model achieved excellent performance on RA hand radiographs, with F1 scores exceeding 0.98 for all joints (PIP, MCP, wrist, radius, ulna) in the test set. 65 Liu et al. proposed a novel DL method using grayscale ImageNet pre-training with ShuffleAttention and dense residual connections, MediMatrix. They show that it efficiently classified RA X-ray images. 66
ML-based thermal imaging is a newly proposed RA diagnosis and evaluation.67–70 Bardhan et al. 67 proposed a two-tier classification approach that accurately classified arthritis-affected knees and further identified those affected by RA, successfully labeling almost 75% of knee thermograph scans. Ahalya et al. 69 focused on classifying RA and normal subjects using thermographic hand images. LogitBoost achieved the highest accuracy percentage, 93.75%, with 10-fold cross-validation, while Quantum Support Vector Machine (QSVM) reached a slightly lower accuracy of 92.7% with reduced computational cost and training time. Kesavapillai et al. reported an automated diagnosis system for RA using AI and quantum computing, QSVM, and ViT models. QSVM achieved 93.75% accuracy for X-rays and 87.5% accuracy for thermal images, while ViT achieved 80% for X-rays and 90% for thermal images. 70
Studies using computerized tomography are scarce in the area of RA diagnosis and AI applications. Folle et al. reported a neural network trained on 3D joint shapes from HR-pQCT scans successfully differentiated RA, PsA, and healthy controls with AUCs of 82%, 75%, and 68%, respectively. When applied to UA patients, the network classified 86% as RA, 11% as PsA, and 3% as healthy; however, further validation is needed. 71
Ultrasound is a sensitive method for detecting early soft tissue changes in RA, including synovitis and joint effusions. Various AI models have been developed to analyze this modality and enhance the detection and quantification of pathological features.72–77 For example, Wu et al. 72 employed a DL-based model to evaluate synovial proliferation in US images with high accuracy (AUC > 0.85 for all models) in assessing RA. Cipolletta et al. 73 developed and cross-validated a VGG16-based DL algorithm that accurately identified ultrasound images of metacarpal head cartilage (AUC = 0.99), showed near-perfect agreement with expert assessments (Cohen’s κ = 0.84), and outperformed beginner sonographers in both accuracy and efficiency. Karageorgos et al. 74 further reported and cross-validated a DL-based approach for automated synovium segmentation in ultrasound images of RA-affected finger joints with a high Dice score of 0.768, intersection over union (IoU): 0.624 through CNNs and data augmentation strategies.
AI algorithms have also been utilized on MRI data to detect and quantify bone erosions and bone marrow edema (BME). Automatic quantification of BME and tenosynovitis in early arthritis patients can achieve accuracy comparable to clinicians.78–80
Schlereth et al. developed and validated a CNN-based automated scoring system for assessing bone erosions, osteitis, and synovitis in hand MRIs of RA and PsA patients, using MRIs from 211 patients for training and 220 patients for external validation. The model achieved high diagnostic accuracy with macro-AUCs of 92% for erosions, 91% for osteitis, and 85% for synovitis, and Spearman correlations of 90% for erosions, 78% for osteitis, and 69% for synovitis compared to expert assessments. 79 Methods of the studies often involve combining axial and coronal MRI slices into a single high-resolution 3D image, identifying and segmenting the carpal bones and extensor/flexor tendon regions in the wrist, and performing specific image intensity analysis to determine the proportion of voxels indicative of BME and tenosynovitis within a given bone or tendon.81–84 Folle et al. used ResNet neural networks to classify hand MRI inflammation patterns among seropositive RA, seronegative RA, and PsA, achieving AUROCs of 75%, 74%, and 67%, respectively, with minimal impact from omitting contrast-enhanced sequences or adding clinical data. When applied to psoriasis (PsO) patients without clinical arthritis, the model frequently assigned them to PsA, suggesting early PsA-like MRI patterns may be detectable before clinical onset. 84 DL-based MRI with automated segmentation effectively classifies knee synovitis (RA, gouty arthritis, and pigmented villonodular synovitis). 85 A generative adversarial network (GAN) model, PatchGAN, was introduced to override the need for gadolinium enhancement and showed strong performance for generating contrast-like images. 86
In addition, a novel DL-based method, 99mTc-maraciclatide imaging, was reported to have high performance for segmenting normal, low, and highly inflamed tissues in RA with Dice scores of 0.94, 0.51, and 0.76, respectively; however, further validation is needed. 87 As another imaging modality, AI-based fluorescence optical imaging applications are another promising tool for the differential diagnosis of RA. 88
Predicting the joint damage and disease flares
Joint damage assessment
ML applications extend to RA imaging, mainly radiographs, which are used to monitor joint damage.13,89–103 ANN and DL promote these processes, and AI models may reduce the need for physician expertise and enable rapid comparisons between images.
Izumi et al. proposed a DNN-based model for classifying wrist joint subluxation and ankylosis as part of an automated radiographic scoring system for RA based on a modified total Sharp score (mTSS). Although the model utilized a minimal dataset, it demonstrated high accuracy—AUC of 97%—and solid performance metrics compared to AlexNet, ResNet, DenseNet, and ViT, contributing to the automation of mTSS evaluation; further validation of the model is needed. 93 Another model was developed as an automated system employing “You Only Look Once” for detecting and classifying joint regions in hand X-rays based on the mTSS. This model effectively detected finger and carpal joints, achieving a mean average precision of 0.92 for joint detection and an average classification accuracy of 0.88. Although the authors describe their approach as “mTSS-based,” they apply a simplified three-class classification (healthy, mild, severe) rather than regressing continuous mTSS scores. While this technically constitutes a modification of the original scoring system, it does not invalidate the performance metrics appropriately framed within the scope of a categorical classification task. 94 Radke et al. aimed to improve erosion detection in X-rays of RA patients by training RetinaNet models with adaptively modified IoU values using the focal loss function with the gold standard as the Sharp van der Heijde (SvH) score. The proposed adaptive IoU approach achieved 94% accuracy and a mean average accuracy (mAP) of 0.81, outperforming static IoU models, which reached only 80% accuracy and a mAP of 0.43. 95
The RA2-DREAM Challenge successfully developed ML algorithms that accurately quantify joint damage in RA using radiographic images, achieving high concordance with expert-calculated SvH scores (0.71–0.82). 96 Venalainen et al. developed and externally validated an automated RA scoring algorithm (AuRA) developed during the RA2-DREAM challenge, designed to predict SvH total scores in hand and foot radiographs for monitoring radiographic progression in RA. AuRA was trained on 367 radiographs from clinical studies and validated on 205 radiographs, achieving superior performance (root mean square error (RMSE) 23.6) compared to the two top-performing RA2-DREAM algorithms (RMSEs 35.0 and 35.6). The algorithm significantly correlated with expert-assessed scores over time (Pearson’s R = 0.74, p < 0.001). 97 Moradmand et al. published the first report of an automated, multistage DL model using a ViT to predict the Overall Sharp Score from hand X-ray images in RA patients, reducing reliance on time-consuming manual scoring. The model achieved high accuracy in joint identification (99%) and demonstrated strong predictive performance for Sharp scores. 98
In addition to peripheral joint assessments, DL-based models of X-rays may automatically evaluate other joints, such as the atlantoaxial joint and knee.99–101
He et al. developed a DL model using multimodal US images to quantify RA activity according to the EULAR–OMERACT Synovitis Scoring system. Dynamic power Doppler models performed better than static models and surpassed many senior radiologists, achieving AUC values up to 0.95. 104 Fiorentino et al. 105 reported a DL framework-based ultrasonographic approach capable of automatically measuring cartilage thickness.
Predicting disease flares
AI and ML algorithms may be beneficial for predicting RA flares.106–112 The FLARE-RA study will be a biopsy-driven clinical trial designed to predict and prevent disease flares in patients with RA who are in sustained remission. 106 It integrates synovial tissue spatial transcriptomics and blood single-cell RNA sequencing data from patients undergoing treatment withdrawal across various therapeutic classes (biologic disease-modifying antirheumatic drug (cDMARDs), tumor necrosis factor inhibitor (TNFi), interleukin-6Ri (IL-6Ri), JAKi, CTLA4-Ig), combining these molecular signatures with clinical and imaging data. Using explainable AI, the study aims to develop and prospectively validate a decision-support tool to guide individualized treatment tapering strategies and optimize long-term remission outcomes. 106 The project is ongoing.
Matsuo et al. aimed to predict disease relapse in RA patients who were in remission, using ML approaches based on baseline ultrasound and blood test data. Three classifiers—Logistic Regression, RnFr, and XGBoost—were trained on 73 features from 210 patients, with XGBoost achieving the highest predictive performance (AUC = 0.747). The model identified 10 key features, including superb microvascular imaging scores for wrist and metatarsophalangeal, and outperformed conventional prognostic markers. However, the absence of external validation limits the generalizability of the findings and underscores the need for further prospective evaluation. 107 Using ML, O’Neil et al. aimed to identify serum proteomic signatures associated with clinical remission and future disease flare in RA patients using ML. In 130 RA patients from the RETRO cohort, 1307 serum proteins were quantified at baseline, and unsupervised clustering revealed four proteomic subgroups, one of which (Cluster 4) was associated with lower Disease Activity Score in 28 joints (DAS28) scores and humoral immune activity. While clustering alone did not predict flare, an XGBoost classifier based on baseline proteomic data achieved an AUC of 0.80 in predicting a 12-month relapse after medication withdrawal. 108 Vodencarevic et al. utilized ML models to predict flare occurrences during and tapering biologic DMARDs (bDMARDs) in 41 RA patients in remission, achieving an AUC of 0.81. Baseline serum proteomics was analyzed from 130 stable RA patients in remission; an XGBoost model demonstrated superior predictive performance for future flares, with an AUC of 0.80. 109 Labinsky et al. explored the Rheuma Care Manager as a CDSS incorporating an AI-based flare risk prediction tool for managing RA. This model established good accuracy for flare prediction, evidenced by the AUROC of 0.80. 110 Unlike traditional ML methods, integrating DL algorithms achieved moderate likelihood, an AUC of around 0.7, in various EHR settings to assess severity and flare risk. 111 These findings warrant further research to assess the feasibility of utilizing routinely collected clinical and laboratory data to evaluate and predict future disease activity and flare risk in RA. However, systematic discrepancies in EHR data quality pose challenges to model generalizability.
Wearable activity trackers monitor movements and provide insights into symptom flare-ups in RA.112–114 ML facilitates recognizing activities from accelerometer data using post-processing techniques such as confidence thresholds and logical filters to enhance predictions. The French ActConnect study, which analyzed data from 155 patients, 1339 flare assessments, and 224,952 activity hours, achieved a sensitivity of 96%, specificity of 97%, PPV of 91%, and negative predictive value (NPV) of 99% in detecting flares using Bayesian methods. 112 Gandrup et al. assessed the feasibility of predicting RA flares using ML models trained on daily symptom data collected via a smartphone app (REMORA) from 20 patients over 3 months. Among several classifiers, logistic regression with elastic net regularization achieved the highest performance with an AUC of 0.82, a sensitivity of 0.60, and a specificity of 0.80 for flare detection. However, external validation is needed. 113
Assessment of disease activity and prediction of disease activity scores
Accurate assessment of disease activity in RA is essential for guiding treatment decisions, yet conventional indices like DAS28 or Clinical Disease Activity Index (CDAI) can be time-consuming. AI can streamline this process by predicting disease activity using diverse data sources, including EHRs, clinical notes (via NLP algorithms, named entity recognition, and sentiment analysis), laboratory values, imaging, and patient-reported outcomes (PROs), NLP techniques.115–123
Norgeot et al. developed a longitudinal DL model to predict RA disease activity at the next clinical visit using structured EHR data from two distinct health systems. Despite differing patient populations and treatment patterns, the model achieved high predictive performance at the university hospital (AUROC = 0.91) and maintained reasonable accuracy at the safety-net hospital (AUROC = 0.74). 116 Feldman et al. evaluated whether supplementing Medicare claims data with EHR data improves the estimation of RA disease activity, using DAS28-C-reactive protein (CRP) as the reference standard. Among 300 patients from the BRASS cohort, adding EHR-derived medications and laboratory data to claims data improved model performance, with C-statistics increasing from 0.61 (claims-only) to 0.76 (claims + medications + labs) for classifying high/moderate versus low disease activity. While continuous DAS28-CRP estimation remained suboptimal (adjusted R 2 = 0.18 at best), combining claims and EHR data enabled reasonable discrimination between binary disease activity categories. 117 Spencer et al. reported a ML model to estimate CDAI scores from clinical notes using data from the OM1 RA Registry. The model demonstrated strong performance with an AUC of 0.88, PPV of 0.80, NPV of 0.84, and Spearman’s R of 0.72. 118 Curtis et al. assessed whether longitudinal PROs could serve as surrogates for the CDAI in predicting LDA among RA patients starting new biologics. ML techniques, primarily RnFrs, yielded an accuracy of approximately 80%, highlighting the potential of PRO data for real-world disease activity classification when physician-derived measures are unavailable. 119 However, the main barriers to daily usage include the heterogeneity of EHR data, variability in clinical documentation practices, and difficulty distinguishing between disease symptoms and treatment-related side effects. 115 Moreover, NLP models may struggle with domain-specific language, abbreviations, and inconsistent use of clinical terminology. Incorporating multimodal data integration approaches, expanding training datasets to include diverse patient populations, and enhancing model interpretability through tools like SHAP may overcome the abovementioned issues. 115
Mobile activity trackers may be useful for predicting disease activity and PROs in the RA.124,125 Rao et al. developed ML models to predict PRO scores in RA patients. They suggested that activity tracker data can effectively monitor health status over time while underlining the need for external validation. 124 The weaRAble-PRO study explored how digital health technologies, including smartphones and wearables, could enable the development of innovative RA assessment methods by combining PRO (PRO-RAPID3 Score) with sensor-based data. Over 14 days, health status, mobility, fatigue, and RA-specific symptoms measured with smartwatches and iPhones accurately differentiated between RA patients and healthy controls (F1: 0.807) and improved the detection of RA severity and flare risk when paired with PROs (F1: 0.833 compared to 0.759 for PROs alone). 126
Besides NLP algorithms, AI-based MRI and/or ultrasonography imaging can potentially assess RA disease activity.127–130 Mao et al. developed an AI-based model using dynamic contrast-enhanced MRI to quantify synovitis in RA patients, achieving high disease activity assessment performance with AUCs ranging from 0.941 to 0.965 and Dice scores between 0.557 and 0.615. The model correlated with expert assessments (Spearman’s 0.884–0.927 for joints and 0.736–0.831 for whole hands). 127 The RATING system, a DL-based tool using multimodal ultrasound and self-supervised pretraining, improves RA activity scoring accuracy to 86.1% and enhances radiologists’ diagnostic performance from 41.4% to 64.0%. 128 However, external validation is needed for both studies.
Two novel composite remote disease activity indexes, ThermoDAI and ThermoDAI-CRP, are based on thermal imaging and ML to assess RA activity without formal joint counts. ThermoDAI and ThermoDAI-CRP showed strong correlations with ultrasound-determined synovitis (GS = 0.52–0.58; PD = 0.56–0.61) and high sensitivity for detecting active synovitis, outperforming PGA and PGA + CRP in relation to CDAI, SDAI, and DAS28-CRP scores (ρ > 0.81).131,132
Predicting treatment response
In RA treatment, the vast armamentarium of therapeutic options, glucocorticoids, conventional and targeted-synthetic DMARDs (tsDMARDs), and bDMARDs, presents challenges in selecting optimal therapies due to the lack of solid data considering different patient populations, variability in drug responses, and wisdom in the trial-and-error approach. AI applications may lead to better prediction of treatment response of DMARDs and adverse event risk.133–143
For the prediction of methotrexate response from fecal RNA using RnFr, an AUC of 0.84 was achieved. 144 To predict adverse events via EHRs, Surendran et al. developed a machine-learning model using RnFr to forecast liver enzyme elevation in RA patients treated with methotrexate. This model demonstrated high accuracy (F1 score: 0.87), with baseline high-normal transaminase levels and elevated lymphocyte and neutrophil counts identified as the primary predictors. 145 Furthermore, other genome, exome, and biologic material sequencing models predicted methotrexate response with an AUC ranging from 0.7 to 0.8. 19 Li et al. developed a diagnostic xgboost model using circRNA biomarkers (hsa-circ0002715, hsa-circ0001946) identified through ML techniques for predicting methotrexate-insufficient response and optimizing TNFi treatment in RA patients. 146 Li et al. aimed to predict 6-month non-remission in 222 DMARD-naïve RA patients starting methotrexate monotherapy, using baseline characteristics from the ARCTIC trial through a ML model. A super learner algorithm combining elastic net, RnFr, and SVM achieved AUC-receiver operating characteristic (ROC) scores of 0.75–0.76 with sensitivities of 0.77–0.81. The model identified the baseline Rheumatoid Arthritis Impact of Disease score as the most powerful predictor. 147
ANNs and RnFr algorithms have been used to predict responses to TNFi like infliximab, adalimumab, and etanercept in RA patients.19,141 Based on clinical and molecular data, these models reached a high degree of accuracy, sensitivity, and specificity, demonstrating the capability of ML to guide better treatment decisions. Sonomoto et al. developed inexpensive ML models, explicitly using lasso logistic regression, for predicting CDAI remission 6 months after initiating TNFi in patients with RA. The models, which relied exclusively on routine clinical data input without subjective physician contribution, demonstrated promising accuracy and underlined the feasibility of regional or institution-specific precision medicine approaches. 148 A study integrating clinical and molecular data attained an AUC of 0.91 in predicting the response to anti-TNF therapy. The investigation examined the impact of RETN polymorphisms on remission in patients with RA treated with TNF-α inhibitors, utilizing ML to forecast remission, revealing the elastic net algorithm showing the best performance in predictions. 149
Besides the TNFi, several prediction models were published for different biologics. Johansson et al. presented the development and validation of a remission prediction score for tocilizumab monotherapy in RA using RCT and real-world data. The models’ performance in predicting remission was good, with improved discrimination achieved by retraining on real-world data and expanding the variable set. 150 Rehberg et al. applied a ML-based decision tree (GUIDE) to identify a clinically actionable rule for predicting response to sarilumab in RA. The rule—anti-CCP positivity combined with CRP >12.3 mg/L—was derived from the MOBILITY trial and validated across multiple trials (MONARCH, ASCERTAIN), showing strong predictive value for ACR and DAS28-CRP responses. Although the performance was limited in TNFi-refractory patients (TARGET), the approach demonstrated potential for guiding individualized biologic selection between sarilumab and adalimumab. 151 Kalweit et al. applied DL to cluster RA patients and investigate their response to b/tsDMARDs. Results showed that patients with at least two conventional synthetic DMARDs (csDMARDs) and prednisone at the start of b/tsDMARD, male patients, and those with lower disease burden had a significantly better response to tocilizumab compared with adalimumab, with hazard ratios (HRs) ranging from 3.64 to 8.44. In contrast, seronegative women not using prednisone at initiation and seropositive women with greater disease burden and longer disease duration had a higher risk of non-response to golimumab compared with adalimumab: HRs 2.36 and 5.27, respectively. 152 Alten et al. applied ML models, particularly a gradient-boosting classifier, to predict 12-month abatacept retention in 5320 RA patients from the ACTION and ASCORE trials, achieving an AUC of 0.620 and an F1 score of 0.659. Key predictive factors identified through SHAP analysis included low BMI, ACPA positivity, low patient global assessment, younger age, and better functional status. 153
A recent scoping review by Benavent et al. 154 aimed to evaluate the application of AI in predicting treatment response to DMARDs in patients with RA. Out of 66 studies included, 32 specifically focused on RA. The review assessed various AI techniques, including ML models such as RnFrs, SVMs, NNs, and DL approaches. These methods analyzed diverse data types, including clinical parameters, imaging, genetic, and transcriptomic data. 154 Overall, the included models demonstrated moderate to high predictive performance, with reported AUC values typically ranging between 0.70 and 0.90 and some models exceeding 0.90. However, the review also highlights the need for further validation and standardization of models before widespread clinical adoption. 154 Messelink et al. performed text mining and feature weighting on real-world data of RA patients to identify those with D2T RA. They also developed a predictive model assessing the risk of developing D2T RA before starting the first biological or tsDMARD. It correctly identified 79% of the D2T cases, with an AUROC of 0.73. 155 Besides all these studies, a systematic review has been conducted to appraise the performance and transparency of ML models that predict RA treatment response. Among 29 analyzed studies, the adherence to reporting standards was moderate, as assessed by TRIPOD (42.9%–45.6%), and most of the studies had unclear or high risk of bias (79.3%). Although ML methods are increasingly applied in RA, the findings underline the need for improved transparency, calibration, and better handling of missing data to enhance the reliability of these models. 156
Drug development and repurposing
AI, particularly ML, may play a significant role in drug repurposing for RA by facilitating the identification of novel therapeutic candidates.157–161 Various ML techniques, including text mining, structure-based, signature-based, and network-based approaches, are employed to analyze vast datasets, extract patterns, and predict potential drug candidates.157,160,162–167 For instance, ML models are used to prioritize genes, predict drug-target interactions, and identify effective inhibitors against inflammatory targets, even for the production of virtual twin synovial joints.157,167–170 Integrating ML and NLP with conventional methodologies enhances the efficiency of the drug discovery process by rapidly identifying promising candidates and minimizing experimental costs. 171 Moreover, ML-based approaches improve the accuracy of predictions by combining multi-omics data, which is essential for addressing the complexity of RA pathogenesis. 157
Prabha et al. utilized ML to identify novel and potential TNFi, addressing the limitations of existing treatments for RA, such as cost, administration challenges, and side effects. The best performance among all the tested algorithms was exhibited by the RnFr model, which yielded an accuracy of 87.96% and a sensitivity of 86.17% for the combination of 1D, 2D, and fingerprint features; this is a novel application of ML in predicting TNFi and offers a cost-effective and efficient approach to drug discovery for RA. 172 Tariq et al. integrated a multi-omics approach combining transcriptomics and epigenomics data with bioinformatics and ML techniques to identify 18 novel multi-evidence genes (MEGs) associated with RA. Twelve of these MEGs, previously unlinked to RA, were found to be part of critical protein-protein interaction (PPI) networks and RA-related pathways, exploring the possibility of AI-driven analysis for discovering new therapeutic markers. 173
However, it should be underlined that effective implementation of AI requires comprehensive, connected datasets and validation at the individual patient level. 174
Assessment of co-existing conditions
RA is associated with several comorbidities, making the early identification of at-risk patients critical for preventing and managing these complications. ML and DL algorithms have been applied to identify and stratify comorbidities. Crowson et al. analyzed comorbidity patterns in RA patients using four clustering methods and compared them to non-RA individuals. Although RA patients had a higher overall comorbidity burden, the comorbidity patterns were similar across groups, and the instability of clustering methods underscored the need for caution when interpreting results from a single approach. 175 Solomon et al. used multiple ML algorithms (K-mode, K-mean, regression-based, and hierarchical clustering) to cluster 24 comorbidities in 11,883 RA patients from the CorEvitas registry over 6 years. While ML-derived comorbidity clusters produced similarly strong predictive models for CDAI and health assessment questionnaire (HAQ)-DI as those using individual comorbidities, the clusters were often clinically difficult to interpret. 176
The complexity of the multimorbidity network in RA has only recently been described as involving a complex interplay between various organs and systems affected by systemic inflammation, which is crucial to consider within the network concept. 177 Key nodes in this network include cardiovascular diseases (CVDs), metabolic comorbidities, musculoskeletal system complications, infections, malignancies, mental health disorders, gastrointestinal and hepatobiliary disorders, and pulmonary complications. 177 These interrelated conditions complicate disease management and significantly impact prognosis and quality of life; therefore, a personalized and multidisciplinary approach is necessary. ML and DL approaches can elucidate the underlying patterns and possibly the key factors in understanding this phenomenon. Utilizing ML, England et al. identified patterns of multimorbidity among patients with RA related to cardiopulmonary, cardiometabolic, and mental health and chronic pain disorders. Compared to individuals without RA, patients with RA had significantly higher odds of these multimorbidity patterns, particularly concerning mental health and chronic pain disorders. 178
Cardiovascular and metabolic disease risk prediction
AI-based risk stratification may improve cardiovascular outcomes and decision-making in RA patients.179,180 Sun et al. used RNA sequencing data and ML approaches to identify shared biological mechanisms and biomarkers between RA and ischemic heart failure. Through PPI network analysis and scRNA-seq, five hub genes (CD2, CD3D, CCL5, IL7R, SPATA18) were identified. 181
ML algorithms developed by Wei et al. that used demographic, clinical, and laboratory data for predicting the incidence of coronary heart disease in RA patients achieved an AUC of 0.79 with an accuracy of 76%, thereby outperforming traditional CVD risk assessments such as the Framingham Risk Score. 182 Konstantonis et al. proposed an ML framework for the diagnosis of CVD in medium- to high-risk subjects using a Greek cohort with RA, diabetes mellitus, and/or arterial hypertension. The framework included 46 CVD risk factors and showed high accuracy (98.40%) and AUC (0.98) at initial and follow-up visits, outperforming classical CVD risk scores. 183 Feng et al. reported risk factors (elevated serum lipids, lipoproteins, and decreased Treg cells) via ML models (LASSO, RnFr, LR) for CVD in RA patients, and their three proposed nomograms achieved high predictive performance; however, external validation is needed. 184
To identify disease-specific biomarkers for RA with multiple metabolic diseases (RA_MD), Zhu et al. used a multi-omics approach (dysregulated lipid metabolism, including altered gut microbiota, metabolites, genes, proteins, and phosphoproteins) combined with the LASSO ML algorithm, achieving high diagnostic accuracy (AUC: 0.958). 185
Interstitial lung disease
Using AI, computer-assisted systems can dive deep into pathogenesis, and quantify, classify, and assess RA-interstitial lung disease’s (RA-ILD) prognosis.186–189 Biomarkers to predict the occurrence, severity, and prognosis of RA-ILD are very valuable as ILD progresses insidiously in the early stages and can be a major cause of mortality and morbidity. 190 Qin et al. 191 demonstrated that ML algorithms, KL-6, D-dimer, and tumor markers can greatly aid RA-ILD identification. Shi et al. identified seven candidate genes linked to RA and idiopathic pulmonary fibrosis (IPF). TOP2A shows significant involvement in both conditions through the TGF-β/Smad signaling pathway. 192 Adding ML-based quantitative CT analysis to polygenic risk scores revealed similarities between IPF and RA-ILD, which may be used to stratify RA patients’ ILD risk. 193 Zhou et al. reported an ML-based risk prediction model (key risk factors were advanced age, smoking, high DAS28 score, strong CCP positivity, and methotrexate use, while hormone therapy was protective) for RA-ILD using data from 459 RA patients. The model achieved predictive performance with an AUC of 0.8914, sensitivity of 74.5%, and specificity of 89.7%, while external validation is needed. 194 Humphries applied DL-based data-driven texture analysis (DTA) on CT scans of 289 RA-ILD patients (primary cohort) and 50 patients (validation cohort), finding that higher DTA fibrosis scores were negatively correlated with lung function (forced vital capacity (FVC): rho = −0.55, −0.50; Diffusing Capacity of the Lungs for Carbon Monoxide (DLCO): rho = −0.67, −0.65; p < 0.001) and significantly associated with increased mortality risk (HR = 1.04–1.06; p < 0.05). Sequential increases in DTA fibrosis scores independently predicted mortality (HR = 1.04; 95% confidence interval (CI), 1.01–1.06; p = 0.003). 195
NLP applications can extract and standardize unstructured clinical information from EHR into Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT) terminology, identifying and characterizing RA and RA-ILD. 196 England et al. developed an NLP tool to extract FVC values from EHR notes of 7485 RA-ILD patients. The tool captured 15,383 FVC values from 4982 patients, compared to 5911 values from 1844 patients obtained directly from pulmonary function test equipment. The tool demonstrated high accuracy (r = 0.94, precision = 0.87) and increased the capture of longitudinal FVC data by three-fold. 197
Osteoporosis
Bone mineral density (BMD) assessment is valuable throughout the RA disease process, as reduced bone density may be a poor prognostic factor in early disease and may be a cause of morbidity in later phases. 198 ML algorithms can reveal common genetic and -omic backgrounds between osteoporosis (OP) and RA and predict the risk of OP in RA patients.199,200 Lo et al. 201 used ML techniques, including LASSO and RnFr, to find that ATXN2L and MMP14 were the key genes involved in progressing from RA to glucocorticoid-induced OP. Saito et al. evaluated the data from the KURAMA cohort (n = 302) and an external validation cohort (n = 32) regarding the correlation between the second metacarpal cortical index (2MCI) and BMD in RA patients. ML models incorporating 2MCI and clinical parameters demonstrated good predictive performance for detecting osteoporosis/osteopenia and BMD, with external validation confirming their generalizability. 202 Lee et al. reported ML models (including BMI, age, menopause, waist and hip circumferences, RA surgery, and monthly income) to predict OP in RA patients using the KORONA database, with logistic regression achieving the highest AUC (0.750) and XGBoost showing the highest accuracy (0.682); however, further validation is needed. 203
Mortality and other conditions
Surgery may be needed in patients with advanced articular deformities. Baxter et al. reported two extreme gradient boosting ML models to predict the likelihood of RA patients undergoing RA-related surgery and the type of surgery required, using clinical data from EHRs. The model predicting surgery occurrence achieved a high AUC of 0.90. 204
Lezcano-Valverde et al. developed and externally validated an RA mortality prediction model using the RSF based on clinical and demographic data from two independent Spanish cohorts (HCSC-RAC and PEARL). The model demonstrated acceptable prediction error (0.187 training, 0.233 validation), with age at diagnosis, erythrocyte sedimentation rate (ESR), and number of hospital admissions being the most predictive variables. Although calibration showed some overestimation of mortality risk, the RSF model successfully stratified patients into risk groups. 205 Using a K-nearest neighbor quantifier to analyze high-resolution CT scans from the COPDGene cohort, McDermott et al. 206 reported that RA was associated with a higher percentage of interstitial changes, and RA patients with emphysema above the 75th percentile had over fivefold higher mortality compared to non-RA participants (HR = 5.86; 95% CI, 3.75–9.13).
Important note to readers
Whether AI or classical statistics is used to generate prediction models, the generalizability of these models remains an important research question. The performance of these models should be evaluated in prospective designs across groups with varying ethnic and social backgrounds.
AI in the management of axSpA
axSpA is a chronic inflammatory disease primarily affecting the sacroiliac joints (SIJs) and spine, leading to pain and progressive structural damage. 207 Diagnosis can be challenging due to the disease’s variable presentation, and management is often complex, requiring careful monitoring of both inflammatory and structural progression. 207 Recently, AI, mainly through DL algorithms, has emerged as a promising approach for enhancing axSpA diagnosis, disease monitoring, and precision medicine (Table 3). This review synthesizes the role of AI across various imaging modalities, predictive modeling, and real-time patient monitoring to demonstrate how AI can improve axSpA care.
AI applications in axSpA.
AI, artificial intelligence; ANN, artificial neural network; AUC, area under the curve; axSpA, axial spondyloarthritis; CDSS, clinical decision support system; CNN, convolutional neural network; CT, computed tomography; DL, deep learning; EHR, electronic health record; GEO, gene expression; ML, machine learning; MRI, magnetic resonance imaging; NLP, natural language processing; NPV, negative predictive value; PBMC, peripheral blood mononuclear cell; PET, positron emission tomography; PPV, positive predictive value; RMSE, Root Mean Square Error; RnFr, random forest; SVM-RFE, support vector machine recursive feature elimination; TabNet, tabular neural network.
Predicting the risk and detecting axSpA
From a basic science perspective, Han et al. aimed to identify ankylosing spondylitis (AS)-specific mRNA biomarkers from whole blood using an ML-based feature selection method, support vector machine-recursive feature elimination (SVM-RFE), applied to the mRNA expression profile (GSE73754) from the GEO. ML identified 13 key mRNAs, with IL17RA, Sqstm1, Picalm, Eif4e, Srrt, Lrrfip1, Synj1, and Cxcr6 validated as significant for AS diagnosis. Notably, Cxcr6, IL17RA, and Lrrfip1 were associated with AS severity. 208 In another study, Alber et al. used single-cell CITE-seq technology and ML models to analyze peripheral blood mononuclear cells in AS patients, achieving an AUROC of >0.95 for AS classification. Overexpression of CD52, TNFSF10, IL-18Rα, and cytotoxic genes in specific cell subsets was identified. 209
Novel AI models are being developed to detect axSpA early, integrating various data sources to strengthen diagnostic accuracy. Jia et al. proposed the improved optimization algorithm SCJAYA (Salp Chain JAYA Algorithm), which integrates the salp swarm’s foraging behavior with cooperative predation strategies to improve the diagnosis of AS. The binary version of SCJAYA was applied to feature selection in the bSCJAYA-FKNN (fuzzy K-nearest neighbor) classifier, which showed excellent performance metrics: 99.23% accuracy and 99.52% specificity on both AS-specific and public datasets. 210 Deodhar et al. and Zhang et al. developed models investigating medical and pharmacy claims data and routine blood tests for high-risk disease identification. At the same time, Ye et al. have extended this approach by incorporating MRI findings with clinical risk factors into a predictive clinical-radiomics nomogram. These models utilize a wide range of biomarkers to facilitate diagnostics and suggest efficient strategies for early diagnosis of axSpA.211,212 Using a machine-learning approach, Zhu et al. developed a diagnostic model for AS using routine blood tests, liver function, and kidney function: seven factors were identified, including ESR, red blood cell count, mean platelet volume, albumin, aspartate transaminase, and creatinine. These were integrated into a nomogram that demonstrated high accuracy, with an AUC of 0.878 in the training and 0.823 in the validation cohort. 213 To address the clinical heterogeneity of axSpA, Sun et al. identified two distinct patient subtypes with unsupervised ML. Their predictive model integrated biomarkers like CRP, neutrophils, and monocytes and reached a high AUC of 0.983. This stratification may help design treatment strategies for patient profiles. 214 Besides, Wen et al. used ML algorithms, including LASSO regression, SVM-RFE, and RnFr, to analyze gene expression profiles from the GEO database. They identified IL2RB and ZDHHC18 as potential diagnostic biomarkers for AS. The diagnostic model achieved AUC values exceeding 0.84 across multiple validation datasets. 215 From the same database, Li et al. found that pyroptosis-related genes may be a potential biomarker and drug target for AS. 216
Kennedy et al. aimed to develop an ML model to predict AS using primary care health records from the Secure Anonymised Information Linkage databank. The model was trained on clinical, demographic, and laboratory data, including symptoms, comorbidities, medication use, diagnostic codes, and blood test results. Using decision trees built on principal component analysis, the model was trained on a cohort of 543 male and 250 female AS patients, along with 2900 male and 2900 female controls, with additional tests on 1559 AS patients and 4500 controls. The model showed high prediction accuracy for males, with a PPV of 76.69%, and for females, 78.3%. However, applying the model to a general population with a low prevalence of AS reduced the PPV to 0.33% for males and 0.25% for females. 217
NLP may aid in improving the classification of axSpA by using EHRs.218–220 Benavent et al. used NLP to extract clinical data from the EHRs of 4337 patients diagnosed with spondyloarthritis at a large hospital. The NLP system extracted information on demographics, SpA subtypes, comorbidities, and treatment patterns, including methotrexate and adalimumab as the most commonly used csDMARD and bDMARD, respectively. The technology was highly reliable, with F-1 scores above 0.80 for data extraction. These findings illustrate the potential for NLP to enhance SpA patient profiling and improve clinical management by efficiently analyzing unstructured EHR data. 218 However, we as clinicians need to do better to help AI(!). In the SpAINET study, NLP of EHRs from axSpA and PsA patients treated with bDMARDs in Spain was performed. 221 Findings suggest a critical gap in real-world data for both diseases concerning the assessment and management of disease activity. Notably, there was a very low record of any activity score concerning the disease at the start of biologic therapy, pointing out the need for enhanced clinical documentation practices. The SpAINET study underlines the potential of AI and sophisticated data analysis to cover existing shortcomings in axSpA and PsA management for more active and specialized patient management. 221
An important problem in daily clinical practice is differentiating axSpA from other causes of chronic back pain. Redeker et al. reported a RnFr-based ML model to distinguish axSpA from non-axSpA in patients with chronic back pain using clinical data from 939 patients (659 axSpA, 280 non-axSpA). The model achieved high performance with an accuracy of 0.9234, sensitivity of 0.9586, specificity of 0.8438, and ROC-AUC of 0.9717, with HLA-B27, insidious onset of back pain, and SIJ erosions being the most important predictors. 222
Insights into imaging techniques for diagnosis
Radiography, or X-ray, remains a major modality for diagnosing axSpA due to its wide availability and affordability. 207 The modified New York criteria for diagnosing AS, one of the subtypes of axSpA, have traditionally relied on the presence of radiographic sacroiliitis, which is typically defined as grade 2 bilaterally or grade 3 unilaterally. Plain radiographs have several limitations, the most significant of which is their insensitivity early in the disease process. 207
AI models can enhance the sensitivity of radiographic analysis.13,103,223–225 A CNN model based on the ResNet architecture, trained to differentiate between normal and sacroiliitis-affected images, showed a very high AUC of 0.97 on a validation dataset, comparable to expert radiologist performance. 223 Another study applied transfer learning techniques on networks like VGG-16, ResNet-101, and Inception-v3, achieving high sensitivity and specificity. 224 A semi-supervised DL model diagnosed AS using pelvic radiographs, with an expert-level performance with only 10% labeled data. Validated on an independent test set of 982 images, the model reached 89.1% accuracy, 86.5% recall, and 85.9% precision, while interpretability analysis confirmed its clinical reliability. 226 Koo et al. developed a DL model to automatically grade vertebral corners on spinal radiographs for Modified Stoke Ankylosing Spondylitis Spinal Score (mSASSS) scoring in AS. The model achieved high accuracy (91.6%); however, further validation is needed. 227 Similarly, Chen et al. proposed a DL model to automate mSASSS using lateral X-ray images of the cervical and lumbar spine in patients with AS. The model was trained on data from 554 patients and reached an accuracy of 86.5% in detecting vertebral structural damage and scoring individual vertebral corners. Both studies need further validation. 228 Further, integrating the anatomical landmark knowledge gathered by the X-ray into the model enhanced the capacity of the AI model. 103
Ultrasound plays a role in assessing peripheral inflammation, including enthesitis and synovitis, which are common manifestations of SpA. The limited use of ultrasound in axSpA diagnosis is attributed to the dependence on the interpreter’s expertise; therefore, it is not easily standardizable. An AI-driven regression model for calculating the Madrid Sonographic Enthesis Index, an indicator of enthesitis severity, demonstrates that AI may bring consistency to ultrasound interpretation, reducing variability and improving reliability in the clinical setting. 229 However, readers should be cautious as this study has not been published in full-text, and results should be interpreted cautiously. The role of ultrasound in axSpA remains complementary, and its wider application is pending more extensive validation studies.
CT imaging offers superior sensitivity and specificity compared to radiography, especially for identifying bone erosions, sclerosis, and ankylosis in the SIJs. However, the images produced by CT are complex and require expertise for interpretation. AI-driven models have further advanced CT-based axSpA analysis, allowing for a more detailed assessment of structural changes. Liu et al. 230 proposed a model that combines U-Net segmentation with radiomics, predicting an accuracy of 87.3% in diagnosing sacroiliitis from CT images, nearly 10% higher than that achieved with traditional methods. Castro-Zunti et al. 231 examined additional applications of DL using an InceptionV3 architecture for detecting sacroiliac erosions and structural damage, surpassing musculoskeletal radiologists in sensitivity and specificity. In a multi-center study, Van Den Berghe et al. trained a DL model to identify erosions and ankylosis on CT scans by concentrating on the cortical edges of the SIJs. Utilizing Grad-CAM++ visualization, this model attained high sensitivity and specificity, showcasing AI’s potential to establish standardized, reproducible CT-based diagnostic processes in axSpA. 232
MRI provides unique insights into bone and soft tissue, which are imperative for the early diagnosis of axSpA, especially when radiographic findings are inconclusive. 207 Early inflammatory changes, such as BME, can be identified on MRI and are considered important markers for axSpA. 233 However, BME assessment is subject to significant intra- and inter-observer variability, highlighting the need for standardized approaches.
AI has significantly improved the detection of BME and/or structural lesions and synovitis on MRI.234–246 Recently, Nicolaes et al. evaluated a previously trained deep-learning algorithm for detecting inflammation in SIJ MRIs from 731 axSpA patients (326 nr-axSpA, 405 r-axSpA) across two trials (RAPID-axSpA, C-OPTIMISE). The algorithm, blinded to clinical data, achieved a sensitivity of 70%, specificity of 81%, PPV of 84%, NPV of 64%, Cohen’s kappa of 0.49, and absolute agreement of 74% compared to expert readings. 234 A DL model proposed and tested by Bordner et al. utilized MRI data to identify BME in SIJs, facilitating the prediction of active sacroiliitis according to the ASAS criteria in patients with chronic inflammatory back pain. The model demonstrated excellent performance, as evidenced by an AUC of up to 0.98 and a specificity of 100% in external validation using data from the DESIR cohort. 245 Tenorio et al. reported that radiomic features extracted from fat-suppressed MRI sequences (SPAIR and STIR) are significantly associated with active sacroiliitis and clinical markers of SpA. The models were validated against expert musculoskeletal radiologists, showing strong concordance. 247 Ozga et al. 248 created a model with a Dice coefficient of 0.98 for segmenting lesions, achieving near-perfect agreement with expert radiologists. Other studies identified active sacroiliitis using DL networks and combined models with the Spondyloarthritis Research Consortium of Canada (SPARCC) scoring system to ensure accurate grading. Zhang et al. merged radiomics from MRI with clinical data, achieving excellent diagnostic performance in distinguishing axSpA from mimics. For this purpose, they focused on structural markers of erosions, subchondral lesions, and sclerosis. Structural lesions such as erosions and subchondral abnormalities are crucial features of axSpA diagnosis on MRI but can be challenging to detect. 249
Radiomics and clinical data have effectively distinguished between axSpA and non-axSpA, and inflammatory and structural changes have been combined to enhance early diagnosis using AI models. 239 For instance, Zhang et al. developed and validated a prediction model for distinguishing axSpA from non-axSpA using SIJ-MRI imaging features and clinical risk factors, applied to 942 patients (707 axSpA, 235 non-axSpA). The combined clinical-imaging model using tabular neural network (TabNet) achieved the highest diagnostic performance with an AUC of 0.93, outperforming other ML models, with key predictors including joint space changes, erosions, HLA-B27 positivity, and CRP values. 250 AI tools may also provide benefits in diagnosing axSpA in HLA-B27-negative patients. Lu et al. reported NegSpA-AI, a DL tool using SI MRI and clinical data, which achieved AUCs of 0.878, 0.870, and 0.815 across internal, external, and prospective test sets, respectively. It significantly improved diagnostic accuracy, sensitivity, and specificity for junior radiologists and rheumatologists by up to 11.5%, 13.3%, and 20.6% in tests involving 454 HLA-B27-negative axSpA patients. 251 In addition to BME, AI applications may provide valuable assessments of structural lesions. Li et al. reported a UNet-based DL model for segmenting fat metaplasia on SIJ MRI and a classification model to distinguish axSpA from non-axSpA using data from 706 patients. The segmentation model achieved Dice Similarity Coefficients of 81.86% (internal) and 85.44% (external), while the classification model reached AUCs of 0.876 (internal) and 0.799 (external). Model assistance also improved radiologists’ performance. 252
An attention-deep neural network has also assessed spinal inflammation in STIR MRI images of axSpA patients, achieving an AUC-ROC of 0.87, sensitivity of 0.80, and specificity of 0.88, comparable to radiologists’ assessments. 253 For hip involvement, integrating clinical data with MRI features like BME and effusion, AI models demonstrate expert-level accuracy, improving early detection and grading. 254 Additionally, AI models have employed denoising and intensity standardization algorithms to address heterogeneity in MRI protocols, improving diagnostic reliability across various imaging platforms. This harmonization is essential for enabling accurate, cross-center diagnostic comparisons and establishing a new standard for MRI in axSpA care.
Positron emission tomography (PET)-CT-based low-dose computed tomography (LDCT) may give insights into the pathogenesis of axSpA. Recently, van der Heijden developed and validated two automated segmentation methods using AI techniques—morphological operations and a multi-atlas-based approach—for detecting lesions in posterior spinal joints, SIJs, and disco vertebral units in SpA patients using Na[18F]F PET/LDCT. The atlas-based method, employing ML for enhanced segmentation accuracy, achieved superior performance with a maximum AUC of 0.90, making it the preferred approach for PET-based lesion analysis. 255
Advanced models integrating MRI radiomics, other imaging techniques, clinical biomarkers, and predictive analytics represent a modern, comprehensive approach to managing axSpA. For instance, merging structural and inflammatory changes from MRI with clinical data enables better disease monitoring and classification, allowing for early detection and treatment planning. However, further validation in diverse patient populations is necessary for these AI-driven tools to become dependable components of routine care in axSpA.
Predicting radiographic progression, treatment response and follow-up
The potential of AI is not limited to imaging since predictive models showed promise in predicting the disease course and treatment strategies in axSpA.256–259 ML algorithms have pointed out prognostic factors, including baseline inflammatory markers and structural damage, which may help the clinician guide the treatment approach.
Recently, Dorfner et al. studied the role of anatomy-centered DL models for detecting radiographic sacroiliitis and predicting disease progression in axSpA using pelvic radiographs from four cohorts (total n = 2422). Comparing standard and anatomy-centered models, the latter achieved higher AUCs (0.899, 0.846, 0.957) and accuracies (0.821, 0.744, 0.906) across three test datasets. High-risk patients identified by the anatomy-centered model had an odds ratio of 2.16 for radiographic sacroiliitis progression within 2 years, indicating improved generalizability and predictive capability. 260 Koo et al. utilized the RnFr algorithm to analyze time-series clinical features to predict radiographic progression AS. Their model yielded a mean accuracy of 73.73% and an AUC of 0.79, reflecting robust predictive performance. 257 Baek et al. 258 developed a machine-learning model using clinical data to predict radiographic progression in axSpA patients; the model performed better than the generalized linear or input model to predict MSA score (RMSE = 2.83). Joo et al. used agglomerative hierarchical clustering and ML models to classify 412 axSpA patients into three phenogroups with distinct clinical characteristics and radiographic progression rates. Phenogroup 2 (male smokers) had the worst progression, while Phenogroup 3 (uveitis-only) had the least. 259 However, Garofoli et al. 261 compared ML algorithms with traditional statistical models, and comparable predictive accuracies were found between the two approaches. This would mean that though AI offers promising tools, its superiority over conventional methods of early disease progress is not yet well established.
ML can be a valuable tool in predicting treatment responses for axSpA. Using baseline demographic and laboratory data, Lee et al. developed an ANN model to forecast early TNFi use in patients with AS. Compared to logistic regression, SVM, RnFr, and XGBoost models, the ANN model reached an AUC of 0.783. The strong predictors identified were CRP and ESR levels. 262 Fernández-Carballido et al. assessed sex differences and other factors influencing TNFi response in 969 axial SpA patients (315 females, 654 males) using statistical and AI-based analyses from the BIOBADASER registry. Analyses highlighted age at treatment initiation as the primary factor linked to poor TNFi response, especially when combined with female sex or cardiovascular risk factors. Females showed lower BASDAI50 responses and lesser reductions in ASDAS-CRP by the second year, with older age also contributing to unfavorable outcomes. 263
A recent scoping review by Benavent et al. 154 also evaluated the application of AI in predicting treatment response (anti-TNF and IL-17 blockers) in patients with SpA, including axial and peripheral subtypes. Various AI methods such as SVMs, RnFr, logistic regression, and DL models were employed, analyzing data from clinical, imaging (particularly MRI), and molecular sources. 154 Overall, the performance of AI models in SpA studies was promising, with reported AUC values generally ranging from 0.70 to 0.90, indicating moderate to high predictive accuracy. 154 However, the review emphasized that more robust validation in independent cohorts and improved data standardization are needed before these models can be reliably integrated into clinical practice. 154
Studies on the extra-articular involvements, comorbidities, and long-term follow-up of axSpA are largely missing. Bioinformatic analysis using ML algorithms found ST8SIA4 and lysosomal pathways as a common pathogenic gene and mechanism in AS and atherosclerosis. 264 Regarding cardiovascular risk assessment in axSpA, recent data suggested that the combination of QRISK3 and SCORE2 yielded the most accurate predictive performance. 265 Navarini et al. evaluated the performance of seven cardiovascular risk algorithms and three ML techniques for predicting CV events in AS patients. Among the traditional algorithms, SCORE and RRS performed best (C-statistic: 0.71 and 0.72), whereas RF was the best among ML techniques tested, with an AUC of 0.73. Feature analysis identified CRP as the most important predictor—a different profile from traditional CV risk factors. 266 Although AI can enhance cardiovascular risk assessment by integrating clinical, imaging, and biomarker data to develop more accurate prediction models beyond traditional risk calculators, the current literature has no solid data on this issue.
AI can support mortality risk assessment by analyzing complex, longitudinal datasets—including disease activity scores, comorbidities, treatment patterns, and imaging—to identify high-risk profiles with greater precision. This could facilitate early intervention strategies and better management plans to reduce long-term morbidity and mortality. However, currently, there is no robust evidence to explore the usage of AI in this setting.
Mobile applications and clinical decision support systems
Mobile applications may facilitate monitoring and enhancing physical activity,267,268 mobility,269,270 self-reporting symptoms, disease activity, and real-time communication with healthcare providers. Gossec et al. applied ML to activity tracker data to predict patient-reported RA and axial SpA flares. Over 3 months, physical activity data (steps per minute) and weekly flare assessments from 155 patients were analyzed. The ML model exhibited high accuracy in detecting flares, with sensitivity at 96%, specificity at 97%, and positive and NPV at 91% and 99%, respectively. 112 A proof of concept was established in one study using a smartphone application to self-monitor disease activity in axSpA patients, showcasing how this might improve patient engagement and facilitate tight control strategies. 271
The CDSS integrates patient data, clinical guidelines, and predictive analytics to help providers make informed decisions. For instance, in axSpA, CDSS will analyze PROs, imaging findings, and laboratory results to provide treatment recommendations for personalized care. Using AI-enhanced CDSS might improve diagnostic accuracy, optimize treatment strategy, and reduce variability in clinical practice.
Looking ahead, AI can facilitate the management of axSpA by continuous patient monitoring and CDSS. These platforms may also promote patient engagement, leading to better adherence to treatment protocols and proactive health management.
Important note to readers
Whether AI or classical statistics is used to generate prediction models, the generalizability of these models remains an important research question. The performance of these models should be evaluated in prospective designs across groups with varying ethnic and social backgrounds.
AI in the management of PsA
This review of the existing literature suggests that AI applications in psoriatic disease primarily focus on psoriasis rather than PsA. AI applications have demonstrated progress in the following areas: recognition and differential diagnosis of psoriatic lesions, assessment of disease severity, prediction of complications and comorbidities, evaluation of treatment responses, and discovery of novel biomarkers. The application of AI to imaging techniques for PsA is preliminary and is mainly located in studies on RA. AI and its applications have been studied for the diagnosis, differential diagnosis, remote screening of PsA, and prediction of PsO to PsA transition via analyzing genetic, serologic, clinical, and multimodal data (Table 4).
AI applications in PsA.
AI, artificial intelligence; AUC, area under the curve; AutoML, automated machine learning; CNN, convolutional neural network; EHR, electronic health record; HAQ, health assessment questionnaire; HR, hazard ratio; ICCs, interclass correlation coefficients; ML, machine learning; NLP, natural language processing; NPV, negative predictive value; PPV, positive predictive value; PsA, psoriatic arthritis; PsAID, psoriatic arthritis impact of disease; PsO, psoriasis; RnFr, Random Forest; XGBoost, extreme gradient boosting.
Diagnosis and progression from psoriasis to PsA
For screening purposes for patients applying to primary care centers, Reed et al. aimed to generate and validate a smartphone application using ML algorithms to screen for different forms of hand arthritis, including PsA, across multiple rheumatology practices. The app utilized a CNN model trained on 1577 hand images and integrated survey data, achieving 95.2% accuracy, 76.9% precision, 90.9% recall, and 95.8% specificity for PsA detection; however, results need further external validation. 272 McArdle et al. aimed to identify serum protein biomarkers that differentiate early inflammatory arthritis patients with PsA from those with RA using nano-LC-MS/MS, aptamer-based assays, and multiplexed antibody assays. ML analysis achieved an AUC of 0.94 for nano-LC-MS/MS, 0.69 for bead-based immunoassays, and 0.73 for aptamer-based analysis, while validation using RnFr models reached AUCs of 0.79 and 0.85, underlining the need for external validation. 273 Xu et al. 274 develop ML models for PsA diagnosis and progression risk prediction using multimodal clinical data from 3961 patients, including demographic, laboratory, clinical, and treatment-related variables. ML algorithms, particularly XGBoost and AdaBoost, were applied to integrate these diverse data sources, achieving an AUC of 0.87 for PsA diagnosis and 0.80 for predicting PsA progression risk. Although these findings demonstrate the potential of multimodal AI approaches, further validation using larger and more diverse datasets is necessary. 274 Love et al. developed an AI-based algorithm that combined structured data with unstructured clinical notes using NLP. This strategy identified 31 PsA-related predictors and achieved a PPV of 90%–93% with a sensitivity of 87%, outperforming the PPV of 57% obtained using a single PsA code alone. 275 The PredictAI™ ML tool was evaluated for the early identification of undiagnosed PsA patients using data from Maccabi Healthcare Service spanning 2008–2020. The model, developed using data from 1 to 3 years before the diagnosis of PsA, displayed 90% specificity within the psoriasis cohort, with sensitivities of 51% 1 year and 38% 4 years prior and a PPV of up to 36.1%. This demonstrated the model’s utility in the early identification of PsA and allowing timely interventions to prevent joint damage and enhance patient outcomes. 276 Recently, Rudge et al. developed a ML model to detect PsA early and identify important clinical indicators in primary care. Using EHRs from the Clinical Practice Research Datalink, models were created employing Bayesian Networks (BN) and RnFr algorithms. 277 A cohort of 122,330 psoriasis patients was examined, with 2460 developing PsA; the best BN model achieved an AUC of 0.823 and a Precision-Recall Area Under the Curve (PRAUC) of 0.221, while the RnFr model reached an AUC of 0.851 and a PRAUC of 0.261. The effectiveness of the models may be limited by incomplete primary care data, delayed PsA diagnosis, and challenges in model calibration, highlighting the need for further validation and refinement. 277
A computational automated prediction tool for PsA in psoriasis patients was developed. 278 Researchers defined nine new loci influencing psoriasis and its clinical subtypes by analyzing data from six cohorts comprising more than 7000 genotyped PsA and psoriasis patients. Integrating a molecular signature composed of 200 genetic markers resulted in a remarkable AUC of 0.82, reflecting good predictive performance. 278 Mulder et al. applied flow cytometry combined with ML in immune cell profiling to delineate PsO from PsA. Their best RnFr classification model exhibited robust performance, with an AUC of 0.95. 279 Key differences included enriching differentiated CD4+ T-cells and decreasing memory T-cells and monocytes in the peripheral blood of PsA patients. Moreover, some immune subsets showed an association with joint scores in PsA—a potential tool for early PsA diagnosis in PsO patients, aiding in the timely referral and treatment of PsA. 279 Jalali-Najafabadi et al. 280 developed feature selection and PsA risk prediction models using genetic data from 1462 PsA cases and 1132 cutaneous-only psoriasis cases, with validation in the UK Biobank dataset. Seven supervised ML methods were trained using stratified nested cross-validation, and the best model achieved an AUC of 0.61 for internal cross-validation, 0.57 for the internal holdout set, and 0.58 for the external dataset. While HLA_C_*06 was initially identified as the most informative genetic variant, mitigating confounding features revealed HLA-B27 as the most important. Still, individual features alone showed only moderate predictive accuracy, suggesting that integrating multiple HLA features improves prediction performance. 280 The performance of a CNN model, utilizing temporal diagnostic and prescription data from Taiwan’s National Health Insurance Research Database, in predicting a 6-month PsA risk in PsO patients was assessed. The overall performance based on AUC reached 0.70, with a sensitivity of 0.80 and an NPV of 0.93. This model has shown good performance in identifying PsO patients at higher risk of subsequently developing PsA. 281 Another study from Switzerland investigated the predictability of PsA in patients with psoriasis. Swiss Dermatology Network on Targeted Therapies data was analyzed using gradient-boosted decision trees and mixed models, focusing on age, Psoriasis Area and Severity Index (PASI), physical well-being, and nail psoriasis severity. 282 Despite the limitations due to modest sample size and generalizability problems, the models achieved AUROCs of up to 73.7%, indicating moderate predictive performance. 282
Disease activity assessment
Using ML models, Choksi et al. aimed to identify serum metabolomic markers associated with skin disease activity in PsA patients. Serum samples from 150 PsA patients were analyzed, and predictive models achieved an AUC of 0.813 for distinguishing low and high disease activity. Potential biomarkers identified include phospholipids, bile acids, and oxylipins, but further validation is required. 283 Another study analyzed serum metabolites associated with PsA disease activity using mass spectrometry and ML, achieving classification models with AUC up to 0.818. Key metabolites, such as lysophosphatidylcholine and sphingomyelin, were identified as potential biomarkers for PsA activity. 284 A recent study applied ML to explore the components of the DAS28-CRP score for patients with PsA. Tenderness in the right index finger metacarpophalangeal joint proved to be the most informative indicator for staging PsA activity. 285 In a multicenter observational study of 158 patients with recent-onset PsA, key factors associated with a psoriatic arthritis impact of disease (PsAID) score of ⩾4, indicating high disease impact, were identified via ML algorithms. They found that higher HAQ scores, increased pain levels, lower educational attainment, and higher physical activity were significantly associated with elevated PsAID scores, with the mean accuracy measure surpassing 85%. 286 The same study group identified severe disease predictors, including high global pain, CRP levels, hypertension, gluteal/perianal psoriasis, and the need for synthetic DMARDs. Using ML models such as XGBoost and RnFr achieved more than 80% accuracy of predictions for severe cases, thus pointing out the rigorous management of pain, inflammation, and comorbidities in the PsA. 287 As the other part of the study, Queiro et al. aimed to develop ML models to predict minimal disease activity (MDA) in the same patient population. A RnFr algorithm revealed the most predictive variables: global pain, PsAID score, patient global assessment of disease, and HAQ score, achieving a prediction accuracy of 85.94% for MDA. 288
For assessing the articular activity status remotely, Webster et al. aimed to validate a smartphone-based app (Psorcast) for assessing cutaneous and musculoskeletal signs of psoriatic disease using digital measures and ML models. Participants included 104 individuals with PsA, PsO, or healthy controls, and digital assessments were compared with clinical evaluations by rheumatologists and dermatologists. ML models analyzing hand photos achieved 76% accuracy in detecting nail PsO. At the same time, the Digital Jar Open test for physician-assessed upper extremity involvement (tender joint or enthesitis) reached an AUROC of 0.68. Although results are acceptable for this proof-of-concept study, further validation in larger cohorts is necessary before clinical application. 289 Nail psoriasis, which affects around 50% of psoriasis patients, is difficult to quantify because of heterogeneous nail involvement, making manual scoring with the Nail Psoriasis Severity Index (NAPSI) impractical in clinical settings. Using BEiT, the researchers developed a neural network-based system for fully automated modified NAPSI scoring. The system achieved an AUC of 88% and a Pearson correlation of 90% with human annotations. This openly available system enables efficient, observer-independent evaluation of nail psoriasis severity in clinical use. 290 Ricardo et al. 291 aimed to assess the performance and inter-reader agreement between AI-determined NAPSI scores using a CNN and dermatologist-assigned scores for psoriatic fingernail images. In a dataset of 240 images, the AI model achieved excellent interclass correlation coefficients (ICCs) of 0.81 for overall NAPSI, 0.75 for matrix NAPSIm, and 0.81 for bed NAPSIb, outperforming dermatologist agreement which had ICCs of 0.43, 0.56, and 0.53, respectively; underlining the potential of CNNs to enhance accuracy and reliability in NAPSI scoring, though limitations include limited sample size and potential underrepresentation of racial/ethnic minorities. 291
Prediction of treatment response
Using AutoML, ML models were developed to predict the change in therapy, treatment response, and disease progression in psoriasis vulgaris and PsA with high accuracy (AUC up to 0.92). The main predictors included baseline PASI scores, modification of pruritus, initial therapeutic agents, and psychological factors such as depression and anxiety. 292 In a multimodal risk prediction study, treatment response analysis revealed that IL-17 inhibitors were effective for severe psoriasis, whereas methotrexate showed limited efficacy in preventing PsA progression (HR = 0.139; 95% CI, 0.040–0.474; p-value = 0.00024). 274 Gottlieb et al. combined ML with evidence-based medicine to investigate baseline data from 2148 PsA patients. They applied Bayesian elastic net regression on 275 predictors. They found variables associated with a good response to secukinumab, being anti-tumor necrosis factor-naïve, treated with one prior anti-tumor necrosis factor agent, not receiving methotrexate, with enthesitis at baseline, and with shorter PsA disease duration, allowing precise medicine strategies. 293 Post hoc ML analysis of 10 clinical trials of secukinumab (FUTURE, MAXIMISE, and MEASURE), Baraliakos et al. identified that PsA patients with certain characteristics (clusters) responded better to secukinumab 300 mg compared to 150 mg. Improved response to higher doses was observed, particularly in patients with axial involvement, higher articular burden, and those with pronounced psoriasis. Specifically, clusters representing overweight patients with extensive joint involvement (knees, shoulders, elbows, wrists) and psoriasis, as well as those with axial manifestations and oligoarthritis, demonstrated significantly better improvements in tender joint counts and swollen joint counts when treated with secukinumab 300 mg. 294 Richette et al. analyzed data from two phase III clinical trials (DISCOVER-1 and 2) of guselkumab in bio-naïve PsA patients to identify distinct PsA phenotypes using ML techniques. Non-negative matrix factorization was utilized to cluster 661 patients based on baseline clinical features, resulting in eight distinct clusters characterized by varying joint involvement, enthesitis, dactylitis, skin, and nail involvement. Guselkumab treatment improved MDA and Disease Activity index for PSoriatic Arthritis (DAPSA) scores across all clusters, with the best responses seen in clusters with high skin involvement and axial manifestations. However, clusters with small joint involvement showed lower initial responses, showing the heterogeneity of PsA and the need for patient-specific treatment approaches. 295
Prediction of comorbidities
Although the association and impact of comorbidities on PsA have been hot topics in recent years, studies of AI applications in the prediction, assessment, or follow-up of comorbidities on PsA are in their infancy. 296 Regarding cardiovascular risk assessment, recent evidence suggests that the combined use of QRISK3 and SCORE2 provided the best predictive accuracy. Although the literature is very scarce on this topic, Navarini et al. applied ML approaches, including SVMs and RnFr, to predict cardiovascular risk in PsA. Their best models outcompeted the standard risk calculators, yielding AUC estimates between 0.76 and 0.85, reflecting superior predictive performance. 297
AI can improve mortality risk assessment by combining information from clinical findings, test results, and comorbid conditions. This approach may help clinicians identify vulnerable patients earlier and guide more targeted and effective care plans. However, this review did not identify any study about the use of AI in mortality prediction.
Important note to readers
Whether AI or classical statistics is used to generate prediction models, the generalizability of these models remains an important research question. The performance of these models should be evaluated in prospective designs across groups with varying ethnic and social backgrounds.
Possibilities and challenges
Integrating multimodal AI into healthcare is promising for enhancing patient outcomes and effectively allocating medical services.298–300 By concurrently analyzing diverse data sources—such as text, images, and patient records—AI and LLM systems can offer a more comprehensive understanding of patient health, leading to more accurate diagnoses and patient-specific treatment plans. 298 For instance, AI and LLM algorithms can process medical imaging alongside clinical notes to detect subtle patterns indicative of early disease stages, thereby facilitating timely interventions. 221 However, it is crucial to ensure that AI-generated reports undergo rigorous validation by medical professionals to maintain accuracy and reliability. 221 Additionally, AI’s capability to analyze social media content enables healthcare providers to gain insights into patient experiences and sentiments, potentially uncovering health trends and patient concerns that might remain unrecognized.301,302 Such insights can significantly enhance patient-centered approaches and inform strategies to improve engagement and satisfaction. 303 Furthermore, AI-driven referral systems can prioritize patient cases based on severity, optimizing resource allocation and expediting care delivery.304,305 Mobile applications equipped with AI can monitor disease activity through PROs, record real-time data that supports proactive disease management, and enhance the effectiveness of treatment strategies. 306
Sentiment analysis can support patient monitoring by extracting emotional tone and symptom burden from unstructured clinical notes or patient-reported data, offering insight into subjective disease experience beyond standard measures.307,308 Sentiment features may enhance predictive models for disease flares, treatment adherence, or quality of life outcomes. 308
In addition to patient management, AI can improve patient selection by identifying individuals who are most likely to benefit from specific interventions based on multidimensional clinical, laboratory, and imaging data during randomized controlled trials. 309 AI can also assist in outcome assessment by automating disease activity scoring and imaging evaluations, thereby reducing interobserver variability and resource demands. 309 Additionally, AI-driven models may support adaptive trial designs by predicting treatment response trajectories, while digital twin technology can simulate individualized treatment outcomes, optimizing trial efficiency and personalization. 310
While exciting, advancements in AI raise many questions about using these systems and algorithms in almost every aspect of healthcare. A significant technical barrier is that most supervised models require large, high-quality datasets for training. 19 Small datasets often result in overfitting, and variations in data acquisition, coding, and patient populations across different settings diminish generalizability. External validation is critical but is rarely performed. 19 Data accuracy is vital; poor-quality data can lead to erroneous results and conclusions. Retrospective study designs dominate the field, limiting real-world applicability, whereas prospective trials are still scarce.33,311
AI models are also susceptible to unintended confounders, compromising generalizability. 19 Furthermore, adversarial attacks and intentional manipulations of input data significantly degrade model performance. In addition to more common metrics such as AUC, accuracy, and F1 score, cross-validation, statistical rigor, computational cost, data requirements, and model accessibility should also guide model evaluation.33,311 While these metrics perform well on most research datasets, their application in clinical settings has become challenging. Enhanced interpretability of the models can lead to greater trust and broader acceptance among healthcare professionals, whether by simplifying experimental design or using improved interpretation techniques.19,311
Despite their promise, the use of LLMs in rheumatology faces several challenges. These include the risks of hallucinations (plausible but incorrect outputs), a lack of high-quality, domain-specific training datasets, ethical concerns related to transparency and bias, and the absence of clear regulatory and professional guidelines. Moreover, their general-purpose design may limit performance on specialized rheumatologic tasks unless fine-tuned on a specific topic. 15
Neural networks and other algorithms lack transparency, are challenging to interpret, and reduce user trust. This lack of clarity in decision-making leads to inaccuracies, especially with novel datasets. 19 Ethical issues arise from algorithmic bias, such as racial disparities in healthcare predictions due to underrepresentation or misaligned metrics. 33 Additionally, private-sector interests may prioritize profit over quality care, potentially introducing biases in clinical decision-support systems. Protecting the privacy and ethical integrity of large-scale biological data is crucial.19,311 Discrimination against patients identified as high-risk may manifest as increased insurance costs. Data anonymity and medical confidentiality are vital to upholding patients’ rights and establishing trust in these technologies. 311 Addressing these concerns is essential for promoting the responsible integration of ML models into clinical practice, ensuring they are both effective and ethically compliant.
Everything discussed in this review has been compiled to illuminate the way forward. Advances in computer science, both present and future, have the potential to affect all facets of medicine, including rheumatology. In this context, umbrella organizations such as EULAR and ACR are responsible for ensuring data standardization and interpretability. The EU Artificial Intelligence Act, adopted in 2024, is the first comprehensive legal framework regulating the development and use of AI across the European Union. 312 The regulation is expected to enter into force in 2025, with phased implementation extending into 2026. It introduces a risk-based categorization system—unacceptable, high, limited, and minimal risk—placing stricter requirements (mandating robustness, data governance, and human oversight) on high-risk AI systems, particularly those used in healthcare. 312 Integrating regulated AI into medicine is poised to improve patient care, research, and drug development with ethical and moral considerations in mind.
Looking ahead, the emerging multimodal AI, which combines data from various sources, offers unparalleled prospects for transforming the management of rheumatic diseases. This technology exemplifies how further AI development should be approached thoughtfully by enabling more precise diagnostics, patient-specific therapeutic approaches, and enhanced patient follow-up. 313 Prioritizing ethical considerations, patient well-being, and interdisciplinary collaboration will be essential in using AI’s full potential to improve healthcare.
Conclusion
In conclusion, integrating AI into rheumatology represents a paradigm shift that can improve diagnosis, treatment, and patient follow-up in RA, AxSpA, and PsA. AI utilizes multimodal data to drive ML methodologies that enable unparalleled precision and personalization in managing complex rheumatic diseases. However, rigorous validation, ethical oversight, and interdisciplinary collaboration will be necessary to implement this approach to overcome technical and societal challenges successfully. Transparency, equity, and patient-centered approaches are essential in building trust and realizing the game-changer potential of AI in healthcare. Balanced AI progress may be the key to re-designing medicine’s future with innovation and humanity at its core.
Footnotes
Acknowledgements
ChatGPT 4o (Plus) and Grammarly (Premium) were used to ensure grammatical accuracy and improve readability. After using these tools, the author reviewed and edited the content as needed and took full responsibility for the content of the publication. With special thanks to the other members of Turkish Society of Rheumatology Artificial Intelligence Study Group (Sedat Kiraz, Ertugrul Cagri Bolek, Hasan Satis, Hakan Babaoglu, Tugba Izci Duran and Yavuzalp Azizoglu) for their support.
