Abstract
Background
Telerehabilitation (TR) delivers rehabilitation services through digital and information technologies. Recent advances in artificial intelligence (AI) have introduced new opportunities for TR, particularly in remote monitoring and individualized treatment. This scoping review aims to examine and synthesize the current literature on the use of AI and markerless motion analysis (MMA) within TR for patients with neurological disorders, distinguishing between approaches focused on remote monitoring/assessment and those supporting AI-based TR platforms.
Methods
A scoping search conducted in March 2025 identified articles published in the last ten years in the following databases: PubMed, Embase, Scopus and Web of Science (WoS).
Results
The initial search retrieved 290 records. After removing 67 duplicates, the remaining records were screened. Following full-text assessment, only 10 studies were included, while 208 were excluded due to wrong population (n = 93), study design (n = 89), outcomes (n = 21), or language (n = 5). Overall, the evidence for both MMA-based remote monitoring/assessment and AI-supported TR interventions remains early-stage and heterogeneous across populations, outcomes, and set-ups.
Conclusion
AI applications in TR and remote monitoring for neurological disorders remain early stage and heterogeneous. While current platforms remain largely experimental, AI-based TR devices and metrics can offer objective, quantitative data to support personalized care, reinforcing the essential role of remote rehabilitation and monitoring in maintaining the patient–clinician connection. Therefore, integrating AI to promote continuity of rehabilitation beyond the clinic may provide a novel way to tailor treatment intensity, adapt exercises over time, and optimize follow-up in neurological rehabilitation.
Registration number
10.17605/OSF.IO/FB8TD
Keywords
Introduction
Telerehabilitation (TR) is a branch of telemedicine that refers to the delivery of rehabilitation interventions through digital and information technologies. It encompasses a broad range of services, such as assessment, monitoring, prevention, intervention, supervision, education, consultation, and coaching, which can be provided remotely to patients and their caregivers (Brennan et al., 2010). This rehabilitation approach is widely considered among patients affected by neurological disorders, including Parkinson's disease (PD) (Goffredo et al., 2023), multiple sclerosis (MS) (Pagliari et al., 2025), and stroke (Calabrò et al., 2023; Federico et al., 2023). These neurological conditions are a common cause of motor and cognitive disabilities, resulting in a lower level of functional independence and poorer quality of life. In this context, rehabilitation is fundamental to promote global recovery, according to the patients’ needs. However, regarding conventional face-to-face rehabilitation, patients and their caregivers may encounter some challenges. For example, demographic barriers, like distance from rehabilitation centers, and high costs related to the therapy and transport, result in low adherence to the treatment. In this sense, TR could represent a potential solution that can address these concerns (Maggio et al., 2020). TR, through digital platforms and/or gaming technologies, can remotely provide therapeutic interventions, improving accessibility and engagement for patients. On one side, TR serves as a facilitator by offering interactive and engaging experiences that can enhance patient outcomes. However, the TR approach also poses challenges, including technological constraints and differences in user adaptability. In addition, evaluating patients clinically in the TR modality can be complex and challenging due to the inherent limitations of remote assessment (Asgharzadeh Chamleh et al., 2025). One of the primary issues is the self-performed nature of the assessments, since some items of the clinical scales can be difficult to execute accurately. For example, the administration of motor clinical scales, like the Fugl-Meyer assessment scale for the upper limb, often requires precise execution. In fact, it cannot be easily replicated in a remote setting, also due to several factors, including low camera resolution and poor lighting conditions, which can limit accurate observation of patient movements (Asgharzadeh Chamleh et al., 2025). Gaining a clear understanding of these factors is essential to ensure that remote rehabilitation programmes effectively support patients from diverse backgrounds in achieving their rehabilitation goals. Several studies (Calabrò et al., 2023; Krzyzaniak et al., 2023; Muñoz-Tomás et al., 2023) have reported the feasibility and the usability of different TR platforms in multiple neurological conditions, suggesting that the use of innovative TR technologies (e.g., the VRRS, Khymeia, Padua, Italy) can achieve outcomes comparable to conventional physiotherapy in selected populations. In these contexts, TR has shown evidence of clinical effectiveness and, in some trials, non-inferiority to in-clinic rehabilitation for neurological disorders. For example, Gandolfi et al. reported that TR improved balance in people with PD compared with in-clinic training. Other randomized controlled trials have suggested that TR may be a feasible and effective option for post-stroke rehabilitation (Calabrò et al., 2023; Saygili et al., 2024; Toh et al., 2025). Having established the feasibility and clinical effectiveness of TR across different neurological populations, an important next step is to explore how these interventions can be further optimized and individualized. In this context, recent progress in artificial intelligence (AI) has created novel opportunities for TR.
For instance, pose estimation (PE), which involves the detection of position and orientation of the body or an object from videos/images, can enhance the accuracy and the quality of movement assessment in TR modality (He et al., 2024). Thanks to the capability of 2D PE, physiotherapists could easily quantify the movement of patients and are constantly informed about patients’ changes in postures and movements, tracking their recovery progress. In addition, 2D PE could be used to give feedback to the patients during TR sessions, which enables patients to understand and correct any posture or movement issues. However, 2D PE has some limitations deriving from the lack of capturing specific details. In this regard, 3D PE involves the estimation of the 3D orientation (corresponding x, y, z coordinates) and position of joints (such as shoulders, elbows, wrists, hips, knees, and ankles) in human movement, allowing a more precise motion detection (He et al., 2024). This method for detecting peoples’ movements without the use of active or passive markers is called markerless motion analysis (MMA). Different from other types of motion capture systems (e.g., marker-based motion capture), MMA allows reducing time-consuming marker placement and costs related to the equipment (Lam et al., 2023). MMA typically relies on a standard camera, such as an RGB or infrared camera, to record human movements. For higher precision in 3D motion tracking, multiple cameras can be positioned at different angles (Colyer et al., 2018). After video acquisition, PE algorithms process the footage through AI systems, including OpenPose (CMU-Perceptual-Computing-Lab/openpose, 2017/2025), MediaPipe (google-ai-edge/mediapipe, 2019/2025), or DeepLabCut (DeepLabCut, s.d.). These systems detect joints and limb positions by applying deep learning (DL) techniques, such as Convolutional Neural Networks (CNNs), to extract skeletal structures accurately. This enables a more natural and realistic capture of human movement while relying on portable, cost-effective sensors rather than traditional marker-based multi-camera setups (Lam et al., 2023).
Recent developments (Cotton et al., 2023; West et al., 2023) have further enhanced precision by employing dense key point models trained on multiple datasets, such as MeTRAbs, which improve tracking of critical regions like the torso and pelvis. In addition, new approaches use neural networks to generate smooth and anatomically consistent 3D trajectories from video, leading to more reliable inverse kinematic analysis. Together, these advances position MMA as a promising tool for clinical applications, enabling fast and detailed movement assessments in rehabilitation settings, even for patients with complex motor deficits or those using assistive
Moreover, AI has enabled not only the development of new tools for movement analysis but also for rehabilitation treatment. An AI-based training platform that integrates DL-based 3D human PE can deliver highly accurate feedback and guidance by capturing detailed movement patterns (Capecci et al., 2018, 2023). Through these systems, users can compare their own movements with synchronized reference models displayed on their devices, allowing them to detect deviations or irregularities and adjust their performance accordingly (Barzegar Khanghah et al., 2023). In this regard, He et al. (He et al., 2024) reported in a randomized controlled trial of older adults with sarcopenia that a 3D PE-based TR programme achieved improvements in motor function and quality of life comparable to those observed with traditional rehabilitation, although these findings require confirmation in larger samples.
This scoping review aims to explore and synthesize the existing literature on the application of AI and MMA in the context of remote monitoring and TR of neurological disorders. Previous reviews were primarily focused on the use of AI in physical rehabilitation more broadly (Rasa, 2024; Sumner et al., 2023), considering different causes of injuries and not only neurological disorders. In addition, a recent bibliometric analysis was conducted on studies involving AI and robotics for stroke patients (Taşkaya & Taşkaya, 2026). Other reviews have examined markerless motion capture for clinical assessments and in healthy subjects (Lam et al., 2023; Pardell et al., 2024). To the best of our knowledge, no previous review has specifically focused on AI-based MMA systems that have been applied in home-based TR for adult patients with neurological disorders, with a dual emphasis on technical implementation and clinical applicability.
Specifically, this scoping review seeks to identify the AI-based and MMA methodologies that have been applied in clinical or home-based TR settings, summarize their reported performance in tracking and evaluating movement, describe their maturity in terms of usability and feasibility in real-world or near real-world environments, and examine their potential benefits and limitations for neurological rehabilitation.
Methods
This scoping review followed the Preferred Reporting Items for Scoping Reviews - Systematic Reviews and Meta-Analyses (SR-PRISMA) guidelines (Tricco et al., 2018) to enhance the transparency, completeness, reliability, and validity of the reported information (SR-PRISMA checklist is available in the supplementary material – S1). The protocol was registered in Open Science Framework (OSF) (https://osf.io/dashboard): 10.17605/OSF.IO/FB8TD.
PICO Model
We defined our combination of search terms using a PICO model (population, intervention, comparison, outcome) (Eriksen & Frandsen, 2018). The population considered was various neurological disorders, such as stroke, Parkinson's disease, multiple sclerosis and so on. The intervention included all studies that explored, described, or applied AI to remotely monitor and treat patients affected by neurological disorders. The comparison was related to the differences between AI models, ML and DL algorithms. The results encompassed contributions to assessment and rehabilitation treatments by AI technologies.
Eligibility Criteria, Information Sources and Search Strategy
A scoping search, started from March 2025, was conducted for all peer-reviewed articles published in the last ten years, using the following databases: PubMed, Embase, Scopus and WoS, which are the most used in the context of medicine and the bioengineering field. The choice of this time frame was made to capture the growing interest and recent technological advances in MMA and AI-based TR, and to map the emerging body of literature in this field. The initial search strategy included combinations of the following keywords: “telerehabilitation,” “machine learning,” “markerless,” “artificial intelligence,” and “neurological disorders”. However, the search strings were adapted to each database's syntax, as reported in Table 1.
Database Search Strategies and Keyword Strings.
We included all studies on the adult population (>18 years) affected by neurological disorders, with various diagnosis. Specifically, the inclusion criteria were: i) patients affected by neurological disorders; ii) Use of AI and related technologies for the remote monitoring and TR treatment; iii) written in English language; and iv) published in a peer-reviewed journal.
Articles describing theoretical models, methodological approaches, algorithms, and basic technical descriptions were excluded. We also excluded: i) animal studies; ii) review; iii) studies involving children; iv) case reports. These restrictions were intentional, as our focus was on AI and MMA methods that have been applied in TR or remote monitoring of patients with neurological disorders. In particular, purely theoretical or algorithm-development studies, often conducted on healthy volunteers or on generic motion datasets without a TR context in neurological patients, were considered beyond the scope of this clinically oriented scoping review.
The list of articles was refined for relevance, revised, and summarized, with key topics identified based on the inclusion and exclusion criteria. Given the limited literature available, various study designs were included in the qualitative synthesis: i) Randomized Controlled Trials (RCTs); ii) Observational studies; iii) Cross-sectional studies; iv) Case-control studies; and v) Cohort studies.
Selection of Sources of Evidence and Data Charting Process
Two independent reviewers (M.B. and G.L.) screened titles, abstracts, and full texts, applying predefined inclusion and exclusion criteria to minimize selection and publication bias. Disagreements were resolved by consensus. All search results were imported into an online database (Rayyan) (Ouzzani et al., 2016), where the reviewers independently assessed each study's relevance. Following the initial title and abstract screening, blinding was lifted, and any remaining disagreements regarding study inclusion were resolved through discussion.
In line with current guidance for scoping reviews, we did not perform a formal methodological quality or risk-of-bias assessment of the included studies. The primary aim of this work was to map and descriptively synthesize the emerging literature on AI-based TR and MMA in neurological rehabilitation, rather than to generate pooled effect estimates or compare the efficacy of specific interventions.
To contextualize potential sources of bias in the available evidence, we recorded key methodological descriptors for each study during data extraction (e.g., design, sample size, intervention setting, and type of comparison group). These characteristics are reported in the summary tables (see Tables 2 and 3).
Summary of the Studies Reporting Markerless Applications for Remote Monitoring.
Legend: AI (Artificial Intelligence); PD (Parkinson's Disease); POs (Primary Outcomes); CFs (Control Factors); FV (Frequency Variability); TS (Total Score); ICC (Intraclass Correlation Coefficient); DTW (Dynamic Time Warping); IoT (Internet of Things); SVM (Support Vector Machine); CNN (Convolutional Neural Network); OS (Operating System); RAM (Random Access Memory); Hz (Hertz)
Summary of the Studies Reporting AI-based TR Platforms and Applications.
Legend: AI (Artificial Intelligence); TR (Telerehabilitation); ARC (Augmented Rehabilitation Care); SUS (System Usability Scale); BaDI (Brief Aphasia Disability Index); 2MWT (2-Minute Walk Test); BFI (Brief Fatigue Inventory); BAI (Beck Anxiety Inventory); BDI (Beck Depression Inventory); EQ-5D (EuroQol-5 Dimension); PD (Parkinson's Disease); HY (Hoen & Yahr); MoCA (Montreal Cognitive Assessment); SCI (Spinal Cord Injury); CNN (Convolutional Neural Network); LOSO (Leave-One-Subject-Out); OIESGP (Online Infinite Echo-State Gaussian Process); DTW (Dynamic Time Warping); DOF (Degrees of Freedom); sEMG (Surface Electromyography); IMU (Inertial Measurement Unit); HBR (Home-Based Rehabilitation); WMFT (Wolf Motor Function Test); ROM (Range of Motion); HBR (home-based rehabilitation); CG (control group); BDI (Beck Depression Inventory).
Data Extraction and Data Items
Following full-text selection, data from the included studies were charted in a structured data sheet. The extracted information included: first author and year of publication, study aim, sample size, baseline characteristics, intervention setting, and type of comparison group, type of clinical predictors (if included), type of ML/DL/AI algorithm used, results and performance metrics (e.g., accuracy, sensitivity, specificity, area under the curve – AuC), presence of interpretability techniques, and clinical implication.
Results
The initial total number of records retrieved was 290. After removing 67 duplicates, the remaining records underwent screening. Five articles were excluded based on title and abstract. In the second round of screening, the two reviewers assessed the included articles per their full text. In particular, we included 10 articles in the final analysis and excluded 208 articles for the following reasons: wrong population (n = 93), wrong study design (n = 89), wrong outcomes (n = 21) and non-English studies (n = 5) (see Figure 1).

PRISMA flow-chart showing the study selection process.
Markerless Applications for Remote Monitoring
Among the selected evidence, we found that 50% (5 out of 10) articles dealt with markerless and AI-based applications for remote monitoring in patients with neurological disorders (see Table 2). In these five studies, AI and MMA were used to capture human's movements and gesture during TR sessions (Capecci et al., 2018; Dellepiane et al., 2025) or to assess movements more objectively in TR modality (Hartman et al., 2022; Nucita et al., 2023). Other applications include voice recognition for dysarthric people (Mulfari et al., 2022) through DL techniques (see Table 2).
Nucita et al. used a ZED camera, which was placed in front of the subject to evaluate passive range of motion (PRoM) on the frontal plane, and to the patient's side to evaluate ProM on the sagittal plane. The body segments associated with the assessed joint were passively and carefully mobilized to the limit of their range of motion, at which point a measurement was recorded. Each evaluation was conducted by the participant's primary caregivers, who passively moved the subject's joints in front of the 3D ZED camera. Each motor evaluation was carried out before and after the TR intervention. In particular, 3D ZED/Open Pose-based joint angle estimates showed high agreement with therapist-measured goniometric values (Pearson's r = 0.62–0.89; ICC = 0.78–0.92), supporting the validity of 3D markerless methods for joint mobility assessment. In contrast, 2D skeleton-based measurements exhibited only moderate agreement (Pearson's r = 0.45–0.77; ICC = 0.68–0.81), indicating lower accuracy compared to 3D but still acceptable precision for clinical or remote monitoring and representing a feasible, low-cost alternative. Regarding the TR intervention, it consisted of three or four active exercises and two or three passive postures. The active exercises were performed every week within three thirty-minute sessions for the duration of the intervention (three months). These sessions were conducted by participants’ primary caregivers (who were together with the patients) with the live supervision of a therapist experienced in the motor treatment of people with Rett syndrome.
Similarly, Capecci et al. introduced a real-time monitoring system designed to assist clinicians in remotely evaluating exercise performance during home-based rehabilitation (HBR). In particular, they extracted specific kinematic features based on clinician indications to assess five motor tasks commonly used in axial disorder rehabilitation (trunk lateral tilt, arm lifting, trunk rotation, pelvis rotation, and squatting). These features were extracted using the Kinect v2 skeleton tracking system and processed to generate disaggregated performance scores based on a bell-shaped ranking function. The system was tested on 28 healthy individuals and 29 patients with neurological or orthopaedic conditions. Using a cross-sectional controlled design, the algorithm's scores were validated against blinded clinical evaluations via a structured questionnaire. The authors reported moderate to high correlations between Kinect-derived kinematic scores and clinician ratings (Spearman's r = 0.41–0.64; p < 0.05), further supporting the clinical relevance of markerless, remote assessments of motor performance, discriminating between healthy and pathological subjects.
The Kinectv2 sensor was also used by Dellepiane et al. to recognize human movements during Sit-to-Stand (STS). In particular, the motor exercises and instructions to the patient were delivered by the ReMoVES IoT system. Designed to integrate with traditional rehabilitation, ReMoVES operates during periods without therapist-guided activity, enabling information collection and access from various locations. The authors collected multidimensional kinematic parameters, finding that stroke and the elderly patient groups showed non-normal distributions for most parameters (p < 0.001), with some exceptions: flexion knee angles (normal in both patient groups), trunk extension and flexion angles (normal in the elderly group), and shoulder twist angle (normal in the stroke group). In addition, feature robustness was validated through correlation analysis of instantaneous right and left knee angles, showing that the controls and the elderly group had the highest correlation. Despite lower correlation in the stroke group, high values suggest preserved bilateral movement coordination, possibly due to good residual autonomy in participants. Moreover,
Different from other authors, Hartman et al. aimed to use accelerometer data in a novel analysis to remotely monitor the use of a dynamic arm support device for individuals with muscular dystrophy. The authors employed an ML algorithm (support vector machine – SVM) to evaluate the relationship between accelerometer measurement and functional tasks of the upper limb during the use of an actuated assistive device. The arm movements were registered on all planes, and they included: reaching forward, pushing backwards, reaching left, right, and diagonally in all directions. The amount of support given by the device can be adjusted by the user with the use of a hand-held remote. Each participant completes the functional upper limb tasks while wearing the ActiGraph and being video recorded. In this study, accelerometery data were used to address two classification problems: determining whether the O540 dynamic arm support device was being used and distinguishing between successful and unsuccessful task attempts. The SVM helped to automatically identify the most informative features from the accelerometer signals, such as mean, standard deviation, and signal power, while minimizing model overfitting. SVM was chosen not only for its ability to model complex, nonlinear decision boundaries (using kernels like the radial basis function), but also because it performs well with smaller datasets, a key consideration given the rarity of Duchenne muscular dystrophy and the resulting limited sample size. The study further explored how variations in these key features related to task success. Classification of task outcomes was performed under three conditions: when the O540 was used, not used, and both combined. This allowed for an analysis of the device's effect on performance detection.
Interestingly, Mulfari et al. developed an automatic speech recognition algorithm for native Italian speakers with dysarthria, exploiting an existing mobile app to collect audio data from users with speech disorders while they perform articulation exercises for speech therapy purposes. With this data availability, a CNN has been trained to spot a small number of keywords within atypical speech, according to a speaker-dependent method. The CNN was trained for isolated keyword recognition using a supervised ML approach. Speech data consisted of 650 audio samples from six Italian native speakers with varying levels of dysarthria. Each speaker contributed 50 samples per keyword. The computational setup used for deep learning training and deployment on mobile devices included: i7 CPU, 16 GB RAM, GTX 1070 GPU. The results showed that personalized models trained with more data (mode30) achieve the highest accuracy, with up to 98.6% in speaker-specific configurations. Global models also perform well, particularly in the mode30 setup (97.9%), confirming that increasing the number of training examples significantly improves keyword recognition performance.
AI-Based Platforms Carrying out TR
The other 50% (5 out of 10) of collected evidence dealt with AI-based TR platforms. Some authors have conducted feasibility and usability studies on developing systems that incorporate AI algorithms to optimize the selection of exercises based on patient-specific difficulties (Capecci et al., 2023). Others have developed systems to support TR using devices equipped with ML algorithms (Bertomeu-Motos et al., 2023; Chae et al., 2020), where the systems are able to correct movements and support the safe guidance of rehabilitation sessions for patients (see Table 3).
In their study of 2023, Capecci et al. delivered TR sessions with an innovative device called ARC. The ARC is a TR solution that integrates multiple wearable sensors and a mobile device supported by AI algorithms. The system includes five inertial sensors (MetaMotionR + from MbientLab, San Francisco, CA, USA), a tablet equipped with a dedicated application, and a charging station. In its Home version, ARC enables patients to carry out their rehabilitation programme independently, providing simple instructions, video tutorials, and automated tracking of correctly executed repetitions. The innovative core of the device lies in its AI algorithm, which automatically counts the number of repetitions performed accurately. The algorithm processes input signals consisting of tri-axial accelerations and angular velocities, transmitted in real time via Bluetooth from three of the five inertial measurement unit (IMU) sensors worn by the patient during each session. The specific sensors used vary according to the type of exercise and the targeted body region. Before using the ARC device at home, each participant received a 30-min in-person training session on the software and hardware components, followed by hands-on practice with selected rehabilitation exercises. After the training, participants used the ARC system at home for four weeks, performing 45-min sessions, five days per week, and participating in at least one 30-min video call per week with the investigator. The ARC system enabled remote monitoring of exercise execution and adherence. Despite the high usability score achieved (77/100), seven participants (9% of the total sample, mostly individuals with PD) experienced difficulties in using or accepting the technology. Nevertheless, the system was found to be safe, and no adverse effects were reported during its use.
Chae et al. designed a HBR system that combines a smartwatch with a smartphone application, employing an ML algorithm to identify and track both the type and frequency of rehabilitation exercises. The system utilized off-the-shelf devices and custom applications, with a CNN to detect exercises. They compared detection accuracy across different data types (accelerometer, gyroscope, or both) and data sets (individual vs. total). The intervention focused on bilateral upper limb exercises for stroke patients with mild motor impairment, including shoulder flexion, wall push-ups, scapular activation, and towel slides. Bilateral training aimed to promote contralateral motor network reorganization via interhemispheric crosstalk, supporting upper limb recovery in chronic stroke.
For the ML model, the authors implemented a CNN with two convolutional layers containing 8 and 16 feature maps, followed by a fully connected layer with 32 nodes. They also tested two approaches: one model trained on personal datasets, and another trained on the combined dataset. Model performance was assessed using cross-validation. The most accurate model was built with personal data that combined accelerometer and gyroscope signals, achieving 99.80% accuracy (5590/5601). This outperformed model trained with accelerometer data alone (98.13%, 5496/5601) or gyroscope data alone (96.07%, 5381/5601). In the comparative study, dropout rates at 12 weeks were 40% (4/10) in the control group and 22% (5/22) in the TR group, increasing at 18 weeks to 100% (10/10) and 45% (10/22), respectively.
In the study by Bertomeu-Motos et al., the authors proposed a home-based AI system, in which the patients were guided during the execution of the correct movement. The proposed system had to decide on the next activity that the patient might perform to help the patient stay motivated and eager to continue the therapy. This system was also used to assess upper limb joint movements in patients with motor impairments. Motion trajectory results were derived by comparing patients’ joint trajectories executed with reference movements performed by clinicians. The system supports a wide range of neurorehabilitation tasks, from simple actions (e.g., touching the head) to complex ones (e.g., drawing geometric shapes in the air). In particular, the activity classification was obtained with a time-series classification model, Online Infinite Echo-State Gaussian Process (OIESGP). This model was trained on clinician-performed activities. The trajectory comparison was obtained by using the Dynamic Time Warping (DTW), measuring similarity between the patient's and the clinician's joint trajectories. The authors defined the correct execution of the upper limb task as a case in which the system both correctly classified the activity and reported a low DTW distance. In addition, the authors evaluated a system combining activity classification and DTW distance computation to assess upper limb rehabilitation movements. Three data scenarios were tested: trajectories of seven joints, five joints, and eight sEMG signals (flexor and extensor muscles of the forearm). Due to poor accuracy in the sEMG-based scenario, the five-joint model was selected. Classification accuracy was 64% for a healthy subject but significantly lower for post-stroke patients (38% for patient 1 and 8.3% for patient 2). The system also supported adaptive therapy by suggesting subsequent exercises and evaluating movement quality to maintain patient motivation during HBR.
Similarly, other researchers have introduced TR platforms with automated exercise guidance, a feature that is expected to substantially enhance rehabilitation outcomes. In this context, Barzegar Khanghah et al. (Barzegar Khanghah et al., 2023), designed and validated a vision-based biofeedback system capable of assessing the quality of exercises performed during TR sessions. In particular, the study developed an activity recognition model to automatically detect whether users performed rehabilitation exercises correctly. The model was trained using only “correctly executed” gestures, leveraging the Inflated 3D ConvNets (I3D) architecture by Carreira et al. (Carreira & Zisserman, 2017), pre-trained on the Kinetics dataset. The I3D model was designed based on the Inception-v1 using batch normalization with inflating filters and pooling kernels into 3D. The model achieved high accuracy on benchmark datasets (e.g., 97.9% on UCF-101, 96.9% on HMDB-51, and 74.1% on Kinetics with RGB data). The proposed system achieved average accuracy values of 90.57% ± 9.17% and 83.78% ± 7.63% using 10-Fold and Leave-One-Subject-Out (LOSO) cross-validation, respectively. In addition, the authors achieved average F1-scores of 71.78% ± 5.68% using 10-Fold and 60.64% ± 21.3% using LOSO validation. The proposed 3D-CNN successfully classified rehabilitation videos and provided feedback on exercise quality, enabling users to adjust their movement patterns.
Ramírez-Sanz et al. (Ramírez-Sanz et al., 2023) proposed a TR system consisting of three main components: i) a Jitsi-based server for real-time data transmission; ii) an affordable, patient-side device that connects to a television, and iii) an AI-processing server using Kafka, Spark, and Detectron2 for advanced computer vision tasks. A key element of the system is the real-time video processing module used to analyze patient movements during rehabilitation exercises. This module identifies the skeletal structure of the patient using a PE model. For each video frame, the model outputs a series of tensors, with the primary tensor containing 17 key points representing major body joints (e.g., wrists, elbows, shoulders, knees, ankles). This skeletal data can be used to track patient progress over time by analyzing joint angles and movement patterns throughout the rehabilitation process. To reduce false positives in person detection, particularly in the controlled scenario where only one person (the patient) is present and centrally located, a high confidence threshold (0.99) was set. This ensures that only the patient is detected and classified, enhancing accuracy. To address privacy concerns during system development and research, an anonymization step was integrated. The system uses the res10_300 × 300_ssd_iter_140000 Caffe model in OpenCV to detect facial regions, which are then blurred using a 3 × 3 Gaussian filter. This guarantees that patients cannot be identified in the recorded videos. In production deployments, this step may be skipped to reduce computational cost, as video processing would be fully automated and only accessible by the therapist.
Across these studies, error metric values related to classification or detection accuracy, such as F1-scores, should be interpreted as indicators of technical performance in recognizing or quantifying specific tasks or movement patterns. They do not in themselves demonstrate the clinical effectiveness of the underlying rehabilitation interventions.
Discussion
To the best of our knowledge, this is the first review to examine the role of AI, including ML, DL, and markerless applications, in the context of remote motor assessment and treatment. Specifically, we focused on TR applications and AI-based platforms capable of recording patients’ movements and exercises in home settings. We found that only a limited number of studies addressed these applications in the context of neurological disorders. In fact, many studies were excluded because they investigated TR combined with AI in healthy individuals (Abrar Ashraf et al., 2025; Clemente et al., 2024; Lam & Fong, 2024). For example, Clemente et al. (Clemente et al., 2024) explored the feasibility of a model for 3D PE from monocular 2D videos (MediaPipe Pose) in healthy subjects by comparing its performance to ground truth measurements. MediaPipe Pose was investigated in eight exercises typically performed in musculoskeletal physiotherapy sessions, where the ROM of the human joints was the evaluated parameter. The model achieved its best performance in key upper- and lower-limb exercises, supporting the potential of monocular 2D PE as a markerless, low-cost, and accessible tool for musculoskeletal TR. Interestingly, other authors (Abrar Ashraf et al., 2025) developed a novel TR platform for remote monitoring in elderly people that processes depth video frames using a multistage methodology. Their pipeline started with noise and floor removal, followed by 3D connected component labelling to identify the human subject and extract the human silhouette. Next, skeleton joint points are estimated, and features are extracted from both the joints and silhouette. These multimodal features are fused and input into a DL model for classification and correctness assessment. Advanced feature extraction techniques, including Synchrosqueezing Transform and Hilbert-Huang Transform, are employed to capture dynamic time-frequency characteristics of human actions. The proposed system classifies nine distinct exercises and assesses the correctness of movements. Furthermore, the authors tested the classification accuracy (91%) for exercise recognition and movement correctness (82%). In addition, we excluded other studies focusing on facial expression recognition, which were mainly conceptual, on healthy subjects and were not TR-based applications (Ciraolo et al., 2024; Hadjar et al., 2025; Yolcu et al., 2019; Yoonesi et al., 2025). Other excluded studies focused on the analysis of clinical data using ML algorithms (Buscarini et al., 2025), whereas our review concentrated on AI-integrated TR technologies for motor assessment and treatment.
Considering major challenges in healthcare, such as population aging and the prevalence of chronic conditions, the integration of AI into TR may prove highly valuable, enabling continuous monitoring and treatment while providing low-cost, eco-friendly, and adaptable solutions for patients and their caregivers.
Markerless Applications for Remote Monitoring
Some of the included studies in this review used 3D PE models. In general, the human PE is the process of tracking the human body. This technique typically represents the human body as a skeletal model, an interconnected, tree-like framework consisting of key landmarks (such as joints and other crucial points) linked by segments representing body parts, as reported by some authors (Nucita et al., 2023). Commonly tracked joints include the shoulders, elbows, wrists, hips, knees, and ankles, while additional key points from the spine, hands, feet, and face can also be incorporated. This form of body representation relies on a relatively small number of parameters, making it highly effective for motion capture applications (Figure 2).

General pipeline for markerless motion analysis. (1) Preparation of the setup and definition of the task (e.g., a standardized test such as the 10-Meter Walk Test); (2) Acquisition of body landmarks during the motor task through camera recording; (3) Post-processing and selection of the most relevant landmarks; (4) Quantitative motion analysis based on the extracted landmarks.
Human PE is generally categorized into 2D and 3D approaches. While 2D PE identifies the spatial position of body landmarks on the x–y plane, 3D PE incorporates the z-axis as well, thus capturing depth. The essential distinction lies in the additional depth dimension provided by 3D models (Clemente et al., 2024). In 2D PE, 3D PE methods can be further divided into monocular (single-view) and multi-view systems. Monocular methods employ only one fixed 2D camera, whereas multi-view approaches combine images from two or more cameras placed at different viewpoints to reconstruct the subject from several angles (Zheng et al., 2023). Another axis of PE classification relates to the number of individuals detected. PE algorithms can be designed either for single-person or multi-person detection. Single-person models extract the body landmarks of one subject per frame, while multi-person models must simultaneously detect and separate landmarks for several individuals, a process that is inherently more complex (Clemente et al., 2024). In the selected studies, TR typically involves a single session per participant, making the single-person model the predominant approach in this clinical context. Most of the included studies used the Microsoft Kinect to assess movements in a markerless way. This system was validated by Metcalf et al. (Metcalf et al., 2013), to measure dynamic hand, finger, and thumb movements. The system reached an accuracy of 78% in landmark identification, with joint angle errors ranging between ±10–12° and generally low absolute errors. These results outperformed conventional manual assessments, indicating their potential for home-based hand motion capture in telerehabilitation settings. However, the tracking accuracy of Kinect may be influenced by the subject's orientation and distance from Kinect itself, noise affecting depth data, the subject's body shape, and limitations in the PE algorithm (Wang et al., 2015). The accuracy of Kinect v2 may also be affected by motion speed, as its relatively low capture rate (30 fps) and local fluctuations in depth sensing during movement can introduce errors (Sarbolandi et al., 2015). In this context, Timmi et al. (Timmi et al., 2018) developed a novel tracking method using Kinect v2, employing custom-made coloured markers and computer vision techniques. The authors tested the accuracy of this approach relative to a conventional Vicon motion analysis system, performing a Bland–Altman analysis of agreement. In most conditions, the limits of agreement (LOA) for marker coordinates were within 10 mm, although accuracy tended to decrease as treadmill speed increased along the depth axis of the Kinect. For knee joint angles, LOA remained within −1.8° to 1.7° for flexion and −2.9° to 1.7° for adduction during fast walking. These findings indicate that the proposed method showed good consistency with a marker-based reference system across different gait speeds, supporting its use as a cost-effective motion analysis solution for selected biomechanical applications.
AI-Based Platforms Carrying out TR
The integration of AI-based platforms in a rehabilitation setting allows for tailoring physical exercises to the specific characteristics of each patient's condition. The growing demand for remote healthcare solutions has further accelerated the development of effective TR systems to support individuals recovering from chronic conditions. In this review, we identified evidence related to AI-based TR platforms; however, a large proportion of these platforms are still experimental, with research primarily addressing technical accuracy, feasibility, and usability aspects. Differently, Chae et al., also investigated the effectiveness of the treatment delivered with an innovative AI-based TR system. Similarly, Capecci et al. (2023) clinically evaluated patients who received TR sessions with ARC, an AI-based platform. Despite the unsupervised nature of the TR sessions, the median patient adherence to the prescribed exercises was 80%. In addition, the authors reported significant improvements in walking function (2-Minute Walking Test), fatigue (Brief Fatigue Inventory), and quality of life (Euro-Quality of Life Questionnaire self-assessment-5 Dimension). Furthermore, no side effects were reported, suggesting that ARC is a feasible and well-tolerated option for HBR in people with PD.
Current trends in AI for rehabilitation emphasize the acquisition and analysis of extensive datasets generated by wearable sensors, ambient monitoring tools, and smart home technologies (Calabrò & Mojdehdehbaher, 2025; Celesti et al., 2024). By applying predictive analytics, these continuous data streams allow for dynamic, individualized adjustments to treatment and help anticipate risks like falls or progressive functional loss (Celesti et al., 2023, 2024). For example, Bertomeu-Motos et al., developed a system to intelligently guide poststroke patients during exercise, according to the quality of the movement executed, for a self-managed rehabilitation platform at home. In addition, the system could offer a novel tool for clinical assessment of patient improvements throughout the rehabilitation therapy. This aspect is more needed in the context of remote monitoring, where clinicians are not able to visit patients directly, thus they need to be objectively informed about the status of the patients. For example, the TR system implemented by Ramírez-Sanz et al. allowed the personalization of the TR sessions through a real-time DL model for human PE (Detectron2), which tracked patients’ skeletal movements during therapy sessions. Beyond exercise monitoring, AI-enhanced TR systems can also track physiological parameters, sleep quality, and treatment compliance, thereby broadening the scope of patient management, as highlighted by Capecci et al. This aspect can provide a comprehensive overview of the patient's overall health status.
Technical Gaps in Vision-Based AI and Markerless Systems
Regarding the technical gaps identified in the included evidence, we observed that the vision-based systems lack standardized and consistent procedures to validate PE accuracy and overall system performance. Classical computer-vision pose-estimation metrics (e.g., mean per-joint position error (MPJPE) and percentage of correct key points) are commonly used to quantify the accuracy of 2D and 3D PE algorithms. However, among the studies included in this review, only Nucita et al. provided a direct quantitative comparison between 3D and 2D PE, and this was based on correlations and intraclass correlation coefficients with goniometric joint measurements rather than on these standard pose-estimation metrics. Furthermore, most vision-based TR systems included in this review evaluated performance mainly at the level of the final task, for example, by reporting exercise-level classification accuracy or the degree of concordance with clinical rating scales, rather than by directly quantifying pose-estimation error. For example, in Kinect-based systems such as those by Capecci et al. (2018) and Dellepiane et al. (2025), the authors evaluated their methods by analysing the correlations between kinematic indices and clinical ratings or by modelling how measurement errors propagate to joint angles. Likewise, both depth-based and 2D CNN methodologies, such as the Kinect One depth-I3D system developed by Barzegar Khanghah et al. and the webcam-based Detectron2 pipeline created by Ramírez-Sanz et al., primarily evaluated performance through exercise classification accuracy. However, they did not offer quantitative error measures for individual joints.
System-level performance reporting was also limited. Apart from the system proposed by Ramírez-Sanz et al., which reported the time required to process each frame and the amount of data needed to sustain a 10-fps video stream, the vision-based systems did not provide any information on processing speed or use of computing resources. Consequently, the available information does not allow us to determine whether the proposed pipelines can be reliably deployed on standard home devices, mobile platforms, or in low-bandwidth settings. Beyond raw processing speed, the way AI-generated feedback is integrated into patient and therapist workflows was also only partially described. The majority of the studies included in this review refer to “real-time” or “online” feedback, as in the architecture proposed by Ramírez-Sanz et al. However, metrics regarding end-to-end latency or user–machine feedback delay (i.e., the time between a patient's movement, AI processing, and the delivery of corrective feedback to the user or the therapist) were rarely reported. As a result, the temporal characteristics of these TR systems, and their potential impact on rehabilitation responsiveness remain largely undocumented in the current literature.
From a reproducibility standpoint, hardware and camera-configuration details were often incomplete. While the type of sensor (e.g., Kinect v2, Kinect One, ZED stereo, webcam model) was usually specified, essential information such as camera calibration procedures, image resolution, frame rate, camera-to-subject geometry, and illumination control was rarely reported. Some vision-based studies (Barzegar Khanghah et al., 2023; Capecci et al., 2018; Dellepiane et al., 2025) mentioned the potential impact of lighting conditions but did not provide standardized protocols for environmental control, thereby making it difficult to compare the different experimental setups.
Across the included studies, the reporting of AI algorithms was often limited and rarely addressed comparative performance, hyperparameter optimization, or computational complexity systematically (See Supplementary Table 1). AI components were mainly used to support activity recognition, exercise-quality classification, or automatic speech recognition, and typically consisted of a single or hybrid chosen model rather than a family of alternatives evaluated under the same conditions. The algorithms ranged from conventional ML models (e.g., SVM, random forests, hidden semi-Markov models, Gaussian process–based classifiers) to DL architectures such as CNNs, CNN-LSTMs, 3D CNNs (I3D), and time-series CNNs for inertial sensor data.
Only a few studies reported some form of comparative analysis. For instance, Chae et al. compared CNN models trained on accelerometer-only, gyroscope-only, and combined accelerometer–gyroscope signals and compared personalized with population-level models. Personalized models using both modalities achieved the highest recognition accuracy in chronic stroke survivors. Similarly, Bertomeu-Motos et al. compared three input configurations (seven-joint, five-joint, and EMG-only trajectories) and selected the five-joint model as a compromise between robustness and performance, while the EMG-only configuration yielded poor accuracy, especially in post-stroke patients. In the vision-based domain, Ramírez-Sanz et al. empirically compared four COCO keypoint R-CNN variants implemented in Detectron2 and chose the keypoint_rcnn_R_50_FPN_3x model based on a favourable trade-off between loading and processing time on their big-data pipeline. Barzegar Khanghah et al. trained a depth-based I3D model to classify exercises as correctly or incorrectly executed and evaluated its performance using both 10-fold and LOSO cross-validation. These authors explicitly describe a hyperparameter optimization procedure, using grid search over batch size, learning rate, and number of epochs, together with dropout and early stopping.
Furthermore, the statistical validation of AI performance was generally based on point estimates (e.g., accuracy, F1-score), and confidence intervals or formal tests to compare alternative models or devices were rarely reported. Similarly, we noticed a lack of Bland–Altman analyses in the included evidence to assess agreement between AI-derived measures and reference instruments, which could help quantify bias and limits of agreement in clinical applications.
Benefits and Challenges for the Adoption in Clinical Practice
The evidence summarized in this review should still be regarded as preliminary, especially when considering its translation into everyday clinical practice. Although initial trials report promising outcomes, important questions persist, particularly about the sustainability of AI-based tools over time and their applicability across different clinical populations. To confirm the clinical value of AI-supported TR, future research will need to rely on broad, long-term studies involving heterogeneous patient groups to strengthen external validity. Although conventional rehabilitation has consistently proven effective, it often relies on intermittent monitoring, potentially missing subtle yet clinically meaningful changes in a patient's condition. In contrast, AI-assisted systems offer the potential for continuous, personalized monitoring and intervention. However, this transition brings substantial challenges. A critical challenge relates to digital inequities: differences in device availability, internet reliability, and digital skills may widen current gaps in healthcare access. These infrastructural limitations must be addressed to ensure equitable implementation of AI-driven rehabilitation systems. Moreover, adherence to the use of devices must also be considered. This aspect goes hand in hand with system usability and technology acceptance. However, few authors have investigated this important issue. Ongoing remote monitoring depends mainly on patients’ regular and consistent use of the technological devices provided. Formal usability questionnaires were reported in the ARC platform (Capecci et al., 2023), which achieved high System Usability Scale scores, and in the big-data architecture proposed by Ramírez-Sanz et al., where Telehealth Usability Questionnaire scores suggested good perceived ease of use and satisfaction. However, most other studies provided little or no insight into how patients perceived remote supervision, AI-generated feedback, or the overall burden of interacting with the technology. Beyond quantitative adherence rates, very few data are available on barriers and facilitators from the patient's point of view, such as fear of technology, perceived usefulness, changes in motivation, or the impact on autonomy in daily life. In this regard, providing a patient-centric and personalized approach can be an important factor in determining the success of a TR and remote monitoring interventions in reducing acute care use (Srivastava et al., 2019). Another important aspect in this field is related to the patients’ training on how to use the device, which will likely also need to be personalized and, at times, repeated. For example, the TR session or remote monitoring intervention can be personalized by using individual data to determine alert thresholds (Thomas et al., 2021). In this regard, the caregiver's role is crucial: by acting as a co-therapist and being actively involved in the rehabilitation process, the caregiver can facilitate the patient's engagement in the use of TR devices. In this context, the way AI-generated feedback is integrated into the therapeutic workflow is also clinically relevant. In most available systems, it is not always clear whether feedback is delivered synchronously during the exercise or mainly through asynchronous reports that clinicians review after the session. From a clinical perspective, the timing and modality of feedback are likely to influence motor learning, patient engagement, and the possibility for therapists to correct compensatory strategies in real time. Future TR platforms should therefore not only ensure technical feasibility, but also explicitly consider how feedback delays, communication patterns, and supervision modes shape the therapeutic interaction. Furthermore, while advanced ML models excel in accuracy, they frequently lack clinical interpretability, an essential component of effective and transparent decision-making. This stands in contrast to conventional rehabilitation, where clinicians observe patients directly, instilling trust and ensuring clear clinical judgment.
From a clinical perspective, technical limitations (e.g., scarce use of pose-estimation metrics, processing time speed and feedback latency, incomplete hardware settings and/or setups) have direct consequences for how AI-based TR systems can be used in practice. The lack of standardized pose-estimation metrics and detailed statistical validation makes it difficult to determine whether small changes in joint angles or movement quality reflect true clinical improvement, which in turn limits the use of these measures as reliable rehabilitation biomarkers or as decision-support tools at the individual patient level. Similarly, incomplete reporting of hardware configurations, processing speed, and feedback latency prevents clinicians from knowing whether the proposed systems can deliver feedback with sufficient temporal precision and robustness in typical home environments, where lighting, camera positioning, and connectivity are often suboptimal. Better characterizing these aspects will help ensure that the promising performance reported in feasibility studies can be translated into consistent support for day-to-day clinical decision-making. Achieving this will require close collaboration between engineers, computer scientists, and clinicians to design AI tools that combine technical sophistication with usability, affordability, and clinical practicality.
Moreover, current evidence suggests that AI-based MMA has primarily been administered to relatively stable patients with mild to moderate motor impairments who can perform structured tasks, such as reaching movements, sit-to-stand transitions, or simple upper-limb exercises, in a home environment. These include, for example, stroke survivors, people with Parkinson's disease (Bertomeu-Motos et al., 2023; Capecci et al., 2018; Dellepiane et al., 2025; Ramírez-Sanz et al., 2023), patients with fatigue-related deficits (Capecci et al., 2023), and other conditions with altered motor function (Hartman et al., 2022; Nucita et al., 2023). In acute or medically unstable patients, existing AI research has focused more on enhancing diagnosis and prognosis (e.g., by identifying key recovery factors and optimizing patient outcomes; (Bonanno et al., 2025) rather than on unsupervised home-based TR, which may be unsafe or difficult to implement.
Regarding clinical outcomes, most studies have focused on task-specific performance metrics (e.g., movement smoothness, range of motion, number and quality of repetitions) and on correlations between AI-derived kinematic indices and standard clinical scales. These metrics could complement the traditional assessment by providing information on movement quality, training intensity, and adherence over time. Future work should clarify which digital biomarkers derived from MMA are most responsive to change and clinically meaningful for different neurological populations.
According to the International Classification of Functioning, Disability and Health framework (ICF), the clinical outcomes in the reviewed studies were primarily focused on defining activity-level measures, such as the upper limb test (e.g., WMFT) (Chae et al., 2020), and walking capacity tests (e.g., the 2-Minute Walk Test) (Capecci et al., 2023). Capecci et al., also incorporated outcomes related to participation and broader health status, such as quality-of-life indices (e.g., EQ-5D) and fatigue, anxiety, or depression scales. Environmental and personal factors were addressed more indirectly, mainly through usability questionnaires (e.g., SUS, TUQ), adherence metrics, and digital access constraints(Capecci et al., 2023; Chae et al., 2020; Ramírez-Sanz et al., 2023). In contrast, AI-derived kinematic metrics (e.g., movement smoothness, range of motion, number and quality of repetitions) could be linked to define body functions.
Finally, environmental conditions play a critical role in the reliability of markerless systems. Most vision-based setups were tested in controlled environments with sufficient lighting, limited occlusions, and predefined camera-to-subject distances and viewpoints. When translating these solutions to routine practice, whether in outpatient clinics or at home, it will be important to ensure adequate space, stable illumination, and simple positioning guidelines for cameras or sensors, as well as to provide clear instructions to patients and caregivers. Defining and standardizing these environmental requirements is likely to improve signal quality, reduce data loss, and enhance the robustness of AI-based MMA in real-world TR settings.
Ethical, Regulatory, and Educational Considerations
Integrating AI into rehabilitation requires rethinking and redefining the roles and responsibilities of healthcare professionals. Such a transition is not limited to learning how to use software but also demands a deeper move toward clinical reasoning guided by data. Despite the perceived benefits of AI in rehabilitation, there is a lack of understanding and uptake of AI among physical therapy professionals. This knowledge is crucial for the successful implementation of AI applications in the rehabilitation field, particularly given the increasing emergence of technological advancements that have the potential to enhance healthcare delivery. In the study by Shawli et al., (Shawli et al., 2024) the physiotherapists agreed that AI can reduce the workload, time, and effort required by physical therapists; however, they emphasized that AI will not replace their role. While reducing physical workload can enhance efficiency and job satisfaction among therapists, many participants expressed concerns about relying solely on AI for clinical decisions. Participants stressed that AI cannot adequately capture patients’ psychological and social dimensions, which remain fundamental to rehabilitation. Similar concerns were found in a study among radiation oncology professionals, where 77% agreed that human input remains essential for refining AI-driven decisions (Wong et al., 2021). Despite these concerns, some participants acknowledged AI's potential in delivering accurate diagnoses and treatment plans. This aligns with findings from general practitioner studies, which view AI as a tool to enhance diagnostic accuracy and support clinicians in their roles (Alsobhi et al., 2022; Buck et al., 2022). However, in most of the current studies, the role of therapists is described only briefly, and there is little information on how clinicians are trained to interpret AI-derived metrics, understand their limitations (e.g., measurement error, latency, algorithmic uncertainty), and integrate them into their clinical reasoning. Future studies should therefore assess how clinicians interpret AI-based outputs and how these technologies can be appropriately incorporated into everyday rehabilitation practice.
Ethical and regulatory issues are also central. Relying on AI for personalized treatment recommendations raises concerns related to data bias, transparency of algorithms, patient autonomy, and responsibility for clinical decisions (Díaz-Rodríguez et al., 2023; Mennella et al., 2024). Strong data protection measures, such as end-to-end encryption, multi-factor authentication, secure data storage, and routine audits, are critical in reducing privacy threats and maintaining trust. To ensure equity and maintain patient confidence, AI-based TR platforms must be designed with fairness, openness, and accountability at their core.
The Human Element: Caregivers and the Therapeutic Relationship
Another underexplored aspect, addressed notably only by Nucita et al., is the role of the caregiver. Several studies highlight the caregiver's role as a co-therapist in TR, providing support to physiotherapists during both real-time and remote-delayed sessions (Calabrò et al., 2023; Dulawan et al., 2024; Sun et al., 2023). Their participation is essential for the success of home rehabilitation, particularly for patients who need help in performing exercises or maintaining adherence to prescribed protocols (Calabrò et al., 2023).
More broadly, the shift from traditional to AI-enhanced rehabilitation compels a deeper reflection on the very nature of “care” in digital environments. Technological proficiency alone cannot capture the nuanced dynamics of therapeutic practice and alliance. Core aspects such as the emotional bond and interpersonal relationship established in face-to-face therapy, especially the therapeutic alliance, remain difficult for AI systems to reproduce (Dolev & Zilcha-Mano, 2019; Kornhaber et al., 2016). Therefore, evaluations of AI-based TR must account not only for clinical outcomes but also for this intangible, yet deeply impactful, dimensions of care. Accordingly, incorporating patients’ active participation and constant feedback between patient and clinician is likely to improve engagement.
Limitations
The limitations of the included studies and of the scoping review methodology must be acknowledged. Many findings have limited generalizability due to small sample sizes. In addition, clinical characterization of the study populations was often restricted by incomplete reporting in the primary articles; diagnostic criteria, disease severity, and detailed demographic or clinical scale data were absent or only briefly described, thereby reducing the clinical interpretability and between-study comparability of the results. Moreover, several studies (Bertomeu-Motos et al., 2023; Hartman et al., 2022; Ramírez-Sanz et al., 2023) evaluated only methodological aspects, overlooking the potential effectiveness of the innovative approaches. Furthermore, most AI-based TR studies were small feasibility or pilot trials without a conventional rehabilitation control group, which prevents drawing firm conclusions about the relative efficiency, safety, and usability of AI-based interventions compared with standard care. With respect to our review, restricting the search to English-language publications may have led to the exclusion of relevant evidence, while the absence of statistical analyses prevented a quantitative appraisal of the literature. In line with the scoping review design, we did not perform formal statistical analyses to aggregate results across studies, as our primary aim was to map and qualitatively describe the range of AI-based TR and remote monitoring approaches. In addition, the marked heterogeneity of the included evidence, in terms of sample size and characteristics, study design, interventions, and technological equipment, indicates that a systematic review with meta-analysis would currently be difficult to conduct and of limited clinical interpretability. Consequently, this review provides a broad qualitative synthesis of the available evidence, offering meaningful insights into the role of AI technologies in the field of TR and remote monitoring.
In the future, TR studies using MMA and AI-based technologies should adopt more rigorous and standardized reporting of pose-estimation metrics, hardware and environmental detailed descriptions, providing also details of TR modality (synchronous or asynchronous) and feedback latency. In addition, future work should include comparisons of 2D and 3D pipelines within the same experimental setup and clinical population, allowing quantitative assessment of the trade-off between accuracy, robustness, computational cost, and deployability on low-cost devices. Benchmark datasets for TR scenarios, annotated with both clinical scores and ground-truth kinematics, would greatly facilitate such comparisons and accelerate progress toward standardized evaluation protocols.
For non-visual AI-based TR modalities such as IMU-based, accelerometry-based, or speech-based systems, similar principles apply reporting of sensor calibration procedures, environmental conditions, and inference latency will be essential to ensure reproducibility and to enable fair comparisons between alternative hardware configurations and algorithms. Together, these improvements would address the current gaps highlighted by our review and provide a more solid empirical basis for home-based neurorehabilitation using AI-based technologies.
Conclusion
In conclusion, research in the field of AI and TR, including remote monitoring, remains in a preliminary stage, particularly regarding its application in patients with neurological disorders. This aspect is related to the great heterogeneity in terms of patient populations, outcome measures, and extracted features. Furthermore, AI-based TR platforms and applications focused on physical exercise seem to be largely experimental when compared to those aimed at remote movement assessment. Despite these limitations, this review highlighted considerable potential for future development, both in terms of technological advancements and in relation to patient outcomes and effects. Nonetheless, AI-based platforms could be used to dynamically adapt the level and type of exercises in real time according to the patient's movement performance and difficulties, allowing a more finely tuned progression of training intensity and task complexity. In parallel, MMA in a TR setting could provide continuous, quantitative information on motor status, offering an objective complement to clinical scales that are often difficult to administer remotely and may be insensitive to subtle changes over time. Beyond innovation, it is important to emphasize that TR and remote monitoring have become essential, as they allow clinicians to maintain close contact with patients despite the absence of traditional face-to-face interaction. For these reasons, the integration of AI as a clinical support tool, providing quantitative and objective data, could represent a novel approach to enhancing the personalization of care.
Supplemental Material
sj-docx-1-nre-10.1177_10538135261426537 - Supplemental material for Markerless Motion Analysis and AI-Based Platforms for Neurological Telerehabilitation: A Scoping Review
Supplemental material, sj-docx-1-nre-10.1177_10538135261426537 for Markerless Motion Analysis and AI-Based Platforms for Neurological Telerehabilitation: A Scoping Review by Mirjam Bonanno, Giovanni Lonia, Sepehr Mojdehdehbaher, Antonio Celesti and Rocco Salvatore Calabrò in NeuroRehabilitation
Supplemental Material
sj-docx-2-nre-10.1177_10538135261426537 - Supplemental material for Markerless Motion Analysis and AI-Based Platforms for Neurological Telerehabilitation: A Scoping Review
Supplemental material, sj-docx-2-nre-10.1177_10538135261426537 for Markerless Motion Analysis and AI-Based Platforms for Neurological Telerehabilitation: A Scoping Review by Mirjam Bonanno, Giovanni Lonia, Sepehr Mojdehdehbaher, Antonio Celesti and Rocco Salvatore Calabrò in NeuroRehabilitation
Footnotes
Acknowledgements
MB is a PhD student enrolled in the National PhD in Artificial Intelligence, XL cycle, course on Health and life sciences, organized by Università Campus Bio-Medico di Roma. GL is a PhD student enrolled in the National PhD in Artificial Intelligence, XL cycle, course on AI for society, organized by Università degli Studi di Pisa.
Informed Consent Statement
Not applicable.
Institutional Review Board Statement
Not applicable.
Author Contributions
Conceptualization, MB and RSC; methodology, MB, GL, and RSC; validation, all authors; investigation, MB, GL and SM; resources, R.S.C.; data curation, MB, GL, SM; writing—original draft preparation, MB, GL, SM; writing—review and editing, RSC and AC; visualization, SM and AC; supervision, R.S.C and AC; project administration, R.S.C.; funding acquisition, R.S.C. All authors have read and agreed to the published version of the manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and publication of this article: This study was supported by Current Research Funds 2025 RRC-2025-23686388, Ministry of Health, Italy.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Not applicable.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
