Abstract
Background:
Caring health from childhood is a most important challenge. To date, machine learning (ML) algorithms have been introduced to several fields of knowledge, while in education it is a novel perspective. This review aims to evaluate the effectiveness of ML applications on data registered by inertial measurement units collected from preschool to secondary education children’s physical activity during school-hours. Furthermore, the review aims to explore how ML is used to process and interpret this data for outcomes like motor competence, physical activity intensity, sedentary behavior and academic/developmental indicators.
Methods:
Following PRISMA guidelines, we systematically searched PubMed, Web of Sciences, SCOPUS, SPORTDiscus and ProQuest Central databases.
Results:
13 studies met the inclusion criteria, covering preschool to secondary education settings across multiple countries. The methodological quality ranged from moderate to high (11–17/18 MINORS points). ML algorithms, mainly Random Forest, Support Vector Machines, Gradient Boosting and Convolutional Neural Networks, were successfully applied to classify or predict various outcomes such as motor competence, physical activity intensity, sedentary behavior and developmental or academic indicators.
Conclusion:
Reported accuracies ranged from approximately 70% to 99%, demonstrating the strong potential of wearable sensor data combined with ML to objectively monitor and assess school-related physical activity.
Introduction
Physical activity is fundamental to human nature. It improves muscular and cardiorespiratory fitness, bone health and mental fitness while reducing the risk of heart diseases, diabetes, hypertension, obesity and fractures. Insufficient physical activity is among the leading factors which cause mortality. The World Health Organization (WHO), therefore, recommends people of all ages to indulge in regular physical activity.1,2 However, children as well as adults world-wide are struggling to fulfill the guidelines recommended by the World Health Organization.3,4
For children, physical activity is beneficial for development in 3 major areas: motor skills, cognitive competency (such as creativity, attention and mental abilities) and social competency. 5 Therefore, monitoring children’s physical activity is valuable to better understand their physical and mental development, along with the potential risk factors that consequently emerge with insufficient physical activity levels. 6 Due to the seriousness of such risk factors, such as depressive symptoms and suicidal behavior, accurate assessment is crucial. 7 Subjective methods such as questionnaires and parental reports have historically been leveraged, but have shown to be insufficiently reliable due to recall bias. 8 Such methods are also time-intensive, have poor generalizability and yield substantial misclassification errors in children.9-11
Information technology, such as accelerometers and other wearable sensors, collect objective and continuous data on physical activity. Thus, this technology has been considered useful as an unbiased way of validating the subjective methods previously relied upon. 12 Wearable sensors have provided the opportunity to monitor children’s physical activity throughout school hours, which are a significant part of children’s daily life. Until recently, the technology has been limited in differentiating between types of physical activity. However, machine Learning (ML) and Deep Learning (DL) algorithms have been applied to data collected by wearable sensors in attempts to improve the practical usability of these instruments. This has resulted in an automated process, leading to dramatic time-efficiency gains, 13 and improved activity classification by detecting subtle movement pattern differences.14-16 This has also enabled a more detailed analysis of different intensity levels of physical activity, and their effect on health indicators. 17 The validity and reliability of ML wearable technology has been examined in assessing physical activity in preschool- and school-age youth. 18 However, a wide variety of physical activities are performed spontaneously in real-life settings, which can negatively affect the performance of ML algorithms. 19
Several reviews have examined the application of ML algorithms and DL approaches on children’s physical activity data from wearable technology, and have revealed significant progress within the field.20,21 This has provided the following benefits with regards to children’s physical activity: (1) real-time monitoring and automated alert systems, (2) personalized evaluation frameworks accounting for differences in age and gender, (3) informing intervention strategies by identifying sedentary patterns and (4) detection of complex movement patterns that otherwise are likely to be missed by traditional methods of analysis.22-24 However, previous systematic reviews have focused on early childhood 25 or specific clinical populations, 26 leaving a gap in understanding the full educational spectrum from pre-school to secondary school.
Thus, this review aims to evaluate the effectiveness of ML applications on data registered by information technology collected from preschool to secondary education children’s physical activity during physical education, classroom and school travel. Furthermore, the review aims to explore how ML is used to process and interpret this data for outcomes like motor competence, physical activity intensity, sedentary behavior, academic/developmental indicators and to identify methodological inconsistencies for improved methodological practice in the future.
Materials and Methods
Experimental Approach to the Problem
This systematic review complied with existing standards for conducting systematic reviews in sport sciences 27 and the Preferred Reporting Items for Systematic Review and Meta-Analyses (PRISMA) guidelines. 28 In order to preserve methodological rigor and guarantee thorough coverage of pertinent literature, the review methodology was created. PROSPERO has registered the systematic review (CRD420261285539).
Information Sources
Five databases—PubMed, Web of Sciences, SCOPUS, SPORTDiscus and ProQuest Central—were thoroughly searched. All published literature before October 8, 2025, was included in the search.
Search Strategy
To organize the search strategy and guarantee methodical coverage of pertinent literature, the PICO (Patient, Problem or Population – Intervention or Exposure – Comparison, Control or Comparator – Outcome[s]) framework was used. The writers were not blinded to journal names or manuscript authors in order to preserve transparency. In order to find all pertinent material on information technology and machine learning in educational settings, the search terms were carefully chosen. The last search term was:
(preschool OR kindergarten OR school* OR schoolchildren OR “primary education” OR “elementary education” OR “secondary education” OR “high school”) AND (“machine learning” OR “deep learning”) AND (exercise OR “Physical activity” OR “physical education” OR sport OR fitness OR aerobic OR “motor skill*” OR “motor competence”) AND (“inertial measurement unit*” OR gyroscope OR pedometer OR barometer OR “smart band*” OR smartwatch* OR acceleromet* OR wearable OR sensor*).
Eligibility Criteria
The authors downloaded the title, authors, journal and date of every article that came up in the search after entering the search string into databases. After organizing the Excel file, duplicate articles were eliminated, and the remaining articles were assessed for eligibility. The writers included items that did not show up in the search by marking them in the excel spreadsheets as “included from external sources” (Table 1).
Inclusion and Exclusion Criteria.
Data Extraction
An Excel spreadsheet created in compliance with the Cochrane Consumers and Communication Review Group’s data extraction template was used to conduct a consistent data extraction procedure. 14 The spreadsheet made it easier to systematically evaluate the inclusion and exclusion criteria for each of the chosen studies. Two authors independently conducted the extraction process (including manually done duplicates’ removal) checking titles/abstracts and full-texts, with any disagreements resolved through discussion until consensus was reached. Full documentation was maintained for excluded articles, including specific reasons for exclusion. All data were systematically recorded and stored in the spreadsheet.
Assessment of Study Methodology
The methodological quality was assessed using methodological index for non-randomized studies (MINORS). The MINORS scale is a list that contains 8 essential points, and it is expanded to 12 points when the studies to be treated are comparative. In this case, it was assessed considering 9 items (out of 18 points) due to the non-possibility to applicate (NA) 3 of them. The score that each section receives can be from 0 to 2, depending on the quality obtained by each point.
Results
Identification and Selection of Studies
After analyzing all databases (PubMed: 23; Web of Science: 27; ProQuest Central: 13; SCOPUS: 49; SPORTDiscus: 5; External sources: 2) the contents of 119 articles were checked, detecting, at initial stage, 60 duplicate articles. Then, the authors analyzed if each of the remaining 59 articles meet all inclusion criterion, resulting on the elimination of 37 articles by exclusion criteria number 1 (n = 10), exclusion criteria number 2 (n = 26), exclusion criteria number 4 (n = 2) and exclusion criteria number 6 (n = 8). The remaining 13 articles were included in the qualitative synthesis of the systematic review (Figure 1).

Flow diagram of the study.
Quality Assessment
The methodological quality of the 13 included studies, assessed using the MINORS checklist, ranged from 11 to 17 out of 18 points, indicating generally moderate to high quality. Most studies clearly stated objectives, used appropriate designs and collected relevant data. However, several lacked prospective sample size estimation, adequate control groups or neutral evaluation procedures. Only a few studies achieved the highest methodological standards, while the majority demonstrated solid internal consistency but moderate external validity. Overall, the reviewed works show acceptable methodological rigor, supporting confidence in their reported findings (Table 2).
Methodological Assessment of the Included Studies.
Abbreviation: NA, not applicable. The MINORS checklist (2 = high quality; 1 = medium quality; 0 = low quality): Clearly defined objective (item 1); Inclusion of patients consecutively (item 2); Information collected retrospectively (item 3); Assessments adjusted to objective (item 4); Evaluations carried out in a neutral way (item 5); Follow-up phase consistent with the objective (item 6); Dropout rate during follow-up less than 5% (item 7); Prospective estimation of sample size (item 8); Adequate control group (item 9); Simultaneous groups (item 10); Homogeneous starting groups (item 11); and, appropriate statistical analysis (item 12).
Study Characteristics
Sample
The 13 included studies involved children and adolescents aged 3 to 18 years across diverse educational levels, from preschool to secondary school. Sample sizes varied widely, ranging from 14 participants in laboratory-based feasibility studies to over 1,700 schoolchildren in large-scale population analyses. Most samples were balanced by sex and recruited from schools in Europe, North America and Australia, ensuring a heterogeneous representation of educational and cultural contexts.
Data Collection Methods
Data were primarily gathered through inertial measurement units (IMUs) and related wearable sensors such as ActiGraph accelerometers, smart bands, gyroscopes and GPS loggers. Sampling frequencies ranged from 10 Hz to 110 Hz, with epochs typically set between 1 and 60 s depending on the target variable. Collected features included raw tri-axial acceleration, angular velocity, stride cadence, heart rate and derived metrics such as energy expenditure or motor competence indicators. Data were generally processed using standardized pipelines and exported for subsequent machine learning analysis.
Study Settings and Research Focus
Most investigations were conducted in naturalistic school or preschool environments, with some including laboratory validation phases. Activities analyzed ranged from classroom movement and playground behavior to school travel and structured motor skill assessments. The studies aimed to address diverse educational and health objectives, including classification of motor competence, detection of sedentary behavior, prediction of academic performance and identification of developmental or neurobehavioral patterns such as ADHD-related movement profiles.
Machine Learning Implementation
A variety of supervised and unsupervised machine learning algorithms were applied, including Random Forest, Support Vector Machines, Gradient Boosting, Neural Networks, Convolutional Neural Networks, k-means clustering and Self-Organising Maps. Reported model accuracies ranged from 70% to 99%, depending on the complexity of the task and data quality. Most studies emphasized feature engineering and validation procedures, while a few integrated deep learning frameworks for automated feature extraction. Collectively, these implementations demonstrate the growing feasibility of using machine learning to analyze sensor-based data for monitoring and enhancing schoolchildren’s physical activity and motor development (Table 3).
Main Characteristics and Findings of Machine Learning Applications Using Accelerometer Data in Schoolchildren.
Abbreviations: 20MSRT, 20 m shuttle run test; ADHD, attention deficit and hyperactivity disorder; BMI, body mass index; IMU, inertialmeasurement unit; LPA, light PA; ML, machine learning; MPA, moderate PA; MVPA, moderate-to-vigorous physical activity; PA, physical activity; SED, sedentary; SVM, support vector machine; TGMD-3, Test of Gross Motor Development 3; VPA, vigorous PA; zBMI, body-mass index expressed as a z-value.
Discussion
The traditional methods for physical activity assessment in educational settings suffer from time-intensity, poor generalizability and substantial misclassification errors in children.36-38 Machine learning algorithms integrated with inertial measurement units offer automated and objective monitoring solutions. However, previous systematic reviews have examined only early childhood 25 or specific clinical populations, 39 leaving a gap in understanding machine learning applications across the full educational spectrum from preschool to secondary education. Therefore, this systematic review examined machine learning applications with inertial measurement units for assessing physical activity during school hours across all educational levels.
Thirteen studies demonstrated accuracies of 70% to 99% across diverse applications including motor competence assessment, activity classification, sedentary behavior detection and clinical screening. Random Forest emerged as the predominant algorithm in 7 studies, while Convolutional Neural Networks achieved 87.6% balanced accuracy for sedentary behavior detection and 87.5% to -93.75% accuracy differentiating ADHD children from controls. Machine learning approaches offered substantial advantages over traditional methods, including dramatic time efficiency gains (assessment time reduced from 15 to 2 min per child) and detection of subtle movement pattern differences not captured by intensity-based classifications (eg, reduced walking/running time in children with developmental coordination disorder despite comparable overall activity levels). However, methodological quality varied (11-17/18 MINORS points) with considerable heterogeneity in sampling frequencies, epoch lengths, sensor placements and validation protocols.
Motor Competence Assessment
Motor competence assessment via machine learning addresses the time-intensive nature of traditional evaluations that requires trained assessors and standardized protocols. 40 Brons et al 15 demonstrated 76% accuracy predicting fine motor skills using sensor-augmented toys, reducing assessment time from 15 to 2 min per child. Similarly, Lander et al 30 achieved 80% to 100% accuracy across TGMD-3 skills using simplified 4-sensor IMU systems positioned on wrists and ankles, though limitations existed for detecting certain skill criteria such as arm positioning and object interactions. This accuracy-feasibility trade-off remains critical for school implementation, where comprehensive sensor arrays (eg, 17 IMUs) offer precision but lack practical scalability due to setup time and technical expertise requirements. 30 The integration of machine learning with consumer-grade wearables presents additional opportunities for large-scale implementation. Sulla-Torres et al 16 achieving 95% accuracy in males and 89% in females for motor competence classification using smart bands combined with Gradient Boosting algorithms. Similar consumer devices have demonstrated acceptable accuracy for step counting and distance measurement in adult populations. 41
The instrumentation of standardized motor competence tests represents a growing research area,25,42 with IMU-based systems’ potential to provide objective assessments while reducing assessor burden. Traditional assessments such as the Movement Assessment Battery for Children (MABC-2) require extensive training, 43 creating barriers to widespread implementation. However, most studies employed structured protocols that may not fully represent the complexity of naturalistic classroom and playground activities.30,34 Future development should prioritize algorithms that maintain acceptable accuracy with minimal sensor configurations while capturing ecologically valid movement patterns characteristic of children’s spontaneous play and structured physical education activities.16,25
Activity Classification and Intensity Prediction
Activity type classification revealed advantages over traditional cut-point methods, particularly for detecting subtle movement patterns obscured by intensity-based approaches.13,32,33 Christian et al 32 and Letts et al 33 employed validated Random Forest models achieving F-scores exceeding 80% for classifying sedentary, light and moderate-to-vigorous activities in preschoolers. Critically, Letts et al 33 demonstrated that children with developmental coordination disorder showed comparable overall activity intensity but significantly reduced walking and running time. These findings highlight machine learning’s capacity to reveal qualitative differences in movement patterns beyond quantitative intensity metrics. 44 Similarly, Kwon et al 13 demonstrated that Random Forest classification with wrist-worn accelerometers provided more accurate estimation of physical activity levels in preschoolers, revealing that U.S. preschoolers averaged only 28 min per day of MVPA versus the recommended 60 min. 45
Despite these advances, Li et al 10 reported only 70% overall accuracy using k-means clustering for wrist-worn accelerometer calibration in preschoolers, with challenges in moderate-to-vigorous activity classification. This reflects persistent difficulties with children’s naturally intermittent movement patterns. Also, the variability in cut-point estimates across different processing methodologies creates substantial challenges for cross-study comparisons and population-level surveillance.38,46 Machine learning approaches offer potential solutions to these methodological challenges by learning activity-specific features directly from data rather than relying on fixed intensity thresholds.13,32,33 Notably, Mendoza et al 35 achieved 99.9% accuracy in identifying cycling activity through combined accelerometer and GPS data, enabling precise measurement of active transport interventions. 47
Predictive and Clinical Applications
Beyond classification tasks, predictive applications demonstrated potential for early identification of children at risk for adverse developmental outcomes. Joensuu et al 11 predicted unfavorable future cardiorespiratory fitness in adolescents using Random Forest incorporating physical fitness, motor competence, adiposity, physical activity patterns, academic performance and psychosocial variables (AUC: 0.83 girls, 0.76 boys). While baseline cardiorespiratory fitness emerged as the strongest single predictor, the inclusion of multiple domains significantly enhanced predictive accuracy, supporting machine learning’s capacity to synthesize complex, multivariate data for risk stratification. 11 However, predicting future development proved substantially less accurate than classifying current status (AUC: 0.68 girls, 0.40 boys), 11 suggesting that longitudinal changes in physical fitness involve complex, potentially non-linear developmental processes that current machine learning approaches capture imperfectly. 48
Conversely, Froud et al 29 found that traditional linear regression outperformed machine learning methods (Random Forest, Support Vector Machines, k-Nearest Neighbors, Neural Networks) when predicting academic performance and quality of life from physical activity data, with machine learning models explaining virtually no variance in validation datasets (R 2 = 0%) compared to 22% to 24% for linear regression. This negative finding provides crucial evidence that machine learning does not universally surpass traditional approaches, particularly when relationships are approximately linear, sample sizes are modest, and missing data are prevalent. 49 Clinical screening applications showed innovation. Muñoz-Organero et al 9 differentiated ADHD children from controls through movement pattern analysis using Convolutional Neural Networks applied to acceleration images, achieving 87.5% accuracy with wrist sensors and 93.75% with ankle sensors. Importantly, medication altered movement patterns toward control-like profiles, suggesting potential applications in objective treatment monitoring beyond traditional behavioral rating scales. 9 Clark et al 34 further demonstrated the utility of unsupervised learning approaches, employing Self-Organized Maps and k-means clustering to identify 5 distinct movement behavior profiles in preschoolers, highlighting the potential for profiling approaches to shift focus from basic obesity monitoring to comprehensive assessment of ”moving well.”
Methodological, Ethical Considerations and Limitations
ActiGraph accelerometers (predominantly GT3X+) dominated across studies,10,11,13,29,31-35 reflecting their established validity in pediatric research. Instead, considerable heterogeneity existed in sampling frequencies (10-110 Hz), epoch lengths (1-60 s) and wear locations (hip, wrist, ankle). Epoch length represents critical trade-offs: shorter windows (1-5 s) provide better temporal resolution for detecting brief activity bouts, while longer epochs (30-60 s) offer more stable classifications but risk missing short bursts of activity that contribute to daily energy expenditure. 40 This methodological variability, combined with the proliferation of device-specific and population-specific cut-points, creates substantial challenges for synthesizing evidence across studies.10,32,38 The absence of standardized protocols for school-based sensing represents a significant barrier to clinical translation and cross-study comparison. Future implementation would benefit from consensus guidelines addressing: (a) optimal sensor placement considering both measurement validity and student comfort; (b) minimum sampling frequencies and epoch lengths for different assessment objectives; (c) standardized calibration procedures across device types; (d) data processing pipelines and feature extraction methods; and (e) minimum training dataset requirements for algorithm development. Such standardization would facilitate multi-site validation studies, enable direct comparison of algorithmic performance and support the development of generalizable models deployable across diverse educational contexts.25,32
Consumer-grade wearables such as smart bands offer promising alternatives for large-scale implementation, with Sulla-Torres et al 16 demonstrating high classification accuracy using the Huawei Band 7, though previous validation studies have shown variable performance of consumer devices depending on the specific metrics and populations assessed. 50 Deep learning approaches, particularly Convolutional Neural Networks, demonstrated the capacity to automatically learn relevant features from raw acceleration signals and potentially reducing researcher bias. Also, they improved classification accuracy for complex behaviors such as sedentary patterns and postural transitions.9,31 However, these advantages must be balanced against increased computational requirements, larger training dataset needs and substantially reduced model interpretability. 51
Finally, several limitations characterize the current evidence base including: (a) small, homogeneous samples limiting generalizability9,15,30; (b) predominance of cross-sectional designs precluding assessment of algorithm stability across child development9-11,13,15,16,29,31-35 and (c) incomplete free-living validation with many studies relying on structured protocols 30 ; and (d) limited attention to model interpretability and identification of algorithmic biases.25,31 Also, the exclusion of conference papers may have limited coverage of recent algorithmic innovations, due to many cutting-edge algorithms are frequently published in conference proceedings. Ethically, continuous sensor-based monitoring in schools raises concerns regarding informed consent, surveillance bias (children altering natural behavior when monitored), potential stigmatization from algorithmic classifications and equity if access concentrates in well-resourced schools. Data governance must address ownership, retention and protection against unauthorized access, particularly regarding commercial interests and algorithmic bias in homogeneous training samples.25,31,44,52
Practical Implementation and Future Directions
Practical implementation faces technical expertise requirements for sensor deployment, data management and algorithm implementation. Sulla-Torres et al 16 demonstrated feasibility through user-friendly mobile applications enabling educators to input student data and automatically classify motor competence. Time efficiency represents a key advantage: Brons et al 15 reduced assessment time from 15 to 2 min per child through automated scoring, enabling population-level screening previously prohibitive in resource-limited educational settings. Consumer-grade wearables such as smart bands offer cost advantages and improved user acceptance compared to research-grade accelerometers, though the latter provide superior data quality and have undergone more extensive validation procedures.40,50 The optimal balance between cost, accuracy and feasibility likely varies across educational contexts, assessment objectives and available resources, requiring careful consideration of specific implementation goals and constraints.25,30
For clinical practice, school nurses and primary care physicians could use brief sensor-based screenings to identify children requiring comprehensive evaluations and leverage qualitative movement pattern detection for earlier identification of developmental coordination disorder.15,33 High accuracy in differentiating ADHD movement patterns suggests potential for objective treatment monitoring. 9 Integration with consumer wearables could facilitate longitudinal monitoring between well-child visits, while predictive models enable proactive risk stratification. 11 However, clinicians should recognize these as screening rather than diagnostic tools. 43
Future research priorities include: (a) external validation across diverse populations and cultural contexts to assess generalizability 25 ; (b) development of standardized data collection and processing protocols to facilitate cross-study comparisons and model sharing 32 ; (c) longitudinal designs tracking algorithm performance stability as children develop11,34; (d) investigation of hybrid approaches combining traditional methods’ interpretability with deep learning’s performance29,31; or (e) integration of multiple sensor modalities (accelerometry, GPSand heart rate) through sensor fusion techniques. 35 Establishing data governance guidelines to ensure machine learning benefits children’s health rather than enabling surveillance represents a critical ethical responsibility.
Conclusions
This systematic review demonstrates that machine learning algorithms integrated with inertial measurement units successfully assess physical activity during school hours across preschool to secondary education. Fourteen studies were identified, demonstrating moderate to high methodological quality (11-17/18 MINORS points). Machine learning algorithms (predominantly Random Forest, Support Vector Machines, Gradient Boosting and Convolutional Neural Networks) achieved accuracies ranging from 70% to 99% across diverse applications. These applications included motor competence classification, physical activity intensity prediction, sedentary behavior detection and clinical screening for conditions such as ADHD. Machine learning approaches offered substantial advantages over traditional assessment methods, including time efficiency and capacity to detect subtle movement pattern differences. However, considerable methodological heterogeneity was observed across studies regarding sampling frequencies (10-110 Hz), epoch lengths (1-60 s), sensor placements and validation protocols. Overall, the evidence indicates strong potential for wearable sensor data combined with machine learning to objectively monitor and assess school-related physical activity and motor development.
Footnotes
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
