Abstract
Background
As technology evolves, the market for wearable physical activity monitors has expanded exponentially. As the user base of activity trackers grows, ensuring their accuracy and validity becomes increasingly crucial. However, research in this field remains limited.
Methods
This study evaluated the validity and accuracy of Fitbit in measuring step count and distance during standardised treadmill walking (5.5 km/h) for 30 minutes. Comparisons were made with the gold standard of manual step counting. ActiGraph data was collected and analysed simultaneously as a comparator.
Results
Thirty college students (16 males, 14 females) participated. Fitbit demonstrated excellent agreement with manually counted steps (intraclass correlation coefficients (ICCs) = 0.91, 95% CI: 0.81–0.96, p < 0.001). Overall Fitbit underestimated steps (mean absolute percentage error (MAPE) = 3.6 ± 0.03%) and distance travelled (MAPE = 10.5 ± 0.07%). Fitbit accuracy was higher in females (MAPE = 9.4 ± 0.07%) than males (MAPE = 11.4 ± 0.06%). A Bland-Altman plot between Fitbit and manual count presented less than 1% limit of agreement range (286.6 to −83.7). In contrast, ActiGraph lacked agreement and accuracy in step measurement in a controlled setting. Notably, gender differences may impact the accuracy in distance travelled but not in step counts recorded by Fitbit.
Conclusions
Our findings underscore the high validity and moderate-to-high accuracy of the Fitbit Inspire 2 in measuring step count relative to manual counting within controlled settings among young, healthy adults. However, Fitbit displayed low accuracy in measuring distance travelled compared to actual distances.
Introduction
Physical activity (PA) is vital for maintaining and improving health. As this knowledge is widely known, many people try to ensure that they regularly exercise and seek objective methods to verify whether their efforts are sufficient or effective. As technology evolves, the market for wearable physical activity monitors, such as Fitbit (Fitbit Inc, San Francisco, California, USA), has expanded exponentially. In 2020, Fitbit reached 31 million active users and the number is growing. 1 The trend of using wearable activity trackers has enabled unprecedented access to tracking human movement in response to the emphasis on physical activity. Step count serves as a widely accepted and accessible metric to measure daily movement. Additionally, step data usage in clinical settings has gained acceptance among researchers and practitioners as a simple valid and reliable method to assess habitual physical activity patterns.2,3
As the user base of activity trackers grows, ensuring their accuracy and validity becomes increasingly crucial. However, research in this field remains limited. While some studies have validated various brands’ wireless activity monitors in measuring step counts and energy expenditure,4–7 those focusing on Fitbit have often had small sample sizes; and were under free-living conditions. As such, accuracy may be compromised. Additional controlled studies are needed, as highlighted in a systematic review on this topic. 8 Furthermore, existing validity studies are predominantly based on athletes, adults, or community-dwelling older adults, with a gap in the literature related to their accuracy for young adult groups, a significant proportion of global users.9–11
This study assessed the criterion validity of the Fitbit Inspire 2 for measuring step count, and distance travelled compared to a research grade accelerometer (the ActiGraph GT3X+) and manual step counting among young, healthy adults in a controlled setting. We also performed a gender-specific analysis due to reported differences in gait parameters and postural stability between males and females,12,13 as we hypothesised that these differences could affect measurement accuracy or provide valuable insights. Therefore, the objectives of this study were: (1) To assess the validity of the Fitbit Inspire 2 tracker in measuring step count compared to wrist-worn ActiGraph GT3x+ and manual counting at a standard walking speed (speed = 5.5 km/h), both overall and by gender, among young, healthy college students. (2) To examine the agreement in distance travelled reported by the Fitbit Inspire 2 and the treadmill at a pre-determined speed and time with no incline.
Materials and methods
Participants
Thirty participants aged between 18 and 24 volunteered to participate after signing a consent form. They were recruited between March 2024 and April 2024. Individuals were included if they were physically capable of walking for 30 minutes on a treadmill and were undergraduate students at Trinity College Dublin (TCD). Exclusion criteria comprised individuals under 18 or over 24 years old, injuries or conditions affecting physical performance, medical conditions contraindicating exercise (e.g. uncontrolled cardiac arrhythmia), pregnancy, known allergies causing skin irritation with activity monitor use, and inappropriate clothing. The study was approved by Trinity College Dublin school of medicine research ethics committee (Reference No: 2943). Procedures followed during the study were in accordance with the Helsinki Declaration of 1975, as revised in 1983. Participants provided written informed consent to participate in this study.
Procedure
Upon arrival at the exercise laboratory, participants were briefed on the study’s objectives and procedures, information on gender, age, height, weight, and dominant hand was measured and recorded. Height and weight measurements were inputted into the Fitbit tracker, while all recorded data was programmed into the ActiGraph.
Participants wore both a Fitbit and an ActiGraph on the wrist of their dominant hand as per the manufacturer’s guidelines. Under the researcher’s instruction, they walked on a treadmill set at a constant speed (5.5 km/h) for 30 minutes. Start and finish times were recorded, with time walking reported to participants every 5 minutes and for the last 10 seconds. Two researchers observed and counted the participants’ steps with a digital counter (TallyCount, available at https://tallycount.app/). The start and finish times of the experiment were recorded to the nearest 10-second increment for accurate data extraction (e.g. 16:32:10 or 17:15:50). The number of steps recorded by the Fitbit tracker was calculated as the difference between the start and end counts. Likewise, the distance travelled recorded by the Fitbit was determined by subtracting the initial distance from the final distance. The number of steps recorded by the ActiGraph was extracted in 10-second epochs using ActiLife 6 software. All data collected was pseudo-anonymised and securely stored in encrypted Excel files on the University’s approved Sharepoint folder for analysis.
Statistical analysis
All statistical procedures were performed using statistical software: Statistical Package for the Social Sciences Version 27.0 (SPSS v.27.0). The significant level was set at p < 0.05. Descriptive characteristics were presented as mean (standard deviation; SD) or median (interquartile range; IQR). The Shapiro-Wilk test was used to determine data normality.
The comparative analysis involving the Fitbit tracker, ActiGraph, and manual count was performed across all participants and by gender. Intraclass correlation coefficient (ICC) was used to assess correlation and agreement in steps between the devices and manual count. An ICC value of ≥0.9 implied excellent, 0.75–0.89 implied good, 0.5–0.74 implied moderate, and <0.5 implied poor agreement. 14 To evaluate the accuracy of Fitbit, we calculated the mean absolute percentage error (MAPE) between the devices and manual count using the formula (absolute difference/observed steps) × 100%. Both MAPE values for step counts and distance travelled from the Fitbit were determined. We defined a MAPE of ±3% as a satisfactory level of measurement accuracy based on established standards for acceptable step count accuracy in controlled settings. 15 Lastly, Bland-Altman plots were used to visually represent agreement and systematic differences between step counts from the Fitbit, ActiGraph, and manual count.
Results
Demographic and participant characteristics.
SD = standard deviation; n = number of participants; cm = centimetres; kg = kilogram.
Distribution of step counts for ActiGraph, Fitbit, and manual count.
Distance travelled analysis.
Intraclass correlation coefficient (ICC) and mean absolute percentage error (MAPE) in step counts and distance travelled among ActiGraph, Fitbit, and manual count.
Table 4 also presents the MAPE (±SD) of step counts recorded by Fitbit and ActiGraph. Compared to manual counting, the gold standard, Fitbit demonstrated higher accuracy (MAPE = 3.6 ± 0.03%) than ActiGraph (MAPE = 18.9 ± 0.2%) for all participants. Among males, Fitbit’s step count (MAPE = 2.9 ± 0.02%) showed good accuracy compared to manual counting (MAPE ±3%). However, Fitbit showed higher MAPE values (11.9 ± 0.2%–27.9 ± 0.3%) when compared to ActiGraph, indicating greater discrepancy in step counts between the two devices. Additionally, females displayed higher MAPE values than males in all comparisons of step counts. Regarding distance travelled, Fitbit had a MAPE value of 10.5 ± 0.07% relative to actual distance travelled, with females (MAPE = 9.4 ± 0.07%) showing greater accuracy than males (MAPE = 11.4 ± 0.06%).
Figure 1 illustrates Bland-Altman plots to help visualise measurement discord between step count measures for all participants. Results from the Bland-Altman analysis for all participants and each gender are summarised in Table 5. Results showed all plots deviate more than 100 steps from zero and lack specific scatter patterns, suggesting systematic differences between measurements. Step counts also consistently differ across all measurement values. Fitbit’s step count relative to manual count in males exhibited the least bias (101 ± 94) with a narrower range of LoA (287 to −84). In contrast, ActiGraph’s step count in females displayed a substantially higher level of measurement discord (1041 ± 541) with a wider LoA range (2101 to −19). In general, test pairs involving females demonstrated greater bias compared to those involving males. Bland and Altman plots displaying variance in step count output between manual counting with ActiGraph (a) and Fitbit (b). Scales are consistent between graphs to aid with comparison. Results from Bland-Altman analysis for all participants combined and by gender.
Discussion
Validity of Fitbit Inspire 2 in step measurement
Our study’s objective was to determine the validity of the Fitbit Inspire 2 tracker in measuring step count. It sought to ascertain the degree to which the recorded results represent the actual measurement. Our results found that Fitbit demonstrated a high validity among college students. Firstly, step counts measured by Fitbit showed an excellent correlation (ICC = 0.91, 95% CI: 0.81–0.96, p < 0.001) with manual counting, the gold standard in a controlled setting. This outcome is consistent with recent research, 16 which observed a good correlation (ICC = 0.75, 95% CI: 0.53–0.87) in 30 adults during treadmill walking. In contrast, the agreement between manual step count and ActiGraph’s step count was considerably lower, indicating poor correlation. However, it is acknowledged that ActiGraph is not the gold standard in measuring step counts, 17 and was designed primarily for the measurements of physical activity over a number of hours. Bland-Altman analysis also revealed that ActiGraph exhibited high mean biases relative to manual counting, whereas Fitbit showed more clustered results and a lower level of measurement discord. Fitbit also demonstrated validity for measuring step count with limits of agreement (LoA) less than 1%, which is consistent with another study. 18 Hence, these findings further confirm the high validity of Fitbit in step count, aligning with previous research.5,19
In terms of accuracy, only the recorded step count in males (MAPE = 2.93 ± 0.02%) met the satisfactory level of measuring accuracy based on the established standard (MAPE ± 3%) in controlled settings. 15 However, it is worth noting that the MAPE of Fitbit’s step count in females (4.29 ± 0.03%) and in all participants (3.56 ± 0.03%) approached the standard and remained below ±5%. As MAPE ≤5% is indicative of good accuracy in some studies, 20 it can be argued that Fitbit achieves a relatively accurate step count.
We observed an overall underestimation in step count by Fitbit across all subgroups, which adds to the findings suggested in a systematic review 15 : approximately 50% of the time, Fitbit devices provided accurate measures (within ±3%) of steps in controlled testing conditions, with a general tendency to underestimate steps. The reason for this underestimation is unclear, 10 but it could be related to the device’s algorithm which may be designed to attribute a certain amount of movement as non-exercise activity thermogenesis and discount is as ‘exercise’ per se. After all, the Fitbit is designed to be used in free-living conditions over long periods of time where arm movements are likely also due to activities such as gesticulation, or food preparation.
Regarding gender differences, the Fitbit recorded more steps in females (3615.4 ± 226.2) than in males (3416.1 ± 120.8) over the same distance. This observation aligns with expectations, as females typically have a shorter stride length due to their height. 21 Importantly, the validity of Fitbit appears unaffected by gender as both subgroups demonstrated similar correlations, LoA, and MAPE relative to manual counting.
Validity of Fitbit Inspire 2 in distance measurement
Based on the MAPE≤3% benchmark for measuring accuracy in controlled settings, 15 Fitbit exhibited poor accuracy (10.47 ± 0.07%) in distance measurement, with males (11.43 ± 0.03%) slightly less accurate than females (9.38 ± 0.07%). This observation mirrors findings from Daniel et al., 22 who noted similar inconsistencies across various wrist-based devices in distance measurement. Similar to step count, an overall underestimation in the distance travelled was observed. Interestingly, the mean distance recorded by Fitbit in females (2.71 ± 0.33 km) was consistently closer to the actual distance (2.75 km) than in males (2.46 ± 0.16 km). A plausible explanation could be that Fitbit calculates distance walked by computing step length and number of steps taken. 23 If Fitbit employs an algorithm incorporating a default step length based on height and gender, 24 the number of steps taken may exert a more pronounced effect on distance measurement. Hence, females may cover more ground, as more steps are typically recorded in this gender group. Nevertheless, further research is needed to understand whether other biomechanical factors may explain variation in measurements during walking.
Validity of ActiGraph GT3X+ in step measurement
While ActiGraph GT3X+ is well-validated for assessing physical activity in free-living conditions,25–28 Migueles et al. 8 cautioned against relying on its accuracy and reliability in estimating step count, particularly in controlled settings. Although our study does not aim to evaluate the validity of ActiGraph in controlled settings, is it worth noting that the ActiGraph could yield misleading results if used as a reference instrument to validate Fitbit, especially for step counts.
Our study found that ActiGraph exhibited poor correlation (ICC = −0.13, 95% CI: −1.38–0.462) and poor accuracy (MAPE = 18.92 ± 0.15%) in step count compared to manual count for all participants in controlled settings. This finding aligns with those of Webber et al., 29 who reported a MAPE of 23.2%, and Hergenroeder et al., 30 who found a MAPE of 14.1%. However, contrary to our findings, a systematic review reported good validity of ActiGraph for step counting. 31 The discrepancies in findings could be attributed to differences in study settings and methodologies. Feng et al. 32 emphasised the importance of customising ActiGraph placement and algorithms to optimise measurement accuracy. Studies have indicated that ActiGraph records step counts more accurately when placed at the waist rather than the wrist. 33 Moreover, while our study was conducted with the default setting (30 Hz), research suggests that using the low-frequency filter (LFE) during slow walking (4.25 km/h) enhances step count accuracy by establishing a lower threshold to capture slower movements. 34 Further investigation is needed to elucidate the extent to which these factors contribute to measurement variations. However results from this study indicate that a high degree of caution is required when comparing Fitbit and ActiGraph step count data.
Impact and application
Strong evidence highlights the value of promoting walking, particularly focusing on obtaining 10,000 steps per day. 35 Despite a tendency to underestimate steps, our findings revealed high validity for Fitbit in step counting, which may help reinforce individuals’ trust in using Fitbit for tracking steps and promoting physical activity. This device could also be used in studies or strategies aimed at improving population health through physical activity.
Limitations
Our study has several limitations, including a relatively small sample (n = 30) and limited generalisability due to the predominantly young and healthy adult participants (aged 18–24). However, this sample was chosen due to their known regular use of activity tracking devices. The self-selected recruitment method may introduce selection bias, as the students who volunteered may not fully represent all students or young adults. The validity and accuracy of the Fitbit tracker may therefore differ across different demographic groups, activity types, and between controlled and free-living conditions. Hence, it is crucial to accurately identify the target population and the specific type of activity when interpreting and applying the findings of this study. Readers should interpret these findings with caution, especially in clinical or uncontrolled activity contexts.
Lastly, the activity tracker market is experiencing rapid growth, with manufacturers continuously releasing newer and more accurate versions of devices. This study employed the Fitbit Inspire 2 (released September 2020) and ActiGraph GT3X+ (released June 2012), both of which are still widely in use. However, newer models have since been launched, including the Fitbit Inspire 3 (released September 2022) and ActiGraph GT9X (released February 2019).36,37 To ensure the most up-to-date information reaches the public, we recommend ongoing evaluation of validity across different activity tracker models.
Conclusion
Our findings underscore the high validity and moderate-to-high accuracy of the Fitbit Inspire 2 in measuring step count relative to manual counting within controlled settings among young, healthy adults. However, Fitbit displayed low accuracy in measuring distance travelled compared to actual distances. Both step count and distance data recorded by Fitbit tended to underestimate actual values. Notably, while gender differences did not significantly impact the validity of Fitbit in step counts, they did affect distance measurements, with females exhibiting higher accuracy than males. Given the evolving nature of the activity tracker market, future research could focus on evaluating the performance of newer models to ensure ongoing accuracy and relevance.
Footnotes
Acknowledgements
We thank our colleagues and participants for their involvement in this study.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
