Abstract
Fitness bands are widely available and assist with tracking the number of steps taken. However, for older people with slow gaits, shorter step widths and/or use of ambulatory devices, the accuracy of fitness bands for step counting has not been well studied. Using four commercially available fitness bands (Garmin Vivofit2™, Fitbit Flex™, Up3™ and Microsoft Band™), we studied 30 older people with varying ambulatory abilities. We videotaped participants walking and compared the videotaped step count with the fitness band counts. Only 5 of the 30 participants had accurate readings within a ±20 percent accuracy for all four bands. There was no relationship between the step speed and accuracy of the fitness bands. Participants using walkers and walking sticks had none of the bands that met the ±20 percent accuracy. Canes were more variable with accuracy. Fitness band manufacturers may need to tune their algorithms for use by older people.
Keywords
Introduction
The importance of physical activity in health promotion and disease prevention is well recognized with recent evidence that physical activity may attenuate the relationship between sedentary behavior, health outcomes and cognitive decline in older adults.1,2 The explosion in the use of fitness bands has the potential to improve physical activity in older adults, 3 and older adults indicate the usefulness of fitness bands even 8 months after initiation. 4 In one study, older adults preferred commercial fitness bands to a simple pedometer. 3
Fitness bands use accelerometers, measuring movement on the x (forward-backward), y (side to side) and z (up and down) axis. Each company has a proprietary algorithm within their device to capture the movements associated with steps and, in some cases, movement during sleep. Actigraphy is a more sophisticated instrument that uses an accelerometer but also may have other sensors and generally have a more precise measuring schema including more frequent measures. For these reasons, some researchers have compared fitness bands to actigraphy with the actigraph being the “gold standard.” 5
However, because older adults may have a slower pace and a shorter step width, the accuracy of the step counts obtained by commercial fitness bands may be in question. 6 For the purposes of this study, accuracy is defined as the agreement with the number of steps as compared to another measure. Evenson et al., 7 in a systematic review of 22 studies, conclude that step count accuracy in field-based studies (vs laboratory-based studies) is variable when compared to accelerometers. Dominick et al. found that there was an underestimation of sedentary and light activity by Fitbit™ in a younger sample (n = 19, ages: 18–37 years) in a natural setting. 8 Huang et al. found that there was more underestimation at slower speeds, again in a young sample (n = 40; average age: 23 years) when comparing eight devices. 9 When compared to actigraphy, Rosenberger et al. 5 reported that all nine devices they tested had ~20 percent error rates for step counting for a younger population (mean age: 36 years, 40 participants) in a natural environment.
As well there are questions regarding where the fitness band is worn for determination of accuracy: Simpson et al. 10 had 42 participants walk at seven different speeds while wearing a Fitbit One™ while being videotaped. They also tested the devices worn on the waist compared to devices worn on the ankle, finding that the ankle worn devices were more accurate.
As there is relatively little evidence on the accuracy of the step counts in older people (age: 65 years and above) verified with videotaping, the purpose of this study was to compare fitness band accuracy by actual step count in a range of older people who were community dwelling and with testing done in the field versus a lab.
Methods
Sample and setting
A total of 30 older adults were recruited from a continuing care retirement community (CCRC) in northeast Ohio. Recruitment was done in conjunction with the CCRC department of rehabilitation who publicized the study following institutional review board (IRB)-approved approaches. The CCRC provides independent, apartment style living extending through skilled nursing facility and hospice care to more than 600 older people.
Procedure
Four fitness bands were purchased commercially: Garmin Vivofit2™, Fitbit Flex™, Up3™ and Microsoft Band. The fitness bands were chosen based on popularity as well as differences in use. For example, the Garmin Vivofit2 did not require recharging and the Microsoft Band did not require a tablet computer or smart phone for data capture. Because older people vary in their uses of technology like tablet computers and smart phones, a deliberate decision was made to include a variety of different devices. Participants were recruited following IRB approval and scheduled for the study. First, informed consent was obtained followed by demographic and health history information.
We then completed two falls risk assessments. The Missouri Alliance for Home Care (MAHC) instrument was used for falls risk assessment. The MAHC has been used in home health care agencies in the United States as a standardized and validated falls risk assessment. There are 10 questions in the MAHC and it includes a multidimensional assessment including medication use, history of falls and incontinence. A score of 4 or higher indicates a risk for falls. There is evidence of construct and predictive validity 11 with suggestions that the cut score with optimal sensitivity and specificity is 6 (68.7% and 46.9%, respectively).
The second falls risk assessment was the Stopping Elderly Accidents, Deaths and Injuries (STEADI) screening tool as developed by the Center for Disease Control and Prevention (CDC). 12 This is a three-question screening tool developed by an expert panel and designed to be evidence-based and useful in clinical practice. The three questions focus on falls in the past year, feeling unsteady when standing or walking and worry about falling. Scores range from 0 (no risk) to 3 with any score of 1 or higher considered as a risk for falls. There is no reliability reported for the STEADI although there is evidence of concurrent validity when compared to other falls risk assessments. 12
Using a table of random numbers from random.org, the four fitness bands were applied to the wrists of the participants, two on each wrist. The participants then walked at their normal pace on an indoor walking track while being video recorded. The number of steps for each device was recorded at the beginning and at the end of the walking period. The participants were requested to walk 240 feet. Following the walking portion of the study, the videos were independently reviewed and coded by two research assistants for the step count.
Analytically, following Rosenberger et al., 5 we used a ±20 percent accuracy between the videotaped step count and the device step counts. We also used Bland Altman plots to examine the accuracy of the videotaped step count compared to the step counts obtained from the four devices. 13 Bland Altman plots are widely used to examine whether there are systematic differences in measures based on the magnitude of the measures (e.g. are there larger differences with higher step counts?) or systematic bias where score differences change based on the magnitude of the scores. We wanted to determine whether there were differences based on the numbers of steps as perhaps the fitness bands were more accurate with higher step counts.
Results
There were 30 older adults who participated. The older adults were, on average, 80.6 years old (standard deviation (SD) = 7.66); most (n = 18; 60%) were female and all were White, consistent with the demographics of the CCRC. Diagnostically there was a wide range of diagnoses including post-cerebro-vascular accident, post-hip or knee replacement, arthritis, hypertension, cardiac disease and others. Assistive ambulatory devices were used by 11 participants including walking sticks (n = 1), walker (n = 3), wheeled walker (n = 3) and canes (n = 4).
The average score on the MAHC was 4.33 (SD = 1.67) with 19 (63.3%) of the participants scoring 4 or higher. The range was from 2 to 8. For a MAHC score of 6, seven (23%) were designated at risk for falls. The mean score on the STEADI was 1.18 (SD = 0.84) with a range from 0 to 3; 24 (80%) of participants scored 1 or higher, indicating a risk for falls. The correlation between the two instruments was 0.14. The correlation of the step counts between the video raters was 0.99.
A total of 10 participants walked for one 120-foot lap, one participant did 1.5 laps and the remaining did two laps. The correlation between the distance and speed of walking was 0.29. There was a device failure for the Fitbit Flex on one test day resulting in 25 observations for the Fitbit Flex with the fitness band failing to sync with the tablet computer. There was device failure for one participant with the Microsoft Band (29 observations captured) where the step count did not change during the walking episode for one participant. There were three devices failures for the Up3 (27 observations captured), again with a failure to sync with the tablet computer.
The four bands met the ±20 percent accuracy range compared to the verified step count by video for only five participants. Among bands, the Garmin Vivofit was accurate for 12 participants, the Fitbit Flex for 12 participants, the Microsoft band for 12 participants and the Up3 for 11 participants.
We then examined the five participants with the bands meeting the ±20 percent accuracy parameters, and there was no association between accuracy and time (50–109 s for the five participants).
By device, those using walkers had no bands within the ±20 percent parameters, regardless of whether the walker was wheeled or not. The participant using walking sticks also had no bands within the ±20 percent parameter. There was more variety with canes: for one participant, one band was within the ±20 percent parameters (Microsoft Band); for two participants, the Fitbit Flex and Microsoft band were within the ±20 percent parameters; and for one participant, all four bands were within the ±20 percent parameters. The least and most accurate participants had the fastest paces (69 and 84 s), while the others were 119 and 123 s. Thus, there appeared to be no impact from the pace on the accuracy for those with canes.
We then undertook Bland Altman analysis. 13 For the one-sample t-test, there were no significant differences. We then plotted the difference score by the mean score and found that there were proportional biases for all of the fitness bands; the differences were predominantly in the lower direction, suggesting under-counting by the bands (see Figures 1 to 4). Finally, we undertook linear regression with the difference score as the dependent variable and the mean as the independent variable. Each of the linear regressions had statistically significant coefficients, indicating proportional bias for all four of the bands.

Bland Altman plot Garmin Vivofit™.

Bland Altman plot Fitbit Flex™.

Bland Altman plot Microsoft Band™.

Bland Altman plot Up3™.
Discussion
In summary, only 5 of the 30 participants had accurate readings within a ±20 percent accuracy for all four bands. There was no relationship between the step speed and accuracy. Participants using walkers and walking sticks had none of the bands that met the ±20 percent accuracy. Canes were more variable with some bands meeting the ±20 percent accuracy for two of the participants, but this, again, was not related to the time it took to walk the prescribed distance.
The finding that the devices did not accurately capture steps for persons using walkers is not surprising since the accelerometers used capture the motion of the wrists. As Simpson et al. 10 found, it may be preferable for persons using walkers to wear the devices on their ankles. Of note, however, there is nothing in the consumer materials for these fitness bands to indicate that persons with walkers may not want to wear the devices on their wrists.
Our findings are consistent with other field-based studies using younger participants7,9 that found an under-count and also consistent with Rosenberger et al. 5 who found commercial bands were less accurate than actigraphy. Our results are not consistent with Storti et al. 6 who found accurate step counts within an older group; however, Storti et al. used different devices, which may explain the inconsistency in the findings.
The question arising from this study is the extent to which accuracy matters for step counting. This issue has not been addressed sufficiently in the scientific literature. However, the finding that the fitness bands in this study under-counted the steps suggests that the manufacturers may want to consider tailoring the devices for the older consumer or for consumers with slower paces.
A secondary finding has to do with the two falls risk assessments where both indicate a high proportion of these community dwelling older people to be at risk for falls. In the case of STEADI, 80 percent of the participants were identified at risk for falls, whereas the MAHC identified 63 percent at risk with a cut score of 4 and 23 percent with a cut score of 6. Future research is needed on both instruments, prospectively, to identify prediction of falls, because of the seriousness of the event. As well, the implications of notifying an older person to be at risk for falls who may not actually be at risk are unknown.
Future research efforts should focus on the accuracy of the commercial devices placed on the ankle for persons with walkers, both wheeled and non-wheeled, and those who use canes for accurate capture of steps. While we did not find any patterns between accuracy and speed of walking, we had a wide range of speeds, reflecting the heterogeneity of walking abilities in older people. As well, the impact of under-counted steps bears examination—does under-counting of steps in older people lead to them taking more steps to hit a benchmark or do they use the devices to attain and track their activity without benchmarking? The risk of over-doing physical activity is also unknown: do older people who use devices that under-count steps increase their step numbers and sustain injury or develop pain?
Our limitations are those of field studies: while we asked the participants to walk two laps, some were unable to do so. Thus, we had variability in the distances. Because this reflects the differences in abilities of older people, we consider this an acceptable limitation. Second, because we randomly assigned the devices to the wrists, regardless of dominant hand, we also did not follow the manufacturers’ suggestions for how to apply the devices.
Conclusion
Physical activity is as important for older people as for other age groups. For fitness bands to be used as the metric or counter for steps, however, there is evidence from this study that there are under-counts of steps. While the impact of under-counting is not clear, future research is needed on the use of these devices by older people. As well manufacturers of the devices may also need to consider fine tuning their algorithms.
Footnotes
Acknowledgements
The author wishes to acknowledge David Schell, Miriam Pekarek, Lisa Mansour and Debora Erksa and the residents of Ohio Living Breckenridge Village.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
