Abstract
Background
Walking patterns in stroke survivors are highly heterogeneous, which poses a challenge in systematizing treatment prescriptions for walking rehabilitation interventions.
Objectives
We used bilateral spatiotemporal and force data during walking to create a multi-site research sample to: (1) identify clusters of walking behaviors in people post-stroke and neurotypical controls and (2) determine the generalizability of these walking clusters across different research sites. We hypothesized that participants post-stroke will have different walking impairments resulting in different clusters of walking behaviors, which are also different from control participants.
Methods
We gathered data from 81 post-stroke participants across 4 research sites and collected data from 31 control participants. Using sparse K-means clustering, we identified walking clusters based on 17 spatiotemporal and force variables. We analyzed the biomechanical features within each cluster to characterize cluster-specific walking behaviors. We also assessed the generalizability of the clusters using a leave-one-out approach.
Results
We identified 4 stroke clusters: a fast and asymmetric cluster, a moderate speed and asymmetric cluster, a slow cluster with frontal plane force asymmetries, and a slow and symmetric cluster. We also identified a moderate speed and symmetric gait cluster composed of controls and participants post-stroke. The moderate speed and asymmetric stroke cluster did not generalize across sites.
Conclusions
Although post-stroke walking patterns are heterogenous, these patterns can be systematically classified into distinct clusters based on spatiotemporal and force data. Future interventions could target the key features that characterize each cluster to increase the efficacy of interventions to improve mobility in people post-stroke.
Introduction
Walking patterns differ between individuals to the point that we can recognize people by how they walk. 1 These individual differences are more marked after a neurologic injury, such as stroke, 2 due to heterogeneity in stroke lesion type, size, location, and differences in recovery.3-6 These individual differences in walking patterns make systematizing treatment prescription for walking rehabilitation interventions a difficult clinical endeavor. The heterogeneity in walking patterns has been acknowledged by Knutsson and Richards 4 as early as 1979, stating that “therapy and training in the hemiparetic patient should preferably be adapted to the disturbance in each individual case.” In this study, 4 researchers used paretic electromyography (EMG) and walking kinematics to qualitatively identify 3 subgroups of abnormal muscle activation during walking post-stroke—early triceps surae activation, decreased activation of paretic musculature, and paretic muscle coactivation. Similarly, Olney and Richards 7 qualitatively identified different subgroups of walking impairments in the spatiotemporal, kinematic, and kinetic domains. Also, a seminal study by our co-authors* investigated paretic leg kinematics and used hierarchical clustering to identify 4 clusters of walking behaviors, distinguished by walking speed and paretic knee flexion angles at different points during the gait cycle. 5 While these studies identified individual differences in paretic leg function during walking, walking patterns post-stroke are defined not only by paretic leg function but also by non-paretic leg function. 8
To characterize functional differences between the paretic and non-paretic legs during walking, recent studies have used asymmetry measures in kinematics9-14 or forces.11,15-18 These asymmetry measures are compared to asymmetries observed in neurotypical controls to classify individuals post-stroke into subgroups with higher, comparable, or lower asymmetries than control participants.9-19 However, individuals might be classified as symmetric in 1 feature, for example step lengths, while remaining asymmetric in other features, for example joint kinematics. 19 Therefore, we reasoned that using an approach that allowed simultaneously assessing multiple measures of walking, we can quantitatively identify distinct subgroups of walking behaviors beyond those identified via a single asymmetry measure, which could inform more specific clinical rehabilitation targets.
Identifying subgroups of walking behaviors requires a heterogeneous sample that encompasses the different combinations of gait deviations observed in survivors of stroke. Achieving this heterogeneity necessitates a large sample size. However, studies in walking post-stroke typically collect small, single site samples: for example, a systematic review of 46 studies in walking post-stroke reported sample sizes between 8 and 39 participants 20 that lack the heterogeneity of walking behaviors observed post-stroke. Additionally, activity levels, socioeconomic, and ethnic disparities in post-stroke care across geographical locations influence the walking patterns of the samples engaged in research at different sites.21-25 Therefore, combining data across different sites increases sample size and heterogeneity of behaviors measured in research studies, which ultimately can improve the generalizability of research findings to the overall post-stroke population.
Here, we obtained measures derived from ground reaction forces (GRFs), which is the simplest data collected across research labs that use instrumented treadmills, to generate a multi-site sample and used a data-driven approach to identify subgroups of walking behaviors in people post-stroke and controls. We used sparse K-means clustering to obtain a subset of features that define walking clusters, and determined whether different levels of function and impairment distinguish these walking clusters, such that they are indicative of different walking subgroups. We determined whether the observed walking clusters are generalizable across research sites. We hypothesized that participants post-stroke will have different walking features resulting in different clusters of walking behaviors, which are also different from control participants.5,14,26 Our results could provide the basis for designing and testing targeted interventions aimed at improving walking quality in people post-stroke.
Methods
Data Curation
The lack of standardized protocols for collecting, processing, and analyzing walking data can limit researchers’ ability to combine walking data. 27 However, bilateral GRFs measured using instrumented treadmills are commonly collected across research sites. These GRFs result from muscles generating forces at each segment which then is applied to the ground during walking,28,29 providing insight into how each extremity contributes to the main objectives of walking, defined as shock absorption, stance stability, and forward progression. 29 Additionally, these GRFs can be used to derive spatiotemporal walking metrics such as step lengths, step times, and speed. 30 Thus, we gathered GRF data from individuals with chronic hemiparetic stroke walking at their self-selected speed. We gathered data from: Rancho Los Amigos (Rancho, N = 7 31 ), collected from an overground force plate at 2500 Hz (Kistler Instrument Corp., Amherst, NY, USA) for the paretic extremity only. University of Southern California (USC2018 N = 22 9 and USC2021 N = 23, 12 the first author reviewed all data to ensure no duplicates between studies): participants walked for 3 minutes on a Bertec Instrumented Treadmill (Columbus, OH, USA) that measured GRFs at 1000 Hz. Kennedy Krieger Institute/Johns Hopkins University School of Medicine (JHU, N = 10 19 ): participants walked for 5 minutes on a Motek Instrumented Treadmill (Amsterdam, The Netherlands) that recorded forces at 1000 Hz. Emory University (Em, N = 9 32 ): participants walked for 1 minute on a Bertec Instrumented Treadmill that measured GRFs at 1000 Hz. University of Pittsburgh (Pitt, N = 2133-35) participants walked between 150 and 320 strides on a Bertec Instrumented Treadmill that measured GRFs at 1000 Hz (Figure 1A). We received GRF data for each gait cycle, normalized to 100 points per gait cycle, filtered with a fourth order lowpass Butterworth filter with a 15 Hz cut-off frequency for JHU and with a 20 Hz cutoff for Pitt. Data from Emory were shared as raw data. We filtered data from Em and USC using a 20 Hz cut-off low-pass zero-lag digital Butterworth filter.

Data consolidation and quality assessments and demographics for the final sample. (A) Pipeline for participant inclusion. N indicates the sample size at each point for the total sample (bold) and the sample from each research site. The final sample size was N = 112 participants. (B) Time post-stroke onset in months across all research sites. Participants from Pitt were more chronic than those from USC2021 (P = .033). (C) Post-stroke participant lower extremity Fugl-Meyer score. The maximum score is 34 points. Participants from Pitt were less impaired than those from Em and USC. (D) Treadmill self-selected walking speed for all participants. Control self-selected speed was significantly higher than the average speed in the stroke group (P < .001). Control participants also walked at speed matched to that of a stroke participant of the same age and sex (control matched). Participants from JHU walked at a significantly greater speed than those from Em and both samples from USC (P < .050).
We also collected data for 32 age and gender-matched neurotypical control participants using a Bertec Instrumented Treadmill. We used the control data in the definition of the different walking clusters, to determine whether individuals post-stroke could be comparable to neurotypical adults walking at the reduced speed of people post-stroke, such that controls and participants post-stroke would belong to the same clusters. Control participants were age and sex matched to the participants in the USC2021 study. 12
Participants post-stroke held onto a front handrail during walking in all studies except USC2021 12 and Pitt.33-35 Control participants did not hold onto a handrail. The respective IRB approved all studies, and all participants provided written informed consent before testing. Data collection for neurotypical control participants was approved by the USC IRB number HS-19-00430. Data gathering was approved by the USC IRB number HS-19-00075. IRB-approved Data Use Agreements (DUA) were established between USC and each research institution. Data from Rancho were excluded as they were collected overground; all DUAs and data sharing manuals were first implemented with Rancho, which was vital in setting up this project. We accumulated GRF data for 92 participants post-stroke (Figure 1A). We compiled information about participant age, sex, time post-stroke in months, paresis, mass, treadmill walking speed, and lower extremity Fugl-Meyer score 36 (out of 34 points for the motor scale; Table 1).
Participant Demographics.
Descriptive statistics are presented as average ± standard deviation with the range in brackets.
Abbreviations: F, female; M: males; L: left; R: right; SS: self-selected.
Significant differences between participants post-stroke and controls (P < .050).
P < .050 significant difference in the frequency of males compared to females in the study sample.
Data Processing
From the GRF data collected across labs, we used custom code and derived 17 walking variables common across all labs. Variable definitions are presented in Table 2. We obtained averages across strides for all variables for each participant. Data processing was done using custom code written in Matlab 2021b (The MathWorks, Natick, MA, USA).
Group-Level Averages for Gait Variables in Participants Post-Stroke and Control Participants Walking at Matched Speeds.
Variables in bold were included in sparse clustering analyses. t-Tests corrected for multiple comparisons using the Benjamini and Hochberg false discovery rate (FDR). 37
Significant differences between the paretic and non-paretic extremity of post-stroke participants.
Significant differences between participants post-stroke and controls.
For data collected in neurotypical adults, leg dominance was defined as the leg they would use to kick a ball, which was the right leg for all participants. We compared the non-dominant leg of control participants to the paretic leg and the dominant leg to the non-paretic leg.
Stride-by-stride data in a subset of participants post-stroke and controls are available for download from the Stroke Initiative for Gait Data Evaluation database 38 hosted by the Archive of Data on Disability to Enable Policy and Research.
Statistical Analyses
Statistical analyses were done in RStudio, R version 4.1.2. Since multiple researchers collected data, identifying potential acquisition issues and quality assessment was done post hoc. To remove noisy data and outliers, we quantified the mean and standard deviation of all variables in our sample, including both participants post-stroke and controls and removed participants with datapoints outside of the mean ± 3 × standard deviation range (Supplemental Table 1). The final sample comprised 81 participants post-stroke and 31 controls for 112 participants (Figure 1A). For all analyses comparing participants post-stroke with neurotypical controls, we used data with neurotypical control participants walking at a matched speed to participants post-stroke.
Given that many walking variables are correlated due to the inherent coordination found in the walking pattern, the clusters found in the data can be accounted for by a small subset of features. Thus, we used sparse K-means clustering analyses, 39 which uses a Lasso penalty 40 to derive a sparse subset of features with non-zero predictors. We z-scored all variables before analyses, and set K = 5 clusters in agreement with prior work identifying 4 subgroups of participants post-stroke, 5 and an additional control group. We verified that K = 5 provided clusters that maximized the between clusters distance via the Krzanowski and Lai index. 41 Using the sparcl package in R, we chose the tuning parameter for the Lasso penalty, 40 which determines the number of non-zero predictors to use in our analyses that maximizes between cluster variance and minimizes within cluster variance. We ran 100 permutations in a search space between 1 and the square root of the number of candidate variables included in the sparse clustering analyses, that is, sqrt(17) and set the number of random starts to 100 to avoid finding a local minimum.
We assessed stability of the clusters identified in this study via the Jaccard similarity index.42-44 We resampled the 112 participants with replacement via bootstrap to obtain 10 000 new samples, and identified K = 5 clusters for each bootstrap iteration. We then measured the proportion of observations consistently assigned to the same cluster over each iteration; this proportion is the Jaccard index. A Jaccard below 0.6 indicates that the clusters are unstable, a Jaccard between 0.6 and 0.75 shows moderately stable clusters, between 0.75 and 0.85 stable clusters, and a Jaccard above 0.85 indicates highly stable clusters.
We used linear models with cluster as a categorical fixed effect to determine whether there were significant differences across clusters in participants’ biomechanical features, as well as participant demographics and Fugl-Meyer scores. We performed multiple comparisons using Tukey–Kramer adjusted critical values. We used the results from these multiple comparisons to characterize the cluster-specific walking behaviors, by identifying which features were significantly different across clusters. Within each cluster, we used t-tests with FDR corrected P-values for all spatiotemporal and force variables, to compare paretic values relative to non-paretic values to determine asymmetries in each cluster.
We assessed generalizability of each research site by determining the number of participants from each site that were included in each cluster. To assess whether the observed clusters can be generalized, we performed a leave-one-out sparse K-means clustering approach, leaving out data from each site simultaneously (Em, Pitt, JHU, USC2018, and USC2021). We mapped the clusters obtained in each iteration of the leave-one-out approach to the clusters using the full dataset. We interpret cluster generalizability based on its Jaccard 44 and the similarity of the descriptive walking and impairment measures within each cluster.
Results
Our final sample was composed of 112 participants (Figure 1; Tables 1 and 2; Supplemental Figure 1). At a group level, participants post-stroke showed significant differences in paretic swing (P = .007) and stance times (P = .005), medial and lateral GRFs (P < .001), braking (P = .032), and propulsion (P < .001) compared to the non-paretic extremity. Chronicity (P = .024), impairment measured via FM score (P = .001), and walking speed (P < .001) differed across sites (Figure 1B-D).
Sparse K-means clustering identified 8/17 features with non-zero weights from which 5 distinct clusters could be obtained (Figure 2, Supplemental Figures 2–5). The Krzanowski and Lai index (Supplemental Figure 2) was maximal for K = 5 clusters. Analyses of the clusters obtained showed different walking behaviors from those observed at a group level. The variables that maximized between cluster variance and minimized within cluster variance, as shown by the non-zero weights returned by the Lasso analyses, were paretic and non-paretic stance times, non-paretic propulsion, speed, paretic step length, paretic and non-paretic braking, and non-paretic step length (Figure 2). The data for the paretic extremity was combined with the non-dominant extremity in controls, and the data for the non-paretic extremity was combined with the dominant extremity in controls. We describe the obtained clusters next:

The sparse variables make up K = 5 clusters with distinct walking speeds and impairments. (A) Weights for each of the 17 candidate features for clustering. About 8/17 features had non-zero weights and were used in clustering analyses. (B) Individual observations within each cluster, plotted in discriminant component space, colored by cluster. Cluster 1 comprised 10 participants; 4 from JHU, 3 from Pitt, 2 from USC2018, and 1 from USC2021. Cluster 2, 22 participants; 11 controls, 4 from Pitt, 1 from USC2018, and 6 from USC2021. Cluster 3, 34 participants; 11 control participants, 1 from Em, 4 from JHU, 4 from Pitt, 7 from USC2018, and 7 from USC2021. Cluster 4 had 16 participants; 3 controls, 3 from Em, 5 from Pitt, 3 from USC2018, and 3 from USC2021. Cluster 5 had 30 participants; 6 controls, 4 from Em, 2 from JHU, 3 from Pitt, 7 from USC2018, and 7 from USC2021. (C) Participant speed (for participants post-stroke and controls) and impairment (for stroke participants) were measured via FM score across the different clusters. Only speed was used in the cluster definition. Solid horizontal lines indicate post-hoc significant differences between clusters (P < .050). The dashed horizontal line indicates group-level average speed across all participants and average Fugl-Meyer (FM) scores for all stroke participants compared with the cluster-level speed and FM. P, paretic; NP, non-paretic; St, stance; Sw, swing; Lat, lateral; Prop, propulsion; GRF, ground reaction force. (D) Participant impairment (for stroke participants).

Gait features for participants post-stroke and controls within each cluster compared to all other participants outside each cluster for a subset of sparse variables used in K-means clustering. C1, . . . C5 indicates participants included in the respective cluster number, and ~C1, . . . ~C5 indicates all participants not included in the respective cluster. Stance and swing times are expressed as a percentage of stride duration to account for differences in walking speed. Cluster 1: participants post-stroke with a fast-walking speed and asymmetric propulsion; Cluster 2: participants post-stroke and controls with moderate speed, short stance times, low propulsion, and symmetric steps. Cluster 3: participants post-stroke and controls with moderate speed, short stance times, and asymmetric forces; Cluster 4: participants post-stroke with a slow speed and frontal plane force asymmetries; and Cluster 5: post-stroke participants who walked slowly and symmetrically, with short swing times.
Stance times were a shorter percentage of the stride sample compared to all other clusters (Figure 3). Forces were highly asymmetric both for participants post-stroke and control participants in this cluster. The non-paretic peak lateral GRF was 62% greater than the paretic lateral GRF (P < .001), non-paretic propulsion was 62% greater than paretic propulsion (P < .001), and paretic braking was 28% greater than non-paretic braking (P = .001).

Stability and generalizability of clusters assessed via leave 1 out approach. (A) Jaccard stability index for each cluster for the entire dataset (horizontal black lines) and leaving out each sample. Colors indicate the sample left out to assess the Jaccard. The dashed gray line is the line above which clusters are considered stable. The Jaccard index is the proportion of observations consistently assigned to the same cluster over the bootstrap iterations. A Jaccard below 0.6 indicates that the clusters are unstable, a Jaccard between 0.6 and 0.75 shows moderately stable clusters, 0.75 and 0.85 stable clusters, and a Jaccard above 0.85 indicates highly stable clusters. (B) Average and standard deviation of the speed for each cluster for the entire dataset in black and the leave 1 out approach with colors indicating the sample left out from analyses.
Discussion
We identified 5 clusters of walking behaviors in a combined sample of people post-stroke and controls. Contrary to our hypothesis, we did not identify a cluster composed exclusively of control individuals: we identified a cluster with an equal number of control participants and participants post-stroke, the latter of whom seemed to have reduced walking impairment. We also identified clusters with mostly participants post-stroke and a few controls, which points to the fact that slow walking speeds can lead to aberrant gait patterns even in healthy controls. 46 The clusters obtained in our study point at different levels of function and impairment defining each cluster, which correspond to different walking subgroups post-stroke. We assessed the generalizability of clusters and observed that 4/5 clusters were generalizable across research sites. These clusters were: Cluster 1: a cluster of post-stroke participants with fast walking speed and asymmetric propulsion; Cluster 2: a cluster of controls and stroke participants walking with moderate speed and symmetric steps, with apparent impairments due to reduced speed, such as short stance times, low propulsion; Cluster 4: a cluster of participants post-stroke walking with a slow speed and asymmetric medial and lateral GRFs; and Cluster 5: a cluster of post-stroke participants who walked with a slow speed, short swing times and slow but symmetric steps. It may be possible to develop more personalized intervention targets by considering the cluster to which a given patient is assigned. For example, a post-stroke participant in the fast cluster can benefit from an intervention to increase paretic propulsion17,47-49; a post-stroke participant in the moderate symmetric cluster can benefit from an intervention to increase speed13,50-52; a post-stroke participant in the slow, frontal asymmetry cluster can benefit from interventions to reduce asymmetries in medial and lateral GRFs 53 ; and a post-stroke participant in the slow symmetric cluster can benefit from increasing speed, 54 swing times, and step lengths. The effects of these targeted interventions are still unknown, and future work will determine whether addressing cluster specific impairments will lead to a shift toward a control cluster, or a shift toward a different stroke cluster. Taken together, our results provide a preliminary cross-sectional analyses to estimate which subject-specific walking-related variables could be targeted in interventions to improve mobility and promote neuroplasticity in the long term. 55
Our findings complement those of our co-authors who showed speed to be the greatest determinant in allocating participants to specific clusters, 5 and previous work using speed alone to classify participants to ambulation categories. 6 Similar to Mulroy, 2003, we found 1 fast cluster of participants post-stroke, 1 moderate speed cluster, and 2 slow clusters. A recent study used paretic and non-paretic kinematic data in 36 stroke survivors and identified 6 distinct walking clusters 56 based on range of motion for the paretic and non-paretic side. Like our results, Kim et al. observed different types of impairment associated with the different clusters. A potential limitation of these 2 studies is that they comprise single-site datasets, which might be biased by geographical affordances. We complement the clusters described by Mulroy by providing insights into both paretic and non-paretic spatiotemporal characteristics and forces generated by each limb. One of our slow clusters had asymmetric mediolateral forces (Cluster 4), which corresponds to the slow extended cluster in Mulroy 2003: this cluster in Mulroy 2003 showed knee hyperextension in mid-stance, which limits pre-swing and swing knee flexion and toe clearance, leading to frontal plane compensations and asymmetric frontal plane forces, as observed in our study. Our other slow cluster, Cluster 5 also corresponds to the very slow velocity and excessive knee flexion cluster from Mulroy 2003. Thus, comparison of our clusters to those previously reported show consistencies of subgroups of walking patterns, and provide additional insights into non-paretic function in these subgroups.
The variables that had the largest effect on the between cluster variance, were paretic and non-paretic stance times, non-paretic propulsion, speed, paretic step length, paretic and non-paretic braking, and non-paretic step length. It is worth noting that none of these variables on their own are sufficient to identify the clusters that we observed, similar to what was concluded by Mulroy, 2003. For example, speed ranges overlapped for participants in Clusters 2 and 3 and for participants in Clusters 4 and 5, yet different stance times and forces were observed between clusters despite overlapping speeds, indicating that speed alone was not the only factor driving the between cluster differences. Thus, here we show that while speed influences many gait features, speed alone is not enough to classify post-stroke individuals in more specific subgroups of impairment.
Cluster 3, the moderate speed and asymmetric cluster was not stable or generalizable, consistent with the fact that the common impairments of stroke participants, and how they compare to control participants within Cluster 3 was less clear. For example, participants in Cluster 3 had highly asymmetric forces between the paretic and non-paretic extremity whereas control participants within the cluster were not asymmetric. Similarly, at a group level, participants within Cluster 3 were the only ones to show marked step length asymmetry with longer paretic steps. It could be the case that Cluster 3 is composed of participants for whom the measured spatiotemporal and kinetic variables included in analyses does not account for variability in the data. Future work will aim to use joint level kinematic, kinetic, or EMG measures to determine whether these more specific measures can detect evident differences in walking patterns in these individuals.
We observed differences in cluster specific speed and FM scores compared to group level averages in these variables. For example, the average FM score for all participants was 24.6 ± 5.6 points, yet as seen in Figure 2D, this average value does not align with any of the cluster-specific averages. We also observed that similar levels of impairment measured via overlapping FM scores between clusters could be associated with vastly different walking speeds: participants in Cluster 1 walked significantly faster than those in Clusters 2 and 3, despite no differences in FM scores. Clinically this might imply that participants in Clusters 2 and 3 have the capacity to walk at faster speeds, given their impairment. Note that these differences in behavior and capacity between clusters could inform clinical practice beyond group level averages.
Interestingly, our findings show that some commonly reported spatiotemporal impairments post-stroke are observed in only a subset of post-stroke participants.8,28 For example, we observed asymmetric stance times only for participants in the fast cluster, and asymmetries in step lengths only in Cluster 3, 11 as well as varying degrees of paretic propulsion that were not uniformly associated with walking speed or impairment. 57 Finally, and surprisingly, we did not observe any asymmetries in the peak vertical GRF among our participants. Thus, our results show in a sample of 81 participants post-stroke that spatiotemporal asymmetries are less pronounced than what has been shown in the seminal literature using smaller samples.
We observed significant differences in participant impairment and function across research sites. Consistent with prior literature, control participants’ self-selected walking speed was faster than post-stroke participants’. 7 Similarly, participants from Pitt had lower impairment compared to Emory and USC2018, while participants from JHU walked at faster speeds than those from Em, and USC. Additional information about activity levels and access to post-stroke care across geographical locations could provide information about the causes of the differences in impairment and function between sites. Given these differences, we assessed generalizability by assessing cluster stability when removing each experimental sample from our dataset. This approach assessed both generalizability of the cluster such that its definition did not depend on the experimental samples, as well as generalizability of the experimental samples such that if a sample needed its own cluster, it would imply that participants in that sample are distinct from all other participants. We did not find a cluster of participants post-stroke from a single research site. Additionally, 4/5 clusters were generalizable across research sites. However, we did find that some research sites did not contribute to specific clusters. For example, participants from JHU were not part of Cluster 4, and participants from Em were not part of the fast cluster, consistent with the significant differences in walking speeds observed between both samples. These results confirm that some samples may not encompass all different walking behaviors observed after stroke.
Limitations
There are multiple limitations in our study. First, this study used variables derived from gait analyses to identify clusters indicative of subgroups of walking impairment. The variables used in this study are easy to capture via GRFs and represent global measures of walking that relate to the coordinated action of multiple segments and joints. However, joint-level kinematic and kinetic measures might capture impairments that cannot be inferred from GRF data alone. Future research efforts should aim to establish standards for data collection that allow combining more complex data across sites, such as joint-level kinematics, kinetics, and EMG patterns. Second, we extracted peak force as the feature representing GRF data due to the ease to capture these measures. Future work can assess whether other GRF features, such as impulses, or GRFs during more specific points during the gait cycle provide more insight into the range of walking impairments in people post-stroke. Third, we were also limited in the sample size of control participants which was unbalanced and matched in speed to the USC2021 sample only. The sample size across study sites was also unbalanced, with samples from Em and JHU consisting of ~10 participants and the samples from USC and Pitt consisting of ~20 participants, providing different heterogeneity within each sample and different contribution of each site to the overall sample. Stability analyses show however, that changes in stability when removing each of the samples were not just due to sample size: removing the JHU sample which consisted of 10 participants with the highest speeds decreased cluster stability for most clusters, while removing the USC2018 sample did not change cluster stability uniformly. We interpret this to indicate that the characteristics of the samples had a greater influence on cluster stability than did sample size. Future multi-site studies can address these points. Fourth, participants in this study were all in the chronic phase of stroke recovery; future work should assess if these clusters are consistent or change across recovery phases. Fifth, data were collected with participants walking on a treadmill which may induce changes in walking patterns compared to overground walking. 58 However, some of our clusters were similar to those reported by Mulroy 2003, 5 indicating consistency of clusters on the treadmill compared to overground. In addition, the use of a handrail in some of the experimental protocols could have influenced participant’s walking patterns. 59 Future work could systematically assess whether participants are assigned to the same clusters measured during treadmill walking as to during overground walking. Finally, we used average metrics within participants, despite different patterns of stride-to-stride variance during post-stroke walking. 13 Future work in larger samples can include stride-to-stride variance as additional features to characterize post-stroke walking patterns.
Conclusions
We compiled and curated GRF data across multiple research sites in people post-stroke and controls. Using simple measures derived from GRFs, we identified 5 clusters of different walking behaviors. Four of these clusters captured walking subgroups that were generalizable across study sites. Our findings provide new information about how to classify the heterogeneity of gait patterns post-stroke. Identifying more specific types of walking impairment and different intervention targets for each subgroup can move the field of neurorehabilitation toward a precision medicine approach, 60 and improve the effectiveness of rehabilitation interventions.
Supplemental Material
sj-docx-1-nnr-10.1177_15459683231212864 – Supplemental material for Multi-Site Identification and Generalization of Clusters of Walking Behaviors in Individuals With Chronic Stroke and Neurotypical Controls
Supplemental material, sj-docx-1-nnr-10.1177_15459683231212864 for Multi-Site Identification and Generalization of Clusters of Walking Behaviors in Individuals With Chronic Stroke and Neurotypical Controls by Natalia Sánchez, Nicolas Schweighofer, Sara J. Mulroy, Ryan T. Roemmich, Trisha M. Kesar, Gelsy Torres-Oviedo, Beth E. Fisher, James M. Finley and Carolee J. Winstein in Neurorehabilitation and Neural Repair
Footnotes
Acknowledgements
We want to thank Chang Liu, PhD, Sungwoo Park, PhD, Tara Cornwell, Ryan Novotny, Catherine Yunis, Carly Sombric, PhD, Digna DeKam, PhD, Thu Nguyen, PhD, Justin Liu, Purnima Padmadaban, and all trainees who contributed to data collection at all study sites.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by: NIH NCATS grants nos. KL2TR001854 (N Sánchez), R03TR004248 (N Sánchez), and NIH NCMRR grant no. P2CHD065702 (N Sánchez); NIH NICHD grant no. R01-HD091184 (J.M. Finley), NIH NINDS grant no. R21 NS120274 (N. Schweighofer); NIH NIA grant no. R21 AG059184 (R. Roemmich); NIH NINCHD grant nos. R01 HD095975 and R21 HD095138 (T. Kesar).
Supplementary material for this article is available on the Neurorehabilitation & Neural Repair website along with the online version of this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
