Abstract
Background. Performance variability in individuals with aphasia is typically regarded as a nuisance factor complicating assessment and treatment. Objective. We present the alternative hypothesis that intraindividual variability represents a fundamental characteristic of an individual’s functioning and an important biomarker for therapeutic selection and prognosis. Methods. A total of 19 individuals with chronic aphasia participated in a 6-week trial of imitation-based speech therapy. We assessed improvement both on overall language functioning and repetition ability. Furthermore, we determined which pretreatment variables best predicted improvement on the repetition test. Results. Significant gains were made on the Western Aphasia Battery-Revised (WAB) Aphasia Quotient, Cortical Quotient, and 2 subtests as well as on a separate repetition test. Using stepwise regression, we found that pretreatment intraindividual variability was the only predictor of improvement in performance on the repetition test, with greater pretreatment variability predicting greater improvement. Furthermore, the degree of reduction in this variability over the course of treatment was positively correlated with the degree of improvement. Conclusions. Intraindividual variability may be indicative of potential for improvement on a given task, with more uniform performance suggesting functioning at or near peak potential.
Introduction
Therapeutic research in aphasia typically characterizes baseline and improved language skills in terms of mean scores on a specific task or assessment battery. Whereas this approach succeeds at capturing variability across individuals, it fails to capture such variability within individuals. Performance fluctuations within a single individual (intraindividual variability) are typically perceived as an inconvenient impediment to reaching the desired general conclusions about a new therapy. But treating intraindividual variability as a nuisance parameter or measurement error—for example, of the same magnitude and significance as interindividual variability 1 —may be giving up important and highly relevant information about the therapy. In fact, short-term performance inconsistency on a particular task may represent a characteristic feature of—and a metric to gauge—an individual’s functional status. Particularly in the context of large variability, mean performance may oversimplify the true nature of behavior and inadequately capture the range of ability, 2 obscuring insight into potential therapeutic benefit and outcome assessment on an individual basis.
Existing limited research on intraindividual variability in cognitive and perceptual-motor function in healthy aging3-6 and dementia7-9 suggests a relation between increased intraindividual variability and decreased performance. Yet other, seemingly contradictory, findings suggest that greater intraindividual variability has positive implications for acquiring skills with practice or training. For example, increased intraindividual variability in a cognitive or motor skill during learning precedes (and presages) mastery of that skill during development, 10 and in cognitive training of healthy older adults, the pretreatment degree of intraindividual variability predicts higher response accuracy and performance improvement. 11
These data suggest that performance variability may suggest susceptibility to change and/or the potential for learning. Fluctuations representing adaptive variability 12 may be conceived of not as vulnerability, but as potential. Furthermore, distinguishing between adaptive and maladaptive variability may be key to understanding the significance of these measures in predicting future outcomes.
In the realm of stroke recovery, extensive investigation has addressed differences between individuals, yet there is little work studying performance variability within individuals (although important work has explored the role of attention in intraindividual variability in aphasia13,14). In the recovery of language functions after stroke, intraindividual performance variability has not been investigated, either as a correlate of present functioning or as a predictor of posttreatment ability (but see Small et al 15 for a theoretical study). The implications are far-reaching. From a research standpoint, our knowledge of language recovery in aphasia is limited to mean scores and effect sizes using pooled standard deviations, thus neglecting individual parameters of variability. Such data may represent fundamentally incomplete metrics, substituting a crude numerical proxy for the more nuanced complexity of performance, thus profoundly affecting our understanding of recovery. From a clinical standpoint, such omission could have graver consequences as the most desirable measure of rehabilitation success is a patient’s consistent performance in the real world, not maximal performance in the clinic or the possibility of good performance under ideal conditions.
We hypothesize that intraindividual variability on a language task is predictive of the ability of an individual with aphasia to improve mean performance on that task through training. We investigate this hypothesis in a clinical trial of an intensive, imitation-based aphasia therapy motivated by neurophysiological evidence.16,17 The therapy uses a computer interface to prompt repetition of words and phrases to engage a frontal-parietal motor cortical network involved in both observation and execution of speech. 18 A total of 19 patients with ischemic stroke received imitation-based therapy involving repetition of words and phrases. In this article, we report on an experiment testing the hypothesis that pretreatment intraindividual variability would predict therapeutic outcome.
Methods and Materials
Participants
A total of 19 native English speakers with aphasia following single, left-hemisphere ischemic stroke, confirmed by neurological examination and MRI, were recruited (age range 31-72 years; mean = 53.5 years; SD = 11.7 years; 4 female [21%]). All had sustained a single stroke 5 to 130 months prior to enrollment (mean = 41.6 years; SD = 42.9 years). Demographic and neurological information are in Supplemental Table 1. The institutional review boards of the University of Chicago and University of California, Irvine, approved the study. Consent was obtained according to the Declaration of Helsinki.
Experimental Summary
The intensive imitation-based therapy, administered 6 days per week for three 30-minute sessions each day, required participants to listen to words and phrases presented by 6 different speakers and to repeat them either once or multiple times. Half of the participants also saw a video of the speaker during the presentation. Because there was no statistical difference in any measure between those who saw the speaker and those who did not, all data have been aggregated for this report. Complete details about the IMITATE therapy system can be found in supplemental material and in earlier work.16,17
Over the 6-week therapeutic period (weeks 1-6 of the overall study), participants undertook the specialized speech therapy on a preprogrammed, dedicated laptop. Participants underwent 2 behavioral assessments (Western Aphasia Battery–Revised [WAB], repetition test) that were administered twice before and twice after therapy, with all evaluations 6 weeks apart (weeks −6, 0, 6, and 12). These measures were administered twice pretherapy to establish a stable baseline and twice posttherapy to establish immediate changes and maintenance. Figure S1 in supplemental material depicts the timing of these assessments relative to therapy.
Western Aphasia Battery
The WAB 19 was used as the primary outcome measure because it was anticipated that benefits of our imitation-based therapy would generalize to other domains of language.16,17,20 The WAB was administered at each of the 4 main behavioral assessment sessions by a speech-language pathologist (SLP) blinded to treatment group. We analyzed the WAB Aphasia Quotient (WAB-AQ), Cortical Quotient (WAB-CQ), and the 4 subcomponents of the WAB-AQ (spontaneous speech, auditory verbal comprehension, repetition, and naming and word finding). There was no significant difference in any of these measures between the 2 pretreatment sessions or between the 2 posttreatment sessions (P > .05 on 2-tailed paired t-tests).
Repetition Test
Tests of repetition accuracy were administered during all 4 pretreatment and posttreatment behavioral assessments. These were administered by an SLP using words and phrases randomly selected from the pool of IMITATE therapy stimuli. Repetition tests and behavioral assessments were performed by different SLPs, blinded to the other’s findings. One participant (2) was excluded from this analysis because of missing data, leaving 18 participants.
Repetition test stimuli consisted of words and phrases of high difficulty, based on the level to which the patient was expected to advance. Each block of words contained 10 words. Each block of phrases contained 10 phrases with a varying number of words (2-6) depending on level. For both blocks, each word was scored on a 5-point scale (0 signifying no vocalization; 5 indicating accurate, prompt repetition). Scoring was performed once offline by a single SLP for all participants, and therefore, reliability rates are not reported. Performance on these measures was combined in a single repetition score for each time point (mean score for Words and Phrases). There were no significant differences between week 6 and week 0 repetition scores, or between week 6 and week 12 repetition scores (P > .05 on 2-tailed paired t-tests).
Changes in Behavioral Performance
In all, 7 measures of language performance were studied: WAB-AQ, WAB-CQ, the 4 subcomponents of the WAB-AQ, and the score from the repetition test. For each measure, the pretreatment score was taken to be the mean of week −6 and week 0 scores (see supplemental material for details of weighting the repetition test), and the posttreatment value as the week 6 score (posttherapy repetition). We did not use scores from week 12 (fourth behavioral assessment) because 2 participants missed this assessment. Therefore, our definition of improvement, for all measures, is the week 6 score minus the mean for weeks −6 and 0.
Pretreatment and posttreatment scores were compared using 2-tailed paired t-tests. All significance tests used α = .05. Because of the nested nature of the WAB measures (ie, 4 subcomponents of the WAB-AQ are used, which also contribute to WAB-CQ), Bonferroni correction with n = 5 was applied for the repetition assessment and the 4 subcomponents of the WAB-AQ (spontaneous speech, auditory verbal comprehension, repetition, and naming and word finding).
Intraindividual Variability as Predictor of Improvement
The repetition test based on stimuli from the pool of IMITATE items was used to test directly the hypothesis that intraindividual variability in a language task is predictive of the ability of an individual with aphasia to improve performance on that task through training. This single measure was selected for 2 reasons: (1) there were 2 days on which the repetition test was administered at least twice (week 0 and week 6), allowing a robust assessment of individual variability before and after treatment, and (2) these stimuli were developed to be grossly equivalent in complexity, in contrast with the hierarchical ordering of increasing complexity on the WAB subtests. We chose not to pool data from weeks −6 and 0 when computing intraindividual variability to avoid confounding variability on different time scales; our intraindividual variability scores measure performance variability within a given day only. We computed a repetition intraindividual variability measure pooling variances of words and phrases blocks. Details can be found in supplemental material.
Our specific question was the extent to which pretreatment repetition intraindividual variability predicted improvement in repetition mean, which we determined by computing a Pearson correlation coefficient. We used week 0 and week 6 repetition mean scores as the pretreatment and posttreatment values, ignoring week −6 scores for consistency with intraindividual variability calculations. There was no significant difference between pretreatment repetition mean calculated with and without the week −6 repetition test (2-tailed paired t-test, P > .05).
We used stepwise linear regression to identify those pretreatment variables that best predicted improvement. In addition to pretreatment repetition intraindividual variability, these variables included participant age, months poststroke onset (MPO), number of sessions completed (NSC), aphasia type (fluent vs nonfluent), and pretreatment repetition mean. NSC was tracked by automated video recording of patient participation during each session via the built-in laptop camera and then verified by review of these recordings. Stepwise regression was performed with the MATLAB stepwise function, using the default settings: a new predictor is selected if its regression coefficient would be significantly nonzero at the .05 level, and an existing predictor is removed if its coefficient is not significantly nonzero at the .10 level.
Results
Changes in Behavioral Performance
Statistically significant improvements were demonstrated in 5 of the 7 language measures assessed, with correction for multiple comparisons. Results of 2-tailed t-tests are summarized in Table 1 with uncorrected P values. We used Bonferroni correction (n = 5) for the 6 WAB measures versus the repetition test to determine significance. Significant improvement was measured for the repetition test, WAB-AQ, WAB-CQ, and 2 of the 4 WAB-AQ subcomponents (repetition, and naming and word finding). The 2 remaining subcomponents of the WAB-AQ (spontaneous speech, auditory verbal comprehension) did not demonstrate significant change.
Performance Measures for All Participants.
Abbreviations: AQ, aphasia quotient; AVC, auditory verbal comprehension; CQ, cortical quotient; NWF, naming and word finding; Rep, repetition; SS, spontaneous speech; WAB, Western Aphasia Battery-Revised.
Significant values after Bonferroni correction for 2 comparisons (P < .05/5).
Intraindividual Variability as Predictor of Improvement
In this section, only the repetition test results are used. In contrast to Table 1, pretreatment results are from week 0 only, for reasons explained above. Pretreatment repetition mean ranged from 20.5% to 99.5% (overall mean = 79.4%; SD = 18.8%). Improvement in repetition mean from pretreatment to posttreatment (week 0 to week 6) ranged from −3.8% to 16.5% (median = 5.3%; mean = 6.7%; SD = 5.7%), representing a mean improvement of 0.34 points on the 5-point scale used to rate the repetition performance.
Figure 1A shows pretreatment intraindividual variability versus improvement in repetition mean performance, and Figure 1B shows pretreatment repetition mean versus pretreatment intraindividual variability.

Relationships between pretreatment intraindividual variability, pretreatment mean, and improvement in mean for the repetition test. Participant 2 was excluded because of missing data, leaving 18 participants. Those marked with black crosses are excluded from further analysis (see main text).
We removed several participants from further analysis because of outlier status (4) and possible ceiling effects (ie, participants near threshold pretreatment or posttherapy: 10, 11, 12, 16), as detailed in supplemental material. For the remaining participants, there is a positive correlation (r = 0.68; P = .01 uncorrected) between pretreatment intraindividual variability and improvement—that is, higher pretreatment intraindividual variability is associated with greater improvement. We then considered all the pretreatment variables listed earlier (age, MPO, NSC, aphasia type, pretreatment intraindividual variability, and pretreatment mean) as possible predictors of improvement in posttherapy repetition accuracy. In a stepwise regression, the optimal regression model found intraindividual variability to be the only predictor of improvement (P = .01, as noted above). With all participants included, the relationship remains highly significant (P = .0001), with no additional predictors selected. Repeating this stepwise regression without variability included in the model resulted in the selection of no variables, whether for the entire group or with near-threshold participants excluded.
Finally, we examined intraindividual variability in repetition accuracy immediately posttreatment (week 6). This decreased significantly over the course of treatment (2-tailed paired t-test, P < .05), regardless of whether we consider all participants or exclude near-threshold participants (as detailed in supplemental material). Posttreatment intraindividual variability in repetition accuracy is positively correlated with change in repetition mean when we consider all 18 participants (Pearson’s r = 0.49; P = .04 in a 2-tailed t-test). However, it is no longer significant when we exclude participants who were near threshold either before or after therapy (r = 0.25; P = .41). The change in intraindividual variability in repetition accuracy over the course of treatment has a significant negative correlation with improvement, whether considering all participants (r = −0.48; P = .053) or excluding near-threshold participants (r = −0.57; P = .04; see Figure 2). This effect remains if we control for all the confounding variables (age, MPO, NSC, aphasia type, pretreatment mean, pretreatment intraindividual variability; r = −0.69; P = .018). Put another way, reduction in intraindividual variability is positively correlated with improvement.

Relationship between change in intraindividual variability and change in mean for the repetition test. Participants shown are as in Figure 1.
Discussion
The present study reviews outcomes of a clinical trial of imitation-based therapy for chronic aphasia and explores a new hypothesis about the role of intraindividual variability in predicting benefit from language therapy following stroke. This study showed positive effects of the IMITATE system of repetition-based, computer-assisted speech/language therapy for patients with chronic aphasia. In particular, participants undergoing the therapy had statistically significant gains on composite language and cognitive measures on a standard test for aphasia (WAB) as well as on 2 repetition accuracy measures. Significant gains were made over a relatively short treatment period (6 weeks) in participants who, in some cases, were more than a decade removed from their stroke. Future investigation will refine the IMITATE therapeutic protocol in view of the results from the current study, related research, 21 and other theoretical considerations.
Our analysis suggests that participants demonstrating higher levels of performance variability prior to therapy are likely to experience greater improvement over the course of treatment. Specifically, participants demonstrating greater intraindividual variability during repetition before therapy demonstrated greater improvement in repetition than those with lower intraindividual variability. Perhaps most interestingly, intraindividual variability declined over the course of treatment, and there was a significant correlation between performance improvement and intraindividual variability reduction.
The finding that intraindividual variability is a positive predictor of language improvement appears to conflict with existing literature on the relation between intraindividual variability and task performance in cognitive and perceptual-motor domains. In healthy aging and dementia, intraindividual variability has generally been negatively correlated with both short-term performance 22 and long-term variables 5 —including time until death. 23 On the other hand—and perhaps most relevant here—evidence also suggests that increased variability in a particular cognitive or motor domain may be associated with greater potential for change following training specific to that domain. 12
Correlation of task-specific variability with performance improvement has been attributed to influences of learning and strategy use in development. 24 During skill acquisition, changes occur in execution of strategies, even in the absence of changes in strategy selection. 25 These subtle changes may result in adaptive variability while learning specific tasks. Thus, as an individual achieves maximal potential on a task, variability decreases. In expert motor control, when an individual is performing a highly practiced skill at or near peak level, performance variability is reduced, and this consistency is reflected in precise activation of neural networks during motor planning. 26 Our finding that intraindividual variability decreased over the course of therapy provides further support for this proposition, especially given the significant correlation between improvement on the repetition test and intraindividual variability reduction. Within the limitations of their language impairment, our participants became more expert at the practiced task, thus demonstrating more consistent and more accurate performance. Although it is impossible to determine from the present study, it would be of great interest to explore whether such variability might continue to play a predictive role in the outcome of further therapy or with the introduction of new or more difficult tasks.
That increased variability has been found at dynamic periods of cognitive decline and development suggests, not surprisingly, that these transitions do not occur uniformly. It seems probable that such variability indicates a lack of system stability that is influenced by opposing tendencies. On one hand, in a progression toward overall decline, increased variability results as the valleys of performance drop more deeply; on the other, in the case of development, or recovery, the heightening of peaks is responsible for the observed fluctuation. In support of this, increasing latency for an individual’s slowest reaction times is related to increasing variability for older adults. 27 Nevertheless, cognitive enhancement can occur with training and stimulation programs, with functional gains reported in daily life despite increasing age. 28 As in development and recovery, when older participants realize increased potential, greater intraindividual variability is correlated with improved learning. 12
In the present study, it is possible that individuals demonstrating less variability are at or near an asymptote of their abilities, given their neurological status, the extent of lesion damage, and the degree to which they have already experienced recovery. Although it is generally accepted that individuals with aphasia encounter a plateau within the first year following stroke, 29 intraindividual variability may serve as a more sensitive, individualized measure of potential than time poststroke as well as an immediate and cost-efficient means of prediction.
The implications for language rehabilitation are of great significance because predictors of response to aphasia treatment are presently limited. Specifically, it may be productive for clinicians to target skills in which patients demonstrate high performance variability prior to treatment rather than areas in which limited variability suggests a reduced capacity for gains with therapy. It may also be productive to periodically reassess patient performance on a variety of tasks to determine whether cycling through treatment goals, selected on the basis of variability as a proxy for potential, may be beneficial. However, such possibilities should be interpreted with caution because the present study represents a new avenue of inquiry, and little is yet known about how intraindividual variability changes over the course of recovery. Although the present analysis considered time postonset, those participating in this study were all at chronic aphasia stages. Therefore, there is no suggestion that findings would be identical or even similar in acute phases of recovery. Additionally, several measures that may affect variability in task performance were not included in our assessment, such as attention, mood, and fatigue. Future studies would benefit from operationalization and inclusion of these variables.
Further limitations of our study include the potential for practice effects, given the relatively short time over which these tests were administered. However, we believe that the lack of significant differences between the 2 pretherapy time points and the 2 posttherapy time points suggests that this is not a major confound for the present study. Although our inclusion of fluent versus nonfluent aphasia classifications did not indicate significant differences in benefit between these groups, there was not adequate power in our sample to address the differential effects of repetition therapy that may exist for different aphasia types. It is also worth noting that our imitation-based therapy was heavily dependent on motor processes, as was our repetition outcome measure. Therefore, it is not possible to definitively state that intraindividual variability would predict improvement on purely cognitively based tasks.
Although extrapolation from the present study to clinical guidelines would be premature, if the relationship between behavioral intraindividual variability and posttreatment performance withstands further exploration, it may suggest that those demonstrating higher levels of baseline variability are good candidates for intervention. Intraindividual variability, in this conception, could represent a measure of plastic potential, the extent to which an individual’s present neurological status is conducive to the kind of recovery or reorganization necessary to manifest improvement with practice and stimulation. However, individuals performing consistently at the same level may require different types of intervention if they are to realize enhanced function, and these patients may be better candidates for referral to clinical trials, pharmacology, or more invasive forms of treatment.
Footnotes
Acknowledgements
All speech and language evaluations were coordinated by Dr Leora Cherney at the Rehabilitation Institute of Chicago (RIC) and performed by her staff at RIC. The research staff at The University of Chicago included Blythe Buchholz and Robert Fowler, who helped coordinate the project, and Dan Rodney, who authored the IMITATE software. Dr Ana Solodkin supervised the drawing of lesion masks. The support of these individuals is gratefully acknowledged as well as the support of the patients and families who generously participated in this research. The support of these individuals is gratefully acknowledged, as are the patients and families who generously participated in this research.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the National Institute of Deafness and other Communication Disorders (NIDCD) of the National Institutes of Health (NIH) under Grants R01-DC007488 and R33-DC008638, the James S. McDonnell Foundation under a grant to the Brain Network Recovery Group (A. R. McIntosh, PI) and Mr William Rosing, Esq.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
