Abstract
Introduction
An important problem in the field of sport-related concussion is the lack of a ‘gold-standard’ clinical assessment tool. Currently, the diagnosis relies heavily on self-reporting of symptoms and observation of clinical signs by medical professionals. To address this, our group has been motivated to develop objective measures of neurological impairment following concussion. Spatial working memory is an important aspect of cognitive function that might be impaired following concussion. In the present study, we measured spatial working memory using a robotic spatial span task. We first assessed test–retest reliability in 82 healthy athletes who underwent baseline testing across two athletic seasons using intraclass correlation coefficients. We then assessed spatial span performance relative to baseline in 47 athletes acutely following sport-related concussion using a reliable change index with 80% confidence limits to define impairment on an individual basis.
Results
We found good test–retest reliability for the mean span (a measure of spatial working memory span length; intraclass correlation coefficient = 0.79), and moderate reliability for the response duration (time taken per spatial target; intraclass correlation coefficient = 0.64) in healthy athletes. However, only 19% of acutely concussed athletes showed evidence of impairment relative to baseline in mean span, and even fewer (9%) showed evidence of impairment in response duration. Analysis of serial position curves revealed primacy and recency effects for this task, but no group-level differences between concussed and healthy athletes. Analysis of specific types of errors showed a higher rate of substitution errors in the concussed group at baseline, suggesting possible malingering in a small number of athletes.
Conclusion
Overall, few athletes showed evidence of impaired spatial working memory acutely following concussion, suggesting either that spatial working memory is not commonly impaired acutely post-concussion, or that the present task is not sufficiently demanding.
Introduction
Sport-related concussion is a common injury sustained by athletes. Estimates based on data from the Centers for Disease Control and Prevention suggest that 1.6–3.8 million sport-related traumatic brain injuries occur each year in the United States. 1 While sport-specific concussion rates vary, contact and collision sports such as ice hockey, football, rugby and wrestling have incidence rates in the range of 0.2–18 concussions per 1000 athlete exposures or player game-hours, depending on the age, sport, practice or game setting, and level of play.2–6 In terms of cumulative incidence (the proportion of athletes who will sustain a concussion in a given season), estimates range from 3% to 10% for football, ice hockey and wrestling.6–8
A major challenge in the field of concussion remains the lack of gold-standard diagnostic criteria. Diagnosis and prognosis for concussion are based largely on subjective reporting of symptoms and observation by medical professionals of clinical signs such as forgetting instructions, answering questions slowly, balance and gait abnormalities, lack of coordination or abnormal neurological examinations.9–11 The lack of objective measures of impairment increases the likelihood that athletes may be misdiagnosed or return to sport prematurely, which can increase the risk of re-injury and poor outcomes.12–16
Our group has been using the KINARM robot (BKin Technologies Ltd., Kingston, Ontario, Canada) to objectively measure subtle neurologic deficits following acute sport-related concussion17–19 and other traumatic brain injuries.20,21 A robotic device is one that is automated and programmable. This provides several advantages over more traditional assessment tools. For example, a task can be delivered in exactly the same way for every subject, and multiple aspects of task performance can be quantified precisely and accurately. Robotic technology as an assessment tool is objective, reliable and efficient,22,23 and can be used to measure multiple aspects of neurological function. The KINARM Standard Tests™ include assessments of sensory, motor, as well as cognitive function.
Neurocognitive testing results suggest that impairment of cognitive function occurs in the acute stage post-concussion.24–27 A recent review of meta-analyses found that the reported effect sizes (Cohen’s d) of mild traumatic brain injury on neuropsychological function ranged from 0.06 to 0.61. 28 The Sport Concussion Assessment Tool (SCAT)29,30 includes a cognitive component assessing immediate and delayed verbal recall, concentration and working memory, with an effect size of impairment at 24 h post-concussion of d = 0.36. 31 Group-level impairments in working memory specifically have been described at both acute 32 and remote33,34 time points following concussion. Working memory might be particularly relevant for skilled sport performance. For example, Vestberg and colleagues 35 found a significant correlation between working memory performance and goals scored in elite youth soccer players. We were interested in assessing working memory performance in the visuospatial domain in concussed athletes due to the important parallels with skills necessary for sport performance, and the risks and challenges of returning to play if these skills are impaired. Group-level differences in spatial working memory have been described between healthy university students and those with a history of concussion with a large effect size (η 2 = 0.19). 36
Visuospatial working memory has been assessed most commonly using variations of the Corsi block-tapping task.37,38 In the original Corsi task, an experimenter tapped a series of blocks on a board, and the subject was instructed to repeat the sequence in the correct order. Although Corsi block-tapping and related tasks have been used extensively in research and clinical settings for nearly half a century, there remains significant variation between studies in specific task parameters, administration, and scoring, making assessment of the psychometric properties of these tasks challenging. 39 Nevertheless, Kane et al. 40 found high internal consistency (Cronbach’s α = 0.86) for a matrix span task, and correlations with other memory span tasks (operation span, word span, digit span, etc.) ranging from 0.40 to 0.65, suggesting that these span tasks measure a similar neuropsychological construct.
In the present study, we evaluated the reliability of a novel robotic spatial working memory task, based in part on the Corsi task, and similar to Kane et al.’s matrix span, 40 in healthy athletes across two athletic seasons. We then assessed spatial working memory performance relative to baseline in a separate group of athletes acutely following concussion. The spatial span is one of the KINARM Standard Tests™, and was run as one component of a larger robotic testing battery in our ongoing concussion study. 19 We hypothesized that a large percentage of acutely concussed athletes would show impairment in spatial span performance relative to their own baseline. To our knowledge, no prior studies have examined the KINARM spatial span task in any capacity (healthy normative performance or the effects of neurologic injury). Therefore, a secondary objective was to describe the patterns and serial positions of spatial span errors on this task in both healthy and concussed athletes to allow comparison with published findings for other memory span tasks.41,42
Materials and methods
Participants
Participants were recruited from National, provincial, and club-level sports teams to participate in a prospective study of sport-related concussion. Participants were included in the study if they could understand basic task instructions in English. Ages ranged from 13 to 31 years at the beginning of study participation. Participants were excluded from the study if they had peripheral or central nervous system disorders or ongoing musculoskeletal injuries affecting the upper extremity.
Two groups of subjects are presented in the current study: (1) a healthy control cohort: athletes who completed two baseline assessments in consecutive seasons on the spatial span task without sustaining a concussion between assessments and (2) a concussion cohort: athletes with a single baseline assessment who subsequently sustained a concussion resulting in an acute post-concussion state.
Both cohorts underwent testing at two time points. For the healthy control cohort, the time points are referred to as baseline 1 and baseline 2. For the concussion cohort, the time points are referred to as baseline 1 and acute concussion. Athletes sustaining a suspected concussion reported to the sport medicine clinic for assessment on the recommendation of their team athletic therapist or physiotherapist. The acute post-concussion state was diagnosed by a sport medicine physician (BWB) based on clinical experience and judgement after a comprehensive history and neurological assessment, as suggested by international consensus on concussion in sport.9,43,44 All reported concussions were sustained during sport. Collection of data for the acute concussion time point was scheduled as soon as possible (maximum 10 days) following the injury. Athletes were excluded from the study if they sustained a concussion but had complete resolution of all concussion signs and symptoms by the time of their acute concussion assessment (n = 4). The University of Calgary Conjoint Health Research Ethics Board approved all study protocols (Ethics ID: 23963), and participants provided written informed consent in accordance with the declaration of Helsinki prior to study participation.
Spatial span task
Data were collected using a KINARM end-point robotic device (BKIN Technologies Ltd., Kingston, Ontario, Canada). The spatial span task is one of the KINARM standard tests™. Subjects stood on a platform and grasped the handle of a robotic arm with their dominant hand, allowing them to interact with an augmented reality environment presented on a horizontal screen (Figure 1(a)). Images on the screen appeared in the same plane as the handle, which was represented as a white circle with a radius of 0.5 cm. At the beginning of each trial, a start box appeared at the bottom of the screen and participants were instructed to reach into this box (Figure 1(b)). A virtual wall implemented by the robot prevented subjects from leaving the start box until the sequence presentation was complete. All possible target locations were displayed on the screen throughout the task as grey squares outlined in cyan on a black background (Figure 1(b)). There were 12 possible target locations arranged in a 3-row x 4-column grid. Targets were 3 cm × 3 cm, with 3 cm spaces between targets on all sides.

Spatial span task presentation. (a) Schematic representation of a subject interacting with the KINARM end-point robot during presentation and execution of the spatial span task. Subjects were standing and looking down at the horizontal subject display, with their hand position represented as a white circle on the screen, appearing in the same plane as the robot handle. (b) Example of subject display during the presentation of a single target in the sequence. Note that subjects could not see their arm during the task. (c) Temporal representation of target sequence presentation. Each target was on for 500 ms, with 250 ms between targets until sequence was complete. Subjects were then instructed to use the robotic handle to reproduce the sequence.
The task was comprised of two practice trials and 16 test trials. Sequence lengths started at four targets and increased by one target following correct trials or decreased by one target following incorrect trials. Each target in the sequence turned cyan for 500 ms, with 250 ms between sequential target presentations (Figure 1(c)). Following presentation of the entire sequence, the start box disappeared, and subjects were instructed to reproduce the sequence in the correct order by moving the handle into each target. Subjects received both haptic feedback (vibration through the handle) and visual feedback (target turning cyan) to indicate that a target had been selected.
Outcome measures
Overall task performance was assessed for the 16 test trials using two measures: mean span and response duration. Mean span was calculated as follows:
where correct represented the lengths of all correct sequences and incorrect represented the lengths of all incorrect sequences. For example, if a subject was always successful at sequence length 4 but always failed at sequence length 5, this method would give them a mean span of 4. The response duration was defined as the time taken per target, and was calculated as the total response time (end of target sequence presentation to the end of target selection by the subject for all trials), divided by the total number of targets presented.
An algorithm was developed to evaluate the frequency of different types of errors. Errors were defined as transpositions (two targets switching order within the sequence), relocations (a single target moving to a different position in the sequence), omissions (skipping a target), additions (inserting a target that was not in the original sequence) and substitutions (replacing a target with one that was not in the original sequence). The algorithm started with the first error in the sequence and classified it as one of the error types listed above, then corrected the error and repeated the process from the beginning of the sequence until there were no remaining errors in the sequence. Error rate was defined as the percent of all target presentations on which a given error type occurred. Each sequence consisted of multiple targets, so the total number of target presentations was equal to the sum of all sequence lengths presented.
Clinical assessment
Post-concussion symptom scale (PCSS) scores were assessed using the SCAT3. 43 This evaluation includes 22 possible symptoms (e.g. headache, dizziness, sensitivity to light, difficulty concentrating, fatigue, trouble falling asleep, irritability, etc.) rated on a Likert-type scale from 0 to 6 (0 = none; 1–2 = mild; 3–4 = moderate; 5–6 = severe). The maximum possible PCSS score is therefore 132. PCSS scores were categorized into four subgroups of increasing severity: PCSS 0 (score ≤ 5), PCSS 1 (score 6–22), PCSS 2 (score 23–88) and PCSS 3 (score > 88) based in part on the categories used by Chen et al. 45 PCSS data were missing for one subject. Athletes self-reported their histories (Hx) of concussion (with loss of consciousness (LOC), amnesia and time loss from sport also self-reported for each injury), migraine, neck pain, anxiety, depression and attention deficit hyperactivity disorder (ADHD). Any subject who reported a prior history of concussion was symptom-free and participating fully in unrestricted training or competition prior to inclusion in the study.
Data analysis
All data analysis was performed in MATLAB (MathWorks, Natick, MA, USA) using built-in functions and custom-written code. Reliability from baseline 1 to baseline 2 in the healthy control cohort was evaluated using intra-class correlation coefficients (ICCs). We were interested in the consistency of the measures as opposed to the absolute agreement, since learning effects, which may produce systematic errors across tests, were accounted for elsewhere (see below). We therefore used a consistency type, 2-way random effects model for ICC calculation (type C-k), using a MATLAB ICC function available online 46 and based on the classifications of McGraw and Wong. 47 For descriptive purposes, we classified ICC values from 0.5 to 0.75 as moderate reliability and ICC values > 0.75 as good reliability. 48 In addition, Bland–Altman plots were inspected to visualize any systematic differences across tests, including possible practice effects. 49
For evaluating change from baseline test results following concussion, we used reliable change indices (RCIs). 50 The RCI approach normalizes the change in scores from baseline based on the variability and reliability of the measure in the healthy control population. RCI scores can then be compared to a z-distribution. We defined a clinically significant change from baseline performance as any score exceeding the 80% RCI limits (|RCI| ≥ 1.28) in the direction of impaired performance.
RCI calculations can also take into account practice effects observed in the population.51,52 Practice effects were defined as the mean change from baseline 1 to baseline 2 (μ2 − μ1) in the healthy control cohort if the 95% confidence interval of the mean change did not overlap zero. The RCI was then calculated as follows:
Age-related effects on task-performance were evaluated using least squares linear regressions of raw scores (baseline 1) and raw change scores (baseline 2 − baseline 1) in the healthy control cohort as a function of age. For parameters showing a significant relationship between age and raw score (p < 0.05), an age-adjusted score was calculated and used for subsequent reliability and RCI analyses. This adjusted score was calculated by subtracting the difference between the regression-predicted score and the sample mean score from the raw score. If there was a significant relationship between age and the raw change from baseline 1 to baseline 2, the practice effect in the RCI calculation was also adjusted for age in the same manner. In a secondary analysis, exact age matching was performed between cohorts. This was done based on age at the first assessment rounded down to the nearest year. Within each age year, athletes from each cohort were randomly matched to the other cohort, and those who could not be matched due to unequal age distributions were excluded from the age-matched analysis.
To evaluate differences between groups, paired-sample/one-sample (paired data) or two-sample (unpaired data) t-tests were used for normally distributed parameters (see criteria above) while Wilcoxon signed rank (paired data) rank sum (unpaired data) tests were used for non-normally distributed parameters. For consistency within the manuscript, summary data are presented as mean ± standard error of the mean for both normally and non-normally distributed parameters unless otherwise stated. Cohen’s d was used to evaluate the effect sizes of group differences and practice effects for normally distributed parameters, while η 2 (eta squared) was used to evaluate effect sizes for non-normally distributed parameters. For descriptive purposes, effect sizes were considered small if d ≥ 0.2 or η2 ≥ 0.01; medium if d ≥ 0.5 or η2 ≥ 0.059, and large if d ≥ 0.8 or η2 ≥ 0.14.54–56
Results
Recruitment of healthy control and concussion cohorts
The healthy control cohort consisted of 82 athletes who completed assessments in two consecutive seasons (baseline 1 and baseline 2) on the spatial span task without sustaining a concussion between assessments. The concussion cohort consisted of 47 athletes who were assessed at baseline 1, subsequently sustained a concussion resulting in an acute post-concussion state, and were reassessed at the acute concussion time point. The sport, sex and age of the healthy control and concussion cohorts are presented in Table 1, while the clinical characteristics of both groups are presented in Table 2. Within both groups, athletes with a self-reported prior history of concussion, defined as one or more concussions with complete symptom resolution and full unrestricted sport participation prior to participation in the study, were included (Table 2). There were no differences between groups in SCAT3 PCSS scores at baseline 1 (Table 2; W(47,81) = 3054.5; p = 1.00; η2 < 0.001). At the acute concussion time point, the concussed group had significantly higher PCSS scores relative to their own baseline 1 (Table 2; T(46) = 63.5; p < 0.001; η2 = 0.28) and relative to healthy controls at baseline 2 (Table 2; W(46,82) = 3833.5; p < 0.001; η2 = 0.42).
Participant sport, sex and age. Athletes were recruited from alpine skiing, ice hockey, sliding sports (luge, skeleton and bobsleigh), speed skating (short track) and wrestling.
Participant clinical characteristics. n = # reporting a positive history at baseline 1; SCAT PCSS: Post-concussion symptom scale on the sport concussion assessment tool (SCAT), version 3; ADHD: attention deficit hyperactivity disorder; LOC: loss of consciousness.
Relationship between demographic variables and spatial span performance in healthy control athletes
The healthy control group (n = 82) was used to evaluate the influence of age, sex and self-reported prior history of concussion on spatial span performance (Figure 2). Performance was measured through the mean span and the response duration (time taken per target). We evaluated the relationship between demographic variables and task performance at baseline 1 (Figure 2(a) and 2(c)), as well as any systematic changes in performance from baseline 1 to baseline 2 (practice effects; Figure 2(b) and 2(d)).

Demographic variables and task performance. (a) Scatter of mean span on baseline 1 for the healthy control cohort as a function of age. Box–whisker plots on the right represent the spread of scores (box: median and interquartile range; whiskers: 10th and 90th percentiles) for male vs. female athletes, and for those with or without a history of concussion prior to study participation. (b) Scatter of change in mean span from baseline 1 to baseline 2 as a function of age. (c) Scatter as in (a) for the response duration (time taken per target). Black line represents the linear regression fit: R2 = 0.067, F = 5.8, p = 0.02. (d) Scatter as in (b) for the change in response duration from baseline 1 to baseline 2. Black line represents the linear regression fit: R2 = 0.076, F = 6.6, p = 0.01.
There was no influence of age on mean span at baseline 1 (Figure 2(a); linear regression of mean span as a function of age: R2 = 0.001, F = 0.10, p = 0.75). Similarly, age did not affect the change in mean span from baseline 1 to baseline 2 (Figure 2(b); R2 = 0.009, F = 0.71, p = 0.40). There was, however, a small but significant relationship between age and response duration. While older athletes were faster at baseline 1 (Figure 2(c); R2 = 0.067, F = 5.8, p = 0.02), their response duration decreased less from baseline 1 to baseline 2 (Figure 2(d); R2 = 0.076, F = 6.6, p = 0.01). Due to the difference in age between the healthy control and concussion cohorts (Table 1; healthy control: 19.7 ± 0.6 years; concussion: 15.9 ± 0.4 years; t(127) = 4.4, p < 0.001, Cohen’s d = 0.85), we accounted for this age-related difference in response duration by calculating an age-adjusted response duration and an age-adjusted practice effect for subsequent reliability and reliable change analyses.
There were no differences in the healthy control cohort between male (n = 56) and female (n = 26) athletes on mean span for baseline 1 (Figure 2(a); males: 5.1 ± 0.1; females: 5.2 ± 0.2; t(80) = −0.47, p = 0.64, Cohen’s d = 0.11) or on the change in mean span from baseline 1 to baseline 2 (Figure 2(b); males: 0.34 ± 0.08; females: 0.20 ± 0.11; t(80) = 1.04, p = 0.30, Cohen’s d = 0.25). Similarly, male and female athletes did not differ in response duration for baseline 1 (Figure 2(c); males: 1.23 ± 0.02 s; females: 1.27 ± 0.03 s; W(26,56) = 1207, p = 0.20, η2 = 0.02) or in the change in response duration from baseline 1 to baseline 2 (Figure 2(d); males: −0.06 ± 0.02 s; females: −0.09 ± 0.03 s; W(26,56) = 944, p = 0.18, η2 = 0.02). We therefore grouped male and female athletes together in our analyses.
A self-reported history of concussion in the healthy control cohort prior to participation in the study did not affect mean span on baseline 1 (Figure 2(a); no Hx concussion, n = 28: 5.2 ± 0.1; Hx concussion, n = 54: 5.1 ± 0.1; t(80) = 0.44, p = 0.66, Cohen’s d = 0.10), response duration on baseline 1 (Figure 2(c); no Hx concussion: 1.24 ± 0.02 s; Hx concussion: 1.25 ± 0.02 s; W(28,54) = 1140, p = 0.83, η2 < 0.001), or the change in response duration from baseline 1 to baseline 2 (Figure 2(d); no Hx concussion: −0.06 ± 0.03 s; Hx concussion: −0.07 ± 0.02 s; W(28,54) = 1186, p = 0.82, η2 < 0.001). Athletes with no prior history of concussion showed a slightly greater improvement in mean span from baseline 1 to baseline 2 compared to those with a prior history of concussion (Figure 2(b); No Hx concussion: 0.5 ± 0.1; Hx concussion: 0.2 ± 0.1; t(80) = 2.00, p = 0.0484, Cohen’s d = 0.46).
Spatial span reliability in healthy controls
The mean span showed good test–retest reliability in the healthy control cohort (Figure 3(a), ICC = 0.79). The average mean span was 5.2 ± 0.1 for baseline 1 and 5.5 ± 0.1 for baseline 2. On average, mean spans increased from baseline 1 to baseline 2 by 0.3 ± 0.1 (mean ± 95% confidence interval; Figure 3(b)), suggesting that a small practice effect was present (Cohen’s d = 0.45).

Test–retest reliability of spatial span parameters. (a) Scatter of mean span for healthy controls for baseline 1 and baseline 2 (n = 82). Points on the dashed line would indicate perfect agreement between the two baselines. (b) Bland–Altman plot of mean span difference (baseline 2 − baseline 1) as a function of the average mean span from both baselines. Dashed line represents no difference. Solid lines represent mean ± 2 standard deviations (∼95% of the population) for the difference measure. Note that the mean is greater than zero, representing an increase in mean span on average from baseline 1 to baseline 2. (c) Scatter plot as in (a) for response duration. (d) Bland–Altman plot as in (b) for response duration. Note that the mean is less than zero, representing a decrease in time taken per target on average from baseline 1 to baseline 2.
The reliability of the age-adjusted response duration was moderate (Figure 3(c), ICC = 0.64). Athletes took an average of 1.24 ± 0.02 s per target at baseline 1 and 1.18 ± 0.02 s per target at baseline 2. The age-adjusted response duration decreased on average by 0.06 ± 0.03 s (mean ± 95% confidence interval; Figure 3(d)), suggesting that a small practice effect was also present for this parameter, with athletes getting faster from baseline 1 to baseline 2 (Cohen’s d = 0.44).
The concussion cohort did not differ from the healthy control cohort at baseline
There were no differences in task performance at baseline 1 between the concussion (n = 47) and healthy control (n = 82) cohorts. The mean span at baseline 1 was 5.2 ± 0.1 for both cohorts (t(127) = 0.37, p = 0.71, Cohen’s d = 0.07). The age-adjusted response duration at baseline 1 was 1.24 ± 0.02 s for healthy controls and 1.27 ± 0.02 s for the concussion cohort (W(47,82) = 3198, p = 0.49, η2 = 0.004). We performed a second analysis using an age-matched subgroup (n = 39 from each cohort, matched exactly for age in years at the first baseline), and there were also no differences in task performance between groups at baseline 1 (mean span: healthy control cohort 5.2 ± 0.1; concussion cohort 5.3 ± 0.1; paired t-test t(38) = 0.50, p = 0.62, Cohen’s d = 0.12; age-adjusted response duration: healthy control cohort 1.24 ± 0.02 s; concussion cohort 1.27 ± 0.03 s; Wilcoxon signed rank T = 322, p = 0.34, η2 = 0.01).
Effect of acute concussion on mean span and response duration
At an individual level, 19% of acutely concussed athletes (9/47) had mean span RCIs outside the 80% RCI limit in the direction of impaired performance (lower RCIs; Figure 4(b)). At a group level, a comparison of mean span RCIs in the concussed and healthy control cohorts showed no significant differences between groups in the full cohorts (healthy control cohort: 0.00 ± 0.12; concussed cohort: −0.28 ± 0.16; t(127) = 1.41, p = 0.16, Cohen’s d = 0.26). The same results were found for the exactly age-matched sub-group (healthy control cohort: -0.01 ± 0.16; concussed cohort: −0.26 ± 0.17; paired t-test t(38) = 0.98, p = 0.33, Cohen’s d = 0.24). Prior history of concussion did not affect the acute concussion mean span RCI (no Hx concussion: RCI = −0.58 ± 0.26, n = 22; Hx concussion: RCI = −0.01 ± 0.17, n = 25; t(45) = 1.87, p = 0.07, Cohen’s d: 0.56).

Spatial span performance following sport-related concussion. (a) Scatter of mean span for athletes at baseline 1 and at the acute concussion time point (n = 47). Points falling on the dashed line would indicate perfect agreement between the two assessments. Circles represent athletes with no prior history of concussion (n = 24), while squares represent athletes with a history of concussion prior to the first baseline (n = 25). (b) RCI values for mean span at the acute concussion time point plotted as a function of mean span at baseline 1. Dashed line represents the expected average change. Solid lines represent the 80% RCI limits (±1.28). Athletes falling outside of these limits in the direction of worse performance (i.e. lower mean span) would be considered impaired (grey box). Right y-axis shows the conversion from RCI values to the raw change in mean span from baseline 1 to acute concussion. (c) As in (a) for age-adjusted response duration. (d) As in (b) for the age-adjusted response duration. Note that the direction of impaired performance is opposite for this parameter – an increase in the response duration would suggest impairment. Also note that the raw change values are approximate and represent averages because the practice effect was age-adjusted. The raw change values would be shifted down for younger athletes (larger practice effect, see Figure 2(d)) and shifted up for older athletes (smaller practice effect, see Figure 2(d)).
At an individual level, 9% of athletes (4/47) had age-adjusted response durations outside the 80% RCI limit in the direction of impaired performance (higher RCIs; Figure 4(d)). At a group level, there was no difference in RCI values for adjusted response duration between the concussed and healthy control cohorts (healthy control cohort: 0.00 ± 0.11; concussed cohort: 0.06 ± 0.14; t(127) = 0.31, p = 0.76, Cohen’s d = 0.06). The same results were found for the age-matched subgroup (healthy control cohort: 0.05 ± 0.14; concussed cohort: 0.03 ± 0.15; paired t-test: t(38) = 0.08, p = 0.93, Cohen’s d = 0.02). Prior history of concussion did not affect the acute concussion age-adjusted response duration RCI (no Hx concussion: RCI = −0.08 ± 0.25, n = 22; Hx concussion: RCI = 0.18 ± 0.15, n = 25; t(47) = 0.89, p = 0.37, Cohen’s d = 0.27).
Influence of symptom burden or timing post-concussion on task performance
We found no relationship between concussion symptom burden (PCSS) and mean span RCI (Figure 5(a); linear regression R2 = 0.0003; F = 0.013; p = 0.91) or age-adjusted response duration RCI (Figure 5(b); linear regression R2 = 0.08; F = 3.66; p = 0.06). Similarly, we found no relationship between days post-injury at testing and the mean span RCI (linear regression R2 = 0.01; F = 0.44; p = 0.51) or the adjusted response duration (linear regression R2 = 0.0002; F = 0.011; p = 0.92).

Relationship between symptom severity and task performance. (a) Scatter of RCI for mean span at the acute concussion time point as a function of post-concussion symptom severity score (PCSS). Vertical dashed lines represent cut-off points for symptom severity categories used in Figure 4 (PCSS 0; PCSS 1; PCSS 2). Horizontal dashed lines represent the 80% RCI limits. Athletes falling outside of these limits in the direction of worse performance (decreased mean span) would be considered impaired (grey box). (b) As in (a) for response duration. RCI values have been flipped on the y-axis so that the direction of impairment is the same for both panels.
Effect of serial position
Visual inspection of the serial position curves for sequence lengths 4 to 8 on baseline 1 in the healthy control cohort (Figure 6(a)) revealed that accuracy at each serial position was affected by both the position within the sequence and the length of the sequence. There was a strong primacy effect for position 1 of the sequence for all sequence lengths. Athletes made correct selections at the first position over 85% of the time for all sequence lengths (length 4: 90 ± 2%, n = 82; length 5: 93 ± 1%, n = 82; length 6: 90 ± 1%, n = 81; length 7: 86 ± 3%, n = 69; length 8: 88 ± 4%, n = 39). In addition, there was a small recency effect for the last one to two positions in the sequence for sequence lengths greater than 4. Accuracy tended to decrease linearly with serial position until the final one to two positions in each sequence, which were associated with a reversal of the downward trend in accuracy, producing a bowed serial position curve. Finally, there was an effect of list length, such that accuracy was worse at equivalent serial positions for longer list lengths. These primacy, recency and sequence length effects were evident across both healthy controls and concussed cohorts (Figure 6(b)).

Serial position curve. (a) Correct selections for healthy control at baseline 1 as a function of serial position in the sequence for sequence lengths from 4 to 8. Note the strong primacy effect for serial position 1 for all sequence lengths, as well as a recency effect for the last one to two positions in the sequence for longer list lengths. (b) Serial position curves for sequence lengths 5 to 8 for the healthy control cohort at baseline 1 and baseline 2, as well as for the concussion cohort at baseline 1 and at acute concussion.
Frequency of error subtypes
Errors in serial recall of spatial locations can be categorized into various subtypes (Figure 7(a)). Any combination of one or more errors on a given trial will result in an incorrect response. In the healthy control cohort at both baseline 1 and baseline 2, transposition errors were the most common, occurring on 4.9 ± 0.2% (baseline 1) and 4.7 ± 0.2% (baseline 2) of all targets presented. Note that although transposition errors involve two targets in the sequence, they were counted as a single error. Transposition errors were followed in frequency by substitution errors (3.6 ± 0.2% and 3.5 ± 0.2%), omission errors (2.4 ± 0.2% and 2.3 ± 0.2%), relocation errors (1.8 ± 0.2% and 1.9 ± 0.1%) and addition errors (0.9 ± 0.1% and 1.0 ± 0.1%).

Frequency of specific error types. (a) Frequency of specific types of errors for both cohorts at both time points. Note the increased rate of substitution errors in the concussion cohort at baseline 1. (b) Histograms of the frequency of different rates of substitution errors across both cohorts and time points. Bar colors for each cohort are consistent with (a), while history of concussion is indicated with hash marks. Note the outliers with a high rate of substitution errors in the concussion cohort at baseline 1.
In the healthy control cohort, error rates did not change for any error type from baseline 1 to baseline 2 (Figure 7(a); Wilcoxon signed rank tests; all p values > 0.32). However, for the concussed cohort, substitution error rates were different between the baseline and the acute concussion time points, with a higher rate of substitution errors occurring at baseline (Figure 7(a); baseline: 4.6 ± 0.4%; concussion: 3.5 ± 0.3%; T = 322, z = 2.39, p = 0.02, η2 = 0.06). No other error types changed from the baseline to the acute concussion time points (Wilcoxon signed rank tests, all p values > 0.14). Note that some error rate distributions fit our criteria for normality, but the majority did not and therefore for consistency all error rate analyses were non-parametric. Visual inspection of the distribution of substitution error frequencies in each cohort (Figure 7(b)) suggested that the increase in substitution error rate at baseline 1 for the concussed cohort was due to a longer tail in the distribution representing a small number of athletes with high substitution error rates. Concussion history did not significantly affect substitution error rate at any of the time points (Wilcoxon rank sum tests, all p values > 0.08).
Discussion
In the present study, we characterized the reliability of performance on a spatial span task in healthy athletes, and used the performance parameters to quantify impairments in visuospatial working memory in athletes following sport-related concussion. Although we found good reliability for the mean span, and moderate reliability for the response duration, few athletes showed clinically significant changes in either parameter acutely following concussion. Furthermore, there was no influence of symptom severity or time since injury on spatial span performance at the acutely concussed time point.
We hypothesized that concussed athletes would show impaired spatial span performance based on work showing impairments in other working memory tasks (e.g. n-back task) 34 and visual working memory without a spatial component. 33 Yet, while some studies have found impaired spatial working memory in subjects with a history of concussion, 36 others have demonstrated no difference between concussed and healthy groups.41,57 Our investigation differed from these prior studies because we compared spatial span performance following acute concussion to individual baseline performance, allowing us to account for inter-individual differences in pre-injury performance. 58 It was therefore surprising that only 20% of acutely concussed athletes showed clinically significant impairments in mean span, and that there were no group-level differences between healthy and concussed cohorts in mean span or response duration RCIs.
The absence of group-level differences in spatial span performance between healthy and concussed cohorts in our study is in contrast to the findings of Chuah et al. 36 who found group-level spatial span differences with a large effect size (η 2 = 0.19) between healthy university students and those with a history of concussion (0.5–6 years post-injury). There were some important differences between our studies that might account for these different results. First, their task assessed memory for location alone, as opposed to location and order of presentation in our study. Secondly, their subjects may have had more severe injuries than those in our cohort. While LOC is not necessarily associated with a more severe injury, 59 the fact that LOC was reported in 69% of their concussed cohort but only 16% of ours suggests potential sample differences that could account for differing results.
We found that athletes with a history of concussion did not improve as much on mean span from baseline 1 to baseline 2 compared to those with no history of concussion in the healthy control cohort. We did not find any other significant differences between athletes with or without a history of concussion. However, there was a trend (p = 0.07) towards athletes with no prior history of concussion having worse acute concussion mean span RCIs, with a moderate effect size (Cohen’s d = 0.56). Although this relationship is in the opposite direction to what might be expected, one explanation could be higher baseline performance for athletes without a history of concussion, which would make their acute concussion performance look worse in the RCI calculation. In fact, athletes without a history of concussion did have higher mean spans at baseline 1 than those with a history of concussion in both cohorts, although the differences were not significant.
One possible inference from our results is that sport-related concussion does not impair spatial working memory in most individuals. However, it is also possible that the spatial span task was not sufficiently difficult, and that concussed athletes had enough cognitive reserve to compensate for concussion-related changes in brain function. Indeed, several studies have shown evidence of altered neurophysiological function during working memory tasks, measured through functional magnetic resonance imaging, event-related potentials, or event-related spectral perturbations, in the absence of impaired task performance following concussion.60–64 As well, one study has found impaired auditory discrimination when combined with a spatial span task, but not when performed alone. 65
Despite our finding that few athletes showed impairments in spatial span performance acutely following concussion, some prior work has shown that this task might be a good test of malingering. Specifically, subjects instructed to malinger showed an increased rate of substitution errors on a computerized spatial span task. 41 Athletes might have an incentive to malinger on their baseline assessment so that they may return to play faster if they sustain a concussion. Interestingly, we did find a higher rate of substitution errors in the concussed cohort on their baseline test. However, it is not clear why the concussed cohort in our sample would be more likely than the healthy control cohort to malinger on their first assessment, prior to sustaining a concussion. We did not have sufficient representation across sports to examine whether there was any relationship between substitution error rate and sport. There were athletes with high substitution error rates at the first baseline (highest 10% calculated across both cohorts) representing most sports included in the study (hockey, speed skating, sliding and alpine).
Our motivation for the present study stemmed largely from the current lack of gold-standard assessment tools for sport-related concussion, and the reliance on self-report of symptoms and observation of clinical signs for diagnosis following concussion. Automated robotic testing has the potential of being objective, reliable, and efficient. However, the heterogeneity of impairments following concussion means that a testing battery must be multi-modal, spanning domains such as sensorimotor, cognitive, oculomotor and vestibular function. While we had hypothesized that the addition of a spatial working memory task would improve our KINARM robotic testing battery for acute sport-related concussion, we found that this task had limited clinical utility. However, there may be some value in this task for identification of malingering. Future work could also look at incorporating the spatial span task into a dual task to increase the task demands.
There were some limitations in the current study. First, there was heterogeneity in our sample in terms of age and sport, and the healthy control group were significantly older than the concussed group. This was likely due to a logistical consideration – that the older athletes were more likely to be competing internationally at the time of injury and therefore were less likely to be seen for assessment at the acute concussion time point. We tried to mitigate this limitation by calculating age-adjusted scores and age-adjusted practice effects for any parameters that showed significant effects of age. However, this assumes that the true effects of age are linear, which might not be the case across development. We also performed group-level analyses on an age-matched subgroup and found no differences. However, it might be useful for future studies with larger sample sizes to evaluate adolescents and adults as separate groups. Secondly, the time between tests was shorter for the concussed group than for the healthy control group, due to our study design of conducting baseline testing on a yearly basis. However, work from our group has found that test–retest reliability and practice effects for other KINARM tasks are comparable for short (∼2–3 months) and long (∼1 year) testing intervals. 19 A third limitation was that the symptom severity in our concussed group was quite low relative to some other published studies. For example, in a study that found impairments in attention and executive function in concussed adolescents, the average PCSS score was 43.0 at three days and 30.8 at seven days post-injury. 66 By comparison, our sample had an average PCSS score of 16.7 at the acute time point (3.5 ± 0.2 days post-injury). The wide range of reported symptom severity is an important consideration when interpreting results across diverse concussion studies. The relatively low symptom burden in our study could reflect a culture of increased awareness of concussion in the sports community and more conservative management of reporting suspected concussive injuries. However, we did not find a correlation between spatial span performance and symptom severity, suggesting that low symptom burden might not have affected our results.
In summary, we found a spatial span task to be reliable across seasons in a group of healthy athletes. However, the spatial span showed limited clinical utility in identifying impairment in the acute period following a sport-related concussion. There was some evidence that a small number of athletes in our sample may have been malingering on their baseline tests. It is unclear from our results whether acute concussion spares spatial working memory in a large percentage of athletes, or whether our spatial span task was not sufficiently difficult to detect subtle working memory impairments that do exist. Future work should resolve this question by looking at spatial working memory in more demanding conditions, such as dual tasks.
Footnotes
Acknowledgements
We would like to acknowledge the participating athletes, coaches and team therapists; Kerri Downer for contributing to data collection; Jennifer Semrau, Rachel Hawe and Helen Bretzke for helpful comments on analysis algorithms; and Duncan MacLean, Ian Brown and Mark Piitz for technical help with the KINARM robot.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: SHS is cofounder and chief scientific officer of BKIN Technologies Ltd., the company that commercializes the KINARM device used in this study. BWB previously held a contract with BKIN Technologies Ltd. that included the potential of a small future royalty for assistance with development of the KINARM end-point robot for use in acute sport concussion assessment. This contract expired prior to the data collection for the present study and did not result in any payment.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by Own the Podium Canada, the Canadian Academy of Sport and Exercise Medicine Research Fund, the Canadian Sport Institute Calgary and Jim Smith of Calgary, Alberta. TAW was funded by a Mitacs fellowship in partnership with Own the Podium Canada. CSM was funded by post-doctoral fellowships from the Canadian Institutes of Health Research and Alberta Innovates Health Solutions.
