Abstract
Background. Some noninvasive brain–computer interface (BCI) systems are currently available for locked-in syndrome (LIS) but none have incorporated a statistical language model during text generation. Objective. To begin to address the communication needs of individuals with LIS using a noninvasive BCI that involves rapid serial visual presentation (RSVP) of symbols and a unique classifier with electroencephalography (EEG) and language model fusion. Methods. The RSVP Keyboard was developed with several unique features. Individual letters are presented at 2.5 per second. Computer classification of letters as targets or nontargets based on EEG is performed using machine learning that incorporates a language model for letter prediction via Bayesian fusion enabling targets to be presented only 1 to 4 times. Nine participants with LIS and 9 healthy controls were enrolled. After screening, subjects first calibrated the system, and then completed a series of balanced word generation mastery tasks that were designed with 5 incremental levels of difficulty, which increased by selecting phrases for which the utility of the language model decreased naturally. Results. Six participants with LIS and 9 controls completed the experiment. All LIS participants successfully mastered spelling at level 1 and one subject achieved level 5. Six of 9 control participants achieved level 5. Conclusions. Individuals who have incomplete LIS may benefit from an EEG-based BCI system, which relies on EEG classification and a statistical language model. Steps to further improve the system are discussed.
Keywords
Introduction
Locked-in syndrome (LIS) consists of tetraplegia and anarthria with preserved consciousness, with 3 levels of severity. Classical LIS describes individuals whose voluntary movement is limited to blinking and vertical eye movements. Incomplete LIS refers to individuals who demonstrate voluntary movement other than blinking or eye movement, and total LIS to those without any voluntary muscle function whatsoever.1,2 LIS etiologies include brainstem stroke, traumatic brain injury, and neurodegenerative conditions such as advanced amyotrophic lateral sclerosis. 3 Incomplete LIS can be defined functionally as a condition where individuals cannot consistently rely on oral motor speech or upper extremity function to meet environmental control or communication needs. In addition to the above etiologies, these disabilities may also result from cerebral palsy, muscular dystrophy, multiple sclerosis, Parkinson’s disease, Parkinson’s plus syndromes, and brain tumors. This significantly increases the number of individuals who fit within a definition of LIS and can benefit from brain–computer interface (BCI), and offers a broad perspective of their functional status for rehabilitation and medical management.
Ischemic strokes are the most common cause of classical LIS, which has a prevalence of 1 to 2 per million. 4 Incomplete LIS, which includes additional diagnoses, has an uncertain but significantly greater prevalence. The usual age of onset of LIS varies between 17 and 52 years.5-7 The youngest patients have a better prognosis for survival, with more than 85% of individuals still living 10 years after onset.5,6 With advances in medical technology, life expectancy will likely increase.
Expressive communication (both speech and writing) is a significant challenge for individuals with LIS. People with classical LIS rely on blinking or eye movements to communicate via yes/no responses or partner-assisted communication methods, or to control a speech-generating device.3,8,9 Individuals who present with incomplete LIS may have additional options for gestural communication or alternative access to a speech-generating device.10,11 However, even these methods may not be reliable because of fatigue or variability in motor function 2 and those with degenerative conditions may transition to total LIS and lose the ability to communicate even through blinking or eye movements. 12 Current efforts in assistive technology have resulted in new access methods for people with severe neuromuscular impairments.13,14 BCI is a promising option for people with LIS.
Brain–computer interface uses brain signals to provide a nonmotor communication channel for people with severely limited motor control. Considerable research efforts are being invested into electroencephalography (EEG) BCIs, both from noninvasive scalp recordings and from invasive electrocorticography for both human and animal models. 15 Among noninvasive EEG-based BCI options, the most commonly used spelling interface is the BCI2000 with P300 speller.16,17 The P300 response has been shown to be a reliable signal for controlling a BCI for a number of functions, including text generation. 17 The P300 speller presents a grid of characters arranged in a 6 × 6 matrix. Rows and columns randomly flash with the target cell represented by an intersection occurring with a probability of 1/6. The rare brightening of the target stimulus elicits a P300 18 that is identified by the computer program and interpreted as a “keystroke.” 19 A second spelling application is the Berlin BCI, Hex-o-spell.20,21 In the usual configuration, a user focuses on 6 hexagonal fields surrounding a circle. In each of the fields, 5 letters or other symbols are arranged. In order to select a symbol, the user imagines directing a small arrow to their desired target. Successful imagination results in that field being chosen. Then all other hexagons are cleared and the 5 symbols of the selected hexagon are moved to a novel set of 6 individual hexagons. The user then replicates the same procedure to select one symbol. 20
The current noninvasive BCIs allow people with LIS to access letters for communication and computer control. The current systems do not integrate a language model with signal detection for letter selection although some systems have used predictive spelling after the classifier has decided on the correct letter. 22 Statistical language models assign probabilities to text. High-utility probabilistic language models can be estimated from a large sample of text in any language by counting how often letters occur in particular contexts. They are important components of many computer-based natural language applications, including speech recognition, machine translation and optical character recognition. They are also used to make text entry more efficient in word processors or text messaging in cell phones. These same statistical language models are now frequently used to speed up text entry in non-BCI communication devices for individuals with severe speech and language disorders. 23
This article describes the development and implementation of the RSVP Keyboard, the first BCI device for people with LIS that tightly fuses a language model with an EEG classifier for effective and efficient spelling and expressive communication.
Methods
Participants
Participants with LIS were recruited through the ALS Center of Oregon, the ALS Association Oregon, and SW Washington Chapter, and the outpatient Neurology and Augmentative and Alternative Communication clinics at Oregon Health & Science University (OHSU). Control participants included a convenience sample of people without disabilities. Participants with LIS met the following inclusion criteria: (1) diagnosed by a neurologist, (2) age between 18 and 75 years, (3) capable of participating in 1- to 3-hour experimental interactions, (4) literate in English, (5) adequate vision and hearing, (6) speech that is understood less than 25% of the time (as assessed by the referring speech-language pathologist) or (7) severely reduced hand function for writing and/or typing, and (8) willing to be videotaped for research purposes. Participants in the control group met criteria 2 through 5. All participants completed the RSVP screening protocol that assessed requisite skills for use of the RSVP Keyboard.
The screening protocol addresses history and participant/caregiver perception of current sensory abilities, performance on subtests from existing standardized assessment instruments for auditory comprehension, reading, and spelling and on novel tasks developed to screen sustained visual attention, working memory, and ability to perceive stimuli in all 4 visual quadrants.24,25 The screening protocol only required minimal movement responses and was completed by all participants with LIS.
This study was approved by the institutional review board at OHSU and all participants provided informed consent. Participants with LIS authorized a relative or caregiver to sign the consent forms on their behalf, via yes/no signals or other alternative means of communication.
Procedure
The experimental sessions were performed at the residences of people with LIS and at OHSU for the nondisabled group. EEG was recorded using a 16-channel g.USBamp (g.tec, Graz, Austria) with active electrodes in a cap at approximate 10-20 locations. The reference electrode was placed at TP10 and ground at FpZ. The raw EEG was grossly inspected for signal quality. A 500-ms window of EEG following character presentation was used for the signal analysis of the classifier reducing the detection problem into binary classification problem to decide between target and nontarget stimulus. 26
Stimuli were presented in an RSVP (rapid serial visual presentation) paradigm. 26 The stimuli consisted of the 26 letters plus a “<” for delete prior character and “_” for spacebar and had a visual angle of 3.8°. Individual characters were presented singly at 2.5 per second and were on screen for 400 ms. The stimuli were presented on an 18-inch laptop computer monitor positioned 75 cm away from the participants.
Participants first had a calibration session where the classifier was generated after seeing 75 sequences of the 28 characters. Prior to each sequence, the participants were briefly shown a target letter or character they were instructed to detect.
Mastery Task
After calibration of the RSVP Keyboard, participants performed the mastery task. The mastery task was designed to provide practice opportunities to improve user performance. Varying levels of contribution from the language model (described in detail below) were used to minimize errors and reduce frustration to encourage further practice. During the task, participants were presented with a preselected set of balanced phrases 27 one at a time, and were asked to copy a target word from each phrase using the RSVP Keyboard. They were instructed to correct any errors by selecting the “<”. The 28 characters were presented in 2 blocks of 14, with a fixed pause of 1 second between blocks and sequences. The letters were chosen randomly with the constraint that all 28 characters were chosen in each sequence of 28. One to 4 sequences were presented for each character classifier decision. A classifier decision, with input from the language model, was made after a sequence if the classifier achieved a certain probability threshold. After classification, there was a decision screen presented with the classifier character choice and a 1-second pause before going onto the next character.
The program moved on to the next phrase when one of the following criteria was met: the target word was copied correctly; the participant spent 10 minutes attempting to type the same word; or the number of presented sequences exceeded 8 times the number of letters in the target word. Each of the 5 mastery task levels included 3 sets of 3 phrases. A level was considered successfully completed if participants accurately generated a target word for 2 of the 3 phrases in a set. If the participant did not successfully complete the first set on a level, he or she could attempt up to 2 more sets at that level. The mastery task continued until the subject completed the fifth level, failed to pass a lower level, or opted to end the session. Error rates were calculated using the total error rate formula. 28
The EEG was sampled at 256 Hz and 2.5- to 44-Hz band-pass filtered. Artifact contaminated sequences were rejected if the average amplitude of any channel was higher than 40 µV and the trial was repeated. The 500-ms samples from each channel following each character presentation were further processed. A linear dimension reduction was applied using principal component analysis to remove zero variance directions, essentially equivalent to employing a bank of eigenfilters on EEG and downsampling their outputs to obtain features. The directions with a variance lower than 10−5 of the maximum variance were removed. The energy in the removed directions was not exactly zero; however, it was assumed to be negligible compared with higher energy components. The number of dimensions was reduced to approximately 48 from 64 for each channel, where 64 time samples correspond to half a second windowing after downsampling to 128 Hz. The total number of features, that is, dimensions, for each trial was approximately 800. Subsequently, for each stimulus, the aggregate feature vector obtained from all the channels was further projected into 1-dimensional space using regularized discriminant analysis (RDA) classifier. 29
The signal statistics required for principal component analysis and RDA were learned during the calibration session described above. Using the calibration data, RDA model was fit to data. The accuracy of the classifier during calibration was estimated from the area under the curve (AUC) of true positive versus false positive rate for the calibration target versus nontarget classification, under a 10-fold cross-validation.
Character-based language models were trained on a large sample of New York Times text, from the English Gigaword corpus, following described methods. 30 Briefly, the probability of each letter is conditioned on the previous 5 letters in the sentence. Using these models, experimental stimuli were generated with specific levels of predictability, to provide increasing levels of typing difficulty in the 5-stage mastery task. Low levels in the mastery task are highly predictable given the previous symbols; higher levels include less predictable words. Specifically, in level 1, each letter in the target word is at least 5 times more likely than the next highest probability letter; level 2 target letters are at least 2 times more likely; level 3 target letters are always the most likely; level 4 target letters are never most likely but always at least half as probable as the most likely letter; and level 5 target letters are between 0.3 and 0.5 times as probable as the most likely letter. While the first letter of a word is usually less accurately predicted by language models than later letters of the word, these particular words were chosen so that the language model worked comparably well for the first as later letters of a word. Using the model, many possible stimuli fitting the mastery criteria were found in the New York Times corpus, then hand-filtered for linguistic variety and incorporated into natural-sounding phrases.
To improve character classification, evidences from the language model and from the EEG are tightly combined. The score corresponding to each trial stimulus is obtained after the EEG feature extraction step as a result of RDA. Based on the scores, relative likelihoods for the target and non-target class are estimated using kernel density estimation. A probabilistic Bayesian fusion is made with the assumptions of conditional independence of EEG evidence in each epoch from a prior epoch and from the language model evidence. A naïve Bayesian fusion is applied between the language model and EEG given the class label of each trial being target or nontarget. The fusion is done probabilistically. The probability of each symbol being the intended one changes according to EEG evidence collected by new repetitions of the sequences. As more EEG evidence is collected the effect of language model becomes less prominent. Even a target letter with a probability of .0001 according to the language model may be selected after collecting EEG evidence corresponding to multiple sequences. After the probability is calculated, the symbol with maximum a posteriori probability is selected by the system either once this probability exceeds a preset confidence threshold or when a preset maximum allowed number of sequences is reached. 26
Results
Participants
Demographic information for 9 participants with LIS and 9 healthy controls is provided in Table 1. There were no significant differences between the 2 groups in terms of age, gender, or years of education. One participant did not pass the cognitive screening, either because of lack of consistent motor response or being in a minimally conscious state. Two participants with LIS passed the screening but did not take the mastery task: one with significant electrode problems and one because of hospitalization and move to new foster home (this person achieved AUCs of .79 and .88 on 2 trials of calibration).
Demographic Information on 9 Participants With Locked-In Syndrome (LIS) and 9 Healthy Controls.
Mann-Whitney test.
Chi-square test.
Years since LIS onset was calculated for all participants with LIS to the earliest time they had LIS, either classical or incomplete. The cause of LIS was amyotrophic lateral sclerosis (4), brainstem stroke (2), cerebral palsy (1), brainstem arteriovenous malformation (1), and Duchenne muscular dystrophy (1).
Mastery Task Performance
Table 2 presents results from the mastery task on the remaining 6 participants with LIS and 9 healthy controls. All participants completed at least the first level of the RSVP Keyboard mastery task. The number of successful participants decreased at higher mastery task levels. Higher AUC scores were required for success at higher mastery task levels. Six of 9 control participants completed all 5 mastery levels, compared with only 1 of 6 participants with LIS.
Mastery Level Completion, AUC, and Total Error Rate for Participants With LIS and Control Groups. a
Abbreviations: AUC, area under the curve; LIS, locked-in syndrome.
Mean word frequency is given as number of occurrences of target words divided by total corpus number/number of occurrences of target words with same 5 preceding characters divided by the occurrence number of the 5 preceding characters. AUC and total error rate include only participants who successfully completed a given level. Values are presented as mean (range). Examples of the 5 mastery levels with the target word in quotations are as follows: Level 1, I_DO_“NOT”_AGREE; Level 2, IN_NEW_“YORK”_CITY; Level 3, EAT_THREE_TIMES_A_“DAY”; Level 4, MY_PARENTS_“FIND”_ME_FUNNY; Level 5, THE_MAN_WITH_“WAVY”_EYEBROWS. The probability of letters in the target word range from 5 times more likely as the next most likely letter (level 1) to 0.3 times as likely as the most likely letter (level 5).
One participant with LIS completed levels 4 and 5 during two different sessions, which is why there is a range here even though there was only one successful participant. All levels contain data from multiple sessions with the same participant.
On average, control participants had significantly higher maximum AUC scores than participants with LIS (Mann–Whitney P = .045) and tended to reach higher levels in the mastery task (P = .069). Although the number of sessions required to either complete level 5 or fail to pass a level varied from subject to subject (because of time constraints, individual performance, and/or fatigue), there was no significant difference between the 2 groups (P = .414). These results are displayed in Table 3.
Highest Mastery Level Achieved and Number of Sessions. a
Abbreviations: AUC, area under the curve; LIS, locked-in syndrome.
Values are given as mean (range); P value (2-tailed Mann–Whitney test).
Several participants with LIS consistently achieved low AUC scores during calibration attempts. Two of these participants, both with significant spasticity, demonstrated frequent, uncontrolled movements of the facial and respiratory muscles that interfered with accurate EEG signal acquisition.
Discussion
For the first time, we have demonstrated the utility of fusion of an EEG classifier with a statistical language model for spelling with a BCI in participants with LIS. Equipment organization, transport, and protocols were streamlined to allow research assistants to set up the RSVP Keyboard with ease in 45 minutes. The RSVP Keyboard quickly presents one large letter at a time on the screen for 400 ms (or shorter), thus reducing the visual perceptual demands compared with other more complicated BCI displays. Through calibration and mastery tasks, EEG signals were recorded for up to 5 hours in participant’s homes. The mastery task is another unique feature that allows participants to use the BCI to spell words with minimal errors because of the strong contribution of the language model to the lower levels. This 5-level mastery exercise is suited for functional training of BCI use as well as experimental manipulation.
Nondisabled participants performed better using the BCI system than those with LIS as has been observed previously. 31 Some with LIS demonstrated low AUC. It is possible that those with LIS may have more difficulty with sustained attention but other potential confounds have not been fully addressed with these small samples. Possible causes include medications32,33 and additional electronic equipment (2 on ventilators). Five of the 6 participants with LIS were taking at least one EEG-altering medication. However, one subject with LIS achieved the highest AUC of all participants despite both being on a ventilator and taking 2 different antidepressants.
While the RSVP Keyboard is usable in a small subset of people with LIS in its current form, there are limitations that will be addressed in future versions. Current artifact rejection or minimization techniques in real time are not ideal. While independent component analysis is frequently used to subtract eye blink artifact in off-line EEG analysis, 34 doing this in real time is more difficult. Eye blinks need to be detected and not just subtracted so that ERPs to stimuli presented during a blink are not used for classification. Active electrodes were used that require less quality contact than the typical 5 kohm impedance needed for passive EEG electrode recordings. Electrode artifacts during mastery task or spelling are also a potential problem because the calibration used all electrodes. Electrode locations do not need to be exactly measured since the classifier can use any electrode location and the classifier is partly unique to a subject and a recording session. It is reasonable to try to keep placement as consistent across days as possible within a subject in order to significantly shorten the calibration time for each session. There were occasional problems with 60-Hz sources in peoples’ homes.
For this study, all 28 possible characters were presented for each sequence for the classifier. The unique integration of a statistical language model has allowed a newer developmental version to present just a subset of the 28: the more likely characters. The current stimulus rate of 2.5 Hz allowed novice users to learn the RSVP paradigm, but those familiar with the task have been able to go faster than 5 Hz. We are currently exploring the feasibility of the 5-Hz presentation rate in subjects with LIS. The language model is currently using a standard corpus based on written English but this will possibly be improved with other corpora and with individualization of language model based on participants’ prior text-based communication using email or other communication devices.
Characters per minute were fairly low as with all BCI systems. It is of interest that the correct characters per minute are not markedly different than those reported in other systems with very different letter presentation strategies. It would be useful in the future to design experiments to directly compare performances using different presentation strategies with and without fused language models. For this experiment, no direct comparison with another system was performed.
There were participant issues related to having significant neurological dysfunction. Two participants had uncontrolled movements making useful EEG recording unreliable. This might be overcome with much improved subject specific artifact detection and minimization procedures. It was occasionally difficult to physically position the participants with LIS to comfortably maintain gaze at the laptop computer monitor. Participants did fatigue from the task, both those with LIS and healthy controls. It should be feasible to integrate some physiological markers of decreased alertness or vigilance 35 into the system and then take breaks or stimulate participants in other ways to increase alertness.
This BCI RSVP Keyboard has fused an EEG classifier with a statistical language prediction model in real-time for the first time in people with incomplete LIS allowing for spelling with only 1 to 4 target letter presentations. The plan is to continue to refine the methodology to speed up the spelling rate and to allow its use for people who are completely locked-in.
Footnotes
Acknowledgements
The authors acknowledge technical contributions from Shalini Purwar and Dr Kenneth Hild III.
Authors’ Note
The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the National institutes of Health or the National Science Foundation.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by grants from the National Institutes of Health (NIH R01 DC009834) and the National Science Foundation (NSF IIS-0914808, NSF CNS-1136027, and NSF IIS-1149570).
