Sage Journals: Discover world-class research

Abstract

Recently, rapid serial visual presentation (RSVP), as a new event- related potential (ERP) paradigm, has become one of the most popular forms in electroencephalogram signal processing technologies. Several improvement approaches have been proposed to improve the performance of RSVP analysis. In brain–computer interface systems based on RSVP, the family of approaches that do not depend on training specific parameters is essential. The participating teams proposed several effective training-free frameworks of algorithms in the ERP competition of the BCI Controlled Robot Contest in World Robot Contest 2021. This paper discusses the effectiveness of various approaches in improving the performance of the system without requiring training and suggests how to apply these approaches in a practical system. First, appropriate preprocessing techniques will greatly improve the results. Then, the non-deep learning algorithm may be more stable than the deep learning approach. Furthermore, ensemble learning can make the model more stable and robust.

Keywords

brain–computer interfaces electroencephalogram rapid serial visual presentation (RSVP)data imbalance training-free

1 Introduction

Brain–computer interface (BCI) provides people with an alternative way to communicate with external devices, which can directly measure the user’s brain activity and converts it into the corresponding signal in the BCI [1]. The electro- encephalogram (EEG) is the most commonly used input signal in BCIs because of its simplicity [2]. The measurement method of brain activity is used in functional near-infrared, functional magnetic resonance, and EEG [3]. Additionally, using EEG signals for affective interaction can make interfaces more intuitive [4].

P300 evoked potential [5], motor imagery [6], and rapid serial visual presentation (RSVP) [7] are widely used paradigms in BCIs. RSVP is one form in which visual stimuli are displayed rapidly (2–20 Hz) in chronological order at a fixed position on screen [8], and EEG recordings of subjects’ brain activities are collected throughout the experiment. Specifically, RSVP is an EEG signal induced by small probability events. After stimulation, RSVP will have a positive peak in EEG data. Additionally, RSVP is the response in the time domain.

Figure 1 shows the workflow of RSVP used in the event-related potential (ERP) competition (training-free) of the World Robot Contest 2021 (WRC2021), meaning that each subject has an independent classifier.

Fig. 1

Workflow of RSVP.

First, the subjects are shown with specific images at a high presentation rate on the computer screens, and their brains generate associated RSVP signals. After feature extraction, the contestants use the algorithms to give the corresponding prediction results. Finally, the organizing committee judges the prediction accuracy.

During each trial, subjects had a preparation time of 2000 ms at first and were then showed a new picture displayed on the screen every 100 ms. The images include target and non-target images. For this ERP competition, Table 1 presents the definition of the triggers; the target images are cars (label 1) and humans (label 2), whereas the non-target images are backgrounds (label 0). Additionally, a data imbalance problem exists: RSVP data contain many non-target images and only a small number of target images, with a ratio of around 15:1, which becomes a great challenge for algorithm designs.

Table 1

Definition of all triggers

Definition	Trial start	Trial end	Block start	Block end	System reservation
Trigger	240	241	242	243	244–255
Definition	Non-target	Target 1	Target 2
Trigger	0	1	2

Many algorithms based on traditional machine learning have been proposed for single-trial RSVP EEG classification. Linear discriminant analysis (LDA) and Fisher LDA (FLD) have been widely used in these classical algorithms. For example, hierarchical discriminant component analysis (HDCA) and spatially weighted FLD principal component analysis (PCA; SWFP) are two popular classical algorithms.

Specifically, HDCA uses LDA to obtain spatial weighting vectors for different time periods to calculate the projections of the single-trial RSVP EEG data. SWFP algorithm learns a spatio- temporal weights matrix through FLD to amplify discriminative components and then uses PCA to reduce temporal dimension [9]. Most traditional methods are linear algorithms. However, linear constraint makes traditional methods train faster and more robust, limiting the classification performance [10].

Furthermore, many new convolutional neural network (CNN) models have been proposed for single-trail RSVP EEG classification, such as ShallowConvNet, DeepConvNet [11], which will be introduced in detail in the following sections. Shaheen et al. [12] proposed a deep belief net classifier to classify single-trial RSVP EEG data. Zang et al. [13] also proposed PLNet to classify single-EEG data. They obtained that the net is more efficient than most deep learning methods.

In this paper, we introduce the algorithms used by the top-five teams (Hust-BCI, pikapika, Brainstorming, XDU_ERP, Mind Reader) in the finals of the ERP competition (training-free) of the WRC2021. For simplicity, the algorithms of the five teams will be called Algo-H (Hust-BCI), Algo-P (pikapika), Algo-B (Brainstorming), Algo-X (XDU_ERP), and Algo-M (Mind Reader). The five teams will be called Team-H (Hust-BCI), Team-P (pikapika), Team-B (Brainstorming), Team-X (XDU_ERP), and Team-M (Mind Reader). This paper aims to evaluate the performance of different algorithms in the training-free scenario.

2 Methods

In this section, we first introduce preprocessing approaches, such as filter and Euclidean space alignment (EA) [14]. Then, we introduce two frameworks, because only Team-H uses the non-deep learning framework, while other teams use the deep learning framework. The non-deep learning framework mainly includes xDAWN [15], tangent space mapping [16], and logistic regression [17]. The deep learning frameworks apply networks such as EEGNet, DeepConvNet, and long short-term memory (LSTM) [18], which are usually used in BCI. Finally, we introduce the training strategy in a training-free scenario and flowcharts of the five teams.

2.1 Experimental arrangement

Figure 2 shows the detailed layout of RSVP. During each trial, subjects had a preparation time of 2000 ms at first and were then showed a new picture displayed on the screen every 100 ms. The images include target and non-target images, approved by the institutional review board of Tsinghua University (NO. 20210032).

Fig. 2

RSVP in BCI competition.

For this ERP competition, Table 1 presents the definition of the triggers; the target images are cars (label 1) and humans (label 2), whereas the non-target images are backgrounds (label 0). Additionally, a data imbalance problem exists: RSVP data contain many non-target images and only a small number of target images, with a ratio of around 15:1, which becomes a great challenge for algorithm designs. In the computational aspect, a successful BCI system must address the following two issues. First, a successful BCI system must extract the event-specific signatures that characterize the brain signals specific to the target (or non-target) images embedded in the EEG recordings. This is often enforced by a training process, where the signatures are extracted from the training data whose event associations are already known. Second, a successful BCI system must effectively utilize the event-specific signatures to classify EEG recordings whose event association is unknown. This is often enforced by a classifier in the testing process.

Figure 3 shows the data collection process of the competition. There were four subjects in the competition. For each subject, the organizing committee collects three blocks with 10 trials in each block.

Fig. 3

Data collection process of the competition.

The following equation defines the competition’s evaluation metric, true positive rate (TPR). Here TP refers to the total number of the data with labels 1 and 2 that the model correctly predicts; FN refers to the total number of the data with labels 1 and 2 that the model wrongly predicts.

TPR= \frac{TP}{TP+FN}

The following equation defines the evaluation metric, false positive rate (FPR). Here, FP refers to the total number of the data with label 0 that the model wrongly predicts; TN refers to the total number of the data with label 0 that the model correctly predicts.

FPR= \frac{FP}{FP+TN}

A more specific explanation is shown in Table 2. Receiver operating characteristic (ROC) and area under the curve (AUC) are often used to evaluate the advantages and disadvantages of a binary classifier. In Fig. 4, the ordinate of the ROC curve is TPR, and the abscissa of ROC is FPR.

Table 2

Confusion matrix.

		True label
		1	0
Prediction label	1	True positive (TP)	False positive (FP)
	0	False negative (FN)	True negative (TN)

Fig. 4

AUC and ROC.

AUC is one of the main offline evaluation indicators used by classification models, especially binary classification models. There are two meanings of AUC. One is the traditional meaning of “area under ROC curve”, and it is the shaded part in Fig. 4.

The other is about the explanation of sorting ability. For example, the meaning of AUC of 0.7 can be roughly understood as follows: given a positive sample and a negative sample, in 70% of cases, the model scores the positive sample higher than the negative sample. It can be seen that under this explanation, we are only concerned about the score between positive and negative samples, while the specific score is irrelevant. The equation is given as follows:

AUC = P (P_{True} > P_{False})

Here, P_True is the probability of predicting the positive sample as 1; P_False refers to the probability of predicting the negative sample as 1.

2.2 Preprocessing approach

2.2.1 Filter

Among the five teams in the competition, only Team-B did not use filters.

Team-H used infinite impulse response (IIR) filters to filter data along one dimension after training parameters based on previous training sets without a bandpass filter, where IIR filters follow the input–output relationship:

y (k) + \sum_{i = 1}^{M} a_{i} y (k - i) = \sum_{i = 0}^{L} b_{i} x (k - i)

where x(k) and y(k) are the filter’s input and output, respectively, and M(≥ L) is the filter order, a_i and b_i are the parameters of the equation. The transfer function of the IIR filter can be written in the following general form:

H (z) = \frac{B (z)}{A (z)} = \frac{\sum_{i = 0}^{L} b_{i} z^{- i}}{\sum_{i = 1}^{M} a_{i} z^{- i}}

Team-P continuously used 50, 20, 10, 30, and 40 Hz notch filters, respectively, and finally uses an 8th order Butterworth filter with an upper cut-off frequency of 0.72 Hz (Nyquist frequency) and a lower cut-off frequency of 0.008 Hz (Nyquist frequency) after baseline drift, and the corresponding real frequency is 1–90 Hz.

The relationship between Nyquist frequency and real frequency is given as follows:

f_{N} = \frac{2 f_{r}}{f_{s}}

Here, f_N is the Nyquist frequency, f_r is the real frequency; f_S is the sample frequency (250 Hz in the competition).

Team-B did not use a bandpass filter but used data standardization with the following equation:

X = \frac{X_{\max} - X_{\min}}{X_{\min}}

Team-X uses a third-order Butterworth filter with an upper cut-off frequency of 0.32 Hz and lower cut-off frequency of 0.008 Hz, and the corresponding real frequency is 1–40 Hz.

Team-M also uses a third-order Butterworth filter with an upper cut-off frequency of 0.32 Hz and a lower cut-off frequency of 0.008 Hz, and the corresponding real frequency is 1–40 Hz.

2.2.2 Euclidean space alignment

Zanini et al. [19] proposed Riemannian alignment (RA), which is a transfer learning approach in the Riemannian space. The performance of the classifier with RA can be improved using the auxiliary data of other subjects with only a few labeled trials.

Specifically, RA first calculates the covariance matrices of some resting trials, where the subjects keep still. Then, it calculates the Riemannian mean $\bar{R}$ of these matrices. It is then used as a reference matrix in RA to reduce the difference between subjects through the following equation:

{\tilde{Σ}}_{i} = {\bar{R}}^{- 1 / 2} \sum_{I} {\bar{R}}^{- 1 / 2}

where ∑i is the covariance matrix of the ith trial, and ${\tilde{Σ}}_{i}$ is the corresponding aligned covariance matrix.

Inspired by RA, He et al. proposed EA, which does not need any labeled data from the new subject and can make the data distributions from different subjects more similar. The idea has been widely used in transfer learning [20 –24]. The approach is also based on a reference matrix $\bar{R}$ . Assume a subject has n trials. Then,

\bar{R} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} X_{i}^{T}

where $\bar{R}$ is the arithmetic mean of all covariance matrices from a subject. The alignment process is then obtained by

{\tilde{X}}_{i} = {\tilde{R}}^{- 1 / 2} X_{i}

where $\bar{R}$ is the reference matrix of a subject. After the alignment, the mean covariance matrix of all n aligned trials is given by

\begin{array}{l} \frac{1}{n} \sum_{i = 1}^{n} {\tilde{X}}_{i} X_{i}^{T} = \frac{1}{n} \sum_{i = 1}^{n} {\bar{R}}^{- 1 / 2} X_{i} X_{i}^{T} {\bar{R}}^{- 1 / 2} \\ = {\bar{R}}^{- 1 / 2} (\frac{1}{n} \sum_{i = 1}^{n} X_{i} X_{i}^{T}) {\bar{R}}^{- 1 / 2} \\ = {\bar{R}}^{- 1 / 2} \bar{R} {\bar{R}}^{- 1 / 2} \\ = I \end{array}

Thus, the mean covariance matrices of all subjects are equal to the identity matrix after alignment. Therefore, the distributions of the covariance matrices from different subjects are more similar, which is very desirable in transfer learning.

2.3 Non-deep learning framework

2.3.1 xDAWN

xDAWN is a spatial filtering approach that can find a transformation to improve the signal-to- noise ratio (SNR) of the ERP signal and reduce the dimension of the data [25].

The details of xDAWN are as follows: The EEG signal that contains the P300 component is X ∊ R ⁿ ^× ^d , where n is the feature dimension, and d is the number of channels of the EEG signal. The purpose of xDAWN is to find the projections, W ∊ R ⁿ ^× ^f , where f is the number of filters for projection so that data filtered by this filter are $\tilde{X} = X W$ .

In reality, the P300 signal is assumed to be A ∊ R ^e×d , where e represents the length of the P300 components, and a noise signal N ∊ R ⁿ ^× ^d that obeys normal distribution. The position of the P300 component in the real P300 signal is D ∊ R ^e×n through the Toeplitz matrix. Therefore, the signal can be expressed as X = D ^T A + N , and the filtered P300 signal can be expressed as XW = DAW + NW . A can be calculated by a least-square estimate using the pseudoinverse as follows:

\hat{A} = \underset{A}{\arg \min} = ∥ X - D A ∥_{2}^{2} = {(D^{T} D)}^{- 1} D^{T} X

The optimal filters W can be obtained by maximizing SNR using the following generalized Rayleigh quotient:

\hat{W} = \underset{W}{\arg \max} \frac{T r (W^{T} {\hat{A}}^{T} D^{T} D \hat{A} W)}{T r (W^{T} X^{T} X W)}

2.3.2 Tangent space mapping

Tangent space mapping maps a point on a Riemannian manifold into the Euclidean space, so that machine learning approaches in the Euclidean space can be applied.

First, the Riemannian mean R of the covariance matrix ${P_{i}} \in ℝ^{n \times n}$ of a group of signals is calculated. Then, for each tangent space, P i will be projected to the average point by the following formula. The dimension of the tangent space is m = n(n +1) / 2 ,

s_{i} = u p p e r [R^{- \frac{1}{2}} \log_{R} (P_{i}) R^{- \frac{1}{2}}]

where the upper(·) means that the upper triangular part of the symmetric matrix is retained and vectorized, the weight of diagonal elements is 1, and the weight of other elements is $\sqrt{2}$ .

2.3.3 Logistic regression

Logistic regression is a machine learning approach for solving binary classification problems, which are used to estimate the possibility of classification.

The logistic and linear regression are generalized linear models. Logistic regression assumes that the dependent variable y follows the Bernoulli distribution, whereas linear regression assumes that the dependent variable y follows the Gaussian distribution. Therefore, it has many similarities with linear regression. Without a sigmoid activation function, the logistic regression algorithm is a linear regression.

In practice, linear regression is commonly used to fit the real data; the function is as follows:

h_{θ} (x) = θ_{0} + θ_{1} x

Here, x is the independent variable, and h_θ ( x ) is the dependent variable, describing the linear relationship between input and output. θ ₀ and θ ₁ are parameters that need to be calculated. Additionally, h_θ ( x ) is called hypothesis function.

However, there are many data that do not obey the linear relationship. Thus, we use the sigmoid function to expand the use of hypothesis function. The sigmoid function introduces nonlinear factors into logistic regression, also known as logistic function:

g (z) = \frac{1}{1 + e^{- z}}

where z is the independent variable, and g( z ) is the dependent variable. The function is shown in Fig. 5.

Fig. 5

Sigmoid function.

Then, the form of hypothesis function is given as follows:

h_{θ} (x) = g (θ^{T} x), where g (z) = \frac{1}{1 + e^{- z}}

Therefore,

h_{θ} (x) = \frac{1}{1 + e^{- θ^{T} x}}

where x is the input, and the parameter θ is what needs to be calculated. The hypothesis function h_θ( x ) describes the nonlinear relationship between the input and output.

A machine learning model limits the decision function to a certain set of conditions, determining the hypothesis space of the model. The assumptions made by the logistic regression model are as follows:

P (y = 1 ∣ x; θ) = g (θ^{T} x) = \frac{1}{1 + e^{- θ^{T} x}}

The cost function in logistic regression J(θ) is given by

J (θ) = - \frac{1}{m} {\sum_{i = 1}^{m} [y^{(i)} \ln h_{θ} (x^{(i)}) + (1 - y^{(i)}) \ln (1 - h_{θ} (x^{(i)}))]}

A better logistic regression model can be obtained by continuously optimizing the cost function.

2.4 Deep learning framework

2.4.1 LSTM

The LSTM network is a variant of a recurrent neural network (RNN). RNN can only have short-term memories because the gradient of the loss function decays exponentially with time (called the vanishing gradient problem). LSTM network combines short- and long-term memories through gate control, mitigating the vanishing gradient problem to a certain extent. It can learn long-term dependent information.

Figure 6 shows a schematic of RNN and LSTM. In the standard RNN, this repeated module has a very simple structure, such as a tanh layer. LSTM is the same structure but different from a single neural network layer. It has four neural network layers, with each having a specific purpose.

Fig. 6

RNN and LSTM schematics. (a) RNN. (b) LSTM.

LSTM can remove or add information to the cell state because of the gate structure that contains a sigmoid neural network layer and a bitwise multiplication operation, as shown in Fig. 7.

Fig. 7

Gate structure of LSTM.

The sigmoid layer outputs a value between 0 and 1, describing how much each part can pass through. Here, 0 stands for “not allowed to pass any quantity”, and 1 stands for “allowed to pass all quantity”. More specifically, LSTM has three gates to control cell state. The first step in LSTM is to decide the information to forget from the cell state. The decision was made through a forget gate. The gate reads h _t−1 and x _t then outputs a value between 0 and 1 to each number in the cell state. Figure 8(a) shows the decision to forget information, given as follows:

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

Fig. 8

Meaning of the different parts of LSTM. (a) Forgetting information of LSTM. (b) Updating information of LSTM. (c) Updating cell state of LSTM. (d) Output information of LSTM.

Figure 8(b) shows the information to be updated:

i_{t} = σ (W_{i} [h_{t - 1}, x_{t}] + b_{i})

{\tilde{C}}_{t} = \tanh (W_{C} [h_{t - 1}, x_{t}] + b_{c})

Figure 8(c) shows the update cell state:

C_{t} = f_{t}^{*} C_{t - 1} + i_{t}^{*} {\tilde{C}}_{t}

Figure 8(d) shows the output information:

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

h_{t} = o_{t} * tanh (C_{t})

Therefore, with the described architecture, LSTM can effectively process the time series data, such as EEG signal or Natural Language Processing (NLP) data, which studies the theory of communication between human and computer with natural language, such as machine translation, text classification, text semantic comparison and speech recognition.

2.4.2 EEGNet

EEGNet is a compact CNN architecture for EEG-based BCIs that: (1) can be applied in several different BCI paradigms, (2) can be trained with small data, and (3) can produce useful EEG features.

Figures 9 and 10 show a full description of the EEGNet model. EEG trials have C channels and T time samples. Lawhern et al. fit the model using Adam optimizer and cross-entropy loss function [26].

Fig. 9

EEGNet architecture.

Fig. 10

Network layer structure of EEGNet.

The first part of the network is time convolution, which can replace the frequency filter. The second part is a depthwise convolution, connecting to each feature map individually to learn frequency- specific spatial filters.

The third part is separable convolution. It combines depthwise and pointwise convolutions and collects a temporal summary from each feature map and optimally mixes all feature maps separately. Figure 10 shows the full details of the network architecture.

2.4.3 DeepConvNet

Robin et al. designed the DeepConvNet architecture inspired by successful architectures in computer vision, as described by Krizhevsky et al. [27].

DeepConvNet has four blocks, “Conv-Pool-block”. The first block has two convolution layers and max pooling with a special first block designed to handle EEG signal. The second, third, and fourth blocks consist of a convolution layer and a max pooling layer. The last layer is a dense softmax classification layer; the details are shown in Fig. 11. Figure 12 shows the parameters of DeepConvNet.

Fig. 11

Conv-Pool-block of DeepConvNet.

Fig. 12

Network layer structure of DeepConvNet.

Especially, using two layers in the first block implicitly regularizes the overall convolution by forcing a separation of the linear transformation into a combination of a temporal convolution and a spatial filter.

2.5 Training strategy in training-free scenario

Although participants cannot obtain training data from subjects in the competition, they can use data from subjects in the preliminary. There are six different subjects in the preliminary competition, and the competition organizer provides 21 blocks for each subject.

The shape of the EEG data is (X,C,Y). Here, X is the number of trials, C is the number of channels, T is the time point length of the EEG data.

In Algo-H, participants use the non-deep learning framework to first divide the data into four-time point lengths of 125, 145, 165, and 185. Because there is a certain data imbalance in RSVP data, the ratio of the data with label 0 to data with labels 1 and 2 is 15:1. Thus, the idea of OvR (One vs. Rest) multi-classifier is used. The OvR classification strategy is to train N classifiers by taking samples of one category as positive examples and samples of all other categories as negative examples at a time. For example, the data with labels 1 and 2 are combined into one class, and the data with label 0 is set as one class separately. Therefore, two classifiers are used for the multi-classification problem. The first classifier separates 0 from (1, 2), and the second classifier separates 1 from 2. Therefore, there are eight classifiers for four-time point lengths. Participants use xDAWN and Tangent space as feature extraction, and the classifier is logistic regression. Figure 13(a) shows the details of Algo-H.

Fig. 13

Flowcharts of the five algorithms. (a) Algo-H. (b) Algo-P. (c) Algo-B. (d) Algo-X. (e) Algo-M.

In Algo-P, participants used the deep learning framework. First, they used the notch filter of 50, 20, 10, 30, and 40 Hz. Second, they used the baseline detrend drift. Finally, they used the eight-order Butterworth filter. Furthermore, the participants combined three LSTM networks serially with the EEGNet network. Figure 13(b) shows the details of Algo-P.

In Algo-B, the participants used the deep learning framework. First, they used data standardization without any filter. Second, they reorganized the data with disrupting the channel order and reassembled the data into its original shape. Different subjects were trained and tested with EEGNet. Figure 13(c) shows the details of Algo-B.

In Algo-X, the participants used the deep learning framework. First, they used the third- order Butterworth filtering operation on the data. Then, they combined DeepConvNet with two EEGNet for training and testing. Figure 13(d) shows the details of Algo-X.

In Algo-M, the participants used the deep learning framework. First, they used the third- order Butterworth filter. Then, they trained with EEGNet, which is similar to the idea of Algo-B. Figure 13(e) shows the details of Algo-M.

3 Results

There were four subjects in the finals, and each subject had three blocks of data. Tables 3, 4, and 5 present the mean results on the four subjects of the algorithms. We added HDCA and SWFP [28 –30] as the baselines. Different subjects have different results due to different environments and technical levels.

Table 3

Results of all subjects (TPR).

TPR	Subject1	Subject2	Subject3	Subject4	Mean
Algo-H	0.3472	0.2515	0.3673	0.5551	0.3803
Algo-P	0.3408	0.2621	0.2687	0.4548	0.3316
Algo-B	0.3842	0.1989	0.1773	0.3997	0.2900
Algo-X	0.3646	0.2735	0.2993	0.1364	0.2685
Algo-M	0.2835	0.2765	0.2169	0.2868	0.2659
HDCA	0.2510	0.2320	0.2030	0.2540	0.2350
SWFP	0.2632	0.2539	0.2145	0.3019	0.2584

Table 4

Results of all subjects (FPR).

FPR	Subject1	Subject2	Subject3	Subject4	Mean
Algo-H	0.3601	0.3913	0.3572	0.3449	0.3634
Algo-P	0.3717	0.4023	0.3779	0.3651	0.3793
Algo-B	0.2850	0.2391	0.2776	0.2992	0.2752
Algo-X	0.4242	0.4039	0.4287	0.4371	0.4235
Algo-M	0.3332	0.3573	0.3268	0.3177	0.3338
HDCA	0.3310	0.3720	0.3530	0.3445	0.3501
SWFP	0.3448	0.3541	0.3454	0.3321	0.3446

Table 5

Results of all subjects (AUC).

AUC	Subject1	Subject2	Subject3	Subject4	Mean
Algo-H	0.8454	0.8415	0.8563	0.8651	0.8521
Algo-P	0.8406	0.8445	0.8487	0.8544	0.8471
Algo-B	0.8582	0.8376	0.8369	0.8493	0.8455
Algo-X	0.8353	0.8343	0.8553	0.8259	0.8377
Algo-M	0.8245	0.8355	0.8443	0.8254	0.8324
HDCA	0.8210	0.8327	0.8334	0.8149	0.8255
SWFP	0.8249	0.8348	0.8357	0.8270	0.8360

The results of Subject4 were much higher than those of Subject1, Subject2, and Subject3. Algo-H performed much better than other algorithms on TPR. It also has an outstanding performance on subjects skilled with the instructions. In other words, the performance of Algo-H is more stable, meaning that a non-deep learning framework may be suitable for the scene.

Specific analysis is as follows: Too little training data may make the deep learning framework easy to overfit; meanwhile, the non-deep framework is naturally suitable for scenes with small data. Additionally, RSVP is a paradigm with unbalanced samples, where there are too many non-targets and few targets, easily leading to the problem of inaccurate classification boundary in the deep learning framework. However, the non-deep learning framework based on OvR is conducive to solving the problems because it classifies different tasks separately and improves accuracy. However, the deep learning framework can also achieve the best accuracy through an appropriate feature extraction approach.

Algo-B and Algo-M perform better than other algorithms on FPR. Specific analysis is as follows: Algo-B and Algo-M are algorithms with a single EEGNet, which makes the algorithms not have an extreme bias to predict the target results (label 0, label 1). Because of the use of a large number of classifiers, Algo-H uses OvR classifers, Algo-P uses EEG-Net and three LSTM Networks, and competitors use TPR to extremely optimize the algorithms so that the algorithms tend to predict the results with target labels 0 and 1.

Algo-H, Algo-P, and Algo-B perform better than other algorithms on AUC. Specific analysis is as follows: Algo-H and Algo-P perform better on TPR. However, Algo-B has advantages on FPR that its AUC is also good enough. In practice, keeping a balance between TPR and FPR is important to achieve better AUC. The competitors may use AUC to optimize the algorithms in the future because AUC can reflect the overall performance of the algorithm.

Figure 14 shows the TPR of all blocks. Some algorithms have good results in some subjects. However, it can be found that algorithms using more feature extraction and ensemble learning approaches can ensure better performance in multiple subjects, such as Algo-H and Algo-P.

Fig. 14

TPR of all blocks.

4 Discussion

This paper introduces the algorithm used by the top-five teams in the finals of the WRC2021 ERP competition. These algorithms use several new approaches to improve the performance of ERP in the training-free scenario.

However, we cannot test whether these approaches are effective, and many models were pre-trained by preliminary data before the competition. Thus, we added an experiment based on data from the preliminary competition. Consequently, Algo-H, Algo-P, and Algo-B have good effects, as shown in Tables 6, 7, and 8.

Table 6

Results of the additional experiment on the data of preliminary competition (TPR).

TPR	Subject1	Subject2	Subject3	Subject4	Subject5	Subject6	Mean
Algo-H	0.4853	0.3789	0.5234	0.5351	0.2467	0.4653	0.4391
Algo-P	0.5078	0.3423	0.4567	0.5378	0.3089	0.4234	0.4295
Algo-B	0.4245	0.3768	0.4012	0.4876	0.2679	0.3881	0.3910
Algo-X	0.4312	0.3945	0.4235	0.5536	0.2684	0.3687	0.4067
Algo-M	0.4019	0.3765	0.4019	0.4534	0.2659	0.3784	0.3797
HDCA	0.4034	0.3435	0.3839	0.4039	0.2835	0.3245	0.3571
SWFP	0.4135	0.3535	0.4049	0.4236	0.2639	0.3765	0.3727

Table 7

Results of the additional experiment on the data of preliminary competition (FPR).

FPR	Subject1	Subject2	Subject3	Subject4	Subject5	Subject6	Mean
Algo-H	0.2678	0.2792	0.2655	0.2544	0.2847	0.2654	0.2695
Algo-P	0.2786	0.2877	0.2777	0.2699	0.2788	0.2833	0.2793
Algo-B	0.1653	0.1723	0.1784	0.1773	0.1888	0.1701	0.1754
Algo-X	0.3315	0.3279	0.3243	0.3231	0.3289	0.3274	0.3272
Algo-M	0.2821	0.2767	0.2829	0.2832	0.2757	0.2789	0.2799
HDCA	0.2932	0.3055	0.2977	0.2838	0.3029	0.2967	0.2966
SWFP	0.2732	0.2831	0.2715	0.2694	0.2743	0.2715	0.2738

Table 8

Results of the additional experiment on the data of preliminary competition (AUC).

AUC	Subject1	Subject2	Subject3	Subject4	Subject5	Subject6	Mean
Algo-H	0.8567	0.8521	0.8647	0.8653	0.8476	0.8583	0.8575
Algo-P	0.8577	0.8441	0.8554	0.8574	0.8433	0.8458	0.8506
Algo-B	0.8521	0.8468	0.8532	0.8578	0.8519	0.8485	0.8517
Algo-X	0.8422	0.8459	0.8441	0.8597	0.8387	0.8434	0.8457
Algo-M	0.8378	0.8368	0.8419	0.8397	0.8459	0.8410	0.8405
HDCA	0.8287	0.8264	0.8256	0.8376	0.8325	0.8241	0.8292
SWFP	0.8299	0.8287	0.8352	0.8396	0.8358	0.8235	0.8321

According to the results of the preliminary and final competitions, appropriate filtering approaches, data alignment, and feature extraction approaches can help the algorithm achieve better results; thus, determining the stability and generalization of the algorithm in training-free cross-subject scenarios. Algo-B and Algo-M use a single EEG as the classifier. Meanwhile, Algo-B uses the feature extraction approach to recognize the data; therefore, it has achieved better TPR on more subjects than Algo-M. However, whether the recombinant data adopted by Algo-B is reliable still needs more tests because it will reduce the algorithm performance on some subjects. Algo-H uses EA as a preprocessing approach, which is suitable for training-free scenarios.

To confirm the effectiveness of EA, we added an ablation experiment of EA with Algo-H (Tables 9, 10, and 11). Thus, we obtained that EA could improve TPR and AUC up to 3.31% and 0.56%, respectively, and reduce FPR down to 1.11%.

Table 9

Results of the ablation experiment in Algo-H (TPR).

TPR	Subject1	Subject2	Subject3	Subject4	Mean
Algo-H	0.3472	0.2515	0.3673	0.5551	0.3803
Algo-H w/o EA	0.3019	0.2110	0.3540	0.5220	0.3472

Table 10

Results of the ablation experiment in Algo-H (FPR).

FPR	Subject1	Subject2	Subject3	Subject4	Mean
Algo-H	0.3519	0.3810	0.3540	0.3224	0.3523
Algo-H w/o EA	0.3601	0.3913	0.3572	0.3449	0.3634

Table 11

Results of the ablation experiment in Algo-H (AUC).

AUC	Subject1	Subject2	Subject3	Subject4	Mean
Algo-H	0.8487	0.8476	0.8623	0.8723	0.8577
Algo-H w/o EA	0.8454	0.8415	0.8563	0.8651	0.8521

In terms of classifier, only Algo-H adopted non-deep learning approaches, such as logistic regression classification. The other four groups used neural networks. In the finals of the WRC2021 ERP competition, Algo-H achieved the best TPR on Subject3 (0.3673) and Subject4 (0.5551). It also achieved good TPR on Subject1 (0.3472) and Subject2 (0.2515). Meanwhile, Algo-B and Algo-P achieved the best TPR on Subject1 (0.3842) and Subject2 (0.2621), respectively.

Algo-B and Algo-M achieved good FPR on Subject2 (0.2391) and Subject3 (0.2776), and Algo-H (0.8521), Algo-P (0.8471), and Algo-B (0.8455) performed better in AUC.

Additionally, ensemble learning plays an important role in improving algorithm performance.

Algo-H uses different time points lengths to integrate, and Algo-P uses three LSTM networks to combine with the EEGNet network, which has achieved good results.

5 Conclusion

The described algorithms provide some new ideas for dealing with ERP training-free scenarios. For ERP, EA and appropriate filtering approaches can significantly improve algorithm performance because of the ablation experiment of Algo-H and the comparison of Algo-B and Algo-M. Additionally, the non-deep learning algorithm may be more stable than the deep learning approach. Ensemble learning can make the model more stable and robust. Furthermore, combining the non-deep learning approach and deep learning may have better performance. Keeping a balance between TPR and FPR is still essential to promote the model’s AUC. In future studies, we will test all algorithms on more diverse datasets to test the effectiveness of each approach. Then, we will propose a more comprehensive and practical training-free framework to improve the performance of the ERP analysis.

Footnotes

Conflict of interests

All contributing authors report no conflict of interests in this work.

Funding

This research was supported by the National Key Research and Development Program of China (Grant No. 2021ZD0201303), the Technology Innovation Project of Hubei Province of China (Grant No. 2019AEA171), and the Hubei Province Funds for Distinguished Young Scholars (Grant No. 2020CFA050).

Authors’ contribution

Conception and design of the study, data acquisition and analysis, manuscript drafting and revising: Huanyu Wu. Reviewing and approving the final draft of the manuscript: Huanyu Wu and Dongrui Wu.

References

Pan

Wang

etal. A hybrid BCI system combining P300 and SSVEP and its application to wheelchair control. IEEE Trans Biomed Eng 2013, 60(11): 3156–3166.

Zhang

Ding

etal. Tiny noise, big mistakes: adversarial perturbations induce errors in brain–computer interface spellers. Natl Sci Rev 2021, 8(4): nwaa233.

Park

Myung

Yoo

. Power consumption of wireless EEG device for BCI application: portable EEG system for BCI. In 2013 International Winter Workshop on Brain–Comput Interface (BCI), Gangwon, Korea, 2013, pp 100–102.

Sourina

Liu

EEG-Enabled Affective Applications. In 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, Geneva, Switzerland, 2013, pp 707–708.

Farwell

Donchin

. Talking off the top of your head: toward a mental prosthesis utilizing event-related brain potentials. Electroencephalogr Clin Neurophysiol 1988, 70(6): 510–523.

Pfurtscheller

Neuper

. Motor imagery and direct brain-computer communication. Proc IEEE 2001, 89(7): 1123–1134.

Panwar

Rad

Jung

etal. Modeling EEG data distribution with a Wasserstein generative adversarial network to predict RSVP events. IEEE Trans Neural Syst Rehabilitation Eng 2020, 28(8): 1720–1730.

Meng

Meriño

Robbins

etal. Exploiting correlated discriminant features in time frequency and space for characterization and robust classification of image RSVP events with EEG data. 2012 IEEE Stat Signal Process Work SSP 2012: 668–671.

Fuhrmann Alpert

Manor

Spanier

etal. Spatiotemporal representations of rapid visual target detection: a single-trial EEG classification algorithm. IEEE Trans Biomed Eng 2014, 61(8): 2290–2303.

10.

Manor

Geva

. Convolutional neural network for multi-category rapid serial visual presentation BCI. Front Comput Neurosci 2015, 9: 146.

11.

Schirrmeister

Springenberg

Fiederer

LDJ

etal. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum Brain Mapp 2017, 38(11): 5391–5420.

12.

Ahmed

Mauricio Merino

Mao

etal. A Deep Learning method for classification of images RSVP events with EEG data. In 2013 IEEE Global Conference on Signal and Information Processing. Austin, TX, USA, 2013, pp 33–36.

13.

Zang

Lin

Liu

etal. A deep learning method for single-trial EEG classification in RSVP task based on spatiotemporal features of ERPs. J Neural Eng 2021, 18(4): 0460c8.

14.

. Transfer learning for brain–computer interfaces: a Euclidean space data alignment approach. IEEE Trans Biomed Eng 2020, 67(2): 399–410.

15.

Rivet

Souloumiac

, Attina V, et al. xDAWN algorithm to enhance evoked potentials: application to brain-computer interface. IEEE Trans Biomed Eng 2009, 56(8): 2035–2043.

16.

Kobler

Hirayama

Hehenberger

etal. On the interpretation of linear Riemannian tangent space model parameters in M/EEG. In Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Mexico City, Mexico, 2021, pp 5909–5913.

17.

Kleinbaum

Dietz

Gail

etal. Logistic regression. New York: Springer-Verlag, 2002.

18.

Shi

Chen

0014

etal. Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, Canada, 2015, pp 802–810.

19.

Zanini

Congedo

Jutten

etal. Transfer learning: a Riemannian geometry framework with applications to brain-computer interfaces. IEEE Trans Biomed Eng 2018, 65(5): 1107–1116.

20.

Pan

Yang

. IEEE transactions on knowledge and data engineering. IEEE Trans Knowl Data Eng 2004, 16(3): C2.

21.

Sun

Feng

Saenko

Return of frustratingly easy domain adaptation. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI’ 16). AAAI Press, 2016, pp 2058–2065.

22.

Lawhern

Hairston

etal. Switching EEG headsets made easy: reducing offline calibration effort using active weighted adaptation regularization. IEEE Trans Neural Syst Rehabilitation Eng 2016, 24(11): 1125–1137.

23.

Gretton

Borgwardt

Rasch

etal. A kernel method for the two-sample-problem. In Advances in Neural Information Processing Systems 19: Proceedings of the 2006 Conference. Schölkopf

Platt

Hofmann

MIT Press, 2007, pp 513–520.

24.

Liu

Chen

etal. Align and pool for EEG headset domain adaptation (ALPHA) to facilitate dry electrode based SSVEP-BCI. IEEE Trans Biomed Eng 2022, 69(2): 795–806.

25.

Xia

Wang

etal. Transfer learning algorithm of P300-EEG signal based on XDAWN spatial filter and Riemannian geometry classifier. Appl Sci 2020, 10(5): 1804.

26.

Lawhern

Solon

Waytowich

etal. EEGNet: a compact convolutional neural network for EEG-based brain-computer interfaces. J Neural Eng 2018, 15(5): 056013.

27.

Krizhevsky

Sutskever

Hinton

ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. Lake Tahoe, Nevada, 2012, pp 1097–1105.

28.

Parra

Christoforou

Gerson

etal. Spatiotemporal linear decoding of brain state. IEEE Signal Process Mag 2008, 25(1): 107–115.

29.

Shamwell

Lee

Kwon

etal. Single-trial EEG RSVP classification using convolutional neural networks. In Proceedings SPIE 9836 Micro- and Nanotechnology Sensors, Systems, and Applications VIII. Baltimore, Maryland, USA, 2016.

30.

Marathe

Ries

McDowell

. Sliding HDCA: single-trial EEG classification to overcome and quantify temporal variability. IEEE Trans Neural Syst Rehabil Eng 2014, 22(2): 201–211.