Abstract
The emerging field of affective computing focuses on enhancing computers’ ability to understand and appropriately respond to people’s affective states in human-computer interactions, and has revealed significant potential for a wide spectrum of applications. Recently, the electroencephalography (EEG) based affective computing has gained increasing interest for its good balance between mechanistic exploration and real-world practical application. The present work reviewed ten theoretical and operational challenges for the existing affective computing researches from an interdisciplinary perspective of information technology, psychology, and neuroscience. On the theoretical side, we suggest that researchers should be well aware of the limitations of the commonly used emotion models, and be cautious about the widely accepted assumptions on EEG-emotion relationships as well as the transferability of findings based on different research paradigms. On the practical side, we propose several operational recommendations for the challenges about data collection, feature extraction, model implementation, online system design, as well as the potential ethical issues. The present review is expected to contribute to an improved understanding of EEG-based affective computing and promote further applications.
1 Introduction
As one of the most fundamental mental processes of human beings, emotion plays a crucial role in people’s interactions with the outside world. The emerging field of affective computing is targeted at “computing that relates to, arises from, or deliberately influences emotions” [1], and mainly focused on enhancing computers’ ability to understand and appropriately respond to people’s affective states in human-computer interactions. The core idea of affective computing is to decode human emotions by analyzing people’s behavioral and/or physiological responses using machine learning methods. And the application scenarios of affective computing range from affective state monitoring for health/ safety/marketing purposes to affect-sensitive human-computer interaction systems such as brain-computer interface (BCI) games and intelligent education systems (see more discussion in section 2.9).
The most frequently used data for affective computing can be categorized into people’s behavioral data (e.g., facial expression, body posture, voice, etc.) and physiological data (e.g., heart rate, skin temperature, galvanic skin response, etc.). The behavioral data has focused on the expression aspect of human emotions, lacking an in-depth understanding of the physiological basis. In addition, most behavioral expressions could be either spontaneous or deliberate, undermining the objectivity of behaviorbased affective computing. The physiologicaldata-based approach is expected to overcome these issues, as it provides a direct measurement of human physiological activities that could hardly be concealed. Among all possible physiological recording techniques, electroencephalography (EEG) has gained increasing interest in recent years, with the number of publications per year increased from about 300 in 2001 to about 4600 in 2018 (Google Scholar results with the keywords of ‘EEG’ and ‘affective computing’).
The EEG-based approach has a good balance between mechanistic explorations and real-world practical applications [2]. On the one hand, EEG data has rich spatial, temporal and spectral information about human affective experiences for investigating the underlying neural mechanisms. On the other hand, the EEG recording technique (especially with its latest development) is known for its high device portability and low running cost, which are fundamentally important for realworld application scenarios. More importantly, the EEG-based findings on the neural mechanisms of human affective experiences could be rapidly transferred from basic research to applications, with the help of the BCI technique that emphasizes real-time, individualized decoding of human states (affective states in the present case) [3, 4].
Nevertheless, theoretical and operational challenges remain to be addressed appropriately, before moving toward real-world practices of EEG-based affective computing. In the current paper, we review and summarize ten critical challenges facing the affective computing researches from an interdisciplinary perspective that includes information technology, psychology, and neuroscience. Topics from affective computing theories, algorithms, and applications are covered. We also share our opinions and give advice to practitioners working in the field.
2 Challenges for EEG-based affective computing
2.1 Adopting a proper theoretical framework of emotion
A proper emotion model is the fundamental theoretical challenge for all the affective computing studies. A common practice is to arbitrarily choose one from the mainstream emotion models proposed by psychologists (e.g., the basic emotion model proposed by Ekman [5], the circumplex model of affect proposed by Russell [6], etc.), with the theoretical frameworks underlying these models often largely neglected.
There are two major meta-theoretical perspectives for framing emotions: the categorical perspective and the dimensional perspective. The former assumes that emotions are categorically discrete, and complex emotions are the combinations of multiple basic emotions. For example, contempt is composed of anger and disgust, and disappointment comprises of surprise and sadness. Although researchers have not reached a consensus on the number and specific categories of basic emotions, most tend to agree that humans have at least six basic emotions [5], which are anger, disgust, fear, sadness, surprise, and happiness. Contrary to the categorical perspective, the dimensional perspective holds that emotions are underlain by basic dimensions, and every emotion can be mapped into a specific position in the multi-dimensional emotion space. The most commonly used dimensional model in affective computing is the Valence-Arousal model proposed by Russell [6], which posited that valence (ranging from negative to positive) and arousal (ranging from calm to excited) are the two primary dimensions of human emotions. Moreover, other dimensions like dominance [7] and approachwithdrawal [8] were also emphasized, but they are less used in affective computing studies. However, which of the two theoretical frameworks is more approximate to the nature of human emotions, is a long-lasting and still ongoing debate [9].
It is worth noting that more recent researches in the field of psychology and affective neuroscience have made some important amendments to those aforementioned emotion models. Especially, increasing attention has been drawn towards the positive side of human emotions. The emerging field of positive psychology has argued that the difference within positive emotions was understated in traditional emotion theories. For instance, while only “happiness” in the six basic emotions proposed by Ekman can be regarded as positive, a recent view proposed by Fredrickson included ten representative positive emotions (joy, gratitude, serenity, interest, hope, pride, amusement, inspiration, awe, and love) based on the experience frequency in people’s daily life [10]. There are emerging affective computing studies in support of these recent theoretical advances, demonstrating the possibility of realtime decoding of discrete positive emotions. Liu and colleagues reported that three discrete positive emotions (joy, amusement, tenderness) could be effectively differentiated using EEG [11]. Hu and colleagues found that both the EEG and hemodynamic responses (by functional nearinfared spectroscopy, fNIRS) to the ten positive emotions proposed by Fredrickson are recognizably different [12, 13]. Affective computing of fine-grained positive emotions would be especially beneficial for user experience evaluation of human-computer interactions, as most of the interaction designs are expected to bring users positive experiences.
It has been further suggested that positive and negative emotions have distinct functional roles: negative emotions are linked to fight-or-flight responses, but positive emotions are more likely to be associated with broadening and building social resources and other resources such as personal cognitive resources [14, 15]. Therefore, it might be oversimplified to have a single valence dimension to place negative and positive emotions at its two ends. Accordingly, there is convincing behavioral evidence showing that people actually could feel happy and sad at the same time. For example, people could feel both happy and sad when graduating from colleges [16], or watching bittersweet films like Life is beautiful [17]. And the coactivation of positive and negative emotions, or termed “mixed emotion”, has been suggested to be beneficial to health [18] and creativity [19]. In line with these findings, it has been proposed to have positive and negative emotions as two separate unipolar dimensions, rather than taking them as the polar opposites [20, 21]. However, the EEG signature for mixed emotion remains elusive, and to the best of our knowledge, no studies in the field of affective computing have addressed this issue.
The recent trend emphasizing the positive side of emotions is illustrated in Fig. 1. In summary, we suggest researchers pay more attention to the theoretical frameworks underlying the emotion models used in their studies, and it would be helpful for the field of affective computing to update the methodologies with reference to the latest findings from psychology and affective neuroscience.

Existing affective computing researches often took positive and negative emotions as polar opposites, and understated the diversity of positive emotions (Model 1), but recent advances have suggested the measurement of positive and negative emotions as bivariate, and the emphasis on the diversity of positive emotions (Model 2).
2.2 Understanding the EEG representation of affective states
The rationale for the EEG-based affective computing assumes that the EEG signals can represent human emotions with sufficient accuracy and sensitivity. However, this assumption could not always be taken for granted because the relationship between EEG signals and affective states could be very complicated. Cacioppo and colleagues [22] described four kinds of relationship between physiological responses and psychological elements: (1) one-to-one (one psychological element is associated with one and only one physiological signal), (2) one-to-many (one psychological element is associated with several physiological signals), (3) many-to-one (several psychological elements are associated with the same physiological signal), and (4) many-to-many (several psychological elements are associated with several physiological signals). The one-to-one relationship could provide an ideal theoretical basis for affective computing, but one-to-one relationships are difficult to validate and still rarely reported in existing literature. However, people tend to interpret the existing findings in a “one-to-one” manner. For example, the asymmetry of frontal EEG activities was thought to reflect emotional valence in early studies [23], but later researchers argued that frontal EEG asymmetry varied with motivational direction rather than emotional valence [24, 25]. A recent review indicated that the psychological implication of frontal EEG asymmetry is still controversial [26], but this EEG indicator is yet often taken as a granted index of emotional valence in some affective computing studies, especially for the application-oriented ones. Future studies should be more cautious of the validity of such “one-to-one” EEG indicators for emotions, and interpret them with more caution.
Furthermore, the EEG-emotion relationships proposed in existing studies (whether one-to-one, one-to-many, many-to-one, or many-to-many) could also be challenged in terms of reliability. Specifically, the conclusions about EEG-emotion relationships found in previous literature are often unclear on: (1) whether they can be replicated consistently over time; (2) whether they can be applied to different populations; and (3) how far they can be generalized in varied situations. However, existing EEG-based affective computing studies rarely provide statements on the reliability of their conclusions. The variation of the experimental paradigms and the data analyzing methods in different studies have made the cross-study verification difficult. Nevertheless, some researchers have begun to compare the emotion recognition efficiency of different EEG features across different datasets [27], which is expected to contribute to a better understanding of the EEG representations of human emotions and also be beneficial to future affective computing applications. Besides, preregistered direct replication studies should also be encouraged to provide more direct evidences for the reliability of the proposed EEG-emotion relationships in previous studies [28, 29].
2.3 Bridging the gap between passive and active emotion elicitation methods
Emotion elicitation methods used in affective computing studies can be divided into two main categories: passive or perception-based elicitation, and active or expression-based elicitation [30]. In the passive elicitation methods, individuals passively perceive emotional stimuli such as images, music and videos designed to evoke specific affective states. The most prominent advantage of the passive elicitation methods is that the stimuli can be highly standardized, and people’s affective states can be well manipulated. The most commonly used datasets for standard emotion stimuli include the International Affective Picture System (IAPS) [31], International Affective Digital Sound library (IADS) [32], Affective Norms for English Words (ANEW) [33], and the emotional video database such as FilmStim [34], MAHNOB-HCI [35] , DEAP [36], etc. (the latter two also have the corresponding behavioral and neurophysiological data). Although meta-analysis suggested that emotional videos (e.g., film clips) might be the most effective emotion-eliciting materials [37], there are still concerns about the ecological validity of video stimuli [30]. The emotional responses to films require the willing suspension of disbelief [38]; and people’s previous viewing experience/familiarity to the materials would also greatly impact the effectiveness of video stimuli.
In active elicitation methods, individuals are instructed to perform particular tasks that are designed to induce different affective states naturally. For example, individuals might be asked to recall his/her personal achievements to induce pride [39], participate in public speaking events to induce anxiety [40], or be provided with fake negative feedbacks to induce anger [41]. The most significant advantage of the active methods is that they are more naturalistic and similar to the emotional events occurring in the real world. And they are more efficient to induce emotions that are difficult to induce in the passive methods, such as anger and guilt [42]. However, the active methods are more difficult to precisely manipulate, and there could be a wider variety in individuals’ emotional responses [30]. In addition, additional disadvantages might come to the EEG-based affective computing as the active emotion-inducing tasks are often accompanied by more artifacts in the EEG data (e.g. caused by unavoidable emotional expression-related motions).
In real-world daily life, people’s emotional changes derive from both the emotional stimuli they passively observed (e.g., finding a funny video in the social media), and the emotional interactions they actively participated in (e.g., reposting the funny video to friends). These two kinds of emotion-inducing events are often co-occurring, and interacting in complex ways. However, the existing affective computing studies rarely included both of these two types of emotion-inducing events, and the investigation on the transferability between the affective computing systems based on different emotion elicitation methods is still very limited. To build more naturalistic affective computing systems that are compatible for both the passively and actively evoked emotions, researchers should be more aware of the difference between these two emotion elicitation methods. Practically, it would be necessary to validate the performance of the existing models with one elicitation method on data collected from the other method. Alternatively, for building new models, it is suggested to include data from both passive and active elicitation paradigms for model training. And it is always encouraged to validate affective computing models with as many types of data as possible, which is expected to help practitioners to understand the application boundary of these affective computing models.
In addition, it should also be noted that there is always a trade-off between the ecological validity and experimental control of the emotion elicitation methods: a well-controlled experimental paradigm would facilitate data collection and analysis but possibly limit its generalizability, whereas a high ecologically valid paradigm would better resemble real-life settings but bring more challenges to data analysis. Researchers should find the balance point according to the specific research purposes: if the possible application scenario is very limited (i.e. could be easily covered by experiment design), having better experimental control is preferred; otherwise, ecological validity should be emphasized and more efforts are expected to build the model.
2.4 Collecting emotion data in a convenient and reliable way
An EEG-based affective computing system starts with EEG signal acquisition. For many years, research-level devices with wet electrodes have been the main choice, providing high-quality data for evaluating the feasibility of individualized affective computing [12, 43] and serving as a benchmark for comparing different classification algorithms [36]. However, the time-consuming preparation procedure for wet-electrode caps (e.g., skin preparation, conductive gel application, etc.) and the high price of these devices make it difficult to transfer from laboratory demonstrations to real-world applications.
Recently, consumer-level headsets with high portability and reasonable prices are coming to the market. With a small amount of dry or water-based electrodes and carefully designed light-weight “helmets”, these devices can be set up quite easily, and the users do not need to wash their hair before or afterwards. While the signal quality may not be as good as their researchlevel counterparts, quite a number of recent studies have demonstrated the feasibility of using the consumer-level devices for affective computing with promising performances [11, 44 –46]. Furthermore, the light-weight characteristics of these consumer-level devices can significantly improve user convenience (imagining wearing a 2-channel headband instead of a 64-channel wet-electrode cap), therefore potentially possible to be applied in substantially more real-world application scenarios, such as education, gaming and health care.
Despite the merits in application, consumerlevel EEG devices still face obstacles. The first concern lies in the long-term EEG recording with high-fidelity. Water-based electrodes suffer from the issue of filled salt water drying. Therefore the lasting time is usually one or two hours; Pressure is needed to keep a rigid contact between the dry electrodes and the scalp, likely to cause discomfort or even pain when wearing for a long time [47, 48]. Consequently, interfaces which can offer a robust and comfortable use are still lacking. Novel electrode materials (e.g., porous ceramic-based “semi-dry” electrode) or electrode placement (e.g., in ears or other nonhair areas) may provide a solution [49 –52], yet their performance for emotion recognition needs further verification.
Another concern is the relatively low data quality. EEG data is vulnerable to artifacts even in well-controlled conditions, and the situation would be worse in real-world scenarios where the artifacts are expected to be more intense. In addition, consumer-level devices usually have a smaller number of channels. As many de-noising algorithms rely heavily on spatial information (e.g., independent component analysis, ICA), a reduced number of channels could have a severe influence on their performances. Although there are methods (e.g., empirical-mode decomposition, non-negative matrix factorization, etc.) that are applicable with single-channel data [53 –55], a widely-accepted, standard de-noising pipeline for data collected with commercial devices is still needed.
Before taking use of the consumer-level devices, it is suggested to evaluate their performance with standard testing procedures according to the research purpose. The most convenient way is to search for existing literatures using the same device, but caution must be taken when evaluating the quality of these studies. Alternatively, it is always recommended to record EEG signals with consumer-level devices and research-level devices in the same classical experimental paradigms (e.g., motor imagery, P300, and SSVEP paradigms for BCI applications [56]) in the same lab settings as the to-be-conducted study. Direct comparison on signal quality as well as critical experimental results could then be conducted.
2.5 Extracting robust features for affective computing
EEG-based affective computing relies on EEG features with sufficient discriminative powers. The EEG features from the multi-channel EEG time series are usually decomposed into the temporal, spectral and spatial domains [57]. In the temporal domain, statistical information such as entropy, the fractal dimension and higher order crossings are frequently used [58, 59]. In the spectral domain, EEG signals are analyzed with respect to the classical frequency bands, i.e., δ band (1~3 Hz), θ band (4~7 Hz), α band (8~13 Hz), β band (14~30 Hz) and γ band (> 30 Hz) [60]. The most commonly used spectral features include power spectral density (PSD), differential entropy (DE), differential asymmetry (DASM), rational asymmetry (RASM) and differential caudality (DCAU), etc. [61 –63]. The spatial domain features, however, can be easily incorporated with the temporal or spectral features, by extracting temporal or spectral information from multichannel EEG data. Besides, spatial features could be further exploited by investigating the neural connectivity patterns between channels/ electrodes. Emerging studies are exploring methods such as temporal correlation, spectral coherence, phase synchronization index, etc., as well as their corresponding graph-theory based analysis [64 –66].
As most of the application scenarios for affective computing include continuous and complex audiovisual stimulations, a recently developing approach called inter-subject correlation (ISC) may provide a new perspective for feature extraction. ISC was originally proposed for analyzing fMRI responses to natural visual stimuli [67, 68]. The key idea is to describe the neural responses by calculating the inter-subject correlations rather than searching for singlesubject activations compared to a certain baseline. Compared to the classical single-subject features as reviewed above, this approach could effectively capture the neural dynamics to external stimuli while avoiding the challenging issue of defining discrete events from the complex and continuous stimulations. Recent EEG studies are showing promising performance from the inter-subject perspective [69]. Dmochowski and colleagues used a correlated component analysis method to extract the inter-subject neural correlations that reflected attention and emotion-modulated cortical processing [70]; synchronized EEG activities across a group of students in realworld classroom settings were reported to be related to attention and engagement [71 –73]; Ding and colleagues directly investigated the predictive power of a series of inter-subject features for real-time affective states on the basis of the valence and arousal dimensions and found inter-subject features had superior performance as compared to the single-subject features [74].
Nevertheless, the robustness of the abovementioned EEG features still needs further validation. While people working on computational methods can continue their pursuit for more advanced signal processing techniques, critical caveats are requiring additional efforts. Specifically, a careful experimental design with necessary control conditions would be of great help to clarify the possibly complicated relationship between the physiological responses and the psychological elements (as reviewed in 2.2), toward more specified EEG correlates for a certain affective state. Most affective computing studies, however, were conducted without considering possible confounding factors. For instance, people watching a “gratitude” video clip may also experience a high intensity of “hope” and “love” [12], making it improper to attribute the observed neural responses to a simple “gratitude” category. Therefore, researchers are suggested to learn from psychologists for an improved experimental design for a more accurate affective definition of the elicited EEG responses. A better definition is expected to increase the robustness of the to-be-extracted features.
2.6 Decoding affective state accurately and continuously
After selecting proper EEG features, the next common and important step for affective computing is to build machine learning classifiers for decoding affective states. All of the commonly seen machine learning algorithms have found their place in affective computing [75 –78], including linear discriminant analysis (LDA), support vector machines (SVM), k-nearest neighbors (kNN), Naive Bayes (NB) classifiers and their extensions. More recently, researchers are beginning to explore the latest neural network algorithms for EEG-based affective state decoding. There are studies using autoencoder [79], deep belief networks (DBNs) [61],deep recursive neural network (RNN) [80], and convolutional neural network (CNN) [81]. Comparable or slightly improved performance was obtained, as compared to the classical classification methods. Depending on the experimental scenarios, most of the reported decoding accuracies were in the range of 70%~90% for classifying two or more discrete affective states.
Nevertheless, there is still a gap between the state-of-the-art progress and real-world applications. For one thing, most of the decoding algorithms were tested by assuming a stationary affective state during a relatively long period time (e.g., usually > 10s epoch with the same affective label in video-based paradigms [12, 36, 82]). This stationary assumption may not always hold, as we could have a rapid-changing affective experience at a sec scale [83]. Rather, it is preferred to have affective labels that could accurately reflect the continuous, dynamic affective experience as the golden standard for training the algorithms. However, obtaining such dynamic labels by subjective reports could be labor-intensive and time consuming [74, 84] and automatic tagging using internet-based crowdsourcing methods [85 –87] or information from other modalities (e.g. video content analysis [88], face expressions [89] and peripheral physiological responses [90] during video watching) could provide feasible alternative options. For another, the lacking of standardized large-scale datasets has made the performance evaluation across different methods difficult. To date, the theoretical framework and the corresponding affective stimuli varied substantially from one study to another. Even when using the same algorithms, the EEG data preprocessing details and the key parameters of the algorithms were hardly identical [45]. Although efforts are being made to provide benchmark EEG datasets, such as DEAP, SEED, MAHNOB-HCI, DREAMER, ASCERTAIN, etc. [36, 76, 82, 91], they usually had a relatively small sample size, with 20~60 subjects and 15~40 affective stimuli. The small sample might hinder further algorithm development, especially for the advanced neuralnetwork based methods. As the collection of a large-scale dataset is not an easy job for a single research group, it is highly suggested to do it in a collaborative way with standardized stimuli and procedures shared across multiple groups, as researchers have done in other fields [92, 93]. Another suggestion is to collect data from modalities more than EEG if possible during datasets acquisition. Due to the richness of human emotional expressiveness, the fusion of information from other modalities (e.g., facial expression, peripheral physiological responses and eye movement) may lead to better recognition performance, and the analysis of the relationship among modalities could perhaps shed light on the nature of emotions as well [94].
2.7 Moving from offline to online affective computing
Towards real-world affective computing applications, many scenarios would ask for online affective computing with real-time outputs, such as human-computer interactions in general. However, most of the studies to date were conducted in an offline manner [95], and the performance of their online extensions remained to be explored. Moving from offline to online applications is more than a simple transfer of the offline feature extraction and decoding methods: The “real-time” feature needs special attention and brings new challenges.
First of all, online affective computing requires a timely output of the EEG data computation, imposing constraints on the computation speed. Therefore, while time-consuming and complex methods can be employed to achieve high classification accuracy in offline systems, the online affective computing pipeline, including pre-processing, feature extraction and classification, has to be carefully optimized toward the real-time need [95]. Simpler mathematical models which reduce the computational cost while maintaining adequate classification accuracy are preferable for online applications [96]. For example, while sharing the same goal to extract frequency-band power information, a timedomain method consumes only 15% of the computation time of FFT while offering similar accuracy [97]. Therefore, it would be desirable for researchers to consider the computation time as a primary outcome besides the accuracy and report computation time in their publications as well.
Second, effective handling of online EEG artifacts is not an easy task. To maintain a continuous output of affective computing results, it is preferred to eliminate artifacts for a clean and continuous EEG data, rather than reject the artifact-contaminated EEG segments. Artifact elimination can be achieved by blind source separation (BSS) techniques that decompose EEG signals into true EEG sources and artifact sources, such as ICA, empirical mode decomposition (EMD), etc. [98 –103]. By identifying and removing the artifact-like source signals and projecting the remaining sources back to the EEG signal space, artifact-free EEGs can be obtained. However, many of the decomposition algorithms are more suitable to be performed in a post-hoc manner because a large amount of data is usually required. Moreover, identification of the artifactlike sources would require expert experience and it is still difficult to derive automatic yet powerful criterions.
Third, the non-stationary nature of EEG signals is usually neglected in classifier training, which could pose a fundamental limit for online applications [104]. Possible differences of the statistical properties of EEGs (e.g., amplitude range, spectral distribution) during offline and online sessions may substantially deteriorate online classification performance of classifiers trained by offline data [105], especially for those with a long time interval between offline and online sessions [79]. Moreover classification performance is likely to decrease if the online system is supposed to work for a long time period [2]. The non-stationary issue could be addressed by controlling the time information (e.g., used as covariate) during classifier model training to explore time-stable emotion-related EEG features [106, 107]. Alternatively, instead of using a fixed classifier over a long time period, adaptive strategies could be employed to dynamically update the classifier parameters based on the statistical properties of incoming EEG data [105, 108]. While promising performance has been reported for affective computing [109, 110], a systematic investigation of the adaptive approach is necessary, together with feature extraction and decoding methods.
2.8 Tackling individual difference to achieve model generalizability
Emotional experience is believed to be highly individualized [111, 112]. Accordingly, most EEG-based affective computing studies to date have been conducted in a subject-wise manner, by using the affective labels from each subject to train individual-based computational models. However, such a practice severely limits its massive application, as the training and usage of the individual-based models is time-consuming and resource-demanding. A model that could be readily applicable to the general population would greatly increase the popularity of EEG-based affective computing.
Towards such a goal, efforts have been made by identifying robust EEG features across subjects. Some recent studies have investigated the crosssubject performance over a wide range of EEG features [107, 113 –115]. As reported, features such as differential entropy, maximum power spectral frequency and Shannon entropy of gamma band shared similar patterns across subjects for EEG and the Hjorth parameter of mobility in the beta rhythm achieved better cross-subject performance than other features. Besides extracting features directly, domain adaptation, which projects features to subspaces to track invariant patterns across different subjects also gains promising results [79, 116, 117]. Nevertheless, given the known high individual difference in emotional experiences, these similar patterns across subjects may not be enough to support an affective computing model towards practical application, as reflected by the considerable deterioration in performance when compared to their corresponding individualbased models.
Instead of pursuing the across-subject robust features, some researchers are beginning to take a transfer learning approach to address the individual difference issue [118 –121]. The key idea of this approach is to adapt a model for a new individual to the data or information from other subjects selectively so that the individual difference can be alleviated. Improved performance was obtained as compared to the non-transfer methods in a series of studies. It is also believed that comparable performance to subject-dependent models can be achieved when the datasets are sufficiently large [122]. Therefore, the large-scale dataset may be necessary for supporting the efficacy of the subject-transfer approach. Nevertheless, researchers should be aware that it is necessary to selectively use data in the dataset since the inter-subject variability might deteriorate the performance of models transferring.
As a possible further improvement of the subject-transfer approach, it might be necessary to take the subject’s dispositional traits into consideration. Dispositional traits such as personality provide a comprehensive summary of individual differences in behavior and experience from a psychological perspective [123]. There are strong evidences suggesting that personality traits could greatly influence peoples’ perception of affective contents. For example, positive emotion stimuli elicited stronger activations for extroverts, while neurotic people were more vulnerable to negative stimuli [124 –127], anger stimuli would evoke greater anger for people with high trait anger than those with low trait anger [128, 129], and people’s trait emotion regulation could also influence how they process emotional information [130]. Therefore, tackling dispositional-trait-based individual differences may provide complementary information beyond the above-mentioned data-driven approaches towards a generalized affective computing solution. The relationship between people’s emotion responses and individual traits could be understood as:
where E is one individual’s emotion response to certain stimuli, Ec represents the common emotion response shared by people, and Ei represents the unique emotion response of this individual. Tj represents the emotion responses caused by trait j, kj represents trait j’s influence on emotion responses, and ∊ is the random error. While equation (1) reflects the general state-ofthe-art conceptual model used in current affective computing studies, the emotion responses could be better modeled by introducing individual differences, as in equation (2). An overall affective computing scheme summarizing 2.5 to 2.8 is illustrated in Fig. 2.

Affective computing towards individualized, real-time decoding.
2.9 Finding more application scenarios
EEG-based affective computing has revealed its potential in many application scenarios, such as automatic emotion tagging or affective retrieval of multimedia resources [131, 132], monitoring drivers’ fatigue or stress states [133 –135], neurofeedback training for emotion regulation [136, 137], and interactive BCI games [138, 139], etc. In addition, EEG-based affective computing technology has also been applied in more traditional fields such as education. For example, Li and colleagues built an EEG-based affective computing system to assist in the distance education [140]. Mampusti and colleagues designed an EEG-based affective computing model targeted explicitly at academic affective states (boredom, confusion, engagement and frustration) [141].
When seeking for more possible application scenarios, researchers should be well aware that the terminologies for the same concepts (such as “emotion”) could be varied in different fields. For example, researchers might use “mood” “affect” “emotion” or other words to refer to similar psychological phenomena, with subtle differences in the definitions and experimental manipulations. Finding common ground between these differences could be helpful for opening up new application scenarios in more fields; and staying sensitive to the subtle differences and providing more targeted solutions for the differentiated needs could better integrated the affective computing into these new application scenarios. For example, “mood” is often regarded to be longerlasting and less intense than “emotion”. Therefore, the use of “mood” could imply a special interest in the affective states over longer time scale, which might not be consistent with the current mainstream emotion recognition models, and requires more tailored experimental designs. On the other hand, the same terms used in different application scenarios could also be emphasized in different aspects, or be manipulated with varied methods; thus researchers should always be cautious with the context-dependency of the conclusions.
In addition, the limitations of the EEG hardwares should also be taken into consideration in the real-world application scenarios. (e.g., low signal-to-noise ratio, poor signal stability, and short power supply time). And the most suitable application scenarios for EEG-based affective computing systems, for now, are closed environment where people do not have intense movements (e.g., computer games, driver monitoring, and distant education). Moreover, there are also calls for further optimization of the existing hardwires towards the specific purpose of affective computing. For example, future hardwares may integrate the computing architectures into the chips while keeping power consumption to the minimum. This is expected to further increase the wearability, portability and durability of the system, providing more possibilities for more flexible application scenarios.
2.10 Taking ethical issues into consideration
Emotions, providing information about the most intimate motivational factors and reactions, are among the most private personal information [142]. Aiming at decoding people’s affective state, affective computing has drawn increasing ethical concerns in recent years [143 –145]. These ethical concerns are shared with other modern technologies, such as neurotechnology, artificial intelligence, brain-computer interfaces etc. [146]. For example, Ethics Guidelines for Trustworthy AI proposed that data protection must be guaranteed throughout a system’s entire lifecycle [147]. Researchers in BCI field also proposed criterions to evaluate the implications of brain-reading techniques in general in terms of mental privacy, which are accuracy, reliability, informativity, concealability, and enforceability. And the concealability and enforceability of brain-reading systems are particularly emphasized because of the potential threats to mental privacy and civil rights. These criteria are expected to help stakeholders orient themselves in the rapidly developing field of brain reading [148]. Although there is no universally recognized ethical rules for affective computing practice, ethical guideline or criterions from AI and BCI field could still be handy reference for affective computing researchers to evaluate their works. Besides, there are also unique ethical challenges for affective computing [143]. For example, the development of affective computing opens up the possibility of not only recognizing but also influencing or even manipulating an individual’s emotions. Because of the crucial role emotions play in decision-making, it is possible to influence ones’ choices by manipulating their emotion, which as commented in [142], ‘constitutes the ultimate breach of ethics and will never be acceptable to computer users’. Because of the potential powerfulness of affective computing [149], people in this field should be sensitive to these ethical issues as early as possible.
3 Conclusions
We reviewed ten challenges facing affective computing researches, both from the theoretical and operational perspectives. We took an interdisciplinary approach by reviewing studies from fields such as information technology, psychology, and neuroscience, and Table 1 summarized which fields are most promising to bring new breakthroughs to those ten challenges. On the theoretical side, we suggest that researchers should be well aware of the limitations of the commonly used emotion models, and be cautious about the widely accepted assumptions on EEG-emotion relationships as well as the transferability of findings based on different research paradigms. On the practical side, we propose several operational recommendations for the challenges about data collection, feature extraction, model implementation, online system design, as well as the potential ethical issues. The present review is expected to contribute to an improved understanding of EEG-based affective computing and promote further applications.
Summary of the ten challenges and the most related research fields.
Footnotes
Conflict of interests
All contributing authors have no conflict of interests.
Financial support
This work is supported by National Science Foundation of China under Grant U1736220, MOE (Ministry of Education China) Project of Humanities and Social Sciences (17YJA190017), National Social Science Foundation of China under Grant 17ZDA323, and National Key Research and Development Plan under Grant 2016YFB1001200.
