Abstract
The Japanese and Caucasian Brief Affect Recognition Task (JACBART) has been proposed as a standardized method for measuring people's ability to accurately categorize briefly presented images of facial expressions. However, the factors that impact performance in this task are not entirely understood. The current study sought to explore the role of the forward mask's duration (i.e., fixed vs. variable) in brief affect categorization across expressions of the six basic emotions (i.e., anger, disgust, fear, happiness, sadness, and surprise) and three presentation times (i.e., 17, 67, and 500 ms). Current findings do not demonstrate evidence that a variable duration forward mask negatively impacts brief affect categorization. However, efficiency and necessity thresholds were observed to vary across the expressions of emotion. Further exploration of the temporal dynamics of facial affect categorization will therefore require a consideration of these differences.
Individuals have an inherent faculty to voluntarily control the facial expressions they portray to others, which enables them to either simulate an emotion they do not feel, minimize or neutralize an emotion they do feel, or hide their true emotion with the simulation of another (Ekman, 1985/2009; Ekman & Friesen, 1969; Zuckerman et al., 1981). The portrayal of voluntary facial expressions can, for obvious reasons, be advantageous for the expresser when they intend to manipulate others (Zuckerman et al., 1981). However, in line with Darwin's (1872) Inhibition Hypothesis, this expression control is not absolute and involuntary leakage can unintentionally result in brief demonstrations—appearing for < 0.5 s—of an emotion the individual is trying to conceal (Ekman, 1985/2009; Ekman & Friesen, 1974; Haggard & Isaacs, 1966; Yan et al., 2013). These brief emotional leakages—often termed micro-expressions—are suggested to differ from typical involuntary expressions, which are otherwise known as macro-expressions and portrayed for between 0.5 and 4 s (Ekman, 2003; Hess & Kleck, 1990).
Although empirical data concerning the production of micro-expressions are still limited due to their nature (see Porter & ten Brinke, 2008 or Yan et al., 2013), there is a great deal of research exploring their categorization (e.g., Calvo & Lundqvist, 2008; Ekman & Friesen, 1974; Esteves & Öhman, 1993; Marsh et al., 2010; Matsumoto & Hwang, 2011; Matsumoto et al., 2000, 2014; Milders et al., 2008; Porter & ten Brinke, 2010; Russell et al., 2006, 2008; Shen et al., 2012). Observers are predictably found to have increased difficulty accurately recognizing these brief expressions—as compared to full-duration macro-expressions. Indeed, longer display times were found to benefit brief affect categorization (Calvo & Lundqvist, 2008; Kirouac & Doré, 1984).
There is, however, evidence (e.g., Marsh et al., 2010, Matsumoto & Hwang, 2011, Matsumoto et al., 2014, Porter & ten Brinke, 2010, Russell et al., 2006, 2008) that individuals are capable of improving their ability to efficiently recognize these brief expressions through training, which has important implications for high-stakes situations where accurate emotion recognition could save lives (e.g., see Matsumoto et al., 2014) or enhance clinical interventions (e.g., Marsh et al., 2010; Russell et al., 2006, 2008). As a result, although outside the scope of the current study, training programs have been developed to purportedly increase a person's micro-expression recognition abilities—examples can be accessed through the Paul Ekman Group (https://www.paulekman.com/micro-expressions-training-tools/) or Matsumoto's Humintell (https://www.humintell.com/products-2/).
These training programs are of particular relevance to the current study because the Japanese and Caucasian Brief Affect Recognition Task (JACBART) was developed as a standardized measure of brief affect recognition pre- and post-training, as an assessment of training effectiveness. The JACBART proposes that a target facial expression of emotion should be presented for a brief amount of time between two neutral expressions of that target (i.e., a neutral expression of the same individual presenting the target expression is used as both the forward and backward masks). The observer is then typically asked to select which emotion the target expression portrayed (e.g., Hurley, 2012; Hurley et al., 2014; Matsumoto & Hwang, 2011; Matsumoto et al., 2000).
The introduction of the neutral expression after the target expression (i.e., a backward mask) with the JACBART—as compared to the Brief Affect Recognition Test (BART; see Ekman & Friesen, 1974)—was thought to serve an important function beyond simply increasing the measure's ecological validity. More specifically, it was thought this approach would reduce the likelihood that a brief target image would leave a residual afterimage in the early stages of the visual system, which could consequently facilitate post-offset processing of the target and lead to increased recognition rates (e.g., Bacon-Macé et al., 2005; Shen et al., 2012). Indeed, the duration (Esteves & Öhman, 1993; Macknik & Livingstone, 1998) and type (Loffler et al., 2005; Milders et al., 2008; Zhang et al., 2014) of backward mask have been observed to impact brief affect recognition thresholds. More specifically, a backward mask of at least 50 ms is suggested to significantly lower recognition rates (Esteves & Öhman, 1993; Macknik & Livingstone, 1998), and a neutral-face backward mask was found to add increased affect recognition difficulty as compared to a dynamic checkerboard backward mask (Milders et al., 2008; see, also, Loffler et al., 2005).
Despite the many studies demonstrating the impact of a backward mask on brief affect recognition, there is little research exploring the role of the forward mask (Esteves & Öhman, 1993; Macknik & Livingstone, 1998). Indeed, the JACBART commonly displays a forward mask for a fixed duration (e.g., Matsumoto & Hwang, 2011; Shen et al., 2012; Zhang et al., 2014) or a duration that varies systematically to obtain a fixed duration sum across the target and mask durations (e.g., Hurley, 2012; Hurley et al., 2014; Matsumoto et al., 2000). Furthermore, even with the latter approach, the forward mask varied by at most 65 ms within an experiment.
This presents a potential limitation for the JACBART because individuals have demonstrated an ability to devote their attention to specific timespans. For instance, primes with temporal information have been found to benefit target recognition, when compared to primes without temporal meaning (Coull & Nobre, 1998). Consequently, a fixed forward mask duration could facilitate emotion recognition and impact recognition thresholds for the JACBART by allowing participants to use the forward mask onset as a prime to predict the target onset.
The current study was therefore designed to explore the impact of forward mask duration variability on emotion categorization performances with brief expressions of emotion in a JACBART paradigm. The current study presented target expressions for either 17, 67, or 500 ms (i.e., 1, 4, or 30 frames on a 60 Hz display monitor)—as 17 ms was the shortest time for which one can present a visual stimulus on a standard 60 Hz monitor, 67 ms is the shortest time typically employed with the JACBART (e.g., Matsumoto et al., 2000), and 500 ms is thought to yield performances equivalent to unlimited viewing (Calvo & Lundqvist, 2008) and represents the threshold between micro-expressions and macro-expressions. The version of the JACBART employed by Matsumoto and Hwang (2011) was modified such that the stimuli appeared either after a fixed duration (1,000 ms) or variable duration (750–1,250 ms) forward mask. By then asking participants to correctly categorize the six basic emotions (anger, disgust, fear, happiness, sadness, and surprise) with the three target durations, the current study sought to explore forward mask variability's relationship with the necessity threshold—the minimum presentation time needed to recognize facial expressions at a rate significantly above chance—and the efficiency threshold—defined as the minimum presentation time required for a brief facial expression to be recognized at a rate statistically equivalent to a macro-expression (i.e., a duration of 500 ms or more). It was hypothesized that the variable-duration forward mask would reduce a participant's ability to predict the onset of the target expression. This increased difficulty would be reflected through lower hit rates and greater erroneous response rates, and consequently increases the necessity and efficiency thresholds. That is, both above-chance performance and ceiling-level performance would be expected to require longer target durations.
Two additional hypotheses were also formulated with respect to these categorization thresholds. First, based on research with fixed duration pre-target fixation screens (i.e., without a forward mask; Milders et al., 2008; Pessoa et al., 2005), the necessity threshold was hypothesized to vary across the six basic emotions. More specifically, it was predicted that 17 ms would be sufficient for the recognition of angry and happy expressions at rates above chance, while 67 ms would be needed for expressions of fear. For the remaining emotions, due to a lack of information in previous studies, it was unclear whether 17 ms would be sufficient to yield above-chance performances. Second, Calvo and Lundqvist (2008) suggest 67 ms may not be sufficient for categorization performances to meet the current efficiency threshold definition with any expressions of emotion. However, despite their interaction between display time and displayed emotion, they did not consider how to display time could have varying effects across emotions—opting to, instead, collapse their data across emotion conditions. It was, therefore, hypothesized that the efficiency threshold would vary across expressions of emotion.
Methods
Participants
Thirty undergraduate students were recruited to participate in the current study. These individuals were randomly assigned one of two forward mask groups: Fixed-Duration or Variable-Duration. There were 13 individuals in the Fixed-Duration group (5 men, 8 women; Mage = 18.38, SD = 0.77 years) and 17 in the Variable-Duration group (2 men, 15 women; Mage = 19.71, SD = 4.25 years). All participants were recruited through an undergraduate participant pool, the Integrated System of Participation in Research (ISPR), at the University of Ottawa. Participants completed the experiment individually for course credit in the Integrated Neurocognitive and Social Psychophysiology Interdisciplinary Research Environment (INSPIRE) lab at the University of Ottawa. Although they were tested in groups of up to four individuals, each completed the task on a separate computer shielded from the others by a carrel. All participants in a given session were placed in the same forward mask group.
Materials
E-Prime v2.3 (Psychology Software Tools, Sharpsburg, PA) was used to present a total of 216 trials on ViewSonic VT2405 monitors with a 60 Hz refresh rate and 1920 × 1080 pixel resolution. Stimuli consisted of a total of 24 target images taken from the Japanese and Caucasian Facial Expressions of Emotion (JACFEE; Matsumoto & Ekman, 1988) database. This included images of two men and two women Caucasian models each displaying facial expressions of all six basic emotions (Anger, Disgust, Fear, Happiness, Sadness, and Surprise). In addition, the neutral face images of all four models were used as their respective target expression masks. All images were cropped such that the emotionally expressive faces would align with their neutral counterparts, which served to reduce the degree of apparent motion induced by the switch between the neutral mask images and the target images.
Procedure
A modified JACBART paradigm was used (Matsumoto et al., 2000; Matsumoto & Hwang, 2011), where a target expression was always presented between two neutral expressions portrayed by the same actor (see Figure 1). Each trial began with the display of a neutral facial expression for either 1,000 ms (Fixed-Duration group; see Matsumoto & Hwang, 2011) or a random duration between 750 and 1,250 ms (Variable-Duration group; with 17 ms increments). The target expression was then displayed for one of three set durations (17 1 , 67, or 500 ms) and followed by the neutral expression image for 1,000 ms. The participant was then asked to indicate which expression was displayed by selecting from the seven options: “anger,” “disgust,” “fear,” “happiness,” “sadness,” “surprise,” or “neutral.” Participants were instructed to respond as quickly and accurately as possible, and a maximum of 10 s was given to respond. Any response outside this time was deemed to be unrepresentative of the current affect categorization task (see Hurley, 2012; Hurley et al., 2014). After the participant's response, a 500 ms blank screen inter-trial interval was presented.

The sequence of events presented within a single trial for both forward-mask condition groups (fixed: 1,000 ms; variable: 750–1,250 ms) and the target durations (17, 67, and 500 ms).
All stimuli were presented as 600 by 625-pixel gray-scaled images in the center of the monitor. In addition, expressions of emotion were aligned with their neutral face counterparts using the dorsum of the nose, which offered a more stable baseline across the various expressions. There was a total of 216 trials, with three target presentation durations (17, 67, and 500 ms), six target expressions of emotion (Anger, Disgust, Fear, Happiness, Sadness, and Surprise), four models (2 men and 2 women), and three repetitions. Trials were presented in random order.
Data Analyses
Using R (version 3.6.1) software, participant responses were transformed into binary values to compute hit rates—responses that aligned with an expression's expected label—and erroneous response rates—responses that did not align with an expression's expected label. That is, when calculating hit rates, a “1” was given when a response corresponded with the image's expected emotional label, while a “0” was given when it did not. Conversely, when calculating the six erroneous response rates for each target expression, a “1” was only given to an emotional response option when that response was used and it did not correspond with the image's expected emotional label, while a “0” was used for all other cases.
To approximate the necessity threshold for each target expression, each group's hit rates were compared to chance (1/6 or 16.67%) using single-sample t-tests—accounting for target emotion and duration. Bonferroni adjustments were applied to p-values to account for the multiple comparisons. Next, to explore the efficiency threshold, hit rates were compared across conditions in a 6 (Target Emotion: Anger, Disgust, Fear, Happiness, Sadness, and Surprise) × 3 (Target Duration: 17, 67, and 500 ms) × 2 (Forward Mask: fixed-duration or variable-duration) mixed-design ANOVA. Greenhouse-Geisser corrections were applied where the assumption of sphericity was violated according to Mauchley's test, and Bonferonni adjustments were applied for all post-hoc comparisons.
Task performance was further examined through an analysis of erroneous response rates. These were analyzed in two stages: (1) chi-squared goodness-of-fit tests were used to determine if participants used any particular erroneous response more than others (expected erroneous rate = [1 − hit rate]/6) and (2) where the chi-square test showed non-uniform use of erroneous response options, all erroneous response rates were compared to chance (16.67%) in single-sample t-tests. Bonferroni adjustments were applied to p-values to account for the multiple comparisons.
Results and Discussion
The JACBART was designed as a standardized approach for testing one's ability to recognize brief displays of emotion. This paradigm displays a target expression of emotion between presentations of two neutral expression images of the same actor. The neutral face image appearing before the target image is referred to as the forward mask, whereas the one appearing after is called the backward mask (Matsumoto et al., 2000). While there has been some work on the effects of the backward mask, there is currently little research exploring the influence of the forward mask on performance in the JACBART (Esteves & Öhman, 1993; Macknik & Livingstone, 1998). In addition, previous studies examining the categorization of briefly presented facial expression images have generally examined only a limited range of expressions (e.g., Esteves & Öhman, 1993; Milders et al., 2008; Pessoa et al., 2005) or have not analyzed differences across the expression categories (e.g., Calvo & Lundqvist, 2008).
With this in mind, the current study was designed to explore the role of the forward mask's duration, specifically the effects of changing from a fixed duration (1,000 ms) to a variable one (750–1,250 ms), on the categorization of briefly-presented facial expressions of emotion. In addition, it sought to explore the influence of the forward mask's duration on the necessity threshold—defined as the minimum target duration required to exceed chance—and efficiency threshold—defined as the point where target recognition is no longer benefitted by longer display times. To this end, using the JACBART design, three target expression durations—17, 67, and 500 ms—were used to present expressions of six basic emotions: Anger, Disgust, Fear, Happiness, Sadness, and Surprise. Mixed-design ANOVAs did not demonstrate a significant effect or interaction with the forward mask's duration (see Table 1). Conversely, a significant interaction was observed between target emotion and target duration. Results for each emotion will be discussed in turn.
Mixed-design ANOVA examining hit rate differences across the forward mask conditions, target emotions, and target durations.
< .10; *p < .05; **p < .01; ***p < .001.
Anger Expressions
Our findings showed that with expressions of anger, in contrast to prior literature (Milders et al., 2008; Pessoa et al., 2005), 17 ms was not sufficient for expression categorization; Hit rates were not observed to exceed chance with either the fixed-duration or variable-duration forward masks (see Tables 2 and 3). Indeed, angry expressions were often erroneously recognized as “neutral” with this target duration—regardless of the forward mask condition (see Tables 2 and 3). This disparity from the findings of Milders et al. (2008) was not initially hypothesized but may be due to differences in methodology. Specifically, Milders and colleagues used a different expressor identities for the mask and target images, whereas the current design used the same expressor identity for both. As Milders et al. themselves show, the type of mask can impact performance. They observed that a neutral face mask increased the task difficult relative to a dynamic checkerboard mask. Our results can be seen as an extension of theirs, showing that when a same-identity face is used as a forward mask, it reduces performance compared to a different-identity face. This in turn means that more time is required to recognize a facial expression when a same-identity face is used as a mask.
Hit rates (italicized) and erroneous response rates for the fixed-duration forward mask group (n = 13). Including chi-square goodness-of-fit tests (df = 5) and single-sample t-tests (chance = 16.67%).
***p < .001; **p < .01; *p < .05; †p < .10; one-tailed; Bonferroni adjusted.
Hit rates (italicized) and erroneous response rates for the variable-duration forward mask group (n = 17). Including chi-square goodness-of-fit tests (df = 5) and single-sample t-tests (chance = 16.67%).
***p < .001; **p < .01; * < .05; †p < .10; one-tailed; Bonferroni adjusted.
Correct categorization of angry face images improved with durations longer than 17 ms (both p < .001). The “anger” response was used at a rate above chance with both the 67 and 500 ms displays of angry expressions regardless of the forward mask duration condition (see Tables 2 and 3). Also, there was no significant difference between the anger hit rates with these two longer target durations (p = .06; see Table 1 and Figure 2). Contrary to the findings of Calvo and Lundqvist (2008), this suggests anger may be efficiently recognized with as little as a 67 ms expression display time. However, inconsistent with prior literature (Gagnon et al., 2010; Gosselin et al., 1995; Jack et al., 2009), expressions of anger were not categorized with the “disgust” option at rates above chance.

Mean hit rates for the various target emotions (anger, disgust, fear, happiness, sadness, and surprise) and durations (17, 67, and 500 ms), with combined forward-mask condition groups (fixed and variable).
Disgust Expressions
With expressions of disgust, 17 ms was not always sufficient for correct categorization; the “disgust” response option was not used at a rate exceeding chance regardless of the forward mask condition. Indeed, participants often erroneously labeled these brief expressions as “neutral,” regardless of the forward mask condition (see Tables 2 and 3). Consistent with Calvo and Lundqvist (2008), longer display times consistently improved proper “disgust” response rates—regardless of the forward mask condition (all p < .001; see Table 1 and Figure 2). However, inconsistent with Calvo and Lundqvist (2008), 67 ms was sufficient for proper disgust categorization to exceed chance.
Fear Expressions
A 17 ms presentation duration was not sufficient for participants to categorize this expression as “fear” at a rate above chance (see Tables 2 and 3). Instead, this expression often resulted in an erroneous usage of the “neutral” response when presented with a variable-duration forward mask. Hit rates were once again observed to improve with longer expression display times (all p < .001; see Table 1 and Figure 2), and fear expressions were properly labeled at rates above chance with both the 67 and 500 ms display times. However, inconsistent with prior literature (Chamberland et al., 2017; Gagnon et al., 2010; Gosselin et al., 1995; Gosselin & Simard, 1999; Jack et al., 2009; Roy-Charland et al., 2014, 2015), fear expressions were not confused with “surprise” after corrections for multiple comparisons.
Happiness Expressions
Contrary to the expressions of emotion discussed above, but consistent with prior literature (Milders et al., 2008), 17 ms was sufficient for happy expressions to be accurately labeled with the “happy” response at a rate above chance. In addition, participants did not present any biases in their erroneous response options when presented with expressions of happiness (see Tables 2 and 3). Accurate “happy” response rates consistently improved as participants were presented with the happy expression for longer durations (all p ≤ .05; see Table 1 and Figure 2)—regardless of the forward mask condition.
Sadness Expressions
With the 17 ms display time, facial expressions of sadness were not accurately categorized (i.e., the “sadness” response option was not used) at rates above chance—regardless of the forward mask condition—and were, instead, erroneously labeled as “neutral” at rates above chance (see Tables 2 and 3). Findings show that these categorization rates improved with longer display times (all p < .001; see Table 1 and Figure 2), and the “sadness” label was correctly used at rates above chance with the 67 and 500 ms display durations. In addition, chi-square goodness-of-fit tests indicate there was no bias for any particular erroneous response option with the longer display time (500 ms; Tables 2 and 3).
Surprise Expressions
As with expressions of happiness, 17 ms was sufficient for the correct categorization of surprise. Indeed, expressions of surprise were recognized at rates above chance regardless of forward mask variability condition or display time. In addition, participants did not present a bias for any particular erroneous response (see Tables 2 and 3). Despite the observed high hit rates with this expression, the “surprise” response was still accurately used at greater rates with the 67 and 500 ms display times than the 17 ms display time (both p < .001). However, no significant difference was observed between the two longer display times (p > .99; see Figure 2).
Limitations and Future Directions
The current study was exploratory in nature with regards to the categorization thresholds and was not intended to determine precise measures of the defined thresholds. Instead, the purpose was to use these threshold measures to explore the effect of a variable-duration forward mask. Results nevertheless offer further insight and demonstrate how, in a JACBART paradigm, these thresholds can vary across facial expressions of emotion. Further research employing more presentations will be needed to determine precise stimulus duration thresholds.
Future research may want to further explore the role of a variable-duration forward mask with shorter target presentation times. However, as it is uncommon with the JACBART and biologically impossible for a micro-expression to be presented for such brief durations, said research is not entirely warranted. It should also be noted that, although the current design presented a range of forward mask variability not previously observed with the JACBART, further research is needed to determine if a wider range of variability would have a greater effect.
Conclusion
In sum, our findings suggest the forward mask manipulation did not significantly affect categorization rates. Nevertheless, the temporal dynamics of affect categorization were observed to vary across the six basic emotions. Specifically, although 17 ms was sufficient for the categorization of expressions of happiness and surprise, a minimum of 67 ms was necessary for expressions of anger, disgust, fear, and sadness—consequently suggesting the necessity threshold could be below 17 ms with expressions of happiness and surprise, but somewhere between 17 and 67 ms with expressions of anger, disgust, fear, and sadness. Finally, 67 ms was not sufficient for efficient brief affect categorization with expressions of disgust, fear, happiness, or sadness, and the efficiency threshold is suggested to fall somewhere between 67 and 500 ms. Conversely, a relatively shorter efficiency threshold is suggested to fall between 17 and 67 ms for expressions of anger and surprise—as 67 ms was sufficient for efficient categorization rates. These data show the necessity of considering which emotional expressions one is examining when considering the speed with which individuals can recognize them. In addition, they help to bracket the durations needed to categorize a range of common expression types.
Footnotes
Author contribution(s)
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Natural Sciences and Engineering Research Council of Canada (grant no. 2015-05067).
Supplemental Material
This paper does not include supplementary material/movies.
