Abstract
The classic Stroop task is very simple: you have to name the color of words printed on a page. If these words are color words (like “red” or “blue”), where the color named and the color it is printed in are different (say, “red” printed in blue), the reaction time increases significantly. My aim is to argue that the existing psychological explanations of the Stroop effect need to be supplemented. The Stroop effect is not exclusively about access to motor control. It is also, to a large extent, about interferences in perceptual processing. To put it briefly, reading the color word triggers—laterally and automatically—visual imagery of the color and this interferes with the processing of the perceived color of the word. In other words, the Stroop effect is to a large extent a sensory phenomenon, and it has less to do with attention, conflict monitoring, or other higher-level phenomena.
Introduction
One of the most widely researched psychological phenomena of all times is the Stroop effect (see Stroop, 1935). The classic Stroop task is very simple: you have to name the color of words printed on a page. If these words are color words (like “red” or “blue”), where the color named and the color it is printed in are different (say, “red” printed in blue), the reaction time increases significantly.
What explains this odd difference? There are two major explanations, the first one dominant in the second half of the 20th century, the second dominant in the last 20 years. According the first one, the Stroop effect is about attention capture. The linguistic stimulus captures our attention, and as a consequence, less attention remains for the processing of the color stimulus (see MacLeod, 1991 for a summary). According to the second one, the Stroop effect is about conflict monitoring and control: there are control mechanisms that detect the conflict between the linguistic and the color stimulus and they prioritize the processing of the language stimulus (Botvinick et al., 2001). 1
The attention account and the conflict monitoring account of the Stroop effect are very different inasmuch as the former gives a fully bottom-up explanation, whereas the latter a top-down one of the effect the semantic meaning of the word has on the processing of color. But they share an important premise, namely, that the Stroop effect is about access to motor control. Depending on whether the word “red” is printed in red or blue, our access to the motor control (of reading the word) is different and this explains the difference in our reaction time. This is clear enough in the attention account, but it is also what is behind the conflict monitoring account, where “conflict may be operationally defined as the simultaneous activation of incompatible representations […] e.gg., representations of alternative responses” (Botvinick et al., 2001, p. 630).
My aim in this paper is to argue that the Stroop effect is not exclusively about access to motor control. It is also, to a large extent, about interferences in perceptual processing. To put it briefly, reading the color word triggers—laterally and automatically—visual imagery of the color and this interferes with the processing of the perceived color of the word.
In section “Mental Imagery”, I outline the concept of mental imagery that is relevant in this discussion and in section “Language Processing and Mental Imagery”, I provide empirical evidence for the various ways in which language processing and mental imagery interact. In section “Back to the Stroop Effect”, I argue that these interactions provide a clear case for a very early perceptual interference from language processing to perceptual processing that explains some aspects of the Stroop effect in a much more straightforward manner than either the attention account or the conflict monitoring account could.
Mental Imagery
The term “mental imagery” was first consistently used in the early days of experimental psychology in the second half of the 19th century and while it has clearly made it to our ordinary language, the way psychologists and neuroscientists use the concept is not as an ordinary language category. Here is a representative definition from a review article on mental imagery in the journal Trends in Cognitive Sciences: “We use the term ‘mental imagery’ to refer to representations […] of sensory information without a direct external stimulus” (Pearson et al., 2015, p. 590; see also Nanay, 2015, 2018).
This definition captures the pre-theoretical notion of mental imagery, which we tend to have in mind when, for example, thinking about the experience of closing our eyes and visualizing an apple. That experience is a representation of sensory information without direct external stimulus. But the concept of mental imagery has a much wider scope than just the experience of visualizing.
First, mental imagery, like perception, can happen in all sense modalities. Mental imagery can be visual, but it can also be auditory, olfactory, gustatory, and tactile. Second, while visualizing an apple amounts to a voluntary use of mental imagery, there is also involuntary mental imagery, like flashbacks or earworms—annoying tunes that go through our head in spite of the fact that we really don’t want them to. Third, while in the case of visualizing, mental imagery is not accompanied by the feeling of presence—you’re not actually taking the apple to be in front of you—, some other forms of mental imagery may be accompanied by the feeling of presence, for example, in the case of lucid dreaming and in some forms of hallucinations (which are widely taken to be forms of mental imagery in psychiatry).
The definition I have been using is a negative definition. It defines mental imagery as (to rephrase a bit) sensory representation not triggered directly by sensory input. But it leaves open the question about what this sensory representation is triggered by (directly). In some cases, it is triggered by top-down processes, as in the case of closing your eyes and visualizing an apple. But in other cases, it is triggered laterally, by, for example, input in another sense modality. When you watch the TV muted, for example, your auditory representation (and often your salient auditory experience) is not directly triggered by the auditory input—there is no auditory input as the TV is muted. It is directly triggered by the visual input of the images on TV (Calvert et al., 1997; Hertrich et al., 2011; Nanay, 2018; Pekkola et al., 2005; Spence & Deroy, 2013).
It should be clear that while the definition of mental imagery I have been using does seem to capture the ordinary usage of the term, it also carves up mental phenomena somewhat differently. As we have seen, it allows for involuntary imagery. But it also allows for unconscious mental imagery as nothing in the definition says that the perceptual representation that is not triggered directly by sensory input must be a conscious representation.
We have an overwhelming amount of evidence that perception may be conscious or unconscious (e.g., Kouider & Dehaene, 2007). But if perceptual representations that are directly triggered by sensory input (i.e., perception) may be unconscious, then it would be arbitrary to posit that perceptual representations that are not directly triggered by sensory input (i.e., mental imagery) may not be. Further, some people report having no conscious mental imagery—these people are called aphantasics and in the last two decades or so a lot of experimental studies were conducted to find out about the causes and nature of aphantasia (see, e.g., Zeman et al., 2007). And while aphantasia seems to be a non-monolithic phenomenon, where many different things can lead to the lack of conscious mental imagery, there is clear evidence that at least a subset of aphantasics, while reporting to have no conscious mental imagery at all, do have mental imagery in the sense of perceptual representation that is not directly triggered by sensory input. They have unconscious mental imagery (Nanay, 2021).
In short, mental imagery may be voluntary or involuntary and it may be conscious or unconscious. It is a scientifically respectable (and even publicly observable) category that is well suited to play a role in the explanation of psychological phenomena.
Language Processing and Mental Imagery
We now know that language processing is not completely detachable from mental imagery. Both generating linguistic utterances and hearing/reading them utilizes mental imagery. Some of the empirical findings supporting these claims come from neuroimaging. Describing a scene relies on our ability to generate mental imagery—early cortical representations not directly triggered by sensory input (Mar, 2004; Zadbood et al., 2017). Even more importantly, hearing a description invariably triggers mental imagery—again, not necessarily conscious mental imagery, but early cortical representations not directly triggered by sensory input and it is this imagistic representation that is remembered, not the words we heard (Zwaan, 2016; Zwaan & Radvansky, 1998).
We understand a fair amount of how this happens and, crucially, we know a lot about the ways in which linguistic labels change (and speed up) perceptual processes and we also know a fair amount about the time scale of this influence. The most important piece of finding both from EEG and from eye tracking studies is that linguistic labels influence shape recognition in less than 100 ms (Boutonnet & Lupyan, 2015; de Groot et al., 2016; Noorman et al., 2018—it should be acknowledged that in these experiments, the onset of the linguistic label preceded the onset of the shape to be recognized). This is a very similar time-frame as how long it takes for the stimulus to reach V4 (Zamarashkina et al., 2020)—that is, extremely fast (note that word recognition does take significantly longer, see Hauk et al., 2012).
Crucially, this less than 100 ms it takes for linguistic labels to influence shape recognition is much shorter than the time that would be needed for perceptual processing to reach all the way up to higher level representations and then trickle all the way down again to the primary visual cortex (see Lamme & Roelfsema, 2000; Thorpe et al., 1996 for the temporal unfolding of visual processing in unimodal cases and see Kringelbach et al., 2015 for a summary of the relative slowness of non-early cortical processing).
By means of comparison, amodal completion (the visual representation of occluded parts of perceived objects) is taken to be bottom-up or laterally influenced on the basis of timing studies although it happens slightly slower than 100 ms. Amodal completion in the early cortices happens within 100–200 ms of retinal stimulation (Rauschenberger et al., 2001; Sekuler & Palmer, 1992—this is true even of complex visual stimuli, like faces, see Chen et al., 2009; see also Lerner et al., 2004; Rauschenberger et al. 2006; Yun et al., 2018 for detailed studies that track the (very quick) temporal unfolding of amodal completion in different parts of the visual cortex). If the 100–200 ms of amodal completion is explained in terms of lateral influence, then the less than 100 ms of the influence of linguistic labeling can also be explained in terms of lateral influence.
This means that linguistic processing and mental imagery interact at an extremely early stage of perceptual processing—by any account in early cortical processing.
Back to the Stroop Effect
My aim is to argue that in the light of these results about the relation between language processing and mental imagery, we have good reasons to hold that reading the color word triggers—laterally and automatically—visual imagery of the color and this interferes with the processing of the perceived color of the word and this is what explains the Stroop effect. In other words, the conflict between the color and the meaning of the word starts much earlier than motor control.
Here is an experiment that supports this hypothesis directly (there may be some indirect support from findings about the Stroop effect for color-related words as well (like “sky” [for blue] and “fire” [for red]—see Dairymple-Alford, 1972). A recent experiment shows that even if we control for all the attentional and other mechanisms that determine motor control, the activation patterns in V4—the part of the visual cortex that is responsible for color processing—would be difficult to explain unless we posit early sensory involvement in the Stroop effect (Purmann & Pollmann, 2015).
Given that V4 is devoted (mainly) to color processing, it is active throughout any color Stroop task. More generally, the involvement of V4 in the Stroop task is somewhat difficult to examine experimentally given that without the functioning of these regions, the effect goes away. So some tricks are required to gain any insight about exactly how early cortical color processing is involved in the Stroop task. The experiments in Purmann & Pollmann, 2015 examined the ways in which the previous trial in a series of Stroop tasks influences the current trial. So the question they raised is how your early sensory cortices behave depending on the order of these trials. If you read the word “red” printed in blue, there is a conflict—it is an “incongruent trial.” If you read the word “blue” printed in blue, there is no conflict—it is referred to as a “congruent trial.”
The question is whether early sensory processing is different depending on whether an incongruent trial was preceded by another incongruent trial. And what the results show is that activities in V4 are very different depending on whether the previous trial was congruent or incongruent. Interestingly, the same effect was not observed in language processing regions of the brain, only in V4. If we take the Stroop task to be about motor control, these results make little sense. But if, as I am suggesting, it is at least partly about sensory processing, these results are exactly what we should expect.
The color of the word activates V4 bottom up (that's perception). And the reading of the word activates V4 laterally and automatically (that's mental imagery). And the processing of the perceived color is slowed down because of the interference of the mental imagery. In short, the conflict between the color and the meaning of the word starts already in perceptual processing.
A word of caution about the scope of the claim I argued for in this paper. While the main findings of the Stroop effect can be explained in terms of the lateral and automatic activation of mental imagery, I don’t want to pretend that all aspects of the Stroop effect can be explained with the help of this explanatory scheme. For example, we know that subjects show greater interference on the first few trials in each block of testing than on subsequent trials in the series (Henik et al., 1997). Also, there is less interference on incongruent trials if they are frequent in comparison with congruent trials (Lindsey & Jacoby, 1994). I don’t think the appeal to the laterally and automatically triggered mental imagery will help us explain these findings.
Nonetheless, we can conclude that the Stroop effect is, at least partially, a sensory phenomenon, and it has less to do with attention, conflict monitoring, or other higher-level phenomena than previously supposed. While it may give us insights into the nature of attention and automaticity or into the intricacies of conflict monitoring and cognitive control, its theoretical import may be even more significant. In fact, the way language processing and perceptual processing interact in the case of the Stroop effect can open up new research directions both about early cortical sensory processing and about language processing, besides touching on some of the deepest (and earliest) philosophical questions about the relation between perception and language.
Footnotes
Author Contribution(s)
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
