Abstract
This article presents a study performed to investigate the role of simulation in second language learning while using a virtual environment. Participants were asked to explore a virtual park while learning 15 new Czech verbs (action verbs that describe movements performed with either the hand or the foot, and abstract verbs). This learning condition was compared with a baseline condition, where movements (either virtual or real) were not allowed. The goal was to investigate whether the virtual action (performed with the feet) would promote or interfere with the learning of verbs describing actions that were performed with the same or a different effector. The number of verbs correctly remembered in a free recall task was computed, along with reaction times and number of errors during a recognition task. Results show that the simulation per se has no effect in verbal learning, but the features of the virtual experience mediate it.
Introduction
The link between action and language has been widely acknowledged by the scientific community. During the last decade, several articles have been published reporting the involvement of the motor (Buccino et al., 2005; Gerfo et al., 2008; Repetto, Colombo, Cipresso, & Riva, 2013) and premotor cortices (Boulenger, Hauk, & Pulvermuller, 2009; Hauk, Johnsrude, & Pulvermuller, 2004; Kemmerer, Castillo, Talavage, Patterson, & Wiley, 2008; Tettamanti et al., 2005; Willems, Labruna, D’Esposito, Ivry, & Casasanto, 2011) during language processing. Language learning is a cognitive ability that entangles language and memory, and is suitable for investigation within this theoretical frame. In fact, it has been highlighted that foreign language learning is enhanced by linking verbal and action information (see Macedonia & von Kriegstein, 2012, for an extensive review). How to investigate this link poses an interesting methodological question. We know that verbal information is usually communicated by words or sentences, whereas action information is derived from gestures. After a review of the key points of traditional studies on action information and language learning, the present study will focus on the feasibility of using simulation as an alternative, innovative method to explore these same processes.
The impact of gestures on verbal memory has been studied for decades as gestures are fundamental in second language acquisition. Engelkamp and Zimmer (1985), for example, reported that the recall of action words or sentences is improved if, during the learning phase, the subjects pantomime the corresponding action. These results emerged by comparing the language acquisition rate to a control condition in which subjects could only hear/read the action items. This “enactment effect” not only increased the number of items correctly remembered but also improved the accessibility of the memorized items, as proven using recognition tasks (Kronke, Mueller, Friederici, & Obrig, 2013; Masumoto et al., 2006; Noice & Noice, 2001). Recent neuropsychological studies provide further evidence. A recent functional magnetic resonance imaging (fMRI) study (Miyahara, Kitada, Sasaki, Okamoto, Tanabe, & Sadato, 2013) investigated the neural substrates involved in the effect of spontaneous verbal labeling when memorizing increasingly complex sequences of hand movements. Results showed that the use of verbal labels would reduce neuronal activity in imitation-related regions, such as the left inferior frontal gyrus. As further confirmation of these findings, Straube and colleagues (2014), relying on behavioral and fMRI data, reported the activation of different neural substrates (left temporal pole and middle cingulate cortex vs. posterior thalamic structures and anterior and posterior cingulate cortices) for processing related and unrelated coverbal gestures, leading to enhanced memory performance.
Enriching the study phase with action information, to promote verbal learning, has been used effectively in foreign language learning, a field where verbal memory has a crucial role (Taleghani-Nikazm, 2008). Several researchers have pointed out how gestures increase recall and prevent decay when used to accompany foreign language words (Kelly, McDevitt, & Esch, 2009; Macedonia, 2003; Tellier, 2008). Interestingly, abstract words also profit from the use of enactment, as demonstrated by Macedonia and Knösche (2011).
The reason why enacted items are better remembered and retained is still under debate. Different explanations have been proposed for the enactment effect. As they are not mutually exclusive, we can say they mirror different perspectives. Some authors (see Allen, 1995) refer to classical cognitive theories, such as the depth of processing principle (Craik & Tulving, 1975). According to this principle, the deeper the item is processed (i.e., in terms of semantic features), the more likely it will be recalled in the future; moreover, the memory trace will also last longer. Hence, item recall should benefit from enactment in the encoding phase as it deepens the level of processing. The dual code theory by Paivio (Paivio, 1971) is also referred to as a mechanism underlining this effect (Tellier, 2008): Those items that comprise not only verbal but also visual information are more efficiently remembered. Gestures, in this case, provide the second “code,” the motor trace that plays the same role as the visual code.
The hypothesis of the motor trace is also taken into account by Macedonia et al. (Macedonia, Muller, & Friederici, 2011) to explain learning of both concrete and abstract words during foreign language acquisition. According to the authors, performing a gesture when learning a word . . . strengthens the connections to embodied features of the word that are contained in its semantic core representation. . . . in the case of abstract words such as adverbs, gesture constructs an arbitrary motor image from scratch that grounds abstract meaning in the learner’s body. (Macedonia & von Kriegstein, 2012, p. 398)
All of these positions are based on the idea that an enrichment of the semantic representation facilitates the advantage of enactment.
However, other researchers explored the effect of action on language processing, while assuming the concept of simulation as a theoretical background. According to Barsalou (Barsalou, 2008), “simulation is the re-enactment of perceptual, motor, and introspective states acquired during experience with the world, body, and mind” (p. 618). The effects of motor simulation have been widely investigated in several behavioral experiments, addressing different issues about the interplay between language and action in different linguistic processes (Bergen & Wheeler, 2010; Ditman, Brunye, Mahoney, & Taylor, 2010; Frak, Nazir, Goyette, Cohen, & Jeannerod, 2010; Papeo, Corradi-Dell’Acqua, & Rumiati, 2011; Rueschemeyer, Lindemann, van Rooij, van Dam, & Bekkering, 2010; Springer & Prinz, 2010; Taylor & Zwaan, 2008; Tseng & Bergen, 2005; Zwaan & Taylor, 2006). Yet, the direction of the effect of simulation is still unclear: Does it help or interfere with the linguistic process? The literature reports opposing results related to comprehension and lexical decision tasks. Some authors (Myung, Blumstein, & Sedivy, 2006; Rueschemeyer et al., 2010) found a facilitation effect (faster reaction times [RTs]) due to a match between the action performed and the one described by the verb. Other authors (Buccino et al., 2005) observed an interference when the effector used to provide the answer and the one involved in the action word were the same.
It is quite interesting that, to our knowledge, few articles aimed at applying the concept of simulation to the learning processes, investigating the role of overt action execution during learning. Paulus and collaborators (Paulus, Lindemann, & Bekkering, 2009), for example, predicted that if the acquisition of functional information about an object requires a mental simulation of its use, then an overt motor interference during the encoding phase should affect the acquisition of the functional object knowledge by blocking motor simulation. To test this hypothesis, Paulus and collaborators (Paulus et al., 2009) constructed two sets of novel objects: Half of these objects related to the action of hearing, and the other half related to the action of smelling (both actions being performed by manipulating the object with one hand). Participants were shown pictures of the objects and instructed to learn verbally their functional properties by repeating them aloud. The learning settings were systematically varied according to four different interference conditions during the encoding phase: no interference, hand interference (participants had to squeeze a soft ball while performing the verbal learning task), foot interference (participants had to press a soft ball with their feet while performing the verbal learning task), and attention interference (the task concomitant to the learning one was an auditory oddball target detection task). As predicted, the performance—assessed in a subsequent test phase—decreased significantly in the hand interference condition. In this condition, the actual movement performed during the learning phase interfered with spontaneous and covert motor simulation of the functional object knowledge. The fact that action information contributes to conceptual processing was already identified by Kiefer et al. (Kiefer, Sim, Liebich, Hauk, & Tanaka, 2007), who found that, during a categorization task of previously learned novel objects, an early activation of the frontal motor regions and later activation of occipitoparietal visuomotor regions occurred only when a pantomime of the features of the objects was performed during learning phase. However, these experiments deal with the learning processes linked to conceptual (i.e., functional) information and not with language learning per se, as addressed in the research previously described, which used gestures enrichment.
Starting from this theoretical background, the aim of the present work is to investigate the role of motor simulation (related to the match-mismatch between the effector that executes the action and that described by the action verb) during the acquisition of a foreign language. To reach this goal, an experimental setting was implemented in which participants had to learn foreign verbs—action (hand or foot actions) and abstract—with or without concomitant real and virtual motor tasks. The tasks had to be performed using virtual reality technology.
Virtual reality is a combination of technological devices that allows users to create, explore, and interact with 3-D environments. Typically, individuals entering a virtual environment feel part of this world, and they have the opportunity to interact with it almost as they would in the real world. The similarity of the virtual experience to the real world relies mostly on three features: sight, hearing, and interaction. In most cases, the visual input is provided by means of a computer monitor or a head-mounted display (HMD). The HMD is a visualization helmet that conveys computer-generated images to both eyes, giving the illusion of the third dimension in the surrounding space. Aural devices may be head-based, such as headphones, or stand-alone, such as speakers. The degree of interaction relies on multiple factors. Probably the most influential of these is the software that manages this virtual interaction: The more users see their actions affecting the virtual world, the more they will feel immersed and engaged. These features of the virtual experience have an impact on the sense of presence perceived by the user. Presence is usually defined as the “sense of being there” in a scene depicted by a medium (Barfield, Zelter, Sheridan, & Slater, 1995). In other words, the more a virtual environment is able to elicit a “perceptual illusion of non-mediation” in the user (Lombard & Ditton, 1997), the more the user will feel present in the environment. The determinants of presence are multiple, and refer, on one side, to the features of the environment and to the level of interaction it allows (IJsselsteijn, de Ridder, Freeman, Avons, & Bouwhuis, 2001; IJsselsteijn & Riva, 2003) and, on the other side, to the user’s characteristics (i.e., the tendency to use visual representations; Slater & Usoh, 1994).
In our experiment, the evaluation of the sense of presence was a critical issue: It has been proven that some key aspects linked to presence, such as the sense of spatial presence, correlate with the activation of different cortical regions, including motor and premotor areas (Baumgartner, Valko, Esslen, & Jancke, 2006). Hence, understanding the role of these areas during semantic comprehension of action words (Repetto et al., 2013) is also a fundamental prerequisite to understanding how presence modulates linguistic processes.
Our main hypothesis is that if the simulation of the action described by the verb is important for learning a verb’s meaning, then a concomitant action that involves the same effector of the verb should modulate its recall.
More specifically, we hypothesized three scenarios:
Method
Participants
Forty-two volunteers (16 males and 26 females, age: range = 19-49 years,
Stimuli
Fifteen verbs in the Czech language were selected: Five of them described actions performed with the hand (e.g., to draw), five verbs described actions performed with the foot/leg (e.g., to jump), and five of the verbs were intellectual or symbolic activities (e.g., to forget). The complete set of items is listed in Table 1.
The Complete Set of Items Included in the Experiment.
We choose the Czech language because, on one hand, it is almost unknown in Italy (thus optimum to avoid familiarity effects), and, on the other hand, its phonology is quite comprehensible for Italian speakers. The three categories of verbs included items matched for length and frequency, according to the available database for spoken Italian (De Mauro, Mancini, Vedovelli, & Voghera, 1993). All the Czech verbs were audiotaped with an online voice synthesizer, and the correspondent Italian translations were recorded by a female human voice.
Each trial was composed of a Czech verb, followed by its Italian translation, and by the repetition of the same Czech verb, with 1 s of delay in between. The intertrial delay was set up at 3 s. Figure 1 summarizes the trial composition and timing.

Trials composition and timing.
Five blocks were constructed and randomly presented. A particular trial was presented only once in each block, and the order of presentation of the trials was randomized. Thus, in total, the task included 75 trials.
Virtual Environment
The virtual environment employed the freeware software NeuroVr2 (www.neurovr2.org). It was designed to represent a park on a sunny day. When entering the virtual park, the participant started his or her exploration from a paved track, and the “first-person point of view” was set up as for an adult who was standing and ready to explore the park. On the sides of the track, green grass completely covered the ground, and trees and shrubs enriched the area. In addition to natural features, many artifacts were shown that typically would be seen in a park: for example, benches, streetlamps, and bins. A picnic area and a playground were displayed. No human beings were present in the scene. The interaction with the environment (when required, depending on the experimental condition) was regulated by manipulating the left knob of a joypad (Xbox 360) with the left thumb. The virtual environment was projected through an HMD, shaped as sunglasses (covering the eyes and resting on the ears).
Questionnaires
To measure the sense of presence, the ITC-Sense of Presence Inventory (ITC-SOPI) was employed (Lessiter, Freeman, Keogh, & Davidoff, 2001). This questionnaire was developed taking into account the key factors that predict the sense of presence; it focuses on the user’s experience of the media, both during and after the experience. It is based on four factors: Sense of Physical Space (Cronbach’s alpha = .94), Engagement (Cronbach’s alpha = .89), Ecological Validity (Cronbach’s alpha = .76), and Negative Effects (Cronbach’s alpha = .77).
Participants also completed the UsoImm77 questionnaire (Antonietti & Colombo, 1997). This questionnaire aims at investigating the spontaneous occurrence of visualization and mental images in everyday life activities. The questionnaire comprises 77 items: Each item typifies a situation in which people may experience mental images. Subjects are requested to rate, on a 5-point scale, how frequently the visualization process described in the item occurs for them. The items concern different mental functions (memorizing, recalling, problem solving, daydreaming), involve different kinds of mental images (static and dynamic, single and interactive, personal and impersonal, spontaneously elicited by external stimuli and intentionally constructed and processed by the subjects), have different content (objects, persons, places), and concern different situations (e.g., study activities, leisure time, and so on).
Procedure
Before attending the experimental session, volunteers were contacted by email and requested to complete (using an online form) the Usoimm77 questionnaire. This had to be finished at least 1 day before the experimental session to prevent a priming effect on spontaneous imagery.
On the day of the experimental session, participants were welcomed into a quiet room by an experienced researcher. The materials used in the lab included a personal computer and the tools used to experience virtual reality (VR; joypad and HMD). These materials were arranged in front of the participant at a distance of approximately 50 cm.
As a first step, the participants wore the HMD and held the joypad, while the researcher launched the practice session. This first phase aimed at familiarizing participants with the environment and the commands needed to interact with it. Afterwards, the experimental session started. The main task was the verbal learning of the verbs, which were presented in an auditory manner. Participants were instructed to listen to the Czech verbs, trying to remember as many items as possible. In addition, participants had to follow different instructions according to the experimental condition to which they belonged (as explained below).
Participants were randomly assigned to one of two experimental conditions: the Run condition or the Baseline condition. In the Run condition, participants performed the main task while exploring the park as if they were walking or running through it. The instructions stressed that they had to keep walking, in whatever direction they were going, without stopping until the verbs presentation stopped. The walk-like action inside the park was achieved by moving the joypad knob on the left with the left hand. This experimental condition required people to stand in front of the computer to assume a body position coherent with the virtual walk. No real walking movements were allowed during the session.
In the Baseline condition, the participants sat in front of the computer and started the virtual experience as if they were seated on a bench. In front of them, the playground of the park was displayed. Participants were instructed to pay attention to the Czech verbs: No action within the environment was allowed, with the only exception being the visual exploration of the scene (by turning the head around). This condition served as a baseline measure of the verbal learning.
After completing the study phase (which lasted about 12 min), participants were asked to perform a cued recall task: The experimenter presented the Czech verbs, in an auditory manner and one at a time, and the participants had to provide orally the corresponding Italian translation. The number of verbs correctly remembered was recorded. Immediately after the cued recall task, a recognition task was performed. Participants were instructed to listen to the Czech verbs and to select, as quickly as possible, one of the two possible translations written on the left and right side of the screen by pressing the corresponding left or right button on a button box. The correct responses were presented equally on the left or on the right side of the screen. The correct translation was always coupled with an incorrect, but plausible, translation (i.e., the translation of another presented verb). The RTs were recorded. At the end of the memory tasks, the participants completed the ITC-SOPI questionnaire (Lessiter et al., 2001).
Results
Statistical analyses were conducted on 40 participants. Two of them presented either ceiling or floor effects in the cued recall tasks and, hence, were excluded as outliers.
First of all, we were interested in testing the impact of the different virtual experiences on the dependent variables (number of verbs correctly remembered in the cued recall task, RTs of verbs correctly recognized, and number of errors in the recognition task—see descriptive statistics in Table 2).
Descriptives Statistics for All the Considered Variables.
We performed Repeated Measures ANOVAS. We used the variable “Verb” (three levels: hand–foot–abstract), as a within subjects variable and the “Condition” as a between subjects variable with two levels (baseline–run). Results highlighted that for the number of items recalled, there was an effect of the type of Verb,
With respect to the recognition task, the number of errors was not influenced by the type of Verb,
Afterward, the scores of the questionnaires were taken into account. First, we computed a MANCOVA using the responses (numbers of errors, response time, and free recall performance) as dependent variables, the Condition (baseline vs. run) as a fixed factor, and the subscales of ITC-SOPI questionnaire as covariates. We used the subscales as covariates as we were interested in examining if and how the levels of perceived presence could mediate participants’ overall responses.
The general model was significant for number of errors in recognizing the correct translation of hand-related verbs,
When examining the influence of specific covariates on our independent variable, it was possible to highlight another interesting effect. The subscale Eco-Valid had a significant influence,

The effect of the scale Ecological Validity.
As a second step, the same analysis was applied to the UsoImm 77 questionnaire, but results revealed no significant effects for any variable, thus indicating that the individual tendency to use imagery did not influence the task.
Discussion
The present study aimed at investigating the role of motor simulation during foreign language learning. To achieve this goal, we used a virtual environment where participants, while learning Czech verbs, had to move as if they were running. They achieved this virtual run experience by manipulating a knob with their left hand. This procedure allowed participants to obtain two kinds of action: one real (the movement of the hand on the knob) and one virtual (the virtual movement of the feet, which is necessary to run). When comparing the linguistic performance (in terms of learning) in this condition with that in the baseline condition (i.e., without any real or virtual action), we were able to understand if simulation is involved in this process and which movement triggers it.
When looking at the experiment’s results, it appears that, overall, the simulation of an action performed with the same/different effector does not play a role during second language learning. In fact, the number of items correctly recalled did not vary across conditions but depended only upon the type of verb: Abstract verbs were more difficult to remember than concrete ones. This finding is not surprising as the cognitive advantage of concrete words over abstract words has been recognized in several memory and language tasks (Nelson & Schreiber, 1992; Paivio, Walsh, & Bons, 1994).
Yet the fact that hand-action verbs and foot-action verbs did not differ from each other and, moreover, that the effect of the verb type was not different (depending on the conditions) seem to indicate that neither of the actions (real or virtual) affected the learning of verbs that described actions performed with the same or different effector. Coherently, during the recognition task, the same pattern of effects was evident: The words that were previously better retained (hand- and foot-action verbs) were more quickly recognized, and the opposite was true for the words that were less remembered (abstract verbs). On the contrary, the number of errors in the recognition task did not appear to be influenced by any considered variable: One possible explanation is that the error rate did not rely on the learning process but on different variables, possibly linked to the specific setting or environment.
The fact that simulation is apparently not involved in verbal learning is a new and relevant finding. Data in the literature report an advantage in terms of language learning due to linking words or sentences with gestures (Kelly et al., 2009; Macedonia & von Kriegstein, 2012; Tellier, 2008). However, the enrichment of the action achieved by using gestures and by using the typical paradigm employed to test simulation differ substantially. In one case, the learner pairs a lexical item with a univocal pattern of movements, and the couple action + word is repeated over and over during the study phase. In the second case, a specific movement (virtual or real—in this study, respectively, the run and the manipulation of the knob) is performed for the duration of the study phase with one specific effector that either matches or does not match the one corresponding to the verb. Thus, there is not a specific combination between motion and semantics, but only a generic sharing versus not sharing of the effector (Buccino et al., 2005; Repetto, Cipresso, & Riva, 2015). Moreover, in the present study, volunteers underwent a single session of learning with a relatively small number of verb repetitions: This could account for the lack of differential recall performances regardless of different learning conditions and types of verbs (notice that Macedonia [Macedonia et al., 2011] found that training does not always have an impact on retention). However, it is easy to see that while the gesture paradigm promotes the grounding of the meaning in the learner’s body experience, the use of the same versus different effector is not enough to establish a link between the lexical item and the action.
Yet, when learning focuses on conceptual knowledge, the involvement of simulation is reported in verbal learning tasks (Paulus et al., 2009). In this study, the learners were told explicitly to pay attention to the functional use of the objects, and thus, it is possible that these specific instructions allowed them to imagine the possible use of that object. In this case, the imagery is likely activated rather than simulation, and this process relies on different cerebral networks (Willems, Toni, Hagoort, & Casasanto, 2010).
The fact that no effect of simulation in the foreign-language-learning paradigm emerged from our data, compared with its well-established involvement in other linguistic tasks such as comprehension (Ditman et al., 2010; Frak et al., 2010; Tseng & Bergen, 2005; Zwaan & Taylor, 2006), seems to posit that simulation is a relatively “automatic” mechanism, activated during online processes, sometimes guided by the context or by attentional focus (Bergen & Wheeler, 2010; Taylor & Zwaan, 2008), but never pervaded by an awareness of the usage of strategies. Relying on this perspective, foreign language learning can be seen as a typical process in which individual strategies have a strong impact. To support this interpretation, we can mention the fact that, after the experimental session, participants spontaneously told the experimenter the tricks they used to recall as many verbs as possible in the final test (Lawson & Hogben, 1996). For this reason, the absence of a simulation effect in a language-learning task does not rule out the involvement of the motor system in this linguistic process.
The second (and somehow surprising) result is the effect of simulation in the recognition task. As discussed previously, the recognition measures do not seem to be influenced by the condition of learning (with or without virtual/real movement), but, more interestingly, some effect arises from the contribution of the presence components, as assessed by the ITC-Sopi questionnaire. Specifically, the number of hand-action verb errors in the recognition task seems to be predicted globally by the set of subscales of the questionnaire, with Engagement and Negative effects being the most important predictors. The hand-action verbs are more easily recognized if acquired without interference movement (Baseline condition), when the learner experienced a high level of Engagement and a low level of Negative Effects.
Even more interesting is the effect on the number of errors for foot-action verbs: This measure appeared to be influenced specifically by Ecological Validity, that is, the tendency to recognize the environment as real. When this index was higher, the errors decreased; moreover, the impact of Ecological Validity, when controlling for the other subscales, was predicted to foster fewer errors in the Run condition than in the Baseline condition. This effect is compatible with the following simulation: In the Run condition, learners performed a virtual motion with their feet, and this action was simulated exactly at the same time as the lexical access. Thus, the more the learner perceived the environment as real, the more the virtual action was effective on the cognitive representation of the verb, and the more he or she simulated the action during the recognition. The foot-action simulation, in turn, facilitated the lexical access to the verbs that shared the same effector.
Conclusion
The aim of the experiment was to extend the knowledge about the mechanism of simulation and its cognitive effects. In particular, we were interested in testing the occurrence of this process during linguistic tasks in which, to our knowledge, simulation has never been applied to second language learning. The first important finding is that the simulation per se is not sufficient to establish a relationship between words and action during learning, resulting in null effect with respect to the number of items recalled (at least with low number of items and repetitions).
Nevertheless, and maybe more interestingly, our paradigm allowed us to identify a new and relevant finding: The simulation can be mediated by other perceptual and cognitive processes induced by the context, especially the sense of presence. In this perspective, the use of virtual reality gave us the opportunity to point out the role of some factors linked to the specific experience. These factors appear to be related to the concept of presence (which is linked to the specific characteristic of a VR experience) and can promote or interfere with the simulation process that occurs, even after the virtual experience, as evidenced during the recognition task. In fact, it is known that the sense of presence is mediated by the egocentric perspective in the environment (Bae et al., 2012), typical of the virtual experience.
This result can suggest two reflections: On one hand, it makes clear that simulation can happen when the lexical item must be accessed after being learned. On the other hand, it pinpoints that the occurrence of simulation during this process is mediated in different ways by the different components of presence involved in the simulation. The latter observation raises interesting questions to be addressed by future research: Is it possible to modify the virtual environment to fit the parameters that promote simulation (according to the present findings: the reduction of the Negative Effects, the enhancement of Ecological Validity and Engagement)? What happens when the environment is “optimized” in terms of presence? Could the simulation speed up the time to access the word as well (in the present study, RTs do not appear to be influenced)?
In the present study, the virtual environment had a very basic structure, and the virtual experience allowed a low level of interaction. We can hypothesize that implementing a virtual world that induces higher levels of presence could be a first step toward this goal.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research and/or authorship of this article.
