Effect of cognitive load on working memory updating and prose recall skills of interpreters

Abstract

Aims and objectives:

The aim was to study working memory updating (WMU) and prose recall of two spoken-language interpreter groups, consecutive and simultaneous. The main question was whether cognitive load affects performance and whether specific experience develops specific working memory (WM) processes, manifested as between-group differences under increasing cognitive load.

Design:

Two groups of expert interpreters, 17 simultaneous (SIs) and 18 consecutive (CIs), participated in two experiments. For WMU, the effect of group and level of cognitive load was studied with a category experiment. For prose recall, speech sequences of 13–44 words were used to study cognitive load.

Data and analyses:

WMU data include cognitive performance measures: number of correctly updated words (serial order, number of categories) and number and types of errors vis-à-vis cognitive load. Prose recall data were analysed with idea units, plot point scores and reproduction of metatextual categories.

Findings:

Results showed that cognitive load affects performance: updating was impaired in both groups in condition ’simultaneous translating and repeating’. SIs performed better than CIs, which was confirmed further by error analysis. In prose recall, the effect of cognitive load was obvious: recall of extra-long speech sequences was impaired in both groups. Results also confirmed that with speech sequences exceeding 30 words, informational density and cognitive load increase.

Originality:

This is the first study comparing WMU skills of SIs and CIs under varying cognitive load. It introduces a modified category WMU task with simultaneous translation, and new analysis methods for prose recall: plot point scores and reproduction of metatextual categories.

Significance:

A new contribution is that cognitive load affects WMU performance, less with SIs than with CIs. The experiments show that a well-targeted design can reveal differences between groups using different interpreting modes (SI vs. CI) that probably also exist in authentic interpreting situations.

Keywords

Cognitive load working memory updating prose recall consecutive interpreter simultaneous interpreter

Introduction

Interpreting is a demanding task ideally requiring a completed formal education with appropriate practical training, followed by continuous learning and training on the job. Even without dedicated practicing, working as an interpreter improves the necessary cognitive, linguistic and social skills (see Russo, 2022, for a review). These could include, for instance, the appropriate use of memory and attention, working memory updating (WMU), rapid yet smooth turn-taking and change of language direction in dialogue and, of course, excellent skills in both working languages (cf. Tiselius, 2022, pp. 49–50, 53–54). The particular aim in this study was to examine the effect of cognitive load in relation to the experience in and share of work in the consecutive (CI) or the simultaneous mode (SI). In particular, we wanted to achieve new understanding of WMU and memory skills (prose recall) of experienced interpreters under varying cognitive load. Through well-targeted experiments, we aimed to import the real-world situations of increasing cognitive load while interpreting into controlled laboratory conditions (see, e.g., Ericsson & Williams, 2007).

In what follows, we will first describe the cognitive load¹ in simultaneous and consecutive interpreting and apply the concept to explain the differences between the two interpreting modes. The section after that explains these differences with the help of Cowan’s WM model and the tests that have been found to work best when studying the main executive functions (Friedman & Miyake, 2017) and ways of increasing the load during an updating experiment (Ecker et al., 2014). The section ends with a brief review of studies on updating with interpreters. The last section of the introduction deals with the differences between prose recall and listening span experiments used elsewhere to study the capacity of interpreters’ working memory (WM).

Different cognitive demands of simultaneous and consecutive interpreting

According to Pöchhacker (2004, reprinted 2009, p. 11; emphasis in the original), ‘Interpreting is a form of Translation in which a first and final rendition in another language is produced on the basis of a one-time presentation of an utterance in a source language’. In SI, the lag between incoming source text and spoken target text averages 2–3 seconds (see the review in Timarová et al., 2015). In CI, the target text is provided after the speaker pauses or completes the utterance (Colin & Morris, 1996).

The cognitive load in the two interpreting modes may be described using the Effort model by Gile (1997, 2008, 2016, 2021), developed especially for teaching and training conference interpreters. The model can, however, be successfully applied to describing dialogue situations, such as in community or court interpreting. In terms of the four efforts (Listening, including comprehension; Production; Memory; and Coordination), SI and CI differ especially in two cognitive processes: Memory and Coordination. SIs deliver the speaker’s message in the target language as it unfolds, and understanding the overall train of thought occurs by degrees, simultaneously with listening, speaking and coordinating these two. Thus, above all, the memory functions include the dividing of attention between several concurrent processes. In contrast, CIs listen to longer passages before interpreting, with a greater need to memorize and maintain in WM not just the details and the meaning of smaller segments but also the logic of the entire passage, in order to deliver both in the target language with appropriate accuracy. As to specifics of coordination, SIs have to continuously control all the other efforts, while the batch-type delivery in CI leads to less time pressure and a smaller coordination effort.

Thus, the listening and comprehension effort (Gile, 1997) differs significantly in the two modes. All interpreters need to learn to listen to the speaker in a different way than other listeners and to analyse the content while still listening (Seleskovitch & Lederer, 1989, Chapter II; Viljanmaa, 2020), but the time available for this is different depending on the interpreting mode. According to Gile (2008), cognitive load is especially high in SI, because the interpreter has to allocate his or her attention to several processes at once: listening to and comprehending the speaker (including memory functions), production (translating and speaking) of the target text, as well as monitoring and coordinating all of these processes.

Instead, in CI, the cognitive load is probably somewhat less, due to fewer attention-demanding concurrent processes and thus to more time available for both the listening and the production efforts. Here, the cognitive load is dependent on the length of the passage to be interpreted, that is, on the need to take notes while listening. In dialogue interpreting, the least load is experienced when notes are not needed at all (interpreting sentence by sentence, called ‘short consecutive’). The load is also fairly low when just a few notes are needed. As an example, thanks to previous experience, an interpreter working with a familiar topic will know how much he or she will be able to remember, and only needs to write down crucial words that will best support recall. Depending on the length of passage and personal aspects, even in this situation no notes may be needed. What is heard will be analysed and integrated with previous knowledge, and this will help recall and lessen the load even while interpreting longer passages [see also Tiselius and Englund Dimitrova (2023, p. 317) for the role of monitoring and turn-taking in managing cognitive load in dialogue interpreting].

The load for CIs is the highest when they must take detailed notes while listening. This is necessary especially in so-called long consecutive, used for interpreting monologues in conferences, meetings and so on, as well as in dialogue interpreting in courts, for example, during a witness statement. But even in these cases, the length of interpreting experience appears to have a significant effect, see Andres (2002) for differences in note-taking between the group of experienced CIs graded as ‘gut’ and interpreting students. See also Ahrens and Orlando (2022, pp. 37–38) and Gile (2021, 6.1 Note-taking in consecutive), as well as Isolahti (2014, pp. 136, 155–158) for variation of speech sequence lengths in court interpreting.

In general, although, the perceived load depends on several other factors, such as the length of the interpreter’s experience, familiarity with the topic and possibility to prepare in advance. In addition, many other burdening factors may be present: external interruptions, tiredness at the end of the day, problems with understanding the speaker and so on (see, e.g., Gile, 2021, 5.1 Problem triggers). This may lead to excessive load: some issues or details may not receive sufficient attention, resulting in saturation, as the interpreter is utilizing most of the processing capacity available (Gile, 2008). In this situation, the cognitive load can be exported (carried over) from a current process, such as understanding a particularly demanding source text, to another simultaneous process, such as producing the target text. At such moments, the cognitive load of producing the target text grows with the load imported from the process of understanding, resulting in proneness to errors, omissions or infelicities (EOIs; Gile, 2016).

To conclude, the main difference between simultaneous and consecutive interpreting is the degree of simultaneity of the different cognitive processes and the time allowed for each of them: listening to and analysing the source text, possibly taking notes, translating and speaking, as well as monitoring and coordinating concurrent processes. The differences in the number of processes may lead to differences in cognitive load, as well as in experiencing saturation.

WMU and interpreting

Gile’s Effort model is not a cognitive model, but many of its principles can be explained by Cowan’s (1988, 2001, 2022) WM model. In this model, WM consists of two components: focus of attention and activated part of long-term memory (aLTM). The capacity of the focus is limited, 3−5 units, and when more items or units are added, others may move away from the focus, although remaining active. Through different bindings, such as associations or linguistic components, the units may become quite large: in the case of a text, they can comprise a whole sentence or, in some cases, even several sentences.

Units not focused on or not actively attended to move into the aLTM, where features from different units may form new combinations (rapid new learning). These temporary new combinations will lose activation in 20−30 seconds if not reactivated (Cowan, 2001) but can also be further maintained as new memories if processed sufficiently in the focus of attention (Cowan, 2019). Many deactivated units can also be reactivated if needed later on, using linguistic and other relevant cues. In addition, rapid new learning often results in new aLTM material that can be further used for the task at hand (see next paragraph). In many cases, the material is also saved in LTM. Such rapidly learned material is, however, not always available afterwards, if the learning context is different from the context when it is retrieved again (Cowan & Morey, 2021).

Rapid new learning is probably utilized in interpreting on two occasions: when preparing for an interpreting assignment and during actual interpreting. Even when there has been relatively little time for preparing, new combinations, such as words with new associations or equivalents in the other language, will be automatically activated when mentioned by the speaker during interpreting. In addition, the new combinations learned during interpreting may form new concepts which the interpreter will need later, or even new skills useful in future interpreting assignments. The latter form of learning, called procedural memory, is automatic (Adams et al., 2018). Instead, learning a new concept during interpreting is much harder, as it needs to be focused on, to be permanently learned, and there is usually no time for this right then. See also Gile (2021) for language availability and gradual automation.

As to the efforts specified by Gile, they can also be explained by the Cowan’s WM model. The three efforts requiring WM (listening, production and translation/memory processes) can be focused on at the same time (cf. the three to five items, mentioned above). In Cowan’s model, both monitoring and coordination² come under executive functions, which direct different items and processes to be focused on as needed. If listening needs more attention, focus is directed on it. In cases where still more attention is needed for one particular process, such as for memorizing or writing down details in the spoken text, be it numbers, names or lists of details (see Problem triggers in Gile, 2021), some of the items attended to may fall into aLTM where they are more vulnerable to interference. This may lead to errors in production, see EOIs in Gile (2016).

As executive functions (EFs) are not explicitly explained in Cowan’s WM model, we used studies by Friedman and Miyake (2017) for the purpose. According to these scholars, EFs can be divided into three main functions: common EF, updating and shifting. Common EFs contribute to controlling the maintained information by inhibiting internal and external distractions. Although their use as regards goal maintenance and management varies according to situation, they are always active in interpreting, too.

Updating was considered the most important EF for our study, since WMU, essential for directing attention to relevant information and replacing outdated information (Ecker et al., 2014), seems indispensable for interpreting. In exploring WMU, three main processes can be included: retrieval, transformation and substitution (Ecker et al., 2010, 2014). As interpreters continuously update the content of WM through translating a message in one language into the equivalent message in another, that is, transforming and substituting WM content with new content, our main interest was directed to this process.

WMU seems to be a somewhat under-researched field in interpreting studies: we found only three relevant studies on the updating skills of professional spoken-language interpreters with a mean interpreting experience of 10 years or more, with a control group and enough participants for a reliable statistical power analysis. A 2-back experiment with letter stimuli by Van der Linden et al. (2018), found no differences in updating ability between SIs, foreign-language teachers and adult monolinguals. In an experiment by Hiltunen et al. (in progress), SIs outperformed CIs in updating the last words of categories (Keep track), although this experiment did not include transformation and substitution (translation). By contrast, Henrard and Van Daele (2017) found that compared to professional translators and non-linguistic experts, SIs were better at recalling groups of several letters in serial order after perceiving an unexpected signal. Meta-analyses by Hu and Fan (2021) also reported contradicting results on WMU.

Note, however, that the above-mentioned studies and other studies included in the meta-analysis may not be comparable with our study, as some of them do not have a similar control group and some use students as participants. Besides, many of the experiments may only measure updating ability to a minimal degree. As Friedman and Miyake (2017, 2.2) have stated, task impurity seems to be an unavoidable quality of EF tasks, because EFs involve controlling several lower-level processes at once. Thus, most of the experiments in the meta-analysis are not included in their factor analysis model of more pure updating experiments. In addition, n-back experiments are based on recognition, while interpreting relies, above all, on several active memory processes.

The same arguments apply to the meta-analysis by Nour et al. (2020) who found that SIs scored better in updating than comparison groups. They included in their analysis of updating experiments n-back experiments as well as complex memory tasks. According to Ecker et al. (2010), however, performance on updating tasks can largely (but not totally) be explained on the basis of transformation and especially retrieval, which are known attributes of WM, while substitution, which is essential to interpreting our WMU experiment, cannot be so explained.

Understanding prose recall

Another under-researched aspect of interpreter memory is prose recall, which seems to have been studied only by Hiltunen and Vik (2017). They found that all groups of foreign-language professionals recruited for the study (SIs, CIs and foreign-language teachers) outperformed a non-linguistic group, especially in recall of long speech sequences (17–23 words).

Prose recall experiments could, to a certain degree, be compared to reading and listening span experiments with WM. According to meta-analyses by Mellinger and Hanson (2019) and Wen and Dong (2019), SIs seem to outperform control groups (teachers, student interpreters, bilingual and monolingual students) in many WM tasks, including reading span. For listening span, the results are less convincing: either there were no differences between professional SIs and student interpreters (Stavrakaki et al., 2012), or student interpreters outperformed professional SIs (Köpke & Nespoulous, 2006). Only in the study by Chmiel (2018) did experienced interpreters show better listening span scores than bilingual students. Reading or listening span studies on professional CIs appear not to exist.

In any case, comparing prose recall to reading/listening span experiments requires caution. Although reading span and reading comprehension, and listening span and listening comprehension, are highly correlated (see, e.g., Daneman & Carpenter, 1980), several other subskills must be reckoned with. For instance, our auditory prose recall experiment includes accurate listening, encoding the text plot and details, maintaining them in WM for a shorter or longer time, and recalling them at the end of each speech sequence. After this, WMU is also necessary: after repeating the speech sequence as accurately as possible, the participant can remove irrelevant information from the focus of attention, releasing capacity for material coming in next. In this sense, the prose recall experiment resembles CI.

In contrast, in reading or listening span experiments, some of the above subskills may be partly or completely unneeded. For instance, in the reading span experiment by Daneman and Carpenter (1980), the detailed recall of each sentence is not necessary: only recognition of the content, for instance, by recalling the last word of the sentence, is needed. This requires less effort for listening, encoding and maintaining the plot and details. Furthermore, the sentences in a listening span experiment (usually 9–16 words) are shorter than most speech sequences in our prose recall experiment, requiring lower memory effort. In addition, in reading/listening span experiments, the distracter sentences can be removed from the focus of attention (letting them decay further), freeing resources for maintaining the last words and attending to what comes next (see, e.g., Morey & Cowan, 2018). In prose recall (and in interpreting), almost all information may be needed later (the speaker may return to it) and needs to be kept active (or quickly reactivated) until the end of the task (speech).

In addition, updating during reading is far more complex than in tasks measuring updating alone, because reading requires the selection of important information (Butterfuss & Kendeou, 2018). In interpreting, as well as in our present prose recall experiment, however, the principal selective processes are managed by the speaker (who chooses what to say and how) not by the interpreters (or participants).

Research questions

So far, the WMU tasks used to study interpreters have not included transformation (Ecker et al., 2010). As interpreters continuously transform the message by translating it into another language, we included translation in our updating task. Thus, the task chosen for this study was the updating-specific Keep track task [see Friedman & Miyake, 2017, Figure 1(b)], using verbal categories and transformation (by Ecker et al., 2010). This Category experiment can be modified to vary the cognitive load and is suitable for our purposes as a well-targeted laboratory experiment (see Introduction, first passage). In the same way as interpreting, the experiment requires monitoring and coordination (see Tiselius & Englund Dimitrova, 2023).

As to prose recall, in the previous study by Hiltunen and Vik (2017), the longest speech sequences were 17 to 23 words. By manipulating the sequence length to 40 words or more, we wanted to explore the role of cognitive load in an experiment which would resemble the real-world conditions of CI as much as possible. It seems that the longer the speech sequence, the higher its content of essential information and the more it will generate cognitive load while being memorized (see Gile, 2021, Problem triggers, informational density). In dialogues, the speech sequence length usually does not exceed 60 words (Takeda, 2011). As Finnish words are longer than English words, taking more time to listen to and memorize, and as note-taking was not allowed in our experiment, we decided to keep the sequence length somewhat shorter than that.

As this was, to our knowledge, the first study using transformation in a WMU task and extra-long speech sequences in prose recall for two interpreter groups (SIs and CIs), defining precise hypotheses was quite hard. The Effort model by Gile does not include predictions of how much the interpreter’s cognitive load can increase before reaching saturation. Still, some cautious hypothesis was considered possible:

Compared to CIs, SIs must continuously divide their attention between several processes, so that this skill might be manifested by SIs outperforming CIs in the most stressful situations, that is, when translating the words heard in addition to updating (Task 2. Translate and repeat, see Exp. 1).

As the prose recall experiment used resembled the working conditions in CI without notes, it might favour CIs so that they can better recall extra-long speech sequences of 41 to 44 words (see Exp. 2).

General methods

Participants – personal background

Thirty-five interpreters (34 female, 1 male) from six localities volunteered for the experiments. Two criteria for expertise were used in the recruitment: educational level (BA or equivalent as a minimum) and at least 10 years of interpreting experience (Ericsson & Lehmann, 1996; for specifics on interpreters, see Obler, 2012).

As shown in Table 1, all CIs, with the exception of four, only work in dialogue settings (see column Field of work, Public service and Court interpreting). The four exceptions who reported occasionally working as conference interpreter then reported using SI. Of the SIs, all worked in SI in conferences and mostly in CI when interpreting dialogues. None of the participants reported using CI to interpret monologues (long consecutive). In what follows, unless specifically mentioned, long consecutive is therefore excluded from the discussions (see also Bartłomiejczyk & Stachowiak-Szymczak, 2022, p. 19.)

Table 1.

Participants.

Group	n	Mean age in years (SD; range)	Mean professional experience in years (SD; range)^a	Education level at time of experiments	Field of work^b	Mean working hours per month (SD; range) for the past year^c
Group	n	Mean age in years (SD; range)	Mean professional experience in years (SD; range)^a	Education level at time of experiments	Field of work^b	Consecutive mode	Simultaneous mode
Consecutive interpreters	17	52.9 (10.3; 35–67^b)	20.2 (10.4; 4–37)	B.A. or equiv. (6) M.A. (9) In addition: EMCI (1) Professional qualifications: court interpreter (3) public service interpreter (1)	Public service interpr. (10) court interpreter (7) Interpr. in business negotiations (4) Conference interpreter (4) Teacher (languages) (2) Translator (10) Secretary (2) Other (4)	24.3 (21.9; 1–80)	1.6 (2.4; 0–3.3)
Simultaneous interpreters	18	51.2 (7.8; 38–63)	22.6 (7.0; 12–37)	B.A. or equiv. (3) M.A. (12) In addition: EMCI (1) Specialization in Conference interpreting (2) Qualifications: public service interpreter (1) print interpreter (1)	Conference interpr. (11) Court interpreter (8) Public service interpr. (7) Interpr. in business negotiations (6) Teacher (languages) (6) Teacher (interpreting) (3) Translator (13) Tourist guide (4) Other (6)	13.7 (24.4; 0–14.4)	31.8 (36.0; 0.4–128.0)
Between-group difference		t(33) = 0.5, p = .6	t(32) = 0.1, p = .9			F(1,32) = 1.8, p = .2, ηp² = .05	F(1,32) = 11.9, p < .002, ηp² = .27

Note. Mean age, educational level, field of occupation, professional experience in years and mean working hours in consecutive and simultaneous mode, respectively.

Eight participants in consecutive, and three in simultaneous interpreter group worked exclusively in the mode in question.

The question on field of work was as follows: What were your main fields of work? You may indicate more than one (tick as appropriate). The options were: Interpreting negotiations, community interpreting, conference interpreting or similar, court interpreting, translator and other fields (please specify). On average, each participant ticked two to three options. In both interpreter groups, two only ticked one field, and one interpreter in both groups ticked five (see also the text for participants in section ‘General methods’).

Average working hours (per month). Most interpreters, such as freelancers and those working in public service interpreting centres, could tell exactly the number of hours worked in Si and/or CI by checking their invoicing and then calculating the average. In only a few cases where the assignment included both SI and CI, the share of each was estimated as exactly as possible. These self-reported figures were used to calculate the percentages given in the text under participants. All participants worked actively as interpreters, at least periodically, regardless of age. To clarify minimum numbers of working hours, at the time when the experiments were carried out, one or two of the participants worked as directors of public service interpreting centres or the like and had little time for actual interpreting.

In a background questionnaire, the interpreters reported the share of work in CI and/or SI during the preceding year. The figures were mainly based on invoicing, although, in some cases, the share of the two modes used during the same assignment could only be estimated. The reported figures were used to calculate the average monthly percentages of work in either mode for each participant.

As regards the division into groups, the main principle was to keep the CI group as ‘pure’ as possible. Thus, the CI group included 17 participants (16 female, 1 male) using the consecutive mode for an average of 92.7% of their interpreting hours (minimum value 74%). The remaining SI group consisted of 18 participants (all female), who interpreted simultaneously for an average of 75.2% of their interpreting hours (minimum value 37.5%). Eight participants in the consecutive and three in the SI group worked exclusively in the mode in question.

The mean interpreting experience of the simultaneous group was 22.6 years (SD = 7.0), and that of the consecutive group was 20.2 years (SD = 10.4), with no significant between-group difference in age or professional experience. There was, however, a significant between-group difference in interpreting hours/week in the simultaneous mode, showing that the majority of SIs worked almost exclusively in SI, while of the CIs, only a few worked in SI, and then infrequently. Instead, there was no between-group difference in interpreting hours/week in the consecutive mode, meaning that many SIs also worked in CI if needed. Thus, the CI group was almost homogeneous, but for the SI group, reaching total homogeneity was not possible (for details, see Table 1).

Table 2 presents the participants’ self-reported estimates of how much they take notes in CI (on a scale of 0–10; 0 = none, 10 = practically every word noted). The data indicated no between-group difference in estimates of note volume in medium (20 to 30 words) or in long speech sequences (30 to 50 words). Then again, there was a significant length effect between medium and long speech sequences (with p < .001), showing that both groups needed more notes when the speech sequence length was 30 to 50 words. The group and sequence length interaction was not significant.

Table 2.

Participants’ self-reported estimate on volume of notes taken during CI: short speech sequences (20 to 30 words, less than 15 seconds) vs. long speech sequences (30 to 50 words, 20 to 35 seconds). Mean (SD), range.

Group	n	Volume of notes estimated by participants^a
Group	n	Medium speech sequences (20 to 30 words, less than 15 seconds) mean (SD) range^b	Long speech sequences (30 to 50 words, 20 to 35 seconds) mean (SD) range
Consecutive interpreters	17	1.82 (2.03)	5.00 (3.21)
Consecutive interpreters		0−8	0−10
Simultaneous interpreters	18	2.53 (2.58)	5.67 (2.52)
Simultaneous interpreters		0−8	0−10
Total	35	2.16 (2.30)	5.31 (2.88)
Total		0−8	0−10
Between-group difference			F(1,30) = .77, p < .39, ηp² = .03
Main seq.length effect			F(1,30) = 40.8, p < .001, ηp² = .58
Group and seq.length interaction			F(1,30) = .002, p = 1.0, ηp² = .00

Estimate, how much you do notes while interpreting consecutively? If the speech sequences are short (10 to 20 words, less than 15 seconds). If the speech sequences are long (30 to 50 words, 20 to 35 seconds). Scale: 0 = none, 10 = I note almost every word.

For the questionnaire, two criteria were used for the division between medium and long speech sequences: First, the cut-off point was based on the experiment in Hiltunen and Vik (2017), in which all language professionals were well able to recall speech sequences of up to 28 words. Second, the maximum length of long speech sequences was determined by the fact that the duration of activation in WM is about 30 seconds (Cowan, 2001), and the finding by Isolahti (2014) that one participant was able to recall speech sequences of up to 60 words without taking many more notes than other participants, who worked with shorter sequences.

However, as regards medium speech sequences, the estimated volume of notes by experienced CIs was only about two-thirds of that reported by SIs (mean 1.82 vs. 2.53). This indicates that the majority of CIs were experienced interpreters who only need notes when interpreting longer sequences, and even then only seldom (see section ‘Different cognitive demands of simultaneous and consecutive interpreting’). There was, although, great individual variation: range 1–8 for CIs and 1–10 for SIs. As comparison, similar variation in interpreted speech sequence lengths was also seen in data from six court interpreters (Isolahti, 2014, pp. 136, 155–158): the lowest number of words per sequence 1–34 (mean 14.19) and the highest 1–59 words (mean 18.61). All six interpreters took very few notes, writing down only names, numbers and dates.

Working in a language of limited diffusion

The first language of our participants, that is, Finnish, is a language of limited diffusion (LLD), with about 5.4 million native or bilingual speakers in Finland (Kotimaisten kielten keskus, 2025), and the participants’ background is in all respects representative of the situation of interpreters in countries whose official language is of limited diffusion, such as the Nordic and Baltic states. Interpreters are obliged to accept work regardless of topic and sometimes also to work in both modes to make a living. They may also end up working as translators, teachers and tourist guides, when interpreting assignments are not available. However, such other tasks are also useful in that they increase the interpreters’ general knowledge and may acquaint them with topics and vocabulary that are later useful in interpreting.

For these reasons, regardless of personal wishes or preferences, interpreters of LLDs are fairly rarely able to specialize in either CI or SI or in particular topics to the exclusion of others. The same reasons also make it difficult to recruit sufficient numbers of participants for experiments. This study, for example, was designed to only include native speakers of Finnish with a specified level of education and a sufficient length of experience in interpreting. In a small country, we needed to recruit all eligible interpreters that could be persuaded to participate. Many of those contacted had to refuse because pressure at work was too high to allow taking time off. In particular, those interpreting in immigrant languages (such as Arabic or Kurdish) were overworked as it was.

Our own experience, as well as discussions with Nordic colleagues, seems to point to the extreme rarity of monologues requiring long consecutive: only dinner speeches or welcoming addresses and so on are occasionally interpreted consecutively, and they tend to be on the short side. In other situations, SI is always used in conferences and multilingual meetings, while in court simultaneous, whispering is sometimes used. In dialogue settings, the interpreter (or at court, the president) may interrupt the speaker if needed, to avoid overtaxing the interpreter’s memory and note-taking capability (see also Tiselius, 2022, pp. 49–50, 53).

General procedure and materials

Each participant was tested individually in silent conditions. Since interpreters work on verbal spoken materials, verbal variants seemed preferable for both tasks (see also the meta-analysis by Wen & Dong, 2019). The stimuli were presented through headphones, and the spoken answers were recorded with a laptop computer, using the Presentation software.

The experiments were conducted in the following order for all participants: WMU and prose recall (Experiments 1 and 2). The experiment session lasted about 1:15 hours in all. To start with, each participant signed a confirmation that they had been informed of the ethical principles in the research. After each experiment, the participant filled in an inquiry on memory strategies used during the experiment, and in the case of WMU (Experiment 1) on how they had prepared. For the analysis of all subsequent between-group tests, the criterion for statistical significance was set at p = .05.

Experiment 1, WMU (keeping track)

Introduction

The updating experiment was anticipated to reveal differences in WMU between SIs and CIs. As SI requires the constant and quick updating of memory content, this ability might have developed more markedly in SIs than in CIs during their careers. The skill might be manifested especially in the most stressful situations, that is, when translating the words heard in addition to updating.

Materials and procedure

To test WMU, we chose the Keeping Track task by Yntema (1963), modified into an auditory variant and adapted to the Finnish context. In addition, the present experiment consisted of two tasks described below. In the experiment, exemplars (words) from six different categories (chemical elements; animals; languages; trees and bushes; garments and accessories; and currency units) were used. Short words were favoured in the selection, although due to the phonological structure of Finnish, they varied in length, as they also had to be sufficiently familiar to the participant, to allow an easy and rapid categorization during the task.

Each trial consisted of 15 words including words from all 6 categories, with one to five items from each category. The number of categories to be tracked in each trial varied from one to four, and two trials for each category size were presented in a randomized order in both tasks. The words were presented auditorily at the rate of 1 word/2.5 seconds. The order of presentation was fixed for all participants. The categories to be tracked in each trial were named auditorily before the presentation of the items for that trial. The participants could also check in a handout the categories for each trial throughout the experiment.

The experiment consisted of two tasks conducted in the same order for each participant. In the first part (Task 1: Repeat), the participant was to track the most recent word presented in each relevant category and maintain it in memory until another word in the same category was presented, updating and maintaining it until hearing the next item in that category or until the end of the trial, while ignoring all words in non-relevant categories. After the last item of each trial, the participants were to repeat aloud the last word presented in each category relevant for that trial. The second part (Task 2: Translate and repeat) differed from the first part in that there were two concurrent tasks. The participant was to silently translate each word, immediately after hearing it, into the target language chosen beforehand, while updating the words in the same way as in Part 1. After the trial, the participant was to repeat aloud the translated last words of the relevant categories. The target languages used by the participants in Task 2 were (number of CIs/SIs in parentheses): English (4/9), French (5/2), Swedish (2/3), Russian (3/1), German (1/1), Spanish (2/0) and Norwegian (0/2). The answers were recorded, with a subsequent transcription and analysis.

The instructions for the experiment with the names of categories (not with the stimulus words) were sent by e-mail to each participant 1−2 working days before the experiment. The participant could thus prepare for the experiment in much the same way as interpreters usually prepare for an assignment (see Gile, 2018).

Results

Advance preparation

The post-test strategy report showed that almost all participants had used the opportunity to prepare for approximately an hour (varying between 15 minutes and 3.5 hours). On average, SIs used more hours for preparation than CIs: 1.29 hours (SD = 0.35) versus 0.90 (SD = 0.35), respectively. SIs also evaluated the intensity of preparation as higher: 4.11 (SD = 2.12) vs. 3.37 (SD = 2.83) on a scale from 0 to 10 (0 – not at all; 10 – I did everything I was able to or could think of).

Those who had prepared very little or not at all mentioned pressure of work as the main reason. Many also mentioned that preparation was difficult without a text providing the word associations needed. The preparation strategies were mostly the same as for an interpreting assignment: making word lists and finding the equivalents in the target language on the Internet or in dictionaries and possibly rehearsing them after a while or the next day. However, almost all participants regardless of mode said that they prepare more intensively for work assignments, mostly using parallel texts. Only two CIs reported that they usually do not prepare for assignments at all.

Analysis

In addition to the difference in results between Task 1 and Task 2 meant to reveal the meaning of cognitive load in simple and dual-task conditions, the present analysis included several other aspects of load: serial position effect, number of categories effect, task difficulty (ways of lessening the burden, reported by participants in the strategy inquiry after the test), as well as errors in WMU.

Serial position effect

The fifteen words of each trial were divided into positions according to order of presentation as follows: primary positions³ (four first items), recency positions (four last items) and middle positions (items in-between). After this, the differences in the numbers of items recalled from each position were analysed with two Group (consecutive vs. simultaneous) × two Task (Repeat vs. Translate and repeat) × three Position (primary vs. middle vs. recency) analysis of variance (ANOVA).

The results indicated a significant between-group effect, showing that SIs performed better than CIs for all positions except for the first positions in the first task (Repeat). In addition, a significant difference between the tasks was found, but the task and group interaction did not reach significance, suggesting that the second task (Translate and repeat) was performed with less success than the first one (Repeat) by both groups. For the exact figures, see Table 3.

Table 3.

Results of updating experiment (Keep track).

Group	n	Number of last words of categories correctly updated and recalled from different positions, indicated as number of words from maximum possible (SD)
Group	n	Primary positions (1−4), from max two items	Middle positions (5−11), from max nine items	Recency positions (12−15), from max seven items	Total, from max 18 items
Task 1: Repeat
Consecutive interpreters	17	0.47 (0.21)	0.56 (0.22)	0.81 (0.19)	0.64 (0.13)
Simultaneous interpreters	18	0.47 (0.36)	0.62 (0.17)	0.89 (0.10)	0.71 (0.11)
Total	35	0.47 (0.30)	0.59 (0.19)	0.85 (0.15)	0.68 (0.12)
Task 2: Translate and repeat
Consecutive interpreters	17	0.58 (0.13)	0.24 (0.26)	0.71 (0.17)	0.59 (0.11)
Simultaneous interpreters	18	0.69 (0.17)	0.39 (0.32)	0.85 (0.10)	0.72 (0.11)
Total	35	0.63 (0.16)	0.31 (0.30)	0.78 (0.16)	0.66 (0.13)
Between-group difference					F(1,33) = 5.7,p = .02, ηp² = .15
Main task effect					F(1,33) = 5.0,p = .03, ηp² = .13
Group and task interaction					F(1,33) = 2.4.,p = .1, ηp² = .07
Main position effect			F(1,33) = 6.4,p = .05*, ηp² = .16	F(1,33) = 69.5,p = .003*, ηp² = .69	F(2,66) = 54.0,p < .001, p² = .62
Group and position interaction			F(1,33) = 0.46,p = 1*, ηp² = .02	F(1,33) = 0.9,p = 1*, ηp² = .03	F(2,66) = 0.4,p = .6, ηp² = .01
Task and position interaction			F(1,33) = 44.6,p = .003*, ηp² = .57	F(1,33) = 15.2,p = .003*, ηp² = .32	F(2,66) = 23.4,p < .001, p² = .42
Task and position and group interaction			F(1,33) = 0.0,p = 1*, ηp² = .00	F(1,33) = 0.1,p = 1*, ηp² = .00	F(2,66) = .06,p = .9, ηp² = .00

Note. Number of last words of categories correctly updated by the group recalled from primary, middle and recency positions in the list presented: mean, standard deviation and total number of words from maximum possible, as well as ANOVA functions.

Bonferroni-corrected value.

The difference between the positions was also significant, but there was no group and position interaction. Furthermore, the task and position interaction did reach significance, while the task, position and group interaction did not. The Bonferroni-corrected contrast analysis indicated that the main position effect was due to more items being updated and recalled especially from the four last positions (12 to 15, recency effect) in both tasks by both groups. Instead, the task and position interaction indicated that compared to the primacy effect, both middle and recency effects were larger in the first task (Repeat), but in the second task (Translate and repeat), only the recency effect was larger for both groups. The greater cognitive load of translating and repeating thus seems to affect most strongly the middle positions of the list.

Number of categories effect

To reveal possible differences in the cognitive load for WMU even more clearly, the number of categories to be tracked was analysed with a two Group (consecutive vs. simultaneous) × two Task (Repeat vs. Translate and repeat) × three Load (two vs. three vs. four categories) ANOVA. The results indicated a significant between-group effect, showing that SIs outperformed CIs for almost all numbers of categories, except when tracking two categories in the first task (Repeat). Still, there was no task effect, nor task and group interaction (for the exact figures, see Table 4).

Table 4.

Results of updating experiment by degree of difficulty (cognitive load).

Group	n	Number of last words of categories correctly updated and recalled, indicated as number of words from maximum possible (SD)
Group	n	Two categories	Three categories	Four categories	Total
Task 1: Repeat
Consecutive interpreters	17	0.88 (0.18)	0.76 (0.17)	0.43 (0.18)	0.64 (0.13)
Simultaneous interpreters	18	0.78 (0.26)	0.95 (0.16)	0.49 (0.19)	0.59 (0.12)
Total	35	0.83 (0.23)	0.86 (0.19)	0.46 (0.18)	0.66 (0.13)
Task 2: Translate and repeat
Consecutive interpreters	17	0.90 (0.15)	0.45 (0.15)	0.54 (0.19)	0.71 (0.11)
Simultaneous interpreters	18	0.94 (0.14)	0.63 (0.21)	0.66 (0.17)	0.72 (0.11)
Total	35	0.92 (0.15)	0.54 (0.20)	0.60 (0.19)	0.66 (0.13)
Between-group difference					F(1,33) = 5.9,p = .02, ηp² = .15
Main task effect					F(1,33) = 1.9,p = .2, ηp² = .05
Group and task interaction					F(1,33) = 2.7,p = .1, ηp² = .08
Main load effect			F(1,33) = 41.1,p = .003*, ηp² = .55	F(1,33) = 133.6,p = .003*, ηp² = .80	F(2,66) = 76.4,p < .001, ηp² = .70
Group and load interaction			F(1,33) = 15.0,p = .003*, ηp² = .31	F(1,33) = 3.6,p = 1.*, ηp² = .10	F(2,66) = 7.3,p = .001, ηp² = .18
Task & load interaction			F(1,33) = 43.1,p = .003*, ηp² = .57	F(1,33) = 0.8,p = 1.*, ηp² = .02	F(2,66) = 34.7,p < .001, ηp² = .51
Task and load and group interaction			F(1,33) = 1.7,p = .6*, ηp² = .05	F(1,33) = 0.6,p = 1.*, ηp² = .02	F(1,66) = 0.9,p = .4, ηp² = .03

Note. Number of last words of categories correctly updated and recalled by the group and by the number of categories to be tracked: mean, standard deviation and ANOVA functions.

Bonferroni-corrected value.

The effect of load, as well as load and group interaction, showed significance. The Bonferroni-corrected contrast analysis indicated that the main effect of cognitive load was due to a better updating ability when tracking three categories instead of two or four categories by both groups when only repeating, and two categories instead of three or four categories when both translating and repeating. In addition, the joint effect of load and group was based on the SIs excelling in tracking and updating three categories, especially in the first task (Repeat).

The task and load interaction reached significance, but task, position and group interaction did not. The Bonferroni-corrected contrast analysis indicated that the joint effect of load and task was due to the fact that tracking and updating three categories was easier for both groups in the first task (Repeat) than in the second task (Translate and repeat).

Task difficulty

The effect of cognitive load was manifested in two more ways. As shown by the strategy reports, in the second task of simultaneous translating and updating, if not immediately able to find the equivalent word in the target language, nine CIs and six SIs used Finnish (= zero translation) or some other language more familiar to them for one to three recalled items each, a possibility that was also allowed in the instructions. Participants also mentioned other strategies for making the task easier: four participants in both groups reported not having translated each item as they came, especially while updating four categories.

Errors in WMU

The total error percentage was low for both groups: on average fewer than 13 per cent of items were erroneously recalled (see Table 5). The errors were classified on two principles. First, as errors in updating: either item or goal updating. The first type consists of recalling an earlier item in the requested category instead of the last one, and the second one of recalling items from a non-relevant category. Second, following the classification of errors in free recall by Unsworth and Engle (2007), the errors were sorted into two types: previous list intrusions (items from previous word lists, PLI) and extra-list intrusions (items not presented in any list, ELI).

Table 5.

Errors in working memory updating (Experiment 1).

Group	n	Percentages of errors of all recalled items
Group	n	Task 1: repeat, % (SD)	Task 2: translate and repeat, % (SD)
Consecutive interpreters	17	12.96 (5.12)	12.47 (5.83)
Simultaneous interpreters	18	9.66 (5.84)	7.96 (5.84)
Total	35	11.26 (5.67)	10.15 (6.19)
Between-group difference		F(1,33) = 5.3, p = .03, ηp² = .14
Main task effect		F(1,33) = 1.4, p = .2, ηp² = .04
Group and task interaction		F(1,33) = 0.4, p = .5, ηp² = .01

Note. Percentages of errors by task and by group of all recalled items: mean, standard division; between-group difference, task and task and group effect.

The percentages of errors by type and by group out of all recalled items and between-group differences are presented in Table 6. In a two group (simultaneous vs. consecutive) × two task (Repeat vs. Translate and repeat) ANOVA, a significant between-group effect in errors was revealed for the task, but no effect for task or task and group interaction, showing that SIs made fewer errors in both tasks. Furthermore, a multivariate analysis of variance (MANOVA) analysis of error type revealed two significant between-group effects: CIs made more errors in item updating as well as more extra-list intrusions. For the other two error types, the between-group difference did not reach significance.

Table 6.

Errors in working memory updating (Experiment 1).

Group	N	Error type, percentages of all recalled items				Total, percentage of errors of all recalled items (SD)
		Error in updating		Previous list intrusions (PLI), % (SD)	Extra-list intrusions (ELI), % (SD)
		Previous item in category, % (SD)	Error in goal updating: previous category % (SD)	Previous list intrusions (PLI), % (SD)	Extra-list intrusions (ELI), % (SD)
Consecutive interpreters	17	15.28 (5.59)	1.73 (3.28)	4.78 (4.35)	4.70 (5.03)	26.48 (9.54)
Simultaneous interpreters	18	10.75 (6.70)	2.73 (4.22)	2.92 (3.38)	1.22 (2.37)	17.62 (10.45)
Total	35	12.95 (6.67)	2.24 (3.77)	3.82 (3.93)	2.91 (4.22)	21.93 (10.85)
Between-group difference		F(1,33) = 4.4,p = .04, ηp² = .12	F(1,33) = 0.6,p = .44, ηp² = .02	F(1,33) = 2.0,p = .2, ηp² = .06	F(1,33) = 7.0,p = .01, ηp² = .18

Note. Percentages of errors by error type and by group of all recalled items: mean, standard division and between-group difference.

In addition, a chi-square test with post hoc cell contributions at the level of p < .05 was used to reveal if any error types were especially characteristic of either interpreter group. The observed frequencies, expected values,⁴ and residuals for the error categories are presented in Table 7. The results showed that there was a statistically significant difference in error distributions between CIs and SIs, χ²(2) = 31.9, p = .000. The post hoc cell contributions indicated that CIs made more extra-list intrusions than expected, while SIs made more errors in goal updating.

Table 7.

Errors in working memory updating (Experiment 1).

Error type	Consecutive interpreters n = 17	Simultaneous interpreters n = 18	Total
Errors in item updating
Observed frequency*	260a	194a	454
(Expected values)	266.4	187.6
Residual	−6.4	6.4
Errors in goal updating
Observed frequency**	29a	49b	78
(Expected values)	45.8	32.2
Residual	−16.8	16.8
Previous list intrusions (PLI)
Observed frequency**	81a	52a	133
(Expected values)	78.0	55.0
Residual	3.0	−3.0
Extra-list intrusions (ELI)
Observed frequency	80a	22b	102
(Expected values)	59.8	42.2
Residual	20.2	−20.2
Total	450	317	767
Pearson chi-square			.000

Note. *Observed frequencies of errors, expected values and residuals with total by group.

Each subscript letter denotes a subset of group categories whose column proportions do not differ significantly from each other at the .05 level; in other words, if adjacent columns for a group on the line ‘Observed frequencies’ include the letter ‘a’, the difference between the two groups is not significant, but if the letters for adjacent group columns are ‘a’ and ‘b’, there is a significant difference between the groups: χ2(2) = 31.9, p = .000.

Discussion

The WMU experiment produced several interesting findings. If analysed in the position of item presentation, both groups did worse in the dual-task situation of the second task (Translate and repeat). Obviously, translating the items into the target language (i.e., performing a transformation) while tracking the last items of categories was cognitively more demanding than the tracking task alone. In addition, tracking items presented at the end of a list was easier for both groups. This result corroborates a previous recency effect finding of updating in a running memory task by Ruiz et al. (2005). Interestingly, however, SIs seemed to outperform CIs at tracking items regardless of the order of presentation: their performance was better in both middle and recency positions. This result partly correlates with the findings by Hiltunen et al. (in progress) showing that SIs outperformed CIs in updating middle positions of the list.

However, the most important finding was that tracking the last items of categories was demanding for both groups: the cognitive load increased with more categories to be tracked. The dual-task situation in the second task (Translate and repeat) increased the cognitive load even further: compared to the first task of merely tracking the last items, the impact of the dual-task was seen in that fewer categories could be tracked successfully (three categories in the first task and only two categories in the second task).

Still, as expected (see section ‘Research questions’), compared to CIs, SIs were able to update and recall more of the last items of categories when the number of relevant categories was three, and this ability was obvious in both tasks. This seems to contradict the finding in the meta-analyses by Hu and Fan (2021) showing mixed effects in WMU, possibly explained by the different experiment designs. Perhaps the present category experiment corresponds better to the updating functions of interpreters than the 2-back experiment most frequently used. On the contrary, the new results corroborate the finding by Hiltunen et al. (in progress) that SIs are better at updating than CIs, especially when the cognitive load is growing but before reaching the saturation point (Gile, 1997, 2008).

The better performance by SIs was confirmed even further by error analysis. Compared to CIs, they made fewer errors in both tasks. Some between-group differences in error types could be found: compared to SIs, CIs made more extra-list intrusions but fewer errors in goal updating. The higher number of ELIs by CIs might be explained by a between-group difference in the time and intensity of preparation: SIs reported having used more time for it and having done it more intensively. Through looking for the most obvious exemplars in each category and finding equivalents for them in advance, SIs might have learned them at least partly (see rapid new learning in Cowan’s WM model, under Introduction, WMU and interpreting), and during the test, recognized them more quickly and so avoided extra items (ELIs) coming to mind. However, a similar difference seems not to apply to real interpreting situations, as, according to the strategy reports, almost all participants in both groups reported preparing for assignments with higher intensity.

The only error type which CIs were able to avoid better than SIs was errors in goal updating: SIs often failed to update the categories to be tracked in the next group of items. The finding does not corroborate the Wisconsin Card Sorting Test (WSCT) results in Yudes et al. (2011), in which SIs showed fewer perseverative errors than monolingual or bilingual controls did. Perseverative errors occur when the participant continues sorting the cards according to the previous-category dimension despite feedback indicating that the response was wrong. The inability of SIs to avoid similar perseverative errors in our updating experiment may be due to no feedback being given.

To conclude, the main results of the WMU task were as follows: (1) The task of simultaneously translating and updating was more demanding for both groups than updating alone, and this was manifested in two measures: in updating and recall of middle positions of the list and number of categories to be tracked. (2) SIs outperformed CIs in two measures: ability to track more categories in both tasks and number of errors. One explanation for the latter finding could be that at work the CIs have not learned to divide their attention in the same way as SIs and are thus accustomed to a smaller cognitive load at work. As a result, they did not manage as well as SIs in the current WMU task, when the increased load required them to divide attention between updating and translation (for a more thorough analysis of the findings, see section ‘General discussion’).

Experiment 2, prose recall

Introduction

The prose recall experiment was anticipated to reveal specific differences in memory functions between SIs and CIs. As the working conditions in CI require encoding, compressing the message and keeping active longer speech sequences than in simultaneous interpreting, these abilities might have developed more markedly in CIs than in SIs during their careers. The prose recall experiment resembled the working conditions in CI: without notes, it might favour CIs so that they can better recall extra-long speech sequences of 41 to 44 words.

Participants in the prose recall experiment were the same as in the WMU experiment. However, the data of three participants (two CI and one SI) were not included in the analysis due to technical challenges and unsuccessful recordings.

Materials and procedure

The prose recall experiment was performed after the WMU experiment by all participants. The text selected was the column ‘What do bicycle, social media and stereotypes have in common? Do they all contribute to lazy thinking?’ (‘Mitä yhteistä on polkupyörällä, somella ja stereotypioilla? Ovatko ne kaikki laiskan ajattelun apuvälineitä?’; Heinonen, 2017) (see Supplemental Appendix A). The column was published in the daily Aamulehti, which has a very wide circulation. The text did not contain words, concepts or information which would have been difficult to understand by a native-language listener. The column heading (topic) was included in the recorded text.

The 383-word text was recorded and spoken by the same female voice as the words in the WMU experiment. After recording, the text was edited and presented with the presentation software. The text was divided into fourteen speech sequences with enough time between them for spoken recall. The sequence length varied from 13 to 44 words (7.0 to 30.2 s). On average, assuming natural pauses between sequences, the text was spoken at 85 words/min.

The participants listened to the speech sequences through headphones and were instructed to recall each sequence aloud to the best of their ability. The importance of maintaining the message was emphasized (as is usual in interpreting), but the participants were also told that there would be extra points for details correctly recalled. Taking notes was not allowed, as we wanted to study the WM functions alone, rather than the ability to take notes along with memorizing the sequence.

Voice recording of the recalled sequences was automatized using the built-in microphone level thresholding built into the software. The recording of each sequence ended automatically if no voice was detected within 3 seconds. The instructions before the experiment mentioned this possibility, and contrary to usual interpreting practice, the participants were told to use an audible filler to begin their response in case they could not immediately start repeating the sequence, to prevent the recording from stopping after 3 seconds.

The maximum time set for recording depended on the length of each sequence on the basis of piloting (see Supplemental Appendix C for exact details), but above that, the recording continued for two more seconds after the next sequence was presented. This allowed slower participants to finish the recalled sequence, although it is possible that the participant’s concentration was divided between their own speech and listening to the beginning of the next sequence. It was assumed that this would not disturb an experienced interpreter, as long as they had interpreted simultaneously at least occasionally. The participants were informed about this possibility in advance, and a few did make use of it. The next sequence started 0.5 seconds after the maximum recording time for the previous sequence.

Analysis

The participants’ recordings were transcribed into .doc format, and the statistical analyses were made using SPSS for Windows 21.0. The analysis was performed using two methods: recall of idea units and maintenance of text plot. ‘Idea unit’ refers to ‘a sentence or part of a sentence that expresses a complete idea which contains an actual or implicit verb and is usually a phrase-size unit’ (Mills et al., 1993, p. 289). The text was parsed into idea units by the main author (experimenter) in close cooperation with the second author of this paper, who herself has worked as SI and CI as well as with interpreter education and examinations. Two points were given for each correctly recalled idea unit, with a maximum score of 150 points for the entire text (see Supplemental Appendix A). No points were removed for omissions, errors or incorrectly recalled idea units.

In the second analysis, points were given for maintenance of the plot. The points were defined by the first and second authors, using the argumentation analysis by Kakkuri-Knuuttila and Halonen (2000, pp. 101−102). The point scores varied so that in principle, every part of the text important for understanding the whole was awarded 0.5–1 points (see Supplemental Appendix B for details). These could be described as, say, main argument, intermediate argument with reasonings or proposed solutions and arguments for them. The maximum score possible was 23.5 points.

Both the idea unit scores and plot scores for the text as recalled by the participants were determined by the first author in close collaboration with the second author as well as a research assistant, who has a bachelor’s degree in interpreting and translating. Any deviations in the assessment of two team members were discussed with the third member, but the final decision was made by the first author.

Results

The two measures, idea units and plot scores, were used to find out the effect of cognitive load in speech sequence length. We also analysed between-group differences, with the assumption that they could be explained by differences in cognitive load. With this in view, we carried out analyses on speech density, the effect of metatextual category on recall and weakly recalled speech sequences.

Speech sequence length

For the analysis, the speech sequences were grouped by length as follows: short (12−13 words, 7.0−19.7 seconds) and medium (20−27 words, 11.9−18.2 seconds) length; long (29−36 words, 19.9−25.5 seconds) and extra long (41−44 words, 28.1−30.2 seconds). As the text contained only two short sequences, the short and medium sequences were analysed together. The between-group differences in the recall of speech sequences of different lengths measured as idea unit scores and plot scores were analysed using a 2 (Group: consecutive vs. simultaneous) × 2 (measure: idea units vs. plot scores) × 3 (Sequence length: short/medium vs. long vs. extra long) ANOVA. The results showed no statistically significant between-group effect. There was, however, a significant main effect of measure, indicating that both interpreter groups got higher scores if measured by idea units than if measured by plot scores. The joint effect of group and measuring method was not significant (for details, see Table 8).

Table 8.

Results of prose recall (Exp. 2) in idea unit and plot maintenance scores for consecutive and simultaneous interpreters.

Speech sequence length in words and seconds	Points out of maximum possible, mean (SD)
Speech sequence length in words and seconds	Short and middle 13 and 20−27 words, 7.0−19.7 seconds	Long 29−36 words, 19.9−25.5. seconds	Extra long 41−44 words, 28.1−30.2 seconds	Total
Idea unit measure
Consecutive interpreters, n = 15	0.74 (0.11)	0.55 (0.13)	0.55 (0.15)	0.64 (0.12)
Simultaneous interpreters, n = 17	0.79 (0.09)	0.65 (0.12)	0.61 (0.13)	0.70 (0.09)
Total	0.77 (0.10)	0.60 (0.13)	0.59 (0.14)	0.67 (0.11)
Plot score measure
Consecutive interpreters, n = 15	0.74 (0.13)	0.43 (0.17)	0.52 (0.17)	0.59 (0.13)
Simultaneous interpreters, n = 17	0.79 (0.11)	0.50 (0.18)	0.51 (0.15)	0.63 (0.10)
Total	0.76 (0.12)	0.47 (0.17)	0.52 (0.15)	0.61 (0.11)
Main effect of group, p-value (ηp²)				F(1,30) = 1.9, p = .2, ηp² = .06
Main effect of measure, p-value (ηp²)				F(1,30) = 70.3,p < .001, ηp² = .70
Group & measure interaction, p-value (ηp²)				F(1,30) = 3.3,p = .08, ηp² = .10
Main effect of sequence length, p-value (ηp²)		F(1,30) = 144.2,p = .003*, ηp² = .83	F(1,30) = 80.6,p = .003*, ηp² = .73	F(2,60) = 68.4,p < .001, ηp² = .70
Group & sequence length interaction, p-value (ηp²)				F(1,30) = 0.9,p = .4, ηp² = .03
Measure & length interaction, p-value (ηp²)		F(1,30) = 31.6,p = .003*, ηp² = .51	F(1,30) = 11.1,p = .006*, ηp² = .27	F(2,60) = 14.1,p < .001, ηp² = .32
Measure, length and group interaction, p-value (ηp²)				F(2,60) = .1,0p = .4, ηp² = .03

Note. Speech sequence length in words and seconds, points out of maximum possible for each length category (short and middle, long and extra long), mean and standard deviation.

Bonferroni-corrected value.

There was a statistically significant main effect of sequence length, but the joint effect of length and group did not reach significance. The Bonferroni-corrected contrast analyses showed that the main effect of sequence length was due to the short-/medium-length sequences being recalled better than either long or extra-long sequences by both groups. Furthermore, measure and sequence length interaction was significant, but measure, sequence length and group interaction was not. The Bonferroni-corrected contrast analyses indicated that the interaction mentioned resulted from an especially weak recall of long sequences for both groups when measured in plot scores.

Relationship between recorded speech sequence length and idea unit scores (speech density)

To reveal any between-group differences in the relationship between the lengths of recalled and recorded individual speech sequences and idea unit scores, the following classification was used. First, the length in milliseconds of each recorded speech sequence for all participants was determined. Then, the participants were divided into two groups for each speech sequence: those above or below the median length. Second, the midpoint of idea unit scores for each speech sequence was found, and the participants were divided into two groups: those scoring above and those below the midpoint. After that, the recorded length and score values were combined to form the following categories: (1) Succinct: short sequence – high idea unit score; (2) Iterative: long sequence – low idea unit score; (3) Sparse: short sequence – low idea unit score and (4) Verbose: long sequence – high idea unit score.

A chi-square test with post hoc cell contributions at the level of p < .05 was used to see if any category was especially characteristic of either interpreter group (Table 9). The results showed a statistically significant difference in category distributions between CIs and SIs, χ²(2) = 14.1, p = .003. The post hoc cell contributions indicated that SIs produced more often succinct, and CIs more often iterative speech than the other-way round. There were no between-group differences in other ‘speech quality’ categories.

Table 9.

Classes by speech sequence length and idea unit scores by group (1 – succinct, 2 – long-winded, 3 – sparse and 4 – verbose).

Class type*	Consecutive interpreters n = 15	Simultaneous interpreters n = 17	Total
1. Succinct (‘napakka’)
Observed frequency**	45a	86b	131
(Expected values)	61.0	70.0
Residual	−16.0	16.0
2. Iterative (‘runsassanainen, tarkentava)
Observed frequency**	34a	21b	55
(Expected values)	25.6	29.4
Residual	8.4	−8.4
3. Sparse (‘niukkasanainen’)
Observed frequency**	47a	45a	92
(Expected values)	42.8	49.2
Residual	4.2	−4.2
4. Verbose (‘vuolas’)
Observed frequency**	82a	87a	169
(Expected values)	78.6	90.4
Residual	3.4	−3.4
Total	208	239	447
Pearson chi-square			.003

Note. Observed frequencies, expected values and residuals with total by group.

Categories: (1) Succinct: short sequence – high idea unit score; (2) Long-winded: long sequence – idea unit score; (3) Sparse: short sequence – low idea unit score; (4) Verbose: long sequence – high idea unit score.

Each subscript letter denotes a subset of group categories whose column proportions do not differ significantly from each other at the .05 level; in other words, if adjacent columns for a group on the line ‘Observed frequency’ include the letter ‘a’, the difference between the two groups is not significant, but if the letters for adjacent group columns are ‘a’ and ‘b’, there is a significant difference between the groups.

Meta-analysis of prose text content

To analyse the content of the recalled prose text in greater detail, an argumentation analysis was used (see Supplemental Appendix B). For this, each segment awarding plot maintenance points was analysed to determine its rhetorical function in the text. The following metalabels were defined (number of exemplars of each category is in parentheses): Explanation (8), Illustration (6), Description (5), Counter-suggestion (4), Cause (3), Solution (2), Halo effect (2), Conclusion (2), Contextualization (2), Advantage (1), Connection (1) and Consequence (1). Note that the aim was not to develop a universal rhetorical categorization but simply to categorize the segments in this particular text. Other texts would quite probably yield other categories in addition to or instead of these.

A chi-square test with post hoc cell contributions at the level of p < .05 was used to reveal whether either interpreter group had noticeable difficulties in recalling specific types of prose sequences (Table 10). The results indicated no between-group difference for any category, χ²(2) = 3.09, p = 1.0, showing that regardless of metatextual category, the units were recalled equally well by both interpreter groups.

Table 10.

Categories determined by argumentation analysis in prose recall by group (1. Explanation, 2. Illustration, 3. Description, 4. Counter-suggestion, 5. Cause, 6. Solution, 7. Halo effect, 8. Conclusion, 9. Contextualization, 10. Advantage, 11. Connection and 12. Consequence).

Class type*	Consecutive interpreters n = 15	Simultaneous interpreters n = 17	Total
1. Explanation (8)
Observed frequency*	47a	51a	98
(Expected values)	44.6	53.4
Residual	2.4	−2.4
2. Illustration (6)
Observed frequency*	39a	43a	82
(Expected values)	37.3	44.7
Residual	1.7	−1.7
3. Description (5)
Observed frequency*	38a	47a	85
(Expected values)	38.7	46.3
Residual	−0.7	0.7
4. Counter-suggestion (4)
Observed frequency*	13a	22a	35
(Expected values)	15.9	19.1
Residual	−2.9	2.9
5. Cause (3)
Observed frequency*	17a	19a	36
(Expected values)	16.4	19.6
Residual	0.6	−0.6
6. Solution (2)
Observed frequency*	12a	15a	27
(Expected values)	12.3	14.7
Residual	−0.3	0.3
7. Halo effect (2)
Observed frequency*	13a	16a	29
(Expected values)	13.2	15.8
Residual	−0.2	0.2
8. Conclusion (2)
Observed frequency*	8a	11a	19
(Expected values)	8.6	10.4
Residual	−0.6	0.6
9. Contextualization (2)
Observed frequency*	10a	13a	23
(Expected values)	10.5	12.5
Residual	−0.5	0.5
10. Advantage (1)
Observed frequency*	6a	8a	14
(Expected values)	6.4	7.6
Residual	−0.4	0.4
11. Connection (1)
Observed frequency*	5a	7a	12
(Expected values)	5.5	6.5
Residual	−0.5	0.5
12. Consequence (1)
Observed frequency*	3a	1a	4
(Expected values)	1.8	2.2
Residual	1.2	−1.2
Total	211	253	464
Pearson chi-square			1.0

Note. Number of exemplars in each category is in parentheses. Observed frequencies, expected values and residuals with total by group.

Each subscript letter denotes a subset of group categories whose column proportions do not differ significantly from each other at the .05 level; in other words, if adjacent columns for a group on the line ‘Observed frequency’ include the letter ‘a’, the difference between the two groups is not significant, but if the letters for adjacent group columns are ‘a’ and ‘b’, there is a significant difference between the groups.

Weakly recalled metatextual categories

Weakly recalled metatextual categories were inspected separately. For this, metatextual categories in speech sequences in the lowest statistical quartile of plot maintenance scores for all participants ([max score − min score]: 4) were included, resulting in five out of 36 sequence occurrences (see Table 11 and Supplemental Appendix B for details).

Table 11.

Prose recall, weakly recalled metatextual categories (5 out of 36 = statistical quartile of mean plot maintenance scores of all participants: [max score – min score]: 4) (for details, see Supplemental Appendix B).

Speech sequence with length class metatextual label (number of words in parenthesis) with contents in Finnish and (in English)	Group, plot scores out of maximum (mean, SD)
	Consecutive interpreters, n = 15	Simultaneous interpreters n = 17	Total n = 32
SS5_medium (35): Consequence1: Oversimplified categorization. . . yksinkertaistaa ajatteluamme ja vahvistaa entisestään yksiviivaista ja luokittelevaa ajattelua. – (. . . simplifies our thinking and further bolsters already simplistic and categorizing thinking.)	0.20 (0.25)	0.06 (0.17)	0.13 (0.22)
SS9_extra-long (41): Explanation4: Incidental factors:Miksi? Meihin kaikkiin – myös asiantuntijoihin – vaikuttavat satunnaiset ympäristötekijät. – (Why? All of us – including experts – are affected by incidental factors in our surroundings.)	0.23 (0.26)	0.12 (0.22)	0.17 (0.24)
SS9_extra-long (41): Counter-suggestion2: Tacit knowledge not always correct:Tiedon, joka ei aina pidä paikkaansa. – (That knowledge is not always correct.)	0.07 (0.18)	0.18 (0.25)	0.13 (0.22)
SS11_medium (35): Conclusion2: Thinking may expand OR contract:Voit kaventaa tai laajentaa ajatteluasi. – (You may broaden or restrict your thinking.)	0.17 (0.24)	0.21 (0.25)	0.19 (0.25)
SS13_short (20): Return to aids to quick action: Advantage2: In several contexts:Ajattelumme tarvitsee apuvälineitä nopeaan toimintaan niin somessa kuin sotessa. – (Our thinking needs aids to rapid action in both social media and health care.)	0.07 (0.18)	0.12 (0.22)	0.09 (0.20)
Total	0.15 (0.07)	0.14 (0.06)	0.14 (0.07)

Note. Speech sequence number with length class, label and contents. Plot scores out of maximum by group: mean and standard division.

The MANOVA for category occurrences with p-values varying between .07 and .66 did not show a statistically significant between-group difference. However, as shown in Table 11, the plot maintenance scores related to the occurrences are quite low with the first of them Consequence1 in SS5 recalled especially weakly by SIs; and the two others, Counter-suggestion2 in SS9 and SS13 Advantage2 in SS13, by CIs, with only a few participants receiving full points (means varying between 0.06% and 0.07%).

Discussion

The results of the prose recall experiment indicated no statistically significant differences between CIs and SIs, whether measured in idea units or plot maintenance scores. There was, however, a significant main effect of speech sequence length, indicating that recalling long and extra-long speech sequences (29 words and more) is very demanding regardless of interpreting mode (CI or SI). So, contrary to our expectation, the ability to keep in memory long sequences (up to 44 words) was equal in both interpreter groups. In addition, as indicated by the detailed meta-analysis, both groups recalled all metatextual categories of the prose text equally well, and there were no metatextual categories not recalled at all, showing that the main message of the text was recalled well.

Note, however, that short and medium speech sequences (12−27 words, 7.0−18.2 seconds) were recalled best. This corresponds to the results in Hiltunen and Vik (2017), in which the speech sequences recalled best were the longest ones in that experiment, 20–27 words. It would thus appear that with the number of words in a speech sequence approaching 30, its information content and thus the cognitive load increase, making it more difficult to recall in detail without notes. This finding applies to both interpreter groups.

In addition, the relationship between recorded sequence length and idea unit scores indicated a statistically significant between-group difference in two of the four ‘speech quality’ categories: SIs produced more often succinct and CIs iterative speech than other-way round. A possible explanation might be the difference in time pressure (cognitive load) during interpreting between the two groups: SIs may be accustomed to using less time, and CIs more time for producing the target text. If so, both groups behaved in this way even during the experiment. It is worth noting that the experiment design was close to the CI situation, except that taking notes was not allowed.

As to the weakly recalled metatextual occurrences, one occurrence in a medium-length (SS5), one in an extra-long (SS9) and one in a short (SS13) speech sequence were especially weakly recalled by either interpreter group (see Supplemental Appendix B and Table 11). As the sequence length varied greatly, length alone cannot be the explanation. Instead, it is perhaps the imported cognitive load (Gile, 2008) that caused weak recall and between-group differences. For example, the parts of extra-long sequence SS9 caused difficulties for both groups: Explanation4 for SIs and Counter-suggestion2 for CIs. This may be because the preceding SS8 was also long (36 words with several details). The saturation point may have been reached, resulting in omissions of details even in the following sequence (Gile, 2016). Note also that the combined duration of these two sequences (SS8 and SS9; see Supplemental Appendix C) was over 50 seconds, far beyond the usual 30 seconds’ scope that can be maintained in WM (or the 2–3 seconds lag between the source and target texts in SI).

Furthermore, it could be that the sequence length together with the detailed information in the preceding part of the sequence in SS5 (‘time pressure, need for rapid response and stress’) was what caused the weaker recall by SIs in that sequence. At work, SIs can usually deliver the interpretation more quickly after hearing the original and need not keep several details in mind at the same time.

Conclusions

The present study revealed several new results. The main findings seem to confirm the significance of cognitive load in both updating and prose recall experiments. In the first experiment, both interpreter groups had updating difficulties, especially in the dual-task situation of simultaneous translating and updating, although the task appeared to be somewhat easier for SIs than for CIs. The better performance by SIs was confirmed even further by error analysis, showing their more developed skills of coordinating several concurrent processes. In prose recall, the significance of cognitive load was obvious for both interpreter groups, especially in recall of extra-long speech sequences (41 to 44 words). It would thus appear that with the number of words in a speech sequence approaching 30, its information content and also the cognitive load increase, making it more difficult to recall in detail without notes. This finding applies to both interpreter groups. Consequently, only one between-group difference was found in prose recall: CIs produced more iterative speech than SIs did. Rather than by a more successful management of cognitive load, this difference is more probably explained by the fact that at work SIs are accustomed to avoiding wordiness because of time pressure and did so from habit during the experiment.

General discussion

The main purpose of the present study was to gain a better understanding of cognitive differences between two spoken-language interpreter groups, consecutive and simultaneous (CIs and SIs), as well as the role of cognitive load in explaining eventual differences. For this, two experiments modified for the purpose were used.

The WMU experiment revealed two basic differences: compared to CIs, SIs were better able to update the last words of categories in the dual-task situation of translating and updating and made fewer errors. In addition, a difference in error type was revealed: SIs made more errors in goal updating and CIs in extra-list intrusions.

In contrast, in the prose recall experiment, both groups performed equally well, with only one notable between-group difference found in the relationship between recorded sequence length and plot maintenance scores: on average, CIs produced more iterative speech than SIs. Nevertheless, the significance of cognitive load in prose recall was established for both groups. The results revealed that for both groups, speech sequences of 20–29 words are recalled best. If the sequence length exceeds this and taking notes is not possible for some reason, errors of several types are probable. Especially with an extra-long sequence immediately following a long sequence, the most abstract part of the sequence was omitted completely or recalled more weakly than other parts.

The better performance of SIs in category updating may be explained in several ways. It is possible that the experiment design with a dual-task format of simultaneous transformation (translation) and updating resembled more closely the divided-attention situation of simultaneous listening, translating, speech production and monitoring in SI. CIs do not habitually encounter similar simultaneity at work (see the introduction on Effort model by Gile, 2008). With less time pressure in the experiment and more time for translation and updating between the items, CIs might also have performed well. Thus, the sentence updating task developed by Fellman et al. (2018) might be better suited for testing the updating skills of CIs.

As to SIs making more errors in goal updating, possibly the present updating experiment may mirror the updating needs of some interpreters more closely than of others. In CI, especially in public service and court interpreting, there are often several speakers on both sides, often with power asymmetries (Rudvin, 2001), leading to continuous changes of speaker and language. Instead, in SI (particularly in a conference setting), the speaker and the language do not change as frequently as was required by the change of categories in the present updating experiment. This indicates that at work SIs may not have to track changes in the situation (update the goal) to the same degree as CIs, thus making more errors in our experiment.

Why, then, did both interpreter groups perform equally well in prose recall, although the experiment design favoured CIs? After all, in reading span experiments SIs can outperform many other linguist groups [see Mellinger and Hanson (2019) and Wen and Dong (2019)]. Obviously, speech sequence length cannot explain this finding, as there were no between-group differences in the recall of long speech sequences.

The explanation could first be based on similar maintenance processes in SI and CI, which have been developed during numerous years of practice at work. This enabled both groups to recall the prose passages even without the possibility to take notes or to prepare for the task in advance, which is usual in most interpreting assignments. Because of time pressure, interpreters probably do not remove anything from the focus of attention, nor do they refresh or especially elaborate the material in WM (Morey & Cowan, 2018): the elaboration is done while listening to and understanding the message. In addition, the different linguistic bindings (prosodic, semantic, syntactic, etc.) in the spoken message help them to keep the message activated and, at least partly, focused. While some details and facts inevitably decay with time (see, e.g., Adams et al., 2018), many facts, especially relevant ones, can still be reactivated quickly when necessary. Thus, the between-group differences are probably based on the accumulation of cognitive load, more pronounced in SI and evidenced also by the present WMU experiment.

Increased cognitive load is also indicated by the fact that in the prose recall experiment, SIs produced more often succinct speech, possibly in an attempt to reduce the load. Instead, CIs were more prone to iterative speech, possibly to ensure that nothing was left out despite the increased load. Experiments on recall of words have also revealed that scanning the units presented to find any not yet mentioned (rapid mental scanning) can help to reactivate them (Cowan, 1992, p. 681) and thus allow more to be recalled. These differing strategies could result from the sharply different time constraints that SIs and CIs are used to.

Second, the updating processes of CIs and SIs could be different. According to Butterfuss and Kendeou (2018), updating in text comprehension alone is much more complex than updating of individual words (categories), not to mention other subskills needed in prose recall, such as maintaining the plot and producing the speech sequence fluently and as accurately as possible. It could be assumed that the continuous rehearsal in updating while interpreting coherent prose favoured both groups equally in prose recall, but not in the updating of categories.

To conclude, interpreters in both groups seem able to recall accurately an average of over 60% of the information in the whole text and more than half even in long and extra-long speech sequences, independent of the measure (idea units or plot maintenance scores). The analysis of recall of metatextual categories also showed that the main message as well as plenty of the details was recalled well even when the cognitive load was high, indicating a high level of plot maintenance. In addition, approximately the same average quantity of items was recalled accurately in the WMU experiment (66%). This is well in line with the results by Gerver (1976), according to which SIs were able to correctly produce over 85% of the source text. All of this is a convincing indication of the advanced WM skills of experienced professional interpreters regardless of interpreting mode.

Future directions

The findings of the present experiments should be treated with considerable caution. The sample size was quite small, and especially the SI group was far too heterogeneous, consisting of interpreters doing both SI and CI to varying degrees. Still, due to a medium effect size (for between-group differences, partial eta squared between .14 and .18), at least some of the results should be of value. In particular, the methods used, the verbal category updating experiment with translation to increase the cognitive load, and the prose recall with extra-long speech sequences seem to offer new perspectives for research on WM with different interpreter groups.

In any case, repeating these experiments is recommended, preferably in more widely used languages, such as English, German or French, to achieve a more ‘pure’ participant group in each mode. In addition, it would be interesting to screen the participants beforehand with an inquiry similar to the one we used after the test to divide them into three different groups: one SI group and two CI groups: those who usually take just a few notes while interpreting and those who usually use plenty of notes at work (especially for long consecutive). They could also be asked about situations where they either do or do not use plenty of notes. Above all, we recommend using complete sentences and transforming them in the updating experiment (Fellman et al., 2018; see also section ‘General discussion’), instead of non-related words. This wish was also expressed by the participants.

In addition, although our detailed meta-analysis in prose recall did not reveal statistically significant differences between the two interpreter groups, this analysis method proved very insightful. The different criteria used so far for prose recall evaluation, be it the number of individual words, propositions, idea units or even key expressions correctly recalled, have seemed to us inadequate for revealing the whole story of the key goal in interpreting: to deliver the whole message as precisely as possible. Sometimes, as for instance in a court of law, this may require reproducing all details in what has been said. In a less formal setting, especially when the speaker’s style is long-winded and possibly hard to understand, this could mean reproducing what is essential in the speech: what the speaker ended up saying but floundered along the way.

Consequently, understanding the level of detail appropriate to each interpreting situation requires a careful advance analysis of several situational factors and may well be an element of an interpreter’s expertise. If prose texts were analysed on the metalevel, as was done in the present experiment, it might be possible to get closer to evaluating what is important for the interpreter to recall in each situation. As, however, this metatextual method was used for the first time in a prose recall experiment with interpreters, the method itself should be thoroughly examined before being more extensively utilized. The present prose text was far too short and the participants far too few for generalization. Studies on the usefulness of the metatextual method using corpora of authentic speeches and their interpretations for different settings (conferences, court, health care, etc.) are to be recommended.

Still, in conclusion, the updating ability itself seems to be such an important skill in interpreting that more comprehensive tests are needed, designed especially to reveal its importance for several cognitive processes in different modes and settings.

Supplemental Material

sj-docx-1-ijb-10.1177_13670069241302407 – Supplemental material for Effect of cognitive load on working memory updating and prose recall skills of interpreters

Supplemental material, sj-docx-1-ijb-10.1177_13670069241302407 for Effect of cognitive load on working memory updating and prose recall skills of interpreters by Sinikka Hiltunen, Heli Mäntyranta, Gun-Viol Vik and Virpi Kalakoski in International Journal of Bilingualism

Footnotes

Acknowledgements

The authors thank the heads of the various institutions and organizations, as well as all colleagues, and, above all, the volunteer participants who contributed to this study. The authors are especially grateful to Tommi Makkonen, Lisa Uhle and Mari Tervaniemi for their collaboration in the preparation and conducting of experiments and checking the results, as well as for insightful comments on and corrections to the manuscript.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Sinikka Hiltunen

Supplemental material

Supplemental material for this article is available online.

Notes

Author biographies

Sinikka Hiltunen holds MA degrees both in translation and interpreting (University of Tampere) and in cognitive science (University of Helsinki). For over two decades, she worked as in-house translator and consecutive dialogue interpreter, mainly for manufacturing industries. She is an authorized translator (Russian–Finnish), has worked as a freelance lecturer, educator, and trainer on topics related to linguistics, psychology, and cognitive sciences, and for a couple of years, taught psychology at an upper secondary school. The publication of the article at hand enables her to start finalizing her doctoral thesis in cognitive science.

Heli Mäntyranta has an MA degree in translation and interpreting (University of Tampere). She is an authorized translator (Finnish–English–Finnish). After 12 years of teaching at the University of Tampere, she worked as a self-employed translator and conference interpreter. She also acted as the principal examiner at competence-based vocational examinations for community interpreters for several years from the inception of the examination system. Now semi-retired, her research interests include various aspects of the work and education of interpreters and authorized translators.

Gun-Viol Vik, PhD, has expertise in interpreting and bi- and multilingual practices in organizations. She also works as a professional conference interpreter.

Virpi Kalakoski, PhD, is a cognitive psychologist and a research manager at the Finnish Institute of Occupational Health. Her PhD research focused on the experimental study of expert memory. Her current projects apply cognitive psychology to workplace research, exploring topics such as cognitive demands at work, workplace interventions, and algorithmic management.

References

Adams

E. J.

Nguyen

A. T.

Cowan

(2018). Theories of working memory: Differences in definitions, degree of modularity, role of attention, and purpose. Language, Speech, and Hearing Services in Schools, 49, 340–355.

Ahrens

Orlando

(2022). Note-taking for consecutive conference interpreting. In Albl-Mikasa

Tiselius

(Eds.), Routledge handbook of conference interpreting (pp. 34–48). Routledge, Taylor and Francis Group.

Andres

(2002). Konsekutivdolmetschen und Notation. Peter Lang-Verlag.

Barrouillet

Bernardin

Camos

(2004). Time constraints and resource sharing in adults’ working memory spans. Journal of Experimental Psychology: General, 133, 83–100.

Bartłomiejczyk

Stachowiak-Szymczak

(2022). Modes of conference interpreting simultaneous and consecutive. In Albl-Mikasa

Tiselius

(Eds.), Routledge handbook of conference interpreting (pp. 19–33). Routledge.

Butterfuss

Kendeou

(2018). The role of executive functions in reading comprehension. Educational Psychology Review, 30, 801–826.

Chmiel

(2018). In search of the working memory advantage in conference interpreting–training, experience and task effects. International Journal of Bilingualism, 22(3), 371–384.

Colin

Morris

(1996). Interpreters and the legal process. Waterside Press.

Cowan

(1988). Evolving conceptions of memory storage, selective attention, and their mutual constraints within the human information-processing system. Psychological Bulletin, 104(2), 163–191.

10.

Cowan

(1992). Verbal memory span and the timing of spoken recall. Journal of Memory and Language, 31, 668–684.

11.

Cowan

(2001). The magical number 4 in short-term memory: A reconsideration of mental storage capacity. Behavioral and Brain Sciences, 24(1), 87–114.

12.

Cowan

(2019). Short-term memory based on activated long-term memory: A review in response to Norris (2017) [Comment]. Psychological Bulletin, 145(8), 822–847.

13.

Cowan

(2022). Working memory development: A 50-year assessment of research and underlying theories. Cognition, 224, 105075.

14.

Cowan

Morey

C. C.

(2021). An embedded-processes approach to working memory. In Logie

R. H.

Camos

Cowan

(Eds.), Working memory state of the science (pp. 44–84). Oxford University press.

15.

Daneman

Carpenter

P. A.

(1980). Individual differences in working memory and reading. Journal of Verbal Learning and Verbal Behavior, 19(4), 450–466.

16.

Ecker

U. K. H.

Lewandowsky

Oberauer

Chee

A. E. H.

(2010). The components of working memory updating. An experimental decomposition and individual differences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26(1), 170–189.

17.

Ecker

U. K. H.

Oberauer

Lewandowsky

(2014). Working memory updating involves item-specific removal. Journal of Memory and Language, 74, 1–15.

18.

Ericsson

K. A.

Lehmann

A. C.

(1996). Expert and exceptional performance: Evidence of maximal adaptation to task constraints. Annual Review of Psychology, 47, 273–305.

19.

Ericsson

K. A.

Williams

A. M.

(2007). Capturing naturally occurring superior performance in the laboratory: Translational research on expert performance. Journal of Experimental Psychology: Applied, 13(3), 115–123.

20.

Fellman

Soveri

Viktorsson

Haga

Nylund

Johansson

Edman

von Renteln

Laine

(2018). Selective updating of sentences: Introducing a new measure of verbal working memory. Applied Psycholinguistics, 39, 275–301.

21.

Friedman

N. P.

Miyake

(2017). Unity and diversity of executive functions: Individual differences as a window on cognitive structure. Cortex, 86, 186–204.

22.

Gerver

(1976). Empirical studies of simultaneous interpretation: A review and a model. In Brislin

(Ed.), Translation: Application and research (pp. 165–207). Gardner Press.

23.

Gile

(1997). Conference interpreting as a cognitive management problem. In Danks

J. H.

Shreve

G. M.

Fountain

S. B.

McBeath

M. K.

(Eds.), Cognitive processes in translation and interpreting (pp. 196–214). Sage.

24.

Gile

(2008). Local cognitive load in simultaneous interpreting and its implications for empirical research. Forum, 6(2), 59–77.

25.

Gile

(2016, January 20). The Effort model and Gravitational model clarifications and updates [PPT presentation]. https://www.researchgate.net/publication/303249990_The_Effort_Models_-_Clarifications_and_update

26.

Gile

(2018). Simultaneous interpreting. In Chan

(Ed.), An encyclopedia of practical translation an interpreting (pp. 531–561). The Chinese University Press.

27.

Gile

(2021). The Effort models in interpreting as a didactic construct [Preprint]. In Muñoz Martín

Sun

S. S.

(Eds.), Advances in cognitive translation studies (pp. 139–160). Springer Nature.

28.

Heinonen

(2017, September 3). Mitä yhteistä on polkupyörällä, somella ja stereotypioilla? Ovatko ne kaikki laiskan ajattelun apuvälineitä? [What do bicycle, social media and stereotypes have in common? Do they all contribute to lazy thinking?], Aamulehti, B6/Ihmiset. https://www.aamulehti.fi/uutiset/mita-yhteista-on-polkupyoralla-somella-ja-stereotypioilla-ovatko-ne-kaikki-laiskan-ajattelun-apuvalineita-200365382/

29.

Henrard

Van Daele

(2017). Different bilingual experiences might modulate executive tasks advantages: Comparative analysis between monolinguals, translators and interpreters. Frontiers in Psychology, 8, Article1870.

30.

Hiltunen

Vik

G.-V.

(2017). Interpreters–experts in careful listening and efficient encoding? Findings of a prose recall test. International Journal of Bilingualism, 21(2), 194–212.

31.

Hiltunen

Vik

G.-V., M

äntyranta

Kalakoski

(in progress). Do memory and updating skills of interpreters reflect particular features of long-term experience?

32.

Fan

(2021). The interpreter advantage in executive functions–a systematic review and meta-analysis. Forum for Linguistic Studies, 3(1), 131–161.

33.

Isolahti

(2014). Tulkkauksen tarkkuus rikosoikeudenkäynnissä–saavuttamaton ihanne [Accuracy of interpreting in criminal trials: An unachievable ideal] [Dissertation]. Tampere University, Tampere.

34.

Kakkuri-Knuuttila

M.-L.

Halonen

(2000). Argumentaatioanalyysi ja hyvän argumentin ehdot [Argumentation analysis and premises of a good argument.]. In Kakkuri-Knuuttila

M.-L.

(Ed.), Argumentti ja kritiikki, Lukemisen, keskustelun ja vakuuttamisen taidot (pp. 60–113). Gaudeamus.

35.

Köpke

Nespoulous

(2006). Working memory in expert and novice interpreters. Interpreting, 8(1), 1–23.

36.

Kotimaisten kielten keskus. (2025, December 12). Kielitieto, Kielet, Suomi [On languages, Languages of Finland and language policy]. https://www.kotus.fi/kielitieto/kielet/suomi; https://en.kotus.fi/on-language/languages-of-finland-and-language-policy/

37.

Mellinger

Hanson

T. A.

(2019). Meta-analyses of simultaneous interpreting and working memory. Interpreting, 21(2), 165–195.

38.

Mills

C. B.

Diehl

V. A.

Birkmire

D. P.

Mou

(1993). Procedural text: Predictions of importance ratings and recall by models of reading comprehension. Discourse Processes, 16, 279–315.

39.

Morey

C. C.

Cowan

(2018). Can we distinguish three maintenance processes in working memory? Maintenance processes in working memory. Annals of the New York Academy of Sciences, 1424(1), 45–51.

40.

Nour

Struys

Woumans

Hollebeke1

Stengers

(2020). An interpreter advantage in executive functions? A systematic review. Interpreting, 22(2), 163–186.

41.

Obler

L. K.

(2012). Conference interpreting as extreme language use. International Journal of Bilingualism, 16(2), 177–182.

42.

Pöchhacker

(2004, reprinted 2009). Introducing interpreting studies. Taylor & Francis.

43.

Riccardi

(2022). Strategies and capacity management in conference interpreting. In Albl-Mikasa

Tiselius

(Eds.), The Routledge handbook of conference interpreting (pp. 371–385). Routledge.

44.

Rudvin

(2001). Cross-cultural dynamics in community interpreting. Troubleshooting. In Hansen

Malmkjær

Gile

(Eds.), Claims, changes and challenges in translation studies, vol. 50, selected contributions from the EST Congress, Copenhagen 2001 (Vol. 1 in the EST subseries, pp. 271–284). Benjamins Translation Library.

45.

Ruiz

Elosúa

Lechuga

(2005). Old-fashioned responses in an updating memory task. The Quarterly Journal of Experimental Psychology Section A, 58(5), 887–908.

46.

Russo

(2022). Aptitude for conference interpreting. In Albl-Mikasa

Tiselius

(Eds.), The Routledge handbook of conference interpreting (pp. 307–320). Routledge.

47.

Seleskovitch

Lederer

(1989). Pédagogie Raisonnée de l’interprétation [Reasoned Pedagogy of Interpretation]. Didier Erudition; Publications Office of the European Union.

48.

Stavrakaki

Megari

Kosmidis

M. H.

Apostolidou

Takou

(2012). Working memory and verbal fluency in simultaneous interpreters. Journal of Clinical and Experimental Neuropsychology, 34(6), 624–633.

49.

Takeda

(2011). Revisiting the teaching of consecutive interpreting. Working paper presented at the Monterey FORUM, 9 April, Monterey, CA, USA.

50.

Timarová

Š.

Čeňková

Meylaerts

Hertog

Szmalec

Duyck

. (2015). Simultaneous interpreting and working memory executive control. Interpreting, 16, 139–168.

51.

Tiselius

(2022). Conference and community interpreting commonalities and differences. In Mikkelson

Jourdenais

(Eds.), The Routledge handbook of conference interpreting (pp. 49–63). Routledge.

52.

Tiselius

Englund Dimitrova

(2023). Monitoring in dialogue interpreting. Cognitive and didactic perspectives. In Gavioli

Wadensjö

(Eds.), Routledge handbook of public service interpreting (pp. 309–324). Routledge.

53.

Unsworth

Engle

R. W.

(2007). The nature of individual differences in working memory capacity: Active maintenance in primary memory and controlled search from secondary memory. Psychological Review, 14(1), 104–132.

54.

Van der Linden

Van de Putte

Woumans

Duyck

Szmalec

. (2018). Does extreme language control training improve cognitive control? A comparison of professional interpreters, L2 teachers and monolinguals. Frontiers in Psychology, 9, Article 1908.

55.

Viljanmaa

(2020). Professionelle Zuhörkompetenz und Zuhörfilter beim Dialogdolmetshen [Professional listening competence and listening filters in dialogue interpreting] [Dissertationsarbeit, Univärsität Tampere] [TransÜD, Arbeiten zur Theorie und Praxis des Übersetzens und Dolmetschens, Band 112]. Frank & Timme.

56.

Wen

Dong

(2019). How does interpreting experience enhance working memory and short-term memory: A meta-analysis. Journal of Cognitive Psychology, 31(8), 769–784.

57.

Yntema

D. B.

(1963). Keeping track of several things at once. Human Factors, 5, 7–17.

58.

Yudes

Macizo

Bajo

(2011). The influence of expertise in simultaneous interpreting on non-verbal executive processes. Frontiers in Psychology, 309(2), 1–9.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.03 MB

0.00 MB