Abstract
Case-based teaching and the value of scaffolding in assisting students to solve the problem in case studies has been widely acknowledged. However, there have been few studies mapping out how teachers practise scaffolding in their naturally occurring case-based teaching. To address this knowledge gap, the present study explores, through multimodal interaction analysis, the ways two business English teachers practised scaffolding in their case-based teaching to support students’ problem-solving process. By employing a primarily qualitative method of video observations paired with teachers’ self-reflections on their teaching (yet integrating a quantitative layer through targeted data analysis), the study reveals that both teachers made full use of eight communicative modes of spoken language, print, spatial position, movement, gesture, gaze, head movement and facial expression to realise their scaffolding strategies; however, their scaffolding practice exerted distinct influences on students’ working memory processing during problem-solving. The findings indicate that teachers need, when scaffolding students’ problem-solving process in case-based teaching, to be aware of the significance of the appropriate choice and coordinated use of communicative modes. It yields important implications for promoting the critical semiotic awareness of pre-service and in-service teachers across different subject disciplines, educational settings, and cultural contexts in their case-based teaching.
Introduction
The case-based teaching has been widely used in economics, social work, medicine, law, business English, and other disciplines (Yin, 2017). Written in the form of a story with a problem for students to solve through a series of tasks, case studies are supposed to cultivate students’ abilities to analyse, discuss, evaluate, and make decisions, which are required for their future professional practice and sustainable development (Bonney, 2015; Nohria, 2021). However, the innate authentic nature of case studies can also lead to a cognitive overload on students since students’ working memory is extremely limited in its capacity to deal with an unfamiliar problem (Meguerdichian et al., 2016). Hence, they need adaptive support (Sun et al., 2023) to solve problems in case studies. To this end, teachers tend to provide scaffolding on an ongoing basis according to their students’ progress in problem solving. Scaffolding refers to a process which occurs when teachers interact socially with students to provide students with intellectual support so they can function in their zone of proximal development (ZPD; Acosta-Gonzaga & Ramirez-Arellano, 2022). In this sense, teachers play a particularly important role in the scaffolding process (Sun et al., 2023). However, they often find it difficult to actually exercise scaffolding in their classroom (Esteban & Cañado, 2004).
Scaffolding consists of various kinds of “multimodal strategies that together act as critical mediating tools in students’ learning” (Sharpe, 2006, p. 212). Accordingly, skillful adoption of multimodal resources in the classroom is an instantiation of teachers’ pedagogical competence to scaffold students’ problem-solving process (Patel, 2024). Unfortunately, little evidence from classroom exists (Chern & Cheeb, 2022) mainly because scaffolding, as a concept, cannot “easily be translated into a practical classroom context” (Sharpe, 2006, p. 212). To address this research lacuna, the present classroom-based study adopts multimodal interaction analysis (MIA) as an analytical framework to investigate how scaffolding is practised in case-based teaching to support problem-solving process. MIA is “a holistic analytical framework that understands the multiple modes in (inter)action as all together building one system of communication” (Norris & Pirini, 2016, p. 24). It has been proved to allow for “an in-depth exploration of teacher-student interaction with a particular emphasis on the teacher” (Qin & Wang, 2021, p. 4).
The ultimate aim of this study is two-fold. On the one hand, we attempt to enrich the pedagogical understanding of how the scaffolding is incorporated in the case-based classroom setting to support students’ problem-solving process. On the other hand, we intend to promote pre-service and in-service teachers’ critical semiotic awareness (Kern, 2015), helping them improve their capacity to analyse, design and control their classroom scaffolding practice through an analysis on the incorporation of scaffolding in case-based teaching.
Information Processing in Working Memory and Scaffolding
According to cognitive load theory, students’ problem-solving process will be hampered when their working memory capacity is exceeded in a problem-solving task (Meguerdichian et al., 2016). Working memory provides the mechanism through which humans process information (Chai et al., 2018). In the working memory, visual and auditory information is separately processed based on the phonological loop and visuospatial sketchpad of Baddeley and Hitch’s model (1974) (Mayer, 2002). The phonological loop is responsible for processing auditory information such as verbal information, while the visuospatial sketchpad is responsible for processing visual information. The processed information then moves to the long-term memory, which is the vast amount of information saved in one’s life. Working memory processes information either prior to it being stored in long-term memory or after it has been stored. Before moving to long-term memory, information should be rehearsed or linked to prior knowledge within the working memory; otherwise, it will lose from working memory (Mayer, 2002). After being stored in long-term memory, information first needs to be retrieved into working memory before it can be used to perform cognitive tasks (Liu et al., 2022).
Working memory is limited in capacity and too much information entering the working memory will lead to cognitive overload, resulting in loss of information (Cowan, 2013). It is scaffolding that helps regulate how information moves through working memory, which in turn reduces cognitive load (van Nooijen et al., 2024). Scaffolding involves social interaction between teachers and students, which is initiated by teachers. Interactional scaffolding can be planned to some extent, especially when teachers require responsive interaction (Kuiper et al., 2017). When planning, teachers focus on the use of discourse and metacognition (Mahan, 2022). Discourse means a process mediated by the communicative modes utilised in teachers’ interaction with their students (Kress, 2010). Communicative modes in this study are defined as “a socially and culturally given semiotic resource for making meaning” (Kress, 2010, p. 79) and image, writing, layout, gesture, space, and facial expression are all examples of modes. Key scaffolding strategies evolved around the discourse include repetition by echoing a student’s answer in class, revoicing through reformulating the student’s answer in academic language, elaboration via prompting the student to justify or amplify their answer (McNeil, 2012) and two types of questions that are referential questions and display questions respectively (Mahan, 2022). Different from display questions, teachers do not know the answers that students will provide to the referential questions. These scaffolding strategies focus on the spoken language of teachers; however, high-quality support for students in the classroom can be achieved through not only the orchestration of spoken language but also the combination of other communicative modes (Peng, 2019). Therefore, different communicative modes including spoken language are used in the present study to articulate how teachers make use of discourse to help students with tasks in their case-based teachings. Metacognition, or “learning to learn” (Coyle et al., 2010, p. 29), emphasises the way that teachers help students with tasks by drawing students’ attention to their own learning processes (Zumbach et al., 2020). Teachers can use process prompts (Zumbach et al., 2020), provide examples of tasks and discuss them with students (e.g. modelling) or directly teach meta-strategies to students (Grossman, 2015). Process prompts play an important role in scaffolding metacognition (Zumbach et al., 2020). They are “instructional measures integrated in the learning context which ask students to carry out specific metacognitive activities” (Bannert, 2004, p. 2).
Until now, the research on teachers’ scaffolding practice in case-based teaching tends to emphasise the effect of scaffolding on students’ case-based learning (Tawfik et al., 2019; Zhou et al., 2025). For example, Chern and Cheeb (2022) found that scaffolding enables the teacher of Financial Accounting and Reporting Course to facilitate students in their case-based learning. It shows the positive effect of scaffolding on students’ case-based learning; however, it has focused on the outcome of scaffolding rather than describing its process. In fact, the process is valued more than the outcome for teachers (Chen, 2009) because only via articulation can scaffolding practice be reflected on or fed back into their actual classroom.
Multimodal Interaction Analysis
In a scaffolding process, teachers combine various communicative modes into multimodal ensembles to make meaning so as to manage their interaction with students. Interaction is “multimodal as it happens through speech, writing, gesture, image and space” (Archer, 2014, p. 189). Accordingly, investigating the way that different modes are assembled to construct a whole meaning is crucial to examine how scaffolding is exercised in case-based teaching to support students’ problem-solving process. In view of this, multimodal interaction analysis (MIA) framework (Norris, 2004, 2019) was used in this study for the multimodal microanalysis on those selected fragments of teachers’ interaction with students when they scaffolded students’ problem-solving process, so as to examine interaction more precisely and lead to more nuanced descriptions.
MIA focuses on the mediate interaction in a given context, that is, “how a variety of communicative modes are brought into and constitutive of social interaction” (Jewitt, 2014, p. 36). Accordingly, its unit of analysis is the mediated action that a social actor (the teacher in this study) performs with or through communicative modes (Scollon, 1998). Mediated action can be “further categorized into lower-level actions (LLAs) and higher-level actions (HLAs)” (Norris, 2004, p. 13). LLAs refers to the smallest interactional meaning unit that a social actor employs and that is mediated by a communicative mode. HLAs are made up of multiple chained LLAs and bracketed by an opening/closing (Norris, 2004). In other words, the interplay of multiple chains of LLAs creates the HLAs analysed in the present study: the episodes of scaffolding. An episode of scaffolding here is defined as a fragment of multimodal teacher-student interaction where a certain scaffolding strategy occurs or is prompted to scaffold students’ problem-solving process. Modal configuration (hierarchical organisation of LLAs) allows the analysis of an HLA in terms of the chains of LLAs that constitute it and their relationships (Norris, 2004). In this process “the LLAs that are most important to the meaning produced are defined as most important to the construction of the HLA” (Norris & Pirini, 2016, p. 26).
Most of the studies on teacher-student interaction are concerned with the teacher’s spoken language (Kunioshi et al., 2015; Morell, 2015), with relatively less attention being paid to analysing teachers’ use of combined communicative modes. The limited recent studies on the multimodal construction of teachers’ instructions (Morell, 2018; Querol-Julián, 2023) show that it can help reveal, in a close and interpretative way, how various communicative modes interplay in teachers’ interaction with students. With help of it, this study attempts to explore how teachers practise scaffolding to assist students’ problem-solving process in case-based teaching by investigating the following three research questions:
What communicative modes do teachers combine into multimodal ensembles when they construct scaffolding strategies to assist students’ problem-solving process?
How do communicative modes interplay to realise teachers’ scaffolding strategies in terms of modal configuration?
To what extent, and in what manner, does teachers’ practice in implementing their scaffolding strategies may contribute to students’ information processing in working memory?
Methodology
Research Settings and Samples
In the study, we focus on naturally occurring scaffolding practices of business English teachers since case studies particularly suit business English courses (Esteban & Cañado, 2004). The study reports on two case study lessons given by two business English teachers in their undergraduate courses: Business English Comprehensive Course and International Business Negotiation Course. Business English Comprehensive Course is an English language course designed to equip students with specific language skills, vocabulary, and cultural understanding necessary for professional communication in business contexts. Case studies rank high in popularity among course teachers owing to their ability to stimulate discussion and meaningful writing assignments (Ulrich, 2000) for students to improve English proficiency and business communication skills as well. Their focus “is on the underlying language activities” (Ulrich, 2000, p. 230). In comparison, International Business Negotiation Course teaches students how to conduct and manage negotiations between parties from different countries and its case studies aim to help students learn strategies to avoid deadlock and conflict in business negotiations and promote their negotiation capabilities (Page & Mukherjee, 2007). The present study chose these two case study lessons because their timing was identical as they were both lessons before students produced their final outputs after the unit learning. Outputs were a stimulated negotiation role play for Negotiation Course and an essay on what makes a good piece of writing for Comprehensive Course respectively.
Episodes of scaffolding in these lessons composed the sample of the study. The classroom videos of these two lessons were watched repeatedly for the identification of scaffolding episodes during teacher-student interactions. Episodes of scaffodling were identified based on the following criteria: (1) the episodes open when the teacher pays full attention to scaffolding problem-solving process through a certain scaffolding strategy and (2) they close when full attention is paid again to scaffold. In this way, two lectures each lasted around 45 min, albeit the study focused on the eight typical episodes of scaffolding developed in each class (see Table 1), whose total durations were the same, that is, 1,744 s. The average duration of the episodes was 218 s, the maximum 593 s, and the minimum 55 s.
Episodes of Scaffolding in Comprehensive Class and Negotiation Class.
Participants
This study is based in a 4-year business English bachelor programme of a Chinese public university. Two business English teachers are selected as participants for this study, one of whom is the first author. Their demographic information is shown in Table 2. For the convenience of reference, the Comprehensive Course teacher is referred to here as CT and the Negotiation Course teacher as NT.
Demographic Information of Two Teachers.
Several similarities between teachers make them comparable in terms of data analysis. Firstly, they taught case studies in different business English teaching settings. Secondly, their lessons were both for sophomore students of business English major, who have the same mix-ability profile. Thirdly, they independently designed and practised different scaffoldings with different time allocation, but both successfully involved students in a positive classroom context to solve the problem in case studies. Lastly, they both had taught their students continuously for nearly 2 years and were fairly familiar with the latter.
Data Collection
Data were collected from the recorded videos of two practical classes, teachers’ self-reflections on their teaching, and their PPT slides. To annotate communicative modes in the video clip, this study used the multimodal analysis software of ELAN. This open coding allows us to identify instruction-giving sequences for which they demarcate all sequences considered as HLAs (Norris, 2019). To annotate the data, the second author first clipped out the episodes of scaffolding and observed them many times to identify the features of different communicative modes.
The episodes of scaffolding were firstly annotated by exploring their scaffolding stages based on Sinclair and Coulthard’s classroom interaction pattern (1975), which fits in face-to-face classes (Querol-Julián, 2023). Additionally, given the important role that teachers’ waiting time plays in scaffolding higher cognitive level learning (Tobin, 1987), the period when the teacher was waiting for her students’ responses was also annotated. Accordingly, the episode of scaffolding was annotated hierarchically and sequentially into a four-stage pattern (initiation-waiting-response-feedback), starting with the teacher who initiates the scaffolding process as planned. Thereafter, waiting time was allocated for teachers to help students engage in cognitive processing for appropriate response, followed by students’ responding to the teacher’s initiation. Finally, the teacher provided feedback.
Next, the microstructure of each scaffolding stage was explored to present how many communicative modes were embodied at each stage and how these communicative modes interplayed to fulfil the scaffolding strategies of each teacher. According to Norris’ (2004) list of communicative modes for multimodal interaction and previous research into multimodal teaching (Qin & Wang, 2021), we annotated the modes of spoken language, print, spatial position, movement, gesture, gaze, head movement, and facial expression with ELAN (see Figures 1 and 2) based on a coding schema (see Table 3).

Screenshot of CT’s communicative modes on ELAN statistics.

Screenshots of NT’s communicative modes on ELAN statistics.
Coding Schema of Communicative Modes.
Note. SL = spoken language; SLP = spoken language prosody; P = print; PPS = print-PPT slides; PWB/WB = print-writing on the blackboard; AP = authoritative position; SP = supervisory position; IP = interactional position; MFCF = movement in the front centre of the classroom; MBR = movement between the rows; G = gesture; GIG = gesture-iconic gesture; GDG = gesture-deictic gesture; GMG = gesture-metaphoric gesture; GBG = gesture-beat gesture; GGAS = gaze-gaze at all the students; GGOS = gaze-gaze at one student; GGPP = gaze-gaze at PPT presentation; HM = head movement; HMDS = head movement-directional shift; HMHB = head movement-head beats; FE = facial expression; FEP = facial expression-positive; FEN = facial expression-negative.
To mitigate potential bias arising from the fact that the first author also served as one of the teacher-participants, the first author had no access to classroom videos or transcripts until the coding scheme had been fully developed and frozen, and the second author, who was responsible for methodology, software, and data curation, primarily carried out the data coding. To ensure inter-coder reliability in our coding process, both the first author and the second author independently coded randomly selected episodes (about 20% of the total annotated time stratified by class and by scaffolding stage) using the above-mentioned coding scheme (see Table 3). we achieved strong agreement with kappa value of .82 (Landis & Koch, 1977). After discussing and resolving minor discrepancies between us, the second author coded the remaining episodes.
This study represents language mode by spoken language and print. Print in the present study is regarded as an embodied mode for it is employed to express thoughts (e.g. by creating writing layout on the blackboard; Norris, 2004). What’s more, the study focuses on two teachers’ use of space through their positioning patterns and movement directionality in relation to their students because teachers’“use of space through positioning and movement is a significant semiotic resource for effective teaching” (Lim et al., 2012, p. 236). Teachers’ positioning patten in the present study is divided into authoritative position (standing in front of/behind the teacher’s desk and in the front centre of the classroom), supervisory position (standing between the rows of the students’ desks without giving consultation), and interactional position (standing between the rows of the students’ desks to offer guidance), which correspond to the three types of space in the classroom proposed by Lim et al. (2012): authoritative space, supervisory space, and interactional space.
Before diving into multimodal interaction analysis, we first needed to establish a Tier system to organise annotation layers and map the full spectrum of communicative modes. In the present study, a Tier refers to a dedicated, hierarchical category in the ELAN software used to label distinct types of communicative modes. For instance, we created separate Tiers for “spoken language” (capturing verbal utterances), “gesture” (tracking manual movements), and “gaze” (recording eye contact shifts), with each Tier corresponding to one dimension of multimodal interaction. In this way, the ELAN annotation of communicative modes supplies that essential roadmap. Specifically, based on annotation statistics shown in Figures 1 and 2, we identified the combination pattern of communicative modes at each stage, with the aim to pave the way for analysing the strategies that emerge from multimodal choices (Jewitt et al., 2016). Then, we analysed the interplay of communicative modes in HLAs at each stage in terms of modal configuration, aiming to articulate how communicative modes work together to realise the scaffolding strategies of each teacher. Finally, drawing on the phonological loop and visuospatial sketchpad of Baddeley and Hitch’s working-memory model (1974), we examined how, and to what degree, teachers’ scaffolding regulates the way learners process information in case studies. It should be noted that the study does not claim to measure actual working memory load; any inferences about cognitive mechanisms remain speculative and await validation through experimental methods.
Findings and Discussion
This section begins by outlining the descriptive findings on communicative modes, followed by an analysis, based on modal configuration, of how communicative modes work together to realise the scaffolding strategies of each teacher. Finally, we compare the two teachers’ scaffolding practices and tentatively, and necessarily speculatively, examine their influence on the information processing in working memory.
Combination of Communicative Modes into Multimodal Ensembles
Scaffoldings of the two lessons in the present study unfolded sequentially through four stages of initiation, waiting, response and feedback. Generally speaking, two teachers made full use of eight communicative modes and there was no significant difference in the orchestration of modes in each stage of scaffolding in two classes. However, they differed significantly in their performance of those communicative modes that are associated with the specific strategy to scaffold students’ problem-solving process such as spoken language, print and movement (see Figure 3).

Orchestration of communicative modes in two teachers’ classes (See Tables 1, 2 and 3 in the Supplemental Material for the raw data).
Annotation statistics indicate that CT preferred to deliver instructional content to students through spoken language (ADP = 30.8) while NT preferred the print (ADP = 25.525+5). As for print, both teachers depend a lot on a computer-mediated PPT screen to present teaching content since pictures, diagrams and videos can be easily embedded into the PPT presentation to help students understand the concepts or tasks in case studies, but NT (ADP = 5) spent much more time writing on the blackboard than CT (ADP = 1.4).
Regarding movement directionality, CT was observed to mainly move forward and backward in the front centre of the classroom (MFCF) at times (ADP = 12.3) since such a movement both highlights her authoritative space in the class and simultaneously facilitates interaction with the entire class. In contrast, NT displayed much more movement between the rows (MBR; ADP = 17), which is supposed to reduce interpersonal distance with her students and consequently facilitate their interaction.
The result in Figure 3 also shows that both teachers spent most of the lecture time in the authoritative position to teach and instruct. However, NT (ADP = 16.5) spent much more time in positioning herself between the rows of students’ desks (SP) than CT (ADP = 5.9). This result is consistent with the findings in the mode of movement.
Finally, CT made more gestural and head movements than NT, which is in line with the findings in the mode of spoken language.
Interplays of Communicative Modes to Realise Scaffolding Strategies
The scaffolding strategy resides in the integrated effects of the interplay of communicative modes (Kress et al., 2001). As such, the scaffolding strategy in the present study is the purpose of each HLA. Modal configuration “allows us to examine the hierarchical ordering of the LLAs that make up (or are produced by) an HLA” (Norris, 2019, p. 245; see Table 4).
Modal Configurations in the Multimodal Interactive Discourses of Two Teachers.
Note. NoA = number of annotations; AD = average duration; TAD = total annotation duration; ADP = annotation duration percentage.
Scaffolding Strategies Revolved Around Discourse
According to Table 4, both teachers initiated all the scaffolding episodes with the primary communicative mode of spoken language and the high volume of modal configurations constructing print was employed with spoken language. It indicates that teachers both attempted to make full use of discourse by creating “message abundancy” (Gibbons, 2003, p. 259), “the notion of the message being received by the student in a variety of modes such as oral or written explanations or visual diagrams” (Sharpe, 2006, p. 3). According to Hammond and Gibbons (2005), teachers explicitly engage in message abundancy when selecting and sequencing tasks at macro level of scaffolding, but we argue that message abundancy also occurs at their moment-by-moment interactions with students. In the present study, the oral questions, combined with the written explanations on the blackboards and visual diagrams in the PPT slides, served as instructional stimuli to express the teaching content to students. CT’s time of oral utterance during the initiation stage was much longer than that of NT (TAD = 367 and TAD = 300, respectively) although the differences regarding frequency were not so significant (NoA = 48 and NoA = 45, respectively). Video observation shows that she spent most of the spoken time in explaining and contextualising questions to students. The reason behind it is that she mainly asked referential questions like “What is the general attitude reflected in literature toward business?” and “Do you think the general attitude is justified? Why?”. According to CT, her questions were supposed to create her dynamic interaction with students, and, more importantly, to facilitate students to gain comprehensible input and produce language output (Arifin, 2012). To this end, she used referential questions to “prompt students to comprehend and produce target language that reflects their own thinking” (McNeil, 2012, p. 396). She assumed that these questions could be challenging for her students since they lacked either the linguistic or prior knowledge to understand or answer them (McNeil, 2012). Hence, she spent much time at the initial stage in explaining and contextualising the questions with the help of visual diagrams on the PPT slides (see Figure 4), making connections between students’ prior knowledge and new unfamiliar information in the case so that they could think deeply what they had read in the case and respond with logical evaluation through target language. Even when waiting for students’ response, she (NoA = 28, TAD = 144) spoke more often and longer than NT (NoA = 20, TAD = 88). The oral utterances at the waiting stage, as CT reflected after class, were the contingent content support she provided to establish a cognitive focus for students to answer the questions because she observed little reaction from the students.

Visual diagram on PPT slides in CT’s class.
Case studies in International Business Negotiation Course tend to view the comprehension of their technical terms and jargon as the primary learning task. Consequently, NT preferred asking sequential questions that begin with display ones like “How do four elements of supply chain services lift the service value of the Company?.” These display questions explicitly engage students in the unfolding of the case background. Then, she coached students to further analyse the case with referential questions like “What are Company’s competitive advantages in the negotiation?”. In this respect, questioning in NT’s class was mainly supposed to “help students to develop internal procedures that aid in deep processing of the case text” (Ciardiello, 1998, p. 212). It was confirmed by NT’s reflection on class, “All the questions in my class form a circle for instruction, leading students from superficial responses to deep discussion of the negotiation issues embodied in the case.” Aligned with her questioning strategies, NT emphasised the visual layout of PPT presentation and writing on the blackboard (see Figures 5 and 6) in order to direct students’ attention to the concepts (e.g. competitive advantage) that were semantically related with what had been elicited earlier and written on the board (e.g. LITF). What is more, she (NoA = 45, TAD = 431) made more and longer movements between the rows at the waiting stage than CT (NoA = 40, TAD = 351). Different from CT’s focus on spoken language and gaze (i.e. looking around the classroom), she tended to encourage immediate responses from students by moving instead of just speaking in order to check students’ level of understanding timely.

NT’s PPT slide presentation.

NT’s writing on the blackboard.
When giving feedback, both teachers acted very responsively through smiling, nodding, repeating students’ responses, and writing students’ answers on the blackboard. Spoken language still accounted for the highest frequency; however, NT invested much more time in the modes of print (TAD = 364) and movement (TAD = 142) than CT (TAD = 213 and TAD = 64, respectively). Orchestration of these modes indicates two different strategies that two teachers adopted in the feedback stage: NT focused on elaborating the responses while CT revoiced the responses more frequently. After listening to a students’ response, NT repeated it first and prompted them to amplify it and reflect aloud on their thinking. For example, when students answered the question “What are the Company’s competitive advantages in the negotiation?” with the word “experience,” NT found their meaning not clear enough and stepped forward asking a further question “What kind of experience?.” Students quickly responded with “professional” and NT wrote “professional experience” down on the blackboard (see Figure 6). In this way, NT invited students to improve their responses instead of merely evaluating the answers, through which she “supported students in absorbing new information into existing schema as they work within their ZPD to gain new understanding” (Sharpe, 2006, p. 12). When reflecting after class, she attributed her elaborating strategy to her focus on leading students toward self-discovery in problem-solving process. To facilitate the elaboration process, she moved toward the student who was answering the question in order to trigger more turns of exchanges.
CT, on the other hand, repeated the response and reformulated it into academic language. For instance, when students were encouraged to answer the question “Do you think the general attitude is justified? Why,” one of them stood up with the answer “No, it is not justified because I think it is not so objective and some of the business is not so bad.” After listening, CT quickly reacted: “It is not objective, right? (followed by the student’s response “YEAH”) This kind of view is not objective since not all of businessmen are profit-oriented, right?.” She acknowledged the answer first and relexicalised students’ everyday words (e.g. bad) into more registrally appropriate ones (e.g. profit-oriented). In this way, CT’s revoicing happened based on what the student has contributed and was consequently semantically contingent upon it. In her view, semantically contingent speech can facilitate the development of her students’ English proficiency (Gibbons, 2003). To ensure that all students understood her, she looked around the classroom, repeating the word “profit-oriented” several times.
Scaffolding Strategies Targeting Metacognition
In case-based learning, metacognition is critical for students for they need to realise and evaluate their cognitive processes in solving problems (Çakiroglu & Betül, 2023; Zumbach et al., 2020). NT adopted more process prompts than CT to encourage students to know more about their own thinking. For example, after a student answered the question “How do four elements of supply chain services lift the service value of the company?,” she stepped forward and reacted with the prompt “Based on your answer, I have a second question. Do you think the supply chain services are one of the leverages of the company? Why?.” Being a synonym of value, “leverage” triggered the student to assess her own thinking, understanding, and perspectives in the previous answer, helping her monitor her ongoing thoughts.
In addition to spoken language, two teachers, especially NT, paid much attention to the function of print modes to input metacognitive knowledge. NT displayed long duration (TAD = 498) on print mode at the initial stage although she had much lower frequency than CT (NoA = 12 and NoA = 16, respectively). Video observation revealed that she particularly emphasised the layout of PPT presentation and writing on the blackboard, through which she visualised the way to identify the negotiation leverage (see Figures 5 and 6). In view of this, print, especially writing on the blackboard, in NT’s class was employed with the function of modelling aimed at scaffolding “the metacognitive activities of students together with the intention of direction maintenance” (van de Pol et al., 2010, p. 277). The modelling strategy with print mode was especially obvious at her feedback stage (see Figure 6) where she showed the same frequency (NoA = 10) as CT but with much longer duration (TAD = 364 and TAD = 213, respectively). Figure 6 indicates NT’ writing on the blackboard involves communicative exchanges between her and her students, through which she co-constructed with her students a cognitive schema to solve the problem by setting an example to illustrate how to read between lines in the case. As NT reflected after class, her layout of writing on the blackboard was supposed to lead students to better think about the key information in the case and give them an example to follow when they needed to identify the negotiation leverage.
CT, on the other hand, used PPT slides frequently to suggest meta-strategies to help students solve problems. She used slides to project complicated diagrams of processes for students to analyse the case. For example, the diagram in Figure 4, as a visual support at the initial stage, recommended a cognitive schema for students to figure out the perspectives (e.g. economy, people, spiritual) to judge whether the author’s attitude is justified or not. PPT slides also play a significant role in CT’s “repair contingency” (Hammond & Gibbons, 2005, p. 26) on supporting students’ metacognition. Repair contingency refers to one where teachers undertake repair work when they find students unable to master an important concept (Hammond & Gibbons, 2005). To CT, PPT slides can help students visit the diagrams (Figure 4) any time needed in class (Hammond, 2014). Accordingly, she presented and explained Figure 4 again when she found the students’ answers dissatisfying. She also wrote the key information reflected from the case on the blackboard (e.g. The Enclosure Movement), making sure that students understood what she said.
Contribution of Scaffolding Practice to Information Processing in Working Memory
The above analysis shows that two teachers emphasised the initial and feedback stages, where they both implemented scaffolding strategies to reduce the cognitive load on students. However, detailed description on the interplay of communicative modes reveals that their distinct scaffolding practices could exert different influences on the way that students’ working memory handles new information.
On the initial and feedback stages, NT and CT both made full use of semiotic systems to create message abundancy and thus input auditory (spoken language) and visual (diagrams in PPT slides and writing on the blackboard) information to students’ working memory, especially when they provided a cognitive schema for problem solving. However, their approaches to the verbal and visual modes when implementing scaffolding strategies determined, to some extent, whether the information embodied in modes will maintain or lose in working memory. Take the cognitive schemas in Figures 4 and 6 as examples. CT constructed the schema first and then presented it (see Figure 4) on the PPT screen. To direct students’ attention to the visual information in the schema, she pointed to the diagram consistently. In this way, the attended visual information entered students’ working memory. To facilitate working memory processing, CT used a great amount of spoken langue to provide contextual or explanatory information about the diagram. However, separate systems process auditory and visual information separately within the working memory and they interfere with each other (Loomis et al., 2012). Hence, there are costs in information processing when students shift their attention from information of one modality to information of a different modality, which imposes, at least in theory, a greater load on working memory and working memory in turn will lose some information. In the present study, the unsatisfying response from CT’s student justifies it although this justification is necessarily speculative and awaits empirical testing. It indicates that the schema information in the diagram was incompletely stored in working memory and some descriptive traits were inadvertently ignored (e.g. dimension of “economy”). This finding is consistent with CT’s “repair contingency” at the feedback stage.
In contrast to CT, NT con-constructed the schema with students (see Figure 6) by “grouping relevant information into a coherent mental representation” (van Nooijen et al., 2024, p. 11). While NT was writing information on the blackboard, she had a rich dialogic exchange with students. Consequently, the visual and verbal modes would share working memory and the congruent information presented simultaneously would be processed faster and more accurately and accordingly maintain longer in working memory (Maezawa & Kawahara, 2021; van Nooijen et al., 2024). In the present study, NT’s verbal communication along with writing on the blackboard enables her to move through the visual information in a step-by-step manner, which may allow students to follow the schema construction process and consequently have adequate time to actively process the information on the blackboard. As a result, the schema information would be stored in her students’ working memory and move to their long-term memory. Video observation reveals that students performed well in the following discussion and presentation tasks.
Besides processing both auditory and visual information, students’ working memory also responded to verbal cues from teachers (van Nooijen et al., 2024). NT adopted a sequence of questions for students to make intentional efforts to identify the information in case studies, which helps them reduce information they attended to. Consequently, the amount of information entering their working memory would be controlled and cognitive load may have been at least partially reduced. Compared to NT’s reducing information entering students’ working memory, CT scaffolded after information entering working memory. After asking referential questions that caused high cognitive load on students, she spent much time in explaining and contextualising the questions to students. In this way, prior knowledge was retrieved from students’ long-term memory and information that was linked to the prior knowledge would be stored within their working memory. In view of this, CT’s effect of scaffolding on the working memory processing depends on the level of students’ prior knowledge. Unfortunately, their responses and CT’s repair contingency both indicate that their prior knowledge could be relatively low. One point worth noting is that the current study does not deeply explore how students responded to teachers’ scaffolding practice although such responses are “often a result of the meanings they perceive from their teachers’ embodied semiosis” (Lim, 2021, p. 2). This is because the primary focus of the study was to articulate how the two teachers implemented distinct scaffolding strategies through the interplay of communicative modes. Consequently, the findings chiefly represent teachers’ perspectives, which may be subject to social-desirability bias and cannot fully capture students’ cognitive and emotional experiences. Future research could build upon this work by exploring how these scaffolding strategies engage students to participate in the process of problem-solving in their case-based learning.
Conclusion
By tracking two business English teachers’ scaffolding performance during the progression of problem-solving, the present study spotlights how interactional scaffolding is differentially deployed in case-based teaching. Notably, the success of case-based teaching largely depends on the teachers who are supposed to be a facilitator transferring the responsibility of learning over to students (Esteban & Cañado, 2004), a principle that transcends subject, educational, and cultural boundaries. For example, while teacher-student interaction patterns may vary culturally (e.g. more collective participation in collectivist contexts vs. individual contributions in individualist settings), the core logic of “adaptive multimodal scaffolding” remains applicable: teachers can adjust communicative modes to align with cultural interaction patterns while consistently emphasising the autonomy of students in the process of problem-solving. In view of this, teachers, regardless of their field of subject expertise, educational setting (e.g. K–12 and higher education) and cultural context, can benefit from a more detailed articulation of this paper on how communicative modes may be effectively orchestrated when they scaffold students’ problem-solving process in case-based teaching.
Two teachers’ orchestrations of communicative modes are supposed to reduce the cognitive load by facilitating the information processing in students’ working memory. However, CT’s much time in repair work at the feedback stage indicates that she was less aware, than NT, of the significance of the appropriate choice and coordinated use of communicative modes when she incorporated scaffolding to support students’ problem-solving process. Her repair work resulted, to a great extent, from the incomplete storage of the cognitive schema (see Figure 4) in students’ working memory. At the initial stage, she presented the cognitive schema intact on PPT screen and discussed it immediately. This way to combine verbal and visual mode did not actually help students remember and understand the schema, which they needed to categorise the problem, choose the correct procedures to apply and regulate problem solving. In contrast, NT’s combination of verbal and non-verbal modes provided a larger space for her and her students to co-regulate problem solving where she assisted her students and her students performed with her assistance (W. Li & Zou, 2021). In this respect, CT is advised to use the animation feature of PPT to present the schema in a step-by-step manner (Armour et al., 2016) while explaining its animated components simultaneously. In this way, she can both control the information entering students’ working memory and produce congruent information from verbal and visual modes. From a cognitive load perspective, carefully calibrating teacher scaffolding is essential to keep students’ working memory from becoming overloaded (J. Li, 2025). Hence, a pedagogical implication of this study for teachers, although it is limited by a small sample size, is the need to become aware that the effect of scaffolding is dependent not only on their choices of modes but also on “the inter-semiotic modal relationship established among their selected verbal and non-verbal modes” (Morell, 2018, p. 78). In this sense, the findings of this study may also be used for teacher educators to train pre-service and in-service teachers on the use of combined communicative modes.
Findings also show that the effect of scaffolding on students’ working memory processing is determined by not only the combination of communicative modes but other influential factors like students’ prior knowledge as well. In this respect, the present study can only reach tentative findings due to its main methodological limitation that it did not include any student-perceived data. More future research with a detailed analysis of how student engagement evolve throughout a whole scaffolding process within the constraints of curricula, timetables and available venues, is indicated.
Supplemental Material
sj-docx-1-sgo-10.1177_21582440251410652 – Supplemental material for Scaffolding Students’ Problem-Solving Process in Case-Based Teaching: A Multimodal Analysis
Supplemental material, sj-docx-1-sgo-10.1177_21582440251410652 for Scaffolding Students’ Problem-Solving Process in Case-Based Teaching: A Multimodal Analysis by Yun Jiang and Zheng Wang in SAGE Open
Footnotes
Ethical Considerations
The studies involving human participants were reviewed and approved by the institutional review board of School of Foreign Languages, Jiangxia University (Certificate number: 2024003).
Consent to Participate
Informed consent was obtained from the participants. The authors report that this is an original research and no AI technology has been used in writing the paper.
Author Contributions
Yun Jiang and Zheng Wang shared equal contributions.
Both authors contributed to the article and approved the submitted version.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by [Fujian Federation of Humanities and Social Sciences Circles-Project of Linguistics] under Grant [number FJ2021B098].
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data in this study will be made available on request.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
