Abstract
This study explores how young bilingual (Swedish-Chinese) children (ages 3–5) exercise their agency in heritage language learning. The study draws on a multimodal interactional approach to video-ethnographic data from classroom interactions at a Chinese weekend school in Sweden, and the analysis combines Goffman’s concept of footing with Bakhtin’s approach to heteroglossia and multivoicedness. Particular focus is on how the children use reported speech, along with code-switching and embodied actions, to incorporate voices of absent parties (family members) in classroom discourse in order to assert epistemic and moral authority while strengthening peer alignments. Such multivocal, multilingual, and multimodal practices imply shifts in footing and participation frameworks from teacher-led to child-oriented. Simultaneously, the children’s heteroglossic practices orient to monolingual norms of Chinese as the preferred classroom language. The findings contribute to an understanding of young children’s collective agency in transforming heritage language learning situations into dialogically co-constructed heteroglossic practices.
Keywords
Introduction
In today’s globalized societies, a multitude of children are growing up in transnational families in which multilingualism and translanguaging are integral to their everyday life (cf. Blackledge and Creese, 2014; King and Lanza, 2019;). These families often engage in conscious efforts to support their children’s bi/multilingual development and heritage language (HL) maintenance (He, 2013; King, 2016; Lanza and Wei, 2016). Language socialization studies have shown, for example, how bi/multilingual family members engage in everyday HL practices, in which competing ideologies around languages are shaped by family, peer, and institutional norms (cf. He, 2011; Ogiermann, 2013; Smith-Christmas, 2019). These studies underscore children’s agency in transforming language norms, demonstrating “the agency of language learners – their capacity for creativity, resistance, and even subversion” (Garrett, 2007: 235). However, to date limited attention has been paid to how young multilingual children engage with learning an HL outside the home in educational activities within a (pre)school environment (cf. Cekaite and Evaldsson, 2019).
This study addresses this gap by exploring how young children (3–5 years old) with bilingual (Swedish-Chinese) family backgrounds exercise their agency in a Chinese HL classroom for immigrants and their descendants (cf. He, 2013). We will show how these children use a wide range of semiotic resources, including different language varieties (Swedish and Chinese), discourses, and modalities (talk and embodied actions) to organize their classroom participation. Combining a multimodal interactional approach to Goffman’s notion of footing and participation (Goodwin, 2006; Goodwin and Goodwin, 2004) with Bakhtin’s (1981, 1986) dialogical approach to heteroglossia, we explore key aspects of micro-interactional and macro processes of young bilingual children’s participation and language learning in a multilingual setting (cf. Kyratzis and de León, 2019). Particular attention is paid to the ways in which the children use reported speech, referring to absent parties (family members), to establish forms of footing and alignments with voices, registers, and discourses from outside, in the classroom and hence with each other. Such heteroglossic practices, located and enacted within micro-interactional processes, are in turn linked to macro-level social tension and values of languages. We use Bakhtin’s notion of heteroglossia to further highlight how young children in interaction with others (here a teacher and peers) orient to the sociocultural meanings indexed by various language varieties and voices, rendering their own commentary on normatively defined codes and dominant ideologies, advocated and instantiated by adults (Kyratzis et al., 2010).
In the context of a HL school, young children with a complex bi/multilingual repertoire encounter an educational institution traditionally informed by a monolingual ideology, prioritizing a standardized language (Blackledge and Creese, 2014). In line with previous sociolinguistic research, we will show how even teachers working in monolingual settings (Jaspers, 2024) valorize children’s linguistic diversity as legitimate or complementary ways of expression, without losing sight of the socially valued HL. In this study, we will further explore how HL situations are transformed by young children into heteroglossic practices, in which multiple voices, discourses, and language varieties become crucial resources in the children’s HL learning.
We first develop our theoretical approach to Bakhtin’s (1981, 1986) concept of heteroglossia, embracing the multifaceted linguistic diversity and multivoicedness that in his view is inherent to living languages (cf. Blackledge and Cresse, 2014). The empirical part draws on video-ethnographic research at a heritage Chinese language school with a pedagogy that approaches children’s heterogeneity as an important resource in the classroom context. Drawing on Goffman’s notion of footing as developed by Goodwin (2006; Goodwin and Goodwin, 2004), we use a multimodal interactional approach to explore how young children use reported speech, along with code-switching and embodied actions, involving voices, language varieties, and experiences from the outside as locally relevant resources in an HL classroom.
Dialogic approaches to heteroglossia and multivoicedness
Bakhtin’s (1981, 1986) concept of heteroglossia entails a dialogical approach to linguistic diversity and creativity that draws on the social, political, and cultural implications and tensions of language use in practice (cf. Bailey, 2007). Instead of conceptualizing languages as separate and stable units, it highlights the dialogical relationship between “processes of centralization and decentralization, of unification and disunification” (Bakhtin, 1981: 421; cf. Blackledge and Creese, 2014). The dynamics of heteroglossia entail social tensions between “centrifugal” forces pulling toward a unitary standard language and the oppositional pull of “centripetal” forces toward linguistic diversity (Bakhtin, 1981: 271–272).
In this study we use language socialization theory (Ochs and Schieffelin, 2011) as an approach to children’s peer language socialization (Goodwin and Kyratzis, 2011), combined with Goffman’s (1979, 1981) concept of footing, to explore how children in multilingual settings use patterns of code choice and code-switching to negotiate social alignments while appropriating diverse ways of speaking, which, according to Bakhtin (1981), are in tension with one another (cf. Kyratzis et al., 2010; Kyratzis and de León, 2019). In Goffman’s terms: “A change in footing implies a change in the alignment we take up to ourselves and the others present as expressed in the way we manage the production or reception of an utterance” (Goffman, 1981: 128). Code-switching can be used to negotiate shifts in alignment, “footing” (Goffman, 1979), or “participation frameworks” (Cromdal and Aronsson, 2000; Kyratzis et al., 2009), and “production format” (Cromdal and Aronsson, 2000). Yet a shift in footing also entails a larger range of phenomena, such as performed “stances”; that is, how participants “evaluate objects, position subjects (themselves and others), and align with other subjects, with respect to any salient dimension of the sociocultural field” (Du Bois, 2007: 173; Goodwin and Goodwin, 2004).
Bakhtin’s dialogical approach to heteroglossia not only implies an acknowledgment of linguistic diversity and pressure toward uniformity but also entails a commitment to the multivoicedness [raznogolosie] of language (1981: 263). The dynamics of dialogism is especially present in reported speech, in which one voice (the reporting voice) reports the utterance of another (the reported voice) (Volosinov, 1973). In this study, we use Goffman’s (1981) notion of footing to explore the layering of voices and structural embodied organization of the participation roles within reported speech, including the alignments participants take up toward the voices performed through speech, and hence with each other, in real-life encounters (Goodwin, 2006; Holt and Clift, 2006).
We will show how children use reported speech accompanied by embodied actions to orchestrate shifts in footing and participation frameworks in which voices of absent family members are dynamically oriented to in classroom interaction in order to gain speakership and display authority. In such instances, the children also draw upon their bilingual resources (Swedish and Chinese) to create shifts in footing and participation frameworks. Goodwin’s (2006) concept of “interactive footing” offers an important framework for studying how both talk and gestures as well as embodied actions are mutually performed in strips of reported speech by different actors (children and teacher) to take stances and display alignments toward actions of both present and absent parties.
Importantly, the young children in our study bring in experiences, language varieties, and norms from the outside (home) that contribute to any topic they are interested in, positioning themselves as active and knowing participants in an HL classroom. This openness to the “outside world” brings into the classroom different voices, registers, and discourses linked to social spheres, language practices, and experiences. Taken together, we will explore how young bilingual children exercise their collective agency in an HL classroom in ways that contribute to establishing the relevance and meaning of HL learning within their transnational community.
Research on children’s language creativity in multilingual settings
Prior research has highlighted the importance of children’s agency in shaping language learning and educational practices and policies at bilingual preschools (cf. Anatoli, 2024; Burdelski, 2010; Cekaite and Evaldsson, 2008, 2017; Puskás and Björk-Willén, 2017; Schwartz et al., 2022). For instance, in a comparative study Schwartz et al. (2022) found that preschool children actively create opportunities for second language (L2) learning by asking questions, suggesting ideas, challenging assertions, and resisting directives (see also Sairanen et al., 2022). These findings illustrate children’s ability to influence discourse and redefine participation norms and classroom structures. Waring (2011) – in a study of English as a second language (ESL), further demonstrates that unsolicited learner initiatives, such as self-selecting turns and initiating new sequences, provide children with speakership in ways that shift classroom participation from teacher-centered to co-constructed spaces. Similarly, Wong (2023) examines how a young ESL learner uses multimodal and multilingual resources to mediate her willingness to enter classroom discourse by using L2 (see also Anatoli, 2024; Cekaite, 2017). These studies shed light on how multimodal resources accompanying the first language (L1) are essential for a L2 learner to sustain engagement in translingual environments.
From anthropological perspectives, Goffman’s notions of participation and participation frameworks have proven foundational in understanding how spatial and cultural configurations in multilingual educational settings afford or constrain children’s engagement (Erickson, 1982; Ochs et al., 2005). Classrooms in particular function as multiparty participation frameworks, with students not only positioned by teachers but also actively positioning themselves vis-à-vis peers and adults (Burdelski and Howard, 2020). Participation, then, is not simply about learning to use a particular language but is rather a socially and interactionally organized process in which students actively take part in positioning themselves and others, aligning with or resisting social norms, while navigating classroom hierarchies and language choices (Baquedano-López et al., 2005; Waring, 2011).
In bi/multilingual educational settings, creative language practices such as language play and repetition are key strategies to enhance participation and engagement (cf. Cekaite and Aronsson 2004; Cekaite and Evaldsson, 2019). Teachers, for example, use repetition and revoicing to scaffold language learning, providing corrective feedback, clarification, and reinforcement on linguistic structures (Walsh, 2011; see also Baquedano-López et al., 2005) Beyond its role in structured language learning, the act of repetition and recycling constitutes a key aspect in children’s peer language socialization (Goodwin and Kyratzis, 2011). Duranti and Black (2011) similarly show that improvisation including repetition is a fundamental mechanism in language socialization, allowing children both to creatively exploit linguistic forms and to reproduce cultural practices. For example, Cekaite and Evaldsson (2019) highlight how young immigrant children (Swedish-Kurdish) in a multilingual preschool setting engage in multilingual peer play in which they exploit HL forms (their linguistic features, social values, and pragmatic uses) in ways that transgress boundaries between different language varieties. The children’s ludic heteroglossic practices, located and enacted within micro-interactional processes, in turn linked to macro-level sociocultural values and tensions between languages. Similarly, Kyratzis (2010) shows how Mexican heritage children in a Spanish-English bilingual preschool moved in and out of play spaces, shifting voice (teacher talk, reading speech) and code to organize their participation and negotiate power asymmetries. The children used English to enact authoritative school voices, reflecting teachers’ practices, but retained Spanish for important changes in footing (Goffman, 1981). These studies demonstrate how children, often in creative ways, exercise their agency by rendering their own commentaries on normatively defined codes and dominant ideologies, frequently advocated and instantiated by adults.
Our study contributes to prior research by exploring how young bilingual children engage with and navigate (predominantly monolingual) participation frameworks in a preschool HL learning setting. It shows how children position themselves as active and ratified bilingual speakers, rather than passive recipients of HL instruction, by creatively drawing on multiple linguistic, embodied, and interactional resources to shape their classroom participation.
Video-ethnographic design
The data presented here derive from an ethnographic study by the first author, based on video recordings (total 50 hours) of bi/multilingual children’s HL learning and socialization at a weekend language school. Participating children met the following selection criteria: (1) aged 3–5 years; (2) Swedish as their first language and Chinese as their HL; (3) regular attendees of the language school. The research was approved by the Swedish Ethical Review Authority (project number: 2023-03955-01).
The analysis is based on video-recorded classroom interactions between the children and their educator at the language school, but it is informed by ethnographic knowledge from interviews with the children’s parents and participant observations (field notes). Each class session lasted 90 minutes and was recorded, using a single camera, placed in a corner of the classroom to minimize disruption.
The heritage language school
This study was conducted at a community-based Chinese HL school located in Swedish urban center. Adopting a Montessori-inspired pedagogical model, the school officially promotes a monolingual Chinese educational and cultural ideology, with instruction and assessments based on monolingual language standards (cf. Jaspers, 2024) to ensure children’s acquisition of their HL (cf. He, 2010). However, in practice, children use Swedish and Chinese on and off to engage with classroom activities. The teacher, a native Chinese speaker fluent in Swedish, supports this bilingual meaning-making while maintaining an overall focus on HL development.
Aligned with Sweden’s national policy on HL (also known as mother tongue, “modersmål”), children at preschools and schools are encouraged and entitled to maintain their HL alongside Swedish to foster linguistic diversity and cultural inclusion (Skolverket, 2018). The language school in this study, which is a private alternative, closely follows the national preschool curriculum, supporting linguistic diversities along with children’s agency. The Swedish preschool curriculum underscores children’s participation and agency in educational activities, recognizing children’s rights both to exercise their agency and autonomy and to contribute to educational activities. The policy is reflected in teachers’ local understandings of children’s classroom participation as collaborative, multilingual, and multimodal, positioning bilingual children as agentive participants in their HL learning.
The classroom and the participants
The primary participants are seven preschool children aged 3–5 years. All participants are of Chinese heritage and were born in Sweden to families with diverse socioeconomic backgrounds, who expect their children to learn Chinese. All the children had already begun preschool in Sweden, where Swedish is the primary language of instruction, and their exposure to Chinese, their HL, varied significantly depending on the family language practices:
The variation in home language practices contributed to differing levels of Chinese proficiency among the children, which in turn affected their participation and engagement in the HL classroom interactions.
In this immersion setting, the children engage in a variety of pre-designed classroom activities (e.g. teacher-led picture book reading, storytelling, drawing, dancing). The learning objectives set by the educators are aligned with the parents’ expectations, focusing primarily on developing the children’s speaking and listening skills in their HL. Each semester, a few themes are selected and revisited throughout the sessions. Activities are conducted mainly in Chinese, although Swedish is occasionally used to facilitate communication. The children and teacher typically sit in a circle during activities (see Figure 1). This spatial arrangement is common at Swedish preschools and serves as an interactional space for fostering inclusivity and encouraging the children’s participation.

Classroom setting.
For the teacher, this spatial small group arrangement of participants/bodies simplifies monitoring and managing the group, allowing for direct engagement with each individual child while maintaining attention on the collective dynamics of the group. The analyses will show how this bodily arrangement also allows children to be attentive to each other’s contributions and engage in supporting one another’s HL learning.
Methodological approach
This study adopts a video-ethnographic methodology informed by two complementary approaches, Language Socialization Theory (Ochs and Schieffelin, 2011) and Multimodal Conversation/Interaction Analysis (Goodwin, 2018), which both involve contextual embeddings of children’s language practices. The multimodal analytical framework is grounded in Goffman’s (1981) concepts of footing, and participation as developed by Goodwin and Goodwin (2004). Expanding on this, we use Multimodal Conversation Analysis (Goodwin, 2018; Mondada, 2009) to account for the temporal and embodied dimensions of participation in interaction. Participation is understood as a fluid and evolving process, in which speakers and hearers dynamically shift roles and alignments in moments of interaction in relation to each other and to the broader activity at hand. This underscores the need to analyze embodied shifts in footing and how positions are continuously orchestrated and reconstructed in interaction in a processual manner through assembling embodied practices including speech, body positioning, gestures, facial expressions, gaze, or prosody (Goodwin, 2006, 2018). Social actions are analyzed as sequentially, temporally, and spatially organized phenomena (Goodwin, 2018; Mondada, 2018).
The data were transcribed using conventions from Multimodal Conversation Analysis (Jefferson, 2004; Mondada, 2018). The transcripts include filtered still images (#) extracted from the recordings to highlight multimodal configurations, such as affective stance-taking, body orientation, and embodied interaction (Goodwin, 2018). All transcription symbols are listed in the Appendix 1. To protect participant anonymity, pseudonyms are used, and black-and-white filters are applied to all images to obscure facial features.
Exercising bilingual agency in a heritage language class
The following sections present analytical findings on how young bilingual children exercise their agency and shape their participation within a Chinese HL classroom. Across three episodes (six excerpts), we will show how the children refer to absent family members through reported speech to assert their own stances while strengthening peer alignments in classroom discourse. Combined, these excerpts highlight the multivocal, multilingual, and multimodal character of children’s collective agency in a HL classroom. Particular focus is on how the children, in strips of reported speech, shape the participation framework in the classroom through collaborative performed actions that integrate personal, social, and pedagogical dimensions of HL learning.
Gaining speakership through reported speech of absent family members
Excerpts 1a and 1b focus on how the children gain speakership in the classroom, by referring to absent parties from the outside (cf. Moore, 2014). We will show how their use of reported speech not only shapes their agency but also reconfigures the participation framework from teacher-led to child-oriented. We explore the dialogical character (Bakhtin, 1981; Goodwin, 2006) of reported speech and (1) how the voices of absent parties are not only heard but become active parties in the classroom, as well as (2) how the multiplicity of voices opens up interactional spaces for other parties (teacher and peers) as well to align with and build upon the contributions, transforming the interaction into a multivocal, multilingual, and multimodal co-constructed discourse (Goodwin, 2006).
Excerpt 1a. My dad says that I am the little monkey
The first excerpt begins with the teacher introducing an animal-naming task in Chinese. In presenting the character cards, the teacher establishes an interactive participation framework (Goodwin, 2006) through her embodied and spatial orientation, which invites the children to engage in HL learning.
One of the children, Ines, immediately aligns verbally and affectively with the teacher’s language task, repeating the introduced topic of “little monkeys” in Chinese (line 2). Building on this alignment, Lin self-selects and seizes the opportunity to expand the interaction by drawing upon a familial voice (her father’s). Her use of reported speech, “my dad
Of interest here is also how Lin’s fluid code-switching into Swedish (“säger att”) embedded within the Chinese utterance draws on bilingual resourcefulness (Cromdal and Aronsson, 2000). The code-switching creates a contrast that functions as a discourse marker, segmenting the reported speech “says that” from her own report of her father’s statement (Goffman, 1981), allowing the speaker (here Lin) to foreground her voice and secure speakership. As Goffman’s (1979, 1981) concept of footing suggests, shifting between languages also enables shifts in perspectives and forms of involvement. As a contextualization cue (Gumperz,1982), code-switching is an important interactional resource for maintaining conversational cohesion and for displaying alignments with co-participants and their talk (Kyratzis and de León, 2019).
Instead of requiring Lin to speak Chinese, the teacher subtly shifts language and introduces the Chinese equivalent in line 5, aligning with the production format in Lin’s bilingual (Swedish-Chinese) utterance. Thereby, the teacher uses her bilingual competence to orient the children toward Chinese as the normatively valued classroom language. The shift in footing allows the teacher not only to re-establish Chinese as the interactional code but also to move forward with the classroom task and align with the HL teaching preference (cf. Kairat and Kyratzis, 2025).
Co-constructing multivoicedness through reported speech and peer-alignments
The next excerpt, 1b, will show how the dialogical organization of the reported speech launched by Lin (in Excerpt 1a) functions as a conversational resource in the classroom for her peers as well. In what follows, Lin’s use of reported speech, “my dad says that I am . . .”, is retrospectively oriented to by Zoe (line 8), who aligns with the production format and incorporates a multiplicity of voices within her own utterance: Excerpt 1b. Lin’s dad says that
The production format in Zoe’s self-selection in line 8 orients retrospectively to the production format in Lin’s reported talk, “my dad says that I am . . .” (Excerpt 1a, line 3) (Goodwin and Goodwin, 1987). The recycling of the format allows Zoe to both (1) incorporate the talk of a peer into her own utterance and (2) integrate her own experiences from home into the ongoing classroom discourse. As shown in Figure 2, the multivoicedness of the reported talk functions as a conversational resource, to align not only with the verbal format but also with the voices and structural organization of the participation roles – namely, the Animator-Author-Principal character format described by Goffman (1981) that exists in the format of each participant’s utterance.

Production format.
In terms of the participation roles offered by Goffman, Lin starts out as the Animator and as the reporting voice of the utterance of another (the reported voice; Volosinov, 1973). At the same time Lin is also the Author, constructing the utterance, referring to her own experience from home of what “my dad says.” Meanwhile the Principal, whose voice and stance are represented in the talk, is “Lin’s dad” (Goffman, 1981: 144–145). Goffman’s notion of production format allows us to explore how young children (here Lin) manage to navigate complex participant roles as Animator and Author to both represent experiences and display their own stances. Through reported speech, Lin is not merely reproducing her father’s voice but also using his words to exercise her agency and to position herself authoritatively within the classroom discourse. In doing so, she demonstrates epistemic access to home-based knowledge and projects her identity as a competent bilingual participant.
Zoe then ties to the format of Lin’s utterance to gain speakership and exercise her agency by integrating her own family experience into the classroom discourse, while reinforcing peer-to-peer engagement and affiliative alignments (Sairanen et al., 2022). Zoe’s word search (line 8) prompts the teacher to initiate a clarification (line 9) based on the current HL classroom task involving naming animals. The teacher-initiated repair sequence (Schegloff et al., 1977) momentarily shifts the narrative trajectory from a child-initiated agenda back to a teacher-led one. Of interest here is how Zoe corrects the teacher in line 10, rejecting the proposed candidate “fox” and realigning her statement with Lin’s original contribution (line 3). By repeating the word “monkey” in line 11 the teacher integrates Zoe’s contribution into the heritage learning as well; this shows how the children’s use of reported speech reinforces classroom discourse as a dynamic, collaboratively constructed process (Waring, 2011).
Invoking contrasting moral stances through reported speech
In the next excerpts (2a and 2b), we will show how the children negotiate and integrate contrasting moral perspectives through reported speech, making references to family members as moral authorities from outside. In Excerpt 2a, the teacher is reading a book titled “The Colorful Monster” to the children, introducing the yellow monster that represents joy (lines 1–2). The teacher’s enthusiastic repetition of “happy,” accompanied by expansive gestures, links candy to feelings of happiness (Goodwin, 2007). We will show how the affective stance taken by the teacher prompts the children to express both their emotional engagement and their moral evaluations. Our focus will be on how reported speech allows a child (Theo) to take a contrastive moral stance by referring to his experience from family life: “but my grandma, she said ‘no . . . cannot eat candy’” (line 4).
Excerpt 2a. My grandma said I cannot eat too much candy
The teacher’s positive affective stance is immediately acknowledged by Elisa and Leia (line 3), who align with her expansive gestures (line 2) by raising their arms, reinforcing a shared emotional orientation. It invites Theo to display a contrasting moral stance (line 4) in the format of reported speech, i.e. “my xx said that . . .”, through a self-selection. Like the peers’ self-selected initiatives in Excerpt 1, Theo invokes an absent family member (his grandmother) as an authority figure in the classroom. By animating her voice, Theo legitimates his oppositional stance claim, thereby shifting the footing (Goffman, 1981) from a teacher-directed trajectory to a child-oriented one. Thus, through reported speech, Theo asserts epistemic authority (Heritage, 2012), positioning himself as a knowledgeable participant (K+) rather than a passive recipient of the teacher’s agenda. Simultaneously, by referencing his grandmother’s prohibition against candy he adopts a moral stance, emphasizing moral obligations and authority.
By referring to his grandmother’s moral authority, Theo also manages to display his moral agency in the classroom. At the same time, he negotiates boundaries between different normative frameworks, effectively integrating home values into the classroom discourse. In response, the teacher revoices the strict prohibition into a less absolute version of “not eating too much candy” (line 5). Her revoicing balances between acknowledging the child’s agency and contrastive moral stance of “not eating candy” while steering attention back to the HL classroom task.
Mobilizing authoritative discourses and peer alignments
In what follows, we will explore how the contrastive moral stance taken by Theo through the use of reported speech opens up a space for another child, Lin, to affectively align with and expand on the moral discourse inferred from home. Theo’s persistent assertion in line 7 is displayed through hesitation markers (“emm”) and self-repair, which indicates his careful negotiation between his own moral stance and the teacher’s modified suggestion. As will be shown, it reinforces not only his own stance but also peer participation: Excerpt 2b. Teeth
Notably, Theo’s moral stance claims are noticed and co-opted (Goodwin, 2018) by a peer, Lin, who now establishes some footing or alignment with the moral discourse performed by Theo and hence between the two of them. By code-switching into Swedish, “
Alongside this verbal stance, Lin performs embodied actions that highlight her own commentaries on the moral consequences of what Theo’s grandmother has said about not eating candy. Lin orients her body and directs her gaze toward the teacher, possibly seeking recognition for her contrastive moral stance. At this point, Theo also momentarily shifts his gaze toward Lin (line 9), signaling his attentiveness and potential uptake of her positive alignment with this stance. The teacher formally acknowledges Lin’s expanded reasoning, shifting her own stance to a more excited and affirmative one through her enthusiastic reformulation (line 10) in the preferred target (heritage) language. Rather than explicitly correcting Lin’s language choice (Swedish), the teacher subtly reformulates Lin’s prior utterance into Chinese through an embedded correction. In doing this, she both affirms the child’s agency and bilingual skills while pursuing her orientation toward HL development and maintenance. After securing the teacher’s acknowledgment Lin again code-switches, this time back into Chinese (line 11), thereby aligning with the HL form as the preferred language.
Lin’s nuanced shifts in participation – from passive listener to active contributor – are accomplished through multilingual and embodied resources. Her accompanying embodied action – pointing at her teeth – reinforces joint attention (Goodwin, 2007), further enriching and clarifying her argument for both her peers and the teacher (line 11). The teacher integrates Lin’s multimodal contribution into the HL learning, explicitly mirroring her embodied gesture while elaborating on her point about dental health (line 12). Simultaneously, Lin’s embodied actions emphasize her stance through multiple modalities and help secure collective attention.
Through their mutual embodied and verbal contributions, Theo and Lin show their agentive co-participation. Theo challenges the teacher’s framing by referring to an authority figure and invoking family norms to reposition himself within the participation framework, while Lin aligns with both Theo’s reported speech and the teacher’s reformulation, integrating multiple linguistic and embodied resources that both sustain the interaction and align with the moral stances taken by the teacher and the other child. The shifting roles – from dissenting to aligning, from listener to speaker, from L2 to L1 – highlight how bilingual children socialize each other and co-construct knowledge through transforming classroom discourse into a site of negotiation between personal and institutional perspectives.
Collaborative storytelling through indirect reported speech
As shown in the previous excerpts, the children draw on reported speech to exercise their agency and to extend topics introduced by the teacher, integrating their lived experiences into classroom discourse. In Excerpt 3, a peripheral farewell moment (one child saying goodbye to her mother) serves as an environmental trigger for Theo to report about an incident involving his absent father (lines 5–13). Theo’s unsolicited use of indirect reported speech develops into a collaborative storytelling event that shifts the footing from teacher-directed to child-centered participation and serves to enhance Theo’s HL learning.
Excerpt 3a. My dad saw a boat
Theo’s report in line 5, “my dad saw . . .” is a retelling of something he has likely heard from a family member in the form of indirect reported speech framed in his own words, rather than as direct reported speech. The multivoicedness (Bakhtin, 1981) of Theo’s utterance is layered with both his own voice and an absent family member’s prior experience. By referencing his father’s actions, Theo appropriates an adult’s register (Paugh, 2019), borrowing its authority to grant him entitlement in the classroom, thereby positioning himself as a key participant. Furthermore, the code-mixing of the Swedish “bå” (boat) with the Chinese “chuan” (boat) create a hybrid utterance that “blurs the boundaries” between the language varieties and positions himself as bilingual (Kyratzis, 2010: 579). His emergent bilingualism highlights how HL learners mobilize diverse linguistic repertoires in monolingual settings (Cekaite and Evaldsson, 2019). This bilingual resourcefulness allows Theo both to navigate linguistic affiliations and to reinforce his interactional agency in the HL classroom. His rapid self-correction from Swedish to Chinese reveals his local sensitivity to the situated meaning and value of the HL. Thus, in reporting about his father’s experiences, Theo shifts footing (Goffman, 1981) from passive observer to active participant, indirectly claiming epistemic authority and reshaping the ongoing classroom interaction. In doing so, he also displays his linguistic competence and local sensitivity to the normativity involving using the HL, Chinese, in the classroom.
Extending collaborative storytelling through multimodal participation
In the next excerpt, 3b, which directly follows Excerpt 3a, we explore how Theo’s use of indirect reported speech also provides interactional resources to collectively expand on one another’s utterances and embodied actions into a collaborative storytelling event. Such skillful expansions of sociolinguistic and embodied resources are carefully assembled within utterances and selectively mirrored over sequences of turns (Goodwin, 2006). First the teacher expands on Theo’s initial telling, about “the boat”, by associating it with the lesson’s topic (“like the moon”) (line 6). Her talk is accompanied by embodied gestures (rocking and digging motions) that project Theo’s continued telling and create a shared engagement among the children. The continued excerpt will show how Theo, as an experienced bilingual, uses his proficiency in Chinese to participate in a collaboratively performed HL storytelling event with the support of the teacher.
Excerpt 3b. My dad saw a boat
Theo begins by correcting the teacher’s embodied expansion and revoicing of his initial telling (line 7). Clarifying that the boat is “like an excavator” (not the moon), Theo maintains his father’s original telling. His use of an agglutinating syntactic structure (“xiang yige . . . watuji”) demonstrates his epistemic agency and bilingual proficiency in Chinese as his HL. Theo further reinforces the linguistic meaning of his utterance through an embodied gesture, performing a digging motion (see fig. excavator) that vividly illustrates and emphasizes the concept of the excavator. The embodied clarification also underscores how talk accompanied by gestures serves as a crucial meaning-making resource for reconfiguring the participation framework from teacher-led to child-oriented.
The teacher aligns with Theo’s verbal clarification, reformulating the Chinese word with an added causal connection (line 8) and seeking Theo’s confirmation. Her verbal clarification is accompanied by a digging embodied gesture that aligns with the dialogical organization of Theo’s telling. The mutually performed embodied response work shows how young children’s telling of personal experiences involves “co-participants as reflexive actors” (Goodwin, 2007: 28) who mutually monitor one another’s actions (cf. Evaldsson and Abreu Fernandes, 2019). After a short pause, Theo affirms the teacher’s voicing with a nod (line 10), followed by the teacher’s subsequent evaluative token “okay” (line 11), signaling her validation and readiness to proceed. In response, Theo incrementally expands his telling by animating the boat’s movement on water (line 12), performing wave-like motions with his body (see fig. water). The teacher mutually orients to and expands on the emerging embodied structure of the telling in progress (Goodwin, 2007; line 13), anchoring Theo’s telling within the HL learning event while facilitating continued joint attention through embodied alignment, topic expansion, and negotiation of personal experiences (Burdelski and Howard, 2020; Goodwin, 2007).
The spontaneously performed collaborative storytelling event shows (i) the child’s active participation and bilingual agency, and (ii) how a mutually supported relationship between children and teacher co-creates a bilingual learning environment that honors personal experience while (iii) supporting children’s agency in HL learning as a socially situated, dialogic process.
Concluding discussion
Taken together, our analysis shows how young bilingual children exercise agency in a Chinese HL classroom by creatively reconfiguring participation frameworks through reported speech, embodied actions, code-switching, and peer alignments. Drawing on Bakhtin’s (1981) concept of heteroglossia and Goffman’s (1974, 1981) notion of footing (cf. Goodwin, 2006) we demonstrated how children use multivocal, multilingual, and multimodal resources to position themselves as knowledgeable participants, integrating personal experiences and multiple voices into classroom discourse.
Across the excerpts, children’s participation frequently emerged through unsolicited initiatives (Waring, 2011), particularly in the form of reported speech. These multivoiced contributions show how HL learning and teaching take place in “a dialogic multidirectional way” (Busch, 2014: 14), in which roles are situationally negotiated. The children’s reports about their experiences from family life not only reposition them as powerful speakers but also show how young learners can assume a teaching role. For example, Lin’s spontaneous use of reported speech, “my dad says that I am. . .”, and Zoe’s format-tying in Excerpts 1a and 1b shift the participation framework from a teacher-led IRE sequence (Mehan, 1979) toward a child-initiated, dialogically oriented framework, marking a shift in the speaker’s footing – from that of a responsive participant to an active co-creator of classroom discourse (Goffman, 1981; Goodwin and Kyratzis, 2011).
By making references to personal experiences from the outside, the children bring in authoritative voices and discourses from home (e.g., “my dad says . . .”, “my grandma said . . .”) to legitimate and justify their own contributions, reframing the classroom discourse. By appropriating adult voices and moral discourse from home, the children mobilize adult registers (Paugh, 2019), reframing their personal storytelling in ways that allow them to inhabit more authoritative participation roles. In these heteroglossic practices (Bakhtin, 1981), Swedish and Chinese interact within single utterances. This code-switching serves as an interactional resource to enhance both the children’s agency and classroom participation. In Excerpts 1 and 2 Lin code-switches to report about her personal experiences from home, while in Excerpt 3 Theo blends languages, as part of a word-searching process, to report about an experience his father had. These multilingual utterances mark shifts in footing and move between different roles – whether addressing the teacher, aligning with peers, or drawing on home-based knowledge. These shifts underscore the heteroglossic nature of HL learning and how children use diverse linguistic and embodied resources to navigate in a multilingual classroom (Cromdal and Aronsson, 2000; Cekaite and Evaldsson, 2019).
However, in order to understand the social dynamics of heteroglossia, we need to note that normativity and pressure toward uniformity are also part of children’s heteroglossic language use in multilingual settings (cf. Blackledge and Cress, 2014; see also Evaldsson, 2025). While children actively shape participation through their heteroglossic contributions, the teacher scaffolds engagement through revoicing, expansions, and reformulations that supports their use of HL. Importantly, the teacher’s interactional and embodied sensitivity (e.g., gestures, gaze, affective alignments) supports the children’s verbal and nonverbal engagements in HL learning, ensuring their mutual participation. Rather than enforcing a monolingual norm, the teacher uses embedded corrections to create a space for “translanguaging” (Wei, 2018), opening up possibilities for children’s spontaneously performed heteroglossic practices of code-switching, and code-mixing between Swedish and Chinese, to display a range of actions in the classroom (cf. Cromdal and Aronsson, 2000).
In sum, this study offers theoretical and empirical insights into young children’s collective agency and language learning in multilingual classroom settings, by combining Bakhtin’s (1981) approach to heteroglossia with Goffman’s (1981) notions of footing and participation (as developed by Goodwin, 2006). It shows that children’s embodied, affective, and moral engagement in HL learning is a collectively performed practice that is shaped not only by their teacher’s agenda as well as the educational one but also by the children’s own lived experiences, bilingual competencies, and multimodal performances. Through talk, gesture, gaze, and affective displays, children display and negotiate their rights to speak and to collaborate with peers, while also navigating institutional monolingual norms and stratified linguistic values linked to learning a HL within a bi/multilingual classroom. In so doing, children draw on their sociolinguistic repertoires in their play and learning, transforming the classroom into a dialogic and heteroglossic space for appropriating a HL. By recognizing and supporting young children’s spontaneously performed contributions, educators can create inclusive, responsive, and linguistically rich learning environments that empower these students to meaningfully engage in their HL learning journey.
Footnotes
Appendix 1
Acknowledgements
We would like to express our appreciation to the children and the teachers who participated in this study, the editors of the Special Issue, and the anonymous reviewers for providing constructive feedback and suggestions for improving the paper.
Ethical considerations
The study was approved by the Swedish Ethical Review Authority (project number: 2023-03955-01), ensuring that appropriate measures were taken to protect participants’ rights, confidentiality, and well-being.
Consent to participate
Written consent forms were collected from participants and children’s guardians prior to the start of the study. Children and teachers were informed about the voluntary nature of their involvement in the research process and their right to withdraw at any time without consequences.
Consent for publication
Consent for publication was obtained within the consent for participation. The data are anonymized. All names are pseudonyms. Still images filtered in black and white are used to preserve the participants’ confidentiality.
Author contributions
Zejia Xu: data collection, transcription, analysis, literature review, writing, revisions.
Ann-Carita Evaldsson: conceptualization, analysis, literature review, writing, revisions.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
All data are confidential and cannot be shared.
