Abstract
This study applies natural language processing and qualitative classroom video analyses to examine classroom discourse. Guided by hybridity theory, which emphasizes the benefits of blending everyday with academic language practices for expanding students’ opportunities to engage with disciplinary ideas, our study systematically identifies how teachers’ and students’ discursive resources operate in science classrooms. The NLP (namely, LIWC and topic modeling) results indicated that traditional academic discourses were prevalent; however, hybrid discourses were also evidenced in the presence of social language and everyday topics, as well as high language coordination between teachers and students. The qualitative findings further illustrated how teachers facilitated hybrid discourse spaces through open-ended questions that prompt disciplinary connections to contemporary events and students’ out-of-school experiences. We discuss the implications for future research and partnership with educators in service of supporting hybrid discourse in science classrooms.
Keywords
Introduction
Science discourse is at the heart of students’ science learning. By asking questions, sharing connections to lived experiences, and generating evidence-based explanations, students make sense of the natural world (Bae et al., 2021b; Colley & Windschitl, 2016; Ford & Wargo, 2012). Unfortunately, these rich science talk opportunities are often absent in middle school classrooms, particularly for students who have historically been underrepresented in science (Bae, et al., 2021a; Morgan et al., 2016). Science discourse has been the subject of several education reforms and research initiatives (e.g., National K12 Framework for Science Education, NRC, 2012), and scholars underscore the role of spoken discourse, which is inextricably shaped by the social context in which it is produced and received, in fostering scientific understanding (Cavagnetto & Hand, 2012; Cazden, 2001; Windschitl et al., 2020).
Science discourse can create equitable opportunities for every student to have a voice in the curriculum. Unfortunately, students continue to receive science education in “final form” or as a set of irrefutable facts, with little opportunity for active sense-making (Applebee et al., 2003; Bae et al., 2021a; Chinn et al., 2001; Duschl et al., 2007; Engle & Conant, 2002; Ford & Wargo, 2012; Windschitl et al., 2020). Further, the language of science involves specialized discursive conventions. Instruction, texts, and assessments laden with technical, academic language can create barriers to accessing core science ideas in the curriculum that students would be able to understand otherwise (Bang & Medin, 2010; Emdin et al., 2021; García & Solorza, 2021). Scholars have thus argued that while opportunities to engage in high-quality scientific discourse are important, traditional ways of doing this are insufficient for creating equitable learning environments (Brown & Ryoo, 2008; Lee et al., 2013; Rosebery et al., 1992).
Hybridity theory provides a valuable framework for exploring education spaces that recognize and leverage historically marginalized students’ home and community backgrounds as assets in classroom discussions. This theory highlights the concept of hybrid discourse spaces, where students’ funds of knowledge (FoK; Moll et al., 1992) and academic discourses merge to expand forms of engagement in classrooms (Barton & Tan, 2009; Gutiérrez et al., 1999; Moje et al., 2004). These hybrid spaces blend the social and cultural practices from students’ daily lives with academic environments to support relevance and deeper learning (see reviews by Hogg & Volman, 2020; Llopart et al., 2018).
In this study, we build on the primarily qualitative literature on hybrid discourse spaces by incorporating natural language processing (NLP), a computer-assisted method for automatically analyzing textual data (J. Liu & Cohen, 2021; Manning & Schütze, 1999). We apply NLP techniques to science classroom video data to explore patterns in spoken classroom talk. Although NLP is widely used in other industries, its application in educational research, particularly for analyzing spoken discourse in urban science classrooms through a social justice lens, remains relatively new (Hankour et al., 2024; Nee et al., 2021). We also conducted qualitative analyses of classroom videos to contextualize the descriptive discourse patterns revealed through NLP. This mixed-methods approach enables us to interpret the quantitative NLP findings within the specific instructional, social, and cultural contexts of the classrooms, offering novel insights into the nature of hybrid discourse spaces.
Hybrid Discourse Spaces
Core to the hybrid discourse framework is the premise that students bring valuable discursive resources for learning that should be integrated in formal learning spaces (Barton & Tan, 2009; Gutiérrez et al., 1999; Moje et al., 2004). These resources have been referred to as students’ funds of knowledge (FoK, González et al., 2005; Moll et al., 1992). Over the years, the concept of FoK has broadened, and at least seven sources of students’ FoK are identified in the literature, including students’ 1) homes, 2) communities, 3) peer groups, 4) popular culture, 5) interests, 6) intersectional identities, and 7) language practices and backgrounds (Jovés et al., 2015; Esteban-Guitart & Moll, 2014). Funds of knowledge–based discursive resources often have linguistic, thematic, and interactional features that differ from discourse genres that subscribe to academic ways of communicating in schools (Halliday, 1993; Schleppegrell, 2001). The linguistic features of academic versus the everyday FoK discourses are reviewed next.
Academic discourses reflect conventional practices commonly used in schools, characterized by formal language, structured expression, adherence to standardized rules, complex vocabulary and sentence constructions, layered clause arrangements, and an impersonal tone that positions authority with scholars and subject-area experts (Halliday, 1993; Jensen et al., 2021). This “language of schooling” stands in contrast to everyday spoken discourses, which typically use more general vocabulary, personal pronouns, and prosodic elements like intonation and pauses to convey meaning through loosely linked ideas and conversational rhythm (Schleppegrell, 2001). Further, within traditional academic discourses, teachers often maintain control over classroom talk, guiding interactions in a directive, rule-bound manner that limits spontaneous student contributions (Bennett, 2014; Carlone et al., 2011; Engle et al., 2014; Engle & Conant, 2002). Although academic talk may be connected to broader learning goals (e.g., developing content knowledge), students typically participate within narrowly defined boundaries shaped by institutional expectations (Aulls, 2002; Sandoval et al., 2021). Importantly, academic discourse conventions are not neutral; they originate from and uphold the dominant norms of white colonial patriarchy, influencing which voices and forms of knowledge are deemed valid (Bang et al., 2013; Flores & Rosa, 2023).
In contrast, the existing literature points to the potential of integrating students’ FoK and everyday discourses into the mainstream curriculum to create hybrid discourse spaces that support equitable access to science learning. Students from minority backgrounds often bring diverse linguistic experiences to school, such as fluency in a non-English language and/or dialectical differences (Craig et al., 2009; Gatlin-Nash et al., 2023; Hussar et al., 2020; Washington & Seidenberg, 2021; Wheeler, 2010). Additionally, the communicative styles typical of youth culture (or youth genres), such as banter and playfulness, are influenced by students’ affiliations with specific ethnolinguistic or dialect communities (Kamberelis & Wehunt, 2012; Varelas et al., 2002). These interactions often involve the use of distinct dialects like African American Vernacular English (AAVE), regional or heritage language phrases, culturally rooted humor, code-switching, and/or youth-specific slang (Anaya et al., 2018; García & Kleifgen, 2020).
Studies have documented how these language repertoires and cultural practices are highly relevant to doing science, such as the ability to combine two languages to make sense of scientific ideas, and drawing from home experiences to construct rich analogies and storylines that illustrate scientific phenomena (e.g., Brown & Spang, 2008; Calabrese-Barton & Tan, 2018; Emdin et al., 2021; Jensen et al., 2021; Warren et al., 2001). For example, case studies illustrate how students dramatize the moment-to-moment processes in real time to make sense of plant growth (Ogonowski, 2008; Rosebery et al., 2008), apply functional reasoning from everyday experiences of walking up and down hills to connect to concepts of forces and motion (Varelas & Pappas, 2006; Warren & Rosebery, 2008), and bring to bear knowledge of water availability, use, and waste in different parts of the world to explain conservation (Jean-François, 2008). Further, research suggests that creating opportunities for cultural distinctiveness, situating problems in community, and drawing upon examples or events that represent how minority students perceive and interpret their social encounters in and out of school are powerful entry points for engaging diverse students in science (Bae et al., 2022; Birmingham & Calabrese-Barton, 2014; Jensen et al., 2021; Rosebery & Warren, 2008).
However, the dominance of English, and traditional academic discourse in U.S. schools, often restricts students’ full linguistic potential (Anaya et al., 2018; Bedore & Peña, 2008; García & Kleifgen, 2020). Understanding how discursive variations serve as assets within classrooms is key to fostering inclusive learning environments (Edwards, 2014; Jensen et al., 2021; Marencin et al., 2024). Research has shown that when teachers appreciate and connect students’ way of speaking with disciplinary science terms, students’ access to and understanding of complex scientific phenomena are improved (Brown & Spang, 2008; Brown & Ryoo, 2008; Varelas et al., 2002). For example, specific teacher talk moves, such as incorporating accessible, everyday phrases (“backbone . . . skeleton in your back”) with the academic, scientific equivalent (e.g., “spine”) supported students’ ability to better understand and explain science concepts (Brown & Spang, 2008; Brown et al., 2010). Brown and Ryoo (2008) conducted an experimental study to compare three approaches to presenting science content: 1) use of everyday language (e.g., “light”), 2) use of canonical science terminology (e.g., “photons”), and 3) use of a control group. Results showed that following instruction, students in the everyday language condition demonstrated higher science learning gains on both multiple-choice and short-answer assessments. It is therefore important to support teachers in becoming familiar with their students’ language backgrounds and communication practices and exploring ways to connect these with the curriculum.
Teachers also bring their own language practices as well as beliefs about their students’ language practices into the classroom. There is a large body of literature that shows that teachers’ identities, racial and ethnic backgrounds, and cultural values influence their approaches to instruction (Hedges, 2012; Redding, 2019). Because Standard American English (SAE) is privileged as the premier academic language, and our teaching force in the United States remains predominantly white and English-speaking, students who speak other dialects and/or languages that are unfamiliar to the teacher may be overlooked and sometimes even treated as nonacademic (Lee et al., 2013). For example, research has shown that students who do not always use SAE may be perceived as less fluent and are more likely to be graded poorly or even be misplaced into special education programs (Hendricks & Jimenez, 2021; Wheeler, 2010). These implicit and explicit forms of marginalization accumulate into persistent barriers for students from minoritized communities to engage in disciplinary discourse practices (Lee et al., 2013).
To date, although the theoretical principles of hybridity are well-established, much of the literature is characterized by qualitative ethnographic or case studies that apply time and resource-intensive methodologies and thus are difficult to scale. Additionally, there is a need to examine the concepts of hybrid spaces for today’s youth who are navigating schooling within a fast-paced, technologically immersed, and more globally connected society. In this study, we explore the potential affordances of applying natural language processing (NLP) techniques to identify multiple features of classroom discourse that provide information about the discursive resources present in middle school science classroom talk. A brief overview of NLP in studies of classroom discourse is reviewed next.
Natural Language Processing (NLP) to Understand Classroom Discourse
Natural language processing provides sophisticated analytic tools for rapidly identifying patterns in human language that are time-consuming and less feasible with manual coding approaches. It encompasses a range of techniques that enable machines to process and generate human language and overcomes many of the challenges of traditional approaches to analyzing classroom discourse by offering machine learning–driven analytics that focus on a multitude of fine-grained (e.g., momentary) interactions among classroom participants. These techniques allow for precise and automated examination of linguistic and sociolinguistic features that are labor-intensive to hand-code (Crossley et al., 2014; Dowell et al., 2021; J. Liu & Cohen, 2021; Liu & Sun, 2023; Lucy et al., 2020; McFarland et al., 2021).
Studies are increasingly demonstrating the reliability of NLP techniques for coding a wide range of textual data representing human language. For example, Crossley et al. (2014) used an “SiNLP” method to analyze student essays by way of multiple linguistic features within the text, such as text structure (number of words, sentences, paragraphs), vocabulary (unique word usage), and givenness (number of determiners and demonstratives). These features generally correspond with linguistic complexity and have been used in prior studies to evaluate writing quality. Their goal was to create an accessible and easy-to-use tool comparable to advanced NLP analyses and human scoring, and they found that their SiNLP tool produced an exact match to human scores of essay quality 61% of the time. This was a relatively early example of how powerful NLP can be for the analysis of textual data. Over the past decade, the application of NLP (e.g., lexicons, word embeddings, topic models) to text-based data in educational settings has grown substantially (McFarland et al., 2021). Some notable examples include studies that examine the depiction of historically marginalized groups in textbooks (Lucy et al., 2020), detecting student ideas while engaging in an adaptive dialog to guide knowledge integration (Gerard et al., 2025), students’ values in written essays following an affirmation intervention (Dowell et al., 2021), and the presence of equity-promoting behaviors in participants’ responses to digital equity–focused simulations (Littenberg-Tobias et al., 2021).
When examining the capabilities of NLP with real-time classroom data (e.g., talk, behaviors), the literature is considerably smaller than text-based analyses. However, there is a growing body of work that demonstrates the potential of NLP for capturing various aspects of classroom discourse (D. Wang et al., 2024). For example, J. Liu and Cohen (2021) used various NLP techniques (e.g., dictionaries, topic modeling) using close to 1000 transcripts (approximately 29,000 minutes) of elementary language arts classroom videos from the Measures of Effective Teaching project. The results shed light on various classroom discourse patterns related to teacher-student communication dynamics and pedagogical approaches in classroom communication. These included turn-taking (a parameter that reflects who has control over classroom conversations), targeting (use of “I” versus “You” statements that indicates authority), analytic versus social language (degree to which canonical academic versus everyday language is present), language coordination (degree to which teachers match their students’ speaking styles), allocation of time between academic and routine content, and questioning (frequency of open versus closed-ended questions posed by the teacher; J. Liu & Cohen, 2021). When looking across the descriptive results together, their findings shed light on notable classroom discourse patterns; that is, classroom discourse across the 976 videos was predominantly teacher-directed, with teachers occupying approximately 85% of class time talking and higher use of hierarchical language that signaled authority (J. Liu & Cohen, 2021). Additionally, classroom language was more analytic in nature, with low percentages of open-ended questions and routine-focused language. However, high language coordination was found between teacher and students (synchrony in language styles), indicating that teachers are incorporating their students’ ways of speaking into their instruction and classroom discourse (J. Liu & Cohen, 2021).
Similarly, Datta et al. (2023) used BERT topic modeling to identify and label features of mathematics discourse in high school math classes. Four trained coders annotated and labeled 4,413 questions as expository, probing, procedural, or other. Four different BERT models (BERT-base, RoBERTa-base, Microsoft DeBERTa, and DistilBERT) were then applied to classify the data. Model accuracies were similar, ranging from 74% to 76% agreement with human coding. Findings provide evidence for using this NLP technique to provide teachers with rapid access to data about the types of questions they ask so that they can make informed decisions about asking questions for various purposes, such as promoting greater engagement and deeper thinking from their students.
Researchers have also recently applied these advanced NLP methods to real-time classroom contexts. Demszky et al. (2023) conducted a randomized controlled trial to evaluate the effectiveness of an automated feedback tool, M-Powering Teachers, which utilized NLP techniques to provide feedback on instructors’ uptake of student contributions in an online programming course. Their study involved 1,136 instructors and approximately 12,000 students. Teachers were assigned to different conditions; they either received weekly email reminders about the feedback in the treatment group or had access to the feedback but received no reminders in the control group. Teachers who were prompted to check their feedback showed an increase in uptake of 13.2%, primarily through more sophisticated instructional strategies such as follow-up questioning, as they also asked 11.4% more questions. There were also some variations in response, with female, returning, and non-U.S. instructors showing greater improvements. Additionally, they found that students taught by instructors who received and checked the feedback completed more assignments and reported higher satisfaction with the course (Demszky et al., 2023).
As another example of how NLP can be leveraged to identify sociolinguistic patterns in the classroom, Hunkins et al. (2022) used random forest models to classify teacher language into eight different categories, representing supportive messaging and negative messaging. Codes for supporting messaging included public praise (explicit praising of students for their behavior), autonomy support (providing students with choices about their learning), learning mindset support (language that supports growth mindset, purpose and relevance, and belonging) and strategy suggestion (sharing tips and techniques with students), while the negative codes were public admonishment (expressing disapproval of student behavior), controlling language (language that limits student agency and autonomy), learning mindset undermining (language that undermines growth mindset, purpose and relevance, and belonging), and lack of strategy (teacher withholding or not offering tips or strategies after being asked for help). They conducted these analytics on 156 classroom videos from 73 total teachers and 1,400 total students. Notably, they examined both verbal and paraverbal features (spectral features called mel-frequency cepstral coefficients or MFCCs, which are commonly used in speech recognition software). Results showed that teachers were more likely to use admonishment than praise, but also more likely to use mindset supportive language than undermining language. When compared to student perceptions via survey, results showed that three of the eight codes were significantly correlated with student perceptions, admonishment, praise, and learning mindset support. Additionally, verbal features were much more effective and accurate predictors of discourse than nonverbal cues for all discourse features (Hunkins et al., 2022).
Researchers have also used NLP methods to examine how particular interactions shape student learning. For example, D’Angelo and Rajarathinam (2024) examined TA-student interactions in collaborative group discussions within an undergraduate engineering course. After collecting student-TA interaction data, they analyzed speech patterns (turn-taking, duration, and overlap) while manually coding video data for collaborative behaviors. Across 20 sessions with nine student groups, findings showed TAs primarily offered task-related support (e.g., concept explanations, scaffolding), with minimal focus on collaboration skills (only 6 of 873 interventions). Student participation varied widely, with some groups displaying equitable talk distribution while others were dominated by a few voices. Maximum turn duration (the longest a student talked in any given group) emerged as a key indicator of participation, with a longer maximum turn duration linked to greater contributions for any given student. Overlapping speech, which can indicate active engagement, was more frequent without TAs, suggesting their presence may have disrupted natural group dynamics; however, they also found that overlapping speech and total amount of talk were not linked, indicating that these variables may be more complex than previously assumed. TAs also intervened very often, meaning students may not have had time to process or act on the feedback provided, potentially hindering their ability to engage in deeper discussions.
To date, a small but growing body of research illustrates NLP’s promise for capturing important features of classroom discourse, such as teacher-student turn-taking, questioning strategies, equity-focused messaging, and language coordination. In this paper, we couple the application of many of these NLP techniques with qualitative video analyses to answer substantive questions regarding if and how students’ and disciplinary discursive resources are integrated within more academic and more hybrid discourse spaces.
Present Study
In this paper, we apply NLP techniques (namely, LIWC and topic modeling) to advance our understanding of hybrid classroom discourse in urban science classrooms. Using hybridity theory as our framework, NLP techniques are applied to provide detailed indicators of the discursive resources related to today’s youth’s FoK and disciplinary ideas present in classroom discussions. We then conducted qualitative video analyses to generate contextualized interpretations of the quantitative patterns identified from the NLP-generated science classroom discourse metrics. This approach leverages the advances in automated linguistic analysis of teacher and student-generated discourse data, while also applying traditional classroom video analysis approaches for interpreting complex social phenomena (Lucy et al., 2020).
Importantly, our goal was to identify and interpret fine-grained descriptives of discursive features that have potential for scale, shedding light on discourse patterns that either reinforce established dominant norms or offer historically marginalized students more inclusive access to science discourse. This application, coupled with qualitative analyses of key classroom events in classroom video data, will allow us to contextualize the descriptive NLP patterns within the specific contexts of the participating students, teachers, and schools of this study.
The following research questions are aligned with these goals, aimed at understanding features of hybridity in classroom discourse:
RQ1: What discursive resources are present in science classroom discourse? (NLP, Table 2)
1a. What is the proportion of teacher versus student spoken words?
1b. What is the proportion of open- versus closed-ended questions asked?
1c. What is the proportion of more canonical academic (e.g., analytic, scientific vocabulary) compared to more everyday (e.g., social) language?
1d. What is the degree of language coordination between teacher and students?
RQ2: How do academic and everyday discursive resources intersect in urban science classrooms to create hybrid spaces? (Qualitative classroom video analysis)
2a. Are there noticeable patterns in the co-occurrence of discursive resources in more academic versus more hybrid discourse spaces?
2b. What are the key contextual and activity features, as well as social dynamics, that characterized science hybrid discourse spaces?
Methodology
Design
We used an explanatory sequential mixed-methods design (Creswell & Plano Clark, 2018). First, in the quantitative strand, we applied NLP techniques to generate a comprehensive set of quantitative results that describe and characterize the features of and variation in science classroom discourse, with a focus on discursive resources. We identified patterns in these indicators (e.g., co-occurrences) to then qualitatively examine classroom videos for key events in which hybrid (versus academic and everyday) discourse spaces are present. In the qualitative strand, we analyzed key classroom events to contextualize the descriptive, quantitative indicators with attention to dynamics of urban science classrooms (RQ2).
Data
A total of seven science classroom videos from two school districts in the southeastern region of the United States were collected as a part of a multiyear middle school science partnership project. School districts A and B serve a student population who identify as male (51%, 50%), female (49%, 50%), White (50%, 12%), Black (26%, 69%), Hispanic/Latinx (16%, 15%), two or more races (5%, 2%), Asian/Pacific Islander (4%, 2%), and Native American (<1%, <1%), respectively. The Swivl technology was used, which includes a camera that captures a wide-angle view of the teacher and students. The classroom teacher wore a wireless microphone, and audio recorders were placed on student tables to capture student voices. The videos ranged from 38.97 to 77.98 (M = 58.09, SD = 14.44) minutes in length. The main analytic sample for this NLP study was the verbatim transcriptions of the videos, and the unit of analysis was marked by the beginning and end of each speaker’s turn (J. Liu & Cohen, 2021).
Analyses
NLP techniques to explore discursive resources in science classrooms
To answer RQ1, focused on documenting and describing the various discursive resources present in science classroom discourse, the following NLP techniques were applied: 1) LIWC to examine frequency of spoken words as well as analytic versus social language, 2) parts-of-speech tagging for identifying open and closed-ended questions, 3) language style matching, and 4) topic modeling to allocate academic versus everyday content talk. A description of the NLP techniques, the generated output, and examples of guiding questions to identify patterns and interpret findings are summarized in Table 1. LIWC is a text analysis tool that measures various features of written or spoken language (Pennebaker et al., 2015). In this study, we include summary variables that provide a broader, more abstract analysis related to the psychological and emotional aspects of language (e.g., analytic versus social language, language coordination; Pennebaker et al., 2015). The results are presented by the speaker (teacher versus student).
Summary of NLP techniques to explore discursive resources in science classrooms
Frequency of teacher versus student spoken words
A higher frequency of teacher versus student spoken words provides a descriptive measure of teacher versus student control (or agency) in classroom discourse, respectively (e.g., Chinn et al., 2001; Z. Wang et al., 2013). LIWC software calculates word counts along with the validated dictionaries.
Closed vs. open questions
The act of asking questions is crucial in fostering meaningful science discourse. Both the quantity and quality of these questions are essential factors. A decrease in the overall number of questions posed by teachers indicates reduced control, and similarly, a larger number of questions posed by students indicates a shift in agency to students (Chinn et al., 2001). Additionally, the nature of these questions (i.e., being open-ended, authentic that probe into multiple student ideas) indicates a learning environment that shifts agency for science sense-making to students (Nystrand, 2006). In contrast, simple one-word (i.e., closed-ended) questions are typical of teacher-directed agency, related to classroom management, rhetorical expressions, or managing the flow of discussion (Applebee et al., 2003). To count the number of questions a teacher asks, we used regular expressions, a programming technique that automatically identifies specific text patterns to extract question marks and their corresponding questions during a class. To differentiate between open-ended questions and others, we needed to train our computer algorithm to recognize the features that set them apart. To accomplish this, two experienced raters with K–12 classroom backgrounds, who are also current education researchers, manually labeled 600 randomly chosen questions from the dataset. These labeled questions served as a “training” dataset. Because many factors can predict whether a question is open-ended or not, traditional regression-based prediction methods were not feasible due to the likelihood of having more variables (i.e., words) than observations. Therefore, we employed the Lasso method, a feature reduction regression approach designed for such situations. We applied a 10-fold cross-validation Lasso regression model to learn from the training data and make predictions for the remaining questions.
Analytic versus social language
Teachers serve as language role models for their students. Research indicates that children’s language development is influenced by the adults they interact with, including both teachers and parents (Hassinger-Das et al., 2017; Song et al., 2014). We will examine two aspects of teachers’ language usage to contrast analytical, logical, and consistent thinking with more intuitive, narrative, or social language. Analytical thinking encompasses words associated with formal, logical, and hierarchical thinking, such as prepositions (e.g., to, with, above), cognitive terms (e.g., cause, know, hence), and exclusive language (e.g., but, without, exclude). In contrast, social language pertains to human interaction and includes non-first-person-singular personal pronouns (e.g., we, us, you all) and verbs related to human interaction (e.g., sharing, talking). We expect to see more equal proportions of analytical and social language in hybrid discourse spaces. Linguistic inquiry and word count (LIWC) will be used to compare the presence of formal, logical, and hierarchical thinking as calculated via the analytical thinking summary variable to the proportion of casual language used in human interactions as represented by the social linguistic dimension (J. Liu & Cohen, 2021).
Academic vs. everyday topics
Using topic modeling, we will gauge the proportion of teachers’ and students’ spoken language devoted to academic (e.g., scientific terms, behavioral management) vs. everyday (e.g., real-life experiences, colloquial expressions) talk to identify key classroom events that exemplify more traditional science discourse and those that exemplify hybrid discourse spaces. We expect that classrooms in which both academic and everyday discourses are present in more balanced proportions will illustrate hybrid discourse spaces. Specifically, bidirectional encoder representations for topic modeling (BERTopic) is a topic modeling method that uses both advanced language models and a statistical keyword approach to group similar utterances. It begins by converting sentences into numerical vectors using a compact sentence transformer model called “all-MiniLM-L6-v2” (a version of miniature language model), then reduces the dimensionality of these vectors using uniform manifold approximation and projection to better organize the data. It clusters related utterances using hierarchical density-based spatial clustering of applications with noise, and finally identifies representative keywords for each topic using maximal marginal relevance, which selects terms that are both relevant and diverse (Grootendorst, 2022). This unsupervised NLP technique scans the data (e.g., classroom transcripts) to detect word and phrase patterns, then clusters the word groups and similar expressions. The topics are then labeled according to common themes as they relate to academic versus everyday versus hybrid discursive exchanges.
Language coordination
Language coordination describes the similarity in how students and teachers speak and uses the language style matching (LSM) formula to calculate the verbal mimicry between groups of people by looking at the differences in function word usage (Gonzales et al., 2010). Unlike content-specific words, function words count filler words, such as pronouns, prepositions, and modifying verbs, that fit around the content of a sentence. Function words are helpful inputs for verbal mimicry because they tend to be unconscious words representing speaking style independent of the conversational content. Contexts with high levels of language coordination are associated with more cooperative environments; thus, we expect to see more coordination in hybrid discourse spaces. We will use LIWC-22 to calculate the pairwise comparison LSM between teachers and students.
Classrooms are contested spaces where language is negotiated, and students who use nonstandard dialects and/or speak a non-English native language may face barriers that further compound systemic inequalities for marginalized students (Hendricks & Jimenez, 2021; Lyn, 2022; McClendon & Valenciano, 2018). Traditional discourse spaces tend to place a higher value on academic language, which is generally tied to linguistic norms present in SAE, leading to other ways of expression being viewed as informal or inaccurate (García & Solorza, 2021). Given the urban science classroom context of this study, considering African American Vernacular English (AAVE) and other nonstandard American English dialects is particularly relevant as research shows that teachers who share cultural and linguistic backgrounds with their students also tend to be perceived more positively by and elicit better academic outcomes from those students (Egalite, 2015; Lee et al., 2013; Redding, 2019).
Qualitative Analyses of Key Science Classroom Events Related to Hybrid Discourse Spaces
As part of a larger study, the transcripts were also human-coded to identify traditional (academic) versus everyday versus hybrid discourse spaces (Bae et al., 2022, 2024). Of note, we recognize that classroom discourses exist on a continuum where, in practice, students and teachers transition between discourses and any given segment of talk will often include interactions from various discourse registers (Schleppegrell, 2001). Therefore, discourse spaces here were identified based on which discursive resources were most prominent in that segment. For example, a segment was only coded as hybrid if it clearly combined formal science concepts (from the curriculum) and discursive resources drawn from students’ out-of-school experiences or community knowledge. In other words, hybrid segments required the explicit integration of academic content with experiential or non-school-based knowledge.
These discourse spaces represent structural (macro) features of equitable classroom discourse, characterized by groups of exchanges that occur within a defined lesson activity structure and center around a conceptual theme (Brown & Spang, 2008; Sandoval et al., 2021). To contextualize the NLP outputs, we examined the classroom videos to generate rich descriptions of how the discursive resources identified in the NLP analyses co-occurred in academic versus hybrid discourse spaces. More specifically, the NLP provided a “technical analysis” of descriptive patterns in teacher and student talk. Then, based on the discourse spaces identified in a prior study (Bae et al., 2022), the qualitative exploration of academic versus hybrid discourse spaces in the videos provide a critical analysis that accounts for social and cultural theoretical concepts from the hybridity framework, including teacher and student in and out-of-school discursive repertoires and classroom activity structures within which the finer-grained discourse patterns are taking place (Engeström, 1999).
NLP Results
Frequency of teacher vs. student spoken words
Results showed that the total number of words spoken in each class period by teachers was n = 27,596 and by students was n = 11,356. Thus, classroom talk was characterized by mostly teacher talk (70.85%) compared to student talk (29.15%). Results disaggregated by teachers are presented in Table 2, showing that the predominance of teacher spoken words (ranging from 69.80% to 91.17%) was present in six of the seven classes. Class 2 was the exception, where the teacher’s spoken words (23.27%) were lower compared to the students’ (76.73%). Qualitative analyses showed that this was due to the prevalence of small group activities in the class 2 video.
Frequencies of teacher versus student spoken words by class
Closed vs. open questions
There were generally more closed-ended (57.65%, n = 829) compared to open-ended (42.35%, n = 609) questions across the classrooms. The breakdown by teacher is presented in Table 3.
Descriptives of closed versus open-ended questions by teacher
The distribution of questions is also presented by discourse space in Table 4. Notably, whereas closed-ended questions (65.12%) were more prominent than open-ended questions (34.88%) in everyday spaces, the opposite trend was observed in hybrid spaces (30.06% closed, 69.94% open). In traditional discourse spaces, the distribution of closed (41.64%) to open (58.36%) questions was relatively equivalent.
Descriptives of closed versus open-ended questions by discourse space
Analytic versus social language
As shown in Figure 1, students had a higher composite score (converted into percentiles) for analytic language (M = 39.73, SD = 10.40) compared to teachers (M = 25.57, SD = 3.87), whereas teachers used a higher composite score for social language (M = 16.49, SD = 2.66) compared to students (M = 11.09, SD = 3.83). Across the seven classrooms, the analytic language spoken ranged from 24.47 (SD = 36.36) to 55.41 (SD = 41.81) for students and from 19.30 (SD = 28.60) to 30.42 (SD = 34.41) for teachers. As detailed in our discussion, this result was initially counterintuitive but likely explained by the higher number of overall teacher-spoken words.

Student versus teachers’ use of analytic and social language.
To further contextualize these patterns in student and teacher talk, the proportion of analytic and social language across the three discourse spaces (everyday, hybrid, academic) were examined (Figure 2). The analysis revealed only one statistically significant difference in teachers’ use of analytic language across the three spaces. Teachers used analytic language significantly more often in traditional spaces (mean difference = 3.85, p = 0.08) compared to hybrid spaces. There were no significant differences in teachers’ use of social language across the spaces. In contrast to teachers, students exhibited significant differences in language use across the spaces.

Student versus teachers’ use of analytic and social language across everyday, hybrid, and traditional.
Students used analytic language significantly more often in hybrid spaces compared to both traditional spaces (mean difference = 7.83, p < 0.01) and everyday spaces (mean difference = 18.27, p < .001). Additionally, students used analytic language more in traditional spaces compared to everyday spaces (mean difference = 10.45, p < .001). Interestingly, the pattern reversed for social language. Students used social language significantly more often in everyday spaces compared to traditional spaces (mean difference = 1.96, p = 0.10). This indicates that students differentiate their usage of analytic and social language by discourse space, whereas teachers use similar levels of analytic and social language across discourse spaces.
Science versus routine language
Results from the topic modeling analyses showed 10 and 13 unique topics for students and teachers, respectively, which were categorized into science versus routine talk (J. Liu & Cohen, 2021). For students, five of the topics were categorized as science-related, characterized by the following disciplinary words (428 total, frequencies indicated in parentheses): earth/time/surface/science/the (186), winter/spring/summer/seasons/weather (91), twist/twisting/twists/tie/how (53), helix/double/two/multiple/twin (50), and shade/picture/they/them/see (48). The five routine topics were characterized by the following words (1,488 total): no/nope/not/yes/yeah (459), doing/done/we/do/help (310), they/them/something/like/the (293), yeah/yep/yes/yup/sure (257), and life/side/back/move/need (169). For teachers, eight of the topics were categorized as science-related, characterized by the following disciplinary words (426 total): seasons/season/winter/autumn/summer (145), salt/water/sugar/sodium/drink (59), nuclear/neutrons/neutron/radioactive/radiation (58), atmosphere/altitude/air/ground/layers (45), periodic/atoms/elements/count/periods (36), dna/mutations/mutation/science/chemicals (30), slope/math/terrain/calculate/slides (28), and measure/experiment/data/instructions/trials (25). The five routine talk topics were characterized by the following words (1,329 total): back/okay/alright/help/you (322), question/talk/questions/answer/you (302), doing/what/the/make/use (273), done/homework/doing/next/finished (289), and doing/looking/look/okay/wait (143).
These patterns showed that teachers’ talk consists of more science topics compared to students (eight versus five topics), whereas the number of routine topics was the same for both teachers and students (five topics). The qualitative nature of the science topics was, in most cases, unique between teachers and students, except for common topics related to weather and DNA. Additionally, the disciplinary nature of topics was more apparent for teachers (e.g., solubility, atoms, periodic table, force and motion), whereas the topics for students were more domain general in nature (e.g., twists, pictures). Another notable pattern was that the frequency of science words for both teachers and students was lower (426 and 428, respectively) compared to the frequency of routine words (1,329 and 1,488 for teachers and students, respectively).
Language coordination
Language coordination scores ranged between .76 to .91. These values indicate that there is a high degree of language style matching between the teacher and students.
Qualitative and Integrated Findings
The qualitative analyses and interpretations of the NLP results presented next focus on how academic and everyday discursive resources present in hybrid discourse spaces, using academic discourse spaces as a comparison. The classrooms were largely characterized by academic discourse spaces; analysis from a prior study of the 1,445 coded segments showed that academic discourse spaces were the most frequently observed (M = 71.88%, SD = 10.79%), followed by hybrid spaces (M = 19.88%, SD = 11.28%), and then everyday discourse spaces, which were the least common (M = 8.24%, SD = 3.40%, Bae et al., 2022). Our qualitative analyses in this study focused on better understanding NLP results in relation to features of the recorded lessons (e.g., topic, objectives, key activities, Table 5) that support hybrid discourse.
Description of science lessons
Teacher versus student spoken words
In relation to the frequency of teacher versus student words spoken, NLP results showed that, in general, teachers spoke more words compared to students in the recorded lesson. Recognizing that the number of spoken words is an indirect indicator of hybridity, with a greater balance between teacher and student talk in the science classroom signaling a more hybrid discourse environment. This balance reflects increased opportunities for students to publicly share their ideas (Snow, 2015) and a shift in authority toward students (Calabrese Barton & Tan, 2020). Qualitative analyses of academic versus hybrid discourse spaces illustrates how talk distribution commonly differs in this way (see Table 6 for example from class 7).
Illustration of word frequency distribution between teachers and students in academic versus hybrid discourse spaces
Qualitative differences in the traditional, teacher-dominant exchange within the academic discourse segment is that it is characterized by initiate-respond-evaluate (IRE; Mehan, 1986) patterns that position the teacher as the arbiter of knowledge. Here, students only offer brief responses that are vague or factual. The hybrid discourse segment in contrast is more dialogic in nature, with the student engaging in deeper reasoning and asserting their noticing of a discrepancy between their own and others’ responses (e.g., “both or our answers were kinda different?”), as well as using everyday (e.g., “the way that I was like doing it”) and scientific (e.g., “more pressure”) language to provide an explanation for the discrepancy in results.
Use of everyday and academic discursive resources in classroom discussions
The qualitative analyses also contextualized how the everyday and academic discursive resources are presented in science classroom discourse. Specifically, findings helped explain the unexpectedly high proportion of analytic student talk identified in the NLP results, showing that opportunities for student talk in classrooms seem to be predominantly in the context of responding to teachers’ science-related questions in both academic and hybrid discourse spaces. As illustrated in the previous examples, even in hybrid spaces where students are expressing more authority and using a blend of everyday and scientific language, the discourse segments were prompted by a teacher’s question. However, as expected, topic modeling results showed that teachers’ talk was characterized by more scientific topics compared to students. The qualitative analyses of teacher and student talk segments provided insight into how teachers were engaging in scientific talk; teachers were often modeling the use of scientific vocabulary during instruction and connecting it to students’ everyday talk. For example, the following excerpt from class 1 illustrates how the teacher is guiding students’ use of scientific vocabulary:
What is something that we know about the Earth? It’s not flat.
The orbit of Earth.
Yes, orbit, revolving. What else? I want you to think a little bit more into what we just talked about about the Earth. It says that the Earth rotates or spins on what?
Its axis.
Its axis. Right. It’s tilted. The two things are what? Somebody repeat that for me. What are the two things? Give me one, Jonathan.
The tilt on its axis and the revolution around the sun.
As illustrated here, the teacher is frequently connecting scientific vocabulary to students’ prior knowledge and reinforcing the use of scientific terms (e.g., “orbit,” “revolving,” “rotate”) into the discussion.
Related to this finding, the qualitative analysis of open- and closed-ended questions posed showed that both open- and closed-ended questions were primarily posed by the teacher and analytic in nature. For example, from class 3: T open-ended question: “What observations did you make about your surface? What about the ball? Did you make any observations about the surface texture of the ball?” T closed-ended question: “So it’s losing an electron. Meaning that it’s going down to its second electron shell. Meaning that it’s full right?”
Further, open-ended questions were characterized by a variety of disciplinary prompts for student observations (e.g., “What more can we find?”), problem-solving using science ideas (e.g., “When we are tilted away from the sun, what season would we experience?”), real-world applications (e.g., “What TV shows do y’all know that use DNA?”), and critical thinking (e.g., “Why do you think it’s important to learn about African American scientists?”).
In contrast, the use of open- and closed-ended questions incorporating social or informal language was most commonly observed in behavior management and routine procedural interactions. These questions were less about advancing academic thinking and more focused on monitoring student readiness and maintaining classroom order. For example, questions like “You ready?” or “Do you need a pencil?” were typically used to gauge preparedness for an upcoming task, while remarks such as “Dude. How much dirt are you bringing into my room man?” or “Did I tell you to touch this?” were often part of informal behavioral redirections. Similarly, exasperated rhetorical questions like “Good Lord what is wrong with y’all today?” or reminders such as “Can y’all stay in your seats?” reflect teachers’ attempts to redirect student behavior using casual, socially resonant language. This pattern sheds light on the disproportionately high frequency of teacher social language identified in the NLP results, showing that social language often served functional, managerial purposes aligned with the language of schooling.
Questions that facilitate hybrid discourse spaces
Finally, although observed less frequently, we identified unique instances in which teachers used open- and closed-ended questions to create hybrid spaces in which popular culture references (e.g., “It looks like Twizzlers with gumballs,” “What in the picture gives you the decade of the 90s?”) and links to historical events (e.g., “What’s that big explosion they did to end World War II?”) were made to science topics being taught.
The following excerpt from class 4 illustrates how teacher questioning can create a hybrid discourse space that blends academic content with students’ cultural and social knowledge. In this exchange, the teacher begins by referencing prior instruction on the nature of science, positioning science not merely as a body of facts but as a human enterprise shaped by collaboration, communication, and global exchange:
So this one, even though it’s on your paper, in LS-1 we learned about the nature of science, right? And so we still have it posted in the front of the room, and we talked about it, and we said that science behaves in certain ways. Like it is a social thing, which is why we have to talk in science and that we share research all over, with scientists all over the country and different countries. And we try to explain nature with it. So, knowing these things about science, about how scientists act or behave, I want you guys to tell me why you think this will make it important for us to learn about African American scientists . . . why do you think it’s important to learn about African American scientists, to support the nature of science?
To know, what did they invent, and how to respect the earth.
Thank you for sharing. Oh, last question. Why do you think it’s important to learn about African American scientists?
Because it’s not a lot of African American scientists we know about . . . So, we need to learn about them.
By stating that “science behaves in certain ways” and emphasizing its social nature, the teacher disrupts a narrow, decontextualized view of science and encourages students to see it as a dynamic, relational practice. This framing sets the stage for deeper reflection on who participates in science and whose contributions are recognized. For example, the teacher’s follow-up question, “Why do you think it’s important to learn about African American scientists?” serves to further open the space for critical inquiry. The teacher invites students to engage in reasoning and value-based reflection. This move both legitimizes students’ voices and expands the scope of the science lesson to include socio-historical awareness that is reflected in the students’ responses; one student connects the topic to respect for the earth, while another points out the underrepresentation of African American scientists in mainstream narratives, signaling a developing consciousness about representation.
Taken together, the qualitative findings align with the extant literature regarding the wide-ranging purposes of questioning in science classrooms, ranging from reinforcing ideas and checking for comprehension to promoting critical thinking and problem-solving (Chin & Osborne, 2008; Datta et al., 2023).
Diverse forms of language coordination
Finally, when qualitatively exploring instances of language coordination in the transcripts, different approaches of discursive style matching were identified, reflecting a nuanced form of relational pedagogy. These moments of discursive style matching, referred to here as language coordination, demonstrated that teachers were not simply using informal language but were intentionally or intuitively drawing upon the everyday linguistic repertoires of their students to build rapport, manage classroom dynamics, and bridge academic and everyday discourses.
Specifically, teachers were observed adopting elements of youth genres, regional dialects, and features of AAVE often in ways that reflected both the sociolinguistic makeup of the classroom and the teacher’s own cultural background. For instance, in School District 2, where four Black female teachers taught predominantly Black student populations, language coordination frequently involved the use of AAVE and culturally familiar terms of endearment (e.g., “baby,” “honey”) as part of emotionally attuned and relational classroom talk. In the excerpt from class 6, the teacher’s shift into AAVE (“Your mom get your report?” and “You improvin’ in my class”) not only eases the formality of the exchange but also signals care, familiarity, and a willingness to engage students on culturally resonant terms:
Your mom get your report?
She was like, the “needs improvements” need to come up.
How many “needs improvement” do you have? One?
Three.
Oh, that’s not bad! You just . . . You gotta improve. You improvin’ in my class.
This linguistic shift functions as a subtle affirmation of the student’s identity, recognizing him not just as a learner but as someone embedded in family and community life. By embedding care and encouragement in a culturally familiar register, the teacher creates a supportive and affirming space for the student.
In contrast, in School District 1, where three white teachers taught more racially and linguistically diverse classrooms, discursive coordination often took the form of referencing youth culture and humor in casual, playful exchanges. For example, in the following excerpt, a white male teacher from class 3 who uses a more casual youth genre is having a conversation with a couple of his male students:
Dude, I said we are not wearing earbuds. Would you guys like some music while we play?
No . . . Nuuh.
Can you put on some Young Boy?
Anybody else tired of the ads on YouTube?
Yes.
What I am going to say is that this is the third time I’m asking for the earbuds man. Yeah, I’m going to play some cheesy 80s music. It’s just good background, low level.
By using phrases like “Dude” and “cheesy 80s music,” and references to YouTube ads and rap artist Young Boy, the teacher is meeting students in their everyday discourse spaces, leveraging shared cultural references to defuse tension, redirect behavior, and maintain classroom cohesion. While these interactions were not explicitly instructional and seemed to occur primarily in the unplanned “in-between” spaces of the lesson, they served important social functions by fostering rapport, humanizing the teacher-student relationship, and signaling attentiveness to students’ interests and out-of-school (e.g., media) worlds.
Discussion
The aim of this study was to explore different (academic, hybrid) discourse spaces in middle school science classrooms. Using hybridity theory as our analytic framework, we applied NLP to examine key features of classroom discourse (word frequency, closed versus open-ended questions) and the everyday (social, routine) versus academic (analytic, scientific) discursive resources that were present in middle school science classrooms. The NLP-generated patterns were then contextualized via qualitative analyses of classroom videos to understand how teachers’ and students’ discursive resources operate in academic versus hybrid discourse spaces, accounting for lesson characteristics and social, interpersonal dynamics. Thus, while the NLP results illustrate through multiple descriptions what discursive resources were present, the qualitative analyses demonstrated how these discursive resources were (or were not) integrated in hybrid and academic discourse spaces. Our contribution lies in the application of NLP techniques to contemporary science discourse data from urban classrooms in the United States, coupled with traditional qualitative methods to contextualize machine-generated descriptions based on knowledge of the participants (teachers, students) and classroom environments.
The findings showed that traditional academic discourses were most prevalent, pointing to the longstanding need to create classroom discourse spaces that are more dialogic and where students’ FoK can be integrated with science curriculum (Bae et al., 2021b; Cazden, 2001; Patterson, 2019; Rodriguez, 2013; Stroupe, 2014; Varelas et al., 2015). Specifically, while the higher frequency of teacher (versus student) spoken words and prevalence of closed-ended questions point to a more traditional classroom, hybrid discourse spaces were also present, characterized by the presence of both academic (scientific) and everyday language, coordinated teacher-student talk, and more dialogic exchanges. Qualitative exploration of these NLP patterns in the classroom videos provides unique insights into how everyday and academic discursive resources operate in hybrid talk. We argue that even if minimal, the close examination of hybrid discourse spaces identified in this study is not trivial, as they represent examples of talk that go against the grain of schooling by positioning historically marginalized students as rightful classroom participants with knowledge and experiences that are valuable for learning science.
Hybrid Discourses in Science Classrooms
Our findings provide novel insights into characteristics of a hybrid discourse environment, with a focus on how teachers’ and students’ everyday and disciplinary discourses are integrated to expand science learning opportunities. Of note, our findings are meant to be illustrative rather than exemplary, as achieving transformative hybrid discourse spaces is an ambitious goal that requires coordinated shifts at multiple (e.g., cultural, political, curricular) levels that often go against convention. Additionally, it is important to acknowledge that everyday and academic languages are not dichotomous but rather are best represented on a continuum of discourse registers, reflecting the dynamic ways in which teachers and students navigate between school- and home-based ways of knowing in science learning. These points are discussed in more detail in relation to specific findings next.
Role of teacher facilitation and questioning in hybrid spaces
Although teachers spoke more frequently than students, this simple word count did not always diminish the presence of hybrid spaces. In fact, qualitative analyses revealed that teachers often played a pivotal role in bridging academic content with students’ everyday experiences. Compared to students, teachers made more consistent explicit connections between scientific concepts (academic, analytic language) and socially or culturally relevant contexts (social, everyday language). For instance, in one excerpt, a teacher introduces the science behind seasons by linking it to students’ lived geographic context: “Because the Earth is tilted, it is why we experience our seasons. In Virginia, we live in the northern hemisphere. What’s that line that cuts the world in half? What’s that called?” This kind of discourse exemplifies how teachers situate curricular content within students’ everyday frames of reference, playing an important role in modeling and scaffolding connections between curricular, mainstream science ideas and relevant examples from students’ lives (Rodriguez, 2013; Stroupe, 2014; Windschitl et al., 2020).
In particular, teacher questioning emerged as a key mechanism for fostering hybrid discourse. Although some teacher-directed questioning segments followed traditional, didactic patterns where teachers posed closed questions and evaluated student responses, others reflected a more dialogic approach, inviting elaboration, interpretation, and student reasoning. These dialogic exchanges were more characteristic within hybrid discourse episodes, suggesting that teacher-directed talk can open space for students to make meaningful connections between the curriculum and real-world phenomena. Although expanding student-led discourse remains an important goal, these findings affirm existing scholarship emphasizing the productive role of teacher facilitation in scaffolding scientific thinking and supporting students’ navigation between everyday and academic discourses (Cherbow & McNeill, 2022; Chin & Osborne, 2008; Colley & Windschitl et al., 2020; Datta et al., 2023).
Quantitative analyses further underscored the unique role of questioning in hybrid spaces. Across the three discourse spaces, the number of teacher-posed questions varied significantly: 43 in everyday, 163 in hybrid, and 658 in academic discourse. Notably, the proportion of open-ended questions was highest in hybrid spaces (69.94%), compared to academic (58.36%) and everyday (34.88%) spaces. This suggests that hybrid discourse spaces were marked by more flexible, exploratory talk, often shaped by teachers’ use of open-ended prompts. Our qualitative findings offer further insight into the nature of these open-ended questions. In hybrid spaces, teachers frequently prompted students to draw on both scientific knowledge and everyday observations, explore real-world applications of science (e.g., the role of chemical reactions in warfare), and engage in creative problem-solving linked to innovations that addressed societal needs. In several cases, teachers also used open-ended questions to invite culturally relevant reflection, such as asking students to consider the historical contributions and exclusion of Black scientists or to imagine the personal experiences of scientists in particular historical moments. These questions not only expanded students’ conceptual understanding but also encouraged them to critically examine who participates in science and how scientific knowledge is shaped by cultural and historical contexts.
In summary, our findings support prior research on the dynamic and multifaceted role of teacher questioning in promoting student engagement in science learning (Chin & Osborne, 2008; Murphy et al., 2018). However, this study extends the literature by illustrating how teacher questioning can serve as a catalyst for hybrid discourse spaces where students are invited to integrate disciplinary reasoning with their cultural knowledge, lived experiences, and critical perspectives.
Coordinating teachers’ and students’ FoK in hybrid spaces
Another important contribution of this study is the systematic documentation of discursive resources from today’s youth in hybrid spaces that represent students from historically marginalized groups in science. This finding is both timely and significant, as it offers a contemporary and context-specific view of students’ FoK, which are inherently shaped by cultural, linguistic, social, and generational factors. As scholars have emphasized, FoK are not static; they emerge in response to their communities, interests, and identities (Ladson-Billings, 2000; Paris, 2012).
Situated in urban middle schools in the mid-Atlantic region of the United States, this study captured a rich array of student discourse practices that reflect their everyday realities and cultural repertoires. These included the use of youth genres such as slang; references to popular media like TV shows, movies, and music artists; fashion trends; and spontaneous code-switching between SAE and AAVE. Students also made frequent mention of culturally and geographically grounded experiences, such as birthday celebrations, seasonal changes, and trips to major northeastern cities, highlighting how regional identity and social life inform their classroom contributions. These findings align with a growing body of literature demonstrating that when educators recognize and integrate students’ FoK into science instruction, it fosters more meaningful engagement, deeper understanding, and sustained participation (Calabrese Barton & Tan, 2018; Rosebery et al., 2016; Warren et al., 2001). By surfacing specific, situated examples of everyday language and cultural references used by students in science classrooms, this study contributes to a much-needed resource base for science educators seeking to design more culturally responsive instruction. For practitioners in urban education settings, where cultural and linguistic diversity is often high, this kind of contextual knowledge is particularly valuable. It offers concrete ways to reimagine science learning as a space where students’ voices, communities, and cultural practices are not only welcomed but seen as integral to the learning process.
This study also contributes to the growing body of research on hybrid discourse spaces by examining how teachers and students co-construct these spaces through reciprocal interactions. Our findings show that hybrid spaces are not created solely through student contributions or teacher scaffolding but rather emerge through ongoing discursive negotiation where both parties draw from and respond to one another’s FoK.
One of the most salient features of the hybrid discourse spaces in our data was the linguistic diversity brought by students. This diversity was expressed through youth genres, regional and cultural dialects, and popular cultural references. In turn, teachers frequently engaged in language coordination, adapting their own discourse to resonate with students’ everyday discourse styles. Our NLP analyses showed that all participating teachers demonstrated high levels of language coordination and our qualitative analyses revealed important nuances in how language coordination manifested across different classrooms. Specifically, we found that the nature of language coordination often varied by the racial and cultural alignment between teachers and students. For example, in classrooms where Black teachers taught predominantly Black students, language coordination frequently involved dialect matching and the use of culturally familiar expressions. In contrast, white teachers more often used youth genres, humor, or pop culture references as forms of engagement. These patterns suggest that teachers’ own identities and their cultural proximity to students shape not only whether they coordinate linguistically, but how they do so. This aligns with prior scholarship showing that racial and ethnic congruence between teachers and students is linked to improved behavioral, relational, and academic outcomes, in part due to shared cultural understandings and communication norms (Gay, 2002; Redding, 2019). The ways in which teachers attune to students’ linguistic practices (whether through dialect matching, shared humor, or topical references) warrants further investigation, particularly in relation to how such coordination fosters belonging, participation, and academic engagement in science classrooms.
Natural Language Processing in Education Research
Natural language processing technologies are advancing rapidly, offering promising new tools for educational researchers seeking to analyze language-rich data at scale. An affordance of NLP that is often cited is its capacity to process and analyze vast quantities of textual data that is prohibitively labor- and time-intensive using traditional human coding approaches (Crossley et al., 2014; J. Liu & Cohen, 2021). In this study, we demonstrate the value of integrating NLP-driven analytics with qualitative video analysis to capture broad patterns and context-specific nuances in science classroom discourse. By using NLP to generate descriptive outputs, such as the frequency of spoken words and the types of questions posed, we were able to reallocate human resources toward the interpretive work of analyzing how discourse unfolds in situ. This dual NLP and qualitative video analysis approach allowed us to maintain the depth of qualitative analysis while gaining efficiencies through automated language processing.
While our study highlights the powerful possibilities that NLP presents for analyzing classroom discourse, it also draws attention to the limitations of relying solely on machine-generated outputs. Despite the growing excitement around NLP in education research, several scholars have cautioned against early overreliance on these tools due to the field’s nascency and the inherent complexities of educational phenomena (Hunkins et al., 2022; Zawacki-Richter et al., 2019). NLP models, while increasingly sophisticated, are still limited by the quality, diversity, and representativeness of the training data on which they rely. As such, there is a risk that important nuances, especially those related to cultural, linguistic, and contextual variation, may be misrepresented or entirely overlooked. A particularly pressing concern is that many existing training datasets underrepresent the languages, dialects, and communicative practices of historically marginalized populations. As scholars have pointed out, this lack of representation can lead to biased outputs that not only fail to capture the richness of these students’ discursive contributions but may also inadvertently reinforce deficit narratives or stereotypes (Donnelly et al., 2017; Nee et al., 2021). In educational settings, where language use is deeply tied to identity, culture, and power, such limitations must be approached with caution and critical awareness. Many NLP applications in education are still in the early stages of validation and must be carefully tailored to specific populations and instructional contexts. These limitations highlight the importance of using NLP tools not as a stand-alone method but as part of a broader, theory-informed research approach.
Our study exemplifies this kind of integrative methodology. While NLP analyses helped us identify broad discourse patterns, the qualitative video analysis enabled a deeper, more accurate interpretation of those patterns in context. For example, an initial glance at word frequency data might suggest that teachers were overly dominant and didactic in classroom talk. However, the qualitative data revealed that many of these teacher-driven segments were hybrid discourse moments, where teachers used their talk to connect science content with students’ everyday lives and everyday discourses. This integrated, context-sensitive approach reflects calls in the literature to pair NLP analyses with educational theory and understanding of local classroom contexts, especially when analyzing the discourse of historically marginalized students (Lucy et al., 2020; Nee et al., 2021; Zawacki-Richter et al., 2019). We argue that the value of NLP in education lies not in replacing human interpretation but in enhancing it by offering scalable insights that can be meaningfully interpreted by scholars and educators.
Practical Implications
This study offers a broad overview of the range of NLP methods available to education researchers and practitioners, with a specific focus on their application in urban science classrooms. By situating machine-generated NLP results within real-world instructional contexts, we demonstrate how these tools can generate timely, actionable insights to support teachers in making rapid, responsive improvements to their practice.
A promising example of NLP in practice, as demonstrated in this study, is the ability to provide accessible and interpretable descriptive analytics that classroom teachers can use to reflect on and refine their discourse practices. Rather than relying solely on distal metrics or abstract evaluation frameworks, our NLP results are based on proximal data, resulting in feedback that is immediately relevant to the classroom and can inform ongoing instructional decisions. For example, NLP analyses that identify the ratio of analytic (academic) versus social (informal) language, or the frequency of academic versus everyday topics, offer science teachers a window into the linguistic landscape of their classroom talk. Such insights can help educators recognize which scientific vocabulary terms are being used regularly and which everyday terms students rely on to make sense of disciplinary concepts. This information can be used to bridge academic content with students’ everyday experiences, supporting the development of hybrid discourse spaces. Descriptives such as the frequency of teacher versus student spoken words also provide proximal feedback that can be used to reflect on participation patterns, inviting teachers to consider whether their classroom discourse is overly teacher-dominated (e.g., following traditional IRE patterns) or whether it encourages dialogic, student-centered engagement.
The findings from this study also shed light on the ways teachers and students are moving fluidly between discursive moments that are more academic and more hybrid in nature. We present specific high-leverage discourse moves, such as open-ended questions that prompt students to make connections to their lives and broader societal phenomena, as well as examples of language coordination that support hybrid talk in middle school classrooms for today’s youth. As educators are working to intentionally create more inclusive and relevant science learning environments for their students, our findings provide concrete examples of hybridity so that teachers can reflect on the discursive resources in their classrooms and consider ways to authentically integrate them in the service of students’ science learning.
Finally, systemic improvements to classroom interactions with students require teachers in job-alike teams to collaboratively study and revise their practice, whether for academically productive talk or pedagogies that welcome the lived experiences of minoritized youth (Agarwal & Sengupta-Irving, 2019; Jensen et al., 2021). It is important to acknowledge the organizational contexts within which teachers’ ability to create hybrid connections for their students can be supported or thwarted due to factors such as ability of resources, autonomy in instructional decisions, and assessment policies (Bae et al., 2025; Haverly et al., 2020). The data for this study were part of a larger multiyear study in which participating teachers collaborate in same-content lesson study teams to collaboratively plan, teach, study, and improve their instruction (Bae et al., 2016; Lewis, 2015). Ongoing professional learning opportunities that are grounded in teachers’ classroom practice are a crucial component of achieving the broader goals for equitable classroom talk.
Conclusion
This study advances the field’s understanding of hybrid discourse in urban middle school science classrooms by integrating NLP techniques with qualitative video analysis. Our findings showed that although traditional academic discourse patterns dominated most classroom talk, meaningful instances of hybrid discourse where everyday and academic discursive resources were integrated were present through social language, language coordination, and open-ended questions. Qualitative analyses further illuminated how teachers facilitated hybrid discourses in relation to specific disciplinary activities, the nuances of language coordination across contexts, and the FoK of today’s youth present in hybrid discourses. In addition to contributing insights into the nature and enactment of hybrid discourse, this study also demonstrates the value of combining computational and theory-driven qualitative methods to examine classroom talk. Finally, we discuss the practical implications of our findings for providing educators with timely, interpretable feedback for fostering equitable engagement in science classroom discourse.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This material is based upon work supported by the National Science Foundation Grant #1845048.
Authors
CHRISTINE LEE BAE is an associate professor of educational psychology in the Department of Foundations at Virginia Commonwealth University, 1015 West Main Street Oliver Hall 4052, Richmond VA, 23284; email:
KAMIL HANKOUR is a PhD student of educational psychology in the Department of Foundations at Virginia Commonwealth University, 1015 West Main Street, Richmond VA, 23284; email:
KIMBERLY WILLIAMSON is the director of educational analytics at Lehigh University, 27 Memorial Drive, Bethlehem, PA 18015; email:
MORGAN DEBUSK-LANE is a computational social scientist and principal data scientist at Gallup, 901 F Street NW, Washington, D.C. 20004; email:
