Abstract
Researchers have explored artificial intelligence (AI) applications across educational contexts; however, there is a lack of meta-analysis focused on students with disabilities (SWDs). This study examined the overall effect of AI-based interventions on SWDs’ learning outcomes in 29 (quasi-)experimental studies conducted globally. We used cultural historical activity theory (CHAT) to explore how the effect was moderated by factors, including participant-, AI-, AI-SWD interaction-, intervention-, and methodology-related characteristics. Results indicated a medium effect (Hedge’s g = 0.588) of interventions operating through robots, computer software, and intelligent VR systems. There were no statistically significant moderators. Regardless, this study contributes to a holistic understanding of historical dimensions of AI applications for SWDs and offers critical theoretical implications for future investigations. We call for more rigorous research to explore AI that not only ensures accessibility but also promotes opportunities for SWDs to take an agentic role in participating in and contributing to AI-mediated learning activities.
Keywords
The rapid development of artificial intelligence (AI) has captured increasing attention among researchers from across disciplines to develop and apply intelligent machines to various aspects of human society (Haenlein & Kaplan, 2019). The early enthusiasm was evidenced in a widely quoted statement by Herbert Simon, one of the founding fathers of AI, in which he argued that “their (machines that think, that learn, and that create) ability to do these things is going to increase rapidly until—in a visible future—the range of problems they can handle will be coextensive with the range to which the human mind has been applied” (Simon & Newell, 1958, p. 8). Such enthusiasm continues as quantities and capabilities of AI technologies have grown exponentially in recent years. Like any innovation, there is an emergence of debates and concerns regarding AI for its political, economic, ethical, and practical implications (Coeckelbergh, 2020). Be it exciting or concerning, AI has been and will likely continue to coexist with humans and augment human capabilities in many areas (Hamid et al., 2017).
Alongside swift AI progression, educational researchers have investigated various AI tools and their applications for different purposes, such as student learning, instruction, assessment, and administration, wherein AI was designed to perform humanlike tasks (Hwang et al., 2020). A mounting number of literature reviews and meta-analyses have emerged to analyze research efforts and effects of AI-based interventions or instructional practices for different learner populations (e.g., García-Martínez et al., 2023; Zheng et al., 2023). However, there are far fewer literature reviews of research on AI for students with disabilities (SWDs) and a lack of meta-analysis examining the impact of AI applications on SWDs’ learning outcomes. To address this research gap, this study serves as the first meta-analysis examining the effects of AI-based interventions for SWDs in pre-K–12 educational settings. To provide a comprehensive understanding of the existing literature, we intentionally explore varied types of AI tools applied to support students with a range of disabilities.
We employ cultural-historical activity theory (CHAT; Engeström, 1987) to guide our inquiry, given its versatile analytical lenses for elucidating intricate facets of AI-human interactions. Having its origins in Vygotsky’s (1978) sociocultural theory on human development and learning, CHAT has been extensively discussed and used by researchers from across disciplines, including (special) education (e.g., Roth & Lee, 2007; Waitoller & Artiles, 2013), educational technology (e.g., Gibson et al., 2023), and human-computer interaction (e.g., Kaptelinin & Nardi, 2012), to explain dynamic interactions between humans and social contexts through culturally mediated and historically accumulated tools and practices (e.g., technology, language). Beyond its interdisciplinary lenses, CHAT is of particular interest to this study due to its theoretical foundations that explicate sociocultural aspects of disabilities, thus offering an alternative to the deficit model of disabilities (Gindis, 1999). As such, we construct moderators guided by CHAT to examine factors that may impact the effect of AI-based interventions for SWDs, such as roles of AI, types of AI-SWD interactions, and involvement of other stakeholders (e.g., educators, peers, families). Specifically, four research questions (RQs) guide our inquiry:
What are the overall effect and average effects of AI-based interventions on SWDs’ learning outcomes across areas (e.g., academic performance, social-emotional skills)?
What types and roles of AI led to the highest effect on student learning outcomes?
What types of interactions among AI tools, SWDs, and other stakeholders led to the highest effect on student learning outcomes?
To what extent do learner characteristics (i.e., disability status, age), AI-related factors (e.g., type, role), types of AI-SWD interactions, intervention characteristics (e.g., publication type, intervention sessions, fidelity), and study design quality moderate SWDs’ learning outcomes?
In the following sections, we describe AI definitions and techniques as well as their applications in (special) education, aiming to provide a historical perspective that contextualizes the narrative surrounding AI applications for SWDs. Then, we present an overview of CHAT and discuss its analytical lenses that guide the identification of moderators.
A Brief Overview of AI Definitions and Techniques
The birth of AI can be traced back to 1950 when Alan Turing published the paper “Computing Machinery and Intelligence,” in which he proposed the question “Can machines think?” Later in 1956, John McCarthy, an American computer scientist and cognitive scientist, coined the term “Artificial Intelligence” with his colleagues during a summer workshop at Dartmouth College (Russell & Norvig, 2010). Since then, research on AI has developed into an interdisciplinary area of study, drawing upon advances and insights from fields such as philosophy, mathematics, economics, neurosciences, psychology, computer engineering, and linguistics. This interdisciplinary nature makes what constitutes and how to define AI consistently shift across and within disciplines, rendering AI a fairly loose umbrella term that encompasses diverse concepts, methods, and technologies (Luckin et al., 2016).
In a seminar textbook about AI, Russell and Norvig (2010) discussed four approaches to defining AI as the study of computer systems that think like humans, act like humans, think rationally, or act rationally. According to the authors, a system that can think or act humanly is measured in terms of fidelity to human performance (e.g., hypothesized or observed human thinking, cognition, and behaviors). A rational system is measured against whether it can act or think in a “right” or logical way through mathematical and computational models. These different approaches have resulted in a wide variety of AI techniques that have been applied to drive a machine to think and/or act.
One type of AI technique is natural language processing (NLP), which refers to a collection of computational techniques for a system to understand, synthesize, and generate human language, such as speech and text (Chowdhary, 2020). Researchers have applied NLP in speech recognition, handwritten character recognition, text-to-speech conversion, and machine translation systems since the early development phase of AI (Russell & Norvig, 2010). Modern-day applications of NLP are often seen in voice-activated technology, such as Google Assistant and Amazon Alexa. In addition to NLP, computer vision is another set of AI techniques that enable a machine to perceive objects, extract information from perceived objects (e.g., images, videos), and act based on those inputs (Forsyth & Ponce, 2002). Common applications of computer vision include facial emotion detection, action recognition, and multimedia analysis (i.e., analyzing semantic meanings of a multimedia document; Yao et al., 2000).
Relatedly, representing information in a form that enables a machine to comprehend its logic, reason with it, and act accordingly is the major focus of the subarea of AI technique—knowledge representation and reasoning (Van Harmelen et al., 2008). This type of AI technique builds analytical models by explicitly programming known relationships, procedures, and decision logic into machines in a way that allows them to answer questions, draw conclusions, and solve problems (e.g., expert systems for medical diagnosis; Janiesch et al., 2021). Advances in these AI techniques have driven the growth of intelligent agents (either robotic or software agents) running on a computing architecture (e.g., robotic devices, computers). These intelligent agents can capture information within their environment through various sensors, process input information, and act upon this information through actuators (Russell & Norvig, 2010).
In recent years, machine learning and deep learning have emerged as dominant subfields of AI that have significantly improved humanlike cognitive capabilities of intelligent machines in processing data for complex problem-solving and decision-making. Specifically, machine learning enables machines to learn from massive data to identify patterns, build analytical models, and perform associated tasks (e.g., generating predictions, answers, and recommendations) without explicit programming (Janiesch et al., 2021). As a subset of machine learning, deep learning uses sophisticated algorithms modeled on neural networks of human brains to process unstructured data, such as text, images, and audio (Goodfellow et al., 2016). For example, the latest breakthrough in AI—generative AI (known as large language models, such as ChatGPT)—leverages advanced techniques, including deep learning, to learn from massive datasets and produce humanlike responses to written inquiries from users (van Dis et al., 2023).
The Emergence of AI in Education
Accumulating research on intelligent machines and the learning sciences has led to the formation of the emergent field of AI in education (AIED), which brings together researchers from education, the learning sciences, and other disciplines to develop and utilize various types of intelligent technologies to support human learning (Roll & Wylie, 2016). The AIED community defines AI in many ways (please see online supplementary Table S1 for multiple widely cited definitions of AI by researchers from computer science, business, and education). Most AI definitions in education align with other disciplines and highlight intelligent, humanlike capabilities that an agent system possesses to perform tasks. Educational AI technologies can serve as powerful computational tools in any aspect of teaching, learning, assessment, and administration (Chiu et al., 2023). These tools can simulate humanlike intelligence, such as cognition, functional abilities, adaptive behaviors, and decision-making capabilities, to learn, perceive, and process information (Hwang et al., 2020). Previous research showed that the AIED community has largely focused on creating machines or systems to simulate human one-on-one tutoring and has made significant advances toward the goal over the past three decades (Roll & Wylie, 2016). For example, those technologies include intelligent tutoring systems (ITSs), adaptive learning systems, and robotics, which can provide one-on-one tutoring support for individual students to complete specific activities (Popenici & Kerr, 2017).
Codes, operational definitions, and coding criteria
From early on, educational researchers have discussed the role of AI in teaching-learning interactions between humans and machine-simulated humans as teachers or students (Baker, 1994; Salomon et al., 1991). Salomon et al. (1991) discussed that a more beneficial role of AI was in working with humans than working for them. He posited that an “intelligent partnership” built between humans and intelligent technology could aid in higher-order cognitive processing and support joint intellectual performance, thus exceeding human abilities alone (Salomon et al., 1991, p. 2). Building an intelligent partnership involves what researchers called “distributed intelligence” across humans and technologies (Pea, 1993, p. 47). In this sense, intelligence is not an attribute of the minds that individuals possess to form and transform mental representations of symbols; instead, intelligence is manifest in directing an activity to achieve shared goals and shaping necessary elements of that activity (Pea, 2004). These early discussions provided valuable insights into AI applications from a pedagogical perspective.
As AI has become increasingly sophisticated and ubiquitous, the distribution of intelligence among humans, technologies, and environments is being reshaped. The role of AI in assisting learning has also been evolving along with expanding capabilities or intelligence in processing, analyzing, and making decisions upon a large amount of data from diverse sources at an ever-expanding speed (Gillani et al., 2023). Research and discourses have thus explored how expanding capabilities of various types of AI technology could simulate complex teaching and learning scenarios (Chen et al., 2020; Chiu et al., 2023). For instance, immersive virtual reality (VR), conversational agents, and the Internet of Things (i.e., networked interconnection of everyday objects through ubiquitous intelligent systems) could be applied to provide interactive, exploratory, and multihuman, multiagent collaborative learning environments where students are guided to explore complex topics (Roll & Wylie, 2016). With accumulating research on the potential of AI applications, researchers have suggested that modern AI can play different roles in supporting learning and teaching, such as intelligent tutors, intelligent tutees, learning tools or partners, and policy-making advisors (see detailed descriptions in Hwang et al., 2020).
Previous Reviews of AIED
Previous reviews provided insights into topics and trends in the usage of AI in educational contexts, covering diverse foci, such as technical development and engineering aspects (Zhai et al., 2021), the state of application and theory gaps (Chen et al., 2020), and bibliometrics of AIED research (Kabudi et al., 2021). In some reviews, researchers explored AI applications in specific disciplines or contexts, such as mathematics and science (Papadopoulos et al., 2020), early childhood (Su & Yang, 2022), and higher education (Hinojo-Lucena et al., 2019). In multiple meta-analyses, researchers examined the impacts of AI on student performance or perceptions across K–16 settings (García-Martínez et al., 2023; Zheng et al., 2023). In their meta-analysis of 25 group-design studies, García-Martínez and colleagues found positive impacts of AI on student learning, with most studies focused on science, technology, engineering, and mathematics (STEM) learning. They also identified a greater effect on student learning in secondary education and university settings where more types of AI technologies were applied compared to elementary schools. Similarly, Zheng et al. (2023) found a large effect of AI on student learning achievement (Hedges’ g = 0.812) and a small effect on student perception (g = 0.208) based on 24 synthesized articles.
Special Education Technologies and AI
To a certain extent, advances in AI techniques (e.g., software) and technologies (e.g., hardware) have driven the application of technology in special education. For example, special education researchers have invested significant efforts in using and examining the effects of speech recognition software as assistive technology (AT) on learning outcomes for SWDs, especially students with learning disabilities (Morphy & Graham, 2012). Other technologies, such as text-to-speech systems and word processors—the early foci of AI development—have been used as AT to improve the accessibility of instructional materials for SWDs (e.g., Brown & Cavalier, 1992). These technologies are usually programmed to decode and convert visual, audio, and audiovisual information, thus possessing humanlike capabilities to talk or listen to users. The humanlike capabilities help increase, maintain, or improve individuals’ functional capabilities (e.g., visual, hearing, communication, cognitive, and motor skills). The more recent advances, such as machine learning and deep learning, have enhanced AT’s capabilities in deciphering information from complex human communication, expression, and gestures (Zdravkova, 2022). Integrating these advanced AI techniques into AT solutions enables machines to become more efficient and accurate when “talking” or “listening to” SWDs.
Additionally, research on special education technologies focuses on providing support for individual learners (Woodward & Rieth, 1997), which corresponds with one of the foci of AIED. Many early computer-based expert systems developed by special education researchers embodied AIED researchers’ visions for using AI to enhance personalization in learning for individual learners. For example, researchers developed a variety of early expert systems to improve proficiency and consistency in the procedure of determining a student’s eligibility for special education services (e.g., Hofmeister & Lubke, 1988). These expert systems promoted distributed intelligence or expertise by involving participation from multiple stakeholders, such as teachers and psychologists, in the decision-making processes (Woodward & Rieth, 1997). Other examples include using computer systems to aid human experts in identifying target behaviors and selecting appropriate methods for changing such behaviors (e.g., Hofmeister & Lubke, 1988), tracking student progress toward Individualized Education Program (IEP) goals (e.g., Parry & Hofmeister, 1986), and modeling SWD learning through prompts (e.g., Gerber et al., 1994).
Previous Reviews of AI in Special Education
To our knowledge, there are no meta-analyses and only a limited number of literature reviews specifically focused on AI for SWDs. Of two recent literature reviews, Hopcan et al. (2023) synthesized 29 studies and summarized trends in research on AI in special education. Their findings revealed that the largest number of studies investigated AI for students with autism spectrum disorder (ASD), followed by specific learning disability (SLD), attention-deficit/hyperactivity disorder (ADHD), intellectual and developmental disabilities (IDD), multiple disabilities, and other disabilities. Compared to AI research efforts on improving STEM learning for students without disabilities, most studies for SWDs focused on skill development, although the types of skills were not specified in the review. Moreover, software-based applications were more frequently investigated than other AI methods (e.g., robotics).
In another study, Barua et al. (2022) reviewed 26 studies on AI-enabled personalized assistive tools for students with neurodevelopmental disorders (e.g., IDD, SLD, ASD, mental health disorders). Consistent with the other review, most studies in this review focused on students with ASD, SLD (specifically dyslexia), and ADHD. For example, many researchers conducted robot-assisted instructional or therapeutic interventions to improve learning or social skills for students with ASD, while others developed intelligent AT (e.g., speech recognition software, adaptive systems) to assist learning or behavior skills for students with SLD or ADHD.
Although these reviews categorized AI applications for students with different types of disabilities, there was far less focus on analyzing how AI applications were guided by theories and what factors (e.g., student characteristics, interactions among stakeholders) would impact such applications. A literature review conducted by Tlili et al. (2020) addressed the gap by using CHAT to explore interactions among SWDs, robots, and stakeholders (e.g., family, professionals) as well as how these interactions supported target outcomes. In line with other reviews, robots were mostly used for students with ASD, followed by students with ID or physical disabilities. Most learning activities identified in this review were robotic programming (e.g., building a robot) and robotic play (e.g., robots providing multimodal interactions), followed by literacy development, exploration and ideation, kinesthetic tasks (e.g., robots engaging students in producing motions and gestures), and artistic creation. However, it remains unclear of the effects of robot-assisted instruction on learning outcomes for SWDs.
To conclude, although researchers have investigated the increasingly humanlike intelligence in technologies, less is focused on exploring the dynamic, interactive, and agential roles that AI and humans can share in achieving desired educational goals and outcomes (Knox et al., 2019). It remains underexplored regarding the extent to which interactions with AI and other factors impact learning for SWDs. In this study, therefore, we apply CHAT to conceptualize interactions among SWDs, AI, and other stakeholders and to assess the impact of relevant factors as moderators influencing the effects of AI-based interventions for SWDs.
Theoretical Framework: Cultural-Historical Activity Theory
CHAT was once described as “one of the best-kept secrets of academia” (Engeström, 1993, p. 64), “Vygotsky’s neglected legacy” (Roth & Lee, 2007, p. 186), and “a unique theoretical framework for perhaps the most comprehensive, inclusive, and humane practice of special education in the 20th century” (Gindis, 1999, p. 333) due to its analytic power. Since its introduction into the Anglo-Saxon academia (e.g., Cole & Engeström, 1993; Engeström, 1987), CHAT has been used as a metatheory explaining human activity in various contexts (Roth & Lee, 2007). From the CHAT perspective, human development and learning occur when individuals interact with their social contexts. The interactions are reciprocal and dialectical, with individuals continually influencing and being influenced by their social contexts, thus mutually adapting to each other (Gindis, 1999; Smagorinsky et al., 2016). Many researchers posit that the dialectical relationship between individuals and social contexts highlighted by CHAT helps bridge the problematic divides between individual, biological dimensions and collective, sociocultural dimensions of human development and learning (e.g., Roth & Lee, 2007).
A core aspect of Vygotsky’s (1929) sociocultural theory is that individuals develop psychological functions (e.g., attention, memory, emotion, higher-order thinking) through internalization and externalization processes. Internalization involves individuals incorporating external activities and cultural values into internal cognitive and behavioral functions, whereas externalization involves outwardly expressing internal thinking and mental processes using physical and psychological tools (e.g., technology, language; Cole & Engeström, 1993). However, one idea that has been relatively neglected is that Vygotsky’s conceptions of disability and empirical work with children with disabilities had largely contributed to the formation of his theory (Gindis, 1999). According to Vygotsky, there are two types of disabilities—primary disabilities (i.e., specific biological conditions that influence individual differences) and secondary disabilities (i.e., a sense of inferiority resulting from stigmatization and deficit views of individual differences)—that impact individuals’ developmental trajectories (Smagorinsky et al., 2016). Vygotsky argued that the problem resides in social implications for disabilities rather than individuals’ biological differences; thus, it is imperative to adopt material and psychological tools to compensate for individuals’ biological differences and, more importantly, to facilitate their cultural developments and promote social participation (Gindis, 1999).
Vygotsky’s conceptions of reciprocal interactions between humans and social contexts through culturally mediated tools were often referred to as the first-generation CHAT; however, he did not articulate activity as a unit of analysis of these interactions (Engeström & Sannino, 2021). Later, Vygotsky’s students A. N. Leont’ev and A. R. Luria expanded CHAT’s analytic entity by incorporating more concrete factors explaining societal, historical, and cultural influences on human psychological functioning (Roth & Lee, 2007). The expanded framework constitutes the second-generation CHAT, which highlights the collective dynamics of subject, object, communities, divisions of labor, rules, and mediating tools through ongoing interactions (Engeström & Sannino, 2021). Engeström (1987) illustrated this extended framework through a well-known triangle representation. We adapt the diagram to discuss the core elements of the second-generation CHAT when applied to explain AI-SWD interactions (see Figure 1).

AI-human interactions through the lens of the second generation of cultural-historical activity theory (CHAT).
Through the lens of CHAT, an activity system is an evolving, complex structure of collective human agency as a whole (Roth & Lee, 2007). Within an activity system, a subject refers to an individual or group (e.g., SWDs; depicted in the blank rectangle at the right of Figure 1) who engages in an activity and acts on their own motives and goals. The object of an activity is the motive behind subjects’ participation in the activity, which exists first as a material entity acted on by the subject and second as a desired outcome pursued by the subject. The subject uses external tools, which can be either physical (e.g., AI tools; shown in the blank rectangle at the top of Figure 1) or symbolic (e.g., language, social signs, scientific models), to physically act on and consciously achieve the object. For example, the object of our example is defined as SWDs using AI tools to engage in a learning task (indicated in the darker gray rectangle). The desired outcomes from the activity system are changes in students’ psychological functions manifested in knowledge, skills, or behaviors (e.g., increased learning outcomes; indicated in the circle). The top triangle illustrates the subject-object orientation or interaction mediated by AI tools.
How SWDs interact with AI tools (depicted through arrows between subject and object) to develop psychological functions is shaped by their biological factors (e.g., vision, hearing, cognition) and technological affordances of AI tools (e.g., capturing information, acting upon received information). These interactions are influenced by cultural-historical factors, such as educators’ cultural beliefs and perceptions toward SWDs and AI tools that are developed at a point in time and adapted over time. These factors further define or are defined by community, divisions of labor, and rules (displayed in the lighter grey rectangles) that mediate subject-object interactions. The community is the social context within which subjects interact with other individuals (e.g., educators) who share the same object. The community-subject interactions are mediated by rules that explicitly and implicitly regulate the subject’s actions toward the object (e.g., instructional practices, cultural norms, research procedures, codes of ethics).
Additionally, division of labor defines what tasks are being done by whom toward the object in collective actions (Roth, 2004). This can involve horizontally dividing roles, tasks, and responsibilities among community members within the same level of the activity system and vertically dividing power, status, and resources that reflect different levels of decision-making and control within a community (Engeström, 1987). The division of tasks and responsibilities separates individual goals from collective goals, which may lead to contradictions or tensions when individuals with different voices negotiate tasks and shared goals (Roth & Lee, 2007). The process of negotiation among multiple voices, which are rooted in individuals’ differing roles, perspectives, or histories of communities of practice, has the potential to expand learning for individuals or communities by exploring new concepts and practices (Engeström, 1987). It is worth noting that CHAT has undergone another two generations 1 of theorizing and research. Further descriptions of these generations are beyond the scope of this study. Interested readers can refer to Engeström and Sannino (2021) for detailed discussions.
Theoretically Guided Considerations for Main Effects and Moderator Analyses
We adopt the second-generation CHAT to analyze AI-based interventions for SWDs, although we acknowledge that there might be a lack of comprehensive descriptions of intricate and dynamic AI-SWD interactions as viewed through CHAT in the literature. Our intention is to leverage CHAT’s theoretical perspectives to understand the existing evidence of AI applications on SWDs and uncover underexplored areas regarding AI-SWD interactions. Researchers have employed this kind of retrospective analysis through CHAT to provide data-informed directions for future research on computer-human interaction (Kaptelinin & Nardi, 2012).
To address RQ1, we examine the main effect of SWDs using AI as mediating tools to complete learning tasks (i.e., AI-mediated subject-object interaction; see Figure 1). We hypothesize that this subject-object interaction is influenced by elements related to learner characteristics, AI-related factors, rules, involvement of community members, and shared responsibilities among them. Through the lens of CHAT, these elements interdependently influence complex AI-mediated interactions. Our attempt to address the complexity involves a two-fold process. First, we synthesize information pertinent to individual elements. Second, we use synthesized information related to AI (i.e., types and roles; RQ2) to discern potential mediation patterns that indicate how AI mediated interactions among elements (RQ3). We group CHAT–driven moderators and other moderators essential for a meta-analysis into participant, AI, AI-SWD-community interaction, intervention, and methodology characteristics (RQ4).
Participant Characteristics
The participant characteristics investigated in this study include students’ disability status, age, and gender based on the hypothesis that different student characteristics would impact how students used AI tools to interact with others. Such information helps understand the relations between the object (i.e., SWDs using AI tools) and desired outcomes (i.e., learning outcomes) of an activity system. We did not examine participants’ race/ethnicity as a moderator given that classifications and census definitions of race/ethnicity vary across countries, and data on such information might not be comparable across studies.
AI-Related Characteristics
We hypothesize that different types of AI used as mediating tools and the different roles they played in the interventions have varying impacts on student learning (RQ2). Although not specifically focused on SWDs, previous meta-analyses of AIED research reported multiple moderating effects of AI tools. For example, Zheng et al. (2023) examined moderators, such as the role of AI (e.g., intelligent tutor), areas of applications (e.g., tutoring), hardware (e.g., tablet computers), software (e.g., adaptive systems), and AI techniques (e.g., NLP). The largest moderating effect was evident in AI when it was used as policy advisors (g = 2.875), provided personalized recommendations (g = 1.084), functioned as agent systems (g = 2.059), and operated through NLP (g = 1.071), respectively. In another study, García-Martínez et al. (2023) found that AI-based VR presented a larger effect (g = 2.01) than the other types of AI. Given the lack of such information for SWDs, it is critical to investigate what AI tools were applied, how they were used, and what effects they had on learning outcomes of SWDs.
Interaction among AI, SWD, and community
We hypothesize that AI tools mediate student interactions with other members of the learning community (e.g., teachers, families) differently, leading to differential impacts on learning outcomes (RQ3). According to CHAT, AI-SWD interactions are influenced by regulating rules, which could be instructional guidance, research protocols, or cultural norms; these rules further delineate the division of labor among all members of the community. In their review of robot-assisted instruction for SWDs through the lens of CHAT, Tlili et al. (2020) explored intervention procedures and performance measures as regulating rules. The results revealed that procedures or measures used across studies varied greatly. Additionally, the researchers found that most studies included special education professionals (e.g., educators, therapists) as community members who, among other things, facilitated student learning processes and provided feedback on the use of robots. This was followed by a few studies that involved the participation of families and peers. In the present study, we synthesize information on intervention or instructional features as regulating rules, as well as information on community members and places where the intervention took place. Moreover, we categorize information on potential patterns in interactions among AI, SWD, and community members and how stakeholders divide labor or responsibilities in these interactions. Detailed descriptions of how we code this information are provided in the Methods section.
Intervention and Methodology Design Characteristics
In addition, moderating effects of intervention characteristics, methodology design, and study quality provide important information about the conditions under which AI interventions were most effective. Without clear information from previous reviews, we exploratively examine various variables as potential moderators of AI interventions for SWDs. These moderators include intervention duration and sessions, treatment fidelity, publication year, publication type, and study quality.
Methods
To guide the search and inclusion processes, we considered technology as AI tools if they aligned to one or more components of AI definitions provided in online supplementary Table S1. According to the AI report released by the U.S. Department of Education (2023), two elements that shift AI from conventional educational technology include “detecting patterns in data” and “automating decisions about instruction and other educational processes” (p. 1). These elements well define modern applications of AI technology. To provide a comprehensive and historical understanding of AI applications in special education, we referred to the four broad approaches to studying AI, as discussed in Russell and Norvig (2010), to define technologies that think like humans, act like humans, think rationally, or act rationally. Additionally, we referred to various AI techniques discussed in the Introduction, such as NLP, computer vision, speech recognition, robotics, and machine learning, to determine if a technology satisfied the broad definition of AI.
Literature Search Procedures
Databases and Search Strategies
We followed a three-step search procedure to identify relevant studies. First, we used ERIC, PsycInfo, and PubMed to search for peer-reviewed journal articles. IEEE Xplore was used to identify articles and papers that were associated with disciplines, including computer science, electrical engineering, and electronics. ProQuest Dissertations & Theses A&I was used to search for unpublished dissertations. Second, we conducted ancestry searches of previously published syntheses (e.g., Hopcan et al., 2023; Tlili et al., 2020) to identify any studies that were not included using the online database search. Third, we performed hand searches of nine journals, including Computers & Education (C&E), British Journal of Educational Technology (BJET), Journal of Computer-Assisted Learning (JCAL), Computers & Education: Artificial Intelligence (CAEAI), International Artificial Intelligence in Education Society (IAIED), Journal of Special Education Technology (JSET), Exceptional Children (EC), and Remedial and Special Education (RASE). We selected these journals because they represent high-ranked journals in educational technology (i.e., C&E, BJET, JCAL), the journals most relevant to AIED (i.e., CAEAI, IAIED), the top journal in special education technology (i.e., JSET), and the leading journals in special education (i.e., EC, RASE), ensuring comprehensive coverage of topics on AI for SWDs.
Search Terms
We applied the Boolean approach to searching for relevant literature. Boolean search uses operators (e.g., “AND,” “OR”) to define relationships between different parts of terms in a search string (e.g., “both terms,” “either term”), thus enabling researchers to specify the scope of their search query (Gusenbauer & Haddaway, 2020). We included three parts in our search string: (a) AI-related terms, (b) SWD-related terms, and (c) outcome-related terms (please see supplementary Table S2 for detailed search terms). First, we created a comprehensive list of terms to identify AI-based technologies by cross-referencing terms used in previous research on AIED (e.g., Chen et al., 2020; Hwang et al., 2020). We included relevant terms such as “intelligent tut*” and “intelligent learning” as our search terms. We incorporated technologies that were frequently discussed as AI-driven tools into the search terms, such as “conversational agent” and “recommend* system.” We did not include such technologies as “virtual reality,” “game-based learning,” and “mobile learning” as our terms, considering that these technologies are not always operating on AI. Using terms such as “AI” or “intelligent system” enabled us to identify studies that investigated such technologies if they were driven by AI. For example, VR becomes “intelligent” when it is augmented with AI in ways that AI techniques could enhance the abilities of the virtual world to interact with and respond to user actions (Luckin et al., 2016).
Second, we applied SWD-related terms, including multiple generic terms (e.g., “special education,” “disabilit*,” and “disorder”) and specific terms based on disability categories defined in the Individuals with Disabilities Education Act (IDEA), such as “emotional disturbance” and “autism.” Stems with asterisks allow for including a wide range of terms related to the stems; for example, using “disabilit*” helped capture terms such as “students with disabilities,” “learning disabilities,” and “intellectual disabilities.” We included “mental retardation” because it was not until the 2010 U.S. federal statute Rosa’s Law that the term was replaced by “intellectual disability.” Third, based on Bloom’s taxonomy of educational objectives (Krathwohl, 2002), we used outcome-related terms, such as “performance,” “impact,” “cogniti*” and “skill,” to identify studies that investigated the impacts of AI-based interventions on SWDs’ learning outcomes.
Limiters for Initial Search
We followed the procedure as outlined in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Page et al., 2021; see Figure 2) guidelines to search, screen, and identify relevant studies. We utilized multiple search limiters to narrow down our initial search, including (1) the publication document was written in English; (2) the participants were students in PK–12 settings; and (3) the studies were published between January 1994 and January 2023 (i.e., when the search was conducted). The year 1994 was chosen as the starting year because it represented the beginning year of early research on AIED according to Roll and Wylie (2016).

PRISMA literature screening and identification procedure.
All search records were imported into Endnote, a software used in this study to organize literature search results. After removing duplicates across databases, the first author conducted the first-level search for eligible studies by screening titles and abstracts of 12,279 documents, from which 301 document records were identified for further screening. Then, the first and second authors independently screened the abstracts/titles and retrieved full texts of 91 records. These full texts were read and evaluated independently by the first and second authors against the inclusion and exclusion criteria described in the next section. The interrater reliability (IRR, which is calculated by using the formula: Agreements/ [Agreements + Disagreements] × 100) of the screening process was 97%, and all discrepancies were resolved by consensus.
Inclusion and Exclusion Criteria
To determine eligibility for inclusion, articles had to meet the following criteria. First, the study reported intervention(s) that were delivered via or facilitated by AI tools, the descriptions of which aligned with AI definitions provided in the present study (please see online supplementary Table S1). We defined these interventions as AI-based programs or instruction that focused on improving learning experiences for SWDs. Second, the reported intervention(s) focused on students who had an identified disability or disabilities or reported disaggregated data for these students when other students without disabilities in PK–12 settings were also included in the study sample. Third, the intervention(s) used randomized controlled trials or quasi-experimental designs. We excluded studies that used single case design (SCD) methods given that these designs are considered as lacking generalizability to a population (e.g., Huskens et al., 2013). Additionally, because statistical significance is not the primary driver of drawing conclusions or mitigating publication bias in SCD, the statistical analyses commonly used to examine selective reporting and publication bias in group designs may not be suitable for SCD (Ledford et al., 2023). We excluded SCD studies to maintain consistency in statistical methods and ensure the coherence of the synthesized findings. Fourth, the study reported at least one educational outcome (e.g., academic performance, behavioral, social-emotional) for SWDs that resulted from students using or interacting with the AI tools. That said, the study included at least one quantitative measure of a learning outcome. Fifth, the study reported statistics (e.g., means, standard deviations) to calculate Hedge’s g for experimental and quasi-experimental studies.
After applying these criteria to screening the 301 article abstracts, 91 reports had the potential to meet the inclusion criteria. To proceed, we reviewed the full texts of these articles and excluded 61 articles. Five types of studies were excluded from the analysis. First, studies merely focused on examining AI tools or techniques as technological infrastructure and did not evaluate the effects on SWDs’ learning outcomes (e.g., Yun et al., 2016). Second, studies investigated the role of AI tools in classifying or diagnosing SWDs with a focus on accuracy, such as robot-assisted techniques for analyzing social behaviors in students with ASD or diagnosing child-robot interaction (e.g., Petric & Kovacic, 2020), machine learning for classifying children with IDD or SLD (e.g., Dutt et al., 2022), hand gesture recognition for identifying sign language (e.g., Al-Hammadi et al., 2020), and speech recognition for identifying language disorder in bilingual children (e.g., Ahuja et al., 2022). Third, studies did not include SWDs as participants or did not provide disaggregated data for SWDs (e.g., Berrezueta-Guzman et al., 2021). Fourth, studies investigated the effect of human-student interactions on how AI tools performed (e.g., robot learning) but did not assess the outcomes of students as learning partners (e.g., Boucenna et al., 2014). Fifth, studies did not meet (quasi-)experimental design quality standards (e.g., Vallefuoco et al., 2022), which are further discussed in the Study Quality section.
This process yielded 26 relevant studies. To improve the accuracy in identifying and including relevant studies, the first and second authors rescreened the title/abstract dataset for any misclassifications using Rayyan. Rayyan is an AI-powered screening software that uses machine learning to score references in terms of their likelihood for inclusion (Ouzzani et al., 2016). Included articles were used as training data to improve Rayyan’s identification accuracy. Four records were identified from this procedure that met the inclusion criteria. Additionally, we identified two eligible reports from previous reviews of AI for SWDs. However, no eligible studies through hand searches of relevant journals were identified. Among these 32 studies, three studies (i.e., Lam, 2018; Nguyen et al., 2021; Voss et al., 2019) did not provide sufficient information for calculating effect sizes. We contacted authors to obtain relevant information but received no replies. Thus, we excluded these studies. In total, we identified a final sample of 29 studies reported in 24 peer-reviewed articles and 4 dissertations.
Coding Procedure
Guided by CHAT and research questions, we developed a coding scheme consisting of six groups of codes: (a) learning outcomes, which resulted from SWDs using AI to pursue the object (subject-object interaction); (b) participant characteristics (subject); (c) AI-related characteristics (tool); (d) AI-SWD-community interaction-related characteristics; (e) intervention characteristics; and (f) methodology design characteristics (see Table 1 for definitions of codes and coding criteria). We conducted an abductive coding approach that combines both deductive and inductive coding strategies (Corbin & Strauss, 1990) for AI type, AI role, and interaction type given the explorative considerations of these moderators. For example, we deductively coded the type of AI in terms of primary hardware and software that served as technological infrastructure applied in the study. Then, we inductively coded technological functionalities to determine if distinctive categories would emerge from reviewed studies. Information gleaned from the abductive coding process guided us in categorizing AI types and roles.
Study Quality
We used the What Works for Clearinghouse (WWC) group design standards version 4.1 (What Works Clearinghouse [WWC], 2020) to evaluate the quality of each study included. Studies received the highest rating Meets WWC Design Standards Without Reservations if the study was an RCT with low levels of sample attrition. Studies received the medium rating Meets WWC Design Standards with Reservations if the study was a QED or RCT with high levels of attrition as well as satisfied the WWC’s baseline equivalence requirement (i.e., treatment and comparison groups to appear similar at baseline). Studies received the lowest rating Does Not Meet WWC Group Design Standards if the QED or high-attrition RCT study failed to satisfy the baseline equivalence requirement. We applied the WWC’s more cautious (than optimistic) assumptions for attrition rates to label each RCT study as either low or high attrition based on the combination of overall (i.e., the percentage of the entire sample that has been lost) and differential rates of attrition (i.e., percentage point difference in the rates of attrition for the intervention and comparison groups). See the WWC Standards Handbook 4.1 for details on the model of attrition assumptions.
Interrater Reliability
A consensus on the coding scheme was reached between the first and second authors before they conducted independent coding. An average IRR of 92.1% was obtained across all coded variables, with 95.4% for participant characteristics, 86.21% for AI-related characteristics, 94.4% for intervention characteristics, and 93.1% for methodology design characteristics. The two coders followed an open discussion procedure to resolve discrepancies. First, both coders independently reviewed and reflected on their own codes; then, they exchanged rationales for their coding decisions to foster mutual understanding. This process was implemented to ensure a thorough and collaborative resolution of coding disagreements. After resolving all discrepancies, we reached a 100% agreement for the coding of all included records.
Analytic Strategies
Effect Size Calculation
We extracted means and standard deviations (SDs) of pre- and post-test learning outcomes and the sample size of treatment and control groups of each study. We calculated standardized effect sizes by dividing the difference between adjusted means of treatment and control groups by pooled SDs. Then, we computed Hedges’ g, adjusted for small sample sizes, to provide unbiased estimates of effect sizes. Random-effects meta-analysis and meta-regression models were performed to evaluate the main and moderating effects of AI-driven interventions using R (R Core Team, 2013) and the R package robumeta (Fisher et al., 2017).
To address RQ1, we performed an intercept-only, random-effects model (i.e., without predictors) to calculate the overall average effect size of AI interventions for SWDs based on all estimated effect sizes. We used the robust variance estimation (RVE; Hedges et al., 2010) technique within the random-effects models to account for presumed heterogeneity in participant samples, AI-related characteristics, and interventions between studies. Moreover, the RVE technique adjusts standard errors to account for dependence among effect sizes due to multiple corrected effects within studies (Pustejovsky & Tipton, 2022). To address heterogeneity, we evaluated the parameters I2 to assess consistency among studies and τ2 with corresponding prediction intervals to provide the expected range of true intervention effects on the individual study level (J. P. Higgins & Thompson, 2002). To address RQs 2 and 3, we carried out separate intercept-only, random-effects models for five categorical variables, including disability status, learning outcome, AI type, AI role, and interaction type, to calculate the overall average effect size within each subgroup.
To address RQ 4, we performed weighted random-effects meta-regression models to estimate possible moderating effects that might explain sources of between-study heterogeneity. We used the overall average effect size based on all effect sizes as the outcome and included all moderators, including various participant-, AI-, interaction-, intervention-, and methodology-related characteristics, simultaneously into the meta-regression models to determine whether those variables significantly moderated the effect of AI interventions for SWDs’ learning.
Data Preparation
We screened data for missingness and categorical moderators that comprised small samples prior to statistical analyses. We identified two categorical moderators (i.e., disability status and learning outcome) with small samples and collapsed categories within them to increase statistical power (Hedges & Pigott, 2004). For the disability category, we collapsed “SLD” (k = 6), “IDD” (k = 3), and “Deafness” (k = 1) into “Disability rather than ASD.” For the learning outcomes category, we added “daily life skills” (k = 1) to “other skills” (k = 2). Moreover, most interventions (n = 28) provided step-based guidance and feedback; thus, we did not identify distinctive categories for instructional practices as regulating rules. Most studies (n = 25) were conducted in special education or other nongeneral education settings, and four studies did not specify such information. Given the lack of meaningful comparisons, we excluded these variables from moderator analyses. Lastly, the type and role of AI were highly correlated; thus, we excluded the AI type from moderator analyses to control for multicollinearity.
Sensitivity Analyses
Sensitivity analyses were employed to evaluate the robustness and reliability of the results on the overall effect size estimate (Harwell & Maeda, 2008). We evaluated the robustness of the results by comparing the findings of the models with and without outliers. We manipulated the ρ values from .00 to 1.00 to assess how sensitive the results were to varying levels of correlation among parameters in the random-effect model. The overall effect size (g = 0.588) and variance estimates (τ2 = 0.52) remained identical, suggesting the results generated under the default ρ value (.80) for computing within-study effect sizes were robust and reliable.
Publication Bias
We performed the contour-enhanced funnel plot and Egger’s regression test with RVE (Egger et al., 1997) to visualize and statistically assess the publication bias. The contour-enhanced funnel plot showed the distribution of our data points was asymmetric, thus a skewed distribution of effect sizes (see Figure 3). Further, the results of the regression test showed that the standard errors of correlations significantly predicted effect sizes among studies, β = 2.837, df = 237, p < .001, indicating the presence of publication bias (Rodgers & Pustejovsky, 2021). We used a trim-and-fill method by adding 61 studies to adjust for publication bias. The results revealed a new effect size: g = 0.2694, 95% CI [0.108, 0.431], t = 3.28, p = .001. Compared to the original effect size, the new effect size was smaller but remained significant.

Contour-enhanced funnel plot.
Results
In this study, we analyzed 29 studies from 24 peer-reviewed articles and 4 dissertations published between January 1994 and January 2023 that reported on (quasi-)experiments investigating the effects of AI-based interventions for SWDs. One dissertation included two studies with independent samples of participants; thus, they were treated as separate studies. Descriptions of each study and full citations can be found in online supplementary Tables S3 and S4. In alignment with our theoretical framework, we integrated descriptive results (Table 2), the effects of AI-driven interventions across different groups (Table 3), and moderating effects (Table 4) to present findings. Specifically, we reported on three areas: (a) the overall effects of AI-based interventions on three identified learning outcomes of SWDs and participant-related characteristics (i.e., outcomes of subject-object interactions); (b) the effects of three types of AI tools and three types of AI roles; (c) the effects of two types of interactions among AI, SWDs, and community as well as associated factors (e.g., setting, rule); and (d) other moderating effects.
Descriptive statistics of learning outcome, participant, AI, interaction, intervention, and methodology characteristics
Note. ASD = autism spectrum disorder; SLD = specific learning disability; IDD = intellectual and developmental disabilities; RCT = randomized controlled trial; QED = quasi-experimental design. Studies were coded multiple times when information provided on community members fell under multiple categories.
No study was coded as focused on using AI tools to enhance students’ behavioral outcomes.
The effect of AI-driven interventions on learning outcomes of students with disabilities
Note. k = number of effect sizes; 95% CI = lower bound and upper bound of the confidence interval; τ2 = between-study sampling variance. * = p-value < .05; ** = p-value < .001.
Moderation analysis of the effect of AI interventions through the lens of CHAT
Note. ASD = autism spectrum disorder; SLD = specific learning disability; IDD = intellectual and developmental disabilities. All covariates and moderators were entered into one model. Subgroup comparisons within categorical moderators are all listed in the model. The second group in each group comparison variable is the reference group (e.g., social-emotional coaches vs. instructional/learning tools, instructional/learning tools is the reference group in the dummy coding of the AI role). 95% CI = lower bound and upper bound of the confidence interval. The data related to age and interval are standardized.
Overall Effects of AI for SWDs: Outcomes of Subject-Object Interactions (RQ1)
To analyze the overall effect of AI interventions on SWDs’ learning outcomes, we included 239 effect sizes from 41 independent samples using RVE methods with the within-study effect size correlation specified as .80. The overall estimated effect of AI-based interventions for SWDs was statistically significant, g = 0.588 (95% CI [0.349, 0.826], p < 0.01). The between-study variance in the true effects appeared to be heterogeneous, with RVE-based τ2 = 0.52 and I2 = 76.17. Most studies (65.5%) focused on social-emotional learning, followed by academic learning (24.1%), daily life (3.4%), and other skills (6.9%). The highest average effect size was identified for the academic group (k = 80, g = 0.929, 95% CI [.402, 1.460], p < .001, τ2 = 0.714). However, no statistical significance was found among the learning outcome subgroups.
Participant-Related Characteristics
Considering learner characteristics in the subject-object interaction, most interventions focused on students with ASD (65.5%), followed by students with SLD (20.7%), IDD (10.3%), and students who are deaf (3.4%). This distribution was correlated with the focus on using robots to support social-emotional learning for students with ASD. However, interventions were more effective for students with SLD, IDD, or who are deaf (k = 89, g = 0.952, 95% CI [.548, 1.360], p < .001, τ2 = 0.599) compared to students with ASD (k = 150, g = 0.368, 95% CI [.075, .661], p < .001, τ2 = 0.458), although no statistical significance was found between these two groups. Moreover, no moderating effect was found in participant age or gender variables.
AI-Related Characteristics: AI Types and Roles (RQ2)
Types of AI
Given that multiple devices and software systems could be applied to form the technological infrastructure of an AI tool and enable the tool to execute multiple functions, we integrated information on hardware, software, and functions to synthesize AI types. This approach helped identify three distinctive types: robots, computer software, and intelligent VR systems. Across these subgroups, the largest effect was found in computer software (k = 74, g = 0.959, 95% CI [.258, 1.660], p < .001, τ2 = 0.264), whereas the effect of intelligent VR systems was not statistically significant (g = 0.528, 95% CI [−.176, 1.23]). As indicated previously, we did not perform moderator analysis for the AI types to control for multicollinearity.
Twenty studies (70.0%) investigated the use of robots in such areas as therapeutic interventions for students with ASD. Most of these studies utilized humanoid robots equipped with varying levels of sensors (e.g., cameras, motion detectors, eye contact detectors, image processing techniques), which allowed them to capture information from interacting with students. Many robots contained actuators (i.e., motors) that allowed their head, arms, or legs to produce motion related to an instructional task. Most robots were operated or controlled by humans to perform tasks using the Wizard of Oz method, except for a robot toy programmed to operate autonomously (e.g., Koch, 2018). From the student perspective, however, the robot directly interacted with them by providing instruction, guidance, prompts, or feedback. Overall, these robotic systems illustrated rudimentary humanlike functions, primarily focused on physical movements, with limited incorporation of humanlike thinking or reasoning capacities.
Six interventions (20.7%) were carried out through software systems operating on regular computers or tablets, including speech recognition, expert systems, and ITS. Three early studies (e.g., E. L. Higgins & Raskind, 2000) investigated the effects of speech recognition software on student learning outcomes, especially for students with SLD. Contemporary applications of this type of software integrated more advanced AI techniques (e.g., artificial neural networks; Felix et al., 2017), increasing the capacity of computers to identify patterns from data on student-technology interaction and respond to individual learning needs more efficiently. Moreover, we identified an early study on expert systems (Wilson, 1997) and a more recent study on ITS (Xin et al., 2017). The ITS examined in Xin et al. (2017) was designed to provide heuristic prompting (e.g., feedback, scaffolding) to engage students with SLD in constructing mathematical ideas. However, the researchers did not elaborate on techniques that made the tool “intelligent.”
Three studies (10.3%) focused on intelligent VR. Different from traditional VR, these systems incorporated certain automation techniques. These techniques include computer vision (i.e., Lorenzo et al., 2016), automatic perspective correction and motion tracking functions (i.e., Ip et al., 2018), and the Internet of Things (Park et al., 2022), which could capture data from various sources in environments. From the pedagogical perspective, these systems provided guidance, feedback, or adapted content based on individual responses to environmental stimuli.
Roles of AI
We identified three AI roles in terms of facilitating learning for SWDs, which include providing support as social-emotional companions or coaches, functioning as instructional or learning tools or acting as teachable agents. These roles are highly correlated with the three types of AI; however, they do not align in a one-on-one fashion. The largest effect size was observed for teachable agents (k = 18, g = 1.100, 95% CI [.672, 1.52], p < .001, τ2 = 0.264); however, further moderator analysis did not reveal statistical significances among the effects of different AI types. Social-emotional coaches or companions (n = 19) provided support for SWDs to develop social, emotional, or communicative skills. Instructional or learning tools (n = 8) supported students in accessing learning content or focusing on critical elements of learning. These tools include all computer software (i.e., ITS [n = 1], speech recognition [n = 4]), one intelligent VR system, and two robotic systems. In the category of AI as teachable agents (n = 2), AI tools would receive instruction from students who acted as teachers or tutors. The study conducted by Wilson (1997) focused on instructing students who were deaf to program expert systems. The Brainin et al. (2022) study shared similar research efforts by engaging students with SLD in programming a robot toy to enhance their multisensory learning experiences.
Interactions Among AI, SWDs, and Community (RQ3)
Interaction type
We identified two distinctive types of interactions among stakeholders, including triadic (n = 17) and dyadic (n = 12). The triadic interactions involved active participation from three parties of an activity system, including AI tools, students, and community members (e.g., interventionists, educators, families/guardians). The dyadic interactions involved active participation from two parties, including AI tools and SWDs, from the student perspective. By comparison, dyadic interaction (k = 97, g = 0.973, 95% CI [.511, 1.430], p < .001, τ2 = 0.834) was more effective than triadic interaction (k = 142, g = 0.385, 95% CI [.114, .656], p < .001, τ2 = 0.392), although no statistical difference was found. From the CHAT perspective, these interactions were constrained by the extent to which community members shared responsibilities.
In triadic interactions, AI tools or community members mediated SWD’s interactions with the other party. For example, community members provided prompts, guidance, or instruction for students through AI tools as co-instructors or co-therapists (e.g., Lorenzo et al., 2016; Marino et al., 2020). In the Marino et al. (2020) study, researchers designed and deployed a humanoid robot for a cognitive behavioral therapy intervention for students with ASD. During interventions, the partially controlled and partially autonomous robot acted as a co-therapist, taking turns with a therapist to provide emotional and communication cues and prompts for students. The therapist was able to determine the robot’s behaviors (e.g., response modes for reinforcement) prior to each session and change these behaviors (e.g., movements, speech) in real time during the session. In other cases, community members received prompts or guidance from AI tools to provide feedback or instruction for students. For example, De Korte et al. (2020) designed a robot-assisted intervention wherein a robot, controlled by a therapist, modeled for parents (as community members) how to practice effective strategies for prompting social-communicative skills with their child. In such interactions, community members shared a certain degree of responsibility with AI for instructing or guiding SWDs through step-by-step learning activities. Interactions among stakeholders could occur asynchronously. For example, the ITS examined in Xin et al. (2017) was embedded with an alert or hint mechanism that could prompt educators to support students when needed.
In dyadic interactions, SWDs mainly interacted with technology during learning or intervention sessions, wherein the AI tool was controlled by human operators without student awareness (e.g., the Wizard of Oz method; Pop et al., 2013), or the tool operated autonomously (e.g., Koch, 2018). In such cases, from the student perspective, AI tools undertook the major responsibility for instruction. There were other instances where community members were involved in the intervention to provide ancillary support (e.g., determining content behind the scenes and problem-solving technical issues) but not to provide instruction. In other cases, community members (e.g., family members) were present in the study setting to accompany students to make them feel safe, observe student behaviors, or rate student learning. From the students’ perspectives, the community members did not directly provide instruction or engage in their interactions with AI to complete learning tasks.
Interaction Setting, Community Members, and Regulating Rules
Most interventions (n = 25) were conducted in special education settings (e.g., resource rooms, special schools), research centers, laboratories, and clinics, whereas four studies did not specify intervention settings. The community members identified across the studies were mainly researchers, therapists, educators, and families/caregivers. Moreover, most interventions (n = 28) were focused on providing step-based instruction supported or delivered by AI to SWDs. A few studies (e.g., Ip et al., 2018) investigated game scenarios or game-based learning. However, we categorized them as step-based, given that, like other instructions, these game activities followed predefined procedures and aimed to provide step-by-step guidance.
Other Moderating Effects
To answer RQ4, moderator analyses were conducted to assess if the overall effect of AI-based interventions was influenced by relevant moderators (see Table 4). All analyses were conducted by controlling for pretest outcomes. As indicated previously, results revealed no statistically significant differences across learning outcomes, participant characteristics, AI roles, and interaction types. Moreover, other intervention- and methodology-related variables exhibited nonstatistically moderating effects. The moderator of the publication year was of particular interest and relevance to CHAT’s historical dimension. Early applications of speech recognition software (conducted before 2013) corresponded to the early development phase of AI. All studies investigating robotic systems and intelligent VR systems included in this study were conducted after 2013, thus representing the more contemporary research efforts on AI for SWDs. Contrary to our expectations, more recent studies (conducted between 2014 and 2023) yielded no larger effects than earlier studies (conducted before 2013).
Discussion
The current unprecedented attention to the potentials and perils of AI for education warrants careful consideration by policymakers, researchers, and practitioners. This study sought to establish a foundational understanding of AI-driven interventions for SWDs, with a focus on identifying their effects on student learning outcomes and exploring potential moderators for these effects through the lens of CHAT. Results revealed an average medium effect (g = 0.588) of these interventions operating through robots, computer software (i.e., speech recognition, ITS, expert system), and intelligent VR on promoting learning for SWDs. Most studies examined robot-assisted interventions for their effects on promoting social, emotional, or communicative skills for students with ASD. Computer software equipped with AI applications has positively benefited SWDs in areas such as enhanced accessibility and multisensory experiences. However, the effect of intelligent VR systems was not statistically significant. Furthermore, our findings revealed that AI, when served as the social-emotional coach, exhibited a smaller effect than the other two roles (i.e., instructional/learning tools, teachable agents), although there were no statistical differences among the effects.
Contrary to our hypotheses, we did not find statistically significant moderators related to SWDs, AI type and role, AI-SWD-community interaction, intervention, and methodology characteristics. Possible explanations for the null moderating effects could be variations existing in different AI tools and the small sample size of moderators for multiple moderators, which might have limited the statistical power to detect moderating effects (Egger et al., 1997). Despite the null moderating effects, the retrospective analysis guided by CHAT allowed us to explore dynamic interactions among AI tools, SWDs, and community through historical and theoretical lenses. Next, we discuss these findings in alignment with CHAT and provide implications for policy, research, and practices.
Evolving AI Applications for SWDs: Supervised Mediations for Subject-Object Interactions
Our CHAT-guided analysis revealed that AI had positive impacts on mediating the subject-object interactions in which SWDs (i.e., subjects) used various AI tools to participate in learning tasks to improve learning outcomes (i.e., objects). Understanding such impacts through Vygotsky’s theory of human learning and development helps us delve deeper into the complex, evolving, and dynamic subject-object interactions facilitated by historically and culturally accumulated tools (e.g., AI tools, language, symbols). The development and application of AI tools for SWDs are embedded with historical knowledge, cultural values, and community practices, influencing how SWDs perceive, think about, and interact with their learning environments. This high-level, CHAT-guided perspective provides critical implications for future research to not only investigate the functionalities of various AI tools but also cultural and value-based aspects embedded in AI design and development for SWDs.
From the historical perspective, the AI tools used to mediate subject-object interactions across included studies spanned the last three decades and reflected the progressive development of AI to some degree, although there was no statistically significant difference in impacts on student learning between early and recent AI applications. From the early stages, speech recognition software (e.g., E. L. Higgins & Raskind, 2000), as an outcome of early AI development, has become a focal point in special education technology due to its role in enhancing accessibility and supporting functional capabilities of SWDs. Similarly, researchers have long discussed the role of expert systems in assisting problem-solving and decision-making (e.g., IEP, disability diagnosis; Woodward & Rieth, 1997). Despite these evolving functionalities, there is a lack of experimental studies examining expert systems and ITS as instructional tools for SWDs. This pattern also applies to intelligent VR systems, as both this study and previous reviews (e.g., Carreon et al., 2022) found limited applications of such systems leveraging advanced techniques for SWDs.
Consistent with previous research (e.g., Knox et al., 2019), this study showed the effectiveness of social robots in improving social-emotional skills for SWDs, mainly for students with ASD. It is worth noting that most of the robotic systems analyzed in this study were operated by humans, even though they might be viewed as autonomous from the student’s perspective. The supervised operations seemed necessary not only due to limited technological infrastructure but also because of ethical considerations. As researchers make significant strides in developing humanlike language, thinking, and reasoning capacities in robotic systems and other AI technologies, more teaching and learning tasks can be automated. Additionally, through the CHAT lens, these evolving technologies can simultaneously function as both physical and psychological tools, interacting and communicating with SWDs through complex actions and language. Consequently, the likelihood of explicitly or implicitly embedding historical knowledge and cultural values about SWDs into the design and development of these tools may increase. Therefore, it becomes even more critical to ensure safety and ethics in the applications of autonomous AI systems for SWDs and all learners. This involves keeping stakeholders (e.g., students, teachers, families, researchers) with diverse lived experiences and cultural values in the loop during the design, selection, deployment, and evaluation of AI systems (U.S. Department of Education, 2023).
From Access to Agency: Positioning SWDs as Active Agents in AI-Mediated Interactions
As discussed through the CHAT lens, SWDs exhibit unique learning needs influenced by biological, psychological, or sociocultural factors (e.g., vision, speech, cognition, behaviors, motivation, social engagement) when interacting with AI (see Figure 1). Most AI tools analyzed in this study demonstrated positive effects in augmenting one or more aspects of SWDs’ functional abilities essential for accessing learning content or engaging in tasks to develop cognitive, daily-life, or socioemotional skills (e.g., E. L. Higgins & Raskind, 2000; Pop et al., 2013). With its ever-evolving capabilities and increasingly ubiquitous applications, AI will continue to support SWDs in navigating learning environments and adapting to social contexts by augmenting their functional abilities across aspects related to biological or psychological factors.
From the CHAT perspective on dialectic interaction, SWDs not only adapt to the demands of external contexts but can also actively transform these contexts using tools or support from more competent others (Gindis, 1999). This mutual adaption positions SWDs as active agents capable of using or creating tools to influence their social contexts and contribute to social practices (Bal et al., 2021). Despite the significance of this strengths-based approach to understanding and supporting SWDs (Vygotsky, 1978), our study uncovered a gap in existing research exploring how to support SWDs as active agents interacting with AI or community members. Only two studies (i.e., Brainin et al., 2022; Wilson, 1997) investigated AI as teachable agents in a way that enabled SWDs to take a relatively more agentic role in using tools to engage in learning. Although not focused on advanced AI techniques, these studies aligned with the recent call for more applications of teachable agents to enhance student agency in AI-human interactions (e.g., Hwang et al., 2020).
Additionally, the findings of our study revealed that the design and applications of AI tools that consider and enrich the cultural-historical experiences of SWDs remain underexplored. This is a critical direction for future research to investigate the role of AI in supporting the unique learning needs of SWDs that intersect with (dis)ability, cultural background, language, social class, and other identity markers (Waitoller & Artiles, 2013). We recommend that researchers consider human factors at both subject (i.e., SWDs) and community (i.e., stakeholders) levels in future explorations of the cultural-historical dimensions of AI-SWD interactions. For example, at the individual level, researchers may continue examining varied approaches to positioning SWDs as active agents who use AI tools to participate in learning activities in inclusive settings, contribute to learning communities, and create artifacts that reflect and share their cultural-historical experiences. These approaches, in turn, are shaped by factors at the community level, as discussed in detail in the next section, along with implications for future research and practices. Taken together, approaches to supporting SWDs in interacting with AI tools should always align with human-centered design principles that value and respect learning differences in biological, psychological, and historical-cultural aspects.
Rules and Voices: Distributing Responsibilities Among AI, SWDs, and Community
To explore the dynamics and collectivity of the AI-SWD activity system, we examined multiple variables related to community members and regulating rules that might influence AI-SWD interactions. First, we identified the presence of other community members (e.g., researchers, families, teachers, therapists) in AI-SWD interactions in many studies (n = 17). However, there were no statistically significant differences between dyadic AI-SWD interactions and triadic AI-SWD-community interactions concerning student learning. Despite being contrary to our hypothesis, this finding could be partially explained by the absence of distinctive instructional approaches to supporting SWDs in these interactions, as most studies (n = 28) focused on providing step-by-step guidance and feedback in controlled settings. Second, our results revealed the lack of research investigating SWDs’ participation in collective learning activities with peers in inclusive settings. This gap prevented us from exploring more intricate roles AI plays in facilitating interactions between students with and without disabilities toward collective learning goals. These gaps point to a direction for future research to investigate the design and use of AI guided by learning theories that facilitate collective activities among SWDs, peers, and other community members. Along the line, there is a need to develop a comprehensive understanding of effective strategies for distributing responsibilities and considering differing voices among AI, SWDs, and community in a way that enhances equity, inclusion, and belonging for SWDs.
Now considering factors at the community level that impact cultural-historical dimensions of AI-SWD interactions, the processes of designing, selecting, and using AI for and with SWDs involve decision-making and participation from multiple stakeholders (e.g., AI developers, school leaders, educators, families, peers). Although varying in background, perspectives, and experiences, stakeholders should consider individual differences in cultural values and other historical experiences when designing and deploying AI. For example, AI developers should address algorithmic bias in AI design and development to avoid unintended impacts of AI tools on SWDs and other minoritized learners (Gupta et al., 2022). At the user end, there is a need to address negative social bias derived from the historical deficit model of disabilities when selecting and deploying AI. It is imperative to emphasize equity and inclusion in these processes and focus on strengths-based approaches to supporting SWDs in actively interacting with AI, serving as their learning partners, social-emotional coaches, teachable tutees, or in other roles. Another direction for future AI research is to ensure the representation of SWDs and individuals from historically marginalized groups in the STEM education and workforce. This way, they can contribute their perspectives, voices, and cultural experiences to the design, deployment, and evaluation of AI tools.
The uncertainty around expanding AI capabilities indicates that designing and applying any AI tools should adhere to rules that highlight inclusivity, equity, transparency, ethical use, and data privacy for all learners (U.S. Department of Education, 2023), especially for SWDs and learners from historically marginalized groups. These rules should include public education policies (e.g., the Individuals with Disabilities Education Act [IDEA]), data privacy and safety regulations (e.g., the Family Educational Rights & Privacy Act [FERPA], the Protection of Pupil Rights Amendment [PPRA], the General Data Protection Regulation [GDPR]), and considerations for AI ethics (e.g., Akgun & Greenhow, 2022). In December 2023, the European Union passed the world’s first comprehensive AI legislation, EU AI Act, to regulate ethical, human-centric, and trustworthy design and development of AI. As more AI legislations are introduced, a fundamental principle for stakeholders with varying roles and responsibilities is to ensure equity, ethics, and safety in designing and deploying AI tools for SWDs to meet their unique learning needs associated with biological, psychological, and cultural-historical factors.
Limitations and Directions
There are multiple limitations that warrant interpreting the results of this study with caution. First, we used databases that were widely applied in the fields of education (i.e., ERIC, PsycInfo), medicine (i.e., PubMed), and computer science and engineering (i.e., IEEE Xplorer), as well as ProQuest to search relevant literature. Although it was deemed comprehensive, the search might not find all relevant studies given that AI is an interdisciplinary area of study, and research on AI for SWDs could be published in avenues not incorporated in the databases we used. For example, there is extensive research that investigated the use and efficiency of AI-enhanced tools for screening or diagnosing varying types of disabilities. We did not include those studies given that this study is focused on examining SWDs’ learning outcomes. Additionally, we excluded conference proceedings from the present study given that conference proceeding abstracts provide limited information on the design, methods, and results of the studies (Scherer & Saldanha, 2019). Another reason is that conference proceedings report interventions that are often at the initial development phase, thus providing insufficient information on student learning outcomes (Rosmarakis et al., 2005). Furthermore, our selection of studies was limited to documents written and published in English, which might lead to the exclusion of relevant studies written in other languages.
Second, a high level of heterogeneity in our random-effect models could not be explained by moderators investigated. We anticipated such heterogeneity due to distinct types of AI analyzed in this meta-analysis, as determined by differing definitions and approaches to studying AI as an interdisciplinary concept. However, heterogeneity, especially in cases of high heterogeneity, is often confounded with publication bias (Borenstein et al., 2017), which was present in this study. It is unclear if it was heterogeneity or publication bias leading to the variation in effect sizes. Thus, although the overall positive impact of AI-based interventions on SWDs’ learning appeared to be robust, the findings should be interpreted with caution.
Third, there was an uneven distribution of studies in the categories used for moderator analyses, resulting in small sample sizes in certain subcategories (e.g., non-ASD disability categories, AI’s role as instructional/learning tools or teachable agents, learning outcomes) and limited statistical power of detecting the moderating effects. However, the statistically insignificant effects should not be interpreted as evidence of no relationship between the effect sizes of AI interventions and the moderators. As more interventions accumulate, researchers should continue to explore potential moderators guided by theories in future meta-analyses.
Conclusion
Research investigating AI applications for SWDs, especially regarding how SWDs interacted with AI tools and other stakeholders (e.g., educators, peers), is underexplored in the existing literature. Our study serves as the first meta-analysis of AI for SWDs, and the results revealed an overall medium effect size of AI-based interventions on SWDs’ learning outcomes. These tools operated on social robots, computer software (i.e., speech recognition, intelligent tutoring, expert systems), and intelligent VR systems. Moreover, they functioned as social-emotional coaches, instructional/learning tools, and teachable agents. However, there is a lack of research that explored advanced AI techniques for SWDs or applied instructional practices that guide SWDs in participating in collective learning activities with peers, educators, or families. Guided by CHAT, we discussed historically, socially, and contextually co-constructed facets of interactions among SWDs, AI tools, and community members for future investigations. We recommend that researchers design and deploy AI that not only ensures accessibility but also promotes opportunities for SWDs to take an agentic role in participating in and contributing to AI-mediated individual and collective learning activities. To achieve this goal, there is a pressing need for stakeholders (e.g., educators, AI developers, policymakers) to embrace strengths-based attitudes, values, and cultural norms at both the individual and societal levels to meet SWDs’ diverse learning needs influenced by biological, psychological, and cultural-historical factors when designing and deploying AI tools.
Supplemental Material
sj-docx-1-rer-10.3102_00346543241293424 – Supplemental material for Let’s CHAT About Artificial Intelligence for Students With Disabilities: A Systematic Literature Review and Meta-Analysis
Supplemental material, sj-docx-1-rer-10.3102_00346543241293424 for Let’s CHAT About Artificial Intelligence for Students With Disabilities: A Systematic Literature Review and Meta-Analysis by Ling Zhang, Richard Allen Carter, Yuting Liu and Peng Peng in Review of Educational Research
Footnotes
Notes
Authors
LING ZHANG is an assistant professor of special education in the College of Education at the University of Wyoming, 1000 E University Ave, Laramie, WY 82071, USA; email:
RICHARD ALLEN CARTER, JR., is the Ted S. Hasselbring Chair of Special Education Technology at Indiana University, 201 N. Rose Avenue, Bloomington, IN 47405, USA; email:
YUTING LIU is currently a doctoral student in the Department of Special Education at The University of Texas at Austin, 1 University Station, D5300, Austin, TX 78712, USA; email:
PENG PENG is an associate professor in the Department of Special Education at The University of Texas at Austin, 1912 Speedway, Stop D5000, Austin, TX 78712, USA; email:
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
