Abstract
Purpose
This article critically examines the enduring problems and emerging possibilities of educational research in light of rapid advances in artificial intelligence (AI). It seeks to understand why educational research has struggled to influence practice and policy meaningfully and explores how AI necessitates a fundamental rethinking of research purposes, methods, and epistemologies.
Design/Approach/Methods
The article adopts a conceptual and critical review approach, drawing on historical, philosophical, and methodological literature. It identifies and analyzes seven major problems in traditional educational research, including flaws in peer review, quantification bias, methodological fragmentation, overgeneralization, neglect of individual learner diversity, limited educational imagination, and narrow outcome measures. It then explores how AI technologies challenge and reshape core assumptions about knowledge production and educational inquiry.
Findings
Traditional educational research is constrained by outdated paradigms that emphasize generalizability, stability, and typicality at the expense of contextual sensitivity, individual variability, and imaginative possibilities. The rapid evolution of AI further undermines assumptions of stable treatments, linear causality, and human-centered cognition. AI opens new opportunities for participatory, iterative, and systems-oriented research, while also raising ethical and epistemological concerns that demand critical reflection.
Originality/Value
This article offers a timely and provocative analysis of the limitations of traditional educational research and articulates a vision for its rebirth in the age of AI. It contributes to the growing discourse on paradigm shifts in education by integrating critiques of research orthodoxy with emerging insights into AI-enabled learning. The article calls for methodological pluralism, ethical vigilance, and epistemological innovation, positioning researchers to better respond to the complex and evolving landscape of education in a post-AI world.
Keywords
In 2023, more than 2 million journal articles were published across 30,000 peer-reviewed journals (Zul, 2023). While the number of those articles that are about education or related fields is not published, the Survey of Earned Doctorates indicates that in the United States in 2022, 4,000 research doctoral degrees were awarded in Education versus a total of 57,596 in all fields. This represents 7% of all doctorates. If that figure is applied to the total number of published peer-reviewed journal articles, one might estimate 140,000 articles were published in the field of education. Yet, despite the massive amount of research, there does not seem to be a commensurate impact on education policy, practice, or any general improvement in student learning.
The complaint about the lack of impact of research on policy and practice has been raised equally by policymakers, practitioners, and researchers. And there is a long tradition of examining the problems of educational research (Lagemann, 2002, 2008; Lagemann & Shulman, 1999). The debate about using randomized controlled trials (RCTs) to make education research more scientific (Berliner, 2002) was fierce, but the shift toward more quantitative studies using RCTs did not make education research more impactful. The complexity of educational outcomes and the various side effects of educational treatment, be it policy or practice, adds more to the uncertainty, in validity, reliability, and generalizability of educational research (Zhao, 2017, 2018; Zhao & Beghetto, 2024).
The emergence and rapid development of generative artificial intelligence (AI) in recent years created challenges and opportunities for education research. It could potentially redefine educational outcomes, organizations, and the entire ecosystem of schooling, resulting in a paradigm shift (Kuhn, 1962) in education (Zhao, 2024; Zhao & Zhong, 2024; Zhong & Zhao, 2025). The paradigm shift certainly needs a different kind of research for new theories, practices, and policies to emerge. It also possibly requires new designs and models of research practices.
In this article, we discuss the problems and challenges of research in education. Our discussion includes problems in educational research prior to the recent availability of generative AI tools, with special attention to speculating on new challenges presented or reduced by AI.
Problems in traditional research
Problem 1: Problems with peer review
Qualified reviewers, typically termed “peers,” evaluate manuscripts to uphold the quality and integrity of scientific publications. Journals and conferences within the educational domain have widely adopted this tradition, relying heavily on peer review to maintain scholarly standards. Peer review, though currently seen as essential to scientific endeavor, is a relatively new phenomenon. The work of Newton, Galileo, Einstein, Watson, and Crick did not receive peer review (Aczel et al., 2025) as modern peer review started in the mid-20th century, and the term itself was coined in the 1970s (Wills, 2024). Not surprisingly, for a topic of importance to researchers, there have been numerous articles identifying problems with peer review. Aczel et al. (2025) identified four categories of issues: (a) quality, (b) predatory journals, (c) biases, and (d) low reliability.
Per Aczel et al. (2025), concerns (and parenthetic examples of potential solutions) related to quality include the lack of qualified reviewers (provide reviewer training), lack of reviewers at all (provide compensation), insufficient scrutiny (use more reviewers), and need for reviewers with particular areas of specialization (sign the reviews so readers know what specializations were included). Though listed separately, the concern regarding predatory journals was primarily a quality concern that there would be no or only rudimentary review.
Data indicate that over 70% of scholars decline review invitations, primarily because the manuscripts do not align with their specific areas of expertise. Additionally, about 42% of scholars feel overwhelmed by competing professional commitments, and approximately 39% report insufficient formal training in conducting peer reviews (HighWire, 2023).
Reviewer fatigue presents another critical factor contributing to the peer review shortage. A relatively small proportion of professionals in scientific research regularly undertake reviewing responsibilities, often leading to overcommitment. Although these scholars acknowledge peer review as a vital professional duty, prolonged periods of excessive reviewing responsibilities frequently result in burnout, reducing both willingness and effectiveness in the peer review process (Tropini et al., 2023).
Aczel et al. (2025, p. 2) identified that biases could relate to “authors, topics, methods, groups, institutions, countries, arguments, or ideas.” Perhaps amusingly, potential ways of addressing these biases ranged from double-blind reviews to the opposite—open, transparent, and even signed reviews.
Cultural biases and language barriers further complicate peer reviewing. With the dominance of English as the lingua franca of scientific communication, researchers from non-English speaking nations, such as China—which has rapidly grown as a prominent contributor to global scientific literature—often struggle to participate effectively in reviewing English-language submissions. Conversely, English-speaking researchers typically lack the linguistic skills and cultural familiarity necessary to adequately assess manuscripts submitted in languages such as Chinese (Grabarić Andonovski et al., 2019; Publons, 2018).
The shortage of appropriately qualified peers jeopardizes the fundamental purpose of peer review. Engaging unqualified reviewers increases the risk of rejecting meritorious papers or accepting substandard manuscripts. Even qualified reviewers may occasionally fail to recognize innovative or groundbreaking research, inadvertently rejecting work with potential transformative implications. Such outcomes highlight inherent limitations in the peer review process, questioning its efficacy as an optimal mechanism for scientific quality assurance and potentially impeding significant scientific advancements (Tennant & Ross-Hellauer, 2020).
Aczel et al. (2025) focused on low reliability, highlighting the limited agreement and frequent disagreement between reviewers (“focus revisions on points where reviewers agree”) and lack of evidence for deciding how to improve peer review (do research on the peer review process). In a meta-analysis based on 70 reliability coefficients from 48 different studies, Bornmann et al. (2010) found the weighted mean inter-rater reliability of the peer review process was .34 with a corresponding Cohen's Kappa of .17. In an article that replicated the NIH peer review process, Pier et al. (2018) empaneled 43 researchers and had them review 25 actual NIH oncology grant submissions. They calculated intraclass correlations and Krippendorf's Alpha and concluded, “there was no agreement among reviewers.”
Lotriet (2012) identified the slowness of the peer review process as a significant problem. Huisman and Smits (2017) suggested this problem may be worse in some disciplines than others. Though perhaps outdated, Karieva et al. (2002) showed an average of 572 days from initial submission to publication in conservation and applied ecology journals and 249 days for genetics and evolution journals. Huisman and Smits (2017) found that the peer review process in the social sciences (which included education but excluded psychology which had its own category) took an average of 23 weeks, the second slowest (exceeded only by Economics and Business) among 10 disciplines. While electronic journals did better, the process remains slow and often tedious.
Thus, peer review, often considered the sine qua non for publishing academic research, is fraught with issues and needs some kinds of alterations. The recent growth of open access journals provides some relief in terms of speed of publication, but the lack of or superficial review of the content of open access work renders the process somewhat equivalent to simply posting manuscripts online.
Problem 2: Quantification without contextual representation is tyranny
Comedian Steven Wright was known for his one-liners, including, “43.7 per cent of all statistics are made up on the spot.” This joke simultaneously points out the power of seemingly precise numbers and warns that power can readily be misused or misinterpreted. Educational research faces this same problem, only we call it quantification bias!
Before considering the tyranny of false precision, the epistemological context behind educational research in general and how it applies to quantitative research in particular should be considered. The dominant epistemological basis for research methodology, especially in education, is referred to as post-positivism. As stated by Karl Popper (1935, 1959), one of the post-positivisms’ founders, “… all knowledge is provisional, conjectural, hypothetical—we can never finally prove our scientific theories, we can merely (provisionally) confirm or (conclusively) refute them …”
Much educational research, particularly quantitative studies favored by policymakers, heavily relies upon probability and statistical procedures to examine the effects of educational interventions. Gerd Gigerenzer, in his seminal critique, “Mindless Statistics,” addresses widespread misunderstandings and misapplications of null hypothesis significance testing (NHST), a practice pervasive within educational research (Gigerenzer, 2004). Gigerenzer argued that many researchers routinely misuse NHST by interpreting p-values as definitive proof of the truth or importance of findings. This misuse often involves emphasizing whether results achieve statistical significance, such as surpassing arbitrary thresholds like p < .05 or p < .01, while neglecting to consider whether the results carry practical educational meaning or relevance.
Confidence intervals present the same information that is the basis of NHST, but in a way more consistent with post-positivism. A confidence interval presents the bounds within which truth is likely to be found. Better research and better statistical modeling can reduce the confidence interval, but cannot reduce it to zero. By highlighting the uncertainty, confidence intervals reduce quantification bias.
Researchers frequently overlook or misunderstand other important methodological considerations such as effect sizes, statistical power, and the necessity of replication to verify findings. Consequently, educational researchers frequently present inflated or exaggerated claims about treatment effects derived from studies that are statistically underpowered or otherwise based on relatively small samples.
The implications of these misuses extend beyond academia into educational policymaking, where statistical results are often treated as certain and universally applicable, despite their inherent probabilistic and context-dependent nature. Policymakers regularly advocate for so-called “evidence-based” educational reforms founded on generalized findings without adequately addressing the complexities of local variations, cultural differences, or the practical intricacies of implementation (Zhao, 2020). For instance, large-scale international assessments such as the Programme for International Student Assessment (PISA) frequently catalyze sweeping educational reforms driven by minimal statistical differences, disregarding substantial contextual diversity and practical implementation challenges.
This overreliance on the presentation of quantitative data is further symptomatic of the aforementioned quantification bias. Other manifestations of quantification bias in education include practices such as teacher evaluations grounded in value-added models (VAMs). These models apply statistical procedures to estimate teachers’ contributions to student learning outcomes, despite considerable instability and probabilistic uncertainty inherent in their calculations. Similarly, educational systems often overly emphasize changes in test scores, neglecting broader developmental or social outcomes critical to holistic student growth.
A fundamental limitation of probability-based methods in educational research lies in their inability to reliably predict or account for sufficient individual-level characteristics. While probabilistic approaches may reveal group-level trends, they fail to capture the individual differences inherent in student populations. Educational interventions demonstrating significant statistical effects at the group level may yield highly heterogeneous outcomes at the individual level; such interventions might be beneficial for some students, neutral for others, and potentially detrimental for yet others (Zhao, 2018; Zhao & Beghetto, 2024). This inherent diversity underscores the importance of cautious interpretation of statistical findings and suggests the need for more nuanced, individualized, and context-sensitive approaches in educational research and policy formulation. Sadly, as Trout (2002) observed, people tend to believe conclusions or explanations, not because they are accurate, but because they are intuitively satisfying.
Problem 3: The cverblown research paradigm wars
Another ongoing challenge within educational research has been the persistent conflicts commonly referred to as the “paradigm wars,” a term first articulated by Gage (1989) to describe the deep-seated methodological divisions within the field. These conflicts have predominantly revolved around quantitative and qualitative research paradigms, each staunchly advocating for its methodological rigor and epistemological superiority while vigorously critiquing the other's perceived shortcomings (Gage, 1989). This methodological schism has deep historical roots and has profoundly shaped scholarly discourse, policy formulation, and research practices within education.
A notable legislative influence that intensified these methodological divisions was the enactment of the No Child Left Behind Act in 2002, which decisively favored quantitative methods, particularly randomized controlled trials (RCTs), by officially designating them as the “gold standard” for educational research (No Child Left Behind Act, 2011). This legislative endorsement significantly elevated quantitative methodologies, granting them privileged status in educational policy and funding decisions, which consequently marginalized qualitative research approaches.
However, qualitative methodologists and researchers advocating interpretive and critical paradigms resisted this marginalization vigorously. They offered compelling critiques of the limitations inherent in quantitative methodologies, arguing that the complexity of educational phenomena could not be fully captured by purely statistical analyses or controlled experiments (Biesta, 2007). McCloskey and Ziliak (2010), among others, underscored the critical epistemological and ethical shortcomings of the quantitative emphasis, particularly the over-reliance on statistical significance without sufficient consideration for practical relevance and contextual understanding.
The paradigm wars have unfortunately created lasting fractures within educational research, fragmenting the scholarly community into opposing methodological camps. These entrenched divisions have hindered meaningful dialogue, collaboration, and synthesis across research approaches, thereby impeding the advancement of educational knowledge and practice (Zhao, 2018). The adverse consequences of this fragmentation echo those observed in similarly polarized educational debates, such as the so-called reading and math wars. In these contentious disputes, entrenched ideological positions have often obscured nuanced, integrative solutions and perpetuated conflicts rather than resolving them (Ginsberg & Zhao, 2025; Zhao, 2024). Notably, other fields have struggled with the same controversies. In Political Science, for example, the American Political Science Association labeled this as Perestroika, concluding that the movement, “reminded the practitioners of formal and quantitative methods that qualitative methods and area studies also make contributions to political science research and teaching, and should not be undervalued in the profession” (Rigger, 2009).
For educational research to advance constructively and meaningfully, scholars must actively move beyond these entrenched methodological divides. Embracing methodological pluralism—an approach that acknowledges the complementary strengths and limitations inherent in both quantitative and qualitative paradigms—would allow researchers to more comprehensively understand complex educational phenomena. Such pluralism promotes critical methodological reflection, encourages integrative dialogue, and ultimately facilitates richer and more holistic research insights capable of effectively informing educational practice and policy.
Problem 4: Overgeneralizing across contexts
A critical issue in educational research is the presumption of uniformity in educational contexts. This assumption erroneously simplifies the inherent complexity and diversity present across educational environments. While it is undeniable that certain universal principles and commonalities exist, the variability among educational contexts—including cultural nuances, local socioeconomic conditions, political frameworks, pedagogical philosophies, governance structures, organizational practices, and teacher–student ratios—is substantial (Cohen & Spillane, 1992; Crossley & Watson, 2003). Despite this variability, educational researchers frequently generalize findings from single or limited contexts, assuming representativeness across diverse settings once statistical significance has been achieved (Berliner, 2002; Biesta, 2010).
This practice overlooks the profound differences that exist among schools and classrooms, which significantly shape educational processes and outcomes (Phillips & Schweisfurth, 2014). Such oversight risks imposing educational interventions and policies that may be effective in one context but detrimental or irrelevant in another, leading to questionable applicability and diminished effectiveness of educational reforms (Zhao, 2018). Therefore, acknowledging the diversity of educational contexts is crucial in research design and interpretation, as it enhances the ecological validity and contextual sensitivity of educational studies, thus better supporting context-appropriate educational innovations and improvements (Lincoln & Guba, 1985; Stake, 2005).
Additionally, educational contexts are dynamic and subject to change driven by policies, teacher initiatives, student agency, technological advancements, and community engagement. Consequently, no educational environment is permanently fixed; each can evolve over time through deliberate efforts and innovations (Cuban, 2013; Fullan, 2007). Therefore, educational researchers should not only recognize the existing variability but also anticipate and accommodate potential changes within these contexts. Embracing such an approach requires researchers to adopt flexible methodologies that consider future possibilities, fostering an environment conducive to adaptive and forward-looking educational research (Darling-Hammond, 2010; Yin, 2017).
In educational research, the prevailing assumption has traditionally been that educational treatments—whether policies or classroom practices—are sufficiently stable to yield insights that can be reliably generalized across different contexts and temporal frameworks. This assumption underpins much international comparative research and policy borrowing, exemplified by initiatives such as the PISA. PISA encourages policymakers and educators worldwide to replicate educational strategies from high-performing systems like Finland and Singapore, presuming the efficacy of these strategies to be both stable and transferable (OECD, 2016; Sellar & Lingard, 2013). However, such assumptions neglect critical contextual and historical nuances, given that the observed high achievement levels in countries such as Finland and Singapore resulted from specific policies and practices instituted decades earlier under significantly different societal, economic, and technological conditions (Sahlberg, 2015; Tan & Dimmock, 2014). Hence, borrowing these educational practices implicitly assumes educational contexts remain static, which is rarely the case (Zhao, 2020).
Problem 5: The negligence of individual diversity
The diversity of individual learners constitutes another critical yet often overlooked dimension in educational research. Learners are not uniform; rather, they exhibit unique innate capabilities, interests, cultural backgrounds, experiences, and aspirations. The concept of multiple intelligences proposed by Gardner (1983; Gardner & Hatch, 1989) underscores the varied intellectual potentials and strengths individuals possess. Furthermore, each learner's personality traits significantly shape their learning processes and outcomes (John et al., 2008). Reiss (2000, 2004) additionally emphasized the diverse motivations and desires that guide student engagement and achievement in educational settings.
Moreover, students’ backgrounds, including their family, community, and geographic environments, significantly influence their learning trajectories. These factors interact dynamically with inherent biological and psychological predispositions, contributing to distinct learning profiles, each characterized by specific strengths and challenges (Lewontin, 2001; Ridley, 2003). Hatch (1997) further emphasizes that when entering educational environments, every student brings unique expertise derived from their life experiences, albeit alongside specific limitations or challenges.
Given this inherent complexity, educational research that generalizes findings across diverse student populations can lead to misleading conclusions and practices. Educational interventions and policies deemed effective at the group level may fail dramatically when applied universally without consideration for individual variations. Consequently, contemporary educational research increasingly acknowledges the notion of learners as inherently twice-exceptional—gifted in some domains yet challenged in others—highlighting the necessity for individualized approaches in education (Foley Nicpon et al., 2011; Reis et al., 2014; Ronksley-Pavia, 2015; Trail, 2021; Zhao et al., 2022). This perspective necessitates a shift toward more nuanced educational practices, urging educators and policymakers alike to consider research findings within the context of individual learner profiles.
Problem 6: The typical vs. possible mindset
Educational research has traditionally gravitated toward identifying generalizable principles and universal laws that apply uniformly to diverse populations, including students, teachers, administrators, and families across varied educational contexts (Berliner, 2002; Shavelson & Towne, 2002). This focus on the typical and normative has facilitated the proliferation of standardized assessments, universal curricula, and policy prescriptions aimed at broad-scale applicability (Zhao, 2018). While such efforts have undoubtedly yielded significant insights into educational phenomena, they simultaneously constrain the conceptual boundaries of educational imagination by prioritizing uniformity over possibility.
The prevailing methodological paradigm in educational research, largely influenced by post-positivist traditions, emphasizes replicability, predictability, and generalizability (Phillips & Burbules, 2000). This paradigm inherently restricts the exploration of educational possibilities and diminishes attention to context-specific innovations that could inspire unique educational visions and practices. Consequently, research tends to overlook or undervalue educational experiments and scenarios that deviate from normative expectations, effectively marginalizing imaginative alternatives and potential transformations (Biesta, 2010).
However, the potentiality of education—what it might achieve or become for certain individuals, schools, or classrooms—is precisely where imagination and innovation reside (Greene, 1995). A robust educational imagination demands openness to diverse educational outcomes, acceptance of complexity, and willingness to embrace uncertainty and variability (Eisner, 2002). By expanding methodological and epistemological frameworks to include speculative, interpretive, and imaginative inquiries, educational research can begin to address not only what is typical or normative but also what is possible, aspirational, and transformative (Barone & Eisner, 2012).
Indeed, shifting the emphasis from studying the typical to envisioning the possible necessitates an expansion of research methods to encompass narrative inquiry, design-based research, and futures studies (Clandinin & Connelly, 2000; McKenney & Reeves, 2018; Slaughter, 2002). These approaches invite researchers to imagine innovative educational trajectories, explore contingent futures, and consider multiple educational possibilities rather than predetermined outcomes.
In sum, educational research must deliberately cultivate imagination by embracing methodological pluralism and conceptual flexibility. Only then can research effectively illuminate and inspire the diverse educational possibilities essential for addressing the evolving needs of learners and communities in a rapidly changing world.
Problem 7: The multiplicity and conflicting educational results
Educational research frequently suffers from an overly narrow focus on singular, often quantitative outcomes, primarily academic test scores, while disregarding the multiplicity and complexity of educational outcomes. The long-standing paradigm conflicts, such as those exemplified in the reading and math wars, emerged precisely because researchers and policymakers fixated upon isolated measures of effectiveness, neglecting a broader spectrum of cognitive and non-cognitive, short-term and long-term educational results (Berliner & Glass, 2014; Biesta, 2009; Ravitch & Riggan, 2016). The propensity to prioritize standardized test scores as definitive evidence of educational success not only constrains understanding but also perpetuates contentious debates among competing educational methodologies (Koretz, 2017).
Zhao (2018, 2022) underscores that the exclusive focus on specific outcomes inevitably obscures other significant impacts—what he refers to as educational “side effects.” According to Zhao, education interventions, particularly those targeted toward enhancing standardized test performance, frequently produce unintended negative consequences, including diminished creativity, reduced student engagement, increased anxiety, and compromised socio-emotional well-being (Zhao, 2018; Zhao & Gearin, 2018). Zhao's critique aligns with broader scholarly consensus that robust educational evaluation must encompass diverse measures of student development, capturing cognitive as well as social-emotional, psychological, and ethical dimensions (Duckworth & Yeager, 2015; Heckman & Kautz, 2012).
The myopia inherent in outcome-driven educational research has resulted in fragmented and polarized discourse, exemplified vividly in the persistent “wars” around literacy and mathematics education. Such debates often pit advocates of phonics against those of whole language instruction, or procedural mathematics against conceptual understanding, each emphasizing favorable outcomes within their narrow frameworks (Hanford, 2019; Schoenfeld, 2004). Rarely do researchers engage with the possibility that their favored interventions may simultaneously produce detrimental effects in areas outside their immediate scope of evaluation. Comprehensive educational research, therefore, requires an intentional shift toward multidimensional assessments, fostering greater collaboration and dialogue among educational stakeholders, and mitigating the entrenched divisions perpetuated by reductive evaluations (Biesta, 2020; Zhao, 2018).
Ultimately, recognizing the complexity and multiplicity of educational outcomes is essential for moving beyond reductionist paradigms and facilitating more integrative, nuanced educational practices. Zhao's call to acknowledge and systematically investigate side effects as legitimate outcomes of educational interventions offers a critical methodological adjustment that could transform educational research and practice, reducing longstanding theoretical and methodological conflicts (Zhao, 2018, 2022).
Challenges of AI for research
Artificial intelligence (AI) has been around for decades, but the advent of large language models such as OpenAI's ChatGPT and other emerging technologies like virtual reality are poised to dramatically influence workforce issues, economic systems, practices in business, education, healthcare, and likely all fields, with individual lifestyles and entire societies impacted worldwide. In education specifically, according to a report by Center for Innovation, Design, and Digital Learning (2024), AI has the potential to revolutionize teaching and learning through personalized education, administrative efficiency, and innovation …” (p. 1). In higher education, innovative teaching practices are emerging using AI across multiple fields (e.g., see Mollick & Mollick, 2024). Regarding research, the American Psychological Association (Huff, 2024) set out multiple ways AI can and potentially will support research, noting various perils that will require consideration. A recent study on the use of AI in engineering research concluded that, “AI tools are making research faster, more accurate, and more collaborative, from literature discovery and data analysis to writing aid and collaboration” (Madanchian & Taherdoost, 2025, p. 9). AI provides researchers with significant computing capabilities and assistance with data processing and analysis (Papaspyridis, 2020). N. Dhawan and S. Batra (2021) reported that AI aids researchers in creating and analyzing surveys.
However, along with the potential, clear challenges face researchers as AI emerges. AI is a dynamic space, with rapid changes and updates, concerns about hallucinations in data, and an array of ethical issues. A series of issues that AI generates for educational research are examined next.
The crisis of stable treatments
This inherent complexity and context-dependence have become substantially exacerbated by the rapid evolution of artificial intelligence (AI). AI's ongoing integration into educational environments continuously alters both the nature and efficacy of educational treatments. Teaching methods and educational interventions that incorporate AI tools are subject to rapid obsolescence, as advancements in AI technology frequently outpace the traditional research publication cycle (Luckin & Cukurova, 2019; Selwyn et al., 2023). As AI tools improve at a rapid pace, often on a monthly or even weekly basis, educational interventions involving AI become moving targets. For example, a study examining the efficacy of GPT-3-powered tutoring may be obsolete by the time GPT-4 or GPT-5 introduces significantly enhanced functionalities. Consequently, findings regarding AI-supported educational methods often become outdated even before dissemination within academic and practitioner communities.
This challenge echoes Zhao (2024), who argued that educational research must adapt to the reality that the subjects of study—AI tools—evolve faster than research cycles. Consequently, researchers must embrace more adaptive, real-time, and iterative forms of inquiry (Barab & Squire, 2004; Perrotta & Selwyn, 2020).
Rethinking educational aims
AI raises fundamental epistemological and curricular questions: What should students learn when machines can perform many cognitive tasks faster and better than humans? Echoing Herbert Spencer's question, “What knowledge is of most worth?” (Spencer, 1860), the AI era demands a new answer. Memorization and standard skill acquisition may lose relevance when AI tools provide real-time support.
Instead, education may need to emphasize creativity, ethical judgment, problem finding, and human–AI collaboration. These shifts compel researchers to investigate not merely how existing goals can be achieved more efficiently with AI, but how the goals themselves must be redefined (Mishra & Mehta, 2017; Zhao & Watterston, 2021).
Moreover, the incorporation of AI does not merely change the methods of instruction but also fundamentally reshapes the educational contexts themselves, influencing students’ learning behaviors, cognitive engagement, and social interactions (Zhai et al., 2021). The dynamic nature of these changes renders the traditional static view of educational research insufficient and poses unprecedented methodological and epistemological challenges. Researchers are compelled to adapt their methodologies, moving toward more flexible, responsive, and iterative research designs that better accommodate the rapid pace of technological innovation and its consequences in educational settings (Baker & Siemens, 2014; Williamson et al., 2020).
From causality to complexity
AI does not act as a discrete educational intervention; it functions as an ecosystemic shift. The effects of generative AI on learning are emergent, nonlinear, and deeply contextual (Zhao & Zhong, 2024; Zhong & Zhao, 2025). Therefore, the dominant paradigm of educational research—especially randomized controlled trials (RCTs)—may be ill-equipped to account for these dynamics.
Instead, educational research must draw from complexity science and systems thinking. Methods such as network ethnography, agent-based modeling, and design-based implementation research can capture the evolving nature of human–AI interactions in classrooms (Bar-Yam, 2004; Cobb et al., 2003).
Distributed cognition and AI as co-learners
The use of AI in learning contexts challenges traditional assumptions about the locus of knowledge. Instead of viewing learning as an individual cognitive process, AI integration reveals the distributed nature of cognition across humans and machines. This aligns with theoretical frameworks such as distributed cognition (Hollan et al., 2000) and human-in-the-loop systems (Amershi et al., 2019).
Educational research must begin to conceptualize students not just as independent learners but as participants in human–AI collaborative systems. These collaborations alter both the processes and products of learning, requiring new forms of assessment and observation (Luckin et al., 2016).
Ethical, equitable, and sociotechnical inquiry
The incorporation of AI into education is not neutral. It raises urgent questions about data privacy, algorithmic bias, access inequality, and surveillance. Historically marginalized communities may experience disproportionate harms or limited benefits from AI-driven education. Therefore, educational research must move beyond efficacy to ask critical questions about justice, inclusion, and power (Benjamin, 2019; Noble, 2018; Selwyn, 2019).
Critical theorists, feminist epistemologies, and sociotechnical frameworks provide valuable lenses to examine how AI is designed, deployed, and received in educational settings.
Toward participatory, generative research
Generative AI opens possibilities for democratizing research processes. Teachers and students can become co-researchers, using AI tools to generate data, reflect on practices, and design solutions. This calls for participatory design research and action research models that treat all actors in the learning ecosystem as knowledge producers (Bang & Vossoughi, 2016; Brydon-Miller et al., 2003; Ito et al., 2020).
A special case: The transformation of reviewing literature with AI
The practice of literature review in educational research is undergoing a fundamental transformation driven by the rapid advancement of artificial intelligence (AI). Traditionally, literature reviews have functioned as a critical foundation of scholarly inquiry, enabling researchers to contextualize their work within existing knowledge, identify theoretical and empirical gaps, and justify research questions. However, this process has historically been constrained by human limitations—bounded attention, cognitive overload, and the inefficiency of manual searching and screening (Boell & Cecez-Kecmanovic, 2015). AI, particularly in the form of natural language processing (NLP), large language models (LLMs), and machine learning algorithms, is now disrupting this paradigm by enabling a new form of “augmented reviewing” that combines human judgment with machine efficiency (Jovanović et al., 2021; Yin et al., 2022).
AI-enhanced tools are capable of navigating vast and ever-expanding bodies of academic literature with a speed and scale previously unattainable. Platforms such as Semantic Scholar, Scite, and Connected Papers, for example, use NLP to extract key topics, trace citation networks, and highlight seminal works in a given field. These tools allow researchers to visualize conceptual relationships, track the evolution of ideas, and identify underexplored areas more efficiently than through manual processes alone (Marshall & Wallace, 2019). Moreover, machine learning systems like ASReview employ active learning strategies to prioritize article screening in systematic reviews, reducing the number of articles researchers must read while maintaining high recall (van de Schoot et al., 2021). Such systems have been particularly valuable in educational research, where interdisciplinary overlaps and diverse methodological traditions often make comprehensive reviews labor-intensive and prone to omission.
In addition to streamlining discovery, generative AI tools such as ChatGPT, Elicit, and Scispace Copilot are increasingly used to assist with content synthesis, thematic analysis, and even paraphrasing of complex theoretical arguments. These tools can generate preliminary literature summaries, organize themes, and suggest potential frameworks for analysis, effectively acting as intelligent research assistants (Gilson et al., 2024; Lund et al., 2023). While they are not a replacement for scholarly interpretation, they can significantly reduce the cognitive and temporal load associated with large-scale reviews, particularly for novice researchers or interdisciplinary teams lacking deep domain familiarity.
Nevertheless, the integration of AI into literature reviewing also introduces a set of epistemological and ethical challenges that educational researchers must navigate carefully. First, there is the risk of algorithmic bias and opacity. AI tools may reinforce dominant narratives or exclude marginalized perspectives if trained on biased corpora or if their selection criteria remain inscrutable to users (Leitner et al., 2023). This is especially concerning in education, a field characterized by cultural, linguistic, and ideological diversity. Second, there is concern over the erosion of critical engagement. Literature reviews are not merely technical exercises but interpretive acts—sites where researchers make sense of, and engage critically with, bodies of knowledge (Biesta, 2020). Over-reliance on AI-generated summaries or categorizations may discourage deep reading and the kind of reflexive analysis that drives theoretical innovation and methodological advancement.
Moreover, the epistemic status of AI-assisted reviews is still evolving. Questions remain about authorship, attribution, and scholarly rigor when machines contribute to knowledge synthesis. For instance, how should researchers cite AI-generated insights? What standards should guide the validation of machine-curated bibliographies? As AI becomes more embedded in research workflows, professional guidelines and educational programs must evolve to address these issues. Training future educational researchers will require not only technical competence in using AI tools but also ethical literacy and critical awareness of their limitations and biases (Biesta, 2020; Gilson et al., 2024).
In sum, AI has undoubtedly transformed the practice of literature review in educational research—from discovery to synthesis—offering powerful tools to enhance efficiency, breadth, and even insight. However, its responsible use necessitates a reconceptualization of the review process as a human–AI partnership grounded in critical reflexivity, transparency, and epistemological care. As AI continues to evolve, it will be essential for the educational research community to shape its adoption in ways that enhance, rather than erode, the foundational values of scholarly inquiry.
AI, research design, and epistemological shifts in educational research
The integration of artificial intelligence (AI) into educational research is not only transforming methodological procedures such as literature reviews but also prompting a reexamination of research design and foundational epistemological assumptions. At its core, research design in education is guided by questions about what counts as knowledge, how it can be acquired, and how it should be interpreted. The rise of AI technologies—especially machine learning algorithms, data-driven predictive models, and generative systems—challenges traditional distinctions between qualitative and quantitative paradigms, introduces new forms of data, and compels researchers to rethink their roles in the knowledge production process (Knox, 2020; Williamson, 2021).
One of the most significant shifts is the increasing emphasis on data-intensive research design. AI tools have enabled educational researchers to access, process, and analyze vast, complex, and often real-time datasets, including learning management system logs, student interactions with educational software, biometric feedback, and more. This development has made possible new forms of learning analytics, predictive modeling, and real-time intervention design that move beyond static variables and retrospective analysis (Siemens & Baker, 2012; Slade & Prinsloo, 2013). These approaches often employ correlational or pattern-recognition logics rather than theory-driven hypotheses, raising concerns about the epistemological consequences of prioritizing prediction over explanation (Hoffmann, 2019).
Moreover, AI-enabled research often challenges the traditional notion of researcher as the sole or even primary agent of interpretation. In human-centered qualitative research, meaning is co-constructed through dialogue, context, and reflexivity. However, AI systems trained on large corpora can now autonomously generate themes, detect sentiment, or classify discourses—tasks that were previously the domain of interpretive researchers (Zawacki-Richter et al., 2019). While these tools can augment human analysis, they may also obscure the interpretive layers and contextual complexities that qualitative research traditionally illuminates. The danger lies not only in epistemic flattening, but also in what Biesta (2010) describes as the “learnification” of education—where empirical description replaces normative questions about what education ought to do and be.
The increasing use of generative AI in designing research instruments—such as surveys, prompts, and even interview protocols—further complicates the relationship between researcher intention and methodological control. While such tools can offer efficiency and scalability, they also raise concerns about the reproducibility, transparency, and contextual appropriateness of AI-generated research materials. Furthermore, when AI tools suggest hypotheses or research questions based on pattern detection in large-scale data, they shift the site of inquiry initiation from human imagination to algorithmic suggestion, potentially privileging patterns that are statistically significant but substantively trivial or ideologically problematic (Leitner et al., 2023).
These shifts necessitate an epistemological response from the educational research community. Scholars must interrogate the assumptions embedded within AI systems, such as their reliance on past data to predict future behavior, their translation of complex social phenomena into discrete and quantifiable features, and their potential to reinforce existing inequalities under the guise of neutrality (Noble, 2018; O’Neil, 2016). They must also attend to the emerging hybrid epistemologies—part computational, part interpretive—that AI-enabled research design entails. This includes rethinking validity, reliability, generalizability, and reflexivity in light of algorithmic mediation.
Ultimately, AI invites a reimagining of educational research not just in terms of tools and techniques, but as a field that must grapple with new forms of agency, knowledge, and responsibility. As Knox (2020) suggested, the involvement of AI in research compels a shift from research about education to research with intelligent systems. This reframing moves beyond methodological novelty and into the realm of philosophical inquiry, pushing educational researchers to reconsider the aims, ethics, and politics of their work in an age of increasingly autonomous and intelligent technologies.
Conclusion: Toward the rebirth of educational research
Educational research stands at a crossroads. The longstanding challenges—ranging from flawed peer review and quantification bias to overgeneralization, neglect of individual diversity, and the reduction of learning to narrowly defined outcomes—have limited the field's relevance and impact. Too often, research has prioritized what is typical and measurable over what is possible and meaningful. The paradigm wars between methodological camps have further fragmented the field, impeding synthesis and innovation. As a result, much of educational research has failed to respond to the complexities of actual learning environments and to support truly transformative educational change.
The emergence of artificial intelligence (AI) intensifies these challenges while simultaneously opening unprecedented opportunities. AI accelerates the obsolescence of stable treatments, challenges the validity of existing research designs, and raises profound questions about what students should learn in an age where machines can perform many cognitive tasks. Traditional research models—linear, slow, and often reductionist—are ill-suited for an era of rapid technological change and increasing complexity. In response, educational research must embrace new epistemologies and methodologies that are adaptive, participatory, and pluralistic. These include design-based, complexity-informed, and futures-oriented approaches that account for emergent systems and evolving human–AI relationships.
AI also compels a rethinking of the very nature of cognition, learning, and knowledge. As human and machine intelligences become increasingly entangled, research must reconceptualize learners not as isolated individuals but as part of distributed, dynamic systems. This shift requires new forms of inquiry and assessment, as well as a commitment to addressing the ethical, social, and political implications of AI in education. Moreover, generative AI opens possibilities for democratizing research by involving teachers and students as co-creators of knowledge and using AI tools to augment reflective practice and design.
In sum, the future of educational research must be reimagined. It cannot simply be an optimization of current practices but must represent a deeper transformation—an epistemological and methodological rebirth. To remain relevant and responsible in the age of AI, educational research must become more imaginative, inclusive, and responsive to complexity. It must aspire not only to explain what is but to envision what could be.
Footnotes
Contributorship
The authors contributed equally to this article.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
