Abstract
AI chatbots have emerged as innovative educational tools and drawn increasing attention from educators and researchers in programming education. Although previous research has highlighted potentials of applying AI chatbots in programming education, there is a lack of empirical evidence to understand the overall effects of using AI chatbots in programming learning as well as the critical factors that influence the effects. To fill this gap, this study conducted a meta-analysis of 32 empirical studies published between 2015 and 2025 to examine the overall effect size of applying AI chatbots on programming learning performance and identify significant moderators. The results indicated a small-to-medium effect on posttest performance (g+ = 0.538, 95% CI [.202, .873], p < .01) and a medium-to-large effect on practice performance (g+ = 0.650, 95% CI [.330, .970], p < .001), based on robust variance estimation models. Moderator analyses revealed that research design and AI chatbot-to-student ratio significantly influenced posttest performance. Specifically, true experimental designs demonstrated significantly larger effects than quasi-experimental designs, and a 1:1 chatbot-student ratio was substantially more effective than a 1:N ratio. These findings underscore the potential of AI chatbots in programming education and offer practical insights for optimizing their integration into instructional design.
Keywords
Get full access to this article
View all access options for this article.
