Abstract
Technological development has led to the increasingly widespread use of Artificial Intelligence (AI) in various areas of life, including education. Particularly in the context of assessment, AI offers great opportunities, but also raises many concerns, as implementing new solutions also requires adequate procedures and tests. AI can be a significant aid, especially in testing, but there is still a lack of research that would allow for a clearer understanding of its potential. This study uses quantitative methodology to check the patterns of usage, reception of GenAI-based tools, and its future in classroom assessment, as well as to evaluate needs in terms of additional support. 131 university lecturers completed the distributed survey. Participants came from Algeria, Poland, and Türkiye. The research findings confirm that university lecturers are engaging with GenAI with different tools in assessment, however, they are limited by many biases and fears. This engagement is clearly impacted by several unrelated systemic and individual factors, such as national and international factors, experience, and attitudes. The results provide important insights into the application of AI tools in the assessment process, to achieve higher efficacy and provide direction for possible curricular modifications.
Introduction
The paper investigates university lecturers’ perspectives on the use of Generative Artificial Intelligence (GenAI) in assessment. Given the constantly evolving information technology resources available for processing, generative artificial intelligence remains a mechanism offering numerous applications in the selection and use of acquired data. However, how this potential can be used harmlessly in the classroom still remains relatively underexplored (Xia et al., 2024). Thus, this article seeks to address part of the existing research gap in the field of education.
In the paper, GenAI refers to a form of AI that produces an output such as text, audio, video, images, or 3D objects (Sengar et al., 2025). In addition, Tayade et al. (2024) also add that: “Generative AI encompasses artificial intelligence systems with the ability to create text, images, or various forms of media through the utilization of generative models” (p. 213). Through these patterns of work Generative AI (GenAI) introduces creative solutions in comparison to classification or prediction, as mainly used by traditional AI (Tayade et al., 2024). AI excels in repetitive tasks, whereas GenAI has can mimic human creativity, and through that personalise and analyse content (Sarumi & Heider, 2024). GenAI can be contextualised by entering a prompt, a sentence or two that describe what the user wants the tool to do, and based on that GenAI generates a response. The GenAI tool interprets the prompt and infers various details from it using its training data, including responding to nuances and context. This makes GenAI teacher-like in the sense that it has wide, if imperfect, background knowledge and can provide immediate feedback on the learner’s parameters, misconceptions, and evaluation queries (Łodzikowski et al., 2023). At the same time, the unpredictability of what GenAI will put out, and at times, how haphazard that output might be compared to a human response, makes it risky, as one cannot simply ask in vague terms what it means to learn history, for example, to obtain a robust response (Aguado-García et al., 2025). GenAI can be likened to a conversation partner, as it creates a dialogue that involves asking questions and clarifications to arrive at solutions (Susskind et al., 2024). And in this sense, it bears resemblance to a chatbot (Akpan et al., 2025). Its drawbacks may include providing verbose and incorrect responses with spurious references, an inability to edit previous outputs, and an inability to recall earlier queries or disagree with prior outputs (Kadel et al., 2024). However, GenAI can also be advantageous over other assessment tools, providing immediate feedback, which opens avenues to help and enhance discussions; generating new questions based on the learner’s preference and style of presentation; providing immediate grading, with rubrics that can support the onscreen assessment experience; and being applied in online assessments and learning management systems through APIs and various plug-ins (Adiguzel et al., 2023; Irons & Elkington, 2022; Michel-Villarreal et al., 2023; Sari, 2023; Tapalova & Zhiyenbayeva, 2022). By proofreading and editing assignments upon request for grades via a prompt, GenAI can uphold academic integrity at the request of institutions and organisations. Introducing question pool generation and randomised selections can also help teachers not have to write a question bank for quizzes and assignment tiers, a process that may easily be mechanised (Johnson et al., 2025; Moore, 2024). It is important to note that due to continuous data entering the model, there are pros and cons regarding potential biases and imprecise output; however, this behaviour mirrors human grading. Although the technical capabilities and limitations of GenAI tools are increasingly well documented, significantly less is known about how university lecturers understand these tools in the context of classroom-based assessment and how such understandings can shape the current implementation.
Current Trends in Higher Education Assessment
This section aims at providing insights into the main directions in assessment trends in Higher Education. GenAI technologies are transforming assessment designs in higher education to make them less susceptible to answer-generation technologies, address the need for revising assessment designs, and involve future-oriented approaches.
The emergence of GenAI tools marks a paradigm shift visible in assessment design (Khlaif et al., 2025; Ogunleye et al., 2024; Yan et al., 2024). The study conducted by Khlaif et al. (2025) based on group interviews with teachers (155) working at Palestinian universities, demonstrates that one of the visible trends in assessment is the implementation of AI-enhanced solutions in the continuous process of teaching and learning. According to participating teachers, assessment requires the integration of technology and strong learner-orientation, although many teachers do not know how to unify both aspects. Teachers were moderately positive about AI-enhanced solutions in their teaching practice, fearing an imbalance between human and computer assisted learning and teaching.
Also Luo (2024), based on an analysis of university documentation of 20 selected institutions from North America, Europe, and Australia, confirmed the need for treating AI not as an external reviewer, but incorporating its use into the curriculum to give learners a better understanding of its functioning, through integration of AI tools in assessment. (Aslanyan-rad, 2024).
The studies cited above show that teachers are important agents in the teaching process, so their perceptions of the creation of different AI-enhanced assessment tasks, resulting from their fears, caution but maybe also sometimes relief, are crucial.
These forms of assessments could be much richer than typical testing formats, or even prior computational forms like multiple-choice questions. They could also offer authentic, integrated assessments, including a variety of evaluations such as “who,”“what,” and “how.” This range of formats could easily be capable of concluding both formative and summative assessments based on a variety of activities (Dimitriadou & Lanitis, 2023; Hobbins et al., 2022). However, the task of implementation itself still lies in the hands of teachers.
Nevertheless, there is limited evidence on how lecturers integrate GenAI into formative and summative assessment workflows, what practical or ethical concerns shape their decisions, and which contextual factors (institutional policies, digital literacy, assessment culture) influence their adoption (Luo, 2024).
GenAI in Assessment
GenAI can produce a variety of high-quality assessment tasks shaping in that way new trends for the future of education which is the topic discussed in the paragraph.
At present, GenAI can easily create a plethora of low-level multiple-choice, fill-the-gap questions, and other short-answer problems with instantly computable responses (Moore, 2024). A more sophisticated range of medium and high-demand and genre questions is on the horizon. This capability, previously unknown, already means that students can no longer cheat by sharing answers. Left unexamined and unaddressed, this may lead to many negative consequences, and most importantly, excessive amounts of unnecessary work. Luckily, the same AI could be used to evaluate these generated or imported assessment items in terms of their scholarship, fairness, and soundness. As scripting tools, GenAI can offer educational designers unprecedented leverage to generate new activities, assessments, and intervention lessons, and then enrich or modify existing ones automatically. That could theoretically free the whole academic assessment design infrastructure of the preceding century and a half from biases that have been latent ever since the inception of standardised assessments in education (Lim et al., 2023).
The research conducted by Kooli and Yusuf (2025) provides promising results in reference to the quality and validity of GenAI feedback (revealing only small nuances between human and AI generated corrections), which again confirms its potential and the need for further investigation. The novel assessment designs implemented with the support of GenAI tools can be generated, field-tested, and designed to leverage lab-like experiments on a large-scale and global scale, while cross-institutionally collaborating, however as the researchers underline the capacities of GenAI assessment should be further investigated (Yan et al., 2023).
On the other hand, elaborate pedagogical designs involving GenAI tools post-assessment can be implemented in many disciplines, focusing on writing or design processes; these designs could be novel in the case of lost use.
Given the rapid expansion of GenAI and the new requirements and needs of higher education assessment, it is essential to examine how university lecturers understand, experience, and implement these capabilities in their everyday teaching practice.
Challenges of Implementing GenAI
Together with the opportunities for using GenAI in assessment, the challenges should also be confronted, including contextual and personal factors influencing their adoption. These factors are likely to influence whether lecturers choose to incorporate GenAI into assessment practices, but empirical studies documenting such influences are still emerging.
While the advent of GenAI promises to transform assessment methodologies, making great strides towards many limitations in assessment for learning and at scale (Łodzikowski et al., 2023), its implementation comes with new challenges. First, developing an accurate grading model for automatic assessment through GenAI techniques requires high development costs, as a massive amount of data needs to be collected and worked on (Bandi et al., 2023; Fagbohun et al., 2024).
On top of that, additional development costs are needed to implement suitable human-like grading rubrics, often requiring advanced AI training. (Awadallah Alkouk & Khlaif, 2024; Kooli & Yusuf, 2025). Second, the implementation of GenAI systems requires constantly keeping up with new regulations and modern applications, as the field is evolving rapidly, which is usually a costly and demanding process. The advent and adoption of revolutionary technologies are often accompanied by an extensive backlash. Third, even though GenAI is user-friendly due to Natural Language Processing (NLP), it still requires users to have a basic understanding of intuitively phrasing queries to produce appropriate outputs (Iorliam & Ingio, 2024). The increase of GenAI’s adoption and spread may widen already existing accessibility gaps for rubric-based automatic assessment systems, and furthermore reflect and amplify unevenness in social dynamics. These include technical illiteracy hindering access to education, asymmetries in AI-guided teaching, and a lack of expertise leading to uncritically reliant systems. GenAI has brought a paradigm shift in teaching, learning, and assessment. Institutions have begun to use GenAI as an intelligent tutor to coach individual students by assisting them in exploring learning resources and connecting ideas through generative prompts. (Mao et al., 2024; Preiksaitis & Rose, 2023). The rise of GenAI tools has significantly disrupted assessment design methodologies. Promising generative pre-trained transformers (GPT) models have demonstrated the ability to automate parts of assessment design, together with the teachers’ subject expertise (Moussa, 2024). For instance, GenAI tools can assist in generating complex multi-dimensional questions in a fraction of a second, tagging questions to their corresponding content topics, providing formative assessments, and more. However, institutions are also grappling with significant challenges surrounding the implementation of GenAI in assessment contexts (Jongkind et al., 2025).
Research Gap
This study contextualises the important topic of the possible applications of AI in assessment. Xia et al. (2024) mention the existing gap, especially concerning the standardisation of the methodology used for the implementation of AI in assessment and ongoing evaluation of diverse approaches. This leads to the conclusion that although AI has enormous potential in optimising educational processes, it still needs in-depth consideration and checking. Comparative research is particularly scarce in this regard. There is a lack of comparative studies verifying the use of generative artificial intelligence in the assessment process, as well as analyses of its impact on teachers and teaching practices (Khlaif et al., 2024). The analysis of potentials and challenges of GenAI implementation for assessment in different cultural backgrounds might provide valuable insights about its potential and challenges. Very few studies have sought to capture the views of educators across different continents and policy landscapes, and even fewer have focused on countries outside dominant Anglophone contexts. By including participants from Algeria (Africa), Poland (Europe), and Türkiye (bridging Europe and Asia), this study seeks to provide some pilot research and define the directions for further cross-cultural research. These countries were chosen not only for their contrasting educational policies and practices, but also for their underrepresentation in comparative research on AI in assessment. Through this design, the study contributes to a more inclusive and globally nuanced understanding of how teachers conceptualise and evaluate the use of GenAI in assessment, thereby informing ongoing discussions about equity, trust, and the standardisation of AI methodologies in education.
To examine the context described above, the authors formulated the following main research question, which focuses on teachers’ perceptions of GenAI in assessment (as referred to in subquestion 1), experiences in the authentic use of Gen AI technology in teaching practice (referred to in subquestion 2), and factors that influence the implementation of GenAI-supported assessment solutions (referred to in subquestion 3).
How do university lecturers perceive the role and potential of Generative AI (GenAI) in transforming assessment practices within different cultural and educational settings?
In what ways do lecturers integrate GenAI tools into assessment design and feedback processes, and which individual or institutional factors influence the extent of their adoption across different cultural and educational contexts?
What challenges, ethical concerns, and standardisation needs do lecturers in Algeria, Poland, and Türkiye identify regarding the implementation of GenAI in assessment across different cultural and educational contexts?
Methodology
The group that makes use of GenAI tools in classroom assessment was analysed with respect to the patterns of usage (e.g., tools used, types of assessment performed with the help of GenAI), reception of AI-based tools (e.g., effectiveness and fairness of AI-based approach), and the future of AI use in classroom assessment (e.g., continuation of usage of GenAI tools, potential risks) as well as evaluation of needs in terms of additional support. The study follows the pragmatic research paradigm, focusing on the identification of the practices and challenges of GenAI-enhanced assessment in Algeria, Poland, and Türkiye.
The collected responses are based on the subjective views of the university lecturers on the effectiveness, fairness, and possible risks of GenAI tools for assessment. Simultaneously, through the inquiry, current practices and future perspectives are identified. The study, conducted in three countries, allowed us to gain data for inferential statistical analysis, due to the size and structure of the sample, as well on the differences in perceptions of the participating lecturers. The instrument in this research was the survey designed by the authors of the paper and distributed via Google Forms in the time period between October 2024 and December 2024 (cf. Appendix 1).
Participants
131 participants completed the distributed survey. Participants came from Algeria, Poland, and Türkiye. The study involved lecturers teaching English, or English departments in a higher education institution. The participants approached the study voluntarily and were informed that they could withdraw from it at any time. Participants were also informed of the anonymous nature of the data collection, and the study was carried out in accordance with The Code of Ethics of the World Medical Association (Declaration of Helsinki) for experiments involving humans. The research was also assessed by the Ethical Committee at Burdur Mehmet Akif Ersoy University, Türkiye and has the decision number GO 2025/933 (06.01.2025) for approval. Table 1 presents information about the age of participants with respect to their nationality. Each group was relatively evenly distributed. A purposive sampling strategy will be employed to ensure the sample is correlated with the research objectives.
Number of Respondents by Country and Age.
This purposive selection is justified, as it directly serves the comparative aim of the study by building diversity into the design from the start. The rationale standing behind the purposive sampling was the cross-national comparability of the study, and professional experience in assessment of the participating lecturers (being AI users was not an a priori condition for participation). To increase validity section criteria, geographical spread, balanced demographic representation, voluntary participation, and anonymity were introduced.
The smallest subgroup was 20- to 30-year-olds, consisting of 28 respondents, and the largest (31- to 40-year-olds) consisted of 38 people. It should be noted that a transformation on the age variable was performed. This is important for the purpose of conducting statistical inference (see Table 2).
Number of Respondents by Country and Gender.
The distribution of the education of the participants is presented in Table 3. Also, here the distribution is relatively equal. As for length of service, most of the respondents had between 5 and 10 years of experience. The second most numerous group was people with between 11 and 20 years of experience. A further 21% of the sample had more than 20 years of experience. 69% of the users declared regular and frequent use of AI in their professional work. The vast majority of the respondents had used AI tools for classroom assessment. Non-users constituted only 31% of the sample. Considering how relatively new AI tools are, this indicates a significant interest in using this new technology in the academic environment.
Results of Chi-square Tests for AI Use Across Demographic Variables.
Survey Data Collection Instrument
The analysis was performed using the R programming language and the RStudio programme. All variables present in the data set can be considered as either categorical or ordinal, even though some of them may seem like numerical data. This is because when analysing variables measured on the Likert scale, we cannot evaluate the actual difference or distance between grades – it can only be stated that a grade of 2 is greater than 1, but, unlike with numerical data, it cannot be assessed by how much. The analysis was divided into two parts: descriptive and inferential. In the descriptive part, counts and proportions of the analysed variables were presented.
As all the variables are either categorical or ordinal, these two values provide a straightforward way of visualising the distributions. In the inferential part, statistical tests were performed. Ordinal variables were tested for differences in distributions by categories (e.g., the perception of fairness of assessment by gender). For such comparisons, the Kruskal–Wallis test was used, as all categorical variables have at least 3 levels, in which case the usual choice – the Mann–Whitney U test, is unfit for use (as it can be used in pairwise comparisons only). Also, two tests of independence were performed any time the presence of a relationship between two nominal variables was to be assessed (e.g., GenAI use vs gender). In case of testing for a relationship between two ordinal variables, Spearman’s coefficient was calculated to estimate the strength of the relationship, and then it is statistical significance was tested. The main hypothesis formulated in the paper is that University lecturers’ perceptions and use of GenAI-tools in classroom-based assessment are influenced by national context, professional experience, and the availability of institutional support.
Results
This section is dedicated to a description of the results. It is divided into subsections corresponding with the research questions asked. First, the general frequency of the GenAI tools correlated with the views on its implementation by teachers. Secondly, implementation fields and benefits of implementation of AI tools highlighted by teachers in their practice are discussed. Finally, barriers and risks are thematised. Based on the results of the test of independence, the cross-national correlations are also analysed in this section.
The Role and Potential of Generative AI (GenAI) in Transforming Assessment Practices (RQ1)
The teachers participating in the study unanimously pointed to the importance of generative AI in education. In particular, they highlighted numerous tools that can support assessment processes. Among the listed GenAI tools, the most frequently used one was AI-powered plagiarism detection, selected by over 76% of respondents. While this indicates a high reliance on AI in verifying originality, it represents only one narrow aspect of the assessment process. Tools that more directly support formative assessment, such as AI-assisted feedback generators or AI-supported language assessment systems, remain rarely used. This pattern suggests that lecturers are more willing to delegate administrative elements of assessment to AI, while maintaining human control over evaluative and pedagogically meaningful tasks such as feedback provision, rubric-based grading, or language performance assessment. Although plagiarism detection was the most common tool across countries, noticeable cross-country differences emerged in the adoption of more advanced assessment-related GenAI applications, reflecting varying institutional cultures and assessment policies.
The vast majority of respondents reported using AI tools frequently or very frequently. Only one respondent claimed never to use AI-based tools, although this contradicts their earlier statement indicating AI use (question 9).
Time saving was the strongest motivation for using GenAI. This indicates that lecturers predominantly view AI as a tool for reducing assessment workload, rather than as a means to improve assessment quality or student learning (Figure 1). The least frequently selected motivation was ensuring fairness and objectivity. This suggests that lecturers maintain strong confidence in their own assessment fairness and do not view GenAI as a tool that could meaningfully enhance reliability, mitigate bias, or support consistent grading across diverse student groups. Both the perception and the role of generative artificial intelligence (GenAI) vary depending on the country and educational context. These differences are particularly evident when comparing Türkiye and Poland, with Türkiye demonstrating the highest level of GenAI use, while Poland exhibits considerable caution toward these technologies. In Türkiye, GenAI tools are most commonly used for plagiarism detection. In contrast, respondents from Algeria primarily emphasise the use of GenAI tools for generating feedback that supports assessment processes, as well as for the automation of feedback.

Frequency of GenAI tools used in classroom assessment.
Integration of GenAI Tools into Assessment Design and Feedback Processes (RQ2)
Teachers also highlight benefits of using AI tools indicated by respondents are directly linked to different stages of the assessment process. There are several interesting differences in the way teachers in different countries and educational settings integrate GenAI into assessment process. Increased grading speed, particularly popular with participants in Türkiye, brings to the fact the perceived value of “GenAI” in alleviating the burden of work associated with summative assessment. In Poland, however, the most frequently indicated benefit was improving the feedback quality, which means that there is more of a tendency to use GenAI to improve formative assessment practices. Interestingly, in Algeria, more objectivity was highlighted alongside grading speed, meaning that the lecturers see GenAI as a means of making evaluative decisions fairer and more consistent. On the other hand, perceived objectivity was among the least chosen benefits in the two other countries. As indicated above, on average, the respondents view AI-based assessment to be fair, with only six lecturers rating fairness achieved through the tools at a low and extremely low level. In contrast to this assessment, the most widely reported impediments, including lack of trust in AI systems and algorithmic bias, are interrelated with quality and reliability of assessments. Lecturers use GenAI to generate initial and supplementary feedback (mainly in Türkiye) notably in large classes where human feedback is difficult to sustain and time consuming. They believe that GenAI tools are effective at addressing surface-level structure, that is, structure, language accuracy, and criterion-based comments, however, interpretive, developmental, and dialogic feedback is preserved for human intervention.
Challenges, Ethical Concerns, and Standardisation in the Use of GenAI for Assessment (RQ3)
While lecturers tend to evaluate AI-generated assessment outputs as fair, they simultaneously express significant doubts about the accuracy and trustworthiness of these systems. This discrepancy suggests a complex, and possibly unstable perception of GenAI in assessment contexts. Most respondents reported feeling comfortable using AI tools in assessment-related tasks, and none stated being very uncomfortable. Over 75% indicated that they are likely or very likely to continue using GenAI in future assessments.
A relationship was identified between frequency of use and the level of concern, namely that the more frequently generative AI was used (as observed in Türkiye), the fewer concerns it raised.
The reason for not using AI most frequently indicated by the respondents was the lack of trust in the accuracy or fairness of AI-based assessment (nearly 70% of non-users selected this reason). Among the other most important doubts, respondents indicated concerns about students’ privacy and data security, preference of traditional assessment methods, and the lack of familiarity with AI tools. Interestingly, only 2% of lecturers claimed that their institution does not encourage the use of AI tools. One person added that in their field they prefer to rely on their own assessment, which might be understood either as their preference of traditional methods over AI-based systems, or as them having concerns about data security.
The most commonly named reasons for not using AI among the participants of the survey are lack of trust in AI’s accuracy as well as the concerns about the privacy and security of the data. Among the participants, 16% agree that they prefer traditional methods of assessment and do not feel familiar with AI. Considering how relatively new AI tools are, this indicates a significant interest in using this new technology in the academic environment. While the Algerian and Polish subsamples are nearly identically distributed between AI users (constituting around two thirds of the sample) and non-users, the percentage of university lecturers using GenAI in assessment was higher in Türkiye (just below 74%).
The most frequently indicated risk of using AI tools in Algeria and Türkiye was over-reliance on GenAI for grading. In Poland, on the other hand, the loss of the personal connection between students and teachers was considered the main risk. It is also apparent that Polish lecturers have major concerns about data privacy. This might be because the citizens of the European Union have a great awareness of data privacy. The findings indicate the significant importance of the human element in assessment. On the one hand, GenAI-supported assessment can greatly accelerate and standardise the evaluation process; on the other hand, the responses reveal concerns about relying too heavily on GenAI alone, which may substantially affect human relationships in the learning assessment process. There were significant differences between countries in terms of what additional resources were needed. In Türkiye, training and workshops were the primary need. The same was quite important in Algeria, although the main required support was access to more reliable AI. Polish lecturers indicated peer support or mentoring most frequently. As for required university support, providing the funding for AI tools was beyond doubt the most important aspect for lecturers from Türkiye and Algeria . In Poland, developing clear guidelines on AI use has the highest priority among lecturers, although a wider range of training opportunities and additional funds are also considered important.
Differences Across Countries
The differences of distribution of AI users and non-users by country, age group, and job seniority group was tested using two tests of independence. At the assumed level of significance α = 0.05, no statistically significant differences were discovered, although the p-value of the test between age groups indicates that the results obtained in the sample for this variable are the least plausible under the hypothesis of no relationship between the age group and the use of AI. The test of independence was done to check the correlation between the chosen variables. The results of the analyses show that the use of AI does not differ significantly between countries, age groups, or length of service groups, although in the case of age there is a slight trend suggesting that older people use AI less often, but this is not statistically confirmed (Table 3). Significant differences do appear in the perception of the risks associated with AI, which depends on the country, but not on age or length of service. Similarly, the motivations for using AI differ between countries. Spearman’s correlation analysis indicates that all the relationships examined between the assessment of ease of use, fairness, effectiveness, frequency of use, and willingness to continue using AI are positive, moderate, or moderately strong and statistically significant.
The strongest relationship is between the assessment of effectiveness and ease of use, suggesting that people who feel comfortable using AI tools are more likely to perceive them as effective. The weakest relationship was found between ease of use and frequency of use. Continued use of AI is most strongly associated with the assessment of the tool’s effectiveness, although ease of use and perceived fairness of assessment also have a similar, moderately strong influence. Overall, attitudes towards AI are consistent: the more users perceive the tool as convenient, fair, and effective, the more often they use it and the more willing they are to declare further use.
Factors Affecting Use of AI-Enhanced Tools in Assessment
In the Spearman’s correlation matrix (Table 4) it can be observed that among pairs of ordinal variables, the strongest relationship exists between the assessment of the effectiveness of AI tools and AI use comfort. One probable cause of such a result would be that the more comfortable a person is using AI based tools, the better use they can make of these tools, and hence use them more efficiently and effectively, obtaining better results. It is rather unlikely that any direct causation exists between these two variables, as there is no direct logical connection between them.
Spearman’s Correlation Matrix.
The interesting thing is that the relation between how comfortable a person is using AI and the frequency of using AI tools is the weakest in the sample (0,49 – it is still considered a moderate correlation). As for the assessment of the likelihood of continuation of the use of AI tools, the strongest relationship can be observed with the opinion on the effectiveness of these tools, although differences between effectiveness and ease of use, or views on fairness of assessment are minimal, all three variables are moderately strongly correlated with the likelihood of continuing using AI-based tools (Table 4). These results are in line with the intuition that people are more likely to continue using tools they feel comfortable with, that they consider fair or effective. It is interesting that the strongest relationship was observed for the effectiveness component, but the differences between the values of Spearman’s correlation coefficient are so small that, considering the sample size, the result cannot be extrapolated to the wider population. All of the relationships presented above are statistically significant (at the standard significance level α = 0.05), positive and can be considered at least moderate.
Also the results of Kruskal–Wallis tests revealed differences among countries concerning AI effectiveness and therefore only these are presented below (Table 5). The points outside the whiskers represent outliers, values that can be considered extreme.
Kruskal–Wallis Test Results (AI Effectiveness vs Country).
The analysis confirmed that the perception effectiveness of GenAI use in assessment differs among respondents from different countries (Figure 2). The Kruskal–Wallis test confirmed the existence of statistically significant differences (H = 29.5, p < 0.001). The results show that the lecturers from each country represented different views regarding the efficiency of GenAI implementation in assessment.

Distribution of AI effectiveness rating by country.
The Kruskal–Wallis test showed that representatives of the three countries participating in the study assess the fairness of AI use in very different ways (H = 24.4, p < .001) (Table 6). Additionally, a post hoc analysis revealed that differences in perception appear between Poland and Türkiye and Algeria, while between Algeria and Türkiye these differences are not statistically significant (Figure 3).
Kruskal–Wallis Test Results (AI Fairness vs Country).

Distribution of AI fairness rating by country.
In terms of ease of use and comfort of use, the Kruskal–Wallis test also showed significant differences between countries (H = 30.3, p < .001) (Table 7 and Figure 4).
Kruskal–Wallis Test Results (AI Use Comfort vs Country).

Distribution of AI ease of use rating by country.
The Dunn post-hoc test shows that the largest difference is between Türkiye and Poland (z = −5.45, p < 0.0001).
The differences in the likelihood of continuation of using AI tools are also statistically significant (Table 8). Respondents from Türkiye are, on average, most likely to continue using AI, while Polish lecturers have less confidence in that regard (Figure 5). Post-hoc Dunn tests were conducted; detailed results are provided in Supplementary Tables S1–S4 (Appendix 1).
Kruskal–Wallis Test Results (AI Continuation vs Country).

Distribution of AI use continuation likelihood rating by country.
Discussion
This study provides data and insights into the new implementations of GenAI in the higher education contexts in classroom-based assessment and its effects, highlights the possible emerging challenges they pose, and denotes numerous advantages of this use on both teaching practices and student learning experiences across three national contexts: Algeria, Türkiye, and Poland. The main findings indicate the participating teachers feel comfortable about the use of GenAI, trusting its fairness, however they expressed doubts in its full accuracy. Among the most commonly named barriers to its use, lecturers mentioned the fear for students’ privacy. An important observation resulting from an analysis of the survey’s results is the fact that lecturers perceive GenAI less as a means to decrease the workload connected with assessment than as a potential new solution to improve the assessment process. The study depicted cross-national differences in perception of GenAI’s efficiency, fairness and ease of use between representatives of different countries, which might be a useful direction for further studies. Within this context, this section presents critical interpretations of the current study’s findings, derived from a mixed-methods survey and statistical analysis of 131 participants, in the light of the main suggested hypothesis and the research questions, with direct relevance to the literature review. The investigation is guided by the main hypothesis which posits that university lecturers’ perceptions and use of AI tools in classroom-based assessment are influenced by national context, professional experience, and the availability of institutional support. The findings strongly confirm this hypothesis, which provides empirical evidence of the complex interplay of the aforementioned factors in shaping GenAI use within the higher education context.
The Role and Potential of Generative AI (GenAI) in Transforming Assessment Practices (RQ1)
The findings indicate that the level of GenAI adoption among lecturers is moderate to high with strong preference of younger lecturers and higher acceptance in Türkiye than in Algeria and Poland. The adoption is neither universal nor indiscriminate. Importantly, the most commonly selected tools were AI-based plagiarism checkers (76.7%), automated grading systems (54.4%), and AI-generated feedback (46.7%). These tools were regularly integrated into summative assessments like essays, quizzes, and exams. Such perceptions corroborate recent research that has stated that generative artificial intelligence (GenAI) use in the domain of higher education is more a matter of functional necessity than pedagogical disruption (Michel-Villarreal et al., 2023; Plate et al., 2024).
In addition, according to the findings, university lecturers believe that the use of GenAI contributes to the generation of assessment functions that can be clearly defined and automated. This is demonstrated when using automated grading, standardised feedback generation, and plagiarism detection. Such applications of GenAI are located mainly in formal summative assessment contexts, where the qualities of scalability, consistency, and time efficiency are valued more. This finding is consistent with findings that indicate that educators prefer to assign routine assessment tasks (e.g., grading multiple-choice questions) to AI systems while retaining the evaluative authority (Adiguzel et al., 2023; Irons & Elkington, 2022; Tapalova & Zhiyenbayeva, 2022).
On the other hand, lecturers note limitations about the role of GenAI in formative and judgement-heavy assessment practices. While they do not discount the merits that the tool’s use presents, which include its ability to provide instant feedback, they express scepticism about its ability to adequately address the contextual, developmental, and dialogic dimensions of assessment for learning. This points towards the broader concerns around the epistemic limitations of automated systems. This finding is in line with previous research findings on the irreplaceability of human judgment in formative assessment (Khlaif et al., 2025; Łodzikowski et al., 2023).
According to the participants, the adoption of GenAI reshapes assessment practices with a selective character. This transformation can be achieved only under ethical governance, institutional support, and most importantly, methodological transparency. In this spectrum, lecturers conceptualise transformation as conditional and incremental, and do not deny GenAI’s transformative potential. Finally, the observation of Xia et al. (2024) that uncertainty is viewed as the dominant barrier to the integration of Gen AI is confirmed by the fact that the evident presence of clear institutional policies and standardised frameworks increases lecturers’ proclivity to extend such integration. In terms of perception, role, and motivation of GenAI use for assessment, a clear polarisation of views can be observed among teachers from different countries. This may result from experience in using the tools, appropriate training, and possibly also from properly formulated rules for the operation and implementation of generative AI tools in education. The strongest correlation might be seen between the frequency of usage and comfort and reliability of the GenAI use. The biggest differences in this respect can be noted between Türkiye and Poland.
Integration of GenAI Tools into Assessment Design and Feedback Processes (RQ2)
The findings demonstrate that the lecturers’ adoption of different GenAI tools is selective. This adoption is used to reinforce both assessment design and feedback processes that are usually scalable, routinised, and time-intensive. Lecturers indicate that the most commonly used GenAI tools are used for automated grading, plagiarism detection, and the generation of standardised feedback. These applications are largely found in cases of summative assessment contexts, where lecturers prioritise efficiency, consistency, and workload management. Such use is regarded as an instrumental approach to integration. These findings support prior findings that suggest that lecturers are likely to use AI for tasks that are clearly structured and amenable to automation (Michel-Villarreal et al., 2023; Plate et al., 2024). Also, regarding the choice of GenAI tools and their main modes of implementation, attitudes differ. Turkish lecturers are more willing to trust automated assessment, whereas Polish and Algerian participants are more hesitant. In Algeria, the feedback process is more supported by GenAI.
Notably, lecturers reveal that they use GenAI tools in designing assessments. In such adoption, GenAI is a design assistant rather than an autonomous decision-maker. Lecturers use GenAI tools to generate question pools, create and refine grading rubrics, or standardise assessment criteria, which enables them to retain control over content validity and alignment with learning outcomes. This finding reinforces Moore’s (2024) and Johnson et al.’s (2025) findings that point out the importance of preserving human oversight in streamlining assessment development while growing GenAI’s role. Also, the findings indicate hesitation on the lecturers’ part to rely on GenAI tools for authentic, complex, or discipline-specific assessment design (mainly in Poland and Algeria), reflecting ongoing concerns regarding epistemic depth and contextual understanding. This finding aligns with Luo’s (2024) findings.
Considering the individual level, lecturers’ engagement with GenAI is outlined by different aspects such as lecturers’ experience, digital literacy, and technological confidence. Lecturers with considerable experience tend to adopt GenAI tools in assessment cautiously and strategically, privileging reliability and assessment validity. However, less experienced lecturers reflect openness in using GenAI for exploratory purposes, particularly in feedback generation. What is notable, is their shared opinion with reference to the necessity of human oversight, underscoring a professional commitment to pedagogical responsibility when using GenAI tools.
At the institutional level, lecturers operating within institutions and universities that provide clearer AI-related guidelines, digital infrastructure, or assessment support mechanisms report more consistent and confident integration of GenAI tools. The situation is not the same in institutions and universities where GenAI use policies are ambiguous or even lacking. Lecturers’ use of GenAI tools in such cases is restrained, and occasionally limited to low-risk assessment functions. This finding support Xia et al.’s (2024) findings on the importance of methodological standardisation. The lack of methodological standardisation and policy clarity remains a significant barrier to meaningful AI integration in assessment.
These findings emphasise the importance of developing clear institutional frameworks, ethical safeguards, and professional development opportunities to support informed and context-sensitive use of GenAI tools. Also, the adoption of these tools in assessment is less depending on individual willingness. Additionally, lecturers favour the human-in-the-loop models, in which GenAI tools boost, but do not replace professional judgement. This finding aligns with prior findings of previous researchers that debate literature advocating for responsible and ethically grounded AI integration in education (Capraro et al., 2024; Parker et al., 2024). The ease and comfort of using GenAI tools for assessment is highest in Türkiye and lowest in Poland. The same pattern is observed regarding the likelihood of continuing to work with GenAI assessment tools. These observations strongly indicate a relationship between the geographical and social context and the use and implementation of GenAI solutions in assessment.
Challenges, Ethical Concerns, and Standardisation Needs in the Implementation of GenAI in Assessment (RQ3)
The findings indicate that lecturers consider that the adoption of GenAI in assessment is constrained by a complex set of interrelated systematic, ethical and governance-related challenges, rather than technological limitations. Lecturers indicate that even though GenAI is valued for its ability to improve efficiency and scalability, lecturers consistently emphasise concerns related to trust, accountability, and consistency suggesting a cautious approach to adopt it across various educational contexts and levels. Lecturers identified the limited transparency and reliability of GenAI-generated assessment outputs as a central challenge. The lecturers reveal concerns over the opacity of algorithmic decision-making and the difficulty of verifying the validity of automated grading and feedback. These concerns are particularly pronounced in high-stakes assessment contexts, where lecturers retain legal and professional responsibility for evaluation outcomes, that is, they are accountable for evaluative decisions. This finding confirms prior research noting the risks associated with algorithmic bias and the lack of explainability in AI-driven assessment systems (Bandi et al., 2023; Kooli & Yusuf, 2025).
Lecturers perceive that there are ethical issues revolving around academic integrity and equity. They specify that GenAI tools are powerful at detecting problems like plagiarism, and are useful in providing consisted grading. However, GenAI presents a paradoxical risk to the normalisation of AI-mediated assessment in environments where students have differing access to GenAI tools, digital infrastructure, or different levels of AI literacy. Such disparities are keenly observed across cultural and cross-national divides, lending credence to concerns that unregulated AI adoption may exacerbate existing educational disparities rather than achieve fairness (Capraro et al., 2024; Parker et al., 2024).
In addition, based on the lecturers’ views, contextual and cultural variability is identified as a major challenge in cases where there are no clear institutional policies or national guidelines concerning the use of GenAI in assessment. Uncertainty about what is acceptable further restricts lecturers’ willingness to harness GenAI for anything beyond low-risk uses that augment their work. Conversely, with more clear policy frameworks and technological support, institutions adopt with more confidence, but still with some limits. This finding regarding institutional regulation as a mediating factor in ethical and pedagogical decision-making, aligns with Xia et al.’s (2024) argument for context-sensitive governance and implementation of AI.
A recurring concern in all contexts was the lack of standardised assessment frameworks of GenAI integration. Lecturers highlight the absence of methodological frameworks, shared criteria, validated rubrics, and procedural guidelines as a deep impediment to achieving consistent and trustworthy assessment practices (Khlaif et al., 2024; Luo, 2024).
Overall, the lecturers conceptualise the challenges of GenAI adoption as essentially institutional and ethical, and that this can be best addressed through governance efforts, instead of scattered technological interventions. Lecturers consistently recommend HITL assessment models to be backed by clear institutional policies, professional development, and standards across contexts.
Conclusion
The research findings here consolidate the idea that university lecturers in Poland, Türkiye, and Algeria engage with GenAI enhanced tools in assessment. This engagement, considered to be a multi-dimensional engagement, is clearly impacted by several unrelated systemic and individual factors i.e., national and international factors, experience, and attitudes. Lecturers’ adoption of GenAI in assessment is at a moderate level – especially in summative assessment tasks, such as plagiarism detection, automated grading, and feedback. Also, lecturers hold that GenAI tools guarantee efficacy in reducing assessment loads, giving effective feedback, limited time boundaries, and increased faster feedback loops in large-scale classroom contexts. However, lecturers insist that appropriate integration of GenAI, coupled with the values of fairness, accuracy, and academic integrity in assessment in the institutions face several barriers, such as pedagogical conservatism, ethical ambiguity, and institutional uncertainty. Following this spectrum, application of GenAI in assessment contexts necessitates not only the accessibility of the tools and relevant infrastructure but also the development of trust, transparency, lecturer agency, strong institutional policy, clear guidelines, continuous technical support, and continuous professional development, all of which should be provided by institutions and stakeholders. Additionally, lecturers highly value the nuanced, relational, and contextual aspects in assessment, which reflect the importance of these dimensions that are not replicated when using GenAI in assessment. In this spectrum, GenAI tools should be used to support assessment. The relatively small sample size and the fact that the study was limited to a survey mean that it can only be considered a pilot study. However, the results obtained indicate interesting correlations, such as frequency of use and willingness to continue using GenAI, or the comfort in using GenAI tools and the likelihood to perceive them as effective. These directions are worth further investigation.
The collected findings might be a valuable contribution to better understanding how and why teachers use GenAI for assessment, or why they are resistant to it. Beyond that, the results might be helpful in preparation of pre-service teachers for conscious application of GenAI in assessment and/or formulating practical rules for in-service teachers supporting the use of GenAI tools in assessment.
Supplemental Material
sj-docx-1-sgo-10.1177_21582440261454817 – Supplemental material for Generative AI and Classroom Assessment: Insights into Higher Education Practices from Algeria, Poland, and Türkiye
Supplemental material, sj-docx-1-sgo-10.1177_21582440261454817 for Generative AI and Classroom Assessment: Insights into Higher Education Practices from Algeria, Poland, and Türkiye by Joanna Kic-Drgas, Asma Rahmani and Ferit Kılıçkaya in SAGE Open
Footnotes
Appendix 1
Survey on AI Use in Classroom Assessment by University Language Lecturers
Mark only one oval.
Skip to section 2 (Participation declined)
Participation declined
Thank you for your time. You have elected not to participate in the survey.
Mark only one oval.
Mark only one oval.
Mark only one oval.
Mark only one oval.
Mark only one oval.
Mark only one oval.
Mark only one oval.
Section 2: Al Use in Classroom Assessment
Mark only one oval.
Reasons for not using Al in assessments
Check all that apply.
Mark only one oval.
Skip to question 25
Al Use in Assessment
Check all that apply.
Check all that apply.
Mark only one oval.
Mark only one oval.
Perceptions and Experiences with Al in Assessment
Mark only one oval.
1 2 3 4 5
Less O O O O O Much more effective
Mark only one oval.
1 2 3 4 5
Strongly Disagree O O O O O Strongly Agree
Check all that apply.
Check all that apply.
Mark only one oval.
1 2 3 4 5
Least Comfortable 0 0 0 0 0 Very comfortable
Future of Al in Classroom Assessment
Mark only one oval.
1 2 3 4 5
Least likely 0 0 0 0 0 Very likely
Check all that apply.
Mark only one oval.
Check all that apply.
Acknowledgements
The manuscript was revised and language-edited with the support of digital writing tools, including Grammarly and Claude, to improve its clarity, readability, and overall coherence. The authors retain full responsibility for the manuscript's content, interpretations, and any remaining errors. The third author gratefully acknowledges the support of the Polish National Agency for Academic Exchange (NAWA) through the Ulam NAWA Programme.
Ethical Considerations
The study was carefully designed to adhere to ethical standards for research involving human participants, including the Declaration of Helsinki. All procedures were non-invasive and posed no physical, psychological, or social risk. Participation was entirely voluntary, with the right to withdraw at any time without consequence. The research was also assessed by the Ethical Committee at Burdur Mehmet Akif Ersoy University in Türkiye and has the decision number GO 2025/933 (06.01.2025) for approval.
Consent to Participate
Informed consent was obtained before data collection through a detailed consent form outlining the study's purpose, procedures, potential risks and benefits, and data protection measures. All data were anonymised, and identifiable information was removed during analysis and reporting.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors will receive financial support from Adam Mickiewicz University for the open-access publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Data sharing is possible upon a request.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
