Abstract
Calligraphy is the treasure of traditional Chinese culture and it is ubiquitous in China. Because of the unique aesthetic characteristics of calligraphy, the difficulty of calligraphy teaching lies in the assessment of calligraphy works. The major problems concerning calligraphy assessment are the low time efficiency of calligraphy assessment and low engagement of students in the assessing process. As an active learning method, peer assessment (PA) has been widely applied to education. This research aims to explore the application of PA to the calligraphy class. With reasonable training, elementary school students can act rationally as an assessing group. On the basis of students’ scores, this study further takes the most reasonable scores based on clustering analysis as the final score. Through the comparative analysis of the students’ scores and the teachers’ scores, it shows that the students’ scores of calligraphy writing can obtain results that are close to the teachers’ scores. The significance of the study lies in enhancing the time-efficiency of calligraphy assessment and maintaining the learners’ high levels of engagement in the assessing process. The results show that the use of PA in the assessment of calligraphy writing is an effective and accurate method of calligraphy assessment, which contributes to students’ command of the key points of calligraphy marking and plays an integral role in guiding students’ future calligraphy practice.
Plain Language Summary
Purpose: This study shows that using of PA in the calligraphy writing assessment can promote students’ calligraphy practice and their understanding of calligraphy. PA in the calligraphy writing assessment help students develop their critical judgment skills and students get more involved in the calligraphy course. Methods: Students are actively engaged in learning calligraphy writing, and assess each other’s writing from 3 factors composed of 10 dimensions in terms of structure, strokes and handwriting. Based on students’ scores, this study further obtains the most reasonable scores as the final score based on K-Means clustering analysis, so the assessment results are more accurate. The majority of students are satisfied with their PA assessment scores and find the assessing suggestions helpful. Conclusions: The research results suggest that PA for calligraphy writing assessment can be used to promote students’ calligraphy learning, cultivate students’ calligraphy writing skill and help students grasp the main points of calligraphy assessment by making marking judgments on other students’ writing and providing specific feedback. The significance of the study lies in enhancing the time-efficiency of calligraphy assessment and maintaining the learners’ high levels of engagement in the assessing process. Implications: Cluster Analysis-Based Peer Assessment learning model can be applied to other art learning that requires subjective assessment. Limitations: For the future study, it is hoped that the students’ whole writing process can be recorded in videos, and the student’s brush grip gesture and stroke writing order can be further analyzed.
Introduction
As one of the Chinese cultural treasures, calligraphy has attracted increasing attention. Calligraphy is unique artwork, which has a history of over 3,000 years. The Chinese has laid emphasis on calligraphy since ancient times, which is reflected in various aspects of life, such as pub and restaurant signs, lantern riddles and artistic decoration. Calligraphy has been playing a critical role in celebrating Chinese spring festivals. In recent years, the standardized mass produced Chinese New Year couplets are gradually being replaced by classical Chinese New Year couplets handwritten with brush and ink.
Ministry of Education of China motivates elementary and secondary schools to lay more emphasis on calligraphy education. “Compulsory Education Fine Arts Curriculum Standards” promulgated by the Ministry of Education of China stipulated that students should be able to “appreciate calligraphy works and understand its characteristics” (MOE, 2011). “Guiding Outlines for Elementary and Secondary Schools Calligraphy Education” proclaimed by the Ministry of Education of China put forward the specific requirements for calligraphy education for students in elementary and secondary schools. It also stipulated that the basic teaching content of the calligraphy education includes Chinese character literacy and handwriting teaching, the basic goal is to improve the writing ability of Chinese characters, the basic approach is handwriting practice, and appropriately integrates calligraphy aesthetics and calligraphy culture education if possible (MOE, 2013). As importance has been attached to the cultivation of calligraphy skills, schools in China begin to offer calligraphy courses to students.
Learning calligraphy is not only conducive to cultivating students’ patience, but also to developing reflective and enterprising spirit in students. Students who can write Chinese characters correctly and in a neat and tidy manner generally have a serious study attitude. Therefore, through long-term calligraphy education, students can form good behavior habits and character traits, such as concentration, diligence and perseverance.
In order to improve their calligraphy skills, students have to finish calligraphy assignments on a weekly basis. Consequently, the teachers have to evaluate a large number of calligraphy assignments in a timely manner, rendering them incapable of providing accurate and effective evaluation. In view of the large amount of workload, the teacher’s subjective evaluative comments are reduced to a monotonous grade or a rank. Since the students do not understand the evaluation criteria, it is difficult for them to make targeted modifications to the calligraphy assignments.
As an active learning method, peer assessment (PA) has been widely applied to higher education (Bouzidi & Jaillet, 2009; Jones & Alcock, 2014; Ketonen et al., 2020; Lin, 2016; Panadero & Alqassab, 2019; Planas-Llado et al., 2014; Reinholz, 2016; Seifert & Feliks, 2019). Several studies have also applied PA to secondary education (Tsivitanidou et al., 2011). However, PA has been rarely used in primary education. In this study, the elementary students are trained in terms of calligraphy assessment criteria. Therefore, they can participate actively in the calligraphy assessment via PA, which facilitates students’ understanding of the key points while practicing calligraphy and students get more involved in the calligraphy course. The innovation of this study is to change the traditional evaluation method of calligraphy writing, so that students become raters. When PA is applied to the calligraphy assessment, elementary school students are able to actively participate in the calligraphy assessment as assessment raters. Moreover, on the basis of students’ scores, this study further adopts the most reasonable score based on K-Means cluster analysis as the final score. Comparing the students’ final scores with the scores of experienced teachers, we find that students’ calligraphy assessment results are close to seasoned teachers’ results. The significance of the study lies in enhancing the time-efficiency of calligraphy assessment and maintaining the learners’ high levels of engagement in the assessing process. After acquiring the marking criteria, students can make targeted revisions to calligraphy writing in accordance with the detailed peer assessment.
As such, our study aims to examine the effects of Peer assessment (PA) on students’ calligraphy and explore students’ perceptions of PA. The study is guided by the following research questions:
What is relationship between students’ calligraphy scores given by the teachers and by peers?
Is there a difference between students’ calligraphy scores given by the teachers and by peers?
How do students perceive PA in calligraphy writing?
Related Works
The objective of calligraphy education is to cultivate the basic brush writing skills of elementary and secondary school students and help them understand and appreciate the art of calligraphy. First and foremost, it can improve students’ ability to write Chinese characters using brush. Secondly, it can also develop students’ aesthetic tastes, build their personality and cultivate their cultural pursuits. In a word, it is conducive to promoting the all-round development of students.
Chinese Calligraphy
Various countries and regions carry out calligraphy education, teaching students to write Chinese characters with a brush, such as China, Japan, South Korea and many other countries in Southeastern Asia. The popularization of calligraphy in Japan was concurrent with the introduction of Buddhism to Japan. People began writing Buddhism scriptures with Chinese characters using a brush in the sixth century. The methods of making brushes and inks were introduced to Japan in the seventh century, which catalyzes the development of calligraphy in Japan. In the late seventh century, the special calligraphy art came into existence in Japan.
Until now, Japan still has a strong atmosphere of calligraphy. Shop signboards utilizing brush and ink are widely used in Japan. Calligraphy courses are offered in schools of compulsory education. Students can learn calligraphy not only in schools but in intensive classes in various training schools. Besides promoting calligraphy in schools, large scale calligraphy competitions are held regularly. For instance, the New Year calligraphy competition is organized annually, which contributes to the popularization of calligraphy.
In China, emphasis has always been laid on calligraphy education since ancient times. Both the aristocracy and the common people attach importance to calligraphy. Calligraphy certainly occupies an important position among the four Chinese high culture activities, for example, Guqin, Go, calligraphy and painting.
“Compulsory Education Chinese Curriculum Standards” put forward the specific requirements for calligraphy education: Master the basic strokes and common radicals of Chinese characters, write Chinese characters in accordance with stroke orders using pen, paying special attention to character form, appreciating the aesthetic beauty of characters. Form the good habit of writing characters with good postures and in a neat and tidy way. Write regular script with a pen at a certain speed and write regular script with a brush in a beautiful style (MOE, 2011).
Currently, some scholars are more concerned with the aesthetic value and cultural connotations of calligraphy. Peng and Geng (2013) explored iconicity in the metrical structure and the cultural value of artistic Chinese calligraphy in line with Carles S. Peirce’s theory. Calligraphy practices utilizing new media was a useful remediation in the digital era, and offered inventive deviations from the traditional mode (Vermeeren, 2018). Some researchers focused on the psychological effects of calligraphy. Yang et al. (2010) proposed that calligraphy offered a promising approach in improving the health of cancer patients. Chu et al. (2018) discussed that the Chinese calligraphy exerts curative effect on neuropsychiatric symptoms. Zhang et al. (2021) showed the positive role of calligraphy practice and they proposed that copying pleasant calligraphy could decrease aggression comparing with copying neutral calligraphy. Based on the discussion above, it can be seen that calligraphy is currently elucidated as a cultural symbol. In addition, importance has also been attached to the therapeutic effects of calligraphy, but there is currently a lack of research on how to motivate the students and get them more engaged in calligraphy learning, especially the effective calligraphy teaching mode at the elementary level.
Peer Assessment
Peer assessment falls into two categories: anonymous assessment and non-anonymous assessment. Scholars concentrated on the effects of anonymous assessment and non-anonymous assessment and the incorporation of PA with self-assessment. In higher education environment anonymous PA seems to contribute to students’ delivering of more critical peer feedback, increased self-perceived social effects, and enhanced performance (Panadero & Alqassab, 2019). Seifert and Feliks (2019) identified student-teachers’ attitudes concerning the improvement of their assessment skills and the quality of their assignments and found that students significantly benefitted from self-assessment and anonymous PA. Anonymous assessment provides some benefits in PA. However, some scholars such as Lin (2016) argued that the non-anonymous assessment can also be an effective facilitator for PA. Facebook, with its non-anonymous feature, was regarded as a convenient tool for performing peer assessment (Lin, 2016).
Some researchers delimitated the interrelationship of PA and self-assessment. A model describing how peer assessment supports self-assessment was proposed and applied to three activity structures to analyze their potential to support learning by promoting self-assessment (Reinholz, 2016). The current research on the combination of PA and self-assessment mainly focuses on the application in specific scientific fields. A combination of PA with self-assessment gave a better validity to this assessment method when applied to exams concerning the exact science field, specially, calculations, mathematical reasoning, short algorithms and drafting of short texts (Bouzidi & Jaillet, 2009).
Some researches focused on the students’ perception of PA and emotional responses. Students perceived peer assessment as a both motivating and recommended methodology that facilitates the acquisition of learning at different levels (Planas-Llado et al., 2014). Students’ emotional responses and participation were analyzed and examined in an online peer assessment activity using both quantitative and qualitative analysis (Cheng et al., 2014). Secondary school students had positive attitudes towards unsupported reciprocal PA and that they intended to implement unmediated PA again (Tsivitanidou et al., 2011). Based on the literature above, it can be seen that maintaining a positive attitude in PA can enhance the desired effectiveness of peer assessment in terms of learning outcomes and engagement.
Some researchers applied PA to subject teaching. Jones and Alcock (2014) investigated whether students were able to reliably and validly assess their peers’ conceptual understanding of advanced mathematics. While instructing a computer science project management course, three teaching methods were utilized: project-based learning, spiral learning, and peer assessment. Depending on the quality of the students the combination of methods produced different results (Jaime et al., 2016). Wang et al. (2021) examined how the two different roles (assessors and assessees) in peer assessment contributes to students’ perceptions of science learning in mobile technology-supported collaborative learning (MSCL). Ketonen et al. (2020) examined which factors affected students when PA was implemented in the early stage of physics studies in the context of conducting and reporting experiments. PA is more applicable to specific subjects teaching because the assessment content and objectives are more concrete.
Some scholars focused on the relationship between students’ rating and teachers’ rating of tasks. Russell et al. (2017) provided evidence of effective practice of PA utilizing calibrated peer review (CPR) and also demonstrated variation in students’ ratings of mid and lower quality writings in spite of the reviewer’s competency in CPR. Based on empirical studies published since 1999, Li et al. (2016) performed meta-analysis to examine the agreement between peer and teacher ratings and factors that might influence this agreement. Li et al. (2020) continued to perform a meta-regression analysis to examine the factors that are likely to exert influence on the PA effect and indicated that the most crucial factor is rater training. A few other variables (such as rating criteria, rating format and frequency of PA) also show noticeable, though not statistically significant effects. Students need to receive training before assessment. The understanding and mastery of assessment criteria determines whether the results of students’ rating are accurate. Therefore, students’ training and rating are two critical factors in PA, which exert huge influence on the assessment effect in PA.
Online Peer Assessment in Calligraphy Courses
In this study, we select 136 Grade 5 elementary school students enrolled in calligraphy classes. Grade 5 students have acquired a large number of Chinese characters and also learned the writing process of these characters. So grade 5 students, with some training, are able to participate in PA in calligraphy class. These fifth-grade elementary school students mainly come from three primary schools in Beijing. They are around 11 years old, with 72 boys and 64 girls.
The students mark and provide feedback on four consecutive tasks. Each task is marked anonymously by all students, using an online marking system (wenjuanxin, available at https://www.wjx.cn/). The wenjuanxin is an online platform, which can be installed on smart phones or PCs. It is free to use wenjuanxin to conduct surveys, carry out questionnaires, organize online tests, provide evaluation, and elicit opinions. Thus, it is suitable to carry out online PA. In the calligraphy courses, teachers send the calligraphy marking link to students then the students click the calligraphy marking link. These calligraphy marking tasks are also independently double-marked by five experienced teachers, in order to set an expert reference against which the peer assessment scores awarded can be measured.
In order to decide which calligraphy competences are of vital importance to students, we interview five experienced calligraphy teachers with at least 9 years of calligraphy teaching experience. The analysis of the interview reveals crucial competency items and we develop the three scales of structure, stroke and handwriting. Therefore, the assessment scale of calligraphy writing mainly examines three aspects: structure, stroke, and handwriting.
In accordance with the learning objective of calligraphy course, we put forward the marking criteria (3 factors composed of 10 dimensions) for students’ calligraphy writing, namely, the assessment scale for students’ calligraphy writing, as is shown in Appendix I. The sub-scale of structure encompasses three dimensions, namely, the symmetry, regularity and smoothness of the structure. The sub-scale of stroke includes five dimensions, specifically, stroke shape, length, weight, position, and angle. The common strokes of Chinese characters are shown in Figure 1. The sub-scale of handwriting comprises two dimensions, that is, the correctness and the continuity of handwriting. Students are actively engaged in learning calligraphy writing; they assess each other’s writing from 3 sub-scales composed of 10 dimensions. Data have been collected via a free online platform. Students assess other learners’ calligraphy in accordance with the assessment scale and submit their scores via wenjuanxing, the online platform.

Common Strokes of Chinese Characters*. (a) left-falling stroke & turning (撇折). (b) lying hook (卧钩). (c) vertical curved hook (竖弯钩). (d) vertical hook (竖钩). (e) rising stroke (提). (f) vertical rising stroke (竖提). (g) horizontal turning hook (横折钩). (h) horizontal turning curved hook (横折弯钩). (i) left-falling stroke (撇). (j) horizontal stroke (横). (k) curved hook (弯钩). (l) horizontal hook (横钩). (m) right-falling stroke (捺). (n) left-falling stroke & dot (撇点). (o) dot (点).
Twenty typical Chinese characters are selected. Based on the different writing structures of Chinese characters, they are divided into four different types of tasks: Task 1 (left, middle and right structure Chinese characters), Task 2 (upper and lower structure Chinese characters), Task 3 (left and right structure Chinese characters), and Task 4 (single-element characters). Students and teachers assess them separately. The Chinese characters examined by these four tasks are shown in Table I.
Four Tasks for Calligraphy Assessment.
Three factors (structure, stroke, and handwriting) composed of 10 dimensions are put forward to assess the calligraphy writing. The total score of each dimension is 100. Five experienced calligraphy teachers are selected to grade the students’ writing. The teachers grade the students in accordance with the 10 dimensions of each Chinese character. The mean of the five teachers’ score is calculated and is to be regarded as the yardstick of the PA.
Assessment Procedure of Students’ Writings
This peer assessment process is divided into three separate stages, as shown in Figure 2.

The Flowchart of Peer Marking and Teacher Marking.
First, we familiarize students with the marking criteria of 3 categories composed of 10 dimensions so that students can understand and master the marking criteria. Secondly, students assess the calligraphy writing via wenjuanxing online platform. Finally, clustering analysis is performed on students’ assessment scores on 10 dimensions.
The goal of cluster analysis is to divide a set of data points into groups so that the data points within each group can be compared with each other and display difference from the data points of the other groups. Before participating in assessing the calligraphy writings, students are trained in terms of the assessment criteria. Although the students have learned the criteria of calligraphy grading, as a rater of calligraphy works, compared with experienced teachers, it is inevitable that the students will give abnormal scores. One hundred thirty-six students have been selected to take part in the assessment. When a large number of students participating in grading, the possibility of abnormal scores becomes quite high. Therefore, we utilize the cluster analysis to sift the influence of abnormal value. The cluster analysis for students’ scores can eliminate the unusual scores, such as exceedingly high scores or exceedingly low scores. Thus, the final scores will be close to the experienced teachers’ scores.
Because of its power and simplicity, K-means (MacQueen, 1967) is the most widely utilized clustering analysis algorithm. Therefore, we perform the K-means clustering to retrieve the scores of 10 dimensions. The average score of the 10 dimensions will be the final score. In this way, the total score of the grading is 100.
Training of Students in Terms of Peer Assessment Criteria
As proposed by Li et al. (2020), the most critical factor of PA is rater training. Before students embark on assessing the calligraphy writings, we train them in terms of the assessment criteria. Raters’ familiarization with the assessment criteria can help students give appropriate scores. Moreover, it can also facilitate students’ understanding of the key points while practicing calligraphy.
PA generally refers to students’ judging their peers’ work based on the assessment criteria. Therefore, we show students teaching videos. Thus, they can know the pronunciation, collocation and structure of the selected Chinese characters. Most important of all, students get to know how to write the character on mizige (Figure 3), which is the special sheet with square lattice printed on it, used for practicing writing Chinese characters. After watching the videos, students begin to write the Chinese characters on mizige. Consequently, students can have a deep understanding of the crucial points when they write the selected Chinese characters. After students finish writing the Chinese characters, the teachers explain the criteria for assessing students’ writings. The ten dimensions are independent of one another and the total score of each dimension is 100.

Mizige for practicing calligraphy.
K-means Clustering Analysis of Students’ Assessment
Because there are a large number of students and they have uneven abilities concerning calligraphy, we utilize K-means clustering analysis. Thus, we can sift the influence of abnormal value. The center value which has the most cases in K-means is taken as the final result of the students’ assessment.
After students assess the writings, we collect the assessment data. All the students in the class participate in the assessment. Each student is an intelligent entity. Even though they are not specialists in terms of calligraphy, we can bring into full play the groups’ intelligence. By means of K-means clustering analysis, the center value of K-means which has the most cases is taken as the final score, because this score represents the assessment score provided by the majority of students.
The students’ score of the 10 dimensions is encapsulated into a data with 10-dimension. Altogether, 136 students participate in the assessment. We get 136 10-dimension data. Clustering analysis is performed in accordance with the K-means clustering algorism. We set K = 3 in our experiment.
Assuming that the Chinese character “guo” (国) is subjected to cluster analysis after students’ rating, three cluster center points are obtained for the character “guo,” as is shown in Table 2.
Character “Guo” Cluster Center Points Results.
After clustering the student scores of the Chinese character “guo,” the number of samples contained in each cluster is obtained, as is shown in Figure 4. According to the principle that the largest number of samples in the three clusters is used as the final clustering result, the central value data of the 10 dimensions in cluster 3 is taken as the final result of the students’ writing assessment score.

Number of samples in each cluster of Chinese character “guo.”
Results and Discussion
Before grading calligraphy writings, students need to take a calligraphy course, in which students receive the basic training in terms of the structure of the characters and the shape, length, and weight of the strokes, as well as the specific position of each stroke on the mizige. Before the grading process, the teachers will explain the evaluation criteria to the students, then the students evaluate the calligraphy writings from a total of 10 dimensions of 3 categories. Grade 5 elementary school students are selected for this evaluation. After completing the evaluation of the four tasks, the students are asked to complete the questionnaire and 30 students are selected for interviews. The scores of the 10 dimensions are averaged to obtain the final score.
Peer and teacher marking are compared from 3 aspects, namely, an analysis of the total score, the details of each score awarded, and comments based on the marking criteria. Scores within individual groups are also analyzed to find out the consistency of group marking. In addition to the analysis of both peers’ and teachers’ marking, questionnaires and interviews are used to ascertain how satisfied students are with their scores, together with their opinions regarding the score and the feedback they received.
Table 3 summarizes the statistics relating to teachers’ and peers’ scores of calligraphy quality, based on 136 students’ assessment of the four tasks. Each of the scores in the four tasks is calculated from the average scores, given by five teachers (teacher scores), and clustering the student scores given by 136 students.
Descriptive Statistics of Teachers’ and Peers’ Scores in Four Tasks (n = 136).
In this experiment, the average of the peers’ scores is higher than that of the teachers’ scores—20.56%, 11.84%, 12.19%, and 8.25% in task 1, 2, 3, and 4, respectively.
The peer marking and teacher marking in task 1 show the biggest difference in their scores, because task 1 (left-center-right structure Chinese characters) is rather complicated, and it is not easy for students to judge the writing quality of this type of Chinese characters.
The difference between the scores of peer marking and teacher marking in task 2 and task 3 is smaller than that of task 1, and the difference between the scores of task 2 and task 3 is relatively close. It is because the Chinese character structure of task 2 (upper and lower structure Chinese characters) and task 3 (left and right structure Chinese characters) are simpler than that in task 1.
The difference between the scores of peer marking and teacher marking in Task 4 is smaller than those of tasks 1, 2, and 3, this is because task 4 (simple character) has the simplest structure in all Chinese characters and it is comparatively easier to evaluate its writing quality.
The standard deviation of students’ grading of task 1 is 19.32, which indicates that students’ grasp of the grading standard of task 1 is not accurate, and there are many deviations. The standard deviation of teacher scores for task 1 is 14.48, indicating that even experienced teachers have deviations in their scores for the more complex left-center-right structure.
The standard deviations of students’ scores for task 2 and task 3 are 15.96 and 15.54 respectively, which are lower comparing with task 1, indicating that students’ scores for task 2 (upper and lower structure Chinese characters) and task 3 (left and right structure Chinese characters) are more accurate than that of task 1. The standard deviation of students’ scores for Task 4 is 7.15, which shows that for the simple single character structure, the students’ scores are relatively uniform, indicating that the students’ grasp of the marking criteria of task 4 is relatively accurate.
Relationship Between Scores Given by the Teachers and Scores by Peers With Clustering
The relationship between the scores given by the teacher and the scores given by the peers is represented by the correlation coefficient. If changes in peer scores correlated with changes in teacher scores, the two sets of scores are correlated. A Pearson correlation is used to measure the correlation (linear correlation) between two variables peers’ and teachers’ scores with a value between -1 and 1.
The Pearson correlation coefficients in the four tasks are positive and substantial (r .60 and Sig. (two-tailed) = 0.000). The positive correlations achieve a high level of statistical significance because p < .01). There is the strongest positive relationship between the peers’ and the teachers’ scores in task 4 (r = .88, n = 136, p < .0005). Therefore, if the teachers give a high score for a certain task, the peers will also give a high score for this task.
However, the correlation coefficient in task 1 (r = .63, n = 136, p < .0005) is the lowest among correlation coefficients in task 1, 2, 3, and 4. This may be because task 1 is the most difficult task. In addition, in the initial stage of student evaluation, the marking criteria are not well grasped. However, after completing the follow-up tasks, the students have a better command of the marking criteria and the corresponding correlation coefficients gradually increase.
Difference Between Scores Given by the Teachers and Scores by Peers With Clustering
In the previous section, the Pearson correlation coefficients in the four tasks show that there is a strong correlation between the scores given by teachers and those given by peers, but the average scores given by peers are all higher than those given by teachers in the four tasks, 87.38 versus 66.82 (task 1), 80.63 versus 68.79 (task 2), 81.38 versus 69.19 (task 3) and 76.28 versus 68.13 (task 4) respectively. As a result, this section will focus on the difference between the scores given by the teachers and peers.
The paired samples t-test is used to test whether two related samples come from normal populations with the same mean. We use the paired-samples T-test to test the differences between the scores given by the teachers and peers.
Therefore, this statistical test is conducted and the purpose of which is to decide whether to accept or reject the hypothesis H0 below.
Hypotheses
H0: the clustering of the peers’ scores and the mean of the teachers’ scores is not significantly different
H1: the clustering of the peers’ scores and the mean of the teachers’ scores is significantly different
The differences in the mean scores of the four pairs are 20.52, 11.84, 12.19, and 8.25, respectively. The Sig.(two-tailed) values of the four pairs are 0.0031, 0.0024, 0.0017, and 0.0012, respectively. Because the sig.(two-tailed) values of the four pairs are below 0.05, there are significant differences in the clustering of the peers’ scores and the mean of the teacher’s scores (reject H0).
The reason for the difference between the evaluation results of teachers and students is that students and teachers have different understandings of the marking criteria of calligraphy. Students tend to think if learners can write Chinese characters that basically meet the standards, it is a fairly good job. In addition to the basic writing standards, teachers also attach importance to the cultural connotation of Chinese characters when judging calligraphy. Emphasis is laid on whether students understand the implication of Chinese characters and whether the calligraphy embodies the implied meaning and cultural connotation of Chinese characters.
Marking Criteria
In order to find out which marking criterion the peers can mark as accurately as the teachers, in this section we will analyze in great detail the correlations between the peers’ and the teachers’ scores. Teachers and peers marked the task by responding to the same questions catalogued in the marking criteria, which relate to calligraphy quality (i.e., structure, stroke and handwriting). The marking criteria for task 1, 2, 3, and 4 consist of 10 questions (see Table 4). The Pearson correlation coefficient is a statistical indicator that quantitatively describes the closeness and direction of the linear relationship between two continuous variables. The Pearson correlation is used to identify the relationship between the teachers’ and the peers’ scores of 136 individual tasks for each marking criterion.
Correlations Between Teachers’ and Peers’ Scores for Each Marking Criterion (n = 136).
Correlation is significant at the .01 level (two-tailed).
Correlation is significant at the .05 level (two-tailed).
Task 1
The strongest positive relationship (at a high level of statistical significance) between the teachers’ and the peers’ scores is marking criterion 2 (r = .893, n = 136, p < .0005), and the least positive relationship is marking criterion 8 (r = .362, n = 136, p < .0005). Therefore, marking criterion 2 “Is the overall structure regular” is the easiest for students to follow whereas marking criterion 8 “Is the stroke angle sensible” results in the most inaccurate marking. It is probably because task 1 is the assessment of left-center-right structure Chinese characters and it is the most complicated among the four tasks in terms of the structure and stroke of these characters. Hence, students cannot fully grasp the marking of this task.
Task 2
The strongest positive relationship (at a high level of statistical significance) between the teachers’ and the peers’ scores is marking criterion 5 (r = .712, n = 136, p < .0005), and the least positive relationship is marking criterion 4 (r = .467, n = 136, p < .0005).
Therefore, marking criterion 5 “Is the stroke length accurate” is the easiest for students to follow whereas marking criterion 4 “Is the stroke shape reasonable” results in the most inaccurate marking. It is probably because task 2 is the assessment of upper and lower structure Chinese characters, which is the most complicated among the four tasks in terms of character shape. It is also difficult for students to fully grasp the marking of this task.
Most of the correlations are higher than .50, which means most of the peers’ scores and teachers’ scores have a strong positive relationship at a high level of statistical significance. However, students and teachers have different opinions on marking criterion 5. Some students believe that the shape of the upper and lower structure of Chinese characters is not important for Chinese calligraphy, so a higher score is given. On the contrary, the teachers hold that if the shape of the upper and lower structure of Chinese characters is unreasonable, it will seriously affect the artistry of Chinese calligraphy, so a lower score is awarded.
Task 3
The strongest positive relationship (at a high level of statistical significance) between the teachers’ scores and peers’ scores is marking criterion 7 (r = .852, n = 136, p < .0005), and the lowest for criterion 3 (r = .473, n = 136, p < .005). Therefore, marking criterion 7 “Is the position of the stroke proper” is the easiest for students to follow whereas marking criterion 3 “Is the overall structure smooth” results in the most inaccurate marking. It is probably because task 3 is the assessment of left and right structure Chinese characters, which is the most complicated among the four tasks in terms of character evenness. Thus, students cannot fully grasp the marking of this task.
Task 4
The strongest positive relationship (at a high level of statistical significance) between the teachers’ scores and peers’ scores is marking criterion 1 (r = 0.796, n = 136, p < .0005), and the lowest for criterion 10 (r = .232, n = 136, p < .015). Therefore, marking criterion 1 “Is the overall structure symmetrical” is the easiest for students to follow whereas marking criterion 10 “Is the writing linked-up” results in the most inaccurate marking. Task 4 is the assessment of simple characters, which are relatively few in all Chinese characters. Therefore, students and teachers differ greatly in their marking of such tasks.
As to the three dimensions of evaluation of structure, stroke and handwriting, some are subjective issues while others are objective issues. Therefore, when assessing more complex characters, differences will arise in terms of ratings by teachers and students within the same category.
The subjective and objective questions are summarized in Table 5. There are more subjective questions (accounting for 60%) than objective questions, which further proves that proper grading of calligraphy is a very difficult task.
Objective and Subjective Question Type in Marking Criteria.
Table 6 displays the grouping of strong, medium and low positive relationships between peers’ scores and teachers’ scores for the 10 marking criteria. Most correlations from these four tasks are high, which means that there is a strong positive relationship between peers’ scores and teachers’ scores. In particular, in task 2 and 3, there is no low correlation. As can be seen in Table 6, marking criterion 5 ‘Is the stroke length accurate’ is marked accurately by students in each task; marking criterion 7 ‘Is the position of the stroke proper’ and marking criterion 8 “Is the stroke angle sensible” are in the low correlation group in Task 1 and 4.
Correlation Coefficient for 10 Marking Criteria From the Four Tasks.
It can be seen from Table 6 that in the 4 tasks marking criteria 1, 2, 5, 9 are all in high and medium correlation and no low correlation. It shows there is a strong positive relationship between peers’ scores and teachers’ scores in terms of objective criteria, which also testifies that it is comparatively easy for students to have a command of objective criteria. However, marking criteria 7, 8, 10 are in low correlation in Tasks 1 and 4 and in high and medium correlation in Tasks 2 and 3. It shows that students can hardly grasp subjective criteria for difficult tasks (Tasks 1 and 4) and that students can accurately grasp subjective criteria for easy tasks (Tasks 2 and 3).
Therefore, the correlation of students and teachers for different grading scales depends on the type of marking criterion (objective/subjective) and the level of difficulty of the task.
Questionnaire and Interview Analysis
After grading each calligraphy writing, students refer to the scale in Appendix I to ascertain the corresponding scoring standard according to the scoring results, so as to obtain the corresponding comment of the calligraphy writing concerning each scoring standard.
This section discusses the students’ opinions about assessing the calligraphy writings and how useful the comments from the scale are, with the following questions.
Are the comments from the scale helpful?
Results in Table 7 suggest that most students regard the comments from scale with suggestions on improving their calligraphy abilities as useful and helpful when they embark on doing their next task. According to the questionnaire, the majority of students (80.9%) choose “useful” or “most of them are useful”. Similarly, based on the interview, most students (70%) opt for “useful” or “most of them are useful”. On top of the feedback from the evaluation, students are able to tackle the problems of their calligraphy writing in their subsequent calligraphy practice. Taking into consideration the feedback obtained in the PA process, students will gradually improve their calligraphy writing skills and proficiency in the subsequent calligraphy practice.
Responses From Questionnaire and Interview on Usefulness of Comments.
The students’ citations are displayed as follows:
Are students satisfied with scores from peer assessment?
There are three sets of scores awarded by students in peer assessment (structure, stroke and writing marking). Results from the questionnaires indicate that 81.6% of students (111 out of 136) are satisfied with their scores from peer assessment. However, some students are not satisfied with the scores awarded by peers for the following reasons.
Do students feel comfortable while assigning scores?
Results from the questionnaires indicate that 102 students (75%) felt comfortable when assigning scores, but 34 students do not. The main reason seems to be that students feel that they do not have a good command of many subjective evaluation criteria. Although before grading the teacher gave a detailed explanation of each marking criterion, it is still difficult for the students to give a comparatively objective score based on these subjective evaluation standards.
In the interviews, 18 out of 30 students said that they felt comfortable when assigning scores. Three students thought the grades were reasonable, but could not fully accept some subjective evaluation. Six students thought the grades were unreasonable. They believed that those subjective evaluation criteria could not be completely grasped by everyone, so the evaluation results were not objective. Three students were not sure whether the grades were reasonable, they held that the grades seemed reasonable, but on second thought they thought the grades were not reasonable.
It has been found through the interviews that although the students cannot understand calligraphy as well as the teachers, and they cannot evaluate the specific problems of a calligraphy writing from a professional point of view, after the training, the students can master the dimensions of evaluation and are able to evaluate from different perspectives. During the peer assessment process, students get to know the marking criteria and learned what kind of problems they will encounter when finishing the calligraphy writing. By means of the refinement of calligraphy marking criteria, students can identify problems in their own calligraphy writing and then tackle the problems in a targeted manner.
Anonymous evaluation helps students to overcome anxiety in evaluating peers’ works and develop their assessment skills. The research shows that online and anonymous assessment can decrease the students’ tension when evaluating their peers’ works. It also proves that peer assessing can motivate the students to think, thus, enhancing learning and critical thinking abilities. Moreover, peer assessing made them construct, examine and analyze their own work before submitting the writings.
Discussion
This research explores the impact of peer assessment on Chinese calligraphy learners and the attitudes of students towards peer assessment.
The most critical factor influencing the peer assessment effect is rater training. Rater training exerts huge influence on the effect size of peer assessment (Li et al., 2020). In our study, we provide training to the students in terms of the assessment criteria before they are involved in assessing the calligraphy writings. Being familiar with the assessment criteria contributes to the raters’ effective and proper assessment. Li et al. (2020) also suggested that computer-mediated peer assessment can bring about greater learning gains than the traditional paper-based peer assessment. In our study, students assess the calligraphy writing via online platform then clustering analysis is performed on students’ assessment scores.
The experiment results align with the findings of Liu and Li (2014). They displayed evidence manifesting the validity of students’ rating of peer work if sufficient training has been provided. Our study shows that teachers’ scores and students’ PA scores display deviations for Task 1 (left-center-right structure Chinese characters). We also notice that even experienced teachers show divided opinions on this difficult font structure (left-center-right font structure). The PA scores of students for Task 2 (upper and lower structure Chinese characters) and Task 3 (left and right structure Chinese characters) are more in congruity with those of the teachers. Because Task 4 (simple character) has the simplest structure in all Chinese characters, the difference between students’ PA scores and teachers’ scores in Task 4 is smaller than those of Tasks 1, 2, and 3. It can be seen that with the exception of the most difficult font structure, students’ PA scores are close to the teachers’ rating. Therefore, we can conclude that the students PA can assess most of font structures in calligraphy writing effectively.
Anonymity is another important factor in peer assessment which makes students comfortable while assigning marks since they can concentrate exclusively on the program (Sitthiworachart & Joy, 2008). The anonymous online PA scale in calligraphy courses proposed in this paper comprises three factors (structure, stroke and handwriting) composed of 10 dimensions to assess the students’ calligraphy writing, which assesses the students’ calligraphy writing in a more comprehensive manner than the traditional teacher evaluation. Therefore, students can fully understand the details they should attach importance to in future calligraphy practices. Meanwhile the evaluation process is fast, which can also offer the students instant feedback.
Previous research shows that the incorporation of specifically planned strategies which foster a positive attitude towards peer assessment improves the desired effectiveness of peer assessment and engagement (Wing & Yu, 2021). The students believe that they are more involved in calligraphy courses through peer assessment. It is no longer the case when students finish the calligraphy writing and wait for a long time to get the teacher’s comment. Instead, the students also participate in the grading process. Thus, their understanding of calligraphy has been enhanced. They think the anonymous marking is beneficial in a number of ways. After students submit the assignments, they do not have to worry about being downgraded by others. As raters, they will not maliciously lower their classmates’ scores or deliberately give high scores to their intimate friends. Similarly, there is no need to worry that their classmates will feel dissatisfied with low scores given by others.
The research of Casey et al. (2011) suggested that the majority of students enjoyed the assessing process, and that peer assessment facilitates and strengthens students’ engagement. During the PA process of calligraphy writing, the students participate in the grading process and learn the calligraphy marking criteria, which enhances their understanding of calligraphy writing. By contrast, students remain passive in the traditional teacher evaluation framework. They just get responses from their teacher and there is little or no chance for them to interact with the instructor concerning the specific feedback. As a result, the students are more involved in calligraphy courses through PA and display high levels of engagement in the assessing process.
Conclusions
This study shows that using of PA in the calligraphy writing assessment can promote students’ calligraphy practice and their understanding of calligraphy. PA in the calligraphy writing assessment help students develop their critical judgment skills and students get more involved in the calligraphy course. At the same time, anonymous marking guarantees that students will not deliberately give high scores to good friends or maliciously lower the scores of other classmates. Qualitative and quantitative analyses suggest that PA is an effective and accurate assessment method for calligraphy courses. The majority of students are satisfied with their PA assessment scores and find the assessing suggestions helpful. These results suggest that PA for calligraphy writing assessment can be used to promote students’ calligraphy learning, cultivate students’ calligraphy writing skills and help students grasp the main points of calligraphy assessment by making marking judgments on other students’ writing and providing specific feedback. To summarize, cluster analysis-based peer assessment for calligraphy classes reported here is novel, as students demonstrate better understanding, higher engagement and more consistent calligraphy writing skills.
One limitation of this study lies in while analyzing the examples of calligraphy we exclusively use the Chinese characters. Future studies could involve integrating the calligraphy education of Western words and Chinese characters to form a unified calligraphy education model. Hopefully, further research can cover ideographs and alphabets alike.
The assessment of the students’ calligraphy writing consists of two aspects. On the one hand, it assesses the overall effect of the calligraphy writing, which is usually done when the calligraphy work is completed. On the other hand, the student’s brush grip gesture and stroke writing order during the writing process are also important factors for the assessment of calligraphy. This study mainly focuses on assessing the overall effect of the completed calligraphy writing through anonymous peer assessment. Therefore, for the future study, it is hoped that the students’ whole writing process can be recorded in videos, and the student’s brush grip gesture and stroke writing order can be further analyzed. Thus, the assessment of the writing process can be incorporated into the assessment of calligraphy so that the calligraphy courses can be more comprehensively promoted.
Footnotes
Appendix 1
Elementary School Students Calligraphy Writing Assessment Scale.
| 10 Dimensions | Scores | ||
|---|---|---|---|
| Assessment dimension | 80–100 | 60–80 | Under 60 |
| Symmetrical structure | Structure is symmetrical | Structure is basically symmetrical | Structure is asymmetrical |
| Regular structure | Structure is regular | Structure is basically regular | Structure is irregular |
| Smooth structure | Structure is smooth | Structure is basically smooth | Structure is not smooth |
| Stroke shape | Stroke shape is reasonable | Stroke shape is stiff | Stroke shape is improper |
| Stroke length | Stroke length is accurate | Stroke length is basically accurate | Stroke length is random |
| Stroke weight | Stroke weight is appropriate | Stroke weight is unobvious | Stroke weight is inappropriate |
| Stroke position | Stroke position is proper | Basically proper stroke position | Stroke position is orderless |
| Stroke angle | Stroke angle is sensible | Stroke angle is basically sensible | Stroke angle is random |
| Correct handwriting | Writing is correct | Writing is basically correct | Writing is not correct |
| Linked-up handwriting | Writing is linked-up | Writing is basically linked-up | Writing is not linked-up |
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work reported in this paper is supported by the National New Liberal Arts Research and Reform Practice Project under Grant No. 2021180001.
Ethical Approval
The participants were informed that the data obtained from the study will only be used for research and that they have the right to withdraw from the study at any time. The researchers provided data security and the data were protected in the computer used by the researchers.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.
