Abstract
Testing can do more than just determine what a student knows; it can aid the learning process, a phenomenon known as the testing effect. There is a growing trend for students to create and share self-assessment questions in their subject, as advocated by the contributing-student pedagogy (CSP). For subjects with large enrolments, this process can be facilitated by educational technology. PeerWise is an example of such technology. It is free, web-based software that allows students to author, share, answer, and provide feedback on multiple-choice quizzes in a collaborative and constructivist fashion. While it is popular, it is unclear to what degree it facilitates student learning. To evaluate its effectiveness, we introduced PeerWise into a second-year psychology subject. We measured the extent to which it increased scores in the final exam. We found that PeerWise did significantly increase exam scores, so was a useful learning aid.
Keywords
Introduction
In higher education, assessments can be either summative or formative (Boud, 2000; Nicol & Macfalane-Dick, 2007). Summative assessments are designed to evaluate what students know. They are typically administered at the conclusion of the subject or course and will usually provide quantitative feedback, allowing the students to be ranked relative to their peers (Boud, 2000; Nicol & Macfalane-Dick, 2007). Conversely, formative assessments are primarily designed to aid the learning process (Black & Wiliam, 2009; Nicol & Macfalane-Dick, 2007). They are typically used to indicate shortcomings in the knowledge of the student in order to guide subsequent learning (Black & Wiliam, 2009; Nicol & Macfalane-Dick, 2007). In particular, they can help the students identify which of their preconceptions need to be refined (Smith, diSessa, & Roschelle, 1993). Students do not start a subject as a blank slate. Rather, they bring to the subject a range of knowledge and preconceptions. Crucially, these preconceptions are not necessarily misconceptions (Clement, Brown, & Zietsman, 1989). Rather, they often represent an unrefined or oversimplified understanding of a topic that can serve as a useful starting point for their subsequent learning (Clement et al., 1989). Formative assessments help to identify in which ways these preconceptions require refinement and can further be used throughout a subject to aid students in progressively building their understanding. Formative assessments can also give feedback on how effective the student’s learning strategies have been, and can motivate the student to learn more in the future (Pastotter, Schicker, Niedernhuber, & Bauml, 2011).
While formative assessment has been identified as a critical practice for enabling student progress, it inevitably exists within a broader pedagogical approach. Recent innovations emphasize a more interactive role for students. Whereas in the traditional teaching model the instructor prepares and administers learning resources, in the contributing-student pedagogy (CSP), the students share responsibility for creating learning resources (Hamer et al., 2008; Hamer, Sheard, Purchase, & Luxton-Reilly, 2012). This pedagogy is based on the philosophy that students learn best when they are actively engaged in helping to create their own learning resources, and draws on constructivist (Hamer et al., 2008) and socio-cultural constructivist (Ben-Ari, 2001) theories of learning. CSP activities will often involve students creating assessments for their peers, sharing solutions and feedback with each other, and sometimes reviewing the work of their classmates (Vygotsky, 1978). While this may be achieved using only pen and paper, in subjects with larger enrolments, CSP will usually rely heavily on computer-based technologies to administer and provide feedback on formative assessment tasks.
PeerWise (http://peerwise.cs.auckland.ac.nz) is an example of web-based software that allows students to create and share formative self-assessments in a convenient manner (Denny, Hamer, Luxton-Reilly, & Purchase, 2008; Denny, Luxton-Reilly, & Hamer, 2008a). As such it facilitates CSP. It is free and widely used. In brief, it works as follows (Luxton-Reilly & Denny, 2010): The lecturer creates a PeerWise website and gives the students access to it. The students can then create, share, and answer multiple-choice questions (MCQs). When a question is created, the author of the question designates which of the potential answers is correct and provides a written explanation to act as feedback. Once the question has been attempted, the answerer receives this feedback and then has the opportunity to rate the question based on its difficulty and quality, as well as provide general comments. These comments can form the basis of an online discussion where the veracity of the question can be debated with other students, including the author of the question. This allows useful questions to be identified and permits those questions whose answers are potentially incorrect or ambiguous to be flagged. The correct answer can then be provided by other students. A central aim of PeerWise is to guide subsequent learning by the student. By revealing in which areas the student’s knowledge is incomplete or otherwise needs refinement, PeerWise helps the student focus their learning more effectively. It is in this sense that we regarded PeerWise as formative (Black & Wiliam, 2009) and well aligned with CSP.
Testing is a powerful means of improving learning (Roediger & Karpicke, 2006a, 2006b). Indeed, there is growing evidence that it is one of the most effective ways of aiding student learning (Dunlosky, Rawson, Marsh, Nathan, & Willingham, 2013). For example, taking a test has been shown to improve retention more than spending an equivalent amount of time restudying the material, even when test performance is poor and no feedback is given (Roediger & Karpicke, 2006a). This
While some studies have indeed shown that using PeerWise is correlated with higher marks in exams taken at the end of the subject (Bates, Galloway, & McBride, 2012; Bottomley & Denny, 2001), these studies did not attempt to control for student aptitude. Here we are using the term “aptitude” to refer to how easily and quickly a student can learn the subject matter. In particular, if two students are equally motivated, use the same revision strategy and spend the same amount of time learning, the one with the greater aptitude will learn more. When determining how effective PeerWise is as a learning aid, it is necessary to control for student aptitude. In particular, it could be that the students with a greater aptitude for the subject tend to use PeerWise more. Since these students would also tend to perform better in the exam taken at the end of the subject, this could give rise to a spurious correlation between PeerWise usage and exam performance, giving the impression that PeerWise contributes more to learning than it really does (Luxton-Reilly, 2012).
The most rigorous way to address this concern would be to randomly divide students into cohorts and allow only some cohorts access to PeerWise. Student aptitude could then be decoupled from PeerWise usage. However, this is generally considered unethical as it would mean denying some students in a subject a potentially beneficial learning aid while allowing other students in the same subject access to it. As a compromise option, Humpage (2014) adopted a quasi-experimental design. She conducted a 6-year study of a sociology subject where PeerWise was made available for only two of those years. Comparing student outcomes across all 6 years, she concluded that there was little evidence that the introduction of PeerWise was associated with any improvement in student performance.
McQueen, Shields, Finnegan, Higham, & Simmen (2014) argued that PeerWise is helpful for some students. They studied the usage of PeerWise in a second-year genetics subject over a period of 3 years. For each year, they divided their students into four quartiles based on their performance in a prior genetics subject. Within each quartile, they performed a median split to identify those students with high PeerWise activity (HPA) and low PeerWise activity (LPA). They reported mixed findings. In the majority of their comparisons (7/12), they found that the HPA group did not significantly outperform the LPA group. In the remaining comparisons, they found a 4–6% improvement in the overall marks for the subject. However, it is unclear to what extent even this increase can be attributed to PeerWise. Because they operationalized student aptitude as performance on a
There are several reasons why students with greater aptitude for the subject material might be more likely to engage with PeerWise. For example, such students might derive more satisfaction from learning, so might readily engage with all learning aids. Additionally, these students would be more likely to get the answers correct, which might encourage them to remain engaged with PeerWise. Since aptitude is likely to be correlated with exam performance, if the students with greater aptitude tend to engage more with PeerWise this would cause PeerWise usage to be correlated with exam performance, even if PeerWise itself was not an effective learning aid. Therefore, to test whether PeerWise is effective, it is necessary to control for student aptitude. While correlation can never prove causation (Aldrich, 1995), we can better test if participation with PeerWise does indeed improve final exam scores if we use partial correlations to control for student aptitude.
There were a number of different ways students could participate in PeerWise: they could write questions, comment on questions, and answer questions. Our initial plan was to investigate to what degree each way of utilizing PeerWise increased final exam scores. However, as we discuss later, relatively few students either wrote or commented on questions, so we decided to quantify PeerWise usage solely as the number of questions a student answered.
In summary, different studies have come to conflicting conclusions as to whether PeerWise facilitates learning. While one multi-year study concluded that PeerWise does not improve student performance (Humpage, 2014), another study found that it might do so for some students (McQueen et al., 2014). However, for the latter study, it is unclear to what extent this improvement is due to the students with greater aptitude for the subject matter tending to use PeerWise more.
The purpose of our study was to answer the following question: To what extent does utilizing PeerWise by answering questions increase final test performance? To address this question, we needed to measure the degree to which PeerWise facilitates learning while controlling for student aptitude. We measured student aptitude by measuring performance on two assignments. Crucially, these assignments were part of the same subject as the one to which PeerWise was applied (Luxton-Reilly, 2012). This allowed us to better discount student aptitude when measuring the degree to which PeerWise facilitates student learning.
Method
Participants
To assess the effect of answering questions on PeerWise on final exam performance, we introduced it into a second-year psychology subject,
Materials
Each student completed two essays, answered one or more questions on the PeerWise website, and sat the final exam. Thus, for each student, we had the marks for the two essays, the number of questions he/she authored, answered and commented on in PeerWise, and his/her score in the final exam.
Procedure
We constructed a PeerWise account, which was made available to students at the start of the second week of the semester. For ethical reasons, it was made available to all students enrolled in the subject. Because we were unsure whether PeerWise would be an effective learning aid, participation in it was not mandatory. However, students were encouraged to use it on a number of occasions.
The two essays were marked using a points-based marking scheme. Essentially, there was a list of criteria that each essay needed to satisfy and a number of points was assigned to each criterion. For example, some of the criteria covered the key concepts that each essay needed to explain. Other criteria evaluated how well students followed APA formatting guidelines. Double marking of a subset (approximately 10%) of the essays ensured that all essays were being marked in a consistent manner. The essays were on a topic that was covered only in the tutorials and was not the focus of any of the lectures. As the PeerWise questions were focused solely on the lecture content and not on the tutorial content, the PeerWise questions did not relate to the essays. As such, the essay marks represent a useful, independent measure of student aptitude for the subject in question.
In the final exam, there were 120 MCQs. These questions covered the lecture content, so covered the same material as the PeerWise questions. However, because the PeerWise questions were constructed by students, they were written independently of the final exam.
Design and Analysis
As discussed above, it is possible that the students with greater aptitude for the subject matter might be more likely to use PeerWise. Since these students are likely to do better in the final exam, this may cause a positive correlation between the exam scores and participation in PeerWise, even if PeerWise were not an effective learning aid (Luxton-Reilly, 2012). Following the lead of Luxton-Reily (2012), we used performance on the two assignments during the subject to give a measure of each student’s subject-specific aptitude.
For each subject, we therefore measured the partial correlation between the level of participation in PeerWise and the student’s exam mark, controlling for the student’s subject-specific aptitude (as measured by their separate marks on the two assignments). We also performed a linear regression where we attempted to predict students’ exam scores based on their two essay scores and their level of participation in PeerWise. This regression allowed us to determine the degree to which PeerWise participation affected the final exam score, separate from student aptitude.
Results
In total, 176 questions were written and 134 comments were made for this subject. Out of the 387 students who completed all assessments and participated with PeerWise, only 28 wrote questions. Most of these students did not write many questions, whereas a small proportion wrote a large number. Consequently, for these students, the modal number of contributed questions was two whereas the mean was 6.3. Forty-three of the 387 students commented on at least one question, and all 387 students answered at least one question with each, on average, answering 92.2 questions. Given the relatively low number of students authoring questions or writing comments, we did not include these two measures in our analysis. This, unfortunately, meant that we could not quantitatively investigate the extent to which creating or commenting on MCQs aids learning and affects exam performance.
Figure 1 shows the marks for the two assignments and the exam as a function of whether or not the student participated in PeerWise. For the two assignments, students who participated in PeerWise scored higher than those that did not (Assignment 1: The mean assignment marks as a function of whether the student participated (“yes”) or did not participate (“no”) in PeerWise. Error bars represent the standard error of the mean.
The coefficients of the regression fit. “Number of Questions Answered” denotes how many PeerWise MCQs were answered.
Discussion
Self-assessments can aid learning, and it is becoming increasingly common for students to create and share these assessments, as advocated by the CSP (Luxton-Reilly, 2012). PeerWise is an online resource that can facilitate this process (Denny, Hamer, et al., 2008; Denny, Luxton-Reilly, et al., 2008a). Although it has proven popular with students (Denny, Luxton-Reilly, & Hamer, 2008b), it was unclear whether it increases student learning and, if so, whether the extent to which it increases student learning justifies the time students spend on it. The purpose of our study was to address this concern. We found that the number of questions that a student answered in PeerWise was positively correlated with their final exam score even when subject-specific student aptitude was controlled for. A linear regression revealed that each question answered increased the final exam score by 0.04%. This means that if a student were to answer all 176 questions that were posted on PeerWise, then the predicted increase in their final exam score would be 7.0%. Given that we would expect students to take approximately three hours to answer these questions (based on the fact that in the final exam they are required to answer 120 questions in two hours), this would appear to be an efficient use of their time.
Our results add to the growing literature that PeerWise is an effective learning aid. A concern with much of this literature was that it did not control for student aptitude. A notable exception to this trend was a study by McQueen et al. (2014) who quantified student aptitude as performance on a prior subject. Like us, they concluded that PeerWise was an effective tool for formative assessment, at least for some students. Combined with our findings, this suggests that regardless of whether student aptitude is defined as performance on a prior subject or performance on non-MCQ assessment tasks in the current subject, one can still find evidence that usage of PeerWise increases final exam scores, controlling for student aptitude.
The McQueen et al. (2014) study investigated the effectiveness of PeerWise in the context of a second-year genetics subject. Conversely, we investigated its effectiveness in the context of a second-year psychology subject. The fact that both studies found PeerWise to be an effective tool for formative assessment suggests that finding is likely to generalize to other second-year science subjects. However, it is possible that PeerWise may not be as effective in other contexts. For example, Humpage (2014) concluded that there was no compelling evidence that PeerWise was an effective learning aid for a first-year sociology subject.
As noted earlier, PeerWise relies on the testing effect and there is evidence that the testing effect may not apply to more complex subject matter (Van Gog & Sweller, 2015). Van Gog and Sweller found that testing was most effective as a learning aid for material containing distinct elements, where the individual elements could be learned independently and without reference to the other elements. They described such material as having low element interactivity. The subject that was the focus of the present study,
Other subjects may contain material with much higher element interactivity. In particular, the subject for which Humpage (2014) investigated the effectiveness of PeerWise,
So far, we have been agnostic as to the mechanism by which PeerWise aids learning. It could be that improved learning occurred only for the items that were tested, what Roediger and Karpicke (2006a) call “direct” testing effects. If true, then training with PeerWise would increase performance for those questions in the final exam that happened to be similar to or identical to questions in PeerWise (recall that the students who wrote the questions for PeerWise had no knowledge of the questions that would appear on the final exam). However, it has also been reported that memory for the untested items may also improve, what is often described as an “indirect” testing effect (Roediger & Karpicke, 2006a). It has been suggested that indirect testing effects are caused by the participation in the test activating other related information, thereby causing the participant to review that information as well (Roediger & Karpicke, 2006a). However, testing can also increase learning for unrelated information that is studied
In conclusion, we have found that PeerWise is an effective learning aid in a second-year psychology subject. Not only did PeerWise participation result in a statistically significant improvement in exam scores, but the effect size was large enough to make its usage worthwhile for the students. This supports previous work that has also found PeerWise to be a useful learning aid (McQueen et al., 2014). However, it is possible that these results will only hold for subject matter with low element interactivity (Pastotter et al., 2011; Van Gog & Sweller, 2015) and may not be appropriate for material that has higher element interactivity (Humpage, 2014).
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by a Learning Teaching Initiative grant from the University of Melbourne (no grant number).
