Sage Journals: Discover world-class research

Abstract

This study explains how economics instructors can incorporate pretest and posttest assessments into their economics courses to make testing and related teaching more dynamic and interactive with students. Such assessments allow the correct and incorrect responses to pretest and posttest items to be used by instructors to create four learning scores that reveal different pathways of student learning—positive, retained, negative, and zero. Learning scores provide more insights into learning progression among students during an instructional unit or a course than is possible with the typical posttest only assessment. The study explains how learning scores can be used by economics instructors to analyze classroom or group learning, to diagnose individual student understandings or misunderstandings, and assess how each test item contributes to student learning. It discusses the practical applications and extensions of learning score analysis that make it realistic for economics instructors to use and addresses measurement concerns.

JEL Codes: A2, A22

Keywords

learning scores economics instruction testing,assessment economic education

Introduction

The distinction between a stock and a flow is fundamental to measurement in economics. For example, the amount of capital investment is a stock measured at a point in time whereas economic growth is a flow measured over time. The stock-flow distinction also is important to research and instruction in economic education (Siegfried & Fels, 1979).¹ A learning outcome, such as economic understanding or achievement, is a stock that is measured after the completion of instruction. A learning progression is a flow that assesses what has been learned over time, which requires testing before and after instruction.²

Most tests given in economics courses are stock or summative assessments as they are administered to students after instruction. Economics teachers then use these posttest results in combination with data from other course requirements to assign course grades. Stock assessments, however, provide little information about learning progression during a course as little is known about the starting point for students before a course or an instructional unit begins. Flow assessments, by contrast, are powerful because they collect more data on what students apparently understand or do not understand before and after instruction, and those before and after data can be used to assess learning progression.

This study explains how economics instructors can incorporate flow or formative assessments into their economics courses using learning scores that are derived from response patterns to pretest and posttest items. They reveal four pathways of learning—positive, retained, negative, and zero. The four learning scores, either separately or together, provide more insights into learning progression among students during a course than is possible with a stock assessment administered only after the completion of economics instruction. Although learning scores were originally developed and used in regression research studies as alternative measures of student outcomes (Emerson & English, 2016; Happ et al., 2016; Walstad & Wagner, 2016), what have been overlooked are the practical applications of learning scores for course assessments and economics teaching, and they are the focus of this study.

What follows is an explanation of how learning scores can be used by economic instructors to improve testing and learning in economics courses to make it more dynamic and interactive. The first section introduces the idea of a measurement paradox with a total test score and explains how the four learning scores from a flow assessment offer learning insights hidden within total test scores. The second section provides a practical example of learning score analysis using data from a principles of economics course. It explains the process for how an economics instructor can use learning scores to: (1) analyze classroom or group learning; (2) diagnose individual student understandings or misunderstandings; and (3) assess how each item on a test contributes to student learning. The third section broadens the potential applications of learning scores by relaxing conditions related to the equivalence of matched items, the frequency of testing, the use of testing formats beyond multiple-choice, and the test conditions. The fourth section considers measurement issues, such as motivation or guessing, and offers a practical perspective for considering such concerns. The concluding section presents further implications for course testing and economics instruction.

A Measurement Paradox Revealed by Learning Scores

Economics instructors often administer a multiple-choice (MC) test to students in their principles of economics courses as part of a course assessment (Walstad & Miller, 2016; Watts & Schaur, 2011).³ Student responses to each MC item are scored as correct or incorrect and then the correct item scores are summed to create a total test score for each student. This scoring process, however, creates a measurement paradox. It arises because students with the same total score may have different sets of correct responses to the test items. Although from a total score perspective, the learning outcome of students with the same score is the same, the total score masks individual differences in the learning progression of students. The differences are revealed with four learning scores extracted from the correct and incorrect item responses when a MC test is administered as a pretest and a posttest (Walstad & Wagner, 2016).⁴

Table 1 provides a definition for each learning score. The first pattern is giving incorrect answers to items on the pretest and giving correct answers to the same items on the posttest. It indicates an improvement in student understanding, so this learning outcome is categorized as positive learning (PL). The second pattern is giving correct answers to items on the pretest and the posttest. It is classified as retained learning (RL) as students maintained their understanding of test content from pretest to posttest. Both PL and RL are desired learning outcomes from a teaching or instructional perspective as they either indicate an increase in understanding or at least its maintenance.

Table 1.

Learning Scores Defined.

Learning scores	Pretest item	Posttest item	Learning score calculations
Positive learning (PL)	Incorrect	Correct	Sum number of items with PL pattern
Retained learning (RL)	Correct	Correct	Sum number of items with RL pattern
Negative learning (NL)	Correct	Incorrect	Sum number of items with NL pattern
Zero learning (ZL)	Incorrect	Incorrect	Sum number of items with ZL pattern

The other two response patterns provide information about what students apparently did not learn or understand. With the third pattern, students give correct answers to items on the pretest but then give incorrect answers on the posttest. It indicates a possible loss in understanding from pretest to posttest, so in contrast to previously described PL and RL outcomes, it suggests negative learning (NL). The fourth pattern shows items with incorrect pretest and posttest responses, and accordingly is labeled zero learning (ZL).

The four response patterns for items can be used to construct four learning scores (PL, RL, NL, and ZL) for each student by assigning a 1 for a correct response or a 0 for an incorrect response to the same items on the pretest and posttest and then summing the scores across all test items. This disaggregation of the total test score creates a set of PL, RL, NL, and ZL scores that represent different pathways of students’ learning over time related to the test item content.⁵

These learning scores can be used to provide insights about their influence on a total score, whether it is a pretest, posttest, or difference score. As shown in Table 2, each total score is composed of two learning scores. The posttest score is RL + PL as it shows that students answered the test items correctly on the posttest regardless of the correct or incorrect responses students gave to those items on the pretest. The pretest score is RL + NL as students answered the item correctly on the pretest but later gave a mixture of correct (RL) and incorrect (NL) responses to those same items at the posttest. A difference score can be expressed in terms of learning scores as it is simply the subtraction of the pretest (NL + RL) from the posttest (PL + RL), which reduces to (PL – NL).

Table 2.

Relationship Between Total Test Scores and Learning Scores.

Total score	In terms of learning scores
Posttest score	Positive learning (PL) + Retained learning (RL)
Pretest score	Negative learning (NL) + Retained learning (RL)
Difference score	Positive learning (PL) – Negative learning (NL)
All test items	PL + RL + NL + ZL

The learning scores show trade-offs for assessing learning progression. The total number of test items equals the scores for PL + RL + NL + ZL. This additive relationship means that each one affects the others. For the posttest, the number correct is PL + RL and the number incorrect is NL + ZL. For any given level of NL + ZL, a larger RL means less PL and vice versa. A similar trade-off is evident in the pretest, but in this case, the number correct is RL + NL and the number incorrect is PL + ZL. For any given level of PL + ZL, the more RL, the less NL and vice versa.

In general, what the relationships show is that both RL and ZL will have a major influence on the amount of PL shown by students at the posttest.⁶ If RL is extremely high as a percentage of all test items, the test is too easy as students give correct answers to most items on the posttest and pretest. By contrast, if ZL is extremely high as a percentage of all test items, the test is too hard as students give incorrect answers to most items on the posttest and pretest. PL falls between these extremes as incorrect answers are given on the pretest, but correct answers are given on the posttest. Thus, for a test to show higher levels of PL requires more moderate levels of RL and ZL.

Learning score analysis also reveals a major problem with using a difference score (PL – NL). It requires an assumption that the two learning scores (PL and NL) are the same constructs with opposite meanings. NL, however, might be more affected by motivation, guessing, or other measurement issues such as confusing test items. PL alone can serve as a better indicator of the improvement in student understanding as it is not confounded by the subtraction of NL. Thus, it is better practice to consider each learning score separately and not manipulate them to create a total score with questionable interpretations.⁷

A Practical Example: A Re-take Exam

The pedagogical uses for learning scores are numerous and the analysis is relatively easy to conduct as explained with the following example. In a principles of economics course, a MC exam was administered to students to assess their understanding of economic concepts at the completion of each of the three units taught in spring 2020. At the end of the semester, students had the option to re-take one unit exam that had been administered during the semester. A higher score on the re-take exam would replace the score on the unit exam in the calculation of student grades. Students who chose to do a re-take exam were typically ones who showed poorer understanding of course content on the unit exam and would likely benefit most from a second chance to show economic understanding with a re-take exam.

Each of the 60 items on the re-take-exam was written to match a corresponding item on the unit exam. Although the paired test items were not identical, each covered the same economic content and numerical items differed only in the numbers included for the calculations. The item characteristics, therefore, made it possible to conduct a learning score analysis. In essence, student responses to items on the unit exam served as the pretest and student responses on the re-take exam served as the posttest. In preparation for the re-take exam, students were encouraged to go online and review the test items and their answers from the unit exam as the re-take items would be similar in content.

A learning score analysis was conducted with the unit exam and re-take exam. The analysis involves producing three types of valuable assessment information for instructors. The first one is the average scores for the group of students who took the unit and re-take exams. It helps instructors to understand how learning progression and learning outcomes vary for the group. The second one is the individual scores for students. It provides instructors with insights about individual differences in exam scores and the measurement paradox. The third one is item scores that show instructors the different learning patterns across test items in the exams. What follows is a brief explanation of each type of information from a learning score analysis with data from the unit and re-take exams.

Analysis of Group Averages

Table 3 presents the sample averages for the total and learning scores for the students on the unit exam (hereafter also pretest) and re-take exam (hereafter also posttest). The pretest average of 39.5 shows that the unit exam was difficult as most students answered only 66% of items correctly. Giving these students an opportunity to show an improvement in their learning on a re-take exam would help them achieve a better course grade. At the posttest, the group average is 50.5, or 84% correct, which was a major improvement.

Table 3.

Scores on Re-Take and Unit Exams (60 Items; 48 Students).

Variables	Mean	Std. Dev	In percent
Posttest score (re-take exam)	50.521	6.377	84.20
Pretest score (unit exam)	39.500	7.921	65.83
Posttest−pretest	11.021	7.052	18.37
Retained learning (RL)	34.896	9.238	58.16
Positive learning (PL)	15.625	5.978	26.04
Negative learning (NL)	4.604	3.420	7.67
Zero learning (ZL)	4.875	3.972	8.13

The average learning scores are of interest as they show their contribution or noncontribution to the posttest score. The posttest average can be divided into two learning components, RL (34.9) and PL (15.6), which means that the posttest score is 69% RL and 31% PL. A high RL would be expected with a re-take exam as students have previously been taught and tested on this economic content. Nevertheless, 31% of the posttest score reflects substantial positive learning by students as many incorrect answers given on the pretest were converted to correct answers to items on the posttest.

The difference score (11.02) is the gap between its two learning components, PL (15.63) and NL (4.60). The calculation shows that the total gain is reduced by a total loss in understanding of 30% [(4.60/15.63) x 100]. As previously discussed, the meaning and outcomes for PL and NL are best considered separately in a learning score analysis as PL and NL can be different constructs and not two sides of the same construct. The PL average alone shows a substantial improvement in understanding by students on the test (+15.63). That learning progression shown by the PL score should be focused on by economics instructors and not a difference score with a questionable interpretation.

Analysis of Individual Students

The total and learning scores for students can be used to describe a measurement paradox that students with the same total scores on the posttest have different underlying learning scores, RL + PL. Table 4 ranks students from highest to lowest posttest scores. The results show considerable variation in the PL and RL scores even for students with the same posttest score, which suggests a measurement paradox. For example, five students (# 15, 16, 17, 18, and 19 in Table 4) all have the same posttest score of 55. Their respective PL scores are 24, 15, 15, 25, and 20. Their respective RL scores are 31, 40, 40, 30, and 35. Students #16 and #17 have lower PL scores and higher RL scores than students #15, #18, and #19. Focusing only on the posttest score masks the substantial increase in PL shown by students #15, #18, and #19 compared with students #16 and #17, whose posttest scores are heavily influenced by RL.

Table 4.

Total and Learning Scores for Students: Re-Take and Unit Exams.

1	2	3	4	5	6	7	8
Students	Posttest (re-take)	Pretest (unit exam)	Difference (post-pre)	Retained learning	Positive learning	Negative learning	Zero learning
1	60	43	17	43	17	0	0
2	60	35	25	35	25	0	0
3	60	51	9	51	9	0	0
4	57	51	6	49	8	2	1
5	57	48	9	47	10	1	2
6	57	45	12	45	12	0	3
7	57	39	18	38	19	1	2
8	57	53	4	52	5	1	2
9	57	49	8	47	10	2	1
10	57	47	10	46	11	1	2
11	57	52	5	50	7	2	1
12	56	34	22	32	24	2	2
13	56	36	20	35	21	1	3
14	56	51	5	49	7	2	2
15	55	33	22	31	24	2	3
16	55	43	12	40	15	3	2
17	55	43	12	40	15	3	2
18	55	33	22	30	25	3	2
19	55	38	17	35	20	3	2
20	54	34	20	32	22	2	4
21	53	40	13	34	19	6	1
22	52	53	−1	45	7	8	0
23	52	45	7	41	11	4	4
24	52	40	12	35	17	5	3
25	51	35	16	32	19	3	6
26	50	36	14	31	19	5	5
27	50	44	6	39	11	5	5
28	50	41	9	36	14	5	5
29	50	49	1	43	7	6	4
30	49	48	1	41	8	7	4
31	49	37	12	33	16	4	7
32	48	33	15	26	22	7	5
33	47	46	1	37	10	9	4
34	47	32	15	25	22	7	6
35	47	20	27	17	30	3	10
36	46	41	5	32	14	9	5
37	45	28	17	23	22	5	10
38	44	45	−1	34	10	11	5
39	44	32	12	26	18	6	10
40	43	28	15	21	22	7	10
41	43	32	11	23	20	9	8
42	43	41	2	33	10	8	9
43	43	33	10	28	15	5	12
44	41	35	6	28	13	7	12
45	41	34	7	22	19	12	7
46	39	28	11	24	15	4	17
47	39	25	14	18	21	7	14
48	34	37	−3	21	13	16	10

This learning score analysis should be useful for economics instructors to diagnose what economic content each student apparently learned (PL), already understood and retained (RL), changed from perhaps understanding (NL), and did not show to have learned (ZL). The caution for economics instructors is that comparisons of learning scores across students must be carefully interpreted as they will differ based on each student’s responses. For example, two students (#12 and #15) have high PL scores of 24 on the re-take exam but which test items show the PL response pattern will differ by student. Learning scores, however, provide indicators that are useful to economics teachers for providing individual feedback to students and to better target current or future course instruction to enhance student learning.

Analysis of Test Items

Tables 5 shows the extensive variation in the learning response patterns across the re-take items that were not evident in the group averages. The test items are ranked from highest to lowest based on their average proportion. The average proportion on the posttest is .842, but it ranges across items from .979 to .521 (17 items in the .9 range, 25 items in the .8 range, and 18 items in the .7 range or lower). In learning score analysis, the lowest ranked items should be investigated further for potential content problems or to re-direct future instruction to help students learn this content.

Table 5.

Item Results for Total Scores and Learning Scores on Re-Take and Unit Exams (in Proportions).

1	2	3	4	5	6	7	8
1	Posttest (re-take)	Pretest (unit)	Difference (post-pre)	Retained learning	Positive learning	Negative learning	Zero learning
Average	.842	.658	.184	.582	.260	.077	.081
Item
1	.979	.813	.167	.792	.188	.021	.000
12	.979	.500	.479	.500	.479	.000	.021
17	.979	.750	.229	.750	.229	.000	.021
30	.979	.854	.125	.854	.125	.000	.021
32	.979	.896	.083	.875	.104	.021	.000
35	.979	.813	.167	.813	.167	.000	.021
48	.979	.979	.000	.958	.021	.021	.000
49	.979	.833	.146	.813	.167	.021	.000
5	.958	.896	.063	.875	.083	.021	.021
7	.958	.625	.333	.604	.354	.021	.021
14	.958	.729	.229	.729	.229	.000	.042
46	.958	.583	.375	.583	.375	.000	.042
34	.938	.667	.271	.646	.292	.021	.042
4	.917	.813	.104	.771	.146	.042	.042
8	.917	.604	.313	.604	.313	.000	.083
23	.917	.646	.271	.646	.271	.000	.083
42	.917	.625	.292	.583	.333	.042	.042
9	.896	.729	.167	.646	.250	.083	.021
28	.896	.625	.271	.583	.313	.042	.063
41	.896	.604	.292	.542	.354	.063	.042
21	.875	.542	.333	.479	.396	.063	.063
38	.875	.792	.083	.688	.188	.104	.021
44	.875	.563	.313	.479	.396	.083	.042
52	.875	.563	.313	.500	.375	.063	.063
53	.875	.750	.125	.646	.229	.104	.021
6	.854	.958	−.104	.813	.042	.146	.000
13	.854	.813	.042	.729	.125	.083	.063
24	.854	.813	.042	.729	.125	.083	.063
50	.854	.604	.250	.521	.333	.083	.062
55	.854	.688	.167	.604	.250	.083	.063
59	.854	.542	.313	.438	.417	.104	.042
60	.854	.688	.167	.604	.250	.083	.063
16	.833	.979	−.146	.813	.021	.167	.000
31	.833	.646	.188	.583	.250	.063	.104
36	.833	.563	.271	.521	.313	.042	.125
3	.813	.604	.208	.500	.313	.104	.083
10	.813	.875	−.063	.708	.104	.167	.021
22	.813	.500	.313	.458	.354	.042	.146
26	.813	.813	.000	.708	.104	.104	.083
27	.813	.625	.188	.521	.292	.104	.083
43	.813	.667	.146	.583	.229	.083	.104
47	.813	.750	.063	.646	.167	.104	.083
18	.792	.583	.208	.458	.333	.125	.083
19	.792	.729	.063	.667	.125	.063	.146
45	.792	.521	.271	.396	.396	.125	.083
51	.792	.521	.271	.438	.354	.083	.125
56	.792	.563	.229	.458	.333	.104	.104
20	.771	.646	.125	.542	.229	.104	.125
11	.750	.792	−.042	.625	.125	.167	.083
33	.750	.667	.083	.542	.208	.125	.125
37	.750	.646	.104	.563	.188	.083	.167
39	.750	.521	.229	.396	.354	.125	.125
2	.729	.688	.042	.542	.188	.146	.125
15	.729	.563	.167	.375	.354	.188	.083
29	.708	.250	.458	.208	.500	.042	.250
54	.688	.083	.604	.083	.604	.000	.313
25	.646	.542	.104	.417	.229	.125	.229
58	.646	.458	.188	.292	.354	.167	.188
57	.625	.354	.271	.208	.417	.146	.229
40	.521	.458	.063	.250	.271	.208	.271

The item proportion data can be used to calculate the percentage of correct responses that PL accounts for on the posttest [(column 6/column 2) x 100]. It averages 31%, but across items varies from a low of 4.9% (item 6) to a high of 87.8% (item 54), indicating that some items elicit more PL responses than others. Similarly, the percentage RL constitutes of correct responses for the posttest score [column 5/column 2) x 100] averages 69, but it too ranges substantially across items.

The item data from the learning score analysis can be studied from the perspective of incorrect rather than correct responses. In Table 5, the proportion incorrect for items on the posttest is the combination of NL + ZL (column 7 + column 8). The results from the learning score analysis show that incorrect responses average 15.8% across posttest items. Although this average is about equally split between ZL (8.1%) and NL (7.7%), there is extensive variation in the split across items.

Other Applications and Extensions

Several conditions for learning score analysis can be relaxed to increase its practical applications and pedagogical value for economics instruction. These modifications are worth discussing because the purpose of the analysis is not to grade students at the end of a course or unit of instruction. Instead, it is to give instructors information on what concepts students demonstrated their understanding of or lack of understanding. Instructors can use this information to give feedback to students about their learning. In addition, learning scores give economics instructors insights about contributions to students’ learning from test items that can be used to evaluate scores from current tests or to improve future assessments.

The first condition to relax is that students do not have to take the exact same items on the pretest and posttest. The reason for this less restrictive stance is that the major purpose of a test item is to assess whether students understand an economic concept. If test items can be matched for content coverage and are of similar perceived difficulty, but are not the exact same items, then they can be used to create two equivalent forms of a test, one as a pretest and the other as a posttest. Subtle differences in the matched test items are likely to have minimal influence on pretest and posttest scores. Evidence to support this assertion comes from the Test of Economic Literacy (Walstad et al., 2013, p. 304). It has two forms that were constructed using items matched only for their conceptual content and perceived level of difficulty. The average scores for the two forms show insignificant differences. The basic point is that if students understand an economic concept, they should be able to correctly answer similar items testing understanding of that same economic concept (as shown in the re-take example).

The second condition to relax is that a pretest is only given at the beginning of a course and a posttest only at its end. In fact, for the learning score analysis to be most useful for students and instructors, a pretest and posttest should be administered multiple times during a course, such as before and after each major unit of instruction within a course. In contrast to the total score that students usually receive after an exam, their learning scores indicate those items and economic concepts that they had difficulty understanding. The use of learning scores can help students identify r economic misconceptions early in an economics course that affect their learning later in the course or in subsequent economics courses.

Consider the situation where an instructor divides a course into three units of instruction. At the beginning of each unit, the instructor administers a pretest and at the end a posttest with matched items. The pretest results can be shared with students to indicate what they need to know by the time of the posttest. The pretest results may also help the instructor identify those concepts being taught that have been shown either to be difficult or easy for students. At the end of the unit, the posttest is administered and the learning score analysis can be conducted to identify what concepts individual students understand and provide students with individual feedback. Similarly, for the other two units of instruction during a course, a pretest and posttest are also administered. This multiple testing procedure provides instructors with ample test data to use for course instruction and equips them with test information to provide individual guidance to students to support their learning.⁸

The results from the pretest and posttest administered for each instructional unit also are valuable for preparing a final comprehensive exam if it is administered. Some of the items selected for that exam can be ones on unit tests that showed the highest levels of NL or ZL to check that students have mastered those items that many students responded to incorrectly on unit tests. If instead of giving a final exam, an instructor allows students to re-take a unit exam, and have the new score replace the old score if it is higher, then the individual results from the from the unit exams will be useful for students in preparing for a re-take exam on the same content (as shown in the re-take example).

The third condition to relax is the test format. So far it has been assumed the learning score analysis only applies to a MC test. More generally, learning score analysis applies to any test for which there are multiple test items that can be binarily scored, such as correct or incorrect, pass or fail, meets the threshold or does not meet it, or some other zero and one scoring scheme. For example, an instructor could administer a test composed of multiple items that require computations and written short answers. If the instructor can binarily score each computation or short-answer items on the test, then the learning score analysis can be applied. The number of test items could be minimal, but more items would enrich the learning score analysis. The pair of items on the pretest and posttest could be the same or they could be slightly different and matched for content coverage.

A fourth condition to relax is that a learning score analysis does not necessarily require many students, many test items, or much time. The student sample size can be small, perhaps as few as 15 students. In fact, most principles of economics courses, and even intermediate economic courses, contain enough students to conduct such analyses. In addition, smaller samples of students make it easier for instructors to report results to a class of students or to provide specific feedback to individual students.⁹ Also, the number of test items for a learning score analysis can be as few as 10 or 15 depending on its format (quiz or exam) and the time available for testing. Students often like regular feedback and a learning score analysis with a pretest and posttest can provide it.

Measurement Issues

Several measurement issues can affect the interpretation of learning scores. The first one is that learning scores do not explain why students selected the responses to pretest and posttest items. For example, motivational factors such as student effort or interest in the subject matter could affect pretest and posttest responses (Allgood et al., 2015). It may be that student preconceptions may cause confusion for students (Busom et al., 2017). More positively, responses could be influenced by instruction or assignments that improve student understanding (Balaban et al., 2016; Miller & Schmidt, 2021). Although learning scores only show the pattern of what students select on test items, they open new possibilities for why research on the possible reasons for student outcomes (Emerson & English, 2016; Happ et al., 2016; Schmidt et al., 2020; Walstad & Wagner, 2016). Many reasons are worth investigating if data are available to probe the minds of students and discover the specific reasons for selecting their item responses.¹⁰

For instance, self-confidence in acquired economic knowledge and guessing are likely to influence how students answer pretest and posttest items. Students could feel less confident and guess more on the pretest than the posttest, feel more confident and guess more on the posttest than the pretest, or do a mixture of the two, making it especially difficult to evaluate their separate or combined influence on test scores. Although these concerns are valid, no widely accepted standard exists for detecting or correcting for guessing and self-confidence on MC tests. Any proposed method depends on the available test data, testing circumstances, and assumptions about student responses to items (e.g., Bush, 2015; de Ayala, 2022; Smith & Wagner, 2018). In economics courses, however, most instructors do not adjust student test scores for guessing or self-confidence as it is not practical to do so, and it will differ for each student. Thus, in practice the influence of guessing or self-confidence may be a minor cost that most instructors are willing to bear given the benefits using of MC tests for testing understanding of economic concepts.¹¹

Another measurement concern is test-retest or memory effects if the same MC items are administered as a pretest and posttest. Over a short period (one week), research on memory effects on with university students finds mixed results (positive and negative) on student knowledge (Fazio et al., 2010; Roediger & Marsh, 2005), indicating that it may not be known what the exact effects will be. Over longer periods (many weeks or a semester), however, memory effects are less of an issue as students are not likely to remember test item content (Adams & Wieman, 2011; Happ et al., 2016; Schmidt et al., 2020).

None of the above measurement issues invalidate the use of learning scores. They also are not unique to learning scores as they may affect total test scores used by economics instructors for grades. The general point is that given course constraints and the practicalities of teaching, it may not be possible for an economics instructor to adjust test scores for students in an economics course to account for such measurement issues even if there was strong consensus on what should be done. This work on measurement issues is best left to test or psychometric specialists and other researchers. What learning scores do for economics instructors is provide them with timely information about the pattern of student responses to test items. Probing the reasons for those responses should stimulate conversations between students and instructors. Those conversations should give instructors the opportunity to provide helpful insights to students about their current achievement and what they still need to learn.¹²

Implications and Conclusion

Course testing in economics needs to be a dynamic and formative process that measures the flow of learning rather than a static event that only measures the stock of understanding. Learning scores make that goal achievable by giving economics instructors useful data about individual students or a student group to increase economic understanding and make teaching more efficient and satisfying. The results from the pretest cue students as to what they are expected to learn. The posttest results can be analyzed to identify those items where students show learning and need the most improvement. This test information can be used by students to better prepare for assessments and by instructors to better target instruction.

These desirable outcomes are made possible by learning scores, or at least they give instructors more practical tools from testing to achieve those desired outcomes. Learning scores enable economics instructors to know how much change in individual or group scores is positive learning (PL) (incorrect to correct), retained learning (RL), (correct to correct) negative learning (NL) (correct to incorrect), or zero learning (ZL) (incorrect to incorrect). Economics instructors can use this analysis to find out how much a posttest score is PL or RL, and how much of the pretest score is RL or NL.

Calculating and understanding learning scores makes it possible to extract insights about student understanding or achievement that would not be evident from total test scores. For example, consider a measurement paradox that students with the same total test score correctly answer a different set of test items. If only the total test score is considered, students with the same total score are viewed as showing the same learning or achievement. Learning scores, however, show that students with the same total score are not the same because the total test score is the combination of RL and PL. This relationship means that some students will show more RL on the total test, or already knew the content at the pretest, while other students show more PL, or will have improved their understanding of course content since the pretest.

Incorrect responses can be studied from a learning score perspective. Students or test items with high NL can be studied to discover why correct answers were given on the pretest and incorrect answers on the posttest. Perhaps it is because students guessed, or it is because they misunderstood the concept, or the test item was poor. In addition, the incorrect responses from pretest to posttest can be analyzed for what students consistently do not understand (ZL). Perhaps these items are too difficult for students, or an instructor needs to do a better job of teaching the economic concepts tested by these items.

Instructors can use course averages from learning score on a posttest to improve future tests. The averages can show a higher or lower mix of RL and PL on a posttest, so instructors will have to decide what mix they want to see on that test in the future. A learning score analysis of the test items reveals the extent to which they measure PL and RL, and this information can be useful in selecting items to keep or delete. If the purpose of the test is to measure the increase in learning, then items with more PL would be desired. If the purpose is to assess retention over time, then more RL items would be desired.

This study suggests that there needs to be more interaction between testing in economics courses and economics instruction. They should not be viewed as separate and discrete activities. Learning scores facilitate that interaction by giving economics instructors more insights about the learning progression of students and their learning outcomes.

Footnotes

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

William B. Walstad

Notes

Author Biographies

William B. Walstad is Professor of Economics, Emeritus, in the Department of Economics at the University of Nebraska-Lincoln. Among his research interests are testing and measurement in economics and personal finance and the analysis of outcomes from economic education and financial education.

Olga Zlatkin-Troitschanskaia is Professor and Chair of the Business and Economics Education at Johannes Gutenberg University in Mainz, Germany. Her research interests include business and economics education, educational measurement, assessment of competencies in higher education, international comparative studies in higher education, and critical online reasoning.

References

Adams

W. K.

Wieman

C. E.

(2011). Development and validation of instruments to measure learning of expert-like thinking. International Journal of Science Education, 33(9), 1289–1312. https://doi.org/10.1080/09500693.2010.512369

Allgood

Walstad

W. B.

Siegfried

J. J.

(2015). Research on teaching economics to undergraduates. Journal of Economic Literature, 53(2), 285–325. https://doi.org/10.1257/jel.53.2.285

Alonzo

A. C.

Gotwals

A. W.

(2012). Learning progression in science: Current challenges and future directions. Sense Publishers. https://doi.org/10.1007/978-94-6091-824-7

Asarta

C. J.

Chambers

R. G.

Harter

(2021). Teaching methods in undergraduate introductory economics courses: Results from a sixth national quinquennial survey. American Economist, 66(1), 18–28. https://doi.org/10.1177/0569434520974658

Balaban

R. A.

Gilleskie

D. B.

Tran

(2016). A quantitative evaluation of the flipped classroom in a large lecture principles of economics course. The Journal of Economic Education, 47(4), 269–287. https://doi.org/10.1080/00220485.2016.1213679

Brückner

Schneider

Zlatkin-Troitschanskaia

Drachsler

(2020a). Epistemic network analyses of economics students’ graph understanding: An eye-tracking study. Sensors, 20(23), 6908. https://doi.org/10.3390/s20236908

Brückner

Zlatkin-Troitschanskaia

Küchemann

Klein

Kuhn

(2020b). Changes in students’ understanding of and visual attention on digitally represented graphs across two domains in higher education: A post-replication study. Frontiers in Psychology, 11, 2090. https://doi.org/10.3389/fpsyg.2020.02090

Bush

(2015). Reducing the need for guesswork in multiple choice tests. Assessment and Evaluation in Higher Education, 42(2), 218–231. https://doi.org/10.1080/02602938.2014.902192

Busom

Lopez-Mayan

Panadés

(2017). Students’ persistent preconceptions and learning economic principles. Journal of Economic Education, 48(2), 74–92. https://doi.org/10.1080/00220485.2017.1285735

10.

Daro

Mosher

F. A.

Corcoran

(2011). Learning trajectories in mathematics: A foundation for standards, curriculum, assessment, and instruction. Consortium for Policy Research in Education. https://doi.org/10.12698/cpre.2011.rr68

11.

de Ayala

R. J.

(2022). The theory and practice of item response theory (2nd ed.). Guilford Press. https://www.guilford.com/books/The-Theory-and-Practice-of-Item-Response-Theory/R-de-Ayala/9781462547753

12.

Duncan

R. G.

Hmelo-Silver

C. E.

(2009). Learning progressions: Aligning curriculum, instruction, and assessment. Journal of Research in Science Teaching, 46(6), 606–609. https://doi.org/10.1002/tea.20316

13.

Emerson

English

L. K.

(2016). Classroom experiments: Teaching specific topics or promoting the economic way of thinking? Journal of Economic Education, 47(4), 288–299. https://doi.org/10.1080/00220485.2016.1213684

14.

Fazio

L. K.

Agarwal

P. K.

Marsh

E. J.

Roediger

H. L.

(2010). Memorial consequences of multiple-choice testing on immediate and delayed tests. Memory & Cognition, 38(4), 407–418. https://doi.org/10.3758/mc.38.4.407

15.

Happ

Zlatkin-Troitschanskaia

Schmidt

(2016). An analysis of economic learning among undergraduates in introductory economics courses in Germany. Journal of Economic Education, 47(4), 300–310. https://doi.org/10.1080/00220485.2016.1213686

16.

Lasry

Guillemette

Mazur

(2014). Two steps forward, one step back. Nature Physics, 10(6), 402–403. https://doi.org/10.1038/nphys2988

17.

Miller

L. A.

Schmidt

J. R.

(2021). The effects of online assignments and weekly deadlines on student outcomes in a macroeconomics course. American Economist, 66(1), 46–60. https://doi.org/10.1177/0569434520968250

18.

Roediger

H. L.

Marsh

E. J.

(2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(5), 1155–1159. https://doi.org/10.1037/0278-7393.31.5.1155

19.

Saunders

Powers

J. R.

(1995). Reallocating content coverage in principles of microeconomics to increase student learning. American Economic Review, 85(2), 339–342. https://www.jstor.org/stable/2117944

20.

Schmidt

Zlatkin-Troitschanskaia

Walstad

W. B.

(2020). IRT modeling of decomposed student learning patterns in higher education economics. In Zlatkin-Troitschanskaia

(Ed.), Frontiers and advances in Positive Learning in the Age of Information (PLATO) (pp. 237–251). Springer. https://doi.org/10.1007/978-3-030-26578-6_17

21.

Siegfried

J. J.

Fels

(1979). Research on teaching college economics: A survey. Journal of Economic Literature, 17(3), 923–969. https://doi.org/10.1007/978-3-030-26578-6_17

22.

Smith

B. O.

Wagner

(2018). Adjusting for guessing and applying a statistical test to the disaggregation of value-added learning scores. Journal of Economic Education, 49(4), 307–323. https://doi.org/10.1080/00220485.2018.1500959

23.

Walstad

W. B.

Miller

L. A.

(2016). What’s in a grade? Grading policies and practices in principles of economics. Journal of Economic Education, 47(4), 338–350. https://doi.org/10.1080/00220485.2016.1213683

24.

Walstad

W. B.

Rebeck

Butters

R. B.

(2013). The test of economic literacy: Development and results. Journal of Economic Education, 44(3), 298–309. https://doi.org/10.1080/00220485.2013.795462

25.

Walstad

W. B.

Wagner

(2016). The disaggregation of value-added test scores to assess learning outcomes in economics courses. Journal of Economic Education, 47(2), 121–131. https://doi.org/10.1080/00220485.2016.1146104

26.

Watts

Schaur

(2011). Teaching and assessment methods in undergraduate economics: A fourth national quinquennial survey. Journal of Economic Education, 42(3), 294–309. https://doi.org/10.1080/00220485.2011.581956

Learning Scores and Economics Instruction

Abstract

Keywords

Introduction

A Measurement Paradox Revealed by Learning Scores

A Practical Example: A Re-take Exam

Analysis of Group Averages

Analysis of Individual Students

Analysis of Test Items

Other Applications and Extensions

Measurement Issues

Implications and Conclusion

Footnotes

Declaration of Conflicting Interests

Funding

ORCID iD

Notes

Author Biographies

References