Abstract

Introduction
The articles in this issue are largely the result of the first Education Frontiers Forum held on the campus of East China Normal University in October 2018. The theme of the forum was Large-Scale Assessment (LSA): Problems and Prospects.
The purpose of the forum was not to discuss the technical issues of LSAs or assessment. Instead, it was to debate and discuss the use (or misuse) of LSAs, the role of LSAs in education, and the capacity and the limitations of LSAs to assess what matters in education at a philosophical level. In other words, we were more interested in discussing what LSA cannot do than what it can. We were also more concerned about the negative consequences of LSAs in education than their positive impact.
The impetus of the forum’s focus on LSAs came from the realization that the education around the globe has come to a critical moment and the likelihood of more LSAs being used to drive educational changes. Education systems in the world are all facing the challenges brought about by technological changes. There has been a growing consensus that education needs to change in order to cultivate human capacities fitting for the future (Barber, Donnelly, & Rizvi, 2012; Schleicher, 2018; Schwab, 2015; Trilling & Fadel, 2009; Wagner, 2008, 2012; World Economic Forum, 2016; Zhao, 2012). LSAs have been enlisted as a powerful tool to drive the change. In the U.S., for example, the No Child Left Behind Act used LSAs as a primary policy tool in its national attempt to hold schools and educators accountable for closing the so-called achievement gap (Hess, 2011; No Child Left Behind Act of 2001, 2002; Zhao, 2009). The Programme for International Student Assessment (PISA) was developed by the Organisation for Economic Co-operation and Development with the purpose to provoke and drive education reforms around the world (Baird et al., 2016; Schleicher, 2018).
Similarly, China has engaged in multiple waves of efforts to reform its National College Entrance Exam, the Gaokao, over the past few decades in order to induce changes in its education that include reducing academic pressure and cultivating the diverse and creative talents (Zhao, 2014; Zhao & Wang, 2018). Likewise, South Korea has made significant changes to the use of LSAs for college admissions for the purpose of making its education less stressful and more supportive of talents needed in the 21st century (Zhao, 2015; Zhao & Wang, 2018). The College Board, the owner of one of the most influential LSAs for college admissions, the SAT, has also introduced changes to the test (Landy, 2014).
There are more LSAs on the horizon. PISA has plans to expand its enterprise to include assessments for 5-year-olds or the “baby PISA” (Pence, 2016) and assessments for creativity, collaboration, and global competence. Different countries have also expressed increasing interest in developing assessments for the so-called 21st-century skills or soft skills such as entrepreneurial thinking, creativity, collaboration, growth mindset, and so on (Duckworth & Yeager, 2015; Zhao, 2016a). It is thus important to examine the potential negative impacts of LSAs.
Presentations and discussions at the forum as well as our literature review of LSAs in preparation for the forum resulted in the articles included in this issue. While opinions diverge, the consensus is that LSAs are extremely powerful in education. And that tremendous power should be kept in mind and respected in decisions about the uses and development of LSAs in the future.
The power of LSAs
LSAs wield tremendous power in education. LSAs used for international comparisons such as PISA affect educational policies and practices, fuel furor in the medium, pressure politicians and education leaders, and stir up emotions in the public globally (Baird et al., 2016; Bieber & Martens, 2011; Pereyra, Kotthoff, & Cowen, 2011; Schleicher, 2018; Zhao, 2016b). LSAs used for accountability such as state accountability assessment in the U.S. command the attention of the public and media, dictate curriculum and pedagogy, change lives of educators (including sending some to prison), and affect the fate of schools (Jennings & Bearak, 2014; McMurrer, 2007; Nichols & Berliner, 2007; Reback, Rockoff, & Schwartz, 2011; Wong, Wing, & Martin, 2016). LSAs used for selection such as National College Entrance Exam in China determine the fate of millions of youth each year, direct teaching and learning in and outside schools, and define the culture of education (Cheng, 2011; Choi & Park, 2013; Ho, 2003; Zhao, 2014).
The power of LSAs derives from a number of sources. First, as the late Harvard University biologist and historian of science Stephen Jay Gould noted, there is a human tendency “to convert abstract concepts into entities” (Gould, 1996, p. 56). That is, human beings desire to translate abstract ideas into concrete entities. Assessments are essentially a way to convert abstract ideas such as mental abilities and creativity into something tangible. Once the conversion is complete, the “tangible thing” becomes the abstract. Hence, an IQ score is perceived as the state of one’s mental ability.
Second, as also pointed by Gould, humans have the “propensity ordering complex variation as a gradual ascending scale” (Gould, 1996, p. 56). We love ranking. We cannot resist the urge to sort people, to know who is better and who is worse, to award different fate to people based on their relative positions on ranking spectrum. To rank, we need a tool to reduce the complex variations into a single number. And again as soon as this number is arrived at, it becomes accepted as a valid indicator of whatever it is said to represent. People then are told to accept their ranking.
Third, in addition to and because of these two natural human fallacies, LSAs carry significant material consequences. They are used to define and judge human capacities for future success. The judgment is then used to award resources and opportunities such as enrollment in elite colleges, which is presumed to lead to better jobs and lives. Thus, students work hard to score well on college admissions tests.
LSAs also are used to judge the degree to which educators and schools help people acquire the assessed capacities for future success. Such judgment is basis for rewarding or punishing educators with reputation, money, and job conditions. As a result, educators are pressured to do whatever to help their students to do well on LSAs. Such judgment is also passed on politicians and government officials who are held responsible for the quality of entire education systems. Thus, politicians and government officials are under the control of LSAs.
The impact of LSAs on society
The psychological acceptance of LSAs as valid measures of human capacities and the material and political consequences society has attached to LSAs make LSAs an extremely powerful tool to shape human societies in two significant ways. First, LSAs have a significant impact on the educational experiences of children because they guide the spending of educational resources in a society. They command students, teachers, parents, and politicians to focus their energy, time, and money on what is tested. As a result, curriculum is narrowed to what is tested. Teaching and learning is fixated on what is tested. What is not tested is excluded. The students in schools today are citizens and leaders of societies tomorrow. So what they experience in schools today ultimately affect what kind of citizens the human society will have in the future.
Second, LSAs decide what talents are cultivated and what talents are depressed or left to wither on their own. Human beings are different as a result of the process of nature via nature (Ridley, 2003). They have a jagged profile of abilities (Rose, 2016), with strengths and weaknesses in different domains. While the whole spectrum of human talents are valuable and social prosperity relies on diverse talents (Chua, 2007; Pink, 2006; Zhao, 2018, 2019), LSAs can only measure a limited number of abilities. As a result, individuals who happen to have strengths in the areas LSAs measure receive more resources and opportunities and become more valuable in society, while those with talents that are not measured are deemed as failures and consequently suppressed or ignored. Children are assessed in various ways using LSAs and the results often influence, directly or indirectly, what educational opportunities they will receive. For example, some children may be placed in extended remedial classes to improve their reading skills because an LSA shows that they are not as proficient as others. As a result, these children lose time and opportunity to experience other possibilities such as math, art, music, or sports, which means they may not be able to ever discover and develop their talents and passions. Furthermore, some children are placed in well-resourced schools and others excluded on the basis of their test scores, which means that the fate of some children are doomed from a very young age.
Consequently, LSAs determine the nature of composition of the citizenry of a society in terms of talents. For instance, the Chinese imperial exam, one of the oldest LSAs, resulted in a society dominated by scholar-officials who were literary experts rooted in the Confucian tradition. Individuals talented with and interested in other areas such as technology and science were largely ignored (Zhao, 2014). This had serious consequences, one of which is that the empire lacked the talent pool to initiate the Industrial Revolution despite of the fact that it had the economic conditions almost 200 years earlier than Britain (Lin, 2006).
In summary, LSAs have the power to significantly affect individuals. They influence the opportunities individuals receive and the experiences they can have in education. Consequently, they play a considerable role in determining the fate of individuals by allocating different opportunities to different individuals according to their performance on various LSAs. Moreover, LSAs have the power to significantly affect human societies by affecting the makeup of talents in a given society.
Respect the power of LSAs
With great power comes great responsibility. Power can be equally constructive and destructive. LSAs, like any powerful agents, can be extremely effective in bringing desirable outcomes while doing great damage. Thus, the great power LSAs wields over education should be respected and used responsibly.
To use LSAs responsibly, we need to keep in mind what LSAs cannot measure. The great power of LSAs gives them an oversized influence on education and thus what they measure becomes what matters to policymakers, educators, students, parents, and the general public. But there are many things that LSAs do not, and cannot, measure. What they do not and cannot measure are often neglected, if not outright suppressed or rejected. But it is possible that what is not measured is more important for societies and individuals.
First, LSAs cannot measure what they are not intended to measure. This seemingly obvious fact is very often overlooked in education. Although LSAs, even the most valid LSAs, only measure what they are designed to measure, their results are often overgeneralized beyond what they measure. For example, the results of PISA have been generalized to reflect the quality of educational systems although they simply reflect how well 15-year-olds in different systems perform on the PISA assessments in math, reading, and science. Results of college admissions exams such as China’s Gaokao and SAT in the U.S. are interpreted as indication of one’s readiness for college, although college readiness includes a lot more than test scores (Conley & McGaughy, 2012). As a result, they have been found to be unreliable predictors of success in college (Burton & Ramist, 2001; National Association for College Admission Counseling, 2008).
Second, LSAs cannot measure the unknown. LSAs can only measure what is known. Thus, it is impossible to design tests for unknown constructs. For example, IQ was not measured before the construct of IQ was invented. Creativity was not measured before there was the idea of creativity. Global competency was not measured before until the concept was developed (Zhao, 2016a). There are certainly constructs that have not been discovered or developed in education and these unknown constructs may be more important than the known ones. Research has shown that what has been measured cannot accurately predict the success of individuals and societies (Baker, 2007; Levin, 2012; Tienken, 2008; Zhao, 2009, 2018). There must be some other important human qualities at work (Brunello & Schlotter, 2010; Duckworth & Yeager, 2015).
Third, LSAs cannot measure exceptionality. The Danish Nobel Winning physicist Niels Bohr points out the nature of measurement: the entity measured cannot be divorced from the measuring instrument. In other words, “mathematical ability, indeed any ability, is not an intrinsic property of the individual; rather, it’s a joint property of the individual and the measuring instrument” (Morrison, 2013). As such, LSAs, even within the domain they are intended to measure, can only measure abilities within the designed limits and thus cannot measure abilities that fall outside. For example, a mathematician who takes a third grade math test can get a perfect sore, but her mathematical ability is certainly beyond a third grader who achieve the same perfect score. If Albert Einstein took a high school physics test, the best he could do is a perfect score but we cannot say that his ability in physics is the same as a high school student who also scored perfectly.
Fourth, LSAs cannot measure dynamic, fluid, ill-defined, and context-dependent constructs. LSAs may be able to assess abilities that are relatively stable and well-defined, but many constructs of human abilities are ill-defined, fluid, dynamic, and context dependent. For example, creativity, critical thinking, communication, global competency, entrepreneurship, and many other so-called 21st-century skills are all ill-defined in that they all have different definitions. They are also very fluid and dynamic in that they change a lot depending on the individual and contexts—one can be creative in one area but not in others, for example. They are also often context dependent and culturally bound. For instance, what is considered creative in one context may not be in another and what is viewed effective communication skills in some cultures may be considered ineffective or even counterproductive in others.
Fifth, LSAs cannot measure the uniqueness of individuals. LSAs are a common measure of some constructs (abilities) applied to a group of individuals. They can only judge some aspects of an individual’s capabilities, but cannot assess all aspects of what a person can do, no matter how many LSAs we develop. Moreover, an individual’s capability is a unique combination of abilities, personalities, and interests. In other words, the capability is more than a simple addition of different components—a person’s capability for college success, for instance, is not a simple addition of one’s ability in math, language, science, creativity, and personality. Every individual has a jagged profile of abilities (Rose, 2016).
In summary, what LSAs can measure is limited to a narrow spectrum of what matters to the success of individuals and societies. As such, LSAs are incapable of measuring the broad spectrum of valuable education outcomes and the unique combination of talents, knowledge, skills, and personal qualities of individual human beings. But because of the power of LSAs, what they measure often becomes all that matters in education, thus misguiding policies and practices in education, resulting in a host of damaging side effects discussed in an article in this issue.
LSAs are not going away. What we need to do is to remember the saying: what can be counted may not count and what counts may not be counted.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
