Sage Journals: Discover world-class research

Abstract

This piece looks at indicators and data models in countries outside China to see if such approaches may contribute to the accountability and quality drives in higher education in China. It notes the policy movement in China from rapid student number expansion to more emphasis on quality and social purpose as desirable characteristics for its universities and colleges. The article remarks that establishing legibility to the center to enable effective but lighter-touch surveillance of higher education institutions is difficult in large mass sectors without a major commitment of resources and bureaucratic intervention which may not be desired by all stakeholders. Rather, it is suggested that utilizing good data and indicators may be one means of overcoming the difficulties in balancing central public control with the increased autonomy of universities in China.

The article distinguishes between static data and dynamic data and goes on to consider the relevance of student posts on social media as an accurate guide to the student experience, a key component of the many institutional attempts in China to capture this dimension of quality.

Overall, the chapter debates the extent to which an over-reliance on indicators and data may decontextualize the rich experiences and nuances found in the learning and teaching processes. There is the danger of over-simplification and the “exclusion of narrative” necessary for a full understanding of the knowledge process. Rather, the author supports a ‘variable geometry’ of approach to quality assessment and other forms of higher education accountability. This would seek to utilize both external data and indicators alongside a peer review methodology in which human opinion and assessment remains important.

Keywords

indicators data quality accountability evaluation media

1 Introduction: China and the Growth of a Mass Higher Education Sector

Higher education in China has grown at an extraordinary pace in recent decades. Between 1998 and 2015 the China Ministry of Education stated that total enrollment of college students rose from 3.4 million to 37 million, and the gross enrollment ratio reached 40 per cent (China Ministry of Education, 2017). Yet, as Jinghuan Shi et al. note, a focus on quantity is rapidly giving way to one based on quality and accountability, not least as a restructuring of the economy is leading to a slower growth rate: “the emphasis of Chinese higher education development has been shifting in recent years from quantitative expansion to quality improvement” (2018, p. 371). The external evaluation system under construction at the national level is being complemented by a number of internal quality assurance mechanisms within individual institutions. Parents and taxpayers, and government and students, are asking for increased accountability from institutions to provide assurance that value for money is being achieved. There is a special focus on student surveys and participation as methods for indicating and measuring both levels of quality but also quality improvement. “Governing the quality has become a new challenge in China’s higher education and answering the questions requires multi-dimensional thinking” (ibid., p. 373). Perhaps just as importantly, “most of the projects, if not all, allow for domestic and international comparisons by employing and constructing comparable benchmarks, indicators and test scores” (ibid., p. 378).

As a result the very notion of quality is expanding, with an original emphasis on globally indexed scientific papers (and world-class research rankings) now being accompanied by indices of how universities and colleges may contribute more intensively to communities, and general prosperity and wellbeing, including through innovation. “Rankings based on employability produced by third parties are helping to open up the black box of Chinese higher learning” (p. 375), and this development helps constitute a focus on outcomes, particularly graduates’ labor market performance. As higher education institutions have gained increased autonomy from direct control and supervision from the state, the task for central decision-makers is how to hold so many institutions to account without expending large amounts of additional resources and resorting to high levels of bureaucratic supervision and intrusion. Legibility at the center through the cultivation of data and the construction of appropriate indicators of quality is one means by which this may be achieved. What do countries outside China have to offer that may be worth exploring?

2 Emerging Technologies in the Practices of Evaluation

An emerging technology in the practice of evaluation, and governance more generally, in higher education is the use of numbers and indicators. Such has been its recent rise that it competes with a more traditional and established methodology of evaluation, that associated with peer review or assessment by colleagues sharing similar higher education “spaces.” Numbers and metrics, however, tend to be used to apply assessment from outside the academy, normally governmentally-endorsed, and purporting to be acting more on behalf of students and the wider public interest. Research assessment exercises are one such increasing example in recent years, but in the UK we have also seen the development of a student outcomes or teaching excellence framework. These methodologies challenge the essential self-governance of traditional forms of evaluation in universities and colleges.

Various aggregations of indicators, such as indexes, rankings, and composites are included in the broad definition. Often such indicators are ‘mashed together’ compilations drawn from different data sources (Davis et al., 2012). The publishers of such indicators (say, of research excellence, or social diversity) often claim that they are more “objective” than peer review, taking out the errors of subjectivity, and relying more on hard information and data. Yet, the originators of number-driven indicators and evaluations often have to make choices of which particular indicators to use, the levels of weightings and similar algorithms to employ, devise methods to overcome double-counting of essentially the same thing, and find ways to overcome data unavailability. Incentivizing data supply may involve offering significant rewards to organizations. In China, the Shanghai Stock Exchange encourages companies to file annual Corporate Social Responsibility (CSR) reports and to show the development of CSR strategies by allocating priority election into the prestigious Shanghai Corporate Governance Sector (Sarfaty, 2015, p. 113).

Consequently, subjectivity is not entirely overcome through the use of data for evaluative purposes and, in the case of algorithms, may be well-hidden, and not really transparent and open to accountability. The challenge is for institutions to engage constructively with data-base assessors and to bring the same levels of scrutiny and, where necessary, contestation that applies generally in peer review. This is not always easy and most data and its modelling for evaluation is drawn from institutions themselves, mainly because it can be very expensive and time-consuming to start a data collection system from scratch. This means that there is always the danger of institutions “massaging” their data in order to achieve higher evaluations, and this “gaming,” it is argued, is enhanced if too much of the evaluation technologies, especially around weightings and algorithms, are made transparent to institutions.

3 Advantages and Disadvantages of Evaluating by Numbers

Generally the increased use of data and indicators for evaluation aims at increased simplification, not least for intending students. Although policies in the UK, for example, aim to make students into “informed consumers” it is not clear in many commercial sectors, never mind higher education, how much information and choice consumers really want. Data overload may present real problems and consequently consumers, including students, like reliable guides and a clutch of a few but trusted markers of quality and choice. As a result, students place greater confidence in rankings as a form of simplification than do the professional insiders, such as academics, who are more concerned with nuance, context and narrative. Students are more likely to believe in simplified metrics than are academics.

Indicators, such as teaching excellence or research assessment outcomes, respond to demands for (and receptivity to) numerical, rank-ordered and comparable data—that is, to the desire to make visible or legible that which is difficult to know publicly. Such indicators seek to turn an activity (such as classroom teaching) and its outcomes that are rather secretive and invisible, into something more transparent, accountable, and capable of being evaluated. Evaluators use a simplification methodology for compiling indicators, such as the means for aggregating data from many sources. This means excluding data not deemed to be reliable or representative. There should be therefore a set of justifications as to why the organization and simplification of the data really does denote the essence of the phenomenon under evaluation, such as teaching excellence. Although considerable controversy often attends these claims to simplified representation contained in indicators, it is the very reductionism that underlies their appeal to consumers and policymakers. They are intended to be convenient and easy to understand.

Numerical data appears to possess technical neutrality and objectivity, in comparison with the more complicated and nuanced accounts provided by those working in universities and colleges. Davis et al. (2012, p. 8) note

They are often numerical representations of complex phenome intended to render these simple and more comparable with other complex phenomena, which have also been represented numerically. As such, ‘indicators can conceal nuances and restrict contestation by displacing subjective decision-making with the appearance of hard data’ (Fischer 2012, p. 217). Of course, controversies over the use of data-driven indicators of assessment often focus on the ‘exclusion of narrative’ (Espeland 2015), or the stripping away of context and meaning for the numeric representations.

In a sense, the stark comparisons of indicators, excluding ambiguity, but often using flawed data or data collected for other purposes, “absorbs uncertainty” in the sense that underlying nuances and contestations are swept under the carpet or remain largely invisible. Consequently, it is not surprising that academics in universities and colleges remain suspicious of entirely data-driven evaluation methodologies (although this varies by discipline, scientists and engineers, for example, being rather more accepting than sociologists or those in arts subjects).

The thinning practices associated with indicators and their underlying data, in a sense, reflect similar patterns within the authority structures of the organizations tasked with collecting and submitting data on themselves that is used by external compilers to evaluate their performances. That is, information is initially gathered on the “front line” and is then progressively moved upward and is increasingly subject to processes of data amenability and parsing, so that by the time data reaches the executive suite, it has become fit for external consumption and authorized for submission to external data-gatherers. This process of upward editing removes nuance, ambiguity and incompatibilities and data appears to be far more robust than it actually is. Data moves upwards along the pathways of hierarchically arranged authority.

In some cases, such as those seeking to represent risk within an organization, for example, representations such as risk registers, risk maps, and similar artefacts are one-step further removed from reality. As a contingent or future possibility of an adverse outcome, risk requires artefacts such as risk registers to make it real and subject to managerial intermediation. Often the aim is to secure auditability rather than risk management as such, but data trails aid a process of at least formal post-hoc accountability.

Indicators are not simply neutral or technical objects, however, but also contain statements or models of standards against which performance or conduct is measured. In conceptualizing an indicator initially, an underlying explanation for change is assumed. This involves a theory, the construction of categories for measurement, and modes of data analysis. That is, the indicators are based on premises on what is good and what is bad, and how this is measured. The indicator effectively defines standards by the assemblage of specific criteria and measurements.

The indicators also allow the governance of diverse and diffuse entities to be exercised without costly and burdensome interventions and evaluations more directly and continuously by armies of inspectors. They allow “governing at a distance” by generating legibility to the center provided by numeric and similar often-simplified representations. In a globalizing and increasingly populous and diversifying world regulating many units (often splinters of larger agglomerates) is an increasingly difficult task for evaluators. Power and authority tends to gravitate to a world of devolved or delegated authority but within larger and more transparent systems of accountability made possible by data and similar indicators. Atomism is overcome as large networks of more independent agents produce the data necessary for their monitoring by external evaluators.

As a result, certainly in the UK, data-driven methodologies tend to exist side by side with peer assessments or narratives. The Research Assessment exercises, for example, although moving towards increased use of data in recent years, such as measuring the social impact of research, continues to use peer review as a key component of its methodology. The recently-introduced Teaching Excellence and Students Outcomes Framework in the UK also uses a combination of data assessment and expert peer evaluation drawing on institutional ‘narratives’ in the respective submissions (King, 2018). Moreover, there is evidence that the increased data made available in these exercises can produce an overload of numbers for assessors who tend to fall back more on peer review of narrative submissions. Essentially, we are finding what we might call a “variable geometry” at work where data approaches are combined with peer judgements but in varying degrees according to subjects and the amount of data collected.

Nonetheless, the use of metrics and similar numbers in evaluation and governance is not unproblematic and many call for ethical considerations to be given greater weight in collection and application of data. In the UK, for example, the Wilsdon Report (2015) argues for the use of “responsible metrics,” which is understood in the following dimensions:

Robustness, basing metrics on the best possible data in terms of accuracy and scope

Humility, recognising that quantitative evaluation should support, but not supplant, qualitative, expert assessment

Transparency, keeping data collection and analytical processes open and transparent, so that those being evaluated can test and verify the results

Diversity, accounting for variation by field, and using a range of indicators to reflect and support a plurality

Reflexivity, recognising and anticipating the systemic and potential effects of indicators, and updating them in response

A further problem in recent years in both higher education and other sectors is that regulators and government bodies impose quite heavy data collection demands on institutions. To what extent such information gathering is necessary is often unclear. However, uncoordinated requests, often for the same data, can create a large burden for university administrations. The best regulators, particularly in government and the commercial sector, increasingly have introduced forms of “data governance” within their organizations to counter such trends. The aim is that officials must secure appropriate authorization from a data governance committee that ensures that the information request is necessary and does not replicate previous or similar requests.

4 Key Data

In the practice of managing universities, three types of data have significance. First, static data, which refers to the traditional storage of finance, administration and other records. Second, dynamic data, which is associated particularly with student learning and the digital classroom where the “fingerprints” of learning and teaching are associated (such as how often a student goes to the library, which parts of the curriculum do students have particular problems with and what they are, and so on). Thirdly, and increasingly, social media is an increasing preoccupation of universities as universities and colleges seek to get a grip on what their students are saying about their student experiences. Posts online to university websites, such as Facebook and Twitter, provide the opportunity to directly harvest data on the student experience. In China, surveys focused on the student experience and increasingly also on student outcomes, have become a marked feature of internal universities’ quality assurance systems (Jinghuan Shi, Yan Luo, Wen Wen and Fei Guo 2018). Below we look more closely at the potential of social media to provide instruments of quality control and improvement.

The distinction between static and dynamic data is becoming reflected in indicator systems, such as for performance measurement and the assessment of quality. Alexander C. McCormick and Jillian Kintze (2018), for example, distinguish between quality as state and quality as process. In their view a comprehensive system of indicators for evaluating the quality of undergraduate education should consider two dimensions of quality: the static dimension, which examines performance at a moment in time, and the dynamic dimension, which takes into account the imperative for higher education institutions to use performance information to guide improvement initiatives’ (p. 41).

Thus, most performance-indicator systems are based on proxy measures of performance and tend to ignore a more dynamic performance-improvement system. Instead they argue for a focus on signals and artifacts as measures of evidence-informed improvement—“verifiable products of a range of performative acts that signify a genuine commitment to evidence-informed improvement: communicating evidence, action plans, implementation, and loop closing” (p. 42).

5 Social Media

5.1 Overview

One method for overcoming both data overload on institutions, and the potential “gaming” of data by universities and colleges to present themselves in the best light, is to obtain the data in another way, one which is more direct and lacks the intermediary interventions by institutions on student experience assessment, for example, that has been found in some jurisdictions. One method is to use social media data. A recent research project at the London School of Economics and Political Science (LSE) for the UK Quality Assurance Agency (QAA) by Griffiths, King and Leaver (QAA 2018), underlines the potential for such approaches.

Over the last two decades, the UK Government policy for higher education in England has focused on introducing more market-like characteristics. This places students at the heart of a consumer-led system, where the exercise of better-informed student choice of universities is intended to drive up competition, quality, and learning innovation. An important requirement for the new regulatory system is that data on the student experience (viewed as a proxy for “quality”) should be timely, robust, low-burden and representative of student views and experiences.

One alternative source of information that may overcome many of the longstanding issues with regulatory data is the unsolicited feedback of service users. Such data can be gathered in near real time, reports at various stages of the student experience (rather than simply at the end of the final year of an undergraduate course, as with the UK National Student Survey), and bypasses institutions’ administrative systems.

The research by Griffiths et al. explores whether the ever-growing volume of online student feedback can provide insight into the quality of higher education provision. Over 200,000 reviews of their respective institutions by students posted online were examined and a collective-judgement score comprising a time-limited moving average of the review scores for each provider was derived. This collective-judgement score had a positive correlation with Annual Provider Review (APR) outcomes, Teaching Excellence Framework (TEF) scores, and the overall satisfaction scores from the National Student Survey (NSS).

For example, on the day 60 days prior to the announcement of the TEF rankings for institutions in 2017, the average collective-judgement score of providers subsequently judged to be “bronze” (lower tier) was lower than that for providers subsequently judged to be “silver,” which was in turn lower than that of providers subsequently judged to be “gold.”

Similarly, the average collective-judgement score on the final data submission date for the APR was lower for those providers that required an action plan—demonstrating the need for improvement—than those that did not.

Finally, there was a positive (albeit weak) correlation between a provider’s collective-judgement score on the day the NSS survey opens, and their overall satisfaction score.

There is a significant volume of student feedback online. For this research, over 210,000 reviews were gathered for the 165 higher education institutions, 211 further education colleges and twelve alternative (private) providers considered in scope. These reviews came from three sources: Facebook; whatuni.com ; and StudentCrowd.com . This may, however, be only a fraction of the data available. Over the course of twelve months over 2.6 million tweets mentioned providers’ main twitter accounts. It is estimated that around 2 per cent of these relate to the student experience, which translate into a further 90,000 tweets a year, even before departments, career services, student unions and other accounts are considered.

Some data sources require more work than others, however, and significant investment in resources required to identify and score relevant tweets amongst the millions available. However, twitter data will be included in future research. Nonetheless, in the UK, the significant volume of reviews available online is a relatively new phenomenon but has grown substantially since 2012.

We should note that we look at within-year student satisfaction, the scores are most positive at the start and at the end of the academic year, but with a small lift halfway through the year. We attribute the higher scores at each end of the academic year to student excitement and relief, and the associated optimism and biases, at starting and finishing an academic year.

5.2 Student Reviews

What uni.com had the greatest number of reviews with over 121,000. The majority of reviews detail not only the provider but also courses and lecturers.

Only 13 of the 378 providers considered in the research did not have a Facebook page. Of the 365 that did, 231 had the “Reviews” function enabled giving people the chance to rate them and optimally leave a comment. Further to provider-level pages, the researchers systematically searched for pages relating to departments, schools, institutes, faculties, student unions and career services at each provider. While provider-level reviews accounted for almost 54,000 of the 73,424 reviews on Facebook, just fewer than 20,000 reviews were identified at the sub-provider level.

Unlike the whatuni.com and StudentCrowd.com data, the Facebook data suffered from what appeared to be irrelevant comments that the researchers attempted to clean from the data. These reviews were accompanied by text from people who: were looking for accommodation or flat mates, had never visited the provider but were looking for advice, were promoting their businesses, or were advertising jobs. Nonetheless, the majority of reviews on Facebook appeared to be from current or recent students, although there were also reviews from parents having taken their children to open days, job applicants, former staff and alumni from decades ago. With an average review score of 4.33 (maximum of 5.00), Facebook had the most positive reviews of the three data sources.

StudentCrowd.com reviews cover providers, specific courses, and student accommodation, and offer more in-depth ratings on such topics as campus facilities and the quality of WiFi and the Internet. With an average overall review score of 4.08, these reviews were more negative than those in the other two, although still overwhelmingly positive.

5.3 Results

The overall finding of this research is that providers’ collective judgement scores are positively associated with: Annual Provider Review (APR) outcomes, an exercise by the higher education regulator (Office for Students) which considers existing data that is used in a “contextualized and rounded way” with judgements on quality and standards as reached through peer review; the Teaching Excellence and Student Outcomes Framework (TEF) which judges against submitted institutional data and contextual narratives whether teaching quality has demonstrated “gold,” “silver” or “bronze” rankings; and a National Student Survey (NSS) primarily aimed at final year undergraduate student and covers a number of aspects of the student experience, including “overall satisfaction.”

Whilst these positive correlations between providers’ collective-judgement scores is found for each of these data sources, the collective-judgement score created by combining all the reviews proves an even more effective predictor than the individual data sources when it comes to APR and TEF outcomes, although this was not the case with NSS scores. Thus, a poor collective-judgement score does not guarantee a provider will perform badly on other quality measures. However, a provider with a poor collective-judgement score has greater probability of performing poorly on other quality measures. Likewise, a provider with a good collective-judgement score is not guaranteed to be performing well on other quality measures, but has a greater likelihood of doing so.

6 Conclusion

This research has explored whether the use of online student reviews can provide insight into the quality of higher education provision. After considering over 200,000 reviews and calculating a collective-judgement score from them, we found a positive correlation between the collective judgement of students and APR, TEF and NSS outcomes, that is, with other methodologies for evaluating university quality. The ‘wisdom of crowds’ phenomenon (Surowiecki, 2004), which articulates that when our often-flawed individual assessments are aggregated in the right way, our collective assessment is often highly accurate, seems to be borne out by this research. As a result, the use of the collective judgement of students could have a significant benefit for institutions, regulators, and students.

However, is not clear as yet the extent, or if at all, these three entities are willing to accept student feedback of the kind used here as a valuable and helpful tool for improving quality, the student experience, or organizational learning. Moreover, given such willingness (and its undoubted variability and distribution in sectors), we need to explore which advisory and consultancy programs could be designed for institutions and others that would ensure more organizational attention to student feedback data and how it is best used for improvement in meeting both internal and external objectives.

This research is only an introduction and beginning of research on the use of the collective judgement of students. There is far more to be explored, including but not limited to:

assessing whether students and providers accept the monitoring of student feedback as an oversight or as a quality improvement instrument

Considering whether more can be done to encourage student feedback in underrepresented institutions

Examining Twitter and other sources of data

Considering the ability of student feedback to identify department or course-level quality and/or the different dimensions of quality; and

Sector-wide thematic analysis of student feedback to identify the key issues that are of concern to students.

The good news is that as these questions are explored, the volume of student feedback will continue to grow. As it does so, the more accurate the algorithms developed to classify the data and derive meaning from it will become.

References

China Ministry of Education ( 2017 ). Statistical Yearbook of Chinese Education. Accessed from: www.moe.edu.cn/s78/A03/moe_560/jytjsj_2017/qg.

Davis

K. E.

, Fisher

, Kingsbury

B.,

and Merry

S. E.

(Eds.). ( 2012 ). Governance by indicators: Global power through classification and rankings. Oxford: Oxford University Press.

Espeland

( 2015 ). Narrating numbers. In Rottenburg

, Merry

S. E.

, Park

S. J.

and Mugler

(Eds.), The World of Indicators: The making of governmental knowledge through quantification, (pp. 56-75). Cambridge: Cambridge University Press.

Griffiths

, King

, and Leaver

( 2018 ). The wisdom of students: Monitoring quality through student reviews. Gloucester, UK: Quality Assurance Agency.

King

( 2018 ). Governance and power through indicators: The UK Higher Education Teaching Excellence and Student Outcome Framework. In Weingarten

, Hicks

and Kaufman

(Eds.), Assessing quality in postsecondary education: International perspectives, (pp. 67-84). Montreal: McGill-Queen’s University Press.

McCormick

A. M.

and Kinzie

( 2018 ). Quality as both state and process: Implications for performance-indicator systems. In H. Weingarten

Hicks

, M. and Kaufman

(Eds.), Assessing quality in postsecondary education: International perspectives, (pp. 27-48). Montreal: McGill-Queens University Press.

Sarfaty

G. A.

( 2015 ). Measuring corporate accountability through global indicators. In Merry

S. E.

, Davis

K. E.

, and Kingsbury

(Eds.), The quiet power of indicators: Measuring governance, corruption, and the rule of law (pp. 103-132). Cambridge: Cambridge University Press.

Shi

, Luo

, Wen

and Guo

( 2018 ). Governing quality in a transforming higher education system: the Case of China. In Hazelkorn

, Coates

and McCormick

A. C.

(Eds.), Quality, performance, and accountability in higher education (pp. 371-381). Cheltenham, UK: Elgar.

Surowiecki

( 2004 ). The wisdom of crowds. London: Little Brown.

10.

Wilson

et al. ( 2015 ). The metric tide: Report of the independent review of the role of metrics in research assessment and management. Bristol: HEFCE.

Evaluating Higher Education by Indicators

Abstract

Keywords

1 Introduction: China and the Growth of a Mass Higher Education Sector

2 Emerging Technologies in the Practices of Evaluation

3 Advantages and Disadvantages of Evaluating by Numbers

4 Key Data

5 Social Media

5.1 Overview

5.2 Student Reviews

5.3 Results

6 Conclusion

References