National testing in education in France: Statisation,rationalisation and politicisation

Abstract

In this paper, we analyse the trajectory of the French testing policy in education since 1973. Regarding its statistical tradition and its ability to produce its own evaluation tool, France may be regarded as an interesting case to interrogate the capacity of national educational systems to meet international standards of testing. Anchored in a perspective of sociology of public action, we show that the development of testing in France is the outcome of specific policy configurations that themselves depend on various types of factors. Using materials drawn from four qualitative research studies on testing and evaluation, we argue that this policy trajectory can be interpreted as a statisation process in which state administrations and political leaders both increased their power on society and imposed their categories and own interests to policy actors. This statisation led to a rationalisation and a politicisation of testing. Testing development did not lead in France to a deep transformation of governance patterns: it rather merged into traditional modes of regulation of education and confirmed them to some extent. Testing is thus an interesting way to study the propensity of the French education system to redefine global problems according to domestic stakes.

Keywords

Low-stakes accountability testing France policy trajectory state governance

Introduction

Testing of pupils’ achievement, be it developed at the national level or not, is very often studied through an international or a global approach. In that perspective, various theories have been provided to explain both the emergence of testing and its political meaning. Testing is alternatively regarded as an emblematic measure of the implementation of an evidence-based policy in education (e.g. Wiseman, 2010), as a key device to promote a “new [neoliberal] global educational order” (Laval and Weber, 2002) or as one policy potentially paving the way for post-bureaucratic modes of governance of education systems (Maroy, 2012). For others, testing is one of the numerous forms taken by a new governance of education through numbers (Felouzis and Hanhart, 2011), through measurement policies (Normand, 2011) or through statistical data and quality assurance (Ozga et al. 2011). More recently again, some even theorised the emergence of a new global panopticism through testing (Lingard et al., 2013) and the development of a new “global testing culture” in education (Smith, 2016).

These works are very helpful to highlight common trends and study globalisation or Europeanisation in education, to overcome methodological nationalism, to compare different national experiences and learn more from each of them through comparison, to understand the reasons why a culture of evaluation may emerge (or not) in some countries and, above all, to interpret the political significance of this new governing mode through testing.

Nevertheless, these works do not really allow the researcher to understand the forms taken by testing in specific national policies, the trajectory of testing in various domestic policy contexts and the domestic processes of legitimation of testing. Yet, these three aspects are essential to correctly interpret the political meaning of testing growth, to compare countries with relevance and to understand the potential globalisation at work. For instance, several works illustrated that testing development increased central national state power. Nevertheless, this trend will not have the same political meaning in traditional decentralised states, such as England or the USA, in which it corresponds to a major change of public governance, and in traditionally centralised systems as in France. Thus, the simple occurrence of testing in a growing number of countries is not a sufficient proof of the emergence of a common global testing culture.

That is why this article provides a framework in order to rethink these three aspects. It studies the trajectory of the French national testing policy of pupils in primary and secondary education, or more precisely the devices organising the diverse standardised evaluations of pupils’ achievement (SEPA) that are implemented in different periods and how these SEPA were legitimised. As far as SEPA are concerned, France is an interesting most likely case (Lijphart, 1971). It is indeed a country with a strong tradition of public statistics (Desrosières, 1998), which has developed its own tools from 1973, which tried to export its model in Europe and in Africa, especially in the 1990s, and which is often described in formal Eurydice international comparisons of evaluation devices as a country with various evaluation tools. Thus, it can be expected that France meets international standards of testing and if it does not, it will be meaningful for our discussion on globalisation through testing. As will be exposed in the two first sections, this trajectory of testing in France is understood as the outcome of specific policy configurations that enable more or less its development and that are studied on the basis of qualitative materials drawn from four research studies carried out since 2004.

Our argument is that this trajectory corresponds to a statisation process (Payre and Pollet, 2013) in which state administrations and political leaders both increased their power in society and imposed their categories and own interests on policy actors in education. This statisation took different forms according to the periods and the policy configurations at work but, overall, it led both to a rationalisation and a politicisation of testing, the two trends being sometimes compatible, sometimes not. Consequently, testing policy did not lead in France to a transformation of governance patterns but it rather merged into traditional modes of regulation of education and confirmed them to some extent. Testing is thus an interesting way to study the propensity of the French education system to redefine global problems according to domestic stakes.

After presenting our theoretical framework and our methodology, we distinguish five policy configurations from 1973 to 2017 and illustrate the kind of statisation at work in each period. We discuss then in the conclusion the political meaning of these trends.

Theoretical framework

In this article, we study the trajectory of the French testing policy as a succession of policy configuration outcomes. This approach inscribes itself in a policy sociology (Ozga, 1987) or a sociology of public action (Lascoumes and Le Galès, 2007) in education (van Zanten, 2011) that tries to understand how and why education policies are conceived, implemented and evaluated and what effects they produce.

A trajectory can be defined as a movement in time and space. It has been conceived and conceptualised differently from one scientific discipline to another. In social sciences, for instance, it often designates a succession of social positions occupied by a person in his/her social life, for instance within the school system (school trajectory, or career, of a pupil for instance) or on the housing market (residential trajectories of families). In an international issue that we coordinated on the trajectory of school inspections in Europe, several authors used this notion to question the surprising durability of inspections in a context of deep changes in the governance of education (Pons, 2014). They often used neo-institutionalist theoretical tools to conceptualise this notion of trajectory (like those of path dependency or gradual changes), but this perspective was not unique since others regarded the trajectory of different inspectorates as the result of specific strategies of internationalisation or as the outcome of a professional legitimation process implemented by inspectors. As far as policy trajectory specifically is concerned, Stephen Ball was among the first authors to insist on the necessity to “capture the dynamics of policy across and between levels” and to study “the ways in which policies evolve, change and decay through time and space and their incoherence” (Ball, 1997: 266). This study can be conducted on the basis of various theoretical frameworks. In a recent comparison of the trajectory of the accountability policies in French and Quebec education to which we contributed, this policy trajectory was conceptualised as the combination of three processes: path dependence on early choices, policy bricolage and translation of transnational imperatives in domestic contexts (Maroy and Pons, 2019). Our approach is close to these two last contributions and conceives the trajectory of the French testing policy as the outcome of a succession of specific policy configurations.

Policy configuration refers here to the notion of configuration developed by Norbert Elias (1978) and designates a set of factors within the policy process that produces specific interdependencies between policy actors. One major analytical challenge of this approach is to identify the factors that shape these interdependencies. We distinguish four types of factors following our previous work using this notion (Buisson-Fenet and Pons, 2014). The first one is political and refers mainly to the political making of public policies (here mainly the evaluation policy itself). The second is institutional. By defining the formal rules of the games, institutional designs strongly predetermine the roles and routines of policy actors, their margins and spaces of liberty and their degree and form of coordination. Here, it will mainly refer to the institutionalisation of SEPA and the institutional properties of evaluators. The third is professional: the way in which policy actors define their job (professional identity), embody it in specific activities (professional skills and competencies) and struggle for it (professional legitimating) plays an important role in their involvement in the policy process. The last is cognitive and designates all the representations, ideas and pieces of knowledge that policy actors may share, and that may influence their intervention in a policy process.

As illustrated below, these policy configurations all shape a movement of statisation of the SEPA, of their purposes and of the methods, the tools and the pieces of knowledge produced through them. Statisation (“étatisation” in French) can be defined as the institutionalisation of state power, the state being regarded as a centralised, differentiated, institutionalised, autonomous and sovereign political system (Badie and Birnbaum, 1983). At this general level, the notion of statisation is close to the so-called “state building” often quoted in international literature and so to the numerous theses that have tried to explain the emergence of the state form. It is also close to several historical works that stressed the globalisation of the “school form” (“forme scolaire”) and the increasing power of the state on education (Meyer and Ramirez, 2000; Vincent, 1994). Yet, as Michel Offerlé (1997) argued, analysing statisation does not only mean analysing how the state penetrates society but also the statisation of the state itself, that is to say the process of bureaucratisation of its own activity.

Three kinds of works contributed to the analysis of this double process (Payre and Pollet, 2013). The first focuses on the historical origins of some local initiatives to determine their influence on state policies and understand the drivers of statisation. This approach was particularly developed to analyse the appropriation and the generalisation by the state of local initiatives in urban policies. Nevertheless, this approach is not relevant in the case of SEPA since the development of public statistics in education in France was mainly top down and came from the initiatives of central administrations. The second approach of statisation, inspired by Max Weber, studies the bureaucratisation of the society by the state on the basis of an empirical sociology that stresses both the “inventive hesitations” (“errements inventifs”) of state agents and their work of delimitation of publicness, of stateness and of the frontier between the centre and its peripheries (Offerlé, 1997). Alternatively, this sociology may focus on the role of policy networks taking place between state administrations and private actors or on the role of myths, speeches and government sciences in the definition of state representations. We propose to call this kind of statisation an institutional statisation. The last approach regards statisation as the outcome of the imposition of specific policy categories that participate in the definition of policy problems requiring state intervention and that contribute to make the public targeted by this intervention exist. The best example of this double process is probably provided by the category of “unemployment”. We will talk about a cognitive statisation to designate this process even if cognitive processes are also at work in the institutional statisation.

To summarise, studying statisation implies seeing the state plunged in a perpetual process of redefinition of the frontiers of its legitimate sphere of action, as an actor in perpetual interaction with non-state actors and processes. This process can be more or less favoured by processes of politicisation. Politicisation occurs when policy devices – such as the SEPA – are more and more linked with the implementation of a specific political offer coming from political actors, such as parties. In axiological terms, it implies increasing debates on the values and the goals of these devices. According to the context, this politicisation can restrict the development of the SEPA or, in the contrast, favour their rationalisation. The latter can be formal – in that case it consists of cleaning up the device and improving its internal coherence at the risk of its simplification – or material when policy makers try to increase its empirical relevance (Benamouzig, 2005).

Methodology

This article is based on materials drawn from four different qualitative research studies led since 2004 and mobilised here in a synthetic perspective. The first one is a survey of scientific literature about standards and standardised tests done with Nathalie Mons. This survey was an opportunity to cover the English-speaking literature on standards, to interview French policy makers about SEPA and to compare four French-speaking education systems (Mons and Pons, 2006). The second study is a four-year PhD research on policy evaluation in France that allowed us to collect many public and non-public documents about SEPA and the personal archives of one statistician and lead intensive semi-structured interviews with people from the DEPP¹ (n = 32) and their interlocutors (n = 66) (Pons, 2010a). The third study was conducted in 2010 and focused on the implementation of new SEPA in primary education. It was based on additional interviews (n = 7), on document analysis and on the exploitation of a small dataset of dispatches (n = 69) from a press agency that specialised in education (Pons, 2010b). It was updated in 2012 with a new dataset of 70 dispatches. The last research study consisted of comparing policies of accountability in France and in Quebec² (Maroy and Pons, 2019). It was an opportunity to confirm, through interviews and document analysis, the slowdown of the French testing policy between 2012 and 2017 and to send a short questionnaire to regional statistical services of the ministry to check the degree of implementation of SEPA at the level of the “académies”.³

Testing policy as a succession of policy configurations

In this section,⁴ we present the five successive configurations that have characterised the trajectory of French testing policy since 1973. For each of them, we synthesise the main factors of interdependence between the actors in accordance with our theoretical framework. These elements show a regular process of statisation that has taken various empirical forms over time and that we present in the final section.

Internalising academic competencies (1973–1988)

While the first initiatives in intelligence testing in France date back to the early 20th century, the 1970s is a relevant starting point for our analysis. In 1973, the central office of statistics of the French ministry of education received a new mandate of evaluation. The year after, an office specifically devoted to “pedagogical assessments” was created. Its holder, a young administrator recently graduated from the prestigious National College of Administration (ENA), was given full liberty to organise its activities. For one year, he and a statistician undertook a kind of “scientific tourism” in France and in Europe (especially in Belgium, England and Switzerland) to learn from scholars’ experiences in testing. This enabled them to recruit a research officer and other administrators without specific technical skills in testing, to launch a collaboration with some specialists from the French National institute of pedagogical research and to implement with them the first national test of pupils in French and in mathematics between 1974 and 1976.

This new orientation of the ministry is the consequence of four main interconnected factors. The first refers to the successive transformations of the lower secondary education that have taken place since 1944 and that came up in 1975 with the creation of a single school system. Many observers wanted to know if this new system and the new curriculum that it implied would improve pupils’ achievement. The second is the development of several works in psychology and docimology that have shown the biases of teachers’ assessments and that have regularly pleaded for alternative forms of assessment since the 1930s. The third is the implementation from 1968 in all French ministries of a new form of budget planning that invited public administrations to evaluate ex ante the effects of their decision before programming them. Even if this operation, called BCR (Budgetary Choices’ Rationalisation), was limited in the education sector, it allowed the Ministry of Education to recruit a research engineer and to develop its statistical power. The last factor is the political opportunism of the statisticians themselves, whose head managed to convince the minister that national tests would provide interesting results to appreciate the effects of his reforms and that this operation was possible and not so costly, since the office already had implemented a panel of pupils in secondary education since 1972.

Despite the negative reactions coming from a part of the influential general inspectors, who criticised this quantitative approach of assessment, a second national test in primary education was implemented in 1979. In the 1980s, the SEPA were more frequent but their periodicity, their purposes and the level and discipline that they tested were never stabilised (see Table 1). Yet, two kinds of assessment rapidly emerged: “diagnosis assessments” and “record assessments”.⁵ The diagnosis assessments are exhaustive, relatively easy and implemented at the beginning of a learning cycle to allow teachers to identify rapidly pupils with difficulties. They are conceived as tool for professionals in schools. The record assessments are sample-based, summative and conceived for political leaders to provide them with a first measure of the outcomes of pupils and through them of the whole school system. These evaluations were not compulsory and were intermittent. They were conceived primarily to support the implementation of new curricula in the 1980s or to give political leaders specific feedback on their policies (this was the case, for instance, for the assessments of early-learning studies).

Table 1.

The standardised evaluations of pupils’ achievement from 1980 to 1989.

Years	ISCED levels	Types of assessment	Disciplines
1980	2	Diagnosis	French and mathematics
1981	1	Diagnosis	French and mathematics
1981	1	Record	French and mathematics, early-learning studies
1982	2	Record	French, mathematics, English, German, physics, natural sciences, disciplinary behaviours
1983	1	Record	French, mathematics, early-learning studies
1984	2	Record	French, mathematics, English, German, sciences, history and geography, disciplinary behaviours
1986	2	Record	French, mathematics, disciplinary behaviours
1986	3	Record	French, mathematics, English, German, history and geography, social sciences, sciences physiques, physics, technology, sport, disciplinary behaviours, oral expression
1987	1	Record	Reading, writing
	3	Record	French
	3	Record	History and geography
1988	2	Record	Not mentioned
	2	Record	Not mentioned
	2	Record	Economic culture
	3	Record	English
1989	1	Record	Reading
	1	Diagnosis	French and mathematics
	2	Diagnosis	French and mathematics
	3	Record	Economic culture

Source: Levasseur (1996).

ISCED: international standard classification of education.

This configuration had some advantages. It allowed professionals to improve their mutual knowledge on pupils without having to cope with administrative consequences. Gathering different people within pluralist groups whose task was to conceive the tests favoured communication between professionals that hardly knew each other at that time (such as inspectors and statisticians, for instance) or whose communication was rarely freed from hierarchical relations (such as between teachers and inspectors). This configuration also favoured spontaneous appropriations of the SEPA logics by professionals in the académies even if this appropriation took various directions.

Nevertheless, this configuration raised several issues. Methodologically, it caused a fragmentation of the data available and it implied for statisticians to always reconstitute samples. This exposed them to various forms of criticisms, such as the impossibility of having a longitudinal approach, the lack of control of some contextual explanatory variables or, in contrast, the regular condemnations of their figures-based approach to education. Politically, since the testing tools were not strongly institutionalised, it placed statisticians under the dependence of political leaders whose interest for this approach was uneven from one minister to another. If Alain Savary (1981–1984) wanted to have regular feedbacks on pupils’ achievement, Jean-Pierre Chevènement (1984–1986) was less interested in this approach and René Monory (1986–1988) had a strategic use of testing, that is, using it to influence teachers’ practices by circumventing the traditional institutional dialogue with unions. This paved the way for important decisions, such as the end of the scientific collaboration with the researchers who represented France in the Association for the Evaluation of Educational Achievement and the withdrawal of scholars from the SEPA devices.

This first period was characterised by a double statisation process. The latter was first cognitive since the ministry progressively internalised skills and competencies in pupils’ assessment that were initially developed beyond it. This internalisation started from “scientific tourism” and went on with an official collaboration with researchers for the conception of the tests, which stopped in 1984 when the ministry decided to reproduce alone its own tools. This statisation was also institutional, since the “inventive hesitations” of the founders of the office of evaluation came up with an extension of the state frontier that integrated SEPA. This double process went with a low politicisation of the testing policy and a low rationalisation of the SEPA themselves.

The promises and pitfalls of systematisation (1989–1997)

The situation rapidly changed in 1989. In March, within a whole policy debate on the so-called growing illiteracy of pupils, the minister Lionel Jospin announced the systematisation of diagnosis assessments in CE2 (year 8) and 6ème (year 11) for the following year. This statement was announced after several weeks of negotiation with teachers’ unions to prepare the future bill of 1989, whose last title was devoted to evaluation. During these negotiations, teachers’ unions recalled the importance of pedagogical liberty in their eyes and stressed their fear to see the SEPA becoming evaluation tools of teachers themselves. They won three concessions from the ministry: the lack of legal obligation for teachers to consider the results of the SEPA, the non-publication of these results per school to avoid parents’ consumerism and the privilege given to diagnosis assessments and to the SEPA conceived to support teachers’ practices and not evaluate or sanction them.

Despite these limitations, the systematisation of the “CE2-6ème assessments” device and the Act of 1989 considerably changed the configuration at work. If the former period was characterised by organisational hesitations, personal initiatives and the possibility for the opponents to the SEPA to ignore them, from 1989, the latter took a major place in the ministerial schedule and were clearly integrated in the whole evaluation policy. Massive training sessions on the SEPA were organised at various policy levels. Several workshops and professional symposia were organised locally, and their acts sometimes published by regional pedagogical centres. The SEPA became one of the key policy tools promoted by the head of the DEPP of that time, Claude Thélot, whose ambition was to transform this department into an institutional interface between the research, the administration, the media and the political leaders. For him, diagnosis assessments were a key channel of dissemination of a new “culture of evaluation” within the bureaucratic French education system. In 1992, a diagnosis assessment in “seconde” (upper secondary education, year 15) was launched. Two years after, a first evaluation of the reception of the SEPA in schools done by the DEPP revealed that 93% of teachers in CE2 and 6ème used the SEPA to discuss with parents, 82% to update their teaching and 67% to make their pedagogy evolve. The DEPP started to present the SEPA as a whole “observatory of pupils’ achievement” (Levasseur, 1996) and Claude Thélot even conceptualised on their basis a specific vision of evaluation that must produce the “mirror effect” (Pons, 2010a). According to this theory, evaluation can rarely provide professionals with strong and stable causal relations because education processes are too complex. Consequently, it can only confront professionals with the outcomes of their choices and invite them, once they accept to look in the mirror of the evaluation study, to find themselves, as professionals, the solutions to improve the results of their pupils. This theory deeply influenced the French model of evaluation policy that is sometimes presented in typologies of accountability policies as a reflexive (Dupriez and Mons, 2011) or a soft one (Maroy and Voisin, 2014).

Nevertheless, the priority given to diagnosis assessments had its own limits. Since they were exhaustive, these assessments were costly in material (production and mass edition of testing documents, contracts with transports, delivery of specific informatics equipment to enter data in schools) and in time (persons entirely devoted to supervisory tasks in the department). Their heavy logistic never allowed the DEPP to send back the results to schools before November. This was too late for teachers, who used other means to identify pupils’ difficulties. In addition, this costly priority implied to put in the background the record assessments, whereas they were supposed to be a key steering tool. This led some policy makers at various policy levels to use diagnosis assessments as quality indicators even if they were not conceived for that purpose.

During this second period, the statisation process continued. Institutionally, it consisted of the penetration in the schooling process of pupils of a new state device, the aim of which was to evaluate pupils and position them vis-à-vis others in their class, in their académie and in the whole nation. Cognitively, a specific dichotomy was imposed – diagnosis assessments versus record assessments – whereas it did not really correspond to the scientific literature. This movement was favoured by an increasing politicisation of the testing policy, now integrated in a whole evaluation policy, and came up with a formal and material rationalisation of the SEPA through the “CE2-6ème assessment” device.

The uncertainties of politicisation (1997–2007)

The Claude Allègre’s ministry (1997–2000) clearly broke this dynamic. The minister reorganised the department, publicly criticised the data that the latter produced – for instance on the proportion of under-achieving pupils in reading that would be underestimated for him – and defended another model of evaluation, which was more external and academic based. This break introduced much uncertainty on statistical production. If the diagnosis assessments went on, the record ones were clearly slowed down during the period, even if two emblematic assessments were conducted in 1997 and 1999.⁶

Moreover, several works have questioned the regulatory power of diagnosis assessments in the years from 2000. While a second study from the DEPP led in 2005 concluded that the majority of teachers still declared that these assessments contributed to make their professional practice evolve – thanks to the aggregation of general categories proposed in the questionnaire – an ethnographic research study in three académies showed that teachers had many difficulties integrating these tests in their pedagogy and in their communication with parents (Derouet and Normand, 2003). On the basis of a several-year research study in the Lille académie, Lise Demailly (2003) highlighted a resistance of teachers towards these tests, which they justified by corporatist reasons (fear of being evaluated themselves through these tests and losing their pedagogical autonomy), by general values (the refusal of technologies coming from the private sector and leading to unfair evaluations) and by a global distrust towards statistics.

As far as head teachers were concerned, various works illustrated that if they recognised that outcomes-based steering was now part of their mission and that these tests might increase their possible influence on teachers, they also strongly criticised the bureaucracy that they implied and their problematic insertion in the everyday professional relations with teachers and parents (Barrère, 2009). Hence, the integration of these tests in school projects was very uneven from one school to another (Verdière, 2001). Agnès van Zanten (2001) even argued that when these tests were used, it was not to support a pedagogical project but rather to maintain a good reputation (in the case of top performing schools) or to fabricate a good image (in the case of unfairly stigmatised schools). Concerning local authorities, if some municipalities actively used diagnosis assessments as an observation tool to bypass the traditional educational experts and assert themselves as legitimate interlocutors on pedagogical issues (Dutercq, 2000; van Zanten, 2001), their use remained very uneven from one territory and a policy level to another. Concerning académies lastly, these tests were sometimes used as steering indicators, whereas they were not conceived for that purpose and this instrumentalisation was criticised both by the DEPP and by the general inspectors (IGEN-IGAENR, 2005).

Consequently, the years from 2000 were characterised by many political uncertainties concerning the future of the SEPA, which were illustrated both by the numerous reorganisations of existing tools and by the inflation of new short-lived devices. In 2001, for instance, the diagnosis assessment in “seconde” (year 15) was abandoned and a new one was launched in “5ème” (year 12). A new test of that kind in CE1 (year 7) was experimented in 2004–2005 within a global policy programme for reading. It was re-experimented the year after officially to better capture pupils’ difficulties, and it was generalised in 2006–2007 before being abandoned in 2008. The test in CM2 (year 10) launched in 2005 in the continuation of a new education act was stopped this year too. Lastly, the former tests in CE2 (year 8) were abandoned in 2006.

Concerning record assessments, a new cycle entitled CEDRE⁷ was implemented in 2003 to evaluate pupils’ competencies at the end of primary and secondary education in various disciples. Based on national samples, their methodology was close to that of international comparisons and took better into account international research in psychometrics. Nevertheless, they were not conceived to inform at the same time some expected fields of the new budgetary documents introduced in 2006 by the LOLF,⁸ so that the ministry had to create other ad hoc sample-based tests to fill these documents. Experimented in 2005–2006, they were generalised in 2007, initially as a temporary device.

The 1997–2007 period was thus characterised both by a strong politicisation of the testing policy – visible for instance in the uncertainties introduced by Allègre’s ministry and in the inflation of new devices in the years after – and by a weak rationalisation of the numerous SEPA available whose lifetime were generally short. This double movement paved the way for an ambivalent statisation process, in which the SEPA devices were more and more defined according to state (always changing) needs, but with a decreasing regulatory power on professionals and bureaucrats.

Political instrumentalisations (2008–2012)

The instability of the SEPA went on between 2008 and 2012 because of political instrumentalisations of these tools by policy actors. In November 2007, the minister Xavier Darcos (2007–2009) announced the creation of two national exhaustive tests in CE1 (year 7) and CM2 (year 10). These new tests were expected to measure pupils’ achievement, to provide parents with feedback on their child, to be an indicator of teaching efficiency and to estimate the school system efficiency. Initially, these new tests were supposed to solve various problems of the past: for the first time they were explicitly linked with an on-going curricular reform (whereas before, tests were conceived a posteriori), they would allow one to improve the statistical database in primary education, which met several difficulties in the past (striking from primary school leaders who refused to fill in statistical documents, contestations of a former intrusive database on pupils, etc.) and so to improve the LOLF indicators and, lastly, they were expected to bridge the gap between former record assessments that were conceived to steer the system but only at the national level, on the one hand, and on the other diagnosis assessments, which enabled statistical analyses at various levels since they were exhaustive but not in a steering perspective since their purpose was different.

These new tests introduced three main breaks in the testing policy trajectory. Firstly, they signalled the end of the former emblematic “CE2-6ème assessments” design since the responsibility of the implementation of the tests in “6ème” was transferred to the académies in 2008. Secondly, they constituted a new form of test – neither diagnostic nor summative – whose conception was far from traditional psychometric canons and closer to the immediate political needs of the ministry. Symbolically, their conception was not given to the DEPP but to another central department (the DGESCO), whose mission is to implement the ministry policy. Thirdly, they provoked for the first time a burning and durable controversy within the education field between the ministry and its partners that went on until the next presidential elections (Dutercq and Lanéelle, 2013; Pons, 2010b).

This controversy developed in three main stages. From October 2008 to May 2011, it was essentially fuelled by methodological considerations. These new tests were reproached for being created discretionarily by the ministry cabinet, announced in emergency without giving birth to a preliminary consultation of professional organisations, conceived according to too-binary logics of correction (pupils know or do not) and implemented in a bad period of the schooling year. For instance, the test in CM2 (year 10) was supposed to be organised in February, which is too late to help teachers struggling again pupils’ difficulties identified by the test and too early to constitute a summative assessment of the whole schooling year. From May 2011 to December 2011, these methodological elements did not disappear, but they were added by opponents to other criticisms on other SEPA devices. The announcement by the government that another exhaustive test in “5ème” (year 12) would be experimented in September 2012 was received by teachers’ unions as an illustration that the ministry did not take into account their former methodological criticisms. In September 2011, the High Council of Education published a report in which it strongly criticised the evaluation tools that were used to measure the degree of pupils’ competencies in the LOLF documents. These criticisms had a new echo from January 2012 with the official opening of the presidential campaign. While the Right in power confirmed regularly its policy and even published a circular organising the next schooling year several days before the first round of the elections, the Left rapidly reassured teachers by confirming that it would abrogate the CE1–CM2 assessment device after the election.

Consequently, between 2008 and 2012, the politicisation of the testing policy clearly increased since the interventions of political leaders did not only have the effect of creating uncertainty on the future of the existing SEPA as before, but also they materialised themselves in explicit attempts to instrumentalise these tools for specific political and institutional purposes. Therefore, the rationalisation of the SEPA devices was particularly weak and even negative in a way. The trajectory of the testing policy can still be regarded as a statisation process if we take also into account in this process the resistances that state initiatives can bring about.

Political withdrawal and technical stabilisation (2012–2017)

From 2012, the testing policy was clearly put in the background by political leaders. The position of the new minister Vincent Peillon (2012–2014), who announced in a press conference in August 2012 that he abrogated the tests in CE1 and CM2 and who reintroduced the distinction between diagnosis and record assessments, did not pave the way for the implementation of a new device. The wild consultation organised by the Left after the presidential elections, entitled “the Refoundation”, the aim of which was to prepare the Act of 2013 did not give birth to many discussions on the SEPA except to reassert that the choices made in the former period were poor ones. Significantly, the new national council for the evaluation of the school system (CNESCO) created by this Act did not specifically address this question during the quinquennium and no new ad hoc SEPA device was created.

This political cooling allowed the DEPP to improve the stability and the cumulativeness of its evaluation tools. New sample-based assessments were conceived in 2014 to improve the evaluation of pupils’ basic competencies in order to better inform the LOLF budgetary documents. The CEDRE device was consolidated with each year a regular record assessment on a specific discipline. This allowed the DEPP to develop diachronic assessment systems. The latter rest not only on LOLF indicators and CEDRE assessments, but also on the repetition of the 1987 survey that we mentioned above and on the exploitation of specific panels of pupils who are periodically tested. These initiatives were also opportunities to improve the methodological foundations of the SEPA and to better incorporate the latest findings of the international research, an issue illustrated by the DEPP (Rocher and Simonis-Sueur, 2015).

Nevertheless, this political cooling did not favour the development of a testing policy at other policy levels than the national one, for instance in the académies. Our collective researches on the accountability policies in three major académies confirmed that the success indicators that these regional authorities consider for their governance or their institutional dialogue with the central administration are only the success rates for national exams. The short questionnaire that we sent to the statistical department of 30 académies confirmed that the implementation of regional tests, as was the case in the académie of Caen at the 6ème level since 2011, is an exception. Most académies simply do not implement such tests even if some of them sometimes conceive ad hoc studies on specific topics.

In that context, it remains difficult to appreciate the effects of this testing policy orientation on the actors of the school system. Another research on the French policy debate in education confirms that the LOLF is very little debated and that this debate tends to focus on budgetary considerations and not on the question of evaluation (Pons, 2017). The CEDRE device seems to be relatively consensual, which does not prevent policy actors instrumentalising it for the sake of their own policy stances.

This period of low politicisation of the testing policy undoubtedly favoured a strong rationalisation of the SEPA devices but also a reduction of their importance at various levels. This led to a narrow statisation in which the testing policy is essentially defined according to the immediate cognitive and institutional needs of the state central power.

Testing as a statisation process

Finally, the trajectory of the French testing policy can be regarded as a succession of five different policy configurations, which are synthesised in Table 2. If these configurations more or less encouraged the politicisation of this policy and the rationalisation of the SEPA devices, they all drew a statisation process that took different forms according to the periods. It started by the internalisation of technical competencies that were developed beyond the state sphere and it went on with the imposition of specific categories and processes, with an increasing integration of the state’s own needs in the conception and objectives of the SEPA, to their contestation and to their concentration on the needs of the central state only.

Table 2.

Policy configurations: a final synoptic view.

	Periods
	1973–1988	1989–1997	1997–2007	2008–2012	2012–2017
	Dimensions of the configuration
Political	Limited interests from political leaders	Development of an evaluation policy (mirror effect)	Allègre’s breakdown + political hesitations	Instrumentalisations + controversy + elections	Political cooling
Institutional	Low institutionalisation	Reflexive accountability	Inflation of devices	Burst of the devices	No more reflexive accountability
Cognitive	Scientific tourism + internalisation of scientific skills + distinction diagnosis/record assessments	“CE2-6ème assessment” device	“CE2-6ème assessment” device + short-lived devices	Methodological controversy	Stabilisation of existing devices
Professional	Conflicts between statisticians, inspectors and scholars + uneven investment from an actor to another	New “evaluation culture” versus unions’ claims	Ambivalent uses at all levels	Strong oppositions by school professionals	Lower investment by professionals + scientific consolidation of the DEPP
	Degree of politicisation
	Low	Intermediary	High	Very high	Low
	Degree of rationalisation
	Low	High	Low	Low	High
	Forms of statisation
	Internalisation of an external competency	Imposition of categories	Statisation of the state	Resistances to state initiatives	Focalisation on central state needs

Nevertheless, this statisation process does not imply that only one logic of accountability was at stake during the whole period. Soft accountability is relatively constant: the SEPA were never linked in France with hard institutional, financial or administrative consequences. Yet, this logic has been progressively challenged since the middle of the decade of 2000 with the LOLF and the Right policy between 2007 and 2012. In addition, concerning the reflexive accountability through the SEPA, it was central during two decades (1980s and 1990s) but not really before and far less after. It is even surprising to notice that, finally, this trajectory came up in recent years with a situation in which there is neither reflexive accountability (since the SEPA are defined essentially for the purposes of the central power) nor, strictly speaking, soft accountability (since the SEPA constitute some indicators of the LOLF).

Learning from long-term policy trajectories

We chose to study France because it was a priori a most likely case of development of a testing policy. In contrast, the analysis of the long-term trajectory of the testing policy in this school system shows that, in the end, this policy deviates in several ways from international standards: it is implemented irregularly over time according to the configurations and the unequal interest of political leaders; it is not always linked to an overall policy of accountability; and it does not inscribe itself easily into a linear scheme of methodological rationalisation that would see the tools implemented moving continuously closer to the latest advances in international research (only certain assessment tools are included in this scheme, in particular “record assessments”). What theoretical lessons can we draw from the refutation of this highly probable case through a long-term policy trajectory analysis? We propose three as a final theoretical discussion (rather than a conclusion).

This trajectory analysis makes it possible to take much more account of the historicity of public action, that is, both the capacity of history to influence current policy choices and the selective mobilisation of past experiences by the actors in the policy process to impose their vision of the problems to be resolved (Laborier and Trom, 2003). This consideration invites the researcher to show a salutary critical distance from the many “turns” that are sometimes associated in the international literature with the implementation of tests, whether international or not (performance, comparative, quality or topological turns, for instance).⁹ A testing policy is inevitably part of a long history, a trajectory that conditions the possibility and the very forms of not only its implementation, but also its effects. In the case of France, contrary to a rapid inference that the implementation of tests would be a sign of the country’s alignment with a global (testing) culture, of its inevitable imprisonment in a global panopticism or of a neo-liberalisation of schools, the trajectory of the testing policy, like that of accountability in this country more generally, remains a neostatist one, leading to a strengthening of the power of the state (Maroy and Pons, 2019).

Moreover, the entry through the successive configurations makes it possible to grasp, on the theoretical as well as the methodological level, the construction of local sites that are more or less context generative when they are exposed to global imperatives (Lingard, 2006). The idea here is not to enclose France in the image of the Gallic village still resisting the invading forces of globalisation, but rather to show that the succession of configurations studied in this article produces a strongly generative political and institutional context, conducive to a vernacular globalisation of the French school system.

This last remark does not mean either that the globalisation or Europeanisation of educational policies at work through the development of testing and the networks of experts who carry them do not have an effect on French testing policy. The surveys show several examples of one-off borrowings: the “record assessments” have gradually been aligned with the item response models used in international surveys, the Right justifies the new assessments implemented in primary education in 2008 by France’s poor results in international surveys, etc. However, the long-term trajectory approach invites us to grasp its nature in greater detail. In the French case, exposure to a new world testing culture did not result in a profound transformation of the modes of governance of the school system (for example, more focused on student performance), but it did provide technical and political opportunities, among others, that actors caught up in domestic policy configurations could seize (or not, or in different ways) depending on the context.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Xavier Pons

Notes

Author biography

Xavier Pons is associate professor at the University of Eastern Paris Créteil (UPEC), member of the Interdisciplinary Department of Political Studies – Hannah Arendt Institute (LIPHA), and associate researcher at the Center for social change in Sciences Po Paris. Member of several international comparative research projects since 2006, he works mainly on the transformations of the governance of education systems in France and in Europe, especially through evaluation, with a special focus on the role of policy tools, professional groups, knowledge and discourses in the policy process. He also works on the shaping of policy debates and the role of media in the fabrication of education policy problems.

References

Badie

Birnbaum

(1983) The Sociology of the State. Chicago: The University of Chicago Press.

Ball

(1997) Policy Sociology and critical social research. A personal review of recent education policy and policy research. British Education Research Journal 23(3): 257–274.

Barrère

(2009) Les directions d’établissement scolaire à l’épreuve de l’évaluation locale. Carrefours de l’Education, 28: 199–214.

Benamouzig

(2005) La Santé au Miroir de l’Économie. Paris: PUF.

Buisson-Fenet

Pons

(2014) School Evaluation Policies and Educating States. Bruxelles: P.I.E. Peter Lang.

Demailly

(2003) L’évaluation de l’action éducative comme apprentissage et négociation. Revue Française de Pédagogie 142: 114–127.

Derouet

J-L

Normand

(2003) Le Développement d’une Culture de l’Evaluation dans l’Éducation Nationale: Comment les Enseignants Utilisent-ils les Résultats des Évaluations Nationales? Lyon: INRP.

Desrosières

(1998) The Politics of Large Numbers: A History of Statistical Reasoning. Cambridge, MA: Harvard University Press.

Dupriez

Mons

(2011) Introduction. Les politiques d’accountability. Du changement institutionnel aux transformations locales Education Comparée 5: 7–16.

10.

Dutercq

(2000) Politiques Éducatives et Évaluations: Querelles de Territoires. Paris: PUF.

11.

Dutercq

Lanéelle

(2013) La dispute autour des évaluations des élèves dans l’enseignement français du premier degré. Sociologie 1(4): 43–62.

12.

Elias

(1978) What is Sociology? New York: Columbia University Press.

13.

Felouzis

Hanhart

(eds) (2011) Gouverner l’Éducation par les Nombres? Usages, Débats et Controverses. Bruxelles: De Boeck.

14.

IGEN-IGAENR (2005), L’usage des Outils de Pilotage Élaborés par les Académies ou Mis à Leur Disposition. Paris: MEN.

15.

Laborier

Trom

(2003) Historicités de l’action publique. Paris: PUF.

16.

Lascoumes

Le Galès

(2007) Sociologie de l’Action Publique. Paris: Nathan.

17.

Laval

Weber

(dir.) (2002) Le Nouvel Ordre Éducatif Mondial. Paris: Ed. Nouveaux regards.

18.

Levasseur

(1996) L’évaluation nationale des acquis des élèves. Revue Internationale D’éducation 11: 101–114.

19.

Lijphart

(1971) Comparative politics and the comparative method. The American Political Science Review 65(3): 682–693.

20.

Lingard

(2006) Globalisation, the research imagination and deparochialising the study of education. Globalisation, Societies and Education 4(2): 287–302.

21.

Lingard

Martino

Rezai-Rashti

(2013) Testing regimes, accountabilities and education policy: Commensurate global and national developments. Journal of Education Policy 28(5): 539–556.

22.

Maroy

(2012) Towards post-bureaucratic modes of governance: A European perspective. In: Steiner-Khamsi

Waldow

(eds) Policy Borrowing and Lending in Education. London: Routledge.

23.

Maroy

Pons

(2019) Accountability Policies in Education. A Comparative and Multilevel Analysis in France and Quebec. Dordrecht: Springer.

24.

Maroy

Voisin

(2014) Une typologie des politiques d’accountability en éducation: l’Incidence de l’instrumentation et des théories de la regulation. Éducation Comparée 11: 31–57.

25.

Meyer

Ramirez

(2000) The world institutionalization of education. Origins and implications. In: Schriewer

(ed.) Discourse Formation in Comparative Education. Francfort: Peter Lang.

26.

Mons

Pons

(2006) Les Standards en Éducation dans le Monde Francophone: Une Analyse Comparative. Neuchâtel: IRDP.

27.

Normand

(2011) Gouverner la Réussite Scolaire. Une Arithmétique Politique des Inégalités Bern: Peter Lang/ENS de Lyon.

28.

Offerlé

(1997) Etatisations. Introduction Genèses 28: 1–3.

29.

Ozga

(1987) Studying education policy through the lives of policy makers. In: Barton

Walker

(eds) Changing Policies, Changing Teachers. Milton Keynes: Open University Press, pp.138–150.

30.

Ozga

Dahler-Larsen

Segerholm

, et al. (eds) (2011) Fabricating Quality in Education: Data and Governance in Europe. London: Routledge.

31.

Payre

Pollet

(2013) Socio-histoire de l’Action Publique. Paris: La Découverte.

32.

Pons

(2010a) Evaluer l’Action Éducative. Paris: PUF.

33.

Pons

(2010b) L’urgence des évaluations CE1-CM2: Réflexions sur une méthode de gouvernement. Administration et Éducation 125(1): 15–21.

34.

Pons

(2012) Quarante ans d’évaluation ministérielle des acquis des élèves en France: Complexification et politisation. Politiques Sociales et Familiales 110: 9–18.

35.

Pons

(2014) Les trajectoires des inspections scolaires en Europe: Analyses comparatives. Revue française de pédagogie 186: 5–10.

36.

Pons

(2017) Débat Public et Action Publique en Éducation en France dans les Années 2000. Une Sociologie des Configurations de Dicibilité. Créteil: Habilitation à Diriger des Recherches de Sociologie Université Paris-Est.

37.

Rocher

Simonis-Sueur

(coord.) (2015) Evaluation des acquis: Principes, méthodologie, résultats. Education et Formations 86–87: 1–312.

38.

Smith

(ed.) (2016) The Global Testing Culture. Oxford: Symposium Books.

39.

Van Zanten

(2001) Le rôle des évaluations dans les stratégies concurrentielles des établissements et dans les stratégies de choix des parents en France et en Grande-Bretagne. In: Demailly

(ed.) L’évaluation des Politiques Éducatives. Bruxelles: De Boeck.

40.

Van Zanten

(2011) Les Politiques d’Éducation. Paris: PUF.

41.

Verdière

(2001) Les pratiques d’évaluation du travail d’enseignement. PhD Thesis. Lille I University.

42.

Vincent

(ed.) (1994) L’éducation Prisonnière de la Forme Scolaire. Lyon: PUL.

43.

Wiseman

(2010) The uses of evidence for educational policymaking: Global contexts and international trends. In: Green

LAJ

Kelly

(eds) What Counts as Evidence and Equity? Review of Research in Education. New York: AERA.