Abstract
Whereas the European Commission officially intends to periodically evaluate all major European Union legislation in force, in practice it only evaluates a minority of major regulations and directives. This article tries to explain the variation in the initiation of such ex-post legislative evaluations by the Commission with the help of two theoretical motives: an enforcement motive and a strategic motive. Based on two novel datasets and binary logistic regression analysis, the results show that the type and complexity of the legislation, the presence of an evaluation clause and the evaluation capacity of the responsible Directorates-General enhance the chances of evaluation. These findings indicate that ex-post legislative evaluations are at least partly driven by the Commission's need to enforce legislation.
Introduction
The European Union (EU) is often described as a ‘regulatory state’ due to the important role of legislation in the European policy process (Majone, 1999: 1). A marked feature of the European legislative process is the centrality of one supranational executive actor: the European Commission. The Commission has a number of crucial tasks related to European legislation. Firstly, it is responsible for the development and formulation of legislative proposals (Schmidt and Wonka, 2013: 2). Secondly, it produces delegated and implementing acts (McCormick, 2015: 169–172). Thirdly, in its role of ‘guardian of the European treaties’, the Commission is responsible for monitoring and enforcing national compliance with European legislation (McCormick, 2015: 169–172; Schmidt and Wonka, 2013: 2).
These three tasks of the Commission in the EU's legislative process have received ample academic scrutiny (e.g. Kassim et al., 2013; Schmidt and Wonka, 2013; Wille, 2013). Conversely, the literature has hardly touched upon a fourth key task of the Commission, which is to conduct ex-post evaluations that assess the functioning and effectiveness of European legislation. So far, such ex-post legislative (EPL) evaluations have mostly been neglected by scholars (but see Fitzpatrick, 2012; Mastenbroek et al., 2016; Zwaan et al., 2016), which is all the more surprising given both their theoretical importance and their growing role in the Commission.
Theoretically speaking, EPL evaluations may fulfil two important functions in legislative processes. Firstly, by recommending how the implementation of legislation can be improved and/or how legislation can be amended to increase its effectiveness, EPL evaluations are a potential tool for decision-makers to improve their policies (Fitzpatrick, 2012: 479; Vedung, 1997: 109). Secondly, EPL evaluations can be used to judge the performance of the actors that implement legislation, thus holding them accountable for their actions (Coglianese, 2012: 11; Vedung, 1997: 102–108).
Over the years the Commission has increasingly recognized the importance of EPL evaluations. It first emphasized the role of such evaluations in legislative improvement and accountability relationships in 2000, after which it started to make its procedures for EPL evaluations more systematic in 2007 (European Commission, 2007: 3–4; Fitzpatrick, 2012: 478). Since 2010 the Commission has also stressed the importance of EPL evaluations for judging the suitability of entire regulatory frameworks (so-called ‘fitness checks’) (European Commission, 2010: 5). Furthermore, from 2012 onwards it has given EPL evaluations a central place in its REFIT programme, which aims to identify and remove superfluous rules (European Commission, 2012: 4).
In 2015 the Commission published new guidelines that outline the methods, follow-up procedures and institutional responsibilities for carrying out EPL evaluations (European Commission, 2015). In principle, all the Commission's EPL evaluations must use some form of stakeholder consultation to map the views of those actors that are directly affected by European legislation (European Commission, 2015: 299–336). Aside from this, EPL evaluations can use different combinations of methods, such as expert interviews, document analysis and quantitative modelling (European Commission, 2015: 337–414; Fitzpatrick, 2012: 490–497).
Concerning the follow-up of EPL evaluations, the Commission is supposed to produce an action plan based on the main recommendations of each evaluation to ensure that its results feed back into the ‘regulatory cycle’ (European Commission, 2015: 297–298). Existing research has shown that the extent to which this happens varies in practice, such that about half of the ex-ante evaluations (impact assessments) attached to proposals for legislative amendments make use of information from EPL evaluations when available (van Golen and van Voorst, 2016: 388).
Concerning the institutional responsibility for EPL evaluations, the Commission's guidelines specify that such evaluations are the responsibility of the Directorates-General (DGs), with a coordinating role for the Commission's Secretariat-General (SG) (European Commission, 2015: 257; Stern, 2009: 70–71). EPL evaluations are usually based on reports written by external consultants to enhance their independence, but when this is more practical the whole evaluation process may also be conducted internally (European Commission, 2015: 282–289).
Importantly, since 2007 the Commission's guidelines also prescribe that both financial and legislative activities must be evaluated periodically, in proportion to their allocated resources and expected impact (European Commission, 2007: 22; 2015: 257). In reality, however, not all important EU legislation is evaluated. Academic research has shown an initiation ratio of 33% for major EU regulations and directives from the period 2000 to 2012 (Mastenbroek et al., 2016: 1338). The European Commission (2013: 13) has produced similar figures: in 2013, 29% of all important EU regulations had been evaluated, with a further 13% of these regulations being evaluated at that moment, 19% of these regulations having a future evaluation planned and no numbers being provided for directives.
These figures show that the Commission is apparently selective in which legislation it evaluates, for reasons that the institution itself does not explain. This finding is problematic because an evaluation system is only credible if its procedures for initiating evaluations are systematic and transparent (Organisation for Economic Co-operation and Development (OECD), 2015: 120). If this is not the case, legislative quality may diminish in policy areas that are evaluated less frequently (OECD, 2015: 120) and/or the image could arise that the Commission decides what legislation to evaluate based on political considerations (Radaelli and Meuwese, 2010: 146). This, in turn, could harm the credibility of evaluations in the eyes of the legislator and other actors (Poptcheva, 2013: 4).
Therefore, this article looks into the question of what drives the initiation of EPL evaluations by the Commission. In other words, why does the Commission evaluate some pieces of law while it does not evaluate others? By answering this question, we not only seek to shed light on the unexplored topic of EPL evaluations in the EU, but also aim to further explore the motives that drive the Commission's behaviour (Boswell, 2008: 472; Franchino, 2007: 11; Hartlapp et al., 2015: 1; Radaelli, 1999: 760–762; Wille, 2010: 1098–1100).
Two potential motives (not) to initiate an evaluation are studied in this paper, each of which is linked to a specific theoretical image of the Commission. The first motive, which is in line with the image of the Commission as the ‘guardian of the European treaties’, is the effective enforcement of EU legislation. Since EPL evaluations are a potential tool to check how legislation is implemented by the member states (European Commission, 2015: 296; Stame, 2008: 124), we can expect that legislation for which the chances of non-compliance are higher is more likely to be evaluated. The second motive is the strategic protection of competences, which is in line with the image of the Commission as a political actor (Boswell, 2008: 472; Hartlapp et al., 2015: 1; Majone, 2005: 65). Following this logic, we would expect that the Commission refrains from evaluating legislation if this could result in a reduction of its powers.
The hypotheses flowing from these two motives are tested with the help of two datasets, the first containing all major EU legislation from 2000 to 2004 and the second containing all EPL evaluations conducted by the Commission during 2000–2014. With these data, we are able to draw conclusions about the Commission’s decisions to evaluate European legislation over a 15-year period. The 10-year gap between our datasets is needed to give the Commission enough time to evaluate, thus avoiding any bias in our data in favour of legislation that was evaluated sooner. Binary logistic regression was used for the analysis.
Our results show that EU legislation is more likely to be evaluated if it is a directive rather than a regulation and if it is more complex, which is in line with the enforcement motive. Both of our control variables – the presence of evaluation clauses and the amount of evaluation capacity of the DG to which a piece of law belongs – also provide significant explanations. However, we did not find evidence that the strategic protection of competences explains the Commission's initiation of EPL evaluations.
Theoretical framework
Whereas evaluation-related topics are frequently discussed in the academic literature, there is no comprehensive approach to explaining why organizations decide to evaluate or not (Mastenbroek et al., 2016: 1343; Pattyn, 2014: 351). Therefore, this article develops such an approach in the context of the EU, building on two potential motives for the Commission: an enforcement motive and a strategic motive. These motives are closely linked to ongoing academic debates about the nature of the Commission (Boswell, 2008: 472; Franchino, 2007: 11; Hartlapp et al., 2015: 1; Radaelli, 1999: 760–762; Wille, 2010: 1098).
Enforcement motive
In its role of ‘guardian of the European treaties’, the Commission has the task to monitor and enforce member state compliance with EU legislation (Schmidt and Wonka, 2013: 2). EPL evaluations are potentially useful for this purpose, as they can collect and present information about how rules are implemented in practice (Coglianese, 2012: 11). This, in turn, makes EPL evaluations useful to hold those actors responsible for the implementation of legislation accountable (Vedung, 1997: 102). Therefore, EPL evaluations are a potential tool for the Commission to detect non-compliance by the member states and to address such non-compliance via enforcement measures (European Commission, 2015: 292; Stame, 2008: 124).
The role of EPL evaluations in enforcing European legislation is also evident from earlier research about this topic. Mastenbroek et al. (2016: 1339) found that out of 216 EPL evaluations conducted or outsourced by the Commission between 2000 and 2012, 79% assessed the processes of legislative implementation, enforcement and/or compliance. Zhelyazkova et al. (2016: 833) found EPL evaluations to be the most detailed source of information about the compliance of member states with 24 directives of interest.
Those EPL evaluations that study member state compliance often assess the legal implementation of directives by systematically comparing national transposition measures, while they tend to assess the practical implementation of European legislation via surveys and interviews among stakeholders. In some cases, infringement data are also used as a source (Smith, 2015: 92–93). EPL evaluations that address member state compliance also tend to include recommendations for the Commission. Often these recommendations focus on ‘soft’ measures like increased monitoring, sharing best practices or publishing guidelines for national implementing authorities, but evaluations may also recommend the Commission to launch infringement procedures (Mastenbroek et al., 2016).
If the Commission can use EPL evaluations for enforcement purposes, we can expect that the chances than an evaluation is initiated are higher for pieces of law where there is a greater need to scrutinize the member states. In other words, we can expect that the chances that an evaluation is initiated are higher for legislation that offers more opportunities for non-compliance.
Three specific variables may be important in this regard. Firstly, the type of legislation may affect the chances of non-compliance. Directives offer the member states more discretion than regulations because they need to be transposed into national law (Treib, 2014: 6). In turn, this discretion offers the member states more opportunities to delay or prevent implementation (Kaeding, 2006: 232; König and Mäder, 2014: 247; Mastenbroek, 2003: 372; Steunenberg and Rhinard, 2010: 495; Treib, 2014: 6). Therefore, we expect directives to be more likely to be evaluated than regulations (Stame, 2008: 124). H1: Directives are more likely to be evaluated than regulations.
Secondly, the complexity of legislation can affect the chances of non-compliance. Since the European legislative process includes multiple veto players – notably the Commission, the Council and the EP – decision making often produces compromises that are laid down in long and ambiguous texts (Häge, 2007: 307–308; Hofmann, 2013: 99). Such complexity offers member states more leeway for interpretation, and, therefore, makes it more difficult to establish whether they are complying with legislation or not (Kaeding, 2006: 242; König and Mäder, 2014: 253–254; Mastenbroek, 2003: 376; Steunenberg and Rhinard, 2010: 501). This in turn can be expected to increase the chance that legislation is evaluated by the Commission. H2: The more complex a piece of law, the higher its chances of being evaluated.
Thirdly, the political sensitivity of legislation may affect the chances of non-compliance. The more controversial a regulation or directive, the more likely it is that some member states who opposed it during the legislative process will not implement it correctly (Mastenbroek, 2003: 376). Since the Council represents the member states, politicization in the Council is especially likely to increase the chances of non-compliance (Treib, 2014: 14), and, therefore, the chances that an evaluation is initiated. H3: The more politicized a piece of law was in the Council, the more likely it is to be evaluated.
Strategic motive
The hypotheses presented above are in line with the image of the Commission as the ‘guardian of the European treaties’. However, in recent years scholars have increasingly viewed the Commission as an actor that not only fulfils the tasks that the member states have delegated to it (such as enforcing European legislation), but also strategically pursues its own preferences (Franchino, 2007: 11; Hartlapp et al., 2015: 1–14; Wille, 2010: 1099). According to this political view on the Commission the institution has a perpetual interest to protect its competences, as without these competences it would not be able to achieve any of its (temporary) political aims (Hartlapp et al., 2015: 1; Majone, 2005: 65; Pollack, 2008: 9).
The Commission has been shown to deal strategically with ex-ante evaluations of legislation (impact assessments) (Poptcheva, 2013: 4; Torriti, 2010: 1065) and expert knowledge in general (Boswell, 2008: 472), so we can expect strategic considerations to play a role in decisions about the initiation of EPL evaluations as well. Ex-post evaluations are not just neutral instruments that can be used to stay informed about policy implementation, but also potential strategic tools that can strengthen or weaken the positions of actors (Bovens et al., 2008: 320; Schwartz, 1998: 295; Vedung, 1997: 111). As evaluations suggest changes to existing arrangements, they are inherently advantageous to some actors and disadvantageous to others (Bovens et al., 2008: 320; Weiss, 1993: 95–98). Negative evaluations can be particularly disadvantageous to actors that are responsible for delivering policies, as such evaluations may lead to demands to roll back their competences or to put them under closer supervision (Vedung, 1997: 102–108). This, in turn, may be an incentive for such actors to avoid evaluations that may have negative consequences (Schwartz, 1998: 295; Weiss, 1993: 95).
Therefore, we can expect the Commission to be reluctant to initiate EPL evaluations in situations where the results of such evaluations could be harmful to its interests. The Commission's better regulation agenda officially endorses the idea that EU legislation should be significantly amended or even repealed if an evaluation shows that it has no added value (European Commission, 2012: 3; 2013: 1; 2015: 254). In reality, however, we can expect that the Commission wants to avoid such situations to protect its competences (Majone, 2005: 65). In other words, we expect the chances that a piece of law is evaluated to be lower if the potential evaluation is more likely to be used to argue for significant amendments to the law.
Involvement of the European Parliament (EP) in decision making decreases the chances of significant amendments and is therefore expected to increase the chances that an evaluation is initiated. The reason for this is that the EP provides an extra veto player that can block amendments (Häge, 2007: 307; Hofmann, 2013: 102). As a majority of EP members generally supports further European integration (Pollack, 2008: 9), it can also be expected that the EP will usually oppose reducing the competences of supranational institutions like the Commission. H4: Pieces of law that can only be amended with the approval of the European Parliament are more likely to be evaluated than pieces of law that can be amended without the approval of the European Parliament.
The voting procedure in the Council is also expected to influence the chances of legislative amendments. If unanimity is required in the Council it is significantly harder to change legislation, as it is difficult to make all member states agree on a proposal (Häge, 2007: 308). We therefore expect the chances that an evaluation is initiated to be higher when the Council applies unanimity voting, as compared to when it applies qualified majority voting (QMV). H5: Pieces of law decided upon by unanimity in the Council are more likely to be evaluated than pieces of law decided upon by qualified majority voting.
Control variables
Aside from the two theoretical explanations described above, this research controls for two other potential explanations for decisions to initiate EPL evaluations. The first control variable is the presence of an evaluation clause. Many EU regulations and directives contain provisions that oblige the Commission to evaluate them after a number of years, which are usually inserted by the Council and the EP to ensure that they will stay informed about the legislation (Summa and Toulemonde, 2002: 410). We can expect legislation containing an evaluation clause to be evaluated more often than legislation without such a clause.
The second control variable is the evaluation capacity of the responsible DG. In this context, evaluation capacity is defined as the presence of sufficient means and processes to ensure that evaluation is an ongoing practice in an organization (Nielsen et al., 2011: 325). Evaluation capacity includes the presence of organizational structures and procedures that support evaluations, the presence of sufficient financial and human capital to evaluate and the presence of proper (methodological) tools to conduct evaluations (Nielsen et al., 2011: 326–327). Since evaluation capacity varies primarily between the DGs of the Commission (Van Voorst, 2017: 25), we expect that legislation under the responsibility of DGs with higher evaluation capacity is more likely to be evaluated than other legislation.
We could also expect the Commission to be more likely to initiate an EPL evaluation if an ex-ante evaluation (impact assessment) of the same legislation was carried out, as impact assessments often contain a section prescribing that legislation should be evaluated ex-post (European Commission, 2015: 246–251). However, this variable cannot be studied in this research because the Commission's system for impact assessments was only set up in 2002–2003 (European Commission, 2007: 4).
Methods and data
Data collection
Although multiple datasets of EU legislation already exist (e.g. Hofmann, 2013: 102; Treib, 2014: 27), none of them suited the specific aims of our research. Therefore, we created two datasets for the task at hand, one containing major European legislation and one containing EPL evaluations (also see Mastenbroek et al., 2016).
The dataset of legislation covers the years 2000–2004. This period was chosen to give the Commission sufficient time to evaluate. While academic literature indicates that legislation is usually evaluated after about five years (Eijlander and Voermans, 2000: 355) and evaluation clauses in EU legislation also tend to give the Commission five years or less to evaluate, we decided to double this period to avoid concluding that any legislation has not been evaluated while an evaluation was in fact still upcoming. Therefore, our dataset of legislation stops at the end of 2004, but it should be emphasized that our article concerns decisions to initiate EPL evaluations over a period of 15 years (2000–2014), which is further explained by the description of our second dataset below.
Because the Commission follows the logic that the resources spent on an evaluation must be proportionate to the importance of a measure (European Commission, 2007: 22; 2015: 255–256), minor EU legislation does not have to be evaluated (European Commission, 2015: 253). Therefore, our dataset of legislation only includes major regulations and directives. We excluded all delegated and implementing acts, 1 which are generally considered less important than primary legislation (Franchino, 2007: 80), as well as all rectifications, amendments and secondary Council legislation. Because of the explicit link between evaluations and improving the effects of legislation on European citizens and companies (European Commission, 2007: 3; 2012: 2; 2013: 1–2), we also excluded legislation without direct relevance for national actors. This includes legislation that only addresses EU institutions or foreign countries. Together, the selection criteria led to a dataset of 277 major directives and regulations adopted in the period 2000–2004. Our dataset of evaluations (see below) contains only eight evaluations of legislation that we did not consider ‘major’ (2% of all evaluations), indicating that our selection criteria were fairly appropriate.
To assess the initiation ratio of evaluations, our dependent variable, we extracted information from a second dataset. This dataset contains 313 EPL evaluations of regulations, directives and treaty articles conducted or outsourced by the Commission between 2000 and 2014 (Mastenbroek et al., 2016: 1334–1335). 2 Evaluations completed before 2000 were omitted because of a lack of data, and evaluations merely containing studying prescriptions for foreign countries and EU institutions were excluded for the same reasons as discussed above. We also discarded those evaluation reports that merely summarize other evaluations.
The evaluations were gathered from different sources: The Commission's multi-annual evaluation overview (2010), the Commission's search engine for evaluations, 3 the Commission's work programmes, 4 EU bookshop, 5 annexes to the Commission's financial reports, 6 and lists of evaluations found on the websites of DGs. We checked our data using an existing list of evaluations produced by expertise centre Eureval, by running Google searches for evaluations of all major legislation adopted between 1996 and 2010, by searching for background documents of legislation in Eur-lex, 7 and by discussing our data-gathering method with the SG (for a further description of the dataset of EPL evaluations, see Mastenbroek et al., 2016: 1334–1335).
Operationalization
Starting with the enforcement motive, the type of legislation (hypothesis 1) was measured as a dichotomous variable (directive or regulation). The complexity of legislation (hypothesis 2) was measured through its number of recitals, as more complex legislation generally requires a larger number of explanations (Franchino, 2000: 74; Kaeding, 2006: 236; Steunenberg and Rhinard, 2010: 501; Treib, 2014: 26). Politicization in the Council (hypothesis 3) was measured by determining if a legislative proposal was on the Council's agenda as a B-point, as B-points represent the topics that are actively debated at the political level (Häge, 2007: 303; Hofmann, 2013: 126; König, 2008: 149).
Concerning the strategic motive, involvement of the European Parliament (hypothesis 4) was measured by looking at the formal procedure used to enact the legislation as stated by Eur-lex. In case of the ordinary legislative procedure (former codecision and cooperation procedures) this involvement was considered high, while in case of the consultation procedure it was considered low (Häge, 2007: 316). The voting procedure in the Council (hypothesis 5) was also measured as a dichotomous variable (QMV or unanimity) using Eur-lex.
Concerning the control variables, we searched each piece of law using specific keywords 8 to establish the presence of an evaluation clause (yes/no). We also checked the last five articles of each regulation or directive, as this is the most common place for evaluation clauses. Concerning evaluation capacity, we measured 12 indicators derived from a model developed by Nielsen et al. (2011, 326–330) via interviews with the European Commission (Van Voorst, 2017: 29–31). However, only the presence of a specialized evaluation (sub-)unit (yes/no) and the presence of evaluation guidelines (yes/no) could be established per DG per year and were, therefore, useful as indicators. For legislation that has been evaluated, the data used concern the year when the evaluation was published. For legislation that has not been evaluated, we assumed that this decision was made five years after the legislation was published (the modal value of the time between publication dates of legislation and publication dates of evaluations is six years in our dataset, from which we subtracted one year as evaluations usually take that long to conduct) and determined the scores for evaluation capacity accordingly. However, because this assumption of five years is somewhat arbitrary we also experimented with other time periods, which affected our results to some extent. 9
The operationalization of all our variables is summarized in the table in the Online appendix. Because of the binary nature of our dependent variable, logistic regression was used for the analysis. The variables belonging to the two motives to evaluate were entered as blocs to allow for comparisons between the models.
Results
Out of the 277 major regulations and directives in our dataset, 116 have been evaluated ex-post. This is an initiation ratio of 41.9%, meaning that about six out of 10 major pieces of EU law from 2000 to 2004 have not (yet) been evaluated by the Commission. This initiation ratio is higher than the 33% found during earlier research about major legislation from 2000 to 2002 (Mastenbroek et al., 2016: 1338), indicating that legislation published during 2003–2004 was evaluated more often than older legislation. This is a sign that the proportion of legislation evaluated by the Commission may be increasing over time.
A few pieces of law in our dataset were evaluated multiple times over the years: 15 pieces of law were evaluated twice, four pieces of law were evaluated thrice and two pieces of law were evaluated four times. Due to the binary nature of our dependent variable, these pieces of law with more than one evaluation have no special impact on our analysis: they were simply coded as 1. Their number was also too low to conduct an additional analysis of the number of times that a piece of law was evaluated.
Figure 1 below depicts the initiation ratio per DG. The three DGs with the highest initiation ratios are DG Eurostat (71.4%), DG Competition (66.0%) and DG Internal Market (65.2%). DG Trade and DG Economic and Financial Affairs have not evaluated their few major pieces of law from 2000 to 2004 at all; besides this the three DGs with the lowest initiation ratios are DG Energy (28.6%), DG Home Affairs (28.0%), and DG Agriculture (20.8%). The variation among DGs is included in the analysis through the evaluation capacity variables; the data do not suggest other patterns concerning the size or policy areas of the DGs that warrant investigation.
Initiation ratio per DG. Note: AG: agriculture; CM: competition; CN: communications and technology; DG: Directorates-General; EC: economic and financial affairs; EM: employment; ER: energy; ES: Eurostat; ET: enterprise and industry; EV: environment; HO: home affairs; JU: justice; ME: maritime affairs; MK: internal market; MV: transport; SA: health and consumers; TA: taxation; TR: trade. Some DGs have merged and/or changed their names since 2014.
Results of the logistic regression.
AIC: Akaike information criterion.
Starting with the type of legislation, the first variable belonging to the enforcement model, Table 1 shows that the odds of being evaluated are about 2.05 times higher for directives than for regulations. In terms of predicted probabilities – which are easier to interpret than odds ratios – the chances of an evaluation taking place are 14.0% higher for directives than for regulations, if all other variables are kept at their observed values. In terms of descriptive statistics, out of the 141 major directives in our dataset, 51.8% have been evaluated; out of the 136 major regulations, only 31.6% have been evaluated. In line with hypothesis 1, these findings indicate that the Commission prioritizes evaluating directives over regulations.
The complexity of legislation, the second variable belonging to the enforcement model, also significantly increases the chances of an evaluation occurring. For every extra recital, the odds of a piece of law being evaluated increase by about 3%. Figure 2 below presents the effect of this variable in terms of predicted probabilities. The figure shows that the chances of an evaluation occurring increase from about 0.3 to 0.7 as the number of recitals grows, with an average growth in predicted probability of 0.06% per recital, if all other variables are kept at their observed values. These findings are in line with hypothesis 2.
Effect of legislative complexity on the probability of an evaluation occuring.
Politicization in the Council, the third variable belonging to the enforcement model that was measured by the occurrence of the legislative proposal as a B-point on the Council's agenda, is not significant. Accordingly, we reject hypothesis 3.
Turning to the political variables, the results in Table 1 show that neither the involvement of the EP nor the voting procedure in the Council provides a significant explanation for variation in the initiation of EPL evaluations. This means that hypotheses 4 and 5 are rejected. These results indicate that the chances of a piece of law being significantly amended do not affect the Commission's decision to evaluate it or not.
Conversely, both control variables turn out to be significant. Table 1 shows that the odds of a piece of law being evaluated become about 4.69 times higher if an evaluation clause is present as compared to legislation without such a clause. In terms of predicted probabilities, the chances of an evaluation taking place are 30.9% higher for legislation with an evaluation clause than for legislation without such a clause, if all other variables are kept at their observed values.
Despite the significance of this variable, it should be noted that only 92 out of 165 pieces of law with an evaluation clause (55.8%) were evaluated, while 24 out of 112 pieces of law without such a clause (21.4%) were evaluated as well. The first number shows that the Commission only complied with a little more than half of the evaluation clauses inserted in major legislation from 2000 to 2004, indicating that the presence of such clauses is not a guarantee that legislation will be evaluated. The numbers also show that the presence of evaluation clauses only explains a part of the variation in the initiation of EPL evaluations.
Table 1 also shows that both indicators for evaluation capacity are significant. The odds of a piece of law being evaluated are about 2.59 times higher for legislation of a DG with an evaluation unit as compared to legislation of a DG without such a unit, and 2.26 times higher for legislation of a DG that has evaluation guidelines as compared to legislation of a DG without such guidelines. In terms of predicted probabilities, the chances of an evaluation taking place are 18.3% higher in the first case and 15.9% higher in the second case, if all other variables are kept at their observed values.
When interpreting these results, however, it should be noted that high evaluation capacity may be a consequence as well as a cause of evaluation-related activities. For example, it is possible that DGs that initiate more EPL evaluations also invest more in evaluation guidelines to support such evaluative activities. It should also be noted that the results concerning evaluation capacity somewhat depend on our assumptions about the number of years after which it was decided not to evaluate certain legislation (as explained in our methodology section and Note 9). Therefore, more research is needed to establish the exact effect of evaluation capacity on the initiation of EPL evaluations in the EU.
Conclusion
This article has sought to describe and explain the variance in the initiation of EPL evaluations by the European Commission. Although the Commission officially endorses EPL evaluations (European Commission, 2007: 3; 2013: 11; 2015: 296), little was known about how systematically the institution conducts such evaluations in practice (but see Mastenbroek et al., 2016). This study aimed to shed light on this underexplored topic by developing a theoretical approach based on two motives for the Commission to evaluate – an enforcement and a strategic motive – while controlling for other potential explanations. We tested these explanations with the help of binary logistic regression, based on two self-developed datasets.
The results show that less than half of all major EU legislation from 2000 to 2004 (41.9%) was evaluated. However, the proportion of evaluated legislation has increased over time. Only a small proportion of the major legislation was evaluated more than once.
Concerning the enforcement motive, our results suggest that the odds of being evaluated are significantly higher for directives than for regulations, and that these odds also increase significantly as legislation becomes more complex. This indicates that the Commission prioritizes evaluating legislation for which the chances of non-compliance are relatively high, and that evaluations may at least partly be initiated to scrutinize member state implementation. Concerning the strategic motive, however, we did not find any significant results. This indicates that the risk of EU legislation being significantly amended does not affect its odds of being evaluated.
Two control variables also turned out to be significant. Firstly, the odds of legislation being evaluated increase significantly if that legislation contains an evaluation clause. However, our data also revealed that the Commission only complies with such clauses in about half of all cases. Secondly, the evaluation capacity of the DG that is responsible for the legislation significantly increases the odds of that legislation being evaluated.
In conclusion, our analysis indicates that the initiation of EPL evaluations by the Commission is best explained by a mix of its need to enforce EU legislation towards the member states, its formal obligations to evaluate and its evaluation capacity. However, these conclusions should be viewed in the light of two possible limitations of this research. Firstly, the quantitative nature of our study required us to use indicators that could be measured efficiently for a large number of cases. Some of these indicators may not entirely cover the abstract concepts that they are supposed to represent, such as evaluation capacity and politicization. Therefore, to sustain the conclusions of this article, a follow-up case study with more sophisticated indicators would be useful.
A second limitation of this study is its time period. As explained above, EPL evaluations may be conducted a decade or more after a piece of law enters into force, so we could not yet assess the extent to which legislation from after 2004 has been evaluated without risking a bias in our data. Whereas our dataset of 277 major regulations and directives (initiated by seventeen DGs) is so broad that our findings are probably not affected by any particular political choice made during 2000 to 2004, it still seems worthwhile to repeat this research in the future to assess to what extent post-2005 legislation is evaluated.
Two other possibilities for future research stand out. Firstly, since this article showed that the Commission does not always comply with evaluation clauses, a follow-up study about the reasons for this seems worthwhile. Secondly, it could be examined to what extent the factors presented in this study also explain variance in the quality of EPL evaluations, as this is another important characteristic of a proper evaluation system and previous research has shown that the quality of the Commission's EPL evaluations varies greatly (Mastenbroek et al., 2016: 1340–1341).
Footnotes
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplementary material is available for this article online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
