Abstract
This study investigates the impact of an extended school week in 30 primary schools on track recommendations, an achievement-related outcome, at the school level. The intervention is part of a major urban policy in one of the most disadvantaged areas in the Netherlands. Register data are used to examine changes in track recommendations between 2010 and 2020. Control schools are selected through genetic matching based on educational and demographic variables, after which comparative interrupted time series estimate the causal effect. The findings show that, on average, track recommendations in the intervention schools did not structurally increase compared to the control schools. We critically discuss how these results might relate to implementation barriers such as the partial top-down design, program quality and coordination, and teacher shortages. We contribute to the literature by showing how difficulties with program implementation are likely to produce null findings.
Keywords
Introduction
The extent to which a prolonged school week enhances educational outcomes is debated (Granger, 2008). While meta-analyses of extended school time interventions demonstrate that they have small positive effects on academic outcomes, on average, there is significant variation in effect sizes and significance levels (Apsler, 2009; Dietrichson et al., 2017; Durlak et al., 2010; Feldman & Matjasko, 2005; Kraft & Novicoff, 2024; Lauer et al., 2006; Lester et al., 2020; Patall et al., 2010; Roth et al., 2010). More research is therefore needed under which conditions extended school time leads to better educational outcomes. This study provides an evaluation of the National Program Rotterdam South (Nationaal Programma Rotterdam Zuid: NPRZ), a high-profile public policy aimed at raising the socioeconomic level in one of the poorest areas in the Netherlands (Custers et al., 2023; Municipality of Rotterdam, 2011). A main program component involves extending the school week in 30 primary schools, with the expectation that more time in school will lead to higher track recommendations. Track recommendations, which are given at the end of primary school, determine which track level students will follow in secondary school and are therefore an indicator of academic ability.
Our study contributes to the literature on extended school time in two ways. First, the literature is primarily dominated by studies from the United States, providing little insight into how the effects of extended school time vary across educational systems and urban contexts. These effects may diverge due to different curricula, policies, and state support (Long, 2014). Several studies on extended school time have been conducted in other countries than the United States, such as Chile (e.g., Bellei, 2009), Germany (e.g., Steinmann et al., 2019), Italy (e.g., Battistin & Meroni, 2016), and Switzerland (e.g., Schuepbach, 2015), but only a few studies investigated extended school time in the Dutch context (Leuven et al., 2010; Meyer & Van Klaveren, 2013).
Second, we show how the wider policy context and issues of implementation potentially affect program effectiveness. Implementing an extended school week in 30 schools has proven challenging owing to the complex configuration of the educational field (Custers et al., 2023; Dol et al., 2019; Kruiter et al., 2020; van der Sluis et al., 2017; Waslander et al., 2016). The NPRZ is an ambitious program, but its partial top-down design has not been unequivocally welcomed by lower-level actors such as teachers and parents. Schools also have relatively much autonomy regarding the content of the extended school time, which may affect uniformity within the program. Furthermore, the program has operated in the context of an emerging national teacher crisis (Inspectorate of Education, 2019) and it has been difficult to attract qualified staff for the extra hours (Kruiter et al., 2020). Such insights on how the educational field functions contribute to understanding how factors outside schools, or system aspects (Hopkins et al., 2014; Spillane et al., 2019), relate to school effectiveness and what barriers to successful program implementation might exist. In addition, we illustrate that comparing Dutch schools in research is a meticulous exercise, as schools are subject to various policies and interventions (Dronkers, 1995; Karsten, 1994; Ladd & Fiske, 2011; Waslander et al., 2016).
The first goal of this study is to discuss how the complexity of the Dutch educational landscape hampers effective implementation of the NPRZ policy, and the second goal is to conduct an analysis of the program’s effectiveness regarding track recommendations. The remainder of the article is structured as follows. The literature review presents key findings on how extended school time contributes to academic outcomes, a brief explanation of track recommendations in the Dutch educational system, and a discussion of the implementation of the NPRZ policy. Next, the research design is introduced after which the matching and main analyses are presented. In the final section we discuss the results, how they relate to the program’s implementation, and the challenges for studying such comprehensive programs.
Key Findings on the Effects of Extended School Time
Studies on the effectiveness of extended school time have various foci, including comparing differences in school time between educational systems, modeling policy changes, and evaluating specific interventions. Reviews focus on extensions of the school day or year (e.g., Patall et al., 2010), after-school programs (e.g., Durlak et al., 2010), summer programs (e.g., Lauer et al., 2006), and extracurricular activities (e.g., Feldman & Matjasko, 2005). These terms overlap in what activities they designate, although there are also clear differences (e.g., summer programs and after-school programs taking place at different moments during the year). We mainly discuss findings from evaluations and intervention studies of prolonged school days and after-school programs. Extended school time is defined here as extra time for organized activities such as learning, social events, culture, or sports in school or school-related settings.
Reviews of extended school time interventions do not provide a coherent view of their effects on academic outcomes. Most reviews report small to moderate effect sizes (Durlak et al., 2010; Kraft & Novicoff, 2024; Lauer et al., 2006; Lester et al., 2020; Patall et al., 2010), but others indicate almost no significant effects (Dietrichson et al., 2017; Roth et al., 2010). The varying conclusions may be due to differences in inclusion criteria for relevant studies and methodologies used. Published studies find higher effect sizes than unpublished studies, and review studies provide different views on how study quality is related to effect size (e.g., Lauer et al., 2006; Lester et al., 2020).
In general, however, it can be observed that most programs, on average, seem to have little or zero net-effects, whereas only some are quite successful (Granger, 2008). More time in school, especially time focused on instruction and learning, can lead to higher achievement, but this also depends on how much time students already spend in school. If the school day is already long, the marginal returns of extra hours diminish (Kraft & Novicoff, 2024). Since there is a large variety in program effectiveness, reviews discuss mechanisms to understand why certain programs are more effective than others (e.g., Feldman & Matjasko, 2005). Durlak et al. (2010, p. 295) group effective intervention elements under the acronym SAFE (Sequenced, Active, Focused, and Explicit). Successful programs have a step-by-step approach, emphasize active learning, dedicate specific time to learning, and have clear goals. In line with the SAFE-features the quality of instruction is crucial, which is determined by factors such as staff training, evidence-based teaching methods, content tailored to student skill levels, and alignment with the school curriculum (Fashola, 1998; Smith et al., 2010). Extended school time with smaller group sizes and higher involvement of tutors also leads to better results as students receive more personal attention (Dietrichson et al., 2017).
Furthermore, the breadth of activities during extended school hours is relevant for academic achievement. Research suggests that programs incorporating more structured educational activities positively correlate with improved educational outcomes (Durlak et al., 2010; Feldman & Matjasko, 2005). However, there is ongoing debate regarding the academic impact of combining academic and social activities during extended school hours. Combining different activities may serve to increase student motivation by providing a harmonious blend of academic challenge and leisure. Students also have opportunities to cultivate their interests and skills in areas beyond academics (Bohnert et al., 2010; Lauer et al., 2006).
The National Program Rotterdam South
The NPRZ is a landmark urban policy in the Netherlands. It targets the Rotterdam South district, a densely populated area of approximately 200,000 inhabitants and one of the most socioeconomically disadvantaged areas in the country (Custers et al., 2023). The policy was initiated in the early 2010s, following a government commission’s findings that the levels of disadvantage in Rotterdam South were of “unDutch” proportions and that previous social policies had been insufficient in addressing issues of poverty (Municipality of Rotterdam, 2011). A long-term program spanning two decades was therefore developed, which involves collaboration between multiple stakeholders, including the national government, local authorities, educational institutions, housing corporations, and employers. The program aims to stimulate socioeconomic progress in the area by leveraging significant national funding. The main goal of the NPRZ is that Rotterdam South should acquire the same socioeconomic standing as the four largest cities in the Netherlands (i.e., Amsterdam, Rotterdam, the Hague, and Utrecht) through investments in education, housing, and the labor market.
The Harlem Children’s Zone (HCZ) serves as an inspiration for the NPRZ (Dol et al., 2019). The HCZ has been successful in improving academic outcomes among poor and mostly black children (Dobbie & Fryer, 2011), although such urban charter schools have also been criticized for limiting the development of higher-level skills among students (Golann, 2015). The HCZ follows a “pipeline” model in which children and their families receive high-quality services from birth to college. School and community programs focus on extending the school day and year, attracting high-quality teachers, offering different support services, and fostering a culture of achievement. In spirit of the HCZ, the NPRZ seeks to mimic the “pipeline” model to foster a new generation of residents with higher educational degrees and higher-skilled jobs (Municipality of Rotterdam, 2011; National Program Rotterdam South [NPRZ], 2012). In this way, together with attracting middle and higher classes to the area by newbuild housing (see Custers & Engbersen, 2022), the NPRZ aims to achieve its goal of raising the socioeconomic status of Rotterdam South.
The CZ and the Extended School Week
The NPRZ has created a Children’s Zone (CZ) in seven so-called “focus” neighborhoods because here social problems and disadvantage would accumulate the most. The choice for these neighborhoods was mostly based on socioeconomic indicators that demonstrate they have lower scores, on average, than the whole of Rotterdam South and Rotterdam. The seminal document reports a list of 19 indicators—including indicators such as the level of unemployment, single parent households, dropouts, low educated parents, and average test results—that is used to argue why intervention in this specific area is needed (Municipality of Rotterdam, 2011). The choice for including schools in the focus neighborhoods resulted from discussion during the educational round table (see further below) and with other stakeholders in the NPRZ (Kruiter et al., 2020; Municipality of Rotterdam, 2011). However, it has been questioned whether the focus neighborhoods are truly distinctive (Custers et al., 2023), since their socioeconomic profile resembles other disadvantaged neighborhoods in Rotterdam (Custers & Engbersen, 2022) and the Netherlands (The Netherlands Institute for Social Research, 2019). Rotterdam South does traditionally have a strong stigma, but this applies to the whole district and not the seven focus neighborhoods in particular. Moreover, other disadvantaged neighborhoods in Rotterdam also carry a stigma (e.g., Burgers & Kloosterman, 1996). Thus, the schools in NPRZ CZ do not seem to differ from other schools outside the CZ in some unobservable way, which further enables comparison between schools inside and outside the CZ.
Multiple interventions have been introduced to replicate the HCZ model, such as investment in preschooling, neighborhood care teams, professionalization of teachers and school management, and an extended school week (NPRZ, 2012). The NPRZ CZ thus consists of a bundle of interventions, although it is unclear for some interventions to what degree they are implemented and how this differs from similar interventions in other parts of the city (Rotterdam Court of Audit, 2016). The two main interventions in the educational pillar include the extended school week and additional support by neighborhood care teams, where the former centers on school activities and the latter on care at home that is provided through school contacts (NPRZ, 2022). Considering school achievement in primary education, the extended school week is a primary priority, as evidenced not only by the program itself (e.g., NPRZ, 2022) but also by the attention given in monitoring studies (e.g., Dol et al., 2019; Kruiter et al., 2020). About 18 million Euros was allocated annually to the CZ in the period 2019 to 2022 (Municipality of Rotterdam, 2019), of which 11 million Euros was spent on the extended school week (Municipality of Rotterdam, 2020). In addition, in the media the implementation of the extended school week is frequently presented as the main policy achievement of the NPRZ CZ, which has also sparked off debate on its content and effectiveness (e.g., Venema, 2019). In sum, there are multiple interrelated interventions in place in the NPRZ CZ. Since the extended school week is the main intervention for improving educational outcomes in primary education, we will further focus on this intervention.
The official goals of the extended school week are that it should contribute to good educational results, broad education, and social-emotional development of children (Kruiter et al., 2020, p. 7). At the start of the program, educational results (i.e., higher achievement levels) were the main aim (e.g., NPRZ, 2012), but over time the goals are more broadly formulated (e.g., NPRZ, 2023). For the extended school week, the 30 primary schools in the CZ provide 6 to 10 hr extra schooling per week. Since the typical school week consists of 22 to 25 hr of schooling, this extension entails a substantial increase in the length of the school week (cf. Kraft & Novicoff, 2024). The NPRZ aimed for 10 hr from the start, but in practice most schools offered 6 hr in the first years the program. Most schools administer the full 10 hr since 2019 (Kruiter et al., 2020). The extended school week started in different years across schools. However, at least 23 schools had started the program in the school year 2013/2014 (Rotterdam Court of Audit, 2016). In the starting year the extended school week was implemented in grade six, after which the intervention was gradually extended to lower grades in the following years.
During the extra hours schools provide a wide range of activities, including lessons in mathematics, language, culture, arts, sports, lifestyle, and nature. Some hours involve learning core subjects such as mathematics and language, while other hours contain leisure or activities targeted at personal and socio-emotional development. For instance, a “typical” school day might have one additional hour during the break in which external staff entertains students so that regular teachers have some rest or more time to prepare lessons. Another extra hour might then take place after school in which students receive homework guidance. The implementation differs between schools because they more or less freely choose the timing and content of the extra hours. Yet, in practice all schools adopt a mix of cognitive learning and broader development (Kruiter et al., 2020). Almost all activities in the extra hours are provided by external organizations. Schools receive funding to employ organizations that offer school services and accordingly choose which program aligns with their own organization and curriculum. Finally, the NPRZ requires that participation in the extended school week is mandatory to all students (Kruiter et al., 2020).
The intervention cannot be strictly classified as extended school time (e.g., more time for instruction by teachers) or an after-school program (e.g., extracurricular activities after school), since schools are flexible in how they use the extra hours. In some schools there might be more time for instruction because some lessons are provided by additional staff, while other schools might opt for leisure activities after school. This can also vary within schools between different days. Unfortunately, it is not possible to keep track of all the choices schools make in this regard.
Difficulties with Implementation
Implementation plays a crucial role in understanding why some programs are more effective than others. The literature on implementation identifies multiple factors that influence how implementation affects program outcomes (e.g., Durlak & DuPre, 2008). Although we cannot review all these factors given the scope of this paper, it is relevant to discuss the implementation of the NPRZ intervention.
Implementing the extended school week in 30 primary schools has been complicated due to the governance of the educational field (Dol et al., 2019; Kruiter et al., 2020; Rotterdam Court of Audit, 2016). The Dutch educational system is one of the most decentralized and complex systems in the world, owing to the diverse landscape of public and private schools and the involvement of numerous other institutions and organizations (Waslander et al., 2016). Schools and school boards, being legal authorities that represent groups of schools based on common denomination, are subjected to national legislation and regulations regarding funding, inspection, and examinations. School boards are also partly responsible for implementing legislation and regulations. In general, however, they have a high level of autonomy, meaning they decide on issues such as teaching materials, the appointment of teachers, and the structure of the school week (Karsten, 1994). This autonomy entails that governments cannot implement policy programs without the approval and cooperation of school boards.
In addition, in the Dutch educational system parents have free school choice. Religion was a decisive factor in school choice, but nowadays other factors such as location, school quality, and composition are more important (Denessen et al., 2005). Thus, schools must take parental preferences into account in shaping their education. Parental choice has therefore resulted in strong competition between schools to attract students because schools are funded based on student numbers (Dronkers, 1995).
Against this background, the NPRZ had to get multiple school boards on board to create a CZ in the focus neighborhoods. In spirit of the Dutch neo-corporatist tradition, a “educational round table” was created represented by several stakeholders (school boards, the NPRZ bureau, municipal policy advisors, program coordinators, and educational providers). School directors, teachers, and parents are not structurally part of this round table. The round table frequently meets to discuss the extended school week intervention. During these meetings it was decided that schools should provide 10 hr of extra schooling—which have been fully realized since 2019—but that schools can more or less freely choose how they spend these hours (Kruiter et al., 2020). Some schools, however, feel that the program has been forced upon them by the NPRZ bureau and their school boards: “Schools have experienced the implementation of the extended school week as an accomplished fact that could not be influenced. This seems to point toward a missing link between the round table on the one hand, where concepts and plans are discussed, and schools on the other hand [own translation]” (Kruiter et al., 2020, p. 10). Dol et al. (2019) further note that even though schools have much discretion in how they fill in the extra hours, they objected to the strict prescription of 10 hr. The extended school week thus has a partial top-down design, since a group of administrators and policymakers initiated the intervention and decided on the number of hours. On the other hand, schools have considerable autonomy concerning the content and practicalities of the extra hours. In sum, the design of the program created tension from the start.
Several problems emerged during implementation. First and foremost, in the past decade a national teacher “crisis” has developed in the Netherlands, leading to an overall shortage of available teachers (Den Brok et al., 2017; Inspectorate of Education, 2019). The teacher shortage in Rotterdam South was estimated at 212 FTEs in 2022 (NPRZ, 2023). The consequences of this teacher shortage manifest at schools with high shares of migrants (Inspectorate of Education, 2019), of which many are in the NPRZ CZ (Custers et al., 2023). Since the extended school week requires a lot of coordination and administration, it has been perceived as an extra “burden” for some schools because they were already dealing with staffing issues. It has been suggested that the working environment in CZ schools is a reason for some teachers to leave, although this has not been thoroughly researched (Kruiter et al., 2020).
Second, because most schools do not have the capacity to organize the extra hours, external educational providers carry out this task. There is, however, considerable variation between providers in terms of costs and quality of services provided. Schools note that the staff of external providers does not always possess sufficient pedagogical and teaching skills (Kruiter et al., 2020). In addition, the coordination between regular staff and external staff causes issues sometimes, for instance when children misbehave in the extra hours and school teachers have to deal with it afterward. As a response to such issues, “coalition supporters” were hired to smoothen the coordination even though this introduces another actor. The proliferation of involved actors and the variable quality of external staff thus hampers effective implementation.
Finally, from the start parents have not been actively involved in the design and implementation of the program. Some parents therefore do not understand its necessity or complain about its mandatory character (Dol et al., 2019; Kruiter et al., 2020). There is also variation between schools in how strict the mandatory participation is enforced (Kruiter et al., 2020). The lack of parental involvement can be problematic, as it negatively affects children’s motivation to actively participate (cf. Weitzman et al., 2008).
Track Recommendations
The Dutch educational system is a track allocation system that includes two major transitions. The first transition is from primary to secondary education, whereas the second one is from secondary to tertiary education. Especially the first transition is considered pivotal in the student’s educational career. Students transfer from primary to secondary school after sixth grade, tracking thus occurs at relatively early age (around age 12). Secondary school consists of four tracks for which students receive a track recommendation. These four tracks include: practical education (PRO), pre-vocational education that includes four levels of differentiation (VMBO), senior general secondary education (HAVO), and pre-university education (VWO). The tracks largely determine what level of education students eventually obtain (Tolsma & Wolbers, 2010). For instance, the VWO track gives access to university while with the VMBO track students enter secondary vocational education. Thus, track recommendations strongly relate to later educational outcomes and are therefore an important aspect of the educational career.
The track recommendation reflects the teacher’s opinion of the student’s expected future achievement level, that is, which track is most suitable given a student’s aptitude (de Boer et al., 2010). Track allocation is based on a combination of test scores on a standardized national exam and the teacher’s view of the student. Up to 2014 the exam was first administered followed by the teacher’s track recommendation. Since then, the teacher provides a preliminary recommendation that can be adjusted based on test performance. This policy change gives teachers more autonomy in the final track recommendation, which ultimately decides which track students will follow in secondary school. Although the teacher’s track recommendation is an indication of the student’s abilities, it is also prone to the teacher’s subjectivity (Batruch et al., 2023). We will discuss the implications of using track recommendations as the dependent variable in the final section.
Research Design
No evaluation study was planned to test the effectiveness of the extended school time intervention. A monitor keeps track of educational outcomes in the NPRZ CZ, but it cannot separate the impact of the intervention from confounding factors (Boom et al., 2022). The literature, however, offers several avenues on how the effect of the program can be estimated. One feasible research approach is creating a matched school-level dataset, after which short comparative interrupted time series (CITS) can be performed. Such a design retrospectively investigates school-level interventions with register data against relatively low costs (Hallberg et al., 2018, 2020; Stuart, 2007). Within-study comparisons have indicated that, when properly conducted, quasi-experimental designs using CITS are valid alternatives to experimental designs such as a randomized control trial (e.g., Hallberg et al., 2020; St. Clair et al., 2016).
The research strategy proposed by Hallberg et al. (2018) was adopted as follows. We investigate student cohorts in their eighth and final year of primary school (i.e., grade 6). Track recommendations are used to calculate the average level of recommendations at the school level. The System of social statistical datasets (SSD) contains data on the track recommendations along with other information on student’s characteristics, such as ethnicity, parental education, and household income (Bakker et al., 2014). These data are aggregated to create a school-level longitudinal dataset for all schools in the Netherlands including cohorts in three preintervention years (2010/2011–2012/2013) and seven postintervention years (2013/2014–2019/2020).
The complete dataset includes 20 schools from the NPRZ CZ and 5,145 potential control schools. Not all schools could be identified due to data availability. The estimated treatment effect thus only covers a selection of schools, that is, 20 out of 27 eligible schools for this study. 1 Initial analysis, however, showed that this selection includes schools from different denominations and neighborhoods, thereby being representative of all NPRZ CZ schools. 2 The NPRZ organization confirmed that these schools had started with the extended school week in 2013/2014. Furthermore, different estimates are provided based on data availability.
Data and Method
Statistics Netherlands manages the SSD that includes data from administrative sources in the Netherlands, such as the population register, educational institutions, and the tax authority. Under strict legal and ethical restrictions, scholars can access individual-level pseudonymized data to conduct scientific research. Statistics Netherlands checks research output to prevent privacy violations. Below we describe the school-level variables and SSD files.
Track recommendation—the dependent variable is the school’s average ISLED score (the International Standard Level of Education) (SSD file: INSCHRWPOTAB). We classified the track recommendations into four categories, which were subsequently transformed into ISLED scores. This transformation into a continuous scale of 0 to 100 increases its interpretability in an international context (Schroder & Ganzeboom, 2014). The following scores are included: lower pre-vocational education (LPV) (PRO/VMBO basis-kader, ISLED = 29.34), upper pre-vocational education (UPV) (VMBO gemengd-theoretisch, ISLED = 45.27), senior general secondary education (SGS) (HAVO, ISLED = 62.3), and pre-university education (VWO, ISLED = 71.92). When students receive a double recommendation, the highest one is chosen (cf. Boom et al., 2022).
We also considered comparing schools on standardized test scores, but a policy change in 2014/2015 enabled schools to use standardized tests from other providers than CITO—the main test provider until then. As most schools in the NPRZ CZ adopted other tests, we are unable to compare test scores over time.
Parental education—parental education is based on the highest level of obtained education by either of the parents (SSD file: HOOGSTEOPLTAB). We match on the school’s share of higher educated parents (higher professional or university education), lower educated parents (lower than vocational level MBO-2), and missing values on this variable. Missing data on the level of education tends to be high in the SSD, especially in earlier cohorts.
Household income—the disposable standardized household income is divided into percentile scores following the national income distribution (SSD files: IHI/INHATAB). The school’s household income level is the average percentile score.
Household wealth—a variable similar to household income, but then based on the household wealth (SSD file: VEHTAB). Wealth position reflects the assets of a household minus the level of debt.
School size—school size includes the number of pupils in grade 6 at a particular school. The data only allows to calculate school size and not class size. Schools may have multiple classes in grade six.
School retention—the retention rate is the years that children were subscribed at the same school (SSD file: INSCHRWPOTAB). School retention rate is the average amount of years children spend at a school.
Ethnicity—the ethnicity of children follows the Dutch state classification, meaning that children have a migration background when they or one of their parents is born abroad (SSD file: GBAPERSOONTAB). We match on the school’s share of native students, Surinamese students, Turkish students, and Moroccan students.
Denomination—the SDD provides a list of 27 school denominations (SSD file: INSCHRWPOTAB). These denominations were recoded to four categories: Public, Catholic, Protestant, and Other.
In addition, two variables are not used for matching but are included as control variables in the CITS. First, around 2015 many Syrian refugees fled to the Netherlands. Refugees or new migrants tend to have lower educational results because they may suffer from trauma, have a gap in their educational career, and lack a support network, which in turn affects the school’s level of track recommendations (Golsteyn et al., 2024). We therefore control for the share of Syrian students (SSD file: GBAPERSOONTAB). Second, track recommendations were slightly influenced by some of the new tests introduced in 2014 (Jacobs et al., 2024). We control here for the type of test used in every school year (SSD file: INSCHRWPOTAB).
Dataset Construction
The SSD consists of separate files and includes individual-level data from various sources. We combine these files using pseudonymized individual identifiers. Some loss of data occurred because individuals could not be matched between files, but this share was very small (0.5% of the target population). Some missing values were present for household income (0.7%) and household wealth (0.5%), which were excluded through listwise deletion.
A more substantial loss of data occurred for two different reasons. First, we identify intervention schools in the SSD through unique school identifiers and neighborhood location. Not all schools could, however, be matched to their respective neighborhoods. In addition, over time schools can move location, close or merge with other schools. Consequently, their identifiers often change or are absent from the data, making it difficult to identify schools across years. This resulted in a loss of seven intervention schools, leaving 20 schools with the same identifier for the period of study. For the potential control group, the number of unique school identifiers dropped from 7,145 to 5,947. Note, however, that this does not mean that as many schools were excluded. Schools can have multiple identifiers across different years, meaning that, for instance, two excluded school identifiers are associated with one school.
Second, for the cohorts 2010/2011 and 2011/2012 there was considerable missing data on the dependent variable, respectively 8.5% and 5.5%—among smaller amounts of missing data for other years. Inspection of the data showed that these missings were mainly concentrated in a subgroup of schools. For instance, more than 70% of all individual-level missing data originated from schools that had more than 50% missings on the dependent variable. This observation indicates that some schools did not provide adequate data for these cohorts to the Inspectorate of Education. Therefore, a threshold was chosen that school should have at least 85% valid scores on the dependent variable in all school years. The 20 NPRZ CZ schools met this threshold for 2011/2012, but five schools had too many missings for 2010/2011. We therefore decided to construct two separate datasets based on data availability. The first dataset includes 20 NPRZ CZ schools and 5,145 potential control schools, and has valid data for the period 2011/2012 to 2019/2020. The second dataset includes 15 NPRZ CZ schools and 4,700 potential control schools for the period 2010/2011 to 2019/2020. In this step, respectively 802 (first dataset) and 1,252 schools (second dataset) were deleted. Analysis of demographic variables shows that there are some small differences between the schools deleted in this step and the schools that remain, but overall there seems to be no selection bias here (see Appendix). Hence, the first dataset incorporates more schools but fewer pre-trend datapoints, whereas the second dataset has fewer schools but more pre-trend datapoints (cf. Hallberg et al., 2018). We thereby provide different estimates for the program’s effectiveness.
Analytical Strategy
The CITS-design serves to estimate the causal effect of an intervention, whereby the treatment effect is the difference in outcomes between the intervention and control group. Two key analytic decisions must be made in its application (Hallberg et al., 2018, 2020). First, a suitable control group is needed that matches the intervention group as closely as possible before the intervention takes place. In an ideal situation, the extended school week would be randomly assigned to schools within a group of eligible candidates, but this did not happen here. This study therefore adopts a quasi-experimental approach in which the control group is selected on observed characteristics, but schools may still differ on unobserved characteristics.
Our choice of covariates for matching includes whether they potentially relate to both the treatment assignment and the outcome. First, we match on socioeconomic and ethnic variables because educational outcomes are often lower among disadvantaged and minority groups, which is also the reason why schools with high shares of these groups are targeted for interventions (e.g., Dietrichson et al., 2017). This is also the case in the NPRZ where the level of neighborhood disadvantage was the main driver behind the selection of schools. Next to the socioeconomic variables (parental education, income, and wealth), we selected four ethnic groups because they are the largest ones (thereby strongly influencing track recommendations at the school level), there is considerable variation between these groups in educational attainment (e.g., Custers et al., 2023), and the number of associated variables was considered parsimonious for the matching procedure. In addition, even though retention rate, school size, and denomination were not directly relevant in selecting the intervention schools, previous research has shown that they may influence the outcome (e.g., Custers et al., 2023; Leithwood & Jantzi, 2009).
The most optimal covariate balance was achieved by applying genetic matching, which uses a search algorithm that iteratively checks and improves covariate balance. It is a generalization of propensity score (PS) and Mahalanobis distance (MD) matching (Diamond & Sekhon, 2013). Covariate balance was assessed by checking the standardized mean differences (SMDs), the variance ratios (VR), and the improvements in these statistics between the unmatched and matched samples. The matching was performed using the “MatchIt” package in R, version 4.5.3 (Ho et al., 2011). We used one-to-one nearest neighbor matching without replacement and the “pop-size” argument was set to 10,000 for the algorithm to reach optimal values regarding the scaling factors. For the first dataset we matched on all variables for separate years except the propensity score and denomination, because the latter is constant (26 variables in total). However, for the second dataset this would result in too many variables for matching. We therefore decided to take the average on all continuous variables for the three pre-trend years, except for the ISLED scores. Overall, the matching procedure led to a good covariate balance between the intervention and control group. The matching results can be found in the Appendix.
The second decision involves choosing the right model to estimate the treatment effect, which hinges on the functional form of the preintervention trend (Hallberg et al., 2018). The school year 2013/2014 is taken as the starting year of the intervention. Visual inspection shows that the trends of the intervention and control group are parallel (see Figures 2 and 3), which gives some support to the parallel trends assumption (Angrist & Pischke, 2009). Under this assumption, it is recommended to use the baseline mean model (Hallberg et al., 2018; St. Clair et al., 2016). The simple baseline mean model is defined as follows:
Where
In addition, we include a second model with control variables (i.e., all independent variables described in the data section) because of two reasons. First, this model controls for any residual differences in covariate balance that are not covered by the matching process. Second, intervention and control schools may develop in different ways in the postintervention period, for instance when a group of schools attracts more higher educated parents due to gentrification in the neighborhood (cf. Custers & Engbersen, 2022). The treatment effect may be difficult to estimate when it is conflated with differential population changes in schools. Controlling for school characteristics thus accounts for different school trajectories on the observed variables in the postintervention period that might affect the outcome. In the Appendix we show these trajectories for the school characteristics, indicating they are similar for the intervention and control schools.
The baseline mean model with controls has the following definition:
This model is an extension of Equation 1, including
Results
We briefly describe some key characteristics of the intervention schools (N = 20) and how they relate to the potential control group (N = 5,145), indicating the differences between the CZ schools and the wider population of schools in the Netherlands. Figure 1 shows that, on average, the intervention schools in the first dataset have lower ISLED scores in the preintervention period. For instance, in 2013 the ISLED score is 7.3 points lower in the intervention group, which is approximately half of the difference between the lowest track (LPV) and the second lowest track (UPV). In addition, the intervention schools score considerably lower on parental education, household income, and wealth. For example, in the intervention schools 37.9% of the parents is lower educated—not accounting for missing values—whereas this is 12.3% for the potential control schools. School size and retention rate do not strongly differ between both groups. Finally, the share of native students is much higher in the potential control group (79.9%) than in the intervention group (10.1%). Especially the share of Turkish students is high in the intervention group (31.3%). The same demographics for the second dataset are reported in the Appendix.

Mean scores for the potential control (N = 5,145) and intervention (N = 20) group in the first dataset (independent variables averaged for 2011/2012–2012/2013).
Figure 2 includes the CITS results from the first dataset, showing the average track recommendation for both the control and intervention groups during the period of study. Because of the matching procedure, the scores for both groups are very similar for 2011/2012 to 2012/2013, indicating the schools developed evenly before the intervention started. After the start of the intervention and up to 2014/2015 intervention schools perform slightly worse on average, whereas in 2015/2016 and 2018/2019 they perform slightly better. The overall trend, however, indicates that average track recommendations for the intervention and control schools do not substantially differ after the start of the intervention.

Average ISLED score per school year for control and intervention schools in the first dataset, before and after program introduction (start intervention: 2013/2014).
The simple baseline mean model (Model 1a) confirms that intervention schools do not structurally perform better after the intervention (see Table 1). No significant effects are found for the postintervention years. It could, however, be that the intervention schools developed differently than control schools in the postintervention period, for instance through attracting other kinds of student populations or changing the type of standardized test. Model 2a therefore includes all control variables to account for such possible influences. Although the effects change to some extent, they remain insignificant and no substantial differences are observed.
Unstandardized Effects of Short Interruptive Time Series (linear Models) with ISLED Score as Dependent Variable.
, **, ***p < .001 (two-tailed).
Results from the second dataset are comparable to those from the first dataset. The trends in the preintervention years are similar for both groups (Figure 3). The track recommendations of the intervention group are lower in some postintervention years (2014/2015 and 2017/2018) than for the control group. Thus, compared to the control group track recommendations have not structurally improved on average in the intervention group since the start of the extended school week.

Average ISLED score per school year for control and intervention schools in the second dataset, before and after program introduction (start intervention: 2013/2014).
Model 1b shows that both groups do not structurally deviate from each other, as the effects for the postintervention period are not significant (Table 1). Adding the control variables in Model 2b leads to some changes in effects, but no evidence of a lasting intervention effect can be detected. The effects for a smaller group of schools also suggest that the intervention did not affect track recommendations.
Several robustness checks were performed to examine to what extent the presented results depend on the matching and model choices. These checks confirm that the extended school week, on average, does not seem to affect track recommendations (see Appendix).
Conclusion and Discussion
This study had two main objectives: firstly, to discuss the implementation of an extended school week intervention within the complex Dutch policy context, and secondly, to estimate the intervention’s impact on track recommendations. On average, the results show that the extended school week had no effect on track recommendations. In the 7 years following the start of the intervention, the average track recommendation in 20 primary schools did not substantially deviate from that of the 20 matched control schools. The same result was found for a smaller subsample of 15 intervention school and 15 matched control schools. These findings were corroborated through various robustness checks. We emphasize, however, that outcomes may differ between and within schools, for instance as the consequence of unobserved nonrandom individual-level (self-)selection into schools. Nevertheless, the results indicate that the ambitious program has had limited impact at the school level.
We provide three possible explanations. First, the program may lack focus in its goals, and in turn, includes too few structured activities that stimulate better educational results. The literature on extended school time shows that programs with stronger SAFE features are more likely to achieve higher academic outcomes (Durlak et al., 2010), but the NPRZ CZ intervention only partly adheres to these features.³ Most schools have a broad mix of activities to fill the extra hours, including homework lessons, curricular activities, and extracurricular activities in sports, culture, nature, and lifestyle (Kruiter et al., 2020). The goals of the NPRZ CZ are also loosely defined, including gains in educational results, broad education, and socioemotional development, and moreover, these goals changed over time. The program does not clearly elaborate in what ways the extended school time activities are expected to contribute to achieving these goals. Thus, even though some studies indicate how a greater breadth of activities can lead to better academic outcomes (see Bohnert et al., 2010), it seems that the combination of loosely defined goals and the proliferation of activities during the extended school time is at odds with what is known about effective program features.
Second, the intervention lacks several elements that are needed for effective implementation, including participation and the quality of staff. Although the NPRZ states that participation is mandatory for all children, in practice it seems that many schools are quite flexible regarding participation (Dol et al., 2019; Kruiter et al., 2020). Studies have, however, shown that programs with higher rates of participation, and particularly higher engagement, are more effective (e.g., Bohnert et al., 2010; Feldman & Matjasko, 2005). A lack of participation may therefore offset the potential benefits of the extended school week. Furthermore, it is generally acknowledged that the staff of external providers has less didactical and pedagogical skills than the school staff (Kruiter et al., 2020), whereas having qualified staff is an important condition to achieve considerable impact (Fashola, 1998; Smith et al., 2010).
Finally, the partial top-down design of the program and characteristics of the educational field have complicated implementation. The NPRZ convinced different school boards to establish a CZ in Rotterdam South while these boards historically consider each other competitors in attracting students (Dol et al., 2019; cf. Dronkers, 1995). Organizing cooperation between these school boards thus took considerable effort, which materialized into a round table where the outline of the intervention was discussed among the NPRZ bureau, school board directors, and policy advisors. However, lower-level actors such parents, teachers, and school directors have only been minimally involved in the design of the program, which has decreased support for the program (Dol et al., 2019). Although schools have considerable autonomy in how they fill in the extra hours, the top-down approach has led to tensions among several actors. These tensions are particularly visible in discussions regarding the prescription of 10 hr, which are perceived as having been forced upon the schools (Kruiter et al., 2020).
In addition, since the start of the program a national crisis involving teacher shortages has developed that has particularly affected schools with disadvantages students (Inspectorate of Education, 2019). Schools in the NPRZ CZ therefore experience difficulties with attracting qualified teachers. The intervention, on the other hand, demands substantial coordination between different actors, which can complicate the working environment for teachers because they must deal with issues that might occur during the extra hours. In an unforeseen manner, the extended school week might thus negatively affect the ability of schools to hire and maintain teachers (cf. Dol et al., 2019; Kruiter et al., 2020). In short, the complex dynamics within the educational field, together with the program’s top-down approach, may have affected successful implementation and school operations.
This study thus illustrates that, next to considering organizational factors within schools, system aspects also potentially explain why some programs are (in)effective (Durlak & DuPre, 2008; Hopkins et al., 2014). It seems that the intervention started with a lack of ownership among lower-level actors and there was little mutual adaptation between higher-level and lower-level actors. In addition, translating a concept like the HCZ to the Dutch context faces many obstacles (cf. Spillane et al., 2019), such as dealing with parental free choice of education, getting the cooperation of school boards, and attracting qualified teachers for disadvantaged schools.
We conclude by discussing some limitations. First, the estimates in this study reflect the average effects for all schools in the program. Schools have, however, some degree of autonomy in how they fill in the extra hours, meaning that intervention effects might differ between schools and groups of students. Future research could therefore focus more on effects for individual schools, for instance through applying synthetic control methods (Abadie, 2021). Moreover, our analyses concentrated on the school level, but student-level factors could also play a role.
Second, our models depend on multiple assumptions. One common assumption is that the intervention is tested against control schools without interventions. This assumption is, however, unrealistic in the Dutch educational context because all disadvantaged schools receive additional resources through national policies (Ladd & Fiske, 2011). The estimates in this study thus reflect the effect of the extended school week against unknown programs in control schools. Another assumption is that intervention and control schools do not differ on non-observed variables that relate to both the outcome and treatment assignment. Although we match on multiple variables, some variables that potentially relate to both the outcome and treatment assignment might be non-observed.
Third, track recommendations were the dependent variable. Although this measure is an indication of the student’s learning potential and ability, it can also be biased due to the teacher’s subjectivity. We suspect, however, that the influence of this bias on our results is low. Track recommendations are often biased against student with lower socioeconomic status or non-native backgrounds, resulting from stereotypes or parental expectations (Batruch et al., 2023; de Boer et al., 2010). Since we match on these variables, we expect that level of bias is similar between intervention and control schools, therefore not substantially affecting the differences in track recommendations between these two groups. Unfortunately, we could not examine standardized test scores due to data availability and we were unable to include measures on program implementation or participation that might explain why results might differ between schools (e.g., Durlak & DuPre, 2008).
To conclude, this study exemplifies how to investigate school-level interventions using longitudinal register data (Hallberg et al., 2018, 2020; Stuart, 2007). By discussing various aspects of the program’s design and implementation, we have shown how program effectiveness can be better understood within the larger policy context. The study suggests that both design (e.g., the top-down approach) and system features (e.g., the national teacher crisis) hampered the potential impact of the extended school on track recommendations. We think that future studies can benefit from these insights, by not only considering program features within schools but also relevant processes outside schools.
Footnotes
Acknowledgements
I would like to thank the two anonymous reviewers for their useful feedback and the editor for the support during the review process. I am also grateful to all those involved in the NPRZ who were willing to discuss the program.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The research for this article was supported by an ODISSEI Microdata Access Grant (MAG).
Supplemental Material
Supplemental material for this article is available online.
Notes
Author biography
Gijs Custers is an Assistant Professor at the department of Law, Society, and Crime, at the Erasmus University Rotterdam. His work focuses on urban inequalities regarding social class, education, and social policy.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
