Abstract
Quantifying the impact of teaching quality on pupil learning, and understanding what teacher characteristics or practices are likely to improve student achievement, are pressing research questions in all countries. Empirical evidence also needs to be context specific since different education systems are likely to have different facilitators and barriers to good teaching. Existing evidence, largely from the US, suggests a number of strong research designs that enable researchers to model the impact of teaching on pupil achievement. However, operationalising these models in more resource-constrained contexts is challenging. In this paper we describe our attempt to model the impact of teachers and their practices on pupil achievement using the quantitative data generated for this research (household and school surveys with a teacher survey and an attempt to assess teacher knowledge). We describe the challenges when trying to implement this approach in the Indian and Pakistan context and the methodological adaptions needed. We reflect on the strengths and weaknesses of our approach. We note that existing literature tends to provide relatively minimal descriptions of the specific research design and instruments used to model teacher quality and hence provides a partial picture of methodological considerations. In this paper we contribute a detailed and frank account of developing a workable research design and the challenges we encountered.
Introduction
As part of the Sustainable Development Goal agenda, governments across the world have committed to providing inclusive and equitable high-quality education for all children (United Nations, 2015). Yet, despite great progress in getting more children into school over the past decade, education quality remains a serious concern. In many countries, children – particularly those from the most disadvantaged backgrounds – are still experiencing poor quality education which limits their chances of fulfilling their learning potential (UNESCO, 2014; World Bank, 2018). Given this, there is an urgent need for robust research to understand why learning levels are so low and more specifically why learning is so unevenly distributed within countries. Existing literature points to teachers as the most crucial institutional input into a child’s educational experience (Hanushek and Woessman, 2011; Nonoyama-Tarumi and Willms, 2014). Hence unequal access to good teaching is likely to be a key route by which inequalities in learning arise.
In order to identify the extent to which the quality of teaching is at the heart of improving learning equitably, and what features of teaching are most effective, we need to be sure of the robustness of the evidence base. Specifically, we need to consider the methodological concerns and context specific challenges faced when measuring quality education in low and lower-middle income countries.
Existing literature on the topic of teacher quality in low and lower middle income contexts (as well as more generally in the field of education) often provides cursory information on the instruments and methods used, thus providing a partial picture of the methodological considerations. In this paper we summarise previous research that has aimed to generate a picture of quality teaching through adopting quantitative methods. 1 We make a contribution to the literature by describing the conceptual, methodological and practical challenges of measuring teaching quality in these contexts.
In addition to the existing literature on the topic, this paper draws on experience from our Economic and Social Research Council/Department for International Development funded Teaching Effectively All Children (TEACh) project which focuses on the role of teaching quality in explaining low levels of learning in India and Pakistan. 2 Particular issues raised in previous literature with regard to teaching quality in India and Pakistan include: teachers who lack basic subject knowledge themselves; inadequate teacher training; insufficient focus on children from poor backgrounds; weak incentives and poor governance; low motivation; and high levels of teacher absenteeism (Bau and Das, 2017; Bennell and Akyeampong, 2007; Kingdon et al., 2014; Moon, 2013; UNESCO, 2014; Westbrook et al., 2013). What is less understood is the reasons for variation in teaching quality across India and Pakistan and the extent to which it explains some of the variation we see in pupil achievement. To fill this gap in the literature, our research seeks to measure teaching quality in two locations in India and Pakistan (the state of Haryana and province of Punjab respectively) and the extent to which marginalised groups of students access high quality teaching in these contexts.
We start by detailing how teacher effectiveness has been measured in the existing literature before describing the specific country context for the research and the challenges this poses. We then describe how we measure a) student learning as one measure of teacher effectiveness and b) teacher characteristics and behaviours as another. An important aspect of our approach is that we also attempt to capture teacher knowledge directly. We discuss the implications of our approach for research in this field and conclude.
Measuring teacher effectiveness
The past few decades have seen a burgeoning interest in identifying which factors – individual, family and school – are the most critical determinants for raising pupil attainment. As a result, a robust and strong evidence base from developed countries has now emerged that indicates teacher effectiveness to be a critical determinant of student achievement. Hanushek (2011: 467) goes as far as to say that no other attribute of schools has as much influence on student achievement as teacher effectiveness. Estimates from the United States suggest that a difference of one standard deviation in teacher effectiveness can yield between 10 to 20% of a standard deviation change in pupil achievement (Hanushek, 2011; Hanushek and Woessmann, 2011). Whether these estimates apply in low and lower-middle income country contexts is, of course, an empirical question. In Table 1, we summarise relevant studies of teacher effectiveness which use quantitative methods and have been conducted in the context in which we are working, namely India and Pakistan. These have been selected as the focus as they are both contexts where studies recognise the persistence of low levels in basic competencies of literacy and numeracy and wide inequalities in learning. The existing evidence seems to confirm the potential importance of teacher quality for student achievement (Azam and Kingdon, 2015; Bau and Das, 2017; De Talancé, 2017).
Studies examining teacher effectiveness and its relationship with student outcomes, India and Pakistan.
Despite agreement on the important role of teachers, measuring teacher quality is notoriously difficult. This is due to the fact that teacher quality encompasses an immense range of competencies, skills, motivations, behaviours and attitudes, many of which cannot be easily observed. Additionally, the interaction of these factors and the nature of the relationships that teachers maintain with their students remain invisible to those outside the classroom in which the teaching is taking place. Yet identifying the characteristics of a ‘good quality teacher’ is not only important for recruitment but also for developing the skills and competencies of those already within the workforce. In addition to this, the key empirical challenge in identifying a ‘teacher effect’ is due to the potential non-random matching of students to school and, within schools, to particular teachers. These empirical problems plague researchers from more developed countries as well. However, better availability of linked and administrative data sets allows researchers in developed countries to overcome these challenges in a far more robust manner than those from developing countries.
In a narrow sense, teacher quality can be defined as a ‘teacher’s ability to produce growth in student achievement’ (Eide et al., 2004). Even with an arguably very narrow definition of teacher effectiveness which focuses exclusively on test score gains, identification of the specific characteristics and practices of teachers that contribute most towards improving pupil achievement has proved problematic. Whether focusing on test score outcomes or wider outcomes from education, the methodological challenge of isolating the impact of the teacher per se is considerable.
Teacher quality and student achievement
The quantitative literature that has attempted to measure teacher quality has generally adopted one of three approaches. The first approach calculates teacher quality as the value added in test scores that can be attributed to a specific teacher. These estimates of teacher effectiveness have generally been derived from what is known as a teacher fixed effect model of student achievement gain (sometimes described as a fixed effect value-added model). This method requires information on the test score gains of different students taught by the same teacher. The resulting ‘total teacher effect’ enables the researcher to define a good teacher as one who consistently produces high achievement gains for pupils. This approach, in estimating the total effect of the teacher on test scores, does not require identification of specific teacher characteristics that generate good student learning. In developing country contexts, where data constraints can prevent very sophisticated analyses, value-added measures are usually estimated by assessing children at the beginning and the end of a specified grade. The TEACh project has adopted a similar approach. Whilst value-added models provide one methodological approach to measuring teacher effectiveness, they are not free from criticism: Andrabi et al. (2011) argue that restricted value-added estimates that are based on longitudinal data with unobserved student heterogeneity and measurement error may sometimes be worse than naïve cross-sectional ordinary least squares estimates.
The second approach uses an educational production function which relates measurable teacher characteristics, and a range of other school and family inputs (such as resource inputs), to pupil achievement. For example, studies have examined the correlation between student test scores and the number of years of experience that a teacher has, using this approach. A third approach measures the impact of an intervention on teacher training or inputs, for example.
In the Pakistan and Indian context, such models are becoming more widely used (see, for example, studies in Table 1: Azam and Kingdon, 2015; Bau and Das, 2017; De Talancé, 2017). The results from these kinds of model suggest that teachers vary substantially in their effectiveness. However, these studies confirm that many of the standard teacher characteristics such as certification, training and experience level do not seem to matter much to pupil achievement (e.g. Bau and Das, 2017; De Talancé, 2017). As these resumé characteristics often underpin teacher compensation policies, these findings are controversial and widely debated. This implies that whilst we know that teachers matter to student achievement, identifying what makes for a good teacher, a priori, is difficult.
More recent research in the developing world has tried to identify the underpinning characteristics of teaching practice that makes one teacher more effective than another. Studies focusing on India and Pakistan have, for example, found that factors such as the social distance between the teacher and the students, teachers’ opinions, attitudes and perceptions, and indeed classroom practices, matter more for student achievement than teachers’ observed resumé characteristics (Aslam and Kingdon, 2011; Rawal and Kingdon, 2010; Rawal et al., 2013). A number of studies have found that the nature of the contract that the teacher is on and whether the teacher was locally recruited also predicts teacher quality (Atherton and Kingdon, 2010; De Talancé 2017; Goyal and Pandey, 2013; Muralidharan and Sundararaman 2011, 2013). Teachers on contracts that are more closely linked to performance appear to be more effective and, more specifically, lower paid less well-trained contract teachers appear no less effective than more qualified, better trained and higher paid civil service teachers on permanent contracts (e.g. Muralidharan and Sundararaman, 2013).
A methodological issue that is often ignored is that pupils are not randomly allocated to teachers. In some contexts, the low achieving and most challenging students might have fewer effective teachers, particularly if more senior teachers prefer to teach higher achieving pupils. Equally, the most talented teachers may end up in the most socio-economically advantaged schools, teaching students who are likely to make more rapid academic progress, thereby leaving less effective teachers teaching pupils in more disadvantaged schools. A teacher could be incorrectly categorised as highly ‘effective’ merely because s/he happens to be assigned to a group of pupils who are more motivated or able. This teacher could also be more effective at ‘getting assigned’ certain types of students to further exacerbate this bias. Additionally, as many teachers tend to be assigned teaching based on their previous performance as a teacher, this initial non-random sorting could lead to further non-random sorting into classrooms. This makes it difficult to separate the effect of the teacher on pupils’ test scores from the effect of pupils’ often unobserved characteristics (e.g. socio-economic disadvantage or the nature of their behavioural problems). The literature has attempted to allow for these ‘selection effects’ using a variety of methodologies such as instrumental variables, panel data and randomised experiments (e.g. Glewwe and Kremer, 2006; Hanushek et al., 2005; Kingdon and Teal, 2010; Lavy, 2002; see also Table 1). However, most estimates of teacher effectiveness rely simply on allowing for differences in students’ prior level of achievement as a way of dealing with this issue. This assumption has some empirical support in the literature though, since non-randomised studies often provide similar estimates of teacher effects as randomised studies – as long as prior student achievement, student characteristics and teacher variables are sufficiently controlled for (Burgess, 2015).
In summary, an increasing body of global evidence on teacher effectiveness has found that teachers are differentially effective in producing student achievement; what they do and how they do it is of crucial importance for student outcomes. However, this cannot be predicted by teachers’ formal qualifications or other aspects of observable teacher characteristics (Bruns et al., 2016). By contrast, the contractual arrangements for teachers and measures of teacher effort do predict student achievement in some contexts.
Classroom observation approaches to measuring teacher quality
Given the difficulty in identifying the specific (a priori) characteristics of effective teachers, another branch of research has gone in a somewhat different direction. This work has focused on observing teachers within their classrooms to unpack what happens within the black box of teaching. These classroom observations have then been linked to student learning and other outcomes to provide evidence on teacher quality.
A largescale and renowned study in the US – Measures of Effective Teaching – found that by observing teachers within their classrooms and using a variety of observation rubrics, it is possible to estimate differential teacher effectiveness (Kane and Staiger, 2012). 3 The project compared five different approaches to classroom observations (including Classroom Assessment Scoring System – CLASS – discussed in the next paragraph) and found all five observation instruments to be positively associated with student achievement gains. Further, combining these teacher observation measures with data on student achievement gains improves the predictability of these measures in identifying more effective teachers. This suggests that researchers seeking to identify high quality teaching, and indeed evaluation systems trying to do the same, might need to combine teacher observation, teacher self-report data and student outcomes, rather than using observation or test score measures alone.
A similar instrument is the CLASS instrument which suggests that students taught by more ‘effective’ teachers (as measured by CLASS scores) have higher learning gains and better outcomes in relation to behaviour and self-regulation (Howes et al., 2008; Grossman et al., 2010). In the developing country context, such instruments have been used to a much more limited extent. However, Araujo et al. (2016) uses the CLASS instrument in Ecuador to estimate teacher effectiveness. Based on a random assignment of students to teachers, Araujo et al. find that an increase in teachers’ classroom quality (measured using the CLASS observation instrument) resulted in higher student test scores in language, maths, and executive function.
Bruns et al. (2016), comparing CLASS with another widely-used observation instrument (Stallings) find that when using them within the same developing country context (Chile), both instruments produce consistent assessments of teachers’ effectiveness in managing their classrooms and ensuring student learning.
A review of teacher observation literature in low and middle income countries has sought to identify opportunities to systematise observations to monitor quality at both the school and system levels (Pouezevara et al., 2016). The authors conclude that, of the admittedly limited research that attempts to relate observational measures of teaching practice to student learning outcomes, most studies have demonstrated a positive and significant association between them. However, they note that few studies actually attempt to unpack the complex relationship between specific classroom practices and student learning. They also highlight the technical and practical issues that constrain researchers from conducting classroom observations at scale as a means to obtain information on teaching practice. In addition to these limitations, they have also been found to be costly to administer.
The use of classroom observations to examine teacher effectiveness in the specific contexts of India and Pakistan has been very limited. A 2014 study in Madhya Pradesh and Tamil Nadu, India, found correlations between teacher classroom practices and student outcomes and provided some insights for policy in areas of pre-service and in-service training, as well as pedagogy (Dundar et al., 2014). Specifically, the study found that, whilst teachers spend a substantial time on instructional activities, this was not reflected in positive student learning outcomes. Their observations also suggested that teachers were not addressing the different learning and ability levels of students in the classroom, which is particularly important when considering teaching quality for children from disadvantaged backgrounds. The SchoolTells initiative in India and Pakistan also suggested that classroom observations can provide useful insights into the relationship between teachers’ classroom practices, teacher effectiveness and student outcomes (Rawal and Kingdon, 2010; Rawal et al., 2013). Recent research using classroom observations in Indian classrooms suggests that the majority of classroom interactions are characterised by teacher-centred activities and rote learning (e.g. Sankar and Linden, 2014). While these studies pay attention to the overall proportion of children engaged in the learning activities, as with other studies, they do not examine patterns in (dis)engagement or the inclusion and opportunities for disadvantaged children more specifically.
Similarly, Singh and Sarkar (2012) examine the effect of teaching quality on children’s test scores in India and use classroom observations in their analysis. They find that whilst standard characteristics of teachers like experience, gender, content knowledge and subject specialisation do not have any significant influence on children’s learning outcomes, teaching practices such as regularity in checking homework and factors such as the proximity of the teacher’s residence to the school and attitude towards the children, as well as teachers’ perceptions of their schools, have emerged as important determinants of students’ test scores. The authors state that ‘it is what the teacher “believes and does” in the classroom that has the maximum impact on children’s learning outcomes’ (ii). However, it must be noted that whilst this research did involve classroom observations, limited data on actual classroom practices was collected. The only two areas examined related to teaching style (lecture-style, group work, etc.) and whether the teacher provided feedback through marking: the study was less informative on which other classroom practices and teaching processes are more effective.
The move towards the use of classroom observation data in quantitative research on teacher quality is welcome. However, there are important limitations to their use as currently designed. Disadvantages of existing structured observations that have been identified include low levels of inter-rater reliability, an issue that necessitates observations being carried out by multiple assessors in order to reduce biased results (Darling and Hammond, 2012; Kane and Staiger, 2012). This leads to significant financial and time costs, and even then, their reliability is not guaranteed. Sampling error, which denotes variation between different observations, and measurement error, which refers to inaccuracy relating to what is measured in teaching, are further problems which can arise. Validity concerns may also occur when evaluators with different agendas focus on different factors in their evaluation (Cortez Ochoa et al., 2018).
A further problem relates to the limitations of the data that are collected through the existing instruments. The Stallings method is limited to four categories (academic activities/instruction; classroom management; teacher off-task; students off-task) with a predefined set of indicators within each category for determining teacher performance. Though this feature is efficient in the sense that it is curriculum and language neutral and minimal observer discretion is required (Bruns et al., 2016), much is lost in terms of contextual difference between schools, subjects and teachers’ pedagogical approaches. Whilst the suitability of this method for larger scale studies has been suggested, it has been found ‘too crude to be used for individual teacher performance evaluations’, as well as ineffective for capturing ‘teachers’ ability to deliver high quality instruction and support students emotionally’ (Bruns et al., 2016: 28). Moreover, of particular relevance to our study, existing approaches do not sufficiently identify how teachers tackle diversity within their classrooms.
Another potential problem is that in low and lower-middle income countries, classroom observation is a fairly new phenomenon. In Punjab, Pakistan, teachers are used to being observed by district teacher educators. However, where teachers are not used to being observed or where observation is linked to assessment of teacher performance, this may pose difficulties for research. Initial concerns about observer bias can be mitigated to some degree by assuring teachers that the research will in no way impact their assessments or rankings. A final practical issue on observational approaches is that the average student teacher ratio in schools can be high in these contexts. In some of the classrooms in our sample, it reached 50 to 1. Classrooms of this kind tend to be crowded and often loud, making it challenging for classroom observations that aim to document time spent on specific activities at any one point in time. 4
Our approach to measuring teaching quality as described in this paper builds on this existing literature but, for the reasons outlined above, does not include structured classroom observation as part of the quantitative data analysis. 5 Before discussing the individual elements of our proposed approach, we start by documenting the challenges of the context in which we are researching.
The context
There are some key contextual factors that need to be highlighted as they impact the design of our research. First, it is well documented that learning levels are low in both India and Pakistan. Further, the extent of the variation in children’s learning and achievement is considerable. According to recent data from annual status of education reports (ASER) in rural Pakistan, around half of poor girls are not in school and many of these girls have never attended school. By contrast, the vast majority of rich boys and girls are in school. By contrast, most children in India are in school, regardless of their background (Alcott and Rose, 2015). While patterns of educational access vary between India and Pakistan, in both there are wide disparities in achievement levels across students with different background characteristics. ASER data show large differences in achievement levels between richer and poorer states in both rural India and Pakistan, and even within the richest states, poor students perform worse on assessments than their richer counterparts (Alcott and Rose, 2015; Alcott and Rose, 2017; Aslam, 2012). The main implication of this for research on teaching quality and learning in these contexts is that learning assessments need to be able to identify both very low levels of achievement and also accommodate considerable variation.
The scale and diversity of the education systems in many countries present issues with respect to measuring teaching quality. For example, in India and Pakistan the scale of the public-sector teaching workforce is enormous. The Punjab state in Pakistan employs close to 135,000 teachers in the government sector, and has added another 80,000 in 2017 alone. There is also considerable variation in the conditions of work facing different teachers in different parts of both Pakistan and India, as well as variation in pay. For example, government teachers in Pakistan can be paid anywhere between PKR 9860 (GBP 98) to PKR 109,000 (GBP 1090). Designing the research to adequately capture the heterogeneity of the teacher workforce is essential.
In terms of the diversity of schools, many low and lower-middle income countries have a hybrid education system, with a large private sector (predominantly charging relatively low fees) existing alongside state-run schools. This is true in both India and Pakistan. However, despite the fast and significant growth in private schools, the government sector remains the main provider in both countries. Further, government schools in both countries also cater to the most socio-economically disadvantaged populations. Since our research is assessing the teaching experienced by low socio-economic status children, a focus on government schools is essential.
Teaching quality is known to be poor in both government and private schools in both India and Pakistan as evidenced by research investigating issues such as teacher competence, subject-knowledge, efficacy of training, recruitment and deployment, motivation and absenteeism (Dundar et al., 2014; Ilm Ideas, 2014, Andrabi et al., 2015). In an effort to improve teaching in government schools, Pakistan raised the minimum academic qualification requirements for teachers, and instituted continuous professional development programmes. Despite a spate of reforms, the policy perception is that teaching in government schools in India and Pakistan has not improved. Robust evidence on this is limited. This too was a pressing reason to have a focus in our research on government schools. However, this does mean that we need to understand the school choices made by different students, since this will potentially cause selection bias in our models if those choosing to attend the local government school are very different from students attending other schools. To understand this choice, we needed to collect data from households, irrespective of which particular school the child is enrolled in.
The contexts also present some very specific issues that affect the research design. To estimate a teacher fixed effects model, it is necessary to match the achievement gains of students to one particular teacher. In practice, this proves quite difficult to implement as a design in countries such as India and Pakistan. Information from schools suggests that the same teacher does not always teach a particular class. A class may have multiple teachers at one point in time, schools may have a high turnover of teachers through the year and different teachers may teach different subjects even in primary school. For our study, we managed to get schools to confirm teaching assignments and teacher in-year moves (to different classes or schools). This enabled us to be clearer about whether we could attribute the learning gain to a particular teacher. Over and above the research challenges this poses, there is, of course, a larger policy question arising from this issue. If the government is to link teacher incentives to student performance, reliability of data on teacher class and school assignment needs to be ensured.
More generally, the information held on students and schools in low and lower-middle income countries such as India and Pakistan can be highly variable in quantity and quality. A key feature of our study design is that we need to collect data from both schools on student learning and from households on the socio-economic circumstances of the child. We then need to bring these two sources of data together. With patchy class lists and incomplete data from schools, this requires some creative solutions for matching children in schools to those in our household survey. Even when full names are available, matching could still be challenging. It proved important to use as many family identifiers as possible, including father’s name, mother’s name, number of siblings, parents’ occupation. Further, in the areas we were working in Pakistan, teachers tended to know communities. In our Indian districts, teachers did not always live in the communities and knew less about their students and their families. All this affects the quality of the data produced, in common with other large-scale approaches to data collection of this kind.
Measuring student learning
As has been said, one method of measuring teacher quality is to assess the learning gain of students. 6 To do this we measure the learning gain of all children, aged 8 to 12 within the households we surveyed in each village. We chose the age range of 8 to 12 years as we would expect these children to have had some exposure to schooling. In both countries, this age range relates to school grades 3 to 5, assuming children start on time and progress smoothly through the system.
Assessments of the children’s learning in mathematics and language (Urdu and Hindi) used measures which had been used previously in the Indian and Pakistani contexts for a similar age range to those in our study. Specifically, we used instruments based on ASER and Young Lives numeracy and literacy tests. Our team had experience of using secondary data from these sources for other purposes so were familiar with the characteristics of the tests. Using both assessments allowed us to collect discrete (ASER) and continuous (Young Lives) data that could be used for different purposes in our analysis. 7 Importantly, the combination of measures was intended to ensure a distribution of learning levels for different groups of children, recognising that ASER might focus in particular on lower levels of learning hence the need to complement this with some more challenging questions from the Young Lives instrument. 8 We were also particularly keen to include children from diverse backgrounds, including those with disabilities, as far as we could in the assessment process. The combination of ASER, which is administered verbally, and Young Lives, which is administered both verbally and using paper and pen, facilitated this.
Children were first tested in their own households. Specifically, we surveyed around 1000 households in 30 villages in each of Haryana, India, and in Punjab, Pakistan. This resulted in a sample of 1241 and 1600 children from households in India and Pakistan respectively in the household, respectively, for whom we have assessment data (Figure 1). The collection of data from households was important to ensure we had information on all children, regardless of whether they were in school or not, and it also enabled us to collect detailed information on the households, including on measures of wealth to assess a child’s socioeconomic background and other characteristics, 9 as well as the attitudes of mothers towards their children’s schooling experience.

Baseline sample of children assessed in households and schools in Haryana, India, and Punjab, Pakistan.
Given our particular interest of identifying the extent to which learning gains can be associated with particular school and teacher characteristics, children were also assessed using the same tools within selected government schools in each village. We have assessment data for approximately 2071 and 2125 children in schools in India and Pakistan respectively. All children in grades 3 to 5 took the Young Lives paper and pen test, with a sample of these children (15 per school) also completing the ASER verbal assessment test. This provides us with measures of learning gain for a large sample of students in grades 3 to 5. We also have rich household and school information for some of the children. Despite the challenges, it was possible to match more than child’s socioeconomic background, 400 of the children from the household survey with the school survey in both countries, and hence we have particularly rich information on these children, their households, their schools and their particular teachers. 10
Amongst those in school, we tested the children using the same assessments around ten months apart, namely at the beginning and end of the school year. In Pakistan, 90% of the children from the baseline school survey were identified at the end line school survey, and were present in school. Approximately 150 children were absent, and 202 had dropped out over the course of the year (recorded as not enrolled anymore).
We made very small adjustments to the tests in the second round to avoid the possibility of copying or rote learning responses from the original instruments. These two rounds of data provided us with information on test score gains from these assessments over the school year.
Measuring the characteristics and behaviours of teachers
Teacher survey
To capture information about the teaching that each child in our sample experienced, detailed teacher questionnaires were administered at baseline and end line in our sample schools. We illustrate our approach in this section with some findings from Pakistan. In Pakistan, 190 teachers in total were surveyed at baseline. A longer questionnaire was then administered during the end line to all 96 teachers who were teaching grades 3, 4 and 5, since this was the age group for which we had student test score data. A questionnaire was also administered to head teachers who were doing some teaching in grades 3 to 5 (37 at baseline, 33 at end line). Overall, about 75% of the teachers and head teachers in the baseline could be identified easily in the end line survey. The remaining 25% were new appointments. Contrary to expectations from previous research, teacher absenteeism was not a challenge in our Pakistani sample at the time of the survey: 8 of 190 teachers were absent on the first day during the baseline surveys at the beginning of the school year, but could be interviewed when the teams returned on the second day. In the end line survey conducted towards the end of the school year, 9 of 146 were absent, but found on the second day. 11
Teachers were asked about their characteristics, attitudes and practices, recognising that the previous literature has clearly indicated that teachers’ experience and qualifications alone are insufficient to explain variation in students’ learning outcomes. The survey we developed was therefore designed to capture measures of: teachers’ expectations and beliefs (about the students and their own competencies); aspects of the school environment; the quality and nature of leadership in the school; resource levels and teacher satisfaction with their jobs. We drew on existing instruments such as those used by SchoolTells, Young Lives and Learning While You Teach 12 which have been used in the Indian and Pakistani contexts to look at school and teacher effectiveness. Our instrument focused on aspects that relate to teaching of children from diverse backgrounds. Appropriate questions were not readily available in existing tools and we developed questions based on our experience of working in these contexts. A summary of the contents of the survey is found in Table 2. We do not discuss each question included in the survey here but we do highlight the reasoning behind key elements. 13
Summary of teacher questionnaires.
We included a set of questions on teachers’ education (initial education, pre-service training, etc.). In the Indian and Pakistani contexts, these questions are particularly important given the diversity of teacher education routes, and the policy discourse on under qualified teachers. In our sample, 80% of teachers in Pakistan had bachelor degrees or above. There are, however, a range of pre-service degrees that teachers can attain to qualify for teaching in both countries. In Pakistan, for example, there is a requirement for teachers to have a Bachelor of Education degree but the course can range from one year to four (about one third of the sample had the one-year degree and 60% had acquired their pre-service qualification through distance learning). Further, a number of private and government universities and colleges are officially recognised as certificate awarding bodies for teachers in Punjab, and the perception is that they vary in quality. Securing granular information about the nature of the person’s teacher training was important, not least to determine whether there is indeed variation in student learning across teachers who have different types of training.
Given our focus on equity, we collected detailed information on the pre-service training teachers have received on teaching low achieving learners, children with disabilities, and children in multi-lingual and multi-grade classrooms. A very large proportion of primary schools in India and Pakistan are multi-grade environments (Aslam and Rawal, 2015), and the preparedness of teachers to handle this is likely to make a significant difference to the quality of instruction.
Teacher experience is also important in this context. This is partly because the minimum standards for teachers have changed dramatically in recent years. As discussed, Pakistan changed the minimum educational requirement from a high school certificate to a bachelor degree in 2004. Teachers hired before and after this change are therefore quite different in educational terms and hence combining information on years of experience and the nature of the training received is likely to be important.
On teacher attitudes, we included questions from established instruments. We used the Attitudes to Inclusion scale (AIS) (Sharma and Jacobs, 2016), which measures teachers’ attitudes to student diversity with a particular focus on children with special educational needs. It defines inclusion as educating students with diverse learning needs in classrooms alongside their peers, with necessary additional support. Standard AIS questions were adapted for the Pakistani and Indian context. Our instrument elicited teachers’ beliefs about children’s ability to learn and how that related to poverty, gender and disability. We also asked teachers about their own abilities to handle challenging situations. Questions on job satisfaction were also included, particularly satisfaction with extrinsic and intrinsic factors, including salary, school environment and infrastructure, community relations and relations with the school leader and colleagues.
The survey instrument also includes a number of questions about teachers’ practices and strategies. These include teachers’ self-report on issues ranging from what their typical mathematics or language lesson might contain, through to what their typical day looks like and the extent to which children with special needs and low achievement are identified and supported. On the issue of supporting specific groups of children, teachers were presented with a range of different practices and asked to select those that they use in their classrooms. Given that over-testing has emerged as a concern in both India and Pakistan, teachers were also asked about the frequency of assessment and how it is used to improve student learning.
Whilst we have drawn on existing instruments and literature to develop our survey, we are mindful that simply asking teachers about their practices may be insufficient. We therefore also collected qualitative data on a sub set of schools to triangulate findings and present a more in-depth picture of the processes within classrooms. In the qualitative component, we focused on (a) teacher practices in the classroom and (b) teachers’ beliefs and motivation with respect to teaching children from diverse backgrounds, paying particular attention to children with disabilities. We felt this was important given that these children are often most excluded during the teaching process, but can be difficult to capture through quantitative surveys given very small sample sizes (although we did attempt to do so by using established approaches to identify them in our household survey). In order to capture data on teacher practices, we employed classroom observations, while insights into their motivations and beliefs were collected through semi-structured interviews.
Measuring teacher knowledge
Thus far, we have described our use of student test scores, a teacher report survey and classroom observations to try to determine the quality of the teaching in the classrooms in our sample. However, one of the most direct ways of measuring a teacher’s skill level (and hence potentially one aspect of her quality) is through assessing the teacher’s own knowledge directly, i.e. in a test. Glewwe et al. (2011) found in their review of the literature that whilst very few school and teacher variables have significant effects on learning outcomes, one of the teacher variables that does positively and significantly determine student learning is a teacher’s knowledge of the subject they teach. Teacher competence when proxied by teacher test scores (rather than by their educational qualifications and experience) strongly ‘supports the common-sense notion that teachers who better understand the subjects they teach are better at improving their students learning’ (Glewwe et al., 2011: 22). More recent literature also supports this view (Altinok, 2013; Chudgar, 2013; Metzler and Woessmann, 2012; Mulkeen, 2013).
We therefore collected data that would allow us to identify the correlation between good teacher knowledge in a specific subject and students’ academic performance in the same subject. Because of the nature of the sample (with many rural schools having the same teacher for all subjects), this limits the possibility that parents have chosen a specific teacher for their child based on this teacher’s specific knowledge in any given subject, and reduces the chance of bias arising from non-random classroom assignment more generally. This data is potentially important not least because it is novel in the contexts in which we are working and may present ways forward for policymakers in terms of implementing interventions to improve teacher subject knowledge. In an ideal situation, newly recruited teachers should enter the profession with adequate knowledge of the subjects they are intending to teach. However, previous research has suggested this may not be true in India and Pakistan for all teachers (Dundar et al., 2014). Therefore, evidence on the value of subject knowledge for student learning, and hence the potential for in-service training to play a crucial role in filling such gaps, is especially useful.
Although there are clear advantages to collecting information on teachers’ subject knowledge, doing this in contexts such as India and Pakistan is particularly challenging. For example, in Punjab, Pakistan, while teachers are tested monthly by provincial authorities and tests are not high stakes (in that salaries and promotions are not linked with teacher knowledge tests), teachers are very reluctant to take tests. Refusal rates are high when teachers are asked directly to ‘take exams’ or answer questions. An alternative approach is to gauge teachers’ subject knowledge indirectly by observing them undertaking a common teaching task, namely marking and correcting student tests. Since marking and correcting work is part of the job, teachers respond far more positively to requests to do this, and from observing their marking and corrections we can gauge their own level of knowledge.
For our study, only those teachers who reported teaching the subject in question were asked to mark tests. Each teacher was provided with one student test for the subject they were teaching. If a teacher taught both mathematics and Urdu/Hindi, then they were asked to mark one of each. They were asked to mark tests in the following way: correct answers were simply checked; for answers they would mark as incorrect, teachers were required to provide workings and an answer. All students in grades 3, 4 and 5 were administered the same test, so regardless of grade, teachers marked the same sample test. The sample test selected for the teacher to mark was the one with the most questions attempted and the most wrong answers.
Using a similar approach, SchoolTELLS assessments of teachers in literacy and numeracy in the states of Bihar and Uttar Pradesh, India (Kingdon et al., 2008) showed extremely low levels of teacher competence with teachers scoring 47.2% in maths and 64.9% in language assessments at grade 5 curriculum level. In Pakistan, schoolTELLS data (Atherton and Kingdon, 2010) was collected in Punjab with teachers scoring 69.5% and 73.9% in language and maths respectively, illustrating that teachers’ content knowledge was higher than their pupils’, as one would expect (Dundar et al. 2014). The challenge is that these (and our) studies indicate that it is important to identify if teachers have mastered their subject knowledge, as this factor can be associated with low pupil attainment. If their subject knowledge is poor, it is important to identify the reasons for this and what can be done to address it for policy purposes. In many cases, teachers are themselves affected by the low quality of the education system from which they have graduated, so it is key to find ways to tackle this potentially vicious cycle of low achievement.
Discussion and conclusions
The quantitative data collection approaches outlined in this paper are intended to enable us to identify the within and across school differences in the quality of teaching in selected schools in India and Pakistan. The data will enable estimation of the value added to pupils’ test scores by different schools, and we can then relate these estimates to the characteristics of teachers. Further, the teacher survey and classroom observations should enable us to explore how inequalities in learning across different types of student relate to teacher characteristics, attitudes and behaviours. These data are therefore important from a policy perspective since they can inform thinking on the causes of low levels of learning among some students and answer questions such as: is the variation in pupil achievement largely due to higher achieving children being clustered in “good” schools, or are there considerable differences in achievement levels across pupils within the same school? In this article, we seek to make perhaps an unusual contribution to the literature by providing a detailed account of the reasoning behind and practical difficulties encountered when trying to research the impact of teaching quality on student learning. Our aim is not only to illuminate our own study but also to provide a resource which other researchers might draw upon when embarking on this kind of research.
We have documented the nature of the data collection needed to measure teacher quality effectively at scale. What can this account tell us about how to undertake this kind of research in the future? First, the sheer scale of data collection means that inevitably small and lower cost projects will tend to be missing parts of the story. For example, studies may have teacher survey information but lack test score data or vice versa. We also note that whilst value added measures of student learning are central to any serious investigation of teaching quality, they are not sufficient to provide insights into the sources of variation in quality that are observed. Hence collecting rich information on teachers’ characteristics, attitudes and behaviours is essential if we are to develop our understanding of what factors predict high pupil value added and variation in teaching quality, and hence how we can actually improve teaching.
In the contexts in which we are working, there is also a distinct lack of high-quality administrative data. This too increases the data collection requirements for individual studies and whilst the data collected from classroom observations and semi-structured interviews is incredibly rich, we do need to recognise the high cost of such data collection methods. If studies also always develop their own survey instruments which are not necessarily comparable with those used in other studies, there is no possibility of building a cumulative body of evidence and combining studies to get a better picture of what is happening in schools over time. One suggestion therefore is that studies embarking on this kind of work should endeavour to use similar instruments to those that have gone before them which include items that can be compared (and for this reason we are publishing both our protocols and our data). In that way, an evidence base can be built from comparable measures used over time.
Footnotes
Acknowledgements
Comments made by participants at the 2017 United Kingdom Forum for International Education and Training (UKFIET) conference improved the paper immeasurably. We would also like to thank Faisal Bari, Anuradha De, Meera Samson, Nidhi Singal, and other members of the Teaching Effectively All Children (TEACh) team for their invaluable contributions to debates that have contributed to the arguments in this paper.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Economic and Social Research Council and the Department for International Development under the Raising Learning Outcomes programme (grant number ES/M005445/1).
