Abstract
Not knowing the population size is a common problem in data-limited contexts. Drawing on work in Sierra Leone, this short take outlines a four-step solution to this problem: (1) estimate the population size using expert interviews; (2) verify estimates using interviews with participants sampled; (3) triangulate using secondary data; and (4) reconfirm using focus group discussions.
Introduction
Good survey design includes probability sampling so that results can be generalized to a population, but this requires a sampling frame from which the sample can be drawn—a challenge in many developing countries with weak information systems and data management practices (Harris 2022a; Lupu and Michelitch 2018). Drawing on research conducted in Sierra Leone in 2017, I outline a mixed method approach that combines interviews, secondary administrative data, and focus group discussions to generate and verify an estimate of the population size.
The Research Context
The research project sought to understand the effects of a large and ever-present development sector on the labor market of aid-dependent low-income countries. Sierra Leone—a small, low-income, West African developing country—was used as a case study. Research methods included a survey of university finalists before they entered the labor market (with imbedded lab-in-field experiments), focus group discussions with a subset of survey participants, and interviews with private sector employers, government representatives, non-governmental organizations and donor organizations (see Harris [2020] for a comprehensive presentation of the methodology).
Survey data collection took place at the main University of Sierra Leone campus in the capital, Freetown, in August 2017. The research used a stratified random sample to ensure representativeness across faculties of study and gender. The first step of the sampling process was to acquire a sampling frame listing all registered final-year students. Such a list was not available from the university registrar and departmental heads. Lack of centralized information systems has long been an issue in tertiary institutions in Sierra Leone (World Bank 2013:25) and remains a challenge. University officials possessed incomplete lists of registered students, as the costly burden of registration fees often deter official student registration. Students can attend lectures and seminars and write examinations without being officially registered. After writing final examinations, students settle outstanding fees to access their transcripts and degree certificates. Taking the list of officially registered students would have biased the estimated population size downward and the sample would have been biased, being drawn from students financially better off or holding scholarships.
Using Mixed-Methods to Estimate the Target Population: A Step-by-Step Guide
Estimate Using Expert Interviews Expert interviews were conducted with university officials, including the Registrar and departmental heads. These interviews explored implicit knowledge (Döringer 2021), and highlighted issues with obtaining a sampling frame. They also facilitated progress toward a solution. Departmental heads identified student representatives for each degree course, who routinely meet and interact with students. Course representatives were contacted and asked to estimate the number of students attending/actively participating in their respective courses. Initial numbers for sampling were calculated based on estimates from these interviews.
Verify Estimates Using Interviews with Participants Sampled Though informative, using only one student representative estimate per degree program for estimating the population could lead to inaccuracies. After a week of administering the survey and sampling using estimates from interviews (Step 1), three students who had been sampled were randomly identified and contacted from each degree course and asked to estimate the size of their degree cohort. If an estimate was significantly different from the other estimates (by a margin of 10%), a fourth student was called, and so forth. Table 1 provides estimates for a selection of degree courses. The first observation is that estimates were more precise for smaller cohorts like Accounting and Civil Engineering. For larger courses, like Political Science and Sociology/Social Work, there was a higher probability of there being an outlier; and almost all large courses required four estimates. The average of the three closest estimates was calculated per course and summed to approximate the overall population. These new estimates were used to sample in the second and final week of the survey.
Population Estimate by Degree Course.
Source: Author collected data.
*Highlighted cells indicate an outlier that was excluded from the calculation.
Triangulate Using Secondary Quantitative Data Secondary administrative data from the University of Sierra Leone’s graduation booklets, media reports, and a national survey of tertiary institutions were then used to compare and evaluate the validity of estimates. Comparing secondary and primary quantitative data for consistency has been used to overcome similar issues in other developing countries like Ethiopia (Harris 2022a). First, the estimate for each degree course was compared to graduation numbers from the university’s graduation booklets for the past five years, where the course still existed. These estimates could not be used as the main source for the research as the West African Ebola outbreak disrupted the 2014–2016 years. Second, estimates were triangulated with local media reports on graduation numbers, drawing on the three main daily newspapers. And third, the relative shares by faculty were compared to the then-most recent estimate (World Bank 2013). As shown in Table 2, shares were broadly similar; and where differences exist, they reflect reported declines in Science, Technology, Engineering, and Mathematics (STEM) courses, and an expansion in Social Sciences (Harris 2021). Sample Proportions Versus National Shares. Source: Modified from Harris (2022b:18).
Reconfirm Using Focus Group Discussions Using focus groups to triangulate in mixed methods projects is common in developing countries (Bamberger et al. 2010; Harris 2022a). Following the survey, a subset of 36 respondents (from the sample of 392) were invited to participate in focus group discussions for the main research topic. Twenty-nine of those invited attended. The focus groups included discussions on student experience (among other things) and related to this “adequacy of resources per student.” Here, participants were asked to estimate the size of their cohort. These final checks could not influence the sample selection but were used to highlight any significant deviations from estimates using Steps 1–3. None were found. Had there been notable differences, sampling weights would have been adjusted during the data analysis phase (Levy and Lemeshow 2013).
Conclusion
This short take has outlined a mixed methods approach for overcoming challenges with estimating the target population from which to sample. The method outlined can be applied when the population and sampling frames overlap (and the population need not be dissected into multiple sampling frames), secondary data for triangulation is available, and time/administrative costs allow qualitative data collection for triangulating.
Although data collection occurred in a university setting in Sierra Leone, lessons can be extended to similar research contexts. For example, an interesting research area in international development involves migrants. Here, interviews with government officials, aid workers, and migrant representatives can be used to obtain initial estimates, which can then be compared with secondary data (from the United Nations, say), and reconfirmed during qualitative fieldwork. Similar techniques can be used for research in rural areas, where there is high research interest (e.g., in health) but poor administrative data.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the International Growth Centre (39408).
