Sage Journals: Discover world-class research

Abstract

The paper presents methodology to generate experimental small area estimates (SAE) of poverty in four West African countries: Chad, Guinea, Mali, and Niger. Due to the absence of recent census data in the four countries, household level survey data are integrated with grid-level geospatial data, which are used as covariates in model-based estimation. Leveraging geospatial data enables reporting of poverty estimates more frequently at disaggregated administrative levels and makes estimation feasible in areas for which survey data are not available. The paper leverages the availability of a recent census in Burkina Faso for evaluation purposes. Estimates obtained with the same survey instruments and candidate geospatial covariates as the other four countries are compared against estimates obtained using recent census data and an empirical best predictor under a unit level model. For Burkina Faso, estimates obtained using geospatial data are highly correlated with the census-based ones in sampled areas but moderately correlated in non-sampled areas. The results demonstrate that in the absence of recent census data, small area estimation with publicly available geospatial covariates is feasible, can lead to large efficiency improvements compared to direct estimation, and improve the timeliness of small area estimates.

Keywords

headcount poverty rates nested error regression model geospatial covariates empirical best predictor

1. Introduction

This paper presents the methodology to generate experimental small area estimates (SAE) of poverty in four West African countries: Chad, Guinea, Mali, and Niger, as well as an evaluation exercise using data from another country in the region, Burkina Faso. SAE is a statistical method used to improve survey estimates by integrating survey data with geographically comprehensive auxiliary data (covariates) typically derived from census, administrative, remote sensing, or mobile phone data. Data integration is achieved with the use of statistical models to produce estimates at disaggregated geographic levels that are more accurate and precise than estimates that rely only on direct use of the survey data. More disaggregated estimates are key for a better understanding of how to target interventions for the poorest areas as well as for monitoring the impact of such interventions.

Table 1 illustrates the issues with obtaining poverty estimates at disaggregated geographic levels solely from survey data in the countries under study in this paper. The coefficient of variation estimated (cve) is a common measure used to judge the statistical precision of an estimate. Countries often adopt a maximum threshold for the mean or median cves of the estimates that can be reported, which in practice usually ranges from 0.15 to 0.3. For the countries of focus in this paper, the most recent survey estimates of poverty can be obtained from the 2018 round of the Enquête Harmonisée sur le Conditions de Vie des Ménages (EHCVM) which is available at the regional level for each country. This survey is the main output from the Harmonized Surveys on Household Living Conditions Program of the World Bank and the West Africa Economic and Monetary Union (WAEMU) Commission, which resulted in ten countries (eight WAEMU members plus Guinea and Chad) collecting household data and constructing household welfare using methodologies that were highly harmonized across all the countries and updated in line with international best practice.

Table 1.

Statistics of Poverty Estimates in the Focus Countries.

Country	Burkina Faso	Chad	Guinea	Mali	Niger
Year of most recent census	2018	2009	2014	2009	2012
Regions
Number of regions (population)	13	22	8	9	8
Number of regions (sample)	13	23	8	9	8
Median cve of sample estimates of headcount poverty rates for regions	0.114	0.123	0.069	0.085	0.079
Target area
Name of target area	Commune	Department	Subprefecture	Commune	Commune
Number of target areas (population)	351	112	343	704	266
Number of target areas (sample)	234	99	251	244	228
Median cve of sample estimates of headcount poverty rates for target areas	0.435	0.271	0.370	0.415	0.425

Note. Sample estimates are obtained from the subsample of households with valid GPS coordinates in the 2018 round of the EHCVM in each country. Variance estimates were obtained using an approximation to the variance of the Horvitz-Thompson estimator implemented in the R SAE package (Molina and Marhuenda 2015), which assumes that the second order inclusion probabilities are the product of first order inclusion probabilities. Use of the cves in this table is only for illustrating the unfeasibility of obtaining reliable direct estimates at the required level of geography. Departments in Chad are defined using an unofficial shapefile provided by the National Institute of Statistics, Economic, and Demographic Studies (INSEED).

Using EHCVM data, the median cve of regional direct estimates produced by the Horvitz-Thompson estimator ranges from 0.07 to 0.12, which is typically within the acceptable range for publication. However, when we examine direct estimates of the poverty rates for the set of target administrative areas, which is one or two levels below the region, the estimates are too imprecise to publish. The target geography in Chad is the latest unofficial definition of departments provided to use by the National Institute of Statistics, while in Guinea it is the Sous-prefectures, and in Mali and Niger it is Communes. At these levels, the median cve of the direct survey estimates reported in Table 1 exceeds the 0.3 threshold for each country except for Chad, where it is 0.27. In addition, not all the target areas are covered by the surveys, making direct estimation impossible for these areas. While this is the case in all countries, unsampled areas are particularly prevalent in Mali, where less than 40% of the target areas are in the sample.

Typically, small area estimation applications combine survey data with covariates from census (or other population) data. However, except for Burkina Faso, the last time a census was conducted in these countries was between 2009 and 2014. Using out-of-date census data to update small area estimates can lead to biased estimates for example, if the distribution of the census covariates used for prediction has changed over time. This is an issue that is often not discussed in applied poverty mapping work. Literature on approaches to update poverty estimates in the intercensal period includes Isidro et al. (2016), Koebe et al. (2022), and Arias-Salazar (2023). In this paper we rely on using contemporaneous geospatial covariates, as first illustrated by Battese et al. (1988; see also Nguyen 2012), to produce small area poverty estimates in countries that lack recent census data.

Advances in processing of geospatial data and the richness of geospatial data sources make their use as auxiliary information in small area models appealing. Newhouse (2024) summarizes recent literature on the use of geospatial data for small area estimation of wealth and poverty. Jean et al. (2016), Yeh et al. (2020), and Chi et al. (2022) show that satellite data are predictive of wealth indices. The present paper utilizes a method commonly used in small area estimation based on the empirical best predictor (EBP) under a nested error regression model (also referred to as mixed model; Molina and Rao 2010). When applied to predicting headcount poverty rates using geospatial covariates, this method has yielded predictions that are highly correlated with up-to-date census-based estimates in Mexico, Sri Lanka, and Tanzania (Masaki et al. 2022; Newhouse et al. 2022). The methodology we use in this paper deviates from the official approach endorsed by the World Bank’s Poverty Global Practice, as described in Corral et al. (2022), which is based on the EBP under a unit (household) level mixed model, and census micro-data as covariates (referred to as census-EBP). The main difference, besides the use of geospatial (instead of population census) covariates, is that our modeling approach utilizes only grid cell covariates, but the outcome is still modeled at the unit (household) level. This is why sometimes this latter model is referred to as the unit context model.

We explore the use of the unit context model in Chad, Guinea, Mali, and Niger that lack recent census data. We further leverage the availability of recent census data for a fifth country in West Africa, Burkina Faso, to conduct an evaluation exercise. The evaluation exercise compares estimates of headcount poverty rates obtained with a unit level model and the empirical best predictor using census covariates, with poverty rates obtained using the empirical best predictor under the unit context model with geospatial covariates.

As noted above, an alternative approach to small area estimation using geospatial covariates is to use an area level model (Fay and Herriot 1979), case in which both the direct estimates of poverty rates and the geospatial covariates are aggregated at the target area level. Hence, in the evaluation exercise presented in Section 4 we also produce estimates under a Fay-Herriot model as a way of providing additional evidence about the validity of the estimates produced under the unit context model.

Using geospatial data instead of census data in SAE and the unit context model has been criticized in recent literature (e.g., Corral et al. 2021). This is due to the possible introduction of omitted variable bias (relative to the unit level model) resulting from the aggregation of the geospatial covariates. Although a detailed discussion of this issue is beyond the scope of the current paper, being cognizant of the potential impact of using the unit context model on small area estimates is important.

First, the bias that has been reported in the literature is relative to an assumed gold-standard unit (household) level model and the availability of up-to-date household level census micro-data. It is our view that if recent census data are available, the census-EBP method should be preferred. We argue, however, that in the absence of recent census data, the use of geospatial covariates may constitute a valid alternative for providing up-to-date small area estimates until data from the next census becomes available. Second, we have observed that the extent of bias in the unit context model depends on the method used to account for sample weights. In this paper, weights are incorporated following Guadarrama et al. (2018). This weighting procedure was implemented in a way that adjusts the estimates of the regression coefficients and random effects to account for sample weights but uses the unweighted REML estimators for the variance components. As shown below, this can cause significant differences in small area estimates which are larger for models with lower predictive power, which is typically the case with unit context models. In Section 4 we explore both weighted and unweighted versions of the unit context model to assess how this impacts the estimates. Third, noting that aggregation is unavoidable due to the way geospatial data are processed, it is worth mentioning that the geographic level at which geospatial covariates are processed and linked to survey data (grid size) impacts the estimation. Because of this and because geospatial covariates can only act as proxies for the kind of variables typically used to model income (or consumption), it is reasonable to assume that the unit context model may show lower levels of predictive power and higher uncertainty than the unit (household) level model. However, because the estimators of interest are aggregations of individual level predictions, it is not obvious that the lower predictive power and higher uncertainty will substantially reduce the quality of the small area estimates obtained by using the unit context model. Finally, as is the case with any model-based method, model building, variable selection, and residual diagnostics are critical. The data analyst can try to mitigate the impact of aggregation by processing the geospatial data at the finest feasible spatial level to maximize the effective sample size. However, this may increase the risk of observing outliers in the geospatial data. The use of transformations may help make the data more consistent with the assumptions that the functional form is linear, and the error terms are distributed normally. As always, the use of model-diagnostics is crucial.

In addition, Corral et al. (2021) report concerns with the estimated measures of uncertainty under a unit context model. From our perspective, if the model assumptions are satisfied, a parametric bootstrap MSE estimator will provide a valid estimator of the uncertainty under the assumed model. Since the true data generating process is unknown, we cannot know a priori the extent to which the model assumptions are violated, regardless of the type of model assumed. In Section 4, we present results from Burkina Faso comparing coverage rates derived from the parametric bootstrap under the unit context model, treating census-based estimates as truth, with those from other estimators. For sampled areas, coverage rates under the unit context model are slightly below those from direct estimates and slightly above those obtained from an area level model, indicating that the estimated measures of uncertainty obtained through the parametric bootstrap are reasonable in this case.

In summary, we prefer to avoid making definitive statements about whether the unit context model works well or poorly. We instead posit that in the absence of a recent census, a unit context model with geospatial data may be considered as an alternative to the use of outdated census data. The presence of a recent census in Burkina Faso provides a valuable opportunity for evaluating this method. As with every SAE application, the performance of different methods will depend on the country context and the characteristics of the available survey and auxiliary data they are applied to. Evaluations of the estimates therefore remains of paramount importance.

The paper is organized as follows. Section 2 describes the data sources and the process of integrating geospatial and survey data. Section 3 presents the core of the small area methodology, model selection and assessment, small area estimation, mean squared error estimation, and measures to assess the small area estimates for all countries of focus in this paper. Section 4 presents an evaluation exercise using recent census and survey data in Burkina Faso. This allows us to compare small area estimates produced with geospatial covariates to small area estimates produced using covariate information from census micro-data. The results of the evaluation exercise add new insights to the body of literature on the use of geospatial data in small area estimation and motivate the use of the unit context model with geospatial data in the four remaining countries that lack up-to-date census data. Section 5 presents experimental point and uncertainty estimates for all countries using the unit context model. The paper concludes with a summary of the main findings and areas for further research.

2. Data Sources and Geospatial Data Integration

In this paper, we use geospatial covariates because, as shown in Table 1, the most recent censuses in the four focus countries were conducted in 2014 in Guinea, 2012 in Niger, and in 2009 in Chad and Mali. If more recent census data existed, using these data would be the preferred option. For example, several variables routinely collected in censuses such as household size, education, and sector of employment have been shown to be highly predictive of household welfare. Estimates based on recent census data are expected to be more accurate and precise than estimates based on geospatial data, which is often only available at an aggregated level (see e.g., Corral et al. 2021).

In this paper, however, we avoid using household level predictors in the model because information for the same predictors from a recent census is not available. Using old census data can be problematic because it is not guaranteed to capture developments since the last census, especially in countries impacted by rapid changes. Interpreting the estimates as if these arise from the census year requires assuming that the distribution of the census predictors, as well as their relationship to poverty, has not changed over time. This is a particular concern in countries such as these under study in this paper, which have among the highest fertility rates in the world and, in addition, have suffered from recent conflict and climate shocks which likely affected the geographic distribution of poverty and the geographic distribution of the population. Alternative sources of administrative data, such as health, land, or other administrative records, can also be useful sources of auxiliary data for small area estimation. However, these were not possible to obtain, and would not necessarily be commonly available for all four countries. We therefore decided to use publicly available, up-to-date geospatial data as covariates in small area models. The full list of candidate geospatial covariates, as well as a brief description of each of them are included in Table A1.

To estimate the model, we use survey data from the 2018 EHCVM surveys in the focus countries. The process of integrating the geospatial covariates with the survey data in each country is as follows. First, we process the covariates on a gridded shapefile with square grid cells of size 1 km² covering the totality of the country. Then, each household in the survey is matched to a grid cell using the centroid of the Enumeration Area in which the household is located. For each country it was observed that in the 2018 EHCVM surveys, geocoordinates were not available for a small share of households (representing less than 7% of all surveyed households in all cases). We dropped these households from the data. A detailed description of the differences between the full sample and the portion with available geocoordinates that was used in the analysis is presented in Table A2.

Figure 1a and b illustrate the use of grid cells and creation of geospatial zonal statistics. Figure 1a shows the grided cells in Conakry, Guinea. Figure 1b shows the value of the average radiance of nighttime lights across grid cells in the same area. The lighter grid cells have higher values of nightlights, while the darker cells have lower values. For each grid cell, we calculated the average feature value from the raster data. In addition to these grid cell-level indicators, we also calculate mean values of the indicator at the target area level to include as predictors in small area models. Including these contextual variables at the target area level as additional covariates helps improve the predictive performance of the model.

Figure 1.

(a) Grid in Conakry, Guinea and (b) average radiance of nighttime lights in Conakry, Guinea.

3. Small Area Estimation Methodology

In this section we present a summary of the small area methodologies we use to estimate headcount poverty rates at the level of the target area in the five countries of interest. We use a version of the Empirical Best Predictor (EBP; Battese et al. 1988; Jiang and Lahiri 2006; Molina and Rao 2010; Tzavidis et al. 2018) under the unit context nested error regression model with households as the unit of analysis and covariates defined by zonal statistics of geospatial variables at grid cell level (centroid of enumeration areas within target areas). Our methodology is similar to the one used by Masaki et al. (2022), which uses small area estimation to estimate non-monetary poverty indicators in Tanzania and Sri Lanka with geospatial covariates, and Newhouse et al. (2022), which applies similar techniques with geospatial covariates to estimate monetary poverty in Mexico. Van der Weide et al. (2022) also examines the performance of poverty estimates with geospatial covariates in Malawi but using a spatial error model with sub-area level estimates of poverty rates as the outcome, and geospatial zonal statistics as covariates.

As mentioned in previous sections, when the census and survey data are collected from around the same time, using household level census covariates is generally preferred, because the census tends to contain richer auxiliary information than geospatial data. When census data are sufficiently old, however, using cluster level covariate aggregates taken from the old census can generate more accurate estimates than using old census household level covariates (Lange et al. 2021). None of these variations of covariate use, however, reflect any changes in the distribution of the census covariates since the last census. Because, except for Burkina Faso, the census data in the focus countries of this paper are not up to date, we explore the use of more current geospatial data as covariates instead of old census data.

We opt for a household level model of welfare over a grid cell level model of poverty rates because it utilizes more detailed information about the distribution of the welfare variable, and it is easier to interpret. In addition, defining the grid cell level poverty rate as the outcome to be estimated, as in the case of an area level model, requires accounting for the corresponding sampling variability, which may be challenging at such a small level of aggregation. We also prefer a household level model to an area level model because the former allows for the use of auxiliary data at the grid cell level rather than at the target area level, which can improve the accuracy and precision of the estimates, as demonstrated in Masaki et al. (2022) and Newhouse et al. (2022).

We model the household log per capita consumption as a linear function of a subset of geospatial covariates selected through Lasso. The procedure is described in detail in Appendix B. The model equation takes the form:

l n Y_{r a g h} = X_{r a g} β_{1} + X_{r a} β_{2} + D_{r} β_{3} + ν_{a} + ϵ_{r a g h}

(1)

where $\ln Y_{ragh}$ represents the log per capita consumption of household h, for which the centroid of their survey enumeration area falls in grid g within target area a and region r. This value of consumption has been spatially deflated using estimated local prices. $X_{rag}$ represents the vector of grid cell geospatial zonal statistics, and $X_{ra}$ represents the vector of unweighted averages of the geospatial variables at the target area level. $D_{r}$ represents a set of regional dummy variables, $ν_{a}$ is a random effect specified at the target area level with $ν_{a} ~ N (0, σ_{v}^{2})$ , and $ϵ_{ragh}$ is a household-specific error term with $ϵ_{ragh} ~ N (0, σ_{ϵ}^{2})$ . Survey weights are incorporated into model estimation following Guadarrama et al. (2018), as described in Skarke et al. (2021). A recent paper by Cho et al. (2024) presents optimal predictors for general parameters under an informative sampling design. Implementing this methodology with the data from the focus countries is a useful area for future research.

The EBP works by repeatedly simulating synthetic populations $\ln Y_{ragh}$ under model Equation (1) using the expected value of what is unobserved, after conditioning on what is observed in the sample. Under the assumed linear mixed model, this expectation has a closed form. Having fit model Equation (1), the expected log household per capita consumption for each household in the population is computed as follows:

l n Y_{r a g h}^{(l)} = X_{r a g} {\hat{β}}_{1} + X_{r a} {\hat{β}}_{2} + D_{r} {\hat{β}}_{3} + {\hat{ν}}_{a} + v_{a}^{(l)} + ϵ_{r a g h}^{(l)}, (l = 1, .., L)

where ${\hat{ν}}_{a}$ is the random effect predicted with the sample data, $v_{a}^{(l)}$ is generated from $N (0, {\hat{σ}}_{v}^{2} (1 - {\hat{γ}}_{a}))$ , $ϵ_{ragh}^{(l)}$ is drawn from $N (0, {\hat{σ}}_{ε}^{2})$ and ${\hat{γ}}_{a}$ is the area-specific shrinkage factor that depends on the estimated variance components and the area sample sizes (Molina and Rao 2010). For each simulated synthetic population, the target area-specific parameter, the headcount poverty rate, is computed using the simulated values of the welfare variable (per capita consumption) and the official national poverty lines for each country. This procedure is repeated L = 100 times and the final estimated poverty rates for each area correspond to the average across the 100 simulations.

In this paper we implement a version of the EBP that calculates the expected value of headcount poverty given the estimated model parameters, for each population unit (which in this case is a grid). Under the assumed linear mixed model, this expectation has a closed form. Having fit model Equation (1), the expected poverty rate for each grid is computed as follows:

{\hat{P}}_{r a g} = Φ (\frac{l o g (Z) - X_{r a g} {\hat{β}}_{1} - X_{r a} {\hat{β}}_{2} - D_{r} {\hat{β}}_{3} - {\hat{ν}}_{a}}{{\hat{σ}}_{v}^{2} (1 - {\hat{γ}}_{a}) + {\hat{σ}}_{ε}^{2}}),

where $Φ$ is the standard normal cumulative distribution function, lnZ is the natural logarithm of the poverty line, and ${\hat{β}}_{1}$ , ${\hat{β}}_{2}$ , ${\hat{β}}_{3}$ , ${\hat{σ}}_{v}^{2}$ , ${\hat{σ}}_{ε}^{2}$ , and ${\hat{ν}}_{a}$ are estimated model parameters, and ${\hat{γ}}_{a}$ is the area-specific shrinkage factor that depends on the estimated variance components and the effective area sample size. The estimated target-area headcount poverty rate, ${\hat{P}}_{ra}$ , is computed by taking a weighted average of the grid-level poverty estimates in each area, with the grid population estimates from WorldPop playing the role of aggregation weights. Estimation is implemented using a modified version of the povmap package (Edochie et al. 2024) in R (code is available on the development branch of the package at: https://github.com/SSA-Statistical-Team-Projects/povmap). The Povmap package is itself a modified version of the EMDI package (Kreutzmann et al. 2019) that allows for aggregation weights when aggregating across population units (grids in this case). The two versions of implementing the EBP lead to the same estimates for a large number of Monte-Carlo iterations. To verify this, we compare poverty headcount and MSE estimates obtained using the traditional method with $L = 100$ Monte-Carlo replications with those obtained by calculating the expected value approach for one focus country, and report the results in Appendix D. Estimates of the mean squared error (MSE) of the small area estimates are calculated using parametric bootstrap under model Equation (1) (González-Manteiga et al. 2008) as implemented in the Povmap package. MSE estimation adjusts for the fact that the population data we use contain only one observation per grid, while the actual population contains multiple households per grid. An empirical best predictor under the two-fold version of the nested error regression model is also available (Marhuenda et al. 2017). The two-fold version of the EBP was used to produce official small area estimates of poverty rates in Burkina Faso using the latest census in the country. These estimates are used as part of the sensitivity analysis in Section 4.

An alternative approach to small area estimation with geospatial covariates is modeling directly the poverty rates using an area level model (Fay and Herriot 1979). In this case, both the direct estimates of poverty rates, denoted by ${\hat{p}}_{a}^{dir}$ , and the geospatial covariates, denoted by $X_{ra}$ , are aggregated at the target area level. The FH model consists of two stages: the sampling model and the linking model. The combination of both stages results in an area level linear mixed model denoted by

{\hat{p}}_{a}^{d i r} = X_{r a} β_{1} + D_{r} β_{2} + ν_{a} + ϵ_{a}

where $X_{ra}$ represents the vector of unweighted averages of the geospatial variables selected for the model at the target area level. $D_{r}$ represents a set of regional dummy variables, $ν_{a}$ is a random effect specified at the target area level with $ν_{a} ~ N (0, σ_{v}^{2})$ , and $ϵ_{a}$ is the sampling error with $ϵ_{a} ~ N (0, σ_{ϵ_{a}}^{2})$ . The sampling variance $σ_{ϵ_{a}}^{2}$ is estimated under the sampling design and is assumed to be known. The variance component of the random effect is estimated by maximum likelihood methods (e.g., the adjusted maximum-likelihood of Li and Lahiri 2010) to guarantee positive variance estimates. The MSE of the estimator under the FH model can be obtained by analytic solutions (e.g., Prasad and Rao 1990) or by bootstrap techniques (González-Manteiga et al. 2008). In this paper the FH model is estimated using the Fayherriot Command in Stata (Halbmeier et al. 2019) with no transformation. The routine works similarly to the way the FH model is estimated in the EMDI and Povmap packages (Harmening et al. 2023) in R. Transformed versions of the FH models, using for example the arcsin transformation, are also available and can be considered when modeling proportions. Transformed FH models can be also estimated by using the EMDI and Povmap R packages.

3.1. Model Selection and Assessment

The geospatial data listed in Table A1 were used to construct averages of zonal statistics both at the grid cell level and the target area level which are used as covariates in model Equation (1). In addition, we include dummy variables at the region level. For model selection we use Lasso to select a set of predictor variables while avoiding overfitting. Estimation of the Lasso penalty parameter is implemented by minimizing the Bayesian Information Criterion (Zhang et al. 2010). The regional dummies are unpenalized and therefore are guaranteed to be selected in the model. Details are provided in Appendix B.

Broadly, the signs and patterns of the coefficients of the unit context model reflect a positive association between population and building density, and a negative association between welfare and remoteness, as proxied by agricultural production and a high prevalence of grassland and shrubland. Fitting model Equation (1) in the five countries under consideration leads to R² values ranging from .19 in Chad to .32 in Niger. This range is consistent with similar applications in other contexts. In similar household level models with aggregation of geospatial covariates at similar spatial scales, the R² was .30 in Tanzania and .27 in Sri Lanka when predicting per capita consumption, and .13 in Mexico when predicting per capita income. Geospatial variables do not vary within grids-cells and therefore can only explain variation in welfare across enumeration areas. However, the R² is not necessarily the most accurate measure of the benefit of incorporating auxiliary data, as small area estimates based on models with weaker predictors can also be of acceptable quality. Overall, the R² values in the focus countries indicate that the geospatial variables measured at grid cell level (enumeration areas) are moderately predictive of variation in household per capita consumption and can potentially lead to acceptable small area estimates.

Table 2 presents model residual diagnostics under model Equation (1). The error terms appear to be reasonably normal as judged from the skewness and kurtosis, though less so for the unit level error term in Mali. Figure 2 presents quantile-quantile plots for the unit and area estimated model residuals for all five countries. Overall, the results show that the log-transformed model provides a reasonable approximation to the normality of the model error terms. Additional model and residual diagnostics are presented in Appendix C.

Table 2.

Model Residual Diagnostics.

Country	Unit level error term		Area effect		Model R ²
	Skewness	Kurtosis	Skewness	Kurtosis	Marginal	Conditional
Burkina Faso	0.476	3.866	0.194	4.718	.278	.392
Chad	0.400	3.477	−0.299	3.408	.190	.222
Guinea	0.132	3.310	0.067	4.021	.272	.387
Mali	0.559	4.123	−0.093	7.325	.257	.330
Niger	0.434	3.644	0.364	4.916	.317	.365

Figure 2.

Quantile-quantile plots of unit level error terms and area random effects: (a) Burkina Faso, (b) Chad, (c) Guinea, (d) Mali, and (e) Niger.

4. Evaluation Exercise: Comparison of Geospatial and Census-Based Estimates of Headcount Poverty in Burkina Faso

Before presenting estimates of headcount poverty rates for the four focus countries that lack recent census data, we conduct a sensitivity analysis with data from Burkina Faso. The availability of a recent census in Burkina Faso creates an opportunity to assess the estimates produced with geospatial covariates and the unit context model against officially adopted EBP census-based estimates as described below.

Burkina Faso’s National Institute of Statistics and Demography carried out a census in 2018 which was utilized by the Burkina Faso poverty team of the World Bank to generate small area estimates of poverty for Communes using the EBP census methodology (Molina and Rao 2010) under a two-fold nested error regression model (Marhuenda et al. 2017). Because the census and the survey data are from a similar period, the small area estimates using census auxiliary information are considered the gold standard. Comparing the census-based estimates with estimates produced using geospatial covariates offers an appropriate testing ground for assessing the extent of discrepancies between census-based and geospatial-based estimates. This framework can also be used to compare the estimates produced by different models.

For the purposes of the evaluation exercise, we treat the census-based EBP estimates as the gold standard. The census-based estimates are compared to: (a) small area estimates under both survey weighted and unweighted versions of model Equation (1) with the outcome defined at household level and the geospatial covariates defined at grid cell level; (b) small area estimates under an area level (Fay-Herriot) model, with geospatial covariates aggregated at the target area level; and (c) small area estimates under a grid cell level model where both the outcome and the geospatial covariates are defined at grid cell level. It is important to note that the survey data was collected from the same harmonized survey instrument as the survey data for the other four countries we consider in this paper.

Figure 3 and Table 3 summarize the results of these comparisons. Across all Communes in Burkina Faso, we find a high correlation equal to .799 between the estimates under the household level model with geospatial covariates and those derived under the household level model with census covariates. However, there is a large difference in this correlation between in-sample and out-of-sample Communes. For the 234 Communes included in the sample, which comprise 84% of the population of Burkina Faso according to WorldPop estimates, the correlation between the survey and census-based estimates is .879. In contrast, the correlation for the 117 non-sampled Communes is .457. The in-sample correlation is also remarkably similar to findings from other contexts (Masaki et al. 2022; Newhouse et al. 2022; Van der Weide et al. 2022). The correlation for out-of-sample areas meanwhile, is significantly lower than the out-of-sample correlation of .7 reported between geospatial and census-based estimates in Mexico (Newhouse et al. 2022). This may be explained by differences in the nature of the geospatial covariates used in Mexico, which could lead to better out of sample predictions, as well as differences in the country context. Perhaps, the lower out-of-sample correlations in this case could be explained by the fact that non-sampled Communes are different from the sampled Communes, as they are more remote, and they are not covered by the survey. The household level, grid cell level, and area level geospatial models all benefit from conditioning on the same household survey data that was used for producing the census-based estimates, making the census and geospatial estimates (household and area level) more consistent with each other in sampled areas. On the other hand, for out-of-sample areas prediction is purely based on grid cell aggregated covariates that may not be as predictive of poverty as household census covariates.

Table 3.

Comparison of Commune Poverty Estimates for Different Estimation Methods in Burkina Faso, by Sampled and Non-Sampled Communes.

	Communes
Characteristic	Sampled	Non-sampled	All
Number of Communes	234	117	351
Share of population (%)	83.2	16.8	100
Correlation with census-based estimates
Household level model with geospatial covariates (with Guadarrama et al. (2018) weights)	0.879	0.457	0.799
Household level model with geospatial covariates (unweighted)	0.880	0.478	0.807
Grid cell level model with geospatial covariates	0.823	0.529	0.767
Area level model with geospatial covariates	0.754	0.499	0.685
Direct estimates with survey weights	0.837	N/A	N/A
Average estimated MSE across Communes
Household level model with geospatial covariates (with Guadarrama et al. (2018) weights)	0.007	0.023	0.013
Household level model with geospatial covariates (unweighted)	0.006	0.023	0.012
Grid cell level model with geospatial covariates	0.015	0.029	0.020
Area level model with geospatial covariates	0.015	0.025	0.018
Direct estimates with survey weights	0.047	N/A	N/A
Coverage rate relative to census-based estimates
Household level model with geospatial covariates (with Guadarrama et al. (2018) weights; %)	89.7	97.4	92.3
Household level model with geospatial covariates (unweighted; %)	86.3	95.7	89.5
Grid cell level model with geospatial covariates (%)	96.2	97.4	96.6
Area level model with geospatial covariates (%)	86.8	87.2	86.9
Direct estimates with survey weights (%)	91.0	N/A	N/A
Average estimated headcount poverty rate across Communes
Census-based estimates (%)	45.7	52.6	48.0
Household level model with geospatial covariates (with Guadarrama et al. (2018) weights; %)	49.3	54.1	50.9
Household level model with geospatial covariates (unweighted; %)	44.0	47.6	45.2
Grid cell level model with geospatial covariates (%)	48.5	52.7	49.9
Area level model with geospatial covariates (%)	39.1	41.8	40.0
Direct estimates with WorldPop weights (%)	48.2	N/A	N/A
Direct estimates with survey weights (%)	46.8	N/A	N/A

Figure 3.

Census-based versus geospatial-based estimates under the unit context model for sampled and non-sampled Communes in Burkina Faso. Red points represent areas (Communes) included in the sample survey, while blue points represent Communes not included in the sample survey.

Looking at the MSE estimates in Table 3, estimates from the household level models have lower MSEs on average compared to the estimates under the grid cell level and area level models. A further comparison between the estimates produced with geospatial covariates and the census-based ones is to compute coverage rates by treating the assumed gold standard census-based estimates as the truth. The coverage rate is the share of Communes for which a 95% normal confidence interval for headcount poverty, defined as the estimate ±1.96 times the square root of the estimated MSE, contains the census-based estimate. Overall, the coverage rate for estimates under the weighted household level model with geospatial covariates is 92.3%. Of course, this is not an ideal test because the census-based estimates are themselves estimates, derived from the same sample data as the geospatial based estimates. Nonetheless, the high coverage rate alleviates concerns about the validity of the estimates produced under the unit context model.

Recent research suggests that machine learning methods that allow for more flexible functional forms can improve small area prediction (Krennmair and Schmid 2022; Merfeld and Newhouse 2023). Exploring whether the use of machine learning methods improves prediction for out-of-sample areas is an area of current research focus. The performance for out-of-sample prediction will depend on the focus country and how well geospatial data predict poverty. Therefore, out-of-sample estimates under the unit context model in the four countries that lack recent census data should be interpreted with great caution and are likely to change when the next round of census-based estimates becomes available.

5. Assessment of Experimental SAE Estimates of Head Count Poverty in Burkina Faso, Chad, Guinea, Mali, and Niger

Having compared geospatial and census-based small area estimates of head count poverty in Burkina Faso, this section describes the generation of experimental small area estimates of head count poverty for all five countries. Estimates are produced using model Equation (1) with geospatial covariates as described in Table A1. Because the estimates we produce are experimental and not official, the results we present do not identify target areas in the focus countries. Model-based estimates under model Equation (1) with geospatial covariates are compared to direct estimates both at target area and at aggregate, regional levels. In addition, MSE estimates of the model-based estimates are compared to the estimated variances of the direct estimates.

Figure 4 shows the relationship between the EBP estimates under model (1) and the direct estimates at the target area level. In general, model-based estimates are strongly correlated with direct estimates and exhibit less variation than direct estimates, as one would expect due to the impact of shrinkage.

Figure 4.

Direct estimates versus model-based (under the unit context model with weighting following Guadarrama et al. (2018)) estimates at target area level.

Figure 5 shows the relationship between the EBP estimates under the weighted and unweighted version of model Equation (1). The results show that weighted model-based estimates are systematically higher than the unweighted estimates, while the unweighted estimates are closer to the direct estimates. This may be due to the approach to weighting taken by the Guadarrama et al. (2018) method. In future research it will be interesting to compare the current estimates against estimates obtained by using other weighting methods, including the method that accounts for informative sampling proposed by Cho et al. (2024). For the remaining of this section, we will use the term model-based estimates to refer to those obtained under the unit context model using the weighting proposed in Guadarrama et al. (2018).

Figure 5.

Unweighted model-based estimates versus weighted model-based estimates (under the unit context model with weighting following Guadarrama et al. (2018)) at target area level.

Table 4 presents the median, over target areas, of the coefficient of variation estimated for the model-based versus the direct estimates. Our preferred measure of uncertainty for the direct estimates is based on the Horvitz-Thompson approximation, calculated using the R SAE package, with the sum of the sample weights for each area used to approximate the domain size.

Table 4.

Median cve for Direct and Model-Based Estimates.

Country	Burkina Faso	Chad	Guinea	Mali	Niger
Direct survey estimates
Sampled areas	0.435	0.271	0.370	0.415	0.425
Model-based estimates
Sampled areas	0.167	0.115	0.121	0.164	0.155
Non-sampled areas	0.2084	0.268	0.220	0.229	0.229
Median percentage reduction in cve in sampled areas	62.8	59.7	68.3	60.6	64.0

The results in Table 4 show a large reduction in the median cve of the model-based estimates relative to direct estimates. Figure 6 shows reductions in the cve for all but a few target areas. The large efficiency gains from the use of model-based estimates are possibly moderately overestimated. In real data evaluations (e.g., Masaki et al. 2022; Newhouse et al. 2022), coverage rates of confidence intervals produced by using parametric bootstrap MSE estimates are somewhat below the nominal 95%. Nevertheless, even considering this, we expect the model-based estimates to be more efficient than direct estimates.

Figure 6.

Cve for direct (Horvitz-Thompson approximation) versus cve for model-based estimates for sampled areas by country.

As a further comparison between model-based and direct estimates, we consider a goodness of fit statistic at the target area level, following Brown et al. (2001). The statistic is based on computing Z scores defined as follows,

Z_{a} = \frac{({\hat{p}}_{a}^{e b p} - {\hat{p}}_{a}^{d i r})}{(\sqrt{{\hat{M S E}}_{e b p, a} + {\hat{V A R}}_{d i r e c t, a}})}

(2)

Where ${\hat{p}}_{a}^{dir}, {\hat{p}}_{a}^{ebp}$ are the direct and model-based estimates of headcount poverty rates under model Equation (1) for area $a$ respectively, and the denominator comprises the estimated mean squared error of the EBP estimates and the estimated variance of the direct estimates. The Z scores are useful for assessing the magnitude of the difference between the direct and model-based estimates relative to corresponding uncertainty estimates and whether the differences, taken collectively over all areas, are statistically significant. The Wald test statistic is defined by,

W = \sum_{a} Z_{a}^{2},

(3)

where W has a chi-squared distribution with degrees of freedom equal to the number of areas. A value below the 95% threshold implies a p-value above .05, indicating that the differences are not statistically significant. Table 5 presents the p-value for each country when using the EBP estimates under the weighted version of the unit context model. For Burkina Faso, Chad, and Mali we don’t find statistically significant differences between the model-based and direct estimates. However, this is not the case for Guinea and Niger. Meanwhile, Table 6 presents the p-value for each country when using the EBP estimates under the unweighted version of the unit context model. In this case all p-values exceed .05, indicating that the differences between the direct and EBP estimates are not statistically significant at the 95% level. These results, along with the average poverty estimates reported in Table 3, show the significant impact that weighting can have on model-based estimates and highlight the need for further research on weighting methods.

Table 5.

Results from Applying the Goodness of Fit Test Equation (3) in the Five Countries (Weighted Version of the Unit Context Model).

Country	Test statistic (W)	95% Threshold	Degrees of freedom	p-Value
Burkina Faso	251.7	270.7	234	.2
Chad	94.4	123.2	99	.61
Guinea	299.9	289.0	251	.02
Mali	260.4	281.4	244	.23
Niger	271.8	264.2	228	.025

Table 6.

Results from Applying the Goodness of Fit Test Equation (3) in the Five Countries (Unweighted Version of the Unit Context Model).

Country	Test statistic (W)	95% Threshold	Degrees of freedom	p-Value
Burkina Faso	231.1	270.7	234	.54
Chad	87.3	123.2	99	.79
Guinea	246.4	289.0	251	.57
Mali	280.6	281.4	244	.054
Niger	216.5	264.2	228	.70

Turning now to comparisons at more aggregate levels, Figures 7 to 9 compare the EBP geospatial estimates with direct estimates at the regional level. Figure 8 shows the same results as Figure 7 but excludes out of sample areas when aggregating the EBP estimates at the regional level. We decided to explore this latter approach considering the lower correlation between unit context model estimates and the unit level model estimates for out of sample areas in Burkina Faso. Both Figures 7 and 8 aggregate the model-based estimates using WorldPop weights, while the direct estimates are aggregated using survey weights. Figure 9 remedies this inconsistency by using survey weights when aggregating the model-based estimates from target areas to regions. Overall, the results show that the model-based estimates are aligned well with direct estimates at the regional level. Some discrepancies are to be expected, however, because model-based estimates are affected by shrinkage.

Figure 7.

Small area estimates versus direct estimates at regional level (using WorldPop weights for aggregation).

Figure 8.

Small area estimates versus direct estimates at regional level (including only in sample target areas and using WorldPop weights for aggregation).

Figure 9.

Small area estimates versus direct estimates at regional level (including only in sample target areas and using survey weights for aggregation).

6. Conclusions

This paper describes the methodology used for producing experimental small area estimates of headcount poverty rates in five West African countries where in four of these countries no recent census data are available and zonal statistics from geospatial data sources are available instead. The use of model-based estimation with geospatial covariates offers a pragmatic approach for producing interim model-based estimates because the alternative of using old census data carries risks especially if the distribution of census variables has changed over the intercensal period.

The presence of a recent census in Burkina Faso provided a valuable opportunity to evaluate the results of different models against “gold standard” census-based estimates. In sampled areas, the estimates produced by the unit context model track the census-based estimates well and have lower MSEs than direct estimates. Across all areas, the correlation between the geospatial-based estimates and the census-based estimates is high, but this correlation was much higher in sampled than non-sampled areas. Models specified at the household level generated estimates that were moderately more accurate than those specified at the grid cell level, because the greater variation in per capita consumption allowed for the automated selection of a richer model. Both sets of estimates had lower MSEs than estimates under a model specified at the area level which we think is due to the use of more granular auxiliary data.

Overall, the estimates for the countries without census data show large improvements in MSE reduction compared to direct estimates. In particular, the median cve in sampled areas is reduced between approximately 59% and 68%. The five countries focus of this paper are neighbors and share many economic and social characteristics. Furthermore, all of them implemented highly harmonized surveys concurrently, and the set of geospatial variables available for model selection is identical. However, one cannot be certain that the results for Burkina Faso generalize to the other four West African countries as there are important differences to take into account. Burkina Faso is facing a significant internally displaced people crisis, affecting about 10% of the population, but hosts far fewer refugees than Niger or Chad. Burkina Faso and Guinea lack the large areas of mainly uninhabited desert that characterize Mali, Niger, and Chad. Nonetheless, the relatively low correlation between the geospatial estimates and the census estimates in non-sampled areas observed in Burkina Faso raises the prospect that the estimates for these areas could significantly change when upcoming censuses are collected and combined with survey data to produce updated poverty maps. Given the scarcity of evidence on out-of-sample prediction accuracy in the literature, we recommend treating the out-of-sample estimates in the remaining four countries with a high degree of caution.

There are several additional avenues for further research to inform these types of data integration efforts. These include additional empirical work to validate both point and uncertainty estimates against estimates using recent census data. Zonal statistics derived from geospatial data can be highly correlated. Initial results from current research indicate that the approach used for geospatial data processing and model building with geospatial data can impact on the quality of the produced estimates. Model estimation also matters, it should be possible to improve upon existing methods for estimating mixed models with sampling weights. In addition, exploring the use of machine learning methods to capture complex relationships could improve estimation especially for out-of-sample areas. Despite room for further improvement, the model-based estimates of the type calculated in this paper can provide interim estimates of the spatial distribution of poverty with acceptable uncertainty measures that cannot be obtained with survey data alone.

Footnotes

Appendix A

Appendix B

Appendix C

Appendix D

Acknowledgements

We thank Olivier Dupriez, Jed Friedman, Haishan Fu, Craig Hammer, Johannes Hoogeveen, Gabrela Inchauste, Talip Kilic, Johan Mistiaen, Pierella Paci, and Nobuo Yoshida for their support.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was partially supported by the Knowledge for Change Program’s phase IV programmatic research project ‘‘Understanding Trends in Sub-National Differences in Economic Well-Being in Low and Middle- Income Countries’’ Program. The work of Tzavidis, Schmid, and Luna is supported by the Data and Evidence to End Extreme Poverty (DEEP) research programme. DEEP is a consortium of the Universities of Cornell, Copenhagen, and Southampton led by Oxford Policy Management, in partnership with the World Bank - Development Data Group and funded by the UK Foreign, Commonwealth and Development Office. The work of Tzavidis is also supported by the UKRI-ESRC strategic research grant ES/X014150/1 for ‘‘Survey data collection methods collaboration: securing the future of social surveys’’, known as Survey Futures. Survey Futures is a collaboration of twelve organisations, benefitting from additional support from the Office for National Statistics and the ESRC National Centre for Research Methods. Further information can be found at www.surveyfutures.net.

ORCID iD

Angela Luna Hernandez

Received: September 2023

Accepted: August 2024

References

Arias-Salazar

2023. “Small Area Estimates of Poverty Incidence in Costa Rica Under a Structure Preserving Estimation (SPREE) Approach.”Journal of Official Statistics 39 (4): 435–58. DOI: https://doi.org/10.2478/jos-2023-0021.

Battese

G. E.

Harter

R. M.

Fuller

W. A.

1988. “An Error-Components Model for Prediction of County Crop Areas Using Survey and Satellite Data.”Journal of the American Statistical Association 83 (401): 28–36. DOI: https://doi.org/10.2307/2288915.

Brown

Chambers

Heady

Heasman

2001. “Evaluation of Small Area Estimation Methods: An Application to Unemployment Estimates from the UK LFS.”Proceedings of Statistics Canada Symposium 2001. https://www150.statcan.gc.ca/n1/en/pub/11-522-x/2001001/session6/6247-eng.pdf?st=sU49E_nE.

Chi

Fang

Chatterjee

Blumenstock

J. E.

2022. “Microestimates of Wealth for All Low-and Middle-Income Countries.”Proceedings of the National Academy of Sciences of the United States of America 119 (3): e2113658119. DOI: https://doi.org/10.1073/pnas.2113658119.

Cho

Guadarrama-Sanz

Molina

Eideh

Berg

2024. “Optimal Predictors of General Small Area Parameters Under an Informative Sample Design Using Parametric Sample Distribution Models.”Journal of Survey Statistics and Methodology. Published electronically March 282024. DOI: https://doi.org/10.1093/jssam/smae007.

Corral

Cojocaru

Segovia

Molina

2022. Guidelines for Poverty Mapping. Washington, DC: World Bank. DOI: https://doi.org/10.1596/37728.

Corral

Himelein

McGee

Molina

2021. “A Map of the Poor or a Poor Map?”Mathematics 9 (21): 2780. DOI: https://doi.org/10.3390/math9212780.

Edochie

Newhouse

Würz

Schmid

2024. “Povmap: Extensions to the ‘emdi’ Package.” R Package Version 1.0.1. DOI: https://doi.org/10.32614/cran.package.povmap.

Fay

R. E.

Herriot

R. A.

1979. “Estimation of Income for Small Places: An Application of James-Stein Procedures to Census Data.”Journal of the American Statistical Association 74: 269–77. DOI: https://doi.org/10.2307/2286322.

10.

González-Manteiga

Lombardía

M. J.

Molina

Morales

Santamaría

2008. “Bootstrap Mean Squared Error of a Small-Area EBLUP.”Journal of Statistical Computation and Simulation 78 (5): 443–62. DOI: https://doi.org/10.1080/00949650601141811.

11.

Guadarrama

Molina

Rao

J. N. K.

2018. “Small Area Estimation of General Parameters Under Complex Sampling Designs.”Computational Statistics & Data Analysis 121: 20–40. DOI: https://doi.org/10.1016/j.csda.2017.11.007.

12.

Halbmeier

Kreutzmann

A. K.

Schmid

Schröder

2019. “The Fayherriot Command for Estimating Small-Area Indicators.”The Stata Journal 19 (3): 626–44. DOI: https://doi.org/10.1177/1536867x19874238.

13.

Harmening

Kreutzmann

A. K.

Schmidt

Salvati

Schmid

2023. “A Framework for Producing Small Area Estimates Based on Area-Level Models in R.”The R Journal 15 (1): 316–41. DOI: https://doi.org/10.32614/rj-2023-039.

14.

Isidro

Haslett

Jones

2016. “Extended Structure Preserving Estimation (ESPREE) for Updating Small Area Estimates of Poverty.”Annals of Applied Statistics 10 (1): 451–76. DOI: https://doi.org/10.1214/15-aoas900.

15.

Jean

Burke

Xie

Davis

W. M.

Lobell

D. B.

Ermon

2016. “Combining Satellite Imagery and Machine Learning to Predict Poverty.”Science 353 (6301): 790–4. DOI: https://doi.org/10.1126/science.aaf7894.

16.

Jiang

Lahiri

2006. “Mixed Model Prediction and Small Area Estimation.”Test 15 (1): 1–96. DOI: https://doi.org/10.1007/bf02595419.

17.

Koebe

Arias-Salazar

Rojas-Perilla

Schmid

2022. “Intercensal Updating Using Structure-Preserving Methods and Satellite Imagery.”Journal of the Royal Statistical Society: Series A (Statistics in Society) 185: 170–96. DOI: https://doi.org/10.1111/rssa.12802.

18.

Krennmair

Schmid

2022. “Flexible Domain Prediction Using Mixed Effects Random Forests.”Journal of the Royal Statistical Society: Series C (Applied Statistics) 71 (5): 1865–94. DOI: https://doi.org/10.1111/rssc.12600.

19.

Kreutzmann

A. K.

Pannier

Rojas-Perilla

Schmid

Templ

Tzavidis

2019. “The R Package emdi for Estimating and Mapping Regionally Disaggregated Indicators.”Journal of Statistical Software 91: 1–33. DOI: https://doi.org/10.18637/jss.v091.i07.

20.

Lange

Pape

U. J.

Pütz

2021. “Small Area Estimation of Poverty Under Structural Change.”Review of Income and Wealth 68: 264–81. DOI: https://doi.org/10.1111/roiw.12558.

21.

Lahiri

2010. “An Adjusted Maximum Likelihood Method for Solving Small Area Estimation Problems.”Journal of Multivariate Analysis 101 (4): 882–92. DOI: https://doi.org/10.1016/j.jmva.2009.10.009.

22.

Marhuenda

Molina

Morales

Rao

J. N. K.

2017. “Poverty Mapping in Small Areas Under a Twofold Nested Error Regression Model.”Journal of the Royal Statistical Society: Series A (Statistics in Society) 180 (4): 1111–36. DOI: https://doi.org/10.1111/rssa.12306.

23.

Masaki

Newhouse

Silwal

A. R.

Bedada

Engstrom

2022. “Small Area Estimation of Non-Monetary Poverty with Geospatial Data.”Statistical Journal of the IAOS 38 (3): 1035–51. DOI: https://doi.org/10.3233/sji-210902.

24.

Merfeld

J. D.

Newhouse

2023. “Improving Estimates of Mean Welfare and Uncertainty in Developing Countries.”Policy Research Working Paper No. 10348. The World Bank. DOI: https://doi.org/10.1596/1813-9450-10348.

25.

Molina

Marhuenda

2015. “sae: An R Package for Small Area Estimation.”The R Journal 7 (1): 81–98. DOI: https://doi.org/10.32614/rj-2015-007.

26.

Molina

Rao

J. N. K.

2010. “Small Area Estimation of Poverty Indicators.”Canadian Journal of Statistics 38 (3): 369–85. DOI: https://doi.org/10.1002/cjs.10051.

27.

Newhouse

2024. “Small Area Estimation of Poverty and Wealth Using Geospatial Data: What Have We Learned So Far?” Calcutta Statistical Association Bulletin 76 (1): 7–32. DOI: https://doi.org/10.1177/00080683231198591.

28.

Newhouse

Merfeld

Ramakrishnan

A. P.

Swartz

Lahiri

2022. “Small Area Estimation of Monetary Poverty in Mexico Using Satellite Imagery and Machine Learning.” World Bank Policy Research Paper No. 10175. DOI: https://doi.org/10.1596/1813-9450-10175.

29.

Nguyen

V. C.

2012. “A Method to Update Poverty Maps.”Journal of Development Studies 48 (12): 1844–63. DOI: https://doi.org/10.1080/00220388.2012.682983.

30.

Prasad

Rao

J. N. K.

1990. “The Estimation of the Mean Squared Error of Small Area Estimators.”Journal of the American Statistical Association 85: 163–71. DOI: https://doi.org/10.2307/2289539.

31.

Skarke

Kreutzmann

A. K.

Würz

2021. “Extensions to the ebp Function in the R Package emdi: Additional Data-Driven Transformations and Empirical Best Prediction Under Informative Sampling.”https://cran.r-project.org/web/packages/emdi/vignettes/vignette_ebp2.pdf.

32.

Tzavidis

Zhang

L. C.

Luna

Schmid

Rojas-Perilla

2018. “From Start to Finish: A Framework for the Production of Small Area Official Statistics.”Journal of the Royal Statistical Society: Series A (Statistics in Society) 181 (4): 927–79. DOI: https://doi.org/10.1111/rssa.12364.

33.

Van der Weide

Blankenspoor

Elbers

Lanjouw

2022. “How Accurate is a Poverty Map Based on Remote Sensing? An Application to Malawi.” World Bank Policy Research Paper No. 10171. DOI: https://doi.org/10.1596/1813-9450-10171.

34.

Yeh

Perez

Driscoll

Azzari

Tang

Lobell

Ermon

Burke

2020. “Using Publicly Available Satellite Imagery and Deep Learning to Understand Economic Well-Being in Africa.”Nature Communications 11 (1): 2583. DOI: https://doi.org/10.1038/s41467-020-16185-w.

35.

Zhang

Tsai

C. L.

2010. “Regularization Parameter Selections via Generalized Information Criterion.”Journal of the American Statistical Association 105 (489): 312–23. DOI: https://doi.org/10.1198/jasa.2009.tm08013.

Small Area Estimation of Poverty in Four West African Countries by Integrating Survey and Geospatial Data

Abstract

Keywords

1. Introduction

2. Data Sources and Geospatial Data Integration

3. Small Area Estimation Methodology

3.1. Model Selection and Assessment

4. Evaluation Exercise: Comparison of Geospatial and Census-Based Estimates of Headcount Poverty in Burkina Faso

5. Assessment of Experimental SAE Estimates of Head Count Poverty in Burkina Faso, Chad, Guinea, Mali, and Niger

6. Conclusions

Footnotes

Appendix A

Appendix B

Appendix C

Appendix D

Acknowledgements

Funding

ORCID iD

References