Discussion

Abstract

1. Introduction

My views do not coincide with the authors’ entirely. Principally, I suspect this is because I am not as optimistic. Nevertheless, although, for reasons I will detail, I do not share all its conclusions, I think the paper provides a comprehensive approach to the topic.

As the author notes, geospatial data are typically obtained from satellites, mobile phones or internet activity.

Despite a historical aversion among Official Statisticians to using statistical models, under resource pressure and increasing demand, central government statistical agencies internationally have been considering less expensive data sources and greater use of statistical modelling. Geospatial data and more extensive statistical models are becoming increasingly important components in these developments. Although I am more cautious, for some people there is even the hope of fully automated data collection, data cleaning, creation of suitable predictor variables and model fitting to produce good local estimates and maps. There are 17 SDGs and over 200 subcategories (for list of SDGs, see, for example, https://sustainabledevelopment.un.org/content/documents/11803Official-List-of-Proposed-SDG-Indicators.pdf). Not all require small area estimation, but in most countries many do, and for the production of such a large collection of SAEs, such an automated process would be the dream of Official Statisticians internationally.

Although the use of geospatial data to produce local estimates through statistical modelling is not new (see, for example, Battese et al.,^[1]), the data now available are increasingly finely grained spatially. This more extensive geospatial data has offered the opportunity to produce local estimates of poverty and related measures such as food security and undernutrition, especially for countries and in situations where difficult national circumstances make conventional data collection via surveys and censuses difficult. Even where recent survey and census data exist, satellite data can be used as a supplement in more conventional small area estimation models, especially since (at least once the satellite is in orbit) data collection costs are relatively low and the frequency at which data can be collected is relatively high.

Despite its potential, the benefit of using satellite data is nuanced. There remain central issues that cannot be addressed simply by assuming that the variables collected from satellites (or those that can be derived from them), or increasing sophistication of methodology, will necessarily ensure local-level poverty or related estimates for all measures of interest that are useful for resource allocation.

There are also conceptual issues. For example, satellite data are almost always available at area level, which, even if this is a finer level than that for which local estimates are required, does not provide any indication of household or individual variation within areas, or any identifiable or linkable household information at unit level. So, when satellite data is part of standard small area models, it is only directly usable at area level for area-level SAE models but not at unit level for unit-level SAE models. For unit-level SAE models, satellite data can be used only to provide contextual variables at an aggregated level (see Haslett^[2]). As stated in the abstract of Dr Newhouse’s paper, such contextual information is best used in models at the finest level available.

As noted in the paper in Section 2, for data collected via mobile phones, there are also coverage issues, particularly important because the poorest are least likely to be phone subscribers, as well as complications due to low or very low response rates if subscribers are surveyed via their mobile phones (c.f. Aiken et al^[3]). Web-based data collection often has similar selection bias issues.

2. Satellite Data

The focus of the paper is on satellite data, so this also forms the central part of my further comments.

But first, an observation: much of the material in the paper is used to support an inductive argument. In essence, the argument for using satellite data as a replacement for census data as a source of predictor variables is based on an international collection of reports on various types of analysis with different data sources from the same country where comparison is possible between a satellite-based model and some other type of model. But in a real-world situation where only satellite-type data is available and a model is fitted and used to produce local estimates, no such comparison is possible, and the utility of the results rests on the argument that something similar worked reasonably well at another time and/or in another country. Such conclusions without cross-validation and field verification hold dangers where they are intended for allocation of scarce resources. As noted by Tzavidis et al.^[4] and reported in the paper:

It is important to be careful before inferring general results from particular model-based simulations. P16

As Dr Newhouse mentions, satellite data can have advantages:

Satellite indicators have a few key advantages over mobile phones and internet activity, however, including the public availability of a large number of indicators, in many cases derived from publicly available imagery provided by the Sentinel 2 and Landsat satellites. Proprietary high-resolution satellite imagery–from companies such as Maxar, Planet, Airbus, and others–can also either be used directly as an input into deep learning models, or as inputs to derive interpretable features such as building footprints, roads, and vehicles. P3

However, the choice of appropriate satellite-based predictor variables is crucial, and mobile phones and internet activity are not the only alternative sources. Even for derived variables, the utility of satellite-based variables as predictors can be limited. For example, distance to a power source’ can be a misleading predictor for a family or community living directly under power lines when that power source is 11,000 volts and so inaccessible; distance to markets measured in a straight line can be misleading in highly mountainous areas, for example, in Nepal, or where there are major river systems as in Cambodia.

Using deep learning instead to produce predictor variables also has disadvantages (again, sophistication of technique should not be the core aim), especially since:

Those that either use deep learning or tree-based machine learning either ignore or may not properly estimate or evaluate uncertainty, and many of the papers ignore the well-established statistical literature on small area estimation. P12

Data sources vary in quality. For satellite data, there can also be variation, for example, frequency of collection, data quality due to weather conditions, season and pixilation. Simply having many predictor variables may not be enough, particularly where the target variable for which SAEs are wanted is not well-related to satellite imagery except at a relatively high level of area aggregation (e.g., for diarrhoea in children under five years of age), c.f. the point made by Dr Newhouse that:

[Satellite data] offer access to several climate-related variables as well as a host of predictive features such as night-time lights, land classification, year of switch from pervious to impervious surface, estimates of net primary production, cell phone placement, a wide variety of climate and temperature variables, pollution estimates from the Sentinel 5-P satellite, a variety of soil quality measures, and countless other geospatial indicators. P3

Using complex deep learning (P13) as a black box to derive variables to use as predictors in models has particular pitfalls, especially for the uninitiated modeller. It also results in difficulties when the basis for models needs to be explained and justified to non-modellers (especially in a public meeting when presenting results from a country report for sets of SAEs).

These complications with the use of satellite data for predictor variables are, of course, more pronounced if satellites are the only data source used for prediction.

When the satellite data, instead of being used for prediction, itself provides the target variable, the question of the equivalence of the satellite-based target variable and that used previously based on sample surveys and census predictions needs addressing. This topic is discussed in the next section.

3. Choice of Target Measure

The choice of the measure of interest or target variable for SAE is important. As the author notes (P5), when discussing the use of satellite data when it provides the target rather than only predictors, there are key differences between wealth and predicted per capita consumption’, which again indicate choice of the variable to be predicted at local level, and differences (across a variety of countries) between what purport to be similar measures, can have considerable influence on the usefulness of satellite data for replicating non-satellite-based target variables for SAE.

Dr Newhouse mentions that:

Meta has also publicly released the Relative Wealth Index, based on the pioneering work of Chi et al.^[5] P3

Chi et al.^[5] support the use of the Relative Wealth Index based on satellite data as an alternative or even as a substitute for more established measures, but others are more wary. See, for example, the criticism in Sartirano et al.^[6]:

In this work, we aim to understand how the frontier-data derived index can be used to inform anti-poverty programs in Indonesia and the Asia Pacific region. First, we unveil key features that affect the comparison between the traditional and non-traditional sources, such as the publishing time and authority and the granularity of the spatial aggregation of the data. Second, to provide operational input, we hypothesize how a re-distribution of resources based on the [Relative Wealth Index] RWI map would impact a current social program, the Social Protection Card (KPS) of Indonesia and assess impact. In this hypothetical scenario, we estimate the percentage of Indonesians eligible for the program, which would have been incorrectly excluded from a social protection payment had the RWI been used in place of the survey-based wealth index. The exclusion error in that case would be 32.82%. Within the context of the KPS program targeting, we noted significant differences between the RWI map’s predictions and the SUSENAS ground truth index estimates.

As Dr Newhouse also notes (P4), c.f. Xu et al^[7] and Galimberti et al^[8],

[The Jean et al. (2016)^[9] model was trained on night-time lights, accuracy declined precipitously when the model attempted to predict within the lower portion of the per capita consumption distribution. In the African countries considered, most poor households live in dark rural areas, and a model trained to predict only night-time lights cannot distinguish welfare levels among them.

Linking new satellite-based target variables to current poverty measures is difficult. New target measures will inevitably have their own problems, especially with the classification and misclassification of the poor. Again, see Sartirano et al.,^[6] for example. The availability of satellite data and other sources of less expensive data may eventually lead to the adoption of alternative measures, even if they prove not to be equivalent. For example, the UN World Food Programme uses their Consolidated Approach for Reporting Indicators of Food Security (CARI). See https://www.wfp.org/publications/consolidated-approach-reporting-indicators-food-security-cari-guidelines. Data collection costs are high, however, and the use of alternative data collection methods is leading to consideration of rCARI, where r’ indicates remote’. See for example, https://docs.wfp.org/api/documents/WFP-0000151075/download/. Studies on whether the two measures are sufficiently close to equivalent are ongoing.

4. SAE Models and Modelling

One type of model does not fit all. Methodology needs to depend on the type of data available and the variable of interest.

Household or sub-area models are usually preferable to area-level models when sub-area data are available. Section 3(b), P14

See Das and Haslett,^[10] for example, for further discussion of choice of model type for unit-level models of poverty. Different methods also have different strengths and weaknesses, for example, requiring area-based or unit-level and cluster data. Choice and availability of suitable predictor variables and clean data are at least as important as the choice of model type. For satellite data, having clean data is simpler and cheaper.

It is impossible to develop a general rule about the relative accuracy of different models, because their relative accuracy depends on the nature of the data. P5

Satellite data may be useful for modelling, but what it is compared with is important:

In general, several studies suggest that small area estimates generated by combining survey and geospatial data are more accurate than those based solely on survey data, sometimes by significant margins. P6

The same improvement in accuracy holds when a survey and a census are used, because a census (like the satellite data) gives complete coverage of a country. So, the benefit is not due to the additional data needing to come from satellites. Nevertheless, satellite data does have one advantage that censuses do not. New satellite data can be collected and used for updates much more frequently than a decennial or even a quinquennial census. As a counterbalance, as mentioned in the paper (P7):

[R]ecent census data remain the gold standard for auxiliary data for small area estimation.

As also noted in the paper (P15, P8), contemporaneous survey and census or satellite data are required, so it can be clear what period the analysis results apply to.

What is being compared is also important:

[A]t least when predicting poverty rates at higher levels such as subdistricts, the evidence so far indicates that model-based estimates based on geospatial indicators are more accurate than direct survey estimates. This implies that the benefits of reducing sampling error by supplementing survey data with model-based predictions derived from geospatial indicators outweighs the introduction of model bias. [P6-7]

However, local estimates need to be more accurate than direct survey estimates to be of any use at all. So, the argument used above is not entirely logical. What is fundamentally more important depends on how much information there is in the satellite data and how well the chosen model utilizes that information.

Apart from the data source, there remains the question of which model or type of model is best. The claim is made that:

EBP models because they incorporate a conditional random effect that conditions on the sample, which is effectively used as a prior estimate. This distinguishes EBP models from two other popular alternatives: M-quantile (Chambers and Tzavidis, 2006^[11]) and ELL (Elbers et al., 2003^[12]). P14

However, despite its use of best’, benefits of EBP (Empirical Best Prediction - Molina and Rao, 201^[13]) over ELL (Elbers, Lanjouw and Lanjouw, 2003^[12]) M-quantile methods (and vice versa) are far from universal. As detailed in Haslett^[14] Das and Haslett^[10] for unit-level models, much depends on the type of data available. In their original forms, both EBP and ELL are mixed linear models from which best linear unbiased predictions (BLUPs) (Haslett and Puntanen^[15]) or more strictly empirical best linear unbiased predictions (EBLUPs) (Haslett and Welsh^[16]) are generally derived and used to create SAEs. As an aside, EBP uses EBLUP conditional on the sample for the SAEs for the sampled small areas, but can only use empirical best linear unbiased estimation (based on the fixed effects only) for unsampled areas, which is why the accuracy for sampled areas is generally better.

In terms of mixed models, the principal difference between ELL and EBP is that ELL has no area-level random effect and EBP has no cluster-level random effect. One consequence is that ELL requires very good contextual variables (averages of census variables, preferably at subcluster level) because it does not include a small-area level random effect and is particularly useful when full unit record census data is available (including survey cluster membership for every census household, with sampled clusters matched by ID between survey and census). EBP includes no survey cluster random effect at all, so it is more useful when cluster-level IDs linking survey to census are unavailable. EBP also generally includes a more limited set of auxiliary variables (and, as a result, has a more limited model) because, due to confidentiality restrictions, its census’ data is often a construction based only on categorical variables from a few cross-tabulations; the available census’ data then contains only categorical variables, each of which is identical for many of the observations, rather than being the full set of unit-level data and variables available from a unit record data for a complete census. M-quantile methods are good for estimating non-linear functions such as quantiles and distribution functions, and do not need to specify small area or cluster-level variation explicitly because they put aside boundary issues when developing the model. Nevertheless, M-quantile methods may not perform as well as ELL when there are good contextual variables and higher between-cluster than between-area variation (Das and Haslett).^[10]

Data transformations are mentioned in the paper:

The assumption that the stochastic error terms are distributed normally necessitates transforming the dependent variable (Tzavidis et al.^[4]). For household models, a log functional form has often traditionally been used, dating back to Elbers et al. (2003)^[12]. P15

However, greater clarity is needed on this issue, whatever the transformation or its complexity, because stochastic error terms are distributed normally’ is rather different from the variable being predicted itself being distributed normally. Also, stochastic error terms are more often distributed normally at higher levels of aggregation, especially in unit-level models, and it is finding models that limit variation at these higher levels (for example, through the use of contextual variables) that usually improves the accuracy of estimates. So, distributional issues for random effects are not necessarily the principal issue when fitting models, particularly for unit-level models (particularly at the initial stage for provisional models while searching for good’ candidate ones).

The use of specialized sample survey software necessary for analysis that properly incorporates the survey structure can often be delayed until initial models (which for linear models may then for simplicity of implementation include stepwise fitting of regressors) have been checked and the most important predictor variables and their interactions have been found, at least provisionally. The use of contextual variables as fixed effects that reduce variation in random effects may have technical ramifications (Haslett et al.^[17]), but in the context of SAE modelling, where it is the predictions rather than the parameter estimates that are important, these issues are minor. Similarly, because they are usually swamped by aggregation or averaging, outliers in unit-level data (where they apply at the individual or household level) are much less of an issue than outliers in area-level models.

Using one or several models for a given variable of interest is an important issue, especially given the potential for bias:

The importance of training data raises the question of whether estimating different models for different geographies, such as urban and rural areas, improves prediction accuracy. Estimating separate models may improve the accuracy of the estimates by better accounting for heterogeneity across regions. However, these models also utilize less training data, which reduces the richness of the prediction model in the typical case when the sample is used to select or tune models. P10

It is a misconception that models need either to be fitted to the entire country ignoring region, or fitted separately to parts of it. These are the extremes, but in between are a plethora of options. The simplest of these intermediate options uses data from the whole country and a set of indicator variables for each region. If a predictor variable has different effects in different parts of the country, its interaction with the regional indicators will be significant and can be added to the overall model. In the extreme, every variable in every regional model can have an interaction with region added to a model fitted to the national data; in terms of SAE predictions, this is almost exactly equivalent to a separate model for every region. The use of separate models for different provinces or large areas and the potential for regional bias from using separate models are discussed later in more detail in the context of Bangladesh.

There are also tree-based methods for SAE, such as classification trees and regression trees (see, for example, Bilton et al.^{[18, 19]}). Even if a tree-based model is not the final model choice, such explicit tree-based methods are useful for searching for interaction terms, even if these interactions are instead later used in linear mixed models. Important interactions are otherwise often very difficult to find when there is a vast collection of candidate interactions. These findings provide some limited support for Dr Newhouse’s claim that:

Tree-based machine learning methods appear to predict more accurately than linear models. P18

See also P1, 10, 11, 18. Whether using satellite data or not, using tree-based random forests or other black box methods requires considerable care and testing, however, including field validation. As noted in the paper:

Krennmair et al.^[20] proposes a mixed effect random forest model, which is tested in a design-based simulation using household-level covariates in Mexico. The results demonstrate the benefits of applying machine learning methods over traditional linear models. P6

See also P19. However, this conclusion is again inductive, here based on simulation and data from one country only.

Further, Dr Newhouse notes:

Merfeld and Newhouse^[21] evaluates small area estimates of an asset index for four countries: Madagascar, Malawi, Mozambique, and Sri Lanka. In addition, the paper evaluates small area estimates of poverty for Malawi obtained using publicly available geospatial auxiliary data. This study compares linear EBP models with three different types of machine learning models: Extreme Gradient Boosting, Boosted Regression Forests, and Cubist regression. Unlike Krennmair et al.^[20], the machine learning models do not include a conditional random effect. Despite that, the results indicate that all three machine learning methods generate substantially more accurate estimates than the linear EBP model, particularly out of sample. P6

This conclusion may be stronger than the evidence since, as indicated in the paper (P6), the results are not completely consistent across countries.

As Dr Newhouse also notes (P10), out-of-sample predictions can be strongly affected if models fitted to sample survey data are not adjusted for the survey design. This is common knowledge when fitting linear models. Bilton et al.^{[18, 19]} discuss how such survey design adjustments can be made for classification and regression trees. These methods are also applicable, at least in principle, to random forests. As noted, without such sample design adjustments:

[P]redictions for out-of-sample areas based on publicly available geospatial indicators should be treated with great caution. P11

5. Assessment of SAE Model Fit

Model fit assessment via

R^{2} = 1 - \frac{\sum_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i} {(y_{i} - \bar{y})}^{2}}

(5.1)

is suggested by Dr Newhouse. Note, as an aside, that (5.1) is not equal to (and should not be confused with) R² from a regression model using unit record data, for which R² can be low and still produce sufficiently accurate SAEs when model predictions from households or individuals are aggregated to small area level. It also assumes the true y_i is known, which is never the case in real-world applications (since, if it were, no modelling would be required). Instead, more usually, R² is used to estimate the squared correlation at the small area level of the small area estimates from two different models, neither of which produces the real SAEs. Even using direct estimates from household surveys (as Jean et al., 2016 do^[9]) is only comparing one estimate with another, and the direct survey estimate (being for a small subpopulation), although it can be useful, can be very inaccurate (which is why model fitting is necessary and being used to produce yhat_i). In other counties, especially those tabulated in Table 1: Comparison of accuracy across different sources of Dr Newhouse’s paper, satellite-based estimates are compared with other estimates (e.g., two sets of modelled SAEs are compared with one another rather than a set of SAEs being compared with the actual small area level values). So even the correlations (or squared correlations) need to be interpreted carefully because neither set of estimates used in each of the countries in Table 1 is necessarily correct.

Table 1.

Classification Probabilities for Two Sets of SAEs with Identical Symmetric Distributions, Zero Correlation and a Cutoff at the Joint Mean.

	SAE Measure 2
SAE Measure 1		Above cutoff	Below cutoff
	Above cutoff	0.25	0.25
	Below cutoff	0.25	0.25

Further as noted:

Differences in the extent of noise present in the training data, as well as the reference evaluation measure, will not necessarily affect the ranking of different types of models within the same context. But it explains much of the wide variation in R² observed in Table 1 across different studies and underscores the benefits of evaluation studies that compare different methods in the same context, using the same reference measure. P7

A core and largely unaddressed question in Dr Newhouse’s paper is what a particular correlation or R² at area level for two different sets of SAEs of the same target measure indicates in terms of the risk of inadequate resource allocation at area or household level. This is especially important for essential resources such as food. So, the question of whether R² is a good measure of similarity of two sets of estimated SAEs is broader and warrants further exploration, as does the interpretation of R² when one of the SAE measures is the true one.

In practice, R² is an overall measure used to indicate how well two different sets of SAEs match on what may be equivalent measures of the same underlying concept. Even when neither set of SAEs is the truth, such an overall measure can be useful. But in practice, SAEs are used not in an overall way, but to allocate resources to particular small areas. Often, small areas above a pre-specified cutoff get resources and those below the cutoff do not, or (based on the ranking of SAEs) a specified number of the most deprived small areas receive resources.

So it is useful to explore, at least in a preliminary way, how R² for two SAE measures of the same target variable is related to the probability of different allocations to the small areas above or below the same cutoff for the two SAE measures (e.g., where on what purport to be two equivalent estimated target measures, one set of SAEs assesses a small area to be above the prespecified cutoff and the other set of SAEs assesses the same small area as below the cutoff, and vice versa).

To focus discussion (and being optimistic), suppose the two sets of SAEs are similar enough to have the same mean and variance (taken over the small areas), and that the distribution of both sets of SAEs is symmetric. For simplicity, the cutoff for both sets is set at their shared mean. The extent to which this is a realistic setup will be discussed further below.

Assuming the correlation is not (undesirably) negative, consider the two extreme situations, one where the correlation between the two sets of SAEs is zero (Table 1) and the other where the correlation is one (Table 2). Both tables detail the probabilities that

Small areas that are classified on both measures as above the cutoff

Small areas that are classified on measure 1 as above, and on measure 2 as below the cutoff

Small areas that are classified on measure 1 as below, and on measure 2 as above the cutoff

Small areas that are classified on both measures as below the cutoff.

Table 2.

Classification Probabilities for Two Sets of SAEs with Identical Symmetric Distributions, a Correlation of One and a Cutoff at the Joint Mean.

	SAE Measure 2
SAE Measure 1		Above cutoff	Below cutoff
	Above cutoff	0.5	0.0
	Below cutoff	0.0	0.5

When the correlation between the two sets of SAEs is zero, the joint probability space is divided equally by this mean cutoff into these four quadrants (Table 1). So, the probability that the classification of a small area is different (i.e., one measure above, the other below the cutoff, or vice versa) is 0.5. When the correlation is one, there can be no small areas that are assigned differently so that the probability that an SAE is classified above the cutoff on both measures is 0.5, as is the probability that both measures are classified as below the cutoff (Table 2). So, when the correlation is one, the probability that the classification of a small area is different is zero. As the correlation increases from zero to one, for a small area, the probability decreases that one measure is above and the other below the cutoff.

Based on a simulation using a bivariate normal distribution with values of the correlation between 0 and 1 (where the squared correlation gives R²) yields the results in Figure 1.

Figure 1.

Per Cent Misclassified by Correlation with 95 per Cent CIs for Bivariate Normal Distribution.

What is apparent from Figure 1 is that even comparatively high values of correlation correspond to rather high probabilities of different classifications above and below the cutoff for the two sets of SAEs. For example, even for correlation as high as 0.95 (corresponding to R² of 0.9), 10 per cent of small areas are classified differently. For a correlation of 0.9 (corresponding to R² of approximately 0.8), nearly 20 per cent of small areas are classified differently. The probability of the two SAE measures leading to different classifications is considerably higher for most of the R² values for countries in Table 1 in Dr Newhouse’s paper.

When one set of measures is the true SAEs, but the other measure has been used for resource allocation, it is possible to distinguish further between small areas that should have received support but did not, and those that should not have received support but did, since (at least under the assumptions of this preliminary simulation) they have equal probabilities. The more usual situation is that neither of the measures is the true SAEs, and the percentages in Figure 1 represent the percentage of small areas that are classified differently by the two sets of SAE measures.

Of course, this simulation is just a preliminary exploration. It could be extended, even analytically. A more complete simulation or analysis would use a range of cutoffs as well as different distributions (including different means and variances) for the area-level SAEs.

But even in this preliminary form, Figure 1 gives grounds for pause, especially since setting the cutoff at or near the mean is not unrealistic. For example, in Indonesia, in 2020, for Food Expenditure Greater than 65 per cent of Total Expenditure, the estimated proportion is 31 per cent (0.31) taken across all households; in Nepal, in 2019, one in three children (32 per cent) under the age of 5 years were stunted, and in 2020, the percentage of households without formal financial services was 45 per cent.

So, regrettably, the conclusion internationally from Table 1 must be that where two sets of SAE measures have been used for what claims to be the same indicator, one set based on a survey plus satellite measures and the other set based on a sample survey and census data, the agreement between the two sets of SAEs is insufficient to justify the use of sample survey plus satellite data only SAEs as a standard substitute for more conventional SAEs based on a survey and census data. Of course, overall, the percentage classified differently would be lower if the cutoff were below the mean, but it would not necessarily be lower taken as a percentage of those classified as poor on either SAE measure, which is the usual focus (i.e., excluding those classified as above the cutoff on both measures).

Except for North-eastern Tanzania, which uses a simulation, the Steele et al.^[22] study in Bangladesh has the highest correlation in Table 1. The claim there that correlation between satellite-based upazila estimates in Steele et al.^[22] and the SAE estimates using Bangladesh HIES 2011 survey and 2010 population census is 0.95 may need clarification. There were issues with the census-based SAEs, apparent at divisional boundaries, from the use of separate ELL models for each of Bangladesh’s six divisions, exacerbated by the necessity of using a census subsample in place of a full census. See World Bank, Bangladesh Bureau of Statistics, World Food Programme^[23]. The data sets used in Steele et al.^[22] are not contemporaneous, covering a period of 3 years or more. There is the usual open question of how a survey of mobile phone customers can provide unbiased SAEs for poverty measures when mobile phone ownership is connected to personal and family wealth. The principal aspects of the modelling used in Steele et al.^[22] are that boundaries for local areas were based on Voronoi polygons (which, put simply, assign every point on a map to its closest mobile phone tower), that spatial modelling used initial estimates, which are allocated to the centroid of each mobile phone tower area, and that the computation method used for estimation was integrated nested Laplace approximations (INLA). The use of centroids as a substitute for actual location data contrasts with disease mapping, where actual location of individual outbreaks is known. Placing each local area estimate at its centroid introduces measurement error. The Voronoi polygons used may amalgamate subareas that are rather different (such as in Chittagong, where areas run east to west, but the mountains run from north to south). Spatial smoothing can disguise estimation and modelling problems by making adjacent areas more similar. In this context, the fact that the correlation between the two sets of SAEs is so high in Bangladesh may warrant further study.

Further, as Dr Newhouse notes even a high correlation may leave estimates for some small areas still having large errors:

Van der Weide et al. (2022)^[24] generate small area estimates of monetary poverty in Malawi for Traditional Authorities by combining survey data with publicly available geospatial features. Like Steele et al.^[22] their study uses a Bayesian structural equation model that accounts for spatial correlation across areas and validates the prediction against census-based estimates. It finds a correlation between the geospatial and census estimates above 0.9, although some individual target areas show substantial discrepancies between the census and geospatial-based estimates due to differences in the data used for prediction. P6

The choice of measures of accuracy needs to be carefully considered. Each measure can be useful as a diagnostic, but not alone as a determinant of the best’ SAE model. For example, using a fixed relative standard error (RSE) for all SAEs can be a poor measure of a proportion or percentage when the proportion or percentage is not small, because then the associated standard errors (SE) can be very large. SE, like RSE, does not incorporate bias. Even root mean square error (RMSE) and root mean square error of prediction (RMSEP), although they incorporate certain types of bias, do not necessarily incorporate all types of model error. Most accuracy measures assume the model is correct.

It is also necessary in small area estimation to recognize that the estimates are not the true values and that confidence bounds and credible intervals for estimates carry an inherent risk. For example, if a 95 per cent level is used for each SAE, then on average, 1 in 20 small areas will have estimates for which the true value is not contained within the 95 per cent limit. When there are 7,000 small areas (e.g., as for subdistricts in Indonesia), this means the true value will be outside the 95 per cent limits for about 350 small areas. There is also an additional risk from modelling, which would add to the number of true SAEs that are not within their specified range, that actual coverage of confidence or credible bounds may differ substantially from their nominal coverage. This all necessitates ground checking and validation, especially for those small areas that do not conform to local expectations. As with other data sources, using satellite data for SAE modelling does not circumvent this requirement.

6. Conclusion

For small area estimation, satellite data is:

1. Best for predicting variables directly related to ground cover, for example, crops, energy use:

[A] variety of data pertaining to agriculture and food security are posted online through FAO’s hand-in-hand geospatial platform, which contains information on food security, crops and vegetation. Recent subnational estimates of crop type and yield estimates are only available for a few countries at this time, but coverage will likely expand significantly in the coming years. P3-4

2. Almost certainly useful as candidate auxiliary variables for predicting certain types of expenditure poverty (of the type usually collected by detailed expenditure and income sample surveys). Possibly useful directly as a source of target variable, but a change of the poverty measure (which would likely prove not to be equivalent to that from income and expenditure surveys) may be required.

3. Not so useful for food security based on household food availability or for providing information on variation between households within an area. So, satellite data is only partially useful for prevalence or numbers of undernourished people, although it may have the advantage (with good modelling) of being able to provide SAE-type predictions of future food need via satellite imagery of crops, where these can be distinguished from other types of ground cover.

4. Not particularly useful for predicting health-related SAE measures: stunting, underweight and wasting (and especially diarrhoea) in children under five.

Nevertheless, there are grounds for cautious optimism that the increasing sophistication and detail in satellite data will improve SAE (estimates) and help determine the best type of satellite-based predictors for each type of SAE measure. However, satellite data alone will never be a panacea for SAE modelling. If the inexpensive data are of poor quality, there seems to be no way to rescue the situation short of getting better data. Sophistication of statistical method and availability of inexpensive data are not sufficient to guarantee the usefulness of results.

In the final analysis, the real cost is not the cost of data collection and model fitting, but the considerably greater cost and consequences if scarce resources are not well allocated.

Footnotes

Acknowledgements

The author thanks the referee and editor for useful comments. He also thanks to Dr Newhouse for providing a paper that is a relatively non-technical summary on the use of geospatial data linked with sample survey data to produce local, model-based estimates for target variables not present in the geospatial data itself, and on using satellite data to produce alternative target measures. His summary is especially useful because the literature is extensive and relevant publications do not all appear in statistical or in subject-matter journals. It is difficult to be familiar with the extraordinary breadth of all the relevant material.

References

Battese

, Harter

, Fuller

. An error-components model for prediction of county crop areas using survey and satellite data. J Am Stat Assoc 1988; 83: 28 – 36.

Haslett

Small area estimation using both survey and census unit record data: Links, alternatives and the central roles of regression and contextual variables. In: M. Pratesi (ed) Analysis of Poverty Data by Small Area Estimation , 18. New York: Wiley, 2016, pp. 327–348.

Aiken

, Bellue

, Karlan

Machine learning and phone data can improve targeting of humanitarian aid. Nature 2022; 603: 864–870. https://doi.org/10.1038/s41586-022-04484-9

Tzavidis

, L-C

Zhang

, Luna-Hernandez

From start to finish: A framework for the production of small area official statistics. J R Stat Soc (A): Stat Soc 2018; 181(4): 927–979. DOI: 10.1111/rssa.12364

Chi

, Fang

, Chatterjee

Microestimates of wealth for all low- and middle-income countries. PNAS 2022; 119(3): e2113658119. DOI: 10.1073/pnas.2113658119

Sartirano

, Kalimeri

, Cattuto

Strengths and limitations of relative wealth indices derived from big data in Indonesia. Front Big Data 2023; 6: 1054156. DOI: 10.3389/fdata.2023.1054156

, Song

, Li

Combining night time lights in prediction of poverty incidence at the county level. Appl Geogr 2021; 135: 102552>.

Galimberti

, Pichler

, Pleninger

Measuring inequality using geospatial data. The World Bank Econ Rev 2023; 37(4): 549–569. DOI: 10.1093/wber/lhad026

Jean

, Burke

, Xie

, . Combining satellite imagery and machine learning to predict poverty. Science 2016; 353: 790–794.

10.

Das

, Haslett

. A comparison of methods for poverty estimation in developing countries. Int Stat Rev 2019; 87(2): 368–392. DOI: 10.1111/insr.12314

11.

Chambers

, Tzavidis

M-quantile models for small area estimation. Biometrika 2006; 93: 255–268.

12.

Elbers

, Lanjouw

, * Lanjouw

Micro-level estimation of poverty and inequality. Econometrica 2003; 71: 355–364.

13.

Molina

, Rao

JNK

. Small area estimation of poverty indicators. Can J Stat 2010; 38: 369–385.

14.

Haslett

Small area estimation of poverty using the ELL/PovMap method and its alternatives. In: G. Betti and A. Lemmi (eds) Poverty and Social Exclusion: New Methods of Analysis , 12. London: Routledge, 2013, pp. 224–245.

15.

Haslett

, Puntanen

Best linear unbiased prediction (BLUP). Wiley StatsRef: Statistics Reference Online . 2017. DOI: 10.1002/9781118445112.stat08120

16.

Haslett

, Welsh

AH.

EBLUP: empirical best linear unbiased estimation. Wiley StatsRef, Statistics Reference Online . 2019. DOI: 10.1002/9781118445112.stat08180

17.

Haslett

, Isotalo

, Markiewicz

Upper bounds for the Euclidean distances between the BLUEs under the partitioned linear fixed model and the corresponding mixed model. In:

Bapat

R. B.

, Karantha

M. P.

, Kirkland

S. J.

, Neogy

S. K.

, Pati

, Puntanen

(eds) Applied Linear Algebra, Probability and Statistics: A Volume in Honour of C. R. Rao and Arbind K. Lal , 3. Berlin: Springer, 2023, pp. 27–44.

18.

Bilton

, Jones

, Ganesh

, . Classification trees for poverty mapping. Comput Stat Data Anal 2017; 115: 53–66.

19.

Bilton

, Jones

, Ganesh

Regression trees for poverty mapping. Aust N Z J Stat 2020; 62(4): 426–443. DOI: 10.1111/anzs.12312

20.

Krennmair

, Wurz

, Schmid

Tree-based machine learning in small area estimation. The Survey Statistician 2022; 86: 22–31.

21.

Merfeld

, Newhouse

DL.

Improving estimates of mean welfare and uncertainty in developing countries. Policy Research Working Paper Series 10348 2023, The World Bank. https://ideas.repec.org/p/wbk/wbrwps/10348.html

22.

Steele

, Sundsøy

, Pezzulo

, Mapping poverty using mobile phone and satellite data. J R Soc Interface 2017; 14(127): 20160690.

23.

World Bank, Bangladesh Bureau of Statistics, World Food Programme. Poverty maps of Bangladesh 2010: Key Findings , WB, BBS & WFP. 2014; p. 20. https://openknowledge.worldbank.org/entities/publication/868c75f5-3cd3-5797-86b0-c4282de06402

24.

Van der Weide

, Blankespoor

, Elbers

, . How accurate is a poverty map based on remote sensing data?. World Bank policy research working paper no. 10171, 2022.