Abstract
Understanding the impact of scale and zonation is critical for accurately assessing population health in relation to air quality and demographic data. Using existing census geographies, we analyse spatial clustering and statistical associations across different census scales, focussing on vulnerable sociodemographic groups and temporal exposure patterns. This study aims to assess how different spatial resolutions affect the strength and interpretation of associations between air pollution, sociodemographic variables, and self-reported health. It also evaluates whether finer-scale, health-need-based zoning provides a more accurate basis for public health analysis. Exposure to higher PM2.5 concentrations, and having an undrlying disability, or long-term illness shows robust associations with poor health (p < 0.001 at all census scales). However, as spatial resolution becomes coarser, the explanatory power of demographic variables weakens, underscoring the risk of ecological fallacy and misinterpretation when relying on aggregated data. Notably, demographic variables become less significant with coarser spatial resolution, supporting the need for scale-sensitive approaches in population health studies. Multiple linear regression analyses demonstrate that explanatory power in health models strengthens at coarser scales, with potential overfitting at ward levels due to high R-squared values. Whilst a stronger model fit is observed at higher levels of aggregation, this may mask within-area heterogeneity and obscure critical local disparities. Our findings suggest that effective public health policies benefit from granular and contextually aligned zoning strategies, which enhance the accuracy and relevance of health assessments. The study highlights the value of fine-scale, health-need-based geographic units in capturing the spatial nuances of population health and provides evidence supporting their use in equitable resource allocation and intervention design. The findings provide a framework for evaluating environmental and demographic factors at appropriate geographic scales to support targeted, equitable health interventions.
Introduction
Human exposure to potentially harmful air pollution is recognised as being highly irregular due to the spatiotemporal variability of atmospheric pollutants within urban environments (Vardoulakis, 2009). However, human populations also vary spatiotemporally within urban areas (e.g. Kwan, 2013; Martin et al., 2015; Smith et al., 2016). Ambient measures of air quality are commonly used in cities, especially in developed countries, where people spend up to 90% of their time indoors. However, indoor air quality can be equally or even more detrimental than street-level air quality (Santiago et al., 2022; Vardoulakis, 2009). It is now well established that ubiquitous pollutants such as NO2 and PM2.5 are a global public health issue (Mannocci et al., 2019; Meng et al., 2022; Riches et al., 2022; WHO, 2021). Research shows that NO2 and PM2.5 adversely affect human health for given exposure concentrations and durations (e.g. Feng et al., 2016; Hesterberg et al., 2009; Huangfu and Atkinson, 2020; Khaniabadi et al., 2017). Many major cities have seen improvements in air quality, including London, when observed at the city scale. An example of this is nicely visualised through the 2024 ‘air quality stripes’ graphics (National Centre for Atmospheric Science, 2024). However, here we intend to examine small scale variability given that sociodemographic characteristics are highly clustered along with population density. The World Health Organisation (WHO) sets out a series of interim targets followed by a recommended air quality guideline (AQG) level for harmful atmospheric pollutants. Its latest (2021) recommendation sets the mean annual AQG maximum concentration as 5 and 10 µg/m3 for PM2.5 and NO2, respectively (WHO, 2021), following a series of interim targets. However, more manageable interim targets have also been set to promote a gradual shift from high to lower concentrations. These are 15 and 20 µg/m3 for PM2.5 and NO2, respectively (WHO, Interim Target 3).
Social components of vulnerability and scale
Some population subgroups are disproportionately impacted by, or more susceptible to, environmental disamenities such as poor air quality (Maantay, 2007). The WHO identifies the following future research needs to address policy-relevant questions on population vulnerability to poor air quality: (i) define sensitive population subgroups by socioeconomic status, and (ii) assess multiple sources of exposure in different locations (e.g. home and workplace) as well as time-activity patterns (WHO, 2021). The application of GIS is indispensable in measuring the fundamental concepts of environmental injustice and resolving associations with social vulnerability (Ballas et al., 2017). Cutter (1996)’s hazards of place model, used to consider vulnerability to environmental hazards, sets geographic context and scale as key investigative elements. It recognises that the choice (and scale) of areal units, used to contain sociodemographic data, also pose a potential issue for vulnerability analyses. Furthermore, the development of robust GIS-based and population research agendas will be key in addressing these ambitions.
Spatial scale plays a critical role in shaping observed relationships between variables in geographical analysis. Flowerdew (2011) explains that as data are aggregated into larger units, local variation is smoothed out, potentially weakening or altering the significance of explanatory variables. For example, variables that exhibit strong associations with health outcomes at a fine-scale (e.g. census Output Area) may appear less significant at coarser scales (e.g. Wards), not because the relationships are absent, but because spatial averaging obscures them. This scale-induced distortion can lead to the ecological fallacy, wherein inferences about individual-level behaviour or outcomes are incorrectly drawn from aggregate-level data. Recognising and adjusting for these effects is essential in public health studies to avoid misinterpreting spatial patterns and misallocating resources.
Spatial units still matter
It has long been recognised that urban-based environmental risk assessments have an optimum spatial scale and resolution for analysis, as well as being constrained by the resolution of the data available (McMaster et al., 1997). Furthermore, the impact of the areal units used in GIS analyses can have a profound effect on the statistical analyses of outputs through the modifiable areal unit problem (MAUP) (Openshaw, 1984) or the modifiable spatiotemporal unit problem (MSTUP) (Martin et al., 2015). Potential disputes in the relationships between NO2 and health can, in part, be attributed to the selection of inappropriate spatial units for analysis (Parenteau and Sawada, 2011). Flowerdew et al. (2008) propose three guidelines for health study research design that rely on areal units: (i) the zonal system used to constrain analyses should not be assumed or taken for granted; (ii) consideration should be given to the choice of zonal system used and its appropriateness for representing the underlying data; and (iii) an ability to experiment with different ways of aggregating data to examine the potential for any size and zonation effects. The ability to resolve local scale spatial patterns at a higher level is more likely if the data are spatially autocorrelated (Flowerdew, 2011).
The importance and concept of scale in urban settings have drawn interest across various disciplines, both in theory and practice (Lengyel et al., 2023). The choice and aggregation of areal units (the scale and zonation problem), and the subsequent need for care to be taken in their use in statistical analyses, has long been noted (e.g. Alvanides et al., 2001; Openshaw, 1977). Certain explanatory variables for population health have been shown to previously vary in their significance with changes in scale (Flowerdew, 2011; Flowerdew et al., 2008; Parenteau and Sawada, 2011). Using areal data to infer processes at different levels, particularly individual or household levels, is well-known to risk ecological fallacy (Robinson, 1950) and significant analytical challenges (Wrigley et al., 1996). The specific pattern of spatial dependence (autocorrelation) may be unknown or unique to a particular location. Furthermore, within a nested geography (e.g. census areal units), the pattern may be a result of interactions at lower levels (Atkinson and Tate, 2000).
Using London as an example, our analyses explore the relationship between urban air quality and associated health and sociodemographic characteristics, whilst examining the choice of areal units used. In addition, we identify and utilise relevant small area sociodemographic characteristics to inform a spatially constrained multivariate clustering approach. In adopting this approach, we build on principles outlined by Flowerdew (2011) and Flowerdew et al. (2008), focussing on social health data research, appropriate areal units, and spatial autocorrelation. It is long noted that census output geographies will inevitably be a compromise across users’ requirements, as well as a requirement for maintaining confidentiality (Rees and Martin, 2002). A challenge for social researchers occurs when using areal units arbitrarily defined without consideration for the underlying sociodemographic patterns.
In the context of large urban areas expanding (e.g. London) or introducing (e.g. New York City, January 2025) vehicle emission control zones, we examine the environmental justice implications of poor air quality on resident populations. The rest of this paper is structured as follows: the methodological approach using small area census data and GIS is outlined. A worked example is provided using English 2021 Census data for London, UK, coupled with high-resolution air quality measures. Secondly, a regression model is built to examine the impact of the zonal scale on the possible explanatory variables identified. Thirdly, consideration is given to the effect of temporal activity patterns in comparing the impact of the usually resident (often called the census night-time population) and the workday population estimate, and zone design. Finally, a discussion is presented that will inform future health research concerning air quality, population exposure, and choice of spatial units used. This is of the utmost importance in designing robust measures to examine the urban air quality public health concern, as well as to critique policy-based research.
Methodology
Data sources and variables derived for analysis.
Geospatial and weighted multiple linear regression analyses were conducted using ArcGIS Pro v. 3.1.0 and R v. 4.4.2 (Pile of Leaves, October 2024). To investigate the question of scale, this study utilised a range of English (2021) Census nested hierarchical geographies within London, UK (Figure 1). These ranged from the following areal units. The mean 2021 London population per unit ( • Output Areas (OAs): • Lower layer Super Output Areas (LSOAs): • Middle layer Super Output Areas (MSOAs): • Wards: An overview of 2021 Census geography and 2022 electoral wards illustrated for the City of Westminster, London.

Health and selected sociodemographic variables were derived from each of these geographies (Table 1). These variables represent the usually resident population on the census night. In addition, selected census outputs are also produced for the workday population (Table 1, variable suffix ‘_wd’). The workday population is an estimate of the population during the working day. It includes all usual residents in England and Wales who are either in employment in the area or not in employment but live in the area. All the variables selected are available in the full range of geographies investigated.
The London Atmospheric Emissions Inventory (LAEI) 2019 represents ground level annual mean concentrations of key pollutants at the closest reference date to the 2021 Census. It covers the whole of Greater London at 20 m resolution (Greater London Authority, 2024). ‘Summarise Raster Within’ (ArcGIS Pro) was used to derive the annual mean LAEI NO2 and PM2.5 concentrations by each of the census geographies listed.
The 2021 Census workday population is available as a bulk download from NOMIS (2024). It is published for the same census output geographies as the usually resident population. For this study, we pre-processed these data in R using tools from the dplyr package to convert them from long to wide format for data linking and analyses. At the time of writing, and unlike the 2011 Census, a Workplace Zone output geography has not yet been published or linked for 2021 Census data.
An alternative set of wards was developed to contrast with the official ward geography, referred to here as ‘alternative wards’. We used the spatially constrained multivariate clustering tool in ArcGIS Pro (ESRI, 2024) to construct health-need-based aggregations from Output Areas (OAs). This algorithm employs a region-growing method that balances multivariate similarity with spatial contiguity. Specifically, the clustering procedure aims to minimise within-cluster heterogeneity by optimising the multivariate Euclidean distance across selected sociodemographic variables, whilst enforcing spatial adjacency through contiguity constraints (‘contiguity edges only’), meaning only OAs sharing a boundary can be grouped.
Using the summarise within tool, we calculated the mean number of OAs within existing London Wards (mean = 57, maximum = 102). Using OAs as building blocks, ‘alternative wards’ were created based on the percentage of the population with bad health (pcbad) to distinguish clusters. The minimum number of OAs to form each constituent ward is set at 20, with the number of clusters 704 (the same number of real Wards). A spatial constraint parameter was set for ‘contiguity edges only’, ensuring that clusters will only contain contiguous polygon (OA) features, and only OAs that share an edge can be part of the same cluster. The identified OAs forming each cluster were dissolved to form the new ‘alternative wards’. As these ‘alternative wards’ are nested geographies built from OAs, sociodemographic variables (Table 1) were re-aggregated from their constituent OAs using R.
Mathematically, the method iteratively builds clusters by merging spatial units that minimise the increase in overall within-cluster variance, an approach based on constrained agglomerative clustering (Guo, 2008). This process can also be understood within the framework of minimum spanning trees, which help efficiently structure the search space for regionalisation problems under spatial and statistical constraints (Assuncão et al., 2006). This approach is appropriate for this study as it supports the generation of alternative zonal systems that are both spatially coherent and responsive to underlying health-related needs, which is crucial for accurate spatial epidemiological analysis. This process adhered to the principles of size, quality, and attribute similarity outlined by Alvanides et al. (2002) and the recommendation by Flowerdew (2011) to ensure spatial autocorrelation at lower levels (e.g. OA) is observed at higher levels (e.g. wards).
Multicollinearity diagnostics for the MLR (OA level).
The independence of residuals was also assessed. As the OA-level model was weighted, the Durbin-Watson test could not be applied directly. Instead, residual autocorrelation was assessed using a residuals versus fitted values plot and the autocorrelation function (ACF). The residuals displayed no systematic pattern, and the ACF plot showed no significant autocorrelation across lags, supporting the assumption of independence.
The normality of residuals was evaluated via Q-Q plots and the Shapiro-Wilk test, applied to a random sample of 5000 residuals due to the large sample size. The test indicated significant deviation from normality (W = 0.963, p < 0.001). Visual inspection of the Q-Q plot showed that the residuals closely followed the diagonal line, with only slight deviations at the tails, suggesting no severe departures from normality. This further supports the decision to proceed with regression analysis, particularly given the large sample size and the robustness of the Central Limit Theorem in this context. The regression model is specified as follows, where β0 is the y-intercept and βx represents the regression coefficients:
To compare geographies using the usually resident versus the workday population, a reduced regression model (Equation (2)) was applied. This model used a smaller set of predictors, as fewer sociodemographic variables are published for the workday population. The regression was weighted by the workday population:
The regression analysis (Equation (1)) was repeated for the ‘alternative wards’ to provide a comparator. Composite census and other variables providing indicators of deprivation were not used in this study due to the possible confounding effect. These composite indicators (e.g. household census counts in dimensions of poverty) often incorporate dependent variables or explanatory variables, which could bias the analysis.
Whilst Geographically Weighted Regression (GWR) is a valuable method for capturing spatial non-stationarity in relationships between variables (Brunsdon et al., 1998), this study employed MLR to support direct comparison of model outcomes across varying spatial units. The primary objective was to assess how the relationships between sociodemographic and air quality indicators and population health outcomes are affected by changes in spatial scale and zoning. MLR, as a global model, assumes a consistent relationship across the study area, making it particularly well-suited for comparing regression coefficients across geographies such as OAs, LSOAs, and Wards. This is critical when investigating the MAUP and the effects of spatial aggregation, as emphasised by Flowerdew (2011), who highlights the importance of maintaining comparability across spatial units when analysing census or health data. Therefore, whilst GWR is more sensitive to local variations, MLR provides a more interpretable and scale-consistent approach for the purposes of this research.
Results
The spatial distribution of the percentage of the population with bad or very bad health, as well as annual mean ground level concentrations of NO2 and PM2.5, is shown in Figure 2. In terms of air quality, there are notable relationships with the transportation network. Particularly, Heathrow Airport in the west (Figure 2(a)), the A406 North Circular Road (Figure 2(a) and (b)), and also within central London. Highly localised concentrations can also be caused by other emission sources (e.g. other industry and industrial kitchen ventilation). Visual inspection reveals limited discernible patterns in the percentage of the population reporting bad or very bad health at the OA level (Figure 2(c)). However, the Ward level overview (Figure 2(d)) suggests lower poor health incidence in the southwest of London, with worse outcomes tending to be seen in the centre and north. At the small area level (Figure 2(c)), 99.05% of OAs have a mean annual concentration of NO2 greater than the WHO (2021) Interim Target 3 limit (>20 µg/m3), and 0.07% of OAs exceed the Interim Target 3 PM2.5 concentration (>15 µg/m3). We did not find skewing of populations with vulnerable sociodemographic characteristics (Table 1) within OAs within the exceedance limits. This is to be expected as OAs are purposely designed with statistical discourse control in mind. Consequently, they adhere to minimum population thresholds with a zonation that aims to support social homogeneity (Cockings et al., 2011; Martin et al., 2001). Annual mean ground level air quality concentrations (2019, 20 m resolution) across Greater London for NO2 (a) and PM2.5 (b). Percentage of the 2021 Census population with bad or very bad health at Output Area (c) and Ward (d) level.
Descriptive statistics for the variables used by census geography (London). All percentages of the usually resident and workday (‘_wd’ suffix) 2021 Census population (except for air quality, which is given in concentration).
Model results and comparison across different census and administrative geographies for the usually resident population.
Significance: p < 0.001 (***), p < 0.01 (**), p < 0.05 (*).
Across all geographic levels the concentration of PM2.5 (aqpm25) alongside the percentage of the population in lower socioeconomic occupation (pcsec), aged over 65 (pco65), economically inactive due to being long-term sick or disabled (pclts), or considered to have a disability (pcdis) remain significant (p < 0.001) explanatory variables (Table 4). At the OA and LSOA level, the concentration of PM2.5 also has the second greatest impact on overall health outcome after any other variable tested. The percentage of the population economically inactive due to being long-term sick or disabled (pcdis) has the greatest association at these geographies. At the coarser scale this association reverses with the concentration of PM2.5, showing the greatest impact on overall population bad health.
Model results and comparison across different census output geographies for the usually resident and workday population.
Significance: p < 0.001 (***), p < 0.01 (**), p < 0.05 (*).
The output from the spatially constrained multivariate clustering analysis is shown in Figure 3, visualising the percentage of the population with bad or very bad health. This figure compares the real wards with the generated ‘alternative wards’, with ward boundaries delineated to provide context for comparison. When Output Areas (OAs) were created in 2003 from the 2001 Census data, they nested exactly within the ward boundaries of that time. Since then, numerous ward and parish boundary changes have occurred, resulting in splits of some OAs (ONS, 2020). The ‘alternative wards’ produced in this study adhere to the nested approach, comprising 2021 Census OAs. These ‘alternative wards’ consist of the same average number of OAs and the same overall count (704) as the real wards. However, the ‘alternative wards’ (Figure 3(b)) provide a clearer visual interpretation of the spatial distribution of poor population health. Clusters of bad health are more easily identifiable, particularly in the northern and southern parts of the city, along with their associated magnitudes. Percentage of the 2021 Census population with bad or very bad health at the Ward level (a) and ‘Alternative Ward’ (b) level, with unit boundary detail added.
Model results and comparison for the ‘alternative wards’.
Significance: p < 0.001 (***), p < 0.01 (**), p < 0.05 (*).
The spatial aggregation effect, which allows better matching of lower-level geographies with a more autocorrelated larger level (Flowerdew, 2011) was also examined. The ‘alternative wards’ (Figure 3(B)) provided a clearer visual comparison with the underlying OA spatial distribution of bad health (Figure 2(c)) than the real wards (Figure 3(a)). However, when comparing regression coefficients, the ‘alternative wards’ (Table 6) better match the underlying OAs (compared to the real wards) in the non-white (pcnwht) and aged over 65 (pco65) explanatory variables.
Discussion
In this study, we have explored the impact of scale and zonation with respect to population health and air quality when dealing with small area census sociodemographic data. This exploration is motivated by the research need to identify vulnerable sociodemographic groups and assess exposure based on different time-activity patterns (WHO, 2021). We have previously noted that social homogeneity is a desirable design principle for statistical disclosure control in census output geographies (e.g. Martin et al., 2001). Therefore, we have focused on the spatial clustering of existing small-area geographies as well as the statistical association across different scales. Importantly, we respect the principle of not publishing or creating alternative small area census output geographies which are unsafe due to differencing, due to statistical disclosure control concerns (Duke-Williams and Rees, 1998). Therefore, we have examined sociodemographic associations with air quality statistically as well as by spatially clustering existing OAs based on common characteristics. Within the context of this study, we did not intend to or see the merit in creating additional small-area geographies. Instead, the principal outputs concern the re-aggregation of existing geographies and the assessment of the significance of association across all scales.
Effect of scale and zonation, and model performance
As with the MAUP, we found that relationships between variables tend to become stronger at coarser scales (Flowerdew, 2011; Manley, 2021; Parenteau and Sawada, 2011). In our analyses, explanatory models for population health (equation (1), Tables 4 and 5) show a good to strong fit (R2 0.7 to 0.9) as scales become coarser. The higher values at the Ward level (R2 > 0.9) could indicate potential model overfitting. The closeness of these values to their respective adjusted R2 values indicates a good model fit. This metric is particularly useful in multiple regression models where the goal is to assess model fit without inflating the apparent explanatory power due to more predictors.
The consistently low p-values (<0.001) for the high F-statistic across all scales (Tables 4-6) indicate a highly significant relationship between the chosen explanatory variables and population health (dependent variable, pcbad), suggesting the model fits better than a baseline with no predictors. However, the model fit decreases as geographic units for analysis become larger (from OA to Ward, Table 4). Scale impacts the significance of some of the explanatory variables. For example, the percentage of the population living in social rented accommodation (pcsoc) and non-white (pcnwht) are highly significant at OA and LSOA level (p < 0.001 or 0.01) but become less significant (pcnwht) or not significant at all (pcsoc) at the MSOA and Ward level (Table 4). Albeit these explanatory variables have negligible regression coefficients at all scales examined. Other variables are statistically significant at all scales assessed. These include the concentration of PM2.5 (aqpm25), the population aged under 16 (pcu16) and over 65 (pco65), considered to have a disability (pcdis), and economically inactive due to a disability (pcemploy).
Considerations for population health
We observed that exposure to PM2.5 alongside having a disability and unemployment due to long-term sickness has the greatest association with the population reporting bad health in this study. The significance of the concentration of PM2.5 in the ‘alternative wards’ model exhibits reduced magnitude from the real ward regression; however, the significance and impact of the percentage of the population economically inactive due to being long-term sick or disabled (pclts), considered disabled (pcdis), aged under 16 (pcu16) or over 65 (pco65) increases. This is likely a result of closer sociodemographic grouping as a result of spatial clustering by bad health. Designing zones that balance granularity with meaningful aggregation helps retain essential demographic variations whilst mitigating MAUP effects (Alvanides et al., 2002). Aligning zones with social and economic boundaries provides a more accurate picture of community needs and helps in developing culturally and contextually appropriate policies.
It is important to acknowledge a key limitation in the workday population model: due to data constraints, variables such as pclts (long-term sickness or disability) and pcdis (self-reported disability) were excluded, despite their demonstrated significance in the full resident population model. Their omission raises the possibility of omitted variable bias, where effects of included variables, such as PM2.5, may be underestimated due to the absence of key confounders. Therefore, whilst PM2.5 was still found to be significantly associated with poor health in several workday models, the magnitude of this relationship may be conservative. This limitation should caution against overinterpreting the apparent reduction in PM2.5 impact in the workday analysis.
We have also explored model variation based on temporal activity patterns. Previous work has empirically quantified population fluctuations based on diurnal patterns in hazardous locations (e.g. Smith et al., 2015; Smith et al., 2016). In doing so, we observed a significant association between PM2.5 exposure and poor health in the workday population, although the magnitude of this relationship may be understated due to the exclusion of key sociodemographic predictors (Table 5). Instead, having a routine occupation or never worked/long-term unemployed (pcsec_wd) or age (pco65_wd) had the largest significant association with bad health for the workday population. This subgroup may have reduced economic mobility and, therefore, be less associated with changes in exposure to poor air quality.
Air pollutants have been consistently associated with poor health outcomes and mortality (Hvidtfeldt et al., 2019). This has also been found to be adversely associated with even short-term exposure, but it is suggested that additional longitudinal studies are needed (Ronaldson et al., 2022; Samoli et al., 2016). In another study, the possible association between deprivation and poor air quality is explored, but other factors, such as daily population movements, are required (Namdeo and Stringer, 2008). We have developed an agenda to compare the effect of scale versus underlying sociodemographic characteristics for an example study area. This granular insight supports targeted public health interventions and policies that are more effective and equitable, addressing the unique needs of smaller communities.
Targeted healthcare service placement could be adopted by local authorities could use alternative geographies to more accurately site community health clinics or mobile health units in areas with disproportionately high rates of poor health outcomes. The alternative geographies could also inform more responsive and equitable deployment of air pollution mitigation measures (e.g. low emissions zones, tree planting, or traffic management plans) in communities with higher health vulnerability.
Limitations and constraints
The census general health measure is a person’s own assessment of the general state of their health. This assessment is not based over any specified period of time. However, this study represents a health assessment at a known point in time, which can be examined for association with other temporally relevant air quality and sociodemographic variables. A full set of census characteristics is not yet published across all geographies for the workday population. Furthermore, at the time of writing, a decision to publish a new version of census workplace zones for England and Wales has not been taken whilst the impacts of COVID-19 on working behaviours are assessed (ONS, 2025). The results are interpreted as associations rather than causal relationships, acknowledging the limitations of cross-sectional data and the possibility that unmeasured confounding factors may influence observed patterns between sociodemographic variables, air quality, and health outcomes. However, using findings from this study, we set a research agenda that identifies the magnitude and impact of considering associations across these two time-specific populations.
This study captures associations based on a single-year snapshot of air quality and health data, reflecting short-term exposure. Whilst useful for identifying spatial disparities, this approach does not capture the cumulative or long-term health effects of air pollution. Future research incorporating longitudinal or time-series air quality measurements could be better suited to exploring the chronic impacts of environmental exposures on population health. Future research, using mobility data, could also identify locations where higher than average churn in the usually resident population.
Conclusions
Whilst more spatially flexible models, such as GWR, can highlight localised variation, this study used weighted MLR analyses to maintain consistency across spatial scales and zonation schemes. This allowed for direct comparison of sociodemographic and air quality predictors of health outcomes, which is crucial when evaluating spatial aggregation effects and the MAUP (Brunsdon et al., 1998; Flowerdew, 2011). By ensuring comparability across geographies, this approach supports more equitable policy design, allowing interventions to be guided by consistent, interpretable evidence that is sensitive to scale but not distorted by it.
Our analyses have presented a range of associations between selected population health, sociodemographic characteristics, and urban air quality, which address global WHO public health research priorities. In doing so, we have investigated the impact of zonation and choice of areal units on the significance of explanatory factors. The principal focus is on the impact of spatial scale, which was deemed to mask important associations at larger levels. We also found possible model overfitting using variables at the ward level or greater, but this could merit further investigation. Researchers should use caution when interpreting air quality and associated sociodemographic regression analyses based on the ward level. Using census-derived variables in this study, we have found that increased exposure to PM2.5, having a disability, or being unemployed due to a long-term illness have the greatest association with population health, which is significant across all scales with the largest regression coefficients. These findings highlight the value of constructing alternative geographies based on health-need clustering, rather than relying solely on existing administrative boundaries. The use of health-need-based aggregations, such as the ‘alternative wards’ developed from Output Areas, provided not only stronger model performance but also clearer alignment with underlying spatial distributions of poor health. By more effectively capturing localised sociodemographic patterns and health disparities, these custom geographies offer a more sensitive and targeted spatial framework for public health analysis. This approach may support better-informed interventions, allowing policymakers to identify and prioritise areas with the greatest need more accurately than traditional administrative units permit.
Footnotes
Acknowledgements
The authors would like to thank the anonymous peer reviewers for their thoughtful and constructive comments, which significantly improved the clarity and quality of this manuscript. Their careful review and detailed suggestions were greatly appreciated. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Ethical considerations
There are no human participants in this article and informed consent is not required.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data that support the findings of this study are publicly available and used under open licence agreements. Contains Office for National Statistics and other data sources which are licensed under the Open Government Licence v.3.0. Contains OS data © Crown copyright and database right 2025.
