Abstract
Estimating education indicators for small geographic areas, such as specific districts, poses challenges due to the limited sample sizes available in these regions. These limitations can result in high variance among direct estimators and create data gaps that impede the development of targeted educational policies. This study employs the Small Area Estimation (SAE) method to provide accurate estimates of educational attainment at the district level in South Africa. We utilised a unit-level logit mixed model to predict the prevalence of different education levels, incorporating data from the 2023 General Household Survey alongside auxiliary variables from the 2022 Population Census. Our methodology applies Empirical Best Predictors (EBPs) within a random effects framework to refine estimates for districts with small sample sizes. Our analysis reveals significant disparities in educational levels between metropolitan and non-metropolitan districts. For instance, districts DC24, DC33, DC15, and DC27 exhibit the highest prevalence of individuals with no education. In contrast, urban districts such as NMA, CPT, EKU, ETH, JHB, and BUF demonstrate the lowest prevalence, all below 11%. These robust, model-based estimates provide policymakers with a reliable foundation for resource allocation and planning, highlighting that SAE is an essential tool for transforming limited survey data into actionable insights for monitoring educational equity and advancing national development goals.
Keywords
Statement of significance
Estimates at the national and provincial levels may not accurately reflect the unique needs and circumstances of various subnational regions, leading to the misallocation of resources. This research significantly contributes to Small Area Estimation (SAE) by providing estimates and maps of education levels—ranging from no education to pre-primary, primary, secondary, and post-secondary—at unplanned domains, such as district municipalities, using a logit mixed model. Although the methodology for Empirical Best Predictors (EBPs) in unit-level models is well-documented, its application in real-world contexts, particularly in sub-Saharan Africa (e.g., South Africa), remains limited. This research addresses that gap by utilising established models and offering a practical application for binary outcome variables commonly encountered in social science, economics, health sciences, and other fields, where direct survey estimates are often unreliable due to small sample sizes. The resulting fine-grained maps of education levels not only highlight inequalities but also inform policies aimed at skills development and resource allocation. Additionally, the framework for estimating proportions can be adapted for use in other areas, such as poverty mapping, public health, and epidemiology, to support evidence-based policymaking.
Introduction
Access to high-quality education for everyone is a key component of the global development agenda. It is widely recognised that education plays a crucial role in promoting social and economic mobility. 1 Higher levels of educational attainment–measured by the number of years of schooling completed–strongly correlate with both physical and mental health. Individuals with more education generally experience lower rates of chronic diseases and tend to live longer, regardless of their country or population group.2–5 Each additional year of education is associated with lower mortality rates and better self-rated health, with the most significant effects stemming from tertiary education on life expectancy and health behaviours.2,3,5,6 Education also plays a crucial role in mitigating health-risk behaviors, including smoking and obesity.2,7
Higher educational attainment, particularly among women, is associated with lower fertility rates and delayed childbearing, leading to healthier families and greater economic stability. 2 Furthermore, obtaining a higher education significantly improves job prospects, increases income levels, and boosts overall economic productivity. As a result, education serves as a vital catalyst for economic growth and poverty alleviation.2,5
Schools play a significant role in shaping both health and education, which, in turn, influence the overall well-being and economic productivity of entire populations. 8 Research indicates that education is a vital social determinant of health and well-being, influencing access to opportunities, employment, and intergenerational mobility.4,9,10 However, the advantages of education are not equally distributed; marginalised and vulnerable groups often see lower economic and health returns from additional schooling.4,10,11 Addressing these educational and learning gaps is therefore crucial for reducing disparities in health and longevity within societies.2,9,12
In South Africa, educational attainment mirrors that of other low- and middle-income countries but is marked by significant inequalities based on race, socioeconomic status, geography, and disability. Key factors contributing to these disparities include the historical legacy of apartheid, which has entrenched racial and socioeconomic divisions, leading to Black African populations, particularly in rural areas, accessing lower-quality schooling. Even decades after apartheid, these gaps persist, affecting both access to and the quality of education.13–16 Additionally, the country has maintained a dual school system, in which about 25% of students attend well-structured, well-resourced schools that primarily served the former white population. At the same time, approximately 75% are enrolled in under-resourced, low-performing schools that serve predominantly Black and less privileged populations.14,15 Geographic and socioeconomic disparities are also evident, as learners from rural areas and lower-income households exhibit lower attendance and attainment rates than their urban counterparts.14,15,17 Furthermore, racial and gender-based disparities continue to exist, with White and Indian students achieving higher educational outcomes than Black and Coloured students, contributing to significant gaps in higher education representation and employment opportunities.16,18 Lastly, children from single-parent households and those with disabilities face additional challenges that exacerbate existing inequalities in educational attainment. 19
This study aims to investigate educational levels in South Africa with two primary objectives: conducting a detailed district-level analysis of various education categories-namely, no education, pre-primary, primary, secondary, and post-secondary–to provide a nuanced understanding of their prevalence across specific regions, and examining the unequal distribution of educational levels throughout the country to highlight geographic disparities that necessitate targeted interventions. To fulfil these objectives, the research employs unit-level logit mixed models, drawing on methodologies proposed by Jiang and Lahiri, 20 Jiang, 21 and Hobza and Morales. 22 This approach integrates data from the 2023 General Household Survey (GHS) with auxiliary variables from the 2022 population census through Small Area Estimation (SAE) techniques at the unit level. The selection of binomial logit mixed models is grounded in their efficacy as generalised linear mixed models (GLMMs) for dichotomous or count variables, allowing incorporation of random effects to account for variability between domains that auxiliary variables do not explain. Ultimately, the study employs empirical best predictors (EBP) to estimate weighted sums of probabilities under unit-level logit mixed models, using the Laplace Approximation for maximum likelihood (ML).
This research substantially advances the understanding of educational attainment in South Africa through several key contributions. Primarily, it employs a robust SAE methodology that integrates 2023 GHS data with auxiliary variables derived from the 2022 population census. The primary objective is to enhance the accuracy of education level estimates–covering categories such as no education, pre-primary, primary, secondary, and post-secondary–at the district level. This methodological approach effectively addresses the critical issue of unreliable education estimates, which often arise from high sampling variability inherent in direct survey results, thereby enhancing data granularity without necessitating additional field data collection.
Moreover, the study applies this SAE technique to generate the first high-resolution educational estimates for South Africa’s 52 districts, producing invaluable sub-national insights that extend beyond mere national and provincial averages. This granularity is essential for informing actionable policy decisions, as it reveals spatial disparities in educational attainment between metropolitan and non-metropolitan regions and identifies priority districts for targeted interventions. Such information is crucial for crafting effective policies, directing resources toward specific population groups, and monitoring existing educational programs. By aligning these findings with governmental initiatives, this research establishes a solid evidence base for targeted policymaking that promotes equitable growth.
In the realm of official statistics, National Statistical Offices (NSOs) such as Statistics South Africa are increasingly under pressure to produce disaggregated indicators for smaller geographic areas. These indicators are crucial for monitoring progress towards national development goals and the Sustainable Development Goals (SDGs).23–28 This challenge primarily arises from two factors:
SAE provides a sound approach for creating local-level indicators from limited survey data. However, the effectiveness of this method depends heavily on the choice of technique, the availability of auxiliary data, and the results of model diagnostics, all of which significantly affect reliability.26,27,29,30 When executed properly, integrating classical SAE extensions can help address both measurement issues and non-sampling challenges. Research shows that unit-level and area-level SAE models often produce more accurate small-area estimates than direct survey methods, especially when informative auxiliary covariates are applied.25,26,30 This study effectively illustrates the value of SAE techniques in generating trustworthy district-level education indicators, thus supporting the goals of official statistics and meeting the needs of evidence-based policymaking.
The article is structured as follows: Section 2 examines the data sources employed in the study, with particular emphasis on the 2023 General Household Survey (GHS) and the 2022 population census. In Section 3, we delve into unit-level logit mixed models, covering the maximum likelihood (ML) Laplace approximation, empirical Bayes prediction (EBP), and the mean squared error (MSE) associated with EBP. Section 4 presents our findings, followed by a discussion in Section 5. Finally, Section 6 offers concluding remarks to encapsulate the study’s insights.
Data sources
The territory of South Africa is divided into nine provinces, each further subdivided into district municipalities, local municipalities, and wards. This study focuses on the 52 district municipalities of South Africa as the geographic level of analysis.
The primary concept behind SAE methods is to use a statistical model that leverages the relationship between the variable of interest (e.g., different education levels) and covariates for which population information is available. This approach aims to enhance the precision of direct estimates. The analyses presented in this study rely on two main data sources. The first source is the 2023 GHS data, collected annually by Statistics South Africa (Stats SA). This survey provides a comprehensive overview of households and individuals, including their living conditions, access to services, and social dynamics throughout the country. The dataset captures information on individuals with varying levels of education, including those with no education, pre-primary, primary, secondary, and post-secondary qualifications, totalling approximately 67,583 observations. The estimates derived from the GHS are considered reliable at both national and provincial levels. However, at the sub-national level, estimates may exhibit significant sampling variability due to smaller sample sizes, potentially compromising the reliability of the results. This study examines several primary outcome variables derived from the 2023 GHS. These include binary indicators for each education category: no_education (1/0 for “No Education”), pre_primary (1/0 for “Pre-Primary” education), primary (1/0 for “Primary” education), secondary (1/0 for “Secondary” education), and post_secondary (1/0 for “Post-Secondary” education). These indicators are assessed at the unit level, with the primary objective of estimating education categories at the district level.
We model educational attainment through separate binary indicators for each category–no education, pre-primary, primary, secondary, and post-secondary–rather than as a single five-level categorical outcome. This approach allows us to utilise unit-level logit mixed models within an SAE framework, providing directly interpretable and comparable prevalence estimates for each category across various districts. Furthermore, it mitigates the distributional and computational complexities typically associated with multinomial mixed models while providing flexibility for modelling category-specific covariates.
The second data source employed in this study is the 2022 population census, which supplies essential auxiliary variables for the models. The South African census is an invaluable and comprehensive resource, providing insights into a broad spectrum of socio-economic, demographic, and educational characteristics, as well as the migration status of individuals at a detailed level. This study includes covariates that are commonly present in both the GHS and the population census and exhibit significant correlations with the target variables. Consequently, the selected auxiliary variables comprise: (i) individuals’ ages in years, (ii) their gender, and (iii) the type of geography.
Methodology
The unit-level logit mixed model
Consider a group of domain-specific random effects
where
Deriving the marginal likelihood of the unit-level logit mixed model is complex, but it can be approximated using the Laplace approximation, as seen in the ‘glmer‘ function of the R package ‘lme4‘. This method is more effective with larger domain sizes (
EBP under logit mixed models
EBPs provide an optimal approach for estimating domain-level characteristics using unit-level logit mixed models. Consider a finite population divided into
The optimal predictor of the domain proportion
The unit-level conditional expectations can be expressed as:
The EBP approach combines direct estimates with model-based predictions, yielding more reliable estimates for small sample sizes. By leveraging both survey data and auxiliary information, EBPs enhance domain-level estimates, making them valuable in SAE where direct estimates may be unreliable.
Accurate uncertainty quantification is essential for small area estimates. The MSE of EBP under logit mixed models requires specialized estimation approaches due to the complex nature of decomposing prediction errors. The MSE of the EBP for domain proportion Estimate the model parameters Generate From each bootstrap population, take samples that match the original design and compute the EBPs Calculate the bootstrap MSE estimator:
In the education-level application, this bootstrap method successfully addresses the binary nature of the outcome variables and provides relevant measures of uncertainty. The MSE estimates are essential for comparing the precision of EBPs across districts and for assessing whether differences in education-level estimates are statistically significant.
The parametric bootstrap has been shown to be effective in simulation studies of logit mixed models, yielding nearly unbiased MSE estimates when the model assumptions hold. However, caution should be exercised with very small sample sizes, as this can lead to high variability in bootstrap results. This approach ensures that uncertainty in parameter estimates and in random-effect predictions is accounted for, making the EBP method reliable for planning educational policy and allocating resources.
The methodology is designed for the production of official statistics and aims to improve design-based survey estimates, especially for small domains, through model-based estimation. From an official statistics perspective, this proposed methodology can be easily integrated into standard production workflows. The model is calibrated using publicly available survey data, while predictions are generated using census data, which is generally accessible to NSOs. To provide essential measures of uncertainty for publication, a parametric bootstrap method is used to estimate MSE. Unlike traditional design-based estimators, the model-based empirical best predictor (EBP) approach yields stable estimates even in domains with few sample observations, which is often critical for official reporting.
Figure 1 illustrates the workflow of the SAE framework utilised to generate district-level education indicators. The process commences by integrating survey data from the 2023 GHS with auxiliary information from the 2022 Population Census to ensure consistency at the district level. A unit-level logit mixed model is then specified and estimated to incorporate both fixed and random effects. The EBP method is used to derive area-level estimates. Subsequently, uncertainty is quantified using parametric bootstrap techniques, followed by thorough model diagnostics. The workflow culminates in the production of reliable estimates and confidence intervals, which are visually represented through maps and tables to deliver insights pertinent to policy-making.

The workflow for the SAE framework, designed to produce district-level educational attainment indicators, consists of several key steps. Initially, it integrates survey data from the 2023 GHS with supplementary information obtained from the 2022 Population Census. The application of a unit-level logit mixed model for estimation follows this integration. Subsequently, EBP is conducted, along with uncertainty estimation through parametric bootstrap methods. The process includes a diagnostic evaluation to ensure precision, and the findings are ultimately disseminated through maps and tables.
This section presents the results of unit-level logistic mixed models within the SAE framework. The analysis is based on data from two sources: the 2023 GHS and the 2022 Population Census, focusing on educational attainment in South Africa. The models estimate the likelihood of individuals being classified into five education levels: No Education, Pre-Primary, Primary, Secondary, and Post-Secondary. The auxiliary variables (demographic predictors) included in the analysis are common to both the 2023 GHS and the 2022 Population Census, such as age, gender, and urban residence. These demographic indicators were selected because they are significantly correlated with the five target variables, specifically the proportions of individuals in each education level. To address geographic clustering, random intercepts were included for 52 district municipalities.
Table 1 illustrates the weighted percentage distribution of educational attainment by gender and geographic region, as reported in the 2023 GHS. Notable urban-rural differences are evident for both men and women. In urban areas, secondary education is the most prevalent level of attainment, accounting for approximately 59% among both females and males. At the same time, post-secondary education levels are higher in urban areas than in rural areas. Conversely, rural regions exhibit greater proportions of individuals with no education or only primary education. Gender disparities within the same geographic context are relatively minor when contrasted with the significant differences observed between urban and rural areas.
Weighted percentage distribution of educational attainment by gender and type of geography (urban/rural), 2023 GHS, South Africa.
Weighted percentage distribution of educational attainment by gender and type of geography (urban/rural), 2023 GHS, South Africa.
Table 2 illustrates the fixed effects coefficients and standard errors (in parentheses). Additionally, it provides the estimated standard deviation of the district-level random effects (
Parameter estimates from unit-level logit mixed models for educational attainment categories in South Africa.
Notes: Standard errors are in parentheses below the coefficient estimates.
Significance codes:
Models estimated using a unit-level logistic mixed model with random intercepts for 52 district municipalities (
Table 3 shows the top 10 district municipalities, based on EBP estimates, for individuals without education and those with pre-primary, primary, secondary, and post-secondary education across South Africa’s districts. For example, in KwaZulu-Natal province, several districts have a high number of individuals with no formal education. These include the uThungulu District Municipality (DC24), uMgungundlovu District Municipality (DC27), Umzinyathi District Municipality (DC26), and Uthukela District Municipality (DC28). Similarly, in Limpopo province, the Mopani District Municipality (DC33), the Greater Sekhukhune District Municipality (DC47), and the Vhembe District Municipality (DC34) also show high levels of illiteracy. In the Eastern Cape province, the OR Tambo District Municipality (DC15) and Alfred Nzo District Municipality (DC44) exhibit a similar trend of individuals lacking formal education.
The district-wise estimates of the prevalence of no education, generated by the EBP method using a unit-level logit mixed model, range from a minimum of 10.19% (95% CI: 9.12%, 11.27%) in the NMA district municipality to a maximum of 19.49% (95% CI: 16.61%, 22.33%) in the DC24 district municipality. For secondary education, the estimates vary from 40.07% (95% CI: 42.30%, 39.44%) in the DC15 district municipality to 64.65% (95% CI: 66.00%, 62.20%) in the JHB district municipality. Regarding post-secondary education, the prevalence ranges from 2.25% (95% CI: 1.37%, 3.12%) in the DC44 district municipality to 10.34% (95% CI: 8.28%, 12.39%) in the DC4 district municipality.
The district-wise plot illustrating the 95% confidence intervals (CIs) generated by both direct and EBP estimates is presented in Figure 2. This demonstrates that the 95% CIs for direct estimates are wider than those for the model-based estimates produced by the unit-level logit mixed model.

District-wise 95 percent confidence interval plots for the direct estimates versus EBP estimates for Individuals.
Top 10 district municipalities by EBP estimates.
Inferences derived from model-based estimates, specifically those from the unit-level logit mixed model, depend on the distributions defined by the assumed model, underscoring the importance of model validation for reliable model-based estimation. 29 Harmening et al. 34 emphasise that a reliable model-based estimate should align with direct estimates, while the direct estimates should demonstrate higher accuracy by utilising auxiliary variables.
To ensure the reliability of our estimates, we employed MSEs, as illustrated in Figure 3, which presents both direct and model-based point estimates. As the sample size increases, the MSEs for all districts are presented in decreasing order. The line plots comparing the MSEs of direct and model-based estimates show minimal differences across many districts, particularly when the sample size is large. However, model-based estimates are more precise than direct estimates when the sample size is small. In summary, in districts with larger sample sizes and small direct-estimate MSEs, the model-based estimates can align more closely with the direct estimates.

District-wise mean-squared error estimates plotted for direct estimates and model-based estimates under a unit-level logit mixed model, sorted by sample size.
Figure 4 presents a dispersion graph comparing the coefficients of variation (CV) of the direct estimator under the sampling design with those of the model-based estimator derived from the unit-level logit mixed model. Consistent with the patterns observed in the MSEs, the CVs of the EBPs tend to be lower than those of the direct estimators. Furthermore, as the sample size increases, the CVs of both estimators approach equality.

District-wise coefficients of variation (%) plotted for direct estimates and model-based estimates under a unit-level logit mixed model, sorted by sample size.
The scatter plots in Figure 5 illustrate the comparison between EBP and direct estimates for each education category. The presence of the 45° reference line facilitates easy assessment of agreement between the two estimates. In most panels, the tight clustering around this line indicates strong concordance, while a handful of points with greater colour intensity highlight areas with more substantial differences. Overall, the plots effectively demonstrate the expected shrinkage of EBP estimates toward the regression fit, offering a clear visual validation of the model’s performance across various domains.

Scatterplots of EBP estimates with corresponding direct estimates across the 52 districts for five distinct educational categories. The dashed 45° line illustrates the benchmark of perfect agreement between the EBP and direct estimates. Point coloration reflects the magnitude of the absolute difference between these two types of estimates, with warmer colors signifying larger discrepancies.
The analysis reinforces the findings by showing that the model-based estimates from the EBP methods closely align with direct survey estimates. Additionally, the CV values, MSE values and 95% confidence intervals indicate that the model-based estimates are reliable and more stable than their direct survey counterparts. Given this evidence, we recommend that stakeholders utilise EBP estimates for effective policy planning and implementation to address educational disparities in South Africa. The inclusion of random intercepts for district municipalities reveals significant differences in educational outcomes across districts, with the random-effect standard deviation (
The results presented in Table 2 illustrate a nuanced relationship between age and the various categories of educational attainment. This study indicates that older individuals are generally less likely to have no education or to have completed only pre-primary education. Instead, they are more likely to attain secondary and post-secondary education.
Furthermore, the regression analyses presented in Table 2 illustrate noteworthy gender disparities in educational attainment across distinct districts within South Africa. Notably, there is a considerable negative impact on males in post-secondary fields. As noted by Msimango and Motala, 35 more women than men are graduating at the undergraduate level, signalling a substantial advancement in women’s empowerment. Furthermore, women tend to reap greater benefits from education; each additional year of schooling increases their earnings by 21.2%, compared to 18.1% for men. 18 Despite these advancements, significant gender imbalances persist in fields such as science, engineering, and technology, where men are overrepresented. Women also remain underrepresented in academic leadership roles and among doctoral graduates.35,36
Estimates of educational attainment across various districts of South Africa reveal significant disparities in educational attainment within the population. For instance, 10.19% of individuals in Nelson Mandela Bay (NMB) lack any form of education, whereas this figure rises to 19.49% in district DC24. Districts in KwaZulu-Natal, such as DC24, DC27, DC26, and DC28, show a particularly high proportion of uneducated individuals, and in Limpopo, districts DC33, DC47, and DC34 also have significant concentrations of those without formal education. Similarly, DC15 and DC44 in the Eastern Cape reflect comparable trends. Numerous studies indicate that rural areas in KwaZulu-Natal, Limpopo, and the Eastern Cape consistently experience some of the highest rates of educational deprivation in the country, evidenced by elevated numbers of individuals with no formal schooling or low educational attainment.37–39
The percentage of individuals who have completed secondary education varies significantly across different district municipalities. For instance, only 40.07% of the population in DC15 has achieved this level of education, whereas Johannesburg (JHB) boasts a substantially higher completion rate of 64.65%. Major metropolitan areas such as Johannesburg, Ekurhuleni, Cape Town, eThekwini, Nelson Mandela Bay, and Tshwane report the highest percentages of individuals with secondary education. In contrast, district municipalities such as DC39, DC15, DC44, DC24, and DC33 have the lowest completion rates, all below 43%. This underscores the pronounced disparities in educational attainment among South Africa’s 52 district municipalities. Figure 6 illustrates this spatial inequality by showing the estimated proportions of individuals at various education levels using the EBP method. The prevalence of post-secondary education varies widely, ranging from 2.25% in DC44 (Alfred Nzo District Municipality in the Eastern Cape Province) to 10.34% in DC4 (Garden Route District Municipality in the Western Cape Province). Generally, metropolitan districts offer superior educational resources, including better infrastructure, more schools, and supportive local policies compared to rural areas, resulting in higher attainment levels. Privileged communities in these urban settings tend to leverage these advantages effectively, while marginalised groups face substantial challenges and achieve poorer outcomes.40,41 This situation is aggravated by spatial mismatches, where quality education and job opportunities are inequitably distributed, deepening existing inequalities. 41 Despite the advantages of metro districts over non-metro districts, South Africa continues to struggle with low completion rates: only 41% finish lower secondary education, 28% complete upper secondary, and a mere 5% obtain tertiary education.40,42

Spatial distribution of EBP estimates for educational attainment categories across the 52 district municipalities of South Africa. The maps show the estimated proportions of adults with No education, Pre-primary education, Primary education, Secondary education, and Post-secondary education.
The results of this study indicate that living in urban areas is strongly associated with higher educational attainment, particularly at the post-secondary level. Urban residents are significantly more likely to pursue advanced studies, aligning with findings from previous research that highlights the persistent learning gap between urban and rural areas in sub-Saharan Africa, including South Africa. This gap is largely attributed to differences in student characteristics and the resources available in schools. 43
Urban students consistently outperform their rural peers due to better infrastructure, improved resources, and greater access to qualified teachers. Generally, students in urban districts in South Africa achieve higher levels of education than their rural counterparts, benefiting from well-developed educational infrastructure, resources, and opportunities. This translates into higher educational attainment and greater returns on education.44,45
Individuals in urban districts enjoy more advanced educational systems, facilities, and resources, as well as better access to information and communication technology (ICT) and superior skill development opportunities, which are often lacking in rural and underprivileged districts. 46 However, there is also a disparity within urban areas, as children from informal settlements and low-income households face significant barriers to educational access and achievement despite the overall advantages associated with urban living. 44
Table 3 presents the districts with the highest proportions of individuals who hold post-secondary education. This includes DC1, DC4, and DC5 in the Western Cape; DC33 in KwaZulu-Natal; DC45 and DC48 in North West; as well as BUF and NMA in the Eastern Cape, and EKU and TSH in Gauteng. These findings highlight the strong educational attainment in these areas. Consequently, geographic location plays a crucial role in access to post-secondary education. Students in rural and informal settlements have significantly lower access than their peers in urban areas.13,18
District-level estimates are crucial for targeting non-formal education initiatives and establishing rural learning centres. This study reveals that the prevalence of individuals with no education ranges from 10.19% in the NMA municipality to 19.49% in the DC24 municipality. High ’No education’ rates are often found in districts facing significant deprivation and rural isolation, such as DC15 and DC44 in the Eastern Cape, as well as several in KwaZulu-Natal and Limpopo. To effectively address these issues, policies should focus on improving the quality of education in historically disadvantaged schools by enhancing infrastructure, strengthening teacher support, and reforming the curriculum. Additionally, promoting adult literacy and foundational education programs is crucial for achieving Sustainable Development Goal 4.6, which emphasises literacy and numeracy for all youth and adults.
District-level estimates of secondary education completion rates can identify areas with high early school-leaving rates in South Africa. Major cities, including Johannesburg, Ekurhuleni, Cape Town, eThekwini, Nelson Mandela Bay, and Tshwane, have the highest rates, while municipalities such as DC39, DC15, DC44, DC24, and DC33 show rates below 43%. This variation highlights urbanisation and inequality issues, which are critical for shaping policies that enhance skills development, youth employment, and gender equity. These insights can inform bursary programs and initiatives that aim to facilitate smooth transitions from school to work, as well as align TVET programs with industry needs.
Post-secondary education rates in South Africa vary significantly, ranging from 2.25% in the Alfred Nzo District (Eastern Cape) to 10.34% in the Garden Route District (Western Cape). Metropolitan districts generally offer better educational resources, leading to higher attainment levels. This education is vital for developing future leaders, innovators, and skilled professionals, as a country’s research and technological advancement capabilities rely on its post-secondary graduates. To improve this category of education, key policy recommendations include strengthening the South African Research Chairs Initiative (SARChI) to retain research excellence at public universities and enhancing innovation hubs, such as the University of the Witwatersrand and the Tshwane Innovation Hub. Additionally, reforming student loan systems, such as the National Student Financial Aid Scheme (NSFAS), is crucial for making higher education more accessible to low-income students.
This study reveals significant disparities in educational attainment across South Africa’s districts. Generally, individuals in urban areas achieve better educational outcomes than those in rural areas, where attainment levels are typically lower. To effectively address these educational inequalities, stakeholders should focus on regions with the highest illiteracy rates and use model-based estimates derived from unit-level SAE methods for strategic planning and implementation. It is essential to ensure an equitable distribution of education funding, teacher resources, and infrastructure development to bridge the gap between urban and rural districts.
The results underscore the potential of SAE methods to enhance official statistics by providing timely, reliable, disaggregated indicators, particularly when survey sample sizes are inadequate for direct estimation at lower administrative levels (i..e., district municipality levels). This framework is not limited to educational indicators; it can also be applied to any binary outcome of interest for official statisticians, such as poverty, unemployment, or health insurance coverage. By adopting SAE methods, statistical agencies can fulfil their commitment to leaving no one behind while delivering high-quality small-domain estimates without incurring additional data-collection costs.
Footnotes
Acknowledgments
The authors would like to express their gratitude to Statistics South Africa for supplying the datasets, particularly the General Household Survey (GHS) and the 2022 Population Census datasets, which were vital for this research.
Funding
The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research did not receive funding from any specific agency. The University of Johannesburg (South Africa) provided essential material support for conducting the study.
Declaration of Conflicting Interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Use of generative AI and AI-assisted technologies in writing
The authors used Grammarly Premium, provided by their institution, to improve the quality of their manuscript. They also utilised AI tools like Consensus and SciSpace for literature searches. After refining the content, the authors took full responsibility for its publication.
Data Availability Statement
No new data were generated during this study. This research involved a secondary analysis of existing datasets that are publicly available at:
