Abstract
Subnational Population Estimates (SPE) in Latin America are useful to implement new public policies in subnational areas with internal armed conflicts or difficult to access. In this work, we propose to combine a Population Base Statistical Register (PBSR) and the Official Population Projections (OPP) using a Bayesian approach to produce SPE. Our proposed procedures are useful for computing SPE of the population size or the SPE of the population size in percentage SPE (%). However, we focused on SPE (%) due to some data restrictions and to ensure data confidentiality. In this article, the PBSR is constructed using multiple administrative sources with registers from the health, education, vital statistics systems, tax registration, and, more importantly, the registers of the victims of the current internal armed conflict in Colombia. We also propose new fast Markov chain Monte Carlo algorithms to produce SPE (%) using data augmentation procedures to address the complications caused by the resulting joint posterior containing gamma functions. We implement our proposal to compute SPE (%) by age and sex groups in the municipality of Jamundí in Colombia which is currently affected by poverty, forced displacement, and the internal armed conflict and evaluate the accuracy with a Population Census.
Keywords
1. Introduction
The estimation of the composition of a population by sex and age groups is important for a variety of planning purposes. For instance to propose public policies in education, health, family planning, and housing developments, among other aspects (Swanson and Tayman 2012). To compare multiple distributions by sex and age groups and to provide some data confidentiality typically the Colombia National Statistical Office (NSO) called the National Administrative Department of Statistics (DANE, by its Spanish acronym) presents population pyramids in official reports using frequencies in percentages related to a specific total population. This allows official statisticians to compare the evolution of population structures (e.g., constrictive, expansive, and stationary) among different time periods and the distribution of population structures of different sizes among several countries (Sundbarg 1900; Yusuf et al. 2014). For instance, commonly expansive pyramids are observed in many municipalities of Colombia where birth rates are high and life expectancy is short.
The number of possible administrative registers from different sources in Colombia and the Latin American region is large (ECLAC 2020). More importantly, those registers may be useful to produce SPE in Latin American countries where long intercensal periods and the incompleteness of the number of registered births, deaths, and migrants are common issues.
There is a large number of Bayesian models based on the cohort-component method (Alexander and Alkema 2022; Clark et al. 2012; Raftery et al. 2012, 2014; Wiśniowski et al. 2012). Our proposed Bayesian model is however closely related to and inspired by the previous Bayesian models given by J. R. Bryant and Graham (2013, 2015) and J. Bryant and Zhang (2018). Importantly, most of the research combining administrative sources and Bayesian methods to produce SPE has been focused on Europe and the United States. For instance, to provide estimates for the New Zealand population in J. R. Bryant and Graham (2013) a Bayesian framework with administrative data is proposed. The same approach proposed in J. R. Bryant and Graham (2013) is implemented by Toti et al. (2017) to obtain regional population estimates based on Italian administrative data. Another approach was proposed by Daponte et al. (1997) to estimate the Iraqi Kurdish population by incorporating uncertainty characteristics of the population via prior elicitation.
In this work, instead of using the registers from multiple administrative data sources to compute the SPE (%), we consider the Population Base Statistical Register (PBSR) which includes administrative registers of the victims of the internal armed conflict in Colombia. We called in this work a PBSR an administrative register obtained by integrating administrative registers obtained from multiple heterogeneous administrative sources. We describe in the next section the steps to construct the PBSR in Colombia following the guidelines in Wallgren and Wallgren (2014, 2022). Population statistical registers based on administrative data similar to the PBSR in Colombia have been developed by other statistical offices in high-income countries. However, to the best of our knowledge, this is the first attempt to build a PBSR in a Latin American country.
Our proposal is motivated by the need of the Colombia National Statistical Office (NSO) to produce more realistic SPE (%) in municipalities where surveys and censuses are difficult to implement because of the current armed conflict and internal forced displacement (Lezama and May 2018; Méndez-Giraldo et al. 2023). More specifically, we propose a Bayesian procedure using the PBSR and the Official Population Projections (OPP; DANE 2017) to compute more reliable SPE (%) by age and sex groups in Colombia. Our proposed procedures are useful for computing the SPE of the population size or SPE (%). However, SPE (%) are produced in this work instead of absolute numbers for some data restrictions imposed by the Colombia NSO and to ensure some data confidentiality. Importantly, population pyramids constructed with SPE (%) allows official statisticians in the Colombia NSO: (1) to compare the evolution of population structures among municipalities due to the long intercensal periods, (2) to detect some issues in population dynamics such as force-displacement, and (3) to implement public policies in education and improve poverty indicators.
In addition to the proposed technical procedures, we develop efficient algorithms to overcome the complications caused by the resulting joint posterior containing gamma functions to produce SPE under our proposed Bayesian models. We illustrate that the proposed algorithms can produce SPE in a few seconds and may be useful for official statisticians and other practitioners in NSOs. We believe our work can be extended to other countries in the Latin American region where inconsistencies, duplicates, and incompleteness of the administrative sources are common issues (ECLAC 2020).
The most representative subnational-regional levels in Colombia are departments with subdivisions called municipalities. The municipality of Jamundí is located in the Cauca department where several armed groups have imposed restrictions on the mobility of people within a territory making surveys and censuses difficult to implement (Arbeláez-Ruiz 2022; Lezama and May 2018; Ríos et al. 2023). Also, the current armed conflict in Colombia has led to the forced displacement, and confinement of residents and increased the number of deaths (OIM 2019) and hence to obtain the true population structure at the municipality level is a challenging task. We consider confidential administrative registers from the municipality of Jamundí in Colombia to build our PBSR where the administrative sources are produced by different institutions such as the Colombia NSO, Ministry of Health, Ministry of Education, unit for victims, tax registration, and health-promoting entities.
To compare the accuracy of the SPE (%) produced by the proposed Bayesian models and the accuracy of those SPE (%) produced with the OPP and PBSR, we consider the Experimental Population Census (EPC) conducted in the municipality of Jamundí in 2016. More importantly, we illustrate that combining the OPP and PBSR using the proposed Bayesian models leads to more accurate SPE (%) for some age and sex groups than those produced with the OPP and the PBSR.
The rest of the paper is organized as follows. In the next section, we describe the construction of the proposed PBSR and present the method to compute the OPP at subnational levels in Colombia. In Section 3, we present the proposed Bayesian models to produce SPE (%) by age and sex groups. Additionally, in Supplemental Section B we develop the proposed Markov chain Monte Carlo (MCMC) computational algorithms for the proposed models and evaluate their computational cost in Supplemental Section C. In Section 4, we evaluate the performance of the proposed Bayesian procedures to obtain SPE (%) by age and sex groups considering the SPE (%) produced by the PBSR and OPP and comparing those with the obtained in the EPC. Supplemental Section D presents some additional results of the EPC implemented in the municipality of Jamundí in 2016. The conclusions and future work are discussed in Section 5. Finally, Supplemental Section E contains the description of the R code and different files to replicate the results in this paper using synthetic data.
2. The Population Base Statistical Register and Official Population Projections to Produce SPE
Statistical population registers are useful to generate subnational official statistics as was recently discussed in UNSC (2022). As part of the 2030 agenda for sustainable development (United Nations 2018) the use of administrative data has been identified to be one of the potential avenues to monitor the Sustainable Development Goals (SDGs). In this section, we present the construction of a PBSR in Colombia which can be useful to generate subnational estimates by age and sex groups and other important estimates of key demographic indicators. To this end, we will consider the proposed procedures in Wallgren and Wallgren (2014, 2022) to build the PBSR.
Administrative data usually is provided by the health, education, and vital statistics systems but given the particularities of each country in the Latin American region some other administrative sources can be considered. For instance, due to the internal armed conflict in Colombia there is a specific administrative register of the victims called the Single Register of Victims (RUV, by its Spanish acronym). In Colombia, the RUV provides information about the demographic characteristics of the victims of the current armed conflict and the number of forced displacements due to this internal armed conflict. Another source of information is computed by the Ministry of Health in Colombia considering a single dataset of health affiliates (BDUA, by its Spanish acronym) which may be useful to produce SPE (%) by age and sex groups. The records provided by the RUV and BDUA may be useful for producing SPE (%) in Colombia. However, as pointed out by Bakker and Daas (2012) there are several challenges in using administrative data for official statistics. In the specific statistical system of Colombia, when all administrative records from different data sources are considered duplicates and incompleteness may lead to unreliable SPE (%).
Population projections by age and sex are useful for public policies in the municipalities of Colombia. Some important methodologies to obtain population projections under a Bayesian approach with their corresponding uncertainty at the country level have been proposed by Raftery et al. (2014) and Azose and Raftery (2015) and, more recently, at the subnational level by Yu et al. (2023). Since our proposed procedure combines the PBSR and the OPP to compute SPE (%) we also describe in this section the method to compute the OPP at the municipality level implemented by the Colombia NSO.
2.1. The Construction of the PBSR
As is discussed by Wallgren and Wallgren (2014, 2022) an administrative register is maintained to store observations (records) on all objects to be administered, and the administrative process requires that all objects can be identified. In the specific case of Colombia, an administrative register in the administrative system contains for instance records of individuals given by name, address, date of birth, place of birth, date of death, place of death, identity number, and the type of national identity. The type of national identity and the identity number are unique within the Colombian national administrative system and will be used in the deterministic matching of the records in different registers.
As pointed out by Wallgren and Wallgren (2022), access to the microdata and the privacy and confidentiality of administrative records are two important challenges to be addressed by NSOs to build a PBSR. Commonly, Ministries and other national authorities make available only the aggregated administrative data. This is the case in Latin American countries where traditionally administrative registers are protected because of some confidentiality policies. In Colombia, ministries and other national authorities have their own policies to share the information, and the Colombia NSO does not have full access to the different administrative registers. However, as part of the new National Statistical System (SEN, by its Spanish acronym) (DANE 2024) in Colombia, this project uses data from an important cooperation of different institutions in Colombia where confidential administrative registers were shared to build an experimental PBSR in the municipality of Jamundí.
Population statistical registers similar to the PBSR have been constructed by other statistical offices in high-income countries. The Nordic countries have been some of the pioneers in the implementation of administrative data sources for official statistics (Bakker and Daas 2012; UN 2007). The official data agency in New Zealand (Stats NZ) has produced several experimental population estimates at the national (Stats NZ 2016), subnational levels (Stats NZ 2017) and by ethnic groups (Stats NZ 2018) using an Integrated Data Infrastructure (IDI). Similar to the PBSR the population estimates are obtained from linked administrative data in the IDI. Another integrated administrative source closely related to the PBSR in Colombia is the Statistical Population Dataset (SDP) produced by the United Kingdom Office for National Statistics (UK ONS 2021a). The UK-ONS found that the administrative-based population estimates using the SDP are close to those found in the Census 2021-based mid-year estimates (UK ONS 2021b). Other NSOs using administrative sources to produce experimental official statistics include the Italian National Institute of Statistics (Cerroni et al. 2014) and United States Census Bureau (US Census Bureau 2023).
To build the PBSR in Jamundí we follow the procedures in Wallgren and Wallgren (2014, 2022). More specifically, in Subsections 2.1.1 to 2.1.3, the PBSR is constructed by making an inventory of administrative sources, describing the variables with different functions in the administrative system and verifying the usability of administrative registers and implementing standardization of some variables in the PBSR. Those procedures should be implemented before the integration of the administrative registers described in Subsection 2.1.4 to create first a register system. As pointed out by Wallgren and Wallgren (2014) this register system allows practitioners not only to ensure that the administrative registers can be integrated but also to make the PBSR useful for statistical purposes.
2.1.1. Creating an Inventory of Potential Sources
The Inventory of Potential Administrative Sources (IPAS) was created following the SEN (DANE 2024) in Colombia. This inventory was implemented by specialists in public health, economics, demography, and statistics from ministries, the Colombian NSO, and other national institutions.
The IPAS for the municipality of Jamundí is illustrated in Figure 1. To construct the PBSR we use the records of heterogeneous administrative data sources from the IPAS. Those data sources contain administrative registers from ministries and other national institutions which include the Ministry of Health, Ministry of Education, National Planning Department, Health Promoting Entities, Vital Statistics in the Colombia NSO, Unit for Victims, and Single Tax Register which are part of the SEN (DANE 2024).

Inventory of Potential Administrative Sources. Representation of the inclusion of the different administrative sources from the health, vital statistics, and education systems, the tax registration, and the victims of the current armed conflict in Colombia to build the PBSR.
2.1.2. Variables with Different Functions in the Administrative System
In the second step, we defined the variables to be included in the PBSR by considering identifying variables, communication variables, demographic variables, and the new variables in the PBSR (Wallgren and Wallgren 2022). For Colombian citizens, there are three types of national identities with a specific number of numerical digits depending on the specific age and a citizenship card is issued for persons reaching the age of eighteen years old. Also, foreigners have an identity document with a specific number of digits. The type of national identity and the specific identity number are two pieces of information and unique identifiers within the national administrative system in Colombia. As we will describe in the next section, we will use those unique identifiers for the deterministic linkage of the objects in different registers. The communication variables include the given name, surname, the address, phone number, and municipality of residence, and the demographic variables include the gender, age, the death date, and the new variables in the PBSR are used to define the status of individuals in each administrative source such as dead, emigrant, active, and duplicate, and the specific name for each administrative source in letters, for example, BDUA, RUV.
2.1.3. Usability of Administrative Registers and Standardized Variables in the Register System
As pointed out by Wallgren and Wallgren (2014) to build the PBSR we need to evaluate the usability of the administrative registers. Then the records of administrative registers are evaluated using a set of quality indicators. More specifically, those quality indicators include, (a) verifying the number of digits of the national identity, (b) excluding the records of individuals aged 110 or older following the population structure obtained in the Population Census 2005 for Jamundí, (c) considering only the records of individuals where last names have more than two characters, (d) confirming that the age values according to the date of birth and the reported age values are equal, and (e) when the type of national identity and/or the identity number are missing the respective administrative records are excluded in the construction of the PBSR.
In addition to the identifying variables described in the last section, we include two standardized variables to match the records of administrative registers to be included in the PBSR. The first standardized variable is obtained by considering a phonetic transformation of given name and surname in the administrative registers. To illustrate consider Table 1 where the names Yohanna and Joana and surnames Gonzales and Gonzalez have the same phonetic transformation in the Spanish language.
Illustration of the Phonetic Transformation for the Registers of the First Standardized Variable.
The second standardized variable considers the alphabet in Spanish with twenty-eight letters. The observations in this variable are defined as a sequence of fifty-six digits considering the number of letters appearing in the given name of the registers. For instance for the name and surname bereniceheredia this sequence is given by
where the first two digits 01 refer to the number of times the letter a appears in the sequence of letters. The steps in Sections 2.1.1 and 2.1.2 and this section to create the register system are illustrated in Figure 2.

Procedure to create the register system implemented for each administrative
2.1.4. Matching the Records of Administrative Registers
The matching of records of the administrative registers in the PBSR is carried out by using the identifying variables—the type of national identity and the identity number—as matching keys. More specifically, two registers of individuals are matched if the type of national identity and the identity number exist in both. The integration of the administrative registers in the municipality of Jamundí is illustrated in Figure 3. In The first step, we consider a tentative PBSR. Due to the large number of records provided by the SISBEN we use this administrative source as our first tentative PBSR. Next, we consider the administrative records provided by SIMAT and compare those with the records in the SISBEN, as illustrated in Figure 3.

The integration of registers to build the PBSR in the municipality of Jamundí in Colombia.
The following step compares the type of national identity and the identity number. If there is a match of the records in SISBEN and SIMAT we improve the corresponding records (e.g., the given name) by considering the information of both data sources. On the other hand, if there is not a match between records we consider the observation in the first standardized variable, the birth date, and gender to make a possible link between the administrative registers. If there is a match between the records then we improve the records in the administrative registers, for example, the type of national identity and the identity number. In contrast, if there is not a match between records we consider the observation in the second standardized variable as our matching key to implement the linkage. If there is a match we improve the other administrative records and, if there is not a match we include this administrative record and the corresponding new individual in the PBSR. As a result of implementing this procedure, we obtained a new tentative PBSR that considers the linkage of the administrative records from SISBEN and SIMAT. We implement the same steps illustrated in Figure 3 by including the other administrative sources consecutively in the following order: Births (only for females), RUT, RUV, Nueva EPS, Sanitas EPS, and BDUA, until the PBSR is constructed.
2.2. Official Population Projections at the Municipality Level in Colombia
Population projections are closely related with one of the requirements of the SDGs, which states that “the availability of timely, reliable and high-quality data disaggregated by income groups, gender, age, race, ethnicity, migration status, disability, geographic location and other characteristics relevant to national contexts should be significantly increased,” with particular reference to Latin America countries. For some of the large subnational levels (e.g., departments and the most populated cities) the Colombia NSO and some Latin American countries (CELADE 2017) use the cohort-component method (Caselli et al. 2005; Cavenaghi 2012; Wattelar 2006) to compute the OPP (DANE 2017). For instance, the method can be implemented at the departmental level by considering the following expression,
with
where
3. The Bayesian Approach for Population Estimates Using Multiple Integrated Administrative Registers
As we described in the last section, the method to compute the OPP at the municipality level does not use information provided by other ministries or institutions, specifically the administrative sources described in the IPAS in Subsection 2.1.1. Therefore, combining the OPP with the SPE produced by the PBSR has the potential to produce more accurate SPE. In this section, we describe our proposed Bayesian procedure to obtain SPE of the population size by sex and age groups illustrated in Figure 4 which is inspired by the previous proposals given by J. R. Bryant and Graham (2013, 2015) and J. Bryant and Zhang (2018).

Bayesian procedure to estimate the population size parameters given by
The first part of the proposed Bayesian approach described in Subsection 3.1 considers a Gamma/Poisson Bayesian model where we model the OPP by age and sex groups,
3.1. The Gamma/Poisson (GP) Bayesian Model for the OPP
The likelihood of the OPP is given by
where
To set the hyperparameters in Equation (4) we use
To complete the Bayesian procedure displayed in Figure 4 we consider in Subsection 3.2 the proposed Bayesian Hierarchical Model for the multiple administrative sources and in Subsection 3.3 the prior specification for the population size parameter
3.2. The Hierarchical Dirichlet Multinomial (HDM) Bayesian Model for the Administrative Sources
Consider a vector of counts
where
where
3.3. The Prior Distribution for the Population Size Parameters
We consider independent Weakly Informative priors for the population-size parameters
where
3.4. Posterior Distribution and Computations
The joint posterior distribution of
where
We believe our proposed computational framework can be implemented in statistical offices in Latin America where the number of possible administrative records may be large and producing reports of SPE is imperative. The resulting conditional distributions using Equation (8) for the model parameters and the details for computations and proposed MCMC algorithms are given in Supplemental Section B. We evaluate the computational cost of the proposed MCMC algorithms in Section C.
4. Subnational Population Estimates by Age and Sex Groups in Jamundí and Comparisons with the PBSR, OPP, and EPC
In this section, we implement the proposed Bayesian models, OPP and PBSR to compute SPE (%) by age and sex groups in the municipality of Jamundí in Colombia. As mentioned, this municipality is home to different indigenous ethnic groups but unfortunately has been highly affected by the current armed conflict and forced displacement in Colombia (Estrada 2010). We evaluate the performance of the proposed Bayesian procedure to produce the SPE (%) by age and sex groups in the municipality of Jamundí and compare with those subnational population percentages obtained with the records in the EPC. To this end, we describe the EPC in Subsection 4.1, some important results in the construction of the PBSR in Subsection 4.2. The implementation of the different methods is given in Subsection 4.3. Importantly, the four approaches in Subsection 4.3 can help practitioners in NSOs in Latin America produce SPE (%) with some specific benefits and limitations. Typically, the OPP is the method implemented by NSOs to update SPE (%). However, as mentioned, the incompleteness of administrative sources in the vital statistical system and long intercensal periods are significant limitations in developing countries for producing reliable SPE (%) based on the OPP.
The PBSR and combining the PBSR and OPP in the Gamma/Poisson-Dirichlet/Multinomial model allow practitioners to potentially produce more accurate SPE (%). However, these two procedures assume that the NSO can produce a PBSR where matching records from administrative registers is required. Unfortunately, ethical and privacy restrictions on NSOs matching registers and the extensive work of producing a PBSR make it difficult to produce timely SPE in NSOs.
Therefore, using the OPP and records (including duplicates) from multiple administrative registers in the Gamma/Poisson-Hierarchical Dirichlet/Multinomial model is an important avenue for practitioners to produce SPE. This procedure may be implemented in NSOs in Latin America where: (1) the number of administrative sources is large, (2) implementing a matching of records from heterogeneous administrative registers is complicated due to limited access to microdata and the privacy and confidentiality of administrative sources, and (3) producing official statistical reports of SPE is imperative. As discussed in Subsection 4.3, there is a significant loss of accuracy in producing SPE (%) when the PBSR is not considered in the Bayesian model. However, in general, SPE (%) obtained with the Gamma/Poisson-Hierarchical Dirichlet/Multinomial model are more accurate than those obtained with PBSR or OPP.
4.1. The EPC 2016
The EPC in Jamundí was carried out between July 24 and August 27, 2016. The EPC used a mixed enumeration method, combining traditional direct interviews by enumerators using paper questionnaires and electronic questionnaires through Computer-Assisted Personal Interviewing (CAPI). For the first time in Colombia, an enumeration method called the e-Census system was also implemented. The e-Census system in the EPC followed the guidelines proposed by the Korea NSO (UNECE 2009) which included an Internet survey where respondents completed the census questionnaire. Importantly, respondents of the EPC had the option to respond through the e-Census or via traditional enumeration or CAPI. One of the main goals of the EPC was to evaluate the e-Census system and other pre-census activities, such as mapping, recruitment, and questionnaire development, to be implemented in the National Population Census 2018.
Since Jamundí is a municipality heavily affected by the ongoing armed conflict in Colombia and is difficult to access, the EPC was particularly useful in providing updated demographic information on racial and ethnic groups by the urban-rural classification in Colombia. However, the EPC encountered two significant challenges: achieving only 90% coverage in the urban and rural areas and a low response rate of approximately 0.6% to the e-Census (DANE 2019). Unfortunately, no additional methods such as a post-enumeration survey were used to measure the coverage of respondents in the EPC. Therefore, we cannot provide a measure of the uncertainty of the SPE produced by the EPC, and the subnational population sizes in absolute numbers via the EPC may not be accurate. However, because the EPC was conducted in 90% of urban and rural areas, we expect it to capture the population structure by age and sex groups using the subnational population percentages produced by the EPC. Therefore, we consider those percentages to evaluate the SPE (%) produced with our proposed models and with the PBSR and OPP. More details about some of the most important results of the EPC are given in Supplemental Section D.
4.2. Important Results in the Construction of the PBSR
We consider the records obtained by implementing the procedures described in Subsection 2.1.1 to 2.1.4 to illustrate some important features of the administrative sources and the PBSR. Specifically, Supplemental Figures A.1 to A.3 illustrate the population pyramids considering the different administrative records of each data register and compare their structure with the EPC 2016. Table 2 contains the number of records according to emigrants, duplicates, deaths, and the resulting number of records obtained by matching the records of the administrative registers. More specifically, the population structures provided by the records of the number of victims of the armed conflict and the health system are displayed in the plots of Supplemental Figure A.1. The plots in Supplemental Figure A.2 display the population structures of the records provided by the birth certificates and education system. Supplemental Figure A.3 illustrates the population pyramids given by the Health Promoting Entities and Tax and Customs National Authority. Each population pyramid displays the strengths of the respective administrative register.
Emigrants (Internal Migration of Inhabitants from Jamundí to Other Municipalities in Colombia and International Migration), Duplicates, Deaths, and the Number of Records in Each Register.
According to the RUV, a considerable number of victims of the current armed conflict can be found in the fifteen to twenty-nine year group. This registry of victims captures population structures for intermediate age groups but it is not robust for estimating structures at early ages as illustrated in Supplemental Figure A.1. Importantly, according to Supplemental Figure A.1, the BDUA captures more information about males, and the SISBEN captures more information related to females. Birth certificates issued by the Colombia NSO provided records of the mother which are important for the estimation of women of childbearing age as is displayed in Supplemental Figure A.2. Supplemental Figure A.2 shows that the population of the Student Enrollment System for Basic and Middle Education (SIMAT, by its Spanish acronym) is concentrated in the five to nineteen year group and it is almost the same for males and females. Unfortunately in Colombia, a child might be registered but never receive a birth certificate, and perhaps this is the reason for having few records provided by birth certificates as is shown in Table 2.
Finally, the information from health and tax providers is important for estimating the relative weights of the population at older and younger ages as shown in Supplemental Figure A.3. The patterns of mortality rates are given in Supplemental Figure A.4 display high levels of mortality in males for the municipality of Jamundí in 2016. Importantly, the unusually high mortality rates for children under five years and males aged five to nine years in Supplemental Figure A.4 are perhaps associated with the deaths of children from acute malnutrition discussed in ASIS (2017). Supplemental Figure A.5 shows that the estimates of deaths and international migration are consistent with the population dynamics in Colombia. Therefore, the higher number of deaths at older ages and high levels of migration at intermediate ages are observed in Supplemental Figure A.5. The duplicates in the administrative sources decrease when the age group increases and therefore we expect that the PBSR to perform better when computing the SPE for those age groups where the number of duplicates is small, as we will discuss in the next section.
4.3. Subnational Population Estimates in Jamundí
To compute SPE (%) by age and sex groups in Jamundí we consider the four approaches discussed in this article: (i) The PBSR, (ii) The OPP, (iii) The Gamma/Poisson-Dirichlet/Multinomial (GP-DM) model with the OPP and the records of individuals in the PBSR and, (iv) The Gamma/Poisson-Hierarchical Dirichlet/Multinomial (GP-HDM) model with the OPP and the records (including the duplicates) of individuals from the multiple administrative registers. We also consider the SPE (%) produced by EPC to compare those SPE (%) produced by the four alternatives, (i) to (iv). We simulated two chains with
We evaluated the running time of the proposed Supplemental Algorithms 1 and 2 in Section C. To summarize, in Supplemental Table C.1 we found that the computational cost of Supplemental Algorithm 2 is larger than the obtained with Supplemental Algorithm 1 and both Algorithms 1 and 2 produced similar results, as shown in Supplemental Table C.2.
To estimate the posterior mean of the SPE (%) we average the
where

Population pyramids. (a) The OPP (top-left), (b) The PBSR (top-right), (c) The posterior mean of the SPE (%) obtained with the Gamma/Poisson-Dirichlet/Multinomial (GP-DM) model (bottom-left), and (d) The posterior mean of the SPE (%) obtained with the Gamma/Poisson-Hierarchical Dirichlet/Multinomial (GP-HDM) model (bottom-right). The black lines illustrate the EPC 2016 implemented in the municipality of Jamundí.
However, when the OPP or PBSR are not able to capture the population structure the proposed Bayesian models can potentially lead to inaccurate SPE (%). Figure 5 illustrates this limitation in the five to nineteen years group where both OPP and PBSR are not accurate according to the EPC. More specifically, the zero to nineteen years age group is overrepresented under the OPP perhaps due to a decline in fertility rates over the period 2005 to 2016, as we discussed in Supplemental Section D. In addition, Figure 5 displays that the PBSR has a greater proportion aged five to nineteen years than the EPC possibly due to the large number of duplicate records in those age groups, as is displayed in Supplemental Figure A.5.
According to Figure 5, the PBSR and EPC provide similar proportions for the under five-year population. However, the GP-DM is not able to estimate adequately the population structure for the zero to four years age group because of the larger proportions produced by the OPP. The population dynamics did not behave as expected in the OPP perhaps due to the decline in fertility that appeared earlier and was more accelerated than usual over the intercensal period 2005 to 2016, as we discussed in Supplemental Section D. This may be improved by including additional administrative sources using for instance the administrative registers of vaccines in the under-five age group which is produced by the Ministry of Health in Colombia.
To compare the SPE (%) obtained with the methods with the four approaches we implement a measure commonly used to evaluate the coverage of Population Censuses called the Absolute Percentage Differences (APD) given by
where
Table 3 summarized the APD by age groups and males, females, and both sexes. According to Table 3, the proposed GP-DM and GP-HDM models produce posterior SPE (%) with APDs below 20% and in most cases below 15%. In contrast, the APD produced with the OPP for the 80+ age group are extremely large, and also those APDs obtained in the early ages. For males, females, and both sexes, the PBSR produces APD bigger than 15% in the ten to fourteen and fifteen to nineteen year age groups where an important number of duplicate records was found, as is illustrated in Supplemental Figure A.5. Therefore when only the OPP or PBSR is considered to produce SPE (%) in official reports the conclusions about the population structure may be inaccurate. In contrast, the proposed Bayesian models allow us to obtain smaller APDs in general than those obtained by considering the OPP or the PBSR. We also found that the GP-HDM model produces smaller APDs in the 0 to 4 and 80+ age groups than those APDs given by the GP-DM model. This characteristic is also illustrated in Figure 5 and perhaps due to the multiple administrative registers can describe better the population structure for those age groups than the integrated PBSR. In general, the GP-DM model produces more accurate results than the GP-HDM model according to the APD values. However, except for the 0 to 4 and 80+ age groups, the absolute value difference between the APDs produced by GP-DM and GP-HDM models is smaller than 2%, that is,
Absolute Percentage Differences for Males, Females, and Both Sexes and According to Age Groups Using the Different Methods: PBSR, OPP, and the Posterior Mean of the SPE (%) Obtained with the Gamma/Poisson-Dirichlet/Multinomial (GP-DM) Model and Gamma/Poisson-Hierarchical Dirichlet/Multinomial (GP-HDM) Mode.
Averages of Absolute Percentage Differences for Males, Females, and Both Sexes Using the Different Methods: PBSR, OPP, and the Posterior Mean of the SPE (%) Obtained with the Gamma/Poisson and Dirichlet/Multinomial (GP-DM) moDel and Gamma/Poisson and Hierarchical Dirichlet/Multinomial (GP-HDM) Model.
where
To illustrate the uncertainty of the SPE (%) obtained with the GP-DM and GP-HDM models we show the marginal posterior distributions of the SPE (%) for females (left) and males (right) in the plots of Figure 6. According to the shapes of the posterior distributions illustrated in Figure 6, the uncertainty of the SPE (%) by age and sex groups obtained with the GP-DM and GP-HDM is similar. The marginal posterior distributions of the SPE (%) under the GP-HDM model in the 0 to 4 and 80+ year age groups are closer to those subnational population percentages given by the ECP. Also, in general, the marginal posterior distributions of the SPE (%) under the GP-DM model are closer to those subnational population percentages produced by the EPC.

Posterior distribution of SPE (%) by age groups for females (left) and males (right) obtained with the Gamma/Poisson-Dirichlet/Multinomial (GP-DM) model and Gamma/Poisson-Hierarchical Dirichlet/Multinomial (GP-HDM) model. The color points illustrate the experimental Population Census 2016 (red).
Finally, to illustrate other important demographic indicators computed using the posterior samples of
where

Sex ratio for all age groups. The histograms display the posterior distribution of the sex ratio obtained with the Gamma/Poisson-Dirichlet/Multinomial (GP-DM) model and Gamma/Poisson-Hierarchical Dirichlet/Multinomial (GP-HDM) model. The vertical lines illustrate the OPP (blue), EPC (black), and PBSR (red).
5. Concluding Remarks and Future Work
In this work, we propose Bayesian procedures that combine a Population Base Statistical Register (PBSR) and the Official Population Projections (OPP) to produce SPE of the population size in percentage (SPE (%)). We consider confidential administrative registers from the municipality of Jamundí in Colombia to build our PBSR where the administrative sources are produced by different institutions such as the Colombia NSO, Ministry of Health, Ministry of Education, unit for victims, tax registration, and health-promoting entities. We implement our proposal to estimate SPE (%) in the multiethnic municipality of Jamundí which includes indigenous communities and has been highly affected by the current Colombian armed conflict. We found that our proposal produced SPE (%) more accurately than the produced by OPP and PBSR for both females and males. We also proposed faster MCMC algorithms to implement the proposed Bayesian procedures that allow practitioners or official statisticians to produce SPE (%) in a few seconds.
As future work, we could consider integrating more administrative registers for the age groups where the PBSR was not able to produce SPE (%) with larger accuracy. For instance, due to the early-age population being underrepresented in the PBSR, we could explore the inclusion of a new administrative register by using the vaccine information of the under-five population computed by the Ministry of Health in Colombia. Also, another interesting avenue of future research includes the implementation of the proposed framework at the National level and for other subnational areas in conflict or difficult to access. Finally, our proposed Bayesian procedure described in Section 3.2 assigns equal weight to all data sources across the entire age-sex distribution. As future work, we could explore a Bayesian procedure that incorporates additional prior information about the reliability of the SPE produced by age and sex using each administrative source and implement a sensitivity analysis.
Supplemental Material
sj-docx-1-jof-10.1177_0282423X241293749 – Supplemental material for A Bayesian Approach to Produce Subnational Population Estimates Using a Population Base Statistical Register
Supplemental material, sj-docx-1-jof-10.1177_0282423X241293749 for A Bayesian Approach to Produce Subnational Population Estimates Using a Population Base Statistical Register by Jairo Fúquene-Patiño, Andryu Mendoza, Cesar Cristancho and Mariana Ospina in Journal of Official Statistics
Footnotes
Acknowledgements
We sincerely thank the two referees, associate editor, and editor for your constructive feedback. Your insightful comments and feedback during the review rounds have greatly improved our paper.
Authors’ Note
The views that are expressed on statistical, methodological, technical, or operational issues are those of the authors and not necessarily those of the UC Davis or the Colombia NSO (DANE).
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Jairo Fúquene-Patiño thanks to the Campos Scholar initiative, UC Davis for the financial support to this project.
Supplemental Material
Supplemental material for this article is available online.
Received: May 2023
Accepted: September 2024
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
