Abstract
Growing numbers of countries are including ethnoracial questions on their national censuses, spawning new scholarship on the politics of state classification and ethnoracial stratification. However, these literatures have generally not focused on how alignment or misalignment between state and popular conceptualizations of ethnoracial categories affects official measurements, including population size and ethnoracial inequality. The authors leverage a quasi-natural experiment on state-popular alignment in Mexico by drawing on three recent government surveys, which, for the first time in the nation’s history, sought to measure black identification yet defined blackness in divergent ways. The authors find that questions that define blackness in cultural terms (which misalign with popular conceptions of blackness) produce substantially smaller population estimates and considerably less black disadvantage than a noncultural (racial origins) question. This article bridges the literatures on the politics of ethnoracial classification and stratification and produces new empirical and theoretical insights into the study of ethnoracial measurement and inequality.
To recognize ethnoracial identities, assess minority populations, and address discrimination, nation-states are increasingly adding ethnoracial questions to their censuses, surveys, and administrative systems (Loveman 2014, 2021; Morning 2008). This has spawned new scholarship on how states construct or make “race” (Marx 1998) and racial categories via official classifications (Bailey, Fialho, and Loveman 2018; Kertzer and Arel 2002; Loveman 2014; Nobles 2000). However, this literature has paid little attention to the degree of alignment (level of congruity) between state and popular definitions of ethnoracial categories, and how (mis)alignment may affect statistical portraits of the nation.
Exploiting a quasi-natural experiment, we examine how well various unprecedented state measures of black identification reflect popular understandings of blackness at the national level, and the potential statistical consequences of alignment or misalignment on black population size and inequality. Important in their own right, official statistics also have broader implications, including their potential to shape public policy, identity formation, ethnoracial politics, and even the mitigation or reproduction of inequality itself.
State categories can function to sustain stratification systems (Menjívar 2023). Nevertheless, scholarship on ethnoracial stratification often does not approach the study of inequality through a constructivist lens (for exceptions, see Paredes 2018; Shiao 2019; Telles, Flores, and Urrea-Giraldo 2015; Villarreal 2014). Instead, scholars frequently default to whichever ethnoracial variables are available in data, or design questions that mirror official questions (see Howell and Emerson 2017; James 2001; Saperstein, Penner, and Light 2013; Strmic-Pawl, Brandon, and Steve 2018). This is problematic when official classification systems do not reflect social realities, especially given that public policies are based heavily on official assessments of a country’s sociodemographic composition.
Self-identification has become the global standard for the collection of government ethnoracial data. However, there is no universal method for how to ask about ethnoracial identification (Morning 2008; United Nations 2017). The ways in which states frame ethnoracial questions (e.g., via references to race, culture, or ancestry) are often rooted in historical context, national ideology, and global power relationships (Simon and Piché 2012). The United States, for example, has included a race question on its census since 1790, as racial division has been central to U.S. ideology and practice. The 2020 Census asked, “What is your race?” and provided categorical options which, according to the U.S. Census Bureau, “reflect a social definition of race recognized in this country” (U.S. Census Bureau 2022). Because of the country’s long history with official racial classification and the expectation that there is popular recognition of the term race, the question does not define race or list criteria for choosing a racial category (U.S. Census Bureau 2020). By contrast, in many Latin American countries, official ethnoracial categorization is much newer and governments often establish membership criteria.
The National Institute of Statistics and Geography (Instituto Nacional de Estadística y Geografía [INEGI]) collected census data on its afrodescendant or “black” population, whose presence dates back to the colonial era, for the first time in the 2015 Intercensal Survey (Encuesta Intercensal [EIC]) and again in the 2020 Census. Even though empirical research shows that popular understandings of blackness in Mexico are based overwhelmingly on phenotype (Flores and Sue n.d; INEGI 2019; Resano Pérez 2015; Solís, Güémez Graniel, and Sue 2021; Sue 2023), both surveys applied cultural criteria for black classification, following the long-standing practice for data collection on Mexico’s indigenous population. Hence, a cultural conceptualization may represent misalignment between state and popular understandings of blackness, potentially explaining the surprising finding of socioeconomic advantage of black Mexicans using these data sources (e.g., Torre-Cantalapiedra and Sánchez-Soto 2019).
The Mexican case provides a rare quasi-natural experiment to assess the implications of distinct state conceptualizations of blackness. Despite its use of cultural criteria in 2015 and 2020, another INEGI survey, fielded in 2016, used an alternative question framing based on racial origins. We leverage this timing to assess differences in black population size and ethnoracial socioeconomic inequality attributable to question framing, while considering potential confounders. We estimate the contemporary Mexican black population to range from 2.3 million to 3.6 million, and its socioeconomic status (SES) from slightly advantaged to clearly disadvantaged, largely depending on question framing. To our knowledge, this is the first study to assess the implications of different state constructions on ethnoracial categorical inequality using multiple official measures of self-identification in a short span of time.
Theoretical Framework
The Politics of Ethnoracial Classification
Since the middle of the twentieth century, social scientists have increasingly understood “race” to be a social construction, compared with earlier biological connotations. The shift to constructivist thinking about race has involved an emphasis on categorization and classification (Brubaker, Loveman and Stamatov 2004; Monk 2022). Official classification systems, and censuses in particular, have been of interest to scholars as they reflect the symbolic power of the modern state (Brubaker et al. 2004); nation-states can reify, modify, and create ethnoracial categories and understandings of such categories through their censuses. Moreover, census data are generally “unrivaled in their credibility as a source of knowledge about the conditions and characteristics of national populations” (Loveman 2014:30) and provide the primary justification for instituting public policies (Loveman 2021; Menjívar 2023; Nobles 2000).
Scholars have convincingly shown that, far from being “objective” tools used to capture “objective” social realities, official categorization systems are highly political and ideological in nature (Kertzer and Arel 2002; Loveman 2014; Menjívar 2023; Mora 2014; Nobles 2000; Paschel 2016; Simon and Piché 2012). Questions over whether ethnoracial categories should be included in censuses, which categories should be included or excluded, and what criteria should be used to define categorical membership, are contested political processes. Politics and ideology influence this, determining which categories are deemed legitimate, how they are framed, and which demarcate important social divisions. However, scholarship on the politics of ethnoracial classification has focused primarily on the process behind the development of official categorization systems (e.g., Loveman 2014; Mora 2014; Nobles 2000), with less attention paid to their statistical consequences, particularly stratification outcomes, which also have political implications.
The Measurement of Ethnoracial Stratification and Inequality
Most studies of ethnoracial stratification and inequality treat ethnoracial categories as relatively fixed, either in philosophy or practice. In contrast, a social constructivist perspective assumes that the way in which ethnoracial categories are measured can affect estimates of ethnoracial inequality. Yet even among those who view race as a social construction, general scholarly practice separates the study of racial fluidity from the study of ethnoracial inequality (James 2001; Saperstein and Penner 2012). This precludes the systematic assessment of how categorical malleability affects inequality (Howell and Emerson 2017). This is particularly problematic in settings where official measurement of a category is new and less is known about a population.
State-level processes of data production may not reflect popular understandings and self-classification practices. Therefore, it is important to investigate the meanings that individuals give to social categories and forms of difference (Flores and Sue n.d.; Monk 2022) and how those meanings align with official measures of race and ethnicity (Hitlin, Brown, and Elder 2007; Schachter, Flores, and Maghbouleh 2021). Given that state and popular conceptualizations are interdependent (see Bailey and Fialho 2018; Bailey et al. 2018; Emigh, Riley, and Ahmed 2016; Loveman 2021; Mora 2014) and often in flux, it is important to assess alignment at key theoretical junctures, such as the introduction of new ethnoracial categories, as is the case of the black category in Mexico.
Theoretically, we build on Bailey et al.’s (2018) concept of “dimension alignment,” which they use to assess the effects of Brazilian affirmative action policies on self-identification, by examining the overlap between lay identification and government classification schemes before and after policy implementation. Whereas these authors use “alignment” to understand lay identification practices, we examine the degree of government-popular alignment vis-à-vis definitions of ethnoracial categories, to capture its influence on official population size and inequality measurements. We also build on prior empirical research which has shown how estimates of ethnoracial population statistics can vary depending on question wording and the measure being used (e.g., Bailey, Loveman, and Muniz 2013; Bailey, Saperstein, and Penner 2014; Flores, Vignau-Loría, and Martínez Casas 2023; Howell and Emerson 2017; Loveman, Muniz, and Bailey 2012; Paredes 2018; Shiao 2019; Telles and PERLA 2014; Telles et al. 2015). These studies have been based largely on small-scale and unofficial surveys (see Villarreal 2014 for an exception), which are generally too small to generate reliable assessments of ethnoracial variation for smaller minority populations. Likely for this reason, the emphasis has mostly been on question wording effects on population size instead of inequality. Moreover, unofficial surveys are not ideal for addressing the role of the state in the production of ethnoracial statistics because they are not instruments of the state.
In addition to the way in which ethnoracial questions and categories are framed, various factors such as national and local ideologies, the status of an ethnoracial category, and cultural or identification revitalization can affect how people choose to classify themselves (Flores et al. 2023; Paredes 2018; Telles and Torche 2018). Consequently, these factors affect estimates of population size and, to the extent that people of different social and economic strata are differentially affected, of inequality. Lower status individuals socially viewed as black may opt out of this category to avoid stigma and discrimination, or as a strategy for upward mobility. In contrast, higher-status individuals may disproportionately opt into the category to reclaim a stigmatized label, as is the case of mulatto classification in the Dominican Republic and the black category in Brazil (Telles and Paschel 2014), as well as the indigenous category in the United States (Eschbach, Supple, and Snipp 1998) and Mexico (Telles and Torche 2018).
The Measurement of Ethnoracial Identification Across Latin American Censuses
Across the Americas, there is great variation in how ethnoracial categories are constructed, with census questions asking about ancestry, customs, identity, group membership, physical appearance, race, and language (Loveman 2014). Surrounding the turn of the twentieth century, many Latin American countries gathered “racial” statistics but, by midcentury, direct questions using racial criteria were dropped from most censuses and were replaced with measures of culture, such language, food, and clothing (Loveman 2014). A major consequence of this shift was that many Latin American censuses stopped collecting data on afrodescendants. Only in the past few decades have most Latin American states introduced or reintroduced measures to enumerate their black populations (Loveman 2014).
States do not collect large-scale ethnoracial data very often and the many years between government censuses precludes the isolation of question wording effects. For example, when the 1993 Colombian Census asked people if they belonged to a black community, 1.5 percent answered affirmatively, but in 2005, when asked if they considered themselves black or mulatto on the basis of cultural or physical features, 10.5 percent answered affirmatively (Paschel 2016:133). Similarly, the black population in Costa Rica increased from 2.0 percent to 7.8 percent between 2000 and 2011, possibly driven by a similar change in question wording. As there was a significant time gap between these censuses, it is hard to identify the source of the shift, as it could have been propelled by demographic change, fluctuations in ethnoracial categorical meanings, or by census changes to the questions themselves.
We are not aware of any country other than Mexico that has used distinct measures of ethnoracial classification across multiple official surveys in a short span of time. Countries that have used multiple measures over time have sometimes bundled highly distinct criteria into a single question. For example, the 2005 Colombian Census asked, “According to your culture, people, or physical traits . . .,” making it impossible to disentangle the effects of the three distinct ethnoracial criteria invoked in the question. Whereas Flores et al. (2023) examined the effect of changes to the Mexican census question on indigenous identification between 2000 and 2010 using a survey experiment, such data are exceedingly rare. Our quasi-natural experiment is particularly relevant because of the absence of sufficiently powered experimental data on Mexico’s black population and because it is based on official data.
Regarding the relationship between ethnoracial measurement and inequality, Loveman et al. (2012) drew on a Brazilian survey to measure the effects of a binary (black/white) system versus a three-tiered system with a “mixed race” category. They find that racial inequality is greater when the binary system is used. Whereas these authors focused on categorical options, in this study we examine both categorical options and question framing. We also expand on the work of Villarreal (2014) that used the 2010 Mexican Census to examine how (proxy) indigenous self-identification versus classification on the basis of indigenous language proficiency affects the estimated size and SES of the children of indigenous parents. He found that using linguistic criteria reduced estimates of indigenous population size and produced higher levels of socioeconomic disadvantage. Our study contributes to these conversations, addresses some of their limitations, and extends them to the black population.
The Mexican Context
The Role of Mexican Ideology in Official Ethnoracial Classification
Mexico’s particular history and racial ideology have shaped its conceptualizations of ethnoracial categories and approaches to ethnoracial data collection. Following the end of the Mexican Revolution (ca. 1921), the national ideology of mestizaje emphasized the Spanish and indigenous contributions to the Mexican nation while mostly ignoring its African contribution, even though colonial Mexico was the destination for at least 200,000 enslaved Africans (Aguirre Beltrán 1944). Mestizaje glorified the country’s race mixture and proclaimed the eventual integration of Mexico’s indigenous population, resulting in a superior mestizo race, while also assuming the near disappearance of blacks (Sue 2013). Although race-based thinking undergirded the ideology, national leaders shortly after began downplaying notions of “race,” instead using language and other cultural markers to understand national diversity (Flores and Sue n.d.; Martínez Casas et al. 2014; Saldívar 2014, 2018).
The emphasis on culture and avoidance of race is reflected in the nation’s census practices. A “race” question only appeared once, on the 1921 Census, and included the categories of Indian, Mixed, White, other, and “foreigners without distinction.” The question was dropped in 1930, consistent with other Latin American censuses during that time (Loveman 2014). Officials announced that, in lieu of a race question, two additional language questions would be added to acquire “precise knowledge of the process of national integration” (Loveman 2014:227). Subsequent censuses included additional questions about “material culture,” reflecting national interests about ethnicity, development and cultural progress (Loveman 2014). The ideological centering of mestizaje and culture resulted in the symbolic and statistical erasure of Mexico’s black population (Loveman 2014; Sue 2013).
Recent Changes in Ethnoracial Classification in Mexico
The official statistical invisibility of black Mexicans ended in 2015 with the inclusion of a black self-identification question on the EIC, a major mid-decade survey used to assess population size and several other important topics. This shift occurred largely in response to pressure from international organizations, academics, government institutions, and domestic black movement organizations (Delgado 2021; Loveman 2014).
The decision of whether to count an ethnoracial population often gains prominence over the equally important decision of how to count (Simon and Piché 2012). However, as we show, decisions about “how” have important implications for ethnoracial statistics. In the case of Mexico, once the decision to introduce a black question in 2015 was made, debate surfaced regarding the formulation of the question (see Torre-Cantalapiedra 2018). Various actors including census officials, national and international experts, and federal, state, and municipal representatives of civil society organizations came together to discuss the issue (Loveman 2021; Resano Pérez 2015; Ruiz Ramírez 2014; Saldívar and Moreno Figueroa 2016). As part of these conversations, INEGI conducted extensive pilot tests of multiple versions of a black identification question. However, all of INEGI’s pilot questions used a cultural framing, despite low levels of question comprehension (Ruiz Ramírez 2014). This suggests that the cultural framing was not up for debate, reflecting the entrenched nature of ethnic or cultural understandings of difference in Mexico.
Popular Conceptions of Blackness in Mexico
Scholars have found that everyday understandings of what it means to be black in Mexico is based predominantly on phenotype (Flores and Sue n.d.; INEGI 2019; Resano Pérez 2015; Solís et al. 2021; Sue 2023), and, to a lesser degree, family origins (Flores and Sue 2023; Solís et al. 2021). The vast majority of black-identified Mexicans do not identify as such on the basis of cultural characteristics, including in regions with high concentrations of self-identified blacks (Resano Pérez 2015). Only 16 percent of black-identified respondents in INEGI pilot tests conducted in the states of Jalisco and Oaxaca, reported identifying as black on the basis of their “culture” and 11 percent on the basis of their “traditions and customs.” In contrast, 53 percent reported identifying as black on the basis of their skin color or other physical characteristics (INEGI 2019). Similarly, a 2019 nationally-representative survey showed that 59 percent of black-identified Mexicans identified as such on the basis of their “physical characteristics,” compared with 45 percent for their “family origins” and only 16 percent on the basis of their “culture, history and customs” (Solís et al. 2021). 1
Nonblacks in Mexico offer similar definitions of blackness (Flores and Sue 2023; Sue 2013, 2023; Vaughn 2001), conceptualizations which can provide the basis for discrimination (Schachter et al. 2021). Based on an open-ended question about the meanings of ethnoracial categories embedded in a 2021 nationally-representative survey, Flores and Sue (n.d.) found that 64 percent of Mexicans defined blackness in “racial” terms. Of this group, a little over half referenced physical characteristics (mainly skin tone), 8 percent referenced family or African origins, and 13 percent explicitly used the term race to define blackness. This starkly contrasts to the 2 percent of respondents who defined blackness in cultural terms.
On the basis of these studies, we posit that categorical criteria that emphasize cultural aspects of blackness strongly misalign with popular conceptualizations. Although there is no direct evidence regarding popular understandings of a “racial origins” framing (as used in the Módulo de Movilidad Social Intergeneracional [MMSI], or Intergenerational Mobility Module Survey, discussed later), the very low resonance of a cultural connotation of blackness among Mexicans, coupled with a high resonance of phenotype (which could possibly translate into a “racial” understanding of blackness) and fairly high resonance of “family origins,” strongly suggests that a (noncultural) “racial origins” framing is better aligned with popular understandings of blackness compared with cultural framings.
A Quasi-Natural Experiment in State-Popular Alignment: Cultural versus Noncultural Framings
INEGI chose to define blackness in the 2015 EIC and 2020 Census in cultural terms, which is consistent with its long-standing approach to gathering data on “ethnic” groups (Saldívar 2014) and with the recommendations of some international organizations (Delgado 2021). However, the 2016 MMSI of the National Household Survey diverged from the cultural framework by framing its question in terms of “racial origins.” Although the MMSI was also administered by INEGI, the process behind the development of this ethnoracial question was distinct from that of the EIC and Census. The latter surveys were designed by the census branch of INEGI (in consultation with other state entities) while the MMSI was carried out by a different branch. Importantly, the MMSI question was designed by a Mexican social stratification scholar, modeling it after items from the Project on Ethnicity and Race in Latin America, a study of ethnoracial classification and inequality (Telles and PERLA 2014; P. Solís, MMSI principal investigator, personal communication).
We exploit this quasi-natural experiment on question wording—moving from a strong cultural framing, to a noncultural one, and back to a largely cultural framing—to help identify question wording effects on population size and ethnoracial inequality. Given that question wording varies not only because of question framing, and given the presence of other important confounders, we supplement these data with that from additional nationally-representative unofficial surveys to better isolate potential question framing impacts.
Ethnoracial Stratification in Mexico Based on Official Data and Other Evidence
Although there were no official data on black Mexicans prior to 2015, other previous evidence demonstrates that black Mexicans suffer discrimination (CONAPRED 2011a, 2011b; Cruz Carretero 1989; Sue 2009, 2013, 2023; Vaughn 2001; Velázquez and Iturralde Nieto 2016) and strongly suggests black marginalization (CONAPRED 2011a, 2011b; Cruz Carretero 1989; Flores Dávila and Lézé 2007; Velázquez and Iturralde Nieto 2016). A few scholars have conducted analyses on black inequality using the 2015 EIC, with seemingly mixed results. Torre-Cantalapiedra and Sánchez-Soto (2019) found a small black advantage in educational and occupational attainment using bivariate statistics and models controlling for other characteristics. However, black identification could be endogenous to attainment, which Villarreal and Bailey (2020) examined, also using the EIC. Leveraging state-level variation in a government campaign to promote awareness of the black identification question as an instrumental variable, they showed that a lack of black disadvantage from more conventional models treating ethnoracial identification as exogenous reverses when treating it as endogenous. Villarreal and Bailey argued that the newness of the black question may make it particularly susceptible to selection effects related to the endogeneity between SES and black self-identification.
Although Villarreal and Bailey’s (2020) analyses provide significant insight into discussions of ethnoracial inequality in Mexico, their findings could be specific to the EIC’s (or the Census’s) cultural measure of blackness. Our research thus extends this conversation by examining the degree to which surveys conducted a short time after the campaign (in particular the EIC and MMSI, fielded roughly 18 months apart) yield different estimates of ethnoracial inequality according to question wording. These analyses are further bolstered by our addition of the 2020 Census, fielded after both the EIC and MMSI and reverting to a largely cultural framing. Although we do not assume that a noncultural question is free from endogeneity, our findings suggest that endogeneity may be much less severe in estimates of black SES using a “racial origins” question because of potentially better alignment.
The Study Hypotheses
On the basis of the notion that government definitions of blackness which emphasize culture are highly misaligned with popular understandings of blackness, we test the following hypotheses:
Hypothesis 1 (H1): Questions that frame black identification in cultural terms will yield a lower percentage of people identifying as black compared with questions that use a noncultural (racial origins) framing.
Hypothesis 2 (H2): Questions that frame black identification in cultural terms will produce lower levels of black disadvantage compared with questions that use a noncultural (racial origins) framing. 2
Figure 1 further illustrates the way in which we theorize alignment with regard to the specific surveys used in our analyses.

Position of official survey questions on the basis of presumed degree of (mis)alignment between question framings and popular conceptions of blackness.
Data and Methods
Data Sources
As previously discussed, we use three official surveys conducted by Mexico’s INEGI between 2015 and 2020: (1) the EIC, which sampled 6.1 million households from March 2 to 27, 2015, designed to be representative nationally, for five different locality sizes, each of the 32 states, and for all municipalities (INEGI 2015b); (2) the MMSI, collected from July to December of 2016 and designed to be representative at national, state, and urban and rural levels, with one adult aged 25 to 64 years randomly selected in each sampled household, for a total of 25,500 individuals (INEGI 2016); and (3) the 2020 Census long-form survey, mainly conducted from March 2 to 27, which included 4 million households and was designed to be representative at national, state, and municipal levels, as well as for different locality sizes within each state (INEGI 2020). 3
The surveys varied in the ethnoracial question and response categories they used. In the EIC, INEGI used a cultural framing of blackness: “In accordance with your culture, history, and traditions [italics added], do you consider yourself black, meaning afromexican or afrodescendant?” If the respondent simply answered “yes,” they were assigned the response category “yes.” However, if they gave a positive response but also verbalized mixture, they were assigned the response category “yes, in part” (INEGI 2015a). Roughly 30 percent of all blacks were classified as “in part.” Because there are very small differences in the profiles of those classified as “yes” versus “yes, in part” (not shown), except for region of residence, 4 our EIC black sample includes persons in both categories. 5 The 2020 Census maintained a cultural framing of blackness, but also included ancestry as an additional criterion or restriction for membership: “Given your ancestors and in accordance with your customs and traditions [italics added], do you consider yourself afromexican, black or afrodescendant?” The response options for both questions were simply “yes” or “no.” 6 In notable contrast to the EIC and Census, the 2016 MMSI used a single question on ethnoracial identification. The MMSI adopted a “racial origins” framing and offered five mutually-exclusive options: “In our country there are people of multiple racial origins [italics added]. Do you consider yourself a . . . [read options] black or mulatto/indigenous/mestizo/white/other person?” 7 Although question wording differences extend beyond framing, extensive consideration of the questions’ various elements led us to conclude that our findings are due largely to question framing. We discuss these and other potential artifacts next.
Analytical Strategy
Our basic analytic approach was to compare, across surveys (1) the share of individuals identifying as black to assess possible impacts of question framing on black population size, and (2) the degree of black socioeconomic advantage or disadvantage relative to nonblack, nonindigenous Mexicans, to gauge the likely impact of question framing on socioeconomic inequality, specifically schooling, as a measure of early-life chances and current earning potential, and a normalized index of household amenities/assets, commonly used in developing countries to approximate household wealth (e.g., Telles and Torche 2018). 8
Although the quasi-natural experimental design of INEGI fielding cultural questions in 2015 and 2020 and a noncultural question in between these measurements drastically reduces the possibility that many confounders could be affecting our results, it does not eliminate them ipso facto. In the following discussion, we provide a brief summary of all potential artifacts, confounders, and alternative explanations we considered and the outcome of these examinations, with additional details provided in the appendices. Table 1 presents each of the confounders we considered, dividing them into those unrelated versus related to the measurement of ethnoracial identification itself, and summarizing their potential and likely impact. When appropriate, we indicate the relevant appendix in which each issue is discussed more fully.
Summary of Potential Confounders to Observed Cross-Survey Differences in Black Population Size and Inequality.
Note: EIC = Encuesta Intercensal (Intercensal Survey); INEGI = Instituto Nacional de Estadística y Geografía (National Institute of Statistics and Geography); MMSI = Módulo de Movilidad Social Intergeneracional (Intergenerational Mobility Module); SES = socioeconomic status.
Confounders Unrelated to Question Format
To reduce the possibility that the EIC and Census could be covering different populations than the MMSI, as summarized in row 1 of Table 1, we took steps to make the three samples and measures as comparable as possible. In addition to restricting all analyses to individuals 25 to 64 years of age, the range available in the MMSI, and weighting our estimates, we assessed potential differences in survey coverage by comparing the sociodemographic profile of all respondents ages 25 to 64 across surveys. We found the sociodemographic composition of the three samples to be remarkably similar with two important exceptions. First, as hypothesized, the percentage of people identifying with a particular ethnoracial category varied considerably, especially between the EIC/Census and the MMSI (we discuss comparability between measures below). Second, we also found some relatively small differences in three socioeconomic measures, which we fully attributed to question comparability and where our analyses were robust to not including them. Given this, we discarded differences in survey coverage as a confounder (also see Appendix A).
Related to composition, to avoid confounding socioeconomic differences between ethnoracial categories with sociodemographic or geographic differences in these populations, our models for schooling and household assets and amenities control for gender, age, urban or rural residence, and state-level fixed effects. Because ethnoracial differences in SES could partly be the result of processes operating within urban-rural or regional contexts, estimates controlling for these factors are perhaps conservative, assuming that misalignment between popular and state definitions of blackness is particularly likely in rural areas (see Appendix A) or specific states (see note 1).
As summarized in row 2 of Table 1, we also discarded differential demographic growth for black versus nonblack categories as an explanation for the observed patterns. The estimated differences in black identification would require an implausibly high amount of differential demographic growth between black and nonblack Mexicans over the period in question to produce a sizable minority of the difference between estimates. Likewise, the observed temporal pattern in the share of Mexicans identifying as black—highest in the 2016 MMSI, followed by the 2015 EIC, and then the 2020 Census, all statistically significant from each other—further reduces the likelihood that differential demographic growth had an important role.
In contrast to sheer demographic growth, it is quite plausible that black identification could have increased during the period, most notably due to awareness campaigns and conversations around blackness surrounding the EIC and Census (Villarreal and Bailey 2020). As noted in row 3 of Table 1, to assess potential “identification flux” we estimated changes in the share of Mexican adults 25 to 64 years of age identifying as black over time using a string of cross-sections from the LAPOP Mexico project, which asked identical questions on ethnoracial identification (using a neutral framing) in five different nationally-representative surveys fielded roughly every two years between 2010 and 2019 (for details, see LAPOP 2019). Our analyses suggest that black identification increased moderately during part of the 2010s, enough to warrant further analysis and, as described later, adjust our estimates (see Appendix A for more on this assessment).
To ameliorate the confounding effect of identification flux, we took the estimated increase in black identification during the period using LAPOP to project and back-project the share of individuals that would have identified as black with the EIC and Census questions to match the timing of the MMSI, thus correcting our population size estimates for increases in black identification (see Appendix A for details). Unlike our approach to population estimates, we did not adjust inequality estimates for the increase in black identification for two reasons. First, we found identification flux to be a less important contributor than question format, particularly for people with lower schooling levels (see Appendix A, Table A2). Second, we found very similar socioeconomic profiles of black-identified individuals in the EIC and Census (i.e., prior to and after the period of observed increase in black identification) (see “Findings”). As such, socioeconomic differences in black identification underlying estimates of inequality are most likely due to question format, not socially differentiated increases in black identification. Lack of differences in the education profile of black individuals in 2015 versus 2020 also suggests that neither coverage nor ethnoracially differentiated shifts in the educational attainment of black Mexicans for other reasons (e. g., cohort differences in schooling and differences in the estimation of years of schooling across surveys) produced the observed differences with the MMSI.
Differences in Question Format Other Than Question Framing
The measurement of ethnoracial identification differed across surveys in ways that could be due to question framing and/or non-framing-related aspects of question format. To explore this, we began by creating three comparable categories in each survey: nonblack nonindigenous, black, and indigenous. 9 We also listwise-deleted all “don’t know” and did-not-respond options in an attempt to be consistent with the 2020 Census, which did not include this option.
Next, as summarized in the bottom half of Table 1, we considered a series of confounders related to question format. First is the issue of self versus proxy responses (see row 4). Unlike the MMSI, where all questions were only asked to and are about the informant, consistent with census practices, the EIC and Census asked a main informant about the ethnoracial identification of every household member, including themselves. Although we were able to restrict analyses to the main informant in the EIC, the public-release census microdata did not provide a main informant identifier. Instead, INEGI provided us with basic tables comparing the percentage of Mexican adults ages 25 to 64 years identifying as black between the main informant and proxy responses. These tables showed no differences in identification overall, or by important sociodemographic categories.
Next, we considered a series of differences in question format across surveys, other than framing (see rows 5 and 6 of Table 1 and Appendix B). In particular, response option formats in the EIC and Census—separate yes-or-no questions that allow both black and indigenous identification versus one single-response multiple-choice question for the MMSI—could be confounding our interpretation of the results. However, on the basis of analyses of other ancillary data and the literature, we discarded these as potential artifacts because, if anything, any impact would have amplified our estimates of cross-survey differences and thus further strengthened our argument about alignment. Similarly, although the EIC/Census and the MMSI also exhibited differences in some of the ethnoracial terms used to refer to black individuals (negro, afrodescendiente, and afromexicano in the EIC/Census versus negro and mulato in the MMSI), our examination of the literature suggested that negro, which is present in all three surveys, is the most recognized term by far, and one generally tied to noncultural conceptions of blackness.
Findings
Alignment Effects on Black Population Size
The unshaded bars in Figure 2A show large and significant differences in black identification across the three surveys: 1.9 percent of Mexicans ages 25 to 64 years self-identified as black with the 2015 EIC cultural yes, in part, or no question, compared with 2.2 percent for the 2020 Census culture and ancestry yes-or-no question and 2.9 percent for the 2016 MMSI racial origins question with multiple, mutually-exclusive response options (all statistically significant from each other at p < .001). Extrapolated to the entire Mexican population, these percentages amount to 2.3 million people in the 2015 EIC, 2.7 million in the 2020 Census, and 3.6 million in the 2016 MMSI.

Actual and projected estimates of the percentage of Mexican adults (25–64 years of age) who self-identify as black by question framing: (A) actual estimates and (B) estimates projected to 2016.
As previously discussed, these figures need to be adjusted for the rise in black identification during the period, projecting and back-projecting the shares of individuals identifying as black in the 2015 EIC and 2020 Census to match the timing of the 2016 MMSI. These projections, shown in Figure 2B, are still indicative of large question wording effects. The adjusted EIC and Census estimates are 2.2 percent and 1.9 percent, respectively, a substantial and statistical difference from the 2.9 percent in the MMSI. Using only the contribution attributable to question framing (see Appendix A), we estimate that between 506,382 and 848,592 additional Mexicans would have self-identified as black in the EIC if asked the racial origins instead of culture question. 10 Likewise, we estimate that an additional 1,148,929 to 1,550,953 Mexicans would have identified as black using a racial origins question, instead of a culture and ancestry question in the 2020 Census. This is a substantial number and is consistent with Hypothesis 1. Also consistent with our hypotheses (see note 2), adding an ancestry restriction to the cultural question in 2020 reduced the percent black by approximately a third in relative terms, or 166,825 people.
Alignment Effects and Ethnoracial Inequality
We present estimates of ethnoracial differences from ordinary least squares models for schooling in Figure 3 and for the household assets and amenities index in Figure 4, adjusting for all controls previously discussed and listed at the bottom of the figures (see Appendix C for full models). Consistent with Hypothesis 2, we found stark differences in the SES of those identifying as black according to question framing. Relative to nonblack, nonindigenous individuals, the cultural question in the EIC and the culture and ancestry question in the Census produced moderate black advantages in schooling (0.4–0.45 years; see Figure 3) and slight advantages in household wealth (0.09–0.13 standard deviations; Figure 4). In sharp contrast, the racial origins question resulted in black disadvantage in both outcomes (−1.6 years and −0.2 standard deviations; see Figures 3 and 4, respectively). The difference between the coefficients for racial origins and cultural framings are 2 years for schooling and 0.3 standard deviations for household wealth, which we deem as large and moderate, respectively. 11

Predicted difference in years of schooling of black population relative to the nonindigenous, nonblack population by question framing.

Predicted difference in household amenities (z score) of black population relative to the nonindigenous, nonblack population by question framing.
Finally, we found that the black socioeconomic advantage observed in the culture and ancestry Census question is smaller than the black advantage observed in the purely cultural question in the EIC. Although this is inconsistent with Hypothesis 2 (note 2), these differences are statistically significant only for the household asset index (p < .005) and not for educational attainment (p > .10).
Discussion
Leveraging variation in definitions of blackness across three Mexican surveys fielded over a short time span, we illustrate how states’ framing of ethnoracial questions can substantially affect official population size and inequality estimates. Our empirical findings are consistent with some prior research, particularly on how question wording influences population size estimates. Moreover, our analytic framework resonates with a recent focus on state-popular alignment (e.g., Bailey et al. 2018; Flores et al. 2023; Villarreal 2014). However, in this article we centralize and directly theorize the potential significance of state-popular alignment on population size and inequality estimates for black Mexicans, a newly enumerated population. Methodologically, we consider numerous possible confounders, assessing the contributions of question framing versus key artifacts, and discuss multiple statistical consequences of states’ conceptualizations of ethnoracial categories.
Prior research suggests a racial origins framing would better align with current understandings of blackness (which emphasize phenotype and family origins) compared with cultural framings. Consistent with this, the racial origins question not only produced a larger black population than cultural framings (Hypothesis 1), but also a radically different socioeconomic portrait (Hypothesis 2). Both the EIC and Census cultural questions estimate blacks to have a slight socioeconomic advantage relative to nonindigenous, nonblack Mexicans. In sharp contrast, the MMSI’s racial origins framing produces a clear and sizable black socioeconomic disadvantage.
Given ample evidence of antiblack discrimination in Mexico (CONAPRED 2011a, 2011b; Cruz Carretero 1989; Sue 2013, 2023; Vaughn 2001; Velázquez and Iturralde Nieto 2016), the EIC and Census findings of black socioeconomic advantage are surprising. Our analyses reveal that questions on black identification with cultural formats “select out” persons of lower SES (see also Villarreal and Bailey 2020). Although socioeconomically-advantaged individuals seem to identify as black at similar rates using either cultural or racial origins formats (results not shown), socioeconomically-disadvantaged individuals were more likely to “opt out” of the EIC and Census cultural questions, but “opt into” the MMSI’s racial origins question. As a result, the lower SES selectivity of the MMSI also yields a larger black population estimate.
Although ancestry is likely better aligned with popular conceptions of blackness than culture (see Flores and Sue n.d.; Solís et al. 2021; but note that these studies discuss “family origins,” which may or may not hold the same connotation as “ancestry” at the popular level), adding ancestry to cultural criteria, as in the 2020 Census, further narrows the definition of blackness. Therefore, compared with the exclusively cultural framing in the EIC, we anticipated that a culture and ancestry frame in the Census would produce a slightly lower population size and slightly less black disadvantage. While we saw slightly higher black identification in the Census measure relative to the EIC, given the rise in black identification in that time period (estimated with a separate battery of questions with consistent wording), the census category should have been even larger than its observed value. As such, the addition of ancestry as a restriction very likely contributed negatively to the difference with the EIC (see Table A2 in Appendix A), consistent with our expectations. Likewise, our findings suggest that this additional criterion hardly affects black SES, with only slightly lower SES compared with a more purely cultural framing. Our findings could either indicate that, as long as a misaligned concept is present, additional criterion do not have much of an effect, or that the combination of a more aligned concept with a less aligned one may average out to an intermediate degree of alignment between the two.
Regardless, lower-educated individuals were less likely to self-identify as black in the Census or EIC compared with the MMSI. These findings build on work by Villarreal and Bailey (2020), who found that black identification in the EIC is endogenous to men’s earnings potential. Their findings suggest that black identification (using cultural questions) may disproportionately appeal to higher SES individuals and/or have little resonance among lower-SES individuals. Our findings further clarify that the endogeneity identified by Villarreal and Bailey is more likely of the latter variety (i.e., affecting lower-educated persons). This could be at least partly driven by cultural framings, and apply to men and women: black identification vis-à-vis racial origins appears to be more inclusive of individuals across gender (not shown) and socioeconomic strata. Moreover, we find a much better fit of noncultural over cultural framings for measuring the black population and assessing black disadvantage in the Mexican context (see Saldívar 2014; Torre-Cantalapiedra 2018).
Popular understandings of ethnoracial categories (and thus the degree of alignment with state categories) are not always consistent across all sectors of a population. These inconsistencies are likely explained, in part, by differential levels of exposure to state ethnoracial discourse and ideology. For example, a question from the 2021 Race Ethnicity and Migration survey (Flores and Sue n.d.) (see Appendix B for a description) asked respondents if they had encountered any of the consciousness-raising campaigns surrounding the new EIC and census black identification questions. Only 15 percent of respondents reported receiving this messaging and they were disproportionately higher-educated individuals. Especially with the emergence of new ethnoracial ideologies and/or categories in Latin America, higher-educated individuals, who are likely more aware of state discourses, may align their identification accordingly. If there is a stronger degree of alignment between state conceptualizations and conceptions of blackness held by individuals with higher SES, compared with those in less privileged positions, this is likely to result in an underestimation of disadvantage.
Relatedly, although there are important regional differences in popular conceptions of blackness, these differences should not affect our general conclusions. Although cultural understandings of blackness appear to be slightly more common at the popular level in the southern and Gulf regions than in other parts of the country, racialized understandings of blackness are still more relevant than cultural ones in these regions (see note 1). Furthermore, our analyses by region (not shown) reveal there is a larger difference in black population size and inequality estimates between the cultural questions in the EIC and Census and the racial origins question in the MMSI in the southern and Gulf regions.
Future research could further disentangle local and regional variation in alignment and inequality and explore the relative influence of various aspects of question wording (including response option formats and terminology for ethnoracial groups), extending our alignment theory to other aspects of ethnoracial measurement. Moreover, while we relied on available data, more in-depth data are needed to understand how people conceptualize ethnoracial categories and how these conceptualizations interface with ethnoracial survey questions. Additionally, more systematic data are needed to assess the degree to which understandings of ethnoracial categories may vary by SES, gender, region, and other relevant categories. Finally, although beyond the scope of this paper, our findings on the consequences of (mis)alignment may have implications for research on other social categorical systems, especially those which are also undergoing change in official classification systems, such as gender orientation (see Westbrook and Saperstein 2015). We hope that future research will explore these issues.
Conclusions
Nation-states across the globe are increasingly opting to collect ethnoracial data. These measures provide an unprecedented glimpse into previously invisible categories and populations, informing governments and their citizens about the size and status of newly enumerated groups. The statistical portraits arising from these new data have significant social and political consequences: they can inform public policy, be leveraged by social movements and social justice advocacy groups in the fight for minority rights, or they can be coopted by political groups to squelch claims of disadvantage and the need for redistributive policies. In addition to recent data collection trends, many societies are experiencing significant ethnoracial fluidity, including countries with historically rigid ethnoracial classification systems, such as the United States (Saperstein and Penner 2012). The confluence of these trends makes the theoretical and empirical examination of how the degree of alignment between popular and state definitions of ethnoracial categories affects ethnoracial statistics both important and timely.
Scholarship on the politics of ethnoracial classification has placed much emphasis on the development of ethnoracial classification systems and their political consequences. However, there has been little regard for the statistical implications of state definitions of ethnoracial categories, especially in terms of inequality. Likewise, stratification scholarship, particularly in the United States, has tended to treat ethnoracial data as “objective” or fixed, rarely exploring how ethnoracial categorical constructions can operate as mechanisms in the production of inequality. In this research, we examined how the “constructed nature of state-created categories . . . and their misalignments provides a powerful lens” for understanding the creation and reproduction of inequality (Menjívar 2023).
We used the Mexican case to show how state definitions of race, and their degree of alignment with popular conceptions of ethnoracial categories, are important mechanisms driving statistical constructions of population size and ethnoracial inequality. Our analyses revealed that measuring blackness in Mexico may require a noncultural framework to align with current popular understandings of blackness. A racial origins frame yielded a larger estimate of Mexico’s black population and portrayed the population as highly disadvantaged, whereas a cultural frame returned a smaller population size estimate and portrayed blacks as socioeconomically equal or even better off than the Mexican majority.
Although recent literature emphasizes how stratification outcomes can vary across measure type or dimension (Bailey et al. 2013, 2014; Paredes 2018; Telles et al. 2015; Villarreal 2014), we have shown how, even within self-identification—a single measure and the likely standard for official ethnoracial measurement for the foreseeable future—stratification outcomes can vary widely (see also Shiao 2019). In the case of Mexico, although our analysis clearly indicates that the use of a misaligned cultural concept is problematic, a question for future research would be what self-identification measure(s) best aligns with popular conceptions of blackness (e.g., phenotype, ancestry, family origins, or other classification “cues”; see Monk 2022 and Schachter et al. 2021). Finally, it is important to note that we are not suggesting that a higher degree of alignment will always produce estimates that most accurately reflect lived realities, as contextual nuances shape interactions between state and societal categorization systems. The extent to which this is the case is a question for future research. Regardless, we believe that scholars of inequality and policymakers should be more attuned to potential effects of alignment when interpreting and disseminating ethnoracial statistics.
Advocates of previously unrecognized (and often) marginalized populations, such as the Middle Eastern and North African category in the United States, often center their demands on recognition on national censuses. However, our findings highlight how the inclusion of a category is just a first step and may not translate into equality (Menjívar 2023). Saldívar (2014, 2018) warned that emphases on culture and origins can hinder understandings of racism and a focus on racial injustice, a warning not limited to the Mexican case. In Brazil and Colombia, state conceptualizations of blackness as cultural conflicts with social movement emphases on racial inequality (Paschel 2016).
The importance of aligning official and popular conceptualization schemes for capturing social hierarchies can be seen in other contexts. In the United States, the misalignment between state framings and popular identification and categorization practices is clearly seen in the case of Hispanics. The U.S. census uses two separate questions to capture data on race and ethnicity: a Hispanic origin question and a question on race (without a Hispanic category). Research has shown that the way Hispanics often think of their identities is at odds with this ethnoracial classification system (Hitlin et al. 2007; James 2001; Telles 2018). Instead, as the Census Bureau’s own research shows, a single race question with a Hispanic category better aligns with how Hispanics prefer to identify (Telles 2018). Moreover, it has been argued that a single ethnoracial “pentagon” paradigm—White, Black, Hispanic, Asian, and Native American—most closely aligns with everyday practices of categorization and social treatment (Hollinger 2006) and has been shown to best capture inequality (Howell and Emerson 2017).
Our findings also provide a cautionary tale for the global diffusion of ethnoracial discourse and policies (FitzGerald and Cook-Martín 2014; Loveman 2014; Telles and PERLA 2014). This diffusion process is clearly implicated in the politics and decisions surrounding the conceptualization of blackness in Mexico. Mexico adopted recommendations by international organizations regarding the measurement of the black population (Delgado 2021), despite pilot data from its own census bureau showing a cultural question caused a great deal of confusion (INEGI 2019). This reveals how internationally recommended models and measurements of race and ethnicity can be at odds with popular understandings in particular national contexts and lead to unintended and even pernicious consequences. State decisions about which ethnoracial models to adopt can profoundly affect official estimates of population size and inequality. These statistics, in turn, inform and shape state priorities and policies to remedy discrimination and inequality and can determine the rightful benefactors of state resources and support.
Supplemental Material
sj-docx-1-srd-10.1177_23780231231217821 – Supplemental material for Black Disadvantage or Advantage? Misalignment between State and Popular Understandings of Blackness in Mexico
Supplemental material, sj-docx-1-srd-10.1177_23780231231217821 for Black Disadvantage or Advantage? Misalignment between State and Popular Understandings of Blackness in Mexico by Christina A. Sue, Fernando Riosmena and Edward Telles in Socius
Footnotes
Acknowledgements
We thank David Cook-Martín, Juan Delgado, René Flores, Andrés Villarreal, and the anonymous reviewers for their comments on prior versions of this article. This research benefited from administrative, research, and/or computing support through the University of Colorado Population Center (CUPC) funded by Eunice Kennedy Shriver National Institute of Child Health & Human Development of the National Institutes of Health (P2CHD066613).
Correction (June 2024):
This article has been updated to correct Appendix citations in Table 1 and on pg. 9.
Supplemental Material
Supplemental material for this article is available online.
Notes
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
