Networks Beyond Categories: A Computational Approach to Examining Gender Homophily

Abstract

Social networks literature has explored homophily, the tendency to associate with similar others, as a critical boundary-making process contributing to segregated networks along the lines of identities. Yet, social network research generally conceptualizes identities as sociodemographic categories and seldom considers the inherently continuous and heterogeneous nature of differences. Drawing upon the infracategorical model of inequality, this study demonstrates that a computational approach – combining machine learning and exponential random graph models (ERGMs) – can capture the role of categorical conformity in network structures. Through a case study of gender segregation in friendships, this study presents a workflow for developing a machine-learning-based gender conformity measure and applying it to guide the social network analysis of cultural matching. Results show that adolescents with similar gender conformity are more likely to form friendships, net of homophily based on categorical gender and other controls, and homophily by gender conformity mediates homophily by categorical gender. The study concludes by discussing the limitations of this computational approach and its unique strengths in enhancing theories on categories, boundaries, and stratification.

Keywords

Machine learning social networks segregation ERGM categories

The investigation of social networks and boundaries has a long-standing history of exploring the reasons behind the segregation of individuals based on socially constructed identities, including gender, race, ethnicity, education, and occupation. The extant literature has demonstrated that homophily, which refers to the propensity to associate with similar others, is one of the most pervasive and robust mechanisms (Jackson, Rogers, and Zenou 2017; McPherson, Smith-Lovin, and Cook 2001). Empirical analyses predominantly revealed that individuals utilize various sociodemographic categories to evaluate similarity and structure their network connections.

The emphasis on shared sociodemographic categories in social network research can be attributed, at least in part, to the accessibility of categorical data and the enduring interest among social scientists in categorical distinctions (Monk 2022). As Tilly (2005: 72) points out, the categorical difference is fundamental to human nature and inequality: “[l]arge, significant inequalities among human beings correspond mainly to categorical differences such as black/white, male/female, citizen/foreigner, or Muslim/Jew rather than to individual differences in attributes or propensities.” However, in recent years, a growing body of scholarship has emerged, advocating for an alternative approach to the study of differences. For instance, Monk (2022, 20) suggests shifting our focus “from membership in categories to the cues of categories (whether in the signification of categories along continua or their role in feature-based stereotyping beyond explicit categorization, perceived categorical typicality, and membership in subcategories).” Applying Monk's theory to social network research raises an important yet underexplored question: to what extent do cues of categories contribute to perceived similarity and shape network structures, above and beyond the inclination of individuals to segregate themselves based on sociodemographic categories?

This article examines how categorical cues shape social networks by focusing on categorical conformity, a key instance of cues of categories and identity performance. To estimate homophily based on categorical conformity, this study introduces a novel analytical framework that combines two state-of-the-art computational methods – formal analyses of culture and exponential random graph models (ERGMs). On the one hand, formal analyses of culture suggest that computational methods, such as machine learning, can yield a useful measurement for categorical conformity with survey-based techniques (Foster 2018; Gondal 2022; Mohr et al. 2020; So and Roland 2020; So, Long, and Zhu 2019). On the other hand, ERGMs are robust statistical network models capable of distinguishing categorical conformity from other factors influencing observed network structures (Snijders 2011). This framework contributes to theories on categories and boundaries by operationalizing Monk's infracategorical model of inequality within a network analysis context. It also extends social network research by using machine learning to guide ERGMs. Substantively, the analysis contributes to a long tradition of research on “cultural matching,” which explores how cultural practices and patterns shape social networks (Edelmann and Vaisey 2014; Lizardo 2006; Rivera 2012; Vaisey and Lizardo 2010; Wimmer and Lewis 2010).

To illustrate this computational approach, I examine gender homophily in adolescent friendships, using data from the National Longitudinal Study of Adolescent to Adult Health (Add Health). Gender homophily, the tendency for individuals to form ties with others of the same gender, provides a pertinent example of how categorization and cues of categories are embedded in networks. While the prevalence of gender homophily is well-established, the theory of gender performance suggests that individuals who enact gender in similar ways may also form connections (Bettie 2003; Pascoe 2012). However, this argument is often assessed qualitatively. To provide quantitative evidence, this study integrates a machine-learning-based gender conformity measure in ERGMs, empirically exemplifying the methodological state of the art. While the focus is on the case of gender and networks, this computational approach can be adapted to investigate various cues of categories and their impacts on intergroup boundaries, which ultimately contribute to inequality and stratification.

The remainder of this article is structured as follows. First, this article reviews social network research, focusing on homophily effects in statistical network models and recent findings on cultural matching. Next, this article discusses an infracategorical model of inequality developed by Monk (2022) and how machine learning can aid in quantifying gender conformity. Following an introduction to the Add Health sample and analytic strategy, this article presents results from a machine-learning-based analysis of gender conformity and a social network analysis of friendships incorporating categorical gender and gender conformity via ERGMs. In conclusion, this article discusses the substantive findings, theoretical and methodological implications for social network analysis, limitations of the study, and potential directions for future research that investigates cues of categories and boundaries through computational operationalization.

Homophily, Categories, and Cultural Matching

For a considerable period, sociologists have endeavored to elucidate the underlying factors contributing to the segregation of social networks along the lines of diverse social identities (Blau and Schwartz 1984; Fischer 1982; Hofstra et al. 2017; Newman 2003). Among the numerous proposed explanations, the “homophily principle” posits that the segregation of networks results from individuals’ propensity to associate with others who share similar characteristics. Empirical evidence demonstrates that this affinity extends across various dimensions, encompassing sociodemographic identities, attitudes, and behaviors (McPherson, Smith-Lovin, and Cook 2001).

There are at least two types of homophily (Kossinets and Watts 2009). The first type, referred to as “induced homophily” or “shared foci,” pertains to the formation of relationships influenced by common external factors, such as working in the same location or participating in similar clubs (Feld 1981, 1982). The second type is “choice homophily,” which stems from personal preferences. In real-world contexts, it can be difficult to distinguish between the effects of induced and choice homophily due to the complex interplay between the two: individual preferences can contribute to the sorting of individuals into shared environments, while these shared environments can further reinforce those preferences.

The statistical modeling of social networks, exemplified by ERGMs, offers a robust methodology capable of distinguishing choice homophily from induced homophily and other mechanisms (Robins et al. 2007; Snijders 2011; Snijders, van de Bunt, and Steglich 2010). By employing these modeling techniques, researchers can incorporate homophily parameters based on shared foci or sociodemographic identities and subsequently execute algorithms to determine which parameter is statistically significant while controlling for the others. Furthermore, these modeling techniques can account for higher-order contextual network processes, such as triadic closure (i.e., the principle of “a friend of a friend is a friend”) (Goodreau, Kitts, and Morris 2009).

The most prevalent approach to estimating the homophily effects is to use categorical attributes, such as racial or gender categories. In practical terms, the majority of network data is collected via survey-based methodologies, wherein respondents are customarily asked to identify the social categories to which they or their acquaintances belong (Goodreau, Kitts, and Morris 2009; Hofstra et al. 2017; McPherson, Smith-Lovin, and Cook 2001; Smith, McPherson, and Smith-Lovin 2014). However, conceptualizing identities as demographic categories means we bracket to what extent people perform gender or racial identity in ways that fit normative expectations regarding these identities. In other words, this measurement approach ignores variations among social groups and fails to theorize the role of the cultural meanings surrounding identity in network structures (McLean 2017).

Conversely, a growing body of research demonstrates that cultural meanings can often precede the formation of networks and social interactions (Edelmann and Vaisey 2014; Gondal and McLean 2013; Lewis and Kaufman 2018; Lizardo 2006; Mohr et al. 2020; Pachucki and Breiger 2010; Rivera 2012; Vaisey and Lizardo 2010; Wimmer and Lewis 2010). Drawing upon (Bourdieu's([1984]) theory, this line of inquiry focuses on how tastes, a form of social class performance, shape network structures and frequently operationalizes culture as continuous characteristics rather than discrete categories. For example, Lizardo (2006) employed unidimensional scaling techniques to quantify tastes and discovered that the tendency to consume highbrow or popular culture taste is associated with increased strong or weak ties. Also, some studies incorporate behavioral or attitudinal attributes, such as music consumption, church attendance frequency, or anti-immigrant sentiments, that are based on surveys or observations to examine homophily by sharing culture, or “cultural matching” (Cheadle and Schwadel 2012; Lewis and Kaufman 2018; Oosterhoff, Poppler, and Palmer 2022). However, while these studies provide evidence supporting the causal role of culture in networks, they do not propose a general framework to theorize homophily beyond sociodemographic categories. Recently, a growing body of research has proposed a new theoretical framework, the infracategorical model of inequality, which shifts the focus from categories to the cues that signal categorical membership (Monk 2022). Below, I discuss how Monk's (2022) infracategorical model of inequality can be extended to homophily research and social network analysis.

An Infracategorical Model of Inequality in Social Network Analysis

In his review of quantitative studies on inequality and stratification, Monk (2022) posits that a prevalent approach to examining inequalities is what he terms the “standard model of inequality.” This theoretical model, like conventional homophily research, emphasizes the institutionalization of categorical distinctions in shaping social interactions and life outcomes. Despite its widespread use in inequality research, Monk critiques this model for its tendency to “bracket out the fundamentally continuous, gradational, and subcategorical nature of categorization” (Monk 2022, 5). As an alternative, Monk (2022, 3) posits a novel theoretical model, termed the “infracategorical model of inequality” (ICMI), which seeks to “disaggregate difference by shifting the focus from membership in categories to the cues of categories, subcategories, and perceived typicality.” In other words, “what matters are not the superordinate categories themselves, but the various properties (or cues) associated with categories (i.e., intensions) and the mental representations, abstract ideas (i.e., concepts), and stereotypes triggered by these properties” (Monk 2022, 14).

There are three potential approaches for quantifying categorical cues by utilizing ICMI in social network analysis. The first approach is to examine identification via a continuous scale. For instance, Leszczensky and Pink (2019) used a mean index derived from survey items to gauge the strength of students’ attachment to their ethnic group (e.g., “I feel strongly attached to Germans/people from my family's country of origin”). They subsequently examined how ego's and alter's ethnic identification moderates ethnic homophily by ethnic categories. Within the context of gender research, some sociologists have developed survey questions to measure individuals’ gender identification by requesting that they self-identify on a pair of femininity and masculinity scales (Hart et al. 2019; Magliozzi, Saperstein, and Westbrook 2016). Although no current homophily research employs such gender identification measures, it is conceivable that a more robust gender identification could augment the perceived similarity of categorical gender.

The second approach is perceived typicality. As Monk (2022, 9) posits, typicality is an important basis for differentiation because individuals within the same categories “vary in how typical or atypical they are in terms of the characteristics they possess that were used to categorize them.” Perceived typicality is frequently assessed by others, or in surveys, by interviewers. An example of this is perceived skin tone. In social network analysis, Villalta (2022) utilized an ordinal skin color scale in ERGMs and found skin color homophily net of homophily by racial categories and other controls. Perceived gender typicality has also been developed by psychologists (Deaux and Lewis 1984; Green, Ashmore, and Manzi, Jr 2005), though they have not yet been commonly applied in network studies.

The third approach is identity performance. Like perceived typicality, identity performance considers the heterogeneity in performing categories. However, unlike perceived typicality, which measures typicality by how others perceive the ego's categorical membership, identity performance centers on the embodied behaviors and attitudes individuals present in their daily lives. Gender performance serves as an example (West and Zimmerman 1987). Several ethnographic studies have indicated that gender performance can influence the formation of social ties. For example, Pascoe (2012) observed that in a high school setting, a boy who exhibits emotional sensitivity, non-competitiveness, physical weakness, or an inability or unwillingness to assert dominance over girls may experience social isolation from his male peers.

This study assesses an important instance of identity performance, categorical conformity. According to Pascoe's ethnographic observations, what shapes the formation of social ties is not gender performance itself but whether people's gender performance conforms to gender categories. This is also one instance of the core categorical cues. As Monk (2022, 16) argues, “[r]ethinking categorical conformity and nonconformity (gender and ethnoracial) as forms of typicality (e.g., prototypicality and atypicality) is important to comprehensively render how difference relates to inequality.” Yet, unlike gender identification or perceived gender typicality measures that might be available in quantitative formats, gender performance is often evaluated in qualitative research because it is revealed in everyday interactions. One alternative method is to develop a proxy measure of gender performance, which uses self-reported attitudes or behaviors where the responses show gender differences in survey data. Although it does not measure the actual behavior, it presents a subjective sense of behaviors (i.e., people consider how they present themselves) and is also aligned with the underlying conformity to gender categories. Next, I will discuss how machine learning can be used to measure gender conformity.

Machine-Learning-Based Gender Conformity

As a quantitative method, machine learning has been increasingly applied to elucidate the encoding and representation of cultural meaning within social and textual data, blurring the boundaries between qualitative and quantitative research. As Nelson (2020, 101539) argues, “[w]hile machine learning is currently being developed to augment inferential statistics, the mathematical assumptions of machine learning—both unsupervised and supervised approaches—are, I claim, better equipped for use in the type of inductive, exploratory, and contextual research traditionally conducted using qualitative methods.” Various machine learning methods, such as word embedding, have been employed to identify and analyze cultural objects (Arseniev-Koehler and Foster 2022; Nelson 2021). Besides their application to textual data (Arseniev-Koehler and Foster 2022; Knight 2022; Kozlowski, Taddy, and Evans 2019; Nelson 2021; Voyer et al. 2022; Zhou 2022), machine learning methods have also been extensively applied to survey data (Boutyline and Vaisey 2017; Brensinger and Sotoudeh 2022; DellaPosta 2020; Goldberg 2011).

Supervised machine learning techniques are instrumental in assessing gender conformity from self-reported attitudes or behaviors. These techniques encompass a range of computational models designed to identify patterns in labeled data inductively, facilitating tasks such as classification. One example is to train a unidimensional machine learning algorithm with writing components, termed textual features, and author labels to detect which novels were written by white and black authors (Long and So 2015; So and Roland 2020; So, Long, and Zhu 2019). Parallel to the distinctive writing patterns related to racial differences, these techniques have the potential to learn gender conformity by detecting patterns in attitudinal and behavioral survey responses that show gender differences.

However, two theoretical and methodological issues should be addressed when applying supervised machine learning to measure gender conformity. First, what aspects of gender does machine learning measure? As So and Roland (2020, 63) suggest, a machine is “a relational, not ontological, thinker.” Consequently, machine learning models do not directly learn gender itself; rather, they serve to measure the extent to which individuals conform to normative expectations associated with gender categories. This makes machine learning a valuable tool for measuring gender conformity, as gender conformity refers to the degree to which individuals’ behaviors or attitudes align with socially prescribed gender norms and categories.

Second, what outcomes of machine learning should be used? Although machine learning often starts with nominal categories, So and Roland (2020, 63) contend that “native to the machine itself are methods to test the integrity of those initial categories and to explore their potential contingency or tenuousness.” This contingency and tenuousness can be quantified as a predicted probability. For example, So and Roland found that while most white and black authors are distinguishable, some works by white authors have a lower predicted probability of white authorship, while some works by black authors have a higher probability of white authorship. Instead of dismissing these deviations as errors or mistakes in conventional classification tasks, So and Roland analyzed these anomalies and found them to be informative, as these deviations reflect the inherent contingency and negotiation of socially constructed racial categories in writing patterns.

In the context of these extensions, the analysis of gender conformity grounded in machine learning (hereafter referred to as ML-based gender conformity) aligns with Monk's ICMI on multiple dimensions. First, this approach underscores the inherently continuous and heterogeneous nature of differences, as opposed to distinct, nominal differences (i.e., categorial gender). Second, it employs an inductive analytical methodology, which “foreground[s] the various properties (or markers) and aggregates of individuals who probabilistically may or may not be easier to class together socially or politically given the various properties or markers they have or are perceived to have.” (Monk 2022, 14). Additionally, it is consistent with the interaction model of gender, which suggests that gendered attitudes and behaviors (i.e., femininity and masculinity) are context-dependent (Deaux and Major 1987). Third, while the complexity of gender performance may not be fully encapsulated in the resultant parsimonious, unidimensional classifier, it is nonetheless valuable in evaluating conformity to gender categories (i.e., gender norms). In this regard, ML-based gender conformity aligns with ICMI by demonstrating the diversity of ways in which individuals may conform to or deviate from their gender categories.

To obtain ML-based gender conformity, the procedure for training and evaluating supervised machine learning algorithms is illustrated in Figure 1. During the learning stage, a training sample (which can be split into training and validation sets) is generated using individuals’ survey responses as the input and their self-identified gender category as the output.⁶ Then, an iterative learning process ensues via supervised machine learning algorithms to develop classifiers, forecasting the extent to which an individual's performance aligns with gender categories and using a validation set to tune the hyperparameters.⁷ A critical step in this stage involves validating the classifier with a focus on the predicted probabilities. Technically, the area under the receiver operating characteristic curve (AUC) is typically employed as an evaluation metric for assessing the classifier's performance on the testing test (Grimmer, Roberts, and Stewart 2022).⁸ A higher AUC signifies that the classifier effectively captures the underlying conformity to gender norms. Upon obtaining the classifier, the prediction stage entails applying the classifier to the analytic sample to obtain the predicted probabilities. A lower predicted probability indicates that some respondents may exhibit atypical gender performance, making the classifier unable to assign gender categories confidently. These scores are subsequently employed to estimate homophily effects in ERGMs, which will be discussed later.

Figure 1.

Flow chart in computational methods. Note. — The key outputs are highlighted in gray color.

It is noteworthy that despite not using a machine-learning framework, psychologists have developed an analogous approach known as “gender diagnosticity” (Fleming, Harris, and Halpern 2017; Lippa and Connelly 1990). This approach uses existing survey items in Add Health that show gender differences and runs logistic regressions to obtain predicted probabilities. An expanding corpus of empirical research encompassing a wide range of topics, including alcohol and marijuana use, academic accomplishment, weight control, and social mobility, has substantiated the practical value of predicted probabilities (Domingue et al. 2019; Mahalik et al. 2015; Nagata et al. 2020; Shakya et al. 2019; Weber et al. 2019; Yavorsky and Buchmann 2019). Mittleman (2022), in particular, associated this approach with machine learning and found that it can explain the academic gender gap. The ML-based gender conformity presented here expands upon this approach by offering a more flexible analytical framework, which, as shown in the sensitivity analyses, can incorporate more complex machine learning algorithms. More importantly, this framework connects this methodology to the ICMI and guides the social network analysis of cultural matching.

When training supervised machine learning algorithms, a crucial decision is selecting the appropriate survey responses to use as input data. The selection can be guided by theoretical principles or empirical data. While it is possible to include all survey questions in the training data, this article used the 21 gendered behavioral and attitudinal variables identified by the gender diagnosticity method mentioned above (Table A1 in the Appendix shows these variables). In addition to using the same data source (Add Health), there are two additional methodological reasons to include these variables. First, utilizing solely the survey questions that demonstrate gender differences enables the exclusion of irrelevant data, which has the potential to impact the quality of the machine learning models. Second, predicted probabilities associated with these variables have been verified in the context of test-retest reliability and content validity (Fleming, Harris, and Halpern 2017).⁹ In this regard, the classifier's performance aligns with traditional scale development metrics. One limitation of these variables is that they capture only some aspects of gender performance. Their limitations will be covered in the Discussion section.

Next, this article exemplifies the utility of this ML-based gender conformity through a case study of adolescent friendships, thereby providing a more comprehensive understanding of gender homophily. While acknowledging the potential for social relationships to shape gender conformity, this article primarily focuses on how gender conformity can elucidate segregated friendship networks through the lens of homophily.

Data and Methods

Data and Sample

The primary data used for the ML-based gender conformity are derived from Wave 1 in-home survey component of Add Health, whereas the adolescent friendship nomination data is drawn from Wave 1 in-home friendship surveys. Add Health is a longitudinal study involving a nationally representative sample of adolescents in grades 7–12 in the United States (Harris 2013). Data from Wave 1 was collected between 1994 and 1995, encompassing in-school surveys and in-home interviews conducted one year after the in-school surveys. This study focuses on the first wave of Add Health, though in the sensitivity analyses, I included Wave 2 networks to test the associations between gender conformity and the change of network ties between Waves 1 and 2. The social network analysis focuses on Add Health's saturated sample, which includes 16 schools with comprehensive sociometric network data. This data is fundamental for network analysis and has been successfully employed in prior network studies (Flashman 2012; Haynie 2001; Schaefer, Kornienko, and Fox 2011).

Although these data are not recent, I argue they remain useful, as Add Health is one of the few large-scale social surveys that include both survey and network data as well as gender performance survey items. This data thus enables the systematic examination of the relationships between gender conformity and friendship segregation (Goodreau, Kitts, and Morris 2009; McFarland et al. 2014; Moody 2001). There is undoubtedly a conservative bias, as the diversity in gender performance during the 1990s may be less than what would be observed if more recent data were used. Nevertheless, if individuals exhibiting heterogeneous gender performance can still be identified in the 1990s, it suggests that the method developed in this study will reveal a greater diversity in gender performance in more recent work.

Training a Gender Performance Classifier

To develop the ML-based gender conformity measure, this study follows the model-building process illustrated in Figure 1. The measure of gender performance was derived from Wave 1 in-home interview data.¹⁰ Figure 2 presents a summary of the sample flowchart. Generally, constructing a machine learning model requires training/validation/test data, and the analytic sample should not be used during the learning phase (Murphy 2012; Salganik et al. 2020). Accordingly, I retained the sample with the complete data on the survey items pertaining to gender performance from the Wave 1 in-home sample. This step resulted in a reduction in sample size from 20,745 to 20,150. Then, I divided the sample into two portions: training/validation/testing samples (N = 16,623) and the analytic sample, where the latter includes respondents from 14 schools in the Add Health saturated sample (N = 3,527). I excluded two schools because one is a special education school, and the other has no responses to the self-identified gender question from the students who responded to friendship nomination questions.¹¹

Figure 2.

Sample description flow chart.

The next step is to train the machine learning model. First, the training/validation/test data are split into the “training/validation” and “test” datasets. Both datasets are split almost evenly between male and female adolescents. Then, I used the “training/validation” dataset and applied lasso-based penalized logistic regression algorithms (Murphy 2012). The employment of regression algorithms facilitates a more straightforward examination of attribute contributions, as opposed to more complex algorithms. In the sensitivity analyses, I will discuss two alternative machine learning algorithms – decision tree and support vector machine – and compare their results. All variables, commonly referred to as “features” within machine learning frameworks, were standardized prior to classifier training. Following the training/validation dataset, the test dataset serves as an out-of-sample evaluation. In this study, the area under the curve (AUC) of the training/validation and test datasets are 0.851 and 0.848, respectively. As discussed earlier, this metric is crucial in assessing machine learning models, and an 84% accuracy rate in the test dataset is notably high compared to other machine learning models utilized in social survey data (Salganik et al. 2020). The resultant classifier is then employed to predict gender conformity in the analytic sample.

Establishing Friendship Networks

For friendship network data, this study used the in-home friendship nomination files. The network data collection employed name generator questions in Wave 1 (Adams 2019). Adolescents were presented with a roster of other students attending their school and asked to identify up to five male and five female friends. The upper limit on friendship nominations affected few students (more discussion about this limitation can be found in Moody 2001, 690). Adopting the union approach (Adams 2019, 34), this study considers friendship ties to be present if either student reports the tie as such (i.e., undirected ties). This approach is commonly used when network data are collected via self-reports (Brewer 2000). To ensure that all participants in the analytic sample have both network and covariate data, students who did not complete in-home interviews were excluded. This resulted in a final sample size of 3,527, respectively. Appendix Table A2 presents the descriptive characteristics of the final analytic sample across 14 schools.

Variables in the Statistical Modeling of Social Networks

Tie formation. — The primary outcome is whether students form friendship ties, a binary variable at the dyadic level (having a tie or not). In statistical network models, the outcome is the probability of developing friendship ties among two students.

Gender conformity. — This measure utilized predicted probabilities from the gender performance classifier, whose values range from 0 (i.e., the classifier has 100 percent confidence that the adolescent should be classified as a woman) to 1 (i.e., the classifier has 100 percent confidence that the adolescent should be classified as a man). To facilitate the interpretation of the findings, this study employes the terms “women typicality” and “men typicality.” Specifically, a score approaching 1 indicated that the respondent exhibited a higher level of men typicality, whereas a score approaching 0 denoted a higher level of women typicality. A score approaching 0.5 indicated that the machine is more uncertain about whether the respondent should be classified as a man or a woman, which means more nonconformity or that the typicality is more ambiguous.

Similar gender conformity. — I estimated the similarity of gender conformity between two adolescents based on the absolute difference in nodal attribute (i.e., absdiff term in ERGMs). Therefore, for ease of interpretation, the coefficient sign was reversed in the presentation of the results, such that positive coefficients could be understood as indicating positive effects when two adolescents have similar gender conformity.

Controls. — Several covariates and contextual network effects are included to discern the effects of similar gender conformity. Because categorical gender may be associated with gender conformity and networks, it is necessary to include categorical gender to examine the unique role of gender conformity. The interviewer-defined gender category was used as it is the most prevalent gender measure in extant surveys (Kalmijn 2002) and network research (Goodreau, Kitts, and Morris 2009; Moody 2001). This categorical gender variable was coded as a binary variable, wherein “man” is assigned a value of 1 and “woman” is assigned a value of 0. Additionally, age and socioeconomic status are included as covariates as they may influence friendship tie formation. Socioeconomic status was measured as the maximum level of education attained by either parent. The coding scheme for parents’ education is as follows: “less than high school,” “high school graduate,” “some college,” “at least a bachelor's degree,” and “missing/don’t know.” Although race is an important covariate in network ties and may be correlated with gender conformity and friendship ties (Bettie 2003), this study chose not to include it in the main analyses due to the predominantly racial homogenous nature of most schools. In the sensitivity analyses section, I included racial homophily in ERGMs on a subset of schools that exhibited sufficient racial composition variation.

In addition to adjusting for homophily effects by other sociodemographic attributes, induced homophily, or homophily by shared foci (Feld 1981, 1982; Kossinets and Watts 2009), should also be considered. As discussed previously, homophily by shared foci and homophily by personal preferences can reinforce each other. Therefore, statistical network models should incorporate both types of homophily effects to discern the influence of homophily on gender conformity. This article measured the number of extracurricular activities shared by dyads as the key shared foci attribute. It is important to note that some aspects of gender performance also involve shared activities (e.g., adolescents may meet others on the sports field if they participate in the same sport). However, shared extracurricular activities are a more direct measure of the influence of opportunity structures, and this influence has been confirmed in prior studies on friendship networks (Schaefer, Haas, and Bishop 2012).

Another factor that should be adjusted for in social network analysis is the high-order network contextual effects or endogenous network effects. Following network research (An 2022; Goodreau, Kitts, and Morris 2009; McMillan 2019), the three most important endogenous network effects were included: friendship skew, open triads, and closed triads. Friendship skew accounts for the friendship ties concentrated among a few popular adolescents. Open triads capture the tendency for three adolescents to form a two-path friendship (e.g., a is a friend with b, b is a friend with c; a and c may not be friends), while closed triads refer to friendships between three people forming a triangle, or triadic closure (e.g., a is a friend with b, b is a friend with c; a is a friend with c). Note that because open triads are prerequisites for closed triads and are often used as a control for estimating triadic closure (Goodreau 2007; Snijders et al. 2006), the focus of the high-order network effects in the current study will be friendship skew and closed triads.

Analytical Strategies

The main analysis consists of two steps. Firstly, an examination of fourteen schools’ network tie formation processes was conducted using ERGMs. ERGMs constitute a family of logit models that calculate the likelihood of ties (Robins et al. 2007; Wasserman et al. 1994). Unlike traditional logit models, ERGMs facilitate the modeling of higher-order dependencies, which are often prevalent in observed relational data (for example, a friend of a friend is also a friend). Additionally, ERGMs enable the estimation of network effects and the assessment of their statistical significance while adjusting for other factors (Robins et al. 2007). The general form of specifying the probability of the entire network can be expressed as follows:

P (Y = y | X) = \frac{\exp (θ^{'} g (y))}{k (θ)},

(1)

where Y is the random variable for the state of the entire network, g(y) is a vector of model statistics for the realization of y conditional on the matrix of covariates X. θ is the vector of coefficients for model statistics, and k(θ) is a normalizing constant.

ERGMs are particularly suitable for investigating friendship networks, as they can simultaneously examine three types of statistics. The first is the covariate effect, or sociality, which represents the extent to which individuals with specific attributes form network ties. For example, boys may establish more social ties than girls. The second is homophily, which is the focus of this article. The third is the endogenous network effects, such as triadic closure (Goodreau, Kitts, and Morris 2009). Since all these effects are pertinent factors in the formation of friendship ties, ERGMs enable us to test the hypothesized relationship between gender conformity and friendship ties, net of other factors. In this study, the ERGM model specification can be expressed as follows:

\begin{aligned} l o g i t (\frac{P (Y_{i j} = 1 | Y_{- i j})}{P (Y_{i j} = 0 | Y_{- i j})}) & = β_{0} (E d g e s_{i j}) + β_{1} (F r i e n d s h i p s k e w_{i j}) + β_{2} (O p e n t r i a d s_{i j}) \\ + β_{3} (C l o s e d t r i a d s_{i j}) + β_{4} (G e n d e r s o c i a l i t y_{i j}) + β_{5} (A g e s o c i a l i t y_{i j}) \\ + β_{6} (G e n d e r c o n f o r m i t y s o c i a l i t y_{i j}) + β_{7} (S a m e c a t e e g o r i c a l g e n d e r_{i j}) \\ + β_{8} (S a m e s o c i o e c o n o m i c s t a t u s_{i j}) + β_{9} (S i m i l a r a g e_{i j}) \\ + β_{10} (S i m i l a r g e n d e r c o n f o r m i t y_{i j}) \\ + β_{11} (C o m m o n e x t r a c u r r i c u l a r a c t i v i t i e s_{i j}) \end{aligned}

(2)

where the outcome is the probability of developing friendship ties among two students i and j, conditional on all dyads other than

Y_{i j}

. The model includes an “edge” term, which refers to the change statistics for network density and is sometimes described as an intercept in traditional regression models (McMillan 2019). Three endogenous network factors, the friendship skew, open triads, and closed triad effects, are captured using geometrically weighted change statistics. All covariates are included in sociality and homophily effects, except for socioeconomic status sociality due to multicollinearity (Duxbury 2021a). Each coefficient illustrates the impact on the log-odds of friendship ties, conditional upon all other factors. In order to diagnose multicollinearity, an ERGM-based variance inflation factor (VIF) was applied for each coefficient (Duxbury 2021a).¹² A goodness-of-fit examination (see Figures A1-A3 goodness-of-fit plots in the Appendix) shows that the simulated networks from the models reasonably capture the observed networks’ characteristics in 14 schools.

This study used MPLE to estimate ERGMs. While Markov Chain Monte Carlo MLE (MCMCMLE) is considered preferable in cases of network dependency structures, it presents challenges regarding model convergence and may not always surpass the efficiency of MPLE. As van Duijn, Gile, and Handcock (2009) indicated, while MPLE generally yields smaller standard errors for endogenous network effects, it tends to overestimate the standard errors associated with attribute effects. Therefore, MPLE serves as a conservative test for homophily (An 2022). Following suggestions from the literature on enhancing MPLE performance (van Duijn, Gile and Handcock 2009), this study employed the bias-corrected pseudo-likelihood estimator (MBLE).

After completing the estimation of ERGMs for each school, this study summarized the coefficients using meta-analysis to provide an overarching picture of the results (An 2015). This meta-analysis approach has been successfully applied to aggregating outcomes derived from ERGMs across multiple networks (McFarland et al. 2014; McMillan 2019; Smith et al. 2016). This study used the univariate random-effect meta-analysis method (An 2015; Snijders and Baerveldt 2003) and the residual (restricted) maximum likelihood to estimate the parameters (Thompson and Sharp 1999). An earlier attempt also used multivariate meta-analysis, which can aggregate all ERGM coefficients jointly and capture covariation between coefficients. Unfortunately, similar to the findings of Smith et al. (2016), the multivariate meta-analysis did not converge due partly to a lack of power. This study thus presented the results of univariate meta-analysis. Moreover, in order to accommodate the heterogeneity between networks and mitigate scaling bias (Duxbury and Wertsching 2023), the ERGM coefficients were transformed into the average marginal effects (AMEs), which can be interpreted as predicted tie probabilities and then summarized in the meta-analysis.

To gain a deeper understanding of the relationships between homophily by categorical gender and gender conformity, this study conducted a mediated moderation analysis, following the marginal effects framework suggested by Duxbury (2021b). This study focuses on gender conformity as a mediator of categorical gender and not the other way around because I hypothesize that as two people are from the same gender categories, the more likely they have similar gender conformity, the more likely they develop friendships. To test this hypothesis, the ERGMs were estimated with and without homophily by gender conformity. Then, the coefficients were transformed into AMEs and compared to calculate the mediation effects. In this framework, homophily by categorical gender is treated as an interaction (the main effect is gender sociality), and the analysis will show to what extent homophily by gender conformity mediates the effect of homophily by categorical gender, net of the main effect for categorical gender. All relevant codes are shown in Figure B1 in the Appendix.¹³

Results

ML-Based Gender Conformity

One advantage of using regression-based algorithms to train a machine learning model is that we can intuitively inspect the relationships between individual variables and the resultant predictions. Figure 3 illustrates the relative importance of survey items to gender classification. Higher score values denote a greater contribution to the classifier in terms of differentiating between men and women. The triangle dot represents items that predict men typicality, while the circle predicts women typicality. The three items most distinctive are the frequency of crying, playing an active sport, and playing video or computer games, all of which may reflect the pressure on young boys to prove their masculinity and are consistent with communication research showing young girls are less involved in video games due to gender role stereotyping (Hartmann and Klimmt 2006). Other survey items, such as self-perceived weight status (circle dot) and getting in serious physical fights (triangle dot), show relatively modest associations, which may imply differential ideal weight and boldness for girls and boys (Fletcher 2014; Haynie, Steffensmeier and Bell 2007). The least distinctive features include wearing seatbelts and going with your gut feeling for decisions, which are relatively ambiguous in associating with gender categories.

Figure 3.

Variable importance in penalized logistic regression. Note. — Circle: negative coefficients (higher values refer to greater women typicality); triangle: positive coefficients (higher values refer to greater men typicality).

Table 1 shows the descriptive statistics of the analytic sample. As expected, boys have higher men typicality, and girls have lower men typicality. Meanwhile, socioeconomic status, race, and age are comparable between boys and girls, indicating that the difference in gender performance is not because the classifier was trained on two very different groups. The standard deviations of gender conformity are both large, indicating that some boys and girls may be misclassified as the opposite gender categories. As So and Roland (2020) argue, this does not necessarily imply errors but rather suggests variations in conformity with respect to established gender categories or gender norms.

Table 1.

Descriptive Statistics.

Variables	All (N = 3,527)		Categorical gender: Boys (N = 1,806)		Categorical gender: Girls (N = 1,721)
	Mean	SD	Mean	SD	Mean	SD
Gender conformity	0.47	0.29	0.65	0.23	0.28	0.22
Age (years)	16.66	1.53	16.78	1.5	16.53	1.54
	%	N	%	N	%	N
Socioeconomic status (parents’ education)
Less than high school	3.3	118	2.7	48	4.1	70
High school graduate	21.1	743	21.7	391	20.5	352
Some college	12.1	426	11	198	13.2	228
At least a bachelor's degree	24.6	868	24.5	443	24.7	425
Missing/don’t know	38.9	1372	40.2	726	37.5	646
Self-identified race
White (non-Hispanic)	48.7	1716	48.7	879	48.6	837
Black (non-Hispanic)	15.6	550	15.2	272	16.2	278
Hispanic	20.2	712	19.9	359	20.5	353
Other race (non-Hispanic)	15.6	549	16.4	296	14.7	253
Categorical gender (girls)	49	1721

Figure 4 shows a histogram of gender conformity among boys and girls. It shows that most boys and girls adhered to the gender norms, as evidenced by the skewed nature of the histograms toward each end of the spectrum. Similarly, the histogram also indicates the presence of men and women who possess comparatively low or high values in comparison to their same-gender peers. This observation suggests that certain boys and girls adopted gender performances that deviate from the norms. This finding corroborates key insights of gender theories and ethnographic studies, which assert that individuals have consistently engaged in negotiation and resistance (Bettie 2003; Deutsch 2007; Pascoe 2012). One interesting pattern is that compared to girls, more boys’ gender performance is atypical. This finding contrasts with recent qualitative research showing that girls enjoy greater freedom in gender performance than boys (Kane 2006). While a comprehensive explanation of this pattern requires additional data and is, therefore, beyond the scope of this article, it is important to acknowledge that stereotypes pertaining to girls and women have undergone more rapid change in recent decades within the context of the United States than those on boys and men (Eagly et al. 2020). Accordingly, it is plausible that girls might not possess the same level of freedom to engage in atypical gender enactment during the 1990s as they do today. The descriptive statistics of survey items support this idea, as boys have greater variations than girls among 13 out of 21 items (Table A1 in the Appendix).

Figure 4.

Histogram of gender conformity by categorical gender. Note. — A: boys; B: girls. For both panels, being close to 1 indicates more men typicality, whereas being close to 0 indicates more women typicality.

So far, the descriptive data suggests that we can use this ML-based measure to capture gender conformity. How about the relationships between adolescents’ gender conformity and social ties? To grasp the association, a friendship network from school number 7 (Sunshine High School) is plotted in Figure 5. On the left side is the friendship network colored by categorical gender, and on the right is the same network colored by gender conformity. Among adolescents who have the same categorical gender, there are several small clusters of individuals who exhibit similar gender conformity, which are located in the lower-right portion of the graph. Additionally, some individuals belonging to different categorical genders have similar gender conformity. This pattern is also observed in the middle-left portion of the graph, particularly among peripheral pairs or open triads. While homophily by categorical gender is a defining characteristic of many adolescent friendships, Figure 5 demonstrates that friendships based on similar gender conformity are also prevalent and can occur among peers of the same or different categorical gender.

Figure 5.

Friendship network at sunshine high school at wave 1 by (A) categorical gender and (B) gender conformity. For the purpose of this illustration, gender conformity scores are divided into five equal groups from the analytic sample.

Homophily in Gender Conformity in Friendship Ties

In Sunshine High School, the friendship network suggests that adolescent friendships are clustered by gender conformity. Still, this association might be spurious, as various network processes may confound such a clustering pattern. The subsequent step is to statistically evaluate the association between gender conformity and the formation of ties.

Table 2 displays the meta-analysis findings of ERGMs. First, the control variables associated with endogenous network processes align with prior research (An 2022; Goodreau, Kitts and Morris 2009; McFarland et al. 2014; McMillan 2019). Overall, adolescents exhibit more skewed degree distributions, wherein only a small number of adolescents possess more friendship ties. Adolescents also tend to form triadic closure than a random process would expect.¹⁴ Conversely, the sociality effects are not statistically significant, indicating an absence of evidence to suggest that adolescents who are male, older, or exhibit more men typicality are more likely to establish friendships with others.

Table 2.

Mean Coefﬁcients ERGMs on Friendship Networks (log-Odds).

Variable	B		SE	VIF	Q	AME		AME SE
Edge (Density)	−1.82	**	0.54
Endogenous network effects
Friendship skew	−0.86	***	0.20	[1.1–2.3]	0.00	−0.018	***	0.005
Closed triads	0.85	***	0.10	[1.0–1.4]	0.00	0.020	***	0.003
Open triads	−0.21	***	0.03	[1.1–2.4]	0.00	−0.007	***	0.002
Sociality effects
Categorical gender (boys)	0.03		0.02	[1.3–3.0]	0.46	0.0001		0.0001
Age	−0.02		0.01	[1.0–1.6]	0.00	−0.0003		0.0002
Gender conformity	−0.11		0.09	[1.4–2.9]	0.00	−0.0007	***	0.0001
Homophily effects
Same categorical gender	0.26	***	0.05	[1.0–1.6]	0.03	0.0007	***	0.0001
Same socioeconomic status	0.15	***	0.04	[1.0–1.1]	0.23	0.002	*	0.001
Similar age†	0.70	***	0.02	[1.0–1.2]	0.16	0.012	***	0.001
Similar gender conformity†	0.20	***	0.06	[1.0–1.6]	0.58	0.0002	**	0.0001
Shared foci effects
Common extracurricular activities	0.34	***	0.04	[1.0–1.3]	0.00	0.008	***	0.002

Note. — B = estimated average ERGM coefﬁcients weighted by their variance, SE = standard error of the estimated average ERGM coefﬁcients, VIF = range variance inflation factors ([min, max]), Q = Cochran Q test of ERGM coefficients, p values testing school homogeneity, AME = average marginal effects, AME SE = standard error of average marginal effects. AME and AME SE are shown in 3 or 4 digits, depending on the size of the values.

†

The sign of the coefficient for age and gender conformity has been reversed to facilitate interpretation. The AME is the second difference calculated by comparing the AMEs between the mean and one standard deviation of gender conformity or age.

*p < .05, **p < .01, ***p < .001(two sided).

The formation of friendships is also influenced by several homophilous attractors, including categorical gender, age, and socioeconomic status. First, the estimated coefficient for categorical gender indicates that adolescents belonging to the same gender category correspond to a 1.3 increase in the estimated odds of a friendship (Odds ratio: e^0.26 = 1.30; p < 0.001). Second, adolescents from the same socioeconomic status increase the estimated odds of a friendship by 16 percent (Odds ratio: $e^{0.15}$ = 1.16; p < 0.001), while those of the same age correspond to 2.01 increase in the estimated odds of a friendship compared to those with a one-year age difference (Odds ratio: $e^{0.70}$ = 2.01; p < 0.001).

Due to the importance of opportunity structures or induced homophily, the ERGMs were also adjusted for the quantity of shared extracurricular activities as a shared foci effect. A positive and significant coefficient of common extracurricular activities suggests that adolescents tend to form friendships when participating in more shared extracurricular activities. The estimated coefficient indicates that each additional shared extracurricular activity corresponds to a 1.4 increase in the estimated odds of a friendship (Odds ratio: $e^{0.34}$ = 1.40; p < 0.001) (Hunter et al. 2008).

With respect to the key variable of interest, the coefficient estimate of similar gender conformity is positive and statistically significant (Odds ratio: $e^{0.20}$ = 1.22; p < 0.001). This indicates that a one-unit decrease in the absolute difference in gender conformity would correspond to a 1.22 increase in the estimated odds of a friendship, net of homophily by categorical gender and other controls (Hunter et al. 2008). To examine the substantive significance of homophily by gender conformity, I transformed this odds ratio into predicted probabilities. Assuming a hypothetical dyad, comprised of two nodes that were both men, same age (16 years old), same socioeconomic status, and no club co-participation, but one shows more men typicality (e.g., score = 1) and the other is more atypical (e.g., score = 0.5), fixing all other endogenous statistics at 0 to simplify the calculation. According to Table 2, the resultant predicted tie probability is 0.095, whereas the predicted tie probability is 0.098 if the same hypothetical dyad has similar gender conformity (e.g., scores = 0.9 and 1).¹⁵

Because the meta-analysis of ERGMs may suffer from scaling bias, I also adopted the average marginal effects framework (Duxbury 2021b) and transformed coefficients into average marginal effects (AMEs). Table 2 shows that most effects remain statistically significant, except that the AME of gender conformity sociality became statistically significant (AME = −0.0007, p < 0.001). The AME of homophily by gender conformity was calculated by comparing two scenarios: (1) one individual has the average score, while the other's score is one standard deviation below the average, and (2) both individuals in the dyad have the average gender conformity score. This comparison assesses the impact of a one-standard-deviation difference in gender conformity on the probability of a tie forming between two individuals. Table 2 shows this difference in gender conformity increases the tie probability by 0.02% (0.02 percent increase) and is statistically significant. Because AME is comparable across effects, the magnitude of homophily based on gender conformity is similar to that of homophily by categorical gender (0.07 percent increase in tie probability).

In light of potential concerns regarding multicollinearity issues, particularly between categorical gender and gender conformity, I applied Duxbury's (2021a) method to calculate the ERGM-based variance inflation factor (VIF) for all variables. As can be seen in Table 2, the majority of VIF scores are below 4, suggesting that there is no severe concern about the effects of multicollinearity. Finally, to test whether the effects of homophily by gender conformity depend on school contexts, I ran Cochran's Q test to examine if there is between-school heterogeneity (i.e., Q column). The p-value for similar gender conformity is 0.58, suggesting that homophily based on gender conformity is indistinguishable across schools.

Mediation Analysis

A subsequent analysis concerns the extent to which the homophily effect in gender conformity mediates the homophily effect in categorical gender. In other words, can friendships between individuals of the same gender categories be attributed to similar gender conformity? Table 3 presents the findings of a mediated moderation analysis. Since this analysis is based on average marginal effects, the values should be interpreted as predicted tie probabilities. The total effect in the interaction effect column refers to the change in predicted tie probabilities when alters’ categorical gender is changed from a mismatch to a match (i.e., homophily is present) in the model that does not adjust for homophily by gender conformity. The direct effect in the interaction effect column denotes the change in predicted tie probabilities when homophily by categorical gender is present in the model adjusting for homophily by gender conformity. The mediation effect refers to the difference between the total and direct effects. Besides examining the mediation for homophily in categorical gender, Table 3 includes mediation for age homophily as a comparison. Theoretically, the mediation effect should only operate between gender conformity and categorical gender. The results reveal a statistically significant mediation effect for homophily by categorical gender (change in tie probability×10 = 0.001, p = 0.04) and a null mediation effect for age homophily (change in tie probability×10 = 0.000, p = 0.12).

Table 3.

Mediated Moderation Analysis with Homophily by Gender Conformity.

	Interaction Effect		Mediation Effect
	Total	Direct
Same categorical gender	0.010 *** (.002)	0.007 *** (.001)	0.001 * (.001)
Similar age	0.064 (.037)	0.063 (.037)	0.000 (.000)

Note. — Total interaction effect = estimated changes in AME (average marginal effect) when alters’ categorical gender is changed from a mismatch to a match in the restricted models (excluding homophily by gender conformity) weighted by their delta standard errors (multiplied by 10 to facilitate interpretation). Direct interaction effect = estimated changes in AME when alters’ categorical gender is changed from a mismatch to a match in the full models (including homophily by gender conformity) weighted by their delta standard errors (multiplied by 10 to facilitate interpretation). Delta Standard Errors in Parentheses. All AMEs are derived from a meta-analysis of ERGMs in 14 schools. In age models, the mediation effect may be estimated for each age. To simplify the result, the change when moving from the minimum to ego mean was reported.

*p < .05, **p < .01, ***p < .001(two sided).

Taken together, the findings provide evidence that homophily by gender conformity is an important factor in friendship formation. Adolescents tend to form friendships with individuals who have similar gender conformity. Notably, this effect exists above and beyond homophily by categorical gender and mediates homophily by categorical gender. Regarding criterion validity, the findings demonstrate that the computational approach effectively captures characteristics associated with gender conformity. This is evidenced by its ability to predict whether adolescents applied gender conformity when forming social connections, as documented in prior qualitative research (Bettie 2003, Pascoe 2012).

Sensitivity Analyses

This study conducted sensitivity analyses to examine the robustness of the results. As mentioned earlier, the machine learning approach allows for a more flexible analytical framework that can incorporate complex machine learning algorithms. To examine whether the homophily effect by gender conformity is sensitive to selected algorithms, I replicated the same analyses using two alternative algorithms – decision tree and support vector machine – which are commonly used in classification tasks. The AUC of the test dataset using the decision tree algorithm is 0.791, which is lower than the performance of the penalized logistic regression algorithm (AUC of the test dataset = 0.848). By contrast, the AUC of the test dataset by support vector machine algorithm is 0.851, a better performance than the logistic regression. The histograms of gender conformity among boys and girls by decision tree and support vector machine algorithms further show that the decision tree algorithm provides a worse measure, as the distribution is more mixing among boys and girls (Figure A7 in the Appendix). While the decision tree algorithm shows more heterogeneity in gender conformity, I argue that it introduces more noise and may bias the estimation of the homophily effect by gender conformity.

The results from ERGM, presented in Tables A3 and A4 in the Appendix, support this argument. On the one hand, the homophily effect by gender conformity is indistinguishable from zero when decision tree algorithm was used (coefficient = −0.05, p-value = 0.62, AME = 0.0001, p-value = 0.58). On the other hand, the model using the support vector machine algorithm shows a statistically significant homophily effect by gender conformity (coefficient = 0.17, p-value = 0.004, AME = 0.0002, p-value = 0.004), which is consistent with the main analyses using the logistic regression. This finding indicates that a more effective algorithm can better identify the impact of homophily based on gender conformity.

Second, I examined whether the main results can apply to the change of friendship ties using longitudinal networks. The analyses used the same gender conformity measure obtained in Wave 1, while modeling Wave 2 networks using temporal ERGMs (TERGMs). TERGM is an extension of ERGM, though it additionally includes the lagged networks (i.e., Wave 1 networks). More details about TERGMs can be found in Appendix B. Tables B1 and B2 in the Appendix show the results of TERGMs, using MPLE and bootstrap-MPLE (Cranmer, Desmarais and Morgan 2020), respectively. In terms of statistical significance, TERGMs reproduced statistically significant coefficients of similar gender conformity. However, the AMEs became statistically nonsignificant. Because the p-value of Cochran's Q test in Table B1 is statistically significant (Q = 0.03) and the strength of gender conformity may vary by geographical context (Kazyak 2012), I ran meta-regression for AMEs, including the metropolitan location as the independent variable. Results show that urban settings have the largest AMEs for similar gender conformity compared to suburban and rural settings (Table B3 in the Appendix). In other words, students who share gender conformity may be more likely to form friendships between Waves 1 and 2 if they attend schools in urban areas. The mediation analysis was also applied to longitudinal networks. Consistent with Table 3, Table B4 in the Appendix shows a mediation effect for homophily by categorical gender but a null effect for age homophily.

Third, I investigated whether similar gender conformity works differently for boys and girls. Table C1 in the Appendix shows the results from ERGMs, including the interaction effect between homophily by categorical gender and gender conformity, whereas Table C2 shows the results from TERGM. The coefficient from the ERGMs suggests that in addition to the main effect of similar gender conformity, the interaction effect between similar gender conformity and same boys is statistically significant and positive, though the AME is not significant. However, neither the coefficient nor AME from the TERGMs were statistically significant. In other words, only one out of four parameters suggests that the homophily effect based on gender conformity varies across gender categories.

Fourth, because at Wave 1 there were a few students who were asked to nominate only one male and one female friend (see Schaefer, Haas and Bishop 2012), I included a dummy variable indicating which question version the adolescent was asked, where 0 refers to the full version (i.e., five male and five female friends) and 1 refers to the truncated version (i.e., one male and one female friend). I then included this variable in ERGMs, excluding one school where all students received the truncated version (the range of truncated version for the remaining 13 schools is between 4% and 98%). Results suggest that the coefficient for similar gender conformity was statistically significant, though the AME of the similar gender conformity became marginal significant (p = 0.08) (Table D1 in the Appendix). Fifth, I ran an ERGM model that treats friendships as directed ties to examine if the findings depend on the directionality of ties (but see Kitts and Leal (2021) on the assumption of directionality). Because the tie is directed, this model also included reciprocity as an additional endogenous network effect. Results show that the directed-tie models replicated the statistically significant effects for homophily effects based on gender conformity in terms of model coefficients, while the AME became statistically nonsignificant (Table E1 in the Appendix).

This study also conducted two robustness checks to assess the sensitivity of the findings to alternative model specifications. First, to examine the potential influence of sociality and homophily variables for racial groups, the analysis was repeated in cross-sectional networks of four schools and longitudinal networks of three schools, where racial effects could be estimated. The results indicate that the coefficients of similar gender conformity are similar to the main analyses (Table F1 in the Appendix), though the AMEs became statistically nonsignificant partly because of a smaller sample size. Second, a methodological concern when using penalized logistic regression algorithms is the weighting strategy (Hindman 2015). The primary models account for weighting via the lasso method. To assess the robustness of the findings to this weighting specification, this study employed an additional common approach known as the ridge method, which yielded similar outcomes for similar gender conformity, wherein the coefficients of similar gender conformity and the AMEs were both statistically significant (Table G1 in the Appendix). Like Table B3, students attending schools in urban areas also have the largest AME for similar gender conformity in TERGMs (results available upon request).

To conclude, the sensitivity analyses, which explored alternative machine learning algorithms, longitudinal network analysis, interaction effects, survey implementation error, the directionality of ties, omitted variable bias, and alternative weighting methods, indicate that the findings for homophily based on gender conformity are robust with the exception that the AMEs appear to be attenuated by a smaller sample size or the directionality of ties. Also, the AMEs from longitudinal network analyses suggest that the strength of homophily by gender conformity on the change of ties between Waves 1 and 2 may be associated with the metropolitan location.

Discussion

This article offers an analytical framework for scholars to combine two well-developed computational methods – machine learning and exponential random graph models – with a focus on networks and categories. Drawing on the infracategorical model of inequality (Monk 2022), this study investigates the extent to which an instance of categorical cues – conforming to categorical membership – is associated with the formation of social ties. Specifically, this study used gender segregation in adolescent networks as an empirical case and examined the degree to which people establish friendships based on shared gender conformity, above and beyond the tendency to cluster based on categorical gender. Results from ERGMs reveal that adolescents who exhibit similar gender conformity are more likely to form friendships independent of their categorical gender, shared foci, and endogenous network processes. Also, the homophily effect based on gender conformity can mediate the homophily effect by categorical gender. By empirically exemplifying best practices for guiding ERGMs with machine-learning-based gender conformity, this study demonstrates how computational methods can facilitate a shift in focus from categorical membership to heterogeneity in categorical conformity.

This article makes theoretical contributions to the field of culture and networks. First, it extends the “cultural matching” argument, asserting that culture is not solely a product of networks but also a catalyst for the formation of social ties (Edelmann and Vaisey 2014; Lewis and Kaufman 2018; Lizardo 2006; McLean 2017; Rivera 2012; Vaisey and Lizardo 2010). Through computational sociological analyses, this paper demonstrates that individuals rely on their alignment with categories when developing social relationships. While examining the coevolution of gender conformity and network ties is not feasible in ERGMs, this paper effectively illustrates that the cultural meanings associated with identities, or categorical cues, significantly impact network structures. Such analyses can be extended to other identity categories, such as race, ethnicity, sexuality, and class, yielding a deeper understanding of how the transformation from cultural resources (e.g., typicality) to social resources reinforces (or undermines) the structures of network segregation.

This paper also presents contributions to the formal analysis of culture. Computational methods are increasingly being applied to the sociological analysis of cultural meaning, beliefs, and networks due to the rise of machine learning and other computationally intensive methods (Arseniev-Koehler and Foster 2022; Boutyline and Vaisey 2017; Goldberg 2011; Knight 2022; Kozlowski, Taddy and Evans 2019; Snijders 2011; Voyer et al. 2022; Zhou 2022). This article expands upon this line of research by applying machine learning techniques to the study of identity (Long and So 2015; So and Roland 2020; So, Long and Zhu 2019). By using predicted probabilities obtained from machine learning, the method presented in this study offers a more suitable computational approach to capture the inherently continuous and heterogeneous nature of differences. Its combination with statistical network models further confirms the theoretical insights of categorical conformity in network structures while adjusting for other important mechanisms.

This paper contributes to the expanding body of scholarly work on measuring gender. Recently, gender scholarship has initiated a discussion regarding the limitations of measuring gender as a categorical attribute in quantitative research (Hart et al. 2019; Westbrook and Saperstein 2015). Consequently, a gradation measure of gender performance has been increasingly used across various disciplines, including education (Mittleman 2022; Yavorsky and Buchmann 2019), public health (Cislaghi et al. 2022), and economics (Domingue et al. 2019). This study draws upon this burgeoning research and explores gender conformity. While the method does not directly measure actual behaviors, the responses from forced-choice surveys may be equally suited to studying the culture-action link, as suggested by the dual-process model of culture (Vaisey 2009). Moreover, such measurement can be informed by more complicated machine learning algorithms, making the gender conformity measure more flexible than the gender diagnosticity approach. The analytical framework can also be generalized to different social networks and populations, ultimately aiding in the examination of broader social processes of gender segregation and gender inequality.

Finally, the findings extend theories of categories and boundaries in inequality and stratification research (Monk 2022). By analyzing the interplay between gender categories, gender conformity, and network ties, this study sheds light on the role of categorical cues in shaping intergroup boundaries through a computational lens. I argue that categorical conformity, an instance of cues of categories, is a key factor within the infracategorical model of inequality and can mediate the association between social categories and the production of inequalities. By integrating machine learning and inferential network analysis, we can investigate how different categorical cues, such as self-identification, perceived typicality, and categorical conformity, contribute to exacerbating or mitigating inequalities by shaping social relationships. Future work can extend the machine-learning-based categorical conformity to explore various forms of inequality, such as health disparities, educational assortative marriage, and earnings inequality.

Limitations, Strengths, and Future Work

This study has several important limitations and unique strengths. First, although the gender performance classifier has a high level of accuracy, achieving 84%, indicating its potential to distinguish various gender categories effectively, it does not encompass all proxy measures regarding behaviors and attitudes pertinent to gender performance. In other words, the classifier's performance is constrained by the availability of relevant survey instrument questions. However, one strength of this classifier is the applicability of its workflow process to different survey data or even text data. For instance, we can gather interview or ethnographic data that describes the performance of identities and train machine learning accordingly. Future research can explore various data collection methods to compare and contrast different aspects of categorical conformity.

Second, the gender conformity measure used in the present study is derived from a U.S. adolescent sample dating back to the 1990s. Consequently, it cannot be assumed that the classifier trained using this sample can be directly applied to other populations. However, this computational approach enables cross-population comparisons concerning categorical conformity and its effects on network structures. Also, gender categories can be multifaceted, such as when collecting information about lesbian, gay, bisexual, and transgender individuals (Logan 2013). While the computational approach is not a substitute for qualitative research that provides detailed insights into how individuals perform their identities in specific contexts, it facilitates large-sample, cross-situational, and comparative studies. Moreover, it extends existing social network analysis to include categorical cues beyond simple categories.

Third, individuals’ categorical conformity, including gender conformity, can undergo temporal shifts or vary across birth cohorts. Similar to the main findings that conformity can foster relationships, peers may influence the strength of conformity. However, ERGMs are unable to assess the effects of peers on gender conformity. Future research should use alternative statistical network models, such as stochastic actor-oriented models (SAOMs), to explore the coevolution of networks and gender conformity. Access to more recent longitudinal network data will also facilitate the examination of conformity and networks across cohorts. It is plausible that homophily effects based on categorical gender and gender conformity may diminish over time (Smith, McPherson and Smith-Lovin 2014). However, for certain categories, like race, the influence of categorical homophily may intensify as individuals use these labels more frequently to differentiate between in-groups and out-groups. This cohort difference illustrates that homophily based on categorical conformity or other cues of categories are a complex issue and warrants further investigation.

Fourth, while the coefficient of similar gender conformity in TERGMs aligns with the one in ERGMs, the AME is not statistically significant. Additionally, the meta-regression suggests that the strength of homophily effects based on gender conformity varies across different geographic contexts. Given the robustness of AMEs in meta-analyses (Duxbury and Wertsching 2023), this finding is noteworthy. Future research can investigate how homophily effects based on gender conformity relate to urban or rural contexts (Kazyak 2012) and, more broadly, how the coevolution of networks and categorical conformity varies across contexts.

Conclusion

The present study expands the literature concerning cues of categories, homophily, and the formal analysis of culture by combining machine learning and ERGMs to investigate whether individuals use their gender conformity to establish friendships. A workflow procedure for training and evaluating this gender conformity measure has been delineated. By examining the connections between categorical gender, gender conformity, and adolescent friendship networks, the findings indicate that individuals tend to form friendships with those who have similar gender conformity, transcending the binary-based gender classifications. These results corroborate the theoretical insights and qualitative findings regarding gender conformity, while also highlighting the importance of the continuous and heterogeneous nature of social difference in network structures, extending beyond mere categories.

Research Ethics

Procedures for data access to restricted Add Health data and analysis were implemented as approved by the Institutional Review Board and in agreement with the sensitive data security plan approved by Add Health study administrators.

Supplemental Material

sj-docx-1-smr-10.1177_00491241251321152 - Supplemental material for Networks Beyond Categories: A Computational Approach to Examining Gender Homophily

Supplemental material, sj-docx-1-smr-10.1177_00491241251321152 for Networks Beyond Categories: A Computational Approach to Examining Gender Homophily by Chen-Shuo Hong in Sociological Methods & Research

Footnotes

Data and Code Availability Statement

This research uses data from Add Health, funded by grant P01 HD31921 (Harris) from the Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), with cooperative funding from 23 other federal agencies and foundations. Add Health is currently directed by Robert A. Hummer and funded by the National Institute on Aging cooperative agreements U01 AG071448 (Hummer) and U01AG071450 (Aiello and Hummer) at the University of North Carolina at Chapel Hill. Add Health was designed by J. Richard Udry, Peter S. Bearman, and Kathleen Mullan Harris at the University of North Carolina at Chapel Hill. Regarding accessing the Add Health data files, please refer to the Add Health website (http://www.cpc.unc.edu/addhealth). I include code in Supplemental Information to bolster transparency, and point interested readers to the full code on the Open Science Framework (OSF; ).

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Chen-Shuo Hong

Supplemental Material

Supplemental material for this article is available .

Notes

Author Biography

Chen-Shuo Hong is an Assistant Professor of Sociology at National Taiwan University. His research studies the interplay between culture and network structure to analyze the intergroup ties that contribute to the creation of inequality, using quantitative and computational methods. His work has appeared in Social Science Research, Social Networks, and Social Science & Medicine.

References

Adams

Jimi.

2019. Gathering Social Network Data. Los Angeles, CA: Sage.

Weihua.

2015. “Multilevel Meta Network Analysis with Application to Studying Network Dynamics of Network Interventions.” Social Networks 43:48–56.

Weihua.

2022. “Friendship Network Formation in Chinese Middle Schools: Patterns of Inequality and Homophily.” Social Networks 68:218–28.

Arseniev-Koehler

Alina

Foster

Jacob G.

. 2022. “Machine Learning as a Model for Cultural Learning: teaching an Algorithm What It Means to Be Fat.” Sociological Methods & Research 51(4):1484–539.

Bettie

2003. Women Without Class: Girls, Race, and Identity. Berkeley, CA: University of California Press.

Blau

Peter M

Schwartz

Joseph E

. 1984. Crosscutting Social Circles: Testing a Macrostructural Theory of Intergroup Relations. Orlando, FL: Free Press.

Bourdieu

Pierre

. 1984. Distinction: A Social Critique of the Judgement of Taste. Cambridge, MA: Harvard University Press.

Boutyline

Andrei

Vaisey

Stephen

. 2017. “Belief Network Analysis: A Relational Approach to Understanding the Structure of Attitudes.” American Journal of Sociology 122(5):1371–447.

Brensinger

Jordan

Sotoudeh

Ramina

. 2022. “Party, Race, and Neutrality: Investigating the Interdependence of Attitudes toward Social Groups.” American Sociological Review 87(6):1049–93.

10.

Brewer

Devon D.

2000. “Forgetting in the Recall-Based Elicitation of Personal and Social Networks.” Social Networks 22(1):29–43.

11.

Cheadle

Jacob E.

Schwadel

Philip

. 2012. “The ‘Friendship Dynamics of Religion,’ or the ‘Religious Dynamics of Friendship’? A Social Network Analysis of Adolescents Who Attend Small Schools.” Social Science Research 41(5):1198–212.

12.

Cislaghi

Beniamino

Weber

Ann M.

Shakya

Holly B.

Abdalla

Safa

Bhatia

Amiya

Domingue

Benjamin W.

Mejía-Guevara

Iván

Stark

Lindsay

Seff

Ilana

Richter

Linda M.

Baptista Menezes

Ana Maria

Victora

Cesar G.

Darmstadt

Gary L.

. 2022. “Innovative Methods to Analyse the Impact of Gender Norms on Adolescent Health Using Global Health Survey Data.” Social Science & Medicine 293:114652.

13.

Cooper

Harris

Hedges

Larry V.

Valentine

Jeffrey C.

. 2009. Handbook of Research Synthesis and Meta-Analysis. New York, NY: Russell Sage Foundation.

14.

Cranmer

S. J.

Desmarais

B. A.

Morgan

J. W.

. 2020. Inferential Network Analysis. Cambridge, UK: Cambridge University Press.

15.

Deaux

Kay

Lewis

Laurie L.

. 1984. “Structure of Gender Stereotypes: Interrelationships among Components and Gender Label.” Journal of Personality and Social Psychology 46(5):991–1004.

16.

Deaux

Kay

Major

Brenda

. 1987. “Putting Gender into Context: An Interactive Model of Gender-Related Behavior.” Psychological Review 94(3):369–89.

17.

DellaPosta

Daniel.

2020. “Pluralistic Collapse: The “Oil Spill” Model of Mass Opinion Polarization.” American Sociological Review 85(3):507–36.

18.

Deutsch

Francine M.

2007. “Undoing Gender.” Gender & Society 21(1):106–27.

19.

Domingue

Benjamin W.

Cislaghi

Beniamino

Nagata

Jason M.

Shakya

Holly B.

Weber

Ann M.

Boardman

Jason D.

Darmstadt

Gary L.

Harris

Kathleen Mullan

. 2019. “Implications of Gendered Behaviour and Contexts for Social Mobility in the USA: A Nationally Representative Observational Study.” The Lancet Planetary Health 3(10):e420–e28.

20.

Duxbury

Scott W.

2021a. “Diagnosing Multicollinearity in Exponential Random Graph Models.” Sociological Methods & Research 50(2):491–530.

21.

Duxbury

Scott W.

2021b. “The Problem of Scaling in Exponential Random Graph Models.” Sociological Methods & Research 52(2):764-802.

22.

Duxbury

Scott W.

Wertsching

Jenna

. 2023. “Scaling Bias in Pooled Exponential Random Graph Models.” Social Networks 74:19-30.

23.

Eagly

Alice H.

Nater

Christa

Miller

David I.

Kaufmann

Michèle

Sczesny

Sabine

. 2020. “Gender Stereotypes Have Changed: A Cross-Temporal Meta-Analysis of US Public Opinion Polls from 1946 to 2018.” American Psychologist 75(3):301–15.

24.

Edelmann

Achim

Vaisey

Stephen

. 2014. “Cultural Resources and Cultural Distinction in Networks.” Poetics 46:22–37.

25.

Feld

Scott L.

1981. “The Focused Organization of Social Ties.” American Journal of Sociology 86(5):1015–35.

26.

Feld

Scott L.

1982. “Social Structural Determinants of Similarity among Associates.” American Sociological Review 47(6):797–801.

27.

Fischer

Claude S.

1982. To Dwell among Friends: Personal Networks in Town and City. Chicago, IL: University of Chicago Press.

28.

Flashman

Jennifer.

2012. “Academic Achievement and Its Impact on Friend Dynamics.” Sociology of Education 85(1):61–80.

29.

Fleming

Paul J.

Harris

Kathleen Mullan

Halpern

Carolyn Tucker

. 2017. “Description and Evaluation of a Measurement Technique for Assessment of Performing Gender.” Sex Roles 76(11):731–46.

30.

Fletcher

Jason M.

2014. “The Interplay between Gender, Race and Weight Status: Self Perceptions and Social Consequences.” Economics & Human Biology 14:79–91.

31.

Foster

Jacob G.

2018. “Culture and Computation: Steps to a Probably Approximately Correct Theory of Culture.” Poetics 68:144–54.

32.

Goldberg

Amir.

2011. “Mapping Shared Understandings Using Relational Class Analysis: The Case of the Cultural Omnivore Reexamined.” American Journal of Sociology 116(5):1397–436.

33.

Gondal

Neha.

2022. “Multiplexity as a Lens to Investigate the Cultural Meanings of Interpersonal Ties.” Social Networks 68:209–17.

34.

Gondal

Neha

McLean

Paul D.

. 2013. “Linking Tie-Meaning with Network Structure: Variable Connotations of Personal Lending in a Multiple-Network Ecology.” Poetics 41(2):122–50.

35.

Goodreau

Steven M.

2007. “Advances in Exponential Random Graph (P*) Models Applied to a Large Social Network.” Social Networks 29(2):231–48.

36.

Goodreau

Steven M.

Kitts

James A.

Morris

Martina

. 2009. “Birds of a Feather, or Friend of a Friend? Using Exponential Random Graph Models to Investigate Adolescent Social Networks.” Demography 46(1):103–25.

37.

Green

Raymond J.

Ashmore

Richard D.

Manzi, Jr

Robert

. 2005. “The Structure of Gender Type Perception: Testing the Elaboration, Encapsulation, and Evaluation Framework.” Social Cognition 23(5):429–64.

38.

Grimmer

Roberts

M. E.

Stewart

B. M.

. 2022. Text as Data: A New Framework for Machine Learning and the Social Sciences. Princeton, NJ: Princeton University Press.

39.

Hanneke

Steve

Wenjie

Xing

Eric P.

. 2010. “Discrete Temporal Models of Social Networks.” Electronic Journal of Statistics 4:585–605. doi:10.1214/09-EJS548

40.

Harris

Kathleen Mullan

. 2013. The Add Health Study: Design and Accomplishments. Chapel Hill: Carolina Population Center, University of North Carolina at Chapel Hill.

41.

Hart

Chloe Grace

Saperstein

Aliya

Magliozzi

Devon

Westbrook

Laurel

. 2019. “Gender and Health: Beyond Binary Categorical Measurement.” Journal of Health and Social Behavior 60(1):101–18.

42.

Hartmann

Tilo

Klimmt

Christoph

. 2006. “Gender and Computer Games: Exploring Females’ Dislikes.” Journal of Computer-Mediated Communication 11(4):910–31.

43.

Haynie

Dana L.

2001. “Delinquent Peers Revisited: Does Network Structure Matter?” American Journal of Sociology 106(4):1013–57.

44.

Haynie

Dana L.

Steffensmeier

Darrell

Bell

Kerryn E.

. 2007. “Gender and Serious Violence: Untangling the Role of Friendship Sex Composition and Peer Violence.” Youth Violence and Juvenile Justice 5(3):235–53.

45.

Hindman

Matthew.

2015. “Building Better Models: Prediction, Replication, and Machine Learning in the Social Sciences.” The ANNALS of the American Academy of Political and Social Science 659(1):48–62.

46.

Hofstra

Bas

Corten

Rense

van Tubergen

Frank

Ellison

Nicole B.

. 2017. “Sources of Segregation in Social Networks: A Novel Approach Using Facebook.” American Sociological Review 82(3):625–56.

47.

Hunter

David R.

Handcock

Mark S.

Butts

Carter T.

Goodreau

Steven M.

Morris

Martina

. 2008. “ergm: A Package to Fit, Simulate and Diagnose Exponential-Family Models for Networks.” Journal of Statistical Software 24(3):nihpa54860–nihpa60. doi:10.18637/jss.v024.i03

48.

Jackson

Matthew O.

Rogers

Brian W.

Zenou

Yves

. 2017. “The Economic Consequences of Social-Network Structure.” Journal of Economic Literature 55(1):49–95.

49.

Kalmijn

Matthijs.

2002. “Sex Segregation of Friendship Networks. Individual and Structural Determinants of Having Cross-Sex Friends.” European Sociological Review 18(1):101–17.

50.

Kane

Emily W

. 2006. “No Way My Boys Are Going to Be Like That!.” Gender & Society 20(2):149-176.

51.

Kazyak

Emily.

2012. “Midwest or Lesbian? Gender, Rurality, and Sexuality.” Gender & Society 26(6):825–48.

52.

Kitts

James A.

Leal

Diego F.

. 2021. “What Is(n’t) a Friend? Dimensions of the Friendship Concept among Adolescents.” Social Networks 66:161–70.

53.

Knight

Carly.

2022. “When Corporations Are People: Agent Talk and the Development of Organizational Actorhood, 1890–1934.” Sociological Methods & Research 51(4):1634-1680.

54.

Kossinets

Gueorgi

Watts

Duncan J

. 2009. “Origins of Homophily in an Evolving Social Network.” American Journal of Sociology 115(2):405–50.

55.

Kozlowski

Austin C.

Taddy

Matt

Evans

James A.

. 2019. “The Geometry of Culture: Analyzing the Meanings of Class through Word Embeddings.” American Sociological Review 84(5):905–49.

56.

Leszczensky

Lars

Pink

Sebastian

. 2019. “What Drives Ethnic Homophily? A Relational Approach on How Ethnic Identification Moderates Preferences for Same-Ethnic Friends.” American Sociological Review 84(3):394–419.

57.

Levy

Michael A.

2016. “gwdegree: Improving Interpretation of Geometrically Weighted Degree Estimates in Exponential Random Graph Models.” Journal of Open Source Software 1(3):36. doi:10.21105/joss.00036

58.

Lewis

Kevin

Kaufman

Jason

. 2018. “The Conversion of Cultural Tastes into Social Network Ties.” American Journal of Sociology 123(6):1684–742.

59.

Lippa

Richard

Connelly

Sharon

. 1990. “Gender Diagnosticity: A New Bayesian Approach to Gender-Related Individual Differences.” Journal of Personality and Social Psychology 59(5):1051–65.

60.

Lizardo

Omar.

2006. “How Cultural Tastes Shape Personal Networks.” American Sociological Review 71(5):778–807.

61.

Logan

Laura S.

2013. “Status Homophily, Sexual Identity, and Lesbian Social Ties.” Journal of Homosexuality 60(10):1494–519.

62.

Long

Hoyt

Richard Jean

. 2015. “Literary Pattern Recognition: Modernism between Close Reading and Machine Learning.” Critical Inquiry 42(2):235–67. doi:10.1086/684353

63.

Magliozzi

Devon

Saperstein

Aliya

Westbrook

Laurel

. 2016. “Scaling Up: Representing Gender Diversity in Survey Research.” Socius 2:2378023116664352.

64.

Mahalik

James R.

Lombardi

Caitlin McPherran

Sims

Jacqueline

Coley

Rebekah Levine

Lynch

Alicia Doyle

. 2015. “Gender, Male-Typicality, and Social Norms Predicting Adolescent Alcohol Intoxication and Marijuana Use.” Social Science & Medicine 143:71–80.

65.

McFarland

Daniel A.

Moody

James

Diehl

David

Smith

Jeffrey A.

Thomas

Reuben J.

. 2014. “Network Ecology and Adolescent Social Structure.” American Sociological Review 79(6):1088–121.

66.

McLean

Paul Douglas

. 2017. Culture in Networks. Malden, MA: Polity.

67.

McMillan

Cassie.

2019. “Tied Together: Adolescent Friendship Networks, Immigrant Status, and Health Outcomes.” Demography 56(3):1075–103.

68.

McPherson

Miller

Smith-Lovin

Lynn

Cook

James M.

. 2001. “Birds of a Feather: Homophily in Social Networks.” Annual Review of Sociology 27(1):415–44.

69.

Mittleman

Joel.

2022. “Intersecting the Academic Gender Gap: The Education of Lesbian, Gay, and Bisexual America.” American Sociological Review 87(2):303–35.

70.

Mohr

J. W.

Bail

C. A.

Frye

Lena

J. C.

Lizardo

McDonnell

T. E.

Mische

Tavory

Wherry

F. F.

. 2020. Measuring Culture. New York, NY: Columbia University Press.

71.

Monk

Ellis P.

2022. “Inequality without Groups: Contemporary Theories of Categories, Intersectional Typicality, and the Disaggregation of Difference.” Sociological Theory 40(1):3–27.

72.

Moody

James.

2001. “Race, School Integration, and Friendship Segregation in America.” American Journal of Sociology 107(3):679–716.

73.

Mullainathan

Sendhil

Spiess

Jann

. 2017. “Machine Learning: An Applied Econometric Approach.” Journal of Economic Perspectives 31(2):87–106.

74.

Murphy

Kevin P.

2012. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press.

75.

Nagata

Jason M.

Domingue

Benjamin W.

Darmstadt

Gary L.

Weber

Ann M.

Meausoone

Valerie

Cislaghi

Beniamino

Shakya

Holly B.

. 2020. “Gender Norms and Weight Control Behaviors in U.S. Adolescents: a Prospective Cohort Study (1994–2002)a.” Journal of Adolescent Health 66(1):S34–41.

76.

Nelson

Laura K.

2020. “Computational Grounded Theory: a Methodological Framework.” Sociological Methods & Research 49(1):3–42.

77.

Nelson

Laura K.

2021. “Leveraging the Alignment between Machine Learning and Intersectionality: Using Word Embeddings to Measure Intersectional Experiences of the Nineteenth Century U.S. South.” Poetics 88:101539.

78.

Newman

M. E. J.

2003. “Mixing Patterns in Networks.” Physical Review E 67(2):026126.

79.

Oosterhoff

Benjamin

Poppler

Ashleigh

Palmer

Cara A.

. 2022. “Early Adolescents Demonstrate Peer-Network Homophily in Political Attitudes and Values.” Psychological Science 33(6):874–88.

80.

Pachucki

Mark A.

Breiger

Ronald L.

. 2010. “Cultural Holes: Beyond Relationality in Social Networks and Culture.” Annual Review of Sociology 36(1):205–24.

81.

Pascoe

C. J.

2012. Dude, You're a Fag: Masculinity and Sexuality in High School. Berkeley, CA: University of California Press.

82.

Rivera

Lauren A

. 2012. “Hiring as Cultural Matching.” American Sociological Review 77(6):999-1022.

83.

Robins

Garry

Pattison

Pip

Kalish

Yuval

Lusher

Dean

. 2007. “An Introduction to Exponential Random Graph (P*) Models for Social Networks.” Social Networks 29(2):173–91.

84.

Salganik

Matthew J.

, et al. 2020. “Measuring the Predictability of Life Outcomes with a Scientific Mass Collaboration.” Proceedings of the National Academy of Sciences 117(15):8398–403.

85.

Schaefer

David R.

Haas

Steven A.

Bishop

Nicholas J.

. 2012. “A Dynamic Model of US Adolescents’ Smoking and Friendship Networks.” American Journal of Public Health 102(6):e12–8.

86.

Schaefer

David R.

Kornienko

Olga

Fox

Andrew M.

. 2011. “Misery Does Not Love Company: Network Selection Mechanisms and Depression Homophily.” American Sociological Review 76(5):764–85.

87.

Shakya

Holly B.

Domingue

Ben

Nagata

Jason M.

Cislaghi

Beniamino

Weber

Ann

Darmstadt

Gary L.

. 2019. “Adolescent Gender Norms and Adult Health Outcomes in the USA: A Prospective Cohort Study.” The Lancet Child & Adolescent Health 3(8):529–38.

88.

Smith

Sanne

McFarland

Daniel A.

Van Tubergen

Frank

Maas

Ineke

. 2016. “Ethnic Composition and Friendship Segregation: Differential Effects for Adolescent Natives and Immigrants.” American Journal of Sociology 121(4):1223–72.

89.

Smith

Jeffrey A.

McPherson

Miller

Smith-Lovin

Lynn

. 2014. “Social Distance in the United States: Sex, Race, Religion, Age, and Education Homophily among Confidants, 1985 to 2004.” American Sociological Review 79(3):432–56.

90.

Snijders

Tom A. B.

2011. “Statistical Models for Social Networks.” Annual Review of Sociology 37(1):131–53.

91.

Snijders

Tom AB

Baerveldt

Chris

. 2003. “A Multilevel Network Study of the Effects of Delinquent Behavior on Friendship Evolution.” Journal of Mathematical Sociology 27(2-3):123–51.

92.

Snijders

Tom A. B.

Pattison

Philippa E.

Robins

Garry L.

Handcock

Mark S.

. 2006. “New Specifications for Exponential Random Graph Models.” Sociological Methodology 36(1):99–153.

93.

Snijders

Tom A. B.

van de Bunt

Gerhard G.

Steglich

Christian E. G.

. 2010. “Introduction to Stochastic Actor-Based Models for Network Dynamics.” Social Networks 32(1):44–60.

94.

Richard Jean

Long

Hoyt

Zhu

Yuancheng

. 2019. “Race, Writing, and Computation: Racial Difference and the Us Novel, 1880-2000.” Journal of Cultural Analytics 3(2):11057.

95.

Richard Jean

Roland

Edwin

. 2020. “Race and Distant Reading.” PMLA/Publications of the Modern Language Association of America 135(1):59–73.

96.

Thompson

Simon G.

Sharp

Stephen J.

. 1999. “Explaining Heterogeneity in Meta-Analysis: A Comparison of Methods.” Statistics in Medicine 18(20):2693–708.

97.

Tilly

Charles

. 2005. Identities, Boundaries, and Social Ties. Boulder, CO: Paradigm.

98.

Vaisey

Stephen.

2009. “Motivation and Justification: A Dual-Process Model of Culture in Action.” American Journal of Sociology 114(6):1675–715.

99.

Vaisey

Stephen

Lizardo

Omar

. 2010. “Can Cultural Worldviews Influence Network Composition?” Social Forces 88(4):1595–618.

100.

van Duijn

Marijtje A. J.

Gile

Krista J.

Handcock

Mark S.

. 2009. “A Framework for the Comparison of Maximum Pseudo-Likelihood and Maximum Likelihood Estimation of Exponential Family Random Graph Models.” Social Networks 31(1):52–62.

101.

Villalta

Sara Ivethe

. 2022. “Masked Intersectional Inequalities Among Adolescents: Skin Tone Measurement, Skin Color Homophily in Adolescent Friendship Networks, and Skin Color Stratification in Educational Contexts.” Ph.D. Dissertation, University of California, Irvine.

102.

Voyer

Andrea

Kline

Zachary D.

Danton

Madison

Volkova

Tatiana

. 2022. “From Strange to Normal: Computational Approaches to Examining Immigrant Incorporation through Shifts in the Mainstream.” Sociological Methods & Research 51(4):1540-1579.

103.

Wasserman

Faust

Granovetter

Iacobucci

. 1994. Social Network Analysis: Methods and Applications. Cambridge, UK: Cambridge University Press.

104.

Weber

Ann M.

Cislaghi

Beniamino

Meausoone

Valerie

Abdalla

Safa

Mejía-Guevara

Iván

Loftus

Pooja

Hallgren

Emma

Seff

Ilana

Stark

Lindsay

Victora

Cesar G.

Buffarini

Romina

Barros

Aluísio J. D.

Domingue

Benjamin W.

Bhushan

Devika

Gupta

Ribhav

Nagata

Jason M.

Shakya

Holly B.

Richter

Linda M.

Norris

Shane A.

Ngo

Thoai D.

Chae

Sophia

Haberland

Nicole

McCarthy

Katharine

Cullen

Mark R.

Darmstadt

Gary L.

Darmstadt

Gary L.

Greene

Margaret Eleanor

Hawkes

Sarah

Heise

Lori

Henry

Sarah

Heymann

Jody

Klugman

Jeni

Levine

Ruth

Raj

Anita

Gupta

Geeta Rao

. 2019. “Gender Norms and Health: Insights from Global Survey Data.” The Lancet 393(10189):2455–68.

105.

West

Candace

Zimmerman

Don H.

. 1987. “Doing Gender.” Gender & Society 1(2):125–51.

106.

Westbrook

Laurel

Saperstein

Aliya

. 2015. “New Categories Are Not Enough: Rethinking the Measurement of Sex and Gender in Social Surveys.” Gender & Society 29(4):534–60.

107.

Wimmer

Andreas

Lewis

Kevin

. 2010. “Beyond and Below Racial Homophily: ERGMs of a Friendship Network Documented on Facebook.” American Journal of Sociology 116(2):583–642.

108.

Yavorsky

Jill E.

Buchmann

Claudia

. 2019. “Gender Typicality and Academic Achievement among American High School Students.” Sociological Science 6(25):661–83.

109.

Zhou

Di.

2022. “The Elements of Cultural Power: Novelty, Emotion, Status, and Cultural Capital.” American Sociological Review 87(5):750–81.