Sage Journals: Discover world-class research

Abstract

Status is central to understanding collaborative behavior, yet it is often difficult to measure in cultural fields where perceived standings are only partially observable. This study develops a scalable supervised machine learning approach to infer directed deference in collaboration networks using a partially observed status hierarchy derived from a ritualized site of status conferral (a televised competition series). Drawing on a longitudinal “featuring” network of more than 3,000 South Korean hip-hop artists, we train a classifier to learn how differences in status-relevant characteristics map onto observed deference patterns and then use it to estimate preferential attachment across all collaboration dyads. The resulting measure aligns closely with external expert assessments of artists’ relative standing. Applying this metric to streaming performance data, we show that collaboration improves listener engagement and that its effect varies nonlinearly with status distance: artists benefit both from partnering with higher-status collaborators and from featuring emerging talents.

Keywords

status preferential attachment collaboration networks audience reception machine learning

Introduction

Ever since Max Weber conceptualized status as originating from social evaluations, a fundamental sociological insight has been that actors are differently positioned in the social structure and that rewards are largely a function of status position (Gould 2002; Simmel 1950; Weber 1922). Status accumulates through acts of deference among market participants and is typically measured by an actor's affiliation with credited sources or third-party standings—a perceived social construct correlated with but distinct from intrinsic quality (Podolny 1993; Sauder, Lynn, and Podolny 2012). Empirical research has established the unique role of status in shaping perceptions of the actor and thus the flow of payments, resources, and opportunities available to them (Benjamin and Podolny 1999; Ertug and Castellucci 2013; Kovács and Sharkey 2014; Podolny 1993; Podolny and Phillips 1996; Rossman, Esparza, and Bonacich 2010; Rossman and Schilke 2014; Stuart 1998). The basic argument is that status functions as a heuristic for audiences to assess quickly the quality of products and behaviors and as a warranty for actors to reduce coordination costs in forming exchange relations with other partners.

Yet, status remains difficult to measure, particularly in fields without solid formal rankings, affiliations, or award systems. In such contexts, researchers often lack a full sampling frame for perceived relative standings, making it difficult to observe status hierarchies and identify deference. This article addresses a core methodological challenge: when status is partially observed, how can researchers quantify directed deference among collaborators? Work in the sociology of science, for example, has shown that junior scholars benefit from the prestige of senior collaborators or advisors through co-authorship and academic lineage (Azoulay, Graff Zivin, and Wang 2010; Li and Agha 2015; Maliniak, Powers, and Walter 2013; Mcdowell and Smith 1992; Merton 1968; Zuckerman 1977), demonstrating that status can be relationally transmitted and strategically leveraged. However, these studies typically rely on well-documented institutional hierarchies and formalized career paths. In contrast, cultural producers operate in informal networks where collaboration itself may signal deference, endorsement, or aspiration, but where status positions are harder to observe.

We propose a supervised machine learning (ML) strategy for recovering status hierarchies and measuring directed deference in a collaboration network. Our empirical setting is the South Korean hip-hop scene, where collaboration—typically in the form of “featuring” other artists on tracks—is widespread, and artists’ performance is clearly indexed as collaborative or individual. These featuring ties often reflect implicit status judgments, as artists selectively invite collaborators whose perceived standing may enhance or complement their own. To capture these dynamics, we leverage a partially observed status hierarchy derived from a long-running televised hip-hop competition structure—a ritualized site of status conferral that has played a central role in shaping public recognition and career trajectories in the field. Even with a partial status hierarchy, ML can help learn actors’ differences in quality and credential covariates—strong correlates with status positions—across perceived (labeled) status levels. Once trained with balanced data, the model can be used to predict preferential attachment in featuring dyads whose characteristics were likewise observed, but status levels were not perceived.

Our research design, combined with fine-grained two-year longitudinal data from the country's largest music streaming platform, allows us to theorize preferential attachment in collaborative cultural production and test its performance consequences. We find that our preferential-attachment measure, scaled from status conferrals qualitatively observed in a competitive media format, closely corresponds to external expert ratings from rappers, critics, and fans collected through a status position survey. Using this novel metric, this study tests whether collaboration enhances artists’ ability to attract user likes and how this effect varies with the degree of preferential attachment. Contrary to the usual expectation that artists benefit from piggybacking a higher-status collaborator's fame, we find a U-shaped effect of preferential attachment on performance: not only featuring a higher-status artist but “picking up” emerging talents elicits positive listener ratings. Our scalable measurement approach—combining digital trace data, inference from influential media-based status conferrals, and collaboration network—offers a framework for revisiting the problems of organizational and cultural questions about strategic collaboration that have been constrained by traditional data limitations.

Theoretical Background

How Status Distinctions Have Been Identified in Research

Status is broadly understood as the position in a social hierarchy resulting from accumulated acts of deference (Goode 1978; Podolny and Phillips 1996; Sauder et al. 2012; Whyte 1943). In this respect, status can be understood as the stock corresponding to the flow of deference. Joel Podolny's status-based model of market competition argues that a producer's status shapes its competitive advantages and constraints (Podolny 1993). The model conceptualizes status as a market signal. That is, status reflects how market participants perceive a producer's offerings, with varying levels of deference attached to competitors, rather than an objective measure of product quality.

What are the acts of deference that give rise to status distinctions? Actors rely on affiliation and arbiters. First, actors acquire status by affiliating with credentials of status-valued resources. Associations and affiliations are observable indicators of attachment or “gestures of approval.” For instance, Benjamin and Podolny (1999) measure status by investigating how California wineries affiliate with appellations. By deferring to a particular appellation, wineries signal that their wine is of a quality consistent with the wine of other producers who also affiliate with that same appellation. A winery that continually affiliates with high-status regions will have higher status than a winery that persistently affiliates with low-status regions. They measure status ordering by Bonacich's centrality of cross-regional citations, which indicates an appellation's status position relative to that of other appellations. This cross-regional citation behavior is claimed to be correlated with, yet distinct from, wineries’ quality measured by expert ratings from magazines (Benjamin and Podolny 1999). Likewise, scholars have focused on affiliations manifested in exchange relations, including patent citations in the semiconductor industry (Podolny and Stuart 1995; Stuart 1998), painters’ reputations inferred by the centrality of art galleries in the co-exhibition network (Fraiberger et al. 2018), and PhD exchanges among academic departments (Burris 2004). Researchers often leverage status-ordered documents, such as investment banks in the tombstone advertisements of security offerings (Podolny 1993, 1994; Podolny and Phillips 1996), and star powers measured by centrality in the network of screen credits (Rossman et al. 2010).

Second, actors’ deference is often granted by third-party judgments. Arbiters making critique, review, or quality evaluations are usually outsiders to the status hierarchy of focal actors; for example, restaurants rely on Michelin, and universities on the QS ranking, and so on. An influential study by Rao (1994) examines how certification contests shape organizational status, demonstrating that third-party assessments—such as Moody's ratings for insurance firms, Michelin and AAA rankings for restaurants, and J.D. Power's evaluations of automobiles—play a pivotal role in shaping public perceptions. These external evaluations influence how organizations are positioned in status hierarchies, ultimately affecting their likelihood of survival in competitive markets. These forms of external arbiters often materialize deference by the conferral of awards. Scholars have captured the effect of status distinctions by examining prestigious book prizes (Kovács and Sharkey 2014), the Oscar awards in the film industry (Perretti and Negro 2006; Rossman et al. 2010; Rossman and Schilke 2014), Grammy awards and nominations (McMillan 2022; Negro, Kovács, and Carroll 2022), and awards of museums and galleries in the contemporary art field (Ertug et al. 2016).

We extend this line of research by addressing two lacunae. The first concerns empirical scope. How can we identify the acts of deference (and thereby status hierarchy) when sources of status are not fully available for all actors? When status is measured by affiliations, the empirical scope was limited to special cases where a “sampling frame” is known—typically, “written” directories or credentials from a population of organizations. But this is not often the case, particularly when we scale down the analytic level from organizations to individuals. Entrepreneurs do not necessarily register themselves on a list, and their behaviors seen as deference are relatively more implicit and unobservable. When status is measured by deference granted by arbiters, external evaluation systems, or award systems are inherently left-censored—selective towards successful actors, especially in a winner-take-all market. Further, award winners in popular culture are too few—the discrete classification of awardees and non-awardees might mask the gradation of status differences between actors, not to mention that the standards for awards sometimes decouple from standards of fame in the public (e.g., aesthetics vs. popularity).

The second research gap pertains to actors’ status-building processes by direct collaboration. While prior studies have documented how actors cite or affiliate with status-valued resources (Stuart 1998), and how team composition affects recognition (Rossman et al. 2010), these approaches often treat collaboration as a proxy for deference rather than a mechanism of status transfer in its own right. Yet collaboration can itself be a strategic act of deference—where the joint product signals endorsement, competence, or prestige. Co-authorship in academia exemplifies this: junior scholars gain visibility and credibility by working with elite mentors (Merton 1968; Zuckerman 1977), and their scholarly impact often depends on the status of their collaborators (Maliniak et al. 2013; Mcdowell and Smith 1992). The death of a prominent scientist has been shown to reduce the productivity and citation impact of junior collaborators (Azoulay et al. 2010), while co-authorship with high-status figures increases the likelihood of grant success and publication (Li and Agha 2015). These studies demonstrate that status can be relationally transmitted and strategically leveraged in institutionalized academic settings. However, less attention has been paid to how similar processes unfold in cultural fields where status is informal, fluid, and negotiated through collaborative outcomes. Our study extends this literature by examining how status hierarchies are enacted and reproduced through collaboration in contexts where formal markers are absent, and deference must be inferred from relational patterns.

An ML Application for Dyadic Deference in Collaboration

We propose applying supervised ML to address the problems of limited observability and collaboration as acts of deference (Figure 1). This research design presumes that researchers partially observed a status hierarchy from a representative arbiter and that an actor, i, invites another, j, to co-perform, moving across the ordered levels of status (Figure 1A). The key empirical focus rests on the direction of the deference dyad created by the invite on a co-product: upward or downward (Y_i). Y_i is straightforward for the actors, i^L, whose status is labeled in the external arbiter, but is not straightforward for other actors, i^U, whose status is not labeled that way. Without the ML research design, one might simply rely on observable proxies, such as quality indicators and differences between actors, and label i^U that way. However, this approach would overlook the essence of status as social evaluations, which are somewhat distinct from quality and require a certain degree of consensus pronounced in the field.

Figure 1.

Research design applying supervised machine learning to classify the direction of dyadic deference in collaboration from a partially observed status hierarchy.

A core innovation is to leverage supervised machine learning to recover the direction of deference of a collaboration tie for i^U by learning how the direction of deference was determined as a function of the difference in quality and credential covariates, X , between all i^L and j^L. More intuitively, if we know the quality and credential characteristics for the population of actors (both i^L and i^U) and that quality correlates with status, as the literature suggests, why not predict the direction of deference based on the learned information on how much of a difference per quality covariates of the dyad should exist to be classified either upward or backward? In other words, we can amplify researcher coding on deference in collaboration (Lundberg, Brand, and Jeon 2022).

One can use ML algorithms such as random forests to train and obtain the most accurate model, $\hat{f}$ (X_i^L, X_j^L), predicting Y_i (Figure 1B). For co-products whose direction of deference is not labeled, the trained model $\hat{f}$ can be used to estimate and classify i^U's deference behavior based on X_i^U and X_j^U (Figure 1C). It can be represented by the probability $\hat{p}$ _l of the soft voting process in the random forest algorithm. The probability representation allows us to capture cases where actors with similar statuses collaborate—hard to classify into the binary Y_i (i.e., $\hat{p}$ _l ≈ .5). This approach is valid if the training set and test set share similar distributions of covariates (Weiss, Khoshgoftaar, and Wang 2016).

The predicted deference in collaboration captures a new dimension of the behavioral effects of status. We use the term “preferential attachment” to conceptualize how status shapes deference in collaboration. Preferential attachment in network science suggests that the more connected a node is, the more likely it is to receive new links (Barabási and Albert 1999). For instance, when choosing between two webpages, one with twice as many links as the other, about twice as many people link to the more connected page. Likewise, actors are more likely to invite and collaborate with those positioned higher in the status system because of their visibility. High-status actors would often be chosen as referents with more connections with peers and are subsequently preferred by these peers to collaborate again—a relational process that leads to accumulated advantage or the Matthew effect (DiPrete and Eirich 2006; Merton 1968).

Our claim is that preferential attachment in collaboration creates advantages that are distinct from the advantages rooted in one's own status position. While status effect means that individual position occupied in the status hierarchy leads to differential returns on performance (Baum and Oliver 1992; Ertug and Castellucci 2013; Podolny 1993; Podolny, Stuart, and Hannan 1996; Rossman et al. 2010; Stuart 1998; Stuart and Ding 2006), preferential attachment effect suggests that gains pertaining to relative status in a collaboration tie lead to differential returns on performance. Even with the same status position and comparable opportunities to collaborate, the direction of deference (collaborator's relative status) can have different consequences on actors’ performance because audiences may perceive such deference patterns as signals enhancing or undermining upon the actor's status position. Ties can provide informational cues to audiences about the underlying quality of one or both of the market actors (Podolny 2001).

With regard to the preferential attachment effect, we test two contrasting hypotheses. The first hypothesis is that preferential attachment will positively affect the actor's rewards in the market (H1). A primary mechanism is signaling: ties to high-status actors can serve as signals of quality or identity, especially when such attributes are difficult to assess (Spence 1973). A lower-status actor engaging a more established one on his or her product may signal to audiences that the quality of the actor is good enough for the (relatively) higher-status actor to collaborate with willing endorsement. Such an “upward” tie may attract an existing market base for the invited actor, leading to greater returns for the focal actor. Thus, the farther an actor climbs up the status ladder by collaborating with a higher-status actor, the greater the advantages in performance. Contrarily, downward-status ties can potentially harm the reputation of the focal actor. Firm-level studies reported the fear of associating with less prominent firms (Podolny 2008) and its disadvantages found among university sports teams associated with lower-status teams (Washington and Zajac 2005).

The second hypothesis is that preferential attachment will manifest as a U-shaped relationship with the actor's rewards in the market (H2). A primary mechanism is visibility, which status distinctions potentially create. Social network research has long observed that similar actors tend to interact (Blau 1977; Homans 1950; McPherson, Smith-Lovin, and Cook 2001). Status homophily likely drives who collaborates with whom, as that helps reduce risks and costs, which makes the combination of high- and low-status actors unusual. How would this unusual combination be perceived by audiences? Note that, in H1, we characterize downward status ties as harmful to one's own status. However, it is also possible that audiences interpret the unusual collaboration as actors “picking up” newly entered collaborators whose quality is comparable but not sufficiently exposed to them. This can occur when parts of the field's status hierarchy are fluid rather than fixed, partly the reason why new star actors can sometimes emerge because audiences are open to a certain degree. Thus, H2 is different from H1 in that there will be advantages at both ends of the scale of preferential attachment. In other words, the collaboration will help the least when the direction of deference is ambiguous.

Empirical Context

We implement the ML deference classifier and test preferential attachment effect using a hip-hop collaboration network in South Korea. The nature of hip-hop collaboration provides a unique opportunity to assess the deference relationship in a co-product wherein actors invite or are invited by alters and its effect on audiences’ responses. Hip-hop artists do not write or perform music in social isolation. Instead, the genre's proliferation has come with both informal and formal networks of support, rivalry, and collaboration (Gienapp, Kruckenberg, and Burghardt 2021; Halgin, Borgatti, and Huang 2020; McMillan 2022; Smith 2006). Hip-hop artists often publicly feature others on their songs—it is quite common to see “feat.” in the song titles of hip-hop music tracks. For instance, Dr. Dre's full-length album “2001,” released in 1999, had its 21 out of the total 22 tracks feature other artists. The variety of the featuring rappers adds different tastes to songs in the same album, so much so that Dr. Dre increases the overall quality of the album and perhaps effectively holds listeners playing the large 22 tracks without boredom. Still, audiences clearly recognize that the cultural product (<2001>) belongs to an individual artist (Dr. Dre) and that he invited other artists to collaborate. It is also likely that audiences conject the social relationships between Dr. Dre and the featured artists.

The featuring network of hip-hop music raises interesting sociological questions. For young artists, the opportunity to feature on the famous Dr. Dre's album would mean making a name for themselves, thereby jumpstarting one's career, and rising to commercial prominence. Dr. Dre might have involved a more established artist in his tracks as acts of deference and perhaps have performed better than otherwise sang alone. Artists with no fame, like Dr. Dre, would try to release tracks that feature already-established artists—preferential attachment—and lace a “hit.” All these speculations involve an implicit status hierarchy that actors in the field commonly have in mind. Our theoretical hypothesis above now turns into an empirical question: How do artists’ status positions shape who features whom in collaboration, and how does preferential attachment in collaboration shape differences in audience reception?

While it is always challenging for researchers to identify hierarchical status positions, the hip-hop scene of Korea¹ provides an analytical merit to undertake that task. We leverage the archive of a widely recognized TV audition program series (“Show Me The Money,” SMTM hereafter). A competition reality show open to the public (on average 25K applicants), SMTM has grown in popularity since the first season aired in 2012 to the eleventh season in 2022, and it is credited for increasing the South Korean public's interest in hip-hop. The qualifiers of an SMTM are given opportunities to work with their team producers/judges and release songs, many of which hit widely with a “chart-in.” Most independent hip-hop artists deem SMTM a primary entry to the music market. The program's judges in each season (about eight artists) are carefully selected and recruited by the program's staff, based on the fame and deference that fans and fellow artists have for artists. The judges hold utter authority to qualify applicants and produce individual or collaborative tracks for them. The qualifiers eventually make themselves a name for the public audience and typically live with that honor down the road.

While the format of SMTM slightly varies from season to season, contestants on the show compete in different challenges through several knock-out rounds until only one rapper is left—the winner is awarded ₩100,000,000 (≈ $72,487). A season of SMTM begins by announcing (typically) four teams of multiple “producers” cast by the MNET broadcasting team (Figure 2A). The producers are experienced, popular rappers who already achieved significant success and are thus able to assess and offer advice to participants. They play an “examiner” role in screening out thousands of participants performing a cappella in the first round held in a large indoor arena. They then exercise utter authority to advance or eliminate contestants based on solo rap performances or rap battles in the second and third rounds.

Figure 2.

Competition settings of Show Me The Money (SMTM).

To help understand the competition structure, we provide some scenes of the solo-rap contest in Figure 2B and C. A contestant (the one wearing a white shirt in the middle of the circled stage) raps with a self-chosen beat in a one-minute span. The screen below the producers’ seating initially displays “PASS” but can turn to “FAIL” if they decide to reject the contestant. If the contestant ends up finishing it with at least one PASS, then he or she advances to the next round. It partly resembles a journal review process in which the paper under review is rejected if, in this case, all four reviewers give a “fail.” As soon as the four FAIL signs pop up, the beat stops immediately, and the unsuccessful contestant will be surrounded by flames (while rapping) and pulled down to the bottom floor (Figure 2C).

The setting of the program is sociologically interesting as it appears to be a spatial (vertical) representation of status-generating or status-reinforcing processes. Indeed, contestants often express respect and honor to producers while answering questions before performing, although such talks might not factor in for the judgment of producers (they want to be objective and opt out when the contestant belongs to the same crew/label as them). We contend that actors and audiences witnessing all these processes may perceive and embody a status hierarchy between producers (judges), qualifiers, and the unsuccessful.

Team selection is a central feature of the show. Producers recruit team members from participants who have passed multiple preliminary competition rounds, including the aforementioned one-minute solo rapping contest. This stage of team selection often serves as a ritual where judges give deference to participants—sometimes even competing with one another and pleading to secure them for their teams. In this way, selected participants are clearly distinguished from other unqualified ones. Selected team members enjoy several privileges, such as access to catchy beats, top-tier production for their SMTM tracks on their competition stages onward, and performance opportunities with extensive media coverage throughout the competition. Consequently, we define the Qualifier as the subgroup of participants chosen to join a producer's team.

The Qualifiers then start producing songs under the guidance of or in collaboration with their producers. The quality of songs and artists’ live performances—assessed by a diverse set of judges including an “external” group of rappers, university hip-hop associations, or general public audiences—are a major factor in advancing to further stages until the final stage. These songs are released on the music streaming platforms “on the fly” as a season of SMTM progresses.² The popular show attracts much larger viewers than the usual fan base of hip-hop music; therefore, the Qualifiers can make a name for the public and enjoy (otherwise impossible) high returns with some songs that hit widely with a “chart-in.” That is the main incentive for many “underground” rappers to deem the show as the field's standard (recognition from rappers with authority) and participate in the show.

We conceptualize SMTM's competition structure as a site of ritualized status conferral, where symbolic acts of deference, judgment, and selection publicly enact and reinforce hierarchical distinctions in the Korean hip-hop field. Drawing on Goffman's notion of interaction rituals (Goffman 1967) and Collins’ theory of ritual chains (Collins 2004), we treat these performances not as isolated gestures but as components of a patterned social process through which recognition, symbolic capital, and emotional energy accumulate. As Collins writes, “successful rituals generate emotional energy in the form of confidence, enthusiasm, and initiative” (Collins 2004:48)—repeated symbolic interactions such as public praise, deference, and performance validation contribute to the visible concentration of status among contestants and producers. The flaming descent of rejected contestants, the vertical spatial arrangement of producers and performers, and the emotionally charged team selection scenes all serve as ritual mechanisms that encode and reproduce status hierarchies. These field-native status signals offer a culturally grounded alternative to conventional status proxies, enabling scalable inference while remaining embedded in the symbolic logic of the field.

The decade-long history of SMTM is then a partially observed but credible arbiter for us—the Judges, Qualifiers, and Unsuccessful participants of SMTM together comprise the three levels of a status hierarchy. With its advantages over other short-lived spin-off competition programs (see Table S1 in Supplemental Material 1.1), the SMTM archive will serve as a main status source for us to implement the ML workflow for deference in collaboration.

Recovering a stratified deference structure with the use of the field's major competition allows us to consider both status and quality at scale. There are a handful of sociological studies that examined status and collaboration in the hip-hop industry. But their status measurements rely on past demonstrations of quality, such as historical album sales (Halgin et al. 2020) or rare achievements like a Grammy (McMillan 2022). These studies are also limited in the coverage of samples, focusing on a subset of rappers who have released “diss songs” (Halgin et al. 2020) or elite rappers who made Billboard's year-end “Hot 100” R&B/hip hop charts (McMillan 2022). This article advances this body of research by leveraging digital trace data (a music streaming platform) to collect longitudinally the performance of songs produced by a population of hip-hop artists and thus examine status dynamics at positions and career stages as general as possible.

Methods

This study devises a novel research design that recovers deference relationships in a collaboration network of individual cultural producers. Achieving this aim is relevant for many popular culture fields since status determinants conventionally used (organizational affiliation and awards) are typically either unobserved or established only for elites in the winner-take-all market (thereby, most of the actors are “unlabeled”).

Data Sources for Artists’ Quality and Performance

The main data source will be Korean hip-hop artists’ longitudinal performances and the metadata of their songs from the largest music platform in Korea—Melon (https://www.melon.com). As a South Korean online music store and music streaming service introduced in 2004, Melon defeated Apple Music, Spotify, and YouTube Music to become the most famous music platform in Korea and achieved success with over 28 million users. Melon offers a tab in which users can navigate new domestic releases by the album's genres (ballad, dance, rap/hip-hop, R&B/soul, indie, rock/metal, and trot). The genre is chosen by the artist. To include the population of active, current hip-hop artists as completely as possible beyond a paucity of elites, we first created a list of Korean hip-hop artists who have released any song in the rap/hip-hop tab in the last two years (5,862 artists). In the next two-year period from mid-2021 to mid-2023, we conducted weekly web-scraping on these hip-hop artists’ Melon pages that include past and updated information about their released songs. Our focal dataset includes 3,694 artists who have released at least one new song since the beginning of our data collection and thus are eligible for analysis.

The main empirical interest of our study lies in examining how artists of varying quality perform in terms of audience recognition on the Melon streaming platform. The platform's core indicator of audience recognition is the frequency that listeners liked one's song(s): the “like” counts³ displayed for each song on Melon. The like button is only clickable one time (or cancel) by logged-in users (who have verified their identity).⁴ Our weekly scraping tracked down the like counts of both “old” and “new” songs released before and after the beginning of our study. We use this indicator to measure artists’ performance and quality. Our main outcome, artist-level performance, is constructed by the like counts over the new songs (released during our two-year study period) using different time windows since released. Fixing time windows enables evaluating the recognition of songs produced by artists of varying quality within the same period. This strategy also helps us address the irregular release schedules of artists during the two-year study period (e.g., the as-of-now performance of a song released just a week ago is not comparable to that of the another song released a year ago) and consider the fact that listeners’ attention to released songs decays over time.

While performance corresponds to the extent of immediate success of an artist's current cultural product (i.e., current recognition flow), we deem artist's quality as past recognition stock, assuming that people cognitively retrieve an artist's track record cumulatively when evaluating their overall caliber. We measure quality as the cumulative like counts that the artist received for all the old songs with no early bound before the study period. Benjamin and Podolny (1999) similarly measured the winery's quality using the past performance of the wines (ratings). In the “Streaming” era, metrics from platforms have become key indicators of not just an artist's reach but also their creative quality, signifying how deeply an artist's work resonates with audiences. For example, magazines or news articles often highlight Spotify streaming counts to emphasize an artist's success or cultural impact. A notable instance is when publications reported on Billie Eilish's rise to fame, frequently citing her Spotify milestones (of course, “cumulative” ones), such as surpassing billions of streams for her hit “bad guy.”⁵

We also collected the metadata of more than 40,000 albums and that of more than 200,000 songs from Melon and K Hip Hop Wiki (https://khiphop.fandom.com/). The meta information includes active years, the total number of released songs and albums, crew membership, label information, award history, and SMTM participation. The K Hip Hop Wiki provides rich information regarding the history of SMTM participation. We supplement this information by hand-coding upon the information extensively recorded on the SMTM pages of a Korean wiki source (Namuwiki⁶). In Melon, artists and songs are well indexed by a unique identifier that Melon assigns. Those identifiers are publicly accessible by researchers. This indexed nature allows us to complete the key task of identifying collaborated songs and, precisely, the connection of musicians by the “featuring” behavior.

An ML Classifier for Dyadic Deference in Collaboration

A key methodological challenge is to tell the direction of deference among artists who collaborate on a song. As illustrated above, we devise an ML classifier that learns from an observed arbiter of the field's status hierarchies and predicts individuals’ status differences in the population's collaboration dyads. That is, given a collaboration dyad of artists whose deference direction cutting across perceived status levels of the competition show is known (labeled), one can train a model to learn the extent to which differences in quality and credential characteristics in the dyad should exist to classify it as upward or downward. This model can then be used to predict deference direction for collaboration dyads whose quality and credential characteristics were likewise observed, but status levels were not perceived (unlabeled).

Figure 3 presents the overall architecture of deference predictions between labeled and unlabeled data. The supervised ML design begins with the construction of pairs (i, j) of artists from collaborative songs, alongside their individual quality covariates, as the primary input. Deference direction (outcome labels/target outputs Y_i for D^l) is labeled based on the ternary status levels of a decade-long archive of the annual competition show (SMTM). Model training utilizes a random forest algorithm to learn the labeled deference direction, Y_i, as a function of the dyadic covariate difference, C_i,j. The primary output is the predicted deference on the unlabeled artist pairs, p_i,k. Below, we describe these procedures in detail.

Figure 3.

The supervised Machine Learning (ML) architecture.

Labeling deference direction. The archival data for the history of SMTM participants gives us the field's status standings. The artist i has one of the three elements status_i^l = {judge, qualifier, unsuccessful| judge > qualifier > unsuccessful}, where 18, 161, and 226 artists were identified, respectively (N = 405). Unlike other signing reality competition TV series (e.g., The Voice) that aim to find unsigned talent from public auditions, it is rare to find (unknown) amateurs among the SMTM qualifiers due to the intensity and difficulties in prior competition rounds (i.e., they are filtered out). This is the reason why we were able to identify almost all SMTM qualifiers on Melon and retrieve their past and current quality records. We were also able to identify a good number of unsuccessful participants in recent seasons because SMTM required applicants to post their intentions on social media, tagging “SMTM.” We acknowledge that this feature of the show may have led us to pick up relatively more contemporary artists in the unsuccessful category. The labeled data D^l contained 405 artists’ collaborated songs. We excluded competition tracks from the show (i.e., songs released as part of SMTM episodes) in the training set since the collaborations in these instances are orchestrated by the program, rather than reflecting genuine artistic deference.

Given a featured artist j in D^l in i's song k song_i,k∈Song_i, the combination of status elements by status_i and status_j assigns a discrete indicator of i's direction of deference

Y_{i}^{s o n g_{k}} = {\begin{matrix} u p w a r d, s t a t u s_{i} < s t a t u s_{j} \\ d o w n w a r d, s t a t u s_{i} > s t a t u s_{j} \end{matrix} .

(1)

We label the former as preferential attachment (PA) and the latter as pick-up (PU). The collaborated songs across equivalent status categories (judge, judge), (qualifier, qualifier), and (unsuccessful, unsuccessful) are not treated as “status homophily” and excluded from the training set because whether status_i are status_j are equivalent was not a part of the pronounced judgments of the external arbiter (SMTM)—resulting in a total of 777 songs (n_pa = 457, n_pu = 320) in D^l. Labeling equivalent deference would reduce accuracy and external validity (see additional reasoning and performance analysis in Supplement Material 1.5). The unlabeled data D^u is the complement set of D^l—24,009 collaborative songs in which at least one of the constituent artists was not recognizable in terms of the field's status standing (i.e., non-SMTM participants). We offer a concrete example of how we label collaborative songs according to status levels identified in SMTM archives in Figure S1 in Supplement Material 1.2.

We do not further differentiate status levels within the qualifiers (recruited SMTM team members). Given the program's structure, competition rounds after team formation often rely on non-individual assessments—typically team versus team—driven by entertainment needs. As a result, evaluating individual distinctions based on team performance may not be entirely accurate. While ultimate season winners across eleven seasons could be considered distinct from the rest of the qualifiers, they form a very small subset of six; some winners later served as Judges in subsequent seasons and are already categorized as such in our data. Consequently, even if the additional “Winner” level were introduced, only a small number of collaboration dyads would fall into that category, resulting in minimal impact. More substantively, winners and the rest of the qualifiers show only marginal differences in productivity, quality, and overall performance. This may reflect the fact that, while winners enjoy material and symbolic rewards in the moment, their overall airtime in the program is not significantly greater, given that the competition features many participants throughout its entirety. During an expert interview with a former SMTM star (Hanhae), he confessed that surviving until the team selection stage and releasing a track is the goal for many rappers, and that there is not much difference between achieving this and going further to be the ultimate winner of a season (see Supplemental Material 1.4).

Covariates and feature engineering. All artists, whether SMTM participants or not, have a column vector of quality covariates, X_i. These include:

Quality (past cumulative like counts): Quality covariate is the cumulative like counts that the artist received for all the old songs a week before a focal new song of an artist was released. Again, our assumption is that, in popular culture, a musician's quality is determined by the degree to which their songs were recognized and accepted by listeners.

Active years: Career years since the release of his or her first song.

Productivity: The number of new songs and albums released in the study period.

Award history: A discrete covariate indicating whether the artist has won an award.

Recognized crew membership: A discrete covariate indicating whether the artist has ever belonged to one or more crews recognized in K Hip Hop Wiki.

Recognized label/distributor company membership: A discrete covariate indicating whether the artist has ever been part of a label/distributor company recognized in K Hip Hop Wiki.

Top 100 songs: The number of songs that have appeared in the weekly Melon Top 100 chart since 2015, retrieved from a digital archive (https://guyso.me/).

We want to be assured that the hierarchical representation we derive from a series of SMTMs indeed reflects the difference in artists’ characteristics indicated by these covariates. In Figure S2 in Supplemental Material 1.3, we present the distribution of the covariates for 405 artists who were judges, qualifiers, and unsuccessful participants in SMTM. Generally, artists in higher status positions seem to be longer in active years in the field, well-received by listeners, more productive, and more achieving regarding awards.

Machine learning requires that the labeled data is identically distributed with the unlabeled data. As such, an important assumption of our ML application is that the characteristics of collaborations between artists who produced the collaboration songs in the labeled data are not different from those of the unlabeled data. The core quantity of our study is the dyadic covariate difference c_i,j = f(X_i, X_j) between artists i and j. Similarly to prior work (Huang et al. 2007; Pan et al. 2011), our feature engineering focused on seeking balanced scales among 24 different kinds of features that compute the absolute and relative differences and other necessary factors. In the end, we selected the ten most balanced features that helped us achieve the highest predictive accuracy (see formulas and scaling details in Supplemental Material 1.6). We discuss covariate balance later in the Results section.

When more than one artist was invited to feature (e.g., Main Artist (i) – Song Title [feat. Artist A, Artist B, Artist C]), we create multiple dyads between i and artists A, B, and C. Then, we label Y_i (SMTM collaboration dyads) or predict the degree of preferential attachment, $\hat{p}$ _l,k , for each of the three dyads. By the concept of preferential attachment, we are theoretically interested in the overall upward or downward direction from the standpoint of the main artist. Averaging $\hat{p}$ _l,k from the three dyads then represents the overall tendency of preferential attachment of artist i. In addition, the ordering of A, B, and C is usually meaningless because artists’ convention is to put them in the order of appearance in a song (so that listeners easily follow who raps in which part), not the length or prestige alike. Hence, we do not weight these same-song dyads.⁷

Model training. The outcome variable for the supervised learning model is a binary label indicating the direction of deference in a collaboration dyad (Y_i^song_k): “preferential attachment” (PA) if the main artist features a higher-status artist, and “pick-up” (PU) if they feature a lower-status artist, comparing their ternary status levels in SMTM. Of the 777 labeled collaboration dyads, 457 (58.8%) were classified as PA and 320 (41.2%) as PU. We estimate a model Y_i^song_k = $\hat{f}$ (X_i, X_j) in D^l and use this model to predict the direction of deference in collaboration for D^u whose artists were not part of the audition program, but with observed X_i and X_j just like those in D^l. Following a common convention in supervised learning (Murphy 2012), we randomly split the labeled dataset of 777 dyads into 80 percent for training and 20 percent for testing. This split was performed at the dyad level, independent of class labels. The original class distribution—∼59 percent PA and 41 percent PU—was preserved across both sets. Table 1 presents the outcome distribution with a breakdown of training, test, and prediction sets.

Table 1.

Outcome Distributions for Supervised Learning.

	Collabo Songs		PA	PU
Labeled (D^l)	777 (3%)	Training	370 (81%)	251 (78%)
		Test	87 (19%)	69 (22%)
Unlabeled (D^u) (Prediction set)	24,009 (97%)
Total	24,786 (100%)		457 (100%)	320 (100%)

Abbreviations: PA = preferential attachment; PU = pick-up.

This supervised learning procedure is implemented with a random forest algorithm. $\hat{f}$ (X_i, X_j) is a set of estimated betas (β) on parameters for the dyadic difference in covariates between collaborators. A random forest is an ensemble learning method that aggregates multiple decision tree classifiers trained on different sub-samples of a dataset. It combines their predictions through a weighted average to enhance predictive accuracy and mitigate potential overfitting issues. Substantively, these estimated betas indicate, given a collaborated song dyad between SMTM participants who took different roles (e.g., a judge and qualifier), how much of a difference per individual quality and credential covariates (e.g., the difference in total like counts received in Melon between the judge and the qualifier) of the song dyad should exist to be classified either PA or PU behavior on average. This estimated model serves as a classifier on the prediction set.

We apply one-hot encoding to represent the dyadic differences in covariates when the focal covariate is discrete. These covariates include award history, prestigious crew membership, and prestigious label/management membership. To prevent overfitting, we tuned the maximum depth of the random forest classifier using a grid search over depths 5–8. Depth 7 yielded the highest accuracy and F1 score on the test set, indicating optimal generalization. We summarize the parameters used in our random forest algorithm in Table 2.

Table 2.

Parameters Used in a Random Forest Algorithm.

Parameters	Number/function
Number of estimators	100
Criterion	Gini impurity
Max depth	7
Min split samples	2

Deference prediction. The random forest algorithm consists of multiple decision trees, in which each decision tree makes a prediction based on part of the covariates of the input data, and the final prediction of the random forest algorithm is obtained by the voting (or averaging) on all decision trees’ predictions. To combine the predictions, soft voting calculates the average probability of each class and then declares the winner having the highest weighted probability. We use the output of soft voting to assign a probability representation, p_i,k, to the direction of deference from i to j of each collaboration song, song_i,k, of D^u. From this, we create our primary artist-level independent variable, preferential attachment:

P A_{i} = \frac{\sum_{k = 1}^{K} p_{i, k}}{K},

(2)

where K is the number of collaborated songs that i has released in the study period. PA_i indicates artist's average direction of deference in collaboration. This quantity gives a continuous scale that signifies the tendency of fully associating with a lower-status artist (PA_i = 0), a higher-status artist (PA_i = 1), or an artist of similar status (PA_i = 0.5) across Song_i.

To help understand the resulting outputs, we provide a concrete example of an artist's $P A$ score (see Table 3). Zico, a renowned rapper decorated with chart-topping hits since debuting in 2011,⁸ has produced six new songs during our study period. Our ML workflow outputs a preferential attachment score $\hat{p_{i, k}}$ for his three collaborative songs. $\hat{p_{i, k}}$ is drawn from the average probability of each class from soft-voting in the random forest algorithm, ranging from 0 (fully downward) to 1 (fully upward). Zico engaged three different artists: Zior Park, Homies, and Changmo. Zior Park is an early-career and emerging artist who gave birth to his first studio album in 2020. Zico's directed deference to Zior Park was estimated as downward as .118. Zico's preferential attachment scores for the ties to Homies and Changmo are relatively higher because the difference in features between Zico and these two artists is not as big as that between Zico and Zior Park. Our independent variable, $P A$ , takes the average of $\hat{p_{i, k}}$ estimated for one's new collaborative songs. Zico's $P A$ is .286, meaning he tended to feature artists whose status was relatively lower than his.

Table 3.

An Example of Preferential Attachment (PA) Scores Estimated for an Artist.

No.	Date	Artist	Song Title	$\hat{p_{i, k}}$	Likes (8 weeks)
1	July 27, 2022	ZICO	Nocturnal animals (Feat. Zior Park)	.118	3,329
2	September 06, 2022	ZICO	New Thing (Feat. Homies)	.341	122,688
3	July 27, 2022	ZICO	Trash Talk (Feat. CHANGMO)	.398	5,210
4	July 27, 2022	ZICO	Freak	–	17,866
5	July 27, 2022	ZICO	OMZ freestyle	–	2,421
6	June 08, 2023	ZICO	Smoke (Sleep Mix)	–	1,527
				$P A = .286$

Our weekly web-scraping produced time-stamped data, which helped us use the most updated information about artists when estimating $\hat{p_{i, k}}$ . Our ML workflow inputs artists’ characteristics a week before each song's release (the second column of Table 3), assuming that audiences are sensitive to artists’ current status positions and popularity that could be dynamic even during our study period. It is also interesting to note that, in Zico's case, collaborative songs are higher on average than sole-produced songs regarding like counts. This is consistent with the positive collaboration effect that we would find below.

ML Performance

Prediction performance. How accurately does the trained model predict the labeled deference of song dyads in the test set—the downward or upward deference across the three ordered elements of status (judges, qualifiers, and Unsuccessful of SMTM)? The model was trained on 621 dyads and evaluated on a held-out test set of 156 dyads. We report the accuracy metrics of different methods in Table 4. Without the research design devised in this study, one could just naïvely compare the quality characteristics (artists’ total like counts received) of a collaborative song dyad and classify the direction of deference, like in equation (1) (Naïve Method in Table 4). The naïve method uses one threshold, the difference between i and j of a song dyad in terms of logged quality, computed by minimizing the error rate in the training dataset. We have found that our supervised learning model (random forest) outperforms the naïve method in correctly predicting the labeled downward or upward deference of collaboration between the artists. It performs great with accuracy and F1 scores higher than .900. Alternatively, one can use the emergent clustering pattern by an unsupervised learning method (K-means in Table 4), but it underperforms the supervised ML.⁹ Random forest also outperforms the logistic regression, naïve Bayes, and SVM; compared to all of these five remaining methods, the random forest model reduces the error rate by nearly a half. Despite the natural imbalance of the 60/40 class ratio (PA vs. PU) in the labelled data, the classifier achieved high accuracy and balanced F1 scores, indicating robust performance across both classes.

Table 4.

Prediction Performance From Naïve Classification, Supervised, and Unsupervised Machine Learning (ML) Algorithms.

Method/Metrics	Random Forest	Naive Method	K Means	Logistic Regression	Naive Bayes	SVM
Confusion Matrix	$[\begin{matrix} 81 & 6 \\ 8 & 61 \end{matrix}]$	$[\begin{matrix} 76 & 11 \\ 13 & 56 \end{matrix}]$	$[\begin{matrix} 76 & 11 \\ 15 & 54 \end{matrix}]$	$[\begin{matrix} 77 & 10 \\ 17 & 52 \end{matrix}]$	$[\begin{matrix} 82 & 5 \\ 47 & 22 \end{matrix}]$	$[\begin{matrix} 81 & 6 \\ 20 & 49 \end{matrix}]$
Accuracy	0.9103	0.8462	0.8333	0.8269	0.6667	0.8333
F1 Score (Macro)	0.9088	0.8436	0.8300	0.8224	0.6088	0.8260

Note: Naïve method refers to a crude classification by quality (past cumulative like counts) between artists i and j on a collaboration song dyad. For the K-means method, we cluster the training dataset into two groups by their standardized features and then decide the belonging of the samples in the test set by the distance between the samples and the center of the two groups. For the support vector machine (SVM) classifier, we used a radial basis function (RBF) kernel, which is well-suited for non-linear classification tasks and tends to generalize better than polynomial or sigmoid kernels in high-dimensional settings.

Covariate importance. Given the estimated model, which covariates were crucial in classifying PA and PU of a collaboration song dyad? Covariate importance in tree-based models assesses the covariates chosen most often by the algorithm (a count of the proportion of splits on the variable of interest), yielding insight into the strongest determinants of generating the structure of the trees in the forest. In other words, by comparing the importance of the different covariates, we can understand which covariates are more crucial in deciding the status behaviors between artists of interest. The first column in Table 5 shows relative importance scores of the covariates we used. It is suggested that the difference between artists’ listener-evaluated quality (past cumulative number of likes), on various different scales, is most meaningful in recovering the deference direction in the featuring behavior. The dyadic differences stemming from prestigious membership (label, crew, and distributor company) appear to be of trivial importance.

Table 5.

Covariate Importance in the Prediction of Deference on Collaboration Song Dyads and Covariate Balance Test Results of Dyadic Difference c_i,j Between Labeled and Unlabeled Collaboration Pairs.

Feature	Importance	Diff.^a	Lower 95%	Upper 95%	Test Stat.^a	p-value	Effect Size^b	Magnitude^a
Quality (logged)	.208	0.320	0.178	0.463	4.403	<.001	0.085	Negligible
Quality	.173	26,288	−7,647	60,223	1.519	.129	0.027	Negligible
Quality (relative)	.156	0.006	−0.023	0.035	0.399	.690	0.009	Negligible
Quality (average)	.094	739	13.69	1465	1.997	.046	0.015	Negligible
Productivity	.073	0.017	−0.008	0.041	1.346	.178	0.026	Negligible
Active years	.061	−0.214	−0.393	−0.035	−2.342	.019	−0.019	Negligible
Top 100 songs	.071	−0.278	−0.517	−0.039	−2.281	.023	−0.047	Negligible
Recognized label	.023	0.111			123	<.001	0.069	Negligible
Recognized distributor	.005	0.226			522	<.001	0.142	Small
Recognized crew	.039	−0.215			611	<.001	0.153	Small
Award history	.098	−0.153			596	<.001	0.151	Small

t-test for numeric features and a chi-squared independence test for categorical features. Difference = Mean or proportion (labeled) – Mean or proportion (unlabeled). For the sake of simplicity of the proportion test, we use the proportion of concordant pairs (e.g., the main artist and featuring artist being both or neither award winners).

^b Beyond p-values indicating whether a difference exists, we evaluate the magnitude of the difference, using Hedge's g (a corrected version of Cohen's d to account for unequal sample sizes between two samples) for numeric features and Cramér's V for categorical features. Qualitatively judging magnitude is done with commonly referred thresholds (Hedge's g: .2 [small], .5 [medium], .8 [large]; Cramér's V: .1 [small], .3 [medium], .5 [large]).

Covariate balance. We find that labeled and unlabeled data in our study are sufficiently balanced to conduct training and prediction tasks. Dyads are the unit of our directed deference prediction. In Table 5, we confirm that the dyadic differences in terms of most quality features at varying scales are similar between collaborative song dyads among SMTM artists, D^l(training set + test set), and collaborative song dyads where any of the main or featured artists were not part of SMTMs, D^u (prediction set). We conducted t-tests for continuous covariates and chi-square tests for discrete covariates (see Table 5 and detailed formulas in Supplemental Material 1.6). For most quality covariates that were high on covariate importance, we did not find statistically significant differences or with a negligible-to-small magnitude of the difference between labeled and unlabeled sets. More substantively, the degree of disparity in quality one may expect from a collaboration song, for example, between an SMTM judge and an SMTM qualifier, also exists in a collaborative song between un- or partially-labeled artists. The dyadic differences seemed to exist for membership variables, but these played less important roles in labeling PA and PU. Our data also exhibit a great individual-level balance between SMTM and non-SMTM participants (see a discussion in Figure S3 in Supplemental Material 1.7).

Results

External Validation of the Preferential Attachment Measurement

Our ML model rates the extent of deference among a pair of artists that produced an unlabeled collaboration song (p_i,k for song_i,k, of D^u), which then leads to our artist-level preferential attachment score, PA_i. One should ask, How close is the ML-rated p_i,k to the status differences that would have been rated by humans? We validate our key measurement by having field experts rate a sampled pairs of collaborators and assessing the similarity between expert responses and p_i,k.

We used a stratified random sampling scheme encompassing four levels of quality strata to select 100 balanced and recognizable collaboration dyads in D^u (Table S4 and detailed description in Supplemental Material 2.1). The resulting PA scores of the sampled collaboration pairs were evenly distributed, as shown in Figure S4, Supplemental Material, allowing for external raters to assess a wide variety of deference cases. Then, we developed a “status position survey” (Figure S5, Supplemental Material). This survey presented the names of two artists (actual collaborators on a song) and posed the question: “Which of the two artists do you think is more popular than the other?” Respondents could choose one of the following answers: (i) Artist A is much more popular (coded as 1), (ii) Artist A is a bit more popular than Artist B (coded as 0.5), (iii) they are similar in popularity (coded as 0), (iv) Artist B is a bit more popular than Artist A (coded as −0.5), (v) Artist B is much more popular (coded as −1), or (vi) I don’t know either rapper (coded as missingness). This external validation survey was filled by experts, including two established rappers (Deepflow and Hanhae), as a music columnist (Bong Hyun Kim), and three avid fans representing different generations, and a large language model (GPT-4o). On one occasion, we restricted GPT-4o's ability to search for information online by using an AI tool called Monica. We offer rater information in Supplemental Material 2.1.

We observe a high level of reliability in field expert ratings, alongside a strong correlation between the outputs of our ML model and expert responses (see Table 6). As the response scale is numeric and ordinal in nature (but with meaningful intervals), we employed the intraclass correlation coefficient (ICC) for assessing inter-rater reliability. The overall ICC for “human” ratings was .798 [.723, .864], while ICC values among field professionals and fans were .820 [.748, .877] and .754 [.646, .839], respectively—all exceeding the conventional threshold of .750 (95% confidence intervals in brackets). It suggests that there exists a status hierarchy among sampled artists that expert raters agree upon. Turning to the correlation between our ML model output and expert responses, we identified a striking level of agreement in the evaluation of artist status comparisons. With the exception of the ratings generated by network-disabled GPT-4o, all correlation values exceeded .600. These findings demonstrate that our ML measure of preferential attachment between collaborators closely mirrors the perceptions of rappers, critics, and fans, thereby providing robust evidence for its external validity.

Table 6.

External Validation Results: Inter-Rater Reliability (ICC) and Correlation Between Expert Ratings on Dyadic Deference and ML-Rated Preferential Attachment.

Rater	ICC	Correlation	Don’t Know
Deepflow (artist)	.820	.693	23
Hanhae (artist)		.687	18
Bong Hyun Kim (critic)		.695	12
Fan A (age: 40 s)	.754	.793	23
Fan B (age: 30 s)		.617	12
Fan C (age: 20 s)		.733	38
GPT-4o (no network)	.765	.524	30
GPT-4o	.765	.656	25

Featuring Behavior and Status Advantage in Hip-Hop Music Collaboration

Descriptive statistics. In Table 7, we report the characteristics of our focal artists. For our performance measurement, they tend to receive a median of 40 likes over 8 weeks on a median of five songs in two albums that they newly released during our two-year study period. The focal artists are, on average, 2.41 years active since the first release of a song—the majority are at early career stages without membership in labels or crews that are prestigious enough to be enlisted in K Hip Hop Wiki. It is notable that collaboration by “featuring” is quite general and frequent, with 71.2 percent of our focal artists having released one or more collaborated songs in the study period. For these artists (n = 2,630), our independent variable, the average probabilities of a collaborated song to be classified PA or PU, was calculated.¹⁰ Its average value marks .627, meaning that featuring a higher-status artist on their songs is a more prevalent form than is featuring a lower-status artist (see also the histogram in Figure S6A, Supplemental Material). We also created a song-quantity version of the PA variable (number of PA songs and PU songs), applying a cutoff of a 25 percent (PA: 100% to 75% and PU: 0% to 25%) on p_i,k. In the Supplemental Material, we checked and found a moderate level of dependency on quality—downward or upward deference ties are general and not exclusively created by those at either end of quality (see Figure S6B, Supplemental Material). A correlation matrix of individual-level variables, with a discussion of multicollinearity, is presented in Table S5, Supplemental Material 2.3.

Table 7.

Descriptive Statistics for Focal Artists.

Variable	N	Mean	Median	Sd	Min	Max
Performance	3,694	4,530	40	39,283	0	1,204,730
Past performance (quality)	3,694	39,967	96	536,018	0	28,382,604
Active years	3,694	2.41	1.40	3.27	0	28.7
Recognized distributor	3,694	0.595	1	0.491	0	1
Recognized label	3,694	0.273	0	0.445	0	1
Recognized crew	3,694	0.063	0	0.243	0	1
SMTM participation	3,694	0.074	0	0.325	0	3
Award Winner	3,694	0.023	0	0.149	0	1
Number of songs	3,694	9.42	5	22.8	1	585
Number of albums	3,694	3.41	2	4.39	1	88
Number of SMTM songs	3,694	0.066	0	0.659	0	10
Number of top 100 songs	3,694	0.255	0	3.19	0	161
Collaboration (dummy)	3,694	0.712	1	0.453	0	1
Number of collabo songs	3,694	3.49	1	5.91	0	103
Preferential Attachment	2,630	0.627	0.621	0.178	0	1
Number of PA songs	2,630	1.37	0	2.60	0	29
Number of PU songs	2,630	0.424	0	2.78	0	98

Abbreviations: SMTM = Show Me The Money; PA = preferential attachment; PU = pick-up.

Who features whom on a hip-hop collaboration song? Figure 4 provides a descriptive comparison of characteristics between the main artist (under which a song was released) and a collaborating artist (“featured” in the title of a song) on a song dyad. For active years (the first panel), we have found that collaboration is more general among younger artists—a young artist featuring another young artist (1–3 years of career) seems to be the most prevalent dyad type. We found slightly heterogeneous patterns for quality and productivity. Unlike the plot of active years, we observed roughly four distinct “peaks” in the contour plots. It means that assortative mixing is not always the case for the featuring dyads: a hip-hop artist also invites another one whose quality and productivity are different from one's own (off-diagonal peaks).

Figure 4.

The distribution of the characteristics of main artists and their featured artists on songs.

Collaboration and preferential attachment effect. Given that artists are heterogeneous in terms of who they engage as a featuring artist, it is essential to examine the relationship between preferential attachment in collaboration and performance. The key dependent variable, artist's performance (P_i), is a logged sum of like counts received from users over 8 weeks on their new songs since released on Melon.com in the study period. We choose the eight-week window because it provides adequate audience exposure while ensuring synchronicity with the timing of the independent variable measurement (at the time of release) and reasonable observability during our two-year data collection period. In Supplemental Material 2.4, we have checked that the mean like counts for a song tend to increase after release but follow a concave trajectory (Figure S8A, Supplemental Material 2.4)—indicating that attention to a song diminishes over time. It suggests that, for example, the impact of the choice between the 6-week and 8-week windows becomes indifferent as the like count differentials become small. This indifference grows further as time progresses (Figure S8B, Supplemental Material 2.4). The Levene's test results examining variance differences across weeks indicate no evidence to reject the null hypothesis of variance homogeneity.

Our goal here is to examine whether collaboration helps increase artists’ performance in receiving users’ likes on new songs and, if so, how this collaboration effect on performance varies by the extent of preferential attachment in collaboration. We begin by modeling the descriptive quantity, P_i, using conventional ordinary least squares (OLS) regression in a blockwise manner. This approach helps understand the functioning of control variables and establishes a baseline for how collaborative contexts explain additional variation in artists’ performance.

There are five models (Table 8). Model 1 includes control variables, including artists’ individual quality (past cumulative recognition: logged sum of like counts received from users on their old songs before the study period), affiliation, and productivity patterns. As expected, artists’ acceptance expressed by listeners’ likes exhibits a strong consistency with listeners’ prior ratings. In line with extant research in other contexts, hip-hop artists with previous awards or membership in crews and labels that are known enough to be indexed at K Hip Hop Wiki secured more likes than those without such affiliations or third-party evaluations. One slightly counterintuitive finding is that artists’ tenure (active years) is negatively associated with performance. This may reflect the fact that the music-streaming market is largely driven by young consumers who are generally in favor of young musicians. These baseline findings stand while we account for the individual quantity of products (the number of released songs and albums)—a sort of “denominator” of our dependent variable.

Table 8.

OLS Regression Models Predicting Artists’ Performance by Collaboration and Preferential Attachment in Featuring Behavior.

	Dependent Variable: Logged 8-Week User Engagement (Melon Likes)
	Model 1	Model 2	Model 3	Model 4	Model 5
Intercept	1.046*** (0.056)	1.145*** (0.056)	1.125*** (0.055)	1.152*** (0.055)	1.142*** (0.055)
Active years	−0.149*** (0.009)	−0.146*** (0.008)	−0.151*** (0.008)	−0.148*** (0.008)	−0.151*** (0.008)
Prior quality (logged)	0.427*** (0.011)	0.418*** (0.011)	0.427*** (0.011)	0.419*** (0.010)	0.424*** (0.010)
No. of albums	0.103*** (0.007)	0.087*** (0.007)	0.094*** (0.007)	0.090*** (0.007)	0.094*** (0.007)
No. of songs	−0.005*** (0.001)	−0.006*** (0.001)	−0.007*** (0.001)	−0.007*** (0.001)	−0.007*** (0.001)
No. of SMTM songs	0.442*** (0.038)	0.435*** (0.037)	0.465*** (0.036)	0.451*** (0.036)	0.468*** (0.036)
No. of top 100 songs	0.031*** (0.008)	0.033*** (0.008)	0.033*** (0.007)	0.033*** (0.007)	0.033*** (0.007)
Recognized distributor	0.186*** (0.056)	0.172** (0.055)	0.131* (0.053)	0.138** (0.054)	0.122* (0.053)
Recognized label	0.570*** (0.061)	0.523*** (0.061)	0.487*** (0.059)	0.483*** (0.059)	0.467*** (0.059)
Recognized crew	0.831*** (0.111)	0.743*** (0.110)	0.735*** (0.108)	0.726*** (0.108)	0.719*** (0.108)
Award winner	1.327*** (0.176)	1.369*** (0.173)	1.586*** (0.173)	1.624*** (0.171)	1.628*** (0.172)
SMTM participation	0.477^*** (0.078)	0.441^*** (0.077)	0.384^*** (0.075)	0.433^*** (0.075)	0.386*** (0.075)
Collaboration (dummy)	0.809*** (0.054)	0.611*** (0.056)	0.717** (0.232)	0.584*** (0.055)	1.007*** (0.246)
No. of collabo songs		0.051*** (0.005)	0.054*** (0.004)	0.016** (0.006)	0.037*** (0.006)
Average PA			−2.744*** (0.726)		−3.074*** (0.775)
Average PA (squared)			3.783*** (0.575)		3.544*** (0.617)
No. of PA songs				0.153*** (0.013)	0.087*** (0.015)
No. of PU songs				0.005 (0.012)	−0.013 (0.012)
No. of observations	3,694
R ²	0.651	0.663	0.680	0.677	0.684
Adjusted R²	0.650	0.662	0.679	0.676	0.682
F-statistic	573.211	556.781	521.698	513.658	467.866

Note: The dependent variable is the logged 8-week user engagement (Melon likes) for newly released songs in the study period. For observations with valid PA values (i.e., those who had a collaborative song), the variance inflation factor values for all independent variables, when mean-centered, are all under 10. The condition number, when scaled, is 21.36, which is below the rule-of-thumb threshold of 30.

Abbreviations: SMTM = Show Me The Money; PA = preferential attachment; PU = pick-up.

***

p < .001; ^** p < .01; * p < .05.

In Model 2, we investigate the association between collaboration and artists’ performance. Releasing songs that feature another artist(s)—the coefficient for the dichotomous variable of the release of collaborative songs—makes a significantly positive difference of artists’ performance in receiving users’ likes on new songs. Yet the factual outcome P_i does not provide information about what an artist's performance would have been in the absence of collaboration. To ask a causal question of whether collaboration helps increase artists’ performance, we define a causal unit-specific quantity (Lundberg, Johnson, and Stewart 2021), P_i(t), as the difference in the potential like counts each artist would realize if collaborated a new song—assignment to treatment value t—in the study period—denoted P_i(1)—versus they did not—denoted P_i(0) among n Korean hip-hop artists:

\begin{matrix} \frac{1}{n} \sum_{i = 1}^{n} (P_{i} (1) - P_{i} (0)) \end{matrix} .

(3)

In this counterfactual framework (Imbens and Rubin 2015), we define two potential outcomes for each artist: one if they have a collaboration song (treatment) and one if they don't (control) in the study period. These outcomes express the average treatment effect (ATE) of collaboration on performance (like counts), which is the difference in expected outcomes between the treatment and control groups. This theoretical estimand indicates the average difference in the potential outcome each artist i would realize if i had collaborated versus not. However, since we only observe one of these outcomes for each musician, the unobserved outcome needs to be estimated.

Our empirical estimand, P_i(t), is derived by an imputation estimator (Rubin 1974) that calculates the average difference in the OLS regression prediction if we recode i as a collaborator versus as a non-collaborator in the study period, while holding other covariates constant. This strategy helps estimate the causal effect of collaboration on performance while controlling for observed confounders. The results suggest that the ATE of collaboration remains significantly greater than zero, as displayed in Figure 5 (left panel). More substantively, the magnitude of the ATE estimate (.754) indicates that, holding all else equal, artists who released a collaboration song received approximately e^.754 ≈ 2.13 times more likes on their songs than those who did not collaborate during the study window. This reflects a 113 percent increase in positive audience engagement, a substantial uplift in the context of streaming platforms.

Figure 5.

Average treatment effects of releasing collaborated song(s) on performance.

Does the treatment effect of releasing a collaborative song vary across artists of different levels of quality (proxied by past recognition stock)? Within the potential outcomes framework, we employ a non-parametric stratification approach (i.e., without any assumptions about the functional form) to obtain P_i(t). This approach involves dividing the data into strata based on quality levels, estimating the treatment effect separately within each stratum, and aggregating these estimates weighted by stratum sizes to derive the overall effect. By eschewing parametric assumptions, the stratification estimator provides a more robust estimate of the causal effect.

The stratification estimates reveal that the performance return from releasing collaborative songs does not substantially differ across artists with varying levels of past quality stock (Figure 5, right panel). The conditional positive means of the collaboration effect are consistent across quality strata. Interestingly, the variability of the treatment effect tends to be greater among high-quality artists: there exhibit more extreme deviations, achieving either substantially higher or lower returns from collaboration relative to other strata. We speculate that this variability may reflect the influence of a stronger individual reputation.

Having descriptively and causally established that featuring behavior yields positive performance returns for Korean hip-hop artists, we turn to the question: Does the status difference between collaborators matter? In Models 3 to 5 in the main regression (Table 8), we explore our preferential attachment hypothesis by adding the variables capturing the average tendency of artists (average PA_i) and its squared term and the number of songs classified as either PA_i (PA ≥ .75) or PU_i (PA ≤ .25) by our random forest ML algorithm, which are separately (Models 3 and 4) and jointly (Model 5) included. Both the linear and squared terms of PA_i are statistically significant from zero, indicating a non-linear effect of the average tendency of preferential attachment on performance.¹¹ To more accurately capture this non-linear pattern, we also employ a more flexible machine-learning-based generalized additive model (GAM), which fits a smooth term for preferential attachment using cubic splines (edf = 7.63; F-statistic = 16.45; p-value <2 × 10⁻¹⁶).

We visualize the effect of preferential attachment through predicted curves from the OLS (red) and GAM (blue) estimates (see Figure 6). Both models suggest that higher like counts occur at either end of the PA_i scale, indicating a U-shaped relationship between preferential attachment and an artist's performance. This finding supports our H2: status differences are helpful even when one co-produces a song with a relatively lower-status artist, whereas collaboration may yield the least benefit when the status difference between collaborators is ambiguous. This result is robust against the removal of a subset of high-quality artists who might have had higher baseline chances to work with a lower-status artist and happened to contribute higher returns on performance (see Figure S7 and a discussion in Supplemental Material 2.2). The GLM adds further nuance by revealing an asymmetric effect: a steeper uptick in performance in downward ties (the low average PA observations), relative to upward ties. This suggests that audience curiosity sparked by downward deference may generate stronger engagement than status signaling from upward deference.

Figure 6.

U-shaped effect of preferential attachment on performance.

Population-Level Preferential Attachment in the Featuring Behavior

Our descriptive and explanatory model findings thus far suggest that (i) co-releases of songs among hip-hop artists are general; (ii) preferential attachment is prevalent in collaboration; and (iii) featuring behavior and status differences in collaboration create advantages in receiving more likes from listeners on a streaming platform. If so, do artists increasingly engage a higher-status artist on their songs over time, leading to a stratification of featuring invites?

From a different angle, preferential attachment in operationalization denotes that the more connected a node is, the more likely it is to receive new ties (Barabási and Albert 1999). Receiving ties in a music collaboration network would mean an artist being invited to feature in a song. We therefore count the artist's featuring frequency to measure “featuring in-degrees” (Lee and Li 2025).

We first examine the total featuring indegree of our focal 3,694 artists. Being featured in another artist's song seems to be highly selective. About half of these hip-hop artists (2,040 [55.2%]) have never been featured in another artist's song. Further, even among the rest (1,654 artists) who were at least featured once on a fellow artist's tune, we have found that the distribution of featuring indegrees is highly skewed. Figure 7A presents the cumulative distribution function of artists’ featuring indegree. It shows that featuring behavior exhibits a scaling law of complex networks. We fit log-normal (blue) and power-law (red) curves and investigated the goodness of fit, respectively. The log-linear fits better the featuring indegree data as the bootstrap hypothesis test (Clauset, Shalizi, and Newman 2009) cannot reject the hypothesis (p-value: .361). So, the number of total featuring indegree appears to be distributed similarly with what research found follows a log-normal distribution, such as the income of 97 percent–99 percent of the population (Clementi and Gallegati 2005) and the citation counts to journal articles and patents (Sheridan and Onodera 2018). Therefore, who gets invited to co-produce songs is neither random nor uniform; instead, artists seem to associate preferentially with a few “star” artists whose quality, fame, or expertise may help promote their own songs.

Figure 7.

The distribution and temporal pattern of featuring indegree (the number of times an artist features in a song released by a different artist).

Having examined the static picture of preferential attachment, we further investigate whether the inequality of featuring invites between individuals grows over time. We fixate on the subsets of artists that had debuted at a similar time point and count the featuring indegree in one-year intervals. This allows us to zoom in on the degree to which the popular gets more popular, and the unpopular gets more unpopular for years. Figure 7B illustrates the dynamic pattern of the release of featured songs by artists who debuted 3–4 years ago (left column) and 5–8 years ago (right column).

The upper panel in Figure 7B shows the propensity to have ever featured (i.e., a discrete indicator of whether an artist had at least one featured song over a one-year period) in a focal year T, and we have found that it radically depends on the previous year's featuring frequency. The early career artists active for 3–4 years who had 5 + featuring indegree at 2 years ago have almost 100 percent to be featured in one or more songs in the most recent year. For those with 1 featuring indegree, the propensity decreases to below 50 percent next year. The same tendency is observed for another set of artists who have survived a longer time in the field and have thus been relatively more active collaborators (5–8 years since debuted): The stratification of the featuring indegree at t–4 lasts materializes through the differential likelihood of getting featured again at t–3, t–2, and t–1.

The lower panel in Figure 7B shows the same time-wise analysis on a count term of featuring indegree, focusing on the degree to which non-isolates¹² with varying featuring indegree in the previous year turn popular or unpopular in getting invited next year. As expected, the Gini coefficients of featuring indegree for both subsets of artists increase over time (see Table S7, Supplemental Material 2.5). We can infer from the line charts that two-sided processes drive the increased inequality: (i) those who were relatively unpopular (i.e., low featuring indegree) “fade out” next year; and (ii) those who were highly popular tend not to increase the frequency of being featured but maintain a certain level of activities.

Qualitative Evidence: How Rappers Make Collaboration Decisions in Practice

Our empirical focus has been on status distinctions between hip-hop artists manifested in their ties connected by the featured songs. To validate our findings, we should ask about the data-generating process: Did hip-hop artists indeed coordinate a song by themselves, caring about status positions? With this question in mind, we conducted in-depth expert interviews with several “veteran” hip-hop artists. Deepflow, a 41-year old rapper who served as an SMTM judge and ranks among the highest in total featuring indegree in our data, described the communication channel through which artists reach out for featuring requests as follows:

Usually, it's social media and personal contact. There are sometimes contacts through the label, but I don't think it's common among us. The reason is … at the end of the day, it is the rappers cookin’ it. But if companies’ talks took place before and we meet thereafter in the studio, which feels pretty unnatural and awkward—we can't advance to the next steps. Plus, it's easier to say no to the requests received by the label as that doesn’t hurt it personally.

Thus, according to the interviewee, artists themselves usually connect with other artists for collaboration. It is interesting to see that indirect contact through companies is not effective because artists do not feel guilty about rejecting the request. Thus, it is suggested that the hip-hop collaboration network we observed is unlikely to be an artifact created by organizational-level interventions.

We also asked him about how he makes a decision given the personally received invites to feature:

Q: We assume you receive lots of invites, particularly from junior rappers. How do you handle them, and by what criteria do you accept the invites?

A: It's quite common to receive DMs from non-acquaintances. It's like a mailbox full of unread messages. But I can’t do every song. I use the “preview” function for screening. If that's too amateur level, I just pass by them. But I open and try listening to the tune if I remember names on Soundcloud or something. It's like “hmm, I know this guy.” Then I reply and consider doing a feature.

Deepflow's story can be seen as actors attempting to “preferentially attach” to a high-status actor to produce a collaborative cultural product. To our interpretation, status standings play a “screening” role from the standpoint of the invited artists, which ends up shortlisting the many invites. The veteran rapper relies on a kind of reputation or fame of the message senders: the “names” he comes to remember. Hip-hop musicians routinely float around Soundcloud to listen to unreleased tracks of other (young) musicians. Also, it could well be that Deepflow is attracted to a rising star that his fellow musicians and listeners in the public talk about, even without directly listening to their music. Whatever the process behind would be, it seems clear that perceived status provides a ready rubric for actors to assess quickly the quality of other actors in a market.

Yet it is worth noting that, while status is crucial for initial screening, once shortlisted, a variety of factors appear to affect the decision to accept the featuring invitation. For example, the interviewee tended to view featuring as an opportunity to expand his style of rapping:

I prefer to do styles that I haven't done before. That is, I usually have a certain role that people want me to rap in a song, just like film actors get cast in certain roles … like, I usually get cast in a lot of tracks where they want me to spit some serious hip-hop stuff over a really hard boom-bap beat. At a certain point, I kind of say no to that, like, I’ve done this too many times and I’m done talking about this. So, I tend to be more excited when it feels like something I haven't done before.

Hip-hop artists are embedded in a close-knit community. Loyalty and bonds occasionally kick in:

When there's a friend that I’m so close to and I want to support … and they're planning an album anyway and they need my spot—that's their grand plan. I don't want to ruin that. You know, it's not my discography; it's their music anyway. Then it goes like I’ll be used as a session; so even if the beat isn't my favorite, I'll just give it a shot.

Taken together, Deepflow's firsthand account offers a vivid look at how collaboration decisions unfold in practice. His description suggests that the collaboration network we analyze largely reflects artists’ own status-attentive choices rather than organizational gatekeeping. He screens invitations through reputational cues, rely on informal community knowledge, and use perceived status as a quick heuristic for assessing quality. Yet once initial screening occurs, creative fit, stylistic exploration, and personal ties also shape his decisions. This qualitative evidence supports our interpretation of preferential attachment as a meaningful status process operating in the field.

Discussion

Status remains difficult to measure because it depends on perceived relative standings among actors. This article developed a measurement strategy for capturing deference in collaborative settings where cultural producers operate within a status hierarchy. The main methodological challenge we identified in the literature was: how can researchers quantify status distances when no complete sampling of perceived standings exists? Our approach shows that supervised machine learning can help learn and recover actors’ differences in quality and credential characteristics—strong correlates of status positions—across partially perceived (labelled) status levels.

We applied this strategy to South Korean hip-hop collaborations by using a decade-long archive of an influential annual TV competition series as a partially observed status arbiter. This approach draws on qualitative insight into the ritualized displays of deference embedded in the show's competition-based hierarchy and translates that understanding into a scalable analytic framework for status inference. For collaboration pairs whose deference direction (PA or PU) across the series’ pronounced status levels was known, we trained a model to learn how differences in quality and credential characteristics map onto upward or downward deference. This model was then used to predict PA or PU for collaboration pairs whose characteristics were observed but whose status levels were not. The resulting probability measure aligned closely with external expert ratings collected through a status position survey.

We used this ML-based preferential attachment measure to test how artists’ collaboration and preferential attachment shape user engagement on South Korea's largest streaming platform. Using imputation and non-parametric stratification estimators in a potential outcomes framework, we estimated the average treatment effect of collaboration and found clear causal evidence that featuring behavior boosts performance. Preferential attachment produced a U-shaped effect: artists benefited both from featuring higher-status partners and from engaging emerging talents, while collaborations with ambiguous status differences yielded the weakest returns. Finally, our population-level analysis showed increasing stratification of featuring invitations over time.

These findings shed light on a new dimension of status processes in cultural fields: Preferential attachment in collaboration. This contrasts with prior research that mostly focused on one's own status positions (Baum and Oliver 1992; Ertug and Castellucci 2013; Podolny 1993; Podolny et al. 1996; Rossman et al. 2010; Stuart 1998; Stuart and Ding 2006). Their measurement strategies typically targeted how actors affiliate with status credentials and/or rely on standings from external arbiters. Our approach instead describes the processes in which actors directly climb up the status ladder by inviting other actors and producing a collaborative performance. Rossman and colleagues, among a very few exceptions, examined status-based peer effects in a collaborative context, linking team spillover (Hollywood actors/actresses having teamed up with a prior Oscar winner) to the likelihood of winning an Oscar award (Rossman et al. 2010). While the concept of team spillover captures only “accidental” upward ties with a high-status actor,¹³ our work illuminates actors’ directed deference—a more “purposive” social engagement that can go in both upward and downward directions in the continuum of preferential attachment. Interestingly, our results add that downward ties can indeed be beneficial for musicians’ performance. This contrasts with prior firm-level studies reporting disadvantages associated with downward between-firm associations (Podolny 2008; Washington and Zajac 2005). We speculate that the U-shaped effect is a function of the impact which the unusual combination of status positions in the featuring behavior makes on audiences.

We contribute to status research by conceptualizing ritualized competitions as culturally embedded sources of status signals. We demonstrate that symbolic acts of deference—performed, witnessed, and socially recognized within a competitive media format—can serve as field-native indicators of hierarchical positioning. Our case of SMTM illustrates how structured interactions among producers, qualifiers, and unsuccessful contestants enact a publicly legible status order grounded in the cultural logic of Korean hip-hop. Integrating this qualitative insight with a scalable analytic strategy, we infer status hierarchies from ritualized performances collected in digital trace data. Strikingly, the resultant measures of status and preferential attachment closely align with external expert surveys, validating the interpretive power of our approach. This conceptualization expands the scope of status research by showing how ritual operates not merely as a symbolic expression, but as a patterned social process through which status is actively produced, negotiated, and recognized in collaborative cultural production (Collins 2004; Goffman 1967).

While our study focuses on the South Korean hip-hop scene, our ML-based status measurement strategy could be broadly extensible to other national contexts. In settings where full-field sampling is not feasible, researchers often focus on relatively elite or publicly visible producers, and status hierarchies within these samples remain analytically meaningful. This strategy can be adapted to such cases by leveraging domain-specific supervision signals—such as competition outcomes, streaming metrics, award recognitions, or performance billing orders—to infer status and model deference. For example, in the domain of hip-hop in the United States, televised competitions like The Rap Game and Rhythm + Flow offer structured rankings that could be used to train models of status ordering. In the Puerto Rican reggaeton scene, status may be inferred from collaborative music videos, venue lineups, and institutional recognition through award shows. More broadly, this approach offers a scalable alternative to studies that rely on sparse or binary status proxies, such as Grammy wins (McMillan 2022) or historical album sales (Halgin et al. 2020), by enabling richer modeling of status hierarchies through partial but interpretable signals embedded in digital trace data.

Beyond national contexts, our strategy can also be applied to other domains where multiple actors purposively co-produce a product labeled in a meaningful order of contributions. Academic co-authorship networks offer a natural parallel: first authors, secondary authors, and corresponding authors with different quality characteristics exert different status signals (Jalali, Introne, and Soundarajan 2023; Katz and Martin 1997; Lee and Bozeman 2005; Li, Liao, and Yen 2013; Moody 2004). The sociology of science has long examined how collaboration with high-status scholars shapes visibility, productivity, and career trajectories (Azoulay et al. 2010; Li and Agha 2015; Maliniak et al. 2013; Mcdowell and Smith 1992; Merton 1968; Zuckerman 1977). Our research can complement this tradition by offering a scalable strategy for inferring status hierarchies in settings where formal markers are incomplete or perceived status data are partially available.

Likewise, our workflow—which combines digital trace data (digitally recorded performance), supervised learning through media-based status identifiers (publicly mediated status indicators), and a collaboration network (collaborative products)—can be used to revisit the problems of organizational and cultural strategic collaboration. These questions have often been investigated with a limited data scope inherent in traditional research designs. In this respect, our study serves as yet another example of social science benefiting from machine learning that helps researchers amplify coding and inference (Lundberg et al. 2022; Molina and Garip 2019). That said, successful adaptation of supervised learning approaches depends on the availability and quality of labeled data, as well as the representativeness of the unlabeled population. Researchers should carefully assess whether supervision signals reflect meaningful status distinctions in the target context and whether collaboration data are sufficiently structured to support inference. In cases where labeled data are sparse or drawn from a different cultural domain, techniques such as domain adaptation may be necessary to mitigate distributional shifts and ensure model validity. This generalizability, when approached with such considerations in mind, opens new avenues for studying strategic collaboration and cultural stratification across diverse fields.

This study is not without limitations. First, the observational nature of our study limits the ability to establish a fully causal relationship between collaboration, preferential attachment, and performance advantages. Although we estimated a counterfactual intervention (collaboration) in addition to the repeated measurement design that controls for the artists’ characteristics before and after the study period, our findings remain constrained by observable factors. Unobserved potential confounders—such as marketing efforts, non-quality-oriented audience engagement (i.e., diss and controversy-driven attention), fan base overlap among collaborators, and external trends in the music industry—are not fully captured in our model. Therefore, the estimated ATE of collaboration and its variation according to presential attachment should be interpreted within the boundaries of our data collection and methodological approach. Second, future research can benefit from incorporating aesthetic measurements—for example, sound and lyrics characteristics (Berg 2022; Kim and Askin 2024; Negro et al. 2022; Nie 2021). Why does hip-hop music collaboration create advantages? We provided a status-based explanation, but it would also be fruitful to investigate if collaboration leads to qualitative innovations in sound and narratives. Third, while our study examined hierarchical status signals manifested in a televised competition program, future work could further explore how different forms of visibility—positive, negative, or absent—contribute to status in mediated performance settings. In our field settings, failure to advance further stages still comes as an advantage in terms of visibility (Andrejevic 2020; Gamson 1994; Marwick and boyd 2011).¹⁴ It would be interesting to examine a setting where status merit from visibility is highly dependent upon its valence (i.e., negative portrayal as status demerit).

Collaboration has long become a general form of cultural production and entrepreneurial innovations. It is natural that actors are often incentivized to explicitly showcase the engaged parties to the eyes of audiences—the featuring behavior of hip-hop entrepreneurs is exactly the case. Our case study suggested one empirical research design to investigate status processes embedded in collaborative outcomes. We hope our work has paved the way for many more to come with the goal of enhancing an understanding of acts of deference manifested in collaboration.

Supplemental Material

sj-pdf-1-smr-10.1177_00491241261420812 - Supplemental material for A Machine Learning Approach to Preferential Attachment and Status Advantage in a Hip-Hop Collaboration Network

Supplemental material, sj-pdf-1-smr-10.1177_00491241261420812 for A Machine Learning Approach to Preferential Attachment and Status Advantage in a Hip-Hop Collaboration Network by Jaemin Lee and Yujie Li in Sociological Methods & Research

Footnotes

Acknowledgments

Authors contributed equally. We thank the SMR Associate Editor, Brandon Stewart, and the three anonymous reviewers for their constructive critique and helpful suggestions. Jeong Woo Ham and Eddy Park provided excellent research assistance. We are also indebted to field professionals—rappers Deepflow and Hanhae, cultural critic Bong-hyun Kim, and television producer Hak-min Kim—whose in-depth interviews, as well as their longstanding commitment to the culture of Korean hip-hop, shaped our understanding of the field and inspired the initial motivation for this study.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

This work was supported by the Chinese University of Hong Kong.

Preregistration Statement

This study was not preregistered. The analyses, models, and validation procedures were developed iteratively during the research process, and no preregistration protocol exists for this project.

ORCID iDs

Jaemin Lee

Yujie Li

Data Availability Statement

All data, code, and replication materials used in this study are permanently available in the Open Science Framework repository (), and they reproduce the full set of analyses reported in the main text of the article.

Supplemental Material

Supplemental material for this article is available online .

Notes

Author Biographies

Jaemin Lee is an assistant professor of sociology at The Chinese University of Hong Kong, specializing in social networks, mathematical sociology, and political sociology. He uses computational and network-based approaches to study how structural mechanisms—including hierarchy, cohesion, peer influence, and boundary-making—shape political, cultural, and organizational processes.

Yujie Li is an assistant professor of computational social science at The Chinese University of Hong Kong, Shenzhen. His research interests include computational social science and social networks.

References

Andrejevic

Mark.

2020. Reality TV: The Work of Being Watched. Lanham, MD: Rowman & Littlefield Publishers.

Azoulay

Pierre

Graff Zivin

Joshua S.

Wang

Jialan

. 2010. “Superstar Extinction.” The Quarterly Journal of Economics 125(2):549–89.

Barabási

Albert-László

Albert

Réka

. 1999. “Emergence of Scaling in Random Networks.” Science 286(5439):509–12. doi:10.1126/science.286.5439.509

Baum

Joel A. C.

Oliver

Christine

. 1992. “Institutional Embeddedness and the Dynamics of Organizational Populations.” American Sociological Review 57(4):540–59. doi:10.2307/2096100

Benjamin

Beth A.

Podolny

Joel M.

. 1999. “Status, Quality, and Social Order in the California Wine Industry.” Administrative Science Quarterly 44(3):563–89. doi:10.2307/2666962

Berg

Justin M.

2022. “One-Hit Wonders Versus Hit Makers: Sustaining Success in Creative Industries.” Administrative Science Quarterly 67(3):630–73. doi:10.1177/00018392221083650

Blau

Peter M.

1977. Inequality and Heterogeneity: A Primitive Theory of Social Structure. Orlando, FL: Free Press.

Burris

Val.

2004. “The Academic Caste System: Prestige Hierarchies in PhD Exchange Networks.” American Sociological Review 69(2):239–64. doi:10.1177/000312240406900205

Clauset

Aaron

Shalizi

Cosma Rohilla

Newman

M. E. J.

. 2009. “Power-Law Distributions in Empirical Data.” SIAM Review 51(4):661–703.

10.

Clementi

Gallegati

. 2005. “Power Law Tails in the Italian Personal Income Distribution.” Physica A: Statistical Mechanics and Its Applications 350(2):427–38. doi:10.1016/j.physa.2004.11.038

11.

Collins

Randall.

2004. Interaction Ritual Chains. Princeton: Princeton University Press.

12.

DiPrete

Thomas A.

Eirich

Gregory M.

. 2006. “Cumulative Advantage as a Mechanism for Inequality: A Review of Theoretical and Empirical Developments.” Annual Review of Sociology 32(1):271–97. doi:10.1146/annurev.soc.32.061604.123127

13.

Ertug

Gokhan

Castellucci

Fabrizio

. 2013. “Getting What You Need: How Reputation and Status Affect Team Performance, Hiring, and Salaries in the NBA.” Academy of Management Journal 56(2):407–31. doi:10.5465/amj.2010.1084

14.

Ertug

Gokhan

Yogev

Tamar

Lee

Yonghoon G.

Hedström

Peter

. 2016. “The Art of Representation: How Audience-Specific Reputations Affect Success in the Contemporary Art Field.” Academy of Management Journal 59(1):113–34. doi:10.5465/amj.2013.0621

15.

Fraiberger

Samuel P.

Sinatra

Roberta

Resch

Magnus

Riedl

Christoph

Barabási

Albert László

. 2018. “Quantifying Reputation and Success in Art.” Science 362(6416):825–29. doi:10.1126/SCIENCE.AAU7224/SUPPL_FILE/AAU7224_FRAIBERGER_SM.PDF

16.

Gamson

Joshua.

1994. Claims to Fame: Celebrity in Contemporary America. Berkeley, CA: University of California Press.

17.

Gienapp

Lukas

Kruckenberg

Clara

Burghardt

Manuel

. 2021. “Topological Properties of Music Collaboration Networks: The Case of Jazz and Hip Hop.” Digital Humanities Quarterly 15(1):1–30.

18.

Goffman

Erving.

1967. Interaction Ritual: Essays on Face-to-Face Behavior. New York: Pantheon Books.

19.

Goode

William J.

1978. The Celebration of Heroes: Prestige as a Control System. Berkeley: University of California Press.

20.

Gould

Roger V.

2002. “The Origins of Status Hierarchies: A Formal Theory and Empirical Test.” American Journal of Sociology 107(5):1143–78. doi:10.1086/341744

21.

Halgin

Daniel S.

Borgatti

Stephen P.

Huang

Zhi

. 2020. “Prismatic Effects of Negative Ties.” Social Networks 60(August 2019):26–33. doi:10.1016/j.socnet.2019.07.004

22.

Homans

George.

1950. The Human Group. New York: Harcourt, Brace & World.

23.

Huang

Jiayuan

Smola

Alexander J.

Gretton

Arthur

Borgwardt

Karsten M.

Schölkopf

Bernhard

. 2007. “Correcting Sample Selection Bias by Unlabeled Data.” Pp. 601–8 in Advances in Neural Information Processing Systems, Vol. 19. Cambridge, MA: MIT Press.

24.

Imbens

Guido W.

Rubin

Donald B.

. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. New York, NY: Cambridge University Press.

25.

Jalali

Zeinab S.

Introne

Josh

Soundarajan

Sucheta

. 2023. “Social Stratification in Networks: Insights From Co-Authorship Networks.” Journal of the Royal Society Interface 20(198):20220555. doi:10.1098/RSIF.2022.0555

26.

Katz

J. Sylvan

Martin

Ben R.

. 1997. “What Is Research Collaboration?” Research Policy 26(1):1–18. doi:10.1016/S0048-7333(96)00917-1

27.

Kim

Khwan

Askin

Noah

. 2024. “Feature-Based Structures of Opportunity: Genre Innovation in the American Popular Music Industry, 1958 to 2016.” American Sociological Review 89(3):542–83. doi:10.1177/00031224241246271

28.

Kovács

Balázs

Sharkey

Amanda J.

. 2014. “The Paradox of Publicity: How Awards Can Negatively Affect the Evaluation of Quality.” Administrative Science Quarterly 59(1):1–33. doi:10.1177/0001839214523602/ASSET/IMAGES/LARGE/10.1177_0001839214523602-FIG4.JPEG

29.

Lee

Jaemin

Yujie

. 2025. “Activity Constraints as a Mechanism for Non‑scale‑free Social Networks.” Pp. 305–15 in Complex Networks and Their Applications XIII, edited by H. Cherifi, M. Donduran, L. M. Rocha, C. Cherifi and O. Varol. Cham: Springer Nature Switzerland. doi:10.1007/978-3-031-82431-9_25

30.

Lee

Jaemin

Yujie

. 2026. “Machine Learning and Preferential Attachment in Hip-hop Networks: Replication Materials.” Open Science Framework. https://osf.io/3ew8h

31.

Lee

Sooho

Bozeman

Barry

. 2005. “The Impact of Research Collaboration on Scientific Productivity.” Social Studies of Science 35(5):673–702. doi:10.1177/0306312705052359

32.

Danielle

Agha

Leila

. 2015. “Research Funding. Big Names or Big Ideas: Do Peer-Review Panels Select the Best Science Proposals?” Science (New York, N.Y.) 348(6233):434–38. doi:10.1126/science.aaa0185

33.

Eldon Y.

Liao

Chien Hsiang

Yen

Hsiuju Rebecca

. 2013. “Co-Authorship Networks and Research Impact: A Social Capital Perspective.” Research Policy 42(9):1515–30. doi:10.1016/j.respol.2013.06.012

34.

Lundberg

Ian

Brand

Jennie E.

Jeon

Nanum

. 2022. “Researcher Reasoning Meets Computational Capacity: Machine Learning for Social Science.” Social Science Research 108:102807. doi:10.1016/j.ssresearch.2022.102807

35.

Lundberg

Ian

Johnson

Rebecca

Stewart

Brandon M.

. 2021. “What Is Your Estimand? Defining the Target Quantity Connects Statistical Evidence to Theory.” American Sociological Review 86(3):532–65. doi:10.1177/00031224211004187

36.

Maliniak

Daniel

Powers

Ryan

Walter

Barbara F.

. 2013. “The Gender Citation Gap in International Relations.” International Organization 67(4):889–922. doi:10.1017/S0020818313000209

37.

Marwick

Alice

boyd

danah

. 2011. “To See and Be Seen: Celebrity Practice on Twitter.” Convergence 17(2):139–58. doi:10.1177/1354856510394539

38.

Mcdowell

John M.

Smith

Janet Kiholm

. 1992. “The Effect of Gender-Sorting on Propensity to Coauthor: Implications for Academic Promotion.” Economic Inquiry 30(1):68–82. doi:10.1111/j.1465-7295.1992.tb01536.x

39.

McMillan

Cassie

. 2022. “‘Who Run the World?’ Gender and the Social Network of R&B/Hip Hop Collaboration From 2012 to 2020.” Applied Network Science 7(69):1–20. doi:10.1007/s41109-022-00485-9

40.

McPherson

J. Miller

Smith-Lovin

Lynn

Cook

James M.

. 2001. “Birds of a Feather: Homophily in Social Networks.” Annual Review of Sociology 27:415–44.

41.

Merton

Robert K.

1968. “The Matthew Effect in Science.” Science 159(3810):56–63. doi:10.1126/science.159.3810.56

42.

Molina

Mario

Garip

Filiz

. 2019. “Machine Learning for Sociology.” Annual Review of Sociology 45(1):27–45. doi:10.1146/annurev-soc-073117-041106

43.

Moody

James.

2004. “The Structure of a Social Science Collaboration Network: Disciplinary Cohesion from 1963 to 1999.” American Sociological Review 69(2):213–38.

44.

Murphy

Kevin P.

2012. Machine Learning: A Probabilistic Perspective. Cambridge, MA: MIT Press.

45.

Negro

Giacomo

Kovács

Balázs

Carroll

Glenn R.

. 2022. “What’s Next? Artists’ Music After Grammy Awards.” American Sociological Review 87(4):644–74. doi:10.1177/00031224221103257

46.

Nie

Ke.

2021. “Disperse and Preserve the Perverse: Computing How Hip-Hop Censorship Changed Popular Music Genres in China.” Poetics 88:101590. doi:10.1016/j.poetic.2021.101590

47.

Pan

Sinno Jialin

Tsang

Ivor W.

Kwok

James T.

Yang

Qiang

. 2011. “Domain Adaptation via Transfer Component Analysis.” IEEE Transactions on Neural Networks 22(2):199–210. doi:10.1109/TNN.2010.2091281

48.

Perretti

Fabrizio

Negro

Giacomo

. 2006. “Filling Empty Seats: How Status and Organizational Hierarchies Affect Exploration Versus Exploitation in Team Design.” Academy of Management Journal 49(4):759–77. doi:10.5465/AMJ.2006.22083032

49.

Podolny

Joel M.

1993. “A Status-Based Model of Market Competition.” American Journal of Sociology 98(4):829–72.

50.

Podolny

Joel M

. 1994. “Market Uncertainty and the Social Character of Economic Exchange.” Administrative Science Quarterly 39(3):458–83.

51.

Podolny

Joel M.

2001. “Networks as the Pipes and Prisms of the Market.” American Journal of Sociology 107(1):33–60. doi:10.1086/323038

52.

Podolny

Joel M.

2008. “Resurrecting Images From the Past? Comment on Reagans and Zuckerman.” Industrial and Corporate Change 17(5):971–77. doi:10.1093/icc/dtn035

53.

Podolny

Joel M.

Phillips

Damon J.

. 1996. “The Dynamics of Organizational Status.” Industrial and Corporate Change 5(2):453–72.

54.

Podolny

Joel M.

Stuart

Toby E.

. 1995. “A Role-Based Ecology of Technological Change Author.” American Journal of Sociology 100(5):1224–60.

55.

Podolny

Joel M.

Stuart

Toby E.

Hannan

Michael T.

. 1996. “Networks, Knowledge, and Niches: Competition in the Worldwide Semiconductor Industry, 1984-1991.” American Journal of Sociology 102(3):659–89. doi:10.1086/230994

56.

Rao

Hayagreeva.

1994. “The Social Construction of Reputation: Certification Contests, Legitimation, and the Survival of Organizations in the American Automobile Industry: 1895-1912.” Strategic Management Journal 15:29–44.

57.

Rossman

Gabriel

Esparza

Nicole

Bonacich

Phillip

. 2010. “I’d Like to Thank the Academy, Team Spillovers, and Network Centrality.” American Sociological Review 75(1):31–51. doi:10.1177/0003122409359164

58.

Rossman

Gabriel

Schilke

Oliver

. 2014. “Close, But No Cigar: The Bimodal Rewards to Prize-Seeking.” American Sociological Review 79(1):86–108. doi:10.1177/0003122413516342

59.

Rubin

Donald B.

1974. “Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies.” Journal of Educational Psychology 66(5):688–701. doi:10.1037/h0037350

60.

Sauder

Michael

Lynn

Freda

Podolny

Joel M.

. 2012. “Status: Insights from Organizational Sociology.” Annual Review of Sociology 38:267–83. doi:10.1146/annurev-soc-071811-145503

61.

Sheridan

Paul

Onodera

Taku

. 2018. “A Preferential Attachment Paradox: How Preferential Attachment Combines With Growth to Produce Networks With Log-Normal In-Degree Distributions.” Scientific Reports 8(1):2811. doi:10.1038/s41598-018-21133-2

62.

Simmel

Georg.

1950. The Sociology of Georg Simmel. Edited by K. H. Wolf. New York, NY: Free Press.

63.

Smith

Reginald D.

2006. “The Network of Collaboration Among Rappers and Its Community Structure.” Journal of Statistical Mechanics: Theory and Experiment 2006(02):P02006. doi:10.1088/1742-5468/2006/02/P02006

64.

Spence

Michael.

1973. “Job Market Signaling.” The Quarterly Journal of Economics 87(3):355–74. doi:10.2307/1882010

65.

Stuart

Toby E.

1998. “Network Positions and Propensities to Collaborate: An Investigation of Strategic Alliance Formation in a High-Technology Industry.” Administrative Science Quarterly 43(3):668–98. doi:10.2307/2393679

66.

Stuart

Toby E.

Ding

Waverly W.

. 2006. “When Do Scientists Become Entrepreneurs? The Social Structural Antecedents of Commercial Activity in the Academic Life Sciences.” American Journal of Sociology 112(1):97–144. doi:10.1086/502691

67.

Washington

Marvin

Zajac

Edward J.

. 2005. “Status Evolution and Competition: Theory and Evidence.” Academy of Management Journal 48(2):282–96. doi:10.5465/amj.2005.16928408

68.

Weber

Max.

1922. Wirtschaft Und Gesellschaft: Grundriss Der Verstehenden Soziologie. Tübingen: Mohr.

69.

Weiss

Karl

Khoshgoftaar

Taghi M.

Wang

DingDing

. 2016. “A Survey of Transfer Learning.” Journal of Big Data 3(1):9. doi:10.1186/s40537-016-0043-6

70.

Whyte

William F.

1943. Street Corner Society: The Social Structure of an Italian Slum. Chicago: University of Chicago Press.

71.

Zuckerman

Harriet.

1977. Scientific Elite: Nobel Laureates in the United States. New York: Free Press.