Abstract
We offer a new perspective on how cultural markets are structured and the conditions under which innovations are more likely to emerge. We argue that in addition to organization- and producer-level factors, product features—the locus of marketplace interaction between producers and consumers—also structure markets. The aggregated distribution of product features helps producers gauge where to differentiate or conform and when consumers may be more receptive to the kind of novelty that spawns new genres, our measure of innovation. We test our arguments with a unique dataset comprising the nearly 25,000 songs that appeared on the Billboard Hot 100 chart from 1958 to 2016, using computational methods to capture and analyze the aesthetic (sonic) and semantic (lyrical) features of each song and, consequently, the market for popular music. Results reveal that new genres are more likely to appear following markets that can be characterized as diverse along one feature dimension while homogenous along the other. We then connect specific configurations of feature distributions to subsequent song novelty before linking the aesthetic and semantic novelty of individual songs to genre emergence. We replicate our findings using industry-wide data and conclude with implications for the study of markets and innovation.
Dating back at least as far as Schumpeter (1939, 1950), scholars have been interested in the relationship between market structure and innovation. Specifically, which configurations and concentrations of (typically large) firms in a market give rise to increased or diminished rates of innovation? Within sociology, there is a particular interest in the nature of this relationship in the creative industries (e.g., music, art, and film) due to their culture generating role (Caves 2000) and the acknowledgment that they “depend on continuous innovation” (Jones et al. 2016: 751). Indeed, work from both the production of culture perspective (Peterson and Berger 1975) and resource partitioning theory (Carroll 1985) has explored the effects of market concentration on innovation in various creative industries.
However, despite a common interest in the forces that drive product differentiation and conformity, these two streams of research generally reach opposite conclusions. The former, building on Schumpeter’s (1950) premise, initially argued that a high concentration of large firms dampens innovation by reducing producer entrepreneurship (DiMaggio 1977; Dowd and Blyler 2002; Hirsch 1972; Peterson and Berger 1975). In contrast, the latter argues that a high concentration of large firms creates opportunities that increase foundings of smaller, specialist producers who, in turn, create novel product categories like craft beers and new film genres (Carroll and Swaminathan 2000; Mezias and Mezias 2000). Although the production of culture perspective has continually added nuance to its conceptualization of market structure and its consequences for innovation (e.g., Burnett 1992b; de Laat 2014; Lopes 1992), the two bodies of work generally differ in their perspectives and predictions (see Dowd 2004:1446).
One possible reason for the differing predictions is that across both streams of research, market structure has historically focused on the producers of innovation (i.e., firms and individual creators) and their characteristics. Yet work in the realm of popular music shows that different conceptions of market structure (e.g., looking at the degree of firms’ decentralized decision-making in the market) negate much of the evidence connecting firm concentration to innovation (Burnett 1992b, 1993; Dowd 2004; Lopes 1992). Furthermore, the observation that market concentration, decentralized decision-making, and rates of innovation have all waxed and waned over the course of music history (Peterson and Berger 1996) has prompted scholars to posit that other features may structure markets (e.g., de Laat 2014). Following in this vein and acknowledging recent research pointing to the need to connect “the social, cultural, and material poles of innovation” (Cao, Chen, and Evans 2022, emphasis added; see also Wohl 2022), we build on the existing market structure research and propose adding the products themselves to the equation. Specifically, we want to understand how product features structure a market and promote or inhibit innovation in cultural industries.
We focus on product features because they are the locus of marketplace interaction between producers—the individual and organizational actors involved in the creation of a cultural product and its presentation to the market 1 —and consumers, the end-users of these products. Product features are particularly important for understanding new products because they are the “details” that “invoke the public’s familiarity with [existing] technical artifacts and social structures” as a means of introducing and making comprehensible something new (Hargadon and Douglas 2001:477). Said differently, consumers use the features of new products to understand and situate them among existing products (Tversky 1977), leading to their adoption or rejection, whereas producers use features to distinguish their products and make them appeal to consumers. Therefore, the features of a cultural product (e.g., the ingredients in a dish [Rao, Monin, and Durand 2003] or the colors and styles used in clothing [Godart and Galunic 2019]) represent an intersection between consumer preferences and producer decisions.
It follows that aggregating the features of the available products in a market reveals an overall distribution: the landscape of consumer tastes and producer opportunity spaces. This feature distribution can be observed via regularly collected, updated, and interpreted information about market activity, referred to as market information regimes (Anand and Peterson 2000; Zanella, Cillo, and Verona 2022). In this article, we focus on two product dimensions, which we call aesthetic and semantic features (cf. Castañer and Campos 2002; Peterson and Beal 2001), and we propose that the distribution of these features provides another means of capturing the structure of a market that we can use to examine when and where innovation may emerge.
To answer our main research question—how product features structure a market and promote or inhibit innovation—we set out three objectives. Our first objective is to provide evidence that the feature-based market structure captured by dynamic market information regimes helps indicate when innovation in the form of new genres is more likely to emerge, above and beyond the market- and producer-level factors that characterize previous research. As innovation can be defined as the successful implementation of creative output (Anderson, Potočnik, and Zhou 2014), and novelty is a necessary component of creativity (Amabile 1988), the appearance of a novel category (genre) is very much a sign of a successful innovation (van Venrooij 2015). 2 This is especially true when that appearance is in the mainstream. We argue that the emergence of new genres is more likely when the feature-based market structure is mismatched, meaning that between the two feature dimensions (i.e., aesthetic and semantic), one is more homogenous, and the other is more diverse.
Our second objective is to link these feature-based structural forces to an outcome of the cultural production process: the novelty of individual products. To draw theoretical and empirical connections from the market-level structure to individual product creation, we first classify markets into four different conditions based on the degree of diversity along each of the two feature dimensions (aesthetic and semantic): both diverse, both homogenous, and two conditions where one feature is diverse, and the other is homogenous. We then examine whether product novelty differs across these market conditions. As above, we highlight market-level feature configurations where one dimension is homogenous and the other is diverse as critical indicators of greater novelty in the individual products that subsequently appear in the market.
Our third objective is to “zoom in” on individual products to explore how their feature novelty plays a role in signaling innovation. More specifically, we ask whether and how a product’s novelty within or across feature dimensions can be a catalyst for its becoming the originator of a ground-breaking category. In other words, does a product’s feature novelty make it more likely to be the progenitor of a new genre, and what composition of novelty matters the most? To answer this question, we examine the aesthetic and semantic features of individual products and distinguish between two types of genre emergence: genre innovation, which reflects ground-breaking novelty that results in fundamentally new categories or new combinations of existing categories, and genre evolution, which also creates or recombines categories but without departing from the boundaries of a parent genre (i.e., new subgenres). 3 We hypothesize that both aesthetic and semantic novelty will be necessary for genre innovation, but that one or the other will be sufficient to generate genre evolution.
We explore our proposed phenomena in the American popular music market. The recent availability of fine-grained, song-level data in the form of sonic attributes and lyrical content means we can look in detail at the features of products in the music market in ways that have not historically been possible. We focus primarily on the Billboard Hot 100 charts, which capture the 100 most popular songs in the United States each week, from 1958 to 2016. These charts have been widely used to study the connection between market structure and innovation (Burnett 1992b; Dowd 2002; Dowd and Blyler 2002; Lena 2006; Lopes 1992; Peterson and Berger 1975), making them an ideal setting for our analyses.
Since the middle of the twentieth century, the Billboard charts have been the primary source for making sense of what was happening in the U.S. music industry (Anand and Peterson 2000). Even with increasing quantities of music released and made accessible over the course of our period of study, what happens on the Hot 100 chart still has an important signaling effect as a market information regime (Shi 2022; Zanella et al. 2022). Music industry stakeholders have limited capacity and cannot observe what is going on in the entire population, so while many niche charts exist (e.g., country, rock & alternative, R&B), the Hot 100 remains a focal point for “field participants to structure their beliefs about the success or failure of particular recordings, artists, and sub-genres” (Anand and Peterson 2000:281). The Hot 100 also generates the lion’s share of field-wide media attention, which drives additional focus back to the chart and the genres that appear on it (see van Venrooij 2015:121). Our argument is that for something to truly be considered an innovation, it must achieve some degree of popular success such that it shifts a market’s categorical structuring, making the Hot 100 an appropriate context in which to observe such phenomena.
However, despite its regular use in research on diversity and innovation, the Hot 100 is not without its issues. For example, it focuses only on the United States; prior to the introduction of Soundscan in 1991, it may not have accurately accounted for some genres; and it only captures a small, mainstream segment of the music market. 4 To address some of these concerns and test the robustness of our findings, we also collected a secondary dataset of over 1.5 million tracks, capturing a wide swath of the available music released during the same period as our Hot 100 data.
Our results suggest that product features provide another means of structuring markets, and they can offer clues as to when and where innovation is more likely to subsequently take place. We show that a market’s potential for innovation varies with the degree of homogeneity and diversity along the semantic and aesthetic feature dimensions, even after controlling for other market- and producer-level factors likely to influence rates of innovation. We also draw a connection between market structure and product novelty, providing evidence of a mechanism connecting macro to micro market dynamics. Finally, we link product-level novelty back to innovation by showing that highly differentiated songs (i.e., those that are novel along both feature dimensions) are more likely to signal genre innovation, whereas songs that are novel along only one feature dimension are more likely to signal genre evolution. Taken together, these results reflect a cycle of innovation in the music market—one that connects market-level phenomena to cultural producers’ output (i.e., the products) and back to market states. We therefore add a new layer to the long-running debate on market concentration and innovation: the role of product features as a market-structuring force.
Theory
Market Information Regimes and Product Features in Creative Industries
In competitive fields like creative industries, organizations derive critical resources from their market (Chang and Chen 2020; Edelman and Yli-Renko 2010), but they must first find ways to make sense of the ambiguity and uncertainty typically surrounding markets (Salancik and Pfeffer 1978). To do so, it is critical to construct field-wide information regimes that help market actors “make sense” of the goings-on in the market. Anand and Peterson (2000:270) define these as “socially-constructed information regimes that compile reports about the market.” By including “regularly updated information about market activity” (Anand and Peterson 2000:271), such information regimes create a common source of relevant information around which market actors can orient their understanding of the market.
In the context of the music industry, Billboard’s weekly charts create such a market information regime (Anand and Peterson 2000). Until recently, when the availability of daily charts provided by digital streaming services made weekly charts slightly less important, the commercial music industry relied heavily on Billboard’s chart data. In the absence of reliable, widely available market-level data, these charts embody a central mechanism of market dynamics, serving as short-hand descriptions of the competitive landscape in the industry, allowing record companies and musicians to continuously monitor each other and each other’s products (White 1981)—as well as consumer demand—vis-à-vis their relative commercial success.
The charts provide a means for the producers and creators of music to spot opportunities for creativity and innovation, which are determined by specific configurations of cultural and material elements (Godart, Seong, and Phillips 2020). In the realm of commercial music, those elements are song features—sonic attributes and lyrics. Artists create, and label executives select and promote, songs that vary along these feature dimensions in attempts to conform to or differentiate from the existing competitive landscape; consumers face choices about whether to listen to music that mirrors current trends or diverges from them. The Billboard charts provide regular snapshots of the industry from which producers and consumers make sense of the current musical landscape and then make choices. The features of the 100 most popular songs in a given week also shape the market context into which new genres arise.
New Genres as Innovations
Initially created as a means of segmenting the music market for the purposes of targeted advertising on radio (Negus 1999; Peterson 1990), genres have become the central categorizing force in the music industry (DiMaggio 1987; Frith 1996; Lena and Peterson 2008; McLeod 2001). As music’s classification system, genres are a touchpoint for artists, consumers, critics, record labels, and audiences, with each party engaging each other in an ongoing conversation about the expectations, norms, boundaries, and commercial or artistic value of each genre (Lena 2012; Lena and Peterson 2008). Genres are highly contested spaces: the boundaries are debated and reinforced by various actors in the industry, each of whom is seeking to identify and expand or preserve what is “allowed” within a given genre (Becker 1982). As such, genres serve multiple purposes, helping to structure the music industry while also determining the appropriate sounds, behaviors, and identities for artists and audiences (Askin and Mol 2018; Becker 1982; DiMaggio 1987; Hsu and Hannan 2005; Phillips and Owens 2004). Genres create markets (Caves 2000; Peterson 1990) and generate status hierarchies (Bourdieu 1983, 1984). That genres go beyond merely dictating sounds and styles is reflective of their economic and social power. Frith (1996:76) points out their particularly critical role, writing that “genre is a way of defining music in its market, or alternatively, the market in its music.”
Critically, when analyzing music industry dynamics, we must acknowledge that genres are not merely musical categories, they are a manifestation of deep-rooted power dynamics influenced by race, gender, and industrial frameworks. For example, from the mid-1920s to the mid-1940s, music was specifically aimed at different racial, ethnic, regional, and class groups. During this span, the correspondence between musical and social categories was obvious, most notably in the industry-wide consolidation around the “race” and “hillbilly” genres, driven by major labels’ splitting of music along racial lines (Roy 2004). In the mid-1940s, these categories became known as “rhythm & blues” and “country” music, respectively, and they began to meld, with associated genres fusing and audiences mixing (Dowd 2003; Dowd and Blyler 2002; Roy 2004). Yet social categories and their respective power dynamics continue to play a significant role in shaping genre categorization, market structure, and the success of new genres and their pioneers. Within the hip-hop genre, for instance, lighter-skinned artists often achieve greater chart success, showcasing how race, intertwined with prevailing societal beauty norms, correlates with genre representation and success (Laybourn 2018). Intermediaries, such as critics, also demonstrate biases in categorizing musicians based on race, further perpetuating racial disparities in the music industry. Black artists are less likely to be categorized as boundary spanners, making it more difficult to be identified as an artist who recombines existing genres into something new (van Venrooij, Miller, and Schmutz 2022). We integrate racial and organizational dynamics into our analyses by considering a broad range of artist demographics and their overall representation in the market.
Because of the central position genres hold within the music industry, we propose that the appearance of a new genre represents an innovation: something substantial enough to create a new market, generate economic activity, and change the organizing hierarchy in the industry (Dowd 2004; Lopes 1992; McLeod 2001). The mainstream appearance of a new genre—although a “discrete event” that is by no means guaranteed during the process of nascent genre development—is one of the “key events that ‘inaugurate’ the form at the field level as a new form and mark an important moment in its institutionalization” (van Venrooij 2015:110). Peterson and Beal (2001:233) note that three of the key processes involved in genre formation are determining the sound and style of what is deemed “acceptable” for the specific genre, understanding appropriate lyrical themes, and finding a target audience (see also Lena 2006). Naming the genre is a fourth key process, and it indicates a genre has created sufficient internal consistency and external differentiation to distinguish it from other genres (cf. Rosch 1978). Summing up, this work suggests innovation takes place when a genre receives a name that identifies distinct sonic and lyrical themes, and it finds or creates an appropriate market.
Importantly, the genre structure of the industry goes beyond “top-level” genres (e.g., “rock,” “jazz,” “country”) and includes hundreds of subgenres—variations on existing themes or recombinations of existing styles into “streams” (Ennis 1992; see also McLeod 2001)—that are nested underneath the top-level genres (Lena 2006). New subgenres also represent important emergence, albeit on a less grand scale. To that end, we examine the two levels on which genres emerge: new top-level genres (as well as new combinations of existing genres), which represent innovations that shift the music market, and new subgenres (as well as new combinations of existing subgenres) that are more evolutionary but still represent significant structural and cultural developments.
Recall that we set out three objectives in order to answer our main research question about how product features may structure markets and promote or inhibit innovation: (1) demonstrating that product features provide a means of structuring markets and then drawing a connection between the aggregate feature-based market structure and subsequent innovation rates; (2) showing how certain configurations of the feature-based distributions are more likely to precede more novel products in the market; and (3) determining the extent of feature novelty that is more likely to lead to genre innovation versus genre evolution. We split each into its own “study,” with its own theorizing and methodological approach. We connect them theoretically following the introduction to Study 1.
Study 1: Product Features and Genre Innovation
We begin with the assertion that two feature dimensions are especially relevant to the emergence of innovation in cultural markets—story or narrative (semantics) and form (aesthetics). Castañer and Campos (2002) show that cultural producers generate novelty by differentiating along these dimensions (cf. Peterson and Beal 2001). A key assumption here is that an array of cultural products’ features can be reduced to, and reasonably captured by, two dimensions. For instance, in the film industry, plots and screenplays are semantic features, whereas cinematography, sound effects, and casts are aesthetic features. In popular music, lyrics constitute semantic features, 5 and sonic attributes (e.g., tempo, key, timbre, mode) represent aesthetic features (Carey 1969; Hirsch 1971; Roy and Dowd 2010). When combined, the semantic and aesthetic features create a two-dimensional space within which cultural products can be positioned.
As a critical market information regime, the Billboard charts provide a dynamic context for observing the distributions of our two dimensions of interest over time. We argue that the configuration of the distributions of semantic and aesthetic features can serve as the macro structure of opportunities for both product success (what is currently popular? [see, e.g., Askin and Mauskapf 2017]) and innovation (where is there untapped potential?). We focus on the latter question. Below, we define three possible configurations of market-level feature distributions based on the combination of homogeneity and diversity within each dimension (semantic and aesthetic): homogeneity in both, diversity in both, and homogeneity in one and diversity in the other. Drawing on competing predictions from research on differentiation and conformity, we theorize why the three different structures may present different levels of opportunity for innovation.
Differentiation–Conformity Tension
Research on differentiation in the creative industries suggests that product homogeneity in a market implies two conditions that may be conducive to innovation: (1) a high likelihood of unsatiated demand for novel, distinctive products, and (2) an existing hegemonic structure (organizational or product) that provides opportunities for new categories to differentiate from (Anand and Croidieu 2015; Anand and Peterson 2000; Carroll and Swaminathan 2000; Swaminathan 1995). Producers, sensing these conditions and market receptivity, will attempt to capitalize in the form of new organizations or novel products that can give rise to new categories.
Regarding unsated demand, the example of the popular music industry reveals that periods of high song homogeneity are followed by bursts of innovative sounds: in 1955, the rise of rock-n-roll was facilitated by pent-up demand from the 1940s and 1950s, when the commercial music market was dominated by just a few genres and record labels (Peterson and Berger 1975). In the realm of film, innovative genres are more likely to arise when big film studios dominate the market and focus on mainstream genres but fail to serve audiences’ diverse tastes (Mezias and Mezias 2000). This unsated demand spurs the founding of small, specialized firms that increase the creation of new film genres.
Regarding hegemonic structure in a market, which indicates clarity around the organizations that have the most power or the products that are the most popular, research suggests that such conditions indicate a likelihood of greater product homogeneity. This homogeneity enables innovation in the form of new categories and genres because there is an existing frame of reference against which producers can more easily position a new category (Anand and Croidieu 2015; Lena 2012; van Venrooij 2015). This ecological approach explains category emergence across a range of creative industries. For example, various craft and specialty beer categories emerged in the U.S. brewing industry as a reaction to the homogenized offerings of the major breweries. Microbrewers used homogeneity in product features (i.e., a skewed feature distribution) as an opportunity structure providing guidance on where and how to innovate to meet consumer demand (Carroll and Swaminathan 2000). Similarly, new labels by smaller wineries in California emerged as they confronted factory-produced wine (Anand and Croidieu 2015; Swaminathan 1995); nouvelle cuisine arose as a challenge to classical French gastronomy (Rao et al. 2003); and modern architects fought over what differentiated modern architecture from traditional revivalist architecture (Jones et al. 2012). In summary, homogeneity in the marketplace suggests an opportunity for subsequent innovation.
However, when the market is instead characterized by feature diversity, it can be difficult to pinpoint a dominant market position. Such conditions make it challenging for producers to know whom to confront and where to differentiate. Furthermore, it may mean consumers have enough variety to satisfy varying tastes. Hence, this stream of research on differentiation suggests that feature diversity provides little information regarding where and how to innovate, only that such diversity is welcome. The implication is that subsequent rates of innovation are likely to remain flat.
Conversely, work on conformity and legitimacy speaks to a competing prediction. In this view, feature homogeneity indicates that only a small number of legitimate schemas or feature configurations are guaranteed to gain acceptance and attention from audiences. Producers are pressured to conform around the taken-for-granted templates to gain legitimacy in hopes of increasing the likelihood of survival (Deephouse 1999; Greenwood and Suddaby 2006; Meyer and Rowan 1977; Scott 2003). Homogeneity thus suggests isomorphic pressure will drive producers to model their products after a “hit-making” schema rather than innovate. Those who fail to conform to the legitimate formulas are likely to be penalized and may not gain attention (DiMaggio and Powell 1983; Zuckerman 1999). Feature diversity, on the other hand, can be read as signaling more flexible legitimacy standards, as audiences will accept many different product features. From the perspective of conformity and legitimacy, when there is greater diversity in the market, attempts at innovation are less likely to be ignored or penalized (Phillips and Zuckerman 2001; Zuckerman 1999; Zuckerman and Kim 2003). Future innovation rates should rise. Taken together, work on differentiation, emphasizing the agentic behavior of producers, posits that innovation thrives under homogeneity, whereas research on conformity, focusing on the structural influence of market receptivity, suggests diversity is more conducive to innovation.
As discussed, two feature dimensions can combine into three different market configurations: homogeneity across both aesthetic and semantic features, diversity across both feature dimensions, and homogeneity along one dimension and diversity along the other. Regarding the first configuration, building on the differentiation and conformity literatures just discussed, we theorize that despite the potential benefits of capitalizing on unsated demand provided by differentiation, a homogeneous cultural market in both dimensions is unlikely to spur innovation. In markets where the reception of novel products is highly uncertain (Bielby and Bielby 1994; Leahey, Beckman, and Stanko 2017), and the market holds outsized rewards for successes (Rosen 1981), the risk of illegitimacy and its consequences is simply too great (Hsu 2006a, 2006b; Zuckerman and Kim 2003). We expect producers in such markets will play it safe and generally follow existing trends (see Figure 1a).

Market Homogeneity in Both Feature Dimensions
Similarly, although diversity across both feature dimensions implies relatively weaker isomorphic pressure from the market, we anticipate innovation in such a context will be less likely due to difficulties in determining where and how to differentiate (see Figure 1b). Said differently, when the market is already highly differentiated, it will be hard to differentiate enough to generate innovation.

Market Diversity in Both Feature Dimensions
As for the third configuration, mismatched feature distributions, in which there is homogeneity along one dimension and diversity along the other, the tension between differentiation and conformity suggests the following: Cultural producers seeking to innovate should prefer market conditions with a powerful, hegemonic reference point that provides something against which they can differentiate. Simultaneously, they should prefer some degree of diversity in the market because it indicates the market is more accepting of newness, thereby lowering the stakes associated with innovation and signaling that divergence is not only tolerated but potentially celebrated. Therefore, producers can aim to strike a balance, tailoring their innovations to stand apart from the homogeneous baseline while simultaneously capitalizing on the audience’s demonstrated appetite for uniqueness. This leads us to argue that the market will see higher subsequent levels of innovation when there is a mismatched distribution landscape (see Figure 1c). A priori, we are agnostic about which dimension should be in each condition. Stated formally:
Hypothesis 1: Rates of genre innovation will be higher following markets in which there is a mismatch between the distributions of semantic and aesthetic features—that is, one is homogenous and the other is diverse.

Homogeneity/Diversity Combination: Mismatched Feature Dimensions
Connecting Market-Level Feature Distributions and Innovation in Cultural Markets
We have thus far argued that the composition of market-level feature distributions will influence rates of innovation in that market. However, for structural (i.e., macro) conditions to influence market-level outcomes, there must be micro-level changes that reflect reactions to those conditions that can ultimately be aggregated up to the subsequent outcomes (see Coleman 1986). We therefore use Coleman’s (1986, 1994) “boat” or “bathtub” model and follow recent work that operationalizes it empirically as a means of theorizing these macro-micro links—the dynamics between social structure and individual or organizational behavior (e.g., Conti, Kacperczyk, and Valentini 2021).
As depicted in Figure 2, which summarizes the theoretical pathways we propose, we argue that the composition of macro-level feature distributions will point to subsequent rates of innovation in the market (Study 1). What underlies this relationship is that, first, certain combinations of feature distributions signal market conditions that are more amenable or conducive to novelty, which will catalyze producers to create and disseminate music with novel features (macro-to-micro link 1; Study 2). This generation of novelty at the product level, in turn, enhances the likelihood that a song becomes a progenitor of a new genre or subgenre, depending on whether that novelty reflects innovative or evolutionary differentiation (micro-to-micro link 2; Study 3) and there is sufficient audience convergence around that novelty (Boone et al. 2012; Lena 2012; van Venrooij 2015). Finally, when we observe these innovative activities across the micro-level behaviors and aggregate them, we witness a transformation in the social structure, reflected in a measurable change in the macro-level rates of innovation (micro-to-macro link 3). In summary, our “boat” model draws a systemic connection between the distribution of features within a cultural market and its subsequent innovation rate, mediated by the actions and receptivity of individual market participants.

Theoretical Model Using Coleman’s Boat
Study 2: Feature Distributions and Product Novelty
Our model posits that diversity in one dimension and homogeneity in the other is most likely to result in higher market-level rates of innovation. This hypothesis consists of two mechanisms. First, producers will be more likely to create—and audiences will be more amenable to—more novel products, which will see their presence in the market increase following mismatched feature distributions. Second, those more novel, differentiated products will be more likely to become the progenitor product of a new category—that is, an innovation.
The second mechanism requires further theorizing that will be addressed in the next section. In terms of the first mechanism, we follow the same logic as Hypothesis 1, except at the producer rather than the market level. Artists and label executives will view mismatched distribution landscapes (i.e., either diversity–homogeneity or homogeneity–diversity) as indicating greater consumer receptivity toward novelty and differentiation than aligned landscapes (i.e., diversity–diversity or homogeneity–homogeneity). Accordingly, we expect product-level differentiation vis-à-vis a comparison set of recently released products—our operationalization of novelty—to be higher when recent market conditions reveal a mismatched distribution landscape.
Hypothesis 2: Mismatched feature distributions are more likely to precede greater cultural product novelty than are aligned feature distributions.
Study 3: Feature Differentiation, Genre Innovation, and Genre Evolution
Finally, we turn our focus to individual products and their features. Product features are crucial because innovation demands novelty (Becker 1982), and novelty comes from, among other things, the ideological and aesthetic distinctiveness of product features (Jones et al. 2012). Here, we are interested in drawing the connection between producers creating a novel product and that product’s likelihood of finding an audience large enough to register as an innovation. More specifically, if our interest above was in the range of sounds and lyrics presenting opportunities for differentiated music, our interest here is in determining whether products on the more novel end of those dimensions are more likely to become innovations.
Innovation, however, is not a uniform construct: it can be decomposed into different varieties (see note 3). Indeed, extensive feature differentiation is the start of genre innovation (Lampel, Lant, and Shamsie 2000), but such de novo category creation is a rare event, representing the advent of an entirely new set of institutional arrangements. Once innovation in the form of the creation and naming of a new category takes place, the legitimacy conferred shifts the attention of producers and audiences (Anand and Croidieu 2015; Croidieu, Rüling, and Boutinot 2016; Lamont and Molnár 2002; Lena and Peterson 2008). Subsequently, rather than fighting for category-level recognition, producers focus on more individualistic identity claims, differentiating and competing for audience attention within the new category. This, in turn, spawns new products that do not disrupt but sustain the new category via evolution (Beverland 2006; Carroll and Wheaton 2009; Peterson 2005). In summary, genre innovation breeds a number of subgenres under a parent category (see McLeod 2001), which we call genre evolution, and which emphasizes processes of institutional exploration within an established category structure. Genre evolution occurs when feature differentiation does not violate the broader logic of a parent genre yet creates a new subgenre that extends its parent genre.
Two examples highlight the distinctions between genre innovation and evolution. First, in examining the emergence of recorded jazz in the 1920s, Phillips and Owens (2004) show that smaller record labels that had the will and capability to differentiate dramatically (in this case, because the dominant labels would not do so for fear of race-related backlash) were more likely to generate genre innovation (e.g., recording a new genre like radical jazz). This was despite the risk of an illegitimacy penalty. 6 In contrast, firms that aimed to create novel sounds but feared an illegitimacy penalty (or worse) were more likely to produce new subgenres, which contributed to the evolution of jazz more generally. Hip-hop presents a second example. While hip-hop had to revolutionize the way its music was made and heard in order to establish itself as a genre in the 1970s, dozens of subgenres have emerged within hip-hop by delicately differentiating their core message (lyrics) or musical style (sonics) (Alridge 2005; Alridge and Stewart 2005).
These examples suggest there is value in contrasting trailblazing songs with songs whose contribution is smaller in scope, extending existing genres by creating subgenres. We therefore examine the effect of a song’s differentiation along each feature dimension, as well as their interaction, on genre innovation and genre evolution. 7
Hypothesis 3a (innovation): Genre innovation, represented by the appearance of a new parent genre (or new combination of parent genres), is more likely when a song’s lyrical novelty and sonic novelty are both high.
Hypothesis 3b (evolution): Genre evolution, represented by the appearance of a new subgenre (or new combination of subgenres), is more likely when either the lyrical novelty or the sonic novelty of a song is high, but not when both are.
Data and Methods
Continuing in the tradition of the production of culture’s explorations of diversity and innovation in the music industry, we rely on the weekly Billboard Hot 100 chart, which began publishing week-to-week rankings of popular songs on August 4, 1958. Our data run from that date through March 26, 2016. The Billboard charts are among the most reliable and accurate barometers of success and popularity in the U.S. music industry (Lopes 1992) and have been widely used by sociologists, management scholars, and musicologists (Berg 2022; Burnett 1992a, 1996; Dowd 2000, 2004; Lee 2004; Lena 2006; Lopes 1992; Peterson and Berger 1975; Shi 2022; Watson and Anand 2006; Zanella et al. 2022). We use 22,319 songs’ weekly chart appearances, which add up to 276,846 song-weeks, from 1958 to 2016. 8
Using these songs as the foundation, we collected detailed data on the songs, their artists, and the firms that produced them. First, we capitalized on advances in music information retrieval, a field at the intersection of machine learning, computer science, and musicology, to capture 11 sonic features about each song. Table 1 describes the 11 features we collected. These include standard musicological attributes like “tempo” and “key” as well as more algorithmically-derived features like emotional “valence” and “danceability,” which aim to better capture the subjective sonic nature of a song (for more detail, see Askin and Mauskapf 2017). These data, initially generated by algorithms created by a company called The Echo Nest, now help power Spotify’s recommendation engines.
The Echo Nest / Spotify Sonic Features
Second, we collected lyric data from various websites like MetroLyrics.com, lyrics.com, and Genius.com. 9 Third, we used Wikipedia, The Encyclopedia of Pop, Rock & Soul, recording labels’ official websites, Allmusic.com (an online music encyclopedia), and Discogs.com (an online recorded music database and marketplace) to gather artist demographic information like gender, race, and a group indicator (i.e., whether a musician is a solo artist or group), as well as organizational information about firms (i.e., record labels) like organizational structure, firm age, whether founded by a musician, and so on.
Finally, we collected every song’s primary genre—Discogs has 15 genres (e.g., rock, R&B, rap, pop), 14 of which appear in our data—and subgenre (258, including art rock, Chicago soul, jazz funk, and so on). Genres and subgenres in Discogs are user-generated, but the options for each are limited, as are the number of genres (three) and subgenres (six) that can be attributed, specifically to avoid confusion presented by “excess styles.” Genre and subgenre attributions are community-approved, meaning any new release entered into the Discogs database has to receive sufficient supporting votes from the community to get listed on the site. 10 The fan-based nature of the genre attributions likely reflects industry designations of genre, meaning there is likely to be strong validity to the genres and subgenres associated with each song, but it also offers greater freedom and less corporate influence than would genre data obtained from a tightly-controlled music encyclopedia like Allmusic. Discogs is a common source of genre data in music industry studies (Askin and Mauskapf 2017; Shi 2022; van Venrooij 2015; van Venrooij and Schmutz 2018).
Given the different levels of analyses required for our three studies, we constructed three datasets. Across all studies, we also perform robustness checks with an expanded dataset capturing much of the recorded and released music in the United States during the same years covered by our Hot 100 data (see Appendix A for more details).
Study 1: Product Features and Genre Innovation
Dependent Variable
Rate of genre innovation
Our first dependent variable is the percentage of songs in innovative genres appearing on the Hot 100 chart in each time window. 11 We count two types of genre creation as innovative—that is, the successful implementation of novelty. The first type, the conventional way of measuring innovation in studies of popular music (Coase 1979; Lopes 1992; Negus 1999; Peterson and Berger 1975), is the simple appearance of a song in a novel genre: for instance, the first time a hip-hop song appears on the chart. The second type of innovation takes its intuition from work on recombination (Fleming 2001; Hargadon and Sutton 1997; Schumpeter 1934) and is defined as the appearance of a song with a novel combination of existing genres. Songs, like films (Hsu 2006b), can be categorized with multiple genre denominations. If a song is labeled as both hip-hop and electronica, for example, and that combination of genres has not previously appeared, we also count it as an innovation.
As we do not have an exhaustive history of genres, it is difficult to determine whether a song’s genre is new and thus innovative in the early years of the Billboard chart. For example, although several jazz songs show up in the very first week of the Hot 100, we know jazz had long been in the popular realm (see Phillips 2013). Accordingly, genres that already existed in the early period serve as a baseline group instead of being included in our analyses. We only treat genres as new if they do not appear within the first eight weeks of data. Furthermore, while songs whose genre(s) appear on the charts for the first time are deemed innovations, we do not believe that only one such song is innovative. Other songs in the same new genre(s) were likely written or recorded around the same time and may have taken slightly longer to be released or to reach the chart. Following others who have studied innovation in music (de Laat 2014), we count a song as an innovation if it appears within one year of the genre’s (or genre combination’s) first appearance on the charts.
To calculate the rate of genre innovation in the field, we take the ratio of the number of songs in novel genres to the number of total songs over 52-week periods, but sliding 26 weeks at a time (meaning there are 26 weeks of overlap in consecutive 52-week windows, see the Analytic Strategy section). 12 In all our models, we include a lagged version of this dependent variable to account for autocorrelation. In our robustness checks, we replace this dependent variable with one measured at the population level (i.e., the wider music industry, not just the Hot 100) to see if any association between the feature landscape and genre innovation persists across the mainstream market of the Hot 100 and the industry in general.
Independent Variables
Market-level semantic (lyrical) homogeneity
To capture the level of semantic homogeneity in the popular music market, we encoded the lyrics of each song into a single vector using the pre-trained BERT (Bidirectional Encoder Representations from Transformers) model, a Natural Language Processing (NLP) model developed by Google (Devlin et al. 2018). BERT is widely used to extract features of text and represent them numerically via word or sentence-embedding vectors. What makes BERT particularly powerful is its use of contextual word embeddings: whereas Word2Vec and older NLP techniques produce the same word embedding for a word regardless of context, BERT takes context into account, providing more accurate feature representations and thus improving model accuracy. BERT “looks” in both directions (left and right surroundings of a target word) in a sentence simultaneously, using the full context to predict the target word. That is, unlike previous models, it takes both the previous and next tokens (i.e., words) into account at the same time—the most cutting-edge models pre-BERT were missing this “same time” part. BERT functions by randomly masking some words in sentences and then predicting those masked words, while also predicting the next sentence given the features of a focal sentence. In doing so, BERT models the association between two sentences, a feature that allows BERT to outperform other state-of-the-art algorithms (Yoshimura et al. 2019; Young et al. 2018; Zhang et al. 2019).
Effectively, BERT is a model that understands how to represent text. For our purposes, we use the BERT base uncased pre-trained model (24 layers, 1,024-dimension, 16 heads, 340M parameters) from BERT-as-service (Xiao 2018). We feed it lyrics, consisting of a sequence of words, and it scans to the left and right of each word simultaneously to produce a 1,024-dimension vector representation of each word. Next, we average the vectors for all words in a focal song to produce a single vector, yielding an averaged 1,024-dimension vector for each song. We then collect all songs in each 52-week period and average their song-level vectors to create a single vector that represents the market’s lyrical vector for the time window. Finally, we calculate the cosine similarity between each song’s vector and the market-level vector for its 52-week period. This similarity measure reflects how similar each song is to the market-level centroid: the mean of these cosine similarities for all the songs in a given market represents the semantic homogeneity in the market for that period. Concretely, the average cosine similarity is high (i.e., the market is semantically homogenous) if most of the songs are concentrated around the market-level centroid; it is low if most of the songs are widely dispersed, indicating semantic diversity.
Market-level aesthetic (sonic) homogeneity
To measure aesthetic homogeneity, we begin with the 11 sonic attributes from Spotify that represent each song’s aesthetic features. The attributes are captured numerically: most range between 0 and 1, and some are in different scales, such as tempo (e.g., 120 beats per minute) and key (e.g., C#, which is 1 on a 0 to 11 numeric key scale). To vectorize each song, we normalize all sonic attributes to the same 0 to 1 range and integrate them into a single, 11-dimension sonic vector. We then calculate the market-level aesthetic homogeneity similarly to semantic homogeneity. We collect all songs in a given 52-week period and average their song-level aesthetic vectors to create a single vector that represents the entire market’s aesthetic vector for the time window. Next, we calculate the cosine similarity between each song’s vector and the average market-level vector for its time period. This cosine similarity score reflects each song’s similarity to the market-level centroid; the mean of the cosine similarities for all the contemporaneous songs represents the aesthetic homogeneity in the market for that period. Concretely, the average cosine similarity is high (i.e., the market is aesthetically homogeneous) for a 52-week period if songs are concentrated around the market-level centroid. It is low if songs are widely dispersed, indicating aesthetic diversity.
Control Variables
Lagged rate of genre innovation (lagged DV)
Whether a technological breakthrough or a sociological change, innovation is path dependent (Inglehart and Baker 2000; Tavassoli and Karlsson 2015). This means current innovation can be explained by past innovation, raising the possibility of autocorrelation. We therefore control for the rate of genre innovation in the prior period (i.e., the lagged DV).
Market concentration
Our analyses include three variables to control for the effects of market-level firm composition that research suggests may affect product innovation. First, following the tradition in the production of culture literature (Alexander 1996; de Laat 2014; Dowd 2004; Dowd and Blyler 2002; Lee 2004; Peterson and Anand 2004), we compute and include the Herfindahl index—the sum of the squared market shares—among parent record labels represented on the Hot 100 in a given period to control for market concentration.
Prevalence of independent firms in the market
Second, we use a measure that captures the prevalence of independent firms: the ratio of independent record labels in the market. Prior literature suggests innovation is more likely in markets with small specialized organizations among the population (Baum and Mezias 1992; Carroll and Swaminathan 2000; Mezias and Mezias 2000). We therefore include the ratio of the number of independent (“indie”) labels with songs on the charts to the total number of record labels placing songs on the charts in a given period.
Prevalence of decentralized organizational structure in the market
The above variable captures the pervasiveness of indie labels, but it does not consider whether those indie firms were purely independent, such that the label is responsible for all processes associated with the creation and distribution of a record. This includes producing, distributing, marketing, and copyrighting, among other responsibilities. One label does not necessarily handle everything. Some labels may be partially independent, responsible for producing an album but leaving the work of distribution, marketing, and copyright to a parent label. In the debate about market concentration and innovation, scholars have shown that the expansion of decentralized production in the popular music industry (i.e., major labels using subsidiary labels to release certain content) effectively reduced the negative influence of market concentration on creating new music (Burnett 1992b; Dowd 2004; Lopes 1992). Big firms provide their subsidiaries with financial and marketing support, as well as sufficient discretion, enabling them to generate novel cultural products.
Examples of decentralized labels include acquired indie labels (e.g., Motown, which was an indie label until it was acquired by MCA Records in 1988 and kept releasing albums under the Motown name); indie labels with a major label distribution deal (e.g., Cash Money, which was run independently until a $30 million distribution deal with Universal Records in 1998); and original imprints established as subsidiaries by a larger parent firm (e.g., Polydor, which was created in 1954 as an imprint of Deutsche Grammophon). To account for the potential influence of this diffusion of decentralization, we build on Dowd’s (2004) approach to measure decentralization. Because label decentralization can vary over time, as in the case of Motown or Cash Money, we collected information on organizational changes such as mergers, acquisitions, and distribution deals from labels’ official websites, news articles, and Wikipedia. These data allow us to track any events that indicate the decentralization of a focal label. Concretely, in our data, Motown is coded as an indie label until its acquisition in 1988 and then as a decentralized firm from 1988 onward. We code 1 if a subsidiary or independent label is owned or distributed by a major label, deeming that major label and any subsidiaries to be “decentralized.” We then include in our models the ratio of songs produced by decentralized firms to the total count of songs on the chart in a given period.
Genre homogeneity in the market
To avoid conflating genre presence and our feature dimensions (e.g., rock songs that all sound similar may be heavily overrepresented on the chart in a given period), we include controls for genre homogeneity. First, we count the number of songs in each genre in each period. To turn the counts into a single market-level homogeneity score, we used Teachman’s entropy index (Teachman 1980), originally developed by Shannon (1948). Following Harrison and Klein (2007), we chose Teachman’s entropy because our theoretical specification of homogeneity (or, inversely, diversity) is closely related to how they define variety. Entropy is at its maximum when all outcomes are equally likely or, in terms of genre variety, the number of songs from each genre in a given period is equal to one another. Inverting this, our homogeneity measure is (1 – normalized Teachman’s entropy); genre homogeneity is high when the market is dominated by a few primary genres. To create this measure, we first calculated the entropy score of a genre i, Ei, reflecting genre diversity in each period measured as
where Pi is the proportion of genre i’s occurrences in each market. We then normalized that score to put it on a 0 to 1 scale and subtracted it from 1, returning a homogeneity score. A market is highly genre homogeneous when this value is closer to 1, and highly diverse when closer to 0. Our models include this homogeneity variable for primary genres and subgenres.
We next consider demographic factors of the musicians and bands on the Hot 100, as they are likely to exert influence on the emergence of new musical genres. For example, research suggests there is a relationship between inequality and creativity (Godart et al. 2020; Storper and Scott 2008), pointing to an alignment between social distinctions (e.g., race, gender, and class) and musical differentiation (e.g., creating new genres) (Roy and Dowd 2010). Race and gender also play a role in the genre attributions “assigned” to artists (Peterson 1997; Phillips and Kim 2009; Schmutz 2009; Singerman 1999) and whether artists are permitted to span categories (van Venrooij et al. 2022), both of which will affect the likelihood of being among the first artists to be identified as representing a new genre. Considering the power dynamics at play in the music industry, it is possible that minorities could be a source of innovation (Phillips and Owens 2004) or be restricted from innovation opportunities (van Venrooij et al. 2022). Furthermore, work on the relationship between innovation and group dynamics suggests innovation is more likely to come from teams than from individuals, as collaborative efforts allow for specialization and the efficient division of labor (Guimerà et al. 2005; Uzzi et al. 2013). To account for these factors, we introduce the following three control variables.
Female ratio
The female ratio captures the proportion of female artists, whether solo or within a group, relative to the total number of musicians on the Hot 100 during a specific time window. In our calculation of the female ratio, we classify a focal musician as female if she is either a solo female artist or part of a female-dominant group, excluding cases where group composition is equally split between male and female members (e.g., The Mamas & The Papas).
Non-White ratio
The non-White ratio represents the proportion of non-White artists relative to the total number of musicians on the same charts. We classify a focal musician as non-White if they are either a non-White solo artist or a non-White group. Cases where a group includes an equal number of White and non-White members (e.g., B.B. King and Eric Clapton for the album “Riding with the King,” released in 2000) are excluded from this computation.
Group ratio
The group ratio reflects the ratio of non-solo artists to the total number of musicians within the Hot 100 in each period.
Thus far, our control variables are constructed at the level of the Hot 100 chart. However, the emergence of innovative genres that ultimately land on the charts might be catalyzed by market characteristics and information not just taking place on the Hot 100, the most popular marketplace, but also by dynamics playing out across the broader music industry. To address this potential influence, we collected every song from each of the record labels that put at least one song on the Hot 100 during the 59 years covered by our data, plus their subsidiary and imprint labels. This amounted to 1.53 million songs, for which we collected additional data from Discogs, Spotify, and Genius. We ultimately added roughly 374,000 songs (due to data availability) recorded in the U.S. market during our observation period. With these data, we added five controls (see Appendix A for details about the additional data and variable construction).
Release density in population
First, we compute the number of all releases—whether an album or a single—produced each year to address the density-dependence effect of the labels on the new market entries, which may in turn affect the emergence of new genres (Dowd 2004; van Venrooij 2015).
Rate of genre innovation in population
Second, similar to our creation of the rate of genre innovation on the Hot 100, we measure the rate of genre innovation at the population level to control for the possibility that genre innovation on the Hot 100 is associated with genre innovation in the population.
Aesthetic and semantic homogeneity in population
Third and fourth, like the aesthetic and semantic homogeneity measures created for the songs on the Hot 100, we measure the population-level homogeneity across our two dimensions by using all the songs available to control for the potential effect of the population-level feature landscape on genre innovation on the Hot 100.
Decade dummies
Finally, to capture heterogeneity in the ways music is produced and consumed due to time-related trends like technological advances and cultural evolution, we coded dummy variables for seven decades from the 1950s to the 2010s (de Laat 2014; see also Phillips and Owens 2004). There are numerous ways to account for temporal differences in the history of the music industry, but our primary analyses use these decade dummies because this is arguably the most common way—outside of a yearly approach, which we cannot use in light of the way our analyses are constructed—that music fans and followers of the charts conceive of eras (see, e.g., Whitburn 2008).
As noted, the history of the Billboard charts and the broader music industry can be divided in other ways. For example, the algorithm for determining songs’ positions on the Hot 100 has changed several times over the chart’s history (Anand and Peterson 2000), potentially influencing how new songs and genres appear. Moreover, the most popular format for personal music consumption (e.g., cassette, CD, MP3) has similarly changed over the 59 years covered by our data (RIAA 2023). Such changes, reflecting shifts from an industry that was driven by singles, then by albums, then again by singles—and that has gone through multiple format eras—could also influence which songs and genres rise to prominence. Therefore, in our robustness checks, we replace decade dummies with Billboard policy dummies and popular format dummies to see if our results are sensitive to different definitions of distinctive periods in the history of popular music (see Appendix B for details). In the online supplement, Tables S1a and S1b report descriptive statistics and correlation coefficients of the variables for Study 1, and Figure S1 shows the longitudinal trends of our control variables.
Analytic Strategy
Because our dependent variable, rate of genre innovation, is a proportion bounded between 0 and 1, ordinary least squares estimation may be biased and inconsistent. Instead, we use a fractional probit model to test our hypothesis (Papke and Wooldridge 1996). We use Stata’s “fractional response regression” (fracreg) command with a probit link and robust standard errors, which is a standard modeling approach for handling fractional dependent variables (Adegbesan and Higgins 2011). Estimations using fractional logit models and OLS models provide similar results.
Our hypothesis is that the market-level distribution of semantics and aesthetics in time period t – α has an effect on genre innovation(s) in the market in time period t. We set α to 2: in the American popular music industry, recording contracts between artists and recording companies have normally contained a clause on the “traditional album production cycle” that requires the contracted artists to release a new album every 12 to 18 months (Negus 2011:42). Because this indicates a 1- to 1.5-year period to produce new songs, it is reasonable to assume producers—label executives and the musicians themselves—would spend roughly this amount of time to incorporate their read of the market-level feature distribution into the production and release of new songs. Also, as discussed in note 12 the lagging scheme is based on rough time frames in the music industry. Our use of a t – 2 timeframe for the independent variables and applying a 26-week sliding window to our analysis is illustrated in Figure S2 in the online supplement. This rolling window approach is common practice in management research, particularly when main variables are aggregated at the market level over consecutive time periods (e.g., Matusik and Fitza 2012; Rietveld and Ploog 2022; Rietveld, Schilling, and Bellavitis 2019).
We standardize our independent variables—semantic and aesthetic homogeneity—to better interpret the results of their interaction and reduce collinearity (Dalal and Zickar 2012). A test of the variance inflation factors returned no significant problems.
Results
Table 2 highlights the key results of our primary fractional probit models (Table S2 in the online supplement includes all variables, model specifications, and robustness checks). Our main interest is in the interaction term between aesthetic and semantic homogeneity in Model 3, which tests our first hypothesis. Although neither independent variable is individually a significant predictor of genre innovation, their interaction is strongly negative (p < 0.001), indicating that genre innovation is more likely to occur when there was previously homogeneity along one feature dimension and diversity along the other. This is consistent with our first hypothesis.
Select Results from Fractional Probit Models for Rate of Genre Innovations (t) in Study 1
Note: Continuous variables are standardized except for the lagged DV. Robust standard errors are in parentheses. Full results appear in Table S2 in the online supplement.
p < 0.05; **p < 0.01; ***p < 0.001 (two-tailed tests).
In terms of our control variables of interest, subgenre homogeneity shows a negative coefficient across these three models, suggesting the expansion of downstream genre scenes represented by increased subgenre diversity may subsequently promote the creation of new niches for a primary genre (Carroll and Swaminathan 2000; Mezias and Mezias 2000; van Venrooij 2015). Additionally, the consistently negative coefficient on the non-White ratio means a decline in the number of non-White musicians relative to their White counterparts in the market may precede genre innovation, suggesting genre innovation is more likely to emerge when the market was previously characterized by greater racial imbalance (i.e., when the market is highly dominated by White musicians). Similarly, the negative coefficients on group ratio indicate that genre innovation is more likely when the prior mainstream chart was dominated by more solo artists than groups.
Figure 3 plots the marginal average effects on genre innovation based on Model 3. It shows that the conditional mean of the rate of genre innovation is low when aesthetic homogeneity and semantic homogeneity are both one or two standard deviations above or below the mean—that is, when they are both relatively homogeneous (the right side of the dashed line) or both relatively diverse (the left side of the solid line). By contrast, the rate of genre innovation is high when aesthetic homogeneity is high but semantic homogeneity is low, indicating a semantically diverse market (right side of the solid line). The inverse also appears to be true: a semantically homogeneous but aesthetically diverse market is similarly likely to see subsequent market-level innovation (the left side of the dashed line).

Predicted Marginal Effect of Aesthetic (Sonic) Homogeneity at Different Levels of Semantic (Lyrical) Homogeneity from Model 3 in Table 2
To see if one scenario is more likely than the other to lead to genre innovation, we visualize our data in Figure 4, which provides descriptive evidence that lyrical diversity and sonic homogeneity is more likely to precede innovation. Concretely, in the heatmaps, we divide the market landscapes into 25 conditions based on the intensity of aesthetic and semantic homogeneity (from very homogenous to very diverse across each dimension) and enter the rate (Figure 4) and count (Figure S3 in the online supplement) of genre innovation that follows each market condition in each cell. Across both the Hot 100 and the broader music industry, it is clear that genre innovation follows market situations where semantics are diverse but aesthetics are homogeneous, but not the other way around. We explore this dynamic in greater detail in Study 2.

Heatmaps Showing Innovation Rates in the Hot 100 and Population across 25 Market Conditions Based on the Intensity of Aesthetic and Semantic Homogeneity
Robustness Checks
We assess the robustness of the above results by running the following additional models in Table S2 in the online supplement. In Models 4 and 5, we replace our decade dummies with our two additional temporal measures, Billboard policy dummies and popular format dummies. In Models 6 through 10, the dependent variable is replaced by the rate of genre innovation measured at the population level while keeping the same set of covariates as our primary Hot 100 models. The negative coefficients of the interaction terms between aesthetic homogeneity and semantic homogeneity in Models 4 and 5 (p < 0.001) suggest our results are not sensitive to the different ways we account for time across our observation periods. Furthermore, goodness-of-fit measures suggest the decade dummies are best suited for the Hot 100 models, and the popular format dummies are best for the broader industry, although the differences between the two are small.
The consistent pattern of results when modeling our effects at the population level, although of smaller magnitude, indicates the feature landscape captured by the market information regime (i.e., Billboard Hot 100) may be relevant not only to genre innovation in the mainstream market but also to the entire industry. Conversely, the interaction effect of the feature distributions in the wider population are either not significant or show weak effects in the same direction as the Hot 100 feature distribution interaction. This further suggests that because the industry is so vast, the charts may provide a centralized source of information about market features and receptivity for many industry players regardless of their active participation in this niche. Finally, the results generally persist when we consider only population-level variables (see Table S3 in the online supplement). Appendix C provides two qualitative examples of our phenomenon.
Study 2: Feature Distributions and Product Novelty
Having established a connection between genre innovation and product feature distributions in the market for popular music, we move next to the question of whether novel songs are more likely to be produced and released under specific market conditions.
Dependent Variable
Song-level (composite) novelty at t
Song novelty is calculated via two steps. We first calculate a song’s novelty score along each feature dimension, semantic and aesthetic, separately. Then, we standardize each and sum the two novelty scores to place them on the same scale. We call this summed novelty variable composite song novelty. We elaborate on each step.
First, using the song-level BERT-generated lyrical vectors, we compute each song’s semantic novelty by comparing its vector to those of the other songs on the charts over the prior five years. For example, the semantic novelty of a song released in 1983 captures how distant that song’s lyrics are from the centroid of lyrical themes of the songs released between 1978 and 1982 (i.e., it does not include contemporaneous songs and we only account for each song once, even if it appeared on the chart for multiple weeks). We calculate this value by first creating an average 1,024-dimension vector from the comparison set, then subtracting the cosine similarity between the focal song’s vector and that mean vector from one. The higher that novelty score, the more distinctive a song’s lyrics, indicating its themes are different from the themes recently popular on the charts. We do not measure lyrical novelty for songs released during the first four years in our data, although those songs are used to calculate lyrical novelty for songs charting in 1963. Additionally, we ran the analysis with a 2-year window and a 10-year window for robustness checks; results are consistent.
Second, using the 11-attribute sonic vectors from Spotify that we use in Study 1, we apply the same approach to aesthetic novelty. We compute a focal song’s aesthetic novelty relative to the songs on the charts over the prior five years. A song’s aesthetic novelty is high (i.e., it “sounds” different) if a song has a highly distinctive sonic vector distant from its predecessors’ sonic feature centroid. We then standardize and sum the two variables, semantic novelty and aesthetic novelty, yielding composite song novelty.
Independent Variables
Market-level feature configurations at t – 2
Our first argument suggested that market-level feature distributions can provide an opportunity for cultural producers—artists, record producers, and labels—to differentiate their output. Based on this theorizing, we use the (standardized) market-level homogeneity variables created for Study 1 and split them at their mean (i.e., 0). 13 Doing so creates four market conditions based on high or low semantic and aesthetic homogeneity: homogeneous semantics and aesthetics, diverse semantics and aesthetics, diverse semantics and homogeneous aesthetics, and homogeneous semantics and diverse aesthetics. We assign each song to one of the four conditions, based on the market two years prior to a song’s release. As a result, 3,597 songs are assigned to markets where both features are homogenous, 5,866 songs to markets where both are diverse, 5,400 songs to markets with diverse semantics and homogenous aesthetics, and 6,711 to markets with homogenous semantics and diverse aesthetics.
Control Variables
For this set of analyses, we include several covariates constructed at four different levels: firm, musician, population, and Billboard Hot 100. First, we create six variables to account for the characteristics of the record labels responsible for producing new music. To do so, we referred to various archival sources—Wikipedia, labels’ official websites, news outlets, and trade magazines. The first five of these are dummy variables intended to control for the many ways labels are founded and run, each of which plays a role in determining what gets produced.
First, we indicate whether a label is either independent (1) or major (0), and then whether it is decentralized (1, otherwise 0). Next, in line with prior literature that suggests collaborative organizational structures can enhance innovative performance (Walter, Auer, and Ritter 2006; Zhou and Li 2008), we delineate labels based on their adoption of one of the following organizational forms—spin-offs, joint ventures, or partnerships. We assign a 1 if a label falls within one of these categories, otherwise 0. Given the variations in the degree of creative control and artistic autonomy offered to musicians themselves that can affect creativity in the production of cultural products (Godart et al. 2020), we also indicate whether a label was founded by a performer (1, otherwise 0). Additionally, we introduce a variable that indicates whether a label was founded by a producer (1 if yes, 0 if no), acknowledging the distinct role that producers may play in shaping the music creation process. The last firm-level variable is firm age, a continuous variable that reflects organizational maturity, represented numerically by the number of years since the firm’s establishment.
Second, as in Study 1, we control for factors pertaining to the musicians and bands themselves that may influence the music they can produce and its novelty. We include a series of dummies to denote the gender and race of the creator(s) of each song, and whether they are a solo artist or a group. This amounts to five dummy variables: gender (female) for solo artists and groups in which female members outnumber male members; gender (mixed), for groups with equal male and female representation; race (non-White) for non-White solo artists and groups with a majority of non-White members; race (mixed) for groups with equal numbers of White and non-White members; and group for distinguishing between solo artists and bands. Gender (male), race (White), and solo artist are the omitted categories as reference groups. At the musician level, we also include a control, first entry to Hot 100, for whether a song is an artist’s or band’s first song on the charts (de Laat 2014; Dowd 2004), and, in light of the importance of lyrics in our analyses, an indicator for instrumental songs.
Third, we again control for population-level variables: release density in population, genre innovation rate in population, aesthetic homogeneity in population, and semantic homogeneity in population. The operationalization of these variables is the same as in Study 1.
Fourth, we control for seven chart-level variables from Study 1 that likely influence the music that is produced and released: genre innovation rate, firm concentration (HHI), primary genre homogeneity, subgenre homogeneity, female artist ratio, non-White artist ratio, and group ratio.
Finally, we control for each song’s primary genre and the era in which it was released. It is possible that, due to genre-specific characteristics, certain genres are more likely to create fine-grained distinctions that lead to a greater likelihood of putting out novel songs. Taking pop as our baseline genre, we include the following 14 parent genres as control variables: rock, folk-world-country, funk & soul, hip-hop, jazz, blues, electronic, Latin, children, stage & screen, reggae, classical, brass & military, and non-music. We also include controls for time periods: decade dummies in our main models and Billboard policy dummies and popular format dummies in robustness checks.
Analytic Strategy
To test Hypothesis 2, we ran pooled, cross-sectional OLS regressions for our dependent variable, composite song novelty, on the 19,442 songs for which we have complete data. The covariates at the firm or musician level were all measured in year t (i.e., the same year as DV is measured), and those at the population or Hot 100 level were all measured in year t – 2. We use heteroskedasticity-robust standard errors in the analysis. Tables S4a and S4b in the online supplement report descriptive statistics and correlation coefficients for the main variables, except for categorical variables. A test of the variance inflation factors reveals no significant collinearity concerns.
Results
We hypothesized that song novelty would be greater when the prior market was characterized by mismatched feature distributions (i.e., one feature was homogenous and the other was diverse) than when feature distributions were aligned. However, based on the results from Study 1 revealed in Figure 4, we further expect that a market characterized by lyrical diversity and sonic homogeneity will give rise to the most novel songs. Figure 5 depicts the standardized results of our primary model (Model 14 in Table S5 in the online supplement, which reports the full results of OLS regressions testing this hypothesis). Our prediction is supported: we find a significant, positive relationship between a lyrically diverse but sonically homogeneous market at t – 2 and higher song novelty at t, in comparison with the reference, a prior market where both feature dimensions are homogeneous. From these results, we not only find support for Hypothesis 2, but we find further evidence in support of Hypothesis 1: novelty is most likely in markets with a mix of diversity and homogeneity.

Selected Standardized Coefficients from OLS Model 14 Predicting Composite Song Novelty (at t) in Study 2
Robustness Checks
A similar pattern—greater subsequent novelty when the charts are semantically diverse but aesthetically homogenous—remains when we use our different demarcations of time (Models 15 and 16 in Table S5 in the online supplement). However, in these models, we also find that markets where both features are diverse give rise to subsequent novelty, although not to the same extent as the diverse-homogenous market. We explore this further at the population level, conducting similar regression models with the larger, more general dataset (Table S6a in the online supplement includes descriptive statistics for these analyses). Unlike the data collected at the Hot 100 level, for which we hand-coded some variables, the size of this population-level dataset—nearly 380,000 songs—is too large to hand-code. Hence, firm- and musician-level variables are omitted in these models. Nonetheless, the results show similar patterns consistent with the main findings: with our decade-level controls, we find the same positive results for lyrically diverse but sonically homogeneous markets, but with other measures of time, markets where both feature dimensions are diverse are also correlated with greater song novelty (for full results, see Table S7 in the online supplement).
Study 3: Feature Differentiation, Genre Innovation, and Genre Evolution
In our final study, we explore the relationship between individual products’ feature differentiation and genre innovation or evolution. Whereas in Study 2, we created a composite novelty measure based on two product-level features—the sum of semantic and aesthetic novelty—here we keep the two dimensions separate, creating a novelty variable for each to test Hypotheses 3a and 3b.
Dependent Variables
Genre innovation (song-level)
We created a dummy variable, genre innovation, to indicate songs that meet our criteria for representing a genre innovation. A song is coded 1 if it is categorized in a genre (or genre combination) that appeared on the charts for the first time within the 52 weeks prior to that song’s debut, 0 otherwise.
Genre evolution (song-level)
We created another dummy variable, genre evolution, to indicate songs that meet our criteria for representing genre evolution. A song is coded 1 if it is categorized in a subgenre (or combination of subgenres) that appeared on the charts for the first time within the 52 weeks prior to that song’s debut, 0 otherwise.
Independent Variables
Our two independent variables are the two components of the (composite) song novelty variable from Study 2. We are therefore assessing the independent effect of each song’s semantic novelty and aesthetic novelty, as well as their interaction, on genre innovation and genre evolution.
Control Variables
We include several control variables from Study 2: independent firm dummy; decentralized major dummy; spin-off, joint venture, or partnership dummy; founded by performer dummy; founded by producer dummy; firm age; first entry to Hot 100 dummy; gender dummies; race dummies; group dummy; instrumental song dummy, and parent genres dummies. 14 In addition, we include the 11 sonic attributes to control for potential effects of the individual sonic qualities of a song, as well as artist tenure on the Hot 100 to account for artist incumbency (de Laat 2014; Dowd 2004; Lopes 1992). Year dummies are included in all models to account for time-specific effects on our dependent variables.
Analytic Strategy
To test Hypotheses 3a and 3b, which have binary outcome variables, we ran a series of binary logit regression analyses. We use heteroskedasticity-robust standard errors in both analyses. Tables S8a and S8b in the online supplement report descriptive statistics and correlation coefficients for the main variables in Study 3. Our independent variables—semantic novelty and aesthetic novelty—are log-transformed to correct for their right-skewed distribution and then standardized to make the coefficients and their interaction terms more interpretable. A test of the variance inflation factors reveals no significant collinearity concerns.
Results
Table 3 reports the key results of logistic regressions modeling the likelihood of genre innovation and genre evolution (fully specified models and results are in Tables S9 and S10 in the online supplement). We include both independent variables and their interaction in Model 24 and find that, in combination, the two measures of novelty are a powerful positive indicator of genre innovation. When a song is both semantically and aesthetically novel, it is more likely to signal the arrival of a new genre or genre combination. To better understand the interaction effect of the two novelty variables, we include a margins plot in Figure 6. The effect of a song’s semantic novelty on the likelihood of that song signifying a new genre increases substantially when its aesthetic novelty is also high. This is reflected by the solid line in the figure. These results are supportive of Hypothesis 3a, which suggests innovation is the result of products that are highly novel across both feature dimensions. Conversely, when semantic novelty is low (dashed line), the aesthetic novelty of a song has no influence on whether a song is likely to indicate the development of a new genre.
Select Results from Logit Models for Genre Innovation and Genre Evolution in Study 3
Note: Robust standard errors are in parentheses. Full results are in Tables S9 and S10 in the online supplement.
p < 0.05; **p < 0.01; ***p < 0.001 (two-tailed tests).

Predicted Likelihood of Genre Innovation as a Function of Semantic (Lyrical)
Results also suggest that “new” artists are more likely to be innovative: artists’ first songs on the charts are more likely to be the progenitors of new genres than are songs from more established charting artists. As far as artist demographics, we find that female artists are more likely than male artists to produce songs that become genre innovations, as are racially mixed groups (compared to White groups and solo artists). Across all models in Table S9 in the online supplement, the effects for the musician-level covariates remain consistently significant; firm-level controls, however, appear to have little effect on new genres appearing. This may suggest that cultural innovation is more often due to small, cohesive project teams that are tightly built around a focal musician or producer (Faulkner and Anderson 1987; Hsu 2006b), rather than the larger firms that help create and fund those teams.
In contrast to Hypothesis 3a, in Model 29 we are exploring the prediction that the main effects of semantic novelty and aesthetic novelty on genre evolution would be positive, and their interaction would have no effect. The results are consistent with this hypothesis: novelty in each dimension is associated with an increased likelihood of a song becoming the progenitor of a subgenre, but their interaction is not significant. As in the models for genre innovation, few of the firm-level variables appear to influence genre evolution.
However, in terms of musician-level controls, first entry to Hot 100 is positive, again indicating a new-entrant effect on cultural evolution. Otherwise, the musician-level variables provide contrast between genre innovation and evolution. Female artists and racially mixed groups, each positively associated with genre innovation, appear to have a nonsignificant and negative relationship with genre evolution, respectively. Overall, gender and race dummies across the models for genre innovation and evolution, whether merely directional or statistically significant, suggest musicians from socially underrepresented backgrounds are more likely to innovate a new genre than are their overrepresented counterparts, while White and/or male musicians are more likely to evolve within an existing genre. Moreover, the coefficients on the group dummy across the models for genre innovation and evolution suggest solo artists disrupt and groups develop music genres, as in the field of science and technology (Wu, Wang, and Evans 2019).
Robustness Checks
To examine the robustness of our results for Hypotheses 3a and 3b, we ran additional logit models using the population-level data. Similar to the population dataset used for robustness checks in Study 2, we again do not have the same set of variables as in the Hot 100 models due to the size of the dataset—more than 400,000 observations—making it impossible to hand-code detailed information for covariates at the firm and musician level. However, the key dependent variables (i.e., genre innovation in population and genre evolution in population), the key independent variables (i.e., aesthetic novelty, semantic novelty, and their interaction), and all the other covariates we were able to include—individual sonic feature variables, year dummies, and parent genre dummies—are measured in the same manner as the chart-level models in Tables S9 and S10 in the online supplement. The results of logit models using the population data are reported in Table S12 in the online supplement. Results of the population-level models further support Hypotheses 3a and 3b. As on the charts, songs high in both aesthetic and semantic novelty are more likely to become early songs in new genres (innovation), and high aesthetic or semantic novelty (but not both) is sufficient for a song to signal a genre evolution.
Discussion
In this article, we add to an ongoing debate about the nature of market structure and its relationship to innovation (Alexander 1996; Bettis and Hitt 1995; de Laat 2014; Dowd 2004; Lopes 1992; Peterson and Berger 1975; Peterson and Kern 1996; Turner, Mitchell, and Bettis 2010), the persistence of which suggests that no one has satisfactorily resolved the question of which market conditions are most conducive to innovation. Focusing on the American popular music industry, we argue that the debate should include product features, both in aggregate and at the individual product level. Product features have been largely overlooked, as most research in this area focuses on the market actors that produce or consume the products, not the products themselves. We suggest the feature data across two key dimensions—aesthetic (sonic) and semantic (lyrical)—is captured by market information regimes that reveal consumers’ tastes and producers’ choices as well as opportunities. Drawing on intuition and conflicting expectations from institutional theory, resource partitioning theory, and work on innovation in creative industries, we theorize and find support for the claim that mismatched market configurations—characterized by feature diversity along one dimension and homogeneity along the other dimension—is most conducive to subsequent innovation.
To examine how product features structure a market and promote innovation, we conduct three studies, unpacking the potential mechanisms that connect the market-level feature configurations to producer output, which is the initial source of any innovative or evolutionary change in a field. We first analyze 113 market information regimes over 59 years, careful to include the factors that have thus far been regarded as key contributors to subsequent innovation: firm concentration in the market, the ratio of independent firms, the extent of organizational decentralization, and so on. In line with research suggesting that firm concentration and decentralized structure may have less to do with innovation than previously believed, or at least that the relationships vary over time (Burnett 1992b; de Laat 2014; Dowd 2004; Lopes 1992), we find little influence of each of these firm- and market-level factors in our primary analyses. However, when we consider the distribution of product features, we find they play a role in genre innovation: specifically, we discover a negative interaction between aesthetic and semantic homogeneity, suggesting a mismatched market configuration is most conducive to the emergence of innovation. Regarding the question of whether differentiation or conformity will drive subsequent innovation, the answer appears to be “both.” Producers and consumers appear to be most inclined to welcome novelty in the form of new genres when the market provides a clear (likely aesthetic) dominant sound from which a new song can differentiate, as well as sufficient (likely semantic) diversity to signal openness to new themes. In addition to this novel finding, we also see the result as further evidence of the relevance of features to producers deciding what to create and put into the market (Rosa et al. 1999; White 1981).
Our second set of analyses refines our understanding of market-level feature configurations and their relationship to product novelty. We find that mismatched feature distributions in the market—specifically, diverse semantics coupled with homogeneous aesthetics—tend to precede periods characterized by greater song novelty, more so than markets with aligned or inversely mismatched configurations (homogeneous semantics with diverse aesthetics). Interestingly, markets with dual diversity (diverse in both semantics and aesthetics) also appear to foster song novelty, but to a lesser extent. The fact that aesthetic diversity and homogeneity are both features of markets that precede greater song novelty suggests semantic diversity may be the more critical driver of song novelty.
This implies one of two phenomena about lyrics and lyrical themes. On one hand, it may be that when there is high lyrical diversity, we are picking up early signals of a fracturing of the existing genre structure of the market, just before it results in the appearance of an entirely new genre (see van Venrooij 2015). On the other hand, the relationship may be less about lyrical diversity leading to greater novelty and more about lyrical homogeneity acting as a barrier to subsequent novelty. Were this the case, lyrical homogeneity might be a sign of a conformity-inducing mechanism keeping more novel songs off the charts. In fact, this inference aligns with research showing that lyrical content often mirrors broader social dynamics: songs released during the COVID-19 pandemic possessed more negative emotional content (Putter, Krause, and North 2022); unemployment rates are correlated with heightened lyrical anger (Qiu et al. 2021); and in more challenging socioeconomic times, songs with profound, comforting, and romantic themes are more likely to gain prominence (Pettijohn and Sacco 2009) (see also Appendix D). If social dynamics drive similarity in lyrical themes, and songs that speak to those dynamics are rewarded, then subsequent (lyrical) novelty is likely to be diminished—a dynamic that would show up in a positive coefficient on semantic diversity. Exploring the details of the lyrical diversity–song novelty connection is a compelling area for future research. More generally, the Study 2 results suggest that an understanding of market conditions across both feature dimensions is important for artists and labels who seek popular success with novel songs.
Finally, in Study 3, we show that differing degrees of novelty along aesthetic and semantic features contribute to songs’ likelihood of signaling genre innovation or evolution. Novelty in both feature dimensions signals a song’s higher chance of being among the first in a new genre or genre combination, whereas novelty in one dimension is sufficient for a new subgenre or subgenre combination. That these results hold using a broader set of songs from the wider music industry suggests our variables are capturing something meaningful about the production and consumption of music, and not just at the level of the Hot 100 chart. They also suggest the social implications of cultural evolution (i.e., new genres, social categories, and market configurations) are reflected in the products themselves. While the emergence and diffusion of genres have historically been deeply intertwined with racial, gender, and organizational dynamics, as well as the strategic efforts of record labels, our findings add a layer to this understanding. They reveal that, alongside these social and industrial influences, the features of the products themselves also play a pivotal role in the genesis and shaping of new musical genres. Although likely unsurprising to anyone who creates or consumes music, our results underscore the need for a more multifaceted academic view of genre innovation and evolution.
In addition to the exploration of mechanisms potentially responsible for our findings, our study makes several other contributions to the literature. First, although we are not the first to suggest that product features play a role in the structuring and evolution of a cultural market, the addition of the fine-grained data we use represents a theoretical and an empirical contribution to the ongoing conversation. This type of data adds another dimension, alongside organizational concentration and the degree of decentralization, requiring consideration when conceptualizing market structure. Many markets will have more than two relevant features dimensions (more on this below), but any product-driven market can benefit from the inclusion of feature data in their analyses. Cultural markets (e.g., wine, cuisine, television, fashion) may be the most obvious for collecting and including this kind of data, but the relevant features of technological products (see Khessina and Carroll 2008) and other consumer products (see Rosa et al. 1999) are easily found and have already been shown to be relevant to organizational research. More broadly, our study highlights the value of incorporating product features and their evolution into the organizational theorizing around innovation. Our findings underscore the idea that structural and feature-based forces should not be fully decoupled from the creative producers themselves, nor from their market.
Our study also contributes to the examination of gender, race, and power in the music industry. Taking a high-level view of our findings reveals that groups of underrepresented minorities (female artists and mixed-race artists/groups) are more likely to introduce new genres and genre combinations (Study 3), and that new genres are more likely to appear following periods of racial homogeneity (i.e., when the non-White ratio is lower; Study 1). Taken together, these results paint a picture of chart dominance by White male artists setting the stage for subsequent genre innovation, often coming from non-White-male artists and groups (cf. Dowd and Blyler 2002; Dowd, Liddle, and Blyler 2005). Yet mixed-race groups are less likely to introduce genre evolutions (cf. van Venrooij et al. 2022), perhaps reflecting more powerful record labels’ moving in once a genre is established and keeping mixed-race groups from pushing internal boundaries (see Phillips and Owens 2004). Furthermore, the songs that become progenitors of new genres and subgenres tend to come from artists who have not previously appeared on the charts (cf. de Laat 2014; Lopes 1992; Peterson and Berger 1975). Despite findings that speak to these earlier explorations of who is “allowed” to innovate and under what conditions, we do not find evidence of indie labels being more likely to be sources of innovation, nor is the degree of organizational decentralization in the industry a contributor to greater or lower levels of innovation. Taken together, these results suggest innovation may have more to do with performers and their products than with the structure of the organizations behind them.
Our study also incorporates a variety of computational techniques for content analysis. In addition to the features themselves, which were calculated using audio analysis algorithms, we use a neural network text-encoding algorithm, enabling us to separately analyze the core feature dimensions for each product, and to do so at scale. Others have used NLP techniques on song lyrics (Berger and Packard 2018; Nie 2021), but to the best of our knowledge, we are the first to do so at this scale and scope, with the intent of better understanding how lyrical conformity and differentiation contribute to genre formation and evolution. The kind of data and analyses we use contribute to a growing trend in the social sciences, where the use of “digital trace” data—data created and made available as a function of interacting with digital tools and platforms—is being utilized to examine culture, broadly defined (e.g., Mohr et al. 2020), as well as music more specifically (Askin and Mauskapf 2017; Negro, Kovács, and Carroll 2022; Nie 2021). The explosion in available data capturing extensive metadata and algorithmically captured feature data, plus improvements in the accuracy of capturing lyrics and tools for analyzing them, means it is now much easier to understand trends and dynamics across cultural industries at scale, over time. Such techniques and computational advances allow us to propose new explanations that were previously not testable; this sets the stage for future research to dig deeper into the content and structure, and causes and consequences, of market dynamics.
As with all studies, ours is not without its limitations and boundary conditions. The first relates to the bias inherent in the use of Billboard’s Hot 100 chart. A focus on only the Hot 100 and the labels that put songs on this mainstream chart surely means we do not account for many genre and subgenre innovations within the market for music—including many that alter the market’s categorical structuring. However, we maintain that a new genre’s appearance on the charts is a clear signal of an innovation, and we add support for our findings by showing similar patterns in a much broader set of songs. Thus, although we cannot wholly exclude the likelihood that we neglect earlier progenitors of new genres because of their relative lack of popularity or the new genre’s comparatively niche appeal, we believe our findings are (1) robust beyond the limited chart and (2) can be tested in future research that can cover an even more comprehensive range of songs and producers that could not make it to the chart (cf. Negro et al. 2022).
A second limitation pertains to the feature dimensions we use and the generalizability of our findings. First, we do not include anything beyond basic demographic information in terms of the artists themselves. Musicians can play with novelty and creativity in ways we are unable to account for: personal image, visual presentation, and performance style, to name a few. There are currently no easy means for quantifying these, nor are they directly related to the features of the songs themselves. Qualitative ethnographies are necessary to test how these kinds of attributes contribute to assessments of novelty, but they fall outside the scope of this study.
Second, although the identity cues and themes in lyrics and the stylistic characteristics contained in songs’ sonic fingerprints cover the cultural content contained by a piece of music, the same typology may not neatly apply to products outside cultural markets. Even within other cultural markets, when it comes to distinguishing feature dimensions, the field of commercial music is arguably among the simplest to analyze because the core features of popular songs can be easily divided into lyrics and sounds. However, in some markets (e.g., film), many more features exist, and in other markets (e.g., literature), perhaps one feature dimension may be enough. In others still, features can be very hard to measure due to intrinsic ambiguity (e.g., painting). Yet despite these concerns, advances in computational techniques are making data generated by the digital decomposition of cultural objects like paintings and images (e.g., Banerjee, Cole, and Ingram 2023), fashion (e.g., Godart and Galunic 2019), and even films (e.g., Harrison, Carlsen, and Škerlavaj 2019) into their constituent parts more accessible. Armed with these advances and the ability to access this kind of data for vast quantities of cultural objects, we believe future research can further test the role of feature dimensions on opportunities for innovating. Our working hypothesis is that some kind of mismatch between diversity and homogeneity across dimensions—regardless of how many there are—will be the most conducive to producers generating novel products and to markets welcoming more innovation.
A third limitation of our study is related to concerns about the dimensionality of the data needed to best capture the products in a given industry. Our own data are neither as symmetrical as we would like them to be, nor as nuanced. Our sonic feature data are sophisticated—they power the recommendation engines used by Spotify—but they still significantly flatten a song’s complexity. Eleven summary features are inherently reductive, simplifying music in such a way that it is not possible to take the features we use in our analyses for a given song and reverse engineer that song. Moreover, although BERT provides a more detailed analysis of semantic content, it is similarly not feasible to take the lyrical vectors for a song and reconstruct the actual words sung. But what is lost in depth and precision is made up in scale: the inherent reductiveness of computational techniques used to capture cultural features allows us to compare tens of thousands (or more) of these objects. We believe the tradeoffs are worthwhile.
Finally, while our analytic and theoretical approach suggest a causal relationship, we do not test our results in a way that can definitively identify causality. Many factors beyond market structure surely play a role in the rise of innovation, and we cannot categorically say we have ruled them out in favor of the role played by product features. Along these lines, our use of different ways of accounting for temporal changes in the music industry, including shifts in the methodology used to create the Hot 100, reveals varying results. The implications of these differing results as a function of how time is captured in the industry are outside the scope of this article, but we do believe the intersection of features and temporal dynamics in the music industry is compelling territory for future research. Furthermore, while we position the emergence of new genres as being situated between the decisions of artists and label executives and the preferences of consumers, we cannot speak specifically to the drivers of these phenomena—although we are sure both sides of the market are deeply involved. To that end, future research should further examine the questions we raise in two ways. First, we should qualitatively explore whether and how organizations, producers, and consumers process market information in the form of feature dimensions and incorporate it into their decisions about what to produce next or which songs they like. Second, we should take a more explicitly causal quantitative approach to see how producers respond when there are notable changes to the market information regime such that it is clear the configuration of features has changed.
Overall, our study makes important contributions to the theoretical and empirical analyses of the role of market structure in innovation. One of the early articles to explore these dynamics is titled “Cycles in Symbol Production: The Case of Popular Music” (Peterson and Berger 1975), and cycles remain an important aspect of this line of research (see Peterson 1997). We provided evidence that cycles include the features of the products in that market, as well. Innovation rates cycle alongside these feature distributions (see Figure S4 in the online supplement). We believe the framework we created invites scholars interested in the emergence of innovation to consider more seriously the product features of the markets and industries they study. We suspect that practitioners and consumers likely already do—now they have support for their intuition.
Supplemental Material
sj-pdf-1-asr-10.1177_00031224241246271 – Supplemental material for Feature-Based Structures of Opportunity: Genre Innovation in the American Popular Music Industry, 1958 to 2016
Supplemental material, sj-pdf-1-asr-10.1177_00031224241246271 for Feature-Based Structures of Opportunity: Genre Innovation in the American Popular Music Industry, 1958 to 2016 by Khwan Kim and Noah Askin in American Sociological Review
Footnotes
Appendix
Acknowledgements
For their valuable ideas, comments, and support, we are grateful to Matt Bothner, Charlie Galunic, Martin Gargiulo, Frédéric Godart, Henrich Greve, Steve Mezias, Damon Phillips, and Henning Piezunka. Thanks are also due to the attendees of the 2022 Creative Industries Conference in Amsterdam. Finally, we are grateful to editor Lynn Selhat for her constructive feedback and editing prowess. The current and previous editorial teams at ASR provided valuable guidance and support, and three anonymous reviewers were among the most helpful we have ever encountered. We extend our sincere thanks to them all.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
