Abstract
This study explores one potential mechanism contributing to the persistent underrepresentation of women in film by considering whether movie critics reward or penalize films with an independent female presence. Drawing on a sample of widely distributed movies from 2000 to 2009 (n = 975), we test whether films that pass the Bechdel Test (two or more named women speak to each other about something other than a man) have higher or lower Metacritic scores net of control variables, including arthouse production label, genre, production budget, including a top star, and being a sequel. The results indicate that the mere inclusion or absence of an independent female presence has no effect on a film’s composite critical evaluation. These findings suggest that while critical reviews are not a major factor contributing to women’s exclusion from film, movie critics as a whole do not advocate for films with an independent female presence.
Introduction
“It’s the mission of art to free our minds, and the task of criticism to figure out what to do with that freedom,” writes New York Times film critic A. O. Scott (2016). Critical review plays an important role in popular culture, both anticipating and shaping audience preferences. In their work as “surrogate consumers” (Hirsch 1972), they must use a range of sometimes conflicting criteria including highbrow artistic criteria, commercial criteria, and speculative criteria (Fowdur, Kadiyali, and Prince 2012). Few studies have examined how representations of members of minority groups may affect critics’ judgments. In this study, we examine the effect of gendered content on movie reviewers’ critical appraisal.
Women are significantly underrepresented in film (Lauzen 2015; Murphy 2015; Shor et al. 2015). In previous research using the same data set used here, Lindner, Lindquist, and Arnold (2015) showed that films with an independent female presence earn less at the box office not because audiences dislike them but because they receive fewer resources in the production stage. The current research examines a different outcome: not box office revenue but critical appraisal by movie reviewers. A 2009 study documenting racial bias in newspaper movie reviews argued that “a logical extension [of the research] would be to investigate critical bias in favor or against other minority groups including female leads” (Fowdur et al. 2012:22). This study does just that by asking whether films that have an independent female presence fare better or worse with critics.
Drawing on a sample of the most widely distributed films annually from 2000 through 2009 (n =975), we link a content analysis using the Bechdel Test, 1 a measure of whether a film has independent female presence or not, with Metacritic scores while controlling for other factors such as arthouse production label, genre, production budget, including a top star, and being a sequel. In doing so, our research contributes to sociological theory about both mechanisms of gender inequality within the culture industry and heuristics used by critical gatekeepers to judge art.
We begin by briefly reviewing existing literature on the influence of critics, their position within the social hierarchy, and the criteria they use to make assessments about films. We continue by discussing the methods used to code and analyze the films in the sample. Then, we review the results of a linear regression analysis of the Metacritic scores on whether the film passed the Bechdel Test and a number of control variables. We conclude with a discussion of the implications of these findings for the processes by which women are persistently underrepresented in film.
Representing Women in Film
While it is beyond dispute that women are underrepresented on the silver screen, representation itself is multifaceted. Erigha (2015:79) argues that there are three types of representation within the film industry: numerical representation (“a social group’s presence or absence on-screen”), centrality of representation (whether a social group is included at the core or the margins of the industry), and quality of representation (whether members of a social group have the chance to play “multi-dimensional, multi-faceted roles”). A numerical approach to studying women’s representation might establish, for example, the proportion of all characters who appear on screen in a set of films in a given year. In 2017, the Geena Davis Institute and Google developed a tool to analyze video for “a character’s gender . . . how long each actor spoke, and were on-screen” (Google, 2017). In doing so, they found that women had only 36 percent of the screen time and only 35 percent of the speaking time in the 100 top-grossing films of 2014–2016.
Though numerical representation provides us with an important baseline, not all roles, lines, or screen time are equally important to a movie. Centrality of representation approaches consider whether the character is named or whether the actor receives top billing. Quality of representation measures lead us to consider whether the character is a stereotypical portrayal or one with more complexity.
The Bechdel Test—whether a film has at least two named women who talk to each about something other than a man—aims to capture elements of all three types of representation. It requires low numerical representation (at least two women) but specifies some degree of centrality (they have to be named characters). By requiring that the women “talk to each about something other than a man” to pass the test, it establishes a minimal level of quality of representation (i.e., that the women are depicted as having lives independent of men).
It is quite possible that critics respond more positively to one very high-quality character played by a woman in an otherwise all-male cast than strong numerical representation of women but in fairly superficial, stereotypical roles. Alternatively, critics may respond to particular narratives about gender (e.g., women can overcome obstacles through personal resilience) rather than types of representation in the cast.
For their study of racial representation and movie reviews, Fowdur et al. (2012) chose a measure that blends a numerical and a centrality of representation approach: the proportion of black actors out of the top five billed actors. As an initial attempt to study the link between gender representation and movie reviews, we use the Bechdel Test as a means of drawing on three types of representation. In doing so, we may fail to observe effects that would be observed with measures that used a single measure or captured greater variation in numerical or quality of representation. Future research ought to link other measures of gender representation to movie reviews.
The Work of Critics
Movie critics, like cultural producers, operate within a broader system of social relationships and cultural meanings. The cultural diamond first developed by Griswold (2012) is a model that envisions a set of links between four points in a diamond: the cultural object, the creator, the recipient, and the wider social world. While recipients (moviegoers) interpret the cultural object (movie) directly, the film’s impact on audiences is also mediated through movie critics. Applying Griswold’s model to film, movie critics act as part of the wider social world, influencing the decisions of creators, the cultural objects themselves, and of course, recipients’ perceptions of the cultural objects (Griswold 2012).
Likewise, Hirsch (1972:649) has argued that film critics act as “surrogate consumers” who work to anticipate and represent their audiences’ tastes. Such theorizing is supported by many empirical studies that have shown that net of other factors, films with better critical appraisal earn more at the box office (King 2007; Krishnamurthy 2011; Lindner et al. 2015). Studios seek positive reviews and use critics’ quotes to legitimize their films as part of their marketing to the public (Baumann 2007). For these reasons, both creators (the filmmakers and studios) and receivers (moviegoers) find movie critics highly valuable (Fowdur et al. 2012; Krishnamurthy 2011).
Several studies have shown that movie critics have many different ways of making judgments about films. In particular, film reviewers can take on two separate roles: the “influencer role,” in which critics act as opinion leaders who steer consumers’ movie selections in early weeks, and the “predictor role,” where critics’ ratings capture characteristics that appeal to their audience rather than influence audience preference (Fowdur et al. 2012; Reinstein and Snyder 2005). In some ways, these two roles are at odds. As predictors, critics consider “commercial criteria” that are likely to make the movie popular such as the presence of popular stars, familiar plot lines, and engaging special effects (Fowdur et al. 2012). As influencers, they take into account a set of artistic criteria, including filmmaking techniques, symbolic meanings, and naturalistic performances by actors.
Holbrook (1999:147) explains that movie critics, who sit at the top of a “cultural hierarchy,” tend to “rail against the evils of commercialism,” assuming that financial success indicates “low-brow” taste. High in cultural capital, movie critics operate as intellectual gatekeepers, giving the highest ratings to films they see as having artistic merit. Critics tend to prefer films with sustained study of nuanced characters, complex treatments of socio-political themes, and auteur directorial sensibilities. Movie reviewers may grudgingly inform readers that they—the masses—will enjoy the latest blockbuster action flick but reserve their most lavish praise for quiet, thoughtful arthouse films. As New York Times film critic A. O. Scott (2016) wrote in partial jest, “I am critic. A scold, a snob, a paid hack intent on . . . spoiling the fun of the public.”
Given their education and relatively high cultural capital, we might suspect that movie critics would be welcoming of diverse representations on the silver screen, including depictions of women, people of color, low-income characters, and so on. However, some existing evidence suggests otherwise. In their study of racial bias in newspaper movie reviews, Fowdur et al. (2012) found that movies with black leads and white supporting casts received reviews that are 6 percent lower than movies with other racial configurations. While their study offers evidence that movies featuring people of color suffer a penalty with critics, as the authors observe, there is no comparable study assessing how the presence of women in film affect critical appraisals.
The current research explores whether films with an independent female presence (as measured by the Bechdel Test) suffer a penalty with critics in the same way as movies with black leads and white supporting casts. If indeed films with an independent female presence fare poorly with critics, there would likely be significant financial consequences for such movies. Fowdur et al. (2012) found that movies with black leads and white supporting casts earned 4 percent less at the box office than they would have absent racial bias in critics’ reviews. Likewise, any penalty for movies featuring women in critical appraisal would be a mechanism in reproducing the underrepresentation of women in film.
If the results of the current research mirrors the results of the Fowdur et al. (2012) study, we would expect Bechdel movies to receive worse reviews from critics net of other factors. On the other hand, it is also possible that movie critics, high in cultural capital, value forms of diversity and reward films with an independent female presence. Additionally, the genres, creative teams, and audiences for movies featuring women and films featuring black actors differ substantially and may be received quite differently by critics. Although there is no systematic evidence yet to support a conclusion one way or the other, we suspect that critics may see “artistic merit” in diverse representation through the inclusion of an independent female presence; thus, we hypothesize:
Hypothesis 1: All things being equal, movies with an independent female presence (as measured by the Bechdel Test) will receive greater critical acclaim than movies without one.
Methods
This study links the gendered content of films with their critical appraisal. Using the Internet Movie Database (IMDB), we selected as our sample the 100 movies with the widest distribution in the United States (as measured by total number of screens) in each year between 2000 and 2009 (for a total of 1,000 films). After excluding documentaries, the sampling frame included 997 films. Of these, we were able to procure full information for 975 movies.
The unit of analysis for this study is the movie. Our analytic strategy was to examine the relationship between the Bechdel Test and Metacritic score on the bivariate level by using a kernel density plot followed by two linear regression analyses to isolate the independent effect of each of the independent variables on the dependent variable.
Dependent Variable
The dependent variable for this study is critical appraisal as measured by Metacritic score. While movie ratings themselves are fundamentally subjective, Metacritic scores offer one of the best measures of critical consensus in film and have been used in several academic inquiries (Joshi et al. 2010; King 2007). Metacritic converts movie reviews into scores on a 100-point scale (e.g., a movie that receives 3 out of 4 stars would receive a Metacritic score of 75). Quantifying each review not only allows users to easily make comparisons across various reviewers, it also allows Metacritic to produce a composite rating for each film. The score is produced using a weighted average that gives more influence to some critics and publications due to their perceived quality and prestige within the field. Unlike some other movie compiler sites, which use only a bimodal rating of “fresh” or “rotten,” Metacritic’s composite scores range from 0 to 100, capturing more subtle distinctions between films. According to Metacritic, scores from 81 to 100 are “Universally Acclaimed,” 61 to 80 are “Generally Favorable Reviews,” 40 to 60 are “Mixed or Average Reviews,” 20 to 39 are “Generally Unfavorable Reviews,” and 0 to 19 indicates “Overwhelming Dislike.” The scores in our sample range from 9 to 98. Complete descriptive statistics are reported in Table 1.
Descriptive Statistics for All Measures.
Independent Variables
The central independent variable in this study is the Bechdel Test (whether a movie features two or more named women who speak to each about something other than a man). The Bechdel Test is frequently cited in critical studies of literature, gender studies (Anthropy 2012; Power 2009; Thompson and Armato 2012), popular analyses (Hickey 2014; Sharma and Sender 2014), and in our previous quantitative work (Lindner et al. 2015). As previously noted, the Bechdel Test blends three types of representation but offers a relatively low bar for each. Nonetheless, even with the Bechdel Test’s minimal requirements, many movies do not pass. With any more rigorous measure of quality of gender representation, passing would be so rare as to limit variation to the point where meaningful analysis would not be possible. Indeed, the low bar of merely including women with lives independent of men is precisely what makes the Bechdel Test a useful measure. The Bechdel Test variable was coded by the researchers and undergraduate students who volunteered to code movies. For complete details on the data collection and coding process, see Lindner et al. (2015).
To test for the possibility of either mediating or suppressor effects, this study controlled for several variables that past research has shown affect critical appraisal. In particular, we controlled for production budget (in millions of dollars), whether the film is a sequel (sequel = 1, others = 0), whether it features a star from the Forbes Celebrity 100 list (movie with a star = 1, other = 0), whether the film was distributed by an “arthouse” label 2 (arthouse = 1, others = 0), and several genre categories (animated, comedy, drama, and horror; genre = 1, other = 0). Production budget data were drawn from BoxOfficeMojo.com and several reputable industry periodicals. We used IMDB to determine whether the film was a sequel and whether it included a star from that year’s Forbes Celebrity 100 list. Finally, films were classified into genres using Metacritics categories. Genre categories were nonexclusive (e.g., a movie could be both animated and a comedy).
Results
As we have shown in previous research using these data, 57 percent of the hundred most widely distributed films in the U.S. market between 2000 and 2009 do not pass the Bechdel Test. Movies passing the Bechdel Test earn less at the box office because they tend to have small production budgets, a key predictor of box office success (Lindner et al. 2015). In the current research, we ask whether Bechdel movies also suffer a penalty or gain an advantage with movie critics who act as surrogate consumers for potential movie audiences. We hypothesized that all things being equal, movies with an independent female presence (as measured by the Bechdel Test) will receive more critical acclaim (as measured by composite Metacritic scores) than movies without a female presence (Hypothesis 1). Our results suggest that an independent female presence in a film does not significantly affect its critical appraisal. Therefore, we reject Hypothesis 1.
Figure 1 is a kernel density plot 3 comparing the Metacritic scores of movies that pass the test and those that do not. As is apparent, the distributions are quite comparable. The mean for movies that pass is 51.39, and those that fail is 53.04, and for the most part, the distributions are quite similar.

Density plot of Metacritic score by Bechdel Test.
Turning to the regression analyses in Table 2, there is no significant effect of passing the Bechdel Test on critical appraisal at the bivariate level (p > .05). 4 To allow for a potential suppressor effect, we controlled for a range of factors including whether the movie is a sequel, has a top actor, its genre, whether it was released on an arthouse label, and its production budget in Model 2. While several of the control variables are statistically significant (most notably the huge positive effect of being an animated film), net of other factors, the Bechdel Test still had no significant effect on Metacritic score. These findings strongly suggest that the mere inclusion or absence of an independent female presence has no effect on a film’s composite critical evaluation.
Linear Regression Analysis of Metacritic Score.
Note: n = 975.
p < .01. ***p < .001.
Discussion
To better understand the causes behind the persistent underrepresentation of women in film, we must isolate where gender inequalities occur in the cultural diamond. Does it occur “upstream” as studios make decisions about funding and casting? Does it occur “downstream” with audience preference for films featuring men? Or does it occur through gender bias among movie critics who influence audiences? This study explores the latter question, and the results suggest the answer is no. Previous research has shown that Bechdel movies earn less at the box office primarily because they are granted smaller production budgets upstream, not because audiences reject them downstream (Lindner et al. 2015). The current study shows that movies that pass the Bechdel Test do not receive either better or worse reviews than films without an independent female presence.
Unlike the inquiry into racial bias in movie reviews conducted by Fowdur et al. (2012), our findings indicated no comparable gender bias. There is neither a penalty nor a benefit among movie critics for movies that represent interactions among multiple women. It seems clear that representation of diversity is not consciously on their radar as a feature to reward.
Perhaps unsurprisingly, factors such as genre, being on an arthouse distribution label, and production budget do affect critical reviews (as measured by Metacritic scores). The preferences for arthouse labels and dramas over horror movies suggest that movie critics adopt the “influencer role,” selecting on artistic merit and attempting to sway audiences. On the other hand, animated movies and those with bigger budgets also receive better reviews, potentially indicating that reviewers also pay attention to commercial criteria, as previous research has shown (Reinstein and Snyder 2005).
There are a few shortcomings related to the measure of gender representation worth noting. As previously noted, though the Bechdel Test has been used in previous studies and offers a rudimentary measure of whether women are depicted as having lives independent of men, it is not a measure of whether the depiction of women is positive or negative, stereotypical or counter-stereotypical. Moreover, unlike studies by Fowdur et al. (2012) and Lauzen (2015), it does not take into account various configurations of men and women in supporting and leading roles. It may be that there is a penalty when leading women are surrounded by supporting men or alternatively, that movie critics reward strong, counter-stereotypic female characters regardless of whether the movie passes the Bechdel Test. Future research should pursue these questions by examining how films with counter-stereotypic depictions of women and various gender configurations of leading and supporting actors perform with critics—though such films are so rare that it might be difficult to generate a large enough sample of such films to allow for quantitative analysis.
In conclusion, this study contributes to literatures on both gender inequality within the cultural industry and the work of movie critics as cultural producers by offering an initial inquiry into the impact of gender representations in film on critics’ appraisals. Our results indicate that film critics as a whole do not weigh representations of women heavily in making their evaluations of movies. At the same time, gender bias in movie reviews does not appear to be an important factor contributing to the underrepresentation of women in film.
Footnotes
Acknowledgements
The authors thank Melissa Lindquist and Julie Arnold for their contributions to data collection.
1
The Bechdel Test, created by graphic artist Alison Bechdel, originated in a comic strip, Dykes to Watch Out For. For a film to pass the Bechdel Test, it must meet the following criteria: It must have at least two women with names, they must talk to each other, and in the conversation, they must talk about something other than a man (Bechdel Test Movie List 2010). As in previous research (Lindner, Lindquist, and Arnold 2015), we use the Bechdel Test as a measure of whether a movie has an independent female presence.
2
In the past, many movies with artistic themes and techniques were made primarily by independent production companies, and this gave rise to the term indie movies. Today, major studios frequently produce and distribute arthouse movies, releasing the movies on a subsidiary label dedicated to “indie films” (e.g., Focus Features, a division of NBC Universal). Thus, we use the term arthouse label to be inclusive of both artsy movies produced by majors on an arthouse label as well as actually independent studios. An analysis not displayed here using a measure of independent production companies produced no significant effects.
3
The kernel density plot forms a kernel, a smoothed, peaked function at each value of Metacritic score. It can be interpreted in much the same way as a histogram.
4
In analyses not presented here, we used an ordinal measure of the Bechdel Test (ranging from 0 to 3) capturing which of the parts of the Test the film passed. These analyses produced nearly identical results.
