Abstract
Research has consistently found that organizational evaluations produce gendered outcomes. We advance understanding of this inequality by examining how multistage evaluations—a common organizational design feature—shape gendered evaluative disparities. We integrate and extend research on evaluations, status, and inequality to theorize how gendered outcomes can vary between stages within a single, unified evaluation process—what we term “stakes-driven gender inequality.” Our theoretical framework centers on a ubiquitous feature of organizational multistage evaluation processes that is missing from prior theoretical accounts: escalating stakes across stages via increasingly binding commitments and rising cost of wrong selections. We conceptualize two stylized stages within the same evaluation process: a shortlisting stage (a preliminary selection of candidates for further consideration) and a winners stage (in which final, binding selections are made). Using data from a large multistage startup competition, we test our theory and find that female-led startups are 18.7 percent more likely than male-led startups to be selected in the shortlisting stage but 30.7 percent less likely to be selected in the winners stage. Mechanism tests in the shortlisting stage are most consistent with female-led startups being assessed as higher quality, whereas in the winners stage, we find evidence consistent with gendered performance expectations and evaluators’ risk aversion.
Evaluation processes shape the distribution of opportunities and resources across the economy and society, determining access to startup funding, jobs, scientific grants, and more (Merton, 1968; Podolny, 2005; Stuart et al., 1999; Wang, 2016). Although these processes are presented as meritocratic (Alon & Tienda, 2007; Castilla, 2008; McNamee & Miller, 2009), research consistently finds that women receive less favorable outcomes in evaluations than comparable men (Botelho & Abraham, 2017; Brooks et al., 2014; Castilla, 2008; Leung & Koppman, 2018; Miller et al., 2023). Understanding the conditions under which such evaluative disparities arise is therefore central to organizational research (Botelho et al., 2025; Fernandez & Campero, 2017; Ridgeway, 2011, 2014; Rivera & Tilcsik, 2019).
Organizational evaluation processes typically have a multistage structure in which a broad candidate pool is progressively winnowed through sequential evaluation stages to one or more final selections. For example, hiring proceeds from résumé screens to various interviews and, ultimately, to a job offer. Despite multistage evaluations’ ubiquity and an expansive literature on gender and evaluations, few theoretical accounts have addressed how multistage structures shape gender differences in evaluative outcomes (see Abraham et al., 2024 for a discussion; e.g., Bol et al., 2022; Botelho & Abraham, 2017; Fernandez & Fernandez-Mateo, 2006). This gap is notable because extensive research examines gender differences in single evaluation stages (e.g., interviews) in isolation within a broader process (e.g., hiring), highlighting a theoretical tension. Some studies find little or no gender differences in initial stages, such as callbacks for interviews (e.g., Botelho & Chang, 2023; Weisshaar et al., 2024), whereas studies of later stages (e.g., hiring, funding) document significant differences for women (e.g., Brooks et al., 2014; Kanze et al., 2018; Witteman et al., 2019). Taken together, these patterns suggest that a multistage perspective is necessary to understand the conditions under which gendered evaluative outcomes occur.
Organizations use multistage evaluation structures to manage cognitive constraints and risk. Consistent with bounded rationality, decomposing complex judgments into iterative steps, such as stages, makes evaluation more tractable (Newell & Simon, 1972; Simon, 1955). From a risk perspective, a single-shot evaluation concentrates the entire burden of making the right selection on one moment. Multistage designs spread that risk: False positives are relatively more tolerable in early stages because initially selected candidates will be re-evaluated and, thus, can be filtered out in later stages if need be, while more-binding and costlier commitments are deferred to later stages. Therefore, as stages progress, the stakes escalate as evaluators move closer to making binding commitments and the cost of a wrong selection rises. We ask, how do escalating stakes across evaluation stages shape the relationship between gender and evaluative outcomes?
We address this question by developing a theoretical framework that specifies how gender differences in evaluative outcomes vary across stages of a single, unified evaluation process. To illustrate this, we conceptualize two stylized stages within a unified evaluation process that are common in organizations: the shortlisting and winners stages. In the shortlisting stage, evaluators make less-binding, lower-stakes selections (e.g., inviting candidates for interviews, sending grant proposals out for review) that will be revisited at a later stage before more-substantial commitments are made. Shortlisting outcomes are meaningful because initial selection keeps candidates in contention to potentially receive the process’s ultimate allocation and because early rejections can shape the pursuit of similar opportunities later (Brands & Fernandez-Mateo, 2017; Fernandez-Mateo et al., 2023). In the winners stage, evaluators make more-binding commitments by making decisions about candidates that are more difficult to reverse. Here, evaluators allocate the ultimate resource that the evaluation process is designed to distribute (e.g., job offers, startup funding). The winners stage thus functions as a consecration event in which those selected must be seen as worthy and fit for the process’s ultimate recognition (Bourdieu, 1984, 1993; Lamont, 2012), further heightening the cost of false positives, namely the risk of selecting candidates who do not meet the evaluation’s objective (e.g., an unqualified hire).
Our framework considers how escalating stakes across stages shape the mechanisms that influence gender differences in evaluative outcomes. We posit three mechanisms in the shortlisting stage that may contribute to a higher likelihood of selection for women than for men. First, evaluators may be more open to exploratory selections in this stage, potentially favoring underrepresented candidates such as women (Jackson, 2023; Knudsen & Levinthal, 2007; Leung, 2018; Levinthal & March, 1993; Li et al., 2020; March, 1991). Second, women at the initial stage of a competitive evaluation process may be higher quality than their male peers—or perceived as such—by having overcome more-stringent prior hurdles to reach that point (Foschi, 1996, 2000). Third, diversity and inclusion goals are least costly to enact when assessments are non-binding (Kalev et al., 2006; Solow et al., 2011).
We posit two mechanisms in the winners stage that may contribute to a lower likelihood of selection for women. First, under these higher-stakes conditions, evaluators face a stronger imperative to resolve uncertainty about candidates’ expected performance and select those they expect to be most successful. This criterion may heighten their use of gender-based performance expectations relative to the shortlisting stage (Berger, 1977; Correll et al., 2007; Correll & Benard, 2006; Ridgeway, 2011). These expectations may be held by evaluators themselves or may reflect evaluators’ anticipation that other key audiences (e.g., customers) will hold these expectations; either way, they may hinder women’s outcomes (Abraham, 2020; Fernandez-Mateo, 2009). Second, evaluators may act with greater risk aversion and avoid candidates they perceive as non-prototypical, as often applies to women (Boudreau et al., 2016; Eagly & Karau, 2002). Overall, this framework clarifies how stage-to-stage escalations in stakes—through increasingly binding commitments and escalating costs of false positives—shape the relationship between gender and evaluative outcomes.
We test this framework using detailed data from an annual, multistage pitch competition at a leading U.S. accelerator for innovation-driven startups. Over nine years (2013 to 2021), 1,274 unique evaluators assessed 2,960 unique startups. The evaluation process includes a shortlisting stage and a winners stage, which differ in relative stakes but operate within a single, unified evaluation process. In the shortlisting stage, startups are selected to receive educational programming and to remain in contention to be named winners. In the winners stage, the remaining startups are re-evaluated to determine which will receive final commitments of funding (up to $100,000) and public recognition as official winners to valuable external stakeholders (e.g., investors), substantially raising the cost of wrong selections relative to the shortlisting stage. This competition offers key methodological strengths for testing our framework: Evaluators in each stage come from similar backgrounds and are randomly assigned to startups, limiting evaluator-specific differences, and each stage uses identical evaluative criteria and formats, allowing us to isolate how gendered outcomes vary between stages of escalating stakes independent of procedural variation.
We observe evidence consistent with stakes-driven gender inequality. In the shortlisting stage, female-led startups were 18.7 percent (6.2 percentage points) more likely than male-led startups to be selected. In the winners stage, however, they were 30.7 percent (18.9 percentage points) less likely to be selected. We then tested each theorized stage-specific mechanism. In the shortlisting stage, we found patterns most consistent with female-led startups being assessed as higher quality than male-led startups. In the winners stage, we observed patterns consistent with evaluators’ use of gendered performance expectations and risk aversion contributing to female-led startups’ lower selection rates. We also examined a series of non-stakes-based alternative explanations, for which we did not find support.
Our study makes several contributions to organizational research on evaluations, inequality, and entrepreneurship. We theorize about a consequential design feature of organizational evaluations that current theoretical frameworks do not discuss. By advancing a stakes-based account of how gendered outcomes can vary across a multistage evaluation process, we help to clarify how gender differences shift with more-binding commitments and rising costs of making wrong selections. We also outline and test a series of stage-specific mechanisms, offering a roadmap for future research on this critical phenomenon. Our findings help to reconcile the mixed evidence in the literature on when gendered evaluative outcomes arise; we illustrate why analyses of single stages within the broader evaluation process can misstate inequality, and thereby highlight the importance of considering a stage’s position within a larger multistage structure.
Gender Differences in Evaluative Outcomes
Evaluation processes are used to select candidates for valuable opportunities under substantial uncertainty regarding quality. As such, evaluators often rely on noisy, imperfect signals to infer quality (Podolny, 1993; Spence, 1973). Gender frequently emerges as one such proxy. Although evaluation processes are typically described as meritocratic and intended to identify the most-qualified candidates (Alon & Tienda, 2007; Castilla, 2008; Castilla & Benard, 2010; McNamee & Miller, 2009; Scully, 2000), research consistently finds that women receive less-favorable evaluations than similarly qualified men, even when objective qualifications (e.g., human capital, past performance) are held constant (BenYishay et al., 2020; Botelho & Abraham, 2017; Foschi, 2000; Galperin et al., 2020; Smith-Doerr et al., 2019).
Several status-based theories help to explain these disparities. Status characteristics theory argues that widely shared cultural beliefs link demographic groups to perceived competence and potential, systematically producing inequality in evaluations. Evaluators often (implicitly) associate men with higher competence and potential, leading to worse evaluations for equally qualified women (Berger, 1977; Ridgeway, 2011). This reliance on gender has been documented across organizational contexts, including peer review (Card et al., 2020), contract employment (Fernandez-Mateo, 2009), promotion decisions (Castilla, 2008), and startup funding (Kanze et al., 2020).
Role congruity theory further emphasizes that gendered beliefs about fitness for certain positions shape evaluations (Eagly & Karau, 2002). Evaluations of the same candidate can shift with context, with women being viewed as more or less suited for certain roles, such as leadership roles or those in historically male-dominated fields (Biernat & Vescio, 2002). Complementing this perspective, double standards theory argues that evaluators impose stricter performance thresholds on women, requiring higher demonstrated competence for equivalent recognition (Botelho & Abraham, 2017; Foschi, 1996, 2000; Heilman, 1983; Kanze et al., 2020).
Gender Inequality and Startup Evaluations
Our context is the evaluation of innovation-driven startups (see Botelho et al., 2026 for a discussion of startup type)—an essential component of venture creation in which gender inequality is pervasive and has substantial economic and social implications (Coleman & Robb, 2009; Gompers & Wang, 2017; Huang et al., 2021; Kanze et al., 2018). Evaluation processes in this sector allocate critical resources (e.g., funding, recognition) that help startups overcome the “liability of newness” (Ruef et al., 2003; Stinchcombe, 1965, p. 149; Stuart et al., 1999). Startups thus rely heavily on external evaluators—accelerators, angel investors, grant programs, and venture capitalists (henceforth “startup evaluators”)—to access these resources.
Startup evaluators aim to identify the ventures with the greatest potential. However, doing so involves substantial uncertainty because innovation-driven startups pursue novel, untested, and often risky ideas (Scott et al., 2019; Shane & Cable, 2002; Stinchcombe, 1965; Stuart et al., 1999). Best illustrating this uncertainty is the fact that most innovation-driven startups fail, even those backed by experienced evaluators (Botelho & Chang, 2023; Hall & Woodward, 2010). Further, entrepreneurship research has not identified clear proxies for venture quality that consistently predict startup success (Gompers et al., 2020; Howell, 2020; Kerr et al., 2014).
This uncertainty creates fertile ground for gender-based disparities in evaluative outcomes. Evidence consistently shows that female entrepreneurs receive less-favorable evaluations than comparable male entrepreneurs (Coleman & Robb, 2009; Ewens & Townsend, 2020; Huang et al., 2021), even when the underlying startup idea is held constant (Brooks et al., 2014). These patterns align with status characteristics theory, double standards theory, and role congruity theory, as discussed above, with each emphasizing how deeply ingrained cultural beliefs can systematically hinder women, particularly in male-dominated fields like entrepreneurship (Botelho et al., 2024; Calder-Wang & Gompers, 2021; Ewens & Townsend, 2020).
Multistage Evaluations and Gendered Outcomes
Researchers have recently called attention to the need to understand how the structure of evaluation processes shapes gendered outcomes (see Abraham et al., 2024 for a discussion), including features such as rating scales (Botelho et al., 2025; Rivera & Tilcsik, 2019), the availability of prior ratings (Botelho, 2024; Muchnik et al., 2013), status awards (Azoulay et al., 2014; Botelho & Gertsberg, 2021), and question framing (Campbell & Hahl, 2022; Kanze et al., 2018; Miller et al., 2023Sarabi & Lehmann, 2024). For example, Botelho and colleagues (2025) found that when an online services platform replaced its five-star rating scale with a thumbs up/down scale, race-based rating disparities disappeared, eliminating the accompanying income gap.
The multistage format of many organizational evaluation processes, in which sequential assessments (i.e., stages) progressively winnow a pool of candidates to final selections, is a common feature with clear potential to shape gendered outcomes. This format is ubiquitous in hiring decisions (Fernandez & Fernandez-Mateo, 2006; Fernandez & Weinberg, 1997; Moss-Racusin et al., 2012), scientific peer review (Bol et al., 2022; Fini et al., 2022; Li, 2017), school admissions (Castilla & Poskanzer, 2022; Stevens, 2007), professional awards (Bowers & Prato, 2018), and entrepreneurship (Huang et al., 2021). In startup investing, for example, venture capitalists typically first select a group of startups to invite to pitch and later re-evaluate that subset to determine which startups will receive funding.
An understanding of multistage evaluations may help to reconcile seemingly mixed findings on gendered evaluative outcomes from studies that examine single stages in isolation. Some studies (e.g., job applicant callbacks for interviews) find parity or even more-favorable outcomes for women in initial stages (Bertrand & Mullainathan, 2004; Botelho & Chang, 2023; Li et al., 2020; Weisshaar et al., 2024; Williams & Ceci, 2015); however, others find worse outcomes for women in evaluation processes’ final stages (e.g., selecting interviewed candidates for job offers, allocating funding) (Brooks et al., 2014; Huang et al., 2021; Kanze et al., 2020; Leung, 2018). While status-based research shows that gender can shape evaluative outcomes (Berger, 1977; Correll et al., 2007; Correll & Benard, 2006; Ridgeway, 2011), we posit that this disparity can shift with stage-specific conditions.
Evidence from hiring and entrepreneurship provides suggestive support that the same candidate attribute can be weighed differently across stages of the same evaluation process (Fernandez-Mateo & Fernandez, 2016). For example, elite education credentials increase a job applicant’s likelihood of being invited for an interview (Rivera & Tilcsik, 2016), but other signals of high capability (e.g., greater managerial responsibility) can hinder candidates’ likelihood of being hired by calling into question their commitment to the role (Galperin et al., 2020). Related, social ties between entrepreneurs and investors facilitate pitch invitations by reducing investors’ search costs but have little bearing on subsequent funding allocation because these ties do not offer new information at that point (Wang, 2016).
Botelho and Abraham (2017) provided a notable exception to most research’s single-stage focus by examining gender-based double standards across two sequential stages of a multistage evaluation process on a digital investment platform. In that study, investment professionals first chose whether to click on, and thus review in detail, peer investment recommendations from a large set. Once they clicked, they could then anonymously rate the recommendation. Recommendations made by women received less attention in the first stage, which was driven by high search costs (i.e., the volume of recommendations to sift through), but no gender differences were observed in the second stage, highlighting that gendered outcomes can vary within the same multistage evaluation process. We extend this research by theorizing that, as stages progress, relative escalations in stakes (via more binding commitments and higher false-positive costs) can systematically shape the relationship between gender and evaluative outcomes within the same evaluation process.
Stakes and Differences in Gendered Outcomes Across Stages
To understand how gender differences may vary between stages of an organizational evaluation process, we first consider why organizations often use multistage evaluations. Multistage designs make complex assessments under uncertainty more tractable and spread risk. First, consistent with bounded rationality, breaking difficult assessments into iterative stages makes evaluations more manageable (Newell & Simon, 1972; Simon, 1955). Second, and central to our theoretical framework, staging permits provisional selections in initial stages (e.g., inviting job applicants for an interview) that will be re-evaluated later, thus deferring final, binding commitments that are more difficult to reverse to later stages. In other words, commitments and the cost of wrong selections rise as stages progress.
In early stages, evaluators select candidates to be re-evaluated in a subsequent stage rather than allocating the resource that the evaluation process is designed to distribute (e.g., funding, a job offer). This makes the cost of selecting a low-quality candidate (i.e., a false positive) relatively lower because selected candidates can be screened out in a later stage. In the context of startup evaluations, a false positive is the selection of an unsuccessful startup, such as one that fails quickly or does not provide a strong return.
By contrast, the final stage requires a more binding commitment to selected candidates. For example, inviting a job candidate for an interview is a less binding commitment than making a hire because the outcome of the former decision can more easily be reversed. Screening out an interviewed candidate is less difficult than firing someone after they are hired. Accordingly, false positives are less costly when commitments are less binding. Thus, the stakes of the evaluation escalate as stages progress and evaluators move closer to making binding commitments to allocate the evaluation process’s ultimate outputs, significantly increasing the cost of false positives.
Our theoretical framework centers on how these relative, stage-to-stage escalations in stakes shape gendered evaluative outcomes. We consider two stylized stages common to organizational evaluation processes, which we refer to as the “shortlisting” and “winners” stages. Both stages are part of the same unified evaluation process but occur at different points within that process. Shortlisting represents an initial evaluation stage in which evaluators make selections that they know will be revisited before any final commitments are made. In the winners stage, evaluators are aware that they are making final, binding commitments to selected candidates.
Next, we theorize how, as stages progress, escalating stakes create stage-specific conditions that shape gendered outcomes, a pattern we term “stakes-driven gender inequality.”
Stakes-Driven Gender Inequality: Shortlisting Stage
We expect that in the lower-stakes shortlisting stage, female-led startups will be more likely than male-led startups to be selected. We theorize three mechanisms that may contribute to this pattern. First, evaluators may cast a wider net when they know that selections will be revisited, increasing willingness to consider less-conventional candidates like female-led startups. Second, female-led startups in the shortlisting stage may be higher quality—or perceived as such—by having surmounted relatively greater hurdles and double standards before reaching a competitive evaluation process. Third, diversity initiatives such as affirmative action are least costly to enact when selections are reversible.
Innovation research suggests that evaluators typically adopt exploratory approaches in initial assessments in order to cultivate a wide set of options for later consideration under greater scrutiny, which preserves flexibility and avoids screening out strong candidates too early (Knudsen & Levinthal, 2007; Levinthal & March, 1993; March, 1991). This casting-a-wide-net logic may encourage evaluators to include less-traditional candidates to a greater extent when they know those candidates will be re-evaluated later. Thus, the shortlisting stage’s lower stakes may encourage evaluators to take more chances on unconventional candidates. This prioritization of exploration may lead to higher selection rates for female-led startups, which are underrepresented (Gompers & Wang, 2017) and may be seen as less-prototypical candidates (Thébaud, 2015) relative to male-led startups.
Double standards theory implies a second mechanism that could produce higher selection rates for women in the shortlisting stage. Women often face greater scrutiny than similar men to obtain the same resources (Botelho & Abraham, 2017; Foschi, 1996, 2000). Evidence from entrepreneurship is consistent with this dynamic: Female founders often surmount greater hurdles than male founders to achieve the same outcomes (Brooks et al., 2014; Huang et al., 2021). Consequently, in the competitive context of entrepreneurship, female-led startups that persist may be stress-tested to a greater extent than male-led counterparts across interactions with customers, suppliers, investors, and hiring markets. Accordingly, among candidates who reach the initial point of evaluation (the shortlisting stage), female-led startups may be perceived as higher quality on average, contributing to higher selection rates.
Diversity initiatives, such as affirmative action, present a third mechanism that may also lead to higher selection rates for female-led startups in the shortlisting stage. Here, we use “affirmative action” to refer to formal or informal practices that proactively seek to advance underrepresented candidates, which can lower the selection threshold for such candidates. Organizations often adopt diversity-oriented goals to increase the representation of underrepresented groups, with varying success (Kalev et al., 2006). These initiatives are sometimes codified, such as requiring a certain level of diversity (Celis et al., 2021; Solow et al., 2011), and sometimes are implemented in lower-cost, symbolic ways (Edelman, 1992). Under the shortlisting stage’s lower stakes, such initiatives may be less costly to enact, especially with the knowledge that candidates will be re-evaluated in a later stage. Given women’s underrepresentation in entrepreneurship, affirmative action that prioritizes female-led startups may contribute to a higher likelihood of selection in the shortlisting stage.
Considering these ideas together, we theorize that in the lower-stakes shortlisting stage, before commitments become binding and false-positive costs rise, three mechanisms may advantage female-led (vs. male-led) startups: exploratory selection, higher assessed quality, and diversity-oriented practices, yielding the following hypothesis:
Stakes-Driven Gender Inequality: Winners Stage
By contrast, we expect that female-led startups will be less likely than male-led startups to be selected in the winners stage, in which stakes escalate significantly relative to the shortlisting stage as evaluators must make binding commitments, increasing the cost of false positives. Because selections will not be revisited in a later stage, evaluators in the winners stage make final commitments to the candidates they forecast as most likely to succeed and most worthy of being consecrated as the winners of the overall evaluation process (Bourdieu, 1984, 1993; Lamont, 2012). We theorize two mechanisms that can contribute to this pattern. First, gendered performance expectations may now play a relatively greater role than in lower-stakes stages. Second, evaluators may exhibit greater risk aversion as the cost of a wrong selection rises, and favor men as more-prototypical candidates.
Research consistently finds that women fare worse in consequential evaluations that resemble our conceptualization of the winners stage, such as in hiring (e.g., Luo & Zhang, 2022), pay setting (e.g., Sterling & Fernandez, 2018), and startup funding (e.g., Kanze et al., 2020). Research on shifting standards further suggests that evaluators weigh gender differentially depending on the output of their evaluations (Biernat & Vescio, 2002), suggesting that the relationship between gender and evaluative outcomes can shift as stakes escalate.
As evaluators make final, binding selections that culminate the evaluation process in the winners stage, this stage functions as a consecration event that confers the social identity of a winner to selected candidates, requiring that they be seen as worthy of this ultimate recognition (Bourdieu, 1984, 1993; Lamont, 2012). Worthy candidates must be broadly seen as excellent and high potential, which are traits more readily attributed to men than women (Berger, 1977; Correll et al., 2007; Correll & Benard, 2006; Ridgeway, 2011). These dynamics likely intensify in innovation-driven entrepreneurship in which there is significant uncertainty about quality (Botelho et al., 2026; Hall & Woodward, 2010). Thus, under the winners stage’s higher-stakes conditions, gendered expectations of future performance are most likely to shape evaluators’ assessments, contributing to lower selection rates for female-led than male-led startups.
Notably, even if evaluators do not personally hold these expectations, they may anticipate that such expectations from downstream audiences (e.g., investors, customers, partners) will hinder female-led startups’ performance. Evaluators may then discount female-led startups’ chances of success because they expect these startups to face greater market resistance (Abraham, 2020; Fernandez-Mateo, 2009). For example, Abraham (2020) found that even people who are equally likely to contract with male- and female-owned small businesses are less likely to refer women-owned businesses to others, anticipating that those others prefer connections with men. More generally, when selecting candidates for a role, people often favor candidates with demographic characteristics that they believe others will associate with that role (Correll et al., 2017). Thus, in high-stakes evaluations requiring a final forecast of performance, evaluators may hold higher-order expectations (e.g., asking, “how will the market react?”) that are independent of their own beliefs and contribute to a lower selection rate for female-led startups.
Risk aversion could also contribute to lower selection rates for women in the winners stage. As the cost of a wrong selection rises, evaluators may adopt more conservative approaches and re-weight toward candidates with more-conventional profiles (Boudreau et al., 2016; Eagly & Karau, 2002). For example, Boudreau and colleagues (2016) showed that evaluators in innovation contests favored proposals that resembled those that were previously successful. Our argument complements this logic. Since selected candidates will not be re-evaluated after the winners stage, evaluators may act with greater risk aversion and favor candidates that they perceive to be more conventional—here, male-led startups. Consistent with this perspective, executive search professionals cite perceived risk as a reason not to advance women, whom their clients are likely to see as unconventional (Fernandez-Mateo & Fernandez, 2016). This can contribute to lower selection rates for female-led startups, which are likely to be seen as less-traditional candidates in entrepreneurship (Hofstra et al., 2020).
Taking these ideas together, we theorize that in the higher-stakes winners stage, when selections are final and the cost of making the wrong selection is highest, two mechanisms (gendered performance expectations and risk aversion) may contribute to a lower likelihood of selection for female-led startups than male-led startups, yielding the following hypothesis:
Boundary Conditions
We expect our theoretical framework to be most applicable to evaluation processes within the following scope conditions. First, we consider processes in which stakes increase relatively between the shortlisting and winners stages. We do not assume that the shortlisting stage entails no commitments or false-positive costs, only that both are greater in the winners stage. Likewise, we make no claims about the absolute importance of shortlisting outcomes, only their relative bindingness and false-positive costs compared to the winners stage in the same evaluation process. Second, we intentionally abstract from other features of evaluation processes (e.g., whether the same evaluators are present across stages) since these can vary widely. 1 Instead, we focus on how evaluators’ assessments vary with the stakes of a given stage. Third, we assume that all stages are part of one finite, unified evaluation process that culminates in the allocation of a predefined ultimate resource (e.g., startup funding), although stages may allocate intermediate resources along the way (e.g., mentoring). We assume that the candidate pool is sequentially winnowed as stages progress, that in each stage the remaining candidates selected in the preceding stage are evaluated, and that no new candidates are added mid-process.
Methods
Research Context: Multistage Pitch Competition for Innovative Startups
We tested our theoretical framework using data from an annual, multistage pitch competition organized by a leading entrepreneurship accelerator in the United States—one of the largest in the world—which we refer to by the pseudonym “New Venture Accelerator” (NVA). Our analysis covers competitions from 2013 to 2021. The competition’s ultimate goal is to identify a small group of startups to receive funding and to publicly be named that year’s winners in NVA’s high-profile communications to key stakeholders (e.g., investors). To reach that outcome, startups were evaluated across sequential stages, with the pool progressively winnowed in each stage. In the shortlisting stage, startups were selected for admission to the accelerator, receiving educational curricula and remaining in contention to potentially be named winners. In the winners stage, the remaining startups were re-evaluated to ultimately determine which would be named winners. Thus, as startups moved from the shortlisting to the winners stages, they moved closer to becoming the winners of the overall evaluation process. Accordingly, evaluators’ commitments became more binding, and the cost of false positives increased significantly.
Accelerators catalyze startup growth by providing mentorship, strategic guidance, and funding (Cohen et al., 2019; Karp, 2023; Miller et al., 2024). NVA targets innovation-driven, high-tech startups across industries worldwide. To be competitive, startups must show evidence of market traction, such as paying customers. In our sample, startups averaged 5.2 non-founder employees, and 91 percent had at least one employee before taking part in the shortlisting stage, demonstrating meaningful operational maturity.
Details of NVA’s Multistage Evaluation Process
In the first evaluation stage (shortlisting), startups pitched to a panel of judges (described below), who evaluated them for admission to NVA’s accelerator, using a predefined, standardized rubric. 2 Selected startups then received preliminary resources—educational programming, mentorship, and networking opportunities—and, most important, remained in contention to potentially be named winners. Of the 2,960 startups evaluated during our study period, 1,050 (35.5 percent) were selected in the shortlisting stage.
A few months later, the remaining startups were re-evaluated in the pre-winners stage, which used the same format, to determine which would advance to the winners stage. All educational programming ceased immediately before this stage. Of the 1,050 remaining startups (those admitted in shortlisting), 237 advanced (22.6 percent of admitted startups and 8.0 percent of the original pool).
About two weeks after the pre-winners stage (16.3 days on average), the remaining startups were re-evaluated once more in the final evaluation stage (winners stage), which again used the same format as in the prior stages. The winners stage carried the highest stakes of the evaluation process: Evaluators made final, binding selections to allocate the process’s ultimate resources—up to $100,000 in equity-free funding and prominent recognition through NVA’s communications to key stakeholders. A committee of NVA staff allocated the limited funding available based on the competition’s results. Winners that scored more highly were allocated more funding than others and were presented in more prominent positions in official communications of the results. Funding was a one-time grant: NVA took no equity and maintained no long-term governance relationships with winners. 3 One-hundred thirty-three startups were named winners: 56.1 percent of finalists and 4.5 percent of all participants.
Although being selected benefited startups in each stage, evaluators made far more binding commitments in the winners stage, significantly raising the cost of false positives. In this setting, a false positive meant selecting an unsuccessful or low-quality startup, since NVA aims to allocate resources to the highest potential ideas. A low-quality startup is one that failed quickly, did not make substantial impact, or was seen by other stakeholders as substandard. As NVA’s prestige depended on external stakeholders’ perceptions of the winners, mistakenly selecting a low-quality startup as a winner carried reputational cost. A lower-quality startup selected in shortlisting could still be screened out before being allocated limited funding and being made a reputational commitment. By contrast, winners-stage selections were final: Allocated funding was irrevocable, and NVA featured winners prominently in external communications that could not be retracted. Thus, the cost of mistakenly selecting a low-quality startup is much greater in the winners stage than in the shortlisting stage.
Figure 1 summarizes the evaluation process. All stages were clearly positioned in official materials as parts of one unified process with the purpose of identifying winners. Consistent with our theoretical focus, our primary analyses compared gender differences in outcomes between the shortlisting and winners stages, which entailed the largest between-stage escalation in stakes in the process. 4 Judges understood that selections in the shortlisting (and pre-winners) stages would be re-evaluated later, whereas winners-stage selections were definitive and binding.

NVA’s Multistage Evaluation Process•
Key empirical features of NVA’s evaluation process
NVA’s multistage evaluation process is well suited to test our theory of how escalating stakes across stages shape gender differences in evaluative outcomes. NVA’s process offers two crucial methodological strengths: Each startup was evaluated by multiple, qualified judges who were randomly assigned in each stage, and the evaluation format and criteria remained constant across stages. These features increase confidence that any between-stage differences in gendered outcomes reflect the escalating stakes rather than procedural variations.
Judges were randomly assigned to evaluate startups within their industry. The judges brought substantial professional experience to the task—typically as venture capitalists, angel investors, former entrepreneurs, or startup-focused professionals (e.g., consultants). Thus, the dynamics of their evaluations likely reflect those in other contexts. Each startup was independently assessed by 5.8 judges per stage, and each judge evaluated about 6.3 startups per stage. NVA refreshed the judge pool between stages, so startups were not evaluated by the same judge twice. Judges did not receive startups’ materials or scores from prior stages or any information learned by NVA during the accelerator program, which supported comparability between stages. We verified adherence to these procedures. 5
Given the large judge pool (averaging 220 per year), NVA used an overseeing committee to synthesize judges’ independent evaluations into selections, adjusting only for judges’ historical strictness. This committee also determined how much funding each winning startup was allocated based on their scores. Importantly, the committee did not introduce its own quality assessments or evaluate startups directly; final selections were based only on the aggregation of judges’ independent evaluations. Consistent with this procedure, judges’ individual evaluations strongly correlated with selection outcomes in each stage (see Online Appendix A).
The evaluation format was constant across stages. In each stage, startups were randomly grouped with industry peers and randomly assigned to judges from the same industry. Each startup delivered a 10- to 15-minute pitch in each stage, followed by a brief question-and-answer session. Founders then exited and judges discussed the pitch. While judges could deliberate, they were not required to come to a consensus, which is supported by the data. Each judge then recorded their rating of the startup’s overall quality on a 0 to 5 scale, indicating whether it should be selected: 0 (“definitely don’t recommend”), 1 (“strongly don’t recommend”), 2 (“don’t recommend”), 3 (“recommend”), 4 (“strongly recommend”), and 5 (“definitely recommend”). Judges could not see one another’s ratings.
Judges received identical instructions at every stage. They were also given the same text explicitly specifying the criteria by which to evaluate startups: likelihood of future success and growth potential—criteria consistent with other startup evaluations, such as venture capital (Dushnitsky & Sarkar, 2022). Evaluators were instructed to select the highest-quality startups in each stage and were not given any competing objectives, such as balancing their selections across industries. Finally, judges evaluated startups by the same standardized six-attribute rubric (Table 1) in all stages. NVA communicated a general commitment to support diverse founders and required judges to view the same, brief unconscious-bias video before each stage. No stage-specific instructions regarding gender, diversity, or other topics were provided.
Attributes on Evaluation Rubric Provided to Judges
Since NVA’s instructions and rubrics were identical across stages, differences in formal guidance were unlikely to drive between-stage differences in gendered outcomes. It remains possible, however, that evaluators interpreted or weighed the same criteria differently at different stages in ways that impacted outcomes. We view such between-stage shifts in evaluators’ application of criteria or standards as consistent with our theory: As commitment bindingness and false-positive costs rise, evaluators may emphasize certain traits (e.g., gender) they associate (or expect other key market actors to associate) with success or traits more associated with one gender. Without any changes in formal guidance, such a dynamic would represent evaluative gender differences produced by between-stage escalations in stakes. Taken together, NVA’s consistent instructions, procedures, and criteria across stages provided a strong empirical setup to test our theoretical framework and strengthened our ability to attribute between-stage differences in gendered outcomes to relative stakes escalations rather than to shifts in guidance or organizational practices.
Data
Our dataset covers all startups that participated in NVA’s pitch competitions from 2013 to 2021. In total, we analyzed 23,893 independent evaluations of 2,960 distinct startups by 1,274 unique judges. We supplemented these data with contextual information from external sources (e.g., PitchBook) and startups’ written applications to participate. We also integrated judges’ demographics and evaluative histories to assess whether judge-level factors influenced outcomes.
Measures
Evaluation stage outcomes
Our primary dependent variables measure evaluative outcomes at each stage. For the shortlisting stage, Admitted takes the value of 1 for startups selected for admission and 0 otherwise. For the winners stage, Winner takes the value of 1 if a startup was selected as a winner (and allocated funding) and 0 otherwise.
Founder’s gender
Our primary explanatory variable indicates whether a startup was female- or male-led. Judges were not given founders’ demographics and could infer gender only from the person giving the pitch. To re-create this data-generating process, independent coders hand-collected the founders’ gender using publicly available images (e.g., LinkedIn profiles, startups’ websites). Gender was hand-coded for 98.4 percent of founders overall and 100 percent of founders in the winners stage. For the remaining founders, we used pronouns listed in PitchBook biographies and, if needed, name-based inference. The results are robust when restricted to founders whose gender was hand-coded from photographs (see Online Appendix B).
For team-founded startups, we classified founders’ gender based on the individual designated as the “primary contact,” typically the CEO, who led the pitch and interacted most with judges. To assess robustness to this coding, we re-estimated our main analyses using the percentage of women on the founding team (Online Appendix B), with robust results.
Control variables
We included controls at the startup, evaluation process, and judge levels that may influence evaluative outcomes. At the startup level, we controlled for location, with Based outside U.S. taking the value of 1 if the startup was located outside the U.S. and 0 otherwise, 6 and we controlled for industry-fixed effects. 7
Although startup quality is fundamentally unobservable, we took multiple steps to account for potential between-startup quality differences. We incorporated two empirical proxies for startup quality, while acknowledging that no proxy perfectly captures quality. First, Previously raised funding, collected from PitchBook, indicates whether a startup raised funding before the competition. 8 Second, we incorporated judges’ scores of startups’ written applications to participate in the competition. Each startup submitted a written application covering their idea, team, and accomplishments, which was scored on a 0 to 5 scale by 4 to 6 judges (5.5 on average) who were distinct from the judges in the pitch competition. Written application score represents each startup’s average score on this application. 9 These scores were not provided to judges at any evaluation stage and thus represent a measure of quality external to judges’ subsequent evaluations in the competition.
At the evaluation level, we controlled for Place in pitch order, defined as the startup’s position in its group’s pitch sequence (e.g., first to pitch = 1). Even though pitch order is random, judges may evaluate startups differently depending on sequence (Bian et al., 2022). We also controlled for Competition quality, the average rating received by the other startups in a focal startup’s group at the previous evaluation stage. For the shortlisting stage, this measure was computed based on written application scores. Models also include year fixed effects.
At the judge level, we controlled for the judge panel’s gender composition (Percentage female judges), coded algorithmically based on names, using genderize.io. We also included Average judge experience, the mean number of prior NVA cohorts in which each judge in the group had participated. Some specifications include pitch-group fixed effects and judge fixed effects, which we explain in the next section.
Empirical Analysis
We estimated Ordinary Least Squares (OLS) models to examine the relationship between stage-specific evaluative outcomes (Admitted, Winner) and founders’ gender (Female-led). 10 Our main regression specification takes the following form:
where i indexes startups evaluated in year t (the unit of analysis). E is the binary evaluative outcome of interest (Admitted, Winner) for startup i; F takes the value of 1 for female-led startups and 0 for male-led startups. U is a vector of time-invariant startup-level controls, and X is a vector of pitch-group-level controls (see Control Variables section). ψ includes fixed effects for the startup’s industry, and λ includes year-level fixed effects; ε is the error term.
We took several steps to account for between-judge variation. In startup-level models (Admitted, Winner), we included pitch-group fixed effects and clustered standard errors at the pitch-group level because each startup’s outcome aggregates input from all judges assigned to its group. These fixed effects account for systematic differences across evaluation groups and interdependence among judges within a group. For analyses that model individual judges’ evaluations, we estimated startup-judge-level specifications with judge fixed effects, and we clustered standard errors by judge.
Results
First, we present descriptive statistics. Second, we show evidence consistent with stakes-driven gender inequality between stages. Third, we examine the theorized mechanisms that may contribute to these differences, and finally, we assess non-stakes-based alternatives.
Table 2 reports summary statistics for startups by founders’ gender. Female- and male-led startups are largely similar on observables. Consistent with prior research, female-led startups are less likely to have raised funding before applying to NVA, both among those competing in the shortlisting (p < 0.10) and in the winners stages. Female-led startups have slightly higher Written application scores among participants in the shortlisting stage; this difference levels out among participants at the winners stage. Reflecting the randomized design of NVA’s evaluation process, female- and male-led startups were evaluated under comparable conditions, appearing in similar pitch-order positions and facing similar competition quality. Female-led startups pitched in groups with a higher share of female judges in the shortlisting stage but not in the winners stage.
Descriptive Statistics of Female-Led and Male-Led Startups by Evaluation Stage •
All test values are t-tests.
Gender Differences in Evaluative Outcomes Between Stages
Figure 2 provides initial evidence of stage-dependent gender differences in evaluative outcomes. Consistent with Hypothesis 1, female-led startups are 6.2 percentage points (p = 0.002) more likely than male-led startups to be admitted in the shortlisting stage, an 18.7 percent increase over male-led startups’ 33.1 percent admission rate. Consistent with Hypothesis 2, at the higher-stakes winners stage, female-led startups are 18.9 percentage points (p = 0.008) less likely to be named winners, a 30.7 percent decrease from male-led startups’ 61.5 percent winner rate.

Selection Rates by Founders’ Gender in Shortlisting and Winners Stages•
Table 3 presents a more rigorous analysis of these relationships. Model 1, controlling only for year- and industry-fixed effects, estimates that female-led startups are 6.1 percentage points (or 18.4 percent) more likely than male-led startups to be admitted. Adding controls (Model 2) yields an estimate of 5.1 percentage points (15.4 percent). Results are robust to adding pitch-group fixed effects (Model 3). Taken together, these results support Hypothesis 1: Female-led startups are more likely than male-led startups to be selected in the shortlisting stage.
Regression Estimates of Gender Differences in Likelihood of Being Admitted and Winner •
p < .05; **p < .01; ***p < .001
All models are OLS regressions and cluster standard errors at the pitch group-level.
We next examine the winners stage, using the same model progression. Model 4 (Table 3), including only year- and industry-fixed effects, estimates that female-led startups are 16.7 percentage points (or 27.2 percent) less likely than male-led startups to be named winners. Model 5, with control variables, estimates that female-led startups are 18.3 percentage points (29.8 percent) less likely to be named winners. Results are robust to adding pitch-group fixed effects (Model 6). Taken together, these results support Hypothesis 2: Female-led startups are less likely than male-led startups to be selected in the winners stage.
Overall, the fully specified models indicate that female-led startups are 15.4 percent more likely than male-led startups to be admitted in the shortlisting stage (Table 3, Model 2) but 29.8 percent less likely to be selected as winners in the higher-stakes winners stage (Table 3, Model 5). We observe, consistent with our theory, stage-contingent outcomes that vary with stakes: Female-led startups are less likely than male-led startups to be selected in the higher-stakes stage.
Stakes-Driven Mechanisms of Between-Stage Variation in Gendered Outcomes
As we outline in the sections below, we tested potential mechanisms related to stakes that we theorized may contribute to these stage-dependent gender differences. Table 4 summarizes each mechanism’s conceptual logic and empirical test. We then examine other, non-stakes-based alternatives in the section Alternatives: Non-Stakes-Based Explanations for Between-Stage Variation in Gendered Outcomes (see also Table 6).
Summary of Stakes-Driven Mechanism Tests
Stakes-Driven Mechanisms for Female-Led Startups’ Higher Selection Rates in the Shortlisting Stage
Does casting a wide net contribute to female-led startups’ higher selection rates in the shortlisting stage?
We theorized that the shortlisting stage’s lower stakes could enable evaluators to broaden their selections to include less-conventional candidates. Using a text-based measure (following Carlson, 2023), Conceptual distance, of how differentiated each startup was from its competition, we first tested this possibility by assessing whether female-led startups in the shortlisting stage were more unconventional. Each startup’s application included a one-paragraph description. We converted these descriptions into numerical embeddings that capture semantic context (e.g., distinguishing between a “riverbank” and a “financial bank”), and calculated Conceptual distance as the average cosine distance between each startup and all other startups in the same industry-year (standardized within industry-year). Higher values indicate greater differentiation from peer startups. This measure is then standardized, so a value of 0 indicates average differentiation. Figure 3 compares gender differences in Conceptual distance distributions in the shortlisting stage. We do not observe evidence that female-led startups are more unconventional: Kolmogorov-Smirnov (p = 0.284) and Mann-Whitney (p = 0.300) tests indicate no significant distributional differences, and mean values are similar (0.021 vs. –0.007, p = 0.684).

Conceptual Distance from Competitors in Shortlisting Stage Split by Founders’ Gender•
We then tested whether returns to unconventionality vary by gender (i.e., whether the likelihood of selection increases with Conceptual distance more for female- or male-led startups). Although Conceptual distance is weakly associated with selection (p = 0.095) on average, the interaction between Female-led startup and Conceptual distance (Table 5, Model 1) does not suggest that unconventionality leads to higher selection rates for female-led startups (p = 0.496). Taken together, these analyses do not reveal patterns consistent with a casting-a-wide-net mechanism driving female-led startups’ higher likelihood of selection in the shortlisting stage.
Tests of Stakes-Driven Mechanisms for Between-Stage Differences in Gendered Outcomes •
p < .05; **p < .01; ***p < .001
All models are OLS regressions. Models 2, 6, and 7 cluster standard errors at the individual judge level. Models 1, 4, 5, and 8 cluster standard errors at the pitch group level. Models 1 and 2 are estimated in the shortlisting stage; Model 3 is estimated among admitted startups that did not win; Models 4, 5, and 9 are among winners, and Models 6, 7, and 8 are in the winners stage.
Do perceptions that female-led startups are higher quality contribute to their higher selection rates in the shortlisting stage?
We theorized that female-led startups may be assessed as higher quality in the shortlisting stage, consistent with double standards theory (Botelho & Abraham, 2017). Since all startups that reach the shortlisting stage have demonstrated signs of viability, female-led startups may have cleared higher prior hurdles to reach that point (Brooks et al., 2014). We tested this by comparing quality assessments of female- and male-led startups in the shortlisting stage on two different scoring instruments.
First, we analyzed judge-assigned ratings of overall quality. Each judge independently rated each startup on a 0–5 scale (Quality rating). Figure 4 (Panel A) shows distributions of these ratings by founders’ gender in the shortlisting stage. 11 Female-led startups are less likely than male-led startups to receive the lowest scores (1 or 2) and more likely to receive the highest scores (4 or 5). They are 43.5 percent more likely to receive a 5 (16.5 percent vs. 11.5 percent). Model 2 (Table 5) estimates this relationship with controls; female-led startups received 0.134-point higher Quality ratings than male-led startups did on average, suggesting higher perceived quality at shortlisting.

Gender Differences in Quality Ratings•
Judges also scored startups from 1 to 10 on each attribute on NVA’s rubric (Table 1). Figure 5 shows estimated gender differences in ratings on each attribute from regressions using the same specification as in Model 2, Table 5. In the shortlisting stage (left bars), female-led startups were rated higher than male-led startups on Growth, Customer Acquisition, Competitors/Cooperators, and Team, were rated similar on Financials, and were rated slightly (insignificantly) lower on Regulation/IP. Overall, these analyses are consistent with a mechanism in which female-led startups are evaluated as higher quality than male-led startups in the shortlisting stage, contributing to higher selection rates.

Gender Differences in Specific Attribute Ratings Across Stages•
Does affirmative action contribute to female-led startups’ higher selection rates in the shortlisting stage?
We theorized that female-led startups may have higher selection rates in the shortlisting stage through affirmative action. If affirmative action were used, NVA would have lowered the quality threshold necessary for selection for female-led startups relative to male-led startups in order to advance women candidates, which could lead to higher selection rates for female-led startups. We tested this possibility in two ways.
First, if the selection threshold for female-led startups was lowered relative to male-led startups through affirmative action, we would expect a disproportionate share of the lowest-scoring startups admitted (i.e., those nearest to the selection margin) to be female-led. We assessed this possibility by comparing the distributions of Quality ratings assigned to female- and male-led startups in the shortlisting stage among only admitted startups, shown in Figure 6. We do not observe a concentration of female-led startups among the lowest scorers: Female-led startups make up 30.1 percent of all admitted startups but 28.8 percent of the bottom-scoring 5th percentile, 25.5 percent of the bottom 10th, 24.3 percent of the bottom 20th, and 28.0 percent of the bottom 30th. Conversely, female-led startups are more likely than male-led startups to be very highly rated and represent 36.6 percent of the top 10th percentile. These patterns are inconsistent with a lowered selection threshold for female-led startups.

Shortlisting Stage Scores by Founders’ Gender Among Admitted Startups•
However, affirmative action may operate via judges systematically giving female-led startups higher scores, potentially masking a disproportionate concentration of female-led startups among the lowest-quality admits in the previous analysis. We therefore investigated whether female-led startups are disproportionately among the lowest-quality admits by examining an external proxy for quality: long-term survival. If lower-quality female-led startups were advanced through affirmative action, we would expect shorter survival rates for female-led ventures among startups just over the cutoff for selection in shortlisting. Similarly, if female-led startups were systematically given higher scores, we would expect these startups to have shorter survival rates than male-led startups that received similar scores. To measure survival, we tracked how long each startup’s website remained active from the competition date through September 2025, using ping data from the Wayback Machine and assuming that website shutdown typically coincides with business closure. Months to website closure is the number of months a site remained active after the competition; a site was coded as “closed” after 12 consecutive months without ping data. 12
Figure 7 shows gender differences in Months to website closure across progressively larger subsets of admitted startups with the lowest Quality ratings in the shortlisting stage (i.e., those just over the cutoff for selection). For example, the 5 percent increment shows gender differences among the 5 percent lowest-scoring admitted startups by Quality ratings. We excluded winners since winning could increase survival. However, results are robust when we included winners (Online Appendix E). Across intervals, we do not find evidence of lower survival rates for female-led startups among those just over the selection cutoff and beyond. Thus, we do not find evidence that female-led startups were disproportionately represented among the lowest-quality admitted startups. Model 3 (Table 5) tests (with controls) the relationship between female-led startups and Months to website closure on average for the total sample of non-winner admits. We do not find evidence suggestive of gendered survival rates.

Gender Differences in Time to Website Closure Among Non-Winning Startups Near Selection Margin in Shortlisting Stage•
Together, these results do not provide evidence that allows us to rule in affirmative action as a contributing mechanism to female-led startups’ higher selection rates in the shortlisting stage. As shown in Online Appendix F, we also examined the possibility that affirmative action in the shortlisting stage created quality differences among startups competing in the winners stage, and once again we do not find evidence consistent with this process.
In summary, we do not find evidence that allows us to rule in either exploratory selection or affirmative action as driving mechanisms of female-led startups’ higher selection rates in the shortlisting stage. Instead, we observe patterns more consistent with female-led startups being assessed as higher quality in the shortlisting stage.
Stakes-Driven Mechanisms for Female-Led Startups’ Lower Selection Rates in the Winners Stage
Do gendered performance expectations contribute to female-led startups’ lower selection rates in the winners stage?
We theorized that gender-based performance expectations, either held by the evaluator or anticipated from other market actors, will play a larger role under the winners stage’s higher-stakes conditions. We first tested this mechanism by examining gender differences in funding amounts allocated to winners. Again, winners could receive up to $100,000, according to their ratings by judges. If judges’ gendered performance expectations when making final, binding commitments contributed to female-led startups’ lower likelihood of winning, then we would expect evaluators to make lesser commitments to female-led winners by allocating them less funding. Model 4 (Table 5) predicts gender differences in funding allocated and estimates that female-led winners receive 32.4 percent less funding than male-led winners. In non-logged terms, this converts to about $18,900 less. We find similar results when controlling for Quality ratings that were assigned at the winners stage (Model 5), suggesting that lesser commitments are made to female-led startups even when perceptions of quality are similar.
We further examined this mechanism by estimating whether female-led startups are assessed as lower quality in the winners stage. Despite being scored more highly (and being more represented among the highest scorers) in shortlisting, female-led startups receive lower Quality ratings in the winners stage (Figure 4, Panel B). Female-led startups are 42.3 percent less likely than male-led startups to receive a score of 5 (10.9 percent vs. 18.9 percent). Model 6 (Table 5) estimates this relationship with controls and judge fixed effects and finds that female-led startups are assigned Quality ratings that are 0.266 points lower than those assigned to male-led startups. Model 7 shows that this result is robust when we controlled for a startup’s average Quality ratings from the shortlisting stage (which judges at the winners stage did not observe), suggesting that female-led startups are assessed as lower quality in the winners stage even relative to male-led startups that were rated similarly at the earlier stage. This result is consistent with our theorizing that gendered expectations play a larger role when stakes are highest.
We extended the analysis to rubric attributes (Table 1). As detailed above, we estimated gender differences in each attribute rating. Figure 5 shows that although female-led startups are rated higher on most attributes in the shortlisting stage, they are rated as substantially worse on every attribute in the winners stage. These differences persist when we controlled for each startup’s scores on each attribute in the shortlisting stage (Online Appendix G), providing further evidence consistent with gendered expectations playing a larger role in the higher-stakes winners stage.
Does evaluators’ risk aversion contribute to female-led startups’ lower selection rates in the winners stage?
We theorized that evaluators may be more risk averse in the winners stage and avoid less-conventional candidates such as women in entrepreneurship. We examined this mechanism by testing whether gender differences in the winners stage are larger among startups with more unconventional (and thus a likely perception of riskier) ideas, by interacting Female-led startup and Conceptual distance (Model 8, Table 5). While unconventionality is positively associated with selection for male-led startups (positive coefficient on Conceptual distance), female-led startups do not realize a similar return. A one standard deviation increase in Conceptual distance corresponds to an additional 12.4 percentage-point decrease in the likelihood that female-led startups are named winners, relative to male-led peers (p = 0.089). This finding provides suggestive evidence to rule risk aversion in as a potential driving mechanism of female-led startups’ lower selection rates in the winners stage.
Outcome-Based Test
Finally, we conducted an outcome-based test to further probe both mechanisms (Becker, 1993; Hebert, 2025). In this test, we estimated long-term survival rates among only startups that were named winners. A finding that female-led startups do not have different survival rates than male-led startups among winners would provide further evidence that evaluators acted with risk aversion toward female-led startups in the winners stage rather than discriminating against those ventures, since female-led startups would have been less likely to be selected despite having comparable long-term outcomes to those of men. Model 9 (Table 5) does not estimate gender differences in survival rates among startups that were named winners. While this result should be interpreted cautiously given that the estimated coefficient on Female-led startup is positive and the model’s sample size is necessarily small, we observe further evidence consistent with evaluators’ risk aversion.
Overall, we find suggestive evidence for both mechanisms. First, we observe patterns consistent with gendered performance expectations playing a larger role when evaluators make binding commitments with higher false-positive costs. Second, we observe evidence consistent with risk aversion contributing to female-led startups’ lower selection rates.
Alternatives: Non-Stakes-Based Explanations for Between-Stage Variation in Gendered Outcomes
We now examine a series of alternative explanations outside the theorized stakes-based mechanisms that could generate lower selection rates for female-led startups in the winners stage relative to the shortlisting stage. Table 6 summarizes each alternative’s conceptual logic and empirical test.
Summary of Alternative Mechanism Tests
Do gender differences in between-stage quality improvements contribute to gendered outcomes in the winners stage?
Male-led startups could improve more than female-led startups between stages, yielding better outcomes in the winners stage relative to the shortlisting stage. Gendered improvement patterns could arise if mentors preferentially support men, if male-led startups network more effectively (Howell & Nanda, 2023), or if they incorporate feedback more between stages (Miller et al., 2023).
We used the pre-winners stage to test this possibility. The pre-winners stage, which mirrors the shortlisting and winners stages in format and rubric, is used to select which startups will pitch in the winners stage. The time between the pre-winners and winners stages is quite short (16.3 days on average), and all accelerator resources stop at this point. For gender differences in between-stage improvements to drive the results, men would need to make revisions that improve their outcomes at a greater rate than women do.
Although we expect startups to attempt to improve between evaluation stages, gender differences in such improvements between the pre-winners and winners stages are unlikely, for a few reasons. First, because accelerator programming is not provided in this time frame, there is no opportunity for men to differentially take advantage of accelerator resources. Second, there are only about two weeks to make changes between these stages. Third, it seems counterintuitive that startups would make substantial changes after having received the positive signal of being selected to proceed to the winners stage based on the pitch they used in the pre-winners stage. Thus, differences in gendered outcomes between the pre-winners and winners stages would provide evidence that the main results are not fully attributable to startup-level improvements.
Model 1 (Table 7), using the same specification as Models 2 and 5 in Table 3, does not find gender differences in likelihood of selection in the pre-winners stage (p = 0.303). However, as shown in Table 3 (Model 5), female-led startups are less likely to be selected in the higher-stakes winners stage. This does not provide evidence that gender differences in improvements drive between-stage differences in evaluative outcomes.
Tests of Alternative Explanations •
p < .10; *p < .05; **p < .01; ***p < .001
All models are OLS regressions. Models 1 and 2 cluster standard errors at the pitch group level. Models 4 and 5 cluster standard errors at the individual judge level. Model 1 is estimated in the pre-winners stage, Models 2 and 5 are in the winners stage, and Models 3 and 4 are in the shortlisting stage.
Model 2 (Table 7) further examines the assumption that the time between the pre-winners and winners stages does not facilitate gender differences in quality improvements that shape evaluative outcomes. Due to random fluctuations in available room space, competition schedules, and judges’ schedules, there is some variation among startups in how many days elapse between the pre-winners and winners stages. If men use this time to improve to a greater extent than women do, gender differences in the winners stage should be wider when there is more time between stages. The interaction between Female-led startup and Days between pre-winners and winners is positive and statistically insignificant and thus does not provide evidence for this explanation. Together, these tests do not provide evidence that gender differences in between-stage quality improvements drive differences in gendered evaluative outcomes.
Do gender differences in idea variance contribute to gendered outcomes in the winners stage?
If male-led startups pitch higher-variance ideas in the shortlisting stage, they could receive lower average ratings in shortlisting, yet a subset of male-led startups could be at the top of the distribution in the winners stage. We tested this by estimating gender differences in the variance of shortlisting-stage Quality ratings assigned to each startup (Table 7, Model 3). We do not find evidence consistent with male-led startups having higher variance.
Does truncation of the distribution of male-led startups in the shortlisting stage contribute to gendered outcomes in the winners stage?
We also considered whether male-led startups are disproportionately clustered at the very top of the shortlisting stage quality distribution. Given that Quality ratings are capped at the maximum score of 5, it is possible that among the startups that score a 5, male-led startups would be rated higher on an unconstrained scale. Such truncation could allow a small set of exceptional, male-led startups to drive higher selection rates in the winners stage. However, female-led startups are, in fact, more likely to receive the highest possible rating in the shortlisting stage (0.9 percent of female-led startups received a perfect rating across all judges, compared to 0.4 percent of male-led startups). Thus, our findings are not consistent with truncated ratings for male-led startups driving the main results.
Does gender homophily contribute to gendered outcomes in either stage?
Female judges could evaluate female-led startups higher than male-led startups, potentially contributing to between-stage differences. Models 4 and 5 (Table 7) estimate the relationship between judges’ gender and founders’ gender at both stages. We do not find evidence that female judges evaluate female-led startups differently than male-led startups in either stage.
Implications for Startup Outcomes
We next consider how stakes-driven gender inequality may contribute to the gender gap in entrepreneurship. After observing our findings, a natural question is, how many female-led startups went unfunded because gender differences emerge at the highest-stakes winners stage? To answer this question, we estimated counterfactual “winners” that would have won had earlier-stage evaluations been used. For each year, we selected the top-scoring startups from the shortlisting and pre-winners stages, choosing the same number as the actual winners. This created a hypothetical winners set that would have been funded had final selections been based on earlier-stage evaluations made under lower stakes (see Online Appendix H for details).
Figure 8 compares the gender compositions of the actual and counterfactual winners. If shortlisting stage evaluations were used, 39.7 percent of the winners would be female-led; pre-winners stage evaluations yield 25.0 percent. By contrast, 21.8 percent of the actual winners were female-led. These differences illustrate how stakes-driven gender inequality translates into lost funding opportunities for female-led startups relative to outcomes from earlier, less-binding evaluations.

Gender Composition of Winners and Would-Be Winners Based on Pre-Winners Stage•
Discussion
Evaluation processes structure how opportunities and resources are allocated across the economy and society. Organizational scholars have documented persistent gender differences in evaluative outcomes, with women often receiving less-favorable assessments across contexts (Berger, 1977; Botelho & Abraham, 2017; Brooks et al., 2014; Fernandez & Campero, 2017; Ridgeway, 2011, 2014). Redressing these differences requires deeper theoretical and empirical understanding of the conditions under which gender inequalities remain.
We integrate and extend research on evaluations, status, and inequality by theorizing how gendered outcomes can vary between stages within a single, unified evaluation process—what we term “stakes-driven gender inequality.” Our framework centers on a ubiquitous feature of organizational multistage evaluation processes that is missing from prior theoretical accounts: escalating stakes across stages via increasingly binding commitments and rising cost of wrong selections. In the shortlisting stage, selections are provisional and will be re-evaluated in a later stage, making false-positive costs relatively lower. In the winners stage—the process’s consecration event—evaluators make final, binding selections, increasing the cost of false positives.
Testing our theory using detailed data from an annual, multistage pitch competition for innovation-driven entrepreneurs for funding and valuable public recognition, we find evidence consistent with stakes-driven gender inequality. In the shortlisting stage, female-led startups are 6.2 percentage points (an 18.7 percent increase) more likely than male-led startups to be admitted. We also find evidence consistent with female-led startups being perceived as higher quality at this stage, but we do not find evidence that allows us to rule in exploratory search or affirmative action as a primary mechanism. At the higher-stakes winners stage, female-led startups are 18.9 percentage points less likely than male-led startups to be named winners (a 30.7 percent decrease). We find evidence consistent with gendered performance expectations and risk aversion as contributing mechanisms to this pattern.
Although our setting offers several empirical advantages that enable multiple, triangulating mechanism tests, we cannot causally rule in or rule out specific mechanisms. However, we examined a series of mechanisms that are, based on extant theory, most likely to contribute to our results, in order to assess which are consistently supported and to offer a roadmap of potential mechanisms for future research to examine. Given the ubiquity of multistage evaluations, further testing these mechanisms across settings will be valuable. This line of inquiry will benefit from designs that leverage exogenous shifts in evaluative conditions—such as changes in bindingness, false-positive costs, or available information—paired with behavioral measures to test, refine, and adjudicate among mechanisms. Those mechanisms for which we do not find evidence in our context may nonetheless be consequential elsewhere, particularly when bindingness, false-positive costs, and information structures differ.
Our central contribution is to show how gendered outcomes can vary between stages of different stakes within a unified evaluation process. We foreground stakes, through increasingly binding commitments and escalating cost of wrong selections, as a consequential but undertheorized feature of organizational evaluation processes that can shape stage-dependent gendered outcomes. Although the literature on evaluations and inequality is vast, most theory and evidence consider only a single evaluation stage within a larger multistage process. Our framework, instead, integrates multiple stages, tracing how gendered outcomes shift from provisional stages to final, binding assessments. By doing so, we offer a roadmap for understanding multistage evaluation processes, a core organizational phenomenon.
We answer calls to incorporate the structure of evaluation processes into theorizing (Abraham et al., 2024; Botelho, 2024; Botelho et al., 2025; Rivera & Tilcsik, 2019), by highlighting a common yet overlooked feature of such processes: increasing stakes as stages progress. In doing so, we build on the limited research that has examined multistage evaluation processes. Botelho and Abraham (2017) showed that gender gaps can attenuate as evaluations progress when evaluators gain more thorough information about the recommendation they are evaluating. Our framework builds on and complements this work by adding a stakes-based theoretical lens. In their setting, investment professionals were less likely to examine women’s stock recommendations in detail than men’s; however, once the recommendations were examined, there were no gender differences in anonymous ratings of those recommendations. In their case, the first stage carries a more binding commitment and higher false-positive costs, since evaluators risk spending their limited time on a low-value recommendation, whereas the anonymous rating stage is virtually costless. Consistent with our perspective, the most pronounced gender difference appears in the higher-stakes stage.
Note also that the core construct theorized to drive gender differences in Botelho and Abraham’s study, formal changes in available information as stages progress, is absent in our setting. At NVA, judges are refreshed between stages and do not receive any information from prior stages or within-accelerator events. This feature is empirically useful, as we are able to isolate gender differences between stages with escalating stakes from changes in formally available information. This feature also makes the shortlisting- and winners-stage evaluations more directly comparable. However, this condition is relatively uncommon, and we view information changes as an important feature of many organizational evaluation processes; for example, hiring processes sometimes add more complex skills assessments as stages progress, or schedule repeat interviews with the same evaluator. Such changes could mitigate gender differences in the winners stage by reducing reliance on gendered expectations. Understanding how information flows interact with stage-to-stage shifts in stakes is an important area for future research.
While we are able to rule out formal changes in the information available to evaluators between stages in this startup accelerator setting, it remains possible that evaluators breached protocol and shared information about candidates through informal conversations. We view this as unlikely because NVA cultivates a diverse set of evaluators who cross many organizations, and thus evaluators would not have known the identities of the other judges or had substantial avenues for informal conversations. Moreover, evaluators were not aware of which startups they would judge until the session began and thus could not ask peers in advance for their impressions about those startups. Nonetheless, such interactions remain possible. Related, although NVA maintained a rule that judges could not evaluate startups with whom they had prior interactions and founders did not know which judges would score them until entering the room, some founders could have shared information with judges behind the scenes. If male founders did so to a greater extent, such as by sharing future plans that judges found compelling, this could shape the results. However, we see no theoretical prior to expect that gender differences in use of alternative communication channels would increase between the shortlisting and winners stages. Thus, the extent to which judges responded to those pitches more in the winners stage would still represent the use of gendered expectations to forecast future startup performance in the higher-stakes stage.
Our stakes-based, multistage framework also helps to reconcile seemingly mixed findings in prior research. Some studies find minimal gender differences, or even better evaluations for women, in initial stages (e.g., Bertrand & Mullainathan, 2004; Botelho & Chang, 2023; Gornall & Strebulaev, 2023; Leung, 2018; Weisshaar et al., 2024; Williams & Ceci, 2015), whereas others find that women fare worse in later stages that allocate the process’s ultimate outcome (Brooks et al., 2014; Castilla, 2008; Ewens & Townsend, 2020; Kanze et al., 2020). By explicitly incorporating multistage structure and stage-to-stage escalations in bindingness and false-positive costs into theory, we can better identify when differences are most likely to emerge.
Related, the stakes-driven theory presented here can also inform research on single-stage evaluations. Understanding an evaluation’s bindingness and false-positive costs can help scholars anticipate when inequality is more or less likely to occur. While we focus on multistage evaluations, future research may examine the degree to which the stakes-driven inequality concept presented here is portable to other settings. When a single-stage assessment is relatively low stakes (e.g., selecting a supplier for a small purchase or crowdfunding on Kickstarter), evaluators may be better able to tolerate potential false positives. In these cases, parity or even higher assessments for women may arise. Conversely, when a single-stage assessment requires a binding commitment or carries substantial false-positive costs (e.g., signing a multiyear supplier contract or approving a loan application), gendered performance expectations and risk aversion are more likely to shape assessments, and women may be less likely to be selected.
We may also observe stakes-driven variation in gendered outcomes in distinct evaluation processes. For example, when making larger funding decisions that they perceive as carrying greater risk, investors may be less apt to consider female-led startups, or hiring managers may advance female-led candidates to a greater extent for jobs that they see as less consequential or lower in the organizational hierarchy. The conceptualization of stakes as based on commitment bindingness and false-positive costs is likely present in single-stage evaluations, even if in a less continuous form, and thus may shape gendered outcomes.
We also contribute to growing research on how structural features of evaluation processes shape inequality. Prior studies show that candidate-pool composition (e.g., Leung & Koppman, 2018), rating scales (e.g., Botelho et al., 2025; Rivera & Tilcsik, 2019), and question design (e.g., Miller et al., 2023) affect evaluative differences. Given how widely evaluation structures vary, it is essential to consider how design choices interact with escalating stakes. We provide evidence that differences by candidate characteristics (here, gender) can vary systematically across stages as stakes escalate. Other structural features (e.g., rating scales, assessment criteria, selection thresholds) may also shift across stages; future research should examine how such within-process changes interact with stage-dependent conditions to generate stage-contingent outcomes. For example, we find that female-led startups are less likely to receive the highest score (5) in the winners stage, despite being more likely to receive this score in the shortlisting stage, which suggests that recent findings on gender and top score assignments may have stage-specific contingencies (Rivera & Tilcsik, 2019). A structural feature’s influence may therefore depend on the stage, a finding that helps to refine theoretical accounts of evaluative inequality and further reconciles findings from single-stage studies.
Prior research on gender, evaluations, and competition underscores why outcomes at each stage—not only the final stage—matter. Intermediate-stage outcomes send signals about venture viability, shaping entrepreneurs’ pursuit of ideas (Howell, 2021) and impacting gender differences in labor supply and perseverance. Evidence indicates that women may opt out of certain career opportunities after receiving adverse early signals in evaluative processes, which alters who remains to compete and the composition of candidate pools (Fernandez-Mateo et al., 2023). For example, women rejected early in hiring processes are less likely to pursue similar roles in the future (Brands & Fernandez-Mateo, 2017). Accordingly, shortlisting stages carry consequences beyond determining who advances: Early, lower-stakes selections influence pipeline dynamics, whereas later, higher-stakes selections impose binding, public commitments at the process’s ultimate point of selection. Moreover, reaching the winners stage and not being selected may be especially discouraging, both for affected individuals and onlookers who observe limited representation at the most consequential stage, potentially reducing subsequent entry or persistence by women. These supply-side responses may interact with false-positive costs to magnify stakes-driven gender inequality, an avenue for future research.
More broadly, we contribute to research on the gender gap in entrepreneurship (Brooks et al., 2014; Coleman & Robb, 2009; Ewens & Townsend, 2020; Huang et al., 2021; Kanze et al., 2020; Miller et al., 2023; Snellman & Solal, 2023). Prior research shows that evaluators tend to favor ventures that fit with established patterns of success (Boudreau et al., 2016; Hofstra et al., 2020). As women are historically underrepresented in entrepreneurship, female-led startups may be perceived as less traditional and, thus, associated with worse expectations of future performance and success. We extend this perspective by showing that structural characteristics of evaluation processes—here, escalating stakes—correlate with female-led startups’ evaluative outcomes. Estimates of lower evaluations for women in the winners stage persist even when we compared female- and male-led startups that received similar quality ratings in the shortlisting stage, suggesting shifts in evaluators’ application of existing standards (Biernat & Vescio, 2002). Thus, we help to clarify when and where gender differences are most likely to emerge in entrepreneurial evaluations.
We note several limitations. First, our empirical context, innovation-driven entrepreneurship, is historically male-dominated; thus, our insights may be most applicable to other fields in which women are also underrepresented. Second, our framework assumes substantial uncertainty about candidate quality. Although such conditions are common across contexts, evaluators in these contexts may rely more heavily on available signals, including gender. Third, after the pitch, NVA judges briefly discussed the startup but ultimately entered independent evaluations. Gender may factor differently when groups must reach consensus. In those settings, gender could emerge as an orienting device for perceived quality. Future research can explore how different decision-making structures moderate stage-contingent outcomes.
In our setting, judges are instructed to evaluate startups by the same criteria in every stage, which helps to isolate between-stage differences in stakes from formal changes in criteria or guidance. We cannot, however, fully observe how judges interpret and weigh criteria. As stakes increase, judges may prioritize some attributes (e.g., experience, credentials) over others, potentially prioritizing traditionally male-associated qualities. Although such patterns are consistent with our theory that gendered outcomes vary with stakes, future research could illuminate these microprocesses by directly measuring how criteria are weighted across stages. We observe preliminary evidence that gender differences vary across attributes: Gender gaps appear largest on attributes characterized by greater uncertainty (e.g., growth potential) and are more limited on more concrete, measurable attributes (e.g., financials, regulation/IP), though this warrants further study. Related, evaluators and candidates may adjust behavior between stages. Although we do not find evidence of gendered differences in startup-level improvements between stages, candidates may respond differently to changing evaluative conditions. Embedding instruments to capture such behavioral adjustments would further clarify potential mechanisms.
Our aim is to inform and guide, not foreclose, future research on multistage evaluations. Despite the ubiquity of such evaluations in organizational life, systematic research on multistage processes remains scarce. Future studies can specify the conditions under which gender differences emerge or intensify as evaluators move from provisional shortlisting stages to binding winners stages, and future research can apply this framework to other characteristics such as social class, political partisanship, or race (Botelho et al., 2025; Castilla, 2008; Kim et al., 2024; Pager et al., 2009; Rheinhardt et al., 2024; Rivera & Tilcsik, 2016). We also urge scholars to interpret single-stage results in light of where a stage sits on the bindingness-and-error-cost continuum. Snapshots from single-stage evaluations can potentially be misleading about the differences that arise and which mechanisms plausibly operate. Our sample, for example, could have yielded three separate single-stage studies—higher likelihood of selection for female-led startups in the shortlisting stage, parity in the pre-winners stage, and lower likelihood in the winners stage—each implying a different theory and set of operative mechanisms. Considered together, however, the stages reveal a broader pattern: Escalating stakes jointly shape outcomes within one unified evaluation process.
Finally, we theorize and examine a finite multistage evaluation process that culminates in the allocation of a predefined resource (e.g., a job offer, funding). Organizations also run serial evaluations over time, such as post-hire promotion and bonus decisions, that are conceptually separate. While such sequences fall outside the scope of this study, future research should examine inter-process spillovers. Outcomes from one evaluation process (e.g., prior promotion) may influence subsequent evaluations (e.g., subsequent career opportunities), particularly if relative stakes vary. In our setting, judges were refreshed between stages and lacked access to earlier-stage information. In other contexts, however, evaluators certainly observe prior assessments (e.g., past performance reviews). Such visibility could amplify inequality through peer-deference dynamics or anchoring, especially when later assessments are binding and false-positive costs are high.
Our findings also offer practical implications for organizational leaders and policymakers. Seemingly neutral design choices, such as adopting a multistage structure, can systematically shape who advances and who is consecrated as a winner. We are not advocating a removal of stages. Rather, to promote fairer and more effective evaluations, leaders should attend to stage-specific stakes, especially in binding stages, and audit whether the winners stage disproportionately affects particular groups without merit-based justification. Attending to how the evaluative stakes in each stage shape final outcomes may help to mitigate disparities while facilitating resource allocation to the best candidates.
Supplemental Material
sj-pdf-1-asq-10.1177_00018392261442931 – Supplemental material for Getting in the Door vs. Winning It All: How Gendered Outcomes Change Across Evaluation Stages in Entrepreneurship
Supplemental material, sj-pdf-1-asq-10.1177_00018392261442931 for Getting in the Door vs. Winning It All: How Gendered Outcomes Change Across Evaluation Stages in Entrepreneurship by Tristan L. Botelho and Ethan J. Poskanzer in Administrative Science Quarterly
Footnotes
Acknowledgements
We thank Nicole Francis and Ray Jin for their helpful research assistance. We would also like to thank Jim Baron, Judy Chevalier, Sarah Kaplan, Kelly Shue, George Ward, and Kate Weisshaar, as well as seminar participants at the Academy of Management, Brookings Institute, Inequality and Evaluations Conference, Oxford Residence for Entrepreneurship, Oxford University; Stanford University, Strategic Management Society, University of California, Santa Barbara; University of Colorado, Boulder; Colorado Junior Faculty Consortium, Warwick Business School, Wharton People & Organizations Conference, and Yale University, for their feedback and suggestions. Ethan Poskanzer gratefully acknowledges financial support provided by the Frank Schiff Professorship and the Leeds School of Business.
Data Availability Statement
We are unable to make the data for this study publicly available in order to protect the privacy of the accelerator that generously shared the data for academic purposes.
2
Each startup submitted a written application that described its idea, team, and accomplishments. These applications were scored by four to six (average = 5.5) randomly assigned judges to determine which startups would pitch in the shortlisting stage. The judges of these written applications were different from the judges who evaluated in the pitch competition.
3
NVA also did not invest in startups after the process concluded.
4
We analyzed the pre-winners stage as a temporally close comparison to the winners stage, in which commitments are slightly less binding and the cost of a false positive is still lower than in the winners stage, to further test our theory that gendered outcomes are a function of evaluative stakes and to explore potential alternative explanations. See the section Alternatives: Non-Stakes-Based Explanations for Between-Stage Variation in Gendered Outcomes.
5
The same judge evaluated the same startup in both the shortlisting and winners stages in only one instance out of 1,528. Judges were also barred from evaluating startups with which they had prior interaction. Some judges acted as mentors in the program but were then prohibited from judging any startups that they mentored. We analyzed a random subsample of the data in which information on startup–mentor interactions was available, and found no instances of a judge evaluating a startup that they had mentored.
6
The results are also robust to more fine-grained measures of startup location. We repeated the analyses using country fixed effects and region (equivalent to a state or province) fixed effects and found similar results.
7
Startups self-reported their industry from the following choices: General, Energy/Clean Tech, Healthcare/Life Sciences, High Technology, and Social Impact.
8
Startups not listed as having raised funding in PitchBook were coded with a value of 0 for this variable.
9
This variable is unavailable for 3.1 percent of startups in the sample. In those cases, we imputed the within-year mean value of Written application score. Results are robust when these observations are excluded.
11
12
The data collection process included website redirects, whereby a startup redirected its old domain to a new one such as for rebranding, as continued activity.
Authors’ Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
