Abstract
(Hetero)sexual double standards (SDS) entail that different sexual behaviors are appropriate for men and women. This meta-analysis (k = 99; N = 123,343) tested predictions of evolutionary and biosocial theories regarding the existence of SDS in social cognitions. Databases were searched for studies examining attitudes or stereotypes regarding the sexual behaviors of men versus women. Studies assessing differences in evaluations, or expectations, of men’s and women’s sexual behavior yielded evidence for traditional SDS (d = 0.25). For men, frequent sexual activity was more expected, and evaluated more positively, than for women. Studies using Likert-type-scale questionnaires did not yield evidence of SDS (combined M = −0.09). Effects were moderated by level of gender equality in the country in which the study was conducted, SDS-operationalization (attitudes vs. stereotypes), questionnaire type, and sexual behavior type. Results are consistent with a hybrid model incorporating both evolutionary and sociocultural factors contributing to SDS.
Women and men often are held to different standards of appropriate behavior (Foschi, 2000; Prentice & Carranza, 2002). For example, women are penalized more than men for self-promoting behavior (Rudman, 1998) and for speaking in a direct and dominant manner (Carli et al., 1995). In addition, compared with women, men are penalized for passiveness (Costrich et al., 1975) and modest behavior (Moss-Racusin et al., 2010). Such “backlash” effects (Rudman & Fairchild, 2004), whereby men and women receive sanctions for violating social standards for their behavior, can have far-ranging negative consequences for individuals and society.
The present analysis focuses on (hetero)sexual double standards (SDS), in which different sexual behaviors are expected of, and valued for, men and women (Emmerink, Vanwesenbeeck, et al., 2016; Zaikman & Marks, 2017). Traditionally, men/boys are expected to be sexually active, dominant, and the initiator of (hetero)sexual activity, whereas women/girls are expected to be sexually reactive, submissive, and passive. Moreover, traditionally men are granted more sexual freedom than women. As a consequence, men and women can be treated differently for the same sexual behaviors. For example, slut-shaming is experienced by 50% of girls, compared with 20% of boys (Hill & Kearl, 2011).
Furthermore, traditional SDS have been associated with gender differences in sexual coercion and violence (Shen et al., 2012), as well as in sexual pleasure and achieving orgasms (Kiefer et al., 2006; Sanchez et al., 2012). SDS have also been associated with gender differences in sexual risk behavior, specifically with more sexual partners for men, and more reluctance to request or insist on condom use for women (Lefkowitz et al., 2014). Traditional SDS have further been related to other societal problems, such as homophobia, sexism, and gender inequality (Zaikman & Marks, 2014; Zaikman, Marks, et al., 2016).
However, research is inconsistent about the continued existence and extent of SDS (for narrative reviews, see Bordini & Sperb, 2013; Crawford & Popp, 2003; Zaikman & Marks, 2017), which might be due to, among other reasons, differences between the studies in the conceptualization and measurement of SDS. Because of the negative implications of SDS for men and women, it is important to illuminate whether SDS still exist. This undertaking necessitates probing whether the conclusions about its existence depend on the sexual behavior type assessed, or on how SDS are measured or conceptualized. Therefore, we conducted a meta-analysis to examine whether SDS are present in society and which measures and conceptualizations yield evidence for the existence of SDS, and which do not. We define existence of SDS as the degree to which people have internalized SDS in their own social cognitions (i.e., stereotypes, attitudes; Greenwald et al., 2002).
Theoretical Perspectives on SDS
In our work, we draw on several distinct yet sometimes overlapping theoretical frameworks to make predictions about the existence and moderators of SDS. Our goal was to provide a broad theoretical overview of the conditions under which SDS would be present. First, evolutionary theory (Buss & Schmitt, 1993; Trivers, 1972) and biosocial theory (Wood & Eagly, 2002, 2012) assume the existence of SDS and make predictions about moderators of the strength of SDS, specifically with regard to the behavioral specificity of SDS, cultural differences, and historical change. Second, we use the theoretical framework of male and female control theory, which builds on premises of evolutionary and biosocial theory, to make predictions about gender differences in SDS. Third, we employ the gender-intensification hypothesis, which is similar to biosocial theory in its focus on gender roles, to predict age differences in SDS. The specific predictions we derived from each perspective are primarily based on our interpretation of the theories. The original theorists did not necessarily specify these concrete predictions, but we believe that they logically follow from their core propositions.
Theoretical perspectives on the existence of SDS
Evolutionary theory
Evolutionary theories, and especially gender differences in parental investment and reproductive strategies, provide rationales for the differential expectations and evaluations of men’s and women’s sexual behavior. Regarding parental investment, women biologically invest more in their children than men (e.g., egg cells are more precious than sperm cells, 9-month pregnancy, delivery; Trivers, 1972). Due to the lower parental investment of men compared with women, there is a high degree of competition among males for female mates. In this context, being highly dominant and assertive sexually is likely to increase mating success for men (Buss & Schmitt, 1993). In addition, men benefit more than women from having frequent sex with multiple partners, as this increases the likelihood of passing their genes on to a next generation. In contrast, the higher parental investment by women makes them more selective with regard to choosing mates or withholding sex, until they may be sure that the partner can provide resources for their children (Oliver & Hyde, 1993) and is willing to assist in raising their children (Wiederman & Allgeier, 1992). These evolutionary processes are supposed to unconsciously influence how we view sexual behavior of others and ourselves. More specifically, it has been suggested that physical or behavioral traits that indicate reproductive fitness elicit positive evaluations and traits that indicate a lack of fitness elicit negative evaluations (Milhausen & Herold, 2001).
Biosocial theory
According to biosocial theory (Wood & Eagly, 2002, 2012), different norms for the behavior of men and women arise from societies’ division in gender roles: the female role of homemaker and the male role of economic provider. These different roles emerged, among others, from biological differences between men and women, with men being physically stronger, and women investing more in childbearing and nursing. As such, this theory integrates evolutionary processes related to parental investment and sexual strategies, although the division of gender roles is viewed as the most proximal cause of gender differences. Traditionally, the male role is characterized by competence, independence, assertiveness, power, and leadership, whereas the female role is characterized by submissiveness, kindness, consideration, helping, nurturing, and caring. People are expected to behave according to their gender roles and behavior that adheres to gender roles elicits positive evaluations, whereas behavior that violates gender roles elicits negative evaluations (Gaunt, 2012). Because gender roles are social constructions, socialization processes such as observational learning, reward, and punishment are important for learning what constitutes appropriate sexual behavior for men and women (Bussey & Bandura, 1999). These processes influence men’s and women’s own sexual behavior, but also their cognitions about SDS.
Applied to sexual behavior, the power difference in gender roles means that society expects men to be sexually agentic, that is, dominant, powerful, and assertive, and rewards men for such behaviors. In contrast, society expects women to be sexually communal, that is, submissive, passive, and reactive in sexual relationships, and accordingly rewards women for such behaviors. Most previous research on gender differences in sexual agency has focused on initiation patterns, with men initiating sex more frequently than women (Dworkin & O’Sullivan, 2005). Other examples of sexually agentic behaviors are perpetration of interpersonal sexual violence and willingness to engage in casual sexual relations (Mosher & Danoff-Burg, 2005), or being the director of sex as well as the teacher and expert in sex (Schwartz & Rutter, 2000). Examples of women’s more sexually communal behaviors are associating sex with complying and submitting, not communicating one’s own desires (Fetterolf & Sanchez, 2015; Kiefer et al., 2006), and consent to unwanted sexual activities in relationships (O’Sullivan & Allgeier, 1998).
Theoretical perspectives on moderators of SDS
Gender differences in SDS
Other theories that provide relevant predictions with regard to SDS are male and female control theory (Baumeister & Twenge, 2002). This theoretical framework integrates both evolutionary and sociocultural (i.e., feminist) perspectives to explain suppression of female sexuality in general, and more specifically to explain gender differences in SDS. According to male control theory, SDS can be viewed as a male privilege that men want to keep in place. Advantages for men that arise from SDS, and the associated sexual dominance of men and suppression of female sexuality, are improved certainty about paternity (Buss, 1994), reduced male insecurity, and prevention of social chaos caused by widespread, indiscriminate sex (Baumeister & Twenge, 2002; Hyde & DeLamater, 1997). As such, SDS are part of a patriarchal system that is created by and for men, and suppresses women. Men are invested in patriarchy more than women, and therefore men might also be more supportive of SDS (Rudman et al., 2013). In addition, women are less accepting than men of social hierarchies that subordinate women (Lee et al., 2011), such as traditional gender roles.
In contrast, female control theory (Baumeister & Twenge, 2002) proposes that female sexuality is a more valuable resource than male sexuality, because gender differences in sexual desire leads to a higher demand for female sexuality. As a consequence, SDS in which female sexuality is suppressed have advantages for women, because they can trade highly valued sexual favors for lower valued favors from men, such as economic provision, monogamous relationships, and parental investment. There is, however, less empirical support for female control theory than there is for male control theory (Petersen & Hyde, 2010; Rudman et al., 2013). Because male control theory proposes that SDS constitute a form of male privilege that men want to control and keep in place, we predicted that men would be more likely than women to hold traditional SDS.
Behavioral specificity of SDS
An important question is whether the existence of SDS is behavior specific. In the current meta-analysis, we defined sexual behavior as any behavior involving oral sex and/or sexual intercourse (vaginal/anal), because too few studies examined SDS in other behaviors, such as petting, kissing, or manual stimulation of the genitalia. Previous research examined the existence of SDS in a myriad of sexual behaviors, ranging from premarital sex in committed relationships (e.g., Sprecher et al., 2013), having sex outside committed relationships (i.e., casual sex; e.g., Penhollow et al., 2017), having sex for the first time at the age of 16 or younger (i.e., early sexual debut; e.g., Sprecher et al., 1987), sexual infidelity or having a sexual affair (e.g., Haavio-Mannila & Kontula, 2003), being highly sexually active (e.g., many sexual partners versus a few sexual partners, engaging in threesomes, multiple sexual partners at the same time) versus being less sexually active (e.g., Sprecher et al., 1991), to sexual coercion between persons that are in a power or age hierarchy, for example a teacher and a student (e.g., Dollar et al., 2004).
Even though evolutionary theory and biosocial theory have often been pitted against each other in the literature, there is accumulating evidence for hybrid models explaining gender differences in sexuality from the interplay between evolutionary predispositions and sociocultural pressures (Lippa, 2009). For example, the relative power of evolutionary and biosocial theory to explain gender differences may vary depending on the behavior under consideration (Cross et al., 2013; Lippa, 2009). In terms of reproductive fitness, men would benefit more than women from having frequent casual sex with many partners, having an early sexual debut, and having sex with other person’s during a committed relationship (i.e., sexual infidelity; Buss & Schmitt, 1993; Petersen & Hyde, 2010). For men engaging in these behaviors is likely to increase the success of passing genes on to the next generation, whereas for women refraining or postponing these behaviors is likely to be a more successful reproductive strategy because of their higher parental investment. Therefore, based on evolutionary theory we expect SDS to be most prevalent for these specific behaviors and less for other sexual behaviors, such as premarital sex in committed relationships, or sexual coercion. It is important to realize that sexual behaviors are not limited to the context of reproduction, because use of condoms and contraceptives can prevent actual impregnation, but rather evolutionary processes can be viewed as general tendencies underlying sexual behavior (Zaikman & Marks, 2017).
Biosocial theory (Wood & Eagly, 2002, 2012) would predict that SDS are most prevalent in sexual encounters where there is a power/status difference between men and women, and less so in the other sexual behaviors (e.g., early sexual debut, casual sex, sexual infidelity, premarital sex in committed relationship, high sexual activity level; see also Zaikman & Marks, 2017). In patriarchal societies, men hold more power than women, which is supposed to underlie gender differences in sexual behavior and attitudes (Fugère et al., 2008; Petersen & Hyde, 2010). In such a context, society affords sexual agency more to men than to women. More specifically, SDS may serve to justify male sexual coercion toward women (Krahé et al., 2000; Warner, 2000), possibly because men exerting power over women in a sexual context fit with their agentic role. In addition, male victims of sexual coercion or rape might be perceived as powerless and not willing to have sex, which violates men’s (hetero)sexual agentic gender role (Weis, 2010). Double standards with regard to sexual coercion in power/age hierarchies have most often been studied in person perception studies. In such studies, participants evaluate a perpetrator (e.g., higher status or older) and/or a victim (e.g., lower status or younger) in a hypothetical sexual context. Based on biosocial theory we expect the strongest SDS in the context of sexual coercion/power hierarchy, with male perpetrators being evaluated less negatively or penalized less (e.g., deserving less punishment, judged to be less exploitative) than female perpetrators, and male victims being evaluated more negatively (e.g., condemned more, more damaged reputation) than female victims.
Yet, an alternative hypothesis is also possible on the basis of the societal norm that men need to protect women, because women are more vulnerable. It has been argued that the traditional male gender role therefore also encompasses chivalry norms (Eagly, 1987). In the context of sexual coercion in power/age hierarchies, male perpetrators violate the chivalry aspect of their gender role. Therefore, people might penalize male perpetrators more than female perpetrators. This expectation is consistent with research on violence in general, showing that people evaluated violence from a man to a woman more negatively than violence from a woman to a man (e.g., Felson & Feld, 2009).
Cross-cultural differences and changes over time
Evolutionary theories, and specifically the perspective of obligate sex differences (i.e., persistent and relatively uniform psychological sex differences across cultures due to hormonal or genetic effects), would predict that there are no differences in SDS between countries. If these double standards evolved from adaptive gender differences in reproductive strategies, they would be universal and should be visible in all countries (Schmitt, 2015). However, according to the emergently moderated perspective cross-cultural variation in sex differences is the result of moderating factors in the local ecology, like religion and gender equality. Level of gender equality is particularly relevant in the context of SDS. Yet, the emergently moderated perspective does not yield a testable hypothesis about whether increasing levels of gender equality would suppress or accentuate gender differences in the norms for sexual behavior. Therefore, we will only test the prediction from the obligate sex difference perspective.
With regard to cross-cultural differences, biosocial theory would argue that gender roles are a product of culture (Wood & Eagly, 2002, 2012) and thus SDS might differ between countries. To quantify cross-cultural differences in gender equality, two measures have been developed that assess the level of gender equality in countries across the world: the gender inequality index (United Nations Development Program, 2017) and the global gender gap score (World Economic Forum, 2017). Data from these measures showed that Scandinavian and Western European countries generally have the smallest gender gap in the world and that North American countries have a somewhat bigger gender gap. Latin-American and Asian societies have intermediate levels of gender inequality. The largest gender inequality can be found in Middle East and North African societies. Biosocial theory would predict that lower gender equality scores of countries on these measures are associated with more traditional SDS.
Regarding changes over time, evolutionary theory would not predict changes in SDS over the last 60 years, because evolutionary changes are generally slow. However, biosocial theory would predict that SDS would be less traditional in recent studies compared with older studies. In recent decades, the division of gender roles has become less strict in most modern Western societies (Eagly & Wood, 1999), which according to biosocial theory would lead to less differentiation in the norms for the sexual behavior of men and women (Wood & Eagly, 2002, 2012). Moreover, gender equality has increased in most Western societies over the decades (Inglehart & Norris, 2003).
Age differences in SDS
Neither evolutionary theory nor biosocial theory makes direct predictions with regard to age differences in the existence of SDS. Yet, pressures to conform to gender roles increase with child age and might be highest in adolescence (Basow & Rubin, 1999). According to the gender-intensification hypothesis, this might be because adolescence is a period in which boys and girls become increasingly different as a result of the convergence of biological, social, and cognitive changes (Hill & Lynch, 1983). Also, major developments in sexuality take place in adolescence (DeLamater & Friedrich, 2002), making sexuality a highly salient issue on which adolescents evaluate each other (Kreager et al., 2016). Therefore, we expected SDS to be more prevalent in adolescent samples than in adult samples.
SDS: Previous Findings
Reiss (1964) conducted the first systematic study of SDS in the 1960s, indicating that more sexual permissiveness was granted to men than to women. Since then, dozens of studies have been published on SDS, albeit with inconsistent results. Several studies did not find clear evidence of SDS (e.g., Gentry, 1998; O’Sullivan, 1995), whereas others clearly demonstrated the existence of traditional SDS (e.g., Marks, 2008; Marks & Fraley, 2007). Some recent studies even found evidence for a reversed double standard (e.g., Howell et al., 2011; Zaikman, Vogel, et al., 2016), in which women were evaluated more positively than men for high sexual activity (Milhausen & Herold, 1999).
Several narrative reviews tried to summarize the inconsistent body of research (Bordini & Sperb, 2013; Crawford & Popp, 2003; Fugère et al., 2008). Crawford and Popp (2003) concluded that traditional double standards for some sexual behaviors still exist, for example, for initiating sex, casual sex, sex at an early age, and having many sexual partners, but that for other sexual behaviors a double standard is no longer present, for example, for sex before marriage. Fugère et al. (2008) concluded that in some studies men held more traditional SDS than women and that SDS might be more traditional in non-U.S. samples (Russian, Japanese) compared with U.S. samples. Similar to Crawford and Popp (2003), Bordini and Sperb (2013) concluded that premarital sex and casual sex are accepted for both men and women in Western cultures, whereas a double standard still exists for other sexual behaviors, such as being highly sexually active or having a high number of sexual partners. From these reviews, we can conclude that the following moderators appear to play a role in the existence of SDS: sexual behavior type, gender, and cultural background.
In addition, two meta-analyses examined gender differences in sexuality, encompassing 30 specific sexual behaviors and attitudes (Oliver & Hyde, 1993; Petersen & Hyde, 2010). These meta-analyses also assessed a gender difference in SDS attitudes, but did not report the overall existence of SDS across men and women. Men reported more traditional SDS than women in the meta-analysis on studies conducted between 1993 and 2007 (Petersen & Hyde, 2010), whereas women reported more traditional SDS than men in the meta-analysis on studies conducted between 1974 and 1993 (Oliver & Hyde, 1993). Both meta-analyses each only included seven studies about the SDS, which might be because the search terms were not specific enough for SDS. Moreover, a small number of included studies assessing SDS precluded robust examination of moderators.
Finally, recently Zaikman and Marks (2017) conducted a theory-based narrative review on SDS, describing evidence for hypotheses based on evolutionary theory, biosocial theory, and cognitive social learning theory. First, they did not identify any evidence for evolutionary theory’s proposition that SDS would be most evident for sexual behaviors that could lead to reproduction. This hypothesis could not be studied in the current meta-analysis, because too few studies examined SDS in sexual behaviors that cannot lead to reproduction, such as petting or kissing. Second, they presented evidence for the prediction of biosocial theory that SDS are more evident when there are power differences between men and women. Third, they identified evidence for SDS being more prevalent in cultures characterized by higher levels of gender inequality and for SDS becoming more egalitarian over time. These findings were in line with biosocial theory, but not with evolutionary theory. Regarding predictions of cognitive social learning theory, they identified evidence for the role of traditional gender-role socialization and a high level of sexual experience in the existence of SDS.
Conceptualization and Measurement of SDS
Inconsistencies in previous research on SDS could be due to differences in conceptualization, measurement, and study design.
Conceptualization
Most previous research has conceptualized SDS in terms of attitudes: people’s differential evaluation of sexual behaviors of men versus women. Yet, research using this conceptualization has provided inconsistent evidence of SDS. Therefore, Milhausen and Herold (2001) proposed a reconceptualization of SDS. They distinguished between people’s personal acceptance of SDS (e.g., “I would think badly of a man/woman who had protected sexual intercourse with a woman/man he or she was not emotionally committed to”) and people’s knowledge of the existence of SDS in society (e.g., “Who do you think has more sexual freedom today?”). Using this reconceptualization, they found that most people still believe SDS exist at a societal level, but on a personal level most people held egalitarian standards. The distinction between personal attitudes and more generally shared social expectations is similar to the common distinction in social psychology between knowledge of cultural stereotypes (i.e., socially shared set of expectations about a certain social group) and personal attitudes (i.e., negative or positive evaluation of certain social group or behavior of this group) (Greenwald et al., 2002).
Because SDS are grounded in negative evaluations of those who behave in ways that violate social sexual standards as well as stereotyped beliefs about gender (Lai & Hynie, 2011), in the current meta-analysis we assessed SDS in terms of individuals’ attitudes, that is, their personal evaluation of sexual behavior of men versus women, and in terms of their stereotypes, that is, their expectations about the sexual behavior of men and women. Although, as explained below, we were not able to distinguish between personal stereotypical beliefs and knowledge of cultural stereotypes in our analysis, we nonetheless expected that the broad distinction between attitudes and stereotypes might be important. Consider, for example, that findings indicate that in children, as well as adults, content of gender stereotypes has not changed over time, whereas gender attitudes have become more egalitarian (Ruble, 1983; Signorella et al., 1993). Thus, studies conceptualizing SDS in terms of stereotypes (e.g., participants’ responses to questions such as “Who has more sexual freedom?”) might be more likely to yield evidence for a double standard, compared with studies using the attitude conceptualization (e.g., participants’ differential evaluation of either a male of female target in a friends-with-[sexual]benefits scenario) or studies conceptualizing SDS as a combination of stereotypes and attitudes (e.g., aggregating participants’ agreement with attitude statements such as “It is worse for a woman to sleep around than it is for a man” with stereotype statements such as “It is expected that a woman is less sexually experienced than her partner.”).
Measurement type and study design
Dual-process models of cognition propose that social cognitions can be present both at an explicit and an implicit level (Gawronski & Creighton, 2013). Explicit cognitions are overtly expressed ideas that are under conscious control and, therefore, are especially prone to social-desirable responding (Greenwald et al., 2009). Self-report questionnaires of stereotypes and attitudes tap into explicit cognitions. Implicit cognitions, on the other hand, are supposedly relatively inaccessible to conscious awareness, are elicited unintentionally, require little amounts of cognitive resources, and cannot be stopped voluntarily (Gawronski & Bodenhausen, 2006). They are most often assessed with implicit association tests (IATs) that measure the strength of automatic associations between concepts (e.g., men, women, Black people, gay people) and attributes (e.g., career, family, angry, good, bad). An important implication of dual-process models for social cognitions about controversial subjects, such as gender, is that people might report more egalitarian cognitions on explicit self-report measures than are suggested by their responses on implicit measures. The explicit format enables them to better control the expression of potentially socially undesirable cognitions (Gawronski & Bodenhausen, 2006). Research has indeed shown that implicitly assessed gender stereotypes were more traditional than explicitly assessed gender stereotypes (Endendijk et al., 2013).
In the SDS literature, studies often used self-report questionnaires resulting in a composite score indicating explicit SDS-cognitions (e.g., Caron et al., 1993). Questionnaires differ, however, in whether they assess participants’ agreement with statements that are indicative of a traditional double standard (e.g., “It is up to the man to initiate sex.”; double standard scale [DSS] of Caron et al., 1993), or participants’ evaluation of parallel items about men’s and women’s sexual behavior (e.g., “A [girl/boy] who has sex on the first date is easy”; SDS scale of Muehlenhard & Quackenbush, 1998; premarital sexual permissiveness scale by Reiss, 1964). Implicit measures have hardly been used to asses SDS (see, for example, Marks, 2008; Sakaluk & Milhausen, 2012), and findings regarding differences in implicit and explicit SDS have been inconsistent. For example, Marks (2008) found evidence of double standards for person evaluation under divided attention (i.e., implicit condition), but not for person evaluation under full attention (i.e., explicit condition). In contrast, Sakaluk and Milhausen (2012) found that men held more traditional SDS than women on an explicit self-report questionnaire, but men held egalitarian standards on an IAT, whereas women demonstrated reversed double standards on the IAT.
Another measurement type that is often used, and could be considered as a relatively implicit measure of SDS, is an experimental task in which participants have to evaluate a vignette or scenario that describes the sexual behavior of a hypothetical male and/or female (e.g., Barron, 2010). Such tasks are considered less explicit methodologies for measuring SDS (Jonason & Marks, 2009; Reid et al., 2011; Weaver et al., 2013). For instance, in a between-subject design, researchers randomly assign vignettes/scenarios to participants, who are generally unaware of the presence of other vignettes presented to other participants. Or in a within-subject design, researchers administer vignettes/scenarios in a counter-balanced way to participants. Such procedures make it less explicit that the participant’s SDS-cognitions are assessed than with self-report questionnaires. Because of the less explicit nature of the assessment, social desirability may play a less important role in experimental designs using vignettes than in studies using self-report questionnaires (Greenwald et al., 2009).
In addition, between-subjects designs are likely more implicit in nature than within-subjects designs. In between-subjects designs, participants evaluate the sexual behavior of either a man or a woman, whereas in within-subjects designs, participants evaluate the sexual behavior of both genders, which likely makes the focus on gender differences more explicit. In previous studies, scholars also suspected that social desirability and demand characteristics (whereby participants form an interpretation of the experiment’s purpose and unconsciously change their responses to fit that interpretation) play a larger role in within-subject research on SDS than in between-subject research (Marks & Fraley, 2005; Milhausen & Herold, 2001). Thus, evidence of SDS is less likely to be found in studies using relatively explicit measures or designs, than in studies using relatively implicit measures or designs. More specifically, studies using Likert-type-scale questionnaires are less likely to yield evidence for SDS than studies assessing the differential evaluation of men and women engaging in the same sexual behavior, or studies using IATs or similar reaction-time measures. Similarly, between-subjects designs are more likely to yield evidence for SDS than within-subjects designs.
Current Study
The current meta-analysis of SDS tested the following grand hypotheses based on evolutionary theory (Buss & Schmitt, 1993; Trivers, 1972) and biosocial theory (Wood & Eagly, 2002, 2012): (a) people expect behaviors associated with high sexual activity more from men than from women, and behaviors associated with low sexual activity more from women than from men; (b) people evaluate highly sexually active men more positively (or less negatively) than highly sexually active women, and low sexually active women more positively (or less negatively) than low sexually active men.
In addition, we tested several hypotheses regarding specific moderators. Both evolutionary theory and biosocial theory propose that SDS are behavior specific, with evolutionary theory predicting SDS to be most evident for having frequent casual sex with many partners, having an early sexual debut, and sexual infidelity, whereas biosocial theory predicts SDS to be most evident for sexual coercion in a power/age hierarchy. In addition, biosocial theory predicts that there are differences in SDS between countries and changes over time, but evolutionary theory does not. The hypotheses that were based on evolutionary theory and biosocial theory are similar to the hypotheses presented in a recent narrative review of the theories relevant to the study of SDS (Zaikman & Marks, 2017).
Regarding the demographic moderator age, the gender-intensification hypothesis predicts that studies with adolescent samples are more likely to yield evidence for SDS than studies with adult samples. In terms of gender differences, male control theory proposes that men would be more likely than women to hold traditional SDS-cognitions. With regard to measurement moderators of SDS, dual-process models of social cognition predict that studies using relatively explicit measures or designs (e.g., self-reports, within-subjects designs) yield less evidence for SDS than studies using relatively implicit measures or designs (e.g., IATs, vignettes, between-subjects designs). Last, studies conceptualizing SDS as stereotypes may be more likely to yield evidence for SDS than studies conceptualizing SDS as personal attitudes, or as a combination of stereotypes and personal attitudes.
The current meta-analysis extends previous narrative and meta-analytic reviews in the following ways: (a) examination of the presence and strength of SDS from 1960 until 2019; (b) examination of the effect of moderators related to sample, sexual behavior type, measurement and design, and publication year; (c) a theory-based meta-analytic approach. Such a meta-analysis is currently lacking, but is essential for better and more rigorous experimental design of future SDS research and has the potential to enhance understanding of etiology and functioning of SDS (Zaikman & Marks, 2017).
Method
Literature Search
We used the PRISMA guidelines for conducting and reporting the current meta-analysis (Moher et al., 2009; see Supplemental Appendix A). The current meta-analysis was preregistered (see Supplemental Appendix B). Via three search methods, we identified eligible studies until March 1, 2019. First, we searched the electronic databases of Scopus, ERIC, PsycINFO, Online Contents, and PiCarta for empirical, peer-reviewed articles using the following search terms: ((premarital AND sex* AND standard*) OR (sex* AND double AND standard*) OR (sex* AND permissive* AND gender AND attitude*)). These search terms are similar to the search terms used in previous systematic narrative reviews on SDS (Bordini & Sperb, 2013; Crawford & Popp, 2003). We checked whether the search terms yielded all articles included in previous narrative reviews on SDS. This was indeed the case. Second, we searched the reference lists of relevant narrative reviews on SDS and meta-analyses on gender differences in sexuality. Third, we searched the reference lists of the articles and dissertations that met our inclusion criteria for eligible studies. We applied a very broad strategy with this reference search, including all articles that mentioned any of our search terms in the title terms. The database search and reference list search together yielded 1,364 hits. Figure 1 depicts the flowchart of the literature search.

Flow chart of literature search process.
Inclusion Criteria
To be included in the meta-analysis, studies had to assess participants’ cognitions about SDS in terms of attitudes (one’s personal evaluation of sexual behavior of men versus women) and/or stereotypes (a socially shared set of expectations about the sexual behavior or men and women). Examples of personal attitude conceptualizations of SDS are (a) participants’ differential evaluation of either a male of female target in a friends-with-(sexual)benefits scenario; or (b) participants’ differential agreement with statements about men and women’s sexual behavior (“A [girl/guy] who has sex on the first date is ‘easy’”). Examples of stereotype conceptualizations of SDS are (a) participants’ differential expectations about the percentage of males and the percentage of females who have had sex by the age of 18; (b) participants’ level of agreement with statements such as “A girl is usually looking for a man to marry, but a boy is usually looking for sex” or “Women who are sexually experienced with multiple partners are usually not respected as much as men who are sexually experienced”; and (c) participants’ responses to questions such as “Who generally has more sexual freedom?” Studies examining differences in the sexual behavior of men and women were, thus, not included. Importantly, half of the studies assessing stereotype aspects of SDS used a conceptualization of stereotypes as personal beliefs (e.g., personal expectations about the percentage of males and females engaging in a certain sexual behavior). The other half of the studies conceptualized SDS stereotypes as collective beliefs (e.g., “Who generally has more sexual freedom?”). There were too few studies examining stereotypes to investigate these two stereotype aspects as separate moderator categories. Therefore, they were grouped together.
Following recommendations by Crawford and Popp (2003), we excluded studies from this meta-analysis when they employed a design that confounded participant gender and target gender (e.g., participants only evaluating the sexual behavior of same- or opposite-gender targets). Such a design makes it impossible to draw conclusions about the existence of SDS. We did not set any restrictions with regard to the language of the paper, as long as an English abstract was available for screening purposes. During the full-text screening phase, papers written in languages other than English (eight Spanish, two Portuguese) were translated by native Spanish- or Portuguese speakers. Of the included publications, two were published in Spanish and one in Portuguese.
When a study lacked sufficient information to be included (k = 11), we contacted the 11 corresponding authors by email with a request for additional information (e.g., sample information, information about measures, effect size). However, none of these studies could be included, because the contacted authors did not have access to the data anymore, or the authors never responded.
We determined level of agreement between the first author and a research assistant on the inclusion of studies on a random subset of 100 studies, oversampling included studies. Agreement between the coders was satisfactory (agreement 94%, kappa =.84). In case of disagreements between the coders, the coders discussed the disagreement until they reached consensus. After the reliability assessment, the first author screened the remainder of the articles. The studies that we included in the meta-analyses are presented in Tables 1 and 2.
Studies Included in Meta-Analysis on Differential Evaluation/Expectation of Men’s and Women’s Sexual Behavior.
Note. Numbers under questionnaire refer to double standard scale of Caron (1), sexual double standard scale of Muehlenhard (2), personal acceptance of double standard scale by Milhausen (3), scale for the assessment of sexual double standards in youth by Emmerink (4), premarital sexual standards scale by Reiss (5). H = high educational level; M = mixed; O = other; MO = mixed/other; A = Asia(n); F = Finnish; R = Russian; Est = Estonian; Mex = Mexican; US = United States; W = White/Caucasian; B = Black; LH = Latino/a/Hispanic; H = Hispanic; EA = East-Asian; Q = questionnaire; V = vignettes/scenarios; S = stereotypes; AT = attitudes; PSLA = premarital sex in love/with affection; PSE = premarital sex engaged; PSU = premarital sex unspecified; CS = casual sex; IA = infidelity/affair; CP = coercion/sex in power hierarchy; ACT = sexual activity; ESD = early sexual debut; J = journal publication; D = dissertation; C = correlational; L = longitudinal; WS = within-subjects design; BS = between-subjects design.
Studies Included in Meta-Analysis on SDS-Cognitions Assessed With Likert-type-Scale Questionnaires.
Note. Numbers under questionnaire refer to double standard scale of Caron (1), sexual double standard scale of Muehlenhard (2), personal acceptance of double standard scale by Milhausen (3), scale for the assessment of sexual double standards in youth by Emmerink (4), premarital sexual standards scale by Reiss (5). H = high educational level; M = mixed; O = other; MO = mixed/other; A = Asia(n); UK = United Kingdom; US = United States; W = White/Caucasian; B = Black; LH = Latino/a/Hispanic; H = Hispanic; EA = East-Asian; Q = questionnaire; S = stereotypes; AT = attitudes; PSLA = premarital sex in love/with affection; PSU = premarital sex unspecified; CS = casual sex; ACT = sexual activity; J = journal publication; D = dissertation; C = correlational; L = longitudinal; WS = within-subjects design.
Data Extraction
We coded the following moderators (see Supplemental Appendix C for further explanations of the categories): regarding measurement and design moderators, we coded SDS conceptualization (stereotype, attitude, combination), the measurement type (questionnaire [i.e., explicit], vignettes/scenarios [i.e., relatively implicit], other), questionnaire type (DSS of Caron et al., 1993; SDS scale of Muehlenhard & Quackenbush, 1998; premarital sexual permissiveness scale by Reiss, 1964; other), design of study (cross-sectional, within-subjects design, between-subjects design), and sexual behavior type assessed (being a perpetrator of sex in power/age hierarchy, being a victim of sex in power/age hierarchy, casual premarital sex, infidelity/affair, level of sexual activity, premarital sex when in love or with affection, premarital sex when engaged, premarital sex unspecified, early sexual debut, other/mixed). Questionnaires, study designs, or sexual behavior types other than the ones mentioned above were too uncommon to form a separate category for moderator analyses.
Regarding sample characteristics, we coded participant’s gender, the level of gender equality in the country in which the study was conducted, and participant’s mean age at the time of the assessment (categorical; adolescence 12–18 years, college aged/emerging adults 19–25 years, adults > 25 years). Regarding the level of gender equality, we averaged the country’s most recent score on two global gender equality measures: the gender inequality index (reverse-coded; United Nations Development Program, 2017) and the global gender gap score (World Economic Forum, 2017). Because these indices only exist for about 10 years, we could not compute the country’s level of gender equality at the year each study was conducted. We coded age in categories and not continuously, because more studies provided data on this categorical measure of age than of the continuous measure. We also coded sample size, in order to be able to assign weight to the effect sizes. As the majority of the studies examined White/Caucasian college samples with a primarily heterosexual orientation (see Tables 1 and 2), ethnicity, educational level, and sexual orientation could not be taken into account as moderators. Last, we coded year of publication (continuous).
As a check for coder reliability, the first author and a research assistant each coded the same set with 20 publications. Agreement between the coders was satisfactory for both the moderator and outcome variables (kappas for categorical variables between .71 and 1.00, average .91; and agreement between 80% and 100%, average 93%; intraclass correlations for continuous variables between .97 and 1.00, average .99). The coders resolved any disagreements by discussion. After the reliability assessment, the first author coded the remainder of the articles, but consulted one or more of the other authors in cases of doubt.
Meta-Analytic Procedures
We conducted two separate meta-analyses, one meta-analysis for studies that presented SDS as two different scores for the evaluation/expectation of the sexual behavior of men versus women, and a second meta-analysis for studies that examined SDS with Likert-type-scale questionnaires. The two types of studies could not be combined in one meta-analysis, because the point estimates derived from these studies (i.e., standardized mean difference in the first meta-analysis, versus a mean of one group at one timepoint in the second meta-analysis) cannot be analyzed together (Borenstein et al., 2009).
For the first meta-analysis, we calculated the standardized mean difference (d) for each study. We included data in the following forms in hierarchical order: (a) mean and standard deviation for participants’ expectations/evaluations of the sexual behavior of men and women (or mean difference together with p or t value); (b) correlations between target gender and participants’ evaluation/expectation of a targets sexual behavior; (c) p values. We gave a positive sign to effect sizes indicating a difference in evaluation of the sexual behavior of men and women that was in line with traditional SDS (e.g., more positive or less negative evaluation of males compared with females for high sexual activity), and a negative sign to differences that were not in line with traditional SDS (e.g., more positive or less negative evaluation of females compared with males for high sexual activity). According to Cohen (1977), effect sizes of d = 0.20 are small, d = 0.50 is a medium-sized effect, and d = 0.80 is a large effect.
For the second meta-analysis, we used rescaled group means and standard deviations of each study to test whether the combined mean was different from zero in either the positive or negative direction. Negative scores represented reversed SDS (e.g., expecting high sexual activity more for women than for men), scores around zero represented egalitarian sexual standards (e.g., no difference in expectation or evaluation of the sexual activity of men and women), and positive scores represented traditional SDS (e.g., expecting high sexual activity more for men than for women). Because the included studies used many different scales, we rescaled each group mean to the same scale ranging from −1 to +1, using min-max normalization which is a form of standardization (Han & Kamber, 2006; see for applications of min-max normalization, Bandura, 2008; van Zanten et al., 2014). The following formulas were used:
and
where
In case of an unequal spread of a scale between positive and negative numbers (i.e., SDS scale [SDSS], −30 to 48), the above solution may inadvertently revert the sign of the group mean (i.e., a group mean of 0 will become −0.23). To honor the original spread of positive and negative values, we adapted the above function for rescaling SDSS scores as follows:
Using this function a score of −30 will become −0.625, a score of 0 will still be 0, and a score of +48 will be +1.
Statistical Analyses
As there was a considerable number of studies that reported on multiple samples, or multiple SDS-aspects (see Tables 1 and 2), we applied a multilevel random effects model (Cheung, 2014; Van den Noortgate et al., 2013). See Figure 2 for the multilevel hierarchical structure employed in these meta-analyses. Such a model accounts for the dependency between effect sizes that come from the same study, because it partitions sources of variance: variance between studies, variance between samples from the same study, variance between effect sizes from the same study, and sampling variance (Cheung, 2014; Van den Noortgate et al., 2013). Furthermore, multilevel models can model all available effect sizes within studies together which maximizes statistical power. In addition, these models can test moderators of differences in outcomes at both the within-study level and the between-study or between-sample level. We performed the analyses with the metafor package (Viechtbauer, 2015) for the R environment (Version 3.5.2; R Development Core Team, 2013), using a step-wise procedure described by Assink and Wibbelink (2016). We used the restricted maximum likelihood procedure to estimate the parameters. Significance tests and moderator analyses were performed through random-effect models, which are more conservative than fixed-effect models when there may be different effect sizes underlying different studies (Borenstein et al., 2009). Single effect sizes within each study were computed using the Comprehensive Meta-Analysis (CMA, v3) program (Borenstein et al., 2005).

Multilevel hierarchical structure employed in the meta-analyses.
For the moderator analyses, we centered each continuous variable around its mean and converted each categorical variable with k categories to k − 1 dummy variables through binary coding. We tested single moderators first, followed by a multiple-moderator model, including all significant single moderators. We checked for outlying effect sizes and sample sizes separately for the two meta-analyses. Z-values below 3.29 or greater than 3.29 were considered outliers (Tabachnick & Fidell, 2012).
In case of nonsignificant overall effects, we used the two one-sided test (TOST) procedure for equivalence testing (Lakens, 2017), with equivalence bounds of d [−0.10, 0.10], as an indication for trivial gender differences (Hyde, 2005). In the TOST procedure a significant p value is indicative of statistical equivalence. We used likelihood ratio tests to test for the presence of significant between-study and within-study heterogeneity (σ2) (Raudenbush & Bryk, 2002). Also, we computed 95% credibility intervals (CVs) for the overall effect as an indication of the range of true population effects (Borenstein et al., 2009;see Supplemental Appendix D Table 4 and 5 for individual CVs in moderator analyses). CVs also provide an additional metric of heterogeneity in effect sizes. In addition, I2 statistics give an indication of the proportion of the variation in observed effects that is due to variation in true effects (Borenstein et al., 2017). If there was evidence for heterogeneity in effect sizes (at one or more of the three levels), we conducted moderator analyses, but only conducted when at least two of the moderator categories consisted of at least four samples each (Bakermans-Kranenburg et al., 2003). For moderator models, an omnibus test of the fixed model parameters tests the null hypothesis that the group mean effect sizes are equal. To control for Type I error rates, we applied the Knapp and Hartung (2003) adjustment. To further control for Type I errors associated with multiple-moderator tests, we included the significant moderators in one multiple-moderator model. Multiple-moderator models also test the relative importance of each significant moderator and take into account possible correlation between moderators. As a proxy of R2 change, we computed the proportional reduction in the variance components between the model without moderators and the multiple-moderator model.
Publication Bias
Smaller studies with nonsignificant results or with effect sizes in the nonhypothesized direction are less likely to be published, whereas for large studies, publication of small or nonsignificant effect sizes or effect sizes in the nonhypothesized direction is more likely, because large studies are generally deemed more trustworthy. This problem is also known as publication bias (Rosenthal, 1995). Publication bias is problematic for meta-analyses, because it could lead to an overestimation of the true effect size (Borenstein et al., 2009). A method to test for publication bias is by testing for asymmetries in funnel plots. A funnel plot is a plot of each study’s effect size against its standard error (usually plotted as 1/SE, or precision). We examined funnel plot asymmetry with Egger’s regression test, which regresses the standard normal estimate on the estimate’s precision (Egger et al., 1997). When this test was statistically significant, we performed the trim and fill method, which estimates the number of studies which have no symmetric counterpart on the other side of the funnel (Duval & Tweedie, 2000a, 2000b).
Results
The meta-analysis conducted on studies assessing SDS from differential evaluation or expectation of men’s and women’s sexual behavior included 52 studies, reporting on 116 independent samples, and 277 effect sizes. The studies reported on a total of N = 71,442 participants. The meta-analysis conducted on studies assessing SDS with Likert-type scale questionnaires included 47 studies, reporting on 85 independent samples, and 129 effect sizes. The studies reported on a total of N = 51,901 participants.
Overall Effects
The combined effect size for the difference in evaluation or expectation of men’s and women’s sexual behavior was significant, but small (d = 0.25, 95% confidence interval [CI] [0.18, 0.33], 95% CV [−0.42, 0.92], p < .001, k = 52, independent samples = 116, effect sizes = 277). The effect size was positive, indicating that behaviors associated with high sexual activity were expected more and evaluated more positively (or less negatively) in men compared with women, and behaviors associated with sexual passivity were expected more and evaluated more positively (or less negatively) in women compared with men. We found significant variation between studies, Level 3: σ2 = 0.046, χ2(1) = 11.33, p < .001; I2 = 38.52, as well as significant variation between effect sizes within studies, Level 1: σ2 = 0.068, χ2(1) = 1,334.88, p < .0001; I2 = 57.16, but no variation between independent samples from the same studies, Level 2: σ2 < 0.001, χ2(1) = 0.00, p = 1.00; I2 < 0.001.
The combined mean for SDS assessed with Likert-type-scale questionnaires was not significantly different from 0 (M = −0.09, SE = 0.05, 95% CI [−0.19, 0.01], 95% CV [−0.84, 0.67], p = .084, k = 47, independent samples = 85, effect sizes = 129). The effect was also not statistically equivalent (Z = 0.20, p = .42, 90% CI [−0.18, −0.01]). This indicated that on Likert-type-scale questionnaires participants demonstrated no evidence of SDS. We found significant variation between studies, Level 3: σ2 = 0.081, χ2(1) = 29.99, p < .0001; I2 = 56.47, as well as significant variation between effect sizes within studies, Level 1: σ2 = 0.063, χ2(1) = 11,134.09, p < .0001; I2 = 43.50, but no variation between independent samples from the same studies, Level 2: σ2 < 0.001, χ2(1) = 0.00, p = 1.00; I2 < 0.001.
For both meta-analyses, Egger’s regression test indicated that the there was no evidence of publication bias (meta-analysis on differences scores: estimate = 0.743, SE = 0.763, 95% CI [−0.76, 2.25], t(275) = 0.97, p = .33; meta-analysis on Likert-type-scale questionnaires: estimate = 9.47, SE = 5.34, 95% CI [−1.09, 20.03], t(128) = 1.78, p = .078).
Single Moderator Analyses
Tables 3 and 4 display the results of the single moderator analyses for respectively studies examining the difference in evaluation or expectation of men’s and women’s sexual behavior, and studies using Likert-type-scale questionnaires to assess SDS.
Meta-Analytic Results of Moderators of the Differential Evaluation and Expectation of the Sexual Behavior of Men and Women.
Note. Moderators or moderator-categories other than the one’s presented above were not examined, because less than two of the moderator categories consisted of at least four studies each. #k = number of independent samples; #IS = number of independent samples; #ES = number of effect sizes; d = standardized mean difference; CI = 95% confidence interval; SDS = sexual double standard.
For continuous predictors, the mean effect size d indicates the mean effect size of a participant with an average value on the corresponding predictor. b Variance was examined at the following levels: 1 = variance within samples, that is, between effect sizes from the same sample, 2 = variance within studies, that is, between samples from the same study, 3 = variance between studies. As there was zero variance at the second level in the overall model, this level was not presented in this table. c Reference category is coercion perpetrator, results of pairwise comparisons between other sexual behavior types can be found in Supplemental Appendix D (Table 4).
p < .10. *p < .05. **p < .01.
Meta-Analytic Results of Moderators of Endorsement of Sexual Double Standards Assessed With Likert-Type-Scale Questionnaires.
Note. Moderators or moderator-categories other than the one’s presented above were not examined, because less than two of the moderator categories consisted of at least four studies each. #k = number of studies; #IS = number of independent samples; #ES = number of effect sizes; d = standardized mean difference; CI = 95% confidence interval; SDS = sexual double standard(s); DSS = double standard scale; SDSS = sexual double standard scale.
For continuous predictors, the mean effect size d indicates the mean effect size of a participant with an average value on the corresponding predictor. b Variance was examined at the following levels: 1 = variance within samples, that is, between effect sizes from the same sample, 2 = variance within studies, that is, between samples from the same study, 3 = variance between studies. As there was zero variance at the second level in the overall model, this level was not presented in this table. c Attitudes and other cognitions do not differ significantly from each other: β = −0.067, SE = 0.09, t = −0.729, 95% CI [−0.249; 0.115], p = .468. d SDSS questionnaire and other questionnaires do not differ significantly from each other: β = −0.053, SE = 0.09, t = −0.568, 95% CI [−0.239; 0.133], p = .571.
p < .05. **p < .01.
For the meta-analysis on studies examining the difference in evaluation or expectation of men’s and women’s sexual behavior, SDS conceptualization was a significant moderator. SDS were more traditional (i.e., significantly higher effect size with positive sign) in studies that assessed stereotypes than in studies that assessed attitudes. Sexual behavior type was also a significant moderator. Pairwise comparisons showed that the highest effect size (i.e., most traditional SDS) was found for being a victim of sexual coercion, followed by casual sex, and having an early sexual debut. The effect sizes for these behaviors were small to moderate. We found significantly lower combined effect sizes for sexual infidelity, level of sexual activity, other/mixed sexual behavior types, premarital sex (when in love or with affection, when engaged, or relationship status unspecified), and being a perpetrator of sexual coercion. The effect sizes for these behaviors were negligible to small. The following moderators were not significant: study design, measurement type, participant gender, participant age, level of gender equality in the country where the study was conducted, publication year.
For the meta-analysis on studies using Likert-type-scale questionnaires to assess SDS, SDS conceptualization was a significant moderator. SDS were more traditional (i.e., significantly higher combined mean with positive sign) in studies that assessed stereotypes than in studies that assessed attitudes or a combination of cognitions. For studies conceptualizing the SDS as a combination of stereotypes and attitudes, a significant reversed double standard was found as indicated by a negative combined mean. Questionnaire type was also a significant moderator. Pairwise comparisons showed that studies using the DSS of Caron yielded a significant reversed double standard (i.e., negative combined mean). However, we found significantly more traditional SDS in studies using the SDS scale of Muehlenhard or in studies using other questionnaires. Level of gender equality in the country in which the study was conducted was a significant moderator, indicating that higher levels of gender equality were associated with less traditional SDS in the direction of a reversed double standard. Publication year was a significant moderator, indicating that a more recent publication year was associated with less traditional SDS in the direction of a reversed double standard. The following moderators were not significant: participant gender and participant age.
Multiple-Moderator Analyses Including All Significant Moderators
Tables 5 and 6 display the results for the multiple-moderator model for respectively the evaluation and expectation of the sexual behavior of men and women, and for studies examining SDS with Likert-type-scale questionnaires. SDS conceptualization was still a significant moderator in both meta-analyses. Sexual behavior type remained a significant moderator in the meta-analysis on the evaluation and expectation of the sexual behavior of men and women. Questionnaire type and gender equality in the country in which the study was conducted were still significant moderators in the meta-analysis on studies using Likert-type-scale questionnaires, but publication year was no longer significant. Effects were in the same direction as the single moderator analyses. In both multiple-moderator models, there was still significant variance left to explain at both the between-study and within-study level. Yet, in the model for studies examining differential evaluation of men and women, adding the moderators lead to a 32% reduction in the variance within samples. In the model for studies using Likert-type-scale questionnaires, adding the moderators lead to a 10% reduction in the variance within samples, and a 49% reduction in the variance between studies.
Results for the Multiple-Moderator Model for Studies Examining Differential Evaluation and Expectation of the Sexual Behavior of Men and Women (#k = 52, #IS = 116, #ES = 277).
Note. #k = number of studies; #IS = number of independent samples; #ES= number of effect sizes; CI= confidence interval; SDS = sexual double standard.
Variance was examined at the following levels: 1 = variance within samples, that is, between effect sizes from the same sample, 2 = variance within studies, that is, between samples from the same study, 3 = variance between studies. As there was zero variance at the second level in the overall model, this level was not presented in this table.
p < .10. *p < .05. **p < .01.
Results for the Multiple-Moderator Model for Studies Examining Endorsement of Sexual Double Standards With Likert-Type-Scale Questionnaires (#k = 47, #IS = 85, #ES = 129).
Note. #k = number of studies, #IS = number of independent samples; #ES= number of effect sizes; CI= confidence interval; SDS = sexual double standard; DSS = double standard scale.
“Other questionnaire” was a redundant predictor (r = −1.0 with “social cognition other”) and therefore dropped from the model. b Variance was examined at the following levels: 1 = variance within samples, that is, between effect sizes from the same sample, 2 = variance within studies, that is, between samples from the same study, 3 = variance between studies. As there was zero variance at the second level in the overall model, this level was not presented in this table.
p < .05. **p < .01.
Outliers
In the first meta-analysis, we detected five outlying (four positive, one negative) effect sizes (Conley et al., 2013, Study 1b and Study 2b; Marks, 2008, implicit condition 17 partners; Weaver et al., 2013) and three studies with outlying sample sizes (Do & Fu, 2010; Marks & Fraley, 2005; Soller & Haynie, 2017). In the second meta-analysis, we did not detect outlying effect sizes, but one study had an outlying sample size (Kettrey, 2016; female sample). We conducted analyses with and without studies with outlying effect sizes. The outliers with regard to sample size were winsorized (highest nonoutlying number + difference between highest nonoutlying number and before highest nonoutlying number). We found no differences in results for analyses including or excluding outlying effect sizes or winsorized sample sizes.
Discussion
In line with evolutionary theory (Buss & Schmitt, 1993; Trivers, 1972) and biosocial theory (Wood & Eagly, 2002, 2012), this meta-analysis demonstrated clear evidence for traditional SDS in studies assessing differences in people’s evaluation, or expectation, of men’s and women’s sexual behavior, although the effect was small. People expected behaviors associated with high sexual activity more from men than from women, and behaviors associated with low sexual activity more from women than from men. Similarly, people evaluated highly sexually active men more positively (or less negatively) than highly sexually active women, and low sexually active women more positively (or less negatively) than low sexually active men. In contrast, the overall set of studies using Likert-type-scale questionnaires for assessing SDS did not yield evidence of SDS.
We found some significant moderator effects in one or both sets of studies. First, existence of traditional SDS was behavior specific. Second, stereotypes about SDS were more traditional than attitudes about SDS. Third, studies using the “sexual double standard scale” (SDSS; Muehlenhard & Quackenbush, 1998) reported more traditional SDS than studies using the “double standard scale” (DSS; Caron et al., 1993) which demonstrated reversed SDS. Fourth, higher levels of gender equality in a country were associated with less traditional SDS. Participant gender and age, publication year, and study design were not significant moderators.
Behavioral Specificity of SDS
Regarding sexual behavior types, we found strongest evidence of SDS for being a victim of sexual coercion, followed by casual sex, and having an early sexual debut. SDS were less evident for sexual infidelity, level of sexual activity, other/mixed sexual behavior types, premarital sex, and being a perpetrator of sexual coercion. The findings for coercion and sexual encounters within a power or age hierarchy were partly in line with the predictions from biosocial theory that SDS would be most prevalent in sexual encounters where there is a power/status difference between men and women (Wood & Eagly, 2002, 2012). However, we only found double standards for victims of sexual coercion, and not for perpetrators. That we did not find differences in the evaluation of male and female perpetrators might be because both male and female perpetrators violate gender role expectations, with men violating chivalry norms, and females violating communal characteristics. Thus, male and female perpetrators might have been evaluated equally negative for their gender-role inconsistent behavior. Moreover, the double standards for victims of sexual coercion, found in person perception studies, indicate that sexual behavior within the context of a power hierarchy is evaluated more negatively (or less positive) for female victims (e.g., more condemned, more perceived damage to reputation) than for male victims (e.g., “positive” experience that will be evaluated by peers as cool; Zaikman & Marks, 2017). Thus, girls might be blamed for being a victim of sexual coercion (Weis, 2009), whereas boys’ experiences of sexual coercion might be trivialized (Weis, 2010). This is inconsistent with the idea that male victims of sexual coercion or rape might be perceived as powerless and not willing to have sex, which violates men’s (hetero)sexual agentic gender role (Weis, 2010). Because only a few studies examined the evaluation of both perpetrator and victim, replication of the existence and direction of SDS for coercion victims is necessary in future studies.
The finding that engaging in casual sex and having an early sexual debut were more expected and rewarded in men than in women, fits partly with predictions from evolutionary theory. In terms of reproductive fitness, men would benefit more than women from having casual sex and by having sex at an early age (Buss & Schmitt, 1993; Petersen & Hyde, 2010). However, similar beneficial effects would have been expected for sexual infidelity and high sexual activity with numerous partners, but for those sexual behaviors less traditional SDS were applied. The same was true for other sexual behaviors, such as premarital sex when engaged or when in love. Our findings are in line with previous narrative reviews concluding that premarital sex in particular has become accepted for both men and women (Bordini & Sperb, 2013; Crawford & Popp, 2003).
Cross-Cultural Differences in SDS
In line with predictions from biosocial theory (Wood & Eagly, 2002, 2012), and not with evolutionary theory’s perspective of obligate sex differences, SDS were less traditional in countries with higher levels of gender equality. According to biosocial theory, in cultures with bigger differences in the gender roles of men and women, men have more power than women, which translates in traditional SDS (Wood & Eagly, 2002, 2012). However, level of gender equality was only a significant moderator in the meta-analysis conducted on studies using Likert-type-scale questionnaires, and not in the meta-analysis on differential evaluation and expectation of the sexual behavior of men and women. This might be because there was less variation in level of gender equality in the latter meta-analysis, as most studies in that meta-analyses were conducted in the United States. The direction of effect, albeit nonsignificant, was in the same direction as the effect from the meta-analysis on Likert-type scales.
Changes in SDS Over Time
In line with evolutionary theory, and not with biosocial theory, time period in which the study was conducted was no longer significant when controlling for other moderators, which indicated that traditional SDS have existed for decades and are still present. This finding could indicate that stable gender differences in reproductive strategies are underlying SDS. Also, it appears that even though gender roles have become less strict in most modern Western societies (Eagly & Wood, 1999), this did not lead to less differentiation in the norms for the sexual behavior of men and women (Wood & Eagly, 2002, 2012). Possibly, it takes more time for egalitarian gender roles to permeate into the bedroom, than in other domains of life such as the work field, because sexuality is very much a private issue. Furthermore, the content of SDS may have changed over time, because most older studies focused on double standards in premarital sex in different relationship types, whereas newer studies more often focused on double standards in casual sex. Thus, changes in gender roles over time might only be reflected in changes in the behavior specificity of SDS.
Gender Differences in SDS
Regarding gender, we did not find differences between men and women in their cognitions about SDS. In light of male control theory and female control theory (Baumeister & Twenge, 2002), these findings could indicate that both male control and female control contribute equally to the existence of SDS. This means that SDS might provide evolutionary and sociocultural advantages for both genders that they would like to control. Advantages for men that arise from SDS could be improved certainty about paternity (Buss, 1994), patriarchal power over women, prevention of sexual chaos, and reduced male insecurity (Hyde & DeLamater, 1997). The advantages of SDS for women are the high value of sexual favors that they can trade for lower valued favors from men, such as economic provision, monogamous relationships, and parental investment.
Age Differences in SDS
Regarding participants’ age, we did not find support for the predictions of the gender-intensification hypothesis (Hill & Lynch, 1983). It appears that adolescence is not necessarily a period that is characterized by increased gender role pressure and intensification of people’s social cognitions about gender. However, it should be mentioned that most studies were conducted with high-educated college samples, mostly including emerging adults. It may be possible that the relatively small number of studies conducted with adolescents and adults, decreased the power to detect effects of age on SDS.
Implications for Evolutionary Theory and Biosocial Theory
In sum, some of the above findings are in line with evolutionary theory (Buss & Schmitt, 1993; Trivers, 1972) whereas others are in line with biosocial theory (Wood & Eagly, 2002, 2012). This converges with the findings of a recent theory-based narrative review, which demonstrated some support for predictions of both evolutionary theory and biosocial theory about the behavioral specificity of SDS, and for predictions of biosocial theory about cultural differences in SDS (Zaikman & Marks, 2017). Each theory suggests a different mechanism that underlies SDS, but these mechanisms might be intertwined. We therefore propose that a hybrid model explaining SDS from the interplay between biological predispositions and sociocultural pressures is most appropriate (Lippa, 2009). According to biosocial theory, different norms for the behavior of men and women may have arisen from societies’ division in gender roles that expects men to be assertive, dominant, and powerful, and women to be submissive, caring, and kind (Wood & Eagly, 2002, 2012). However, the division in gender roles may have a biological or evolutionary origin (Wood & Eagly, 2012), because there are gender differences in adaptive reproductive strategies leading people to view (sexual) behaviors in men and women differently. Also, the predictive power of evolutionary and sociocultural gender role pressures to explain SDS appears to depend on the sexual behavior or context under consideration. Gender roles may have more predictive power in a sexual context characterized by power/status differences. Yet, evolutionary processes might play a larger role in sexual behaviors that increase successful reproduction.
Conceptualization and Measurement of SDS
We also looked at the effects of moderators related to conceptualization and measurement of SDS. Regarding SDS conceptualization, effect sizes were significant for both stereotypes and personal attitudes. This suggests that both stereotyped beliefs about the sexual behavior of men and women and people’s personal attitudes in response to sexual behavior that violates expectancies are underlying SDS. Yet, traditional SDS were more prevalent in collective or personal expectations about the sexual behavior of men and women (i.e., stereotypes) than in people’s personal evaluation of the sexual behavior of men and women (i.e., attitudes). This finding is in line with the idea that people can have knowledge of collectively shared stereotypes with regard to SDS or personal stereotypical expectations about the sexual behavior of men and women, although they do not apply these stereotypes personally when evaluating other people’s sexual behavior (Milhausen & Herold, 2001; Signorella et al., 1993). It has been argued that knowledge of collective stereotypes is strong, stable, and does not depend on one’s experience with other people, but on culturally shared and generalized social beliefs (López-Sáez & Lisbona, 2009). Indeed, research in children as well as adults showed that content of collective gender stereotypes has not changed over time, whereas gender attitudes did become more egalitarian (e.g., Ruble, 1983; Signorella et al., 1993).
However, our findings with regard to social cognition type need to be interpreted with caution, because the vast majority of studies examined personal SDS attitudes or a mix of stereotypes and attitudes. In studies examining a combination of stereotypes and attitudes, evidence for a reversed double standard was found, a finding that is difficult to disentangle because of the muddled operationalizations of SDS in these studies. Furthermore, in the small number of studies examining stereotypes, it was not possible to distinguish between descriptive and prescriptive aspects, or between personal stereotypes and knowledge of collective stereotypes. Yet, these distinctions are important for future research. For example, knowledge of collectively shared stereotypes is less predictive of one’s own behavior toward men and women than personal stereotypes (Stangor & Schaller, 1996). Furthermore, prescriptive stereotypes (e.g., perceptions of how men and women should behave sexually) might be particularly relevant in the context of SDS as they have been associated with negative evaluations and backlash for people who behave in stereotype-inconsistent ways (Burgess & Borgida, 1999). Indeed, gender stereotypes in general are highly prescriptive in nature (Prentice & Carranza, 2002) and more predictive of people’s personal evaluation of men and women (i.e., attitudes) than descriptive stereotypes (Gill, 2004).
As expected from dual-process models of social cognition (Gawronski & Creighton, 2013), studies using explicit Likert-type-scale questionnaires did not yield evidence for traditional SDS. Yet, studies using more implicit within- or between-subjects designs did yield evidence for SDS. The Likert-type-scale questionnaires often include items such as “It’s worse for a woman to sleep around than it is for a man” in which male and female sexual behavior is explicitly contrasted to each other (Muehlenhard & Quackenbush, 1998). Therefore, in studies using such questionnaires it might have been more clear to participants that personal cognitions about SDS were assessed, leading to social-desirable responding (Greenwald et al., 2009). In between- and within-subjects designs, the focus on SDS is more implicit than in explicit self-report questionnaires. This is because in a between-subject design researchers assessed cognitions about women’s and men’s sexual behavior with separate items or vignettes that they randomly assign to participants, who are generally unaware of the presence of other vignettes presented to other participants. Or in a within-subject design researchers administered separate vignettes or items about women’s and men’s sexual behavior in a counter-balanced way to participants (Jonason & Marks, 2009; Reid et al., 2011; Weaver et al., 2013). Thus, this finding suggests that traditional SDS might only be present at a more implicit level. Previous research indeed showed that implicit assessments are less prone to social-desirable responding (Gawronski & Bodenhausen, 2006) and more likely to suggest existence of traditional gendered cognitions (Endendijk et al., 2013).
However, SDS were not different between studies using between- or within-subjects designs, or between studies using extensive vignettes/scenarios versus studies using questionnaires with different items about the sexual behavior of men and women. This indicates that social desirability and demand characteristics might not necessarily play a larger role in within-subject research on SDS than in between-subject research (Marks & Fraley, 2005; Milhausen & Herold, 2001). Also, this finding suggests that study designs that have only a slightly less explicit focus on SDS (i.e., not contrasting male and female sexual behavior in the same items) can yield evidence for the existence of traditional SDS. This argument is consistent with one study that specifically examined differences in implicit (i.e., under divided attention) and explicit (i.e., under full attention) SDS-cognitions, showing that traditional SDS were only present at an implicit level (Marks, 2008). However, between-subjects designs, like the study by Marks (2008), have been criticized for measuring single standards (because there is no comparison with how an individual would rate another target) instead of double standards (i.e., contrasting evaluation of male vs. female target; Crawford & Popp, 2003). Therefore, using IATs might be a fruitful direction to take to examine SDS at an implicit within-subjects level (see, for example, Sakaluk & Milhausen, 2012).
Our findings regarding questionnaire type indicated that questionnaires differ in the extent to which they yield evidence for SDS, which might also explain the nonequivalent findings in studies using these methods. Studies using the DSS (Caron et al., 1993) reported reversed double standards, whereas studies using the SDSS (Muehlenhard & Quackenbush, 1998) reported more traditional double standards, which might be explained by differences in content and scoring of the questionnaires. In the DSS all but one items are formulated in the direction of a traditional double standard (e.g., “It is up to the man to initiate sex.”) and participants answer the items on a scale ranging from strongly agree to strongly disagree. Such a questionnaire design cannot distinguish between people with reversed and egalitarian sexual standards, because both groups of people will (strongly) disagree with the traditional items. Therefore, we cannot be completely sure that the negative combined mean found in studies using the DSS actually reflects reversed double standards, or an egalitarian view about the sexual behavior of men and women instead. In contrast, the SDSS consists of 20 items occurring in pairs, with parallel items about men’s and women’s sexual behavior (e.g., “A [girl/boy] who has sex on the first date is easy”). In addition, six items contrast men’s and women’s sexual behavior, with some items formulated in the direction of traditional SDS (e.g., “A man should be more sexually experienced than his wife.”) and others formulated in an egalitarian way (e.g., “A woman’s having casual sex is just as acceptable to me as a man’s having casual sex.”). Participants answer all items on a scale ranging from disagree strongly to agree strongly. Difference scores are computed between the 10 male and female items and the six individual item-scores are added to these difference scores. The design of the SDSS makes it possible to assess a more complete range of reversed to traditional double standards than with the DSS. However, the SDSS score range is asymmetrical (−30 to 48). Thus, the more traditional double standards appearing in studies using the SDSS might have been an artifact of the possible range of scores.
Limitations and Future Directions
Some limitations of this meta-analytic study need to be addressed. First, the available body of quantitative research on SDS is highly homogeneous in terms of participant age, ethnicity, and educational level. According to biosocial theory, these factors are important in the social construction of gender roles, and more specifically for the social construction of SDS (Wood & Eagly, 2002, 2012). Therefore, future studies should examine SDS in more diverse samples in terms of ethnicity, age, and educational level.
Second, almost all studies included in this meta-analysis measured SDS in a relatively explicit way, by using self-report questionnaires, even though implicit measures, such as IATs or priming tasks, are less prone to social-desirable responding than explicit measures of stereotypes, and are often better predictors of behavior (Gawronski & Bodenhausen, 2006). Thus, researchers should make use of more implicit tasks to assess SDS. Relatedly, previous research has used many different conceptualizations of SDS, sometimes combining attitudinal aspects with stereotypical aspects within one questionnaire. We advise future researchers to be more theory-driven in their conceptualization, operationalization, and predictions regarding SDS. For example, dual-process models (Gawronski & Creighton, 2013) or social cognition frameworks (e.g., Greenwald et al., 2002) could be used to further conceptualize different aspects of people’s SDS-cognitions, that is, implicit, explicit, attitudes, stereotypes, knowledge of stereotypes, prescriptive versus descriptive aspects, and personal versus collective aspects. New measures need to be developed and validated before we can examine the interplay between different double standard components.
Furthermore, studies assessing SDS via questionnaires sometimes used questionnaires that did not distinguish between people with reversed and egalitarian sexual standards. With such questionnaires, it is impossible to study predictors of individual differences in SDS-cognitions. When researchers would like to use a questionnaire in future studies on SDS, they should use questionnaires with symmetrical scales to assess the complete range of SDS from reversed to traditional (e.g., 20 item-pairs of the SDSS; Muehlenhard & Quackenbush, 1998) or develop new questionnaires that can assess the complete range.
Last, most studies included in this meta-analysis focused on SDS in behaviors associated with high sexual activity and only a few studies have been conducted specifically on behaviors associated with low sexual activity. However, further study of differences in the strength of traditional SDS between behaviors associated with high sexual activity (more male-typical) and behaviors associated with low sexual activity (more female-typical) is important. Such research can test whether boundaries for male-typical (sexual) behavior are more strict than for female-typical behavior (Hort et al., 1990). Also, research on how people acquire traditional SDS-cognitions now is essential for designing future interventions that foster egalitarian sexual standards and sexual equality for men and women.
Conclusion
The current meta-analysis demonstrated that people on average still clearly have traditional cognitions about SDS, in particular with regard to men and women having casual sex, having sex for the first time at an early age, and general sexual activity level. We also found clear evidence of traditional SDS in within- or between-subject experimental studies assessing differences in the evaluation, or expectation, of men’s and women’s sexual behavior. Nevertheless, SDS were less traditional in countries with higher levels of gender equality. This meta-analysis further demonstrated that both evolutionary theory and biosocial theory provide relevant and testable predictions with regard to the existence of SDS. It appears that a hybrid model including both evolutionary processes related to gender differences in parental investment and sexual strategies, as well as the societal division in gender roles can best explain double standards for the sexual behavior of men and women. This meta-analysis also demonstrated the relevance of dual-process models of social cognition (Gawronski & Creighton, 2013) for the measurement and conceptualization of SDS. Therefore, we call for more research on the interplay between evolutionary and sociocultural processes underlying SDS, using implicit as well as explicit conceptualizations and measures that are able to assess the entire range of double standards, from reversed to traditional.
Supplemental Material
Appendix_A_PRISMA_2009_checklist_R3 – Supplemental material for He is a Stud, She is a Slut! A Meta-Analysis on the Continued Existence of Sexual Double Standards
Supplemental material, Appendix_A_PRISMA_2009_checklist_R3 for He is a Stud, She is a Slut! A Meta-Analysis on the Continued Existence of Sexual Double Standards by Joyce J. Endendijk, Anneloes L. van Baar and Maja Deković in Personality and Social Psychology Review
Supplemental Material
Appendix_B_preregistration – Supplemental material for He is a Stud, She is a Slut! A Meta-Analysis on the Continued Existence of Sexual Double Standards
Supplemental material, Appendix_B_preregistration for He is a Stud, She is a Slut! A Meta-Analysis on the Continued Existence of Sexual Double Standards by Joyce J. Endendijk, Anneloes L. van Baar and Maja Deković in Personality and Social Psychology Review
Supplemental Material
Appendix_C_CODING_SYSTEM_R1 – Supplemental material for He is a Stud, She is a Slut! A Meta-Analysis on the Continued Existence of Sexual Double Standards
Supplemental material, Appendix_C_CODING_SYSTEM_R1 for He is a Stud, She is a Slut! A Meta-Analysis on the Continued Existence of Sexual Double Standards by Joyce J. Endendijk, Anneloes L. van Baar and Maja Deković in Personality and Social Psychology Review
Supplemental Material
Appendix_D_R2 – Supplemental material for He is a Stud, She is a Slut! A Meta-Analysis on the Continued Existence of Sexual Double Standards
Supplemental material, Appendix_D_R2 for He is a Stud, She is a Slut! A Meta-Analysis on the Continued Existence of Sexual Double Standards by Joyce J. Endendijk, Anneloes L. van Baar and Maja Deković in Personality and Social Psychology Review
Footnotes
Acknowledgements
The authors would like to thank Joyce ter Heide and Anne Smit for their assistance with screening and coding the articles for this meta-analysis.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material is available online with this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
