Abstract
This paper provides a critical review of the Short Dark Triad (SD3), a 27-item self-report measure developed to assess the personality constructs of Machiavellianism, psychopathy, and narcissism. Widely cited and commonly used in psychological research, the SD3 offers a quick means of measuring these three socially aversive constructs. This review draws together and discusses existing literature on the psychometric properties of the SD3, with a focus on reliability and validity. A structured database search informed this review to ensure breadth and transparency. While the measure demonstrates relatively good internal consistency and test–retest reliability, concerns are raised in relation to its validity—particularly content validity and the conceptual underpinnings of the original three-factor model. Evidence from studies conducted in both Western and non-Western contexts suggests that alternative models (e.g., bi-factor structures, a single general factor, or overlapping constructs on a continuum) may better capture the relationships among the constructs. Limitations in the availability and applicability of normative data are also discussed. This review highlights implications for both research and practice, raising questions about the robustness of the SD3 and recommending caution in its use and interpretation, especially in applied or clinical contexts.
Keywords
Introduction
The Dark Triad, the “new kid on the personality psychology block” (Jonason et al., 2014, p. 117), is a personality theory. Coined by Paulhus and Williams (2002), it concerns three dark personality constructs – narcissism, Machiavellianism and psychopathy – that share malicious features and a “callous-manipulative interpersonal style” (Brinkley et al., 2001, p. 1025). Narcissism is defined by grandiosity, self-esteem and self-interest; Machiavellianism by exploitation, absence of morality and egotism, and psychopathy by callousness, remorselessness and antisocial/impulsive behaviour (Paulhus & Williams, 2002).
Researchers have examined how the Dark Triad construct is linked to broader personality models, particularly the Five-Factor Model (FFM; also known as the Big Five; McCrae & Costa, 1987). Research shows that all three constructs are generally associated with low agreeableness, which has been considered a common dispositional root (Jakobwitz & Egan, 2006; Paulhus & Williams, 2002). However, differential patterns also emerge across constructs. Psychopathy is strongly associated with low conscientiousness and low neuroticism, reflecting impulsivity and emotional detachment (Paulhus & Williams, 2002). In contrast, Machiavellianism has shown positive associations with conscientiousness, suggesting that successful manipulation may require calculated planning and impulse control (Muris et al., 2017). Narcissism, particularly in its grandiose form, is positively related to extraversion and agency-related traits such as leadership and assertiveness, distinguishing it from the other two constructs (Jonason et al., 2014; Raskin & Terry, 1988).
The HEXACO model (Ashton & Lee, 2007), which adds Honesty-Humility as a sixth factor, has also been proposed as a useful framework for understanding dark constructs. Machiavellianism and psychopathy in particular have been strongly predicted by low scores on Honesty-Humility (de Vries & van Kampen, 2010), reinforcing the role of manipulative and exploitative tendencies in their structure. These broader associations provide important conceptual grounding for the Dark Triad and inform interpretation of SD3 scores within both research and applied contexts. While some recent work, such as the D Factor model (Moshagen et al., 2018), challenges the notion that antagonism is the core of the Dark Triad, a discussion of this is beyond the scope of the current review.
The three constructs are conceptually distinct, but the Dark Triad encourages researchers to study them together, given their common dark and malevolent features and “callous core that encourages interpersonal manipulation” (Jones & Paulhus, 2014, p. 3). The theoretical roots of the Dark Triad have been debated, and McHoskey et al. (1998) argued that when studied alone, narcissism, Machiavellianism and psychopathy overlap, and the factors display significant intercorrelation. Paulhus and McHoskey debated this at an APA conference in 2002, inspiring a growing body of research (Paulhus & Williams, 2002). Jones and Paulhus (2014) recognised that the theoretical distinctions between the constructs had become intertwined, so they proposed the theoretically-loyal Short Dark Triad (SD3) psychometric.
Previously, each of the Dark Triad constructs was assessed using its own psychometric: the Self-Report Psychopathy Scale (SRP-III; Williams et al., 2003; a 64-item self-report measure with four subscales: interpersonal manipulation, callous affect, erratic lifestyle, and antisocial behaviour), the standard measure of Machiavellianism (MACH-IV; Christie & Geis, 1970; a 20-item self-report measure which produces a total Machiavellianism score), and the Narcissistic Personality Inventory (NPI; Raskin & Hall, 1979) or Narcissistic Personality Inventory-40 (NPI-40; Raskin & Terry, 1988; a 40-item self-report measure with items derived from the DSM-III criteria for narcissistic personality disorder, with subscales for leadership/authority, grandiose exhibitionism and entitlement/exploitativeness; the measure is not intended for diagnostic purposes, it measures normal and subclinical narcissism). Many researchers felt deterred by the time-consuming nature of these measures combined, so the Dirty Dozen (DD) was created to examine the Dark Triad constructs in tandem (Jonason & Webster, 2010). This short measure contains four questions for each personality construct; however, several studies have raised concerns about its reliability/validity (Jones & Paulhus, 2014). Vize et al. (2019) argued high reliabilities were achieved due to repetitive wording, and cross-correlations were often stronger than the convergent correlations. To ensure accuracy of measurement, particularly with personality constructs, which are considered persistent across the life span, it is necessary to measure the Dark Triad with a well-constructed scale consisting of sound psychometric properties. As with all psychometric measures, reliability, validity and norms are essential qualities to consider.
Body of Work
To account for the DD’s shortcomings, Jones and Paulhus (2014) devised the SD3 to fit into a wider research base. High scores in each Dark Triad construct denote an increased likelihood of engaging in criminality, creating social distress, and causing workplace problems (Azizli et al., 2016; O’Boyle et al., 2012). Recent research has also highlighted the substantial economic burden associated with psychopathy; for example, Gatner et al. (2023) note that societal costs associated with behaviours commonly linked to this construct are estimated to be in the billions of dollars annually in the United States.
Adolescents who scored higher in Dark Triad constructs presented with an increased likelihood of interpersonal violence (Ali, 2020). Moreover, Lyons and Jonason (2015) found that self-confessed thieves scored higher on these constructs. Other research has linked the Dark Triad constructs to internet trolling/cyberstalking, cybercrime, extremism (Harrison et al., 2016; Lopes & Yu, 2017; Pavlović & Wertag, 2021), and COVID-19 rule breaking (Gogola et al., 2021; Konc et al., 2022; Zajenkowski et al., 2020). Although the SD3 is widely used within research, studies demonstrating empirical evaluations of applied utility remain relatively limited.
Since Jones and Paulhus (2014) proposed the SD3, hundreds of academic articles adopting the methodology have been published (Dragostinov & Mõttus, 2022). In terms of psychometric properties, most studies have focused on assessing SD3 language translations. In addition, while previous meta-analyses (e.g., Muris et al., 2017; Vize et al., 2019) have examined the Dark Triad constructs in relation to their intercorrelations, gender differences, broader personality correlates, and methodological concerns such as partialing procedures, this review offers a distinct and complementary contribution by focusing specifically on the psychometric and conceptual adequacy of the Short Dark Triad (SD3) measure itself. In particular, it provides details of content validity and model fit across cultural contexts and incorporates a wide range of more recent international studies not included in earlier reviews. Crucially, this review also emphasises the practical relevance of the SD3, considering its use not only in academic research but also in applied settings such as clinical and forensic contexts. In doing so, it contributes to a more comprehensive understanding of the measure’s strengths and limitations, with the aim of informing its future use across both research and practice.
In summary, the following review aims to examine the psychometric properties of the SD3 in order to provide practitioners and researchers with insight regarding the extent to which scores can be relied upon. An overview of the development of the SD3 will be provided, followed by an overview of the characteristics of the tool. Literature regarding the psychometric properties of the measure will then be drawn together, that is, the reliability, validity and norms. The review will conclude by discussing the utility of the tool in research and practice.
Short Dark Triad Development
The SD3 was created using an item pool designed to examine the classic Dark Triad construct (Jones & Paulhus, 2014). Jones and Paulhus (2014) provided background context regarding the theoretical underpinnings of the constructs, for example, they described impulsivity as the difference between Machiavellianism and psychopathy, whereby those with higher psychopathic traits exhibit more impulsive behaviours, while those with higher Machiavellianism traits tend to be more strategic. Furthermore, the three key elements of Machiavellianism are manipulativeness, strategic calculating orientation and callous affect, while narcissism is a conflict between grandiose identity and insecurity (Wu et al., 2019). Jones and Paulhus (2014) agreed that only the grandiose variant would be represented in the Dark Triad, given the lack of a theoretical link between narcissistic vulnerability/insecurity and darkness. Therefore, the SD3 does not consider vulnerable narcissism.
First, Jones and Paulhus (2014) saturated items, ensuring the conceptual facets were maintained by creating a large pool of 41 items. They reduced the items, avoiding mixing of the conceptual facets of each construct (Jones & Paulhus, 2014). Factor analyses found 14 items did not adequately saturate the factors, or saturated multiple factors, and eventually 13 items related to narcissism and Machiavellianism and 15 to psychopathy (Jones & Paulhus, 2014). Factor analysis and cross-validation resulted in 27 items, nine for each personality construct (Jones & Paulhus, 2014). A three-factor model was found to be optimal when conducting Exploratory Structural Equation Modelling (ESEM) to identify the smallest number of dimensions needed to explain the covariation among variables. The model showed good fit indices: Root Mean Square Error of Approximation (RMSEA) = .04, Comparative Fit Index (CFI) = .93, and Tucker–Lewis Index (TLI) = .91. RMSEA is an absolute fit index that estimates the size of errors produced by a model (Xia & Yang, 2018), while CFI and TLI are incremental fit indices that assess how well the hypothesised model fits relative to a baseline model (Jones & Paulhus, 2014; Xia & Yang, 2018). Jones and Paulhus (2014) also conducted a Confirmatory Factor Analysis (CFA) model using 279 US respondents (M age = 30.7, SD = 10.9), finding the model was an overall good fit and items loaded appropriately (i.e., >.3), albeit slightly weaker than the original (RMSEA = .07, CFI = .82, TLI = .80). The factor loadings (i.e., the correlation between the item and the factor) indicated a moderate correlation of >.3 between the items and personality constructs (Tavakol & Wetzel, 2020). However, research has stipulated a CFI and TLI of .90 or higher is acceptable as an indicator of good fit, which is not the case for the aforementioned results (Hu & Bentler, 1999; Kim et al., 2016). Given the relatively small factor loadings and CFI (for the CFA only), it is questionable whether the model is actually a good fit, and whether the psychometric is based on a sound theoretical model. Nevertheless, the ESEM provided a better fit for this initial evaluation, indicating the final items in the psychometric were suitable for each personality construct (Jones & Paulhus, 2014).
Tool Characteristics
The SD3 contains 27 statements, each with nine questions per personality construct (Jones & Paulhus, 2014). In their initial development research involving 1,063 students and community members across four studies, Jones and Paulhus (2014) reported that the SD3 took an average of 147 seconds to complete. While no manual is available, Jones and Paulhus’s (2014) study provided a detailed overview of the tool. The questions are scored on a five-point Likert Scale, which ranges from disagree to agree, including a neutral option (Jones & Paulhus, 2014). For scoring, the answers for each construct are added together and the mean is calculated.
For a psychometric test to be considered robust, it ought to be constructed as an interval scale (Kline, 1986). Neither Jones and Paulhus (2014) nor other published research has formally determined the SD3’s measurement level. Likert scales are subject to much debate regarding whether they are ordinal or interval level, with many researchers assuming Likert scales to be interval (Wu & Leung, 2017). Strictly speaking, Likert scales yield ordinal data, given the responses have directionality but the intervals cannot be assumed equal (Wu & Leung, 2017). Research regarding the SD3 has classified the data as being interval level in order to use parametric tests (Wu & Leung, 2017). However, Likert scales present at a lower level of measurement than what is ideally required for statistical analysis. This suggests that caution should be taken in considering the results of SD3 studies and the interpretation of parametric data used in the context of an ordinal style scale.
The SD3 is a self-report measure, completed by the respondent (Demetriou et al., 2015). Individuals may therefore falsify their responses to portray positive characteristics (Demetriou et al., 2015). While some personality psychometrics (e.g., Millon Clinical Multiaxial Inventory (MCMI-IV; Millon et al., 2015)), include questions and statistical methods to assess desirability, the SD3 does not, and Jones and Paulhus (2014) have advised discretion among researchers/clinicians.
Conceptual Issues
Since the development of the SD3, research has not only examined its psychometric properties but has also raised concerns regarding the conceptualisation of the Dark Triad constructs. Before reviewing the reliability and validity of the SD3, it is important to consider these conceptual issues, as they have implications for how the measure is interpreted and validated. Specifically, questions have been raised about how the SD3 defines and operationalises Machiavellianism, psychopathy, and narcissism, with concerns about whether these constructs are accurately represented within the measure.
One of the key criticisms within the Dark Triad literature pertains to the measurement of Machiavellianism. Research has consistently shown that most measures of Machiavellianism, including those incorporated in the SD3, fail to align with theoretical descriptions of the construct. Miller et al. (2017) suggest that constructs associated more closely with primary psychopathy, such as impulsivity and antisocial tendencies, are often captured in assessments of Machiavellianism—suggesting that measures of psychopathy and Machiavellianism may not be assessing distinct constructs. This conceptual drift raises concerns about whether the SD3’s Machiavellianism subscale is truly capturing the intended construct or whether it is simply measuring a secondary dimension of psychopathy. This issue is compounded by factor-analytic findings, which often reveal strong correlations between the SD3’s psychopathy and Machiavellianism subscales (Watts et al., 2017), calling into question the assumption that each Dark Triad construct is meaningfully distinct.
A further conceptual issue relates to the assumption that the Dark Triad constructs are unidimensional, as reflected in the SD3’s structure. While the SD3 was developed to provide a concise, single-factor measure of each Dark Triad construct, emerging research indicates that these constructs are best conceptualised as multidimensional rather than singular constructs (Rose et al., 2023). Rose et al. (2023) argue that treating psychopathy, narcissism, and Machiavellianism as unidimensional constructs ignores important subcomponents, such as grandiose versus vulnerable narcissism or primary versus secondary psychopathy. Their findings suggest that the SD3’s approach oversimplifies the complexity of these constructs, potentially leading to misclassification and reduced predictive accuracy.
Similarly, Naor-Ziv et al. (2022) examined the Dark Triad construct within a multidimensional personality framework and found that Machiavellianism and psychopathy tend to cluster closely together, further challenging the assumption that they are distinct constructs. Their findings reinforce concerns that measures such as the SD3 may not adequately capture the nuanced structure of these constructs and that a more multidimensional approach may be necessary to improve construct validity.
Kajonius et al. (2016) critiqued the validity of Dark Triad measures with their primary focus on the Dark Triad Dirty Dozen (DTDD). They highlighted concerns about the factor structure and argue that Machiavellianism and psychopathy tend to collapse into a bi-factor model, rather than existing as distinct constructs. However, their study does not explicitly advocate for a multidimensional representation of the Dark Triad construct in the way that Rose et al. (2023) and Naor-Ziv et al. (2022) do.
More recently, Knitter et al. (2025) conducted a meta-analysis centred on the SD3, examining its external associations with outcomes such as aggression and intelligence. Although their work acknowledges structural and validity concerns, it does not provide an in-depth psychometric critique of the SD3’s internal structure or conceptual clarity. Similarly, Welsh et al. (2024) conducted a comprehensive systematic review of dark personality measures - including the SD3 - using COSMIN criteria to assess psychometric development across 64 instruments. While their review offers broad comparative insights, it does not explore in detail the SD3’s theoretical underpinnings or evaluate its reliability and validity across different populations.
In contrast, the present review offers a targeted and critical appraisal of the SD3. By concentrating on a single, widely used measure, this review integrates conceptual, psychometric, and cross-cultural evidence to examine how well the SD3 captures the construct it purports to measure. This depth of focus enables a more nuanced understanding of the measure’s strengths, limitations, and suitability for diverse research and applied contexts.
These conceptual concerns provide essential context for interpreting the SD3’s psychometric performance. Evaluating a tool’s reliability and validity requires clarity about what it aims to measure and how effectively it does so. The following sections critically examine key aspects of the SD3’s psychometric properties, including internal consistency, structural validity, convergent and discriminant validity, and cross-cultural applicability.
Method
This review adopted a narrative approach to synthesising the psychometric and conceptual properties of the Short Dark Triad (SD3) measure. To enhance transparency and ensure broad coverage of the literature, a systematic review methodology was adopted in the identification and selection of studies.
A structured search was conducted across three major academic databases: PsycINFO, Web of Science, and Scopus. Searches were conducted using the following combination of terms: “Short Dark Triad” OR “SD3” AND “valid*” OR “reliab*” OR “psychometric” OR “factor structure” OR “measure*” OR “norms” OR “translation*” OR “cross-cultural” OR “test-retest” OR “internal consistenc*” OR “construct valid*”.
Search strings were translated into each database’s native syntax. The search was limited to peer-reviewed journal articles published in English and covered the period from January 1, 2014 to May 28, 2025. Additional relevant studies were identified through backward citation searching of key empirical and review papers.
Regarding criteria for inclusion, studies were included if they reported empirical findings involving the SD3 as a measurement tool and examined at least one psychometric property, such as reliability (e.g., internal consistency or test–retest), validity (e.g., construct, convergent/discriminant, criterion), factor structure, or model fit. This included studies that used the SD3 in different populations, settings, languages, or countries, where an aim was to evaluate its psychometric performance in those contexts. In addition, studies were included where they provided conceptual critique relevant to the measure’s structure or theoretical underpinnings.
Studies were excluded if they focused solely on correlates of the Dark Triad construct without evaluating the SD3, employed alternative brief measures without comparison to the SD3, or were non-empirical commentaries, book chapters, or non-English publications. See Figure 1 for the PRIMSA Flow Diagram. PRISMA Flow Diagram
This approach allowed for a comprehensive yet focused synthesis of the available evidence regarding the SD3’s measurement properties and utility across different populations and cultural contexts. The following section synthesises the literature on the SD3’s reliability and validity, with reference to these conceptual concerns where relevant.
Reliability
Reliability is “the trustworthiness/consistency of a measure” (APA, 2023b, para. 1). Alternate form reliability is the degree to which two different versions of the same test correlate with one another (APA, 2023a). This was not assessed as there is only one SD3 version. Moreover, there is no direct literature regarding inter-rater reliability (the consistency of the test conducted by different people), which is understandable given the SD3 is a self-reporting measure (Price, 2015). The inability to examine these should not be considered a weakness but due to the SD3’s design. Research has focused on internal consistency and test-retest reliability.
Internal Consistency
Internal consistency reflects the SD3’s ability for individual items to measure the same domain/construct (Price, 2015). Cronbach’s Alpha examines how closely items are related as a group by providing an internal consistency estimate using correlations (Cronbach, 1951). Cronbach’s Alpha above α > .7 is recommended, specifically, α = .90 to α = .99 is excellent/strong, α = .80 to α = .89 is good, α = .70 to α = .79 is fair, and α = .00 to α = .69 is considered poor (Cronbach, 1951). However, it is important to interpret such thresholds with caution—particularly for multifaceted constructs such as narcissism, Machiavellianism, and psychopathy, which may not exhibit high internal consistency due to their conceptual breadth. Clark and Watson (1995) emphasise that very high alpha values can sometimes indicate item redundancy or narrow construct coverage, while moderately lower values may reflect a more comprehensive representation of the construct. This nuance is particularly relevant when interpreting both Cronbach’s α and McDonald’s ω, as the latter, while often preferred, can also be influenced by factor structure and item composition. Despite these caveats, internal consistency remains a commonly reported indicator of reliability across SD3 studies.
Cronbach’s Alpha Across a Selection of SD3 Studies
For example, a cross-national study by Denovan et al. (2024) reported reliability estimates for the SD3 in adult community samples from the UK, Canada, and Russia. Cronbach’s alphas ranged from 0.68 to 0.84 for Machiavellianism (UK α = .70, Canada α = .84, Russia α = .68), .67 to .72 for narcissism (UK α = .72, Canada α = .67, Russia α = .68), and .70 to .75 for psychopathy (UK α = .75, Canada α = .75, Russia α = .70). Consistent with the original validation study (Jones & Paulhus, 2014), narcissism continued to show comparatively weaker internal consistency across all samples. While some alphas approach acceptable thresholds, the study concluded that reliability across subscales was inadequate overall. This reinforces ongoing concerns regarding the internal consistency of the SD3 - particularly for narcissism - and supports calls for refinement of the measure.
Gamache et al. (2018) suggested the narcissism factor had low internal consistency (α = .64) because one item had a higher endorsement level due to French-linguistics, whereby the wording underplayed entitlement/social dominance. Somma et al. (2020), not included in Table 1 given the Cronbach’s Alpha was presented as a range, found internal consistency was lower among Italian adolescents (α = .67-.69), compared to Italian university students (α = .77-.84). This suggests the Italian SD3 translation may not be appropriate for adolescents; given research suggests personality constructs develop up to 25 years, using an adult version raises ethical concerns (Van Dijk et al., 2020). However, in contrast to the findings of Somma et al. (2020), Francis and Crea (2021) reported lower scores across all three subscales of the SD3 in their study of priests and religious brothers and sisters in Italy. Even after item reduction aimed at improving reliability, internal consistency remained low (Machiavellianism: α = .63; Narcissism: α = .58; Psychopathy: α = .62). The authors concluded that the SD3 may not be appropriate for use with this population.
More recently, Abilleira et al. (2024) investigated internal consistency in a large sample of over 1,600 Spanish adolescents. They adapted the scale specifically for this population (SD3-A), removed reverse-coded items, and conducted confirmatory factor analysis, which demonstrated excellent model fit. The final 18-item version yielded strong internal consistency overall (α = .85), though subscale reliabilities varied: fair for psychopathy (α = .77), and questionable for Machiavellianism (α = .69) and narcissism (α = .65). These findings reflect the continued challenge of achieving satisfactory reliability at the trait level, even when the scale is modified for younger populations. Nonetheless, the large sample and adaptation efforts suggest that targeted revision may be a step in the right direction for improving SD3 reliability in adolescents.
Both Atari and Chegeni (2016) and Chegeni and Atari (2017) found poor/questionable internal consistency across all constructs among an Iranian population. The former found only acceptable internal consistency (i.e., α > .7) after deletion of six items (Atari & Chegeni, 2016; Cronbach, 1951). Given the removal of so many items in the development of the Iranian version, it is queried whether the construct/content proposed by Jones and Paulhus (2014) is valid cross-culturally, and whether the remaining 21-items provide accurate measurement of the construct.
McDonald’s Omega is considered a better way to estimate internal consistency as it requires error loadings from Confirmatory Factor Analysis (CFA; Hayes & Coutts, 2020). The cut-off for Omega is consistent with Cronbach’s Alpha (ω > .7). Persson et al. (2017), among N = 1,487 (M = 33.3, SD = 11.6), found the SD3 Omega total was ω = .90 suggesting only 10% random error variance. Bonfá-Araujo et al. (2021) found SD3 Omega total was (ω = .86), albeit slightly lower than Persson et al.’s (2017), but emphasised this was unusually high, indicating a large general factor accounting for item variance. Klimczak and Turska (2020) found questionable SD3 Omega internal consistency among N = 45 adolescents (ω = .66 narcissism, ω = .71 psychopathy, ω = .67 Machiavellianism) querying the SD3’s reliability among adolescents. Researchers appear eager to use the SD3 with a younger population, but available reliability results suggest this may be unwise.
Taken together, while several studies report α or ω values below conventional thresholds - particularly for narcissism - these figures should be interpreted with caution. Internal consistency estimates can be influenced by item heterogeneity, translation effects, and population differences, and lower values do not always reflect poor measurement. Nonetheless, as shown in Table 1, there is a consistent pattern of weaker reliability estimates for the narcissism subscale across diverse samples. This pattern supports calls for further item refinement and suggests that the SD3’s narcissism items may not adequately capture the breadth or complexity of the construct as conceptualised. As Clark and Watson (1995) highlight, measures of multifaceted constructs may produce lower internal consistency not because of psychometric weakness, but due to the broad content coverage required to validly assess such constructs.
Another method used to assess internal consistency is inter-item correlation (Cohen, 2022). The ideal inter-item correlation range is r = .2–.4, while higher correlations suggest too much inter-relatedness (Piedmont, 2014). Szabó et al. (2021), using the Hungarian SD3 translation with N = 663 employees (M = 37.4, SD = 11.4), found acceptable correlations between all constructs, Machiavellianism and narcissism r = .15, p < .001, Machiavellianism and psychopathy r = .38, p < .001, and psychopathy and narcissism r = .40, p < .001. In a re-examination study, Jones and Paulhus (2017) also found a good level of inter-item correlation between Machiavellianism and narcissism r = .29, p < .001, and narcissism and psychopathy r = .33, p < .001. However, concerns were raised regarding Machiavellianism and psychopathy r = .51, p < .001, suggesting significant inter-item correlation (Jones & Paulhus, 2017), like Zhang et al.’s (2020) finding, r = .58, p < .01. The inter-correlations found in these studies potentially provide evidence for a bi-factor model (see content validity).
Test-Retest Reliability
Test-retest reliability examines correlations between test scores over time (Matheson, 2019). A test with acceptable test-retest reliability shows little change in scores over time. This is essential for the SD3 given that personality is considered persistent/pervasive (Millon et al., 2015). There is no standard interval for determining test-retest reliability, although many standardised tests range from 2 to 12 weeks (Fawns-Ritchie & Deary, 2020). The most common way to measure test-retest reliability is using Pearson’s r correlational analysis; a minimum correlation of r = .70 is advised (Glen, 2020). Malesza et al. (2017) examined test-retest reliability of the German translation of the SD3 (N = 221) over four weeks. Results found that Machiavellianism held r = .81, p < .001, 95% CI [.80–.82], psychopathy held r = .83, p < .001, 95% CI [.81–.85], and narcissism held r = .74, p < .001, 95% CI [.73–.75], indicating acceptable test-retest stability of the measure. Ermis et al. (2018), in testing a Turkish translation of the SD3 (N = 327 students) two weeks apart, found better test-retest results for Machiavellianism (r = .92, p < .001), narcissism (r = .89, p < .001), and psychopathy (r = .91, p < .001). However, Dragostinov and Mõttus (2022; N = 509), found questionable test-retest results for psychopathy (r = .68, p < .001) and Machiavellianism (r = .61, p < 001) over 12 days.
It appears that most published correlation coefficients generally exceed the .70 accepted level, and this is evident across different samples and time lengths. Nevertheless, further research is needed to assess the SD3 over long periods of time, this is particularly relevant given that personality is considered persistent and pervasive.
Validity
Validity examines whether a test measures the characteristic as intended (HR-Guide, 2022). Five validity types were considered: face, criterion (predictive/concurrent), incremental, content, and construct (convergent/divergent).
Face Validity
Face validity refers to the extent to which a test appears effective in terms of its stated aims (Price, 2015). Although no studies have directly assessed the face validity of the SD3, it is generally considered appropriate for measuring the Dark Triad as a composite construct - largely because, as a whole, it appears face-relevant. Often, face validity serves as a precursor to more rigorous forms of validation, assuming it has been adequately established (Jones & Paulhus, 2014). However, this assumption may be premature, as face validity at the level of individual constructs remains contentious. Given the conceptual overlap and subtle distinctions among psychopathy, narcissism, and Machiavellianism, it is not clear that individual items unambiguously reflect their intended constructs. For example, the psychopathy subscale appears to lack items that clearly tap into impulsivity and emotional instability - features that are central to psychopathy in broader models. Instead, the SD3 psychopathy items, such as “I can be mean to others,” align more closely with sadistic tendencies. Similarly, the distinctions between narcissism and Machiavellianism may not be apparent based solely on item wording, raising concerns about whether respondents - or even trained observers - could accurately identify which items correspond to which construct if the items were randomised and scoring instructions withheld. Similarities between Machiavellianism and psychopathy are discussed in later sections and also call into question the initial face validity of these subscales.
Criterion Validity
Criterion validity is the extent to which a test measures the outcome it was designed for (West & Beckman, 2018). There are two subtypes of criterion validity: predictive and concurrent.
Predictive Validity
Predictive validity is the degree to which a test score predicts future behaviour at a later time point (Clemens et al., 2018). No research has focused specifically on predictive validity of the SD3. Some authors have claimed the SD3 demonstrates predictive validity, however, evaluations were not assessed over time, and assertions may be reflective of difficulty in differentiating statistical uses of prediction (i.e., multiple regression) and predictive validity. For example, Dinić and Wertag (2018) claimed to demonstrate predictive validity by examining the SD3 and aggression using the Serbian Reactive-Proactive Questionnaire (Dinić & Raine, 2017; N = 632, M = 30.4, SD = 12.4), finding SD3 psychopathy was the best predictor of reactive and proactive aggression (men, R 2 = .25, p < .001, R 2 = .36, p < .001; women, R 2 = .15, p < .05, R 2 = .22, p < .05). Both results demonstrate stronger relations with proactive than reactive aggression, consistent with theoretical underpinnings related to remorse/guilt/empathy and instrumental offending (Dinić & Wertag, 2018). Although multiple regression provides evidence of prediction, this was not assessed over time, so likely demonstrates convergent validity rather than predictive validity.
Concurrent Validity
Concurrent validity is the extent to which a psychometric compares to a related measure at the same time point (West & Beckman, 2018). Most studies have compared the SD3 to its predecessor, the DD (Jonason & Webster, 2010). For concurrent validity the correlation should be as high as possible (around r > .9), however correlations of r > .75 are acceptable (Lin & Yao, 2014). The DD, a brief 12-item questionnaire, examines the Dark Triad, and was created, like the SD3, to negate the need for long psychometrics (Jonason & Webster, 2010). However, the DD has been subject to significant criticism due to poor validity, despite good overall reliability (Kajonius et al., 2016). Nevertheless, researchers consider it an important measure by which to assess the SD3’s concurrent validity.
In the original study, Jones and Paulhus (2014) found correlations between SD3 and DD scales ranging from r = .46, p < .01 to r = .56, p < .01. Jones and Paulhus (2014) suggested this demonstrated concurrent validity, although the correlations were low given the SD3 is the DD’s improvement. Similarly, Maples et al. (2014) found the DD and SD3 subscales ranged from r = .54, p < .001 for narcissism to r = .65, p < .001 for psychopathy (also evidencing convergent validity correlations). Moreover, Egan et al. (2014) examined happiness and wellbeing (N = 840), finding correlations between the SD and DD scales: Machiavellianism (r = .38, p < .001), narcissism (r = .50, p < .001) and psychopathy (r = .59, p < .001). Egan et al. (2014) concluded both measures were comparable. Later, Geng et al. (2015) compared the DD and SD3, finding modest correlations for narcissism (r = .45, p < .01) and psychopathy (r = .47, p < .01). Arguably, these are low correlation levels. However, Geng et al. (2015) reported this is normal among personality measures, yet suggested this implies the SD3 and DD have dissimilar constructs. These results suggest only moderate convergence with the DD, which may appear suboptimal when considered against Lin and Yao’s (2014) recommended thresholds for concurrent validity. However, it is important to note that the SD3 was designed to improve upon the DD’s factorial ambiguity and to offer clearer differentiation between the three Dark Triad constructs. Therefore, some degree of divergence may reflect conceptual refinement rather than measurement failure. Nonetheless, the variability in correlations raises valid questions about the appropriateness of using the DD as a benchmark for concurrent validity.
Incremental Validity
Given the difficulties using the DD to assess concurrent validity, researchers have examined incremental validity (i.e., whether a new psychometric increases predictive ability beyond an existing measure; Sackett & Lievens, 2008). Incremental validity assesses whether a predictor can explain an outcome, above all other predictors, usually using hierarchical regression to measure the prediction of one criterion after another (Sackett & Lievens, 2008). Gamache et al. (2018), in developing a French SD3 (N = 405, M = 31.0, SD = 12.0), compared SD3 and DD measures to predict psychopathy on the Expanded-Levenson Self-Report Psychopathy Scale (Levenson et al., 1995). Using regression analysis, Gamache et al. (2018) reported the SD3 Machiavellianism demonstrated better incremental validity than DD Machiavellianism and showed better coverage through its associations with callousness (z = 4.10, p < .05), egocentricity (z = 2.87, p < .05), and unempathetic features (z = 3.78, p < .05), with similar results found for psychopathy. The authors concluded the SD3 is the most informed choice for assessment of impulsivity and antisocial tendency, however cautioned that the DD provides a better alternative to examine lack of empathy.
Content Validity and Model Fit
Content validity is the extent to which a measure represents all facets of a construct and whether the instrument and items reflect theoretical/empirical research (Dixon & Johnston, 2019). Factor analysis can identify whether the SD3 corresponds to the Dark Triad construct (Dixon & Johnston, 2019). SD3 content validity determines whether Jones and Paulhus’s (2014) three-factor model fits across samples using CFA, Exploratory Factor Analysis (EFA) or ESEM. In terms of model criteria for these, a significant model fit is required (X2), and for CFA, ideally a CFI >.95 for excellent fit, or CFI >.90 for good fit, and similar TLI criteria (Hu & Bentler, 1999; Kim et al., 2016). Moreover, a RMSEA as close to zero as possible (or < .06) suggests the model (i.e., of Dark Triad constructs) is an acceptable fit (Hu & Bentler, 1999; Kim et al., 2016).
Several studies have replicated Jones and Paulhus’s (2014) three-factor model. Malesza et al. (2017), using CFA in the German SD3 translation (N = 598, M = 27.3, SD = 7.5), found Maximum Likelihood indicated a CFA three-factor model was a good fit, χ2 (102) = 206.24, p > .05, RMSEA = .04, CFI = .98, TLI = .99 (CI not provided), compared to a tested one-factor model. Moreover, Pineda et al. (2020), developing a Spanish version (N = 454, M = 34.6, SD = 11.1), found a CFA three-factor model fitted well, and items loaded convincingly onto their hypothesised factor, χ2 (321) = 574.83, p < .001, CFI = .93, RMSEA = .04, 90% CI [.04, .05] (TLI not provided), although this model was compared to the DD model rather than other types of models (e.g., bi-factor models), so comparison is relatively limited. Nevertheless, it appears the Spanish/German/English SD3’s evidence content validity.
However, Pechorro et al. (2019) found CFA only supported a three-factor model after two items per dimension were removed, similar to Atari and Chegeni (2016) with seven items removed overall. Atari and Chegeni (2016) related this to cultural norms, for example “being acquainted with important people” (question 14; p. 116), is not dark but pertinent for economic survival in Iran. Similar item-removal patterns have been seen across studies (Čopková & Šafár, 2021; Salessi & Omar, 2018), questioning content validity, and whether studies fulfil Jones and Paulhus’s (2014) theoretical stance given content removal.
Several studies suggest content validity of a three-factor model is poor. Rogoza and Cieciuch (2017) proposed a bi-factor model including narcissistic grandiosity and “Dark Dyad” (i.e., psychopathy/Machiavellianism subsumed; p. 760), theorising that Machiavellianism was an aspect of psychopathy. Persson et al. (2017) found an EFA bi-factor SD3 was acceptable among a large sample (N = 1,487, M = 33.3, SD = 11.7), χ2 (273) = 1,794.55, p < .001, RMSEA = .06 (90% CI [06, .06]), although it was evident the Machiavellianism items contributed more to the general factor than the narcissism items. The authors concluded a bi-factor model was more appropriate, and for Machiavellianism and psychopathy to be subsumed as one factor. In testing this, the bi-factor CFA with two specific factors (N = 17,740 participants), found (χ2 [297] = 16,651.80, p < .001, CFI = .97, TLI = .97, RMSEA = .08, CI [.08, .08]), which fitted better than other tested models (according to the CFI, TLI and RMSEA criteria; Hu & Bentler, 1999; Kim et al., 2016; Persson et al., 2017). Gamache et al. (2018; N = 405, M = 31.0, SD = 12.0) found a bi-factor ESEM had the best fit, given the highest CFI and TLI and lowest RMSEA (χ2 = 426.67 (227), p < .05, CFI = .96, TLI = .94, RMSEA = .05, CI [.04, .05]) compared to the bi-factor CFA model (χ2 = = 655.15 (273), p < .001 CFI = .92, TLI = .90, RMSEA = .06, CI [.05, .06]), and orthogonal three-factor ESEM (χ2 = 553.26 (250); p < .001, CFI = .93, TLI = .91, RMSEA = .06, CI [.05, .06]). All items loaded onto a global Dark Triad factor, in particular raising concerns about the psychopathy coverage, which may suggest psychopathy’s items assess characteristics shared by all three constructs (Gamache et al., 2018). Klimczak and Turska (2020), in testing Jones and Paulhus’s (2014) original three-factor classic model, demonstrated a strong correlation between psychopathy and Machiavellianism (r = .94, p < .001) among Polish youths (N = 405; M = 14.4, SD = 1.1). They tested a bi-factor Dark Dyad model with psychopathy and Machiavellianism as two dimensions which best fitted the data, while also showing that Persson et al.’s (2017) model was not the best fit. There appears overlap between SD3 constructs, unfortunately a common issue for multifaceted constructs.
A bi-factor ESEM model was found as best-fitting in the Chinese SD3 translation (N = 507, M = 20.5, SD = 1.1). Zhang et al.’s (2020) bi-factor ESEM (with one general factor and the three personality constructs as separate factors) fitted better (χ2 [249] = 450.73, p < .001, CFI = .93, TLI = .90, RMSEA = .04, CI [.03, .05]) than two other tested factor models (correlated three-factor CFA and orthogonal three-factor ESEM models) which demonstrated lower CFIs, TLIs and higher RMSEAs, across two different samples. Given the specific bi-factor model design (i.e., whereby SD3 items were indicative of both one general factor and the specific three personality constructs), this is more similar to the original version, as opposed to models whereby psychopathy and Machiavellianism have been subsumed as one factor. However, in Zhang et al.’s (2020) study, some narcissism factor loadings were low, but this likely related to the collectivist sample, with similar results to Park et al. (2021) in Korea. Moreover, Bonfá-Araujo et al. (2021), when testing a Brazilian sample (N = 1,965, M = 25.5, SD = 9.0), using CFA, found the three-factor model was poor (χ2 (321) = 3,589.98, p < .001, RMSEA = .07, CFI = .86, TLI = .85), but the bi-factor model (with one general factor and the personality constructs) yielded an improvement (χ2 (294) = 2,714.41, p < .001, RMSEA = .07, CFI = .90, TLI = .88) according to the aforementioned criteria (Hu & Bentler, 1999; Kim et al., 2016). Like Zhang et al. (2020), Bonfá-Araujo et al. (2021) concluded a general underlying factor was likely present, and which could evidence a continuum of constructs, rather than separate construct entities. Moreover, Machiavellianism and psychopathy correlations were the highest (r = .60 p < .001), indicating the existence of a large general factor (Bonfá-Araujo et al., 2021). They therefore suggested the construct might be better conceptualised on a spectrum of narcissism-Machiavellianism-psychopathy, and future research ought to consider a unidimensional model. These findings align with emerging frameworks such as the Dark Core model (Moshagen et al., 2018), which conceptualises malevolent constructs as facets of a single antagonistic dimension, thereby supporting both bifactor and unidimensional approaches. Moreover, the newly developed SD4 (Paulhus et al., 2021), which includes 12 items per construct and introduces everyday sadism as a fourth construct, was designed to improve psychometric precision and address the SD3’s limitations in construct differentiation and structural validity.
Content validity evidence converges towards a potential bi-factor model (i.e., Dark Dyad) with Machiavellianism and psychopathy subsumed, and a separate narcissism factor. Such findings were also reflected in Vize et al.’s (2018) meta-analysis of the Dark Triad, whereby the psychopathy and Machiavellianism nomological networks overlapped substantially, to the extent they were nearly indistinguishable, and conclusions suggested that Machiavellianism was better understood as a measure of psychopathy. Theoretically, Machiavellianism and psychopathy could be the dark core, while narcissism contains brighter, rather than conceptually dark characteristics, querying whether flawed theoretical concepts underpin the SD3 (Szabó et al., 2021). Similarly, Muris et al. (2017) using a meta-analytic method, argued the Dark Triad constructs were not sufficiently distinct and the SD3 is a too simple measurements to capture such malevolent and complex constructs. Other evidence suggests a continuum of the three constructs, given an underlying dark construct (also a bi-factor model) found among studies (e.g., Bonfá-Araujo et al., 2021). Taken together, evidence converges towards a bi-factor model, although disagreement exists about the specific conceptualisation and structure of the bi-factors. Underlying the aforementioned model fit concerns, themes in research continually highlight that the SD3 may be conceptualised on flawed theoretical Dark Triad conceptualisation (Muris et al., 2017; Vize et al., 2018).
Construct Validity
Construct validity concerns a psychometric measure’s ability to measure its intended construct and whether the test correlates with theory (Pelz, 2022). Types of construct validity include convergent and divergent. Convergent validity is the ability of a measure to correspond to other measures, demonstrated by positive correlations between measures of related constructs (Krabbe, 2017). Carlson and Hederman’s (2010) criteria suggest acceptable convergent validity is achieved if the correlation coefficient is r > .50, ideally r > .70. Contrastingly, divergent validity is how much a test deviates from another measure whose underlying construct is different, i.e., no/negative correlation suggests good divergent validity (Krabbe, 2017).
Convergent and Divergent Validity With Long Dark Triad Psychometrics
Jones and Paulhus (2014) originally tested convergent validity against standard measures of each of the Dark Triad constructs: MACH-IV (Christie & Geis, 1970), SRP-III (Williams et al., 2003), and NPI-40 (Raskin & Terry, 1988). Among N = 230 adults, Jones and Paulhus (2014) found each subscale correlated well with these tools, for example, positive correlations between SD3 Machiavellianism and MACH-IV (r = .68, p < .01), SRP-III and SD3 psychopathy (r = .78, p < .01), and NPI-40 and SD3 narcissism (r = .70, p < .01). They claimed the SD3 subscales are appropriately distinct and comparable to related tools, with acceptable correlations (i.e., r > .50; Carlson & Herdman, 2010; Jones & Paulhus, 2014). Malesza et al. (2017) compared the SD3 subscales with the same psychometrics (N = 384, M = 23.0, SD = 1.8). The SD3 showed correlations ranging from r = .28, p < .01 (SRP-III antisocial behaviour facet with SD3 narcissism) to r = .71, p < .01 (SRP-III manipulation facet with SD3 psychopathy), providing evidence of the convergent validity between the variables (Malesza et al., 2017). Overall, these studies show convergent/divergent validity for the SD3 scales with longer psychometrics, albeit some with lower correlations than Carlson and Hederman’s (2010) criteria (r > .50), such as Malesza et al.’s (2017) result for the SRP-III antisocial behaviour facet and SD3 narcissism.
Convergent and Divergent Validity With the Big Five
Studies have focused on comparing the SD3 with personality measures such as the Big Five (Goldberg, 1993). O’Boyle et al.’s (2014) meta-analysis found SD3 narcissism in general correlated positively with extraversion and negatively with agreeableness, concurring with theoretical underpinnings that, from a Big Five perspective, individuals high in narcissism are considered “disagreeable extraverts” (Paulhus, 2001, p. 228). Moreover, SD3 Machiavellianism and psychopathy correlated negatively with Big Five agreeableness and conscientiousness, suggesting evidence of divergent validity (O’Boyle et al., 2014). However, as previously described conceptually, it would be expected that Machiavellianism would demonstrate less divergent validity with conscientiousness than psychopathy, suggesting a potential lack of distinctness between Machiavellianism and psychopathy. Vize et al. (2018) highlighted this issue by showing that Machiavellianism, as measured by the SD3, correlates with low conscientiousness to a similar extent as psychopathy. They argued that this undermines the conceptual distinctiveness of Machiavellianism, particularly since higher conscientiousness has traditionally been seen as a distinguishing feature.
Čopková and Šafár (2021) examined convergent/divergent validity of the Slovak SD3 using the NEO-Five Factor Inventory (NEO-FFI; N = 333, M = 26.5, SD = 11.4; Costa & McCrea, 1992). In considering their results, using Spearman’s Rho, SD3 psychopathy was “slightly” (p. 659) negatively correlated with openness (r = −.19, p < .01) and conscientiousness (r = −.31, p < .01), and had a “strong” (p. 659) significant negative relationship with agreeableness (r = −.62, p < .01; Čopková & Šafár, 2021). Psychopathy’s negative correlation with agreeableness confirms theory suggesting a typical characteristic of someone who scores highly on psychopathy is coldness in interpersonal relationships (Čopková & Šafár, 2021). Considering further results, Čopková and Šafár (2021) found no relationship between narcissism and conscientiousness (r = .03, p > .05), which was to be expected given that they are theoretically dissimilar, demonstrating divergent validity (Carlson & Herdman, 2010).
Convergent and Divergent Validity With HEXACO
Given the Big Five’s theoretical limitations (e.g., factor analysis over-reliance, construct broadness), some researchers have opted to use the HEXACO (Big Five with “honest humility”; Ashton & Lee, 2007). In a Chinese sample, Zhang et al. (2020) found correlations between the SD3 and HEXACO: Machiavellianism significantly correlated negatively with all HEXACO honest-humility facets, with values ranging from r = -.30, p < .01 for modesty to r = -.48, p < .01 for sincerity, demonstrating divergent validity. Moreover, Pailing et al. (2014) found convincing convergent/divergent results (N = 159; M = 29.3, SD = 11.1): positive significant Spearman Rho correlations between SD3 narcissism and HEXACO extraversion (r = .49, p < .01), while other SD3 constructs did not significantly correlate (divergent). Although Pailing et al. (2014) claimed evidence of convergent validity for SD3 narcissism and HEXACO extraversion, according to Carlson and Hederman’s (2010) criteria (r > .50) the positive correlation is not quite within the acceptable range to demonstrate this.
Convergent and Divergent Validity With Other Measures
Some researchers have opted to examine the convergent/divergent validity using other measures. For example, Pechorro et al. (2019) examined convergent/divergent validity among 412 adolescent offenders (M = 13.2, SD = 1.4) using an array of psychometrics considered theoretically linked to the SD3 (DD, Jonason & Webster, 2010; Self-Report Delinquency Scale, SRD, Elliott et al., 1985; Rosenberg Self-Esteem Scale, RSES, Rosenberg, 1965; Brief Self-Control Scale, BSCS, Tangney et al., 2004). Pechorro et al.’s (2019) results suggest “good” (p. 281) convergent validity between the DD and the SD3 (Machiavellianism and DD r (men) = .50, p < .001 and r (women) = .51 p < .001; narcissism and DD r (men) = .61, p < .001 and r (women) = .57, p < .001; psychopathy and DD r (men) = .53, p < .001 and r (women) = .46, p < .001. Similar findings were found between the SD3 and SRD scale (Machiavellianism and SRD r (men) = .42, p < .001 and r (women) = .26, p < .001; narcissism and SRD r (men) = .46, p < .001 and r (women) = .18, p < .05; psychopathy and SRD r (men) = .43, p < .001 and r (women) = .22, p < .001). Conversely, they found null/negative correlations between SD3 Machiavellianism, narcissism and psychopathy and the RSES and BSCS (i.e., null/low negative correlations: Machiavellianism and RSES r (men) = .20, p > .05 and r (women) = .04, p > .05; Narcissism and RSES r (men) = −.01, p > .05 and r (women) = .03, p > .05; psychopathy and RSES r (men) = .05, p > .05 and r (women) = −.10, p > .05; Machiavellianism and BSCS r (men) = −.18, p < .05 and r (women) = −.16, p < .05; narcissism and BSCS r (men) = −.06, p > .05 and r (women) = −.08, p > .05; psychopathy and BSCS r (men) = -.16, p < .05 and r (women) = −.15, p < .05), which demonstrate divergent validity (Pechorro et al., 2019). Pechorro et al. (2019) concluded the latter results corroborate with theoretical underpinnings: self-esteem is likely low for all SD3 constructs and typically characterised by low self-control. The results overall suggest good convergent/divergent validity with theoretically-linked measures (Pechorro et al., 2019).
Further support for the SD3 is offered by Francis and Crea (2021), who found that emotionality was significantly positively correlated with Machiavellianism and subclinical psychopathy, but negatively correlated with all three Bright Triad traits (emotional intelligence, purpose in life, and intrinsic religiosity). Extraversion was unrelated to the Dark Triad traits but showed positive associations with the Bright Triad traits. Age and sex differences also emerged in expected directions, with older participants reporting lower levels of Machiavellianism and men scoring higher on narcissism, lending some support to the construct validity of the SD3.
However, it could be argued that construct validity is not comprehensively established in this study. The authors did not conduct confirmatory factor analysis to evaluate the SD3’s underlying factor structure, nor did they provide an in-depth assessment of criterion-related validity. Moreover, the need to remove poorly performing items and the low internal consistency coefficients (all α < .7) raise concerns about both the conceptual breadth and measurement stability of the SD3 in this context. These limitations potentially weaken the case for strong construct validity within this specific population.
Factor Structure and Dimensionality
Construct validity also encompasses the extent to which a scale’s structure aligns with theoretical expectations. Multiple studies have raised concerns about the SD3’s structural validity, particularly regarding the distinctiveness of Machiavellianism and psychopathy (e.g., Gamache et al., 2018; Persson et al., 2017). Denovan et al. (2024) provide particularly robust evidence for this issue, using ESEM and Rasch analysis across three national samples (UK, Canada, and Russia). Their analyses supported a three-factor bifactor model, indicating the presence of a general “dark” factor alongside the three specific traits. However, they noted strong intercorrelations between the Machiavellianism and psychopathy factors, raising questions about the empirical distinctiveness of these traits — a finding consistent with earlier critiques (e.g., Vize et al., 2018).
Siddiqi et al. (2020) similarly questioned the SD3’s structural validity in a large Indian sample. Confirmatory factor analysis showed poor model fit for the original three-factor structure (CFI = 0.569, GFI = 0.888, AGFI = 0.869; CMIN/df = 4.365), falling below recommended thresholds. Exploratory factor analysis instead yielded a two-factor structure, with Machiavellianism and psychopathy loading onto a shared “dark dyad” factor, while narcissism emerged as a distinct construct. A bifactor model further supported this distinction, indicating that Machiavellianism and psychopathy reflected a general dark personality core, but narcissism did not. Reliability estimates were modest (dark personality: CR = 0.68, AVE = 0.18; Dark Dyad: CR = 0.12, AVE = 0.06; narcissism: CR = 0.46, AVE = 0.18). While the dark dyad factor was significantly associated with higher social dominance, lower morality, and lower empathy, narcissism showed weak or non-significant links with these outcomes. Overall, the findings suggest that the SD3 does not clearly distinguish all three traits in this cultural context, with narcissism emerging as a comparatively less maladaptive and more conceptually distinct dimension.
Moreover, Denovan et al. (2024) identified Differential Item Functioning (DIF) across countries, suggesting that some SD3 items may not operate equivalently across cultural contexts. These results highlight the need for ongoing refinement of the SD3 to ensure both structural validity and cross-cultural applicability. While the overall model fit was encouraging, the study ultimately concluded that the scale’s dimensionality may not be fully adequate, particularly in terms of discriminating Machiavellianism from psychopathy.
Norms
Norms, established through control groups, are essential for test interpretation at individual/group level (Ruble, 2017). Although Jones and Paulhus (2014) presented means regarding the sample’s scores on each Dark Triad construct, they did not provide normative data. Norms are only available to view when an individual has personally taken the SD3 online, either at https://www.psytoolkit.org/survey-library/short-dark-triad.html or https://openpsychometrics.org/tests/SD3/. The former, PsyToolKit (2021), claims the SD3 normative data on their webpage was sourced from Professor Paulhus’ website, however the webpage link provided does not work. In accessing Professor Paulhus’ website (Paulhus, n.d.) via Google, only the Short Dark Tetrad (SD4; Paulhus et al., 2021) is available, with no information regarding a SD3 normative sample. Therefore, it is unclear where the information provided by PsyToolKit (2021), however it is claimed that for Machiavellianism a score >3.86 is outside of normal range (M = 3.1, SD = 0.76), for narcissism a score >3.68 is outside of normal range (M = 2.8, SD = 0.88), and for psychopathy a score >3.40 is outside of normal range (M = 2.4, SD = 1.0).
Regarding the norms available on the OpenPsychometrics (2021) website, OpenPsychometrics reported being aware the SD3 did not have a normative sample, and claimed this was because most research was conducted using convenience samples rather than control groups. Therefore, OpenPsychometrics (2021) reported they developed SD3 norms by producing a statistical estimate of a normative sample across several studies. They used Google Scholar to search for studies to include, and claimed they chose 10 studies based on “quality” (para. 8), those that reported the mean and standard deviation, and had a “large sample size” (para. 8), while “studies with similar samples to those already collected were skipped” (para. 8). OpenPsychometrics (2021) claimed that the “weighted average of the samples” (para. 9) scores of the 10 included studies produced the following norms: Machiavellianism M = 2.96, SD = 0.65; Narcissism M = 2.97, SD = 0.61; Psychopathy M = 2.09, SD = .63. However, caution must be taken when interpreting the data available on the OpenPsychometrics (2021) webpage, given that Butler (2015) and Adler (2017) research are both unpublished thesis’. Baughman et al.’s (2012) study was dated before Jones and Paulhus (2014) published their original SD3 study. Moreover, OpenPsychometrics (2021) comment on their webpage that “the estimated distribution for US adults [presented in a graph on the webpage] is probably not very good because so far research has only been done with convenience samples and the assumptions of the model are not fit particularly well” (para. 3). Given these concerns, and as Jones and Paulhus (2014) did not provide norms in their original study, and percentiles were not provided for the above-mentioned data, limitations certainly arise regarding the efficacy of the PsyToolKit (2021) and OpenPsychometrics (2021) norms claims.
Culture/Gender Norms
In considering norms, it is a limitation that available data from PsyToolKit (2021) and OpenPsychometrics (2021) are not separated by gender. Jones and Paulhus (2014) acknowledged that men scored significantly higher on all SD3 constructs in alignment with evolutionary/social theory, however supporting research has been relatively mixed, partially due to unequal sex ratios within samples (Burtăverde et al., 2023). Nevertheless, Pineda et al. (2020) found males scored higher on all constructs (t [452] = 4.31, p < .001) as expected, mirroring Malesza et al.’s (2017) study (t [479] = 14.17, p < .01) and Zhang et al.’s (2020) Chinese SD3. Pechorro et al. (2019) found male adolescents scored higher on all constructs: Machiavellianism (males M = 17.9, females M = 16.0); narcissism (males M = 18.6, females M = 17.3); psychopathy (males M = 14.0, females M = 11.5). In considering gender/cultural differences, Burtăverde et al. (2023), using factor analysis among a Romanian population (N = 4,134, M = 29.0, SD = 11.4), found that some Machiavellianism factors in the CFA model did not load for both men and women, and for the ESEM model, and some poor loadings were evident in the women sample. However, in their final model tested, Burtăverde et al. (2023) concluded that there was “strict invariance across gender” (p. 11; that is, the model met certain statistical criteria that provided evidence that the same construct was being measured and interpreted the same way for both genders). Burtăverde et al. (2023) found the SD3 was problematic to use in developing countries where individuals scored higher on the Dark Triad constructs due to competitive living/work environments. It is concerning that gender/age/culture norms are unaccounted for; an area future research can contribute to. Such research can heighten the understanding of personality structures across protected characteristics and provide baseline information to enhance accurate result comparisons within certain populations. This will ultimately improve the accuracy of research using the SD3 by ensuring that test results are applicable to population norms.
Conclusions
The rationale of the SD3 put forward by Jones and Paulhus (2014) was clear: to address the limitations of the DD by providing distinct definitions of each construct, ensuring theoretical underpinnings were incorporated into the three-factor model. Within the Dark Triad field, the SD3 has been used extensively within research, as demonstrated by citation numbers.
For reliability, the SD3 holds acceptable internal consistency among studies. However, a lower than expected Cronbach’s Alpha was found for narcissism, and issues of low internal consistency among adolescents were reported. Research considering inter-rater and parallel-forms reliabilities is unavailable, however this is not necessarily a weakness, but relates to test design.
Although SD3 reliability appears positive, a differing picture emerges for validity. Concurrent validity has been assessed by comparing the SD3 and DD, with relatively low correlations found between the two measures (Jonason & Webster, 2010; Jones & Paulhus, 2014). A better picture was presented with incremental validity, however, further research is needed in this area. Limitations arose due to a lack of knowledge regarding predictive validity. Longitudinal data, measuring the Dark Triad against subsequent diagnoses/criminal behaviour, are required to provide evidence of predictive validity and confirmation of theoretical underpinnings. Researchers claim good convergent/divergent correlations (construct validity) between the SD3 and longer Dark Triad psychometrics and personality tests, however many correlations were below the recommended cut-off, demonstrating potential over-optimism among researchers, perhaps due to a personal investment in their SD3 translation (Glen, 2020). Given the Big Five limitations, further convergent/divergent research between the SD3 and personality psychometrics is needed using a medical model (e.g., ICD-11, DSM-5; American Psychiatric Association, 2013; WHO, 2019). Issues were noted regarding the SD3’s content validity, whereby a bi-factor model better fits the data, and high inter-item correlations between Machiavellianism and psychopathy suggest these ought to be subsumed and are likely measuring over-lapping constructs, while narcissism is a distinct brighter construct. Some have proposed the Dark Triad as a spectrum (Bonfá-Araujo et al., 2021).
Issues arose in the lack of published norms (Kline, 1986). Cultural norms are important to establish, given the significant body of research regarding SD3 translations. It appears that the SD3 may have fallen victim to a Westernised view of personality, and without standardised cultural norms it is difficult to assess the applicability of Dark Triad constructs in non-Western countries.
Although further research regarding norms and predictive validity would be beneficial, it is recommended that rather than continuing to assess the SD3’s psychometric properties and creating language translations, future research should focus on the development of a new Dark Triad psychometric, addressing the aforementioned psychometric shortcomings. Moreover, research is required on the newly developed SD4 (Dark Tetrad, including sadism; Paulhus et al., 2021); as yet it is unclear as to whether the SD4 accounts for the SD3’s limitations. In addition, as highlighted in the introduction, recent research has raised fundamental concerns regarding the conceptualisation of the Dark Triad constructs (e.g., Naor-Ziv et al., 2022; Rose et al., 2023), further suggesting that a reassessment of these constructs is needed. Promising developments include the HEXACO-based Machiavellianism scales (Marcus et al., 2023), the Five-Factor Model Antagonistic Triad Measure (Rose et al., 2023), and the newly developed Facet-level Dark Triad Scale (Martindale et al., 2022), which each address key conceptual and structural limitations of earlier measures. These emerging tools highlight the importance of revisiting the theoretical foundations of the Dark Triad and developing psychometric instruments that better reflect its multidimensional nature.
In summary, the SD3’s merits are potentially outweighed by the reliability and validity limitations presented in this review. For use within research and clinical/forensic contexts, the SD3 likely does not provide an accurate, theoretical reflection of the Dark Triad constructs, even as a brief screening measure. It is recommended that caution is taken by researchers and practitioners should the SD3 be used, while clinical judgement and further psychometric testing is certainly required in the interpretation and differentiation of each construct, in line with medical models of personality. Moreover, narcissism would benefit from examination singularly, given the view that it is a brighter distinct construct compared to Machiavellianism and psychopathy.
Until conceptually sound alternatives are developed and empirically validated, researchers may - where appropriate - wish to draw on the original, construct-specific measures to assess individual constructs. While these tools (e.g., the MACH-IV; Christie & Geis, 1970; the SRP-III; Williams et al., 2003; and the NPI or NPI-40; Raskin & Hall, 1979; Raskin & Terry, 1988) are themselves subject to critique and require further meta-analytic evaluation, they may offer more detailed coverage of their respective constructs than the brief SD3. However, given the increasing evidence of construct overlap and theoretical ambiguity within the Dark Triad, researchers should ensure that their choice of measure is informed by a clearly defined conceptual rationale.
Supplemental Material
Supplemental Material - A Critical Review of the Short Dark Triad (SD3)
Supplemental Material for Can Bride Price “Purchase” Happiness? Evidence From China by Qiyangfan Feng, Louise Latham1 and Zoe Stephenson in Personality Science
Footnotes
Author Note
Not applicable.
Acknowledgements
Not applicable.
Author Contributions
Conceptualisation (Dr. Louise Latham); Literature Search and Data Analysis (Dr. Louise Latham; Dr. Zoe Stephenson); Draft (Dr. Louise Latham); Critical Revision (Dr. Zoe Stephenson; Dr. Louise Latham). The paper was submitted to the SAGE-track system.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical Consideration
This a review manuscript. No ethical approval or informed consent was required.
Data Accessibility Statement
Not applicable.
Supplemental Material
Supplemental material for this article is available online. Depending on the article type, these usually include a Transparency Checklist, a Transparent Peer Review File, and optional materials from the authors.
Notes
Not applicable.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
