Abstract
With a recent surge of research on narcissism, narcissism questionnaires are increasingly being translated and applied in various countries. The measurement invariance of an instrument across countries is a precondition for being able to compare scores across countries. We investigated the cross-cultural measurement invariance of three narcissism questionnaires (Brief Pathological Narcissism Inventory [B-PNI], Narcissistic Personality Inventory [NPI], and Narcissistic Admiration and Rivalry Questionnaire [NARQ]) and mean-level differences across samples from the United States (
Keywords
In recent years, there has been an increased interest in studying narcissism, which is justified by findings that narcissism predicts important outcomes such as counterproductive work behaviors (Penney & Spector, 2002), being unemployed (Leckelt et al., 2019), achieving a leadership position, and getting divorced (Wetzel et al., 2019). Narcissism questionnaires are increasingly being applied in different languages and countries than the ones they originated from. For example, the Narcissistic Admiration and Rivalry Questionnaire (NARQ; Back et al., 2013) was developed in German and English and now has been translated and validated in several other languages including Polish (Rogoza et al., 2016), Spanish (Doroszuk et al., 2020), and Italian (Vecchione et al., 2018). However, in most cases, instruments are adapted to other languages without checking the measurement invariance of the translated version with the original. Testing for measurement invariance is essential to ensure that the same construct is being measured and that different versions of the instrument function the same way. Measurement invariance is highly relevant to the generalizability of research on personality traits. For example, if the narcissism facet admiration does not have the same meaning between say, Americans and Germans, research on narcissism conducted with Americans cannot be generalized to Germans. In the same way, if Germans respond to certain narcissism items differently than Americans, despite having the same latent trait level, comparisons across the two groups need to take these differences into account to still be accurate. Measurement noninvariance can occur even between similar cultures speaking the same language (Doroszuk et al., 2020) as well as between ethnicities within one country (Wetzel et al., 2017). The goal of this study was to test the measurement invariance of three popular narcissism questionnaires across three countries, the United States, the United Kingdom, and Germany.
Establishing Measurement Invariance
There are many reasons why measures might not work equivalently across countries and cultures. The most critical reason for inequivalence would be that the construct does not exist in the same form in different cultures (i.e., a lack of conceptual equivalence). In addition to issues of conceptual equivalence, heterogeneity across countries in terms of the quality of item translations, the relevance of concepts expressed in items, and whether culturally specific knowledge is needed to fill out the items, may affect the measurement properties of the instrument.
Analyzing the equivalence of measures is a straightforward enterprise. In these analyses, it is first investigated whether the factor structure (number of factors and the pattern of salient and nonsalient loadings) is equivalent across countries. If this is the case, further restrictions representing stronger degrees of equivalence can be tested. Second, by constraining the factor loadings to equality across countries, it can be tested whether the items relate to the trait in the same way in the different countries. Third, by constraining the item intercepts to equality across countries, it can be tested whether the observed means conditional on the trait level are the same across countries. Fourth, by constraining the items’ residual variances to equality across countries, it can be tested whether the amount of variance in the items not accounted for by the trait is the same across countries. Invariance in the general factor structure is referred to as configural invariance, invariance in factor loadings is referred to as metric invariance, invariance in factor loadings and item intercepts is referred to as scalar invariance, and invariance in factor loadings, item intercepts, and residual variances is referred to as strict invariance (Meredith, 1993). More detailed information on the process of testing for measurement invariance can be found in Vandenberg and Lance (2000) and Widaman and Reise (1997). Often, full invariance (i.e., invariance for all items) does not hold. In this case, equality constraints can be relaxed for the noninvariant parameters and partial invariance can be achieved (Byrne et al., 1989; Steenkamp & Baumgartner, 1998). Unbiased estimates of the latent mean differences between the countries can then be obtained from the final (partial) invariance model when there are few noninvariant items relative to the number of invariant items (Guenole & Brown, 2014).
Some prior research has examined the measurement equivalence of specific narcissism measures across several countries. For example, Leckelt et al. (2018) found that metric invariance held across German and combined U.S. and U.K. samples in a short version of the NARQ (NARQ-S), though they did not test for scalar invariance. Doroszuk et al. (2020) found partial scalar invariance between Spain, Chile, and Colombia for the Spanish NARQ. Zemojtel-Piotrowska et al. (2018) investigated measurement invariance across samples from the United Kingdom, Japan, and Poland in a short version of the Narcissistic Personality Inventory [NPI], the NPI-13 (Gentile et al., 2013), but their scalar model did not converge, which according to additional analyses may have been due to noninvariant items on the entitlement/exploitativeness facet. Meisel et al. (2016) found that metric invariance did not hold for the 40-item NPI between U.S. and Chinese university students. Thus, prior research on the measurement equivalence of narcissism has been somewhat unsystematic as different forms of invariance have not been consistently tested across countries (e.g., several studies only investigated metric, but not scalar, invariance). Furthermore, previous studies only applied a pass/fail decision rule for whether measurement invariance existed and did not allow for partial invariance or considered the effect size of the noninvariance.
Cross-Cultural Differences in Narcissism
Americans tend to be perceived as more narcissistic than people from other countries (Campbell et al., 2010; Miller et al., 2015). For example, in a study on perceptions of national character, Miller et al. (2015) found that people from Basque Country, England, China, and Turkey rated Americans as more narcissistic than members of their own world region. It is unclear whether these perceptions are accurate or whether they are just stereotypes stemming from the portrayal of Americans in movies and the media (e.g., celebrities). Previous research on self-reported narcissism supports the perception that people in the United States are more narcissistic than people from other countries. For example, Foster et al. (2003) compared scores on the Narcissistic Personality Inventory (NPI; Raskin & Hall, 1979; Raskin & Terry, 1988) across five world regions and found that participants from the United States reported the highest NPI scores, followed by Europe, Canada, Asia, and the Middle East. Jonason et al. (2017) found that Americans showed higher narcissism scores than participants from Australia, Russia, Hungary, Brazil, and Japan, with Japanese participants showing the lowest narcissism scores. Fukunishi et al. (1996) conducted an analysis of variance of NPI scores between students from the United States, Japan, and China and found the highest scores for Chinese students. While some studies appear to confirm the perspective that people in the United States are more narcissistic than people in other countries, results are still inconclusive. In addition, these previous studies failed to consider a fundamental issue necessary for making valid and comprehensive comparisons: establishing whether the narcissism measure was being used equivalently across countries. Thus, in the present study, we investigated whether Americans on average are more narcissistic than individuals from two other Western countries, the United Kingdom and Germany, after establishing measurement invariance.
Differentiating Distinct Aspects of Narcissism
Existing research on cross-cultural differences in narcissism has focused on single measures of narcissism, which fails to reflect the often complex and multifaceted ways in which narcissism is currently defined and operationalized. Research on narcissism has emerged out of at least two traditions in psychology reflected in clinical understandings of the concept (Pincus et al., 2009) and more normal-range, personality-trait based conceptualizations of narcissism (Back et al., 2013; Robins et al., 2001). These different perspectives have led to a number of different questionnaires with heterogeneous narcissistic content. More recent research across traditions and measurement instruments shows that a three-dimensional distinction of agentic, antagonistic, and neurotic narcissism is more appropriate and allows one to disentangle functionally distinct aspects of narcissism with different correlates and outcomes (Back, 2018; Back & Morf, in press; Crowe et al., 2019; Krizan & Herlache, 2018; Miller et al., 2016). Here, we focus on three questionnaires that collectively capture all relevant narcissistic aspects: the NPI, the NARQ, and the Pathological Narcissism Inventory (PNI).
The NPI was developed to assess the
The PNI (Pincus et al., 2009) was designed to assess clinical forms of narcissism and therefore contains content more directly related to distress. Its vulnerability domain contains facets with neurotic content such as contingent self-esteem and hiding the self, while its grandiosity domain is more mixed and contains a facet with mostly antagonistic content (exploitativeness), a facet with mostly agentic content (grandiose fantasies), and a facet with mostly communal content (self-sacrificing self-enhancement). To provide comprehensive coverage of the different forms and operationalizations of narcissism, we investigated measurement invariance and mean differences across three countries on these three different narcissism measures.
The Present Study
In our study, we aimed at examining whether three narcissism questionnaires were equivalent across the United States, the United Kingdom, and Germany. If at least partial scalar measurement invariance was established, we additionally investigated whether perceptions of Americans as more narcissistic than natives of other countries are true by comparing Americans’ mean levels with those of people from the United Kingdom and Germany. We analyzed data from the Brief Pathological Narcissism Inventory (B-PNI; Schoenleber et al., 2015), the NPI (Raskin & Hall, 1979; Raskin & Terry, 1988), and the NARQ (Back et al., 2013), allowing us to capture potential differences in agentic, antagonistic, and neurotic aspects of narcissism. The data were collected in five countries (the United States, the United Kingdom, Germany, Italy, and Poland), but because only the U.S., U.K., and German samples filled out all three questionnaires, we focus our analysis on these three countries. In the supplementary online material (SOM; https://osf.io/53amg/), we additionally report analyses of measurement invariance and mean differences for samples from Italy (NPI and NARQ) and Poland (B-PNI and NARQ). Based on prior research we expected to find varying levels of measurement invariance and comparability of these measures across the United States, United Kingdom, and Germany.
Method
Samples
For all samples, only participants aged between 18 and 50 years were included in the analyses to make the age distributions across countries more similar. The purpose of this was to avoid confounding potential noninvariance across countries with noninvariance due to age or cohort (Wetzel et al., 2017).
German Sample
The German sample consisted of 925 participants (72% female) who filled out an online survey. Their mean age was 26.33 (
U.S. and U.K. Samples
The U.S. and U.K. samples were collected together in an online survey. The survey was hosted on www.yourpersonality.net and was available for people from all over the world. There was no specific recruitment strategy, but anyone who searched the Internet for personality tests or narcissism could have come across this website. For the purposes of this study, we only extracted data from participants who reported that their country of residence was the United States or the United Kingdom. The U.S. sample originally consisted of 2,954 participants aged between 18 and 50 years. We removed 210 participants who had participated more than once and 280 participants who failed one or both instructed response items. The final U.S. sample therefore consisted of 2,464 participants (76% female,
Descriptive Statistics for the Three Samples.
Measures
Means scores, standard deviations, and omega reliabilities for facet scores by questionnaire and sample are depicted in Table S1 in SOM 1. Table S2 shows observed score correlations among all facets for the U.S. sample. Descriptive statistics for the Italian and Polish samples, which were used for supplementary analyses, can be found in SOM 2.
B-PNI
The B-PNI (Schoenleber et al., 2015) was developed as a short version of the PNI (Pincus et al., 2009), which assesses pathological narcissism. The B-PNI retained the facet structure of the PNI, thus distinguishing between the subscales
NPI
The NPI (Raskin & Hall, 1979; Raskin & Terry, 1988; in German by Schütz et al., 2004) assesses grandiose, nonpathological narcissism with 40 item pairs. Each item pair consists of a narcissistic option (e.g., “I like to look at myself in the mirror”) and a nonnarcissistic option (e.g., “I am not particularly interested in looking at myself in the mirror”). Participants are instructed to select the option out of the pair that best describes their feelings and beliefs. The factor structure of the NPI has been a subject of debate. Here, we use the factor structure obtained by a Thurstonian item response analysis of NPI data, which takes the dependencies between items presented in a pair into account. This analysis yielded three facets (Wetzel, Roberts, et al., 2016):
NARQ
The NARQ assesses grandiose narcissism on two dimensions: admiration and rivalry.
In the U.S. and U.K. samples, the order in which the three questionnaires were presented was randomized. In the German sample, all participants filled out the questionnaires in the order NARQ, NPI, PNI. The data for all samples are available from https://osf.io/hbuqz/.
Analyses
We tested for cross-country measurement invariance in multigroup item response models. For the NPI, the underlying model was the Thurstonian item response model (Brown & Maydeu-Olivares, 2011), which is a two-parameter logistic (2PL) model for forced-choice data that takes the dependencies between the items presented in a pair into account. For the B-PNI and NARQ, we used the graded response model (Samejima, 1969), which is a 2PL model for data from ordered rating scales. In the graded response model, the probability of endorsing a certain response category or the ones above it is parameterized with thresholds and there are one fewer thresholds than response categories for each item (e.g., five thresholds for a 6-point rating scale as in the B-PNI and NARQ). We started with a fully constrained strict invariance model with factor loadings, 3 item thresholds, and residual variances constrained to equality across countries. 4 We implemented strict invariance (instead of scalar invariance) because allowing residual variances to vary across countries can obfuscate noninvariance in loadings and thresholds (Lubke & Dolan, 2003). Noninvariance was determined using the classification system developed by Educational Testing Service, which categorizes items into no or negligible, small to moderate, and moderate to large noninvariance (Zieky, 1993). Transformed into the metric of item response models, the cutoffs for moderate to large noninvariance are 0.25 for factor loadings and 0.375 for thresholds (Wetzel et al., 2017). We iteratively freed parameters (loadings and thresholds) with moderate to large noninvariance in the order of the size of their modification indices. That is, in the fully constrained model we determined which one of the parameters with moderate to large noninvariance had the largest modification index. We then estimated the first partial invariance model in which we freely estimated this parameter in the country in which it showed noninvariance while constraining it to equality across the other two countries. Next, we determined which one of the remaining parameters with moderate to large noninvariance now had the largest modification index and freed it for the second partial invariance model, and so on. The advantage of this procedure for testing measurement invariance is that it is based on an effect size criterion for noninvariance, rather than significance testing (Wetzel et al., 2017). Therefore, large sample sizes and differing sample sizes across countries should not affect the results.
All analyses were conducted using unweighted least squares with mean- and variance-corrected Satorra–Bentler goodness-of-fit tests (denoted ULSMV in M
Results
In the following, we report the results of our measurement invariance analyses across the United States, the United Kingdom, and Germany as well as mean differences between participants from these countries. Results from the analyses including Italy and Poland are available in SOM 2.
B-PNI
We first checked configural invariance using the theoretical factor structure from Schoenleber et al. (2015) across the United States, the United Kingdom, and Germany. The fit of the configural model was good according to the root mean square error of approximation (RMSEA) [0.035], 90% confidence interval [CI: 0.033, 0.037]) and standardized root mean residual [SRMR] (0.041) and acceptable according to the confirmatory fit index (CFI; 0.941) and Tucker–Lewis index (TLI; 0.932). The pattern of factor loadings indicated that the items loaded strongly on the facet they belonged to in all countries (see Table S3 in SOM 1).
The measurement invariance analyses of the B-PNI resulted in six noninvariant loadings for Germany while all loadings were invariant between the United States and the United Kingdom (see Table S4 in SOM 1). The largest difference in factor loadings occurred for Item 28 on the facet
In sum, the United States and the United Kingdom largely showed full measurement invariance on the B-PNI. There were some violations of measurement invariance between Germany and the United States and the United Kingdom, but partial measurement invariance was achieved, allowing us to investigate mean differences across countries.
The latent mean differences between the United States and the other two countries derived from the final partial measurement invariance model showed that the United States and the United Kingdom did not show mean-level differences on most of the B-PNI’s facets with the exception of self-sacrificing self-enhancement, which was lower in the United Kingdom,

Latent mean differences on the B-PNI facets exploitativeness (EXPL), self-sacrificing self-enhancement (SSSE), grandiose fantasy (GRFA), contingent self-esteem (COSE), hiding the self (HIDE), devaluing (DEVA), and entitlement rage (ENTI) between the United States, the United Kingdom, and Germany (D).
Latent Mean Differences and Effect Sizes for the Brief Pathological Narcissism Inventory.
We also checked whether there were notable fluctuations in the estimates of mean differences over the course of the measurement invariance models from strict invariance to the final partial invariance model. As Figures S1 to S7 in SOM 1 show, this was not the case. For facets with no noninvariant parameters such as self-sacrificing self-enhancement, the estimate stayed the same over the course of all models (Figure S2 in SOM 1). For facets with some noninvariant parameters such as exploitativeness for the German sample, the estimate changed when the noninvariance was taken into account by freeing parameters, resulting in an adjusted estimate of the mean difference. In the example of the mean difference between the United States and Germany on exploitativeness, the estimate from the strict invariance model would have underestimated the mean difference (
Finally, as an additional validity check, we compared the correlations between the B-PNI facets and age and gender across countries, both for the full invariance model (not adjusted for noninvariance) and the final partial invariance model (adjusted for noninvariance). The correlations were very similar across countries and models (see Table S6 in SOM 1). For example, self-sacrificing self-enhancement correlated −0.22 with age in the United States, −0.27 in the United Kingdom, and −0.23 in Germany in the final partial invariance model.
Thus, regarding the B-PNI facets, we found that Americans on average scored higher than individuals from the United Kingdom and Germany on the agentic facet self-sacrificing self-enhancement. Americans also scored higher than Germans on facets with neurotic content (hiding the self, contingent self-esteem).
At the level of the higher-order domains grandiosity and vulnerability, a similar number of parameters was noninvariant (3 loadings and 37 thresholds) as at the facet level, again mostly between Germany and the combined U.S. and U.K. samples (see Table S1 in SOM 3). Latent mean differences on grandiosity and vulnerability reflected those found for the facets and showed that Americans on average scored higher than participants from the United Kingdom and Germany on grandiosity. Americans also on average scored higher than Germans on vulnerability (see Figure S1 in SOM 3).
NPI
For the NPI, the configural invariance model showed a good fit according to the RMSEA, 0.021, 90% CI [0.020, 0.023], a just acceptable fit according to the SRMR (0.080), and a below acceptable fit according to the CFI (0.892) and the TLI (0.883). The pattern of factor loadings indicated that there were some items that showed poor factor loadings for all countries and that some factor loadings differed strongly across countries (see Table S7 in SOM 1). For example, the standardized factor loading on the vanity item 20 (“I try not to be a show off” vs. “I am apt to show off if I get the chance”) was 0.43 in the U.S. sample, 0.14 in the U.K. sample, and 0.22 in the German sample. This indicates that there might be differences in the factor structure across countries. Nevertheless, since the model overall fit acceptably, and considering that the factor structure of the NPI has been a disputed topic even in homogeneous samples (Ackerman et al., 2011; Emmons, 1984), we proceeded with the measurement invariance analyses.
The measurement invariance analyses indicated that 21 parameters were noninvariant (8 loadings and 13 [12%] thresholds). With two exceptions, these noninvariant parameters pertained to the German sample, indicating that the United States and the United Kingdom were largely invariant while Germany required its own loading or threshold on a number of items. For example, Item 4 on vanity (“When people compliment me I sometimes get embarrassed” vs. “I know that I am good because everybody keeps telling me so”) had an unstandardized loading of 1.51 in the combined U.S. and U.K. group and a lower loading in Germany (0.89). Thus, for eight NPI items, the relationship between the items and the underlying trait was different for the German sample compared with the combined U.S. and U.K. samples. With respect to the thresholds, a number of items also differed between Germany and the combined U.S. and U.K. samples. For example, Item 1 on leadership (“I have a natural talent for influencing people” vs. “I am not good at influencing people”) had a threshold of −0.35 in the combined U.S. and U.K. samples, but a threshold of −0.96 in the German sample. This indicates that—conditional on the trait level—German participants were more likely to choose the narcissistic option (“I have a natural talent for influencing people”) compared with U.S. and U.K. participants. In fact, 73% of the German participants selected the narcissistic option compared with 59% in the United States and 52% in the United Kingdom. Table S8 in SOM 1 contains a list of the noninvariant NPI items for each country. In sum, for the United States and the United Kingdom, the NPI functioned largely equivalently. For Germany, there were some violations of measurement invariance.
In the final partial invariance model, mean differences relative to the United States as the reference group were found for the United Kingdom on leadership and vanity, with U.K. participants on average scoring lower than U.S. participants,

Latent mean differences on the Narcissistic Personality Inventory facets leadership, vanity, and entitlement between the United States (US), the United Kingdom (UK), and Germany (D).
Latent Mean Differences and Effect Sizes for the Narcissistic Personality Inventory.
At the level of overall narcissism, fewer loadings were noninvariant (3 instead of 8). Furthermore, 15 thresholds were noninvariant, 13 of those for Germany (see Table S2 in SOM 3). The latent mean differences on overall narcissism indicated a higher mean for Germany compared with the United States and the United Kingdom (see Figure S2 in SOM 3).
NARQ
The fit of the configural invariance model with the factor structure from Back et al. (2013) was acceptable to good (RMSEA = 0.047, 90% CI [0.044, 0.049], SRMR = 0.051, CFI = 0.919, TLI = 0.905). All items showed substantial factor loadings on their respective facet in all countries (see Table S9 in SOM 1).
The measurement invariance analyses revealed 29 noninvariant parameters, all of them for the German sample, while the U.S. and U.K. samples were fully invariant (see Table S10 in SOM 1). Of these 29 noninvariant parameters, three were factor loadings. For example, Item 8 (“I deserve to be seen as a great personality”) was more strongly related to the trait admiration for U.S. and U.K. participants (unstandardized loading 1.67) than for participants from Germany (unstandardized loading 1.05). Of the 270 thresholds (5 thresholds for each of the 18 items times 3 countries), 26 (10%) were noninvariant for the German sample. Eleven items in total were affected (five on admiration and six on rivalry), though only for five of these items three thresholds or more were noninvariant. For example, four out of the five thresholds of Item 18 on admiration (“Mostly, I am very adept at dealing with other people”) were noninvariant for the German sample. These thresholds all had higher values in the German sample compared with the combined U.S. and U.K. samples, indicating that Germans needed a higher trait level to have the same probability of endorsing a certain response category as people from the other countries. All thresholds of Item 14 (“Other people are worth nothing”) were noninvariant for the German sample, but this item had a very skewed response distribution in all samples (fewer than 15% of the participants responded in Categories 4, 5, or 6), which may have affected the estimation of the thresholds.
In sum, the NARQ was equivalent between the U.S. and U.K. samples. The German sample differed in their endorsement probabilities of some response categories on some items, but partial invariance with the U.S. and U.K. samples existed.
Figure 3 and Table 4 show latent mean differences on admiration and rivalry from the final partial invariance model. The U.K. and German participants had lower average trait levels than U.S. participants on admiration,

Latent mean differences on the Narcissistic Admiration and Rivalry Questionnaire facets admiration and rivalry between the United States (US), the United Kingdom (UK), and Germany (D).
Latent Mean Differences and Effect Sizes for the Narcissistic Admiration and Rivalry Questionnaire.
At the level of overall narcissism, the same number of noninvariant parameters was found (29). Twenty-eight of these pertained to thresholds (see Table S3 in SOM 3). According to the final partial invariance model for the NARQ, Americans scored higher on overall narcissism than individuals from the United Kingdom and Germany (see Figure S3 in SOM 3).
Discussion
In this study, we investigated the measurement invariance of three narcissism questionnaires (B-PNI, NPI, NARQ) across the United States, the United Kingdom, and Germany. The three narcissism questionnaires functioned mostly equivalently in the U.S. and U.K. samples, indicating that mean comparisons between these two countries could be drawn without qualifications. Comparisons between the United States and Germany required adjustments for noninvariance. In the following, we first discuss the measurement invariance or lack thereof of the B-PNI, NPI, and NARQ across countries. Second, we discuss potential reasons for noninvariance in a cross-cultural context and the implications for translating and adapting measures. Third, we discuss the mean differences we found and the practical implications of measurement invariance for interpreting mean differences before noting some limitations of our study and future directions.
Measurement Invariance of the B-PNI, NPI, and NARQ Across Countries
We found the fewest violations of measurement invariance between the United States, the United Kingdom, and Germany in the B-PNI (7.5% of factor loadings and thresholds), followed by the NARQ (9%). The NPI showed the largest number of violations (13.3%), many of which pertained to factor loadings. 6 In fact, the NPI was the sole questionnaire with a preponderance of metric nonequivalent items. This is a more severe violation of measurement invariance than differences in item thresholds because it may indicate that items are relevant to a trait in one country, but not another (Huang et al., 1997). For example, the item pair “I insist on getting the respect that is due me” versus “I usually get the respect I deserve” loaded higher in the United States/the United Kingdom than in Germany. Thus, this item was less relevant to the entitlement facet in the German sample compared with the U.S. and U.K. samples.
In contrast to the NPI, in the B-PNI and the NARQ, we mainly found violations of measurement invariance at the level of the thresholds. Almost all of the noninvariant thresholds pertained to the German sample. This may indicate that the items measure aspects of the trait that are expressed to different degrees in Germany compared with the United States and the United Kingdom, leading to different probabilities of endorsing the items. For example, U.S. participants had a higher probability of endorsing a category stating agreement compared with German participants on Item 18 on admiration in the NARQ (“Mostly, I am very adept at dealing with other people”). Despite the noninvariance in loadings and thresholds found in all questionnaires, partial invariance across the United States, the United Kingdom, and Germany could be established for all traits.
Potential Reasons for Noninvariance
Why did some items show noninvariance across countries? According to
Sometimes it is necessary to adapt the item content because the item content is not relevant in a different country or culture, it contains a cultural reference, it contains an expression that cannot be translated literally, or it contains an idiom (International Test Commission, 2017). For example, the narcissistic response option for Item 16 on the NPI is “I can read people like a book.” This idiom may not exist in other languages or the same meaning might be expressed with a different analogy. In German, “to be an open book for someone” is a common expression, but “reading someone like a book” is not. In the German translation, this item was adapted to “I can read in others like in a book” (
Mean Differences in Narcissism
Are Americans more narcissistic than people from other countries? Previous research using observed scores indicated that this was the case, with Americans, for example, scoring higher than Europeans on the NPI total score (Foster et al., 2003). According to our analyses, Americans consistently scored higher than participants from the United Kingdom and Germany on facets capturing agentic narcissistic content, such as self-sacrificing self-enhancement (B-PNI), leadership (NPI), and admiration (NARQ). Americans also scored higher than Germans, but not individuals from the United Kingdom, on B-PNI and NARQ facets capturing antagonistic and neurotic content (hiding the self, contingent self-esteem, rivalry). These results might be explained by differences between the countries on Hofstede’s (2001) cultural dimension of individualism (vs. collectivism), arguably the most relevant cultural dimension to narcissism. According to Hofstede (2001), the United States had a score of 91 on individualism, while Great Britain had a score of 89 and Germany (the former West) only had a score of 67. Thus, values such as looking after oneself rather than relying on a group play a more important role in the United States compared with other countries and this might foster narcissistic tendencies. Results for two of the NPI facets, vanity and entitlement, were less consistent with this overall pattern with Germans scoring lower than Americans on these two facets. This is especially puzzling for B-PNI-entitlement rage and NPI-entitlement, which showed opposite results (
Practical Implications of Measurement Invariance for the Interpretation of Mean Scores
Our analyses also illustrate the importance of taking measurement noninvariance into account when interpreting mean scores and comparing them across countries: For those facets on which a number of items were noninvariant, observed means underestimated or overestimated the differences between countries. For example, for the B-PNI facet self-sacrificing self-enhancement, the observed mean difference was smaller than the latent mean difference from the final partial invariance model, whereas for the facet hiding the self, the observed mean difference was larger than the latent mean difference. Since there is no way of knowing in which direction the bias will go, researchers relying on observed means may draw incorrect conclusions about cross-country differences. Therefore, the measurement invariance of the instrument across countries should be investigated prior to drawing mean comparisons and potential noninvariance should be controlled for.
Limitations and Future Directions
Even though we tested for measurement invariance and controlled for noninvariance in the partial invariance models, we should be cautious in interpreting these mean differences in narcissism facets because there are other potential reasons that could have played a role in addition to true cross-country differences. We made the age distribution of the samples more similar by restricting the age range from 18 to 50 years, but, since mean-level changes in narcissism occur from young adulthood to middle age (Wetzel et al., 2019) and cross-cohort measurement noninvariance has been found for the NPI (Wetzel et al., 2017), it is still possible that differences between age groups/cohorts may have been confounded with differences between countries. Nevertheless, correlations between narcissism facets and age and gender were similar across countries, both for unadjusted trait estimates and adjusted trait estimates (taking measurement noninvariance into account). In addition, method biases such as differences in using the response scales (e.g., acquiescence, extreme response style) could have influenced the results though the impact of response styles appears to be less severe than is often assumed (Plieninger, 2017; Wetzel, Böhnke, et al., 2016). Last, taking measurement invariance into account when investigating mean differences across countries cannot control for the reference-group effect (Heine et al., 2008; Mottus et al., 2012).
Our samples were from mostly Western countries. Thus, future research could examine the equivalence of narcissism questionnaires and mean differences in narcissism in more diverse countries, including countries from Asia and Africa. With more diverse countries, different patterns of results might emerge with respect to the different components of narcissism (e.g., more individualistic countries scoring higher than more collectivistic countries on agentic aspects of narcissism). Our samples were all nonclinical samples. Since the B-PNI was developed for the assessment of pathological narcissism, it would be interesting to investigate the cross-cultural measurement invariance of the B-PNI in clinical samples and to include other questionnaires such as the Narcissistic Vulnerability Scale (Crowe et al., 2018).
Conclusion
Questionnaires are translated and adapted to other cultures with the intent to minimize or eliminate cultural differences in responding to the items. A test of measurement invariance can be used to check whether that goal was achieved. Configural invariance overall supports the use of the B-PNI and NARQ in the countries investigated here, though it is questionable for the NPI. All questionnaires showed some noninvariance across countries, indicating that caution needs to be exercised when investigating and interpreting mean differences. In line with stereotypical perceptions, we found that individuals from the United States on average scored higher on agentic facets of narcissism than individuals from the United Kingdom and Germany. For antagonistic and neurotic facets, there were largely no differences between the United States and the United Kingdom. The United States showed higher means than Germany on some, but not all, facets with antagonistic and neurotic content.
Supplemental Material
Cross-cultural_MI_SOM_1_20190712_an – Supplemental material for Measurement Invariance of Three Narcissism Questionnaires Across the United States, the United Kingdom, and Germany
Supplemental material, Cross-cultural_MI_SOM_1_20190712_an for Measurement Invariance of Three Narcissism Questionnaires Across the United States, the United Kingdom, and Germany by Eunike Wetzel, Felix J. Lang, Mitja D. Back, Michele Vecchione, Radoslaw Rogoza and Brent W. Roberts in Assessment
Supplemental Material
Cross-cultural_MI_SOM_2_20190718_an – Supplemental material for Measurement Invariance of Three Narcissism Questionnaires Across the United States, the United Kingdom, and Germany
Supplemental material, Cross-cultural_MI_SOM_2_20190718_an for Measurement Invariance of Three Narcissism Questionnaires Across the United States, the United Kingdom, and Germany by Eunike Wetzel, Felix J. Lang, Mitja D. Back, Michele Vecchione, Radoslaw Rogoza and Brent W. Roberts in Assessment
Supplemental Material
Cross-cultural_MI_SOM_3_an – Supplemental material for Measurement Invariance of Three Narcissism Questionnaires Across the United States, the United Kingdom, and Germany
Supplemental material, Cross-cultural_MI_SOM_3_an for Measurement Invariance of Three Narcissism Questionnaires Across the United States, the United Kingdom, and Germany by Eunike Wetzel, Felix J. Lang, Mitja D. Back, Michele Vecchione, Radoslaw Rogoza and Brent W. Roberts in Assessment
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by a grant from the Mentoring Program of the Zukunftskolleg at the University of Konstanz awarded to Eunike Wetzel and a grant from the National Science Center Poland awarded to Radoslaw Rogoza (2014/14/M/HS6/00919).
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
