Sage Journals: Discover world-class research

Abstract

Personality self- and informant-reports have been ascribed complementary value based on the asymmetric knowledge of the two perspectives. However, this study is the first to investigate what personality (item) content is reflected in the shared and unique components in multi-rater personality judgments. In two large data sets (Sample 1: 664 targets/1,615 informants; Sample 2: 478 targets/1,434 informants), we used latent variable models to separate judgments into variance that is shared across targets and informants (the Trait factor), unique to self-reports (Identity), and unique to informant-reports (Reputation). Then, we predicted the personality items’ loadings for each factor from the items’ content. This included items’ affective, behavioral, cognitive, or desire-related content, observability and evaluativeness, and centrality to identity or reputation. We found that Trait consensus was generally promoted by items reflecting observable, behavioral, but also affective content. Unique self-perceptions were captured especially by cognitions and non-observable content. Evaluativeness had inconsistent effects across samples. Similarly, unique informant-views reflected different content across samples. Both may depend on the types of informants or the available item sample. These insights build the foundation for leveraging the power of multi-rater perspectives on personality for advancing theory and measurement across different perspectives.

Plain language summary

Someone’s personality can be judged from two different perspectives: by the person themselves or by someone else (e.g., a friend). Self-judgments and judgments made by others show some agreement, but it typically is moderate: That is, there are some things about our personality that everyone agrees on, but there are also things that only we think about ourselves (our identity) or that only others think about us (our reputation). This may have different reasons: For example, other people can only observe what we do and say, whereas we also know what we are thinking and feeling. In this study, we wanted to find out what content makes up the different perspectives of the self and others on someone’s personality. In two samples, people judged their own personality and were also judged by 1–3 people from their social networks (informants) by answering personality questionnaires. The samples, respectively, included 664/478 people judging themselves and 1,615/1,434 informants. We found that self and others agreed more on questions describing personality traits that are observable from the outside or describe behaviors (e.g., “talks a lot”), but also emotions (e.g., “often feels blue”). For traits that cannot be easily observed, especially if they concern how we think (e.g., “believes they are better than others”), people tend to have more unique self-perceptions. There was no clear pattern for informants’ unique views: The content of a person’s reputation may depend on the type of informant (e.g., friend, coworker, or stranger). Understanding the shared and unique components in personality self- and informant-judgments is important for different reasons: Researchers may develop personality questionnaires specifically for self- or informant-judgments rather than assigning everyone the same questions in spite of their differentiated insights. It also helps decide whose perspective is relevant in a given context and how to integrate their value.

Keywords

personality person perception trait reputation identity

Introduction

Personality research has a long history of relying heavily on self-report questionnaires to assess personality traits (Robins et al., 2007; Vazire, 2006). Informant-reports, though having their own tradition in personality psychology, have mostly been used by researchers to bolster the findings of self-reports (e.g., Hofstee, 1994; McCrae & Costa, 1982; Norman, 1963). That is, the perspective of others has generally been treated as an alternate and interchangeable assessment method. In recent decades, however, discussion has increasingly turned toward understanding the unique and complementary value of informant-reports. In particular, self- and informant-reports may be uniquely suited to provide insights into different traits due to their asymmetrical access to different target information (e.g., Vazire, 2010; Vazire & Mehl, 2008). For instance, informants may be better poised for describing outwardly observable behavioral manifestations of a target’s personality, whereas targets may be better poised to evaluate patterns of internal affect and cognition. This multi-rater research not only furthers our understanding of personality judgments but also has implications for their application. For example, these insights position self- and informant-reports as appropriate predictors for different criteria that are aligned with their different perspectives.

Despite the possibilities offered by more recent multi-rater research, personality psychologists have not yet incorporated its insights when building personality assessments. Most typically, questionnaires are developed using self-reports, and informant-reports are created by simply rephrasing items from first- to third-person pronouns (McCrae, 1994; McCrae & Weiss, 2007). Such approaches do little to capitalize on the differentiated perspectives of self- and informant-reports (McCrae, 1994; Olino & Klein, 2015). This could include, for example, considering that self-reports may be better poised to answer items covering content that is less easily observable by outside perceivers (e.g., “trusts others” vs. “talks a lot”). Informant-reports, on the other hand, may offer more of a bird’s eye perspective on highly evaluative compared to more neutral item content (e.g., “completes tasks successfully” vs. “avoids crowds”). Additionally, researchers and practitioners are often interested in representing the discrepant views offered by self- and informant-reports (e.g., Kluemper et al., 2015). For example, this may be relevant in clinical assessments: a clinician may report on different questions in a structured interview than what the client self-reports on in a questionnaire, and discrepancies in the perspectives of self-, clinician-, and informant-reports may be part of a diagnosis. In some fields such as child clinical psychology, practitioners may be relatively more used to integrating multi-rater perspectives and including informant-reports. Such integration has, however, not been adopted in other applied settings or in personality psychology research at large. In addition, there are currently no systematic ways for assessing and scoring the shared and unique insights of self- and informant-reports contained in personality questionnaires (which could include developing questionnaires specifically for multi-rater applications). This, however, could be a promising approach to future personality assessment.

Integrating multi-rater perspectives into personality assessments requires answers to a more fundamental question: what personality insights are (a) shared by targets and informants, (b) held solely by the target, and (c) held only by informants? Personality research has acquired substantial knowledge on individual parts of this picture. This includes knowledge about factors that drive the moderate self-other alignment in person judgments that has been found and factors that are unique to either perspective. Yet, the unique and shared insights held by targets and informants have never been simultaneously disentangled from one another at the item level to see what type of personality content each reflects. That is, there is currently no evidence on what unique insights are contained in self-reports (or informant-reports) beyond those shared across self and informants. Understanding the content composition of the shared and unique components in multi-rater judgments will inform which perspective is relevant in a given context and how it can be best assessed. Moreover, building such a knowledge base can substantially advance our theoretical understanding of self- and other-perspectives as well as multi-rater assessment across all subfields of psychology.

This study aims to build the foundation for optimizing the measurement and harnessing the power of multi-rater perspectives in personality. To this end, we build on a recently introduced integrative approach to disentangle the shared and unique components in multi-rater judgments. Specifically, the Trait-Reputation-Identity (TRI) Model (McAbee & Connelly, 2016) uses latent variable models to separate multi-rater judgments into variance that is (1) shared across targets and informants (the Trait factor), (2) unique to informant-reports (Reputation), and (3) unique to self-reports (Identity). We apply this model to two relatively large samples of multi-rater personality judgments. Then, we evaluate what personality insights each component reflects by analyzing the content of items that contribute most to a given factor, examining items’ (1) affective/behavioral/cognitive/desire-related content, (2) observability and evaluativeness, and (3) centrality to identity and reputation. This is informed by the combined insights from personality and person perception research on impression formation and self-other differences. The following sections discuss the theoretical prerequisites in more detail, explaining what we know about self-other differences in judgment formation, how multi-rater perspectives have been traditionally viewed, and how the TRI Model can work as a framework to disentangle unique and shared insights in multi-rater judgments.

Self-other differences in personality judgments

“From inside” and “from outside” reflect the two qualitatively different points of view from which a person can assess someone’s personality. One core finding of personality psychology is that measures solicited from these two perspectives—self- and informant-reports, respectively—usually align meaningfully but moderately (Connelly & Ones, 2010; Connolly et al., 2007; Kenny & West, 2010). That is, although a target and their informants share some understanding of the target person’s underlying traits, the target may also have unique self-perceptions, and others may hold views about the target’s personality independent of the target (Letzring & Funder, 2021; Vazire, 2010; Vazire & Mehl, 2008).

To understand how and why self-other differences in personality judgments arise, it is helpful to picture the process of judgment formation (e.g., Brunswik, 1956; Funder, 1995; Kenny, 1994). When making personality judgments, perceivers typically rate the target on a set of items that capture different traits (e.g., the Big Five; Goldberg, 1990). To make such a judgment, the perceiver first needs some information about the target. For example, the perceiver may draw on their observations of the target that span across different situations. There, the perceiver gains access to cues such as different behaviors that the target displays or the target’s physical appearance (Kenny, 1994). Perceivers may then form a judgment from that information by assigning meaning to the cues in regard to the trait they have to judge and integrating them (Kenny, 1994). Notably, this model applies to both self- and informant-reports (i.e., both are subsumed under the term perceiver). For example, Penny may have observed her classmate Tina in many different situations in school. When she is asked to judge Tina’s friendliness, Penny thinks of Tina often smiling at her and greeting everybody she meets as positive indicators. Similarly, when Penny is asked to judge her own sociability, she may think of how often she meets with friends or how exhausted she feels after social gatherings. We focus on three important reasons why self-other differences may occur during this process: (1) self and informant have access to different information; (2) self and informant are affected by different evaluative biases; and (3) self and informant have different goals when providing judgments.

Information availability

Perceivers can only directly use the information about the target that is available to them, which differs systematically depending on who the perceiver is (Funder, 1995; Vazire, 2010). Specifically, anyone but the target is an outside observer, which means the information available to them is limited to cues which the target emits while the perceiver is present. Moreover, these cues have to be observable, that is, perceivable by a present outside observer (i.e., “availability”; Funder, 1995; Furr, 2009). In person perception, this includes two types of information: (1) the physical appearance of the target and their associated belongings (e.g., their clothes) and (2) behaviors displayed by the target in situations available to the observer (Kenny, 1994). Behaviors in the narrow sense include all verbal expressions and movements, and they are by definition observable (Baumert et al., 2017; Furr, 2009). As external manifestations of personality, behaviors are considered the most important source of information for person judgments by outside observers (Funder & Sneed, 1993).

However, personality is not just behavior but someone’s characteristic pattern of affect, behavior, cognition, and desire (ABCD; Baumert et al., 2017; Wilt & Revelle, 2015). In short, affect refers to how people feel, cognition to how people think, and desire to what people want (Wilt & Revelle, 2015). Thus, personality comprises internal traits that become observable only if the target manifests them behaviorally (e.g., by talking about feeling anxious or displaying a tense facial expression; Furr & Funder, 2007). The levels of observability can also vary within each ABCD-component. For example, some affective traits may be linked to various behaviors that manifest in many situations (e.g., anxiety expressed as pacing, sweating, and irritability), whereas others manifest rarely (e.g., individuals concealing depressive thoughts and feelings from others for decades). One relevant aspect here is also the level and context of acquaintance between target and informant. Specifically, close acquaintances may have observed the target in more and more intimate situations where they witnessed manifestations of thoughts or emotions otherwise kept internally by the target (e.g., Colvin & Funder, 1991; Funder, 1995; Vazire, 2010).

The target, on the other hand, is the one person with privileged (i.e., direct) access to a host of self-information including their own thoughts, feelings, and desires (Osberg & Shrauger, 1990; Vazire, 2010). Targets are also the only individuals to witness their personality in every situational context. Thus, they are able to observe, and even control, many of their behaviors. Compared to informants, however, targets do not have a full visual of their body. This limited salience may lead targets to be less aware of some of their own behaviors compared to simultaneous internal processes (DePaulo, 1992; Vazire, 2010). Subtle behaviors (e.g., minor movements/facial expressions) may even elude the target’s awareness.

In summary, there is a knowledge asymmetry where self and informants have access to different types of information: Informants rely on observable behaviors, whereas the self has privileged access to their own unobservable emotions, thoughts, and desires in addition to observing their own behaviors. Accordingly, self-other agreement should be higher for observable information because it is accessible to both the self and informants (Kenny, 1994; Vazire, 2010). Similarly, self-other agreement should be higher for behavioral than for affective, cognitive, or desire-related traits. This has been bolstered by previous research. Higher self-other agreement has been found for personality domains deemed to be more observable (e.g., Funder & Dobroth, 1987; John & Robins, 1993; Watson et al., 2000) and to reflect primarily behavior (e.g., Conscientiousness and Extraversion) rather than internal patterns (e.g., Neuroticism and Openness; e.g., Connelly & Ones, 2010; Pytlik Zillig et al., 2002). Thus, we expect the shared perspective in multi-rater personality judgments to reflect observable and behavioral content. On the flipside, there is limited knowledge about what is reflected in the unique perspectives if they are separated from shared insights. We may reasonably expect unique self-views to center around unobservable content which targets have exclusive access to. The access of informants to observable content, however, may lead to different outcomes: On the one hand, we anticipate informants to draw heavily on more observable information which could result in more unique views on observable traits. On the other hand, we expect that observable items will especially generate self-informant agreement and it is unclear whether and how much variance remains for informant uniqueness which could result in null or negative relationships of informant uniqueness and observability.

Evaluative biases

If information is available to the perceivers, the next question is if and how they will use it to make their judgment. In this stage of cue utilization, perceivers may detect and utilize available trait information (Brunswik, 1956; Funder, 1995). Beyond being impartial perceivers of cues, however, individuals are susceptible to evaluative biases that impact how trait information is detected and utilized (Funder, 1995; Vazire, 2010). Specifically, perceivers may have an overall attitude toward the target that can range from positive to negative (Leising et al., 2015). Notably, this is true for both informant-reports of another person and perceivers’ self-reports. Personality traits (and their associated items) vary in their social desirability, and traits that are especially desirable (e.g., “intelligent”) or especially undesirable (e.g., “immoral”) are likely to elicit more evaluative bias than traits with more neutral social desirability (e.g., “quiet”; e.g., Edwards, 1953; John & Robins, 1993; Leising et al., 2015). Positively biased perceivers, for example, will endorse positive traits more and negative traits less (Borkenau et al., 2009; Leising et al., 2015). As highly evaluative (i.e., very positive/negative) items will be more strongly affected by these biases than more neutral items, “non-evaluativeness” is a property associated with greater judgmental accuracy.

Self-reports of personality are widely regarded as reflecting biases geared toward creating or maintaining a positive self-image. In particular, targets can self-enhance or self-protect by interpreting traits in a self-serving way, externalizing negative outcomes, or choosing self-serving comparisons (e.g., Alicke & Sedikides, 2009; Dunning, 1999; John & Robins, 1994). Notably, however, targets still vary considerably in their tendency to hold a more negative or positive self-image (Leising et al., 2015). Informants have sometimes been considered valuable based on the believed ability to judge evaluative information more objectively with their outsider perspective. However, this certainly does not mean informants are unaffected by evaluative biases: Close acquaintances, especially, will typically tend to like their targets which is reflected in the positivity of their ratings; strangers may take a comparably more negative or neutral stance (Allik et al., 2010; Kim et al., 2019; Vazire & Mehl, 2008). And even regardless of the target, perceivers have been shown to bring different levels of global positivity/negativity to their ratings (i.e., perceiver effects; e.g., Heynicke et al., 2022; Rau et al., 2021). The key point here is that if self and informants are affected by different evaluative biases, or by similar biases but in differing degrees or ways, they will agree more on non-evaluative information because it reflects those biases less (Vazire, 2010). Accordingly, previous studies tend to show lower self-other agreement for personality dimensions considered to be more evaluative (e.g., Agreeableness; Connelly & Ones, 2010; John & Robins, 1993).

Apart from the (unconscious) evaluative biases of perceivers, the evaluative nature of traits may also fuel more conscious impression management efforts by the target. Socioanalytic theory, for example, states that targets may aim to create a certain impression in their self-reports (Hogan, 1996; Hogan & Blickle, 2013). Moreover, targets may use impression management tactics in the information they make available to outside observers (Leary & Kowalski, 1990; Schlenker, 1980) by enhancing or limiting the manifestation of certain behaviors, thereby exacerbating the informational asymmetry based on its evaluative nature. Notably, both unconscious evaluative biases and conscious efforts to control the display of evaluative information would lead to less consensus on evaluative traits.

In the present study, we focus on social desirability as one prominent bias in personality judgments. Other biases may affect judgments beyond that: For example, targets may tend to maintain a consistent perception of themselves by ignoring inconsistent cues or focusing on intentions rather than behaviors (e.g., Funder, 1999; Sadler & Woody, 2003; Swann, 1981). This would affect which cues are translated at all into self-reports.

Perspective-dependent trait centrality

Besides availability or evaluativeness, traits may also differ in how central (i.e., relevant) their content and its implications are to targets and informants, which may in part depend on the different goals the different perceivers pursue when making judgments. Funder (1995) proposes the main purpose of judging others’ personalities is to predict the target’s future behavior. This allows perceivers to make decisions about their future with the target such as whether they want to continue to interact with the target. Following this perspective, informants should be interested in gaining and reporting accurate impressions of the target, but especially of traits that could affect the informant directly (e.g., traits with clear interpersonal consequences; Leising et al., 2014). Whereas accurately gauging the target and establishing an impression of them may be especially important for less intimately acquainted informants, well-acquainted informants may also be motivated by other goals. For example, informants may accentuate what they value about the target (i.e., why they have chosen to stay acquainted) which may in part depend on the informant’s own identity as well (see below). That is, in either case, there may be traits that informants consider to be more central to evaluating another person, and targets changing on those traits would strongly affect how the informants perceive them. Thus, we would expect more unique informant-views on traits that are more central to a target’s reputation.

In providing self-reports, targets may also have the general desire to appraise themselves accurately (e.g., Strube et al., 1986) and to convey their self-perceptions to others (Paulhus & Vazire, 2007). Following socioanalytic theory, targets may use self-reports to achieve a specific self-portrayal and thus reputation with others (Hogan, 1996; Hogan & Blickle, 2013). In any case, some traits will be more important to the target because a change on them would fundamentally alter their (intended) self-portrait, whereas other traits play a peripheral role to identity (Lee et al., 2009; Thielmann et al., 2020, 2023). We would thus expect more unique self-views on traits relevant to the target’s identity.

Summary

Differences in what type of personality insights the shared and unique perspectives of multi-rater judgments reflect may arise because (1) the self and informants have access to different types of personality information. Informants rely on observable behaviors emitted by the target, whereas the target has direct access to their (unobservable) affect, cognitions, and desires. (2) The self and informants tend to have evaluative biases, likely to cause their ratings to diverge. (3) Different traits may be central to the self and informants because of their implications.

These reasons may lead to differences in multi-rater perspectives related to the following characteristics of personality information: Their ABCD-content, observability, evaluativeness, importance for the target’s perspective, and their importance for outside observers. These characteristics are reflected by the person-descriptive items used to capture traits in personality questionnaires (Letzring et al., 2021). For example, an item may reflect a very evaluative and observable behavior (e.g., “insults people”) or a less well observable cognition (e.g., “has a vivid imagination”). Previous research has shown that such item characteristics can be rated by (relatively few) study participants with high reliability (Leising et al., 2014; Letzring et al., 2021; Wilt & Revelle, 2015).

Approaches to multi-rater perspectives on personality

The robust finding of the overlapping but distinctive perspectives of self- and informant-reports along with the myriad factors that may affect them have fueled long-standing debates among researchers on how traits can best be assessed and conceptualized. Broadly, three distinct perspectives have traditionally pervaded this discussion, each with different measurement implications.

First, some scholars argue that either self- or informant-reports reflect what they understand personality to be and thus should be the preferred reference point (e.g., Hofstee, 1994; Hogan, 1998; McCrae & Costa, 1982; Osberg & Shrauger, 1990). Under such formulations, either self- or informant-reports would be the more authoritative assessment. Second, some argue that more accurate personality measures come from averaging scores across self- and informant-reports to maximize the variance accounted for by the overlap in perceptions (e.g., Letzring & Funder, 2021). Although this will reduce the error variance associated with individual raters, it also treats any unique perspective of self- and informant-reports as error that will increasingly cancel out the more raters are averaged across. The underlying rationale is the same as when creating a scale score from multiple items: Averaging across multiple items produces a more reliable representation of a construct of interest. Item-specific variance is considered error and is increasingly averaged out as more items are included. Third, more contemporary scholars argue that self- and informant-reports contain asymmetries in knowledge about traits such that a preference for self- versus informant-reports should depend on which trait is being assessed and which criteria are being predicted (Vazire, 2010; Vazire & Mehl, 2008). For example, Carlson et al. (2013) found that personality self-reports were better predictors of internalizing personality disorders, but informants-reports were better predictors of externalizing personality disorders.

Arguably, however, the value of multi-rater perspectives hinges on knowing what personality insights each perspective—shared and unique—contains. This cannot be achieved with these three approaches but requires simultaneously considering both the shared and unique perspectives across self- and informant-reports. This fourth approach is closely related to the knowledge asymmetries perspective but extends it in such a way that it becomes possible to disentangle the unique and shared perspectives from one another allowing us to subsequently examine what content they respectively reflect. The most recent and extensive elaboration of this approach is the TRI Model developed by McAbee and Connelly (2016). The TRI Model presents an extension and fusion of existing traditions, integrating insights from person perception, classic trait personality, social cognition, and socioanalytic theory. The model builds on prior work such as the Johari window (Luft & Ingham, 1955) and the Self-Other Knowledge Asymmetry Model (Vazire, 2010). It presents an analytical framework that enables the empirical disentanglement of the shared and unique perspectives in multi-rater personality judgments and is thus the missing component to test our assumptions.

The TRI Model (see Figure 1) defines three different perspectives or areas of personality insights that can exist for a given target person. First, the Trait factor contains knowledge that the self and others share. This is closest to a consensus perspective in person perception that is thought to include real trait variance but can also be affected by shared biases (McAbee & Connelly, 2016). We want to emphasize here the distinction between actual trait variance which usually refers to valid variance (i.e., actual differences between people) and the Trait factor as a label from the TRI Model referring to shared self-informant views which may include both valid and biased components. Second, Identity contains the target’s unique self-perception, that is, self-insights that only the target holds and does not share with others. This could be hidden trait knowledge as well as beliefs distorted by self-held biases. Third, Reputations contain insights about the target that are shared by others independent of the target. Finally, there may also be insights that only a given informant holds (i.e., observer uniqueness). Like the other factors, unique informant-views may contain both valid trait knowledge as well as distorted beliefs.

Figure 1.

The Trait-Reputation-Identity Model by McAbee and Connelly (2016). Note. First-level observer factors (Observers A−C) capture observer uniqueness. Residual correlations in the model are not depicted for ease of visualization.

The TRI Model is especially valuable as a methodological framework. Using latent variable modeling, it enables the extraction of Traits, Reputations, and Identity as latent factors by separating the variance of multi-rater judgments into shared and unique portions. This approach has proven to provide valuable insights in studies that have employed the TRI Model thus far (e.g., shedding light on the origins of gender differences in personality perceptions or the predictive value of reputations for job-related outcomes; Connelly et al., 2022; McAbee & Connelly, 2016).

The present study

The goal of this study is to investigate what defines Traits, Reputations, and Identity (i.e., the shared and unique contributions) in multi-rater personality judgments. More specifically, we want to investigate what personality insights each perspective reflects: (1) How do Traits, Reputations, and Identity differentially reflect content components of personality—namely, affect, behavior, cognition, and desire? (2) How do Traits, Reputations, and Identity reflect information depending on its observability and evaluativeness? (3) Do the respective unique contributions of the Identity and Reputation factors reflect information depending on its deemed importance for the target themselves versus for an outside observer?

We preregistered our expectations based on the rationale presented above (https://osf.io/gqmsf); they are as follows:

The Trait factor captures shared variance across self- and informant-judgments. Thus, Traits should reflect information that is available to and used similarly by self and informants.

T1: Information is more strongly reflected in the Trait factor the higher its behavioral component is. The Trait factor reflects behavior more than affect, cognition, or desire.

T2: The Trait factor reflects observable information and non-evaluative information more.

The Reputation factor captures variance unique to informant-judgments. Thus, Reputations reflect information that is either exclusively available to and/or used by informants or that is used differently by informants than by the self.

R1: Information is more strongly reflected in the Reputation factor the higher its behavioral component is. Reputation reflects behavior more than affect, cognition, or desire.

R2: Reputation reflects evaluative information more.

R3: Reputation reflects information deemed more important for an outside observer.

The Identity factor captures variance unique to self-judgments. Thus, Identity reflects information that is either exclusively available to and/or used by the self or that is used differently by the self than by informants.

I1: Information is more strongly reflected in the Identity factor the higher its affective, cognitive, or desire-related component is. Identity reflects affect, cognition, and desire more than behavior.

I2: Identity reflects items low in observability and items high in evaluativeness more.

I3: Identity reflects information deemed more important for the self.

We do not define expectations for the connection of observability with Reputation and treat this as an open research question. This is based on the above-mentioned considerations that while informants should draw more heavily on observable information, we cannot predict whether this will generate informant-rating variance beyond Trait consensus which should be driven by observability. Additionally, the item characteristics are likely not fully independent from one another (e.g., behavioral items may also be more observable). Thus, we also explore whether their effects overlap.

Method

Testing our hypotheses requires the following steps: First, we use the TRI Model to construct latent variable models specifying Trait, Reputation, and Identity factors in samples of personality self- and informant-reports. This results in factor loadings for each item on each perspective (Trait, Reputation, and Identity). Second, we use multilevel regression to predict the factor loadings from the items’ characteristics (e.g., their ABCD-content). Our primary preregistered analysis (https://osf.io/gqmsf) performs these steps on a large multi-rater data set used for the first time in the present study that stems from the Scarborough Integrative Perspectives on Personality Project. Any deviations from the preregistration are explicitly stated. We then replicate the procedure with an existing public data set from the Eugene-Springfield Community Sample (Goldberg, 1999; Goldberg et al., 2006). We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study. All supplements, materials, data, and code referenced in the following can be found on the Open Science Framework (OSF) at https://osf.io/wq3jm.

Sample 1: Scarborough Integrative Perspectives on Personality Project (SIP³)

The SIP³ is a large-scale multi-year data collection effort. Specifically, between 2016 and 2022, cohorts of first-year students from the co-operative management program at the University of Toronto Scarborough were invited to partake in an online self-assessment and subsequently invite 3–6 people from their own social networks to provide informant assessments to receive personality feedback as part of their first-year coursework (REB #32440). In addition, in 2018, management co-op students beyond their first year were recruited as paid research participants to complete the same multi-rater personality inventory (along with an extended set of assessments not analyzed in the current study; REB #36048). Informants were primarily friends (86%) or other personal acquaintances (8%, e.g., significant others). A small portion of informants were professional acquaintances (5%), which were omitted from the present study. Common across all cohorts are personality self- and informant-ratings on the IPIP-NEO-120 (Johnson, 2014).

Measures

Targets and informants, respectively, filled out first- and third-person versions of the IPIP-NEO-120 (Johnson, 2014) which assesses the Big Five personality domains, each with six facets assessed with four items. Responses were given on a 5-point Likert scale (very inaccurate to very accurate). Beyond providing demographic information, targets and informants filled out other measures irrelevant for the present study. A complete overview of all collected variables can be found in the codebook on the OSF.

Participants

We merged the data sets from all cohorts and applied the following general exclusion criteria: (1) Targets were excluded if they agreed to participate but did not share their ratings with the team to be used for research. To eliminate careless responders, we excluded participants who (2) completed the entire assessment procedure in less than 3 minutes, (3) had 50% or more missing values in the IPIP-NEO-120 ratings, or (4) had no or minimal variance in the IPIP-NEO-120 ratings (i.e., items were presented in two blocks of 60 items, exclusion was warranted if variance was <.10 in either of the blocks). (5) We excluded duplicates, meaning targets or informants who had participated again in a second cohort keeping only the original ratings. (6) We aimed to exclude falsified informant entries by checking for multiple (i.e., more than two) responses coming from the same IP address. (7) We excluded informants that after the previous criteria remained without a target as well as targets without any informants. (8) Last, for targets with more than three informants, three were randomly selected. The three informants per target were randomly assigned to the roles of Observers 1−3 for the purpose of estimating TRI Models (see below). For targets with fewer than three informants, vacant informant roles were included as missing values.

The final sample used in the present study includes 664 targets and 1,615 informants; 409 targets had three informants, 133 targets had two, and the remaining 122 targets had one. This total sample size was determined by a combination of methodological concerns (i.e., surpassing N = 500 to ensure model convergence and minimize bias in variance decomposition with bifactor models; Bader et al., 2022) and practical concerns (i.e., the onset of the coronavirus pandemic reducing participation rates). Targets were primarily female (69%), were on average 18.71 years old (SD = 1.26), and reflected a diversity of racial and ethnic backgrounds (60% East/Southeast Asian, 25% South Asian, 9% White, 4% Arab/Middle Eastern, 3% Black, 1% Latin American, <1% Indigenous/Aboriginal, 3% Multiracial, 7% Other; respondents could check multiple options, so percentages do not sum to 100%). Of the 1,615 informants, targets categorized 1,471 as friends and 144 as other personal acquaintances. Further demographic information is included in Supplement A on the OSF.

Sample 2: Eugene-Springfield Community Sample (ESCS)

To investigate the robustness of our findings, we replicated our analyses with a different multi-rater data set of personality judgments. Specifically, we used a subsample of the ESCS, a longitudinal data collection effort that started in 1993 recruiting participants from available lists of homeowners who completed a host of psychological measures. The relevant subsample consisted of 478 targets who provided self-reports on the Big Five Inventory (BFI; Benet-Martínez & John, 1998; John & Srivastava, 1999) in 1998 and three informants per target from their respective social networks (N = 1,434) who rated their target on third-person versions of the BFI. The BFI assesses the Big Five domains with 44 items total (8−10 per domain). Responses were given on a 5-point Likert scale (extremely inaccurate to extremely accurate). The target sample consisted of 64.9% males with a mean age of 49.24 years (SD = 12.76). In the informant sample, 62.2% were female and the mean age was 48.06 years (SD = 18.04). They reported informants to be significant others to the target (2.4%), spouses (21.8%), friends (28.3%), coworkers (11.5%), relatives (28.3%), or acquaintances (1.2%); 7.6% selected “other” or did not report their status.

Analytic approach

Estimation of TRI Models

In a first step, we constructed latent variable models with confirmatory factor analysis specifying Trait, Reputation, and Identity factors in accordance with the procedures in McAbee and Connelly (2016). The TRI Model is a bifactor structure (see Figure 1) in which each item loads on the Trait factor and one rater factor (Identity for self-reports or Reputation for informant-reports); informant-specific Reputation factors also load on a higher-order Reputation factor. Consistent with McAbee and Connelly (2016), Reputation and Identity were kept orthogonal to help stabilize the model and to preserve the Trait factor as representing consensus between self- and informant-reports. We correlated residual variances for like items across raters. All latent factor variances were fixed to 1, and informants were modeled as interchangeable (i.e., factor loadings, intercepts, residual variances, and residual covariances for like items/factors were constrained to equality across informants, and model fit estimates were adjusted using Interchangeable-Saturated Models; Olsen & Kenny, 2006). Missing values were handled using robust maximum likelihood estimation. To aid interpretability when encountering common problems during model estimation, we refit models following the a priori strategies of (a) constraining factor loadings to be in a consistent direction and (b) fixing residual variances to zero for any items with initially negative residual variances. Latent variable models were fit using Mplus (version 8.7; Muthén & Muthén, 1998−2017). Analysis code is available on the OSF. In Sample 1, we fit a separate TRI Model for each of the 30 IPIP-NEO-120 facets (4 items per facet).¹ In Sample 2, we drew on analyses reported in McAbee and Connelly (2016) that fit separate TRI Models for each Big Five domain (8–10 items per domain). Fit statistics for the TRI Models can be found in Supplement C for Sample 1 and in Table 1 of the original paper for Sample 2 (McAbee & Connelly, 2016, p. 576). Indices of close fit were in similar ranges for both samples, largely within typical benchmarks showing moderate to strong fit (e.g., Hu & Bentler, 1999; Kline, 2015).

Item content and characteristics

Of central interest to this study are the factor loadings of the items on each of the three latent factors Trait, (first-level) Reputation, and Identity. We aimed to predict items’ loadings from their (1) ABCD-content, (2) observability and evaluativeness, and (3) item’s importance for Identity and Reputation (available for Sample 1 only). To this end, we assembled these item characteristics for the IPIP-NEO-120 (Sample 1) and for the Big Five Inventory (Sample 2). Table 1 shows example items for the different characteristics from both measures.

Table 1.

Example Items for Different Characteristics.

Characteristic (scale range)		S1: IPIP-NEO-120 example items	S2: BFI example items
Observability (1–7)	High	Yells at people (6.00); radiates joy (6.00)	Is fully of energy (6.20); is outgoing, sociable (6.11)
Observability (1–7)	Low	Does not like poetry (2.20); doesn’t understand people who get emotional (2.92)	Has few artistic interests (3.92); is ingenious, a deep thinker (4.22)
Evaluativeness (0–3)	High	Keeps his/her promises (2.77); uses others for his/her own ends (2.73)	Makes plan and follows through with them (2.10); is considerate and kind to almost everyone (2.08)
Evaluativeness (0–3)	Low	Does not like poetry (0.55); prefers to stick with things that he/she knows (0.55)	Tends to be quiet (0.06); prefers work that is routine (0.08)
Reputation Centrality (1–7)	High	Keeps his/her promises (6.37); loves to help others (6.33)
Reputation Centrality (1–7)	Low	Does not like poetry (2.48); does not enjoy going to art museums (2.78)
Identity Centrality (1–7)	High	Works hard (6.16); is concerned about others (6.12)
Identity Centrality (1–7)	Low	Does not like poetry (2.44); does not enjoy going to art museums (2.60)
A		Often feels blue; feels comfortable with him-/herself	Can be moody; gets nervous easily
B		Leaves a mess in his/her room; is always on the go	Is talkative; is a reliable worker
C		Has difficulty understanding abstract ideas; believes that he/she is better than others	Is curious about many different things; is inventive
D		Seeks adventure; cheats to get ahead	Values artistic, aesthetic experiences; prefers work that is routine

Note. For observability, evaluativeness, reputation centrality, and identity centrality, the two items with the highest/lowest mean scores (shown in parentheses) are presented. For ABCD, two representative items from each category were selected. S1: Sample 1; S2: Sample 2.

Sample 1

For Sample 1, we drew ratings of affect, behavior, cognition, and desire for the IPIP-NEO-120 from Wilt and Revelle (2015). In their study, participants indicated the proportions (in %) to which each item (from longer IPIP-versions) contained affective, behavioral, cognitive, or desire-related content. The reliability of these ratings ranged between .78 and .88 for the four ABCD-components (i.e., reliability of average raters indicated by ICC(3,6) following Shrout & Fleiss, 1979). We used the ABCD-ratings as categorical variables: An item with the highest percentage in affect (compared to B/C/D) is considered an A-item, and so on. The other option of using them as continuous variables where each item is associated with four percentages (for A/B/C/D, respectively) was only available for Sample 1 and is reported in Supplement D.

To obtain ratings of observability, evaluativeness, and items’ centrality to reputation/identity, we collected ratings from MTurk participants who were randomly assigned to rate one of the four characteristics for all IPIP-NEO-120 items. That is, they were asked to indicate for each item (1) how easy it would be to observe for an outside observer, (2) how positive (desirable) it would make a person look (i.e., social desirability from which evaluativeness is derived; see below), (3) how important each statement is for the understanding of one’s own identity, or (4) how important each statement is for a target’s reputation. Observability and social desirability have been successfully assessed for other instruments in previous studies; we adapted instructions based on John and Robins (1993). Responses were made on 7-point Likert scales (extremely difficult to extremely easy for observability and extremely undesirable to extremely desirable for social desirability). Evaluativeness is calculated from social desirability: An item’s evaluativeness ignores the positivity-negativity-polarity and is reflected by the deviation of the item’s social desirability score from the scale mean (i.e., 4 neither undesirable nor desirable). For the trait centrality dimensions, we referred to instructions used by Thielmann et al. (2023). Ratings were made on 7-point Likert scales from very unimportant to very important.

Per dimension, we aimed to collect 30−35 complete responses. Even expecting to exclude some participants (e.g., due to careless responding), previous studies have shown this to be more than enough for highly reliable ratings of item characteristics (e.g., Condon et al., 2020; Leising et al., 2014). Participants were required to live in North America and speak English. They received 4 USD for completing the assessment. Of 123 approved responses, we excluded participants that completed the survey in less than 3 minutes or showed clear indication of being fake (i.e., multiple responses from the same IP with the same nonsensical response to an open-ended question). The final sample comprised 99 participants (22−27 per characteristic) with a mean age of 39.99 years (SD = 11.16). Self-reported gender was female in 40 cases and male in 58 cases (one preferred not to say). Instructions, sample information, descriptive statistics, and the average ratings for all items can be found on the OSF.

Sample 2

For Sample 2, the ABCD-content of BFI items was rated by three of the authors of the present study, who are experts in personality and psychological measurement. Each rater separately assigned one content category to each item (κ = .85). In cases of disagreement, the final categorization was decided via discussion.

Ratings of observability and evaluativeness for BFI items were taken from a study by Huelsnitz et al. (2020) where a sample of 123 undergraduates provided ratings on 7-point Likert scales (not at all observable to extremely observable for observability and not at all desirable to extremely desirable for social desirability). Evaluativeness was again calculated from social desirability as described above.

Prediction of factor loadings

We used multilevel modeling to predict items’ factor loadings on the TRI factors from different item characteristics. Separate mixed effects models were calculated for each of the three latent factors. In these models, items are clustered within Big Five facets for Sample 1 (instead of domains as TRI Models had been estimated for each facet) and within Big Five domains for Sample 2. Additionally, for the Trait factor, the rating source (self vs. informant) was clustered within items. All continuous predictors were Z-scored. To examine our predictions, we constructed regression models with the following item-level predictors (in different models): (1) The categorical ABCD-ratings (dummy-coded to compare behavioral vs. non-behavioral items), (2) observability and evaluativeness, and (3) importance for one’s identity and importance for a target’s reputation (for Sample 1 only). To illustrate, below is the equation of the model which predicts factor loadings on the Identity factor testing whether behavioral items load higher than non-behavioral:

Identity Model I1b: ABCD-Content (Dummy-Coded B vs. ACD)

Level 1	$Y_{i d} = β_{0 d} + β_{1 d} \times B_{i} + r_{i d}$
Level 2	$\begin{array}{l} β_{0 d} = γ_{00} + γ_{01} \times {B i g F i v e}_{f a c e t / d o m a i n} + u_{0 d} \\ β_{1 d} = γ_{10} \end{array}$
Where	$Y_{i d} =$ factor loading of item I belonging to Big Five domain/facet d; $r_{i d} =$ random residuals for level 1; $u_{0 d} =$ random residuals for level 2

All model equations can be found in the preregistration. Additionally, we explored models containing multiple predictors (e.g., ABCD-content as well as observability and evaluativeness) to gauge whether their effects are independent or not. Multilevel regression analyses were performed using R (version 4.1.1; R Core Team, 2021) and the lme4-package (Bates et al., 2015). Analysis code is available on the OSF.

Results

Estimation of TRI Models

Figure 2 shows the proportion of explained variance in the TRI Models for the 30 facets in the IPIP-NEO-120 in Sample 1 that can be attributed to Trait, Reputation, and Identity. The equivalent figure for Sample 2 (comprising the Big Five domains in the BFI) is found in Figure 2 of the original paper (McAbee & Connelly, 2016, p. 577). In Sample 1, Trait variance ranged from 17 to 58%, Identity from 7 to 44%, Observer Uniqueness from 18 to 44%, and Reputation from 0 to 11%. In Sample 2, ranges were similar for Trait (34−69%) and Identity (8−39%), but somewhat lower for Observer Uniqueness (2−27%) and higher for Reputation (8−19%). Also note that in Sample 1, we re-estimated two TRI Models (for the Openness facets “Artistic Interests” and “Adventurousness”) without the second-level Reputation factor as they originally failed to converge with first-level Reputation factors not loading onto the higher-level factor. Therefore, Reputation variance is 0% for those facets in Figure 2. The remaining models only required constraints in line with a priori strategies (i.e., constraining factor loadings to positivity or fixing residual variances to zero²) or no modifications at all.

Figure 2.

Proportion of Explained Variance Accounted for by Trait, Reputation, and Identity Factors.

The smaller Reputation variance may reflect the overall moderate inter-informant agreement in Sample 1 (see Supplement B for descriptive statistics for the personality ratings). In this sample, targets were first-year university students asked to solicit informants from their own network which included an array of contexts (e.g., peers from university and hometown friends) and from individuals with whom they may not have developed a cogent reputation that spans across informants. Accordingly, consensus between informants generally was not markedly stronger than self-informant consensus. We return to the pattern of lower Reputation variance in Sample 1 in the Discussion section.

Prediction of factor loadings

Next, we predicted the standardized factor loadings derived from the above model estimation from the items’ characteristics. The results for our preregistered models are found in Table 2 for predicting Trait, first-level Reputation, and Identity factor loadings, respectively. Specifically, in the following we focus on our fixed effects for which Table 2 displays the regression weights: A fixed effects estimate (b) for a continuous predictor (i.e., observability, evaluativeness, and reputation/identity centrality) signifies that if an item has a characteristic (e.g., observability) one standard deviation above the mean, its factor loading would be increased by b. For dummy-coded predictors such as B versus ACD, the fixed effect regression coefficient (b) is the change in factor loading for a B- versus an ACD-item. We include confidence intervals for each regression coefficient to help gauge estimation precision. Note that these intervals are based on the sampling of items rather than the sampling of individuals. That is, our dependent variable is an item’s loading on a given factor and the “sample size” for these analyses is based on the number of items in the inventories (IPIP-NEO-120 = 120 and BFI = 44) rather than the number of targets in the sample. In the following, we discuss the main patterns emerging from these analyses.

Table 2.

Multilevel Regression Results for Predicting Trait, First-Level Reputation, and Identity Factor Loadings.

Model		Trait		Reputation (first-level)		Identity
Model		S1: IPIP-NEO-120	S2: BFI	S1: IPIP-NEO-120	S2: BFI	S1: IPIP-NEO-120	S2: BFI
Model 1: ABCD-Content	Fixed effects	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]
	Intercept	.25 [.21, .29]	.30 [.19, .40]	.53 [.48, .57]	.52 [.46, .58]	.38 [.32, .45]	.40 [.30, .50]
	Source (Self vs. Informant)	.24 [.20, .28]	.16 [.11, .22]
	ABCD (B vs. ACD)	−.01 [−.06, .05]	.04 [−.10, .18]	.08 [.01, .14]	.01 [−.06, .08]	−.10 [−.20, .00]	−.15 [−.28, −.01]
	Source × ABCD	.08 [.01, .14]	.07 [−.01, .16]
	Random effects	SD	SD	SD	SD	SD	SD
	Intercept item	.06	.16
	Intercept B5 Facet/Domain	.05	.08	.06	.05	.00	.06
	Residual	.13	.10	.16	.09	.28	.19
Model 2: Observability and Evaluativeness	Fixed effects	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]
	Intercept	.25 [.22, .238]	.31 [.25, .38]	.56 [.53, .59]	.53 [.47, .58]	.34 [.29, .39]	.33 [.23, .43]
	Source (Self vs. Informant)	.27 [.24, .30]	.20 [.16, .24]
	Observability	.02 [−.01, .05]	.03 [−.03, .10]	.02 [−.01, .05]	−.02 [−.06, .03]	−.06 [−.11, −.01]	−.06 [−.15, .02]
	Evaluativeness	−.02 [−.05, .01]	−.06 [−.12, .00]	.06 [.03, .09]	−.01 [−.05, .02]	−.02 [−.07, .03]	.08 [.01, .15]
	Source × Observability	.03 [00^a, .06]	.05 [00^a, .09]
	Source × Evaluativeness	.06 [.03, .09]	−.03 [−.07, .02]
	Random effects	SD	SD	SD	SD	SD	SD
	Intercept item	.06	.15
	Intercept B5 Facet/Domain	.04	.04	.00	.06	.00	.10
	Residual	.12	.10	.16	.09	.27	.18
Model 3: Centrality	Fixed effects			Est. [95% CI]		Est. [95% CI]
	Intercept			.56 [.53, .59]		.34 [.29, .39]
	Reputation centrality			.03 [.00^a, .06]
	Identity centrality					.00 [−.05, .05]
	Random effects			SD		SD
	Intercept B5 Facet/Domain			.03		.00
	Residual			.17		.28

Note. S1 = Sample 1, S2 = Sample 2. Source was dummy-coded as 0 = informant and 1 = self. ABCD-categorization was dummy-coded as 0 = affect, cognition, or desires and 1 = behaviors. Confidence intervals are based on the item sample sizes (i.e., IPIP-NEO120 = 120 and BFI = 44).

^aCI limit <.01 and >.001.

Trait

To examine how item properties affect self-informant consensus, we predicted the items’ Trait factor loadings from the items’ characteristics. We also included main and interaction effects of the rating source, that is, whether the rating stemmed from the target or informants (dummy-coded as self = 1, informant = 0). Thus, the effects for the Trait factor loadings in Table 2 can be interpreted as follows: (1) The main effect of an item-level predictor (e.g., observability) shows the effect of that predictor for the Trait loadings of informants’ ratings (the dummy-coded reference category). (2) The main effect plus the interaction effect with the rating source (e.g., observability plus source × observability) shows the effect of that predictor (i.e., observability) in the Trait loadings of targets’ self-ratings. Note that diverging from our preregistration, we excluded the random item-level variation of the source-effect as the inclusion of this effect left no residual variance in several cases leading to model non-convergence.

Overall, we found that behavioral items had higher Trait loadings than non-behavioral items for self-reports (Sample 1: b_ABCD + b_Source×ABCD = .07; Sample 2: b_ABCD + b_Source×ABCD = .11), but this effect was diminished and not discernible from 0 for informant-report Trait loadings (b_ABCD = −.01 and .04 in Samples 1 and 2, respectively). This was similar for observability: Items with higher observability had higher Trait loadings for self-reports (Sample 1: b_{Observability} + b_{Source×Observability} = .05; Sample 2: b_{Observability} + b_{Source×Observability} = .08) but much less so for informant-report loadings (b_{Observability} = .02 and .03 in Samples 1 and 2, respectively). The effect of evaluativeness was partly inconsistent across our two samples. For informant-ratings, items with higher evaluativeness had somewhat lower Trait loadings (b_{Evaluativeness} = −.02 and −.06 in Samples 1 and 2, respectively). For BFI self-ratings in Sample 2, this effect was even stronger (b_{Evaluativeness} + b_{Source×Evaluativeness} = −.09), but for IPIP-NEO-120 self-ratings in Sample 1 it reversed such that more evaluative items were associated with slightly higher Trait loadings (b_{Evaluativeness} + b_{Source×Evaluativeness} = .04). In summary, we found that behavioral items and observable items tended to promote self-informant agreement. Evaluativeness was mostly associated with less Trait consensus but actually promoted consensus for Sample 1 self-ratings. One explanation may lie in unique features of the different questionnaires such that evaluative items are associated with certain ABCD-content (see post-hoc exploration). Another possibility may be features of the informants who may, for example, be partially biased in a way that aligns with the target’s self-perception, thus promoting agreement.

Reputation

To examine how item characteristics relate to informants’ distinct reputation perceptions, we regressed first-level (rater-specific) Reputation factor loadings on sets of item characteristics. The results were overall less clear: Behavioral items had higher loadings for Sample 1 (b_ABCD = .08) but not for Sample 2 (b_ABCD = .01). The effects of observability were small and in opposite directions across samples (b_{Observability} = .02 and −.02 for Samples 1 and 2, respectively) with the standard errors of these effects including zero, suggesting that observability had little relation to Reputation saturation. Evaluativeness had a moderate positive effect in Sample 1 (b_{Evalzativeness} = .06) but a near-zero effect in Sample 2 (b_{Evaluativeness} = −.01). Finally, the reputational centrality of items was only slightly related to the magnitude of first-level Reputation factor loadings (b_RCentrality = .03).³ On the whole, Sample 1 showed more unique informant impressions for behavioral and for evaluative items in ratings on the IPIP-NEO-120, whereas no clear pattern emerged for BFI-ratings in Sample 2.

Identity

For loadings on the Identity factor, negative effects emerged for behavioral items (b_ABCD = −.10 and −.15 for Samples 1 and 2, respectively) and for observability (b_{Observability} = −.06 in each sample). Thus, consistent with expectations, targets had more unique self-views for non-behavioral items and for less observable items. Results for evaluativeness were inconsistent: There were more unique self-views on more evaluative items in Sample 2 (b_{Evaluativeness} = .08) as predicted but not in Sample 1 (b_{Evaluativeness} = −.02). Finally, ratings of items’ centrality to identity were unrelated to their factor loadings on Identity (b_ICentrality = .00).

Post-hoc exploration

The results to our preregistered analyses highlighted two important points to follow-up on post-hoc. First, drawing on existing theory and in pursuit of statistical parsimony, our multilevel regression models did not differentiate between affective, cognitive, and desire-related items. However, it is possible that these three types of items might have distinctive effects. Therefore, we examined the ABCD-separated mean factor loadings on Trait, first-level Reputation, and Identity factors (see Table 3) as well as entering ABCD-content as a four-tiered factor into regression models (using behavioral items as the reference category; see Table 4). Second, we had investigated ABCD-content separately from observability and evaluativeness. However, these dimensions may not be independent from one another. Specifically, we found that, descriptively, the relation between ABCD-content with observability and evaluativeness was distinct for the two different questionnaires (see Supplement E). In the IPIP-NEO-120, for example, behavioral items were also rated as the most evaluative on average. In the BFI, affective items were associated with the highest mean evaluativeness. Thus, to disentangle these effects, we decided to exploratorily enter observability and evaluativeness into one common regression model with the above-mentioned ABCD-factor. Results are shown in Table 4.

Table 3.

ABCD-Separated Mean Standardized Factor Loadings (SD).

	S1: IPIP-NEO-120				S2: BFI
	i	T	R	I	i	T	R	I
A	46	.40 (.19)	.54 (.19)	.33 (.28)	9	.47 (.19)	.60 (.11)	.36 (.24)
B	49	.40 (.21)	.60 (.11)	.28 (.26)	20	.46 (.22)	.51 (.09)	.23 (.17)
C	22	.31 (.19)	.51 (.20)	.50 (.30)	12	.34 (.25)	.52 (.09)	.46 (.24)
D	3	.46 (.20)	.41 (.36)	.27 (.08)	3	.19 (.04)	.44 (.15)	.39 (.15)

Note. i shows the number of items for a given ABCD-category. S1 = Sample 1, S2 = Sample 2. T = Trait, R = first-level Reputation, I = Identity. For Trait, the average across all Trait loadings (self and informant) is shown, and separated loadings are shown in Supplement F.

Table 4.

Exploratory Multilevel Regression Results for Predicting Trait, First-Level Reputation, and Identity Factor Loadings.

Model		Trait		Reputation (first-level)		Identity
Model		S1: IPIP-NEO-120	S2: BFI	S1: IPIP-NEO-120	S2: BFI	S1: IPIP-NEO-120	S2: BFI
Separated ABCD-Content	Fixed effects	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]
	Intercept	.27 [.23, .31]	.36 [.27, .44]	.60 [.55, .65]	.52 [.47, .57]	.28 [.21, .36]	.24 [.15, .34]
	Source (Self vs. Informant)	.27 [.24, .30]	.20 [.15, .24]
	A	−.01 [−.06, .04]	.01 [−.14, .16]	−.06 [−.14, .01]	.06 [−.02, .15]	.05 [−.06, .16]	.13 [−.04, .30]
	C	−.10 [−.15, −.04]	−.11 [−.24, .03]	−.08 [−.17, .00]	.00 [−.08, .07]	.21 [.07, .35]	.20 [.05, .35]
	D	.06 [−.07, .18]	−.26 [−.47, −.04]	−.19 [−.38, .00]	−.08 [−.20, .03]	−.01 [−.33, .30]	.13 [−.11, .37]
	Random effects	SD	SD	SD	SD	SD	SD
	Intercept item	.05	.16
	Intercept B5 Facet/Domain	.04	.03	.05	.02	.00	.04
	Residual	.13	.10	.16	.09	.27	.19
Separated ABCD-Content, Observability, and Evaluativeness	Fixed effects	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]	Est. [95% CI]
	Intercept	.26 [.22, .30]	.33 [−25, .41]	.58 [.54, .63]	.53 [.48, .59]	.30 [.22, .38]	.28 [.16, .40]
	Source (Self vs. Informant)	.27 [.24, .30]	.20 [.15, .24]
	A	.00 [−.05, .05]	.07 [−.05, .20]	−.04 [−.10, .03]	.06 [−.04, .15]	.03 [−.08, .14]	.09 [−.12, .29]
	C	−.07 [−.13, −.01]	−.06 [−.20, .09]	−.04 [−.12, .05]	−.04 [−.12, .05]	.16 [.02, .31]	.10 [−.09, .28]
	D	.07 [−.05, .19]	−.29 [−.49, −.08]	−.15 [−.33, .04]	−.13 [−.25, −.01]	−.04 [−.35, .28]	.09 [−.16, .34]
	Observability	.03 [.01, .05]	.04 [−.03, .10]	.01 [−.02, .04]	−.03 [−.07, .01]	−.04 [−.09, .01]	−.05 [−.13, .04]
	Evaluativeness	.01 [−.01, .03]	−.09 [−.14, −.04]	.06 [.03, .09]	−.02 [−.05, .02]	−.01 [−.06, .04]	.07 [.00, .14]
	Random effects	SD	SD	SD	SD	SD	SD
	Intercept item	.03	.14
	Intercept B5 Facet/Domain	.04	.00	.00	.03	.00	.08
	Residual	.13	.10	.16	.08	.27	.18

Note. S1 = Sample 1, S2 = Sample 2. Source was dummy-coded as 0 = informant and 1 = self. ABCD-categorization was entered as a four-tiered factor with “behaviors” as the reference category. Confidence intervals are based on the item sample sizes (i.e., IPIP-NEO120 = 120 and BFI = 44).

^aCI limit <.01 and >.001.

The ABCD-separated mean factor loadings in Table 3 descriptively show an interesting pattern that is consistent across the two samples: For the Trait factor, affective and behavioral items show similar mean loadings (S1: ${\bar{λ}}_{A}$ = .40 and ${\bar{λ}}_{B}$ = .40; S2: ${\bar{λ}}_{A}$ = .47 and ${\bar{λ}}_{B}$ = .46), whereas cognitive items load visibly lower ( ${\bar{λ}}_{C}$ = .31 and .34). In contrast, for the Identity factor, affective items have somewhat higher loadings than behavioral items on average (S1: ${\bar{λ}}_{A}$ = .33 and ${\bar{λ}}_{B}$ = .28; S2: ${\bar{λ}}_{A}$ = .36 and ${\bar{λ}}_{B}$ = .23); but cognitive items load markedly higher ( ${\bar{λ}}_{C}$ = .50 and .46). We refrain from drawing conclusions for the desire-related items in these analyses because of their small representation in the item pools (i.e., only three items in each questionnaire were classified as desire-related). The regression models in Table 4 test and supports this descriptive pattern: For parsimony of the exploratory models, we removed the interactive effects with rating source (though models including these interactions can be accessed in Supplement G). Cognitive items seem to especially draw unique self-views (b_C = .21 and .20 for Samples 1 and 2, respectively) and less self-informant agreement (b_C = −.10 and −.11 for Samples 1 and 2, respectively). In contrast, affective items seemed to promote agreement comparable to behavior (b_A = −.01 and .01 for Samples 1 and 2, respectively).

Finally, we sought to disentangle the effects of ABCD-content from observability and evaluativeness. When observability and evaluativeness were entered into the regression model additionally, effects of both ABCD-content and observability or evaluativeness were somewhat diminished or enhanced. However, the overall patterns were still consistent with previous results: For example, affective items had similar loadings for the Trait factor as behavioral items, whereas cognitive items drew especially unique self-views and less self-informant agreement. Observability had positive effects on Trait loadings but negative ones on Identity loadings. The effects of evaluativeness and those for Reputation loadings remained inconsistent across samples. Overall, this exploration suggests that ABCD-content and observability/evaluativeness contribute overlapping but non-redundant information about which items are associated with consensual versus discrepant personality perspectives.

Discussion

Although personality psychology has a long history of using self- and informant-reports, there is a lack of knowledge on how to actually integrate and leverage the personality insights and knowledge asymmetries from these perspectives. The present study shows how disentangling and understanding the insights held by the different perspectives may build a foundation toward that goal. Specifically, in two large multi-rater data sets, we examined how item characteristics relate to the shared (Trait factor loadings) and unique (Reputation and Identity factor loadings) perspectives across self and others.

The findings from this study support several general conclusions. First, we saw that the ABCD personality content reflected in items was important. At first glance, behavioral items seemed to promote Trait consensus, though they were much more important for the target’s than the informants’ Trait factor loadings. However, we found that contrasting behavioral with non-behavioral content obfuscated a more nuanced picture where affective and cognitive items had rather specific effects. In fact, items assessing cognitions generally produced the weakest Trait consensus between self and informant, whereas the agreement on affective content was comparable to that on behaviors. Conversely, cognitive items were among the strongest markers for Identity and behavioral items were among the weakest.

These results may be grounded in different reasons. First, they may in part be explained by the type of informants in the studies. Specifically, informants were relatively intimately acquainted with targets (i.e., friends, family, and significant others), which likely afforded them more privileged access to targets’ feelings (e.g., Colvin & Funder, 1991; Vazire, 2010). Although intimacy would seemingly also afford access to cognitions, it may be that informants attend more closely to emotional expressions than cognitions as anchor points when forming trait perceptions. Alternatively, even at increased intimacy, cognitions may be more rarely expressed in behavior than emotions—or their expression may be more controlled by the target (and therefore less aligned with the target’s internal state). That is, we may deem both emotions and cognitions to be primarily internal processes, yet emotions may more readily and easily bleed into behaviors including subtle and even unconscious expressions such as posture or facial expressions, whereas cognitions have to be expressed verbally. In any case, targets’ thoughts appear to be relatively private and constitute markers for navigating the inner world of a person’s unique self-views. Regarding desire items, we could not draw clear conclusions: Both questionnaires contained only three items categorized as such and results were thus accompanied by large standard errors. The underrepresentation of desire items in Big Five measures has been noted before by Wilt and Revelle (2015). If personality researchers believe that desires represent an important component of the personality space, future measures should include a greater proportion of desire-relevant content to elucidate the role of desires in the formation of trait judgments across rating sources.

Second, as hypothesized, more observable items generally were associated with stronger loadings on the Trait factor and weaker loadings on the Identity factor. This finding corresponds to long-standing person perception theory emphasizing the centrality of trait information being contextually relevant and perceptually available for reaching consensus (Funder, 1995; Kenny, 1994). The importance of observability was more pronounced for loadings of targets’ self-reports on the Trait factor than for informant-reports. In contrast, items low in observability generally produced especially strong loadings for self-reports on the Identity factor, suggesting that Identity factors largely emphasize more private, low visibility trait content.

Our results for evaluativeness were less consistent between samples. Though there was some evidence suggesting more evaluative items had weaker loadings on the Trait factor, moderation by source and the impact of evaluativeness on Reputation and Identity factors did not produce replicable patterns between the two samples. One source of these differences may lie in the relation of evaluativeness and social desirability in the two different measures (see also Supplement E): In the BFI, these two dimensions were positively correlated (r = .38, p = .012) because many of the most evaluative items were desirable (i.e., positive) items. In the IPIP-NEO-120, these dimensions were not significantly correlated (r = −.05, p = .57) and evaluative items included both positive and negative items. If social desirability (i.e., the degree of positivity) and not merely evaluativeness affects (targets’) responses to these items, this may in part explain why for the BFI-ratings in Sample 2, evaluativeness promoted unique self-views and led to less Trait consensus, whereas this was not the case in Sample 1.

Next, we also examined the notion that some items may assess content that is more central to targets’ identity or reputation. However, item centrality ratings were generally unrelated to loadings on the respective Identity and Reputation factors. Thus, how central an item is for a person’s identity or reputation may represent more of an individual difference. Accordingly, identity/reputation centrality may be better studied from analytic approaches equipped to studying idiographic elements of consensus and discrepancy in personality perceptions (e.g., Biesanz, 2010).

It is notable that in contrast to Trait and Identity, the effects for predicting Reputation factor loadings were less consistent and clear. In Sample 1, for instance, informants had more unique views on behavioral than non-behavioral items as well as on more evaluative items. In Sample 2, there were more unique informant-views on affective content specifically as well as less observable content, but no effect of evaluativeness. It is difficult to establish whether that stems from differences in the measures or informants (see limitations) or is a property of informants’ unique insights being less content-specific.

Finally, one subtler but important finding was the strong effects of rater uniqueness (both in the strength of the Identity factor and in the observer uniqueness factors). For most facets, uniqueness accounted for between 25 and 35% of variance in Figure 2, reflecting nontrivial discrepancies in how individual raters viewed a target despite the presence of generally stronger Trait factors. Thus, although there is meaningful consensus across raters, measuring a target’s personality from a single rater will always afford an incomplete picture that is mired with rater method variance. This is in line with Connelly and Ones’ (2010) findings that personality interrater reliabilities are meaningful but modest, and these findings highlight the importance of adopting multi-rater measures of personality (much in the same way that researchers may aggregate multiple items to assess a trait with improved reliability).

In addition, it was a notable divergence from past TRI Model research (Connelly et al., 2022; McAbee & Connelly, 2016) that variance accounted for by the second-order Reputation factor was relatively weak across most facets in Sample 1, suggesting that raters’ unique perspectives do not converge strongly beyond the Trait factor. On one hand, this weak Reputation variance may be a product of this particular sample: Informants were generally friends and classmates, and since targets were only first-year university students, a meaningful reputation may not have emerged (yet) within this particular context. On the other hand, it may be that the present study’s TRI modeling of facets instead of factors produced relatively weaker Reputation variance. Indeed, it may be that Reputations exist at a much more general level (e.g., a reputation for agreeableness) rather than at the level of more specific traits (e.g., a reputation for sympathy). Regardless of whether observer uniquenesses coalesce into a distinct reputation or diverge (beyond the Trait factor), these findings underscore the value added by multi-rater measures.

Implications

These results have important implications for studying person perception, integrating multi-rater perspectives, and improving personality measurement. The person perception literature has typically emphasized how accuracy is facilitated by high observability and inhibited by high evaluativeness (e.g., Funder, 1995; John & Robins, 1993; Kenny, 1994), suggesting that behavioral manifestations of neutrally desirable personality traits should produce the strongest consensus. Finding that items’ Trait factor loadings were strong for not only behaviors but also for affect suggests that either (acquainted) informants may be similarly attuned to some more internal trait tendencies or that affective traits may be less internal than we typically believe. By contrast, items assessing cognitions produced the least consensus between self and informants, strongly defining Identity factors and reflecting elements of traits relatively unknown to others. The informants in these studies, who were primarily friends and family members, may have been particularly attuned to affective trait content in targets, though it is notable that the benefits of intimacy for consensus did not extend to cognitions. Accordingly, person perception researchers may fruitfully further explore what makes cognitions so differentiated.

Our findings also have clear implications for how personality measures can be created to leverage the knowledge asymmetries between self- and informant-raters. That is, inventory creators can specifically orient their instruments toward assessing Trait, Reputation, or Identity components of personality dimensions. Traits, for example, may be best assessed by writing a mix of affective and behavioral items that are outwardly observable to others. Writing items about associated cognitions may provide specifically useful anchor points in assessing Identity for a given personality dimension. Notably, this requires creators to explicitly consider these content dimensions, especially the ABCD-content, when designing their instruments. Currently, most instruments assessing the Big Five are compiled without such considerations and thus are unbalanced in that regard (Jackson et al., 2010; Wilt & Revelle, 2015). For example, Conscientiousness items in both questionnaires used were largely behavioral, which has been found for Big Five instruments in general (Pytlik Zillig et al., 2002). An interest in assessing the unique self-views on this dimension could be aided by including related cognitions. Our results were less pronounced and somewhat inconsistent for Reputation. Gearing instruments toward assessing unique informant-views will require additional research, but researchers may also need to consider the specific type of informants used. In Sample 1, Reputation factors seemed to be defined by evaluative items that were related to behaviors. However, these effects did not replicate in Sample 2, which showed substantial unique views on affective items in particular. One final point for personality assessments is to consider that consciously including evaluative items can be valuable if the goal is to capture unique views which is distinct from existing efforts to make items non-evaluative to reduce social desirability in responses (e.g., Bäckström et al., 2009).

Limitations and future directions

The following potential limitations should be considered in interpreting the present findings: First, our samples of informants comprised individuals that were (1) relatively acquainted or even well-acquainted with their respective target and (2) nominated by the target. Thus, these informants may have had (more) opportunities to gain insights into the targets’ feelings (etc.) and may also have generally positive attitudes toward the target. Both knowing and liking can affect the pattern of self-other agreement (e.g., Leising et al., 2010; Wessels et al., 2020). Additionally, the samples included primarily informants from personal contexts, which still comprised quite different groups (e.g., university peers, longtime friends, and relatives). Future studies may thus consider and systematically vary (1) the level of acquaintance as well as the (2) contexts of informants to see how they define Reputation.

Second, our findings may also be influenced by the particularities of the two personality inventories used (i.e., IPIP-NEO-120 and BFI). For example, both questionnaires contained very few desire items making it difficult to draw any conclusions for this content component. Future research may aim to explicitly write and include such items. As desires are included in definitions of personality, it would be important to understand how they operate in terms of perceptions of them. Additionally, we must consider the overall sample size of items included in inventories: The BFI consists of just 44 items, limiting the sample size in the respective regression models. The IPIP-NEO-120, on the other hand, offers a greater range in the different item characteristics with its 120 items. At the same time, however, it proved difficult for estimating Trait, Reputation, and Identity factors in the first place due to its facet structure. Even at the facet level, not all TRI Models converged, and the corresponding Trait, Reputation, and Identity factors often reflected facet-content. This may in part stem from the lack of internal consistency for some of the facets (see Supplement B). For future investigations in this line, personality inventories should be selected with consideration for both (1) their overall item number but also (2) the degree to which their items actually assess one unidimensional facet or domain.

Finally, our central analyses focus on the prediction of a vector of factor loadings which have been shown to require large sample sizes to be estimated reliably. For example, Hirschfeld et al. (2014) showed that for some personality inventory items factor loadings (in exploratory factor analysis) can be unstable even with more than 500 or 1,000 participants. We do use two of the largest multi-rater data sets that TRI modeling can be applied to: With their target sample sizes of 664 and 478, they are in a range where many items do stabilize. Additionally, the Reputation-relevant loadings are based on the larger numbers of informants which should make them more stable. However, collecting larger multi-rater samples as well as increased item sample sizes will be essential to further progress in future studies.

Conclusion

Across the last few decades, personality psychologists have considered the asymmetric and complementary value of self- and informant-reports. However, this was the first study to investigate what personality content is reflected in the disentangled shared and unique components of multi-rater judgments. Our findings suggest that observable, behavioral, but interestingly also affective content may be well suited to assess consensus, whereas targets’ unique self-perceptions are captured especially by cognitive content. These insights build the necessary prerequisite to successfully leverage multi-rater perspectives in personality judgments. Specifically, we invite researchers and practitioners to not only consider which perspectives are important for their purpose in line with what content they reflect but that future personality measures are explicitly developed with the aim of capturing those insights. This could be an important step in advancing the goal-directedness and specificity of personality assessments.

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Preparation of this manuscript was supported by funding granted to Brian S. Connelly from the Canada Research Chairs Program (Title: Canada Research Chair on Integrative Perspectives on Personality, File number: 233149).

Open science statement

Supplements, study materials, analysis code, the item characteristics data set for the IPIP-NEO-120, and data sets necessary for the multilevel regression procedures can be accessed at https://osf.io/wq3jm. The preregistration is available at https://osf.io/gqmsf. The SIP³ (i.e., Sample 1) data set necessary for TRI Model estimation containing individual ratings is not publicly available because this was explicitly assured to participants in the consent form. However, the data set was uploaded to a private component on the OSF and will be made available to interested researchers upon request. Information and data for the Eugene-Springfield Community Sample is available at .

ORCID iDs

Anne Wiedenroth

Brian S. Connelly

Notes

References

Alicke

M. D.

Sedikides

(2009). Self-enhancement and self-protection: What they are and what they do. European Review of Social Psychology, 20(1), 1–48. https://doi.org/10.1080/10463280802613866

Allik

Realo

Mõttus

Borkenau

Kuppens

Hřebíčková

(2010). How people see others is different from how people see themselves: A replicable pattern across cultures. Journal of Personality and Social Psychology, 99(5), 870–882. https://doi.org/10.1037/a0020963

Bäckström

Björklund

Larsson

M. R.

(2009). Five-factor inventories have a major general factor related to social desirability which can be reduced by framing items neutrally. Journal of Research in Personality, 43(3), 335–344. https://doi.org/10.1016/j.jrp.2008.12.013

Bader

Jobst

L. J.

Moshagen

(2022). Sample size requirements for bifactor models. Structural Equation Modeling: A Multidisciplinary Journal, 29(5), 772–783. https://doi.org/10.1080/10705511.2021.2019587

Bates

Mächler

Bolker

Walker

(2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01

Baumert

Schmitt

Perugini

Johnson

Blum

Borkenau

Costantini

Denissen

J. J. A.

Fleeson

Grafton

Jayawickreme

Kurzius

MacLeod

Miller

L. C.

Read

S. J.

Roberts

Robinson

M. D.

Wood

Wrzus

Mõttus

(2017). Integrating personality structure, personality process, and personality development. European Journal of Personality, 31(5), 503–528. https://doi.org/10.1002/per.2115

Benet-Martínez

John

O. P.

(1998). Los Cincos Grandes across cultures and ethnic groups: Multitrait-multimethod analyses of the Big Five in Spanish and English. Journal of Personality and Social Psychology, 75(3), 729–750. https://doi.org/10.1037/0022-3514.75.3.729

Biesanz

J. C.

(2010). The social accuracy model of interpersonal perception: Assessing individual differences in perceptive and expressive accuracy. Multivariate Behavioral Research, 45(5), 853–885. https://doi.org/10.1080/00273171.2010.519262

Borkenau

Zaltauskas

Leising

(2009). More may be better, but there may be too much. Optimal trait level and self-enhancement bias. Journal of Personality, 77(3), 825–858. https://doi.org/10.1111/j.1467-6494.2009.00566.x

10.

Brunswik

(1956). Perception and the representative design of psychological experiments. University of California Press.

11.

Carlson

E. N.

Vazire

Oltmanns

T. F.

(2013). Self-other knowledge asymmetries in personality pathology. Journal of Personality, 81(2), 155–170. https://doi.org/10.1111/j.1467-6494.2012.00794.x

12.

Colvin

C. R.

Funder

D. C.

(1991). Predicting personality and behavior: A boundary on the acquaintanceship effect. Journal of Personality and Social Psychology, 60(6), 884–894. https://doi.org/10.1037/0022-3514.60.6.884

13.

Condon

D. M.

Wood

Mõttus

Booth

Costantini

Greiff

Johnson

Lukaszewski

Murray

Revelle

Wright

A. G. C.

Ziegler

Zimmermann

(2020). Bottom up construction of a personality taxonomy. Journal of Psychological Assessment, 36(6), 923–934. https://doi.org/10.1027/1015-5759/a000626

14.

Connelly

B. S.

McAbee

S. T.

I.-S.

Jung

C.-W.

(2022). A multirater perspective on personality and performance: An empirical examination of the trait–reputation–identity model. Journal of Applied Psychology, 107(8), 1352–1368. https://doi.org/10.1037/apl0000732

15.

Connelly

B. S.

Ones

D. S.

(2010). An other perspective on personality: Meta-analytic integration of observers' accuracy and predictive validity. Psychological Bulletin, 136(6), 1092–1122. https://doi.org/10.1037/a0021212

16.

Connolly

J. J.

Kavanagh

E. J.

Viswesvaran

(2007). The convergent validity between self and observer ratings of personality: A meta-analytic review. International Journal of Selection and Assessment, 15(1), 110–117. https://doi.org/10.1111/j.1468-2389.2007.00371.x

17.

DePaulo

B. M.

(1992). Nonverbal behavior and self-presentation. Psychological Bulletin, 111(2), 203–243. https://doi.org/10.1037/0033-2909.111.2.203

18.

Dunning

(1999). A newer look: Motivated social cognition and the schematic representation of social concepts. Psychological Inquiry, 10(1), 1–11. https://doi.org/10.1207/s15327965pli1001_1

19.

Edwards

A. L.

(1953). The relationship between the judged desirability of a trait and the probability that the trait will be endorsed. Journal of Applied Psychology, 37(2), 90–93. https://doi.org/10.1037/h0058073

20.

Farooq

(2022). Heywood cases: Possible causes and solutions. International Journal of Data Analysis Techniques and Strategies, 14(1), 79–88. https://doi.org/10.1504/IJDATS.2022.121506

21.

Funder

D. C.

(1995). On the accuracy of personality judgment: A realistic approach. Psychological Review, 102(4), 652–670. https://doi.org/10.1037/0033-295X.102.4.652

22.

Funder

D. C.

(1999). Personality judgment: A realistic approach to person perception. Academic Press.

23.

Funder

D. C.

Dobroth

K. M.

(1987). Differences between traits: Properties associated with interjudge agreement. Journal of Personality and Social Psychology, 52(2), 409–418. https://doi.org/10.1037/0022-3514.52.2.409

24.

Funder

D. C.

Sneed

C. D.

(1993). Behavioral manifestations of personality: An ecological approach to judgmental accuracy. Journal of Personality and Social Psychology, 64(3), 479–490. https://doi.org/10.1037/0022-3514.64.3.479

25.

Furr

R. M.

(2009). Personality psychology as a truly behavioural science. European Journal of Personality, 23(5), 369–401. https://doi.org/10.1002/per.724

26.

Furr

R. M.

Funder

D. C.

(2007). Behavioral observation. In Robins

R. W.

Fraley

R. C.

Krueger

R. F.

(Eds.), Handbook of research methods in personality psychology (pp. 273–291). Guilford Press.

27.

Goldberg

L. R.

(1990). An alternative “description of personality”: The Big-Five factor structure. Journal of Personality and Social Psychology, 59(6), 1216–1229. https://doi.org/10.1037/0022-3514.59.6.1216

28.

Goldberg

L. R.

(1999). A broad-bandwidth, public domain, personality inventory measuring the lower-level facets of several five-factor models. In Mervielde

Deary

De Fruyt

Ostendorf

(Eds.), Personality psychology in Europe (Vol. 7, pp. 7–28). Tilburg University Press.

29.

Goldberg

L. R.

Johnson

J. A.

Eber

H. W.

Hogan

Ashton

M. C.

Cloninger

C. R.

Gough

H. G.

(2006). The International Personality Item Pool and the future of public-domain personality measures. Journal of Research in Personality, 40(1), 84–96. https://doi.org/10.1016/j.jrp.2005.08.007

30.

Heynicke

Rau

Leising

Wessels

N. M.

Wiedenroth

(2022). Perceiver effects in person perception reflect acquiescence, positivity, and trait-specific content: Evidence from a large-scale replication study. Social Psychological and Personality Science, 13(4), 839–848. https://doi.org/10.1177/19485506211039101

31.

Hirschfeld

Brachel

R. v.

Thielsch

(2014). Selecting items for Big Five questionnaires: At what sample size do factor loadings stabilize? Journal of Research in Personality, 53, 54–63. https://doi.org/10.1016/j.jrp.2014.08.003

32.

Hofstee

W. K.

(1994). Who should own the definition of personality? European Journal of Personality, 8(3), 149–162. https://doi.org/10.1002/per.2410080302

33.

Hogan

(1996). A socioanalytic interpretation of the Five-Factor Model. In Wiggins

J. S.

(Ed.), The five-factor model of personality (pp. 163–179). Guilford.

34.

Hogan

(1998). Reinventing personality. Journal of Social and Clinical Psychology, 17(1), 1–10. https://doi.org/10.1521/jscp.1998.17.1.1

35.

Hogan

Blickle

(2013). Socioanalytic theory. In Christiansen

N. D.

Tett

R. P.

(Eds.), Handbook of personality at work (pp. 53–70). Routledge.

36.

L.-t.

Bentler

P. M.

(1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118

37.

Huelsnitz

C. O.

Neel

Human

L. J.

(2020). Accuracy in perceptions of fundamental social motives: Comparisons to perceptions of Big Five traits and associations with friendship quality. Personality and Social Psychology Bulletin, 46(1), 3–19. https://doi.org/10.1177/0146167219838546

38.

Jackson

J. J.

Wood

Bogg

Walton

K. E.

Harms

P. D.

Roberts

B. W.

(2010). What do conscientious people do? Development and validation of the behavioral indicators of conscientiousness (BIC). Journal of Research in Personality, 44(4), 501–511. https://doi.org/10.1016/j.jrp.2010.06.005

39.

John

O. P.

Robins

R. W.

(1993). Determinants of interjudge agreement on personality traits: The Big Five domains, observability, evaluativeness, and the unique perspective of the self. Journal of Personality, 61(4), 521–551. https://doi.org/10.1111/j.1467-6494.1993.tb00781.x

40.

John

O. P.

Robins

R. W.

(1994). Accuracy and bias in self-perception: Individual differences in self-enhancement and the role of narcissism. Journal of Personality and Social Psychology, 66(1), 206–219. https://doi.org/10.1037/0022-3514.66.1.206

41.

John

O. P.

Srivastava

(1999). The Big Five Trait taxonomy: History, measurement, and theoretical perspectives. In Pervin

L. A.

John

O. P.

(Eds.), Handbook of personality: Theory and research (2nd ed., pp. 102–138). New York, NY: Guildford Press.

42.

Johnson

J. A.

(2014). Measuring thirty facets of the five factor model with a 120-item public domain inventory: Development of the IPIP-NEO-120. Journal of Research in Personality, 51, 78–89. https://doi.org/10.1016/j.jrp.2014.05.003

43.

Kenny

D. A.

(1994). Interpersonal perception: A social relations analysis. Guilford Press.

44.

Kenny

D. A.

West

T. V.

(2010). Similarity and agreement in self- and other perception: A meta-analysis. Personality and Social Psychology Review: An Official Journal of the Society for Personality and Social Psychology, Inc, 14(2), 196–213. https://doi.org/10.1177/1088868309353414

45.

Kim

Di Domenico

S. I.

Connelly

B. S.

(2019). Self–other agreement in personality reports: A meta-analytic comparison of self-and informant-report means. Psychological Science, 30(1), 129–138. https://doi.org/10.1177/0956797618810000

46.

Kline

R. B.

(2015). Principles and practice of structural equation modeling (4th ed.). Guilford publications.

47.

Kluemper

D. H.

McLarty

B. D.

Bing

M. N.

(2015). Acquaintance ratings of the Big Five personality traits: Incremental validity beyond and interactive effects with self-reports in the prediction of workplace deviance. Journal of Applied Psychology, 100(1), 237–248. https://doi.org/10.1037/a0037810

48.

Leary

M. R.

Kowalski

R. M.

(1990). Impression management: A literature review and two-component model. Psychological Bulletin, 107(1), 34–47. https://doi.org/10.1037/0033-2909.107.1.34

49.

Lee

Ashton

M. C.

Pozzebon

J. A.

Visser

B. A.

Bourdage

J. S.

Ogunfowora

(2009). Similarity and assumed similarity in personality reports of well-acquainted persons. Journal of Personality and Social Psychology, 96(2), 460–472. https://doi.org/10.1037/a0014059

50.

Leising

Erbs

Fritz

(2010). The letter of recommendation effect in informant ratings of personality. Journal of Personality and Social Psychology, 98(4), 668–682. https://doi.org/10.1037/a0018771

51.

Leising

Scharloth

Lohse

Wood

(2014). What types of terms do people use when describing an individual’s personality? Psychological Science, 25(9), 1787–1794. https://doi.org/10.1177/0956797614541285

52.

Leising

Scherbaum

Locke

K. D.

Zimmermann

(2015). A model of “substance” and “evaluation” in person judgments. Journal of Research in Personality, 57, 61–71. https://doi.org/10.1016/j.jrp.2015.04.002

53.

Letzring

T. D.

Funder

D. C.

(2021). The realistic accuracy model. In Letzring

T. D.

Spain

J. S.

(Eds.), The Oxford handbook of accurate personality judgment (pp. 9–22). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190912529.013.2

54.

Letzring

T. D.

Murphy

N. A.

Allik

Beer

Zimmermann

Leising

(2021). The judgment of personality: An overview of current empirical research findings. Personality Science, 2, 1–20. https://doi.org/10.5964/ps.6043

55.

Luft

Ingham

(1955). The Johari window: A graphic model of interpersonal awareness. In Paper presented at the proceedings of the western training laboratory in group development, Los Angeles, CA. UCLA Extension Office.

56.

McAbee

S. T.

Connelly

B. S.

(2016). A multi-rater framework for studying personality: The trait-reputation-identity model. Psychological Review, 123(5), 569–591. https://doi.org/10.1037/rev0000035

57.

McCrae

R. R.

(1994). The counterpoint of personality assessment: Self-reports and observer ratings. Assessment, 1(2), 159–172. https://doi.org/10.1177/1073191194001002006

58.

McCrae

R. R.

Costa

P. T.

(1982). Self-concept and the stability of personality: Cross-sectional comparisons of self-reports and ratings. Journal of Personality and Social Psychology, 43(6), 1282–1292. https://doi.org/10.1037/0022-3514.43.6.1282

59.

McCrae

R. R.

Weiss

(2007). Observer ratings of personality. In Robins

R. W.

Fraley

R. C.

Krueger

R. F.

(Eds.), Handbook of research methods in personality psychology (pp. 259–272). The Guilford Press.

60.

Muthén

L. K.

Muthén

B. O.

(1998−2017). Mplus user’s guide. 8th ed. Muthén & Muthén.

61.

Norman

W. T.

(1963). Toward an adequate taxonomy of personality attributes: Replicated factors structure in peer nomination personality ratings. Journal of Abnormal and Social Psychology, 66(6), 574–583. https://doi.org/10.1037/h0040291

62.

Olino

T. M.

Klein

D. N.

(2015). Psychometric comparison of self- and informant-reports of personality. Assessment, 22(6), 655–664. https://doi.org/10.1177/1073191114567942

63.

Olsen

J. A.

Kenny

D. A.

(2006). Structural equation modeling with interchangeable dyads. Psychological Methods, 11(2), 127–141. https://doi.org/10.1037/1082-989X.11.2.127

64.

Osberg

T. M.

Shrauger

J. S.

(1990). The role of self-prediction in psychological assessment. In Butcher

J. N.

Spielberger

C. D.

(Eds.), Advances in personality assessment (Vol. 8, pp. 97–120): Lawrence Erlbaum.

65.

Paulhus

D. L.

Vazire

(2007). The self-report method. In Robins

R. W.

Fraley

R. C.

Krueger

R. F.

(Eds.), Handbook of research methods in personality psychology (pp. 224–239). The Guildford Press.

66.

Pytlik Zillig

L. M. P.

Hemenover

S. H.

Dienstbier

R. A.

(2002). What do we assess when we assess a Big 5 trait? A content analysis of the affective, behavioral, and cognitive processes represented in Big 5 personality inventories. Personality and Social Psychology Bulletin, 28(6), 847–858. https://doi.org/10.1177/0146167202289013

67.

Rau

Carlson

E. N.

Back

M. D.

Barranti

Gebauer

J. E.

Human

L. J.

Leising

Nestler

(2021). What is the structure of perceiver effects? On the importance of global positivity and trait-specificity across personality domains and judgment contexts. Journal of Personality and Social Psychology, 120(3), 745–764. https://doi.org/10.1037/pspp0000278, https://psycnet.apa.org/doi/10.1037/pspp0000278

68.

R Core Team . (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing. URL. https://www.R-project.org/

69.

Robins

R. W.

Tracy

J. L.

Sherman

J. W.

(2007). What kinds of methods do personality psychologists use? A survey of journal editors and editorial board members. In Robins

R. W.

Fraley

R. C.

Krueger

R. F.

(Eds.), Handbook of research methods in personality psychology (pp. 673–678). The Guilford Press.

70.

Sadler

Woody

(2003). Is who you are who you're talking to? Interpersonal style and complementarily in mixed-sex interactions. Journal of Personality and Social Psychology, 84(1), 80–96. https://doi.org/10.1037/0022-3514.84.1.80

71.

Schlenker

B. R.

(1980). Impression management: The self-concept, social identity, and interpersonal relations. Brooks/Cole.

72.

Shrout

P. E.

Fleiss

J. L.

(1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420–428. https://doi.org/10.1037/0033-2909.86.2.420

73.

Strube

M. J.

Lott

C. L.

Lê-Xuân-Hy

G. M.

Oxenberg

Deichmann

A. K.

(1986). Self-evaluation of abilities: Accurate self-assessment versus biased self-enhancement. Journal of Personality and Social Psychology, 51(1), 16–25. https://doi.org/10.1037/0022-3514.51.1.16

74.

Swann

W. B.

Jr. (1981). Self-verification processes: How we sustain our self-conceptions. Journal of Experimental Social Psychology, 17(4), 351–372. https://doi.org/10.1016/0022-1031(81)90043-3

75.

Thielmann

Hilbig

B. E.

Zettler

(2020). Seeing me, seeing you: Testing competing accounts of assumed similarity in personality judgments. Journal of Personality and Social Psychology, 118(1), 172–198. https://doi.org/10.1037/pspp0000222

76.

Thielmann

Rau

Locke

K. D.

(2023). Trait-specificity versus global positivity: A critical test of alternative sources of assumed similarity in personality judgments. Journal of Personality and Social Psychology, 124(4), 828–847. https://doi.org/10.1037/pspp0000420

77.

Vazire

(2006). Informant reports: A cheap, fast, and easy method for personality assessment. Journal of Research in Personality, 40(5), 472–481. https://doi.org/10.1016/j.jrp.2005.03.003

78.

Vazire

(2010). Who knows what about a person? The self–other knowledge asymmetry (SOKA) model. Journal of Personality and Social Psychology, 98(2), 281–300. https://doi.org/10.1037/a0017908

79.

Vazire

Mehl

M. R.

(2008). Knowing me, knowing you: The accuracy and unique predictive validity of self-ratings and other-ratings of daily behavior. Journal of Personality and Social Psychology, 95(5), 1202–1216. https://doi.org/10.1037/a0013314

80.

Watson

Hubbard

Wiese

(2000). Self–other agreement in personality and affectivity: The role of acquaintanceship, trait visibility, and assumed similarity. Journal of Personality and Social Psychology, 78(3), 546–558. https://doi.org/10.1037//0022-3514.78.3.546, https://psycnet.apa.org/doi/10.1037/0022-3514.78.3.546

81.

Wessels

N. M.

Zimmermann

Biesanz

J. C.

Leising

(2020). Differential associations of knowing and liking with accuracy and positivity bias in person perception. Journal of Personality and Social Psychology, 118(1), 149–171. https://doi.org/10.1037/pspp0000218

82.

Wilt

Revelle

(2015). Affect, behavior, cognition and desire in the Big Five: An analysis of item content and structure. European Journal of Personality, 29(4), 478–497. https://doi.org/10.1002/per.2002

What defines Traits,Reputations,and Identity? Personality item content in multi-rater judgments

Abstract

Plain language summary

Keywords

Introduction

Self-other differences in personality judgments

Information availability

Evaluative biases

Perspective-dependent trait centrality

Summary

Approaches to multi-rater perspectives on personality

The present study

Method

Sample 1: Scarborough Integrative Perspectives on Personality Project (SIP3)

Measures

Participants

Sample 2: Eugene-Springfield Community Sample (ESCS)

Analytic approach

Estimation of TRI Models

Item content and characteristics

Sample 1

Sample 2

Prediction of factor loadings

Identity Model I1b: ABCD-Content (Dummy-Coded B vs. ACD)

Results

Estimation of TRI Models

Prediction of factor loadings

Trait

Reputation

Identity

Post-hoc exploration

Discussion

Implications

Limitations and future directions

Conclusion

Footnotes

Declaration of conflicting interests

Funding

Open science statement

ORCID iDs

Notes

References

Sample 1: Scarborough Integrative Perspectives on Personality Project (SIP³)