Sage Journals: Discover world-class research

Abstract

We present the first part of a fully parametrized mathematical model of person judgment, in an attempt to streamline and better organize theory in this research area. This first part focuses on the emergence of substantive information (“cues”) about target persons that perceivers may use in forming judgments. It incorporates time as a continuous variable, and accounts for the effects of targets, situations and their interactions. It also accounts for randomness overall, between-target differences in randomness, and overlap between observation intervals. We discuss how all of this influences the rank-order similarity of targets’ substance levels in any two observation intervals, enabling predictions regarding retest reliability, inter-rater agreement, cross-situational consistency, and predictive validity. We also explain how the model relates to existing theories, and to key concepts from the person judgment literature.

Keywords

model formalisation personality situation trait

Introduction

In this paper, we present the first part of an integrative mathematical model of person judgment. Our first aim is to help streamline and better organize the vast literature in this field, by connecting previously unconnected strands of theory with one another, and by formalizing some key concepts that, so far, have mainly featured in the literature in „narrative“ (i.e., natural language) form. Our second aim is to demonstrate that, despite being fairly challenging, such formalization is actually possible and bears great potential for making personality research a more cumulative science. Among the gains to be expected from stricter formalization are a higher level of exactness and a lower level of conceptual redundancy.

In terms of content, most of the parameters in the model are not new. What is new is the strict mathematical form in which we present them. In fact, we believe that many of our colleagues in the field at least implicitly subscribe to a model that is quite similar to what we present here. What we hope to be able to add is a greater level of explicitness, conceptual clarity and parsimony.

We present this model in two different forms: The present paper introduces and explains the core components of the model in a manner that should be accessible to most readers with a background in personality research. Throughout it, we try to elucidate how the model‘s parameters relate to key concepts from the personality literature. In addition, we also offer a full mathematical account as an online Technical Supplement (Leising & Schilling, 2024). This latter version is more abstract, goes into greater mathematical detail, and only occasionally touches on specific concepts from the personality literature. Its purpose is to demonstrate that the analysis overall is mathematically sound and that all components of the model are connected within a coherent framework, leaving no room for ambiguities in terms of meaning. To make it as easy as possible to read the paper and the Technical Supplement side-by-side, we reference mathematical formulas by the same consecutive numbering in both documents. However, most of the more specific formulas from the Technical Supplement do not appear in the present paper. The Technical Supplement was peer-reviewed and approved alongside the present paper.

Most psychologists are only just beginning to discover and fully appreciate the advantages of formal scientific models over the “narrative” ones that still dominate the field: Formal models are able to capture greater complexity with greater exactness. They are better specified and thus better falsifiable. They enable a clearer organization of how research findings relate to each another, and an easier identification of contradictions, circularity, redundancy, specification gaps, and compatibility between different models (Borgstede & Eggert, 2023a, 2023b; Leising et al., 2022a; Rodgers, 2010). For all of these reasons, formal models at least have the potential to be more conducive to cumulative knowledge growth than narrative ones, which is why a stricter formalization of psychological theory has been called for many times (e.g., Glöckner & Betsch, 2011; Oberauer & Lewandowsky, 2019; Robinaugh et al., 2021; Smaldino, 2020).

There have been several prior attempts at formalizing theoretical thinking on person judgment processes. Among the most ambitious and successful of these are David Kenny‘s Weighted Average (1994) and PERSON (2004, 2019) models. Brunswik’s (1956) Lens Model has also been highly influential as a conceptual framework in this field (Back & Nestler, 2016; Funder, 1995). This paper presents the first part of a model that is supposed to jointly capture most of the key components from these earlier models and integrate them in a coherent, formalized fashion.

Note that this first part of the model only deals with the substantive bases of person judgments which are called “cues” in the Lens Model. However, the cues in the model do not necessarily relate to some joint, underlying cause variable. They simply represent “the facts” about targets that perceivers may or may not base their judgments on. Modelling what perceivers do with that information will be the subject of a follow-up paper.

If person judgments were simply a reflection of substantive reality, then the first part of the model could be directly used to make a large variety of exact predictions about judgments. But research has shown that judgments do reflect a number of strong influences apart from substantive reality (Baird et al., 2017; Heynicke et al., 2022; Rau et al., 2021; Wetzel et al., 2016). For example, judgments express the perceivers’ formal response styles (e.g., using extreme values vs. moderate values on response scales) and the perceivers’ attitudes toward the targets (for an overview, see Leising et al., under review). Also, person-descriptive items differ from one another in many systematic ways that may interact with perceiver-, target- and dyad properties in shaping judgments (e.g., Leising et al., 2014; Wessels et al., 2020). All of these influences may attenuate the influence of the actual substance on judgments. To obtain a rather complete theoretical account of person judgment processes, one needs to cover both their substantive basis (part 1) and the ways in which perceivers make of the substance in deriving their judgments (part 2).

We account for the emergence of cues in a way that is compatible with Classical Test Theory (CTT; Novick, 1966; Lord & Novick, 1968). The model goes beyond CTT, however, in that it decomposes the unobservable “true” (or better: noise-free) score into three sources of variance: target, situation, and their interaction. Thus, it captures the key components of psychology’s person-situation-debate (Kenrick & Funder, 1988; Mischel, 1968; Vazire & Sherman, 2017) and also overlaps significantly with Latent-State-Trait Theory (LST; Hintz et al., 2019; Steyer et al., 1992; Steyer et al., 1999; Steyer et al., 2015). A brief model comparison is provided further down.

We would like to encourage our readers to use the current parametrization in their own work, in order to foster the emergence of a more cumulative personality science (in which different researchers describe the same things using the same labels). We also hope to be able to showcase the enormous amount of complexity permeating this field of study, which remains hidden as long as proper formalization is avoided, and thus illustrate a likely reason why proper formalization has been avoided so far by most researchers working on these issues. Ultimately, all of this is intended to help increase the field’s appreciation for this crucial yet neglected kind of work.

Going through the model components step by step

Notation

This first part of the model accounts for the emergence of cues on which perceivers may base their judgments of targets. We translate the concept of a cue into the average level of some substance variable for a given interval. By “substance”, we simply mean an actual characteristic that the target has – at least temporarily – irrespective of whether or how that characteristic is ever perceived or used in making a judgment.

In line with CTT, we model a target’s level on a substance variable as comprising a systematic component ( $θ$ ) and random noise ( $W$ ). The systematic component is modelled as depending on the target person ( $τ$ ), the situational circumstances ( $σ$ ) at the given point in time ( $t$ ), and the interaction between target and situation ( $τ \times σ$ ). We also assume that targets may differ in how much random noise their substance levels typically contain ( $γ (τ)$ ).

As a general rule, we use a tilde (as in $\tilde{θ}$ ) to indicate all kinds of operations (i.e., averaging, but also taking variances and covariances) in which the variation is across targets ( $τ$ ) (i.e., the targets are the source of variation). We use a bar (as in $\bar{S}$ ) to symbolize averaging over time ( $t$ ). In most of what follows, such averaging is bound to a specific interval (as in $\bar{S} (τ, I, ω)$ , which stands for a target’s observable (but noisy) substance level in the time-interval $I$ ). In fact, this is the primary outcome of the current modelling effort – a cue level. The $ω$ in the argument represents the presence of random influences.

The different types of operations may also be combined, as in $\tilde{\bar{θ}} (I)$ which stands for the average target’s noise-free substance level across interval $I$ . Once a parameter is subjected to such an operation, the component of its argument across which the operation takes place necessarily vanishes from its argument. For example, the argument of $\tilde{\bar{θ}} (I)$ does not contain $τ$ anymore because it is the outcome of averaging $\bar{θ} (τ, I)$ across $τ$ .

The model’s components

Table 1 lists all model parameters along with their respective meanings. To facilitate comprehension, we introduce a specific example: Action Unit 12 (AU12) activation. In Ekman and Friesen’s (1978) Facial Action Coding System, activation of AU 12 (mainly reflecting activation of the Musculus Zygomatics Major) causes the “cheek dimples” that are commonly interpreted as “smiling”. In the present context, however, it is very important to keep in mind that we are only interested in the actual, substantive level of activation of this part of the face – whether this activation is actually perceived and possibly even interpreted (e.g., as a “smile”) by anyone is irrelevant for now.

Table 1.

Model parameters and their meanings.

Parameter	Meaning
$t$	Time
$τ$	Target
$π$	Perceiver
$σ (t)$	Situation at time t
$ω$	Unsystematic randomness
$I_{\max}$	Standard interval
$I$	Interval of observation
$\| I \|$	Length of interval $I$
$θ (τ, σ (t))$	Noiseless score of target $τ$ at time $t$
$\tilde{θ} (σ (t))$	The average target’s noiseless score for situation $σ (t)$ at time $t$
$\bar{θ} (τ, I)$	Noiseless score of target $τ$ for the interval $I$
$\tilde{\bar{θ}} (I) = \bar{\tilde{θ}} (I)$	Grand mean of the noiseless substance variable across $I$
$∆ \tilde{θ} (σ (t), I)$	$\tilde{θ} (σ (t)) - \tilde{\bar{θ}} (I)$ ; situation effect if $I = I_{\max}$
$∆ \bar{θ} (τ, I)$	$\bar{θ} (τ, I) - \tilde{\bar{θ}} (I)$ ; target effect if $I = I_{\max}$
$∆ θ (τ \times σ (t), I)$	$θ (τ, σ (t)) - ∆ \bar{θ} (τ, I) - ∆ \tilde{θ} (σ (t), I) - \tilde{\bar{θ}} (I)$ ; target by situation interaction effect if $I = I_{\max}$
$W^{τ} (t, ω)$	Cumulative noise level up to time $t$ , modelled by a Brownian motion with mean zero and variance $γ^{2} (τ) t$
$γ (τ) \sqrt{t}$	Standard deviation of the noise at time $t$ for target $τ$
$d S (τ, t, ω)$	Rate of the (noisy) substance level of target $τ$ at time $t$
${d W}^{τ} (t, ω)$	Rate of the random error for target $τ$ at time $t$ ; “white noise”
$\bar{S} (τ, I, ω)$	Observable cue level of target $τ$ for the interval $I$
$\tilde{X}, \tilde{v a r} (X), \tilde{c o v} (X, Y)$	Mean, variance and covariance taken only in targets ( $τ$ )
$E (X), V a r (X), C o v (X, Y)$	Mean, variance and covariance taken only in the noise ( $ω$ )
$\tilde{E} (X), V \tilde{a r} (X), C \tilde{o v} (X, Y), C \tilde{o v} (X, Y)$	Mean, variance, covariance and correlation taken in targets ( $τ$ ) and noise ( $ω$ ) simultaneously
$\frac{\| I \cap I^{'} \|}{\| I \| \cdot \| I^{'} \|}$	Relative overlap between intervals $I$ and $I^{'}$
$\frac{1}{N} \sum_{τ = 1}^{N} γ^{2} (τ)$	Noisyness of the substance levels in the $τ$ -population

In Table 1, we do list the π parameter, which denotes the perceiver. However, specific perceivers will not play any role in the present analysis but only become relevant later on.

In accordance with CTT, we distinguish between the observable level ( $S$ ) of a substance variable and that variable’s “true” level ( $θ$ ). The former always contains at least some perturbations caused by random noise whereas the latter does not. Because the label “true” sometimes leads to misunderstandings regarding the meaning of the $θ$ parameter, we will call it “noise-free” instead.

Figure 1 is to illustrate some of the key concepts. We look at the AU12 activations of two target persons as they pass through a number of different situations. The first target ( $τ$ ) is Tessa, the second target ( $τ^{'}$ ) is Trudy. Both Tessa and Trudy are university professors. In Figure 1, Tessa’s AU12 activation levels are displayed in blue whereas Trudy’s are displayed in red. The time ( $t$ ) axis is divided into five intervals ( $I_{1}$ to $I_{5}$ ) schematically representing five situations. For simplicity, we assume that the situation remains the same within each of these intervals (i.e., we have “one-situation intervals”). The first interval is the morning where both Tessa and Trudy get ready to go to work. The second interval is a lecture that each of them is giving. The third interval is when both of them go to their offices to read a difficult research paper. The fourth interval is another lecture. The fifth interval is a lunch break. Note that the model treats the targets’ substance levels as being independent. For example, the two targets may have identical schedules, but still live and work in different cities.

Figure 1.

Two targets’ ( $τ$ and $τ^{'}$ ) noise-free substance levels (blue: $θ (τ, σ (t))$ ; red: $θ (τ^{'}, σ (t))$ ) in five intervals ( $I_{1}$ to $I_{5}$ ), displayed as colored solid lines. For simplicity, each interval reflects the influence of just one situation ( $σ (t)$ ). The average target’s noise-free level in each interval ( $\tilde{θ} (σ (t))$ ) is represented by the solid black lines. Random changes in substance levels from moment to moment (blue: $d S (τ, σ (t), ω)$ ; red: $d S (τ^{'}, σ (t), ω)$ ) are represented by the wiggly scatter around the noise-free levels.

The solid lines in Figure 1 represent the noise-free substance levels of the two targets, and of the average target, in each of the five intervals. They are also captured by parameters $θ (τ, σ (t))$ for Tessa (blue), $θ (τ^{'}, σ (t))$ for Trudy (red), and $\tilde{θ} (σ (t))$ for the average target (black). Note that all of these expressions depend directly on $σ (t)$ , and not on the respective interval. This is because the only thing that matters about the intervals are the respective situational circumstances applying to them. For example, Interval 2 and 4 are identical in terms of situation (“lecture”), and thus produce the exact same noise-free scores for each individual target, and for the average target.

In Figure 1 we see that Trudy ( $τ^{'}$ , red) has a higher noise-free level of AU12 activation than Tessa ( $τ$ , blue) in the morning ( $I_{1}$ ). This reverses, however, when they both give their first lecture ( $I_{2}$ ). Obviously, lectures are situations in which Tessa’s AU12 gets into overdrive, whereas Trudy’s level of AU12 activation remains basically unchanged compared to when she was at home in the morning. The reading situation ( $I_{3}$ ) brings a drastic change: Trudy’s AU12 activation goes down a bit, but not remotely as much as Tessa’s, whose face now shows almost no AU12 activation anymore. During the second lecture ( $I_{4}$ ), the pattern is the exact same as in Interval 2. Finally, the pattern during the lunch break ( $I_{5}$ ) largely resembles that from the first interval, only with both Tessa and Trudy showing slightly lower AU12 activation.

One of the key tenets of personality psychology is that target persons may differ from one another “in general”, that is, across some kind of longer, relevant time frame (e.g., their entire lifespan). This time frame is represented by another parameter ( $I_{\max}$ ) which does not appear in Figure 1. It stands for the reality that one is trying to model, and from which one is drawing more limited time - samples when gathering data. Most important, $I_{\max}$ is assumed to contain the typical setup of situations, including their typical lengths, of the reality that one is interested in. Each target has an individual time-average on the substance variable for this hypothetical interval ( $\bar{θ} (τ, I_{\max})$ for Tessa and $\bar{θ} (τ^{'}, I_{m a x}$ ) for Trudy) and there is also an average across targets (“Grand Mean”) for this interval ( $\tilde{\bar{θ}} (I_{\max})$ ). To avoid clutter, these parameters are not included in Figure 1. They would be displayed as straight lines lying parallel to the x-axis, with the Grand Mean lying exactly in between the individual means of the two targets. These parameters will come into play when we describe the observable cue level as a sum of various “effects” (see next section).

Up to this point, we were only talking about the targets’ noise-free substance levels, which may never actually be observed. In Figure 1, the wiggly scatter around the noise-free levels represents random changes of the targets’ substance levels from moment to moment. The model parameters for these would be $d S (τ, σ (t), ω)$ for Tessa (blue) and $d S (τ^{'}, σ (t), ω)$ for Trudy (red). The arguments here contain an $ω$ , which stands for the presence of a random influence. The use of $d S$ in this context implies maximum unpredictability (see below). Note that, despite the noise being equally unpredictable for the two targets, its variance is obviously larger for Trudy than it is for Tessa, as indicated by the different amplitude of the respective scatters. In the model, the target-specific variance of the noise is captured by the parameter $γ^{2} (τ)$ for Tessa (blue) and $γ^{2} (τ^{'})$ for Trudy (red).

The key explanandum of part 1 of the model are the targets’ average observable levels on the substance variable for a given interval (“cues”). In the model, they are represented by the parameter $\bar{S} (τ, I, ω)$ (see next section). Both Tessa and Trudy would have an average observable level of AU12 activation for each of the five intervals (i.e., ten such levels altogether). To avoid clutter, we do not display these average levels separately in Figure 1 but introduce them in detail in the next section. Just as in CTT, the targets’ average substance levels are assumed to be the sum of their respective noise-free levels and random noise.

Putting the components together

The full mathematical account of the model may be found in the Technical Supplement (Leising & Schilling, 2024). In the following, we will only highlight those parts of it that are of immediate importance for personality research and that connect most easily to established concepts in this field.

Let us repeat:

$\tilde{\bar{θ}} (I_{\max})$ is the noise-free Grand Mean of the substance variable for interval $I_{m a x}$ .

$\tilde{θ} (σ (t))$ is the noise-free mean of the substance variable across targets for the situation at time t.

$\bar{θ} (τ, I_{\max})$ is the noise-free time-mean of the substance variable for target $τ$ and interval $I_{m a x}$ .

$θ (τ, σ (t))$ is the noise-free level of the substance variable for target $τ$ at time $t$ .

The respective parameters for the other target ( $τ^{'}$ ) would be defined accordingly. We now introduce the following differences, which will become important later on, when we start interpreting covariances:

A target-effect ( $∆ \bar{θ} (τ, I_{\max}) = \bar{θ} (τ, I_{\max}) - \tilde{\bar{θ}} (I_{\max})$ ) is the difference between a given target’s noise-free time-mean across $I_{\max}$ and the respective average of all targets’ noise-free time-means. In our example, it would be (e.g.) the difference between Tessa’s noise-free AU12 activation across $I_{m ax}$ and the average of Tessa’s and Trudy’s noise-free AU12 activation across $I_{m ax}$ . There would be exactly two such target-effects in our example, one for Tessa and one for Trudy. Generally speaking, some targets do certain things more often than the average target does them. This is a basic assumption underlying much of trait-focused personality psychology.

A situation-effect ( $∆ \tilde{θ} (σ (t), I_{\max})$ = $\tilde{θ} (σ (t)) - \tilde{\bar{θ}} (I_{\max})$ ) is the difference between the average target’s noise-free substance level for the situation present at time t and the average target’s noise-free substance level across $I_{m ax}$ . In our example, this would be the average target’s noise-free AU12 activation in a given situation (e.g., lecture) and the average target’s noise-free AU12 activation across $I_{\max}$ . There would be exactly five such situation-effects in our example, one for each of the one-situation intervals ( $I_{1}$ to $I_{5}$ ). Generally speaking, some situations evoke a reaction from the average person that is different from how the average person generally reacts. This is a basic assumption underlying much of experimental social psychology.

An interaction-effect ( $∆ θ (τ \times σ (t), I_{m a x}) = θ (τ, σ (t)) - ∆ \bar{θ} (τ, I_{\max}) - ∆ \tilde{θ} (σ (t), I_{\max}) - \tilde{\bar{θ}} (I_{\max})$ ) is what remains when subtracting the relevant target-effect, the relevant situation-effect and the grand mean from a given targets’s momentary noise-free substance level at time t. In our example, interaction-effects are responsible for the fact that $θ (τ, σ (t))$ and $θ (τ^{'}, σ (t))$ are not always equally far apart from $\tilde{θ} (σ (t))$ , and that their rank-order sometimes even reverses (e.g., during a lecture, Tessa’s AU12 activation is higher than Trudy’s, whereas the opposite is the case while the two of them are reading). Interaction-effects such as these are at the core of personality theories in which personality is conceptualized in terms of the overall pattern of how a target reacts to specific situations (most notable: Mischel & Shoda, 1995). Note that, for brevity, we will call this type of effect “interaction-effect” from here on. In our example, there would be exactly (2 targets by 4 different situations =) eight such effects.

The microscopic model

Formula (1) defines what we call the “miscroscopic model”. The term implies that this formula contains all the relevant components in their most basic form and may not be reduced any further.

d S (τ, t, ω) = θ (τ, σ (t)) d t + d W^{τ} (t, ω)

(1)

Here, the $d S (τ, t, ω)$ stands for the change rate in a target’s substance level at time t and the $d W^{τ} (t, ω)$ stands for the random component in that change rate. To model this random noise, we borrow from the field of mathematics in which the use of Brownian Motion is the standard solution for this issue (Schilling, 2021). The basic idea therein is to have maximum unpredictability of the noise, which means that the noise at any point in time may not predict the noise at any neighboring point in time, regardless of how small the time difference between these two points is. By integrating across an interval comprising several points (s) in time (between zero and t), one obtains a concrete value for the current level of noise. That value is:

W^{τ} (t, ω) = \int_{0}^{t} d W^{τ} (s, ω)

(9)

Figure 2 in the Technical Supplement further illustrates what the concept of Brownian Motion is about. Most of this is omitted here. In the present context, one only needs to know that $W^{τ} (t, ω)$ has an expectancy value of zero (exactly as in CTT) and a target-specific variance of $γ^{2} (τ) t$ . Note that the model does account for differences in how noisy the substance levels of individual targets are. This is the reason why there is a $τ$ superscript accompanying parameter $W$ . In our example (see Figure 1), Trudy ( $τ^{'}$ , red) obviously has a larger W parameter than Tessa ( $τ$ , blue), as indicated by the greater scatter of Trudy’s momentary AU12 activation changes around her respective noise-free levels.

Figure 2.

Upper half: the function $σ (t)$ giving the situation at time t, with the same “one-situation intervals” ( $I_{1}$ to $I_{5}$ ) as in Figure 1. M = Morning, L = Lecture, R = Reading, B = Break. Lower half: two partially overlapping observation intervals ( $I$ and $I^{'}$ ) cutting across the one-situation intervals from the upper half, and their overlapping interval ( $I \cap I^{'}$ ).

The variance ( $γ^{2} (τ) t$ ) of W increases with time, because the longer such a random process goes on, the more room there will be for the sum of all random changes to deviate from zero, in either direction. Note, however, that this concerns the variance of the cumulative noise only – the noise at any given point in time has the same instantaneous variance everywhere. Also, the increase of the cumulative noise will be kept in check when we average across intervals in the next step.

Observable substance levels (Cues)

In analyzing actual person judgment data, one will never have to use the microscopic model because person judgments in reality are never based on infinitely small observation intervals. Rather, the substantive base for any person judgment is an average, observable substance level for a given interval ( $I$ ) with length $| I |$ . By integrating Formula (1) over this interval, one obtains the observable cue level that is of key concern in the present paper:

\bar{S} (τ, I, ω) = \frac{1}{| I |} \int_{I} θ (τ, σ (t)) d t + \frac{1}{| I |} \int_{I} d W^{τ} (t, ω)

(10)

Recall that the bar above the $S$ on the left-hand side means that this is a mean of the variable $S$ over time. Formula (3) describes the average substance level (e.g., AU12 activation) that a given target (e.g., Tessa) has in any given interval ( $I$ ). If the substance variable was AU12 activation and the interval was the first lecture ( $I_{2}$ ), then $\bar{S} (τ, I_{2}, ω)$ would be Tessa’s average AU12 activation level during the lecture. Her students may base their judgments of how much Tessa was “smiling” (or of how “happy” Tessa seemed) during the lecture on that information. Note, however, that the same formula applies when abandoning the assumption that the situation stays the same across the entire interval. This will become relevant further below.

Due to the assumed omnipresence of random noise, a target’s observable substance levels ( $\bar{S} (τ, I, ω)$ ) will differ somewhat from their noise-free substance levels for the same interval ( $\bar{θ} (τ, I)$ ). However, the longer the observation interval becomes, the more this noise will cancel out via integration, and the more the targets’ observable substance levels will resemble their noise-free ones. In our example, a student watching Trudy give a lecture for 10 minutes would obtain a less noisy observation of her typical AU12 activation in such situations than would a student observing her for 1 minute only.

Substantive similarity

In person judgment research, we are often interested in how similar different judgments of the same target persons are. This similarity is typically assessed in terms of correlation coefficients where the target persons are the cases (e.g., Kenny, 1994). However, the same metric may be used for assessing the factor structure of a set of items, the internal consistency of a psychometric scale, or the predictive validity of some kind of measurement (e.g., of academic potential) with regard to some other kind of measurement (e.g., of academic achievement). In terms of the present model, a common basis for all of these is the substantive rank-order similarity of two different sets of observations, that is, the extent to which the substance levels of the targets in two observation intervals ( $I$ and $I^{'}$ ) resemble each other. We will assess this similarity in terms of a correlation, as well (see Formula (74) below). But before we discuss this, we first have to talk about interval overlap.

Kenny (1994) introduced overlap as a component of his Weighted Average Model. He uses his model to explain consensus, that is, the level of agreement between different perceivers judging the same targets on the same item. Accordingly, Kenny conceptualizes overlap as the extent to which different perceivers base their judgments on the same versus different information about the targets. Obviously, when this overlap becomes larger, the substantive base for the perceivers’ judgments is shared to a greater extent, which should lead to greater inter-rater agreement.

Kenny addresses the topic in terms of the proportion of “acts” that two perceivers have seen a target engage in, as opposed to acts that only one of them has observed. However, this approach entails the problem of defining what an “act” is (e.g., when it begins and when it ends). In contrast, we define overlap more directly in terms of observation intervals, thus circumventing the need for demarcating distinct acts. Note again that the term “observation interval” does not imply that any observation actually takes place. We are still dealing with substance only.

Observation intervals ( $I$ and $I^{'}$ ) may have any length, cut across any number of different situations (i.e., we now lift the “one situation per interval” constraint) and also have any degree of overlap with one another. Figure 2 illustrates this. The figure uses the exact same time axis as Figure 1 above. The one-situation-intervals ( $I_{1}$ to $I_{5}$ ) are also the same as before. However, this time the Y-axis does not represent the targets’ substance levels (as in Figure 1) but the respective situational circumstances that are present for each of these intervals (M = Morning, L = Lecture, R = Reading, B = Break). These are on a nominal scale and could thus be rearranged freely along the y-axis.

The lower half of Figure 2 displays two observation intervals ( $I$ and $I^{'}$ ) that cut freely across the one-situation intervals used in the upper half: Observation interval $I$ starts approximately in the middle of interval $I_{1}$ and ranges from there into not quite the middle of interval $I_{3}$ . Observation interval $I^{'}$ starts approximately in the middle of interval $I_{2}$ and ranges from there to a point close to the middle of interval $I_{5}$ . The overlapping interval created by this will be called $I \cap I^{'}$ from here on. Using Formula (10) and the data from Figure 1, we could now compute Tessa’s and Trudy’s average AU12 activation levels for each of these three intervals. Afterwards, we could attempt to determine how similar the rank-order of their average AU12 activation levels is in (e.g.) intervals $I$ and $I^{'}$ . This is what we will deal with next.

In the model, all observation intervals (e.g., $I$ and $I^{'}$ ) are contained in interval $I_{\max}$ and may have any degree of overlap with one another (e.g., one may be completely included in the other). This is explained in more detail in the Technical Supplement (Formulas 61 and 62). For our purposes here, we only need the ratio of the lengths of the shared observation interval ( $| I \cap I^{'} |$ ) and the product of the lengths of the two original observation intervals ( $| I | \cdot | I^{'} |$ ). Their ratio is a used as a measure of overlap and features in Formula (73), which describes substantive rank-order similarity as a covariance.

C \tilde{o v} (\bar{S} (I), \bar{S} (I^{'})) = \tilde{c o v} (\bar{θ} (I), \bar{θ} (I^{'})) + [\frac{1}{N} \sum_{τ = 1}^{N} γ^{2} (τ)] \cdot \frac{| I \cap I^{'} |}{| I | \cdot | I^{'} |}

(73)

To understand the implications of this formula, one needs to understand the difference between $C \tilde{o v} (\bar{S} (I), \bar{S} (I^{'})$ and $\tilde{cov} (\bar{θ} (I), \bar{θ} (I^{'}))$ (see Technical Supplement, Section 2). Both of these covariances assess the rank-order similarity of the targets‘ substance levels in intervals $I$ and $I^{'}$ . However, the first covariance reflects both the systematic and the random variation in the data whereas the second covariance “sees” only the systematic variation. As long as there is no overlap ( $| I \cap I^{'} | = 0$ ), the first covariance is identical to the second. However, to the extent that there is overlap, the average target’s random noise variance ( $\frac{1}{N} \sum_{τ = 1}^{N} γ^{2} (τ)$ ) will add to the first covariance because it features in both intervals (the $N$ in the formula is the number of targets that we average across).

Applied to our example, this means that the covariance between Tessa’s and Trudy’s average A12 activation levels in intervals $I$ and $I^{'}$ will increase as soon as the two intervals overlap (i.e., $| I \cap I^{'} | > 0$ ). This is because the same random perturbations affecting Tessa’s and Trudy’s respective substance levels in the overlapping interval ( $I \cap I^{'}$ ) will contribute to their average substance levels in both intervals ( $I$ and $I^{'}$ ).

To ensure comparability (e.g., across studies investigating different substance variables), we recommend using a normalized version of Formula (73) in which everything is divided by the common variance of the noisy substance levels in the two intervals. This leads to Formula (74):

C \tilde{o r r} (\bar{S} (I), \bar{S} (I^{'})) = \frac{\tilde{c o v} (\bar{θ} (I), \bar{θ} (I^{'})) + [\frac{1}{N} \sum_{τ = 1}^{N} γ^{2} (τ)] \cdot \frac{| I \cap I^{'} |}{| I | \cdot | I^{'} |}}{\sqrt{V \tilde{a r} (\bar{S} (I))} \cdot \sqrt{V \tilde{a r} (\bar{S} (I^{'}))}}

(74)

As long as there is no overlap, the denominator on the right-hand side will be larger than the numerator to the extent that targets’ substance levels do contain noise. This is because the denominator “sees” this random variation whereas the numerator does not. However, with longer intervals, this difference between numerator and denominator will become smaller, because the noise will cancel out more. The potential for this to happen will decrease with overlap. When the overlap is perfect, the substantive rank-oder similarity will be one.

Up to this point, we only discussed the role that noise may play in substantive similarity, given different levels of overlap. We will now devote some more attention to the component of the two formulas (73) and (74) that represents all the systematic influences ( $\tilde{cov} (\bar{θ} (I), \bar{θ} (I^{'}))$ ). This is the covariance of $\bar{θ} (τ, I)$ and $\bar{θ} (τ, I^{'})$ , taken in $τ$ .

In order to understand the role of this component, it is helpful to split these parameters as follows:

\bar{θ} (τ, I) = [\bar{θ} (τ, I) - \bar{θ} (τ, I_{\max})] + [\bar{θ} (τ, I_{\max}) - \tilde{\bar{θ}} (I_{\max})] + \tilde{\bar{θ}} (I_{\max})

(75)

Recall that $\tilde{\bar{θ}} (I_{\max})$ is the Grand Mean of the substance variable. The difference $\bar{θ} (τ, I_{\max}) - \tilde{\bar{θ}} (I_{\max})$ equals the aforementioned target-effect ( $∆ \bar{θ} (τ, I_{\max})$ ). For brevity, this will now be shortened to $∆ \bar{θ} .$ The difference $\bar{θ} (τ, I) - \bar{θ} (τ, I_{\max})$ contains the target-specific time average of all situation- and interaction-effects in interval $I$ . For brevity, this will now be shortened to $δ \bar{θ} (I)$ . Note that $τ$ vanishes from the argument of the covariance, due to this covariance in $τ$ (as signaled by using a tilde).

\tilde{c o v} (\bar{θ} (I), \bar{θ} (I^{'}))

(76)

= \tilde{c o v} (∆ \bar{θ} + δ \bar{θ} (I), ∆ \bar{θ} + δ \bar{θ} (I^{'}))

(77)

= \tilde{c o v} (∆ \bar{θ}, ∆ \bar{θ}) + \tilde{c o v} (δ \bar{θ} (I), δ \bar{θ} (I^{'})) + \tilde{c o v} (∆ \bar{θ}, δ \bar{θ} (I^{'})) + \tilde{c o v} (δ \bar{θ} (I), ∆ \bar{θ})

(78)

= \tilde{v a r} (∆ \bar{θ}) + \tilde{c o v} (δ \bar{θ} (I), δ \bar{θ} (I^{'})) + \tilde{c o v} (∆ \bar{θ}, δ \bar{θ} (I^{'})) + \tilde{c o v} (δ \bar{θ} (I), ∆ \bar{θ})

(79)

Thus, the overall target-covariance of the noise-free scores in the two observation intervals (

\tilde{c o v} (\bar{θ} (I), \bar{θ} (I^{'}))

) reflects several independent influences: The first contribution comes from the variance of the target-effects (

\tilde{v a r} (∆ \bar{θ}))

. The more Trudy and Tessa generally differ in their AU12 activation levels (across

I_{m ax}

), the greater substantive similarity will be.

The second contribution comes from the target-covariance of the time-averages of the sum of all situation- and interaction-effects in the two intervals ( $\tilde{c o v} (δ \bar{θ} (I), δ \bar{θ} (I^{'}))$ ). Note, however, that the situation-effects are the same for all targets. Thus, they do affect $δ \bar{θ} (τ, I)$ and $δ \bar{θ} (τ, I^{'})$ , but not their covariance in $τ$ . Accordingly, $\tilde{c o v} (δ \bar{θ} (I), δ \bar{θ} (I^{'}))$ will only reflect the interaction-effects. The sums of all interaction-effects in each interval ( $I$ and $I$ ’) will covary more strongly the more similar the respective situational line-ups in the two intervals are (situational or $σ$ -similarity; Sherman et al., 2010; see part 5 of the Technical Supplement). For example, if the interval $I$ includes the first lecture and the interval $I$ ’ includes the second lecture, substantive similarity should increase. However, substantive similarity will also be high when the situations in the two intervals are different but produce similar outcomes in terms of the targets’ substance levels.

The third and fourth contributions come from the covariances between the target-effects and the interaction-effects in each interval: $\tilde{c o v} (∆ \bar{θ}, δ \bar{θ} (I^{'}))$ and $\tilde{c o v} (δ \bar{θ} (I), ∆ \bar{θ})$ . This means that substantive rank-order similarity will be higher to the extent that the situations present in the observation intervals amplify the existing overall differences between the targets (i.e., target-effects), via target by situation interaction.

Applicability to typical research questions

Formula (74) is the primary outcome of our current analyses. It enables predictions of the correlation between the targets’ observable substance levels in any two intervals. A few more examples as to why this is relevant will be given in this section.

Substantive similarity is one of the factors contributing to the similarity of judgments. Thus, Formula (74) may be used to make predictions about all sorts of associations between different types of judgments, including inter-rater agreement. Note, however, that we still ignore all additional influences on people’s self- and other-judgments (such as perceiver attitudes), which will be the subject of a follow-up paper. When making predictions about judgments based solely on substantive similarity, one essentially treats all these other influences as noise. Note further that the same formula may be used to make predictions about the similarity of targets’ substance levels when no judgment takes place (e.g., about reaction times measured in two different intervals).

Retest reliability

A proper test of retest reliability would require repeated assessments of the same substance variable in the same targets, under identical situational circumstances, with the same interval lengths, but with no overlap between intervals. Formula (74) tells us that under these circumstances, substantive similarity will equal the proportion of systematic variance in the overall variance, which is exactly how reliability is commonly defined. Note that the systematic variance does not only include the variance of the target-effects but also the three other covariances from Formula (79). Longer intervals will mean higher reliability, because with longer intervals noise is going to cancel out more. However, the size of this effect will depend on how noisy the data is to begin with.

Cross-situational consistency

Cross-situational consistency is of particular interest to many personality psychologists because “personality” may be conceptualized as the extent to which behavioral differences between people persist across varying circumstances. This view basically equates personality with target-effects. However, the analysis just presented tells us that estimates of cross-situational consistency will also reflect additional influences: the covariance of the respective interaction-effects ( $\tilde{c o v} (δ \bar{θ} (I), δ \bar{θ} (I^{'}))$ ), and the two covariances between interaction-effects and target-effects ( $\tilde{c o v} (∆ \bar{θ}, δ \bar{θ} (I^{'}))$ and $\tilde{c o v} (δ \bar{θ} (I), ∆ \bar{θ})$ ). It might even be argued that these four influences together constitute what “personality” is about.

Inter-rater agreement

One reason why different judgments of the same targets may agree with one another is that different judges saw the targets behave similarly. According to the analysis just presented, there are several possible reasons for this to happen, all of which are not mutually exclusive: First, target-variance may be large. Second, there may be overlap, and thus: shared noise. Third, there may be similar interaction-effects in the two observation intervals. Fourth, the situations in which the targets are observed in the two intervals may amplify the existing target-variance. Note that this means there may be good inter-rater agreement based entirely on shared noise.

Validity

The term “validity” is used in a fairly unsystematic fashion throughout the psychometrics literature (e.g., Zachar & Jablensky, 2015). Here, we use it to denote associations between the targets’ observable substance levels in some (“predictor”) interval ( $I$ ) and their observable substance levels in some other (“criterion”) interval ( $I^{'}$ ) which is the interval that is of greater interest. Typically, the latter interval is larger, situationally more diverse, and located later on the timeline, but both intervals are samples from $I_{m ax}$ . With this research question, any overlap between $I$ and $I$ ’ would be undesirable because it would introduce “predictor-criterion-contamination”.

For example, one may use Trudy’s and Tessa’s average AU12 activation levels in the morning ( $I_{1}$ ) to predict their average AU12 activation levels in the three following intervals ( $I_{2}$ to $I_{4}$ ). We would expect predictive validity to be larger to the extent that there is target-variance, and to the extent that the situations present in the predictor and criterion interval amplify the differences between targets. We would also expect it to increase with the lengths of either interval, because longer observations should cancel out more of the noise.

With regard to interaction-effects, it gets a bit more tricky: If the goal is to forecast stable differences between targets in the criterion interval from their behavior in the predictor interval, then the situational setup in either interval should not elicit too much idiosyncratic behavior (i.e., rare interaction-effects). This is because such behavior would contribute to the variance but not the covariance between observations in the two intervals, which in turn would harm validity. In order to mitigate the possibly detrimental effect of such idiosyncrasies, one should rather use a relatively diverse situational setup, especially in the (usually longer) criterion interval (e.g., Borkenau et al., 2004; Wiedenroth & Leising, 2020). This way, interaction-effects should average out to some extent.

An alternative is to use situational setups for the predictor and criterion intervals that are highly similar, even if they are idiosyncratic. According to the model, doing so should lead to high validity coeffcients as well. However, this should then better be interpreted in terms of the predictor interval being a valid representation of the circumstances that one is actually interested in.

Comparison with other models

Kenny’s (1994) weighted average model

Our model covers much of the same ground that is also covered by David Kenny’s (1994) Weighted Average Model (WAM; see also the closely related PERSON model by Kenny, 2004, 2019). Specifically, versions of the acquaintance (n), overlap (q) and consistency (r1) parameters from Kenny’s WAM feature in our model, as well: Using our terminology, one would represent acquaintance as $| I |$ , overlap as $\frac{| I \cap I^{'} |}{| I | \cdot | I^{'} |}$ and cross-situational consistency in terms of Formula (74), with the additional specification that both intervals contain just one situation, and these situations are different from one another. The remaining WAM parameters (see Kenny, 1994; pages 63–65) are not addressed in the current article, because they all involve contributions attributable specifically to perceivers. These will be covered in a follow-up article.

In several respects, however, the model that we present here goes beyond WAM in terms of precision and/or generality and/or scope. We only have room here to highlight some of these:

First, we incorporate time as a continuous variable. This enables us to account for both overlap and a core aspect of “acquaintance” (i.e., interval length) objectively and in a more precise fashion than is possible when the targets’ continuous response streams have to be segmented into separate “acts”. Second, our model (specifically the part of it on which the present paper focuses) incorporates a fully formalized account of phenomena at the level of the substance, independent of whether this substance is ever used by perceivers to inform their judgments. It therefore lends itself to establishing a more direct connection to the branch of personality research that uses objective (i.e., non-judgment) data (e.g., Stachl et al., 2021), and even to making exact predictions in that realm. Third, our model does not distinguish between “categorical” information (e.g., biological sex, height, skin color, age) and “behavioral” information but accounts for all different types of substantive information about targets the same way. What is called “categorical information” in Kenny’s WAM model would just be substance variables that are relatively stable across time (i.e., variables with little noise, situation effects, and interaction-effects). Fourth, our model explicitly incorporates target-effects, situation-effects, and interaction-effects. Thus, it establishes a direct connection to key concepts from the so-called “person-situation debate”; Mischel, 1968; Kenrick & Funder, 1988). Some of these connections are discussed in more detail in the next section.

Latent state-trait theory

The present model also has significant overlap with Latent-State-Trait-Theory (LST; Steyer et al., 1992; Steyer et al., 1999; Steyer et al., 2015). Most important, both the current model and LST focus on explaining the level of a variable that is measured at some occasion and assume that person-effects, situation-effects, interaction-effects and random noise contribute to this measured level. In fact, what is called the “latent state” in LST is largely identical in meaning to the $θ (τ, σ (t))$ parameter in our model.

Whereas earlier models within the LST framework did not systematically disentangle situation- and interaction-effects, a recent model by Geiser et al. (2015) does precisely that. Even more recent models (e.g., Koch et al., 2023) do consider time as a continuous factor, but only between measurements, not within the measurement interval itself. While our model was developed independent of LST – we started from Kenny’s (1994) WAM and Brunswik’s (1956) Lens Model – the parallels are obvious and may actually be viewed as encouraging, because independent strands of theory development converge so well.

There are also a few important differences, however: First, the model we present does systematically decompose the (“latent”) noise-free substance levels into target, situation and interaction effects while also accounting for time as a continuous variable. We are not aware of any LST model so far that combines both of these properties. However, in our model, it is the length of the observation intervals rather than the spacing between them that matters. The spacing between observations will only become relevant once it is assumed that the systematic components of what is being measured (especially target- and interaction-effects) may vary over time. The present model may of course be amended in this way, though.

Second, our model allows for any composition of an observation interval in terms of situations whereas the LST models that we are aware of only handle (repeated) assessments of targets in the same situation as opposed to another situation. This feature enables our model to capture situation similarity as a gradual phenomenon (see below). Third, our model expressly covers overlap. Fourth, our model does account for target-specific levels of noisiness.

Brunswik’s (1956) lens model

The present paper describes the part of our model that captures the emergence of a target’s average substance level for a given interval ( $\bar{S} (τ, I, ω)$ ). As we pointed out above, this level plays the same role conceptually as a “cue” does in Brunswik’s (1956) Lens Model. In a follow-up paper, we will talk about how perceivers may use such average substance levels in judging targets.

Probably the main purpose of Brunswik’s model is to describe how (“proximal”) cues may be expressions of (“distal”) causes, and how judgments based on cues may be used to infer such causes. The cue levels whose emergence we model here are perfectly suited for such analyses. Specifically, if the targets’ levels on different substance variables are found to correlate with one another, this may be due to variation in a third variable that causally affects each of these substance variables. Before this possibility may be seriously considered, however, one has to rule out the alternative possibility that the levels of the different substance variables influence each other directly.

The correlations between the targets’ cue levels and their levels of an underlying cause variable – if it exists – are often called “cue validities” in analyses based on the Lens Model. What the present paper contributes to the understanding of cue validities is that there may be two relevant sources of attenuation involved in this regard: random noise, and target by situation interactions. The influence of both is likely to be reduced with longer observation intervals.

Relationships to key concepts in the personality literature

In the remainder of this paper, we will briefly touch on how some key concepts from the personality literature may either be expressed in terms of the model we introduced so far, or added to it as extensions. Our main goal in this is to showcase how formalization allows for a more precise specification of theoretical concepts that, so far, have mostly been described in terms of the natural language.

Situation characteristics

Our treatment of situations so far has been fairly simplistic, as we represented them with a single, nominally scaled variable ( $σ (t)$ ). For many research purposes, this may actually be sufficient. For example, many studies in social psychology use experimental designs that, mathematically, have the same properties. However, in certain cases it may be necessary to represent situations multidimensionally (e.g., Rauthmann et al., 2014) in terms of several properties ( $σ_{1} (t)$ , $σ_{2} (t)$ etc.) that may vary continuously (and even randomly) over time, and with varying degrees of independence from one another.

Person-, situation- and interaction-effects

Obviously, our model is very closely connected to key concepts from the “person-situation debate” in psychology (Kenrick & Funder, 1988; Mischel, 1968). In fact, the $∆ \bar{θ} (τ, I_{\max})$ parameter is the person-effect (better: “target-effect”, because perceivers, which will come to interest us only later, are persons, too), the $∆ \tilde{θ} (σ (t), I_{\max})$ parameter is the situation effect, and the $∆ θ (τ \times σ (t), I_{\max})$ parameter is the target by situation interaction-effect. Comparing the proportions of the overall substance variance that are accounted for by each of these parameters will be directly informative regarding the relative influence of persons, situations, and their interactions. This, however, requires controlling for random error, which may only be separated from the other parameters by having the same targets go through the same setup of situations a second time. Note that, whereas our model is applicable to both experimental and naturalistic designs, it does not explicitly account for differences in the sets of situations that individual targets typically encounter (“situation contact”; Rauthmann, Sherman, Nave & Funder, 2015).

Situation similarity

Our model does capture the idea that the situations in which targets are observed may be more or less similar to one another, and that this similarity may affect the substantive similarity of the targets’ behavior (Sherman et al., 2010). As we conceptualize it, situation similarity is a property of a pair of observation intervals that are being compared in this regard. If one describes the situation using a single nominal scale (as we did in the present paper), situation similarity may simply be assessed as the percentage of two observation intervals in which sigma has the same value.

Intraindividual variability

For decades, psychologists have been interested in how stable people’s behavior and experiences are across time and situations (Vazire & Sherman, 2017), including the extent to which there are systematic differences between targets in this stability (e.g., Fleeson, 2001). Note again that this concerns the targets’ momentary substance levels, which are the primary focus of the present paper. Research has often used person judgments as a stand-in for these substance levels, but that approach may fall short of its purpose because the variability of interest may not be disentangled from the perceivers’ variability in responding to items (e.g., Baird et al., 2017). Our model also clarifies that there are two possible sources of substantive intra-individual variability (i.e., target-specific noise and interaction-effects) that may and should probably be studied separately, because otherwise one would lump together two components with entirely different meanings.

Personality

Our model connects to several different ways in which the term “personality” is commonly used. The first approach is to simply think of personality in terms of stable differences between targets. Using this approach, a target’s personality may be described for every individual substance variable, as the target-effect. The second approach focuses on interaction-effects instead. Here, a target’s personality is described in terms of the pattern of if-then relationships between specific situational circumstances and the target’s respective response tendencies. This is the approach proposed by Mischel and Shoda (1995), whose concept of a “personality signature” basically comprises a given target’s set of interaction-effects across a number of different situations. Finally, the term “personality” may be used to denote measurable or entirely hypothetical substance variables that causally account for co-variation among cues. This “reflective” idea of personality (Schmittmann et al., 2013) informs most personality researchers’ use of Brunswik’s (1956) Lens Model, and is also consistent with Funder’s (1995) conceptualization. Note that, in contrast, Kenny’s (1994, 2004, 2019) approach is purely “formative” in nature, in that it is only concerned with how perceivers use the information that they receive about targets. In his PERSON model (Kenny, 2004), personality (P) is the judgment that the average perceiver would make if he or she had access to all the relevant cue information.

Strong versus weak persons/situations

The person-situation literature prominently features the idea that both situations and targets may be distinguished from one another in how “strong” versus “weak” they are (e.g., Schmitt et al., 2013). Note that this terminology is not meant to imply any evaluation (e.g., strong being somehow “better” than weak), but only to compare situations with one another, and targets with one another, in terms of how pervasive their respective influences are on substance levels. There is no binding convention as to how these terms are to be used. We suggest the following (Leising & Müller-Plath, 2009): A target ( $τ$ ) would be considered “strong” (“weak”) to the extent that – averaging across time – her noise-free, momentary substance levels ( $θ (τ, I)$ ) resemble her noise-free substance level overall ( $\bar{θ} (τ, I_{\max})$ ) more (less) than they resemble the average target’s noise-free substance level in the same situation ( $\tilde{θ} (σ (t))$ ). Using this approach, Trudy ( $τ^{'}$ , red) is a stronger target than Tessa ( $τ$ , blue) (see Figure 1). Strong (weak) situations may be defined analogously.

Good versus bad targets

“Good targets” are targets that are easy to judge (Funder, 1995). According to our model, there are actually two ways in which one may be a good target. First, being a good target could mean that there is little random variation in a target’s behavior stream. Using this understanding, Tessa ( $τ$ , blue) would be a better target than Trudy ( $τ^{'}$ , red) because Tessa’s $γ^{2} (τ)$ parameter is smaller and thus her observable substance levels in each individual situation will resemble her noise-free levels in those situations more (see Figure 1). Second, being a good target may also be interpreted in terms of how easy it is to infer a target’s noise-free substance level overall. Using this understanding, Trudy would be the better target because her noise-free substance levels in individual situations deviate less strongly from her overall substance level. Note that the latter understanding (being true to one’s overall level) is basically identical to what it means to be a “strong person” (see above).

Good versus bad information

Good Information is one of the key moderators of judgment accuracy in Funder’s (1995) Realistic Accuracy Model. However, Funder himself states that the term covers at least two largely distinct concepts: First, information will become “better” with longer observation intervals. This will be the case because longer intervals will result in more cancelling out of random noise, but only to the extent that the substance levels actually are noisy. Longer intervals (which likely, but not necessarily, go along with a more diverse setup of situations) may also be advantageous because interaction-effects will cancel out more. Second, some situations may be more “diagnostic” than others (Borkenau et al., 2004) in that the targets’ substance levels in them predicts the same targets’ substance levels in the criterion interval particularly well.

Acquaintance

The extent to which perceivers know their targets has played a prominent role in the personality judgment literature (e.g., as a predictor of inter-rater agreement; Biesanz et al., 2007; Funder & Colvin, 1988). Interval length ( $| I |$ ) clearly represents an important aspect of this concept, as greater acquaintance will usually imply having spent more time observing a target. Given some level of random-noise, this should lead to greater reliability and validity. However, the acquaintance concept is ambiguous, because being better acquainted with a target may also mean (a) being able to observe the target in different (e.g., more intimate) situations, and (b) being able to communicate more with the target (e.g., about his or her subjective experiences) (Kenny, 1994).

Limitations

The model as we present it here uses a number of constraints, in order to keep complexity within manageable bounds: First, we assume that the noise-free substance components are stable over time. That is, target-effects do not change, situation-effects do not change, and interaction-effects to do not change, either. In reality, however, it is quite likely that changes in all of these parameters do occur. This may be accounted for by adding more parameters to the model (e.g., ones assessing overall developmental trends with age, and/or ones assessing decreasing predictability). There are now variants of LST that do account for this additional layer of complexity (Koch et al., 2023).

Second, our model (like CTT and LST) operates on the assumption that substance variables have unlimited variability without lower or upper bounds. In reality, however, most the respective continua do have such bounds and this may have important consequences for model accuracy. Most important, target and situation effects will come to limit each others’ influences under these more realistic assumptions (Blum & Schmitt, 2018; Schmitt et al., 2013). For example, when a situation already induces very high levels of anxiety, target-effect can do little to increase that level even further. The model may be amended to reflect these considerations, but this will make subsequent derivations more complicated. The question of how much model complexity is actually needed in this regard is ultimately an empirical one.

Third, in the present paper, we introduced our model using a single substance variable. However, the model may just as well be applied to several substance variables at once, since $S$ may well be a vector, rather than a scalar. This is relevant, for example, with regard to the analysis of whole profiles of personality judgments (i.e., judgments of targets by perceivers on many different items; Biesanz, 2010; Furr, 2008) and will be addressed in a follow-up paper.

Conclusion and outlook

We hope that the presentation so far has shown just how much complexity needs to be accounted for when trying to conceptualize person judgment processes in a strict, formal way. Despite this somewhat daunting complexity, we also hope that the presentation has shown how much clarity and parsimony may be gained by attempting a stricter formalization: A single model comprising a relatively limited set of parameters may go a long way in expressing all sorts of important but previously unconnected theoretical ideas, and connecting them in a seamless manner. The same model enables a large number of exact predictions pertaining to actual behavior (substance) and – by extension – to judgments of behavior. Note again, however, that using this first part of the model to make predictions about judgment data means ignoring all of the additional influences that are also reflected by such data (e.g., response styles and perceiver atttitudes).

We explicitly recommend using the terminology and parametrization presented here in theoretical writing about person judgment. Our hope is this that doing so will help improve on conceptual clarity and thus on the efficiency of scientific communication in this field. Attaining the same level of precision with the more traditional “narrative” writing seems impossible to us.

Of course, it is not necessary to operationalize each and every parameter in every empirical study. But a model like the one presented here may nevertheless be useful because it explicates all the factors that are likely to have an influence on the data. This enables researchers to plan their studies more systematically (e.g., by even only thinking about influences that would previously have been overlooked; by letting the factors they do not measure vary randomly). Doing so may then lead to more valid (e.g., representative) conclusions.

In order to motivate more psychologists to get involved with this crucial but highly demanding kind of academic work, it may be necessary to find ways of incentivizing it better (e.g., Leising et al., 2022a, 2022b). It will also be helpful to turn to colleagues who already possess the required expertise, invite them to collaborate, and learn from them.

Supplemental Material

Supplemental Material - A mathematical model of person judgment

Supplemental Material for A mathematical model of person judgment by Daniel Leising and René L Schilling in Personality Science.

Footnotes

Author note

Not applicable.

Acknowledgements

Not applicable.

Author contributions

Daniel Leising: Conceptualization; Formal analysis; Writing – original draft; Writing – review & editing.

René L. Schilling: Conceptualization; Formal analysis; Writing – review & editing.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

Corrections (May 2025):

This article has been updated with clarifications to the affiliations, Acknowledgments, and Author contributions, as well as with minor grammatical corrections in the text.

ORCID iD

Daniel Leising

Data accessibility statement

Data sharing not applicable to this article as no datasets were generated or analyzed during the current study.

Supplemental material

Supplemental material for this article is available online. Depending on the article type, these usually include a Transparency Checklist, a Transparent Peer Review File, and optional materials from the authors.

Note

Not applicable.

References

Back

M. D.

Nestler

(2016). Accuracy of judging personality. In Hall

J. A.

Mast

M. S.

West

T. V.

(Eds.), The social psychology of perceiving others accurately (pp. 98–124). Cambridge University Press. https://doi.org/10.1017/CBO9781316181959.005

Baird

B. M.

Lucas

R. E.

Donnellan

M. B.

(2017). The role of response styles in the assessment of intraindividual personality variability. Journal of Research in Personality, 69, 170–179. https://doi.org/10.1016/j.jrp.2016.06.015

Biesanz

J. C.

(2010). The Social Accuracy Model of interpersonal perception: Assessing individual differences in perceptive and expressive accuracy. Multivariate Behavioral Research, 45(5), 853–885. https://doi.org/10.1080/00273171.2010.519262

Biesanz

J. C.

West

S. G.

Millevoi

(2007). What do you learn about someone over time? The relationship between length of acquaintance and consensus and self-other agreement in judgments of personality. Journal of Personality and Social Psychology, 92(1), 119–135. https://doi.org/10.1037/0022-3514.92.1.119

Blum

G. S.

Schmitt

(2018). The nonlinear interaction of person and situation (NIPS) model and its values for a psychology of situations. In Funder

D. C.

Rauthmann

J. F.

Sherman

R. A.

(Eds.), Oxford handbook of psychological situations. Oxford University Press.

Borgstede

Eggert

(2023a). Squaring the circle: From latent variables to theory-based measurement. Theory & Psychology, 33(1), 118–137. https://doi.org/10.1177/09593543221127985

Borgstede

Eggert

(2023b). Meaningful measurement requires substantive formal theory. Theory & Psychology, 33(1), 153–159. https://doi.org/10.1177/09593543221139811

Borkenau

Mauer

Riemann

Spinath

F. M.

Angleitner

(2004). Thin slices of behavior as cues of personality and intelligence. Journal of Personality and Social Psychology, 86(4), 599–614. https://doi.org/10.1037/0022-3514.86.4.599

Brunswik

(1956). Perception and the representative design of psychological experiments. University of California Press.

10.

Ekman

Friesen

W. V.

(1978). Facial action coding System: A technique for the measurement of facial movement. Consulting Psychologists Press.

11.

Fleeson

(2001). Toward a structure- and process-integrated view of personality: Traits as density distribution of states. Journal of Personality and Social Psychology, 80(6), 1011–1027. https://doi.org/10.1037/0022-3514.80.6.1011

12.

Funder

D. C.

(1995). On the accuracy of personality judgment: A realistic approach. Psychological Review, 102(4), 652–670. https://doi.org/10.1037/0033-295X.102.4.652

13.

Funder

D. C.

Colvin

C. R.

(1988). Friends and strangers: Acquaintanceship, agreement, and the accuracy of personality judgment. Journal of Personality and Social Psychology, 55(1), 149–158. https://doi.org/10.1037/0022-3514.55.1.149

14.

Furr

R. M.

(2008). A framework for profile similarity: Integrating similarity, normativeness, and distinctiveness. Journal of Personality, 76(5), 1267–1316. https://doi.org/10.1111/j.1467-6494.2008.00521.x

15.

Geiser

Litson

Bishop

Keller

B. T.

Burns

G. L.

Servera

Shiffman

(2015). Analyzing person, situation and person × situation interaction effects: Latent state-trait models for the combination of random and fixed situations. Psychological Methods, 20(2), 165–192. https://doi.org/10.1037/met0000026

16.

Glöckner

Betsch

(2011). The empirical content of theories in judgment and decision making: Shortcomings and remedies. Judgment and Decision Making, 6(8), 711–721. https://doi.org/10.1017/s1930297500004149

17.

Heynicke

Rau

Leising

Wessels

Wiedenroth

(2022). Perceiver effects in person perception reflect acquiescence, positivity, and trait-specific content: Evidence from a large-scale replication study. Social Psychological and Personality Science, 13(4), 839–848. https://doi.org/10.1177/19485506211039101

18.

Hintz

Geiser

Shiffman

(2019). A latent state–trait model for analyzing states, traits, situations, method effects, and their interactions. Journal of Personality, 87(3), 434–454. https://doi.org/10.1111/jopy.12400

19.

Kenny

D. A.

(1994). Interpersonal perception. A social relations analysis. The Guilford Press.

20.

Kenny

D. A.

(2004). PERSON: A general model of interpersonal perception. Personality and Social Psychology Review: An Official Journal of the Society for Personality and Social Psychology, Inc, 8(3), 265–280. https://doi.org/10.1207/s15327957pspr0803_3

21.

Kenny

D. A.

(2019). Interpersonal perception. The foundation of social relationships. The Guilford Press.

22.

Kenrick

D. T.

Funder

D. C.

(1988). Profiting from controversy: Lessons from the person-situation debate. American Psychologist, 43(1), 23–34. https://doi.org/10.1037/0003-066X.43.1.23

23.

Koch

Voelkle

M. C.

Driver

C. C.

(2023) Analyzing longitudinal multirater data with individually varying time intervals. Structural Equation Modeling: A Multidisciplinary Journal, 30(1), 86–104. https://doi.org/10.1080/10705511.2022.2096612, https://www.tandfonline.com/doi/full/10.1080/10705511.2022.2096612

24.

Leising

Müller-Plath

(2009). Person-situation integration in research on personality problems. Journal of Research in Personality, 43(2), 218–227. https://doi.org/10.1016/j.jrp.2009.01.017

25.

Leising

Scharloth

Lohse

Wood

(2014). What types of terms do people use when describing an individual’s personality? Psychological Science, 25(9), 1787–1794. https://doi.org/10.1177/0956797614541285

26.

Leising

Schilling

R. L.

(2024, October 4). Mathematical model of person judgment: Part 1 (cue emergence). https://doi.org/10.31234/osf.io/myua4

27.

Leising

Thielmann

Glöckner

Gärtner

Schönbrodt

F. D.

(2022a). Ten steps toward a better personality science - how quality may be rewarded more in research evaluation. Personality Science, 3(1), 1–44. https://doi.org/10.5964/ps.6029

28.

Leising

Thielmann

Glöckner

Gärtner

Schönbrodt

(2022b). Ten steps toward a better personality science – a rejoinder to the comments. Personality Science, 3(1), 1–15. https://doi.org/10.5964/ps.7961

29.

Lord

F. M.

Novick

M. R.

(1968). Statistical theories of mental test scores. Addison-Wesley Publishing Company.

30.

Mischel

(1968). Personality and assessment. Wiley.

31.

Mischel

Shoda

(1995). A cognitive-affective system theory of personality: Reconceptualizing situations, dispositions, dynamics, and invariance in personality structure. Psychological Review, 102(2), 246–268. https://doi.org/10.1037/0033-295X.102.2.246

32.

Novick

M. R.

(1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18. https://doi.org/10.1016/0022-2496(66)90002-2

33.

Oberauer

Lewandowsky

(2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Review, 26(5), 1596–1618. https://doi.org/10.3758/s13423-019-01645-2

34.

Rau

Carlson

E. N.

Back

M. D.

Barranti

Gebauer

J. E.

Human

L. J.

Leising

Nestler

(2021). What is the structure of perceiver effects? On the importance of global positivity and trait-specificity across personality domains and judgment contexts. Journal of Personality and Social Psychology, 120(3), 745–764. https://doi.org/10.1037/pspp0000278

35.

Rauthmann

J. F.

Gallardo-Pujol

Guillaume

E. M.

Todd

Nave

C. S.

Sherman

R. A.

Ziegler

Jones

A. B.

Funder

D. C.

(2014). The situational eight DIAMONDS: A taxonomy of major dimensions of situation characteristics. Journal of Personality and Social Psychology, 107(4), 677–718. https://doi.org/10.1037/a0037250

36.

Rauthmann

J. F.

Sherman

R. A.

Funder

D. C.

(2015). Principles of situation research: Towards a better understanding of psychological situations. European Journal of Personality, 29(3), 363–381. https://doi.org/10.1002/per.1994

37.

Robinaugh

D. J.

Haslbeck

J. M. B.

Ryan

Fried

E. I.

Waldorp

L. J.

(2021). Invisible hands and fine calipers: A call to use formal theory as a toolkit for theory construction. Perspectives on Psychological Science: A Journal of the Association for Psychological Science, 16(4), 725–743. https://doi.org/10.1177/1745691620974697

38.

Rodgers

J. L.

(2010). The epistemology of mathematical and statistical modeling: A quiet methodological revolution. American Psychologist, 65(1), 1–12. https://doi.org/10.1037/a0018326

39.

Schilling

R. L.

(2021). Brownian motion (3rd ed.). De Gruyter.

40.

Schmitt

Gollwitzer

Baumert

Blum

Gschwendner

Hofmann

Rothmund

(2013). Proposal of a nonlinear interaction of person and situation (NIPS) model. Frontiers in Psychology, 4, 499. https://doi.org/10.3389/fpsyg.2013.00499

41.

Schmittmann

V. D.

Cramer

A. O. J.

Waldorp

L. J.

Epskamp

Kievit

R. A.

Borsboom

(2013). Deconstructing the construct: A network perspective on psychological phenomena. New Ideas in Psychology, 31(1), 43–53. https://doi.org/10.1016/j.newideapsych.2011.02.007

42.

Sherman

R. A.

Nave

C. S.

Funder

D. C.

(2010). Situational similarity and personality predict behavioral consistency. Journal of Personality and Social Psychology, 99(2), 330–343. https://doi.org/10.1037/a0019796

43.

Smaldino

P. E.

(2020). How to translate a verbal theory into a formal model. Social Psychology, 51(4), 207–218. https://doi.org/10.1027/1864-9335/a000425

44.

Stachl

Boyd

R. L.

Horstmann

K. T.

Khambatta

Matz

S. C.

Harari

G. M.

(2021). Computational personality assessment. Personality Science, 2(1), 1–22. https://doi.org/10.5964/ps.6115

45.

Steyer

Ferring

Schmitt

M. J.

(1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8(2), 79–98.

46.

Steyer

Mayer

Geiser

Cole

D. A.

(2015). A theory of states and traits – revised. Annual Review of Clinical Psychology, 11(1), 71–98. https://doi.org/10.1146/annurev-clinpsy-032813-153719

47.

Steyer

Schmitt

Eid

(1999). Latent state–trait theory and research in personality and individual differences. European Journal of Personality, 13(5), 389–408. https://doi.org/10.1002/(SICI)1099-0984(199909/10)13:5<389::AID-PER361>3.0.CO;2-A

48.

Vazire

Sherman

R. A.

(2017). Introduction to the special issue on within-person variability in personality. Journal of Research in Personality, 69, 1–3. https://doi.org/10.1016/j.jrp.2017.07.004

49.

Wessels

N. M.

Zimmermann

Biesanz

J. C.

Leising

(2020). Differential associations of knowing and liking with accuracy and positivity bias in person perception. Journal of Personality and Social Psychology, 118(1), 149–171. https://doi.org/10.1037/pspp0000218

50.

Wetzel

Lüdtke

Zettler

Böhnke

J. R.

(2016). The stability of extreme response style and acquiescence over 8 years. Assessment, 23(3), 279–291. https://doi.org/10.1177/1073191115583714

51.

Wiedenroth

Leising

(2020). The more the better – but more of which? Information quantity and shared meaning as predictors of consistency and accuracy in person judgment. Journal of Research in Personality, 87, Article 103968. https://doi.org/10.1016/j.jrp.2020.103968

52.

Zachar

Jablensky

(2015). Introduction: The concept of validation on psychiatry and psychology. In Zachar

Stoyanov

D. S.

Aragona

Jablensky

(Eds.), Alternative perspectives on psychiatric validation (pp. 3–24). Oxford University Press.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.35 MB

A mathematical model of person judgment part 1: Cue emergence

Abstract

Keywords

Introduction

Going through the model components step by step

Notation

The model’s components

Putting the components together

The microscopic model

Observable substance levels (Cues)

Substantive similarity

Applicability to typical research questions

Retest reliability

Cross-situational consistency

Inter-rater agreement

Validity

Comparison with other models

Kenny’s (1994) weighted average model

Latent state-trait theory

Brunswik’s (1956) lens model

Relationships to key concepts in the personality literature

Situation characteristics

Person-, situation- and interaction-effects

Situation similarity

Intraindividual variability

Personality

Strong versus weak persons/situations

Good versus bad targets

Good versus bad information

Acquaintance

Limitations

Conclusion and outlook

Supplemental Material

Supplemental Material - A mathematical model of person judgment

Footnotes

Author note

Acknowledgements

Author contributions

Declaration of conflicting interests

Funding

Corrections (May 2025):

ORCID iD

Data accessibility statement

Supplemental material

Note

References

Supplementary Material