Sage Journals: Discover world-class research

Abstract

This article reconceptualizes reliability as a theorem derived from the projection geometry of Hilbert space rather than an assumption of classical test theory. Within this framework, the true score is defined as the conditional expectation $E (X ∣ G)$ , representing the orthogonal projection of the observed score onto the σ-algebra of the latent variable. Reliability, expressed as $Rel (X) = Var [E (X ∣ G)] / Var (X)$ , quantifies the efficiency of this projection—the squared cosine between $X$ and its true-score projection. This formulation unifies reliability with regression $R^{2}$ , factor-analytic communality, and predictive accuracy in stochastic models. The operator-theoretic perspective clarifies that measurement error corresponds to the orthogonal complement of the projection, and reliability reflects the alignment between observed and latent scores. Numerical examples and measure-theoretic proofs illustrate the framework’s generality. The approach provides a rigorous mathematical foundation for reliability, connecting psychometric theory with modern statistical and geometric analysis.

Keywords

reliability Hilbert space conditional expectation projection theorem psychometrics operator theory

In practice, researchers estimate reliability using test–retest, alternate forms, or rater-agreement designs, although single administrations remain most common. Despite its ubiquity, definitions of the true score are rarely made explicit, obscuring differences among competing formulations of test theory or the fact that multiple versions exist. Confusion persists in textbooks and empirical studies regarding true scores, error scores, and the reliability of measurement data. Although several authors have addressed these ambiguities (e.g., Kroc & Zumbo, 2020; Raykov & Marcoulides, 2011; Sijtsma, 2009; Sijtsma & Pfadt, 2021; Zumbo & Kroc, 2019; Zumbo & Rupp, 2004), they continue to hinder both the interpretation of reliability and the selection of appropriate indices. Among book-length treatments, Raykov and Marcoulides (2011, Chapter 5) provide one of the clearest clarifications of these misconceptions.

Reliability is a foundational concept in psychometric theory, yet its formal definition remains fragmented across different frameworks. Classical test theory (CTT) defines reliability as the ratio of true-score variance to observed-score variance. Despite its ubiquity, this definition often lacks a rigorous mathematical basis and is treated as an assumption rather than a theorem.

Recent developments in operator-theoretic test theory provide a modern foundation for reliability grounded in Hilbert space geometry. Within this framework, the true score is the orthogonal projection of the observed score onto the subspace of random variables measurable with respect to the latent construct. Reliability then emerges as the squared cosine between the observed and true scores, or equivalently, the Rayleigh quotient associated with the projection operator.

This article extends Zimmerman and Zumbo’s (2001) operator-theoretic approach by establishing reliability as a corollary of the projection theorem in $L^{2}$ . It demonstrates that the conditional expectation operator provides a rigorous realization of the true-score function and unifies reliability with related concepts such as regression $R^{2}$ , factor-analytic communality, and predictability in dynamic models.

Beyond its mathematical reformulation, this framework clarifies the interpretation of reliability in both theoretical and applied contexts. It underscores that reliability is not merely a property of test scores but an intrinsic feature of the projection geometry linking observed and true scores.

Purpose and Structure of the Paper

It has been nearly 60 years since Zimmerman initiated this line of inquiry into test theory, which resulted in the contemporary operator-theoretic test theory (e.g., Williams & Zimmerman, 1977; Zimmerman, 1969a, 1969b, 1969c, 1970, 1972, 1975, 1976; Zimmerman et al., 1968), much of it published in this journal. In light of this history, the purpose of the present article is to revisit, extend, and refine foundational psychometric concepts of reliability for contemporary use in an operator-theoretic test theory formalization. These concepts will recur throughout the exposition, their meaning enriched as methodological and theoretical components are introduced. The repetition of key terms and concepts reflects their evolving role within the developing framework.

The purpose of this article is to formalize reliability as a theorem within an operator-theoretic framework, establishing conditional expectation as the orthogonal projection in $L^{2}$ that defines the true-score function. This reformulation situates CTT within the geometry of Hilbert space, providing a unified basis for reliability, regression $R^{2}$ , factor-analytic communality, and predictability in stochastic processes.

The article proceeds as follows. The section Preliminary Remarks: Definitions and Levels of Abstraction in Mental Test Theory sets the stage for the article by providing key definitions and the description of the development of test theory by progressively more abstract formalisms. The section Hilbert Space Framework and Conditional Expectation introduces the Hilbert space foundation and demonstrates that conditional expectation functions as an orthogonal projection operator. The section Variance Decomposition and Reliability as Projection develops the variance decomposition central to CTT and shows how reliability emerges as the squared cosine between true and observed scores. The section Extensions to Regression, Factor Models, and Time Series generalizes this result across models that share the projection structure. The section Numerical Illustrations provides examples that clarify the computational and interpretive aspects of the theory. The section Implications for Estimation and Measurement discusses the conceptual consequences for psychometric practice. The final section, Conclusion and Future Directions, highlights potential extensions of operator-theoretic methods in measurement theory.

Technical details appear in two appendices. Appendix A presents a worked numerical example, and Appendix B provides a measure-theoretic construction of conditional expectation using the Radon–Nikodým theorem.

Preliminary Remarks: Definitions and Levels of Abstraction in Mental Test Theory

The development of mental test theory reflects a progression from concrete score formulations to increasingly abstract mathematical representations. Early models emphasized observable scores and empirical reliability coefficients, whereas later frameworks expressed reliability and validity in algebraic and probabilistic terms. Contemporary formulations extend these ideas into geometric and operator-theoretic spaces, where random variables are treated as elements of $L^{2}$ and expectations as projection operators. This hierarchy of abstraction clarifies that CTT, regression, and factor analysis all arise as special cases of a broader mathematical structure governing measurement, variance decomposition, and prediction.

Early contributions by Spearman (1904) and Yule established the foundation, later synthesized by Gulliksen (1950). However, these early approaches lacked formal rigor. A major advance came with the formalization of CTT by Guttman (1945), Cronbach et al. (1963), Novick (1966), Rozeboom (1966), and Lord and Novick (1968). These authors explicitly defined observed, true, and error scores as random variables with well-specified properties expressed through variances, covariances, and correlations. As Raykov and Marcoulides (2011) noted, CTT dominated test development across the educational, behavioral, and social sciences for most of the 20th century, though not without controversy.

Gulliksen (1950), Novick (1966), Novick and Lewis (1967), and Lord and Novick (1968) codified reliability within this axiomatic framework. By contrast, Zimmerman (1975), Steyer (1988, 1989), and Zimmerman and Zumbo (2001) introduced measure-theoretic and operator-theoretic alternatives that treat reliability as a provable consequence of conditional expectation and orthogonal projection in Hilbert space. Conditional expectation is central in both probability and statistics (e.g., Athreya & Lahiri, 2006; Billingsley, 1995; Durrett, 2010; Steyer & Nagel, 2017) and plays an equally central role in psychometric theory (Steyer, 1988, 1989; Zimmerman, 1975; Zimmerman and Zumbo, 2001). As Kroc and Zumbo (2020, p. 6) observed, conditional expectation determines the measurement error structure that specifies which sample units are exchangeable across models. Zimmerman (1975) advanced a measure-theoretic definition of the true score as a conditional expectation, extending earlier treatments by Gulliksen (1950) and Novick (1966). Zimmerman and Zumbo (2001) subsequently formalized this approach in an operator-theoretic framework that extends beyond CTT.

Definitions of Reliability

Although Zimmerman (1975), Steyer (1988, 1989), and Zimmerman and Zumbo (2001) provide mathematically rigorous formalizations of test theory, they do not invalidate earlier formulations such as those of Novick (1966) or Lord and Novick (1968). Rather, they extend and complement them, showing how distinct mathematical frameworks—axiomatic, measure-theoretic, and operator-theoretic—yield different but coherent perspectives on test theory and reliability. Across these perspectives, reliability has been conceptualized as replication, as axioms, as variance decomposition, or as projection. Coefficients such as Cronbach’s (1951) $α$ , Armor’s (1974) $θ$ , or McDonald’s (1999) $ω$ are not inherently measures of reliability but descriptive summaries of variance in a covariance structure. Their interpretation as reliability requires a theoretical framework—whether CTT, latent-variable models, or Hilbert-space formulations—that defines true and error variance. Without such grounding, these coefficients are descriptive only; within theory, they become measures of reliability tied to specific assumptions. Reliability is therefore best understood not as a single mathematical entity but as a psychometric construct whose meaning depends on the theoretical assumptions that define it.

Definition of CTT

Following Zimmerman and Zumbo (2001), Zumbo and Kroc (2019), and Kroc and Zumbo (2020), CTT is not an empirical model but a formal structure defined by

X = T + E, where T = E [X ∣ σ (f)],

where X denotes the observed score, $T$ the true score (the conditional expectation of $X$ given the $σ$ -algebra $σ (f)$ ), $E = X - T$ the error score, $f$ the assignment-to-individual function, and $σ (f)$ the $σ$ -algebra generated by $f$ .

Mathematical Consequences (Not Assumptions)

From this definition, two properties follow immediately:

E [E] = 0, and Cov (T, E) = 0 .

CTT is thus definitional rather than assumptive. What the classical literature sometimes describes as “axioms” (mean-zero error, uncorrelated true and error scores) are, in fact, theorems derived from the formal definition above.

Definitions of Reliability in CTT

In CTT, reliability is defined as the ratio of true-score variance to total variance:

Rel (X) = \frac{Var (T)}{Var (X)} .

This canonical formulation expresses the efficiency of measurement as the ratio of signal (true variance) to total variance. It emphasizes that reliability is a population-level property, not an individual one, and depends on how true and error scores are formally defined within a given measurement framework.

Equivalently,

Rel (X) = 1 - \frac{Var (E)}{Var (X)} .

This definition corresponds to the squared correlation between observed and true scores:

Rel (X) = ρ_{XT}^{2},

where $ρ_{XT}$ denotes the correlation between $X$ and $T$ . Historical notations, such as the parallel-test coefficient $ρ_{X X^{'}}$ , reflected repeated-testing paradigms. However, modern measure-theoretic and operator-theoretic formulations generalize this concept without assuming parallel forms or repeated measures. Reliability is thus model-dependent rather than test-dependent; its meaning derives from how true and error scores are defined within each framework.

Equating reliability with internal consistency or a single coefficient obscures its theoretical basis. Measurement precision is real, but frameworks such as CTT (e.g., Raykov & Marcoulides, 2011), generalizability theory (Brennan, 2001; Webb et al., 2006), operator-theoretic models (Zimmerman & Zumbo, 2001), and latent state–trait approaches (Steyer et al., 2015; Steyer & Schmitt, 1990) conceptualize it differently. No single formula captures reliability in full.

To clarify these distinctions, Table 1 contrasts three major conceptualizations of reliability in psychometrics—axiomatic, conditional expectation, and operator-theoretic. Although measure theory provides a rigorous probabilistic foundation, it remains unfamiliar to many researchers outside mathematics. This likely explains why Novick (1966) and Lord and Novick (1968) incorporated measure-theoretic ideas implicitly, preserving mathematical integrity while maintaining accessibility. Zimmerman (1975) made these foundations explicit, defining true scores as conditional expectations and extending psychometric constructs into the language of measure-theoretic probability. Steyer (1988, 1989) further developed this approach by defining $E (Y ∣ X)$ as a function mapping values of $X$ to expected values of $Y$ , aligning with the measure-theoretic definition of conditional expectation as a random variable satisfying integrability conditions. His framework elegantly connects abstract probability theory with applied modeling in psychology.

Table 1.

Conceptualizations of Reliability in Classical Test Theory (CTT)

How reliability and test theory are defined	Key features	Definition of reliability	Strengths	Limitations
Axiomatically (Gulliksen, 1950; Lord & Novick, 1968; Novick, 1966)	Assumes decomposition $X = T + E$ with $T ⊥ E$	Variance ratio: $Rel (X) = \frac{Var (T)}{Var (X)}$	Simple, intuitive; widely taught and applied	Axioms taken as primitive; provides little justification for why reliability equals variance ratio
Conditional Expectation (Steyer, 1988, 1989; Zimmerman, 1975, 1989)	Defines true score as conditional expectation: $T = E [X ∣ G]$	Squared cosine of angle between $X$ and $T$	Provides conceptual clarity	Abstract; less familiar to applied researchers
Operator-Theoretic Test Theory (Zimmerman & Zumbo, 2001; Zumbo, in press; Zumbo & Kroc, 2019)	Models measurement as linear operators in Hilbert space; uses the Rayleigh quotient and Rayleigh–Ritz methods	Reliability as a Rayleigh quotient of variance components	Derives reliability as a theorem of Hilbert space geometry; frames reliability as a precise mathematical object; connects to optimization and variational analysis	Requires advanced mathematics; functional estimation is emphasized; relies on bounding approaches and careful treatment of estimands, estimators, and estimates ( $α, θ, ω$ )

Note. CTT = Classical test theory. This table contrasts the axiomatic, conditional expectation, and operator-theoretic conceptualizations of reliability, highlighting their mathematical formulations, strengths, and limitations.

Building on Zimmerman’s (1975) foundation, Zimmerman and Zumbo (2001) introduced an operator-theoretic formalization that situates test theory within Hilbert space. In this framework, observed scores are vectors in a Hilbert space, and measurement processes are represented as linear operators acting on those vectors. True and error scores correspond to orthogonal projections onto complementary subspaces, ensuring that the decomposition $X = T + E$ follows from geometric principles rather than axiomatic postulates. Reliability, in turn, is represented as a Rayleigh quotient that captures the efficiency of projecting observed scores onto the true-score subspace. This operator-theoretic view unifies reliability with broader mathematical tools such as eigenvalue analysis, the Rayleigh–Ritz principle, and variational methods central to functional analysis.

Table 2 summarizes the main contrasts between CTT and operator-theoretic test theory. While CTT is rooted in variance decomposition and correlations, the operator-theoretic framework reframes reliability, error, and latent structure in terms of orthogonal projections and linear operators. This unifying framework connects CTT with modern approaches such as item response theory and generalizability theory, offering greater explanatory power for understanding variability in test performance. Operator-theoretic test theory is therefore best viewed as a distinct theoretical framework rather than an extension of CTT.

Table 2.

Comparison of Classical Test Theory (CTT) and Operator-Theoretic Test Theory (OTTT)

Feature	Classical test theory (CTT)	Operator-theoretic test theory
Core concept	Observed score = True score + Error	Test scores as random variables in Hilbert spaces, analyzed via operators
Mathematical basis	Variance decomposition, correlations, and covariances	Built on measure theory (probability spaces), Hilbert spaces (geometric structure), and functional analysis (operators on function spaces)
Representation of scores	Random variables with statistical properties	Vectors in Hilbert space; relations captured by inner products and linear operators
Error modeling	Simplified additive error term	Orthogonal decompositions of subspaces; error modeled via projection operators
Applications	Reliability estimation, validity analysis, and test construction	Unification of CTT, IRT, and advanced error models within a single abstract framework that informs reliability estimation, identifies the estimand, measurement validity analysis, and test construction
Explanatory power	Limited to variance-based interpretations	Richer explanations of score variability, situational influences, and latent constructs through operator actions
Level of abstraction	Moderate (statistical)	High (functional-analytic, operator-theoretic, geometric)

Note. CTT = classical test theory. IRT = item response theory.

Operator-theoretic test theory rests on measure-theoretic foundations. Measure theory provides the probabilistic framework for defining expectations, variances, and covariances, while Hilbert space supplies the geometric setting in which operator theory represents and analyzes reliability. Within this geometry, random variables are represented as vectors, and projections formalize the decomposition of test scores into true and error components. The need for this formalism has been recognized by Zimmerman (1975) and Zimmerman and Zumbo (2001), and further extended by Steyer and Schmitt (1990) and Kroc and Zumbo (2020) in their discussions of measurement error and exchangeability.

The next section formalizes this structure by introducing the Hilbert space framework and defining conditional expectation as the orthogonal projection that underlies reliability.

Hilbert Space Framework and Conditional Expectation

This section introduces the mathematical foundation of the operator-theoretic approach. Let $L^{2} (Ω, F, P)$ denote the Hilbert space of all square-integrable random variables defined on a probability space $(Ω, F, P)$ , equipped with the inner product

〈 X, Y 〉 = E (XY)

and norm

∥ X ∥ = \sqrt{E (X^{2})} .

In this space, random variables correspond to vectors, and expectations correspond to projections onto subspaces of constant random variables.

A closed subspace $G \subset L^{2} (Ω, F, P)$ represents the collection of random variables measurable with respect to a sub- $σ$ -algebra $G \subseteq F$ . The conditional expectation of $X$ given $G$ , denoted $E (X ∣ G)$ , is defined as the unique element of $G$ satisfying

X - E (X ∣ G), Z = 0, \forall Z \in G .

Thus, conditional expectation functions as the orthogonal projection of $X$ onto the subspace $G$ . This geometric interpretation immediately implies that $E (X ∣ G)$ minimizes the mean squared error among all $G$ -measurable functions:

E (X ∣ G) = \arg \min_{Z \in G} E [(X - Z)^{2}] .

In the context of test theory, let $X$ denote an observed score and $T = E (X ∣ G)$ represent the true score, where $G$ encodes information about the latent domain, attribute, or construct. The residual $E = X - T$ is orthogonal to $T$ , that is, $E (TE) = 0$ . This orthogonality embodies the fundamental decomposition of CTT, expressed geometrically as

X = T + E, T ⊥ E .

The Hilbert space formalism thus provides a rigorous foundation for the test-score decomposition: the true score is the projection of the observed score onto the σ-algebra representing the latent variable, and the error term is the orthogonal complement of that projection. This formulation replaces the traditional assumption of uncorrelated errors with a theorem arising from projection geometry.

A measure-theoretic construction of conditional expectation is provided in Appendix B.

Variance Decomposition and Reliability as Projection

The operator-theoretic framework allows the reliability of observed scores to be derived directly from Hilbert space geometry. Let $X$ denote an observed score and $T = E (X ∣ G)$ the corresponding true score, where $G$ represents the σ-algebra encoding the latent variable. From the orthogonal decomposition

X = T + E, E (TE) = 0,

the total variance of $X$ decomposes as

Var (X) = Var (T) + Var (E) .

The reliability of $X$ in its canonical form is defined as the proportion of the variance in $X$ attributable to its true-score component. Because $T$ is the orthogonal projection of $X$ onto the subspace of true scores, their inner product satisfies

〈 X, T 〉 = 〈 T, T 〉 = Var (T) .

Substituting this expression yields

Rel (X) = \frac{{〈 X, T 〉}^{2}}{〈 X, X 〉 〈 T, T 〉} .

This form expresses reliability as the squared cosine of the angle between $X$ and its projection T in the Hilbert space $L^{2}$ . Reliability, therefore, measures projection efficiency—the extent to which the observed score aligns with its true-score projection. When $X$ lies entirely in the subspace spanned by $T$ , $Rel (X) = 1$ ; when $X$ is orthogonal to that subspace, $Rel (X) = 0$ . Intermediate values indicate partial projection, analogous to regression $R^{2}$ and factor-analytic communality.

Thus, reliability arises naturally from projection geometry. It is not an assumed property of test scores but a theorem describing how an observed variable relates to its conditional expectation.

Mathematical Structure of Operator-Theoretic Test Theory

This section of the article develops the mathematical structure of operator-theoretic test theory rather than estimation or inference. Recall that operator-theoretic formulations of test theory build on Hilbert space methods, which themselves rest on measure-theoretic foundations. That is, measure theory formalizes the probabilistic quantities of expectation, variance, and covariance; Hilbert-space geometry, through operator theory, provides the analytic framework for representing and studying reliability.

Let $(Ω, F, P)$ be a probability space, where $X$ , $T$ , and $E$ are real-valued random variables, and $f : Ω \to Φ$ maps elements of $Ω$ to individuals in the population $Φ$ . For $φ \in Φ$ , $X (f^{- 1} (φ))$ denotes all possible measurement outcomes for that individual. For a measurable function $A : Ω \to Λ$ , the $σ$ -algebra generated by $A$ is

σ (A) : = {A^{- 1} (S) : S \in Λ} .

In psychometric terms, this framework formalizes how individuals in a population are linked to their possible observed scores through the probability structure.

Definition of Operator-Theoretic Test Theory

Definition (True Score)

Let $(Ω, F, P)$ be a probability space, and let $X \in L^{2} (Ω, F, P)$ be the observed score. For a sub- $σ$ -algebra $G \subseteq F$ , the true score of $X$ with respect to $G$ is defined as the conditional expectation

T : = E [X | G]

which coincides with the orthogonal projection

T = P_{G} X

where $P_{G}$ is the projection operator from $L^{2} (Ω, F, P)$ onto $L^{2} (Ω, G, P)$ .

The conditional expectation is an operator that maps $X \in L^{2} (Ω, F, P)$ to a $G$ -measurable random variable. In Hilbert space terms, this operator is the orthogonal projection

P_{G} : L^{2} (Ω, F, P) \to L^{2} (Ω, G, P) .

Thus, $G$ defines the information set, and the conditional expectation isolates the systematic component of the observed score. That is, in operator-theoretic formulations, $G$ generates the true-score subspace of $L^{2} (Ω, F, P)$ as the orthogonal projection of $X$ onto the subspace of systematic variation $P_{G} : L^{2} (Ω, F, P) \to L^{2} (Ω, G, P)$ .

Thus, in Zimmerman and Zumbo’s framework, a sub- $σ$ -algebra $G \subseteq F$ defines the systematic component of observed scores. Intuitively, $G$ collects all measurable events representing the latent attribute of interest (e.g., ability, proficiency, trait level). However, within the ecological framework (Zumbo, 2023), $G$ also collects all informative, explanatory features or events that support the explanatory modeling of test score variation. Conditioning on $G$ holds constant person-level and possibly explanatory ecological-level factors, isolating the predictable portion of measurement. In this way, test theory rigorously defines variance decomposition and reliability in alignment with the intended latent variable and interpretation.

Within Zimmerman and Zumbo’s operator formalization, $G$ is a sub- $σ$ -algebra of $F$ that specifies which variance components are treated as systematic. By itself, $G$ is not an operator but a collection of measurable events representing person-, situation-, ecological-, or contextual-level factors relevant to the construct and measured latent variables (Anastasi, 1983; Steyer et al., 1992; Zumbo, 2007, 2009, 2015, 2017, 2023; Zumbo et al., 2015). This ecological perspective emphasizes in vivo psychometric practices over routine, in vitro procedures (Zumbo, 2015). In my view, as described in Zumbo (2015), what I refer to as in vivo (as opposed to in vitro), the context is not a nuisance that “distorts the picture” but instead informs and shapes the attributes—that is, one cannot extract the context. Different test theory frameworks define $G$ in distinct ways.

Definition (Error Score)

The error score is the residual $E = X - T$ , with

E [E | G] = 0 .

From a psychometric perspective, $G$ encodes the relevant sources of systematic variation. It defines the true-score subspace of $L^{2} (Ω, F, P)$ , with reliability later formalized as the Rayleigh quotient of this projection.

Across frameworks, the choice of $G$ encodes theoretical commitments by distinguishing variance attributed to the construct (systematic) from variance assigned to error (unsystematic), or stated otherwise, construct-relevant versus construct-irrelevant variance. The error score is the residual $E = X - T$ .

Reliability as Projection Norm

Under the operator-theoretic view, the reliability of $X$ is not defined by internal consistency (e.g., $α$ ) or factor loadings (e.g., $ω$ ), but as the projection of the observed score onto a structure-preserving subspace of the Hilbert space, or equivalently, as the proportion of its variance explained by the projection onto $G$ :

\frac{R e l (X) = | | P_{G} X | |^{2}}{| | X | |^{2}},

where $∥ \cdot ∥$ is the Hilbert space norm and $θ$ is the angle between $X$ and its projection $T$ .

Equivalently, reliability is the proportion of observed variance explained by the projection onto the true-score subspace:

Rel (X) = \frac{| | T | |^{2}}{| | X | |^{2}}, T = P_{G} X,

In Hilbert space terms, this definition is a Rayleigh quotient.

Reliability as a Squared Cosine

Reliability can be expressed geometrically using inner products, angles, and Pythagoras’s theorem in Hilbert space. In Hilbert space, the cosine of the angle between two vectors is

\cos (θ) = \frac{〈 X, T 〉}{∥ X ∥ ∥ T ∥} .

Because $T$ is the projection of $X$ , we have $〈 X, T 〉 = ∥ T ∥^{2}$ . Thus,

co s^{2} (θ) = \frac{∥ T ∥^{4}}{∥ X ∥^{2} ∥ T ∥^{2}} = \frac{∥ T ∥^{2}}{∥ X ∥^{2}} = Rel (X) .

Therefore, geometrically, reliability is the squared cosine of the angle between $X$ and $T$ :

Rel (X) = \cos^{2} (θ),

where $θ$ is the angle between the observed score vector $X$ and its projection $T$ .

In plain language, reliability reflects how close the observed score vector lies to the true score subspace. A small angle ( $θ$ ) means most of $X$ is aligned with $T$ , so reliability is high; a large angle means more error, so reliability is low.

Although reliability has been defined in various ways within test theory—and for most purposes it matters little whether a given algebraic expression is treated as a definition or as a theorem—an important advantage of defining it as the ratio of true-score variance to observed-score variance is that this formulation applies to all observed scores with nonzero variance. In contrast, the squared correlation is undefined when the true-score variance equals zero. Drawing connections among the definitions, the reliability coefficient is defined as the proportion of variance in the observed score $X$ that is explained by its projection onto the true score subspace, where, as depicted in Figure 1, $θ$ is the angle between $X$ and its projection $T$ in Hilbert space.

Figure 1.

Orthogonal Projection of Observed Score $X$ onto the True Score Space $G$ .

Reliability as Projection

The projection theorem tells us that any random variable $X \in L^{2}$ can be decomposed into two orthogonal components: its projection onto a subspace, and the residual. In the context of test theory, the subspace is the information space $G$ , the projection is the true score $T$ , and the residual is the error $E$ , where,

X = T + E, with E [TE] = 0 .

The key insight is that reliability is about the quality of this projection. If most of the length of $X$ is captured in its projection $T$ , then reliability is high. If much of $X$ lies orthogonal to the projection, then reliability is low.

Reliability Follows as a Corollary of Projection Geometry in $L^{2}$

The metric interpretation of reliability follows directly from the geometric fact that conditional expectation is an orthogonal projection in $L^{2}$ , the Hilbert space of square-integrable random variables. This result, often called the projection theorem, provides the functional-analytic foundation for test theory.

Conditional expectation is not only a statistical concept but also admits a functional-analytic interpretation. The projection theorem in Hilbert spaces states that every element $X \in L^{2}$ can be uniquely decomposed into a projection onto a closed subspace and an orthogonal residual. When applied to test theory, this result ensures that the true score is the projection (conditional expectation) and the error score is the orthogonal remainder, so the defining properties of reliability follow as theorems rather than assumptions. The projection theorem explicitly shows that if you have a closed subspace $M \subset L^{2}$ , then for any $X \in L^{2}$ , there exists a unique $Y \in M$ such that

∥ X - Y ∥ = \min_{Z \in M} ∥ X - Z ∥ .

When you take $M = L^{2} (G)$ , the subspace of square-integrable, $G$ -measurable random variables, this minimizing element $Y$ is exactly the conditional expectation $E [X | G]$ . This gives us the general functional-analytic framework, which is the mathematical backbone of the claim that reliability follows as a corollary of projection geometry in $L^{2}$ : (a) the projection theorem ensures the existence and uniqueness of the projection, and (b) the measure-theoretic probability framework identifies the projection with conditional expectation.

Reliability as a Rayleigh Quotient

Integrating the operator-norm interpretation with the Rayleigh quotient perspective, more generally, reliability can be interpreted as the operator norm of the projection $T$ restricted to the one-dimensional subspace spanned by the centered observed score $X - E [X]$ . Formally,

Rel (X) = \frac{∥ T (X - E [X]) ∥^{2}}{∥ X - E [X] ∥^{2}} = ∥ T ∣_{span (X - E [X])} ∥^{2} .

As Zimmerman and Zumbo (2001) note, this follows from the fact that $T$ is an idempotent, self-adjoint operator in $L^{2}$ , implying that its operator norm equals 1. The square root of the reliability coefficient, $σ (TX) / σ (X)$ , is the cosine of the angle between $X - E [X]$ and its orthogonal projection $T (X - E [X])$ . Reliability, therefore, quantifies the efficiency of this projection—how closely the observed score lies to the true-score subspace.

Equivalently, reliability can be expressed as a Rayleigh quotient of the operator $T$ with respect to the random variable $X - E [X]$ :

Rel (X) = \frac{〈 T (X - E [X]), X - E [X] 〉}{〈 X - E [X], X - E [X] 〉} .

This form highlights reliability as an intrinsic spectral property of the projection operator. In functional analysis, the Rayleigh quotient measures how much of a vector’s energy lies in the direction preserved by a self-adjoint operator. When applied to test theory, it expresses the proportion of observed-score variance that remains under the action of the true-score projection. Reliability equals one if and only if $X = TX$ (perfect projection onto the true-score subspace) and equals zero if and only if $TX = 0$ (complete orthogonality).

In functional analysis, the Rayleigh quotient plays a central role in connecting operator theory, geometry, and optimization. For a self-adjoint (Hermitian) linear operator $T$ on a Hilbert space $H$ , the Rayleigh quotient of a nonzero vector $x \in H$ is defined as

R_{T} (x) = \frac{〈 Tx, x 〉}{〈 x, x 〉} .

This expression measures how the operator $T$ “acts” along the direction of $x$ . Geometrically, $R_{T} (x)$ can be interpreted as the expected value of $T$ in the state or direction $x$ , and in probabilistic contexts, as the ratio of explained to total variance.

For compact self-adjoint operators, the Rayleigh quotient satisfies,

λ_{\min} \leq R_{T} (x) \leq λ_{\max},

where $λ_{\min}$ and $λ_{\max}$ are the smallest and largest eigenvalues of $T$ . Equality holds only when $x$ is an eigenvector of $T$ , that is, when $Tx = λ x$ .

Thus, the Rayleigh quotient defines an optimization problem—the Rayleigh–Ritz variational principle—whose solutions identify the principal directions in which the operator acts as a pure scaling transformation. This variational characterization carries directly into test theory. Let $T$ denote the true-score projection operator acting on an observed-score random variable $X$ in $L^{2}$ , the Hilbert space of square-integrable random variables.

Then, the reliability of $X$ can be expressed as the Rayleigh quotient

Rel (X) = \frac{〈 TX, X 〉}{〈 X, X 〉} .

Because $〈 TX, X 〉 = Cov (T, X)$ and $〈 X, X 〉 = Var (X)$ , this expression yields

Rel (X) = \frac{Var (T)}{Var (X)} = co s^{2} (θ_{X, T}),

showing that reliability is the squared cosine of the angle between the observed and true scores.

Equivalently, the square root of reliability,

\sqrt{Rel (X)} = \frac{σ (TX)}{σ (X)},

is the ratio of the length of the projection of $X - E [X]$ onto the subspace of true scores to the length of $X - E [X]$ itself. This ratio lies between 0 and 1, consistent with the fact that the norm of any orthogonal projection operator in Hilbert space is unity.

From this perspective, a reliable score is “close” to the subspace of true scores—its projection retains nearly the same length as the original vector. If the projection is much shorter, the observed and true-score vectors are nearly orthogonal, and reliability approaches zero.

Thus, reliability can be viewed as a Rayleigh quotient quantifying the efficiency with which an observed score preserves the energy (variance) of its projection onto the true-score subspace.

This interpretation situates reliability within the same mathematical framework that governs principal components, eigenvalue problems, and variational formulations in physics and engineering. Reliability, in this view, is not merely a descriptive coefficient but a theorem of Hilbert space geometry—a scalar functional summarizing the action of the true-score operator on the space of observed scores.

Implications for Applied Psychometrics

In applied psychometrics, reliability coefficients such as Cronbach’s (1951) $α$ , Armor’s (1974) $θ$ , and McDonald’s (1999) $ω$ serve as empirical estimators or bounds of the theoretical reliability defined above. Each corresponds to a specific choice of how the projection operator $T$ is approximated or constrained in practice. For example, $α$ estimates reliability under the assumption of parallel items and equal covariances, effectively approximating the projection within a restricted subspace defined by item homogeneity. Coefficient $ω$ , by contrast, corresponds to a weighted projection determined by factor loadings in a one-factor model. In this sense, these indices estimate the Rayleigh quotient $Rel (X)$ under different structural assumptions about the true-score operator. The theoretical reliability is defined as

Rel (X) = \frac{〈 TX, X 〉}{〈 X, X 〉}

thus, it represents an ideal, model-free quantity, while the classical coefficients provide realizable, data-dependent approximations that are contingent on test design and measurement model. The distinction underscores that applied reliability indices are not reliability itself but estimators of the Rayleigh quotient within specific psychometric frameworks.

Reliability as a Spectral Property of the True-Score Operator

From the standpoint of functional analysis, the Rayleigh quotient

Rel (X) = \frac{〈 TX, X 〉}{〈 X, X 〉}

defines the proportion of the squared length of $X$ preserved by the projection operator $T$ onto the true-score subspace. When $T$ is a self-adjoint, positive, bounded linear operator on a Hilbert space $H = L^{2}$ , the Rayleigh quotient lies between the smallest and largest eigenvalues of $T$ . Thus, reliability inherits the spectral structure of the true-score operator.

Geometrically, if $X$ aligns perfectly with an eigenfunction of $T$ associated with the largest eigenvalue (equal to 1 for an orthogonal projection), then $TX = X$ and reliability equals one. Conversely, if $X$ lies entirely in the orthogonal complement of the true-score subspace, then $TX = 0$ and reliability is zero. Between these extremes, the reliability coefficient quantifies how much of $X$ ’s variance is captured by the action of $T$ .

This spectral view links reliability to fundamental concepts in operator theory and variational analysis. In the same way that the Rayleigh–Ritz method characterizes eigenvalues as extrema of Rayleigh quotients, reliability represents an extremal property of test scores under projection. The coefficient, therefore, has a dual interpretation: algebraically, as a variance ratio, and geometrically, as the squared cosine between $X$ and $TX$ . Conceptually, it expresses the “efficiency” with which the operator $T$ captures the true component of the observed score.

Connections to Estimation and Measurement Error

The spectral view of reliability clarifies why common estimators, such as coefficient $α$ , $θ$ , and $ω$ , yield only bounds or approximations to the theoretical reliability defined by the Rayleigh quotient. In practice, reliability is not computed directly from the true-score operator $T$ , which is unobservable, but rather estimated from finite samples and covariance structures that approximate its action.

In this framework, each estimator corresponds to a different projection operator that captures only part of the true-score variance. For example, coefficient $α$ assumes equal covariances among items, effectively constraining $T$ to an equicorrelation subspace. Coefficient $ω$ generalizes this by weighting items through a latent factor model, producing a projection closer to the true operator but still dependent on model specification. These estimators therefore measure the efficiency of surrogate projections relative to $T$ , explaining why they serve as lower or upper bounds rather than exact values.

From the perspective of operator theory, estimation error arises because the empirical covariance operator $\hat{Σ}$ approximates, but does not equal, the population covariance operator $Σ$ . Small perturbations in $\hat{Σ}$ alter the corresponding Rayleigh quotient, yielding variability in the estimated reliability. This observation parallels results in perturbation theory for self-adjoint operators, where eigenvalues (and thus Rayleigh quotients) depend continuously on the operator under mild conditions.

In sum, reliability coefficients computed from data quantify the empirical projection efficiency of the measurement process. Their accuracy depends on how closely the assumed model approximates the true-score operator that governs the underlying Hilbert-space geometry of the measurement system.

Geometric and Statistical Unification

As we see in Zimmerman and Zumbo (2001) and Zumbo (2023), the projection interpretation reveals that reliability, regression $R^{2}$ , and factor communality are not separate concepts but distinct expressions of a single geometric principle: each represents the squared cosine between a variable and its orthogonal projection in Hilbert space. In regression, $R^{2}$ is the squared cosine between the centered dependent variable and its projection onto the subspace spanned by the predictors. In factor analysis, communality measures the squared cosine between an observed variable and its projection onto the latent-factor subspace. In CTT, reliability is the squared cosine between an observed score and its projection onto the subspace of true scores.

Formally, all three quantities share the structure

Eff (X, S) = \frac{∥ P_{S} X ∥^{2}}{∥ X ∥^{2}},

where $P_{S}$ denotes the orthogonal projection operator onto a closed subspace $S \subset L^{2}$ . The choice of subspace—predictor space, factor space, or true-score space—determines the interpretation. In each case, the projection balances bias and variance optimally in the least-squares sense, ensuring that the residual component is orthogonal to the space of predictors or true scores.

This unification highlights the deep mathematical continuity across measurement models: reliability, $R^{2}$ , and communality are all Rayleigh quotients defined on different subspaces. Consequently, the geometric framework not only generalizes CTT but also situates psychometric reliability within the broader domain of statistical estimation and functional analysis.

Summary, Axioms, and Theorems in Operator-Theoretic Test Theory

Thus, reliability can be understood simultaneously as (a) a geometric measure of alignment between $X$ and $T$ , (b) a probabilistic expectation, and (c) a projection-based functional defined within Hilbert space describing the systematic component of observed scores as a set of measurable events representing the latent attribute of interest (e.g., ability, proficiency, trait level).

As Zumbo (in press) shows, this formalization yields four insights.

First, reliability quantifies the proportion of observed variance attributable to systematic variance, situating it as a measure of alignment between $X$ and its projection $T$ .

Second, the geometric interpretation clarifies that reliability equals one when the observed score lies entirely in the true-score subspace (no error) and approaches zero as it becomes orthogonal (all error), paralleling signal-to-noise ratios in PCA and signal processing (Zimmerman & Zumbo, 2001).

Third, the conditional expectation $E [X | G]$ is the best $L^{2}$ -approximation of $X$ , linking psychometric definitions of systematic variance to the probability-theoretic structure of sub- $σ$ -algebras.

Fourth, the classical theory of reliability, rooted in CTT, traditionally depends on assumptions such as parallel forms, tau-equivalence, and equal weighting of items. These assumptions, while convenient for analytic tractability, impose strict limitations on the generalizability and interpretability of reliability coefficients such as Cronbach’s $α$ , McDonald’s $ω$ , or conventional Armor’s $θ$ reliability.

By contrast, the Hilbert space and operator-theoretic formulation of reliability—grounded in the tools of projection geometry and conditional expectation—transcends these limitations. This Rayleigh quotient form defines reliability geometrically as the squared cosine of the angle between $X$ and $T$ . Equivalently, it is the proportion of total observed variance attributable to the systematic, or true-score, component. This development leads to a foundational shift in how reliability is conceived.

Terminological Clarification

At several points, we clarified that coefficients such as $α$ , $θ$ , or $ω$ are not inherently reliability measures—they become so only within a model that defines true and error variance. Conditional reliability was positioned as an extension of this point: the same coefficient can mean different things depending on whether reliability is considered globally or locally (conditional on predictors, factors, or grouping variables).

Having established reliability as a projection coefficient, we now generalize the same geometric structure to broader statistical models that also embody projection operators.

Extensions to Regression, Factor Models, and Time Series

The geometric structure of reliability extends naturally to regression, factor-analytic models, and time series. In each case, prediction or estimation can be interpreted as an orthogonal projection in the Hilbert space $L^{2}$ . This section demonstrates that the reliability of a score, the coefficient of determination in regression, factor communality, and time-series predictability share a common mathematical structure: each involves projecting an observed variable onto a subspace defined by predictors, factors, or past values. Reliability is the squared cosine of this projection.

Regression: Reliability as $R^{2}$

As noted by Zumbo (2007) and Zimmerman and Zumbo (2001), the geometric approach also shows why reliability is analogous to the familiar $R^{2}$ in regression. Reliability is the proportion of variance in $X$ explained by its projection onto the true score space, just as $R^{2}$ is the proportion of variance in a dependent variable explained by regressors.

Consider a regression model in which we predict an observed score $X$ from a set of predictors $Z$ . The fitted values are

\hat{X} = E [X ∣ Z] .

This is exactly the conditional expectation, and thus exactly the projection of $X$ onto the subspace spanned by $Z$ . The residual is

E = X - \hat{X}, with E [\hat{X} E] = 0 .

The regression coefficient of determination is

R^{2} = \frac{Var (\hat{X})}{Var (X)} .

However, this is the same form as reliability:

Rel (X) = \frac{Var (T)}{Var (X)} .

Thus, as Zumbo (2007) states, reliability is simply the $R^{2}$ from projecting $X$ onto the information set $G$ . In regression, $G$ corresponds to observed predictors; in test theory, $G$ corresponds to the latent variable. That is, let $Y$ denote a criterion variable and $\hat{Y} = E (Y ∣ G)$ is the best linear predictor based on information encoded by the $σ$ -algebra $G$ . Because $\hat{Y}$ is the orthogonal projection of $Y$ onto the subspace of $G$ -measurable functions, $Y = \hat{Y} + ε, E (\hat{Y} ε) = 0 .$ The variance decomposition follows directly: $Var (Y) = Var (\hat{Y}) + Var (ε) .$ The proportion of explained variance is therefore

R^{2} = \frac{Var (\hat{Y})}{Var (Y)} .

Comparing this with the definition of reliability shows that

R^{2} = Rel (Y),

when $Y$ is treated as the observed score and its conditional expectation $\hat{Y}$ as the true score relative to the predictive subspace. Thus, $R^{2}$ is a special case of reliability, defined through projection onto the subspace spanned by predictors.

Factor-Analytic Communality as Projection

The same principle applies to factor analysis. Let $X$ represent an observed variable, and let

\hat{X} = E (X ∣ F)

denote its projection onto the subspace spanned by the common factors $F$ . Then

X = \hat{X} + U, E (\hat{X} U) = 0 .

The communality of $X$ is given by

h^{2} = \frac{Var (\hat{X})}{Var (X)} = Rel (X),

showing that communality is also a measure of projection efficiency. When factors perfectly reproduce $X$ , the projection is exact and $h^{2} = 1$ . When factors are uninformative, $h^{2} = 0$ .

Across regression, factor analysis, and test theory, reliability shares a unified geometric interpretation: it quantifies how well an observed variable aligns with its projection onto a subspace representing systematic variation. This projection-based view links psychometric reliability to a broad family of statistical concepts grounded in Hilbert space geometry.

Time Series: Reliability as Predictability

A third connection arises in time-series analysis. Consider an autoregressive model of order 1 (AR[1]):

X_{t} = ϕ X_{t - 1} + ϵ_{t}, ϵ_{t} ~ i . i . d . (0, σ^{2}) .

The one-step predictor of $X_{t}$ from the past is

T_{t} = E [X_{t} ∣ X_{t - 1}] = ϕ X_{t - 1} .

The reliability of $X_{t}$ relative to its predictor is

Rel (X_{t}) = \frac{Var (T_{t})}{Var (X_{t})} = \frac{ϕ^{2} Var (X_{t - 1})}{Var (X_{t})} .

In the stationary AR(1) case,

Var (X_{t}) = \frac{σ^{2}}{1 - ϕ^{2}} .

So reliability simplifies to

Rel (X_{t}) = ϕ^{2} .

Thus, the autoregressive coefficient $ϕ$ is exactly the square root of reliability. Strong persistence in the time series corresponds to high reliability; weak persistence corresponds to low reliability.

Practical Implications

Recasting CTT in operator-theoretic terms has direct consequences for applied test development and interpretation. By treating the true score as a projection rather than an unobservable latent variable, practitioners can frame reliability and error not as assumptions but as intrinsic mathematical consequences of the model. This shift clarifies long-standing misconceptions about the nature of measurement error. It emphasizes that reliability is a geometric property of score alignment rather than a fixed attribute of a test. In practice, this perspective provides a more rigorous foundation for evaluating test quality, comparing measurement models, and integrating CTT with modern psychometric frameworks such as item response theory. It also enhances interpretability by showing how test scores can be decomposed into systematic and error components through projections in Hilbert space, offering both conceptual clarity and methodological precision in real-world testing scenarios.

Integrative Implications

Taken together, the operator-theoretic reformulation of CTT unifies its conceptual foundations with practical applications, offering a coherent framework that bridges theory and practice.

Numerical Illustrations

This section provides numerical examples to illustrate the projection-based definition of reliability. Each example demonstrates how $Rel (X)$ can be computed directly from the geometry of the Hilbert space representation.

Example: Reliability in a Two-Variable System

Consider two standardized variables, $X$ and $T$ , with correlation $r_{XT} = 0.80$ . From the geometric definition, reliability equals the squared cosine of the angle between $X$ and $T$ :

Rel (X) = r_{XT}^{2} = (0.80)^{2} = 0.64 .

This value indicates that 64% of the variance in $X$ is attributable to its true-score component $T$ , and 36% represents error variance. The same value is obtained from the variance ratio:

Rel (X) = \frac{Var (T)}{Var (X)} = \frac{0.64}{1.00} = 0.64 .

The geometric representation clarifies that reliability depends solely on the alignment between $X$ and its projection $T$ . When the correlation between them increases, the projection angle decreases, and $Rel (X)$ approaches 1.

Operator-Theoretic Computation

Suppose the true score is generated by the conditional expectation $T = E (X ∣ Z)$ , where $Z$ represents a latent variable measured by an auxiliary variable. Let $X = 0.8 Z + ε$ , with $ε$ uncorrelated with $Z$ and $Var (Z) = Var (ε) = 1$ . Then

Var (X) = 0 . 8^{2} (1) + 1 = 1.64, Var (T) = 0 . 8^{2} (1) = 0.64 .

Therefore,

Rel (X) = \frac{Var (T)}{Var (X)} = \frac{0.64}{1.64} \approx 0.39 .

This result follows directly from the orthogonal projection theorem: $T$ is the best mean-square predictor of $X$ given $Z$ . Reliability quantifies how much of the total variance in $X$ lies within the subspace spanned by $Z$ .

These examples show that reliability can be derived geometrically, statistically, or computationally from the same principle of orthogonal projection. The projection-based approach provides a unified interpretation of reliability across CTT, regression, and factor models. A worked example is presented in Appendix A.

Implications for Estimation and Measurement

The projection-based definition of reliability reframes several aspects of psychometric estimation and interpretation. Because $Rel (X)$ is defined geometrically, it depends only on the relationship between the observed variable $X$ and its projection $T = E (X ∣ G)$ . This formulation separates the mathematical structure of reliability from any specific measurement model or estimation procedure.

Reliability Estimation and Sampling

Traditional reliability coefficients, such as Cronbach’s (1951) coefficient alpha, congeneric reliability, or hierarchical omega (McDonald, 1999), can be interpreted as sample-based estimates of $Rel (X)$ . Each estimates the ratio of true-score to total-score variance under particular model assumptions. The operator-theoretic framework clarifies that these indices approximate the same underlying projection relationship:

Rel (X) = \frac{Var [E (X ∣ G)]}{Var (X)} .

This equation emphasizes that reliability estimation involves approximating the conditional expectation operator using observed data. Sampling variability affects the empirical estimate of $E (X ∣ G)$ , but the theoretical quantity remains invariant as a property of the population distribution.

Conceptual and Interpretive Implications

Viewing reliability as a projection theorem unifies several concepts that are often treated separately. Reliability, regression $R^{2}$ , and factor communality all represent the same projection efficiency within different subspaces of $L^{2}$ . This interpretation emphasizes that reliability is not merely a property of test items or forms but of the mapping between observed and latent variables.

The geometric formulation also clarifies the meaning of “measurement error.” Error is the orthogonal complement of the projection that defines the true score, and its variance quantifies the portion of $X$ unaligned with the latent variable. This view generalizes across model families—whether the projection subspace is defined by latent factors, predictors, or measurement occasions.

Finally, the projection perspective integrates classical reliability with modern statistical frameworks. In Bayesian estimation, for example, the posterior mean $E (X ∣ data)$ is itself a projection in $L^{2}$ ; its mean-square optimality parallels the definition of $Rel (X)$ . Thus, reliability can be understood as the expected squared correlation between a variable and its posterior mean, linking measurement theory with predictive modeling.

Overall, the operator-theoretic view strengthens the conceptual foundation of reliability by embedding it in a general geometric theory of estimation. This perspective allows psychometricians to interpret reliability not as an assumption or coefficient but as a theorem describing the structure of expectation, variance, and projection in measurement.

Conclusion and Future Directions

This article has reframed reliability as a theorem rather than an assumption of CTT. By situating reliability within the geometry of Hilbert space, the analysis demonstrates that the true score is the orthogonal projection of the observed score onto the subspace defined by the latent variable. Reliability, expressed as $Rel (X) = Var [E (X ∣ G)] / Var (X)$ , quantifies the efficiency of this projection and unifies several statistical concepts—regression $R^{2}$ , factor communality, and predictive accuracy—under a single operator-theoretic framework.

Conceptually, this reformulation clarifies that reliability is not an empirical artifact of a specific test or model but a mathematical property of expectation and variance in $L^{2}$ . This geometric perspective highlights that the decomposition $X = T + E$ is not an assumption about data but a theorem about orthogonal projections. Consequently, measurement error corresponds to the orthogonal complement of the latent subspace, and reliability quantifies the proportion of variance aligned with that subspace.

Methodologically, this approach provides a foundation for unifying psychometric estimation with modern predictive modeling. The same projection principles that define $Rel (X)$ govern regression prediction, factor analysis, and Bayesian estimation. This shared structure suggests that reliability can be extended naturally to nonlinear, multilevel, and Bayesian models through their respective conditional expectation operators.

Future research can explore these extensions systematically. Potential directions include defining reliability for generalized linear and nonparametric models, developing operator-based reliability estimators, and examining how projection geometry interacts with measurement invariance and latent-variable identification.

By interpreting reliability as a corollary of the projection theorem, this article provides a rigorous and unifying mathematical basis for measurement theory. It connects classical psychometric constructs with modern statistical theory, reinforcing the view that reliability reflects not merely consistency in measurement but the geometry of expectation itself.

New Avenues for Psychometric Research

On Quantifiers and the Estimand When Selecting Reliability Coefficients

As Zumbo and Kroc (2019) emphasized, a central issue in test theory is whether an algebraic expression is taken as a definition or as a theorem. An advantage of defining reliability as the ratio of true-score variance to observed-score variance is that this definition applies to all observed scores with nonzero variance. In contrast, the squared correlation is undefined if the true-score variance is zero.

More generally, Zimmerman and Zumbo (2001) introduced an operator-theoretic formulation of CTT. In their approach, the measurement process is expressed as a collection of linear operators acting on a Hilbert space of true-score vectors. Within this framework, the true score and error score correspond naturally to projection operators on the Hilbert space. Once this identification is made, geometric concepts such as distance, length, angle, and orthogonality have direct implications for test theory. Zimmerman and Zumbo further showed that reliability itself can be understood as a mathematical object defined through projection, thereby situating it as an inherent feature of the operator-theoretic structure.

It is this mathematical object—the conventional CTT reliability—that Zumbo and his colleagues refer to as theoretical reliability. The qualifier theoretical is appropriate because this psychometric construct arises from the abstract structure of the Hilbert space and is not formally estimated in routine psychometric practice. Instead, commonly used reliability coefficients (e.g., Cronbach’s α) are best understood as bounding, from below, the theoretical reliability (see Zumbo, 1999).

As Zumbo and Kroc (2019) note, although the psychometric literature frequently uses the phrase to estimate the reliability, the term estimate is somewhat misleading. From a strictly formal perspective, coefficients such as Cronbach’s α may be regarded as biased estimators of theoretical reliability. However, this is not the sense in which practitioners typically use the term. It is therefore clearer to say that one may measure or quantify reliability through statistical procedures and measurement designs that yield bounds on the theoretical value. These procedures include repeated, structured data-collection strategies combined with measurement models such as parallel forms, tau-equivalence, or essential tau-equivalence. Approaches such as the empirical copula method (Bonanomi et al., 2015; Zumbo, in press) further illustrate how empirical strategies can be used to bound reliability.

Because theoretical reliability is formally defined as a ratio of variance components at the population level, a given numerical value of reliability may correspond to multiple possible combinations of true-score variance and error-score variance. To address this, psychometricians typically impose bounds on the error component of the variance ratio. In doing so, they define a quantifier of reliability through (a) the choice of how error is bounded (e.g., internal consistency, interrater variation, or test–retest variation), (b) the design of the measurement experiment, and (c) the choice of estimator. Different estimators naturally yield different properties of the resulting sample values, and which properties are most desirable depends on the analytic objectives.

Finally, much of the confusion surrounding reliability arises from the failure to distinguish among estimators, estimates, and estimands. As emphasized in statistics, an estimator is a formula or function applied to data; an estimate is the numerical value obtained from applying the estimator to a sample; and the estimand is the underlying quantity the estimator is designed to capture. Clarifying these distinctions is crucial to resolving persistent ambiguities in psychometric discussions of reliability.

For example, reliability emerges as a projection operator—a Rayleigh quotient—that Zumbo (1999) and colleagues (e.g., Gadermann et al., 2012; Liu et al., 2009; Zumbo et al., 2007) term theoretical reliability, the estimand. Unlike other formulations, in which reliability remains conceptually vague, the Rayleigh quotient defines it as a precise mathematical object grounded in Hilbert space.

Bounded Interpretations and the Unification of Diverse Reliability Coefficients

Zimmerman and Zumbo’s operator-theoretic perspective, viewing test theory through the lens of Hilbert spaces and operator theory, helped to open the line of research that the present work extends. Their recognition that reliability possesses a fundamental Rayleigh quotient structure was a key mathematical insight, enabling current developments around bounded interpretations and the unification of diverse reliability coefficients. Notably, the operator-theoretic foundation laid in Zimmerman and Zumbo’s developments provides the rigorous mathematical infrastructure that underlies recent advances on the quantification–estimation distinction, the bounded nature of reliability measures, and the reinterpretation of coefficients such as Armor’s (1974) coefficient $θ$ . More broadly, the fact that this framework naturally explains phenomena that previously appeared ad hoc—such as why reliability coefficients behave as bounds or how seemingly disparate coefficients are related—underscores the power of approaching psychometric concepts through the right mathematical abstractions.

Conceptual Foundations

Zumbo and Kroc’s (2019) and Zumbo’s (in press) description of the estimand/estimator/estimate framework for reliability emphasizes the need to separate what we want to know (theoretical concept or psychometric construct) from how we calculate it (procedure) and what we get (numerical value). Ignoring these distinctions risks reducing reliability theory to numeric manipulation without grounding in meaning. The central point is: conceptual clarity in reliability theory requires respecting the hierarchy of estimand → estimator → estimate. For example, Zumbo (in press) used the reconceptualization of theoretical reliability using a Rayleigh quotient within geometric test theory, introducing polychoric and copula estimators for composite reliability of ordered categorical scores. It establishes Armor’s (1974) coefficient theta (θ) as a distinct psychometric index, independent of Cronbach’s (1951) coefficient alpha (α), and demonstrates its computation and interpretation with empirical data, positioning θ as a robust alternative for estimating composite reliability.

Quantification Versus Statistical Estimation

Quantification is about defining what reliability means as a psychometric construct and how it connects to the measurement operation. Statistical estimation is about using formulas to approximate this psychometric construct from data, with sampling variability. The confusion arises when applied researchers leap straight to estimation (e.g., “Cronbach’s α = .85”) without clarifying quantification (what psychometric construct this actually represents). What we learn is that quantification must precede estimation. Without it, reliability coefficients are interpreted as ends in themselves rather than bounded representations of theoretical reliability defined in operator-theoretic test theory.

Reliability as a Bounded Object

Reliability coefficients (α, ω, θ, GLB, etc.) are best understood as bounds on a more fundamental psychometric construct—theoretical reliability. For example, coefficient α is a lower bound under tau-equivalence, coefficient θ is not “weighted α” but a distinct estimator of reliability-as-Rayleigh-quotient (Zumbo, in press). Other coefficients provide different bounding relationships depending on assumptions. Insight provided by operator-theoretic test theory: Instead of asking which coefficient is best, the question becomes which bound is most meaningful for the theoretical psychometric construct and measurement context?

Operator-Theoretic and Geometric Reframing

In Hilbert space, reliability is naturally expressed as a Rayleigh quotient:

Rel (X) = \frac{∥ P_{G} X ∥^{2}}{∥ X ∥^{2}} = co s^{2} θ,

where $P_{G}$ is a projection and $θ$ is the angle between observed and true score vectors. This reveals reliability as a geometric object, not a mysterious psychometric construct. The Courant–Fischer theorem explains why coefficients provide bounds: Rayleigh quotients lie between the min and max eigenvalues of the operator. The insight provided by operator-theoretic test theory is that boundedness is intrinsic to the geometry of measurement, not an artifact of any single coefficient.

Practical Implications

Reporting reliability should move from “reliability = .85” to statements like:

“Internal consistency (a lower bound for reliability under tau-equivalence) estimated at .85 [CI].” This clarifies (a) the quantifier (what aspect of reliability is bounded), (b) the estimator (formula used), (c) the estimate (numerical value + uncertainty), and (d) the assumptions behind the bound. Operator-theoretic test theory provides the following insight: transparent reporting reframes reliability as a bounded, assumption-dependent estimate of a deeper theoretical psychometric construct, the Rayleigh quotient as the estimand.

Integrative Contribution

The resulting approach at once provides mathematical rigor by embedding reliability in operator theory and Rayleigh quotient geometry and practical guidance by reframing reliability coefficients as bound-selecting tools, not definitive measures. The result is a unified foundation that could reshape both knowledge mobilization and pedagogy (clarity on estimand/estimator/estimate) and practice (bound-sensitive reporting), while also opening paths for theoretical generalization (new operator-based reliability quantifiers).

Reinterpretation of True Scores in Operator-Theoretic Test Theory

The alternative interpretation of true scores builds on conditioning not only on the person but also on the situation in which measurement occurs. In this reinterpretation, the true score is defined as the conditional expectation,

T = E (X ∣ σ (f)),

where $f$ is an assignment-to-individuals function and $σ (f)$ is the $σ$ -algebra it generates. This formulation emphasizes that test performance must be understood as the person-in-situation, capturing the contexts relevant to the attribute or phenomenon of interest. Such an interpretation aligns with Zumbo’s (2023) explanation-focused view of validity, which situates measurement within ecology, context, and the many ways of being human, including embodied and distributed cognition.

In short, operator-theoretic test theory provides a framework for reinterpreting the familiar decomposition $X = T + E$ . Whereas conventional CTT locates the true score solely within the individual, the operator-theoretic approach highlights the centrality of context, recognizing that cognition is always situated when a respondent encounters a test item, task, or survey question.

By contrast, the conventional interpretation defines the true score as a property of the test-taker alone. This view is grounded in the classical formulations of Guttman (1945, 1953), Lord and Novick (1968), and Novick (1966), where the true score was defined as the expectation of an individual’s observed scores over infinite independent (memoryless) replications of a test. Lord and Novick’s notion of the “propensity distribution” formalized this as the hypothetical distribution of observed scores under repeated administrations, with the test-taker’s memory wiped clean between trials. In this view, variation in observed scores arises solely from measurement error, given the repeated-measures metaphor.

Without the measure-theoretic framework that is part of operator-theoretic test theory, one must rely on the metaphor of “wiping the test-taker’s memory clean between replications,” which explains why Lord and Novick and others described the true score in this way. This metaphor also underlies why many accounts of CTT frame it as a repeated-measures assessment experiment; it is embedded in the very definition of the true score. Moreover, this repeated-measures view helps explain why some writers portray CTT as imposing immutable outcome variables, why simple difference scores are often dismissed as poor indicators of change (Zumbo, 1999), and why this has been described as a metaphor run amok.

In contrast, the operator-theoretic reinterpretation defines the true score as conditioning on all possible outcomes of the measurement process $X$ for a given test-taker or survey respondent. This shift avoids the limitations of the replication metaphor and situates the true score within the full range of contexts and responses that characterize the measurement process. In doing so, it aligns with an explanation-focused view of validity (Zumbo, 2023), which emphasizes the person-in-situation perspective and the role of context, ecology, and distributed cognition in test performance.

Thus, the operator-theoretic reinterpretation reframes the true score not as an abstract property of the individual under hypothetical replications, but as an ecologically grounded psychometric construct that reflects the individual’s performance across the full range of contexts in which measurement occurs (Zumbo, 2023).

True Score as Conditional Expectation Over All Possible Outcomes for a Respondent, Not Repeated Testing

The operator-theoretic reinterpretation avoids these limitations by conditioning directly on the test-taker or respondent in the measurement process and needing to invoke the memoryless random variable concept. Instead of invoking hypothetical replications, the true score is understood as the conditional expectation over all possible outcomes of $X$ for that individual. This reframing shifts the emphasis from abstract repetition to lived measurement contexts, offering a richer and more ecologically valid account of test performance.

Within the operator-theoretic formalization, $G$ is a sub- $σ$ -algebra of $F$ that specifies which variance components are treated as systematic. It bears repeating that the $σ$ -algebra $G$ is not, in itself, an operator but a structured collection of measurable events that encode person-, situational-, ecological-, or contextual-level factors pertinent to the latent construct being assessed (Anastasi, 1983; Steyer et al., 1992; Zumbo, 2007, 2009, 2017, 2023; Zumbo et al., 2015). This ecological perspective emphasizes in vivo psychometric practices over routine, in vitro procedures (Zumbo, 2015). In my view, as described in Zumbo (2015), what I refer to as in vivo (as opposed to in vitro), the context is not a nuisance that “distorts the picture” but instead informs and shapes the attributes—that is, one cannot extract the context. Different test measurement processes define $G$ in distinct ways. Across these differing measurement processes, the choice of $G$ encodes theoretical commitments by distinguishing variance attributed to the latent construct, domain, or attribute of interest (systematic) from variance assigned to error (unsystematic), or stated otherwise, construct-relevant versus construct-irrelevant variance.

In summary, in CTT, the true score is a fixed property of the individual, defined by infinite replications. However, for the operator-theoretic reinterpretation of a true score as person-in-situation (Anastasi, 1983; Steyer et al., 1992), defined by conditional expectation, consistent with ecological validity and explanation-focused measurement (Zumbo, 2023).

Conclusion and Implications

The geometric view of reliability as projection provides both theoretical clarity and practical insight. Instead of being treated as an axiom of CTT, reliability emerges as a theorem: a necessary consequence of orthogonal projection in Hilbert space. In this section, we expand on the conceptual, knowledge mobilization, practical, and methodological implications of this framework.

Conceptual Clarity

CTT often presents reliability in a somewhat ad hoc fashion, with formulas justified by tradition rather than derivation. By situating reliability within the projection theorem, we see that:

Reliability is not a psychometric convention but a mathematical necessity.

The decomposition $X = T + E$ with $T ⊥ E$ is guaranteed by Hilbert space geometry.

Reliability is precisely the squared cosine of the angle between $X$ and $T$ , a universal measure of projection efficiency.

This reframing shifts reliability from being a peculiar artifact of test theory to being a manifestation of a general structure underlying regression, factor analysis, and time-series forecasting.

By grounding reliability in the projection theorem, we elevate it from a psychometric convention to a universal mathematical structure. This has several broader impacts:

It builds a bridge between psychometrics and statistics, situating test theory alongside regression and time series as part of the same projection framework.

It clarifies to practitioners that reliability is not arbitrary but necessary, derived from geometry.

It empowers educators to teach reliability visually and intuitively, enhancing accessibility for students.

Ultimately, this reframing strengthens the foundations of measurement theory, both conceptually and pedagogically.

Practical Implications for Test Design

The projection view clarifies several design principles:

Increasing reliability means aligning observed items more closely with the latent trait (i.e., increasing loadings in factor models).

Sources of unreliability correspond to residual variance—anything orthogonal to the trait of interest.

Item selection can be seen as choosing variables with high cosine alignment to the latent axis.

Thus, test construction becomes the task of engineering vectors that project efficiently onto the latent subspace.

Footnotes

Appendix A

Appendix B

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Canada Research Chairs program (AWD-016179 UBCEDUCA 2020) and the UBC Paragon Research Initiative to Bruno Zumbo (Award Number: AWD-024645 UBCEDUCA 2023), and the UBC Distinguished University Scholar program.

ORCID iD

Bruno D. Zumbo

References

Anastasi

(1983). Traits, states, and situations: A comprehensive view. In Wainer

Messick

(Eds.), Principals of modern psychological measurement (pp. 345–356). Erlbaum.

Armor

D. J.

(1974). Theta reliability and factor scaling. In Costner

(Ed.), Sociological methodology 1973–1974 (pp. 17–50). Jossey-Bass.

Athreya

K. B.

Lahiri

S. N.

(2006). Measure theory and probability theory. Springer.

Billingsley

(1995). Probability and measure (3rd ed.). Wiley.

Bonanomi

Cantaluppi

Ruscone

M. N.

Osmetti

(2015). A new estimator of Zumbo’s ordinal alpha: A copula approach. Quality & Quantity, 49, 941–953. https://doi.org/10.1007/s11135-014-0114-8

Brennan

R. L.

(2001). Generalizability theory. Springer-Verlag. https://doi.org/10.1007/978-1-4757-3456-0

Cronbach

L. J.

(1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297–334. https://doi.org/10.1007/BF02310555

Cronbach

L. J.

Rajaratnam

Gleser

G. C.

(1963). Theory of generalizability: A liberalization of reliability theory. British Journal of Statistical Psychology, 16(2), 137–163. https://doi.org/10.1111/j.2044-8317.1963.tb00206.x

Durrett

(2010). Probability: Theory and examples. Cambridge University Press.

10.

Gadermann

A. M.

Guhn

Zumbo

B. D.

(2012). Estimating ordinal reliability for Likert-type and ordinal item response data: A conceptual, empirical, and practical guide. Practical Assessment, Research & Evaluation, 17, 3. https://pareonline.net/pdf/v17n3.pdf

11.

Gulliksen

(1950). Theory of mental tests. Wiley.

12.

Guttman

(1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255–282. https://doi.org/10.1007/BF02288892

13.

Guttman

(1953). A special review of Harold Gulliksen, Theory of mental tests. Psychometrika, 18, 123–130.

14.

Kroc

Zumbo

B. D.

(2020). A transdisciplinary view of measurement error models and the variations of X = T + E. Journal of Mathematical Psychology, 98, Article 102372. https://doi.org/10.1016/j.jmp.2020.102372

15.

Liu

A. D.

Zumbo

B. D.

(2009). The impact of outliers on Cronbach’s coefficient alpha estimate of reliability: Ordinal/rating scale item responses. Educational and Psychological Measurement, 70(1), 5–21. https://doi.org/10.1177/0013164409344548

16.

Lord

F. M.

Novick

M. R.

(1968). Statistical theories of mental test scores. Addison-Wesley.

17.

McDonald

R. P.

(1999). Test theory: A unified treatment. Lawrence Erlbaum Associates.

18.

Novick

M. R.

(1966). The axioms and principal results of classical test theory. Journal of Mathematical Psychology, 3(1), 1–18. https://doi.org/10.1016/0022-2496(66)90002-2

19.

Novick

M. R.

Lewis

(1967). Coefficient alpha and the reliability of composite measurements. Psychometrika, 32(1), 1–13. https://doi.org/10.1007/BF02289400

20.

Raykov

Marcoulides

G. A.

(2011). Introduction to psychometric theory. Routledge. https://doi.org/10.4324/9780203841624

21.

Rozeboom

W. W.

(1966). Foundations of the theory of prediction. Dorsey Press.

22.

Sijtsma

(2009). On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika, 74(1), 107–120. https://doi.org/10.1007/s11336-008-9101-0

23.

Sijtsma

Pfadt

J. M.

(2021). Part II: On the use, the misuse, and the very limited usefulness of Cronbach’s alpha: Discussing lower bounds and correlated errors. Psychometrika, 86(4), 843–860. https://doi.org/10.1007/s11336-021-09789-8

24.

Spearman

(1904). “General intelligence,” objectively determined and measured. The American Journal of Psychology, 15(2), 201–293. https://doi.org/10.2307/1412107

25.

Steyer

(1988). Conditional expectations: An introduction to the concept and its applications in empirical sciences. Methodika, 2, 53–78.

26.

Steyer

(1989). Models of classical psychometric test theory as stochastic measurement models: Representation, uniqueness, meaningfulness, identifiability, and testability. Methodika, 3, 25–60.

27.

Steyer

Ferring

Schmitt

M. J.

(1992). States and traits in psychological assessment. European Journal of Psychological Assessment, 8(2), 79–98.

28.

Steyer

Mayer

Geiser

Cole

D. A.

(2015). A theory of states and traits—revised. Annual Review of Clinical Psychology, 11, 71–98. https://doi.org/10.1146/annurev-clinpsy-032813-153719

29.

Steyer

Nagel

(2017). Probability and conditional expectation: Fundamentals for the empirical sciences. John Wiley.

30.

Steyer

Schmitt

M. J.

(1990). Latent state-trait models in attitude research. Quality & Quantity, 24(4), 427–445. https://doi.org/10.1007/BF00152014

31.

Webb

N. M.

Shavelson

R. J.

Haertel

E. H.

(2006). Reliability coefficients and generalizability theory. Handbook of Statistics, 26, 81–124. https://doi.org/10.1016/S0169-7161(06)26004-8

32.

Williams

R. H.

Zimmerman

D. W.

(1977). The reliability of difference scores when errors are correlated. Educational and Psychological Measurement, 37(3), 679–689. https://doi.org/10.1177/001316447703700310

33.

Zimmerman

D. W.

(1969a). An item sampling model for the reliability of composite tests. Educational and Psychological Measurement, 29(1), 49–59. https://doi.org/10.1177/001316446902900103

34.

Zimmerman

D. W.

(1969b). A simplified probability model of error of measurement. Psychological Reports, 25(1), 175–186. https://doi.org/10.2466/pr0.1969.25.1.175

35.

Zimmerman

D. W.

(1969c). Test reliability and parameters of observed score distributions. The Journal of Experimental Education, 37(3), 92–96. http://www.jstor.org/stable/20157043

36.

Zimmerman

D. W.

(1970). Variability of test scores and the split-half reliability coefficient. Educational and Psychological Measurement, 30(2), 259–266. https://doi.org/10.1177/001316447003000205

37.

Zimmerman

D. W.

(1972). Test reliability and the Kuder-Richardson formulas: Derivation from probability theory. Educational and Psychological Measurement, 32(4), 939–954. https://doi.org/10.1177/001316447203200408

38.

Zimmerman

D. W.

(1975). Probability spaces, Hilbert spaces, and the axioms of test theory. Psychometrika, 40(3), 395–412. https://doi.org/10.1007/BF02291765

39.

Zimmerman

D. W.

(1976). Test theory with minimal assumptions. Educational and Psychological Measurement, 36(1), 85–96. https://doi.org/10.1177/001316447603600107

40.

Zimmerman

D. W.

Williams

R. H.

Burkheimer

G. J.

(1968). Dependence of test reliability upon heterogeneity of individual and group score distributions. Educational and Psychological Measurement, 28(1), 41–46. https://doi.org/10.1177/001316446802800104

41.

Zimmerman

D. W.

Zumbo

B. D.

(2001). The geometry of probability, statistics, and test theory. International Journal of Testing, 1(3–4), 283–303. https://doi.org/10.1080/15305058.2001.9669476

42.

Zumbo

B. D.

(1999). A glance at coefficient alpha with an eye towards robustness studies: Some mathematical notes and a simulation model (Paper No. ESQBS-99-1). University of Northern British Columbia, Edgeworth Laboratory for Quantitative Behavioural Science.

43.

Zumbo

B. D.

(2007). Validity: Foundational issues and statistical methodology. In Rao

C. R.

Sinharay

(Eds.), Handbook of statistics (Vol. 26; pp. 45–79). Elsevier Science B.V. https://doi.org/10.1016/S0169-7161(06)26003-6

44.

Zumbo

B. D.

(2009). Validity as contextualized and pragmatic explanation, and its implications for validation practice. In Lissitz

R. W.

(Ed.), The concept of validity: Revisions, new directions, and applications (pp. 65–82). Information Age Publishing.

45.

Zumbo

B. D.

(2015, November) Consequences, side effects and the ecology of testing: Keys to considering assessment “in vivo” [Plenary address]. Annual Meeting of the Association for Educational Assessment—Europe (AEA-Europe), Glasgow, Scotland. https://youtu.be/0L6Lr2BzuSQ

46.

Zumbo

B. D.

(2017). Trending away from routine procedures, toward an ecologically informed in vivo view of validation practices. Measurement: Interdisciplinary Research and Perspectives, 15(3–4), 137–139. https://doi.org/10.1080/15366367.2017.1404367

47.

Zumbo

B. D.

(2023). A dialectic on validity: Explanation-focused and the many ways of being human. International Journal of Assessment Tools in Education, 10, 1–96. https://doi.org/10.21449/ijate.1406304

48.

Zumbo

B. D.

Gadermann

A. M.

Zeisser

(2007). Ordinal versions of coefficient alpha and theta for Likert response scales. Journal of Modern Applied Statistical Methods, 6, 21–29. https://digitalcommons.wayne.edu/jmasm/vol6/iss1/4/

49.

Zumbo

B. D.

Kroc

(2019). A measurement is a choice and Stevens’ scales of measurement do not help make it: A response to Chalmers. Educational and Psychological Measurement, 79(6), 1184–1197. https://doi.org/10.1177/0013164419844305

50.

Zumbo

B. D.

Liu

A. D.

Shear

B. R.

Olvera Astivia

O. L.

Ark

T. K.

(2015). A methodology for Zumbo’s third generation DIF Analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136–151. https://doi.org/10.1080/15434303.2014.972559

51.

Zumbo

B. D.

Rupp

A. A.

(2004). Responsible modeling of measurement data for appropriate inferences: Important advances in reliability and validity theory. In Kaplan

(Ed.), The SAGE handbook of quantitative methodology for the social sciences (pp. 74–93). Sage. https://doi.org/10.4135/9781412986311.n4

52.

Zumbo

B. D.

(in press). Theta reliability and new estimators beyond Alpha’s shadow. In Sinharay

(Ed.), Encyclopedia of measurement in the social sciences. Elsevier.

Reliability as Projection in Operator-Theoretic Test Theory: Conditional Expectation,Hilbert Space Geometry,and Implications for Psychometric Practice

Abstract

Keywords

Purpose and Structure of the Paper

Preliminary Remarks: Definitions and Levels of Abstraction in Mental Test Theory

Definitions of Reliability

Definition of CTT

Mathematical Consequences (Not Assumptions)

Definitions of Reliability in CTT

Hilbert Space Framework and Conditional Expectation

Variance Decomposition and Reliability as Projection

Mathematical Structure of Operator-Theoretic Test Theory

Definition of Operator-Theoretic Test Theory

Definition (True Score)

Definition (Error Score)

Reliability as Projection Norm

Reliability as a Squared Cosine

Reliability as Projection

Reliability Follows as a Corollary of Projection Geometry in L 2

Reliability as a Rayleigh Quotient

Implications for Applied Psychometrics

Reliability as a Spectral Property of the True-Score Operator

Connections to Estimation and Measurement Error

Geometric and Statistical Unification

Summary, Axioms, and Theorems in Operator-Theoretic Test Theory

Terminological Clarification

Extensions to Regression, Factor Models, and Time Series

Regression: Reliability as R 2

Factor-Analytic Communality as Projection

Time Series: Reliability as Predictability

Practical Implications

Integrative Implications

Numerical Illustrations

Example: Reliability in a Two-Variable System

Operator-Theoretic Computation

Implications for Estimation and Measurement

Reliability Estimation and Sampling

Conceptual and Interpretive Implications

Conclusion and Future Directions

New Avenues for Psychometric Research

On Quantifiers and the Estimand When Selecting Reliability Coefficients

Bounded Interpretations and the Unification of Diverse Reliability Coefficients

Conceptual Foundations

Quantification Versus Statistical Estimation

Reliability as a Bounded Object

Operator-Theoretic and Geometric Reframing

Practical Implications

Integrative Contribution

Reinterpretation of True Scores in Operator-Theoretic Test Theory

True Score as Conditional Expectation Over All Possible Outcomes for a Respondent, Not Repeated Testing

Conclusion and Implications

Conceptual Clarity

Practical Implications for Test Design

Footnotes

Appendix A

Appendix B

Declaration of Conflicting Interests

Funding

ORCID iD

References

Reliability Follows as a Corollary of Projection Geometry in $L^{2}$

Regression: Reliability as $R^{2}$