Sage Journals: Discover world-class research

Abstract

Psychometrics, the science sustaining psychological test construction and use, has consistently ignored logical criticism of its foundations, even though science as a cognitive enterprise requires criticism. This accumulating corpus of criticisms is generally unacknowledged in histories of the discipline, in textbooks, and in course curricula. Thus, there is a mystery here as to why this critical resource is missing. Considering three critical offerings from before 1950, it is clear that by mid-century, psychometric methods, consistently marketed as instruments of scientific measurement, were bereft of evidence supporting that boast. A solution to this mystery is proposed.

Keywords

psychometrics quantitative structure qualitative structure normal law of error technological paradigm

Introduction

Those practicing psychometrics (including psychologists, educationalists, sociologists, and others) have failed to support claims to measure psychological attributes (such as intellectual abilities, personality traits, social attitudes) by neglecting the following crucial step: they have failed to investigate whether such attributes actually possess quantitative structure¹ (e.g., Michell 1990, 2025a). Commonly used quantitative psychometric theories (factor analytic theories, item response theories, etc.) characterise psychological attributes as measurable quantities. However, since there is no logical necessity that attributes must be quantitative, the issue of whether any given one is quantitative is an empirical matter and, so, claims to measure² psychological attributes (in the sense of measure required by quantitative theories) are empty when devoid of empirical evidence supporting quantitative structure. While some might conclude that highlighting this fundamental failure represents an anti-psychometric stance, the opposite is the case: it is pro-psychometrics. It aims to strengthen psychometrics’ culture of criticism (criticism being a necessary engine driving science forward) and to ensure that psychometrics conforms to the traditions of quantitative science. Within these traditions, as present in, say, physical science, claims to measure are underwritten by evidence of quantitative structure in the relevant attributes. Those who claim to measure psychological attributes ignoring this issue of evidence leave psychometrics exposed to potentially debilitating criticism. Therefore, the aim of highlighting this failure is to advance psychometrics as a science. To this end, not only have criticisms been made but avenues via which relevant evidence may be obtained have been explored and relevant empirical investigations undertaken (e.g., Kyngdon 2006; Kyngdon and Richards 2007; Michell 1994).

Significantly, since the inception of modern psychometrics, the issue of whether psychological attributes are quantitative has been raised by an extensive array of critically minded psychologists, philosophers, and educationalists. My aim in this paper is: first, to consider three critical contributions formulated during the first half of the twentieth century highlighting psychometrics’ evidential deficit, which show that by mid-century this deficiency was clearly evident to any who cared to look; and, second, to propose an explanation for the failure to meet the empirical challenges required by this deficit if psychometrics is to realise its long-standing ambition to become a “quantitative rational science” (Heiser and Hubert 2016, 1175).

The Chorus of Published Psychometric Criticism

Psychologists are unaware that an undercurrent of criticism³ has always attended psychometrics. To call it a chorus may seem misleading because critics rarely sing simultaneously, although often singing the same song. They are a diachronic, not a synchronic chorus. Those I have located include the following: (Adams 1931; Boring 1920; Brown 1934; Berka 1983; Blinkhorn 1997; Borsboom 2005; Briggs 2022; Essex and Smythe 1999; Fillmore-Patrick 2025; Gould 1981; Goldstein and Wood 1989; Garrison 2009; Grice et al. 2012; Heine and Heene 2025; Johnson 1936; Kyngdon 2011; Lumsden 1976; McCormack 1922; Thomson 1916; Maraun 1998; McGrane and Maul 2020; Nash 1990; Reese 1943; Smith 1938; Sutcliffe 1986; Schönemann 1994; Trendler 2009; Vautier et al. 2012; Wilson 1928).

These critiques have had little effect and are repressed in histories of the discipline (e.g., Jones and Thissen 2007). Even Derek Briggs (2022), whose history highlights controversies, discusses few of the above. For all the difference they made most might never have been penned. Logical criticisms are unsung in authoritative mainstream textbooks (such as Lord and Novick 1968 or McDonald 1999) and have minimal institutional recognition in course curricula. While largely a corpus incognito, they are in reality an unacknowledged treasure trove. Given their pertinence, their neglect by a cognitive enterprise presents a mystery.⁴ In the next sections, I consider three very different critiques to illustrate their range and significance. Then a solution to the mystery is proposed.

The critiques considered here are as follows: first, Thomson’s critique of Spearman’s two factor theory of intellectual ability, which showed that a non-quantitative alternative hypothesis accounted for relevant data as successfully as Spearman’s quantitative theory; second, Boring’s critique of the psychometric practice of using the normal distribution to infer supposedly quantitative measures from test score data, which showed that this practice begs the question of whether psychological attributes are quantitative; and third, Smith’s criticism that measurement in the desired sense requires evidence of quantitative structure. These critiques show that by the middle of the twentieth century, psychometrics’ evidential deficit was clear, implying that claims to measure psychological attributes are unsubstantiated. The fact that those engaged in psychometrics have failed to investigate the issue of quantitative structure indicates a significant scientific failure and such a lapse in standard scientific processes requires explanation.

Sir Godfrey Thomson’s Non-Quantitative Alternative

Thomson challenged Charles Spearman’s two-factor theory of abilities, a still influential theory. What Thomson specifically challenged was a premise of this theory, one that became an axiom of the psychometric paradigm. Spearman’s theory emerged from factor analysis (Michell 2023a; Spearman 1904) and he thought that by factor analysing correlation coefficients between cognitive tests, measures of general ability or g could be obtained. This overestimated factor analysis’s logical reach. Based upon numerical data (generally, covariance indices between total test scores), factor analysis necessarily yields numbers, but concluding that these measure mental abilities is invalid.

Long before Spearman, Galton (1869) dreamt of measuring “natural ability,” ignoring the fact that mental abilities are never experienced as quantitative attributes (Michell 2022). He thought psychology’s destiny lay in emulating physics’ quantitative trajectory, enshrining this conviction in the term “psychometry” (Galton 1879, 149), coined for his number-generating methods. Spearman thought factor analysis realised Galton’s dream.

Thomson, on the other hand, saw its limitations and queried Spearman’s conclusions. This critical stance reflected Thomson’s background: he had received his PhD under Ferdinand Braun, “the great wireless telegraphy expert” (Thomson 1952, 280), at Strasburg in 1906, also attending lectures on Einstein’s theory of relativity, as good an introduction as any to the role of criticism in science. Returning to Britain to teach psychology, he encountered Spearman’s work. Unlike Spearman, who considered his theory a “Copernican Revolution” (Spearman 1927a, 325), Thomson proposed ingenious counterexamples demonstrating, “Professor Spearman has drawn over-hasty conclusions” (Thomson 1921, vi).

Without detailing these counterexamples (see Bartholomew et al. 2009; Briggs 2022), Thomson’s proposals culminated in his “sampling theory of ability” (Thomson 1919). “Let us suppose,” he speculated,

the mind, in carrying out any activity such as a mental test, has two levels at which it can operate. The elements of activity at the lower level are entirely specific; but those at the higher level are such that they may come into play in more than one kind of activity, in more than one mental test. These elements are assumed to be additive like dice, and each to act on the “all or none” principle, not being in fact further divisible. (Thomson 1919, 341)

The salient point is his invocation of the “all or none” principle. According to him, our brains contain elements (“bonds” [Thomson 1935, 89]), which are switched “on” or “off,” each subserving, in the case of higher-level bonds, many different kinds of cognitive activities or, in the case of lower-level ones, only specific kinds of tasks. Attempting an ability test item recruits a random sample of elements of both levels, producing a response (either correct or incorrect depending upon the elements activated) and thus contributes to a person’s total test score. Thomson’s alternative accounted for relevant data as well as Spearman’s theory, but crucially, without postulating quantitative attributes (such as g), relying only upon discrete, all-or-nothing processes. If true, the processes underlying intellectual performance would not be quantitative attributes and Spearman’s presupposition of underlying quantitatively structured abilities would be false.⁵

That a non-quantitative theory explains the pattern of correlation coefficients as well as a quantitative one is unsurprising. Covariance indices, upon which factor analyses are based, are calculated from total test scores, which in turn derive from the ordered series of correct or incorrect answers given to test items. On each test, a person’s ordered series of correct or incorrect responses (that person’s response pattern) is the original data, total scores being derivatives thereof. Because the classification of responses as “correct” or “incorrect” is qualitative, the original data underlying factor analysis is qualitative, not quantitative. For their explanation, qualitative data do not necessitate postulating quantitative causes. That is, specifying a person’s cognitive resources (e.g., their knowledge states, skills, and strategies) is sufficient to explain response patterns. Behind these qualitative causes lie deeper causes, but there is no reason to believe that these are quantitative either: explanation of qualitative cognitive resources no more requires postulating quantitative causes than does the explanation of qualitative response patterns. While logically possible, Spearman’s quantitative theory violates Occam’s razor, adding unnecessary complexity (i.e., the complexity of continuous quantitative attributes versus the simplicity of finite discrete qualitative attributes).

Spearman resisted seeing this, leading Thomson to lament, “the nature of my disagreement has almost always been misunderstood” (Thomson 1946, 1). Spearman’s blindness derived from his misinterpretation of “factor.” Prior to factor analysis, in scientific parlance, “factor” meant “any one of a plurality of causes or conditions which together determine a thing or event” (Baldwin 1901, 368). Spearman, in line with this but also presuming that the numerical outputs of factor analysis measure causal factors, used “factor” to describe both those outputs and underlying causal factors (i.e., intellectual abilities), believing they are the same.

Consequently, when Thomson admitted that factor analysing performances on cognitive tests entails more or less g, with the qualification that “g is interpreted as a mathematical entity only” (i.e., as a numerical output) “and judgment is suspended as to whether it is anything more than that” (Thomson 1939, 240), Spearman misconstrued it as an “endorsement of g” (the causal factor) (1946, 121), which it was not. Spearman’s identification of g (the product of factor analysis) with g (the measure of the causal factor, falsely believing that “like all measurements anywhere, [it] is primarily not any concrete thing but only a value or magnitude” (Spearman 1927b, 75) blinded him to Thomson’s distinction. Thomson knew these two concepts are logically distinct and that without supplementary evidence, equating them is unwarranted. Possessing a viable alternative theory, he accepted the reality of numerical g, the artefact of factor analysis, but was agnostic about g, the measure of Spearman’s hypothesised quantitative causal factor. Spearman, on the other hand, saw g as like a card viewed from two sides: one side, portraying a measure obtained via factor analysis; the other, a causally efficacious attribute, “noegenesis” (1931, 408). Thus, convinced he had measured g, he mistook Thomson’s acceptance of mathematical g as endorsement of his theory.

Because factor analysis was seen as a methodological godsend, Spearman’s blind-spot influenced psychometrics and Thomson’s perspicacious critique was given short shrift. For example, Guilford’s Psychometric Methods (1936 [1954]), a leading textbook of the next generation, curtly complained that there was “little likelihood of demonstrating experimentally the existence of the elements hypothesized” (476) without noting that exactly the same difficulty plagued factor-analytic theories and missing the point that Thomson’s alternative was proposed primarily to demonstrate the logical possibility of non-quantitative alternatives. From then on, Thomson’s sampling theory had negligible impact upon mainstream psychometrics.

Had psychometrics been a normal science,⁶ the veracity of Thomson’s critique would have been accepted and the range of theories considered widened to include qualitative alternatives. Thomson showed that the presupposition that the psychological causes underlying intellectual performance (i.e., abilities) are quantitative attributes was an over-hasty conclusion and that tests of quantitative theories (like Spearman’s) against non-quantitative alternatives were required. But psychometrics was not to be deflected from its quantitative trajectory, a legacy Galton bequeathed, and Thomson, “Once ranked with the top names in intelligence … alone goes almost unmentioned” (Deary et al. 2010, 96). The heroic effort by Bartholomew et al. (2009) to give Thomson’s alternative “a new lease of life” (567) produced little recognition that psychometrics’ quantitative trajectory is not necessarily nature’s path.

Edwin Garrigues Boring and the Normal Law of Error

E. G. Boring, remembered now as historian, was an experimental psychologist, who oversaw intelligence testing of recruits in World War I. Familiar with the quantity objection to psychophysics (i.e., psychophysical measurement is impossible because sensory intensities are not quantitative [Titchener 1905]) and having a degree in electrical engineering, he queried psychometrics’ claims to measure intelligence.

While endorsing the quantitative imperative⁷ (“We hardly recognize a subject as scientific if measurement is not one of its tools” [Boring 1929, 286]), he thought that without experimental evidence psychometrics could not progress beyond merely collecting frequencies (i.e., total test scores). In a paper containing his slogan, “intelligence is what the tests test” (Boring 1923, 35) (said to anticipate Bridgman’s [1927] operationism [e.g., Rogers 1992; Mills 1992] and still misunderstood [e.g., van der Maas et al. 2014]), he sought to redirect psychometrics, not towards operationism but towards bridging the gap between test scores and the theoretical concept of intelligence.

Operationists,⁸ confusing what is measured with how it is measured (Michell 1990), identify the meaning of concepts with the operations used to measure them, which would imply that the operation of testing defines intelligence. That was not Boring’s view and his slogan is no more operationist than “blood pressure is what sphygmomanometers test.” He claimed that “intelligence as a measurable capacity must at the start be defined as the capacity to do well in an intelligence test” (Boring 1923, 35), construing intelligence as a yet-to be-discovered capacity causing intellectual performance. Even as late as 1933, Boring still viewed Spearman’s two-factor theory sympathetically, thinking of intelligence as a complex property, the character of which remained to be discovered. Indeed, this is what his 1923 paper actually treated: it concerned, not operationism, but what would now be called intelligence test validation without using that term (“validity” only just having entered psychometrics’ official lexicon (Courtis et al. 1921; Michell 2009). His intention was not to define intelligence operationally, but to indicate where intelligence might be found amongst the hidden causes of test performance.

The same spirit pervaded his 1920 paper on the normal law of error, which is my main focus. He noted, “Galton in the Hereditary Genius applied the normal law to mental differences and, using it a priori, worked from frequencies of natural ability to a scale of equal intervals of ability” (Boring 1920, 11–12). What was Galton’s reason for this unwarranted a priori manoeuvre? Referring to Quetelet’s observed distributions of certain physical features of “Frenchmen” and “Scotchmen,” which approximated the normal curve, Galton wrote,

Now, if this be the case with stature, then it will be true as regards every other physical feature—as circumference of the head, size of brain, weight of grey matter, number of brain fibres, &c.; and thence, by a step on which no physiologist will hesitate, as regards mental capacity. (Galton 1869, 31–32)

The claim that no physiologist would hesitate to infer the form of the distribution of mental capacity from observed distributions of physical features was hyperbole, like his claim that the normal law of error “would have been personified by the Greeks and deified, if they had known of it” (Galton 1889, 66). Galton presumed that (1) mental capacity is an attribute possessing a mathematically continuous distributional form (thereby presupposing that it is quantitative); and (2) that the normal curve is a Platonic ideal holding sway above the flux and concluded that mental capacity must be normally distributed. Both premises lacked evidence, so his conclusion was unwarranted.

However, Galton was idolised within psychometrics and to this day, the normal curve is treasured as a philosophers’ stone supposedly able to magically transform qualitative observations into quantitative measures. However, as Boring noted, it yields only “a precision of result that is an artefact” (1920, 33), there being “no alchemy of probabilities that will change ignorance into knowledge” (1920, 3).

Boring reasoned this way: on the one hand, there are those (i.e., mainstream psychologists) who, first, transform the observed distribution of test scores to a normal distribution and, second, from that normalised distribution claim to determine the unit of measurement for the relevant attribute; and on the other hand, there are those (i.e., critics of the mainstream) who would, first, establish a unit of measurement (presumably by discovering experimentally the quantitative structure of the relevant attribute, if indeed it is quantitative) and, second, using that unit of measurement, determine the form of the distribution. He recognised that only the latter path gets “the necessary scientific order” (Boring 1920, 30) right because knowledge (of the distributional form) can only be wrought out of knowledge (of the unit of measurement). The former route puts the cart before the horse because “It is wrongly supposed that knowledge could somehow be wrought out of ignorance” (Boring 1920, 33).

Boring’s critique prompted a retort from Truman Lee Kelley (“Thorndike’s pupil and Stanford’s copy of Karl Pearson,⁹ perhaps now America’s leading psychologist-statistician” (as Boring [1929, 528] described him). Kelley’s “early training was in mathematics” (Flanagan 1961, 343) and he also endorsed the quantitative imperative, writing, “in the field of psychology, if a designation of some trait or capacity of mental life, is to be given serious consideration, it must be such as to reveal itself as a measurable difference in conduct” (Kelley 1928, 3). While for Boring, psychology’s scientific reputation required exposing psychometrics’ flawed foundation, for Kelley, psychometrics’ scientific reputation required countering Boring’s critique. Attempting to strike Boring’s jugular, Kelley¹⁰ dismissed the claim that psychometrics lacks fixed units of measurement, observing that there is an arbitrariness about choosing units, not just that involving the familiar linear transformation between, say, centimetres and inches, but something more radical, involving non-linear transformations:

one could have a science of physical phenomena in which the units were such that the scale of time intervals was the square of the present intervals measured in seconds, and in which the length scale was logarithmic as compared with the present scale in centimeters, etc. …. (Kelley 1923, 418)

From this he concluded, “choice of the unit is purely a question of utility” (418), which would imply unrestricted choice. Whatever the truth of Kelley’s claim regarding non-linear transformations of physical measures (an issue dealt with in detail in Krantz et al. 1971; Michell 1993), his argument fails because it begs the question, presuming test scores are measurements of quantitative attributes. Test scores are frequencies (number of correct responses) and not necessarily measures in the desired sense and when transformed non-linearly are neither frequencies nor, as far as is known, measures. Importantly, as already noted, frequencies do not require quantitative attributes for their explanation, hence, it is unwarranted to treat them as measures of anything.

Nonetheless, Kelley’s question-begging response was accepted at face value in psychometrics. For example, Stevens, who, incidentally had attended lectures by Kelley as a student (Stevens 1974), and, who, from mid-century onwards, was psychology’s measurement theory guru (Michell 2002), declared, “the assumption of normality has the advocacy of a certain pragmatic usefulness in the measurement of many human traits” (Stevens 1951, 28); and a leading psychometric spokesman pronounced,

A good argument can be made that there are no “real” or “correct” intervals for any measurement scale, but rather that the intervals are established as a matter of convention. … The issue is one of which calibration of intervals will prove most useful in the long run. (Nunnally 1970, 21)

The delusion that measurement is attained by assuming desired distributional forms was taken as the optimal route to psychometrics’ preferred quantitative destination.

The role of the “normal law of error” (and approximations thereto) as devices for conjuring metrics out of thin air continues unabated in psychometrics, not through normalising test score distributions (the dominant practice in Boring’s day) but by incorporating error distributions into the fabric of probabilistic item response models (see Michell 2004, 2014; Sutcliffe 1986) in such a way that what is most wanted (measurement) is wrought from what is least known (the hypothesised form of the error distribution) by mere postulation. As a past president of the Psychometric Society acknowledged regarding “measurement” so fabricated, “its metric—not only the origin and unit of measurement, but its entire calibration—is not given by data and generally must be imposed by the model” (McDonald 2013, 123). Boring was right: psychometric measurements are illusory and he concluded, “We are left then with the rank-orders of our psychological quantities … and it is with these rank-orders that we must deal” (1920, 33), a judgment too frugal for most psychologists, but actually too generous given test data’s actual structure.

Bunnie Othanel Smith and the Issue of Quantitative Structure

Smith (1938) appraised psychometrics from the perspective of N. R. Campbell’s (1920, 1928) measurement theory. Evaluations from this perspective already existed (Johnson 1936; McGregor 1935), but Smith’s was more thorough. Campbell, a physicist, had defended a version of the representational theory of measurement, according to which measurement is defined as the assignment of numerals to represent attributes other than number, in virtue of laws governing the structure of these attributes, and he distinguished so-called fundamental from derived measurement. For fundamental measurement to be possible, these laws must show that attributes are ordered and involve a physical operation analogous to numerical addition, the paradigm case being length, in which the operation of physical addition involves conjoining rigid straight rods end to end linearly. Representing attributes numerically enables relationships between them to be expressed as numerical laws, which in turn sometimes enables systems of numerical constants to be identified and thus allows derived measurement, the paradigm case being measurement of the density of substances identified as constant ratios of mass and volume. While Campbell believed that his theory of measurement accommodated physical measurement, it seemed unlikely that such a conception fits so-called “measurement” in psychometrics. Thus, it was important to raise doubts about psychologists’ claims.

Campbell’s approach to discovering whether measurement is possible in areas of scientific investigation contrasted with the psychometric approach, which derived from Galton’s position on how to subject “the qualities of life and mental processes to mathematical treatment” (Smith 1938, 33). According to Galton, psychometrics is “the art of imposing measurement and number upon operations of the mind” (Galton 1879, 149; emphasis added). The crucial contrast here is between discovering something in nature and imposing something upon nature. Smith, following Campbell, recognised that the possibility of measuring an attribute hinges upon discovering that it possesses quantitative structure. This contrasts with the psychometric approach according to which measurement is a matter of imposing number-generating operations upon the relevant attribute, no matter its structure.

Smith’s book appeared in 1938, on the eve of World War II and coincided with publication of the Ferguson committee reports (Ferguson et al. 1938, 1940) into psychophysical measurement, a committee dominated by Campbell. These raised doubts regarding attempts at psychophysical measurement. Stevens, whose research area was psychophysics, mulled over them (see his previously unpublished 1939 paper [Stevens 2006; Marks 2006]) and his emphatic repudiation of Campbell (Stevens 1946) came immediately after the War (see Michell 1999). Stevens’s attempted redefinition of measurement was quickly accepted by psychologists and had the effect of quashing logical criticisms of psychometrics for three decades. Smith’s defence of Campbell stood little chance of an audience, let alone a fair hearing alongside the comforts Stevens’s message seemed to promise and garnered only one review in a psychology journal (Cureton 1939), which failed to grasp the import of his critical message.

Smith (1938) saw that measurement begins with “a search for a special kind of structure” (57) in the attributes one aspires to measure and since the “only way in which a particular structure of any character of nature can be ascertained is by careful observation and experimentation” (61), measurement can only be achieved by making the required observations and performing the necessary experiments. These must test whether the relevant attribute is ordered and possesses additive structure.

There are limitations to Smith’s presentation. He failed to acknowledge that even in physics observational and experimental data rarely meet conditions for order and additive structure exactly and therefore one is looking more for signs of quantitative structure in messy data (Michell 2007). And while he recognised that evidence for additive structure may involve identifying very different kinds of empirical operations in different attributes, his treatment of indirect forms of evidence for additive structure, such as found in constancies in ratios of mass to volume (supporting the hypothesis that density is quantitative) and regularities in the way temperature varies with volume and pressure (supporting the hypothesis that temperature is quantitative) is sketchy.

He was unaware of the back door to evidence of additive structure provided by the theory of conjoint measurement (only introduced to psychology two and a half decades later by Luce and Tukey 1964), although the basic ideas were present (see the historical note in Krantz et al. [1971, 259]). While he referenced Nagel (1930), had he noted Nagel’s references, he might have benefited from Hölder’s (1901) specification of axioms for difference measurement, which anticipate conjoint measurement (Michell and Ernst 1997). Nonetheless, these limitations do not detract from his insistence upon observation and experiment to support (or refute) the hypothesis that psychological attributes possess quantitative structure, which is where attempts at measurement in psychometrics consistently fall short.

The Critical Situation at Mid-Century

By mid-twentieth century, Thomson had demonstrated that non-quantitative theories are a viable alternative to mainstream quantitative theories, which entails doubts regarding the presumed superiority of quantitative theories; Boring had demonstrated that using off-the-shelf probability distributions as measurement mechanisms begs the question of whether mainstream psychometric claims to measure psychological attributes are true; and Smith insisted that evidence for quantitative structure could only be obtained by observational and experimental research, which entails doubts about whether psychological attributes are quantitative.

Instead of working to resolve these doubts, psychologists continued to presume that their tests measure psychological attributes. This has been a perpetual stance: for example, at the birth of the Psychometric Society, its journal, Psychometrika, (founded in 1936) was dedicated to developing “a quantitative rational science” (Heiser and Hubert 2016, 1175) and the cover description for a recent set of Essays on Contemporary Psychometrics (van der Ark et al. 2023) still described psychometrics as “the science devoted to the advancement of quantitative measurement practices in psychology, education and the social sciences.” Such a stance takes for granted that abilities, etc., are quantitative attributes and alternative possibilities were not systematically investigated; and probability distributions were recruited as scaling devices without testing their veracity. In a normal science, evidential gaps stimulate remedial investigations. Not so psychometrics: measurement of psychological attributes using psychometric theories and methods was taken to be a fait accompli. As a widely used methodological resource put the matter: “Most behavioural and social science data are ordinal. However, through certain scaling methods and assumptions, it can be considered as interval scale data” (Kerlinger and Lee 1964 [2000], 639).

The result was a disconnect between the mathematical inventiveness of its quantitative theories and the character of the phenomena underlying psychological test performance, a disconnect captured in a remark of a past president of the psychometric society, “I keep thinking of psychometrics as being part of statistics, not so much ‘psycho’” (as quoted in Wijsen and Borsboom 2021, 335). A century earlier, Boring had warned of just this disconnect: “it is senseless to seek in the logical processes of mathematical elaboration a psychologically significant precision that was not present in the psychological setting of the problem” (1920, 33), but it had immediately fallen on deaf ears. The presupposition that psychological attributes are quantitative was elevated to axiomatic status constraining psychologists to impose a level of mathematical complexity not known to be present in the psychological setting of testing upon the phenomena of test performances.

As a consequence, psychometrics is neither a quantitative science (because relevant psychological attributes have not been shown to possess quantitative structure) nor a rational science (because valid criticisms are ignored) and to the extent that it is science, it is pathological (Michell 2000, 2008) because these attitudes subvert the scientific aim of finding the causes of test performance. And yet, despite that, psychometrics always was, is now, and apparently ever will be a thriving technology (as measured, say, by marketing success). However, because this success involves presenting tests as instruments of scientific measurement, that is, as something they are not, psychometrics is a myth-based technology.

Psychometrics: Myth-Based Technology

Psychometrics always had technological aspirations. Long before psychological tests were invented, Galton looked forward to the day when

a system of competitive examination for girls, as well as for youths, had been so developed as to embrace every important quality of mind and body, and where a considerable sum was yearly allotted to the endowment of such marriages as promised to yield children who would grow into eminent servants of the State. (Galton 1865, 165)

Psychometrics came into being to solve this eugenic “problem” of selecting couples adjudged fit to procreate. In step with Galton, Spearman announced, “the eugenic problem from which we set out has reached a definite solution in the theory of ‘two factors’” (1914, 229), thinking factor analysis enabled measurement “of a minimum index to qualify … above all, for the right to have offspring” (Hart and Spearman 1912, 79).¹¹

Psychometrics did not invent the conviction that psychological attributes are quantitative, but it did invent the technology of testing. The presupposition that mental attributes are quantitative predates psychometrics by more than two millennia, the most influential expression being Plato, who considered the idea that our salvation depends upon finding the right measures of pleasure and pain. This quantitative presupposition was part of a continuous intellectual tradition enduring through to the British utilitarians, Galton’s contemporaries (Michell 2023b). Strange as it seems to modern minds, no measurement technology emerged from this tradition, whereas such a technology was pivotal for psychometrics.

There is no need to trace this technology’s history because its achievements are known: the work of Binet and Simon in identifying children needing special education, thereby creating the prototypes for future intelligence tests; the work of Goddard in adapting these tests to US conditions and similarly Burt in Britain; the work of Terman recasting intelligence tests as instruments for measuring IQs; of Yerkes in devising group tests for army recruits during World War I; and further innovations by Thurstone (Jones and Thissen [2007] summarise this history). The primary focus of psychometrics was always technological and substantive psychological theories played second fiddle to theories of test construction. Furthermore, the latter are not treated as raising empirical questions for investigation, but as providing answers enabling test construction. The presupposition that psychological attributes are measurable quantities is an axiom of this technology. As a result, psychometrics is guided by a myth-based technological paradigm.

Transferring Kuhn’s paradigm concept from the philosophy of science to technology studies, Dosi (1982, 152) defined a technological paradigm as a “‘model’ and a ‘pattern’ of solution of selected technological problems, based on selected principles derived from natural sciences and on selected material technologies.” Clearly, he had technologies utilising natural science in mind, but not all technologies are so based. For example, horoscopes are constructed using non-scientific, astrological principles, and the construction of tests as instruments of psychological measurement has always preferred wishful thinking to scientific investigation. Hence, Dosi’s definition is broadened here by replacing his phrase “based on selected principles derived from natural science” with “based on selected pragmatic principles.” This change serves to include not only the paradigm guiding horoscopes but also that guiding psychometrics and it indicates an important difference between scientific and technological paradigms.

Scientific paradigms track truth, in the sense that the aim of science is to discover the structure and ways of working of the systems under investigation; however, the focus of technological paradigms is different. Originally, construction of technological artefacts may have been guided by the belief that the procedures used deliver products fit for special uses. However, once technologies were mass-produced in market economies, the focus shifted. From then on, technological paradigms tracked markets. In market economies, marketing success trumps truth and in disciplines like psychology, where substantive theories and outcomes of interventions can be ambiguous, marketing success seems to provide a more tangible criterion than does scientific success. Hence, packaging tests as instruments of scientific measurement is a marketing strategy, not a truthful assessment.

Thus, psychometrics is the scene of conflict between two paradigms, scientific and technological. According to its technological paradigm, packaging tests as scientific measurement devices is a profitable marketing strategy, which logical critiques directly threaten. According to its scientific paradigm, packaging tests as scientific measurement devices misrepresents them. However, in psychometrics, “When strict logic conflicts with practical utility, it is utility that usually wins, as it probably should” (Ebel and Frisbie 1965 [1991], 31). With this use of “should” the mystery of the missing corpus is solved: psychometrics’ technological paradigm holds the reins and logical critiques are bad for business.¹²

The reign of its technological paradigm over the discipline was tested in the 1950s when the suggestion was made to recast the use of testing technology as actuarial prediction (Meehl 1956) and to replace the rhetoric of scientific measurement by that of decision theory (Cronbach and Gleser 1957). Both suggestions fitted psychometric practice better than did the rhetoric of scientific measurement, nonetheless, that rhetoric prevailed, its marketing superiority obvious. The dominance of the myth-based technological paradigm was confirmed.

In the present cultural environment this dominance shows no sign of abating. For the past century, Western society, seized by “metric fixation” (Muller 2018, 17) has deified numerical indices because of a naive trust in numerical data (naïve because numbers offer no more security than anything else). Exploiting the fact that quantification is “a technology of distance”¹³ (Porter 1995, ix), these indices enable the exclusion of those adjudged unworthy of receiving various social opportunities by cynically claiming to measure relevant human attributes (cynical because decision-makers falsely assume the authority of scientific measurement). With its dazzling statistical models producing its endless array of questionable metrics, psychometrics represents the gold standard of the modern metric fad.

Nonetheless, the fact that the hidden critical corpus has steadily accumulated means that some psychologists march to the drumbeat of the scientific paradigm and decline to endorse the presupposition that tests measure psychological attributes. Opposing the mainstream, they believe that “We should not wish nature to accommodate itself to what seems better ordered and disposed to us; we should rather accommodate our own intellect to what nature has made, in the certainty that this is the best and only way” (from a letter of Galileo to Prince Cesi, 30 June, 1612; Galluzzi 2014 [2017], 114).

Given its social prominence, psychometrics’ pathological deviation from its scientific paradigm, requires a sociological explanation. Ken Richardson has identified one, but not the only social cause:

In effect, then, Galton’s aim, and that of his followers, became simply an attempt to reproduce an existing set of ranks (social class) in another, the test scores and pretend that the latter is a measure of something else. This is, and remains, the fundamental strategy of the intelligence-testing movement, and this gloss over the fundamentals of fully scientific measurement is what has dogged it throughout this century. Of course, the quantitative nature of the test helped to create the impression of scientific measurement. (Richardson 2000, 27)

However, what this particular cause implies is that those individuals affected, especially those deemed unworthy to receive social opportunities, have an interest in knowing that not only are tests not instruments of scientific measurement, but also that psychometrics ignores critiques showing this, while at the same time reaping social rewards through presenting its technology as something it is not known to be. In this sense and not inappropriately, the renegade philosopher, R. G. Collingwood, thought it a “fashionable scientific fraud” (1939, 95; see Michell 2020a).

Upshot

Given human fallibility, criticism is a necessary corrective assisting scientific progress. However, because critics question cherished convictions, those motivated to serve science need to be aware of the tactics used to deflect criticism. Psychometrics provides ready examples.

First, Spearman attempted to discredit Thomson’s critique by mocking what he called “Thomson’s unfortunate induration of thought” (1927a, 325). This was an ad hominem response, attacking the critic, not the critique. Whatever Thomson’s state of mind, the only important thing about his critique from a scientific point of view, is its logical relationship to Spearman’s theory. Second, Kelley’s attempted rebuttal of Boring’s critique is a case of petitio principii (that is, begging the question). Kelley’s starting point was that “our mental tests measure something” (1929, 86), which presumes the very thing Boring was questioning. Third, the desultory reaction to Smith’s critique amounted to turning a blind eye to criticism. This marked a transition from an earlier reliance upon specious arguments, which at least recognised the existence of critiques, to the post-Stevens stance of wilful blindness, which treated critiques as non-existent.

In part this was a consequence of methodological triumphalism. Modern psychology, belittling the armchair methodology of the earlier philosophical psychology, based itself upon the conviction that physics’ success was due to the methods of experiment and measurement, and hence these methods are necessary in psychology. This presumption was fervently embraced and the aping of these methods seemed triumph enough for psychologists. There was little recognition that as with all methods, scientific success is contingent upon specific empirical conditions in the phenomena investigated. In the case of measurement, the central condition necessary for scientific success is that the relevant attributes possess quantitative structure. In the absence of this, claiming measurement is not just misleading (in the sense of presenting a method as something it is not known to be) it is scientifically stultifying.

However, psychometrics is such a successful technology that its defects as a science escapes notice, except by those contributing to the critical corpus. By mid-twentieth century, psychometrics’ deficiencies were exposed. The fact that few attended to them displays psychometrics’ woeful critical culture and the wider culture’s metric fixation. Cocooned in a psychometric culture craving measurements and a wider culture craving convenient metrics, a discipline marketing a flourishing technology premised upon the belief that its “measurements” are key to its scientific image and this image, key to its marketing success has no incentive to aim higher.

Footnotes

ORCID iD

Joel Michell

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

Author Biography

Joel Michell is an honorary associate professor within the School of Psychology, University of Sydney, where he taught psychometrics and the history and philosophy of psychology until his retirement in 2004. He has published widely on the history and philosophy of measurement in psychology, his most recent contribution being “The Problem of Measurement in Psychology” in The Sage Handbook of Theoretical Psychology (edited by Henderikus J. Stam and Huib Looren de Jong, Sage, 2025).

References

Adams

Harry F

. 1931. “Measurement in Psychology.” Journal of Applied Psychology 15 (6): 545–54. doi: 10.1037/h0073925.

Baldwin

James Mark

. 1901. “Factor.” In Dictionary of Philosophy and Psychology, edited by Baldwin

James Mark

, Vol. 1, 368. Macmillan.

Bartholomew

David J.

Deary

Ian J.

Lawn

Martin

. 2009. “A New Lease of Life for Thomson’s Bonds Model of Intelligence.” Psychological Review 116 (3): 567–79. doi: 10.1037/a0016262.

Berka

Karel

. 1983. Measurement: Its Concepts, Theories and Problems. Reidel.

Blinkhorn

S. F.

1997. “Past Imperfect, Future Conditional: Fifty Years of Test Theory.” British Journal of Mathematical and Statistical Psychology 50 (2): 175–85. doi: 10.1111/j.2044-8317.1997.tb01139.x.

Boring

Edwin G.

1920. “The Logic of the Normal Law of Error in Mental Measurement.” American Journal of Psychology 31 (1): 1–33. doi: 10.2307/1413989.

Boring

Edwin G.

1923. “Intelligence as the Tests Test It.” New Republic 36: 35–37.

Boring

Edwin G.

1929. A History of Experimental Psychology. Appleton-Century-Crofts.

Boring

Edwin G.

1945. “The Use of Operational Definitions in Science.” Psychological Review 52 (5): 243–45. doi: 10.1037/h0054934.

10.

Borsboom

Denny

. 2005. Measuring the Mind: Conceptual Issues in Contemporary Psychometrics. Cambridge University Press.

11.

Bridgman

P. W.

1927. The Logic of Modern Physics. Macmillan.

12.

Briggs

Derek C.

2022. Historical and Conceptual Foundations of Measurement in the Human Sciences: Credos and Controversies. Routledge.

13.

Briggs

Derek C.

Maul

Andrew

McGrane

Joshua A.

. 2025. “On the Nature of Measurement.” In Educational Measurement, edited by Cook

Linda L.

Pitoniak

Mary J.

. 5th ed. Oxford University Press.

14.

Brown

J. F.

1934. “A Methodological Consideration of the Problem of Psychometrics.” Erkenntnis 4 (1): 46–61. doi: 10.1007/BF01793476.

15.

Campbell

Norman Robert

. 1920. Physics: The Elements. Cambridge University Press.

16.

Campbell

Norman Robert

. 1928. An Account of the Principles of Measurement and Calculation. Longman, Green and Co.

17.

Collingwood

R. G.

1939. An Autobiography. Clarendon Press.

18.

Courtis

S. A.

Buckingham

B. R.

McCall

W. A.

Otis

A. S.

Rugg

H. O.

Trabue

M. R.

. 1921. “Report of the Standardization Committee.” Journal of Educational Research 4 (1): 78–80.

19.

Cronbach

Lee J.

Gleser

Goldine C.

. 1957. Psychological Tests and Personnel Decisions. University of Illinois Press.

20.

Cureton

Edward E.

1939. “Review of Logical Aspects of Educational Measurement, by B. Othanel Smith.” American Journal of Psychology 52 (4): 652–53. doi:10.2307/1416489

21.

Deary

Ian J.

Lawn

Martin

Brett

Caroline E.

Pattie

Alison

Bartholomew

David J.

. 2010. “Archival Sources for Sir Godfrey Hilton Thomson.” History of Psychology 13 (1): 95–109. doi: 10.1037/a0018529.

22.

Dosi

Giovanni

. 1982. “Technological Paradigms and Technological Trajectories: A Suggested Interpretation of the Determinants and Directions of Technological Change.” Research Policy 11 (3): 147–62. doi:10.1016/0048-7333(82)190016-6

23.

Ebel

Robert L.

Frisbie

David A.

. 1965 (1991). Essentials of Educational Measurement. 5th ed. Prentice Hall.

24.

Essex

Christopher

Smythe

William E.

. 1999. “Between Numbers and Notions: A Critique of Psychological Measurement.” Theory & Psychology 9 (6): 739–67. doi:10.1177/0959354399096002

25.

Ferguson

Myers

C. S.

Bartlett

R. J.

, et al. 1938. “Quantitative Estimates of Sensory Events: Interim Report of the Committee Appointed to Consider and Report upon the Possibility of Quantitative Estimates of Sensory Events.” British Association for the Advancement of Science 108: 277–334.

26.

Ferguson

Myers

C. S.

Bartlett

R. J.

, et al. 1940. “Quantitative Estimates of Sensory Events: Final Report of the Committee Appointed to Consider and Report upon the Possibility of Quantitative Estimates of Sensory Events.” Advancement of Science 1: 331–49.

27.

Fillmore-Patrick

Stella

. 2025. “The Logit Model Measurement Problem.” Philosophy of Science 92 (2): 285–303. doi: 10.1017/psa.2024.25.

28.

Flanagan

John C

. 1961. “Truman Lee Kelley.” Psychometrika 26: 342–45. doi: 10.1007/BF02289767.

29.

Galluzzi

Paolo

. 2014 (2017). The Lynx and the Telescope: The Parallel Worlds of Cesi and Galileo. Translated by Peter Mason. Brill.

30.

Galton

Francis

. 1865. “Hereditary Talent and Character. Part 1.” Macmillan’s Magazine 12: 157–66.

31.

Galton

Francis

. 1869. Hereditary Genius: An Inquiry into Its Laws and Consequences. Macmillan.

32.

Galton

Francis

. 1879. “Psychometric Experiments.” Brain 2 (2): 149–62. doi: 10.1093/brain/2.2.149.

33.

Galton

Francis

. 1889. Natural Inheritance. Macmillan.

34.

Garrison

Mark J.

2009. A Measure of Failure: The Political Origins of Standardized Testing. State University of New York Press.

35.

Goldstein

Harvey

Wood

Robert

. 1989. “Five Decades of Item Response Modelling.” British Journal of Mathematical and Statistical Psychology 42 (2): 139–67. doi: 10.1111/j.2044-8317.1989.tb00905.x.

36.

Gould

Stephen Jay

. 1981. The Mismeasure of Man. W. W. Norton.

37.

Grice

James W.

Barrett

Paul T.

Schlimgen

Liz A.

Abramson

Charles I.

. 2012. “Toward a Brighter Future for Psychology as an Observation Oriented Science.” Behavioral Science 2 (1): 1–22. doi: 10.3390/bs2010001.

38.

Guilford

J. P.

1936 (1954). Psychometric Methods. 2nd ed. McGraw-Hill.

39.

Hart

Berhard

Spearman

. 1912. “General Ability, Its Existence and Nature.” British Journal of Psychology 5 (1): 51–84. doi: 10.1111/j.2044-8295.1912.tb00055.x.

40.

Heine

Jörg-Henrik

Heene

Moritz

. 2025. “Measurement and Mind: Unveiling the Self-Delusion of Metrification in Psychology.” Measurement 23 (3): 213–41. doi: 10.1080/15366367.2024.2329958.

41.

Heiser

Willem

Hubert

Lawrence

. 2016. “A Creation Narrative for the Psychometric Society and Psychometrika: In the Beginning There Was Paul Horst.” Psychometrika 81 (4): 1172–76. doi:10.1007/s11336-016-9539-4

42.

Hölder

Otto

. 1901. “Die Axiome der Quantität und die Lehre vom Mass.” Berichte über die Verhandlungen der Königlich Sächsischen Gesellschaft der Wissenschaften zu Leipzig, Mathematisch-Physische Klasse 53: 1–46.

43.

Johnson

H. M.

1936. “Pseudo-Mathematics in the Mental and Social Sciences.” American Journal of Psychology 48 (2): 342–51. doi: 10.2307/1415754.

44.

Jones

Lyle V.

Thissen

David

. 2007. “A History and Overview of Psychometrics.” In Handbook of Statistics: Vol. 26 Psychometrics, edited by Rao

C. R.

Shiharay

, 1–27. Elsevier.

45.

Kelley

Truman L

. 1923. “The Principles and Techniques of Mental Measurement.” American Journal of Psychology 34 (3): 408–32. doi: 10.2307/1413956.

46.

Kelley

Truman L.

1928. Crossroads in the Mind of Man: A Study of Differentiable Mental Abilities. Stanford University Press.

47.

Kelley

Truman L.

1929. Scientific Method: Its Function in Research and in Education. Ohio State University Press.

48.

Kerlinger

Fred N.

Lee

Howard B.

. 1964 (2000). Foundations of Behavioural Research. 4th ed. Harcourt.

49.

Krantz

David H.

Luce

R. Duncan

Suppes

Patrick

Tversky

Amos

. 1971. Foundations of Measurement: Vol. 1 Addictive and Polynomial Representations. Academic Press.

50.

Kuhn

Thomas S.

1962. The Structure of Scientific Revolutions. University of Chicago Press.

51.

Kyngdon

Andrew

. 2006. “An Empirical Study into the Theory of Unidimensional Unfolding.” Journal of Applied Measurement 7 (4): 369–93.

52.

Kyngdon

Andrew

. 2011. “Plausible Measurement Analogies to Some Psychometric Models of Test Performance.” British Journal of Mathematical and Statistical Psychology 64 (3): 478–97. doi: 10.1348/2044-8317.002004.

53.

Kyngdon

Andrew

Richards

Ben

. 2007. “Attitudes, Order and Quantity: Deterministic and Direct Probabilistic Tests of Unidimensional Unfolding.” Journal of Applied Measurement 8 (1): 1–34.

54.

Lippmann

Walter

. 1922. “The Mental Age of Americans.” New Republic 32: 213–15.

55.

Lord

Frederic M.

Novick

Melvin R.

. 1968. Statistical Theories of Mental Test Scores. Addison-Wesley.

56.

Luce

R. Duncan

Tukey

John W.

. 1964. “Simultaneous Conjoint Measurement: A New Type of Fundamental Measurement.” Journal of Mathematical Psychology 1 (1): 1–27. doi:10.1016/0022-2496(64)90015-X

57.

Lumsden

James

. 1976. “Test Theory.” Annual Review of Psychology 27: 251–80.

58.

Maraun

Michael D

. 1998. “Measurement as a Normative Practice: Implications of Wittgenstein’s Philosophy for Measurement in Psychology.” Theory & Psychology 8 (4): 435–61. doi: 10.1177/0959354398084001.

59.

Marks

Lawrence E.

2006. “S. S. Stevens’s ‘Lost’ Paper of 1939: ‘On the Problem of Scales for Psychological Magnitudes’.” In Proceedings of the 22nd Annual Meeting of the International Society for Psychophysics (Fechner Day 2006), edited by Kronbrot

D. E.

Msetfi

Rachel M.

MacRaw

A. W.

, 19–22. International Society for Psychophysics.

60.

McCormack

Thomas J.

1922. “A Critique of Mental Measurements.” School and Society 15: 686–92.

61.

McDonald

Roderick P.

1999. Test Theory: A Unified Treatment. Erlbaum.

62.

McDonald

Roderick P

. 2013. “Modern Test Theory.” In The Oxford Handbook of Quantitative Methods in Psychology, edited by Little

Todd D.

, Vol. 1, 118–43. Oxford University Press.

63.

McGrane

Joshua A.

Maul

Andrew

. 2020. “The Human Sciences, Models and Metrological Mythology.” Measurement 152: 107346. doi:10.1016/j.measurement.2019.107346

64.

McGregor

Douglas

. 1935. “Scientific Measurement and Psychology.” Psychological Review 42 (3): 246–66. doi: 10.1037/h0053856.

65.

Meehl

Paul E.

1956. “Clinical Versus Actuarial Prediction.” In Proceedings of the 1955 Invitational Conference on Testing Problems, 136–41. Educational Testing Service.

66.

Meehl

Paul E.

1978. “Theoretical Risks and Tabular Asterisks: Sir Karl, Sir Ronald, and the Slow Progress of Soft Psychology.” Journal of Consulting and Clinical Psychology 46 (4): 806–34. doi: 10.1037/0022-006X.46.4.806.

67.

Michell

Joel

. 1990. An Introduction to the Logic of Psychological Measurement. Lawrence Erlbaum Associates.

68.

Michell

Joel

. 1993. “Numbers, Ratios, and Structural Relations.” Australasian Journal of Philosophy 71 (3): 325–32. doi: 10.1080/00048409312345332.

69.

Michell

Joel

. 1994. “Measuring Dimensions of Belief by Unidimensional Unfolding.” Journal of Mathematical Psychology 38 (2): 244–73. doi: 10.1006/jmps.1994.1016.

70.

Michell

Joel

. 1997. “Quantitative Science and the Definition of Measurement in Psychology.” British Journal of Psychology 88 (3): 355–83. doi: 10.1111/j.2044-8295.1997.tb02641.x.

71.

Michell

Joel

. 1999. Measurement in Psychology: A Critical History of a Methodological Concept. Cambridge University Press.

72.

Michell

Joel

. 2000. “Normal Science, Pathological Science and Psychometrics.” Theory & Psychology 10 (5): 639–67. doi: 10.1177/0959354300105004.

73.

Michell

Joel

. 2002. “Stevens’s Theory of Scales of Measurement and Its Place in Modern Psychology.” Australian Journal of Psychology 54 (2): 99–104.

74.

Michell

Joel

. 2004. “Item Response Models, Pathological Science and the Shape of Error: Reply to Borsboom and Mellenbergh.” Theory & Psychology 14 (1): 121–29. doi: 10.1177/0959354304040201.

75.

Michell

Joel

. 2007. “Representational Theory of Measurement.” In Measurement in Economics: A Handbook, edited by Boumans

Marcel

, 19–39. Elsevier.

76.

Michell

Joel

. 2008. “Is Psychometrics Pathological Science?” Measurement 6 (1–2): 7–24. doi:10.1080/15366360802035489

77.

Michell

Joel

. 2009. “Invalidity in Validity.” In The Concept of Validity: Revisions, New Directions, and Applications, edited by Lissitz

Robert W.

, 111–33. Information Age Publishing.

78.

Michell

Joel

. 2014. “The Rasch Paradox, Conjoint Measurement, and Psychometrics: Response to Humphry and Sijtsma.” Theory & Psychology 24 (1): 111–23. doi: 10.1177/0959354313517524.

79.

Michell

Joel

. 2020a. “The Fashionable Scientific Fraud: Collingwood’s Critique of Psychometrics.” History of the Human Sciences 33 (2): 3–21. doi: 10.1177/0952695119872638.

80.

Michell

Joel

. 2020b. “Thorndike’s Credo: Metaphysics in Psychometrics.” Theory & Psychology 30 (3): 309–28. doi: 10.1177/0959354320916251.

81.

Michell

Joel

. 2022. “‘The Art of Imposing Measurement upon the Mind’: Sir Francis Galton and the Genesis of the Psychometric Paradigm.” Theory & Psychology 32 (3): 375–400. doi: 10.1177/09593543211017671.

82.

Michell

Joel

. 2023a. “‘Professor Spearman has Drawn Over-hasty Conclusions’: Unravelling Psychometrics’ ‘Copernican Revolution’.” Theory & Psychology 33 (5): 661–80. doi: 10.1177/09593543231179446.

83.

Michell

Joel

. 2023b. Appropriate Measures. Invited Seminar Paper. Department of Philosophy, Macquarie University.

84.

Michell

Joel

. 2025a. “The Problem of Measurement in Psychology.” In The Sage Handbook of Theoretical Psychology, edited by Stam

Henderikus J.

Looren de Jong

Huib

, 163–85. Sage.

85.

Michell

Joel

. 2025b. “To Psychometrics, from Thermometry.” Theory & Psychology 35 (5): 537–54. doi: 10.1177/09593543251355475.

86.

Michell

Joel

Ernst

Catherine

. 1996. “The Axioms of Quantity and the Theory of Measurement: Translated from Part I of Otto Hölder’s German Text ‘Die Axiome der Quantität und die Lehre vom Mass’.” Journal of Mathematical Psychology 40 (3): 235–52. doi:10.1006/jmps.1996.0023

87.

Michell

Joel

Ernst

Catherine

. 1997. “The Axioms of Quantity and the Theory of Measurement: Translated from Part II of Otto Hölder’s German Text ‘Die Axiome der Quantität und die Lehre vom Mass’.” Journal of Mathematical Psychology 41 (4): 345–56. doi:10.1006/jmps.1997.1178

88.

Mills

John A.

1992. “Operationism, Scientism, and the Rhetoric of Power.” In Positivism in Psychology: Historical and Contemporary Problems, edited by Tolman

Charles W.

, 67–82. Springer-Verlag.

89.

Mislevy

Robert J.

2018. Sociocognitive Foundations of Educational Measurement. Routledge.

90.

Muller

Jerry Z.

2018. The Tyranny of Metrics. Princeton University Press.

91.

Nagel

Ernest.

1930. “On the Logic of Measurement.” PhD diss., Columbia University.

92.

Nash

Roy

. 1990. Intelligence and Realism: A Materialist Critique of IQ. Palgrave Macmillan.

93.

Newman

Edwin B

. 1974. “On the Origin of ‘Scales of Measurement’.’” In Sensation and Measurement: Papers in Honor of S. S. Stevens, edited by Moskowitz

Howard R.

Scharf

Bertram

Stevens

Joseph C.

, 137–45. Reidel.

94.

Nunnally

Jum C.

Jr.

1970. Introduction to Psychological Measurement. McGraw-Hill.

95.

Porter

Theodore M

. 1995. Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton University Press.

96.

Porter

Theodore M.

2004. Karl Pearson: The Scientific Life in a Statistical Age. Princeton University Press.

97.

Reese

Thomas Whelan

. 1943. “The Application of the Theory of Physical Measurement to Psychological Magnitudes, with Three Experimental Examples.” Psychological Monographs 55 (3): i–89.

98.

Richardson

Ken

. 2000. The Making of Intelligence. Columbia University Press.

99.

Rogers

Tim B.

1992. “Antecedents of Operationism: A Case History in Radical Positivism.” In Positivism in Psychology: Historical and Contemporary Problems, edited by Tolman

Charles W.

, 57–65. Springer-Verlag.

100.

Rozeboom

William W.

1960. “The Fallacy of the Null-Hypothesis Significance Test.” Psychological Bulletin 57 (5): 416–28. doi: 10.1037/h0042040.

101.

Schönemann

Peter H

. 1994. “Measurement: The Reasonable Ineffectiveness of Mathematics in the Social Sciences.” In Trends and Perspectives in Empirical Social Research, edited by Borg

Ingwer

Mohler

Peter Ph.

, 149–60. Walter de Gruyter.

102.

Smith

B. Othanel

. 1938. Logical Aspects of Educational Measurement. McGraw-Hill.

103.

Spearman

1904. “‘General Intelligence,’ Objectively Determined and Measured.” American Journal of Psychology 15 (2): 201–93. doi: 10.2307/1412107.

104.

Spearman

1914. “The Heredity of Abilities.” Eugenics Review 6 (3): 219–37.

105.

Spearman

1927a. “Material Versus Abstract Factors in Correlation.” British Journal of Psychology 17 (4): 322–26. doi: 10.1111/j.2044-8295.1927.tb00434.x.

106.

Spearman

1927b. The Abilities of Man: Their Nature and Measurement. Macmillan.

107.

Spearman

1931. “Our Need of Some Science in Place of the Word ‘Intelligence’.” Journal of Educational Psychology 22 (6): 401–10. doi: 10.1037/h0070599.

108.

Spearman

1946. “Theory of General Factor.” British Journal of Psychology 36 (3): 117–31. doi: 10.1111/j2044-8295.1946.tb01114.x.

109.

Stein

Zachary

. 2016. Social Justice and Educational Measurement: John Rawls, the History of Testing, and the Future of Education. Routledge.

110.

Stevens

S. S.

1946. “On the Theory of Scales of Measurement.” Science 103 (2684): 667–80. doi: 10.1126/science.103.2684.677.

111.

Stevens

S. S.

1951. “Mathematics, Measurement, and Psychophysics.” In Handbook of Experimental Psychology, edited by Stevens

S. S.

, 1–49. Wiley.

112.

Stevens

S. S.

1974. “S. S. Stevens.” In A History of Psychology in Autobiography, edited by Lindzey

Gardner

, Vol. 6, 393–420. Prentice-Hall.

113.

Stevens

S. S.

2006. “On the Problem of Scales for the Measurement of Psychological Magnitudes.” In Proceedings of the 22nd Annual Meeting of the International Society for Psychophysics (Fechner Day 2006), edited by Kronbrot

D. E.

Msetfi

Rachel M.

MacRaw

A. W.

, 23–27. International Society for Psychophysics.

114.

Sutcliffe

J. P.

1986. “On Untrue Scores and Rash Applications.” In Current Issues in Cognitive Development and Mathematical Psychology: John A. Keats Festschrift Conference, edited by Heath

Richard A.

, 87–99. Department of Psychology, University of Newcastle.

115.

Thomson

Godfrey H.

1916. “A Hierarchy Without a General Factor.” British Journal of Psychology 8 (3): 271–81. doi: 10.1111/j.2044-8295.1916.tb00133.x.

116.

Thomson

Godfrey H.

1919. “The Hierarchy of Abilities.” British Journal of Psychology 9 (3–4): 337–44. doi:10.1111/j.2044-8295.1919.tb00231.x

117.

Thomson

Godfrey H

. 1921. “Preface II.” In William Brown and Godfrey H. Thomson, The Essentials of Mental Measurement, vi–viii. Cambridge University Press.

118.

Thomson

Godfrey H.

1935. “On Complete Families of Correlation Coefficients and Their Tendency to Zero Tetrad-Differences: Including a Statement of the Sampling Theory of Abilities.” British Journal of Psychology 26 (1): 63–92. doi:10.1111/j.2044-8295.1935.tb00775.x

119.

Thomson

Godfrey H.

1939. The Factorial Analysis of Human Ability. University of London Press.

120.

Thomson

Godfrey H.

1946. Some Recent Work in Factorial Analysis and a Retrospect: Presidential Address Delivered to the British Psychological Society at Durham 1946. University of London Press.

121.

Thomson

Godfrey H

. 1952. “Godfrey Thomson.” In A History of Psychology in Autobiography, edited by Boring

Edwin G.

Werner

Heinz

Langfeld

Herbert S.

Yerkes

Robert M.

, Vol. 4, 279–94. Clark University Press.

122.

Titchener

Edward Bradford

. 1905. Experimental Psychology: A Manual of Laboratory Practice. Vol. II: Quantitative Experiments. Part II: Instructor’s Manual. Macmillan.

123.

Torres Irribarra

David

. 2021. A Pragmatic Perspective of Measurement. Springer.

124.

Trendler

Günter

. 2009. “Measurement Theory, Psychology and the Revolution that Never Happened.” Theory & Psychology 19 (5): 579–99. doi: 10.1177/0959354309341926.

125.

Vautier

Stéphane

Veldhuis

Michiel

Lacot

Émilie

Matton

Nadine

. 2012. “The Ambiguous Utility of Psychometrics for the Interpretative Foundation of Socially Relevant Avatars.” Theory & Psychology 22 (6): 810–22. doi: 10.1177/0959354312450093.

126.

van der Ark

L. Andries

Emons

Wilco H. M.

Meijer

Rob R.

, eds. 2023. Essays on Contemporary Psychometrics. Springer.

127.

van der Maas

Han L.

Kan

Kees-Jan

Borsboom

Denny

. 2014. “Intelligence Is What the Intelligence Test Measures. Seriously.” Journal of Intelligence 2 (1): 12–15. doi: 10.3390/jintelligence2010012.

128.

Wijsen

Lisa D.

Borsboom

Denny

. 2021. “Perspectives on Psychometrics: Interviews with 20 Past Psychometric Society Presidents.” Psychometrika 86 (1): 327–43. doi: 10.1007/s11336-021-09752-7.

129.

Wilson

Edwin B.

1928. “On Hierarchical Correlation Systems.” Proceedings of the National Academy of Sciences 14 (3): 283–91. doi: 10.1073//pnas.14.3.283.

The Mystery of Missing Psychometric Corpus : The Case of the Technological Cart Before the Scientific Horse

Abstract

Keywords

Introduction

The Chorus of Published Psychometric Criticism

Sir Godfrey Thomson’s Non-Quantitative Alternative

Edwin Garrigues Boring and the Normal Law of Error

Bunnie Othanel Smith and the Issue of Quantitative Structure

The Critical Situation at Mid-Century

Psychometrics: Myth-Based Technology

Upshot

Footnotes

ORCID iD

Funding

Declaration of Conflicting Interests

Notes

Author Biography

References