Abstract
Limits of the Numerical calls for the (re)contextualization of the numerical in the social domain and emphasizes that using quantitative data has epistemic and practical/moral considerations that may not align. In this review essay, we evaluate these claims using a case study, viz. the personalized, clinical experience sampling method (ESM) in mental health care. This case study (1) nuances claims made in Limits of the Numerical regarding the generality and non-contextuality of numerical data, and (2) highlights two additional dimensions to the numerical that have been underexposed in the book (i.e., reactivity and Ballung concepts).
1. Introduction
Numerical data takes up a prominent position in the social domain and is frequently taken as decisive in personal and policy decisions. In recent years, however, society has begun to reevaluate its relationship to the numerical. Climate skepticism, for example, demonstrates the fragility of our “trust in numbers” (and our trust in the experts using them), and the introduction of big data in policy, education, and health care has raised new questions regarding knowledge and expertise (Davies 2017). Limits of the Numerical, edited by Christopher Newfield, Anna Alexandrova, and Stephen John (2022), fits into this development. It shows that numerical data “are both more influential and more fragile than before - both more and less authoritative, harder to contest and more contested” (Newfield, Alexandrova, and John 2022, 11). The book presents a timely account of how numbers “might and should work in our social and political lives” (Newfield, Alexandrova, and John 2022, 9). Its authors reflect on the role of quantification in such areas as climate science, higher education, and well-being from different perspectives, including philosophy, history, anthropology, economics, and cultural studies. This review essay highlights the book’s relevance for philosophers of the social sciences and explores its main theses in light of the recent personalized measurement trend in mental health care.
The starting point of Limits of the Numerical is the so-called “Original Critique” of quantification of social phenomena, articulated by historians of science and philosophers of public administration (e.g., Desrosières 1998; Espeland and Sauder 2016; Espeland and Stevens 1998; Hacking 1990, 1995; Nirenberg and Nirenberg 2021; Poovey 1998; Porter 1992, 1994, 1995; Power 1997; Shore and Wright 2015). The Original Critique challenged the objectivity and value-free nature of quantification in the social domain, arguing that numbers presuppose categories that encode specific (dominant) worldviews, histories, and goals. More specifically, the Original Critique claimed that the infiltration of numbers in public administration coincides with the erasure of contextual, informal, and qualitative knowledge. Limits of the Numerical builds upon the Original Critique by presenting a more nuanced perspective on (social) quantification. While the authors agree that values shape how and why we quantify, they move forward from earlier debates by addressing how we can bring the numerical in “epistemic parity” with non-numerical, qualitative information (2022, 18, 63–66). Thus, the authors acknowledge that numbers can do useful work in the social world, for instance by mobilizing people and exposing injustices (i.e., statactivism). Using numbers brings with it epistemic and moral/practical risks as well as benefits; the book explores this trade-off. As such, the authors find that the use of numbers in the social domain should be judged on a case-by-case basis.
Limits of the Numerical delivers the above nuanced critique through a deep-dive into different research themes. In the first part of the book, Elizabeth Chatterjee (Chapter 1) and Christopher Newfield (Chapter 2) explore the relationship between quantification, expertise, and populism in British and US politics. These chapters demonstrate how “quantocratic” politicians have used numbers in a deterministic and decontextualized manner in policy reform, and how this has led to an erosion of trust (for instance by “overruling” personal experience). The second part of the book addresses how qualitative information can strengthen or contextualize quantitative information, and vice versa. Heather Steffen (Chapter 3) discusses how narratives strengthen the quantitative culture of higher education audits, Trenholme Junghans (Chapter 4) explores how qualitative knowledge is regaining terrain in the evaluation of drugs for rare diseases, and Laura Mandell (Chapter 5) addresses how the digital humanities can support the interpretation of literary case histories. The third part of the book addresses moral issues related to the use of numbers in the social domain. Stephen John (Chapter 6) explores whether epistemically “bad” numbers can affect positive social or personal change in the case of dietary advice, and Gabriele Badano (Chapter 7) uses a political philosophy perspective to discuss numbers’ role in political decision-making, specifically in relation to public justification. The final part of the book addresses potential trade-offs between epistemic and practical/moral considerations when using quantification in the context of well-being (Anna Alexandrova and Ramandeep Singh, Chapter 8) and extreme weather attribution (Greg Lusk, Chapter 9). Finally, Aashish Mehta and Christopher Newfield (Chapter 10) discuss how the humanities and economics can bridge their views on the benefits of higher education, exploring their respective positions on quantification. In short, all chapters call for the (re)contextualization of the numerical in the social domain and for the weighting of epistemic and practical/moral considerations when using numbers in the social context.
Limits of the Numerical deserves special attention from philosophers of the social sciences. The book emphasizes the importance of philosophical analyses for reflecting on the quantification of social phenomena: “it is only by stepping outside quantitative frameworks that we can get a fuller appreciation of what those frameworks do, how they do it, and whether they do it badly or well” (Newfield, Alexandrova, and John 2022, 15). However, the book also urges philosophers of social science—and philosophers of measurement more specifically—to take qualitative, contextual information seriously. For instance, Alexandrova and Singh (Chapter 8) show that philosophers of measurement who study social phenomena like well-being should be willing to wear both “the hat of a philosopher-scientist” and that of a sociologist (2022, 198). Discussing measurements of well-being, they argue that philosophers of science should consider both the validity of the measurement they study as well as “the rhetorical and pragmatic role this number plays in politics, governance, and public debate” (2022, 198). Lusk (Chapter 9) emphasizes the dual perspectival nature of quantitative measures: numbers are obtained (or collected or created) from a specific vantage point but used by social actors with a potentially different vantage point. Whereas researchers would ideally like these vantage points to align, this is not necessarily the case. So, Limits of the Numerical adds to recent debates in philosophy of measurement that emphasize the practical/moral side of measurement (cf. Alexandrova 2017; Alexandrova and Fabian 2022).
In the remainder of this review essay, we will explore the main premises of Limits of the Numerical in light of the recent personalized measurement trend in mental health care. Numerical data takes up an increasingly prominent position in mental health care. In recent years, therapists and clinical researchers have expressed interest in data science, hoping that this reliance on the numerical will improve clinical practice (Russ et al. 2018; Rutledge, Chekroud, and Huys 2019). One prominent example that has received only limited attention from philosophers of measurement and social science is the suggested use of the experience sampling method (ESM) in mental health care, that is, clinical ESM. 1 In what follows, we will use this case to evaluate and expand on the “limits of the numerical” posed in the book. 2 First, we argue that the “limits of the numerical” that the book presents also apply to this case study. Afterwards, we explore how clinical ESM nuances some of the claims made in Limits of the Numerical. Finally, we show that clinical ESM highlights two dimensions of the numerical that have been underexposed in Limits of the Numerical.
2. Experience Sampling Method
ESM is a structured self-report technique that assesses an individual’s momentary states, for example, their thoughts, feelings, activities, and the company they are in at the time of reporting. ESM allows individuals to monitor their experiences several times per day, usually for multiple days or weeks, generally on their smartphones (Shiffman, Stone, and Hufford 2008). ESM is used in psychological research to test hypotheses that cross-sectional methods cannot address (for reviews, see Myin-Germeys et al. 2009, 2018). Here, we focus on the suggested use of personalized ESM data in mental health care, that is, clinical ESM. Clinical ESM is not intended as a screening tool (as is the case for self-report questionnaires such as the Patient Health Questionnaire-9, PHQ-9, Kroenke and Spitzer 2002), nor for interpersonal or inter-group comparisons. Rather, clinical ESM investigates how factors that influence someone’s mental well-being develop and co-occur over time. More specifically, clinical ESM generates questions that serve as a starting point for clinical dialogue (Epskamp et al. 2018; Wichers et al. 2019; von Klipstein et al. 2020); improves case conceptualization (von Klipstein et al. 2020), and might even provide treatment targets (David et al. 2018; Rubel et al. 2018). Next, the co-construction of ESM items could strengthen collaboration between the client and the therapist (von Klipstein et al. 2020), and filling out ESM questionnaires may stimulate the client’s awareness, reflection, and insight (Bos et al. 2019; Kramer et al. 2014). That is why various pilot studies suggest that the use of clinical ESM can benefit therapeutic practice (e.g., Bak et al. 2016; Frumkin et al. 2021; Kroeze et al. 2017; von Klipstein et al. 2023a). In short, clinical ESM could serve both an epistemic and a practical purpose in mental health care.
To what extent do the claims in Limits of the Numerical apply to clinical ESM? In line with the book’s main premise, measurement decisions in clinical ESM are neither straightforward nor value-free. In what follows, we will show how each step in clinical ESM measurement necessary for obtaining numerical data is driven by both epistemic and practical/moral considerations. Let us now discuss these three steps in more detail. 3
2.1. Variable Selection
Clinical ESM is based on the assumption that a person’s mental health is influenced by both their mental states and their context (i.e., their activities, environmental factors, and social factors). This means that clinical ESM is not bound to a pre-set list of variables (as is the case in clinical questionnaires). Instead, it can include different variables that are considered important for an individual’s mental health. As such, the client and their therapist must decide what variables to include in that client’s personalized ESM questionnaire. Variable selection will therefore depend on the specific problems and questions that the client and their therapist are interested in, and is ideally done in close collaboration (Bos et al. 2019; von Klipstein et al. 2020). For instance, the personalized, clinical ESM questionnaire developed by von Klipstein et al. (2023a) includes variables ranging from “rumination” to “physical discomfort,” “having company,” and “gaming.” However, variable selection is constrained by both practical and epistemic considerations. For instance, the therapist and the client could decide to limit the number of included variables to make the ESM questionnaire less burdensome (e.g., Myin-Germeys et al. 2018). Moreover, variable selection may be constrained by the specific data models one would like to use to analyze and summarize the data (for reflections on the constraints of personalized network models, see de Boer et al. 2022; von Klipstein et al. 2020).
2.2. Item Selection
After the variables of interest have been selected, they must be transformed into items, that is, statements or questions that the client should answer. The content of an item (i.e., its conceptualization, cf. Cartwright and Runhardt 2014) can differ depending on whether the client and the therapist are interested in that client’s current experiences (e.g., “I have a sense of belonging”) or their recent experiences (e.g., “Since the last beep, I have felt a sense of belonging”). However, after making this decision, there are still multiple ways to transform variables into items. For instance, the variable “feeling alone” can be conceptualized as “I feel like an outsider” or “I would prefer to have company.” So, the final conceptualization that the therapist and the client will decide upon, is influenced by epistemic as well as practical considerations.
2.3. Response Selection
Once the client and the therapist have decided on the right item, they must still decide what type of response the item requires (akin to the representation stage of measurement in Cartwright and Runhardt (2014) and Bradburn, Cartwright, and Fuller (2017)). Most commonly, clients must answer an item on a numerical scale. Indeed, most items in the ESM Item Repository—an open-access repository of existing ESM items (Kirtley et al. 2023)—use an ordinal Likert-scale (e.g., “I feel anxious,” 1 = not at all, 7 = very much). The repository also includes items that require nominal responses (e.g., “Are you alone?” yes/no), or answers that are not on a scale at all, but are open-ended instead (e.g., “Think about the most positive event of today. What was it?”). An item can require different types of responses: response selection that meshes with the item is rarely straightforward and will be driven by more than just epistemic considerations. 4
So, the main claims regarding the value-ladenness of quantification that Limits of the Numerical presents are also visible in clinical ESM: the ESM data obtained will depend on both epistemic and practical decisions made in variable, item, and response selection. However, clinical ESM also brings new insights into (the limits) of the numerical to the fore. In the next section, we will use clinical ESM to nuance some of the statements made in Limits of the Numerical regarding the generality and non-contextuality of numerical data.
3. Personalized and Contextualized Quantification
A main criticism presented in Limits of the Numerical is that exclusively using numerical data causes us to lose sight of individual nuances and personal context. To illustrate, consider the criticisms that quantification “[s]implifies, accompanied by loss of interpretive complexity, local context, and qualitative experience” (Newfield, Alexandrova, and John 2022, 13) and “[s]ilences and causes loss of political agency when numbers are used to bypass vernacular, standpoint, or subaltern knowledges” (2022, 13). We argue that the clinical ESM case nuances this criticism in two ways.
First, clinical ESM highlights that the “depersonalization” that can accompany numerical data is not necessarily the result of the data’s numerical nature (it is not the use of numbers that depersonalizes data). Rather, it arises out of the averaging that generally accompanies the numerical. Glossing over or bypassing individual differences within a study population is a wider criticism that many philosophers of science (cf. Steel 2008 for an introduction) and psychometricians (e.g., Molenaar and Campbell 2009) have previously raised. Clinical ESM is thought to address this worry by providing “personalized” numerical data based on self-reports that are not averaged over different people.
Second, clinical ESM demonstrates that the numerical can take local context into account. That is, (1) clinical ESM data is collected in an ecologically valid manner (i.e., the data is collected in the real world rather than in a laboratory setting), (2) clinical ESM items include questions relating to the individual’s context, and (3) clinical ESM pilot studies show that the quantitative can be embedded directly in a qualitative setting: ESM-based data models are discussed and questioned in the clinical context.
Of course, this does not mean that there is no risk of numerical data getting excessive weight from either the client or the therapist. In fact, von Klipstein et al. (2020) state that therapists should be trained in using and interpreting ESM-based data models, because “[t]he collection of ‘objective’ data, ... create[s] an appearance of objectivity that is not supported by firm evidence” (6). Nonetheless, clinical ESM demonstrates a case that stands a good chance of reaching “epistemic parity” between the qualitative and the quantitative. 5
4. Additional Limits of the Numerical
Next to the aforementioned nuance, clinical ESM introduces new limits of the numerical which could be fruitful additions to those presented in the book. There are two limits that we consider especially salient for philosophers of social science and measurement, and which will form the remainder of this review: the reactivity of quantitative measurements, and the consequences of using Ballung concepts for measurement.
4.1. Reactivity
Reactivity refers to situations in which an individual changes their attitudes and behavior in response to being measured. The influence of quantification on behavior has been addressed in the book, for instance in the context of spurious precision. John (Chapter 6) defines spuriously precise numbers as “numbers that are justifiable, given our evidence, theories, and conceptual frameworks, but where other numbers would be equally justifiable” (2022, 145). 6 John’s main example is the UK health advice slogan urging people to eat “five fruit and veg a day.” The number five is spuriously precise in this context; after all, eating six portions of fruit and vegetables a day may be equally justifiable. However, John argues that spurious precision is not necessarily problematic: in this specific case, it simplifies and provides a guide for action. Spurious precision highlights that numbers—such as the number five in the “five fruit and veg a day” slogan—can affect our behavior. However, reactivity broadens this insight by showing how having to put one’s behaviors, thoughts, feelings, and context on a numerical scale may influence both a person’s behavior and attitudes about themselves.
To illustrate, let us imagine a scenario in which a client uses ESM to explore their sleep quality, amongst others. They have to answer the question “Did you sleep well?” on a daily basis, using a Likert-scale from 1 (“not good”) to 5 (“very good”). At the start of the ESM trajectory, they say that their sleep is “average” (3) when they lay awake for two hours at night. Their reason for doing so is that they have always been poor sleepers, so to them, laying awake for two hours is just an average night. After filling out the ESM questionnaire for a couple of weeks, they discuss the results with their therapist. They are shown an ESM-based data model, for instance, a partial correlation network, that demonstrates that their sleep quality is positively correlated to concentration problems and anxiety. This prompts the therapist to discuss the role of the client’s sleep on their mood. What kind of influences could the measurement and subsequent conversation have on the client? Consider three scenarios. In scenario #1, the client takes active measures to improve their sleeping hygiene and in turn scores higher on average on the sleeping quality questionnaire. In scenario #2, the client re-evaluates how they grade their sleep quality: instead of grading their sleep based on how they sleep on average, they grade their sleep based on whether they feel rested the next morning. Finally, in scenario #3, the client changes what it means for them to sleep well. They may realize that the grade they give to their sleep quality should not only be determined by whether they woke up at night, but also by whether they went to bed later than average. This example illustrates how giving answers regarding one’s behavior on a numerical scale may subsequently influence one’s behavior and attitudes.
Reactivity is frequently discussed in relation to ESM, because the intensity and frequency of ESM assessment may increase the chance of reactivity taking place (see Eisele et al. 2023 for a recent review and empirical study). However, does reactivity always constitute a limit to the numerical? Reactivity is usually framed negatively, namely as a risk to ESM’s ecological validity (e.g., Myin-Germeys et al. 2009; Palmier-Claus et al. 2011; Wray, Merrill, and Monti 2014). But, in line with John’s analysis, we may ask: does it have to be? To this end, we can explicate different types of reactivity that could take place in psychological measurement (cf. Runhardt 2021). Let us go back to the example. Scenario #1 provides an example of alpha reactivity, that is, a change in the numerical outcome of a measurement due to the individual being measured (Golembiewski, Billingsley, and Yeager 1975; Runhardt 2021). This measurement can still be ecologically valid: the question is interpreted in the same way as before, only the client’s behavior has changed. However, both scenarios #2 and #3 show that the numerical can change how the client thinks about their sleep quality in response to being measured. This type of reactivity is referred to as gamma reactivity (Runhardt 2021; cf. McClimans et al. 2013; Fabian 2022), that is, a change in the numerical outcome of a measurement due to a change in how the phenomenon being measured is characterized. Gamma reactivity could influence the ecological validity of ESM: because the client has “recalibrated” the Likert-scale and/or reconceptualized sleep quality, we do not know whether the client’s sleeping patterns really improved or worsened over time. 7 So, as in the case of John, the potential harm of reactivity should be decided on a case-by-case basis.
This example shows how clinical ESM broadens the issue of spurious precision already present in the book, by showing us other ways in which the use of (numerical) measurement may affect the individual being measured (from their behavior and attitudes to the scales they use and the concepts they use to think about themselves). While Limits of the Numerical provides a starting point for the study of reactive measurement, it is by no means the final word on the issue. In the next section, we will discuss a second issue for which this is the case: ESM’s decision to utilize Ballung concepts in its questionnaire items.
4.2. Ballung Concepts
Ballung concepts are conceptualizations of phenomena that are vague and can be interpreted in multiple ways (cf. Cartwright and Runhardt 2014). We see various instances of Ballung concepts in clinical ESM. For instance, von Klipstein et al. (2023a) included the following item in their personalized clinical ESM questionnaire: “Since the previous beep, I avoided something.” This item can be interpreted in different ways. One may say “yes” to this question if one has avoided a confrontation, but also if one avoided working on a cognitively challenging task at work. Why, then, do ESM researchers utilize Ballung concepts in their items? One reason might be exactly what the authors of Limits of the Numerical have picked up: the criticism that quantification can obscure “multidimensionality and heterogeneity of values” (Newfield, Alexandrova, and John 2022, 13). Indeed, this specific item was included in collaboration with the client; they were able to give their own personal, meaningful interpretation of the conceptualization. So, using Ballung concepts to personalize (such as avoiding things) places the burden of conceptualization (and, to some extent, representation) on the client.
While using Ballung concepts in ESM could answer the criticism of quantification that individual narratives and values are often ignored in measurement (Newfield, Alexandrova, and John 2022, 13), it brings with it a host of new issues, of which we will highlight two here. First, the decision to use Ballung concepts means we cannot know whether some data collected for a given individual is at all comparable to that of other individuals, let alone whole populations. Thus, an overemphasis on personalization brings with it issues of external validity. However, given our focus on the N = 1 cases of clinical practice, this problem can be circumvented. The second problem is more pressing, however: the decision to measure in ESM with Ballung concepts means that without further analysis, we cannot be sure a numerical value means the same thing at different times. 8 This latter problem is also recognized by psychometricians. For instance, von Klipstein, Stadel, et al. (2023) explore the lack of context-specificity in personalized ESM measures, also referred to as the “contextual black box” (Mestdagh and Dejonckheere 2021). Take the variable “being in company.” Not every type of company is the same: one may get along better with some people than with others. Hence, whether one enjoys being in someone’s company will depend strongly on who that person is. Researchers have suggested that this lack of context-specificity can be overcome through both personalization and further contextualization. For instance, response items could be personalized (e.g., selecting the type of company rather than just yes/no), and extra contextual information can be provided by including more open-ended response options (von Klipstein et al. 2023b). So, the use of Ballung concepts in clinical ESM may serve a purpose, but may also introduce new limits of the numerical.
5. Conclusion
Limits of the Numerical provides a thorough, nuanced, and timely account of the role of quantification in the social domain that provides relevant insights for philosophers of the social sciences. The case studies presented in the book demonstrate the epistemic and practical/moral benefits and limits of numbers in the social domain, and how we can make efforts to (re)contextualize the numerical. This review essay showed that the issues that the book addresses can also shed light on the use of the numerical in other domains, such as mental health care. By exploring how ESM data is obtained and used in the clinical context, we show that numbers can be both personalized and contextualized, and thereby circumvent some of the claims made in Limits of the Numerical. The hopeful message that Limits of the Numerical wants to bring across is that bridges between the quantitative and the qualitative can be made; clinical ESM demonstrates this possibility. Next, we showed that clinical ESM highlights dimensions of the numerical that have been underexposed in the book. All of this suggests that Limits of the Numerical can serve as a starting point for future analysis of the limits—and benefits—of the use of numbers in the social domain.
Footnotes
Acknowledgements
We would like to thank the participants of the 2023 Annual Conference of the European Network for the Philosophy of Social Sciences, and our co-panelist Chris Newfield in particular, for the fruitful discussion of Limits of the Numerical during our book symposium.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Rosa Runhardt’s contributions to this article were supported by the Spanish Ministry of Science and Innovation under grant (PID2021-125936NB-I00), “Evidence and Mechanisms in the Social Sciences” (EviSoc).
