Abstract
The main question of this paper is how people may agree in their interpersonal comparisons of wellbeing. These comparisons are important in social ethics and for policy purposes. The paper firstly examines grounds for convergence in easy cases. Then comes a more difficult case of low convergence in order to explore a way to increase it. For this, concepts from the empirical subjective well-being literature are used: life satisfaction and vignettes. Ideas of John Harsanyi and Serge Kolm thereby receive a new look.
Keywords
Introduction
The comparability of well-being between people is important for questions of social ethics and economic policy. Policy makers must often decide how to distribute scarce resources and the way in which alternatives are going to impact well-being for different groups of citizens will then be one of the major criteria. The central question of this paper is how different comparers (or spectators) might agree in making their comparisons. How could there be convergence among them? After all, even if various spectators are able to make interpersonal comparisons, that says nothing about whether they will agree in their estimates. This seems especially difficult if the spectators are cut from a different cloth. This paper examines grounds for convergence and ways to increase it. For this, this paper uses desire satisfaction as the concept of well-being and makes the initial assumption that spectators compare by means of empathy.
My general argumentative strategy is to start with relatively easy cases of comparability and then work toward more difficult cases. The easy cases serve to explore the possibilities of convergence. The productive elements carry over to the more difficult cases but they are not sufficient for the difficult cases. The more difficult cases demonstrate sources of divergence, which calls out for dampening.
The structure of the paper is as follows. Section Background: Harsanyi and Kolm gives some background. Section One to one addresses how empathy functions for one spectator examining just one person. Here I shall make use of recent work on folk psychology. Section One to two, and two to two extends to one spectator looking at two people and then two spectators looking at the same two people. This works by means of adding a similarity metric to the folk psychology. Section Many to many, life satisfaction introduces a more difficult case with more spectators and more people being compared: a group of councilors in a municipality who must decide whether to build a museum or a sports facility for the inhabitants To reduce sources of divergence, firstly “life satisfaction,” from the empirical literature on subjective well-being (SWB), is proposed as a proxy for desire satisfaction. Secondly, section Vignettes discusses a procedure from the same empirical literature to calibrate scales, viz. anchoring vignettes. Thirdly, section Redesigning vignettes extends the vignette approach in order to reduce the specific problem that different spectators might be using their own (and varying) ideas and values when comparing people. Section Comparing with two other approaches: equivalent income and extended preferences briefly compares with two other approaches: equivalent income and extended preferences. Section Conclusion concludes.
Background: Harsanyi and Kolm
There is no doubt about the fact that people do make, or at least attempt to make, interpersonal comparisons of utility, both in the sense of comparing different persons’ total satisfaction and in the sense of comparing increments or decrements in different persons’ satisfaction. Harsanyi (1955: 316, 317)
In this early (1955) article, the economist and philosopher John Harsanyi defends that utility in terms of satisfaction can be compared between people. The current paper takes a similar line. More precisely, I will understand well-being as psychological desire satisfaction, a substantive mental state. Hence I will not understand well-being in terms of formal preference satisfaction, as is done in much of modern economics.
What must be compared, according to Harsanyi, are alternatives as experienced by individuals. So one must not compare a museum with a sports facility per se but, say, a museum for Joe and a sports facility for Zoe. He calls these “extended alternatives.” A spectator can then form a utility function over these extended alternatives and possibly assign higher utility to the {sports complex for Zoe} than the {museum for Joe}. This spectator may for example derive more satisfaction from the museum herself but observe that Zoe and Joe have different dispositions, tastes and ideas than herself. By knowing the underlying nature and nurture factors that shape another person's level of satisfaction one can in principle infer any person's utility function. This is Harsanyi's “similarity principle” (1979: 301): (..) the principle that, given the basic similarity in human nature (i.e. the fundamental psychological laws governing human behavior and human attitudes), it is reasonable to assume that different people will show very similar psychological reactions to any given objective situation, and will derive much the same utility or disutility from it — once proper allowances have been made for any empirically observed differences in their biological make-ups, in their educational and cultural backgrounds, and more generally, in their past life histories.
Differences between individuals can thus find proper expression in utility functions over extended alternatives. People differ in their appreciation of cultural goods and thus we should not compare {museum} with {sports complex} but {museum for Zoe}, {museum for Joe}, {sports complex for Zoe}, and {sports complex for Joe}.
Harsanyi not only claimed that utility (as psychological satisfaction) can be compared between people, but also that, given enough information, there will be a high degree of convergence between spectators. You and I (and anybody else) will form the same extended utility function (so that we can speak of just one impartial spectator, reminiscent of Adam Smith's Theory of Moral Sentiments). You and I will form the same extended utility function, given the same information about Zoe and Joe and that we use the same general psychological laws. Utility functions over a set of goods may differ between people but there will be just one extended utility function over such a set of goods.
As said above, in the current paper, we will interpret satisfaction in the substantive way, as a mental state. Now in Harsanyi's work there are also places where he talks about interpersonal comparability of utility in the formal way of preference satisfaction. Apparently, he sometimes thought that spectators could also construct a universal extended preference ordering, given enough information.
1
(..) each individual's preferences will be determined by the same general causal variables. Thus the differences we can observe between different people's preferences can be predicted, at least in principle, from differences in these causal variables, such as differences in their biological inheritance, in their past life histories, and in their current environmental conditions. (1977: 58) If two persons have preferences which appear to differ, there is a reason for this, there is something which makes them different from each other. Let us place this ‘something’ within the object of the preferences which we are considering (..).
2
It has, however, proven to be very difficult (if not impossible) to arrive at these conclusions through the framework of the modern preference satisfaction theory of well-being.
3
This paper does not add to this critical literature but explores what is possible if we leave this framework and use desire satisfaction as a concept of well-being theory. A number of critics of Harsanyi-like proposals have remarked that such a route will not encounter the same difficulties but few have started to concretely explore this possibility. A notable recent exception is an article by Barrett (2019) that investigates the prospects of desire satisfaction. It considers three ways in which desire strength might be measured: by means of motivational strength, pleasure and pain, and reinforcement learning. If these factors can be objectively understood and measured, then we could compare desires across people by comparing desires in those underlying terms. Barrett (2019: 235) admits that this approach requires more work: Of course, whether this strategy is ultimately viable turns on various further questions in neuroscience (what is the physical basis of motivational strength?) and the philosophy of mind (can we identify motivational strength with its physical basis?). Further work, perhaps of an interdisciplinary nature, is therefore needed before anything more definitive can be said.
This paper agrees with Barrett's approach in exploring desire satisfaction for interpersonal comparisons of well-being. Life satisfaction is, however, an already established concept in the empirical well-being literature. That does not mean it is perfect, but it does contain sufficient ingredients to develop it further, or so it seems. It would be interesting to see if ideas like Barrett's can be combined with SWB concepts. Some work has already been done in this direction, viz. studies researching possible links between a variety of physiological factors and introspective SWB judgments. 4 It is, however, beyond the scope of this paper to explore this.
Another discussion item in the literature, which we do engage with, is that Harsanyi and Kolm think that all spectators will indeed agree and arrive at the same utility function. But why would that be? The point is raised by John Broome, who imagines a comparison between an academic who cherishes doing research and a financial analyst who holds material wealth dear. Broome asks why he himself, also being an academic, and now as a spectator, would make the same comparison as the financial analyst would make. Would not the cause of their different personal utility functions also make their extended functions different? Kolm seems to think that, by treating the causes that act upon me as objects of my preference, I can somehow withdraw myself from their influence. But I cannot escape from my own causal situation. (Broome, 1998: 37)
Broome has convincingly demonstrated that this does not work for the formal preferences account. But this does not imply that the idea of building agreement among spectators by means of causal information about differences cannot work if we go beyond this account and use a substantive notion of well-being (like life satisfaction).
Harsanyi and Kolm had ideas and arguments which remain promising, when put in a new jacket. I refer to these ideas as the paper develops: the idea of general psychological laws, the similarity postulate, and the idea of reducing divergence by means of tracing causal differences. A sub-goal of this paper is to argue that a revised combination of these ideas delivers convergence among spectators.
One to one
Harsanyi (1982: 50) said: “[t]he basic intellectual operation in (…) interpersonal comparisons is imaginative empathy.” To estimate how other people are doing, in their circumstances, one must exercise empathy. One must put oneself in their shoes, as the saying goes. I will assume that this is indeed the basic mechanism and that desire satisfaction is the central concept of well-being herein (to be complicated in section Many to many, life satisfaction). Now how does empathy work? I propose to investigate this by means of the recent literature on “folk psychology.”
In daily life, we constantly try to make sense of what other people are doing, how they perceive the world, what they want and what they find important. Successful interaction often requires this. What does this folk psychology amount to? The dominant view in this literature, until recently, was that this psychology basically consists of ascribing beliefs and desires to other people.
When one tries to understand and predict another person, one formulates a set of hypotheses about this person in terms of beliefs and desires. The contents of these hypotheses are not fixed; they can always be updated when new information comes in. What one keeps constant, however, as long the other person is conceived of as a fellow rational being, are some structuring (or organizing) notions, namely that the person is striving toward the satisfaction of her desires, that her beliefs serve to track the truth, and that her beliefs and desires are largely consistent. 5 The notion of desire satisfaction is thereby already contained (albeit implicitly) in folk psychology. Considering an individual acting in a certain context, e.g. work or family matters, we assume that she makes choices that optimize her satisfaction, given the constraints that she is under.
A recent development in this literature paints a richer picture of folk psychology. 6 Beliefs and desires are still there, as the structuring notions of truth, satisfying and consistency, but are now supplemented with other mental ascriptions like intentions, knowledge, emotions, stereotypes, and character traits. Traditional belief-desire theorists may argue that these other mental states can all be eventually reduced to beliefs and desires, and this may be true in some sense, but the point is that people actually use them as a further structuring device. In the vast sea of possible beliefs and desires that someone could hold in a given situation these further ascriptions narrow down the feasible options and help to assign provisional weights. Perhaps it is true that to predict a specific piece of behavior one simply needs some final set of desires and beliefs that are causally going to produce the behavior, but to get a grasp on this final set we often need other desires and beliefs which we efficiently filter by means of ideas about someone's knowledge, intentions, stereotypes, character traits, and so on.
For the purposes of this paper, I now focus on stereotypes and character traits, because these are going to be of help in addressing the main question of this paper, how different spectators can find agreement in their comparisons. Stereotyping by means of gender, age, profession, clothing, social class, education, hairdo, and what not, offers an initial clue about a person's ideas and goals. Seeing a thin and bookish type strolling by, one may predict that she is on her way to the bookshop at the end of the street, not the fast food restaurant next to it. Some stereotypes are very general. Most of us suppose that almost any human being values health, a house, some good friends, a bit of money, leisure time, etc. One could say that “objective list” accounts of well-being catalogue such stereotypes but here only in a hypothetical fashion, since any particular individual can deviate from such lists. Almost everybody wants a house to live in, but mathematician Paul Erdös did not, living out of a suitcase and travelling around seminars, conferences and co-authors. Almost everybody wants a daily decent dinner but neurologist Oliver Sacks did not care much for decades and ate sardines out of a can each evening, standing at the sink. Such exceptions do not refute the practicality of the stereotypes. However, it means that any stereotype can be updated on the basis of further information about a particular person.
Ascribing character traits also provides us with a set of helpful filters. We learn about typical behavioral dispositions and what a person finds important. Character traits also tell us how someone perceives the environment, sometimes even what affordances it provides. Looking at a street plan of a new city, Joe sees architectural analogies to explore while Zoe sees tracks to run, or a possible parcourse. Character traits also tell us what someone's most important values are, what they care deeply about and are committed to—e.g. the family dinner next week, some tournament, a conference, the garden.
This is how I propose to understand what Harsanyi called “general psychological laws.” They make up this general folk psychological structure of beliefs, desires, stereotypes and character traits, truth tracking, optimization, and internal consistency—with stereotypes providing provisional empirical content of a general nature and character traits of a more specific individual nature.
For understanding other people, it is not required to somehow experience or simulate their psychological states oneself. Introspection is not necessary. Adler (2014) discusses “the psychological strategy” as a way to escape the problem of comparing people when they are very different from oneself (he calls it “the essential attribute problem”). 7 The psychological strategy, according to Adler, would be to imagine not people's characteristics and their circumstances but directly try to experience how they would feel, see, and think. He criticizes this strategy for imagining impossible states of affairs when the subjects of comparison are very different. My approach here is different in two ways. Firstly by circumventing introspection, one infers hypotheses about other people's mental states on the basis of behavioral evidence, stereotypes, imagined character traits, consistency. These elements together make up a model. They need not be routed through one's own psychology. 8 Secondly, instead of starting with hard cases, I suggest starting with easy ones.
As said above, a notion of desire satisfaction is one of the structuring assumptions of folk psychology. This means that we constantly make implicit and provisional relative well-being estimations in the form of desire satisfaction when we interpret other people. Diana is offered a better paid job in a new city, far away. Knowing her, Wang Shu thinks she will decline, since she finds her circle of friends more important than the money. Her well-being would decrease in this new job. An estimation of someone's well-being involves filling in the requisite psychological and situational details for the situation at hand. (But does everybody always use the notion of desire satisfaction when evaluating other people's well-being? In section Many to many, life satisfaction, we will complicate this picture).
One to two, and two to two
To compare well-being between people requires that a spectator has more information than just people's orderings of their own well-being states. The spectators also need to know the differences between their states, i.e. the strength or cardinality, of this measure. Furthermore, one must then use a univocal scale. If there were several scales corresponding to well-being measures of individuals, then in order to make a valid comparison, these scales must be reduced to one, i.e. a spectator must normalize the several scales.
In our daily lives, we compare between people by ascribing just a few specific desires and individual characteristics to different people while holding many of them constant. This works well enough for easy cases. Harsanyi (1987: 1) asks: “Suppose I am left with a ticket to a Mozart concert I am unable to attend and decide to give it to one of my closest friends. Which friend should I actually give it to?”. This is an easy case because Harsanyi now only needs to consider cultural appetite and apparently knows that one friend clearly has a stronger interest in Mozart concerts, and thus a stronger desire. How could Harsanyi know such a thing? Well, presumably because he knows about his friend's relevant individual characteristics. During their interactions, Harsanyi has probably learnt that one friend talks more often about classical music, spends more time visiting classical concerts, is more willing to travel a large distance for them, and to pay a higher price. Perhaps this friend also shows more enthusiasm verbally and non-verbally. A collection of such gradual estimates underpin the idea that desires can be compared in terms of their strength, i.e. cardinality.
Normalization is not required in cases like this, because the point of departure for the comparison is already a univocal scale. This is because the default idea is that people are similar in enough aspects. One assumes that the other person would also talk about classical music, spend quite some time visiting concerts, travel large distances, and pay for costly tickets, were he to have the same strong desire for a Mozart ticket. One assumes that the two friends would make the same tradeoffs, given a certain desire. Therefore, the other way around, talking only very sporadically about classical music and never visiting such concerts generally indicate a low interest, that someone's desire for a Mozart ticket is not strong. This is Harsanyi's similarity postulate. The same behavior between individuals without further discernible bodily and mental differences designates the same level of well-being.
Binmore (1994: 225) raised an objection against the similarity postulate. He argues that very similar people in very similar circumstances can still have very different evaluations of certain things. For example, P always decides differently than his twin Q when faced with the choice between X and Y. It may in fact be the case that deep down in P's and Q's histories, there has been some causal condition at a certain point in time provoking this difference, but that was just a flicker of some sort, not really something that would be called an individual ‘characteristic’. A tiny causal flicker in someone's early personal history, for example, could make a large difference. For this reason, Binmore finds it necessary to invoke a mechanism that reduces divergence.
Notice that this problem of the causal flicker can only be acute to the extent that an individual can find something satisfactory without this having any bearing on how we commonly characterize people. For example, we must imagine that somebody values museums and books significantly higher than a very similar person in the same circumstances who prefers outdoor activities and sports, while this has nothing to do with age, physical condition, or being a type that enjoys sport or not. Is that plausible? If such a causal factor really makes a non-trivial difference, then there will be more traces, more systematic influences on behavior and personality.
It seems to me that the causal flicker problem is especially germane for (many scholars’ favorite examples of) things like wine and ice-cream. Kindred people may establish structural differences in what they deem satisfying when they order at the counter but goods such as wine and ice-cream are hardly ever the subject of a practical interpersonal comparison, for, say, a local government to decide where to invest public money. Wine and ice-cream flavors are paradigmatic items of subjective taste but quite unrelated to the kind of person one is. So note that the goods that may first come to mind, when considering this objection, are often actually the least relevant. So the problem of the causal flicker is itself limited. But I agree with Binmore that divergence will occur (albeit caused in other ways) and that it must be reduced. This is addressed in sections Many to many, life satisfaction–Redesigning vignettes.
Let us now continue with the similarity postulate. Differences between people only make sense against a background of similarities. (Davidson (1986: 208) argued that the similarity postulate is not an empirical claim nor a methodological one but a necessary condition of correct interpretation). 9 When there is more distance between participants and spectators stereotypes take over the role of individual characteristics. But still, a spectator will assume that people are alike in at least some respects. Aren’t we all featherless bipeds? Again, almost everybody wants a house with a roof, some close friends, a partner, a steady income, and a daily nutritious meal. Also, without evidence to the contrary, we assume that we weigh these things in the same way, that we make the same tradeoffs. Such notions provide a first conjecture, a basis to get a comparison started. Starting from such a basis, one can modify on the basis of new information (under constraints of consistency, truth tracking, and desire satisfaction).
Learning, for example, that P likes money more than Q, one modifies a couple of other desires in the wake of this: P is probably more materialistic, more concerned about having a big house, and so on. Or, returning to Harsanyi's two friends, suppose that one friend is a more exuberant person while the other friend is more temperate. The exuberant friend shows more enthusiasm not only for the ticket but generally. This is a reason to inflate the scale somewhat of the more temperate friend—calibrating, say, his “Hmm, yes, okay” with the other's “Yes! Terrific!!!” response to the Mozart ticket. Revising the similarity metric in daily interactions is a matter of the same folk psychology, or what Harsanyi calls our general psychological laws.
Let's now involve a second spectator, in the Mozart case, alongside Harsanyi. Now if this second spectator is also quite close and just as well acquainted with the two friends, it seems fairly obvious that she will draw the same conclusion as Harsanyi did. Given the same information about the two and the application of the same folk psychology—driving large distances indicates strong desire etc.—she would also give the ticket to the classical music lover. Two spectators can easily make the same interpersonal well-being comparisons when they access the same data and use the same folk psychology.
Many to many, life satisfaction
Stereotypes and other generally held psychological ideas in a society also provide some common basis for a multitude of people making interpersonal comparisons. A body of ideas in a society about how humans are in general and certain types more specifically furnish a degree of convergence among spectators. Of course with most groups of spectators there will be less agreement than among circles of friends and acquaintances but there will still be a fair number of cases that are relatively easy to decide within a population, cases that will produce agreement in interpersonal comparisons of well-being.
Let us now move toward more difficult cases. Suppose a local government has to decide to invest in one of two equally costly projects, a new museum or a new sports complex, and that well-being impact is going to be the major criterion in their decision. The municipality does not have the time and the money to conduct a survey among their inhabitants, so that the group of councilors must make interpersonal well-being comparisons themselves. A quick and informal examination shows disagreement: some councilors think that the average citizen will benefit more from a new museum while others think that a new sports complex will produce more well-being in the city. Now how could this divergence among the spectators/councilors be reasonably reduced?
Firstly we must ask: what could be sources of divergence? Four such sources can be distinguished: (a) spectators use different concepts of well-being; (b) they use different scales; (c) they use their scales in different ways; and (d) they use different characterizations of the people they are evaluating (for example, because they project their own values and desires upon them). Regarding (a) and (b) I propose to use the measurement concept of “life satisfaction” with a corresponding scale—which is discussed below. Regarding (c) and (d) I propose the use of anchoring vignettes and their possible redesign—which is the subject matter of section Vignettes.
There could be a lack of agreement among different spectators because they are using different conceptions of well-being. I have been assuming until here that most people would make intuitive well-being judgments in terms of desire satisfaction but it is now time to complicate this picture. Some councilors might, for example, be a kind of perfectionist. They judge the museum as best because they find that citizens’ well-being comes down to leading an excellent cultural and art-filled life. These councilors would in fact not put themselves in the shoes of their citizens at all since their idea of well-being is actually not subjective but objective. But it is of course possible that some spectators share such an idea. Or perhaps some councilors have a hedonistic outlook and judge the sports facility best because they believe that pleasure-pain sums are higher around sports fields than in museums.
To avoid such confusion at the level of the group of spectators and get them on the same page I propose the concept of “life satisfaction.” With this measure SWB researchers directly aim to measure psychological states (instead of deriving it from something else, e.g. choice behavior or physiological states): they ask how satisfied people are with their lives in general or in regard to a specific domain. A typical question for global life satisfaction is: “At this moment, how globally satisfied are you with your life? Indicate this on a 1—7 scale.” For specific domains, e.g. work: “At this moment, how satisfied are you with your job? Indicate this on a 1—7 scale.” Hence the scale is a numeric scale (often) from 1 to 7, with the numbers standing for: 1 = very dissatisfied, 2 = dissatisfied, 3 = slightly dissatisfied, 4 = neutral, 5 = slightly satisfied, 6 = satisfied, 7 = very satisfied.
I propose life satisfaction as a measure of well-being for four reasons. Firstly, we obviously need a concept of SWB, not objective well-being. (This is our starting point. For an account of objective well-being the problem of interpersonal comparability does not even arise.)
Secondly, difference in life satisfaction impact seems an adequate way of comparing policy alternatives like the museum and the sports facility. To illustrate, one could ask citizens to score their current life in the city (the status quo), and what their life satisfaction would be were a new museum added or a sports facility. The idea is that relative life satisfaction judgments of (X) and (X + a) are then reduced to a desire satisfaction judgment of (a), and thus that life satisfaction can serve as a proxy for desire satisfaction. 10 As long as the (a), the alternative of interest is sufficiently foregrounded in the description of the comparandum, this seems a fairly plausible presupposition to make. It is quite common, anyway, to examine people's attitudes in indirect ways in empirical social science, because explicit attitudes (like stated desires or preferences) can be unreliable.
Thirdly, within the field of empirical SWB research “life satisfaction” is one of the most established concepts. A few decades of research has brought a wealth of empirical results and methodological refinement. There is a broad variety of good correlations with other SWB measures like affect measures (that probe emotional responses), and physical and mental health indices. The measure is fairly stable across time and situations. There are robust associations with circumstances like income, family situation, and employment. 11 And the measure is pragmatic, fairly accessible for respondents and spectators.
Fourthly, for the application of anchoring vignettes below (in sections Vignettes and Redesigning vignettes), we will need a measure of SWB that is based on self-reports, which has the same meaning for all observers, and allows for reasonably smooth third person empathy.
These four reasons make life satisfaction fitting, but they do not single out this concept as uniquely appropriate. So it is not the only game in town. Section Comparing with two other approaches: equivalent income and extended preferences compares with two other current approaches, one of which also appears to be quite suitable.
The concept of life satisfaction is not uncontroversial. Let us now discuss some critical points. To begin with, Haybron (2011, 2020) and Hausman (2015) have asked whether it is not too mind-boggling to ask people to evaluate a whole life in one go. Aggregating the satisfaction of all aspects of a life's parts seems too demanding. But various empirical studies show a good fit between global life satisfaction and satisfaction with domains like work, family, health, or finances (e.g. Busseri and Mise, 2020; Rojas, 2006). This would be puzzling if the global life question was answered whimsically. Oishi et al. (1999) and Hsieh (2012) show that most respondents have a reasonably good idea of how to answer global life satisfaction questions. They make a rough estimation by attending to what they find the most important parts. And note again that all the strong correlations of life satisfaction with other measures and with personal and material circumstances would otherwise be a mystery. Payne and Schimmack (2020) demonstrate that agreement between self-reports and informant reports (e.g. by parents or partners) support the contention that life satisfaction is based on important life domains. This does, of course, not mean that questionnaires cannot be improved and that the SWB results always offer a valid reflection of the variable of interest. 12 For some studies, it might be helpful to guide respondents more explicitly toward their most important values. 13 But the idea that life satisfaction does not relate to what people find important appears to be ill-founded.
Another issue is that some early studies have shown that life satisfaction questions are answered differently due to irrelevant details like the weather conditions of that moment, whether the waiting room was tidy or not, or to the order of question items. These studies are often cited and they have also given rise to the development of more ecologically embedded measures. 14 Currently, however, the situation has changed in the field since quite a few of these often cited results were simply not replicated in later studies. Furthermore, many such situational effects disappear when multiple items are used instead of just one, or when buffer items are added to the list. 15 Through the years, it is clear that the empirical life satisfaction methods have become more sophisticated.
What about the life satisfaction scale? In the SWB literature, most scholars simply assume cardinality for life satisfaction, without further discussion. But let us here consider this property, firstly intuitively. One basic consideration is that it makes sense to compare differences in strength for a person, for example, that the difference in terms of work satisfaction between Zahra's current job at t = 3 and the job she had at t = 2 is clearly bigger than the difference between job t = 2 and job t = 1. It seems meaningful to say that the last change in jobs made her much happier than the change before that. Or that her new house yields a bigger improvement than her first move between student apartments. (This possibility of measuring differences gives an interval scale.)
It also seems meaningful to say that there is a neutral point in between positively satisfying and negatively dissatisfying goods or events. One can judge one's life going well, being positive well off, but also that one is doing badly, that one's life is depressing and has negative value overall. One can also say that a specific alternative does not positively or negatively contribute to one's life satisfaction, for instance that a new museum leaves one cold. And it seems reasonable that a neutral judgment can result from a positive factor canceling out a negative one, e.g. that one's positive satisfaction from a nice working environment is exactly reduced by dissatisfying working hours. (These arguments support a ratio scale.)
Empirical support for the assumption of the cardinality of life satisfaction is that the quantitative data correlate well with other SWB data, e.g. with how often people smile, their blood pressure, occurrences of depression, etc. These intercorrelations would be surprising if the data only conveyed ordinal information. More specifically, Kristofferson (2017), by means of a mathematical technique called conjoint measurement, recently made a case for the cardinality of life satisfaction data by demonstrating a good linear fit with a certain validated cardinal mental health index. 16
Lastly, presenting people with a numerical scale in advance—as I am proposing here—must have some disciplinary effect. Van Praag (1991) has shown that people interpret a sequence of context-free verbal labels like “very bad, bad, not bad, not good, good, very good” as having equal intervals. 17 Numeric measurement scales for life satisfaction of course strengthen this effect. They push for a cardinal grasp of this SWB measure. 18
Life satisfaction data are collected by directly asking people about it. But the folk psychology that we discussed in section Background: Harsanyi and Kolm—with beliefs, desires, and stereotypes and character traits as filters—is still at work. Folk psychology carries over to life satisfaction when this measure is applied. When the councilors at the municipality decide that a museum and a sports complex are the relevant options to consider, they must first have had thoughts and ideas on what are generally desirable goods for their citizens. Something like a museum is a desirable option, a dunghill is not. How these options are described and presented to the citizens will also draw on a common stock of ideas. Only certain aspects of the capacity and functionality of the buildings are mentioned, not their capacity to cast shadows, the height of the entrances, or the color of the banisters.
Thus life satisfaction and a corresponding 1–7 scale provide steps for reducing divergence; they get the councilors on the same page. But might the councilors not vary in their use of this common scale? Perhaps people of different cultures, backgrounds or personalities use the extremes and the middle of the scale in different ways. We will address this problem by means of anchoring vignettes in the next section.
Vignettes
Angelini et al. (2013) use the vignette method to calibrate life satisfaction scores from citizens from 10 European countries. Kapteyn et al. (2010) use it to compare how life satisfaction varies with income between people from the U.S. and people from the Netherlands. There are many more examples. The application of the vignette method in the field of SWB is blossoming. To see how it works let's firstly look at a seminal work by political scientists King et al. (2004).
Vignettes are short descriptions about individuals in specific circumstances that pertain to the variable of interest. King et al. (2004) apply the vignette method to examine the comparability of survey responses to “visual acuity” between Chinese and Slovakians. To calibrate, King et al. used eight separate vignettes, like this:
[Angela] needs glasses to read newsprint (and to thread a needle). She can recognize people's faces and pick out details in pictures from across 10 meters quite distinctly. She has no problem seeing in dim light.
Respondents are asked to evaluate vignette descriptions and these scores are then used to calibrate the self-reports. Judging by the self-reports visual acuity is very similar in China and Slovakia. They score the vignettes quite differently, however. The Chinese tend to find that Angela's visual condition is less impaired than the Slovakians find it to be. A typical Chinese person evaluates Angela's situation as not too bad (mild), while a Slovakian labels her eyesight in the moderate region. Since Angela's vision just is what it is, it can serve as an anchor. The Chinese and the Slovakians are looking at the same thing (somebody's vision) but they interpret the scale items differently. By using two or more vignettes one can stretch one scale like a rubber band and match it with the other one. This results in moving the Slovakian self-reports up in relation to the Chinese. Then it appears that visual acuity is worse in China. 19
The presupposition here is “vignette equivalence,” namely that the variable exists and is perceived by all respondents in the same way. Angela's visual acuity has a specific value, no matter the respondent who happens to judge it. Different people from different parts of the world may give different scores but Angela's vision just is what it is—it is essentially the same for everybody. Therefore, it can serve to calibrate self-reports.
Now for an estimate of life satisfaction one would need information about a person's material circumstances combined with that person's individual characteristics. Here are two examples of vignettes from a seminal paper in the life satisfaction literature “Comparing Life Satisfaction” by Kapteyn et al. (2009). These vignettes are meant to investigate global life satisfaction:
Global 5: (Name) is 25 years old and recently married, no children. He/she works about 35 h per week and makes xxx (half the median, median, twice the median). He/she works out regularly and on vacations he/she makes long hikes in the mountains with his/her husband. His/her job is satisfying, though a bit dull sometimes. He/she feels she does not have a lot of control over his/her job, but it is a very secure job.
Global 7: (Name) is 75 years old and a widow. His/her pension benefits are xxx (half the median, median, twice the median). He/she owns the house he/she lives in and has a large circle of friends. He/she plays bridge twice a week and goes on vacation regularly with some friends. Lately he/she has been suffering from arthritis, which makes work in the house and garden painful.
The way in which Kapteyn et al. have constructed these vignettes seems clear. They have put in information about general circumstances, like income and wealth, social relations, autonomy, and health. The intuitive (folk psychological) ideas are presumably that most of us value these things, so that more of it a will yield a higher satisfaction score and less of it a lower score. These are broad stereotypes. Then follows some information that is characteristic for the particular person of the vignette. From Global 5, we learn that the person likes to hike in the mountains and to exercise in general and from Global 7 that he or she is a bridge player. This may imply, for example, that an investment in improving the network of hiking trails will make a substantial well-being impact for the hiker while it will hardly make a difference for the bridge player.
So the idea is that a vignette person, just like a real person, has a certain life satisfaction given the way she is and the circumstances that she is in. There could be different estimates coming from different spectators from two different countries, for example, but still, there is just one estimate that is correct. It is the same as with visual acuity. Someone's life satisfaction just is what it is. Kapteyn et al. (2010: 86) on this: “A vignette question describes the satisfaction of a hypothetical person and then asks the respondent to evaluate the satisfaction of that person on the same scale (..). Since the vignette descriptions are the same in the two countries, the vignette persons in the two countries have the same actual life satisfaction or happiness.”
Of course the descriptions could be too brief. This means that more information should be added. Ultimately vignettes can be tested (and designed) by means of a real counterpart, an existing person. Then, there would be an objective life satisfaction report underpinning the vignette. The report would of course be subjective as it depends on a particular individual's experience but it is objective in the sense that different observers can in principle make the same observation about this individual. In the literature, this is called the requirement of response consistency for vignettes: Person X's self-reports must be the same as Person X's evaluations of a vignette with person Z, with Z ≈ X (i.e. ##a person Z who is alike in all relevant aspects to person X). 20
Vignettes make for a second step to reduce divergence among spectators. By means of vignettes our councilors can correct for variation in their use of the scale. Nevertheless, this does not necessarily yield agreement because different councilors can characterize the citizens in diverging ways, for example, because they project their own values and desires upon them. In other words, the requirement of vignette equivalence, that the relevant variables are perceived by all respondents in the same way, can be violated. The next section deals with this issue.
Redesigning vignettes
For this, we need to look into empirical research on vignette equivalence. In this section I argue that if violations of vignette equivalence can be causally traced then this causal information might be of use in revising the vignettes. To get a quick idea, here are two examples. On the basis of interviews Su et al. (2017) found that a large number of their respondents (youngsters in rural China) had difficulty imagining distances in terms of meters and thus the researchers subsequently rewrote the vignettes by means of nonnumeric phrasings: “In the cafetaria, Xiao Wang can clearly recognize students sitting at his table, but not at the next table;” and: “From the last row in the classroom, Xiao Wang can clearly recognize his teacher but not the small written text on the blackboard.” Hanna Grol-Prokopczyk (2018) studied the performance of twelve general health anchoring vignettes. A remarkable violation of vignette equivalence occurs with the following vignette, Grol-Prokopczyk (2018: 57):
[Name] feels exhausted several days a week. S/he has trouble bending, lifting, and climbing stairs, and every day experiences pain that limits many of his/her daily activities. In the past year, [name] spent a few nights in a hospital, and over a week in bed due to illness.
Grop-Prokopczyk found that people from higher income groups and with more education perceive this health state as substantially worse than people with lower income and less education. She relates this to the fact that this vignette is the only one in a series to mention a hospital stay and then discusses research that has shown that for patients (in the USA) of lower socio economic status hospital care is often better than clinical care while the reverse is true for patients of higher socio economic status. The latter group makes more use of clinical care for minor problems and therefore associates a hospital stay with graver health issues. The author did not consider how to rewrite the vignette, but based on her findings it seems obvious that “spending time in a hospital” must be replaced by wording that is less prone to different interpretations, perhaps something like “had to get medical treatment” or “had to consult a doctor”. 21
As far as I can see, in the SWB literature, the idea of revising vignettes on the basis of causal factors that explain the violations of equivalence, and subsequently see whether the new vignettes solve the violation, has not yet been systematically explored. Thus this seems relatively untrodden territory. But the research by Su et al. (2017) and Grol-Prokopczyk (2018) suggests how it could work — how disagreement among spectators might be repaired by tracing it to certain causal factors and then using this causal information to redesign the object of disagreement. So let us now apply this to our case of the municipality and the councilors trying to assess the well-being impact of a new sports complex and a new museum. Recall that the councilors had a round of life satisfaction vignette evaluations to correct for possible differential use of the scale among them. Suppose that they were presented with the following vignettes (among others):
P is 35 years old, lives together with a partner and two young children. S/he earns a median income. S/he lives in a medium large city of 100,000 residents. The city has a movie theater and a swimming pool.
Q is 35 years old, lives together with a partner and two young children. S/he earns a median income. S/he lives in a medium large city of 100,000 residents. The city has a movie theater, a swimming pool and a museum.
R is 35 years old, lives together with a partner and two young children. S/he earns a median income. S/he lives in a medium large city of 100,000 residents. The city has a movie theater, a swimming pool and a sports facility.
The vignette with P resembles the status quo, the one with Q adds the museum and the one with R adds the sports facility. It was this round that resulted in disagreement, remember: one part of the group of councilors estimates higher life satisfaction among the citizens for the vignette with the museum while the other part thinks that the vignette with the sports facility will bring more of this. Now suppose that further inquiry brings a certain pattern to light. Councilors who regularly visit the movie theater gave a higher score to the vignette with the museum whereas councilors who regularly make use of the swimming pool did that for the vignette with the sports facility. (Here is a parallel with John Broome's example of the academic and the financial analyst).
22
It appears that the two groups of councilors differ in how they value certain circumstances because they are cut from different cloth. This information can be used to revise the vignettes. For example, like this:
J1 is 35 years old, lives together with a partner and two young children. S/he earns a median income. S/he lives in a medium large city of 100,000 residents. S/he is a cultural type. The city has a movie theater, a swimming pool and a museum.
J2 is 35 years old, lives together with a partner and two young children. S/he earns a median income. S/he lives in a medium large city of 100,000 residents. S/he is a cultural type. The city has a movie theater, a swimming pool and a sports facility.
Z1 is 35 years old, lives together with a partner and two young children. S/he earns a median income. S/he lives in a medium large city of 100,000 residents. S/he is a sportive type. The city has a movie theater, a swimming pool and a museum.
Z2 is 35 years old, lives together with a partner and two young children. S/he earns a median income. S/he lives in a medium large city of 100,000 residents. S/he is a sportive type. The city has a movie theater, a swimming pool and a sports facility.
Suppose that this produces convergence among the councilors: everybody agrees that the life satisfaction scores are: for J1 = 8, for J2 = 7, for Z1 = 7, and for Z2 = 8. This would mean that all councilors now find, irrespective of their own make up, history and standpoint, that cultural types like J1 and J2 (and Joe, remember from section Introduction) and sportive types like Z1 and Z2 (and Zoe from section Introduction) would experience the same increase in life satisfaction were their favorite alternative chosen. The municipality now has a reason to evaluate the two alternatives as being on a par in terms of individual well-being, and perhaps favor one because there is a larger proportion of the population that would be pleased with it or let some other value decide the issue (or flip a coin).
Of course it can also happen that spectators agree that one alternative is better. What would be some plausible ways in which this might happen? Here we can think of factors like duration, further effect, and the presence of substitutes. 23 It makes, for example, a difference if Zoe spent, say, fifteen hours a week at the new sports complex while Joe visited the new museum perhaps just twice a year. Another way could be that Zoe's time spent at the sports facility is somehow more important than Joe's time in the museum, for example because Zoe is a professional athlete. She makes a living out of her sport while Joe would only visit the museum for fun. Again another way could consist of the existence of substitutes. Zoe and Joe could spend equal amounts of leisure time but if Joe has a good second museum nearby while Zoe can only jump up and down on her small balcony, then it may be plausible to say that Zoe is much more benefited by the sports complex than Joe by the new museum.
Here again, note that the design and redesign of vignettes is done partly on the basis of folk psychology ; in this case, commonly held ideas about sportive types and cultural types, duration, work vs leisure, and which alternatives count as close substitutes. But equally important, some elements of folk psychology have now become corrected. In the first round of vignettes evaluation, a number of our councilors projected too many of their own characteristics (or used the wrong stereotypes) onto the citizens. This fact led to a revision of the vignettes that produced agreement in the second round. Hence in our case the design and redesign is done on the basis of folk psychology and scientific (SWB) psychology together.
Life satisfaction is an objective measure but it is important to see that just supplying a wealth of information to those in the role of making the interpersonal comparisons, the spectators, is not enough, because it is not on target. To decrease divergence, a causal relationship must be traced, which then indicates what the relevant information is.
The procedure begins with establishing a causal relationship with an individual characteristic of the spectator(s). However, not just any causal relationship will do. Suppose that the difference in scores can be causally traced to some neurological state, say denser synaptic connectivity in a certain brain area. A subset of spectators has been identified as having more synaptic connections per volume in a specific region of the brain and it is found that this correlates with higher scores for the cultural vignette. It is difficult to see how such information can usefully be incorporated into a vignette description. That is, we would not expect that adding this information is going to make any difference to the question of how it might foster more convergence among spectators. Causal information, I suggest, should be included if it makes a systematic difference which can be formulated in intentional language that appropriately relates to the possible goals and plans of the individuals concerned.
It is not guaranteed this procedure will dampen divergence among spectators. It is an empirical matter whether disagreement can be usefully traced to how spectators perceive individuals’ well-being and whether a specific redesign of a vignette will be successful. Moreover, there could be cases of false convergence, possibly based on an incorrect stereotype. Imagine our group of councilors now consisting of only cultural types. They would underestimate the well-being impact of the sports facility on people like Zoe. Including sportive types in the council then corrects this bias of the stereotype. 24 Thus to avoid such false convergence, it is best to have a group of spectators who can reflect the desires and interests of the citizens – the ideal of representative democracy.
Another problem is that some citizens might have desires that are unwelcome, e.g. because they are malevolent. Or some citizens could have interests that are somehow not expressed, e.g. because these people have adapted to their dire circumstances. Malevolent and adaptive desires require correction, a ‘laundering’ on the basis of other values or other well-being measures. These are problems for any SWB approach. Further discussion would take us too far beyond the scope of the current paper. Suffice it to say that in our case, this would be a responsibility of the governors and the councilors of the municipality. Hence spectators in this role must not only take care to reflect people's actual desires, but keep a critical eye open at the same time.
This paper has examined some factors and mechanisms that may contribute to increasing convergence. But that does not mean that it will always work. It would be interesting to investigate further what could be other systematic factors hampering or limiting convergence among spectators.
Let us now return to John Harsanyi and Serge Kolm. They argued that interpersonal comparisons of well-being can be made on the assumption that we know enough about individuals’ characteristics and their circumstances. John Broome objected that a spectator “cannot escape her own causal situation” when trying to make a preference ordering over extended alternatives. From an exchange between Broome and Kolm in the journal Social Choice and Welfare it appears that Kolm does not go along with Broome's target of critique, the modern preference approach. 25 Kolm does not talk about formal preference satisfaction but about psychological satisfaction or ‘happiness’. So that partly settles the issue; apparently Kolm and Broome had different conceptions of well-being in mind.
From their discussion it does not become clear, however, what Kolm thinks about Broome's point that one cannot escape one's own values while making a comparison. The present paper proposes ways to loosen the grip of these values. Firstly, given the approach outside of ordinalism, a spectator no longer needs to construct her own preference ordering over (extended) alternatives. Secondly, spectators instead directly compare life satisfaction with an a priori common scale and on the basis of broadly accepted and revisable ideas, not just their own (possibly idiosyncratic) values.
Comparing with two other approaches: equivalent income and extended preferences
This section compares the present approach for making interpersonal comparisons of well-being with two other approaches: equivalent income and extended preferences. Let's start with the equivalent income method. It is developed by Marc Fleurbaey and co-authors (e.g. Decancq et al., 2015; Fleurbaey, 2016; Fleurbaey and Blanchet, 2013). The equivalent income approach modifies regular income as an indication of well-being by also taking into account the non-income dimensions of life. The approach includes health, social relations, job satisfaction, and so on, by means of asking people about their willingness to pay for improvements alongside these dimensions. Take Elias, for example, who earns 50.000 per year and is in a moderate health condition. We ask Elias what income would make him equally satisfied were he in a state of perfect health. He says: 40.000. Nina also earns 50.000, and has the same health condition. When we ask her, she answers: 48.000. So Elias appears to value his health more than Nina. They may have the same real incomes and health conditions, but they are not equally well off. Elias current well-being is lower than Nina's, as indicated by the equivalent income measure.
The approach has been criticized for the arbitrariness of the reference values. What health state should one, for example, compare with? Fleurbaey and Blanchet (2013: ch. 4) reply that the fact that the impartial spectator as a social planner (in our case, a local government) decides and arbitrates this, is exactly a good thing, and that choosing reference values must obviously be done on the basis of sound reasons. The same goes for the vignette method. With vignettes the social planner focusses the comparison by selecting dimensions and choosing reference values. She decides, for example, the relevant reference income, the family situation, and which public provisions enter the comparison. 26
Equivalent income is cognitively a more demanding method of measuring well-being than life satisfaction, as Fleurbaey admits (2016: 466). This issue is exacerbated when we ask spectators to compare between people, as with our municipality councilors comparing the citizens. Because then the spectators must imagine how various people would make tradeoffs between their income and immaterial aspects of their lives. But here, just like with the life satisfaction approach, vignettes can be used to simplify by constraining the variables.
Given a set of reference dimensions and values that are the same for all spectators, somebody's equivalent income has objective meaning. Nina's equivalent income is 48.000, no matter who is observing her. As the same holds in principle for vignette persons (through response consistency), the method of causally tracing differences between the spectators’ estimations and then solving them by means of redesigning the vignettes can also be fruitfully applied to the equivalent income approach.
The extended preferences approach, developed by Matthew Adler and colleagues (e.g. Adler, 2014, 2016, 2022), is in the Harsanyi tradition by taking people's preferences as the point of departure, as seen by an impartial spectator. This spectator cardinalizes through the Von Neumann Morgenstern (VNM) method and normalizes by choosing a high level and a low level which is suitably the same for everybody (like perfect health and death, which are often used in health economics).
How can a specific spectator with certain characteristics imagine and compare lives of people with very different characteristics? We briefly discussed this issue in section One to one. Adler calls it the “essential attribute problem.” Adler (2014) explains that full shareability of characteristics is not necessary for a spectator with sympathy. Sympathy is a caring attitude someone has towards other people. Hence it is not the experiences of other people that a spectator somehow must re-experience and feel, it is just the spectator's own caring evaluations about other people's lives that are to be consistently ordered. But which parts and aspects of all these people's lives should the sympathetic observer exactly attend to? This, Adler says, is “an exceedingly difficult question” (2016: 487). This difficulty is also dependent on how the spectator trades tractability against realism. More realism comes with higher complexity, which means less tractability (Adler, 2022: 43).
In the present paper, we have discussed folk psychology and vignettes as ways to increase tractability. Adler's sympathetic spectators will also use folk psychology to ascribe preferences on the basis broad stereotypes and available information about individual character traits, all open to revision. They can also use vignettes as tractable life descriptions. But there is no underlying preference ordering with objective values that are the same for all spectators. Thus on Adler's approach vignettes would not be anchoring vignettes. Therefore, the vignettes procedure for tracing and dissolving distortions seems less fitting in the extended preferences approach.
The VNM method is obviously cognitively more demanding than both the life satisfaction and the equivalent income approach—even more so for a spectator who tries to imagine how people compare lotteries. Adler defends the VNM way of cardinalizing by saying that a complete measure of well-being should incorporate how people deal with risks. This is true but perhaps there could be other ways for this.
To sum up, for spectators, life satisfaction seems the most convenient way of comparing well-being between people. 27 Vignettes can be used in all three methods for constructing tractable and relevant comparanda. The vignette procedure for increasing convergence seems most appropriate for the life satisfaction and equivalent income methods.
Conclusion
Interpersonal comparisons of well-being are routinely made in daily human interaction. For this, I have assumed, we commonly use the notion of desire satisfaction in combination with a structure of belief and desire ascriptions, a set of adaptable filters in the form of stereotypes and character attributions, and consistency requirements. (This would belong to Harsanyi's general psychological laws.) Insofar as the content of the stereotypes and the form of the consistency requirements are indeed general, this structure provides some common basis for making comparisons. It furnishes us a common scale by default. (Here Harsanyi's similarity postulate comes into play.)
For more complex interpersonal comparisons of well-being, assistance from social science and philosophy is needed. I have suggested the concept of “life satisfaction” as a proxy and a corresponding a priori common scale. Convergence among spectators can be expected to some extent in a given culture, since stereotypes in a sense define a culture, but of course a high grade of convergence cannot be guaranteed for any concrete issue of comparison. There is a good chance to increase convergence, however, by means of anchoring vignettes. These are meant to align different interpretations of scale items. Furthermore, they can illuminate systematic deviations in how people understand each other's circumstances. Redesigning the vignettes on this basis can then improve comparability. (This echoes Harsanyi's and Kolm's idea of using the causes of differences as information to update the well-being function.)
Convergence among spectators can thus be increased by using a unifying concept of satisfaction with an a priori scale, and a procedure for identifying causal information that can subsequently be used to complement or modify the stereotypes of folk psychology.
Footnotes
Acknowledgements
Many thanks to Daniel Bracker, Lieven Decock, Govert den Hartogh, Geertjan Holtrop, Basil Nyaku, Rik Peels, Chris Ranalli, René van Woudenberg, and two anonymous reviewers for their very helpful comments.
Declaration of conflicting interests
The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
