Abstract
The purpose of this study is to investigate people’s image of income distribution and its difference by social position from data collected during a 2015 Japanese survey (SSP 2015) by applying Bayesian statistical analytical models. The income distribution image denotes the perceived and estimated income distribution of the individual and is supposed to be a basis of subjective belief on the features of society, including societal average income. In this study, the latent income distribution images were estimated from the observed variable of average income image. Furthermore, differences in income distribution image by social position were analyzed using Bayesian hierarchical models. The differences in income distribution image by age cohort and household income class were examined in terms of the mean (expected value) and the Gini inequality coefficient of the distribution image. It was found that although the distribution image tends to underestimate the average income level and overestimate inequality, the income distribution image could be an incomplete reflection of the income distribution characteristics of the reference group.
In the field of sociology, many scholars have focused on cognition of social aspects and phenomena, as well as the formation of beliefs or society image, as a basis of rational actions or choices of actors. This is especially true in cognitive sociology (Zerubavel, 1997) and in analytical sociological theories such as the DBO (desire-belief-opportunity) theory (Hedström, 2005), or in the cognitivist model for rational choice (Boudon, 1996, 1998).
Among these social cognitive features, I will focus on people’s image of income distribution in this study. The income distribution image denotes the perceived and estimated income distribution by the individual and is supposed to be formed by the understanding of own income status and others’ income status and a basis of subjective belief on the features of society, including societal average income. This income distribution image is important for policy making, as it represents the people’s assessment of the state of inequality in society, and consequently, forms a basis of people’s choice of redistribution policies and regimes (Cruces, Perez-Truglia, & Tetaz, 2013).
Several attempts have been made to capture income distribution image. Among them, Norton and Ariely (2011) directly examined Americans’ image of the distribution of wealth in the United States, as well as their preference for ideal distribution through an online survey. The authors clearly showed that respondents’ wealth distribution image shows greater equality than the actual distribution.
In line with Norton and Ariely (2011), Cruces et al. (2013) further examine the biased perception of income distribution. In particular, they examine how individuals form biased perceptions of their own relative position in income distribution and how these perceptions affect redistribution preferences. From a household survey in Argentina, they found that there are systematic biases in individuals’ evaluations of their relative position in income distribution, which can be partly explained by the extrapolation of information from endogenous reference groups.
Related to the studies directly exploring perception of income distribution, there are series of studies exploring the perception of legitimacy or inequality of income distribution (i.e., Amiel & Cowell, 1999; Caricati, 2017; Cowell & Cruces, 2004).
I examine current Japanese income distribution image in this study, not directly from the respondents’ estimated income distribution per se, like Norton and Ariely (2011), but indirectly from respondents’ estimated average income collected by a nationwide random sampling interview survey. I will employ a Bayesian statistical model to analyze people’s image of income distribution from average income image data.
Recently, Bayesian statistics and its application have been developed in many fields, and there have been an increasing number of applications in the field of social sciences, including sociology (Jeliazkov & Yang, 2014; Western, 1999, 2001). This “Bayesian renaissance” is mainly attributable to the recent remarkable development of computational Markov chain Monte Carlo (MCMC) methods that enable us to bypass mathematical derivation of posterior distributions, which can be hard to solve.
The virtues of Bayesian statistics vis-à-vis conventional frequentist statistics can be summarized as follows: (a) the natural assumption of an inference process that resembles a human inference process—it is assumed that a prior belief, represented as a prior parameter distribution, meets some empirical data and turns out to be a more concrete posterior belief, represented as a posterior distribution; (b) flexible model construction that enables us to express multilevel uncertainty with hierarchical models; and (c) the possibility of connection to formal theoretical models of a Bayesian learning process of formation of belief. Based on these virtues, I would like to insist that Bayesian statistics is more suitable for social psychological studies of image of distribution than frequentist statistics, because it enables us to construct flexible models that express uncertainty of image by parameter distribution, and it allows for investigation of image formation mechanisms behind the image distribution as Bayesian inference or learning process (Breen, 1999; Breen & García-Peñalosa, 2002).
Thus, this study aims to investigate people’s image of income distribution and its difference by social position using observed data from a 2015 Japanese survey, by applying Bayesian statistical analytical models. The remainder of this article is structured as follows: first, the method is discussed in terms of data and variables. Thereafter, three different Bayesian models for examining income distribution image are discussed. For each model, the assumptions are outlined and the results are discussed. Finally, the conclusion provides perspective to the results and outlines future work.
Method
Data
The data used for the analysis are from the Stratification and Social Psychology Project Survey (SSP 2015), 1 which is a Japanese national sampling survey of class identity, social images, and other related attitudes toward social inequality and social stratification. The survey was conducted between January and June 2015. The sampling procedure was a three-stage stratified random sampling, and the sampling list was the Japanese electoral roll and the basic resident registration. Questionnaires were distributed to 8,309 male and female participants aged 20 to 64 years in 450 locations, and the mode of data collection was face-to-face interviews with computer (tablet-type device) assistance (Computer Assisted Personal Interview [CAPI]). Consequently, there were 3,573 valid responses. The survey was conducted with the respondents’ informed consent, and anonymity was retained throughout the survey process. The official response rate was 43.03%. 2
Variables
The variable of income distribution image per se is not available in the SSP 2015 data. Instead, the variable of average income image is available, that is, a respondent’s estimation of the average income of the same generation as the respondent. The actual question for average income image is “how much do you think the average annual income of people your age is?” Then, respondents were asked to select a corresponding income class as the answer. 3 In the following analysis, I treat the average income image variable as a continuous variable using the median value of the income class.
It can be assumed that the average income image is derived from the latent and unobserved income distribution image of the individual. I will analyze the variation of income distribution image and its differentiations from observed average income image by applying Bayesian hierarchical models. Theoretically, the difference in income distribution image by social position may result from the differences in respondents’ social experience and social interaction with others by their social position.
Figure 1 shows the distribution of average income image. 4 The median is 425 ten thousand Japanese yen, the mean is 437 ten thousand Japanese yen, and the standard deviation is 215.

Histogram of average income image (ten thousand Japanese yen).
Model 1: Overall Shared Income Distribution Image
Model Definition
First, I introduce a simple prototype model with assumption of overall shared income distribution image. I assume that each respondent’s average income image (which is directly observed in the survey) derives from a sample mean of the respondents’ latent and unobserved income distribution image. The latter is assumed to be shaped by a lognormal distribution represented by
The reasons for assuming a lognormal distribution as the income distribution image are as follows. First, it has been theoretically and empirically claimed that the actual income distribution closely approximates the lognormal distribution, especially when excluding the higher income group (Clementi & Gallegati, 2005; Gibrat, 1931; Hamada, 2004; Pestieau & Possen, 1979; Sargan, 1957). Second, the lognormal distribution is easy to handle parametrically for obtaining the indices of the features of distribution, such as the mean (expected value) and the Gini coefficient. Although the actual income distribution image may have various forms beyond the parametric assumption, I employ the assumption of lognormal distribution as the first approximation.
For simplicity of the model, let us assume that people properly estimate the societal average income from unbiased information of their income distribution image.
5
If each value
where
For the sake of simplicity in the MCMC simulation, it is also assumed that the sample size
With reference to the method of setting of prior distributions by Kruschke (2015), I assume that μ is normally distributed as
Figure 2 is the graphical representation of Model 1, where gray circle nodes indicate observed continuous variables (image of average income in this model), double circle nodes indicate generative continuous variables (parameters of normal distribution of average income image), single circle nodes indicate latent continuous variables with prior distribution (parameters of shared latent income distribution image with the shape of lognormal distribution), and square nodes indicate latent discrete variables (assumed sample size from latent income distribution image). Figure 3 shows the outline of Model 1.

Graphical model of Model 1.

Outline of Model 1.
Result of the MCMC Estimation
I employed Stan 2.13.1 (Stan Development Team, 2016b) for the MCMC simulation programming to estimate the posterior distributions of parameters μ and σ, and RStan 2.13.2 (Stan Development Team, 2016a) for implementation in R. I conducted four chains of sampling for 5,000 iterations each, which includes 1,000 initial iterations as burn-in samples. The thin interval was set as one, to generate 16,000 sampled points of posterior distribution.
Table 1 shows the summary of the MCMC estimation of posterior distributions for Model 1. Gelman–Rubin MCMC convergence statistic (
Summary of MCMC Estimation (Model 1).
I monitored the Gini coefficient of income distribution image as a transformed parameter. The Gini coefficient of the lognormal income distribution is parametrically obtained by the following formula (Aitchison & Brown, 1957).
where
Figure 4 shows the shared income distribution image predicted by the posterior distributions of the parameters of the lognormal distribution; it comprises an overlay of 100 plots whose parameters were randomly resampled from the MCMC sample. Figure 5 comprises the histogram of the observed average income image, overlaid by randomly resampled 100 posterior predictive distributions.

Predicted shared income distribution image (ten thousand Japanese yen).

Data with posterior predictive distribution (ten thousand Japanese yen).
Let us compare the estimated shared income distribution image with the actual household income distribution in Japan at that time. According to the Comprehensive Survey (CSLC) of Living Conditions 2015, 7 the actual average household income in 2015 is 541.9 ten thousand Japanese yen, and the value of the Gini coefficient is 0.401. 8 The estimated shared income distribution image tends to underestimate wealth (average income) and highly overestimate inequality (the Gini coefficient) in the society.
Model 2: Difference of Shared Income Distribution Image Among Age Cohorts
Model Definition
Although Model 1 is a relatively simple prototype model in which all respondents hold the same image of income distribution, it can be theoretically assumed that people hold different images, based on their different social experiences. Based on sociological theories of reference groups (e.g., Hyman, 1942; Merton, 1957), it can be assumed that people form their image of income distribution mainly by integrating information on incomes of their reference group members with whom they interact daily. Thus, the image is an incomplete reflection of the income distribution of the reference group. The latter is assumed to be selected based on either geographic proximity or similarity in terms of attributes and socioeconomic status (Singer, 1981). Hence, images can differ with social position.
As the scope of reference is clearly set on age in the actual question of average income image in this survey, the age cohort is the first to be considered as a category that causes differences in the income distribution image. In the actual analysis, I created a nominal variable of age cohort, consisting of categories of 10-year duration between 20 and 59 years of age, such as 20s (aged 20-29 years) and 30s (aged 30-39 years), and the 60s category from 60 to 64 years old. Furthermore, samples were divided by gender because typical features of an age cohort within the reference group could differ by gender, especially in Japan, where gender roles and rules still strongly constrain their life courses (Kano, 2015).
In the analytical model, the difference of shared income distribution image among each category is described as the difference in parameters of the latent income lognormal distribution, μ and σ. Besides, I assume that the parameter μ is predicted by a linear equation, that each parameter of the equation has its distributions, and that the parameter σ obeys a gamma distribution. This model resembles the ANOVA model in frequentist statistics (Kruschke, 2015). However, it is more flexible, as we can estimate not only differences of
The average income image held by the
and its distribution is
The parameter
In Equation 1,
where
Figure 6 comprises the graphical representation of Model 2, and Figure 7 represents its outline.

Graphical model of Model 2.

Outline of Model 2.
Results of the MCMC Estimation
As for Model 2, I also conducted four chains of sampling for each of the 5,000 iterations, which includes 1,000 initial iterations as burn-in samples. Thereafter, the thin interval was set as one, generating 16,000 samples of the posterior distribution.
As for the male sample,
Summary of MCMC Estimation (Model 2, Male Sample).
Model 2 with the female sample can also be regarded as being converged. Table 3 shows posterior distributions of the parameters determining different income distribution images for categories.
Summary of MCMC Estimation (Model 2, Female Sample).
The differences of the means and the Gini coefficients of lognormally shaped income distribution images in age cohorts are shown in Figures 8 and 9, respectively. For both male and female samples, the mean of the distribution image increases as the age cohort rises until the 50s and then decreases in the 60s. As for the Gini coefficient, although the values are relatively high, there is a

Mean of shared income distribution image (median and 95% credible interval).

Gini coefficient of shared income distribution image (median and 95% credible interval).
Finally, I compared the estimated shared income distribution image of each age cohort with the actual household income distribution of the age cohort. Table 4 shows the actual average household income and the Gini coefficient of each age cohort in 2015 from CSLC 2015 and the SSP 2015 data. Each estimated shared income distribution image tends to underestimate wealth (average income) and overestimate inequality (the Gini coefficient) in the society; still, these images approximately reflect tendencies in difference among age cohorts. 10
Actual Average Household Income and the Gini Coefficient in Age Cohort.
Model 3: Difference of Shared Income Distribution Image Among Income Classes
We can apply another categorical variable or a set of categorical variables to this Bayesian hierarchical model. Here is another model with an actual household income class that consists of four income categories divided by quartile points. Categories are below 375 ten thousand yen (c1), from 375 to 600 (c2), from 600 to 800 (c3), and above 800 (c4). The structure of the model and the procedure of MCMC sampling are same as Model 2 (see Figures 6 and 7).
Table 5 shows the result of the estimation of posterior distributions of parameters determining different income distribution images for income classes, and Figures 10 and 11 show differences in the means and the Gini coefficients of lognormally shaped income distribution images in income classes. There is an explicit linear relationship between the mean of shared income distribution image and income class, that is, the mean of income distribution image increases significantly as the income class increases. On the contrary, the Gini coefficient of income distribution image decreases slightly as the income class increases. An explanation of such trends of the mean may be directly derived from reference group theory, which states that people tend to form their image by comparisons with others close to their social economic status (Merton, 1957). The trend of the Gini coefficient may be related to the narrowness of the reference scope in higher income classes.
Summary of MCMC Estimation (Model 3).

Mean of shared income distribution image (median and 95% credible interval).

Gini coefficient of shared income distribution image (median and 95% credible interval).
Conclusion
Thus far, we have investigated people’s image of income distribution and its difference by social position from data collected during a 2015 Japanese survey, by applying Bayesian statistical analytical models.
The study concludes that the distribution image tends to underestimate the average income level and overestimate inequality.
The fact that people tends to underestimate the average income level implies a possibility that people tend to overlook the existence of higher income earners when imagining the income distribution of their reference group. If so, this study’s assumptions of lognormal distribution as the income distribution image and random sampling from the image should be reconsidered, and if necessary, revised in future studies. Besides, the assumption of lognormal distribution would be the main cause of overall increase in the values of the Gini coefficient. 11 Hence, the focus should only be on the relative differences among the values of the Gini coefficient in this study.
Despite the effects of the distributional assumption on the results of analyses, in general, the income distribution image could be seen as an incomplete reflection of the income distribution characteristics of the reference group.
I would like to highlight some implications for an actual redistribution policy. As a matter of principle, the opinion of people based on their subjective evaluations of current distributional situations should be respected in policy making and assessment. However, the result of this study implies that the image of income distribution varies according to the scope of the reference group. Besides, it implies that income distribution image would be biased by overlooking higher income earners. Therefore, these properties of income distribution image should be carefully considered in policy making.
Once again, I would like to stress several advantages of adopting the Bayesian model to study distribution image. First, we can make a strict assumption of latent image in the Bayesian hierarchical model. Second, we can extract some latent information through the flexible model. In this case, we extracted information about inequality of the distribution image in terms of the Gini coefficient.
Some future tasks remain, based on limitations of this study. First, the assumption of lognormal distribution as latent income distribution image should be reconsidered as per the lessons of this study. Second, as the models are relatively simplistic, future studies should develop more empirically realistic and complex models to explore causal mechanisms in the formation of societal images. Third, the Bayesian model of images presented in this article should be verified by direct observation of the images. Finally, a connection should be made to formal theoretical models for a comprehensive study of image and social cognition.
Footnotes
Appendix
Full Detail of the Questions for Average Income Image and Household Income.
| Average income image | How much do you think the average annual income of people your age is? Please choose one of the following. |
|---|---|
| Household income | What was the total income of your household (all people living together as a family unit), before taxes, for the past year? Please include all casual income and extra income such as annual pension, dividends on stock shares, etc. (Please choose single answer) |
| 1 | None |
| 2 | Less than ¥250,000 |
| 3 | ¥250,000 or higher but less than ¥500,000 |
| 4 | ¥500,000 or higher but less than ¥750,000 |
| 5 | ¥750,000 or higher but less than ¥1,000,000 |
| 6 | ¥1,000,000 or higher but less than ¥1,250,000 |
| 7 | ¥1,250,000 or higher but less than ¥1,500,000 |
| 8 | ¥1,500,000 or higher but less than ¥2,000,000 |
| 9 | ¥2,000,000 or higher but less than ¥2,500,000 |
| 10 | ¥2,500,000 or higher but less than ¥3,000,000 |
| 11 | ¥3,000,000 or higher but less than ¥3,500,000 |
| 12 | ¥3,500,000 or higher but less than ¥4,000,000 |
| 13 | ¥4,000,000 or higher but less than ¥4,500,000 |
| 14 | Approximately ¥5,000,000 (¥4,500,000 or higher but less than ¥5,500,000) |
| 15 | Approximately ¥6,000,000 (¥5,500,000 or higher but less than ¥6,500,000) |
| 16 | Approximately ¥7,000,000 (¥6,500,000 or higher but less than ¥7,500,000) |
| 17 | Approximately ¥8,000,000 (¥7,500,000 or higher but less than ¥8,500,000) |
| 18 | Approximately ¥9,000,000 (¥8,500,000 or higher but less than ¥9,500,000) |
| 19 | Approximately ¥10,000,000 (¥9,500,000 or higher but less than ¥10,500,000) |
| 20 | Approximately ¥11,000,000 (¥10,500,000 or higher but less than ¥11,500,000) |
| 21 | Approximately ¥12,000,000 (¥11,500,000 or higher but less than ¥12,500,000) |
| 22 | Approximately ¥13,000,000 (¥12,500,000 or higher but less than ¥13,500,000) |
| 23 | Approximately ¥14,000,000 (¥13,500,000 or higher but less than ¥14,500,000) |
| 24 | Approximately ¥15,000,000 (¥14,500,000 or higher but less than ¥15,500,000) |
| 25 | Approximately ¥16,000,000 (¥15,500,000 or higher but less than ¥16,500,000) |
| 26 | Approximately ¥17,000,000 (¥16,500,000 or higher but less than ¥17,500,000) |
| 27 | Approximately ¥18,000,000 (¥17,500,000 or higher but less than ¥18,500,000) |
| 28 | Approximately ¥19,000,000 (¥18,500,000 or higher but less than ¥19,500,000) |
| 29 | Approximately ¥20,000,000 (¥19,500,000 or higher but less than ¥20,500,000) |
| 30 | ¥20,500,000 or higher but less than ¥25,000,000 |
| 31 | ¥25,000,000 or higher but less than ¥30,000,000 |
| 32 | ¥30,000,000 or higher but less than ¥40,000,000 |
| 33 | ¥40,000,000 or higher but less than ¥50,000,000 |
| 34 | ¥50,000,000 or higher but less than ¥60,000,000 |
| 35 | ¥60,000,000 or higher but less than ¥70,000,000 |
| 36 | ¥70,000,000 or higher but less than ¥80,000,000 |
| 37 | ¥80,000,000 or higher but less than ¥90,000,000 |
| 38 | ¥90,000,000 or higher but less than ¥100,000,000 |
| 39 | ¥100,000,000 or higher |
Acknowledgements
The author thanks the Stratification and Social Psychology (SSP) Project for the permission to use the SSP 2015 survey. The author also thanks the editor, Jinxian Wang, and two anonymous referees for their helpful comments that improved the article. I am also grateful for comments on the author’s study made by Toru Kikkawa, Hiroshi Hamada, Yoshimichi Sato, and Gianluca Manzo.
Author’s Note
Atsushi Ishida is now at Kwansei Gakuin University, Japan.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: JSPS KAKENHI Grant Numbers 16H02045, 15K13080.
