Sage Journals: Discover world-class research

Abstract

Frequency of behaviors or amounts of variables of interest are essential topics in many surveys. The use of heuristics might cause rounded answers, resulting in the increased occurrence of end-digits (called heaping or digit-preference). For web surveys (or CASI), we propose using a conditional prompt as input validation if digits indicating heaping are entered. We report an experiment, where respondents in an online access panel (n = 2,590) were randomly assigned to one of three groups: (1) no input validation; (2) conditional input validation if rounding was presumed; and (3) input validation every time a numerical value was entered. Conditional input validation reduces heaping for variables with high proportions of heaped values. Unconditional input validation seems to be less effective.

Introduction

Blair and Burton (1986, 1987) were the first to extensively review the problems associated with asking for the frequency of certain behaviors from respondents in an open-ended question. Respondents tend not to answer such questions in as much detail as intended by researchers. Instead, the answers often are multiples of certain numbers, mostly five or 10. Such behavior on the part of the respondent is termed heaping (also called digit-preference).

Vaske and Beaman (2006) provided an overview of heaping in surveys and its implications. For instance, response heaps do not only occur at 0 and 5, but potentially at any number and gaps between response heaps increase with a growing range of responses (Vaske and Beaman 2006:286–91). An example of the occurrence of heaping has been recognized as a problem in demography (evident in census counts), where questions asking for a person’s age often yield nonuniform distributed terminal digits (Hobbs 2004). Furthermore, Wang and Heitjan (2008) showed that heaping could lead to biased descriptive statistics by distorting the underlying distribution, and it can affect inferences.

Heaping might be seen as the consequence of not counting each and every occurrence of the behavior to be reported. Problems due to heaping can be approached in two ways: Either by preventing during data collection or correcting already available data. Most of the literature is dedicated to attempts for correcting available data (see, e.g., Allen et al. 2017; Bar and Lillard 2012; Crawford et al. 2015; Wolff and Augustin 2003; Zinn and Würbach 2015). For example, Zinn and Würbach (2015) used a zero-inflated log-normal distribution for self-reported income. In contrast, attempts at preventing heaping during data collection have been rare. One of the few examples is Becker and Diop-Sidibé (2003), who proposed using a calendar to reduce heaping when recording durations.

Tourangeau et al. (2004:235) list three causes for rounding: (1) imprecise encoding of the information in memory; (2) indeterminacy in the underlying quantities (e.g., the actual price of a house is unknown until it is being sold); and (3) the burden of retrieving numerous specific pieces of information.¹ From the perspective of a survey designer, only the last mechanism can be used to reduce heaping since the other two are properties of the memory encoding or correspond to the true state of nature. In some contexts, the subjectively perceived burden can be reduced by increasing the motivation of respondents, for example, by emphasizing the importance of the answer (Cannell et al. 1981:404). Therefore, to increase motivation, prompts have been used, for example, to reduce nonresponse (DeRouvray and Couper 2002) or improve the quality of the given answer (Conrad et al. 2005).²

This article reports on a randomized experiment, where two different input validation prompts are used to reduce heaping during data collection in a general population web survey (n = 2,590). Respondents are either (1) not prompted; (2) prompted if rounding was detected; or (3) prompted every time a numerical answer was given. The conditional feedback on rounding behavior could be seen as an external motivation to increase the cognitive effort. Therefore, an input validation prompt should reduce heaping.

Methods

The proportions of rounded answers are compared between groups using a test for differences in proportions. The test statistic is

\begin{matrix} z = \frac{{\hat{π}}_{1} - {\hat{π}}_{2}}{\sqrt{({\hat{π}}_{p} (1 - {\hat{π}}_{p})) (1 / n_{1} + 1 / n_{2})}} \end{matrix}

(1)

with

\begin{matrix} {\hat{π}}_{p} = \frac{x_{1} + x_{2}}{n_{1} + n_{2}} \end{matrix}

(2)

where π₁ and π₂ are the proportions of rounded answers within the two experimental groups, n₁ and n₂ the number of responses per group, and x₁ and x₂ are the number of persons with rounded answers (see, e.g., Agresti et al. 2017:478f.). One-sided tests were considered.

The statistical effect size for the difference between two proportions (p₁ and p₂) is usually evaluated using Cohen’s h which is defined as (Gleser and Olkin 2009:364)

\begin{matrix} h = 2 \arcsin \sqrt{p_{1}} - 2 \arcsin \sqrt{p_{2}} \end{matrix}

(3)

Data

Results reported here are based on an experiment using all panel members of an academically managed non-probability online panel (WiSo Panel, Göritz 2014). As with many other non-probability online surveys (Callegaro et al. 2015), this panel has used different online recruitment methods such as banner ads, search engines, online networks, and newsletters. In addition, panel members are encouraged to recruit further panel members (Crutzen and Göritz 2012:196). Of course, non-probability samples should not be used for point estimates for a population (Baker et al. 2010), but estimating experimental effects is widely considered as legitimate within bounds given by the heterogeneity of the sampled population (Kohler et al. 2019:166).

The resulting sample (n = 2,590) consisted of 60.8% women and 39.2% men; ages ranged from 15 to 89 years (with a mean of 44). A comparison of the distribution of educational level in the sample and the German census is shown in Table 1. As in many other access panels, the educational level of the respondents is biased in favor of more educated respondents (McCutcheon et al. 2014:115).³

Table 1.

Educational Level in the Sample (Unweighted, in Percent). Census Data for People above 14 Years of Age. The Sample Contains a Larger Proportion of Highly Educated People than the General Public.

Educational level	Sample	Census
No degree	1.2	7.1
Nine years of school	11.4	35.8
Vocational qualification	28.1	28.7
Higher education entry qualification	31.2	12.7
University degree	26.1	13.6
Doctorate	2.0	1.2

The survey contained questions on:

(1) the number of alcoholic beverages per week,

(2) the number of cigarettes smoked per day,

(3) kilometers driven by car per week,

(4) expenses in euros for clothing in the previous month,

(5) the number of orders of electronic items in the previous 12 months.⁴

The questions were presented in random order. The respondents were (disproportionately) randomly assigned to one of three groups:

(1) no input validation (n = 653, ≈25%),

(2) conditional input validation, that is, a prompt whenever an answer given to a question was probably rounded (n = 1,293, ≈50%),

(3) input validation with a prompt for every question (unconditional input validation) if a number larger than zero was entered (n = 644, ≈25%).

For the conditional input validation, an answer was seen as rounded if it was a multiple of five for alcohol consumption, cigarettes, and order of electronic items. Additionally, for alcohol consumption, a weekly reoccurrence (multiple of seven) and for order of electronic items a monthly reoccurrence (multiple of 12) was considered a rounded answer. For kilometers driven and clothing expenses, an answer was seen as rounded if it was a multiple of 50.

The input validation prompt read: “If you guessed this answer, please can you give a precise answer” (translated from German).⁵

After editing for incomplete, zero, and implausible responses, 2,590 cases with 1,594 responses for alcohol consumption, 775 responses for cigarette counts, 2,190 responses for kilometers driven by car, 1,987 responses for clothing expenses, and 1,411 responses for orders of electronic items remained.⁶

Results

Table 2 and Figure 1 show the proportion of rounded responses per variable. Between 14 and 71% of all responses were rounded (considering a multiplier of five). As expected, for clothing expenses (in euros) and kilometers driven by car pronounced heaping was found for multiples of 50. Furthermore, for cigarettes, large proportions of rounded responses are seen for multiples of five. For the other two variables of interest (beverages and electronic orders), heaping could neither be observed for multiples of five nor for weekly or monthly reoccurrences. Therefore, given that 75% of all respondents reported numbers less than five (alcohol) or four (electronic orders), it is likely, that for these rare occurrences, the respondents actually recalled the individual episodes.

Table 2.

Proportions of Responses of Multiples for Each Input Validation Group: Conditional (C), Unconditional (UC), No Input Validation (No). Conditional Input Validation Shows Lower Proportions of Heaped Responses than No, as Well as Unconditional Input Validation.

Heaping on	Orders				Alcohol				Cigarettes
Multiplier of	No	C	UC	Total	No	C	UC	Total	No	C	UC	Total
5	0.15	0.12	0.17	0.14	0.23	0.19	0.23	0.21	0.75	0.67	0.73	0.71
7					0.05	0.06	0.08	0.06
12	0.01	0.01	0.01	0.01

Heaping on	Km				Expenses
Multiplier of	No	C	UC	Total	No	C	UC	Total
50	0.59	0.51	0.52	0.53	0.55	0.40	0.47	0.45

Figure 1.

Proportions of rounded answers (five for electronic orders, alcoholic beverages, and cigarettes; 50 for driven kilometers, and expenses). Conditional input validation always yields lower proportions than no or unconditional input validation.

The experimental results (see Table 2) correspond to the theoretically expected order of the amount of heaping. For four variables of interest, the largest proportion of rounded answers is found in the group with no input validation (the exception is electronic orders, where the largest proportion is found in the unconditional prompting group).

For all five variables, a conditional input validation resulted in lower proportions of rounded answers than in the other two groups. In contrast, the unconditional reminder resulted for three variables in a decrease. Furthermore, on all five variables, the decrease is smaller for unconditional than the decrease for the conditional prompt.

These effects of the experimental conditions on differences in the proportions of rounded answers were tested by pairwise comparisons (no prompt versus conditional prompting, no prompt versus unconditional prompt; see Table 3).

Table 3.

Z Statistics: Pairwise Comparison of Proportions of Rounded Answers of Experimental Groups (Unweighted; One-sided). Variables Are Ordered According to Median Values for all Respondents. All Observed Differences Correspond to Theoretical Expectations: Prompting Reduces Heaping, Larger Reductions Are Observed for Conditional Prompting. for Both Experimental Conditions, the Size of the Effect Increases from Top to Bottom (i.e., with Increasing Median Value of the Variable).

	Conditional versus no prompt	Unconditional versus no prompt
Orders	−1.12	0.81
Alcohol	−1.64	−0.16
Cigarettes	−2.02^*	−0.52
Km	−2.96^**	−2.13^*
Expenses	−5.65^**	−2.62^**

*p < 0.05, **p < 0.01.

For three variables (cigarettes, kilometers, and expenses), the decrease in heaping was significant for the conditional prompt. For the unconditional prompt, the decrease was significant for two variables (kilometers and expenses). As can be seen in Figure 1, the three variables that showed a significant reduction in heaping for conditional prompts are those variables with the largest amount of heaping in all experimental groups.⁷

The literature repeatedly reported more heaping with increasing magnitude of the response (Vaske and Beaman 2006:291). Due to the high skewness of the answers to many behavioral frequency questions, we considered the median response given as a better indicator for the magnitude than the range or the maximum. Therefore, we ordered the variables according to the median value of the variable of interest (disregarding experimental groups in the computation of the median). For both experimental conditions, the size of the experimental effect increases with an increasing median value of the variable (note the decreasing value of the test statistic in Table 3 from top to bottom in both columns). Thus, prompting seems to have a larger effect if higher values of response variables are to be expected.

The effect sizes for the variables of interest are shown in Table 4. The proposed conditional prompt yields an average effect of 0.17. Cohen (1988:184) considered values of h about 0.2 as small effects. However, conditional prompts resulted in an absolute decrease of 3% (orders) to 15% (expenses).

Table 4.

Effect Size (Cohen’s h) for the Difference between Experimental Groups and Control Group.

	Conditional versus no prompt	Unconditional versus no prompt
Orders	−0.09	0.05
Alcohol	−0.10	0.00
Cigarettes	−0.18	−0.05
Km	−0.16	−0.14
Expenses	−0.30	−0.16

In summary, for all variables, the conditional prompt resulted in a higher amount of decrease than the unconditional prompt, and the difference between both prompts increases with the magnitude of the response. Therefore, given the small additional costs due to the conditional prompt, we would recommend using conditional prompts in behavioral frequency questions in web surveys.

Discussion

Behavioral frequency questions and questions asking for numerical estimates tend to produce multi-modal frequency distributions due to heaping. Therefore, statistical modeling might be more difficult and might even result in biased estimates (Wang and Heitjan 2008).

Although most of the literature focuses on correction methods after the data collection, here we proposed a method to reduce heaping during the data collection. The conditional prompt after presumed rounded answers significantly reduced the amount of heaping in all three variables with severe rounding. Conditional input validation reduces heaping for variables with high proportions of heaped values. Unconditional input validation seems to be less effective.

This diminishing result of repetitive and unreliable warnings and alarms might be similar to the long-known “cry wolf” effect in engineering alarm systems (Bliss et al. 1995; Breznitz 1984). If the goal of the alarm is perceived as important and the alarm has a high positive predictive value, the cry wolf effect is not to be expected. However, if neither iss present, the cry wolf effect will be seen (Johnson et al. 2017). Therefore, the success of the input validation will depend on whether the respondent task was perceived as important and if the prompt is perceived as legitimate. Later research should examine if this mechanism is actually the cause of the observed differences in the effectiveness of the prompts.

However, prompting seems to reduce heaping and larger reductions have been observed for conditional prompting. Therefore, a prompt motivated by a presumed rounded answer might be the best design option for reducing heaping in numerical answers to behavioral frequency questions.

The results reported here are based on behavioral frequency questions. If this effect also applies to other questions that might lead to rounding (Holbrook et al. 2014) such as personal characteristic questions, questions about age at the time of an event, questions about percentages, or feeling-thermometers remains to be studied.

A limitation of this between-subjects experiment is due to the software used for implementation. The program does not allow the storage of the initial answers after a changed response following a prompt. In further research, the initial answer should be recorded and compared to later corrections. In addition, at least a subsample of respondents correcting their previous answer should receive a follow-up question if the updated answer is considered an accurate response or a measure to prevent further prompts. A further limitation of this study is common in research on heaping: The lack of external validation data, particularly for behavioral frequency questions, makes the evaluation of measures to reduce heaping difficult. The mentioned follow-up question might partially compensate for the absence of external validation data to evaluate the correctness of the given response. Furthermore, in a sequence of numerical questions, the optimal time spacing of conditional prompts needs to be determined.

Another limitation is due to the non-probability sampling. However, most web surveys (Baker et al. 2010) are based on non-probability sampling methods. The panel used here holds participants from all walks of life who were recruited from a variety of sources. Although the treatment effect in a general population survey might be different (Kohler et al. 2019), it is unlikely that the effect will not be detected in a general population survey. Whether a given non-probability sample is fit for the purpose of the study depends on the purpose and the sampling methods used (Baker et al. 2013). The statistical conditions necessary for the generalization of experimental results are described by Kohler et al. (2019). An overestimated treatment effect would be due to a high correlation between the selection probability for the web survey and the size of the motivational effect of the prompt. Since we found no effects of education, gender, and age on the size of the effect, we see no evidence for an overestimation due to non-probability sampling. Therefore, we consider the sample as fit for the purpose of this study.

Finally, replications of the experiment in other general population surveys seem to be necessary.

Footnotes

Acknowledgments

We thank the four anonymous reviewers for the detailed reviews and helpful comments to improve this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Notes

References

Agresti

Franklin

Klingenberg

. 2017. Statistics: The art and science of learning from data. 4th ed. Harlow, London: Pearson.

Allen

C. M.

Griffith

S. D.

Shiffman

Heitjan

D. F.

. 2017. Proximity and gravity: Modeling heaped self-reports. Statistics in Medicine 36:3200–15.

Baker

Blumberg

S. J.

Brick

J. M.

Couper

M. P.

Courtright

Dennis

J. M.

Dillman

Frankel

M. R.

Garland

Groves

R. M.

Kennedy

Krosnick

Lavrakas

P. J.

Lee

Link

Piekarski

Rao

Thomas

R. K.

Zahs

. 2010. AAPOR report on online panels. Public Opinion Quarterly 74:711–81.

Baker

Brick

J. M.

Bates

N. A.

Battaglia

Couper

M. P.

Dever

J. A.

Gile

K. J.

Tourangeau

. 2013. Summary report of the AAPOR task force on non-probability sampling. Journal of Survey Statistics and Methodology 1:90–105.

Bar

H. Y.

Lillard

D. R.

. 2012. Accounting for heaping in retrospectively reported event data—A mixture-model approach. Statistics in Medicine 31:3347–65.

Becker

Diop-Sidibé

. 2003. Does use of the calendar in surveys reduce heaping? Studies in Family Planning 34:127–32.

Bethlehem

Biffignandi

. 2012. Handbook of web surveys. Hoboken, NJ: Wiley.

Blair

Burton

. 1986. Processes used in the formulation of behavioral frequency reports in surveys. In Proceedings of the Section on Survey Research. American Statistical Association, 481–87. http://www.asasrms.org/Proceedings/papers/1986_090.pdf (accessed May 16, 2022).

Blair

Burton

. 1987. Cognitive processes used by survey respondents to answer behavioral frequency questions. Journal of Consumer Research 14:280–88.

10.

Bliss

J. P.

Gilson

R. D.

Deaton

J. E.

. 1995. Human probability matching behaviour in response to alarms of varying reliability. Ergonomics 38:2300–12.

11.

Breznitz

1984. Cry wolf: The psychology of false alarms. Hillsdale, NJ: Lawrence Erlbaum.

12.

Callegaro

Manfreda

K. L.

Vehovar

. 2015. Web survey methodology. Los Angeles: Sage.

13.

Cannell

C. F.

Miller

P. V.

Oksenberg

. 1981. Research on interviewing techniques. Sociological Methodology 12:389–437.

14.

Cohen

1988. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates.

15.

Conrad

F. G.

Couper

M. P.

Tourangeau

Galesic

. 2005. Interactive feedback can improve the quality of responses in web surveys. In Proceedings of the survey research methods section. ASA, 3835–40. http://www.asasrms.org/Proceedings/y2005/files/JSM2005-000938.pdf (accessed May 16, 2022).

16.

Crawford

F. W.

Weiss

R. E.

Suchard

M. A.

. 2015. Sex, lies, and self-reported counts: Bayesian mixture models for heaping in longitudinal count data via birth–death processes. Annals of Applied Statistics 9:572–96.

17.

Crutzen

Göritz

A. S.

. 2012. Public awareness and practical knowledge regarding hepatitis A, B, and C: A two-country survey. Journal of Infection and Public Health 5:195–98.

18.

DeRouvray

Couper

M. P.

. 2002. Designing a strategy for reducing “No Opinion” responses in web-based surveys. Social Science Computer Review 20:3–9.

19.

Gideon

Helppie-McFall

Hsu

J. W.

. 2017. Heaping at round numbers on financial questions: The role of satisficing. Survey Research Methods 11:189–214.

20.

Gleser

L. J.

Olkin

. 2009. Stochastically dependent effect sizes. In The handbook of research synthesis and meta-analysis. 2nd ed., eds. Cooper

Hedges

L. V.

Valentine

J. C.

, 357–76. New York: Russell Sage.

21.

Göritz

A. S.

2014. Determinants of the starting rate and the completion rate in online panel studies. In Online panel research: A data quality perspective, eds. Callegaro

Baker

Bethlehem

Göritz

A. S.

Krosnick

J. A.

Lavrakas

P. J.

, 154–70. Chichester, UK: Wiley.

22.

Hobbs

2004. Age and sex composition. In The methods and materials of demography. 2nd ed., eds. Siegel

J. S.

Swanson

D. A.

, 125–73. London: Elsevier.

23.

Holbrook

A. L.

Anand

Johnson

T. P.

Cho

Y. I.

Shavitt

Chávez

Weiner

. 2014. Response heaping in interviewer-administered surveys: Is it really a form of satisficing? Public Opinion Quarterly 78:591–633.

24.

Johnson

K. R.

Hagadorn

J. I.

Sink

D. W.

. 2017. Alarm safety and alarm fatigue. Clinics in Perinatology 44:713–28.

25.

Kohler

Kreuter

Stuart

E. A.

. 2019. Nonprobability sampling and causal analysis. Annual Review of Statistics and Its Application 6:149–72.

26.

Krosnick

J. A.

1991. Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology 5:213–36.

27.

McCutcheon

A. L.

Rao

Kaminska

. 2014. The untold story of multi-mode (online and mail) consumer panels: From optimal recruitment to retention and attrition. In Online panel research: A data quality perspective, eds. Callegaro

Baker

Bethlehem

Göritz

A. S.

Krosnick

J. A.

Lavrakas

P. J.

, 104–26. Chichester, UK: Wiley.

28.

Peytchev

Crawford

. 2005. A typology of real-time validations in web-based surveys. Social Science Computer Review 23:235–49.

29.

Roberts

Gilbert

Allum

Eisner

. 2019. Satisficing in surveys: A systematic review of the literature. Public Opinion Quarterly 83:598–626.

30.

Tourangeau

Rips

L. J.

Rasinski

. 2004. The psychology of survey response. Cambridge: Cambridge University Press.

31.

Turner

Sturgis

Martin

. 2015. Can response latencies be used to detect survey satisficing on cognitively demanding questions? Journal of Survey Statistics and Methodology 3:89–108.

32.

Vaske

J. J.

Beaman

. 2006. Lessons learned in detecting and correcting response heaping: Conceptual, methodological, and empirical observations. Human Dimensions of Wildlife 11:285–96.

33.

Wang

Heitjan

D. F.

. 2008. Modeling heaping in self-reported cigarette counts. Statistics in Medicine 27:3789–804.

34.

Wolff

Augustin

. 2003. Heaping and its consequences for duration analysis: A simulation study. Advances in Statistical Analysis 87:59–86.

35.

Zinn

Würbach

. 2015. A statistical approach to address the problem of heaping in self-reported income data. Journal of Applied Statistics 43:682–703.

Conditional Pop-up Reminders Reduce Incidence of Rounding in Web Surveys

Abstract

Introduction

Methods

Data

Results

Discussion

Footnotes

Acknowledgments

Declaration of Conflicting Interests

Funding

Notes

References