Sage Journals: Discover world-class research

Abstract

How to display questions that are part of a battery in self-administered surveys is an important decision. Battery items may be displayed in a grid in a mail survey or computer web survey, but are often displayed as individual items on mobile devices. Although past research has compared grids to item-by-item displays in computer and mobile web surveys, almost no work has compared these displays in mail surveys. Additionally, many web survey templates use wide rectangular buttons to select response options in individual items using a mobile-optimized design, different from the standard round answer space format typically used in mail surveys. In this study, we experimentally test grid versus item-by-item displays and round radio buttons versus wide rectangular buttons for battery items in a probability-based general population mixed-mode mail + web survey of adults in Nebraska. Consistent with past research, we find that item-by-item displays reduce straightlining rates compared to grid designs. We also find that respondents are less likely to select the last two response categories in the item-by-item displays than in the grid displays. Smartphone and computer web respondents have higher item nonresponse rates than mail respondents, and web respondents have lower straightlining rates than mail respondents, accounting for respondent characteristics. Reassuringly, there is no difference in data quality outcomes across radio button versus wide button formats. These findings replicate past research that item-by-item displays reduce straightlining but may shift answer categories. These findings suggest that questionnaire designers can combine round radio button answer spaces on mail surveys with wide buttons on web surveys on battery items with little difference in data quality.

Keywords

mixed-mode surveys grids mobile optimization questionnaire design survey research mail surveys web surveys

Introduction

Self-administered surveys are increasingly used to gather information by survey and market researchers (Olson et al., 2021). Mixed-mode mail and web “web-push” designs permit those who do not have easily accessible home Internet access to be included in the study, overcoming a problem of coverage with web surveys alone (Dillman et al., 2014). Achieving similar measurement quality across modes, however, may require making the design of the questionnaire as similar as possible, also known as unified mode design. Moreover, as web surveys can be completed on computers or mobile devices, surveys with a web component are also mixed-device surveys (Link et al., 2014; Peterson et al., 2017), and thus unified device design may also be needed. Mixing both modes and devices makes achieving unified designs more difficult, especially on certain question types.

Battery items, in which multiple items are asked under a common question stem and set of response options, can be difficult to design for mixed-mode/mixed-device surveys. In mail surveys and in web surveys completed by computer (hereafter “computer web surveys”), these items are typically visually displayed in a grid format. However, survey researchers could decide to display them in an item-by-item format, repeating the response options with each subitem. For instance, a researcher may implement an item-by-item approach to maintain a unified design when using mobile optimization of the web survey. Mobile optimization often defaults to displaying battery items in the item-by-item format to avoid making respondents scroll across response options (Revilla & Couper, 2018). Yet item-by-item formats have their own challenges. In mail surveys, item-by-item formats take more space than grid items, increasing survey length and possibly survey costs. In web surveys, item-by-item formats require vertical scrolling and may separate the question stem from the subitems if the list is long. Thus, researchers may be tempted to break with unified design and provide battery items in the grid format for those answering by mail or by computer web but in an item-by-item format for those answering by mobile web.

Furthermore, mobile optimization often changes the layout of the inputs for item-by-item displays from a mail-centric set of radio buttons (displayed as small circles) to a web-centric set of wide buttons (each response option is fully encompassed in a rectangular clickable box) (Antoun et al., 2020). While the wide buttons can be used in both computer and mobile web surveys, they are quite different from the answer spaces conventionally used in mail surveys, creating another potential deviation from unified design.

Despite the growing presence of web-push mixed-mode surveys, surprisingly little work has examined if or how bringing design features from mobile optimization into both computer web and mail surveys to achieve unified design for web-push mixed-mode surveys is necessary. Such research requires a careful comparison of grid versus item-by-item displays and answer space designs in computer web, mobile web, and mail surveys. Although considerable work has compared battery item designs in web surveys, only one study we are aware of has incorporated mail (and then only the grid format) into experiments (Kim et al., 2019) and none have examined whether wide rectangular buttons used in a web questionnaire can be transported to a mail questionnaire versus a more common round radio button format. Thus, in this paper, we address the following research questions:

1. Do measures of data quality for battery items differ across item-by-item versus grid formats?

2. Do measures of data quality for battery items differ across mail and web modes and across web devices?

3. Do measures of data quality for battery items differ depending on whether the display format uses mobile-optimized wide buttons or non-optimized radio buttons?

Background

Grids versus item-by-item displays

Item batteries are a series of items that have a common question stem and response options. Questions that form a scale are often asked as item batteries, but not all batteries are part of a scale. In self-administered surveys, the primary visual design decision is whether to display batteries as a group in a grid or to display each item separately, called an item-by-item design (Couper et al., 2013, 2017; Dillman et al., 2014; Revilla et al., 2017; Vehovar et al., 2022).

Two theoretical frameworks inform how grids may influence the quality of survey answers. The first is a visual design perspective. Drawing on the Gestalt principles of proximity and common region (Dillman et al., 2014; Smyth & Olson, 2020) and the “near means related” heuristic (Tourangeau et al., 2004), a set of items that are displayed physically together (proximate) and within a defined space (common region) will be perceived as both physically and conceptually grouped. As a result, individuals are likely to answer them similarly, resulting in possible nondifferentiation of answers (Krosnick, 1991). By breaking the proximate placement and removing a common region for the question, an item-by-item format may interrupt the perception of connections between items compared to a grid format, allowing respondents to process, interpret, and answer each item individually. Because each item is displayed and processed individually rather than being “lost” in the grid, the item-by-item display may reduce item nonresponse rates. That is, the perception of more connectedness created by the grid format and less connectedness in item-by-item displays suggests that answers to items displayed in a grid format will be lower quality than those displayed in an item-by-item design. In particular, questions displayed in a grid may be more likely to be skipped, straightlined (the same response provided), and answered with more nondifferentiation (similar responses provided) than those in item-by-item displays.

The second is a survey burden perspective, which predicts that satisficing is more likely to occur on the more burdensome format (Roßmann et al., 2017). Some have argued that grid designs are inherently burdensome (Peytchev, 2009). In contrast, others have argued that displaying questions in a grid may make survey questions easier and more efficient to answer because the respondent can quickly learn the set of candidate response options (Couper et al., 2001; Debell et al., 2019; Krosnick, 1991; Roßmann et al., 2017; Toepoel et al., 2009). Item-by-item designs add visual length to self-administered surveys because each question must be displayed separately. Survey length is a common proxy for survey burden (Bradburn, 1978). Thus, a survey with questions displayed in an item-by-item format may be perceived as more burdensome than the same questions displayed in a grid because of the added length due to separate presentation for each item and repeated display of the response options.

Empirical evidence is mixed about the effects of grids on answer quality compared to item-by-item displays. In web surveys, item nonresponse rates are sometimes higher on grids than item-by-item displays (Liu & Cernat, 2018; Toepoel et al., 2009; Tourangeau et al., 2004), but this is not always the case (Callegaro et al., 2009; Couper et al., 2001; Debell et al., 2019; Grady et al., 2019; Mavletova et al., 2018; Mockovak, 2018; Revilla et al., 2017; Vehovar et al., 2022). Generally, grid formats produce more nondifferentiation and/or straightlining than item-by-item formats, but again, not always (see Table 1). Response distributions tend to be similar across grid and vertical item-by-item displays (Mockovak, 2018), but there is some evidence that straightlining is concentrated in certain response categories. Studies have shown higher rates of selection of categories presented on the left side of the screen or higher rates of selecting the first response option in grids (although this varies by the number of response categories (Liu & Cernat, 2018)) and possible general “clustering” of answers around the endpoints (Dale & Walsoe, 2020). As such, the most likely candidate for changes in response distributions across the grid versus item-by-item displays are the endpoints of the response options.

Table 1.

Summary of nondifferentiation findings from studies of grid versus item-by-item visual displays of battery questions.

Study	Sample	Measure	Grid	Item-by-item
Guidry (2012)	National survey of student engagement	Aggregate low quality response measure including nondifferentiation (items not reported)	Smartphone: .191	n/a
			Computer: .094
			Tablet: .090
Baker-Prewitt and Miller (2013)	“Sample from a major online panel company”	Straightlining on 1 set (13 items)	Smartphone: 5%	Smartphone: 1%
Baker-Prewitt and Miller (2013)	“Sample from a major online panel company”	Straightlining on 1 set (13 items)	Computer: 3%	Smartphone: 1%
Peterson et al. (2013)	ResearchNow online opt-in panel	Straightlining on 1+ of 4 item sets	Smartphone: 26%	Smartphone: 21%
Peterson et al. (2013)	ResearchNow online opt-in panel	Straightlining on 1+ of 4 item sets	Computer: 30%	Smartphone: 21%
Struminskaya et al. (2015)	GESIS panel	Straightlining on 1 set (6 items)	Smartphone: 3.9%	n/a
Struminskaya et al. (2015)	GESIS panel	Straightlining on 1 set (6 items)	Computer: 1.5%	n/a
Keusch and Yan (2017)	iPhone owners recruited through mturk	Straightlining on 1+ of 3 item sets	n/a	Smartphone: 2.0%
Keusch and Yan (2017)	iPhone owners recruited through mturk	Straightlining on 1+ of 3 item sets	n/a	Computer: 4.5%
Lugtig and Toepoel (2016)	LISS panel	Straightlining on 1 set (5 items)	Smartphone: 9.3%	n/a
Lugtig and Toepoel (2016)	LISS panel	Straightlining on 1 set (5 items)	Computer: 10.0%	n/a
Richards et al. (2016)	World trade center health registry wave 4	Straightlining in 5 sets	Computer: All 5 predicted probabilities higher	Smartphone: All 5 predicted probabilities of straightlining lower
Stern et al. (2016)	Healthy Illinois survey	Mean % straightlining across 3 sets	Smartphone: 24.3%	Smartphone: 9.7%
Stern et al. (2016)	Healthy Illinois survey	Mean % straightlining across 3 sets	Computer: 18.7%	Computer: 15.7%
Antoun et al. (2017)	LISS panel	Nondiff. Index (range: 0-1) (6 items)	Smartphone: .35	n/a
Antoun et al. (2017)	LISS panel	Nondiff. Index (range: 0-1) (6 items)	Computer: .34	n/a
Revilla and Couper (2018)	Netquest opt-in panel	Variance in answers across items in 2 sets	Smartphone: Set 1: 1.8, set 2: 9.7	Smartphone: Set 1: 1.9, set 2: 11.8
Revilla and Couper (2018)	Netquest opt-in panel	Variance in answers across items in 2 sets	Computer: Set 1: 1.7, set 2: 10.6	Computer: Set 1: 1.9, set 2: 11.9
Revilla et al. (2017)	Netquest opt-in panel	Variance in answers in 2 nonsensitive 14-item sets	Smartphone: Set 1: 1.27, set 2: 5.94	Smartphone: Set 1: 1.39; set 2: 6.98
Revilla et al. (2017)	Netquest opt-in panel	Variance in answers in 2 nonsensitive 14-item sets	Computer: Set 1: 1.26, set 2: 6.43	Smartphone: Set 1: 1.39; set 2: 6.98
Tourangeau et al. (2017)	Probability sample of addresses within purposively selected counties	Straightlining on 1+ of 6 item sets	Smartphone: 50.8%	n/a
Tourangeau et al. (2017)		Straightlining on 1+ of 6 item sets	Computer: 50.9%	n/a
Kim et al. (2019)	Study 1: Adults age 22 to 34 from drivers license records	% straightlining on 2 sets (study 1) and 3 sets (study 2) + 4 other measures	Mail: Study 1: 11%; study 2: 61%	Computer: Study 1: 10%
Kim et al. (2019)	Study 2: Address-based sample in Wisconsin		Computer: Study 2: 54%	Computer: Study 1: 10%
Liu and Cernat (2018)	SurveyMonkey audience nonprobability panel	Straightlining on 2 sets (7 items and 5 items)	Smartphone: 2.1%–5.3%	Smartphone: 1.4%–6.6%
Liu and Cernat (2018)	SurveyMonkey audience nonprobability panel	Straightlining on 2 sets (7 items and 5 items)	Computer: 3.8%–7.8%	Computer: 1.1%–6.1%
Mavletova et al. (2018)	Online market intelligence online access panel	Total number of question sets with straightlining	Smartphone and computer: Grids increased odds of straightlining by OR 1.3	n/a
Grady et al. (2019)	SurveyMonkey audience nonprobability panel	% straightlining on 10 item sets	Computer: Study 1: 2.8%–4.4%; study 2: 2.5%–4.1%	Smartphone: Study 1: 2.8%–4.9%; study 2: 3.0%–7.1%
Clement et al. (2020)	Danish 2019 ISSP	% straightlining on 10-item set	Computer: OR = 1.191 smartphone versus PC; OR = 1.368 tablet versus PC	n/a
Dale and Walsoe (2020)	GallupPanelet probability-based online panel	% straightlining on 2 items sets (Q4: 10 point and Q5: 7 point scales)	Smartphone: Q4: 10%; Q5: 7%	Smartphone: Q4: 0%; Q5: 4%
				PC: Q4: 9%; Q5: 11%
				Tablet: Q4: 7%; Q5: 11%
Leon et al. (2022)	Netquest opt-in panel	Respondent-level standard deviation on two sets – 10-items and 14 items	Computer: Grids increased nondifferentiation versus item-by-item on smartphone (b = −.00, p = .017 grid 1; b = −.03, p-.084 grid 2)	Smartphone: Item-by-item decreased nondifferentiation versus computer grids
Vehovar et al. (2022)	Slovenian access panel	Mean number of 4 item sets with straightlining	Computer: .460	Computer: .388
Vehovar et al. (2022)	Slovenian access panel	Mean number of 4 item sets with straightlining	Smartphone: Item-by-item reduced straightlining by 25%	Computer: .388

Grids across modes and devices

Web and mail surveys both use visual communication, and may yield few mode differences in responses under a unified mode design. However, even with a common design, respondents may be differentially motivated to answer questions across these modes. For instance, mail surveys with long lists of questions may result in respondents more easily skipping an item, either inadvertently or on purpose. The length of a mail survey is immediately visible, unlike in web surveys that typically display only a few questions at a time, possibly increasing perceived burden experienced by mail respondents. Additionally, web surveys can use more active encouragement of answering questions, providing error messages if respondents leave questions blank or providing guidance on which rows or columns the respondent has answered.

Furthermore, completing a survey on a mobile device may be a difficult task (Couper & Peterson, 2017; Keusch & Yan, 2017; Peytchev & Hill, 2010; Struminskaya et al., 2015). Grids may pose particular problems for mobile devices because of the considerably smaller screen size of these devices. Smaller screen size may result in more inadvertent selection of responses (the “fat finger” problem) or require more horizontal and vertical scrolling in grids or on long sets of items, decreasing the probability of hidden response options at the end of grids being selected (Couper & Peterson, 2017; Peytchev & Hill, 2010).

Yet mode comparisons in general population samples have found few differences in overall substantive answers between web and mail surveys, with differences primarily observed in sample composition and in Internet access and usage (see reviews in Fowler Jr. et al., 2019; Suzer-Gurtekin et al., 2019; Tourangeau et al., 2013). Previous research has found higher item nonresponse rates on mail surveys than computer web surveys (Israel & Lamm, 2012; Lesser et al., 2012; Marken et al., 2018; Messer et al., 2012; Millar & Dillman, 2012), but has not found consistent differences in item nonresponse rates or in other indicators of data quality for questionnaires completed on mobile devices versus computers after accounting for possible selection into using a particular device (e.g., Keusch & Yan, 2017; Krebs & Höhne, 2020; Lugtig & Toepoel, 2016; Lynn & Kaminska, 2012; Mavletova, 2013; Revilla & Couper, 2018; Sommer et al., 2017; Toepoel & Lugtig, 2014). Response distributions also tend to be similar across computer and mobile device questionnaires (Clement et al., 2020; Keusch & Yan, 2017; Tourangeau et al., 2017).

The data quality outcome that has received perhaps the most attention across devices for battery items is nondifferentiation or straightlining (Table 1). Existing studies vary notably in how they operationalize nondifferentiation, the number of item sets examined, and the population of interest, and few studies have examined this design feature in a non-panel probability-based sample of the general population. Somewhat surprisingly, there is only one study of which we are aware that compares responses across battery designs on a computer web questionnaire to those obtained from a mail questionnaire, although it excludes an item-by-item condition in the mail (Kim et al., 2019). Additionally, it is unclear whether the studies conducted on smartphones used wide buttons or radio buttons on the item-by-item displays because screenshots are not uniformly provided (Dale & Walsoe, 2020). Nevertheless, the common pattern is that grid displays have higher levels of nondifferentiation than item-by-item displays, with mixed results across web devices.

Wide versus radio buttons

“Mobile optimization” approaches are common when conducting web surveys. In a mobile-optimized questionnaire, wide rectangular buttons often replace smaller radio buttons for the single answer item-by-item questions presumably to give respondents a bigger target to touch to register their response (Antoun et al., 2020; Mavletova et al., 2018). Few studies have examined whether wide buttons versus radio buttons in web surveys affect answers, and those that have find inconsistent differences in the distribution of responses (Antoun et al., 2020; Dale & Walsoe, 2020). While mobile optimization can often work well in web surveys because the same design can be used for those answering on a computer and mobile device (i.e., unified device design), it can be challenging to achieve unified mode design between mobile-optimized web surveys and mail surveys because wide answer buttons are highly unconventional in mail surveys. If the wide-button design is emulated in a mail survey to achieve unified mode design with a mobile-optimized web survey, it is unclear whether respondents to the mail survey will know how to answer since the wide answer spaces lack a distinct space to check or fill in on paper, possibly increasing respondent burden (Figure 1). Respondents in a mail survey may be more likely to skip items displayed with wide buttons (which become shaded rectangles in a mail survey) compared to the same items displayed with small answer space circles, although this design feature has not been empirically evaluated in mail surveys. Thus, we will also examine whether data quality differs when we display item-by-item designs with radio buttons versus wide buttons across modes and devices.

Figure 1.

Display of wide buttons for item-by-item questions in mobile web, computer web, and mail questionnaires.

In sum, we will examine how grid versus item-by-item designs implemented on mail, computer web, and mobile web devices affect five data quality outcomes (Table 2). We will examine three key survey design variables – visual design of the battery items, mode and device, and small radio buttons versus wide rectangular buttons; we have no clear expectations about interactions between these design features given mixed findings in past literature.

Table 2.

Summary of hypotheses.

	RQ1: Grid versus item-by-item	RQ2: Mode and device	RQ3: Radio versus wide button design
Item nonresponse	H1a1: Higher in grids	H1b: Higher in smartphones
Item nonresponse	H1a2: Lower in grids	H1b: Higher in smartphones
Straightlining	H2a: Higher in grids	H2b: Higher in smartphones
Nondifferentiation	H3a: Higher in grids	H3b: Higher in smartphones	H1c-H5c: No clear expectations
Selection of first response categories	H4a: Higher in grids	H4b: Higher in smartphones
Selection of last response categories	H5a: Higher in grids	H5b: Lower in smartphones

Data and methods

We use the 2017 Community Values and Opinions in Nebraska Survey (CVONS) in which a simple random sample of 10,000 Nebraska households was selected from an address-based frame, and one adult (age 19+) with the next birthday was selected from the household. CVONS was fielded from March 8 to May 18, 2017, by the Bureau of Sociological Research at the University of Nebraska-Lincoln (AAPOR Response Rate 2 = 28%, n = 2801 respondents). CVONS used a sequential mixed-mode design, offering the web survey URL in the first two postal mail contacts and both the web survey URL and a mail questionnaire in the final mailing.

All addresses were randomly assigned to a question design condition (grid or item-by-item, n = 5000 in each condition), fully crossed with the visual format conditions (radio button or wide answer space, n = 5000 in each condition; n = 2500 in each combination). Respondents remained in their assigned condition across modes or devices selected to complete the questionnaire. Consistent with most mobile-optimized web survey software displays for item-by-item formats (Hu, 2020; Mockovak, 2018; Revilla & Couper, 2018), vertical response options were used for the item-by-item display in all modes and devices (Figures 1 and 2). Mail questionnaires in both formats were 12 pages long with the same questions on each questionnaire page across conditions.

Figure 2.

Examples of radio button grid versus item-by-item displays in mail, computer web, and mobile web modes and devices.

The answer space format conditions on the web were achieved through two different Qualtrics templates. In the “minimal” template, the answer spaces for both grid and item-by-item questions were identified through radio buttons (small circles) with a black outline and white fill when unselected that turned dark grey when selected (see Figure 2). In contrast, the “Qualtrics 2014” template enclosed each response option in the item-by-item questions in a wide button (Figure 1) that displayed as grey but turned red when selected. The grid items in this template used red-rimmed circles instead of black-rimmed circles. Both templates used hover to highlight grid rows light grey when a cursor hovered over them in the computer web mode. Versions of the mail questionnaire were formatted to match each web template.

The grid versus item-by-item experiment was conducted on three attitudinal batteries and one behavioral battery (see Appendix A for full wording). The first battery (B12) on citizenship and community attitudes contained six items with a five-point fully-labeled scale from strongly agree to strongly disagree, all positive valence (I consider myself a good citizen). The second battery (B24) measured perceptions of safety with five items that were both positive (I feel safe where I live) and negative (I worry about becoming a victim of a crime) valence and five fully-labeled response options ranging from never to always. The third battery (B27) measured perceptions of fairness of the criminal justice system (Treatment of people accused of committing a crime) with four-point fully-labeled response options ranging from very fair to not at all fair. The behavioral battery (B15) measured time demands with a five-point fully-labeled response scale from never to always; this battery contained items that were positive valence (You were able to do almost everything you needed to do) and negative valence (You had too little time to perform daily tasks).

Dependent variables

The first dependent variable is item nonresponse. We examine item nonresponse within each battery by summing the number of unanswered items in each battery. The total number of items asked in the battery (constant across respondents) is included as an exposure variable. The average item nonresponse rate across the batteries is 2.6% (n = 2705).

We then examine two measures of nondifferentiation. The first measure examines straightlining, in which identical responses for each item within a battery were coded as 1 and responses that used at least one different answer category were coded as 0. Respondents who skipped items in the battery were included as straightlining if at least two answers were provided that used the same answer categories. The average straightlining rate across the batteries is 11.1% (n = 2657 answered at least two questions in at least one battery). The second measure of nondifferentiation is calculated as the respondent-level standard deviation (also referred to as within-subject inter-response variability) of answers to the j items within the battery: $S t d D e v = \sqrt{\frac{1}{j - 1} \sum_{i = 1}^{j} {(x_{i} - \bar{x})}^{2}}$ . A standard deviation of zero is equal to straightlining; higher values indicate more variable (and presumably higher quality) responses. The average nondifferentiation level across the batteries is .78 standard deviations.

The final dependent variables are from the responses to the questions themselves. First, we calculate the total number of selections of the first answer category within each battery for each respondent. Then we examine the total number of selections of the last two response options in questions with five scale points (Q12, Q15, Q24) and the last response option in questions with four scale points (Q27) across the items in the battery. This dependent variable indicates the most likely answer categories to be hidden in a grid format on smartphones. We calculate the total number of questions answered in each battery (varying across respondents) as an exposure variable in the models below to model the rate of selecting these response categories. The average rate of selecting the first response option across the batteries is 15.2% (n = 2657 answered at least one item in at least one battery) and the last two response options across the batteries is 21.8%.

Independent variables

The first independent variable is the format of the battery items – grid (52%) or item-by-item (48%).

The second independent variable is an indicator of the self-selected mode or device used to complete the survey. The completion mode – mail (28%) or web (72%) – and web device smartphone (21% of web respondents), tablet (8%) or computer (71%) - was identified from sample tracking forms and paradata based on operating system, browser, and screen size among the web respondents.

The final independent variable is whether the survey format used radio buttons or wide buttons. Overall, 50% of the respondents answered using the radio buttons format and 50% using the wide buttons format.

Control variables

To account for self-selection into device, the models control for respondent sex, age, education, race, and where the respondent reported responding to the survey (at home or at work/elsewhere). The question and visual format conditions were also fully crossed with three cover letter conditions in which framing the participation request varied; we also control for the cover letter text experiment conditions and include fixed effects for the four batteries to account for differences in content. Respondent-level characteristics did not differ across the grid versus item-by-item format (Appendix Table B; all p > .15).

Analyses

We estimate regression models using Stata 17.1 to examine whether the data quality outcomes vary across grid versus item-by-item format, mode or device, and wide versus radio buttons. Our experimental design factors were assigned at the respondent level, not at the question level; that is, there is no within-respondent variation in the key independent variables. Thus, we use population-average models accounting for clustering of the four batteries within respondents (McNeish et al., 2017).

For the count variables (item missing; selecting the first or last response options), we estimate Poisson models (xtpoisson) with exposure variables of the number of questions in the battery (item missing) and the number of items answered (first/last response options). For the continuous outcome of nondifferentiation, we use GEE models (xtgee) with batteries nested within respondents. For our dichotomous outcome of straightlining, we use logistic regression (xtlogit) models. We estimate each population-average model with robust standard errors and an exchangeable within-respondent correlation (using the pa vce(robust) options). For instance, to test the effect of web versus mail modes, let the batteries denoted with i = 1, …, 4 and respondents be denoted with j = 1, …, n, we estimate with a link function of g (Y_ij):

g (Y_{i j}) = β_{0} + β_{1} {I t e m b y I t e m}_{j} + β_{2} {W e b M o d e}_{j} + β_{3} {B u t t o n s}_{j} + \sum β_{4 a} {L e t t e r}_{a j} + β_{5} {F e m a l e}_{j} + \sum β_{6 b} {E d u c}_{b j} + β_{7} {A g e}_{j} + β_{8} {W h e r e R e s p o n d}_{j} + β_{9} {S u r v e y w e i g h t}_{j} + \sum β_{10 i} {Qn}_{i j}

We then test whether the effects of the grid versus item-by-item display vary by mode or device and grid versus item-by-item display vary by radio versus wide buttons using two-way interaction effects; we also examine the three-way interaction between grid versus item-by-item display, mode or device, and by radio versus wide buttons. Because we estimate six interaction effects for each outcome, we use a Bonferroni correction of p = .05/6 = .008 to indicate statistical significance for an interaction term. We estimate all models in two sets: including the mode indicator (mail vs. web) and the device indicators (mail, smartphone, tablet, computer). We also estimated models only for the item-by-item display condition to isolate the effects of the radio versus wide buttons; all conclusions were identical as in the interaction effects models.

In all analyses, missing values for the predictor variables were multiply imputed 20 times using sequential regression imputation with Stata’s ice command. We account for multiple imputation using the mi estimate commands. We account for the nonresponse adjustment weight by incorporating the grand-mean centered survey weight as a predictor variable (Snijders & Bosker, 2012).

Results

Grid versus item-by-item designs

There are systematic differences in two of the five measurement error outcomes across the grid versus item-by-item designs in these four batteries (Table 3 and Appendix Tables C.1 and C.2). Consistent with our hypotheses, straightlining rates are lower in the item-by-item displays than in the grid formats (H2a; p = .05), and selection of the last two categories was lower in the item-by-item displays than in the grids (p < .0001; H4a). There is no difference in the item missing rate (H1a1, H1a2; p = .26 for all batteries), nondifferentiation rates (p = .916; H3a), or the selection of the first answer category (p = .09; H5a) across the grid versus item-by-item design.

Table 3.

Population-average model coefficients and standard errors predicting five data quality outcomes.

Model 1: Web versus mail modes
	Poisson: Item missing		Logistic: Straightlining		Linear: Nondifferentiation		Poisson: Last two response options		Poisson: First response option
	Coef	SE	Coef	SE	Coef	SE	Coef	SE	Coef	SE
Battery display (0 = Grid)
Item-by-item	−.164	.145	−.140*	.071	.001	.008	−.085****	.016	−.057	.032
Mode (0 = mail)
Web	.684*	.296	−.189*	.080	.015	.009	.011	.018	−.019	.036
Design (0 = radio buttons)
Wide buttons	.145	.144	.059	.071	−.008	.008	−.019	.016	−.041	.032
Model 2: Device and mail modes
Battery display (0 = grid)
Item-by-item	−.164	.144	−.140*	.071	.001	.008	−.085****	.016	−.057	.032
Mode and device (0 = mail)
Web phone	.981**	.356	−.139	.129	.021	.015	.009	.031	−.013	.059
Web tablet	.320	.383	−.272	.161	.022	.017	−.014	.033	.001	.077
Web computer	.672*	.323	−.185*	.085	.012	.010	.015	.019	−.022	.038
Design (0 = radio buttons)
Wide buttons	.140	.144	.059	.071	−.008	.008	−.018	.016	−.041	.032

Note. *p < .05, **p < .01, ***p < .001, ****p < .0001. All models control for cover letter experiment, sex, education, age, where the respondent completed the survey, race/ethnicity, the sampling weight, and the battery items. Full models are in Appendix Tables C.1 and C.2.

Modes and devices

Differing from past research, item nonresponse rates are significantly higher in the combined web devices than in the mail survey (H1b, Model 1, Table 3, p = .026). When we examine the web devices separately, consistent with our hypothesis, smartphone respondents (H1b, Model 2, Table 3, p = .007) and computer respondents (p = .044) have higher item nonresponse rates than mail respondents, but there is no difference between the tablet and mail respondents (p = .40) Additionally, straightlining rates are lower in the combined web mode than in the mail mode (p = .02); looking at the web devices separately, the computer web respondents have lower straightlining rates than the mail respondents (p = .029), but there is no difference for the smartphone (p = .28) or tablet (p = .09) respondents, inconsistent with hypothesis H2b. Inconsistent with our hypotheses, there is no overall mode or device difference in nondifferentiation (H3b, Model 1 mode: p = .12; Model 2 device: p = .38), selection of the last two response options (H4b, mode: p = .55; device: p = .76), or selection of the first response option (H5b, p = .94).

Radio buttons versus wide buttons

There is no difference overall in the measurement error indicators for respondents who received the design containing radio buttons versus the wide button design. (H1c-H5c, p > .20 for all models).

Interaction effects

There are no statistically significant two-way interaction effects between mail versus overall web and the grid versus item-by-item format or radio versus wide buttons on any of the measurement outcomes (p > .08 for all interaction effects, Appendix Table C.3). No statistically significant interaction effects are present for the mail and individual web devices and the grid versus item-by-item format or radio versus wide buttons on any outcome other than nondifferentiation (overall p = .007 and p = .006). There is slightly more nondifferentiation (smaller standard deviation of answers) for those who answered on tablets with item-by-item displays and with wide buttons than with grids or with radio buttons (Appendix Figure C.1), but no difference in nondifferentiation in item-by-item and grid designs for those who answered via mail, web computer, or smartphone. None of the three-way interactions are statistically different from zero.

Discussion

Researchers often decide to display sets of questions with a common question stem and common response options in grids, but optimization on mobile devices may display these grid questions as individual items, undermining attempts to achieve unified designs. We examined whether these two methods of displaying battery items and the use of wide versus radio buttons affected answer quality across mail and web devices. We found that item-by-item displays yielded lower straightlining rates than the same items displayed in a grid, suggesting an increase in data quality for item-by-item displays and consistent with a visual grouping hypothesis. Respondents were also less likely to select the last two response options when shown in an item-by-item display, suggesting that they may have not fully processed the response options. Future research should explore whether these differences emerge because respondents are less likely to process the response options in the item-by-item format or because they are anchored to the visually grouped endpoints in the grid format.

It is reassuring that there were few differences between the mail modes and web devices on the outcomes examined here. Unlike past research – conducted largely before smartphones were used for web surveys – we found higher item nonresponse rates among the web respondents than among the mail respondents and especially higher item nonresponse rates for the smartphone respondents, consistent with a hypothesis that responding on smartphones is burdensome. Even though web respondents were more likely to leave an item missing, they were less likely to straightline than mail respondents when they did answer these questions, inconsistent with the single other study that compared straightlining on web and mail modes (Kim, et al., 2019). The higher levels of straightlining for mail respondents were not concentrated in only one response category across all respondents, as the rates of selecting the first or last response options for mail respondents did not differ from any of the web devices. The general lack of differences in data quality outcomes between the two self-administered modes confirms that these modes can produce very similar results with unified mode design.

As web survey software templates increasingly use wide buttons as a default, we are also reassured that there are no differences in any of the data quality outcomes across the radio buttons and wide buttons for the items examined here. This was true overall and for the item-by-item condition alone. Thus, researchers conducting mixed-mode surveys who want to use a wide button default in their web survey and radio button-style answer spaces in their mail survey are unlikely to see notable shifts in the distribution of answers, at least for battery items.

Many previous studies have examined the design of battery items in self-administered surveys, largely showing that item-by-item displays yield lower levels of straightlining than the same items displayed in grids. This work replicates and extends past work in two ways – comparing web survey respondents with mail survey respondents and comparing the display of the response categories for the item-by-item questions as radio buttons versus wide buttons. Nevertheless, more work can be done. Although we accounted for multiple characteristics of the respondents, we did not examine whether the effects of the experimental (grid vs. item-by-item; wide vs. radio buttons) or observational (mode and device) independent variables varied across characteristics of the respondents. To the extent that one design feature or device is more burdensome, then the challenges of answering may be exacerbated for persons with lower cognitive abilities. Additionally, although the items examined here were part of a battery, they were not necessarily designed to create a scale. As such, we did not examine interitem correlations or latent variable models. Future work could examine these measurement outcomes where appropriate. Finally, although we looked at four batteries with vastly different topics and response scales, all batteries had only four or five response options and a limited number of items. We also looked at three attitudinal batteries and one behavioral battery. Future work could examine different topics, more response options, and more scale items.

Decisions to display a battery in a grid or not, to have multiple modes of data collection, or to permit mobile optimization on web displays are important. The analyses presented here indicate that researchers may see little effect of these decisions on the quality of answers, at least on these measurement indicators. Consistent quality of responses across design features is critical when developing unified mode designs.

Supplemental Material

Supplemental Material - Display of battery items in web and mail surveys: Grids versus item-by-item and radio versus wide buttons

Supplemental Material for Display of battery items in web and mail surveys: Grids versus item-by-item and radio versus wide buttons by Kristen Olson, Jolene D Smyth, and Angelica Phillips in International Journal of Market Research.

Footnotes

Acknowledgements

Earlier versions of this paper were presented at the 2018 Midwest Association for Public Opinion Research annual meeting and at the 2019 European Survey Research Association Conference.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Data collection for this study was funded by the UNL Office of Research and Economic Development. The analysis was partially funded by Cooperative Agreement USDA-NASS 58-AEU-5-0023, supported by the National Science Foundation National Center for Science and Engineering Statistics.

ORCID iDs

Kristen Olson

Jolene D Smyth

Data availability statement

Replication information is available at https://osf.io/tuyfg/?view_only=460ac82e67554c7a947de2d8923b6eca (Olson, 2023).

Supplemental Material

Supplemental material for this article is available online.

References

Antoun

Couper

M. P.

Conrad

F. G.

(2017). Effects of mobile versus PC web on survey response quality: A crossover experiment in a probability web panel. Public Opinion Quarterly, 81(S1), 280–306. https://doi.org/10.1093/poq/nfw088

Antoun

Nichols

Olmsted-Hawala

Wang

(2020). Using buttons as response options in mobile web surveys. Survey Practice, 13(1), 1–10. https://doi.org/10.29115/sp-2020-0002

Baker-Prewitt

Miller

(2013). Mobile research risk: What happens to data quality when respondents use a mobile device for a survey designed for a PC. CASRO online research conference. March 7-8, 2013. San Francisco, CA.

Bradburn

(1978). Respondent burden. Proceedings of survey research methods section of the American Statistical Association, 35–40. http://www.asasrms.org/Proceedings/y1978f.html.

Callegaro

Shand-Lubbers

Dennis

J. M.

(2009). Presentation of a single item versus a grid: Effects on the vitality and mental health scales of the SF-36v2 health survey. Annual meeting of the American association for public opinion research. May 14-17, 2009. Hollywood, FL.

Clement

S. L.

Severin-Nielsen

M. K.

Shamshiri-Petersen

(2020). Device effects on survey response quality. A comparison of smartphone, tablet and PC responses on a cross sectional probability sample. In Survey methods: Insights from the field, special issue: ‘Advancements in online and mobile survey Methods’ (No vol.). Survey Methods: Insights from the Field. https://doi.org/10.13094/SMIF-2020-00020.

Couper

M. P.

Antoun

Mavletova

(2017). Mobile web surveys. Total survey error in practice (pp. 133–154). John Wiley and Sons, Inc.

Couper

M. P.

Peterson

G. J.

(2017). Why do web surveys take longer on smartphones? Social Science Computer Review, 35(3), 357–377. https://doi.org/10.1177/0894439316629932

Couper

M. P.

Tourangeau

Conrad

F. G.

Zhang

(2013). The design of grids in web surveys. Social Science Computer Review, 31(3), 322–345. https://doi.org/10.1177/0894439312469865

10.

Couper

M. P.

Traugott

M. W.

Lamias

M. J.

(2001). Web survey design and administration. Public Opinion Quarterly, 65(2), 230–253. https://doi.org/10.1086/322199

11.

Dale

Walsoe

(2020). Optimizing grid questions for smartphones: A comparison of optimized and non-optimized designs and effects on data quality on different devices. In Advances in questionnaire design, development, evaluation and testing (pp. 375–402). Hoboken, NJ: John Wiley & Sons.

12.

Debell

Wilson

Jackman

Figueroa

(2019). Optimal response formats for online surveys: Branch, grid, or single item? Journal of Survey Statistics and Methodology, 9(1), 1–24. https://doi.org/10.1093/jssam/smz039

13.

Dillman

D. A.

Smyth

J. D.

Christian

L. M.

(2014). Internet, phone, mail, and mixed mode surveys: The tailored design method. John Wiley and Sons.

14.

Fowler

F. J.

Jr. Cosenza

Cripps

L. A.

Edgman-Levitan

Cleary

P. D.

(2019). The effect of administration mode on cahps survey response rates and results: A comparison of mail and web-based approaches. Health Services Research, 54(3), 714–721. https://doi.org/10.1111/1475-6773.13109

15.

Grady

R. H.

Greenspan

R. L.

Liu

(2019). What is the best size for matrix-style questions in online surveys? Social Science Computer Review, 37(3), 435–445. https://doi.org/10.1177/0894439318773733

16.

Guidry

K. R.

(2012). Response quality and demographic characteristics of respondents using a mobile device on a web-based survey. Annual meeting of the American association for public opinion research. May 17-20, 2012. Orlando, Florida.

17.

(2020). Horizontal or vertical? The effects of visual orientation of categorical response options on survey responses in web surveys. Social Science Computer Review, 38(6), 779–792. https://doi.org/10.1177/0894439319834296

18.

Israel

G. D.

Lamm

A. J.

(2012). Item nonresponse in a client survey of the general public. Survey Practice, 5(2), 1–5. https://doi.org/10.29115/sp-2012-0010

19.

Keusch

Yan

(2017). Web versus mobile web: An experimental study of device effects and self-selection effects. Social Science Computer Review, 35(6), 751–769. https://doi.org/10.1177/0894439316675566

20.

Kim

Hwang

Jang

H. R.

Sohn

Ahn

H. S.

Park

H. D.

Huh

Jin

D. C.

Kim

Y. G.

Kim

D. J.

H. Y.

Lee

J. E.

(2019). Creatinine- and cystatin C-based estimated glomerular filtration rate slopes for the prediction of kidney outcome: A comparative retrospective study. BMC Nephrology, 20(1), 214–233. https://doi.org/10.1186/s12882-019-1403-1

21.

Krebs

Höhne

J. K.

(2020). Exploring scale direction effects and response behavior across PC and smartphone surveys. Journal of Survey Statistics and Methodology, 9(3), 477–495. https://doi.org/10.1093/jssam/smz058

22.

Krosnick

J. A.

(1991). Response strategies for coping with the cognitive demands of attitude measures in surveys. Applied Cognitive Psychology, 5(3), 213–236. https://doi.org/10.1002/acp.2350050305

23.

Leon

C. M.

Aizpurua

van der Valk

(2022). Agree or disagree: Does it matter which comes first? An examination of scale direction effects in a multi-device online survey. Field Methods, 34(2), 125–142. https://doi.org/10.1177/1525822x211012259

24.

Lesser

Newton

Yang

(2012). Comparing item nonresponse across different delivery modes in general population surveys. Survey Practice, 5(2), 1–4. https://doi.org/10.29115/sp-2012-0009.

25.

Link

M. W.

Murphy

Schober

M. F.

Buskirk

T. D.

Childs

J. H.

Tesfaye

C. L.

(2014). Mobile technologies for conducting, augmenting and potentially replacing surveys. Report, 2014. American Association for Public Opinion Research.

26.

Liu

Cernat

(2018). Item-by-item versus matrix questions: A web survey experiment. Social Science Computer Review, 36(6), 690–706. https://doi.org/10.1177/0894439316674459

27.

Lugtig

Toepoel

(2016). The use of PCs, smartphones, and tablets in a probability-based panel survey: Effects on survey measurement error. Social Science Computer Review, 34(1), 78–94. https://doi.org/10.1177/0894439315574248

28.

Lynn

Kaminska

(2012). The impact of mobile phones on survey measurement error. Public Opinion Quarterly, 77(2), 586–605. https://doi.org/10.1093/poq/nfs046

29.

Marken

Auter

Marlar

(2018). Mail or web first that is our question: A comparison of multi- and sequential mode offerings. Annual meeting of the American association for public opinion research. May 16-18, 2018. Denver, CO.

30.

Mavletova

(2013). Data quality in PC and mobile web surveys. Social Science Computer Review, 31(6), 725–743. https://doi.org/10.1177/0894439313485201

31.

Mavletova

Couper

M. P.

Lebedev

(2018). Grid and item-by-item formats in PC and mobile web surveys. Social Science Computer Review, 36(6), 647–668. https://doi.org/10.1177/0894439317735307

32.

McNeish

Stapleton

L. M.

Silverman

R. D.

(2017). On the unnecessary ubiquity of hierarchical linear modeling. Psychological Methods, 22(1), 114–140. https://doi.org/10.1037/met0000078

33.

Messer

Edwards

Dillman

(2012). Determinants of item nonresponse to web and mail respondents in three address-based mixed-mode surveys of the general public. Survey Practice, 5(2), 1–9. https://doi.org/10.29115/sp-2012-0012

34.

Millar

Dillman

(2012). Do mail and internet surveys produce different item nonresponse rates? An experiment using random mode assignment. Survey Practice, 5(2), 1–6. https://doi.org/10.29115/sp-2012-0011

35.

Mockovak

(2018). Horizontal vs. vertical scales vs. Use of a grid in online data collection: Which is better? Bureau of Labor Statistics, Office of Survey Methods Research Research Papers. https://www.bls.gov/osmr/research-papers/2018/pdf/st180020.pdf.

36.

Olson

(2023). IJMR grids replication materials. https://doi.org/10.17605/OSF.IO/TUYFG

37.

Olson

Smyth

J. D.

Horwitz

Keeter

Lesser

Marken

Mathiowetz

N. A.

McCarthy

J. S.

O’Brien

Opsomer

J. D.

Steiger

Sterrett

Suzer-Gurtekin

Z. T.

Turakhia

Wagner

(2021). Transitions from telephone surveys to self-administered and mixed-mode surveys: Aapor task force report. Journal of Survey Statistics and Methodology, 9(3), 381–411. https://doi.org/10.1093/jssam/smz062

38.

Peterson

Griffin

LaFrance

(2017). Smartphone participation in web surveys: Choosing between the potential for coverage, nonresponse, and measurement error. In Biemer

P. P.

de Leeuw

Eckman

Edwards

Kreuter

Lyberg

L. E.

Tucker

West

B. T.

(Eds.), Total survey error in practice (pp. 203–233). John Wiley and Sons.

39.

Peterson

Mechling

LaFrance

Swinehart

Ham

(2013). Solving the unintentional mobile challenge. Report, Market Strategies International.

40.

Peytchev

(2009). Survey breakoff. Public Opinion Quarterly, 73(1), 74–97. https://doi.org/10.1093/poq/nfp014

41.

Peytchev

Hill

C. A.

(2010). Experiments in mobile web survey design: Similarities to other modes and unique considerations. Social Science Computer Review, 28(3), 319–335. https://doi.org/10.1177/0894439309353037

42.

Revilla

Couper

M. P.

(2018). Comparing grids with vertical and horizontal item-by-item formats for PCs and smartphones. Social Science Computer Review, 36(3), 349–368. https://doi.org/10.1177/0894439317715626

43.

Revilla

Toninelli

Ochoa

(2017). An experiment comparing grids and item-by-item formats in web surveys completed through PCs and smartphones. Telematics and Informatics, 34(1), 30–42. https://doi.org/10.1016/j.tele.2016.04.002

44.

Richards

Powell

Murphy

Nguyen

(2016). Gridlocked: The impact of adapting survey grids for smartphones. Survey Practice, 9(3), 1–14. https://doi.org/10.29115/sp-2016-0016

45.

Roßmann

Gummer

Silber

(2017). Mitigating satisficing in cognitively demanding grid questions: Evidence from two web-based experiments. Journal of Survey Statistics and Methodology, 6(3), 376–400. https://doi.org/10.1093/jssam/smx020

46.

Smyth

J. D.

Olson

(2020). A comparison of fully labeled and top-labeled grid question formats. In Advances in questionnaire design, development, evaluation and testing (pp. 229–257). Hoboken, NJ: John Wiley & Sons.

47.

Snijders

T. A. B.

Bosker

R. J.

(2012). Multilevel analysis: An introduction to basic and advanced multilevel modeling. Sage.

48.

Sommer

Diedenhofen

Musch

(2017). Not to Be considered harmful: Mobile-device users do not spoil data quality in web surveys. Social Science Computer Review, 35(3), 378–387. https://doi.org/10.1177/0894439316633452

49.

Stern

Sterrett

Bilgen

(2016). The effects of grids on web surveys completed with mobile devices. Social Currents, 3(3), 217–233. https://doi.org/10.1177/2329496516657335

50.

Struminskaya

Weyandt

Bosnjak

(2015). The effects of questionnaire completion using mobile devices on data quality. Evidence from a probability-based general population panel. Methods, Data, Analyses, 9(2), 261–291. https://mda.gesis.org/index.php/mda/article/view/2015.014.

51.

Suzer-Gurtekin

Z. T.

Elkasabi

Lepkowski

J. M.

Liu

Curtin

(2019). Randomized experiments for web-mail surveys conducted using address-based samples of the general population. Experimental Methods in Survey Research, 275–289. https://doi.org/10.1002/9781119083771.ch14

52.

Toepoel

Das

Van Soest

(2009). Design of web questionnaires: The effects of the number of items per screen. Field Methods, 21(2), 200–213. https://doi.org/10.1177/1525822x08330261

53.

Toepoel

Lugtig

(2014). What happens if you offer a mobile option to your web panel? Evidence from a probability-based panel of internet users. Social Science Computer Review, 32(4), 544–560. https://doi.org/10.1177/0894439313510482

54.

Tourangeau

Conrad

F. G.

Couper

M. P.

(2013). The science of web surveys. Oxford University Press.

55.

Tourangeau

Couper

M. P.

Conrad

F. G.

(2004). Spacing, position, and order - interpretive heuristics for visual features of survey questions. Public Opinion Quarterly, 68(3), 368–393. https://doi.org/10.1093/poq/nfh035

56.

Tourangeau

Maitland

Rivero

Sun

Williams

Yan

(2017). Web surveys by smartphone and tablets: Effects on survey responses. Public Opinion Quarterly, 81(4), 896–929. https://doi.org/10.1093/poq/nfx035

57.

Vehovar

Couper

M. P.

Čehovin

(2022). Alternative layouts for grid questions in PC and mobile web surveys: An experimental evaluation using response quality indicators and survey estimates. Social Science Computer Review, Online First, no pp. https://doi.org/10.1177/08944393221132644.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.59 MB