Sage Journals: Discover world-class research

Abstract

Probes are follow-ups to survey questions used to gain insights on respondents’ understanding of and responses to these questions. They are usually administered as open-ended questions, primarily in the context of questionnaire pretesting. Due to the decreased cost of data collection for open-ended questions in web surveys, researchers have argued for embedding more open-ended probes in large-scale web surveys. However, there are concerns that this may cause reactivity and impact survey data. The study presents a randomized experiment in which identical survey questions were run with and without open-ended probes. Embedding open-ended probes resulted in higher levels of survey break off, as well as increased backtracking and answer changes to previous questions. In most cases, there was no impact of open-ended probes on the cognitive processing of and response to survey questions. Implications for embedding open-ended probes into web surveys are discussed.

Keywords

web probing open-ended questions reactivity response quality cognitive pretest

Introduction

In recent years, open-ended questions have experienced a renaissance (Neuert et al. 2021), particularly in the context of web surveys (Smyth et al. 2009), as the cost and effort of data collection (Gavras and Höhne 2020; Revilla and Couper 2019) and coding of responses (Schonlau and Couper 2016) are much decreased due to technological development. Open-ended narrative questions are considered the “classic open-ended question […], in which respondents are invited to articulate their response using their own words” (Couper et al. 2011). Probes are a specific type of open-ended narrative question that directly relate to a forgoing closed survey question (Behr et al. 2012b; Schuman 1966) and can be used to assess the validity and even cross-cultural comparability of survey questions (Meitinger 2018). They are frequently used at the stage of question development and cognitive pretesting for the purpose of question evaluation, for instance, to examine whether a term in the survey question is understood in the way intended by the researcher or to gain insights on why respondents chose a response option (Collins 2015; Miller et al. 2014). The implementation of open-ended probes in large-scale surveys is less common, though researchers have repeatedly argued for this to clarify reasons for a response (Schuman 1966), gain insights on reasons for lack of measurement invariance (Meitinger 2017), or even to encourage more truthful answers (Couper 2013; Singer and Couper 2017).

Next to these described benefits, there are concerns that open-ended probes impact surrounding survey questions. Recent studies that embedded open-ended probes in web surveys indicated an increase in survey break offs and item nonresponse (Luebker 2021) as well as slight shifts in response behavior (Couper 2013; Fowler and Willis 2020). If embedding open-ended probes affects the response behavior to web survey questions, the comparability of survey questions asked with and without open-ended probes may be compromised. This would affect settings such as the one proposed by Schuman (1966), in which probes are asked to a random subsample within a survey, or longitudinal analysis of panel data, if probes are implemented in some, but not all waves. Moreover, the validity of insights gained from web probing responses depends on respondents understanding and answering the survey questions in the same way regardless of whether or not they receive probing questions.

The present article sets out to examine the effects of open-ended probes on web survey questions. The background section begins with an overview of previous studies that examined the effect of open-ended probes on closed survey questions and points to current research gaps. Next, the differences between open-ended and closed questions in terms of burden, cognitive processing and response behavior are summarized, and the notion of measurement reactivity in surveys is introduced. From this, the research questions and hypotheses are derived. A between-subject experiment is reported which assessed the effects of embedding open-ended probes on the processing of and response to closed web survey questions. The experiment examined survey break off, backtracking, and answer changes to previous survey questions, as well as response times, nonresponse, and response behavior to successive survey questions. Finally, the benefits and potential adverse effects of embedding open-ended probes into web surveys are discussed.

Background

Previous Research on the Impact of Open-Ended Probes on Survey Response

In the realm of web surveys, three studies have examined the impact of open-ended probes on closed survey questions. Luebker (2021) examined the effect of embedding an open-ended probe on survey break off and item nonresponse to a closed opinion question. He found that a probe displayed on the same survey page as the question it pertained to increased survey break off by 0.6 and item nonresponse by more than 25 percentage points. When using a paging design—that is displaying the probe on a separate survey page—there was a stronger impact on survey break off of 1.4 percentage points, but no effect on item nonresponse as compared to inserting no probe. It must be noted that the probe in this experiment more resembled an open-ended text field at the end of a closed survey question than a typical open-ended narrative probe and was worded in a strongly nonmandatory manner (“If you like, you can add some bullet points to your response.”).

Couper (2013) reported the results of two experiments that inserted open-ended probes into a ten-item scale on attitudes towards immigrants in a probability panel in the Netherlands. In the first experiment, respondents were presented with one mandatory open-ended probe after each item using a paging design. In the second experiment, respondents were presented with an optional open-ended probe on the same screen as the respective closed survey item. In both experiments, there was a small but significant difference in the overall means of the item battery between the experimental condition with probes and the control group, with respondents reporting lower levels of prejudice in the condition with probes. In the experiment using the paging design, this effect occurred as of the second-shown survey item.

Finally, a study by Fowler and Willis (2020) compared responses to survey questions depending on probe placement. Respondents answered nine closed items on perceptions of neighborhood walkability, such as the presence of sidewalks, trails, or paths. On one condition, they received four open-ended probes on the survey page directly after the item battery. In the other condition, respondents were presented with the probes retrospectively at the end of the survey, that is, with several unrelated survey questions in between. Results showed a small but significant effect of probe placement on the mean walkability score, with respondents who received the probes directly after the survey questions reporting slightly enhanced perceptions of walkability. It must be noted that the study was not a strictly randomized experiment, as the condition that included probes directly after the survey items was fielded three weeks before the condition with retrospective probes, and the sample was not representative of the U.S. population in terms of demographics.

In sum, previous studies lend support for the notion that open-ended probes impact whether a respondent continues the survey, answers survey questions, and how they answer them. However, the studies also raise many questions. Regarding survey break off and item nonresponse, the effect sizes found by Luebker (2021) merit further examination. The study only examined one survey question, and the probe was rather atypical. Effects may vary across both probe and question types. Regarding response behavior to the survey questions, Couper (2013) found that using a paging design influenced response behavior to subsequent items (i.e., as of the second-shown item), whereas Fowler and Willis (2020) found an effect on their item battery despite presenting their probes on a separate page after the survey questions. A possible explanation for this effect could be that respondents in Fowler and Willis’ study backtracked to the previous survey page and changed their answers; however, the study provides no details on this. Most importantly, the reasons for the shift in the overall means found in both studies remain unclear. Couper (2013) had assumed that open-ended text fields in which respondents could justify their responses would reduce the threat of sensitive questions and lead to an increase in socially undesirable answers; however, the shift in response behavior indicated the opposite effect. Moreover, in both studies, the effects on the overall means were rather small. Possibly, effects on response behavior can be better examined using other measures, such as indicators of response quality or response styles. However, there is currently no framework to predict such effects.

Cognitive Processing of Open-Ended and Closed Questions

The following section draws on literature on open-ended probes in the context of cognitive interviewing and web probing, as well as open-ended narrative questions in general. Other types of open-ended (such as numeric) questions or probes with closed response options, also known as targeted embedded probes (Scanlon 2019, 2020), are not considered.

The process of survey response optimally consists of several cognitive steps (Tourangeau, Rips, and Rasinski 2000). Respondents must interpret the pragmatic meaning of a survey question. They embark on an information retrieval process, which is truncated when respondents have gathered enough information to form a judgment of sufficient certainty. The relevance of the accessible information is assessed, and an internal judgment is formed. This is then adjusted to the response format of the survey question.

For closed questions, the available response options may contribute to construing the meaning of a question (Schwarz et al. 1988) and impact the perceived relevance of the retrieved information. For open-ended questions, neither question interpretation nor the assessment of which accessible information is relevant to form a judgment is guided—and potentially limited—by predefined response options. However, open response formats also bear the risk that respondents deem aspects irrelevant if they consider them self-evident, or that information retrieval is truncated before relevant information is retrieved (Tourangeau et al. 2014).

The differences in these processes impact whether and how respondents answer open-ended and closed questions. In general, the respondent tasks associated with open-ended questions are considered more demanding and burdensome (Krosnick 1999; Tourangeau and Rasinski 1988). In line with this, a higher number of open-ended questions in a survey is associated with an increased likelihood of survey break off (Galesic 2006), and inserting multiple open-ended questions on one page has a particularly strong effect (Peytchev 2009). The study by Luebker (2021) confirmed the negative impact on survey break off for open-ended probes, in particular when the probe is presented on a separate survey page. Moreover, open-ended questions result in higher levels of item nonresponse than corresponding closed questions (Reja et al. 2003; Zuell, Menold, and Körber 2015), particularly among lower educated respondents (Andrews 2005; Miller and Lambert 2014; Schmidt, Gummer, and Roßmann 2020; Scholz and Zuell 2012; Zuell and Scholz 2015). These findings have been confirmed for open-ended as compared to closed probes (Neuert, Meitinger, and Behr 2021).

Differences can also be found in response distributions between open-ended and closed survey questions (Reja et al. 2003) and probes (Neuert, Meitinger, and Behr 2021). Responses not included in a closed format are unlikely to be named by respondents, even when an open-ended “other” field is included. On the other hand, any given opinion, theme, or topic is less likely to be volunteered in an open response format than when it must simply be “recognized” in a closed question (Bradburn 1983).

Open-ended probes are more directed than other open-ended questions as they directly pertain to the preceding closed survey question (Foddy 1998; Neuert, Meitinger, and Behr 2021; Silber, Zuell, and Kuehnel 2020). In the context of web probing, three types of probes are mainly employed. Comprehension probes ask about the understanding of a term used in the survey question. In category selection probes respondents are requested to explain why they chose their response option. Specific probes encourage respondents to provide additional information on a particular detail of the item (Behr et al. 2012a, 2012b; Meitinger, Braun, and Behr 2018). These probe types ask respondents to focus on different aspects of their survey responses. Due to the differences in the response process between open-ended and closed questions, probes may lead respondents to consider additional or different aspects of a question than they did while answering it, or respondents may come to a different evaluation as to which retrieved information should be relevant to their judgment. Moreover, simply the process of repeated thinking about survey questions may impact how a respondent answers it. This could lead to an interaction of probing and survey questions. The following section describes the impact that questions within a survey can have on each other, known as measurement reactivity, and applies it to probing questions.

Measurement Reactivity in Surveys

The notion that examining a phenomenon can alter the phenomenon itself is discussed in many areas of research, from physics to behavioral psychology. In survey research, the notion of measurement reactivity was examined in a series of experiments using personality measures. Knowles et al. (1992) argued that thinking about questions has consequences for question construal and that increased reflection on a topic makes a certain interpretation more salient, leading to a polarization of judgment. They postulated that later items within a measure (or items in a repeated measurement) show more extreme, but also more reliable and consistent responses. To examine this, the order of multi-item measures was randomized (Knowles 1988), with later items showing higher reliability and more extreme answers. Importantly, there was generally no visible effect on the mean value of these items. The studies demonstrated that increased reflection about survey questions influences both cognitive processing and response to survey items and that these effects must not (necessarily) be visible by a simple comparison of means.

Whether respondents’ verbalized reflection on survey questions causes reactivity has been subject to debate since the dawn of cognitive testing. The early standard of verbal protocols required respondents to think aloud while answering a survey question (Ericsson and Simon 1980, 1993). This was criticized by researchers as potentially increasing the effort required to create a response (Willis 1994, 2005), especially after an experimental study demonstrated that think-aloud protocols impact task accuracy and response times for some tasks (Russo, Johnson, and Stephens 1989). The debate gave rise to the use of probing questions, which are administered after the respondent has completed the survey question. Beatty and Willis (2007) argued that using probes in cognitive interviews may be less likely to cause reactivity than employing the think-aloud technique. However, other researchers argued that probes may likewise lead to invalid or reactive reports (Conrad, Blair, and Tracy 1999) by interfering with the natural flow of the survey interview (Beatty 2004), and a recent meta-analysis supported the notion that directive probing can impact task accuracy (Fox, Ericsson, and Best 2011). A further study showed some indication of increased respondent motivation through verbal probing, but remained inconclusive (Sudman, Bradburn, and Schwarz 1996).

Research Questions and Hypotheses

This study aims to enhance our understanding of the impact of open-ended probes on survey responses. I differentiate between effects on the survey in terms of survey completion, effects on the questions being probed, and effects on subsequent questions.

RQ1: Does embedding open-ended probes into web surveys impact survey break off?

RQ2: Does embedding open-ended probes into web surveys impact the survey questions the probes pertain to?

RQ3: Does embedding open-ended probes into web surveys impact subsequent survey questions?

Based on the findings from previous studies and literature on open-ended questions and measurement reactivity, I put forward several hypotheses. The strongest possible adverse effect of an open-ended question occurs if a respondent chooses to discontinue the survey. The sum of past research indicates that adding open-ended probes to a web survey results in higher levels of survey break off (Galesic 2006; Luebker 2021; Peytchev 2009).

H1: Embedding open-ended probes into web surveys increases survey break off.

Embedding open-ended probes may impact the survey questions they relate to, either if probes are presented alongside the survey question on the same page (as in some of the experimental conditions in Couper 2013; Luebker 2021), or if respondents have the possibility to return to previous questions in a paging design (which would explain the effects found by Fowler and Willis 2020). A probe may cause respondents to reconsider their interpretation of a survey question, access other information, or include other information in their judgment. Previous research has indicated that reverse question order effects may arise when respondents have the possibility to return to previous questions (Sudman, Bradburn, and Schwarz 1996), meaning that subsequent questions can influence responses to previous ones (Bishop et al. 1988; Schwarz and Hippler 1995; Schwarz et al. 1991). Therefore, I hypothesize that embedding open-ended probes leads to an increase in backtracking and changing one's answer to previous survey questions:

H2a: Embedding open-ended probes into web surveys increases backtracking to previous survey questions.

H2b: Embedding open-ended probes into web surveys increases answer changes to previous survey questions.

Next to the effects on the survey questions they relate to, open-ended probes may impact how respondents process and answer subsequent questions. The following hypotheses rest on the assumption that probes cause respondents to reflect on their previous survey responses, and that respondents process survey questions more deeply when they are expecting these questions to be followed by probes. Knowles (1988) demonstrated that increased thinking about questions leads to judgment polarization and more consistent responses.

Regarding cognitive effort, response times are considered “the most important means for investigating hypotheses about mental processing” (Yan and Tourangeau 2008). Findings from think-aloud and verbal probing (Fox, Ericsson, and Best 2011; Russo, Johnson, and Stephens 1989) indicate that response times increase in interviewer-administered settings for some questions. Therefore, I hypothesize that response times to closed survey questions increase when open-ended probes are embedded:

H3: Embedding open-ended probes into web surveys increases response times to subsequent survey questions.

Unfortunately, previous studies did not report on nonresponse (Couper 2013; Fowler and Willis 2020) or only examined one question (Luebker 2021). However, if embedding open-ended probes causes respondents to reflect on survey questions more deeply and this leads to judgment polarization, it can be assumed that nonresponse decreases for subsequent items:

H4: Embedding open-ended probes into web surveys decreases nonresponse for subsequent survey questions.

Previous research has found effects of embedding open-ended probes on the mean sum score of multi-item measures (Couper 2013; Fowler and Willis 2020), but the effect size was small and the direction could not be predicted. Knowles et al. (1992) argued that increased thinking about answers impacts response behavior but that this must not necessarily be visible in the form of a mean shift. Rather, increased thinking about questions leads to judgment polarization and more consistent responses. Therefore, I hypothesize that embedding open-ended probes does not (consistently) impact means. Instead, differences become visible in the form of increased extreme responding and nondifferentiation:

H5a: Embedding open-ended probes into web surveys does not impact mean scores for subsequent survey questions.

H5b: Embedding open-ended probes into web surveys increases extreme responding for subsequent survey questions.

H5c: Embedding open-ended probes into web surveys increases non-differentiation for subsequent survey questions.

Method

An experiment was designed with the aim of comparing closed survey questions that were either accompanied by open-ended probes or not. Respondents received six survey pages with closed attitude questions. A between-subject design was used, in which respondents were randomly assigned to experimental condition (A) which embedded open-ended probes between the survey pages with closed questions, or condition (B) which contained only the closed questions.

Survey and Probing Questions

The closed survey questions presented a mix of single- and multi-item measures using common constructs in social science research, such as political attitudes, personality, and well-being. Measures that have been accompanied by open-ended probes in other studies were chosen when possible. Multi-item measures were presented in a grid format on one survey page. The exact wording of the closed survey questions and the open-ended probes from condition (A) can be found in Online Appendix Table A1.

The first closed survey question was a single-item measure of left–right orientation (Q1), which is considered a “central element of political science research” (Zuell and Scholz 2015:28). Left–right orientation is implemented in several general population surveys, such as the German General Social Survey (GESIS 2020) or the German Longitudinal Election Study (GLES 2019). The question has repeatedly been complemented by open-ended probes to gain additional insights into the meaning of left and right (Bauer et al. 2017; Fuchs and Klingemann 1989, 1990; Scholz and Zuell 2012; Zuell and Scholz 2015). It was succeeded by another political construct, the two-item short scale on political cynicism (Q2) (Aichholzer and Kritzinger 2016), which captures respondents’ general trust in politicians’ honesty. It was developed for the Austrian National Election Study (Kritzinger et al. 2014) and has been implemented by other researchers since (Prochazka 2020). The third question was the six-item short scale for the Gamma factor of social desirability (Q3) (Kemper et al. 2014). Social desirability responding is the tendency to give overly positive self-descriptions (Paulhus 2002). The construct's Gamma factor is implemented into questionnaires to control whether self-reports may be biased by social desirability responding (Nießen et al. 2019). Survey research has indicated reactivity in personality measures (Knowles 1988; Knowles et al. 1992). General life satisfaction (Q4) and relationship satisfaction (Q5) (Beierlein et al. 2014; Schwarz, Strack, and Mai 1991) were implemented as further single-item measures. They have been followed by open-ended probes in the past to analyze determinants of self-reported satisfaction (Edwards and Lopez 2006). Furthermore, past research has repeatedly demonstrated that the communicative context of these questions impacts how the items are answered and how strongly they relate to one another (Schuman and Presser 1981; Smith 1982). The measure of intergenerational social support (Q6) consists of six items on family support (Gerlitz 2014; Legewie et al. 2007). All questions included the nonsubstantive response option “I don’t want to answer” except the personality measure (Q3).

In experimental condition (A), each closed question was followed by one open-ended probe. For the multi-item inventories (Q2, Q3, and Q6), respondents were randomly presented with a comprehension or category selection probe. The single-item measures on life and relationship satisfaction (Q4 and Q5) were each followed by a specific probe. The question on left–right orientation (Q1) was followed by two probes on the understanding of the terms “left” and “right” as in previous studies (Zuell and Scholz 2015). To keep the survey setting identical across conditions and to be able to attribute survey break offs to the probing situation, the probes were not announced on the welcome page of the survey. Instead, the first probe was introduced with the words “We would like to receive more information on the previous question.” Probes were presented using a paging design, with the survey question being repeated on the page with the probe. In addition, the selected survey response was repeated for category selection and specific probes.

Web Survey

An online survey was carried out with a nonprobability sample between November 20 and December 2, 2020, with the panel provider Respondi AG. The sample included quotas to depict the German online population in terms of gender (male, female),¹ age (18–29, 30–39, 40–49, 50–59, and 60 or more years), education (low, medium, and high) and region (former East and West Germany). Respondents were randomly assigned to experimental condition (A) or (B). There were no significant differences regarding demographics or devices used between experimental groups (see Online Appendix Table A2 for the sample composition). The web survey included several experiments, which were randomized independently of each other. The reported study was placed towards the beginning of the survey after the screening and quota questions and three short scales. No open-ended questions were implemented before the experiment.

The Universal Client-Side Paradata script by Kaczmirek and Neubarth (2007) was implemented to ensure a more exact measure of response latency (Yan and Tourangeau 2008) and collect questionnaire navigation data (Callegaro, Manfreda, and Vehovar 2015). The script records response behavior sequentially so that the resulting string variables enable coding backtracking to previous survey pages and answer changes to items. In accordance with both legal and ethical research standards (ADM et al. 2021; Kunz et al. 2020), respondents were informed about the collection and use of client-side paradata on the welcome page of the survey.

Of the 9,731 panelists invited to participate in the survey, 2,441 started the survey and of those, 241 broke off before completing it (before, during, or after the reported experiment), resulting in 2,200 completed questionnaires. This leads to a participation rate of 22.6 percent (American Association for Public Opinion Research 2016) and a break off rate of 9.9 percent (Callegaro and DiSogra 2008). Respondents received €1.50 for survey completion. About a quarter (27.1%; n = 597) of respondents filled out the survey using a mobile device. Average survey completion was 21.0 min (median: 17.3; n = 1,096) for respondents in condition (A) and 15.9 min (median: 12.4; n = 1,046) for respondents in condition (B).

Probe Response Quality and Content

Prior to examining the survey responses, probe response quality and content were analyzed. Low probe response quality may point to poorly designed probing questions. Probe response content was examined to gain insights into the respondents’ cognitive process of survey response and the quality of the survey questions. To ascertain probe response quality, the share of nonsubstantive responses was determined. Responses were coded as non-substantive when respondents left the probe empty, entered random characters, typed a “don’t know” answer or an explicit refusal, or other nonintelligible or noncodable content (Behr et al. 2012a). Between 12.2 percent and 21.6 percent of respondents in condition (A) gave nonsubstantive responses to the probing questions (see Table 1), which coincides with previous web probing studies (Behr et al. 2017; Meitinger and Behr 2016). To examine probe response content, the substantive probe responses were subjected to cognitive coding (Willis 2015) to determine whether the probe responses hinted at issues in respondents' cognitive process of survey response. This approach is also known as an error perspective as it gives insights into possible reasons behind measurement errors (Meitinger and Behr 2016). Errors or problems may occur at the stages of question comprehension, information retrieval, judgment, or response formatting (Willis, Schechter, and Whitaker 1999). For all but one question, error codes were detected for under 5 percent of respondents (see Table 1). Reported problems included misinterpreting words central to the question (i.e., the terms “left” or “right” in Q1) or mismatches between survey and probe responses (i.e., a low score of political cynicism in Q2, but probe responses indicating very low trust in politicians). Question Q5 on relationship satisfaction showed an unusually high share of error coding, with 25.5 percent of responses pointing to difficulties with judgment and/or response formatting. Almost all of these respondents reported that they were missing an answer to indicate that they were currently not in a relationship. A complete list of reported problems is available from the author on request.

Table 1.

Probe Response Quality and Content (Basis: Condition A, N = 1,096).

	Nonsubstantive response	Error detected	Error related to…
	Nonsubstantive response	Error detected	Question comprehension/Information recall	Judgment/Response formatting
Q1^a	21.6% (237)	4.7% (51)	3.7% (41)	1.0% (11)
Q2	13.0% (143)	4.7% (51)	2.1% (23)	2.6% (29)
Q3	13.8% (151)	1.9% (21)	0.5% (6)	1.4% (15)
Q4	12.2% (134)	0.9% (10)	0.2% (2)	0.7% (8)
Q5	15.5% (170)	26.3% (288)	0.7% (8)	25.5% (280)
Q6	19.3% (212)	3.4% (37)	2.2% (24)	1.2% (13)

For Q1, responses were coded as nonsubstantive when both probes contained nonsubstantive content; an error was coded when at least one probe response contained the error code.

Dependent Measures and Data Analysis

All dependent measures were compared across the two experimental conditions (1 = condition (A) open-ended probes; 0 = condition (B) without probes).

Survey break off, backtracking, and answer changes

Break offs that occurred during the reported experiment, that is on the pages containing the closed survey questions (Q1–Q6) or the open-ended probes (P1a/b-P6) were included in the analysis. Backtracking was recorded for respondents when the client-side paradata string recorded more than one page visit. Multiple page visits were coded into a binary variable for each respondent on page level (1 = backtracking to survey page; 0 = no backtracking to survey page) and aggregated across all six closed survey questions (1 = backtracking to at least one survey page; 0 = no backtracking to any survey pages). Likewise, the prevalence of answer changes after backtracking was coded on page level (Heerwegh 2003, 2011) (1 = answer change after backtracking; 0 = no answer change/no backtracking) and aggregated across all six closed survey questions (1 = answer change after backtracking on at least one survey page; 0 = no answer change after backtracking for any closed survey questions).

Chi-square tests of independence were used to examine whether the share of survey break offs, backtracking, and answer changes after backtracking differed between conditions. For backtracking and subsequent answer changes, analyses were additionally carried out on page level, with Bonferroni adjusted alpha levels for multiple comparisons.

Response times

Client-side response times were captured for each survey page. Response time data is positively skewed and subject to outliers, which makes decisions about outlier definition, handling, and potential transformation of response times prior to analysis of utmost importance (Kunz and Hadler 2020). A variety of response time outlier definitions exist (Matjašič, Vehovar, and Manfreda 2018). Detecting outliers based on the mean and standard deviation remains a common procedure (Yan and Olson 2013), but has been criticized, as the mean value is in turn influenced by outliers. Researchers have increasingly recommended using median-based outlier definitions (Höhne and Schlosser 2018). Employing 2.5 times the median absolute deviation is considered the most robust outlier threshold (Leys et al. 2013), and applied in the present study. This method led to between 9 percent and 12 percent of response times being identified as outliers. Response time outliers were omitted from response time analysis, as were all instances in which a survey page was visited more than once (backtracking).

To test the third hypothesis, a multivariate analysis of covariance (MANCOVA) was applied to the adjusted response times as of the second survey question (Q2–Q6), with the experimental condition as the main predictor. The model included age, education, and device used (PC/laptop or mobile) as covariates, as it was not possible to include baseline reading speed, which otherwise accounts for much of the variance within response times (Couper and Kreuter 2013; Lenzner, Kaczmirek, and Lenzner 2010; Yan and Tourangeau 2008). Because outlier exclusion as described above leads to a sizeable decrease in available cases for a MANCOVA, separate analyses of covariances (ANCOVAs) were run for each survey page (thus, only excluding outliers page-wise).

The robustness of response time analyses should be tested by applying different outlier detection methods (Revilla and Couper 2018). The alternative approach used excluded observations beyond the upper and lower one percentile from analysis (Yan and Tourangeau 2008). A MANCOVA and separate ANCOVAs were run using the same predictors. All analyses led to the same results.

Response behavior

Survey response

T-tests were used to test for differences between conditions. For the single-item measures (Q4 and Q5), the scale values coincide with the items’ raw values. For the multi-item measures, scoring was carried out according to the instruments’ documentation. For the two-item measure on political cynicism (Q2), the second item was recoded so that the scale direction was the same across items, with higher values indicating higher cynicism. The score is the sum of both items divided by the number of items (Aichholzer and Kritzinger 2016). For the social desirability responding measure (Q3), the two factors exaggerating positive qualities (PQ+) and minimizing negative qualities (NQ−) were calculated separately using the sum score, again dividing by the number of items (Kemper et al. 2014). For the intergenerational social support inventory (Q6), the fourth item was recoded so that the scale direction remained the same across items, and principal components factor analysis was conducted on the six items with oblique rotation (Direct Oblimin). The Kaiser-Meyer-Olkin measure of sampling adequacy was 0.65, and Bartlett's test of sphericity reached statistical significance (χ²(15) = 1,947.06, p < .001). The analysis identified two factors with an Eigenvalue greater than 1, explaining 58.84 percent of the variance. All items loaded above .60. The two factors could be attributed to support provided by older to younger generations (F1) and support provided by younger to older generations (F2).

Item nonresponse

Item nonresponse can occur in the form of (1) skipping an item entirely, that is, when respondents produce missing data by leaving an item blank, or (2) by choosing a nonsubstantive response option, such as “I don’t want to answer” or “I don’t know” (Cornesse and Blom 2020). Skipping items is the more precise indicator of satisficing and low response quality, as choosing a nonsubstantive response option at the least requires reading and choosing the respective response option (Schuman and Presser 1981). Because these types of nonresponse may occur for different reasons, they may be impacted differently by embedding open-ended probes and are analyzed separately.

Chi-square tests of independence were used to examine whether the frequency of skipping items or choosing nonsubstantive response options differed between conditions. Analyses were additionally carried out on page level (for Q2–Q6), with Bonferroni-adjusted alpha levels for multiple comparisons. For multi-item inventories (Q2, Q3, and Q6), nonresponse was aggregated across all items. The item battery on social desirability responding (Q3) did not to include a non-substantive response option and is not included in this analysis.

Nondifferentiation.

Nondifferentiation, also referred to as straightlining, was examined for the two multi-item batteries Q3 and Q6. A dichotomous and a metric measure of straightlining were calculated. The dichotomous measure indicated whether a respondent chose the identical response option for all items within one battery. The second measure “mean root of pairs” was chosen to “capture variations in the choice of answers in a battery” (Kim et al. 2019). It is calculated by producing a temporary index by computing the mean of the root of the absolute differences between all pairs of items in a battery and then rescaling the temporary index to range from 0, indicating least straightlining, to 1, indicating most straightlining (Chang and Krosnick 2009). To examine whether nondifferentiation differed between conditions, chi-square tests of independence were used for the absolute and T-tests for the metric measure of nondifferentiation.

Extreme responding

Extreme responding was defined as choosing either endpoint of a response scale (apart from non-substantive response options). Chi-square tests of independence were carried out on item level with Bonferroni-adjusted alpha levels for multiple comparisons.

All analyses were carried out using IBM SPSS Statistics Version 24.0.

Results

Effect on Survey Break Off

The first hypothesis postulated that embedding open-ended probes into a web survey increases survey break off. Seventy-nine respondents dropped out of the survey during the reported experiment. Most of these break offs (82.3%; n = 65) occurred in condition (A) with open ended probes. This difference was significant (n = 2,279; χ²₍₁₎ = 32.153; p < .001); therefore hypothesis 1 can be confirmed.

More than half of the break offs in condition (A) (n = 44) occurred during the first two survey questions (Q1 and Q2) or their respective probes (P1a, P1b, and P2). Respondents who broke off the survey during the reported experiment were significantly more likely to have lower education (low/medium: n = 59 vs. high: n = 20; χ²₍₁₎ = 7.567; p = .006) and be women (women: n = 55 vs. men: n = 24; χ²₍₁₎ = 12.914; p < .001), consistent with previous research (Roßmann, Blumenstiel, and Steinbrecher 2015) that respondents who break off surveys systematically differ from those who complete them.

Effect on Preceding Survey Questions

The second hypothesis stated that embedding open-ended probes into a web survey increases backtracking (H2a) and subsequent answer changes (H2b) to previous survey questions. In total, 6.1% (n = 134) of respondents backtracked to a previous survey page with closed questions. Of these, three quarters (76.1%; n = 102) were assigned to condition (A) with open-ended probes. A chi-square test of independence confirmed the significant difference between conditions (χ²₍₁₎ = 39.483; p < .001). Backtracking to the first-shown question (Q1) occurred most often (n = 46). Analysis of question level showed the same tendency for all six questions (see Table 2). Differences between conditions were significant for Q1, Q3, and Q4 based on Bonferroni-adjusted alpha levels of .0083 per test (.05/6).

Table 2.

Backtracking to Previous Survey Pages and Answer Changes After Backtracking on the Question Level.

		Condition (A): Open-ended probes	Condition (B): Without probes	Chi-square test of independence
	Valid n	% (n)	% (n)	χ²_(df); p
Backtracking to previous survey question
Q1	2,197	3.3% (36)	0.9% (10)	χ²₍₁₎ = 15.180; p < .001
Q2	2,196	1.4% (15)	0.6% (7)	χ²₍₁₎ = 2.998; p = .083
Q3	2,196	2.0% (22)	0.5% (5)	χ²₍₁₎ = 10.963; p = .001
Q4	2,196	1.3% (14)	0.1% (1)	χ²₍₁₎ = 11.440; p = .001
Q5	2,196	1.3% (14)	0.5% (6)	χ²₍₁₎ = 3.289; p = .070
Q6	2,196	1.4% (15)	0.6% (7)	χ²₍₁₎ = 2.998; p = .083
Answer change after backtracking
Q1	2,197	0.7% (8)	0.4% (4)	χ²₍₁₎ = 1.370; p = .242
Q2	2,196	0.5% (5)	0.4% (4)	χ²₍₁₎ = 0.119; p = .730
Q3	2,196	0.5% (5)	0.1% (1)	χ²₍₁₎ = 2.703; p = .100
Q4	2,196	0.4% (4)	0.1% (1)	χ²₍₁₎ = 1.826; p = .177
Q5	2,196	0.8% (9)	0.1% (1)	χ²₍₁₎ = 6.488; p = .011
Q6	2,196	0.5% (5)	0.2% (2)	χ²₍₁₎ = 1.312; p = .252

Answer changes after backtracking were carried out by 2.0% of respondents (n = 45). Of these, 71.1% (n = 32) were respondents in condition (A) with open-ended probes. A Chi-square test of independence confirmed that the difference between conditions was significant (χ²₍₁₎ = 8.332; p = .004), though there were no significant differences in question level when using the Bonferroni adjusted alpha levels (see Table 2). It should be noted that across both conditions, a similar share of respondents who returned to a previous survey page decided to change their answers. Thus, while embedding open-ended probes increases backtracking and subsequently answer changes, it does not increase the likelihood of backtrackers to change their survey response. In summary, hypothesis 2 can be confirmed.

Effects on Subsequent Survey Questions

Response times

The third hypothesis stated that embedding open-ended probes into a web survey increases cognitive effort and thus response time invested in subsequent questions. To examine this, the response times to Q2–Q6 between conditions were examined using MANCOVA. After outlier exclusion, 1,509 cases remained for analysis (condition A: n = 737; condition B: n = 772). There was a significant main effect of condition on overall response time (Wilks-Lambda = 0.98; F_(5,1500) = 6.01; p < .001; η² = .02). Age and device used were significant covariates, whereas education was not.² Table 3 depicts the mean response times and separate ANCOVAs run for each survey question. Results showed that response times differed significantly for only one of the examined questions: respondents who were presented with open-ended probes spent significantly longer on average (7.4 s) to respond to the question on relationship satisfaction (Q5) than respondents in condition (B) (6.7 s). Thus, based on the result of the MANCOVA, hypothesis 3 can be confirmed; however, the effect only seems to apply in specific situations.

Table 3.

Analysis of Covariances (ANCOVAs) of Response Times.

	Q2	Q3	Q4	Q5	Q6
n	1,999	1,977	1,954	1,971	2,012
Condition (A)
Mean response times (std dev)	16.7 (7.5)	40.9 (17.9)	8.0 (3.4)	7.4 (3.3)	45.8 (21.5)
Condition (B)
Mean response times (std dev)	16.5 (7.5)	40.5 (18.1)	8.1 (3.7)	6.7 (3.2)	46.9 (23.0)
F	1.12	1.73	0.15	30.68	0.45
p	0.29	0.19	0.70	0.00	0.50
Partial η²	0.00	0.00	0.00	0.02	0.00

Nonresponse

The fourth hypothesis postulated that nonresponse is lower in the condition with open-ended probes. Table 4 shows the occurrence of skipping items and choosing non-substantive response options on question level. Across Q2-Q6, 8% (n = 175) of respondents skipped at least one item. A chi-square test of independence showed no significant difference between conditions (χ²₍₁₎ = 0.251; p = .616) across the five questions. Question level tests using Bonferroni adjusted alpha levels of .01 (.05/5) confirmed this, though based on the global alpha level, more respondents left the question on relationship satisfaction (Q5) unanswered in condition (B) without probes.

Table 4.

Nonresponse.

	Condition (A): Open-ended probes	Condition (B): Without probes	Chi-square test of independence
Question	% (n)	% (n)	χ²_(df); p
Item skipping
Q2	0.7% (8)	1.3% (14)	χ²₍₁₎ = 1.61; p = .205
Q3	4.3% (47)	3.9% (43)	χ²₍₁₎ = 0.22; p = .641
Q4	0.1% (1)	0.1% (1)	χ²₍₁₎ = 0.00; p = .996
Q5	0.4% (4)	1.3% (14)	χ²₍₁₎ = 5.53; p = .019
Q6	2.6% (28)	2.5% (28)	χ²₍₁₎ = 0.00; p = .978
Nonsubstantive response
Q2	4.7% (52)	5.2% (57)	χ²₍₁₎ = 0.21; p = .651
Q4	2.1% (23)	0.8% (9)	χ²₍₁₎ = 6.32; p = .012
Q5	16.0% (175)	13.3% (147)	χ²₍₁₎ = 3.10; p = .078
Q6	6.8% (75)	6.3% (69)	χ²₍₁₎ = 0.32; p = .574

Respondents had the option to choose a nonsubstantive answer (“I don’t want to answer”) for all questions except Q3. In total, 22.7% (n = 499) of respondents chose a nonsubstantive response to at least one item. A chi-square test of independence showed a significant difference between conditions (χ²₍₁₎ = 3.905; p = .048). However, contrary to the hypothesis, more respondents in condition (A) with open-ended probes chose a nonsubstantive response (24.5%; n = 268) than in condition (B) (20.9%; n = 231). This difference remained significant on the question level for the question on general life satisfaction (Q4) based on the Bonferroni adjusted alpha level of .0125 (.05/4). Thus, hypothesis 4 could not be confirmed.

Mean scores

In line with hypothesis 5a, there were no significant differences between means for any of the single-item or multi-item measures (see Table 5).

Table 5.

Mean Scores.

Question	Range	Condition (A): Open-ended probesmean (std)	Condition (B): Without probesmean (std)	T_(df); p
Q1	1 (left) to 10 (right)	5.30 (1.78)	5.26 (1.86)	T₍₁₉₁₈₎ = .524; p = .600
Q2	1 (low) to 5 (high)	3.63 (.86)	3.66 (.88)	T₍₂₀₆₇₎ = −.631; p = .528
Q3: PQ+	1 (low) to 5 (high)	3.58 (.70)	3.53 (.77)	T₍₂₁₂₆₎ = 1.617; p = .106
Q3: NQ−	1 (low) to 5 (high)	2.04 (.82)	2.03 (.81)	T₍₂₁₅₄₎ = .376; p = .707
Q4	1 to 11 (extremely satisfied)	7.19 (2.19)	7.03 (2.35)	T₍₂₁₆₃₎ = 1.580; p = .114
Q5	1 to 11 (extremely satisfied)	7.90 (3.19)	7.69 (2.91)	T₍₁₈₃₀₎ = 1.464; p = .143
Q6: F1	−2.79 (disagree) to 3.15 (agree)	−.02 (1.01)	.02 (.99)	T₍₁₉₉₇₎ = −.830; p = .406
Q6: F2	−2.55 (disagree) to 2.87 (agree)	−.01 (1.00)	.01 (1.00)	T₍₁₉₉₇₎ = −.501; p = .616

Extreme responding

According to hypothesis 5b, more extreme responding should occur in condition (A) with open-ended probes. Chi-square tests were conducted for each of the eight probed items of Q2–Q6, using Bonferroni adjusted alpha levels of .00625 (.05/8). Table 6 shows the share of extreme responding across conditions on item level. Extreme responding was more likely for the single-item measure of relationship satisfaction (Q5). In line with the hypothesis, significantly more respondents reported that they were extremely satisfied or unsatisfied with their current relationship in condition (A) that included open-ended probes (37.4%, n = 343) than when no probes were embedded (25.7%; n = 242) (χ²₍₁₎ = 29.73; p < .001). There were no further significant differences between conditions. Thus, similar to the findings regarding response times, hypothesis 5b can be confirmed for one question only, that being the question on relationship satisfaction.

Table 6.

Extreme Responding (Probed Items Only).

		Condition (A): Open-ended probes	Condition (B): Without probes	Chi-square test of independence
Question	Valid n	% (n)	% (n)	χ²_(df); p
Q2 (item 1)	2,113	24.4% (259)	27.5% (290)	χ²₍₁₎ = 2.65; p = .104
Q2 (item 2)	2,093	20.6% (216)	22.0% (230)	χ²₍₁₎ = .61; p = .435
Q3 (item 1)	2,186	31.9% (347)	31.4% (345)	χ²₍₁₎ = .04; p = .835
Q3 (item 5)	2,185	54.1% (589)	53.0% (581)	χ²₍₁₎ = .30; p = .582
Q4	2,166	5.7% (61)	6.5% (71)	χ²₍₁₎ = .61; p = .437
Q5	1,860	37.4% (343)	25.7% (242)	χ²₍₁₎ = 29.73; p = .000
Q6 (item 2)	2,136	22.6% (240)	24.5% (263)	χ²₍₁₎ = 1.06; p = .304
Q6 (item 4)	2,097	22.4% (234)	23.6% (249)	χ²₍₁₎ = .42; p = .518

Nondifferentiation

Hypothesis 5c assumed higher levels of nondifferentiation among respondents in condition (A) with open-ended probes. This was examined for the two multi-item batteries Q3 and Q6. However, neither the absolute nor the metric measure using the mean root of pairs showed significant differences between conditions (see Table 7). Therefore, an impact on nondifferentiation cannot be confirmed.

Table 7.

Nondifferentiation.

	Condition (A): Open-ended probes	Condition (B): Without probes	Significance level
Absolute nondifferentiation	% (n)	% (n)
Q3	3.3% (36)	4.1% (45)	χ²₍₁₎ = .971; p = .324
Q6	10.6% (116)	9.3% (103)	χ²₍₁₎ = .965; p = .326
Mean root of pairs	Mean (std)	Mean (std)
Q3	.335 (.193)	.338 (.203)	T₍₂₁₉₈₎ = −.435; p = .664
Q6	.483 (.242)	.473 (.234)	T₍₂₁₉₈₎ = 1.046; p = .295

Discussion and Conclusion

The purpose of the presented research was to determine whether and in which ways embedding open-ended probes into web surveys impacts the process of responding to closed survey questions. In doing so, it took a different perspective than many current studies in the area of open-ended questions, which examine contextual effects on the response quality to open-ended questions and probes.³ The study differentiated between the effects of open-ended probes on survey completion, on the survey questions the probes pertain to, and on subsequent survey questions. To this end, a randomized web survey experiment was carried out, in which closed survey questions were fielded with and without open-ended probes using a paging design. Inserting open-ended probes increased survey break off and impacted the survey questions the probes pertained to in the form of increased backtracking and answer changes. Effects on subsequent questions occurred in single cases (see Table 8 for an overview of the hypotheses and results).

Table 8.

Overview of Hypotheses and Results.

Hypotheses:		Result
Embedding open-ended probes into web surveys…
Impact on survey break off
H1	… increases survey break off	Confirmed
Impact on preceding survey questions
H2a	… increases backtracking	Confirmed
H2b	… increases answer changes	Confirmed
Impact on subsequent survey questions
H3	… increases response times	Confirmed for one question only
H4	… decreases nonresponse in terms of item skipping and choosing nonsubstantive response options	Not confirmed
H5a	… does not impact mean scores	Confirmed
H5b	… increases extreme responding	Confirmed for one question only
H5c	… increases nondifferentiation	Not confirmed

The majority of break offs occurred in the condition with open-ended probes, particularly after the first- and second-shown probe. The open-ended probes were not announced at the beginning of the survey, which may have contributed to the increase in break offs for the first-shown probes (rather than at the announcement that probes will be asked). Importantly, respondents with lower education and women were more likely to break off the survey. Although embedding open-ended probes did not lead to an unusually high level of survey break off, survey researchers should consider this potential nonresponse bias when implementing open-ended probes in web surveys.

Embedding open-ended probes significantly increased backtracking and answer changes to previous survey questions. In the present study, 9 percent of respondents who received open-ended probes returned to a previous question, while only 3 percent of respondents did this in the condition without probes. Across both conditions, about one of three respondents who backtracked changed their response to the preceding survey question. Like survey break off, backtracking and answer changes to previous questions occurred most often in response to the first open-ended probes.

Asking open-ended probes did not impact subsequent questions for the most part. There were no significant effects on mean scores, item skipping or non-differentiation for any of the examined questions. For four of the five examined questions, there were no significant effects on response time, choosing a nonsubstantive response option or extreme responding. This indicates that, in most cases, the cognitive processing of survey questions and subsequent web survey data are not impacted by inserting open-ended probes. However, there are notable exceptions.

Respondents were significantly more likely to choose a nonsubstantive response option to the single-item measure of life satisfaction (Q4) in the condition with open-ended probes. Moreover, respondents took significantly longer to answer and were more likely to give an extreme response to the question on relationship satisfaction (Q5) in the condition with open-ended probes.

There are several possible explanations for these findings. First, the responses to the open-ended probes indicated that the question on relationship satisfaction was flawed because it lacked a response category to indicate that one was currently not in a relationship. By the time respondents in condition (A) reached the survey question on relationship satisfaction, they were certainly expecting an open-ended probe to follow. Possibly, respondents lacking a suitable response option dealt with this irritation differently when they were expecting to be able to explain their response in an open-ended text field than when they were not expecting this. The majority of respondents who indicated that they were not in a relationship in the probe either chose the available non-substantive response option (“I do not want to answer this question”) or an extreme response option (i.e., “very unsatisfied”), while only one respondent in condition (A) chose to skip the question.⁴

However, alternative explanations should be considered. Perhaps, the effects of embedding open-ended probes on the response to subsequent closed survey questions are more likely to occur when there is a close connection to the preceding survey and probing questions. In the present study, the question on relationship satisfaction was directly preceded by the closely related construct on life satisfaction (Schwarz, Strack, and Mai 1991).

Probing techniques and probe design vary strongly, as do the survey questions they pertain to, and generalizing the results of any given study to all settings is not possible. In the present study, the effect of open-ended probing on the response time and behavior of only one of the examined questions highlights that future research should establish which question, probe, and respondent characteristics determine when open-ended probes impact surrounding survey questions.

Thus, the present study has several limitations which point the way to future research. For instance, it could be that certain probe types, such as category selection probing, increase the likelihood of respondents backtracking and changing their answers more than other probes that do not prompt respondents to reconsider their survey response. Different spacing designs should be examined to understand whether asking probes after (almost) each survey question leads to other effects than spacing probes throughout the survey or only inserting one (random) probe. Future research should include other question types, such as behavior and factual questions. Finally, future studies should employ further measures of data quality, such as test–retest reliability (Knowles et al. 1992).

In summary, embedding open-ended probes can increase survey break off, backtracking, and answer changes to previous questions, though fortunately, none of these outcomes occurred very often in the present study. Survey researchers may omit a back button to prevent effects on previous questions in practice. Of course, effects on response behavior to subsequent survey questions cannot be prevented technically; however, for this study such effects were seldom, and there is no reason to assume a worrisome impact on data collection. More than ever, Schuman's (1966) suggestion to ask single open-ended probes to a subsample of a survey seems a timely and pragmatic compromise in order to control for rare effects of open-ended probes on response behavior while gaining insights into respondents’ thought processes and thereby validating survey responses.

Supplemental Material

sj-docx-1-smr-10.1177_00491241231176846 - Supplemental material for The Effects of Open-Ended Probes on Closed Survey Questions in Web Surveys

Supplemental material, sj-docx-1-smr-10.1177_00491241231176846 for The Effects of Open-Ended Probes on Closed Survey Questions in Web Surveys by Patricia Hadler in Sociological Methods & Research

Footnotes

Author's Note

The quantitative data set of this study and analysis file is available under the following link: Hadler, Patricia (2023): Hadler 2023 SMR_Effect of openended probes_Analysis.sps. figshare. Dataset. . The answers to the open-ended questions are not publicly available due to them containing information that could compromise participant privacy.

Declaration of Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Patricia Hadler

Supplemental Material

Supplemental material for this article is available online.

Notes

Author Biography

Patricia Hadler studied communication sciences, psychology, and French literature at Johannes Gutenberg-Universität Mainz (Magister Artium 2009). She is a PhD candidate in psychology at the University of Mannheim and a researcher at the GESIS pretest lab. Her research focus is cognitive pretesting methodology.

References

ADM, ASI, BVM and DGOF. 2021. “Richtlinie für Online-Befragungen.” (https://www.adm-ev.de/wp-content/uploads/2021/03/RL-Online-2021-neu.pdf).

Aichholzer

Julian

Kritzinger

Sylvia

. 2016. “Kurzskala politischer Zynismus (KPZ).” Zusammenstellung sozialwissenschaftlicher Items und Skalen (ZIS) (https://zis.gesis.org/skala/Aichholzer-Kritzinger-Kurzskala-politischer-Zynismus-(KPZ)).

American Association for Public Opinion Research. 2016. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. AAPOR. (https://aapor.org/wp-content/uploads/2022/11/Standard-Definitions20169theditionfinal.pdf)

Andrews

Mark

. 2005. Who Is Being Heard? Response Bias in Open-Ended Responses in a Large Government Employee Survey. Florida: Miami Beach (http://www.asasrms.org/Proceedings/y2005/files/JSM2005-000924.pdf).

Bauer

Paul C.

Barbera

Pablo

Ackermann

Kathrin

Venetz

Aaron

. 2017. “Is the Left-Right Scale a Valid Measure of Ideology? Individual-Level Variation in Associations With ‘‘Left’’ and ‘‘Right’’ and Left-Right Self-Placement.” Political Behavior 39:553‐83. doi:10.1007/s11109-016-9368-2.

Beatty

Paul C

. 2004 “The Dynamics of Cognitive Interviewing.” pp. 45‐66 in Methods for Testing and Evaluating Survey Questionnaires, edited by Presser

Rothgeb

J. M.

Couper

M. P.

Lessler

J. T.

Martin

Singer

Groves

R. M.

Kalton

Rao

J. N. K.

Schwarz

Skinner

. Hoboken, NJ, USA: John Wiley & Sons, Inc.

Beatty

Paul C.

Willis

Gordon B.

. 2007. “Research Synthesis. The Practice of Cognitive Interviewing.” Public Opinion Quarterly 71(2):287‐311. doi:10.1093/poq/nfm006.

Behr

Dorothée

Braun

Michael

Kaczmirek

Lars

Bandilla

Wolfgang

. 2012a. “Testing the Validity of Gender Ideology Items by Implementing Probing Questions in Web Surveys.” Field Methods 25(2):124‐41. doi:10.1177/1525822X12462525.

Behr

Dorothée

Kaczmirek

Lars

Bandilla

Wolfgang

Braun

Michael

. 2012b. “Asking Probing Questions in Web Surveys: Which Factors Have an Impact on the Quality of Responses?.” Social Science Computer Review 30(4):487‐98. doi:10.1177/0894439311435305.

10.

Behr

Dorothée

Meitinger

Katharina

Braun

Michael

Kaczmirek

Lars

. 2017. Web Probing – Implementing Probing Techniques From Cognitive Interviewing in Web Surveys With the Goal to Assess the Validity of Survey Questions. Mannheim: GESIS – Leibniz-Institute for the Social Sciences.

11.

Beierlein

Constanze

Kovaleva

Anastassyia

László

Kemper

Christoph J.

Rammstedt

Beatrice

. 2014. Kurzskala zur Erfassung der Allgemeinen Lebenszufriedenheit (L-1) . doi:10.6102/zis229.

12.

Bishop

George F.

Hippler

Hans-Jürgen

Schwarz

Norbert

Strack

Fritz

. 1988. “A Comparison of Response Effects in Self-Administered and Telephone Surveys.” pp. 321‐40 in Telephone Survey Methodology, edited by Groves

R. M.

Biemer

Lyberg

Massey

J. T.

Nicholls

W. L.

Waksberg

. New York: Wiley.

13.

Bradburn

Norman M.

1983. “Response Effects.” pp. 289‐328 in Handbook of Survey Research, edited by Rossi

Wright

Anderson

. New York: Academic Press.

14.

Callegaro

Mario

DiSogra

Charles

. 2008. “Computing Response Metrics for Online Panels.” Public Opinion Quarterly 72(5):1008‐32. doi:10.1093/poq/nfn065.

15.

Callegaro

Mario

Manfreda

Katja Lozar

Vehovar

Vasja

. 2015. Web Survey Methodology. Los Angeles: Sage Publications.

16.

Chang

Linchiat

Krosnick

Jon A.

. 2009. “National Surveys via Rdd Telephone Interviewing Versus the Internet. Comparing Sample Representativeness and Response Quality.” Public Opinion Quarterly 73(4):641‐78. doi:10.1093/poq/nfp075.

17.

Collins

Debbie

, ed. 2015. Cognitive Interviewing Practice. London: Sage Publications.

18.

Conrad

Frederick G.

Blair

Johnny

Tracy

Elena

. 1999. Verbal Reports Are Data! A Theoretical Approach to Cognitive Interviews. Washington, DC (https://www.bls.gov/osmr/research-papers/1999/st990240.htm).

19.

Cornesse

Carina

Blom

Annelies G.

. 2020. “Response Quality in Nonprobability and Probability-Based Online Panels.” Sociological Methods & Research 21(1):online first. doi:10.1177/0049124120914940.

20.

Couper

Mick P.

2013. “Research Note: Reducing the Threat of Sensitive Questions in Online Surveys?” Survey Methods: Insights from the Field: 1‐9. doi:10.13094/SMIF-2013-00008.

21.

Couper

Mick P.

Kennedy

Courtney

Conrad

Frederick G.

Tourangeau

Roger

. 2011. “Designing Input Fields for Non-Narrative Open-Ended Responses in Web Surveys.” Journal of Official Statistics 27(1):65‐85.

22.

Couper

Mick P.

Kreuter

Frauke

. 2013. “Using Paradata to Explore Item Level Response Times in Surveys.” Journal of the Royal Statistical Society: Series A (Statistics in Society) 176(1):271‐86. doi:10.1111/j.1467-985X.2012.01041.x.

23.

Edwards

Lisa M.

Lopez

Shane J.

. 2006. “Perceived Family Support, Acculturation, and Life Satisfaction in Mexican American Youth: a Mixed-Methods Exploration.” Journal of Counseling Psychology 53(3):279‐87.

24.

Ericsson

K. Anders

Simon

Herbert A.

. 1980. “Verbal Reports as Data.” Psychological Review 87(3):215‐51. doi:10.1037/0033-295X.87.3.215.

25.

Ericsson

K. Anders

Simon

Herbert A.

. 1993. Protocol Analysis: Verbal Reports as Data, 2nd ed. Cambridge, Massachusetts: MIT Press.

26.

Foddy

William.

1998. “An Empirical Evaluation of In-Depth Probes Used to Pretest Survey Questions.” Sociological Methods & Research 27(1):103‐33.

27.

Fowler

Stephanie L.

Willis

Gordon B.

. 2020. “The Practice of Cognitive Interviewing Through Web Probing.” pp. 451‐69 in Advances in Questionnaire Design, Development, Evaluation and Testing, edited by Beatty

Collins

Kaye

Padilla

J. L.

Willis

G. B.

Wilmot

. Hoboken, New Jersey: Wiley.

28.

Fox

Mark C.

Ericsson

K. Anders

Best

Ryan

. 2011. “Do Procedures for Verbal Reporting of Thinking Have to be Reactive? A Meta-Analysis and Recommendations for Best Reporting Methods.” Psychological Bulletin 137(2):316‐44. doi:10.1037/a0021663.

29.

Fuchs

Dieter

Klingemann

Hans-Dieter

. 1989. “Das Links-Rechts-Schema als politischer Code: ein interkultureller Vergleich auf inhaltsanalytischer Grundlage.” pp. 484‐98 in Kultur und Gesellschaft. Verhandlungen des 24. Deutschen Soziologentags, des 11. Österreichischen Soziologentags und des 8. Kongresses der Schweizerischen Gesellschaft für Soziologie in Zürich 1988, edited by Haller

Hoffmann-Nowotny

H.-J.

Zapf

. Frankfurt/Main: Campus-Verlag.

30.

Fuchs

Dieter

Klingemann

Hans-Dieter

. 1990. “The Left-Right Schema.” pp. 203‐34 in De Gruyter Studies on North America, vol. 5, Continuities in Political Action. A Longitudinal Study of Political Orientations in Three Western Democracies, edited by Jennings

M. K.

van Deth

J. W.

. Berlin, Boston: DE GRUYTER.

31.

Galesic

Mirta.

2006. “Dropouts on the Web: Effects of Interest and Burden Experienced During an Online Survey.” Journal of Official Statistics 22(2):313‐28.

32.

Gavras

Konstantin

Höhne

Jan K.

. 2020. “Evaluating Political Parties: Criterion Validity of Open Questions With Requests for Text and Voice Answers.” International Journal of Social Research Methodology 37(3):1‐7. doi:10.1080/13645579.2020.1860279.

33.

Gerlitz

Jean-Yves.

2014. Intergenerationale familiale Verpflichtung. Zusammenstellung sozialwissenschaftlicher Items und Skalen (ZIS) (https://zis.gesis.org/skala/Gerlitz-Intergenerationale-familiale-Verpflichtung). doi: 10.6102/zis212.

34.

GESIS. 2020. Allgemeine Bevölkerungsumfrage der Sozialwissenschaften ALLBUS – Kumulation 1980-2018: ZA5274 Datenfile Version 1.0.0. Cologne: GESIS Data Archive.

35.

GLES. 2019. Pre-election Cross Section (GLES 2017). Cologne: GESIS Data Archive.

36.

Hadler

Patricia.

2021. “Question Order Effects in Cross-Cultural Web Probing – Pretesting Behavior and Attitude Questions.” Social Science Computer Review 39(6):1292-1312. doi:10.1177/0894439321992779.

37.

Heerwegh

Dirk.

2003. “Explaining Response Latencies and Changing Answers Using Client-Side Paradata From a Web Survey.” Social Science Computer Review 21(3):360‐73. doi:10.1177/0894439303253985.

38.

Heerwegh

Dirk.

2011. “Internet Survey Paradata.” pp. 325‐48 in Social and Behavioral Research and the Internet. Advances in Applied Methods and Research Strategies, edited by Das

Ester

Kaczmirek

. Florence: Routledge.

39.

Höhne

Jan K.

Schlosser

Stephan

. 2018. “Investigating the Adequacy of Response Time Outlier Definitions in Computer-Based Web Surveys Using Paradata SurveyFocus.” Social Science Computer Review 36(3):369‐78. doi:10.1177/0894439317710450.

40.

Kaczmirek

Lars

Neubarth

Wolfgang

. 2007. “Nicht-reaktive Datenerhebung: Teilnahmeverhalten bei Befragungen mit Paradaten evaluieren.” pp. 293‐311 in Online-Forschung 2007: Grundlagen und Fallstudien, edited by DGOF (Deutsche Gesellschaft für Online-Forschung) e.V. Köln: Herbert von Halem Verlag.

41.

Kemper

Christoph J.

Beierlein

Constanze

Bensch

Doreen

Kovaleva

Anastassiya

Rammstedt

Beatrice

. 2014. Eine Kurzskala zur Erfassung des Gamma-Faktors sozial erwünschten Antwortverhaltens: Die Kurzskala Soziale Erwünschtheit-Gamma (KSE-G). 2012|25. Köln: GESIS (https://www.gesis.org/fileadmin/kurzskalen/working_papers/KSE_G_Workingpaper.pdf).

42.

Kim

Yujin

Dykema

Jennifer

Stevenson

John

Black

Penny

Moberg

D. Paul

. 2019. “Straightlining. Overview of Measurement, Comparison of Indicators, and Effects in Mail–Web Mixed-Mode Surveys.” Social Science Computer Review 37(2):214‐33. doi:10.1177/0894439317752406.

43.

Knowles

Eric S.

1988. “Item Context Effects on Personality Scales. Measuring Changes the Measure.” Journal of Personality and Social Psychology 55(2):312‐20. doi:10.1037/0022-3514.55.2.312.

44.

Knowles

Eric S.

Coker

Michelle C.

Cook

Deborah A.

Diercks

Steven R.

Irwin

Mary E.

Lundeen

Edward J.

Neville

John W.

Sibicky

Mark E.

. 1992. “Order Effects Within Personality Measures.” pp. 221‐36 in Context Effects in Social and Psychological Research, edited by Schwarz

Sudman

. New York: Springer.

45.

Kritzinger

Sylvia

Johann

David

Aichholzer

Julian

Glinitzer

Konstantin

Glantschnigg

Christian

Thomas

Kathrin

Wagner

Markus

Zeglovits

Eva

. 2014. AUTNES Rolling-Cross-Section Panel Study 2013: ZA5857 Data File Version 2.0.0. Cologne: GESIS Data Archive.

46.

Krosnick

Jon A.

1999. “Survey Research.” Annual Review of Psychology 50:537‐67.

47.

Kunz

Tanja

Beuthner

Christoph

Hadler

Patricia

Roßmann

Joss

Schaurer

Ines

. 2020. Informing About Web Paradata Collection and Use. Mannheim, Germany: GESIS – Leibniz Institute for the Social Sciences.

48.

Kunz

Tanja

Hadler

Patricia

. 2020. Web Paradata in Survey Research. Mannheim, Germany: GESIS – Leibniz Institute for the Social Sciences.

49.

Kunz

Tanja

Quoß

Franziska

Gummer

Tobias

. 2020. “Using Placeholder Text in Narrative Open-Ended Questions in Web Surveys.” Journal of Survey Statistics and Methodology (Online First):1‐21. doi:10.1093/jssam/smaa039.

50.

Legewie

Gerlitz

Jean-Yves

Mühleck

Scheller

Schrenker

. 2007. Dokumentation des International Social Justice Projekt 2006 für Deutschland. Vol. 118. Berlin: Humboldt Universität zu Berlin.

51.

Lenzner

Timo

Kaczmirek

Lars

Lenzner

Alwine

. 2010. “Cognitive Burden of Survey Questions and Response Times: A Psycholinguistic Experiment.” Applied Cognitive Psychology 24(7):1003‐20. doi:10.1002/acp.1602.

52.

Leys

Christophe

Ley

Christophe

Klein

Olivier

Bernard

Philippe

Licata

Laurent

. 2013. “Detecting Outliers: Do Not Use Standard Deviation Around the Mean, Use Absolute Deviation Around the Median.” Journal of Experimental Social Psychology 49(4):764‐66. doi:10.1016/j.jesp.2013.03.013.

53.

Luebker

Malte.

2021. “How Much is a Box? The Hidden Cost of Adding an Open-Ended Probe to an Online Survey.” methods, data, analyses 15(1):7‐42. doi:10.12758/mda.2020.09.

54.

Matjašič

Miha

Vehovar

Vasja

Manfreda

Katja Lozar

. 2018. “Web Survey Paradata on Response Time Outliers: a Systematic Literature Review.” Metodološki Zvezki 15(1):23‐41.

55.

Meitinger

Katharina.

2017. “Necessary but Insufficient: Why Measurement Invariance Tests Need Online Probing as a Complementary Tool.” Public Opinion Quarterly 81(2):447‐72. doi:10.1093/poq/nfx009.

56.

Meitinger

Katharina.

2018. “What Does the General National Pride Item Measure? Insights From Web Probing.” International Journal of Comparative Sociology 59(5-6):428‐50. doi:10.1177/0020715218805793.

57.

Meitinger

Katharina

Behr

Dorothée

. 2016. “Comparing Cognitive Interviewing and Online Probing: Do They Find Similar Results?.” Field Methods 28(4):363‐80. doi:10.1177/1525822X15625866.

58.

Meitinger

Katharina

Braun

Michael

Behr

Dorothée

. 2018. “Sequence Matters in Web Probing: The Impact of the Order of Probes on Response Quality, Motivation of Respondents, and Answer Content.” Survey Research Methods 12(2):103‐20. doi:10.18148/srm/2018.v12i2.7219.

59.

Meitinger

Katharina

Kunz

Tanja

. 2022. “Visual Design and Cognition in List-Style Open-Ended Questions in Web Probing.” Sociological Methods & Research 28(1): Online first. doi:10.1177/00491241221077241.

60.

Miller

Angie L.

Lambert

Amber D.

. 2014. “Open-Ended Survey Questions: Item Nonresponse Nightmare or Qualitative Data Dream?” Survey Practice 7(5):1‐11. doi:10.29115/SP-2014-0024.

61.

Miller

Kristen

Willson

Stephanie

Chepp

Valerie

Padilla

José-Luis

, eds. 2014. Cognitive Interviewing Methodology. Hoboken, NJ, USA: John Wiley & Sons, Inc.

62.

Neuert

Cornelia E.

Meitinger

Katharina

Behr

Dorothée

. 2021. “Open-Ended Versus Closed Probes. Assessing Different Formats of Web Probing.” Sociological Methods & Research 35(2):1‐35. doi:10.1177/00491241211031271.

63.

Neuert

Cornelia E.

Meitinger

Katharina

Behr

Dorothée

Schonlau

Matthias

. 2021. “The Use of Open-Ended Questions in Surveys.” methods, data, analyses (Special Issue) 15(1):3-6.

64.

Nießen

Désirée

Partsch

Melanie V.

Kemper

Christoph J.

Rammstedt

Beatrice

. 2019. “An English-Language Adaptation of the Social Desirability–Gamma Short Scale (KSE-G).” Measurement Instruments for the Social Sciences 1(2):1‐10. doi:10.1186/s42409-018-0005-1.

65.

Paulhus

Delroy L.

2002. “Socially Desirable Responding. The Evolution of a Construct.” pp. 49‐69 in The Role of Constructs in Psychological and Educational Measurement, edited by Braun

H. I.

Jackson

D. N.

Wiley

D. E.

. Mahwah, NJ: Erlbaum.

66.

Peytchev

Andy.

2009. “Survey Breakoff.” Public Opinion Quarterly 73(1):74‐97. doi:10.1093/poq/nfp014.

67.

Prochazka

Fabian.

2020. Vertrauen in Journalismus unter Online-Bedingungen: Zum Einfluss von Personenmerkmalen, Qualitätswahrnehmungen und Nachrichtennutzung. Wiesbaden: Springer VS.

68.

Reja

Ursa

Manfreda

Katja Lozar

Hlebec

Valentina

Vehovar

Vasja

. 2003. “Open-Ended vs. Close-Ended Questions in Web Questionnaires.” Metodološki Zvezki 19:159‐77.

69.

Revilla

Melanie

Couper

Mick P.

. 2018. “Comparing Grids With Vertical and Horizontal Item-by-Item Formats for PCs and Smartphones.” Social Science Computer Review 36(3):349‐68. doi:10.1177/0894439317715626.

70.

Revilla

Melanie

Couper

Mick P.

. 2019. “Improving the Use of Voice Recording in a Smartphone Survey.” Social Science Computer Review 39:(6):1159-78. doi:10.1177/0894439319888708.

71.

Roßmann

Joss

Blumenstiel

Jan E.

Steinbrecher

Markus

. 2015. “Why Do Respondents Break Off Web Surveys and Does it Matter? Results From Four Follow-Up Surveys.” International Journal of Public Opinion Research 27(2):289‐302. doi:10.1093/ijpor/edv030.

72.

Russo

J. Edward

Johnson

Eric J.

Stephens

Debra L.

. 1989. “The Validity of Verbal Protocols.” Memory & Cognition 17(6):759‐69. doi:10.3758/BF03202637.

73.

Scanlon

Paul J.

2019. “The Effects of Embedding Closed-Ended Cognitive Probes in a Web Survey on Survey Response.” Field Methods 31(4):328‐43. doi:10.1177/1525822X19871546.

74.

Scanlon

Paul J.

2020. “Using Targeted Embedded Probes to Quantify Cognitive Interviewing Findings.” pp. 427‐49 in Advances in Questionnaire Design, Development, Evaluation and Testing, edited by Beatty

Collins

Kaye

Padilla

J. L.

Willis

G. B.

Wilmot

. Hoboken, New Jersey: Wiley.

75.

Schmidt

Katharina

Gummer

Tobias

Roßmann

Joss

. 2020. “Effects of Respondent and Survey Characteristics on the Response Quality to an Open-Ended Attitude Question in Web Surveys.” methods, data, analyses 14(1):3‐34. doi:10.12758/mda.2019.05.

76.

Scholz

Evi

Zuell

Cornelia

. 2012. “Item non-Response in Open-Ended Questions: Who Does not Answer on the Meaning of Left and Right?” Social Science Research 41(6):1415‐28. doi:10.1016/j.ssresearch.2012.07.006.

77.

Schonlau

Matthias

Couper

Mick P.

. 2016. “Semi-Automated Categorization of Open-Ended Questions.” Survey Research Methods 10(2):143‐52. doi:10.18148/srm/2016.v10i2.6213.

78.

Schuman

Howard.

1966. “The Random Probe: a Technique for Evaluating the Validity of Closed Questions.” American Sociological Review 32(2):218‐22.

79.

Schuman

Howard

Presser

Stanley

. 1981. Questions and Answers in Attitude Surveys: Experiments on Question Form, Wording, and Context. New York: Academic Press.

80.

Schwarz

Norbert

Hippler

Hans-J.

. 1995. “Subsequent Questions May Influence Answers to Preceding Questions in Mail Surveys.” Public Opinion Quarterly 59(1):93‐7. doi:10.1086/269460.

81.

Schwarz

Norbert

Strack

Fritz

Hippler

Hans-J.

Bishop

George

. 1991. “The Impact of Administration Mode on Response Effects in Survey Measurement.” Applied Cognitive Psychology 5(3):193‐212. doi:10.1002/acp.2350050304.

82.

Schwarz

Norbert

Strack

Fritz

Mai

Hans-Peter

. 1991. “Assimilation and Contrast Effects in Part-Whole Question Sequences: a Conversational Logic Analysis.” Public Opinion Quarterly 55(1):3‐23. https://nbn-resolving.org/urn:nbn:de:0168-ssoar-67184

83.

Schwarz

Norbert

Strack

Fritz

Müller

Gesine

Chassein

Brigitte

. 1988. “The Range of Response Alternatives May Determine the Meaning of the Question. Further Evidence on Informative Functions of Response Alternatives.” Social Cognition 6(2):107‐17. doi:10.1521/soco.1988.6.2.107.

84.

Silber

Henning

Zuell

Cornelia

Kuehnel

Steffen-M.

. 2020. “What Can We Learn From Open Questions in Surveys? A Case Study on Non-Voting Reported in the 2013 German Longitudinal Election Study.” Methodology (European Journal of Research Methods for the Behavioral and Social Sciences) 16(1):41‐58. doi:10.5964/meth.2801.

85.

Singer

Eleanor

Couper

Mick P.

. 2017. “Some Methodological Uses of Responses to Open Questions and Other Verbatim Comments in Quantitative Surveys.” methods, data, analyses 11(2):115‐34. doi:10.12758/mda.2017.01.

86.

Smith

Tom W.

1982. Conditional Order Effects. GSS Technical Report no. 13. Chicago: NORC (https://gss.norc.org/Documents/reports/methodological-reports/MR020.pdf).

87.

Smyth

Jolene D.

Dillman

Don A.

Christian

Leah M.

Mcbride

Mallory

. 2009. “Open-Ended Questions in Web Surveys: Can Increasing the Size of Answer Boxes and Providing Extra Verbal Instructions Improve Response Quality?” Public Opinion Quarterly 73(2):325‐37. doi:10.1093/poq/nfp029.

88.

Sudman

Seymour

Bradburn

Norman M.

Schwarz

Norbert

. 1996. Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco, Calif.: Jossey-Bass.

89.

Tourangeau

Roger

Conrad

Frederick G.

Couper

Mick P.

Cong

. 2014. “The Effects of Providing Examples in Survey Questions.” Public Opinion Quarterly 78(1):100‐25. doi:10.1093/poq/nft083.

90.

Tourangeau

Roger

Rasinski

Kenneth A.

. 1988. “Cognitive Processes Underlying Context Effects in Attitude Measurement.” Psychological Bulletin 103:299‐314.

91.

Tourangeau

Roger

Rips

Lance J.

Rasinski

Kenneth A.

. 2000. The Psychology of Survey Response. Cambridge: Cambridge University Press.

92.

Willis

Gordon B.

1994. “Cognitive Interviewing and Questionnaire Design: a Training Manual.” Cognitive Methods Staff Working Paper Series 7:1‐56.

93.

Willis

Gordon B.

2005. Cognitive Interviewing: A Tool for Improving Questionnaire Design. Thousand Oaks: Sage.

94.

Willis

Gordon B.

2015. Analysis of the Cognitive Interview in Questionnaire Design. Oxford: Oxford University Press.

95.

Willis

Gordon B.

Schechter

Susan

Whitaker

Karen

. 1999. “A Comparison of Cognitive Interviewing, Expert Review, and Behavior Coding: What Do They Tell Us?” (http://www.asasrms.org/Proceedings/papers/1999_006.pdf).

96.

Yan

Ting

Olson

Kristen

. 2013. “Analyzing Paradata to Investigate Measurement Error.” Pp. 73‐95 in Improving Surveys with Paradata. Analytic Uses of Process Information, edited by Kreuter

. Hoboken, New Jersey: John Wiley & Sons, Inc.

97.

Yan

Ting

Tourangeau

Roger

. 2008. “Fast Times and Easy Questions. The Effects of Age, Experience and Question Complexity on Web Survey Response Times.” Applied Cognitive Psychology 22(1):51‐68. doi:10.1002/acp.1331.

98.

Zuell

Cornelia

Menold

Natalja

Körber

Sabine

. 2015. “The Influence of the Answer Box Size on Item Nonresponse to Open-Ended Questions in a Web Survey.” Social Science Computer Review 33(1):115‐22. doi:10.1177/0894439314528091.

99.

Zuell

Cornelia

Scholz

Evi

. 2015. “Who is Willing to Answer Open-Ended Questions on the Meaning of Left and Right?” Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 127(1):26‐42. doi:10.1177/0759106315582199.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.05 MB