Sage Journals: Discover world-class research

Abstract

The popularity of self-report surveys brings to light the issue of insufficient effort responding (IER), where participants lack motivation to respond thoughtfully, thereby compromising data quality. A prevalent form of IER, known as straightlining, occurs when participants consistently select the same response for multiple items, leading to potentially biased outcomes. Typically, researchers apply a uniform threshold criterion across all response options (e.g., nine consecutive identical answers in any option are considered problematic). Considering the diverse or near-normal distribution patterns in questionnaire responses, employing varied thresholds for different options may be more appropriate. This study utilized both simulated data and real, large-scale international survey data to examine the likelihood of straightlining as IER across various options. The findings revealed that legitimate straightlining (non-IER) was more common at the extremes than in the middle options. Moreover, the study highlights the significant risks of incorrectly categorizing non-IER straightlining as IER without corroborating evidence. In conclusion, our results question the conventional approaches to eliminating straightlining, including (a) the application of a uniform cutoff value across different options or adopting stricter criteria for middle options and (b) the direct removal of straightlining as IER without additional verification.

Keywords

insufficient effort responding careless responding satisficing straightlining long-string

1. Introduction

Insufficient effort responding (IER) poses a significant threat to the data quality in Likert-scales (Arias et al. 2020; Meade and Craig 2012). Straightlining, a common manifestation of IER in Likert scales, occurs when participants select uniform responses for consecutive items to conserve cognitive effort, regardless of the content and direction of the items (Arias et al. 2020; Vriesema and Gehlbach 2021). Although many researchers have emphasized the necessity of identifying and removing straightlining responses before conducting proper analyses (e.g., Dunn et al. 2018; Kim et al. 2018; Meade and Craig 2012), there is few guidance on how to address straightlining across different options (e.g., whether nine consecutive “4”s are as problematic as nine consecutive “3”s). Therefore, the present research examined theoretical distributions under various hypothetical conditions to recommend strategies for establishing appropriate cutoff values for straightlining across different options.

1.1. Insufficient Effort Responding and Straightlining

Likert scales are commonly used to collect data in psychological, educational, or other social science research (Bucher and Sand 2022; Houtenville et al. 2021). However, as the surveys have no direct consequences on the respondents, some of them may answer irresponsibly, in an IER manner. They may not spend effort answering due to inattentiveness, exhaustion, or speeding (Curran 2016; Hong et al. 2020; Huang et al. 2015; Tawa 2021). IER is close to the concept of “satisficing,” which refers to suboptimal decision-making strategies to conserve respondents’ cognitive energy (Simon 1957). When participants do not have the motivation to respond carefully, they may just try to provide a response rather than respond to the specific content of the items (Krosnick 1991). As one common kind of IER or satisficing, straightlining helps respondents conserve their mental energy by endorsing the status quo by applying one satisfactory answer to different items consecutively in one scale (Yan 2008). This issue is exacerbated when respondents quickly answer a series of questions on a single scale, as they are more likely to overlook items with reversed phrasing or varied content (Krosnick 1991).

The problem of straightlining is especially prevalent for questions in grid format, with the design of all items sharing the same responding options in the form of a matrix (Debell et al. 2021; Mavletova et al. 2018; Roßmann et al. 2018). Items in the grid format can have a carryover effect, in which participants likely provide similar responses when they notice similarities across items (Dillman et al. 2014).

Identification and removal of straightlining responses are essential, without which straightlining will have a detrimental impact on estimates of psychometric properties, such as inter-item correlation, alpha, and the component structure (Arias et al. 2020; DeSimone et al. 2018; Yan 2008). DeSimone et al. (2018) showed that straightlining had a more pronounced impact on data quality than random responding (another IER type).

1.2. How Many Consecutive Identical Responses Should be Considered as IER?

As a direct and useful indicator of IER, straightlining behavior is used widely to detect potential IER (e.g., Huang et al. 2012; Meade and Craig 2012; Vriesema and Gehlbach 2021; Zhang and Conrad 2014). In practice, straightlining participants are flagged as IER when they tick identical responses consecutively over a preset cutoff value (Costa and McCrae 2008; Dunn et al. 2018; Hong et al. 2020; Meade and Craig 2012; Steedle et al. 2019). However, there is no consensus on whether one single cutoff value for all options or different values should be used for different options.

Some researchers conveniently set one cutoff value for straightlining responses without considering the position of options (DeSimone et al. 2018; Dunn et al. 2018; Meade and Craig 2012; Vriesema and Gehlbach 2021). That is, these researchers assumed respondents were equally likely to choose consecutively in different options, whereas others thought their likelihood varied across options. Thus, supporting the latter belief, Costa and McCrae (2008) flagged the participants as IER when they had at least six “strongly disagrees,” nine “disagrees,” ten “neither agree nor disagrees,” fourteen “agrees,” or nine “strongly agrees” in the Minnesota Multiphasic Personality Inventory (MMPI) scale. Johnson (2005) proposed a scree-like test method (cutoff at the sudden drop in the scree plot of the longest response string frequencies) and selected 9, 9, 8, 11, and 9 as cutoff values for Options 1 to 5, respectively, in his study. Subsequent research also adopted this “scree-like” strategy to determine the cutoff values for each option (Hong et al. 2020; Huang et al. 2012). Nevertheless, none of these researchers explained why different cutoff values were chosen for different options, nor did they provide theoretical or empirical justifications for their choices.

In sum, there is no unanimous view about whether we should have different cutoff values for straightlining for different options. Theoretical rationales for doing or not doing so have not been well-researched.

1.3. IER Likelihood in Straightlining

The assumption that researchers can eliminate straightlining IER using uniform cutoff values across various options rests on the belief that a specific option neither increases nor decreases the likelihood of straightlining. However, this assumption should not be accepted until two related research questions are addressed: (a) If it is assumed that most items approximate a normal distribution with more frequent responses in the middle than at the extremes, should we consider a longer straightlining IER cutoff for the central options? (b) Is the incidence of straightlining as IER equally probable across all options in actual data?

A significant challenge in applying a single cutoff value for different options when detecting straightlining is the participants’ tendency to prefer specific options, which deviates from a random selection of each option with equal probability. Indeed, Schulz et al. (2012) observed participants adopting specific patterns even when asked to choose numbers randomly. Additionally, Keusch and Yang (2018) found a propensity for respondents to select options on the lower end of a scale. Consequently, if participants naturally show longer sequences of the same response for particular options, it is crucial to recalibrate the straightlining cutoff for those options. Analogously, if non-IER participants tend to agree more often than disagree, consecutive agreements should be expected to occur more frequently than disagreements, suggesting that the cutoff for identifying agreeing IERs should be higher than for disagreeing.

Moreover, as straightlining encompasses both IER and non-IER responses (Reuning and Plutzer 2020), it is vital to evaluate the frequency of IER and non-IER in straightlining for each option. If straightlining for a particular option is primarily due to IER, it may be feasible to eliminate such patterns with minimal risk of removing valid responses. Conversely, if straightlining predominantly represents valid responses, treating them as IER brings a significant risk of discarding valid responses. Therefore, addressing straightlining necessitates a customized approach, with flagging criteria individually tailored for each option.

1.4. Differences due to Position of Options: Middle Versus Two Ends

The tendency to exhibit a centrality bias in responding to questionnaires is expected, where participants are more inclined to select middle choices and less likely to choose the extremes. Previous research by Shaw et al. (2000) has confirmed that individuals tend to favor middle options over extreme ones when faced with a series of similar items. In our study, we explored the influence of the location of response options—whether situated in the middle or at the two ends of the scale—on the propensity of respondents to engage in straightlining behavior. Considering that the orientation of options on Likert scales may vary, ascending or descending, we simplified our approach for clarity and assumed a symmetrical distribution; thus, we only compared the end options with those in the middle. In the context of this study, “middle options” are defined as those choices that do not occupy the outermost positions on the scale. For example, on a 4-point scale (1–4), the middle options would be 2 and 3; on a 7-point scale (1–7), the middle options would range from 2 to 6.

1.5. Mathematical Formulation for Straightlining in Valid Responses (Non-IER)

Referring to related research (Reuning and Plutzer 2020), mathematically, the $Respons e_{ij}$ of respondent i to question j can be expressed as:

R e s p o n s e_{i j} = K i f c u t o f f_{K - 1} < λ_{j} η_{i} \leq c u t o f f_{K},

(1)

where K is the chosen number from the range of response options for question j (e.g., 1, 2, 3, or 4 in a 4-point scale), $η_{i}$ is the respondent i’s true score on the trait of interest (latent variable), $λ_{j}$ is the factor loading of item j on the latent variable and it represents how well item j reflects a latent variable, $cutof f_{K - 1}$ and $cutof f_{K}$ are two maximum values of the latent variable for Option K−1 and Option K. The cutoff value for each option can be manifested by the frequency of choosing each option.

The formulation of a straightlining behavior from Item j to Item j + n can be expressed mathematically like this:

\begin{matrix} Respons e_{j} = [\begin{matrix} Respons e_{1 j} \\ Respons e_{2 j} \\ \begin{matrix} ⋮ \\ Respons e_{ij} \end{matrix} \end{matrix}] \\ Respons e_{j} = Respons e_{j + 1} = Respons e_{j + 2} = \dots = Respons e_{j + n} \end{matrix}

(2)

The $Respons e_{j}$ ranges from the lowest point to the largest point. According to the first formulation, when the cutoff values are fixed, the response on Item j depends on factor loading $λ$ and the respondent’s true score $η$ . As straightlining is a behavior for a person with one latent variable $η$ , only factor loading $λ$ for each item affects the incidence of straightlining in valid responses. If the factor loadings are 1 for all items (i.e., correlations among items are 1), an effortful respondent is supposed to keep choosing one option from the beginning to the end, and the frequency of straightlining across each option depends on the distribution of options. In practice, factor loadings in reliable surveys are high and close to 1 (e.g., factor loading for two items is .7, and the correlation between two items is .49), under which the choice of this item influences the choice for the next item due to their strong relationship. However, during this process, the options at two ends are limited by the ceiling and floor effects, which differ from the middle options. The possibility of choosing an identical answer for two adjacent items may vary between the middle and extreme options. This, therefore, may lead to a tendency to provide a series of identical responses with different lengths across options.

1.6. The Present Research

In the context of Likert scales, no guidelines address the detection of straightlining with respect to the position of options. The present research examined whether the position of the option (extreme ends vs. middle options) influenced the incidences of straightlining and, consequently, how to establish criteria for flagging straightlining across various options. To this end, we conducted a series of three studies.

Study 1 calculated and compared the incidences of straightlining for each option, using simulating non-IER data. It provided a controlled environment to identify natural straightlining incidences across different options without the influence of IER.

Study 2 assessed the effectiveness of applying a uniform cutoff value for identifying straightlining without regard to option positions. During this stage, IER was deliberately added to the simulated non-IER responses, allowing investigation into the risk of misidentifying straightlining for different options.

Study 3 sought to corroborate the simulated findings by examining real survey data. Specifically, PISA data was analyzed to assess the prevalence of IER in straightlining for each response option, aiming to strengthen the conclusions drawn from the first two simulations.

The findings from these studies could provide recommendations for researchers on the application of cutoff values for straightlining, taking into account the potential impact of the option’s position on the scale.

2. Study 1: Estimating Valid Straightlining (Non-IER) at Different Options

Often neglected by most researchers, despite the chance being very small, non-IER participants may also coincidently select a consecutive long string of identical choices. To evaluate such an assumption and phenomenon, we calculated the frequency of straightlining within valid responses (non-IER) to establish baseline incidences for each option. This study assessed the incidences of straightlining among different options using simulated valid responses and investigated how this incidence varied with different attributes of the scales.

2.1. Design of the Simulation

In the simulation, each responding choice can be influenced by the factor loading $λ$ , the distribution of the trues score $η$ , and the cutoff for each option (see Equation (1)). In the simulation, the true score $η$ for each respondent was derived from a standardized normal distribution, and responses were generated based on three factors: item loadings $λ$ , frequency distributions of options, and the number of options. We set two levels for each of these three factors (common vs. contrast). The “common” level employed parameters modeled after those from a large-scale international survey to exemplify a typical scenario encountered in many situations. The “contrast” level involved distinct values to explore the impact of the three manipulated factors. For the “common” level, specifically, we based our model on the characteristics of the Programme for International Student Assessment (PISA) 2018. PISA evaluated fifteen-year-old students’ cognitive abilities (in Reading, Mathematics, and Science) and various attitudes and learning-related constructs using questionnaires across seventy-nine economies (OECD 2019). The PISA 2018 consisted of twenty-three indexed scales, featured items with high loadings (average .72, which is 60% above the .70) and offered four response options, ascending or descending in value. Considering that PISA predominantly uses 4-point scales, and given that our preliminary simulation results showed consistent results across 4- and 5-point scales, for simplicity, we reported the 4-point scale results in this study.

It was posited that the frequency distribution of these options approximated a normal curve, with middle options such as “agree” being more commonly selected than extremes like “completely agree.” Therefore, the “common” level was defined with the following characteristics: item loading of .7, four response options in ascending order (where larger values indicate a stronger trait), and a normal distribution pattern for the options. To explore the effects of these three manipulated factors, we set the “contrast” level at: an item loading of .4 (compared to .7 at the “common” level), a 7-point scale (in contrast to the “common” 4-point scale), and a uniform distribution (in contrast to the normal distribution at the “common” level), resulting in a 2 × 2 × 2 factorial design.

Specifically, for 4-point scales, the normal distribution yielded frequencies of 15%, 35%, 35%, and 15% for Options 1 through 4, respectively, whereas the uniform distribution produced 25% for each of the four options. For 7-point scales, the normal distribution resulted in frequencies of 5%, 10%, 20%, 30%, 20%, 10%, and 5% for Options 1 through 7, respectively, while the uniform distribution yielded approximately 14% for each of the seven options. We simulated responses on a nine-item scale for each of the eight conditions (2 × 2 × 2), generating 100,000 datasets for each scenario.

We computed and compared the frequency of straightlining for each option in the eight conditions. Specifically, for each option, the maximum number of consecutively chosen identical option (ranging from 0 to 9, e.g., seven consecutive Option 1) was counted for every respondent. Then, we counted the respondents who had straightlining with a specific length (0–9) for each option. For instance, for the straightlining with eight consecutive responses on the first option (responding with “1, 1, 1, 1, 1, 1, 1, 1”), we counted the number of participants who chose eight consecutive “1” (excluding nine “1”). For eight types of scales, each option’s straightlining with a length from 1 to 9 is shown in Tables A1 and A2.

Our interest was to identify instances of IER straightlining; therefore, we focused on the longer patterns of straightlining involving nine consecutive identical responses (Figure 1). By comparing the likelihood of straightlining across options among non-IER participants, we could establish a baseline of straightlining for each option in making recommendations for IER deletion.

Figure 1.

Number and proportion of straightlining at different options for 4-point scale (top) and 7-point scale (bottom).

2.2. Results

In most situations, the longest straightlining (selecting an identical option on nine questions) tended to happen more frequently at the two ends rather than in the middle. Under most situations, more straightlining was found at the first and last options than at the middle options, manifested by the longer bars on two-ended options (Figure 1, corresponding to the lightest subsections and the darkest subsections in the Likert scale). As all our data were non-IER, if all the longest straightlining were to be treated as IER, then more valid straightlining on the extreme options would be wrongly flagged as IER than in the middle options.

In the condition with low item loadings (.4) and normal frequency distribution, we observed higher incidences of straightlining on the middle options compared to the ends. However, the total number of straightlining for all options was negligible. For example, only 65 out of 100,000 respondents showed straightlining behavior in the 4-point Likert scale, and only 3 in the 7-point Likert scale. Thus, basically, straightlining in non-IER data tended to be more prevalent at the ends than the middle options.

The comparison of the two kinds of distribution (normal vs. uniform, Figure 1) showed that the uniform distribution produced more straightlining at the two ends and less at the middle (Figure 1). The bars in Figure 1 containing “uniform distribution” are always dominated by the lightest and the darkest color (options at the two ends). This reflected comparatively more responses at the two ends in the uniform distribution than in the normal distribution.

Notably, when the number of options increased from 4 to 7 (Figure 1, the graph at the top vs. the graph at the bottom), or item loading decreased from .7 to .4, the total number of straightlining decreased. This was because identical answers would reduce when there were more choices or items were less correlated with each other. Thus, fewer options (4-point) or higher item loading (.7) led to more straightlining in non-IER responses.

3. Study 2: Consequences of Using the Uniform Cutoff Values for Straightlining in Different Options

Study 2 examined the potential risks of applying a uniform cutoff value for straightlining across different options. Specifically, we introduced instances of straightlining that were actually IER into the pool of valid responses and assessed the rate of erroneous removal (non-IER straightlining) at each option.

3.1. Design of Simulation

By adding IER straightlining responses to non-IER responses, we examined the consequences of correct and wrong removal of potential IER data when all the longest straightlining responses were removed as IER. We used a typical situation with a 4-point scale, .7 item loading, and normally distributed responses to show the consequences of using the same cutoff value.

Three kinds of straightlining data were simulated as real IER and then added into non-IER responses respectively, including (a) straightlining with a normal distribution (15%, 35%, 35%, and 15% for Options 1–4, respectively), (b) straightlining with uniform distribution (25% for each of options), (c) straightlining with a close-to-U-shape distribution (e.g., 42%, 8%, 8%, 42%), which may be close to the empirical distribution detected in PISA data. Two levels of IER proportions (3% and 15%) which consisted of (a) an equal number of straightlining responses in seven, eight, and nine items (e.g., 1% responses showed identical answers for seven, eight, and nine consecutive items, respectively, 1% × 3), or (b) only straightlining on nine items (e.g., 3%), were added to non-IER responses, respectively. The purpose was to examine the consequences of our IER removal strategies under different conditions. Totally, 3 IER Distribution × 2 IER Proportions × 2 Length of Straightlining (7–9 and only 9) of IER were added to a non-IER data set.

Utilizing the established IER status, we assessed the precision of applying a uniform cutoff value (where straightlining IER was equal to or greater than 7 or 9 for the two respective levels of straightlining length). Precision referred to the proportion of the real IER in the sample flagged as IER; the larger, the better.

3.2. Results

Simulation results revealed the drawbacks and significant consequences of indiscriminately applying the same cutoff value to eliminate straightlining responses across four options. Effective elimination is characterized by high precision rates, which imply fewer incorrect removals of non-IER responses and more accurate removals of IER responses. The findings indicated that, under most scenarios (Table 1), options at the two extremes exhibited a lower precision rate in detecting straightlining IER than the middle options when using the uniform cutoff value. For example, in the scenario with a normal distribution of IER and an added 3% of nine straightlining as IER, the precision rate was only 36% to 37% for the extreme options, significantly less than the 88% for the middle options. Consequently, utilizing the same cutoff value across options led to a higher risk of incorrect non-IER removal at the extremes compared to the middle. The issue of incorrect removal (low precision) became relatively widespread when the IER proportion increased from 3% to 15%. As the percentage of IER in the dataset increased (comparing 3% with 15% IER, as shown in Table 1), precision rates improved for all options. Furthermore, introducing shorter sequences of straightlining, such as selecting seven and eight identical consecutive responses (as noted in the columns “straightlining 7, 8, 9” in Table 1), would reduce precision rates for all options. This implies that including short straightlining IER can lead to a decrease in precision when identifying straightlining behaviors. This general observation was altered only when the IER straightlining exhibited a U-shaped distribution (Table 1). Under these conditions, the precision rate was similar or even higher for straightlining at the extremes compared to the middle option.

Table 1.

Precision Rates in Using the Same Cutoff Values for Options at Two Ends and at the Middle.

	Straightlining 9		Straightlining 7, 8, 9
Simulations	N Flag	Precision (%)	N Flag	Precision (%)
3% IER (N = 3,000)
Normal distribution
Option 1	1,246	36.12	1,858	24.22
Option 2	1,199	87.57	1,998	52.55
Option 3	1,188	88.38	1,990	52.76
Option 4	1,214	37.07	1,957	22.99
Uniform distribution
Option 1	1,546	48.51	2,158	34.75
Option 2	899	83.43	1,698	44.17
Option 3	888	84.46	1,690	44.38
Option 4	1,514	49.54	2,257	33.23
U-Shape distribution
Option 1	2,056	61.28	2,668	47.23
Option 2	389	61.70	1,188	20.20
Option 3	378	63.49	1,180	20.34
Option 4	2,024	62.25	2,767	45.54
15% IER (N = 15,000)
Normal distribution
Option 1	2,941	76.50	3,485	64.56
Option 2	5,385	97.49	6,087	86.25
Option 3	5,366	97.84	6,076	86.41
Option 4	2,929	76.82	3,578	62.88
Uniform distribution
Option 1	4,441	84.44	4,985	75.23
Option 2	3,885	96.53	4,587	81.75
Option 3	3,866	97.00	4,576	81.95
Option 4	4,429	84.67	5,078	73.85
U-Shape distribution
Option 1	6,991	90.12	7,535	83.61
Option 2	1,335	89.89	2,037	58.91
Option 3	1,316	91.19	2,026	59.23
Option 4	6,979	90.27	7,628	82.59

Note. N Flag = number of participants flagged as IER; Options 1, 4 = options at the two ends; Options 2, 3 = options in the middle.

In summary, the simulations demonstrated that the proportion of IER, the frequency distribution of IER straightlining, and the length of straightlining could all affect the precision rates for each option when identifying suspect straightlining responses. To examine potential factors possibly influencing IER, it would be beneficial to validate and support these findings with empirical data, which was the objective of our Study 3.

4. Study 3: IER Likelihood in Straightlining Across Options in Empirical Data

Although the above two simulation studies showed the baseline of straightlining across options and the potential problems of using the uniform cutoff value to detect IER straightlining, examining and interpreting these issues using authentic empirical data is necessary. Study 3 explored how IER straightlining likelihood varied across options in the large-scale international survey data PISA 2015. The 2015 cycle was chosen because of the availability of relatively long scales.

4.1. Method

In the PISA 2015 assessment, five 4-point Likert scales, each comprising a minimum of eight items, were employed to examine the association between the likelihood of IER and straightlining across options (OECD 2017). Further specifics of the scales under consideration are provided in Table A3. The five scales included:

inquiry-based science teaching and learning practices (ST098; e.g., “Students are given opportunities to explain their ideas”),

family property (ST012; e.g., “How many televisions at your home”),

students’ dispositions for collaborative problem solving (ST082; e.g., “I am a good listener”),

science self-efficacy (ST129; e.g., “Recognize the science question that underlies a newspaper report on a health issue”), and

science activities (ST146; e.g., “How often do you do these things? Watch TV programs about <broad science>”).

Totally, responses from 5,713 students in the USA were used. Inattentive items were widely used to label IER in real data (Arias et al. 2020; Kim et al. 2018; Nichols and Edlund 2020). Following earlier research that showed responding time was correlated with IER straightlining strongly (Zhang and Conrad 2014), we classified IER (coded 1) and non-IER (coded 0) according to individuals’ responding time and their answers on the attentive item ST121 in the PISA student questionnaire. Students were considered as IER in responding when (a) their average responding time of the target scale was one second or less; or (b) they provided incorrect answers to ST121 (e.g., participants were told “<NAME 1> gives up easily when confronted with a problem and is often not prepared for his classes”; participants would be considered IER (attentive) if they “totally agreed” or “agreed” in rating that “<Name 1> is motivated”). Then we counted the number of IER and the number of non-IER among the respondents using the longest straightlining for each option. Admittedly, students with short response time were not necessarily all straightliners (they could be IER of another type), and straightliners’ response time was not necessarily all unreasonably short, we postulated that there would be some detectable relation for us to triangulate the two constructs.

4.2. Results

Comparing respondents classified by their answers on ST121, we found that those identified as potentially exhibiting IER were more prone to give nonsensical answers or to skip open-ended questions than their non-flagged counterparts. This difference was statistically significant (t = 8.752, p < .001), indicating that respondents who answered ST121 incorrectly were less motivated to complete the questionnaire thoroughly. Then, based on the flagging derived from ST121 and response time, the data revealed that the proportions of IER differed across each option’s straightlining for the five scales. For instance, straightliners on the second option of ST012 were 100% IER, whereas the IER for the last option (on the right end) was only 15.38% (Table 2). Therefore, a cutoff value of 8 might be effective for eliminating straightlining responses for the second option but less appropriate for the fourth option. In essence, students marking “4, 4, 4, 4, 4, 4, 4, 4” on ST012 should not be assumed to have the same likelihood of exhibiting IER as those who selected “2, 2, 2, 2, 2, 2, 2” on this scale. This is due to the fact that even attentive respondents might consistently choose the last option on this scale.

Table 2.

Proportions of IER Straightlining for Each Option.

Item	All Flag	IER	IER proportion (%)
ST098
Option 1	231	35	15.15
Option 2	162	34	20.99
Option 3	130	24	18.46
Option 4	45	10	22.22
ST012
Option 1	7	2	28.57
Option 2	2	2	100.00
Option 3	0	0	—
Option 4	39	6	15.38
ST082
Option 1	8	5	62.50
Option 2	9	5	55.56
Option 3	807	75	9.29
Option 4	180	22	12.22
ST129
Option 1	349	22	6.30
Option 2	611	58	9.49
Option 3	201	40	19.90
Option 4	96	21	21.88
ST146
Option 1	62	13	20.97
Option 2	66	20	30.30
Option 3	301	58	19.27
Option 4	1,159	69	5.95

Unexpectedly, IER proportions in straightlining were small for most scales. Other than ST012 and ST082, the scales less than 31% IER straightlining. In ST146, by classifying the longest straightlining across all options as IER, only 160 participants would be correctly identified and removed, amounting to 10% of the 1,588 participants flagged for straightlining (summed across options in Table 2).

5. Discussion

5.1. Inappropriateness of Using a Uniform Cutoff Value for Straightlining Across Options

Our findings raise important questions about the common practice of applying a uniform cutoff value for detecting straightlining as IER. The data from this study suggest that straightlining is not uniformly distributed across all response options. This pattern was corroborated by the empirical data from PISA, indicating that such behavior is not a random occurrence but is related to the position of the response options. For example, in scale ST012, a sequence of extreme option “1, 1, 1, 1, 1, 1, 1, 1” was less likely to indicate IER, with a precision rate of 28.57%. In contrast, a sequence of the middle option “2, 2, 2, 2, 2, 2, 2, 2” had a 100% chance of representing genuine IER. The use of an identical cutoff value disregards the differential risk associated with removing valid versus invalid straightlining responses based on option position.

The implication is clear: a one-size-fits-all approach in removing the same percentage of IER from all options to cleaning data from straightlining may lead to the erroneous exclusion of valid data, potentially skewing the results of a study. Researchers must consider the position of options when establishing cutoff values for straightlining, ensuring they account for the inherent differences in generating straightlining. This could involve developing scale position-specific cutoff values or adopting more sophisticated statistical techniques to differentiate between potential IER and non-IER. By doing so, we can improve the accuracy of data-cleaning processes and enhance the validity of research findings.

5.2. Against Common Sense—Valid Straightlining More Often at Two-End Options

Contrary to the conventional practice of setting a strict cutoff value for detecting straightlining in middle options (based on the assumption that valid straightlining is more prevalent there), a simulation study using non-IER data demonstrated otherwise. The study found that “legitimate” straightlining occurred more frequently at the two-end options than at the middle option, even with respondents selecting each option with equal probability on the scale (uniform distribution). This was more obvious for scales with higher item loadings and fewer options.

Although some researchers realize the necessity of using different cutoff values to detect straightlining for different options, they usually select stricter thresholds (i.e., more consecutive same answers to be considered IER) for middle options than those at the two ends in applied empirical research. For instance, Johnson (2005) proposed a scree-like test to determine the cutoff value for straightlining across options and picked 6, 9, 10, 14, and 9 for each response category from “very inaccurate” to “very accurate.” Similarly, Costa and McCrae (2008) adopted for their Neo personality inventory-revised (NEO PI-R) thresholds of: 6, 9, 10, 14, and 9 for five points, respectively, while Huang et al. (2012) used 7, 7, 12, 10, and 8 for this scale. All of them had stricter cutoff values (a larger value for the same consecutive answers as IER) for the middle option than those for the two-end options, which was in the opposite direction to results in our simulation results (i.e., straightlining more frequently at two ends).

Our simulated data showed more frequent straightlining at the two-end options and less frequent at the middle options, suggesting severe wrong removal of non-IER data (for some extreme straightlining) in these studies. Results with the PISA data also suggested more frequent removal of non-IER straightlining responses at the two ends for most of the scales (except ST082 and ST129, Table A3).

The above results, contradictory to researchers’ expectations, are likely due to high item loadings in the questionnaire scale. A good-construct scale has large item loadings (e.g., .7), reflecting the high inter-correlations among the items in the scale. Careful non-IER respondents are supposed to provide consistent answers across various items. Thus, on a 4-point scale, consistent respondents who choose the first option (“1”) at the beginning of the scale will likely continue to choose the first option or the option near it. That is, there is a strong likelihood of having “1” or “2” in the subsequent item for consistent respondents starting with “1.” In contrast, consistent respondents starting with the middle option “2” may choose “1,”“2,” and “3” in the subsequent item. Thus, even though the frequency of selecting options at the two ends is lower than that at the middle option, straightlining at the two ends can happen more frequently than middle straightlining. Notably, our study examined scales with relatively few points (e.g., 4 points), so the two ends started with reasonably large percentages compared to a scale with more points (e.g., 7 points). Thus, there is a greater likelihood of observing more relatively long straightlining at the two ends on a 4-point scale than on a 7-point scale (Figure 1).

Despite the method being used in many studies, we do not recommend the scree-like test method to determine the cutoff values for straightlining across options (e.g., Hong et al. 2020; Huang et al. 2012; Steedle et al. 2019). The scree-like test assumes the change in the frequency of straightlining reflects the change in IER seriousness. However, our simulation results show that the characteristics of scales (number of options, item loadings, and the distribution of the options) can induce more non-IER straightlining responses, especially for options at two ends. Thus, it is uncertain whether the sudden change in straightlining is caused by IER straightlining or non-IER straightlining, rendering the scree-like test dubious.

5.3. Problems in Wrong Removal of Straightlining

Although IER can be manifested as straightlining, not all straightlinings are IER. Previous research through Monte Carlo simulations showed that this is a typical fallacy of falsely affirming the consequent and non-IER straightlining would increase when the quality of survey questions improved (increasing factor loadings of items; Reuning and Plutzer 2020). Consistent with the results of the simulated data, the PISA data showed a different trend across options. PISA data showed that the straightlining percentages differed across the four options with less real IER in the longest straightlining. For ST012 and ST082, deleting the longest straightlining for some options (e.g., the second option for ST012, and the first option for ST082) was acceptable. However, for ST098, ST129, and ST146, the longest straightlining for each option contained a large number of non-IER straightlining. At least 70% of the longest straightlining may be non-IER straightlining for these scales, alerting us to the seriousness of wrong removal of non-IER data when using the strictest cutoff value for straightlining in some scales.

Some careful non-IER respondents may still provide straightlining responses on some scales, and we cannot assume all straightliners to be IER. Our simulation studies showed that under some conditions (e.g., high item loadings and fewer options), there is more non-IER straightlining. Just imagine the extreme situation for a scale with the highest item loading “1” and only two options for each item. It is likely that respondents will tick identical options consecutively. It would lead to more frequent straightlining because the contents among items are similar, and there are only two options. Thus, for the scales with high item loadings and fewer options, straightlining responses should not be taken for granted to be IER without other converging evidence.

5.4. Recommendations and Limitations

Based on our research findings, we propose three practical recommendations for survey researchers:

Utilize variable cutoff values for straightlining based on the choice options. Distinct response options warrant different straightlining cutoff values, with more stringent criteria for extreme options where non-IER straightlining is more likely to occur.

We cannot assume all straightliners to be IER, particularly on scales with high positive item loadings and few response options (e.g., dichotomous scales). Straightlining in these contexts may not necessarily be all IER. Conversely, on scales with low item loadings or scales with negatively worded items where item loadings are high in the negative direction, and with a broader range of response options (e.g., 7-point scales), straightlining is more indicative of IER.

Enhance the correct detection of IER by considering a comprehensive set of indicators in addition to the straightlining criterion. This multifaceted approach aids in distinguishing genuine from spurious response patterns. These indicators may include:

Response Time: Monitor abnormally rapid response time which may imply a lack of thoughtful consideration. Bowling et al. (2021) suggested that items answered in less than two seconds on one page might be flagged for potential IER.

Infrequent Items: These are items typically endorsed at low rates, such as “I have been to every country in the world” (Meade and Craig 2012). Unusually, high agreement with these items is indicative of IER.

Instructed Response Items: Insert specific items requiring respondents to follow instructions to confirm their attentiveness. For example, “Please answer ‘Somewhat disagree’ for this question” (Nichols and Edlund 2020) can check for respondent’s compliance and attention.

Self-report Engagement: Deploy single-item self-assessments at the end of the survey to gauge the respondent’s effort and attention, such as “I put forth ____ effort toward this study” or “In your honest opinion, should we use your data in our analyses for this study?” (Meade and Craig 2012). These self-reports can provide valuable insight on respondent’s engagement with the study.

The findings from this research added valuable insights into the phenomenon of straightlining in survey research, yet they come with limitations. The first limitation is that the IER detected in Study 3 may represent only a subset of the total IER present in the dataset. Given that no method can guarantee the detection of all types of IER, some IER responses may inevitably slip through unnoticed. In Study 3, IER designation was based on responses to an attention-check item and the analysis of response times. While this method is informative, it might not encompass all possible types of IER. As a result, our approach could miss certain instances of IER, potentially underestimating the prevalence and impact of straightlining associated with IER in the dataset. To address these limitations, future research should incorporate a more extensive set of IER indicators, which would allow for a more comprehensive assessment and deeper understanding of the straightlining behavior in survey responses.

The second limitation of this study is the lack of concrete guidance on establishing cutoff values for straightlining across different response options. Although the research sheds light on how the positioning of options can influence straightlining, it stops short of providing definitive cutoff points for each option. The research indicates that the determination of straightlining cutoffs should be context-specific, reflecting the range of response options and the nuances of the survey context, which renders the establishment of universal cutoff values for straightlining infeasible. Moreover, the study did not consider the potential effects of survey design elements such as question order randomization and the inclusion of both positive and negative item phrasings. These factors are known to influence both the manifestation of straightlining and its detection (Robie et al. 2022; Vriesema and Gehlbach 2021). Therefore, to effectively address and mitigate straightlining, future research should integrate these critical survey design aspects.

At last, further investigation is needed to understand how exclusion based on straightlining affects the representativeness of survey data and the implications for data quality, including validity and reliability, across demographic groups. By refining the criteria for detecting straightlining and examining its impact on survey outcomes, researchers can enhance the utility and accuracy of survey data in capturing the nuances of human responses.

Footnotes

Appendix

Table A3.

Target Scales Analyzed from PISA 2015 in Study 3.

Item	Content
ST012Q01TA	How many in your home: Televisions
ST012Q02TA	How many in your home: Cars
ST012Q03TA	How many in your home: Rooms with a bath or shower
ST012Q05NA	How many in your home: <Cell phones> with Internet access (e.g., smartphones)
ST012Q06NA	How many in your home: Computers (desktop computer, portable laptop, or notebook)
ST012Q07NA	How many in your home: <Tablet computers> (e.g., <iPad^®>, <BlackBerry^® PlayBook™>)
ST012Q08NA	How many in your home: E-book readers (e.g., <Kindle™>, <Kobo>, <Bookeen>)
ST012Q09NA	How many in your home: Musical instruments (e.g., guitar, piano)
ST082Q01NA	To what extent do you disagree or agree about yourself? I prefer working as part of a team to working alone.
ST082Q02NA	To what extent do you disagree or agree about yourself? I am a good listener.
ST082Q03NA	To what extent do you disagree or agree about yourself? I enjoy seeing my classmates be successful.
ST082Q08NA	To what extent do you disagree or agree about yourself? I take into account what others are interested in.
ST082Q09NA	To what extent do you disagree or agree about yourself? I find that teams make better decisions than individuals.
ST082Q12NA	To what extent do you disagree or agree about yourself? I enjoy considering different perspectives.
ST082Q13NA	To what extent do you disagree or agree about yourself? I find that teamwork raises my own efficiency.
ST082Q14NA	To what extent do you disagree or agree about yourself? I enjoy cooperating with peers.
ST098Q01TA	When learning <school science>? Students are given opportunities to explain their ideas.
ST098Q02TA	When learning <school science>? Students spend time in the laboratory doing practical experiments.
ST098Q03NA	When learning <school science>? Students are required to argue about science questions.
ST098Q05TA	When learning <school science>? Students are asked to draw conclusions from an experiment they have conducted.
ST098Q06TA	When learning <school science>? The teacher explains <school science> idea can be applied
ST098Q07TA	When learning <school science>? Students are allowed to design their own experiments.
ST098Q08NA	When learning <school science>? There is a class debate about investigations.
ST098Q09TA	When learning <school science>? The teacher clearly explains relevance <broad science> concepts to our lives.
ST098Q10NA	When learning <school science>? Students are asked to do an investigation to test ideas.
ST129Q01TA	Recognize the science question that underlies a newspaper report on a health issue.
ST129Q02TA	Explain why earthquakes occur more frequently in some areas than in others.
ST129Q03TA	Describe the role of antibiotics in the treatment of disease.
ST129Q04TA	Identify the science question associated with the disposal of garbage.
ST129Q05TA	Predict how changes to an environment will affect the survival of certain species.
ST129Q06TA	Interpret the scientific information provided on the labeling of food items.
ST129Q07TA	Discuss how new evidence can lead you to change your understanding about the possibility of life on Mars.
ST129Q08TA	Identify the better of two explanations for the formation of acid rain.
ST146Q01TA	How often do you do these things? Watch TV programs about <broad science>
ST146Q02TA	How often do you do these things? Borrow or buy books on <broad science> topics
ST146Q03TA	How often do you do these things? Visit web sites about <broad science> topics
ST146Q04TA	How often do you do these things? Read <broad science> magazines or science articles in newspapers
ST146Q05TA	How often do you do these things? Attend a <science club>
ST146Q06NA	How often do you do these things? Simulate natural phenomena in computer programs\virtual labs
ST146Q07NA	How often do you do these things? Simulate technical processes in computer programs\virtual labs
ST146Q08NA	How often do you do these things? Visit web sites of ecology organizations
ST146Q09NA	How often do you do these things? Follow news via blogs and microblogging

Funding

The author(s) declared that they received no financial support for the research, authorship, and/or publication of this article.

ORCID iD

Melissa Dan Wang

Data Availability

Data is available on PISA’s official website ().

Received: May 2023

Accepted: June 2024

References

Arias

V. B.

Garrido

L. E.

Jenaro

Martínez-Molina

Arias

2020. “A Little Garbage In, Lots of Garbage Out: Assessing the Impact of Careless Responding in Personality Survey Data.”Behavior Research Methods 52 (6): 2489–505. DOI: https://doi.org/10.3758/s13428-020-01401-8.

Bowling

N. A.

Huang

J. L.

Brower

C. K.

Bragg

C. B.

2021. “The Quick and the Careless: The Construct Validity of Page Time as a Measure of Insufficient Effort Responding to Surveys.”Organizational Research Methods 26 (2): 323–52. DOI: https://doi.org/10.1177/10944281211056520.

Bucher

Sand

2022. “Exploring the Feasibility of Recruiting Respondents and Collecting Web Data via Smartphone: A Case Study of Text-To-Web Recruitment for a General Population Survey in Germany.”Journal of Survey Statistics and Methodology 10 (4): 886–97. DOI: https://doi.org/10.1093/jssam/smab006.

Costa

P. T.

McCrae

R. R.

2008. “The Revised NEO Personality Inventory (NEO-PI-R).” In The SAGE Handbook of Personality Theory and Assessment, Vol. 2. Personality Measurement and Testing, edited by Boyle

G. J.

Matthews

Saklofske

D. H.

, 179–98. Thousand Oaks, CA: Sage. https://psycnet.apa.org/record/2008-14475-009.

Curran

P. G.

2016. “Methods for the Detection of Carelessly Invalid Responses in Survey Data.”Journal of Experimental Social Psychology 66: 4–19. DOI: https://doi.org/10.1016/j.jesp.2015.07.006.

Debell

Wilson

Jackman

Figueroa

2021. “Optimal Response Formats for Online Surveys: Branch, Grid, or Single Item?” Journal of Survey Statistics and Methodology 9 (1): 1–24. DOI: https://doi.org/10.1093/jssam/smz039.

DeSimone

J. A.

DeSimone

A. J.

Harms

P. D.

Wood

2018. “The Differential Impacts of Two Forms of Insufficient Effort Responding.”Applied Psychology 67 (2): 309–338. DOI: https://doi.org/10.1111/apps.12117.

Dillman

D. A.

Smyth

J. D.

Christian

L. M.

2014. Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method. Hoboken, NJ: John Wiley & Sons. https://books.google.com.hk/books?id=fhQNBAAAQBAJ&printsec=frontcover&hl=zh-CN&source=gbs_ge_summary_r&cad=0#v=onepage&q&f=false (accessed May, 2023).

Dunn

A. M.

Heggestad

E. D.

Shanock

L. R.

Theilgard

2018. “Intra-Individual Response Variability as an Indicator of Insufficient Effort Responding: Comparison to Other Indicators and Relationships with Individual Differences.”Journal of Business and Psychology 33 (1): 105–121. DOI: https://doi.org/10.1007/s10869-016-9479-0.

10.

Hong

Steedle

J. T.

Cheng

2020. “Methods of Detecting Insufficient Effort Responding: Comparisons and Practical Recommendations.”Educational and Psychological Measurement 80 (2): 312–45. DOI: https://doi.org/10.1177/0013164419865316.

11.

Houtenville

A. J.

Phillips

K. G.

Sundar

2021. “Usefulness of Internet Surveys to Identify People with Disabilities: A Cautionary Tale.”Journal of Survey Statistics and Methodology 9 (2): 285–308. DOI: https://doi.org/10.1093/jssam/smaa045.

12.

Huang

J. L.

Curran

P. G.

Keeney

Poposki

E. M.

DeShon

R. P.

2012. “Detecting and Deterring Insufficient Effort Responding to Surveys.”Journal of Business and Psychology 27 (1): 99–114. DOI: https://doi.org/10.1007/s10869-011-9231-8.

13.

Huang

J. L.

Liu

Bowling

N. A.

2015. “Insufficient Effort Responding: Examining an Insidious Confound in Survey Data.”Journal of Applied Psychology 100 (3): 828–45. DOI: https://doi.org/10.1037/a0038510.

14.

Johnson

J. A.

2005. “Ascertaining the Validity of Individual Protocols from Web-Based Personality Inventories.”Journal of Research in Personality 39 (1): 103–129. DOI: https://doi.org/10.1016/j.jrp.2004.09.009.

15.

Keusch

Yang

2018. “Is Satisficing Responsible for Response Order Effects in Rating Scale Questions?”Survey Research Methods 12: 259–70. DOI: https://doi.org/10.18148/srm/2018.v12i3.7263.

16.

Kim

D. S.

McCabe

C. J.

Yamasaki

B. L.

Louie

K. A.

King

K. M.

2018. “Detecting Random Responders with Infrequency Scales Using an Error-balancing Threshold.”Behavior Research Methods 50 (5): 1960–70. DOI: https://doi.org/10.3758/s13428-017-0964-9.

17.

Krosnick

J. A.

1991. “Response Strategies for Coping with the Cognitive Demands of Attitude Measures in Surveys.”Applied Cognitive Psychology 5 (3): 213–36. DOI: https://doi.org/10.1002/acp.2350050305.

18.

Mavletova

Couper

M. P.

Lebedev

2018. “Grid and Item-by-Item Formats in PC and Mobile Web Surveys.”Social Science Computer Review 36 (6): 647–68. DOI: https://doi.org/10.1177/0894439317735307.

19.

Meade

A. W.

Craig

S. B.

2012. “Identifying Careless Responses in Survey Data.”Psychological Methods 17 (3): 437–55. DOI: https://doi.org/10.1037/a0028085.

20.

Nichols

A. L.

Edlund

J. E.

2020. “Why Don’t We Care More About Carelessness? Understanding the Causes and Consequences of Careless Participants.”International Journal of Social Research Methodology 23 (6): 625–38. DOI: https://doi.org/10.1080/13645579.2020.1719618.

21.

OECD. 2017. PISA 2015 Technical Report. PISA, OECD Publishing. https://www.oecd.org/pisa/data/2015-technical-report/PISA2015_TechRep_Final.pdf (accessed May, 2023).

22.

OECD. 2019. PISA 2018 Results (Volume I): What Students Know and Can Do. PISA, OECD Publishing. https://doi.org/10.1787/5f07c754-en (accessed May, 2023).

23.

Reuning

Plutzer

2020. “Valid vs. Invalid Straightlining: The Complex Relationship Between Straightlining and Data Quality.”Survey Research Methods 14: 439–59. DOI: https://doi.org/10.18148/srm/2020.v14i5.7641.

24.

Robie

Meade

A. W.

Risavy

S. D.

Rasheed

2022. “Effects of Response Option Order on Likert-Type Psychometric Properties and Reactions.”Educational and Psychological Measurement 82 (6): 1107–129. DOI: https://doi.org/10.1177/00131644211069406.

25.

Roßmann

Gummer

Silber

2018. “Mitigating Satisficing in Cognitively Demanding Grid Questions: Evidence from Two Web-Based Experiments.”Journal of Survey Statistics and Methodology 6 (3): 376–400. DOI: https://doi.org/10.1093/jssam/smx020.

26.

Schulz

M.-A.

Schmalbach

Brugger

Witt

2012. “Analysing Humanly Generated Random Number Sequences: A Pattern-Based Approach.”PLoS One 7 (7): e41531. DOI: https://doi.org/10.1371/journal.pone.0041531.

27.

Shaw

J. I.

Bergen

J. E.

Brown

C. A.

Gallagher

M. E.

2000. “Centrality Preferences in Choices Among Similar Options.”The Journal of General Psychology 127 (2): 157–64. DOI: https://doi.org/10.1080/00221300009598575.

28.

Simon

H. A.

1957. Models of Man, Social and Rational. New York, NY: Wiley. https://lib.ugent.be/en/catalog/rug01:002048535 (accessed May, 2023).

29.

Steedle

J. T.

Hong

Cheng

2019. “The Effects of Inattentive Responding on Construct Validity Evidence When Measuring Social–Emotional Learning Competencies.”Educational Measurement 38 (2): 101–111. DOI: https://doi.org/10.1111/emip.12256.

30.

Tawa

2021. “The Response Entropy Index: Comparative Assessment of Performance and Cultural Bias Across Indices of Careless Responding.”Survey Research Methods 15: 299–325. DOI: https://doi.org/10.18148/srm/2021.v15i3.7832.

31.

Vriesema

C. C.

Gehlbach

2021. “Assessing Survey Satisficing: The Impact of Unmotivated Questionnaire Responding on Data Quality.”Educational Researcher 50 (9): 618–27. DOI: https://doi.org/10.3102/0013189x211040054.

32.

Yan

2008. “Nondifferentiation.” In Encyclopedia of Survey Research Methodology, edited by Lavrakas

P. J.

, 520–1. Thousand Oaks, CA: Sage. DOI: https://doi.org/10.4135/9781412963947 (accessed May, 2023).

33.

Zhang

Conrad

F. G.

2014. “Speeding in Web Surveys: The Tendency to Answer Very Fast and Its Association with Straightlining.”Survey Research Methods 8: 127–35. DOI: https://doi.org/10.18148/srm/2014.v8i2.5453.

In Likert Scale,Is Ticking Options Consecutively at Two Ends Equally Problematic as Ticking in the Middle?

Abstract

Keywords

1. Introduction

1.1. Insufficient Effort Responding and Straightlining

1.2. How Many Consecutive Identical Responses Should be Considered as IER?

1.3. IER Likelihood in Straightlining

1.4. Differences due to Position of Options: Middle Versus Two Ends

1.5. Mathematical Formulation for Straightlining in Valid Responses (Non-IER)

1.6. The Present Research

2. Study 1: Estimating Valid Straightlining (Non-IER) at Different Options

2.1. Design of the Simulation

2.2. Results

3. Study 2: Consequences of Using the Uniform Cutoff Values for Straightlining in Different Options

3.1. Design of Simulation

3.2. Results

4. Study 3: IER Likelihood in Straightlining Across Options in Empirical Data

4.1. Method

4.2. Results

5. Discussion

5.1. Inappropriateness of Using a Uniform Cutoff Value for Straightlining Across Options

5.2. Against Common Sense—Valid Straightlining More Often at Two-End Options

5.3. Problems in Wrong Removal of Straightlining

5.4. Recommendations and Limitations

Footnotes

Appendix

Funding

ORCID iD

Data Availability

References