Sage Journals: Discover world-class research

Abstract

The speed–accuracy tradeoff (SAT), where increased response speed often leads to decreased accuracy, is well established in experimental psychology. However, its implications for psychological assessments, especially in high-stakes settings, remain less understood. This study presents an experimental approach to investigate the SAT within a high-stakes spatial ability assessment. By manipulating instructions in a within-subjects design to induce speed variations in a large sample (N = 1,305) of applicants for an air traffic controller training program, we demonstrate the feasibility of manipulating working speed. Our findings confirm the presence of the SAT for most participants, suggesting that traditional ability scores may not fully reflect performance in high-stakes assessments. Importantly, we observed individual differences in the SAT, challenging the assumption of uniform SAT functions across test takers. These results highlight the complexity of interpreting high-stakes assessment outcomes and the influence of test conditions on performance dynamics. This study offers a valuable addition to the methodological toolkit for assessing the intraindividual relationship between speed and accuracy in psychological testing (including SAT research), providing a controlled approach while acknowledging the need to address potential confounders. Future research may apply this method across various cognitive domains, populations, and testing contexts to deepen our understanding of the SAT’s broader implications for psychological measurement.

Keywords

speed–accuracy tradeoff experiment psychological testing spatial ability high-stakes latent change

Introduction

It is well known from experimental studies using simple perceptual tasks that participants’ response accuracy decreases when response speed is increased (i.e., the time that test takers spend on a given task): a phenomenon that is widely recognized as the speed–accuracy tradeoff (SAT; for an overview, see Heitz, 2014). The SAT has also been reported in studies objecting psychological assessments (Davison et al., 2012; Goldhammer et al., 2017; Goldhammer & Kroehne, 2014; Mutak et al., 2024; Semmes et al., 2011). Thus, the actual performance in a test needs to be described by both the effective speed (i.e., the speed level a test taker choose) and the effective ability (i.e., the ability level a test taker exhibits with the chosen level of speed; Dennis & Evans, 1996; Goldhammer & Klein Entink, 2011). Understanding the SAT in psychological assessments can deepen our understanding of individual differences in cognitive performance, thereby improving the validity of test results (e.g., Lohman, 1989; Pohl et al., 2021; van der Linden, 2009).

The SAT plays a vital role in psychological assessments. To illustrate this, consider the SAT of three fictional test takers (Figure 1). In the figure, we depict an SAT with the commonly assumed inverse monotonic intraindividual relationship between speed and accuracy (Goldhammer, 2015; Heitz, 2014; Thurstone, 1937). In the example in Figure 1, test takers differ with regard to their maximum level of ability (capability; that is, ability given infinite time on responding to an item; that is, the ability level to which the curve converges with a decrease in effective speed) as well as the rate of trading ability for an increase in effective speed (i.e., the gradient at a given point of the SAT curve; Goldhammer, 2015; Pohl et al., 2021; Wickelgren, 1977). The curves represent every possible combination of ability and speed that the test taker could adopt. As can be seen in Figure 1, Test Taker 1 (blue) has a higher capability than Test Takers 2 (yellow) and 3 (pink). Also, given any speed level, Test Taker 1 outperforms both other test takers in terms of ability. Comparing Test Takers 2 and 3 with each other, Test Taker 2 outperforms Test Taker 3 for lower effective speed, while for medium to high speed levels, Test Taker 3 outperforms Test Taker 2. Usually in assessments, not the whole SAT curve, but only one point of the SAT curve is assessed for a given person. Test takers also do not choose the same speed level, but differ in the effective speed. In the depicted hypothetical scenario, the effective speed and ability level of each person is marked with a dot. Assessing only effective ability, we would conclude that Test Taker 2 is the most able one and Test Taker 1 has the lowest ability. This comparison, however, might not fully capture the nuances of cognitive performance, as Test Taker 1, despite performing worse than Test Takers 2 and 3 in terms of effective ability, works at a much faster pace. Even if Test Takers 2 and 3 choose the same speed level, a single observation does not reveal whether the chosen speed level affects whether Test Taker 2 or 3 has a higher ability. A more comprehensive understanding of the differences in performance among these three test takers would require knowledge of their individual SAT curves.

Figure 1

Hypothetical SAT Curves for Three Test Takers With Varying Capabilities

In this study, we present an experimental approach for assessing the SAT in psychological assessments of cognitive abilities (i.e., spatial ability).

Approaches for Investigating Intraindividual Speed–Accuracy Relationships in Psychological Assessments

Previous research on the within-person relationship of speed and accuracy relied on one of three strategies: (a) nonstationarity of ability and speed within a test, (b) external measures, and (c) experimental manipulations of speed.

Nonstationarity

Ample evidence exists suggesting that test takers change their working speed during test administration (Dennis & Evans, 1996; Maris & vander Maas, 2012). Some research on intraindividual dependencies of ability and speed make use of this to infer different values on the SAT curve of a person. Different approaches have been proposed that model how changes in response speed coincide with changes in accuracy (for an extensive overview, see De Boeck & Jeon, 2019). Most prominently, local dependency models have been proposed to account for the possibility that item responses and response times may be dependent on each other, even after controlling for the latent traits of speed and ability (under the assumption of stationarity). If test takers change their working speed during a test, it might lead to local dependencies between items, especially if the change in speed affects response accuracy. This can be used to infer the intraindividual relation of ability and speed (Bolsinova et al., 2017; De Boeck et al., 2017). Research employing local dependency models has shown that the relationship between response speed and accuracy varies with the difficulty of the cognitive tasks (Bolsinova et al., 2017; De Boeck et al., 2017; Goldhammer et al., 2014). Specifically, higher accuracy is associated with faster responses in easy tasks, while for more difficult tasks, slower responses correspond to higher accuracy.

Domingue et al. (2022) also used a residual-based approach. They reanalyzed 29 data sets from cognitive assessments and looked at how response times can explain residual responses. They reported inconsistent results across the different data sets; positive as well as negative dependencies were found. The authors concluded that next to motivational mechanisms, differences in task designs (e.g., the existence of time limits for the whole test) as well as the cognitive domain of interest (e.g., working memory) impacted the relationship. Interestingly, across all data sets, test takers who performed worse in cognitive tasks also showed higher variation in response speed.

Recently, Mutak et al. (2024) introduced a statistical approach extending the hierarchical model by van der Linden (2007) capturing changes in speed and ability during test administration through a latent growth term. By applying the model to Programme for International Student Assessment (PISA) test data, the authors confirmed the expected negative intraindividual relationship between changes in speed and ability, indicative of the SAT. They also identified instances in which this relationship does not exist and discussed the influence of confounding variables such as concentration or motivation.

External Measures

For estimating intraindividual relationships of ability and speed, Ranger et al. (2021) relied on external measures of proficiency. They used the ELO-score of chess players as an external measure of capability and grouped test takers into groups with similar ELO-scores. Assuming that persons with a similar capability have a similar SAT curve, they analyzed the relationship of response time and accuracy from a chess test within each subgroup. They argued that approaches relying on nonstationarity may be confounded by other factors such as concentration, persistence, or guessing. For instance, higher levels of concentration could result in both higher effective ability and higher working speed, while persistence could have a beneficial effect on response accuracy and decrease working speed. This would explain the inconsistent results of previous studies. They proposed various forms of the estimated SAT curve, each adjusted for different confounding factors. All of their estimated SAT curves showed a consistent positive relationship between ability and speed within the subgroups of persons with similar capability: Spending more time responding to an item relates to lower response accuracy. The authors conclude that other aspects, such as differences in concentration, may also be present and confound the inference of these curves as representations of intraindividual SAT curves.

Experimental Manipulations of Speed

Experimental manipulations of speed have mainly been performed in cognitive psychology on simple perceptual tasks (Heitz, 2014). Forcing participants to adapt their speed of responding to a task is commonly achieved by either introducing time limits on tasks (Goldhammer et al., 2024; Ratcliff & Rouder, 2000; Van Zandt et al., 2000) or by instructions that emphasize or even incentivize speed over accuracy or vice versa (Hale, 1969; Howell & Kreidler, 1964).

As pointed out by De Boeck et al. (2017), only very few studies exist that employ experimental manipulations of working speed in psychological assessments. Some studies have experimentally varied the speed of the test takers by introducing conditions with and without item time limits. These studies were either not directly concerned with investigating the SAT (Semmes et al., 2011) or investigated SAT in speed tests with simple tasks, for which responses are nearly always solved correctly if enough time is available (similar to perceptual tasks from experimental psychology; Goldhammer et al., 2017, 2024; Goldhammer & Kroehne, 2014) Also, while introducing time limits on the item level allows for a precise manipulation of time, it may result in other unwanted response processes such as item omission or guessing (Pohl & Von Davier, 2018).

Semmes et al. (2011) conducted a study that applied experimental manipulation of speed on tests with complex cognitive tasks (i.e., power tests, where responses require to take multiple seconds up to minutes). Specifically, they administered items of a numerical reasoning test to participants in two conditions: one with a set time limit for each item (experimenter-paced) and another where participants could take as long as they needed (self-paced). The primary objective of the study was not to investigate the SAT but to understand how introducing time limits might change the underlying factor structure of the test. Their analysis revealed that when time limits were imposed, performance was not just about accuracy (numerical ability) alone. Instead, a random effect of administration condition was found to be of best fit in modeling a second dimension, a speed factor. In terms of the SAT, the authors concluded that individuals vary in the degree to which ability is decreased while increasing their speed. However, the study ignored response times and only modeled response data in their analyses.

The data set by Semmes et al. (2011) was reanalyzed by Davison et al. (2012) comparing statistical models where self-paced response times were also incorporated in a latent speed factor. Interestingly, they interpreted the speed dimension as a measure of how well someone maintains their self-paced numerical reasoning under time constraints. High speed scores indicated minimal performance drops under time limits, while low scores signaled a significant decline. On average, speed levels increased in the time-limit condition while, at the same time ability levels decreased, aligning with the SAT. In addition, De Boeck et al. (2017) analyzed the data set to explore the relationship between speed and accuracy, examining speed’s main effects on accuracy and item interactions across conditions with and without time constraints, and between fast and slow responses. This approach, based on findings that time constraints might assess different traits, aimed to identify measurement invariance violations and further correlating item difficulty discrepancies, thereby assessing effects both within and across conditions. Although, within conditions, increased response speed was associated with higher response accuracy; on average, a decrease in ability was found between conditions when test takers were prompted to answer faster. The latter finding aligns with research in simple perceptual tasks and supports the existence of an SAT in psychological assessments.

Previous experimental studies have primarily relied on conceptualizing response speed using observed variables, analyzing median (log) response times at both item and person levels and examining residuals from these averages. This method overlooks the potential insights gained from using latent variables for response times, which could clarify the effects of item properties (e.g., time intensities) and individual test taker characteristics (e.g., working speed). A latent variable approach allows for the adjustment of response times based on each item’s time requirements, crucial for power tests where item difficulty and time demands vary significantly (Marianti et al., 2014; Mutak et al., 2024; van der Linden, 2006, 2009; van der Linden & Guo, 2008).

Aim of the Study

The SAT is a well-studied phenomenon in simple perceptual decision tasks (e.g., lexical discrimination), which are designed so that nearly all responses would be correct given enough time. However, the nature of SAT in more complex aptitude and competency tests, which demand greater cognitive effort and advanced problem-solving skills, is not yet well understood. Most previous studies investigating the intraindividual relationship of ability and speed in psychological assessments rely on nonstationarity. However, these approaches can only investigate SAT within the scope of speed changes within a test. These speed changes are often confounded by other factors, such as concentration or motivation. Studies on SAT relying on external measures operate on the strong assumption that individuals with similar capacities have the same SAT curve. While experimental studies, which allow to better isolate SAT from other factors impacting the within-person relationship between speed and accuracy, are often used in cognitive psychology, they are hardly utilized in psychological assessments of complex tasks. We aim to contribute to the understanding of the SAT in psychological assessments by using an experimental approach and utilizing a psychometric model to study SAT in psychological assessment.

Almost all existing research on SAT in psychological assessment has focused on low-stakes assessments. We hardly know anything about how SAT is present in high-stakes assessments. However, the response process may differ in high-stakes assessments, which may probably also impact the SAT or the investigations of the SAT. In our study, we focus on investigating the SAT in a high-stakes assessment setting.

In this study, we investigated the following research questions:

Research Question 1: What is the direction and magnitude of the intraindividual relationship between response speed and accuracy in psychological testing?

Research Question 2: Are there individual differences in the SAT among test takers?

Research Question 3: Do these findings generalize across different tests?

Method

We will leverage unpublished data from a high-stakes assessment designed for selecting trainee air traffic controllers (ATCOs) in Austria, measuring spatial abilities which are of paramount interest in the air traffic controlling profession (e.g., Rathje et al., 2004; Soldatov et al., 2018).¹

Participants

Over the time period between August 2011 and January 2015, overall 1,305 participants took part in an annual process of selecting trainee ATCOs in Austria. About 29.27% of the participants were female and 70.73% male aged 17 to 55 (M = 21.89, SD = 3.74) years. The age distribution is highly skewed, with 97.1% of the persons being between age 17 and 30, 2.6% between age 31 and 40, and 0.3% above 40. The highest level of education for 88.51% of individuals was high school completion, whereas 9.27% had already obtained a university degree.

Measures

Spatial ability was measured by two different computerized tests, the Endless Loops Test (ELT; Gittler & Arendasy, 2003) and the Three-Dimensional Cubes Test (3DC; Gittler, 1990). Each test consists of 20 Rasch-scaled items. For a short description, psychometric properties and item examples see the Supplementary Materials.

Design

We experimentally manipulated speed in a within-subject design by two instructions. In the first condition, the self-paced condition (SPC), the test takers were instructed to solve the items “as accurate as possible.” In the second condition, the time-pressured condition (TPC), the participants were instructed to solve the items “as fast and as accurate as possible.” No time limits were imposed in either condition.

First, the ELT test was administered to all participants, first under the SPC and then under the TPC. Then, there was a short break in which participants answered cognitively undemanding questions of a job-related interest questionnaire. After the questionnaire, the 3DC test was administered to the participants, first under the SPC and then under the TPC. Note that due to the high-stakes nature of the selection process for trainee ATCOs, the order of the tests and conditions was fixed rather than counterbalanced. This was to ensure that every candidate experienced the same sequence, maintaining fairness and consistency across the selection process.

For assigning items to each of the conditions, each cognitive ability test was split into two parts: Eight items were assigned to the SPC and 12 items to the TPC. Furthermore, the items within the two sets (SPC and TPC) were administered in a fixed order, with difficulty levels mixed throughout the sequence rather than arranged from easiest to hardest. This rationale is in line with the original test construction principles of the ELT and 3DC (Gittler, 1990; Gittler & Arendasy, 2003).

Analyses

Data Preparation

To allow for comparisons of latent variables between the two conditions, we obtained item parameters of all items from other studies, in which the two tests were administered under SPCs in a low-stakes setting (e.g., Arrer, 1992; Gittler, 2000). In the analyses of the data of this study, we did not estimate item parameters, but fixed them to the values of the pre-studies.

From the 20 administered items of the ELT, 17 items (7 in SPC, 10 in TPC) were used in the analyses, as (a) the first item in each condition needed to be treated as a hidden warmup item to ensure Rasch homogeneity as stated by the test author (Gittler, 1990, 2000) and (b) the final item of the TPC was excluded from scoring as its response time data were not available in the low-stakes assessment studies that we reanalyzed for item calibration. From the 20 administered items of the 3DC, two hidden warmup items of the 3DC were excluded from scoring, resulting in 18 items (7 in SPC, 11 in TPC) that were used in the analyses.

To control for random guessing and aberrant responding, we excluded fast responses based on a visual inspection of multimodality in distributions of log-transformed response times on the item level (Kroehne et al., 2019; Wise, 2017; Wise & DeMars, 2006). Overall, 490 responses and response times out of 23,490 (2.09%) were excluded by this procedure. In the analyses, we treated these values as missing values.

Model Specification

In our analyses, we made use of both responses and response times for each item. For analyzing the data, we combined the hierarchical modeling framework by van der Linden (2007) for speed and ability with a between-item multidimensional approach (Adams et al., 1997). For each condition, a latent speed and ability variable is modeled representing the test taker’s effective speed and ability levels across items, respectively. The model that was applied separately to each test is depicted in Figure 2.²

Figure 2

Between-Item Multidimensional Extension of the Hierarchical Modeling Framework by van der Linden (2007)

The model consists of two levels: The first-level model describes the relationship of the model parameters with the observed data, while the second level describes the joint distribution of the model parameters.

First-Level Model

Regarding the modeling of item responses, we rely on the Rasch model (Rasch, 1960), which has been shown to well depict the ELT and the 3DC items (Gittler, 1990; Gittler & Arendasy, 2003). We specify a separate ability dimension for each condition:

logit (p (x_{ji} = 1)) = ((1 - q_{i}) θ_{j 1} + q_{i} θ_{j 2}) - β_{i},

where $logit (x) = \log (x / (1 - x))$ , $θ_{j 1}$ and $θ_{j 2}$ represents the latent ability of person $j$ on the self-paced as well as the latent ability on the TPC, respectively, and $β_{i}$ corresponds to the difficulty of item $i$ . We introduce an indicator variable $q_{i}$ that is set to 0 if item $i$ is part of the SPC, and 1 if item $i$ is part of the TPC.

Regarding the modeling of response times, we assume that logarithmized response times are following a normal distribution as suggested by van der Linden (2006). Similar as for ability, we assume a separate speed dimension for each condition (Zhan et al., 2020, 2021):

\ln (t_{ji}) ~ N (b_{i} - ((1 - q_{i}) τ_{j 1} + q_{i} τ_{j 2}), σ_{i}^{2}),

where $\ln (t_{ji})$ denotes the observed log-transformed response time of person $j$ on item $i$ , $b_{i}$ representing the time intensity of item $i$ as well as $τ_{j 1}$ and $τ_{j 2}$ corresponding to the latent speed of person $j$ in the respective condition. $σ_{i}^{2}$ is introduced as an item-specific variance parameter.

To evaluate the impact of a change in speed on effective ability for each person, latent change scores in both ability and speed across the two conditions were derived by $θ_{Δ} = θ_{2} - θ_{1}$ and $τ_{Δ} = τ_{2} - τ_{1}$ within the model. Change scores are used to investigate both, trends in within-person changes in ability and speed levels across the two conditions, as well as the intraindividual relationship of effective speed and effective ability. To ensure comparability of person abilities and person speed across the two conditions, we fixed the item parameters $β_{i}$ and to values obtained from the prestudies. In accordance with van der Linden (2007), we assume (a) stationarity of person variables within conditions and (b) conditional independence of responses and response times.

Second-Level Model

On the second level, we specify the joint distribution of the model parameters. Person parameters are assumed to follow a multivariate normal distribution with mean vector:

μ_{P} = (μ_{θ_{1}}, μ_{θ_{2}}, μ_{τ_{1}}, μ_{τ_{2}}),

and variance-covariance matrix

Σ_{P} = (\begin{matrix} σ_{θ_{1}}^{2} & σ_{θ_{1} θ_{2}} & σ_{θ_{1} τ_{1}} & σ_{θ_{1} τ_{2}} \\ σ_{θ_{1} θ_{2}} & σ_{θ_{2}}^{2} & σ_{θ_{2} τ_{1}} & σ_{θ_{2} τ_{2}} \\ σ_{θ_{1} τ_{1}} & σ_{θ_{2} τ_{1}} & σ_{τ_{1}}^{2} & σ_{τ_{1} τ_{2}} \\ σ_{θ_{1} τ_{2}} & σ_{θ_{2} τ_{2}} & σ_{τ_{1} τ_{2}} & σ_{τ_{2}}^{2} \end{matrix}) .

The sampling distribution of the model can be written as

f (x, t; θ, τ) = Π_{j = 1}^{J} Π_{i = 1}^{I} \underset{first - level models}{\underset{︸}{f (x_{ij}; θ_{j 1}, θ_{j 2} β_{i}) f (t_{ij}; τ_{j 1} τ_{j 2} b_{i}, σ_{i})}} \underset{second - level model}{\underset{︸}{f (θ_{j 1}, θ_{j 2}, τ_{j 1}, τ_{j 2}; μ_{P}, Σ_{P})}},

where $f (\cdot)$ represents probability density functions. Note that in contrast to van der Linden (2007), we do not incorporate a second-level model for the joint distribution of item parameters as their values are treated as known (as in Klein Entink et al., 2009). As the item parameters are fixed, no further parameter restrictions need to be imposed for model identification.

Parameter Estimation

Data preparation, data analysis, and model evaluation were carried out using the programming language Julia version 1.9.3 (Bezanson et al., 2017). The model was fitted separately to each of the cognitive tests. Bayesian parameter estimation for the proposed model was conducted using Stan version 2.31 (Carpenter et al., 2017) and its command-line interface CmdStan. To sample from the posterior, Stan utilizes an adaptive form of the Hamiltonian Monte Carlo (HMC; Neal, 2011) algorithm, the No-U-Turn Sampler (Hoffman & Gelman, 2014). Priors used in the analysis are given in the Supplementary Materials. We ran four Markov chain Monte Carlo (MCMC) chains, each consisting of 12,000 iterations while the first 8,000 iterations were discarded as burn-in. Model parameters were summarized using expected a posteriori (EAP) estimates, accompanied by 90% highest posterior density (HPD) intervals. The Stan code used for model specification can be found in the Supplementary Materials.

In addition, we wish to emphasize that the latent change scores $θ_{Δ}$ and $τ_{Δ}$ , along with their 90% HPD intervals, were directly estimated from the draws of the MCMC chains. Hence, on every draw we subtracted the latent ability and speed parameters between the two conditions. These calculations as well as further correlational analyses between latent change scores and latent variables of the two conditions, SPC and TPC, were performed using Julia.

Convergence and Model Fit

To ensure the robustness of our model parameters, we assessed convergence using the $\hat{R}$ statistic, with values close to 1 indicating convergence (Gelman et al., 2013; Gelman & Rubin, 1992). Following recommendations by Vehtari et al. (2021), we adopted a more stringent threshold of at 1.01 to confirm convergence. In addition, we monitored the effective sample size (ESS) for each parameter, considering ESS values above 400 as indicative of reliable statistical estimates (Vehtari et al., 2021; Zitzmann & Hecht, 2019).

For model fit evaluation, we utilized both graphical and numerical posterior predictive checks (PPMCs; Gelman et al., 2013), flagging Posterior Predictive p-Values-values outside the .025 to .975 range as indicative of potential model-data misfit (Sinharay et al., 2006). We also applied Bayesian residual analysis for the lognormal response time model, focusing on the uniform distribution of PPP-values and the alignment of item curves with the identity line as indicators of fit (van der Linden & Guo, 2008). Using Yen’s Q3 statistic, we assessed whether the assumption of local independence and unidimensionality holds in the data. Detailed descriptions of these methods and additional analyses are available in the Supplementary Materials.

Manipulation Check

We evaluated whether the instructions had the desired impact on test takers’ speed and ability, that is, speed increases and ability decreases from SPC to TPC, both on group as well as individual level. On the group level, we evaluated whether on average the difference in effective speed between TPC and SPC $(τ_{Δ} = τ_{2} - τ_{1})$ is greater than zero. For doing this, we calculated the mean difference in speed as Cohen’s $d_{z}$ for paired samples (Cohen, 1988). We evaluated its significance by examining the region of practical equivalence (ROPE) using the full ROPE procedure (Kruschke, 2010, 2014; Kruschke et al., 2012). As suggested by Kruschke and Liddell (2018), this region was set to a range with values below $| 0.1 |$ for every latent speed change parameter.³ To investigate the impact of instruction on effective ability, we examined the mean latent changes in ability. This approach is analogous to investigating mean differences in latent speed, however with the hypothesis, that $θ_{Δ} = θ_{2} - θ_{1}$ would be less than zero.

For investigating the impact of the instruction on speed at the individual level, we calculated the proportion of participants who exhibited a significant increase in effective speed from SPC to TPC. An individual change score was considered significant if the EAP estimate of latent speed changes $\hat{τ}$ surpasses half the standard deviation of the difference in effective speed between TPC and SPC, represented as $0.5 * σ_{τ_{Δ}}$ . This cut-off for the individual change score, reflecting the lower limit of a moderate effect size according to Cohen (1988), aligns with the concept of minimally important difference (MID).⁴ Thus, it serves as a rough benchmark for meaningful change, ensuring that identified changes are both statistically significant and practically important. The same measure was also applied to individual changes in effective ability.

Investigating the SAT

To depict the different intraindividual relations of ability and speed, we reported on the number of test takers within each of the four possible patterns of change in ability and speed across conditions: (i) increasing speed and decreasing ability; (ii) increasing both, speed and ability; (iii) decreasing speed and increasing ability; and (iv) decreasing both, speed and ability. Note that Patterns (i) and (ii) align with the manipulation, while Patterns (iii) and (iv) indicate that the manipulation did not work. Both (i) and (iii) show patterns consistent with the SAT, while Patterns (ii) and (iv) do not. Increase or decrease in person parameters was determined based on the individual’s scores on the latent variables (not considering significance).⁵ For each pattern, we reported the percentage of test takers, as well as summary statistics of estimated EAPs of latent changes (mean, standard deviation).

We further investigated SAT, by describing simultaneous change in ability and speed across conditions by two-dimensional change vectors, with ability depicted on the y-axis and speed depicted on the x-axis. The vectors are constructed by connecting the ability and speed scores of both conditions for each test taker. To facilitate a more meaningful comparison between changes in ability and speed, we scaled the change in ability and change in speed by unit variance. We focused on magnitude $r$ and angles ∠ of the vectors. We calculated the magnitude of each change vector using the Euclidean norm and the angle of each change vector in degrees with

∠ = atan 2 (Δ θ, Δ τ) * \frac{π}{180},

where $atan 2$ is the 2-argument arc-tangent function, $Δ θ$ and $Δ τ$ are the individual change scores in ability and speed across the conditions, respectively. The change vectors show the directional shifts in performance, with their magnitude indicating the overall extent of change—a larger magnitude reflects a larger change in ability and/or speed. The angle of these vectors, measured in degrees, conveys the intraindividual trade-off between speed and ability. Specifically, a vector pointing directly upward ( $90^{°}$ ) would suggest an increase in ability with no change in speed, while a vector pointing rightward indicates an increase in speed without a loss in ability ( $0^{°}$ ). Individuals in Group (i) will have an angle ranging from $0^{°}$ to $- 90^{°}$ , those in Group (ii) from $0^{°}$ to $90^{°}$ , in Group (iii) from $90^{°}$ to $180^{°}$ , and in Group (iv) from $- 90^{°}$ to $- 180^{°}$ .⁶ An angle of $- 90^{°}$ or $90^{°}$ signifies a significant trade-off between ability and speed, whereas an angle of $0^{°}$ or $180^{°}$ suggests no trade-off between these factors.

On the group level, we evaluated the correlation between the change in ability and the change in speed. A negative correlation would support the existence of the SAT.

Results

In the following, we will present the results of the analyses, first for the ELT and then for the 3DC.

Endless Loops Test

Figure 3 shows the frequency distributions of log-transformed response times for ELT items in both conditions. We observed no signs of multimodality regarding very fast responses, and as such, we did not exclude any fast responses.

Figure 3

Frequency Distributions of Log-Transformed Response Times on ELT Items (Split by Condition)

Convergence and Model Fit

The model estimation successfully converged ( $\hat{R} < 1.01$ for all parameters), with all parameters estimated with adequate precision $(ESS >> 400)$ . PPMCs indicated a satisfactory model fit, with the Rasch model accurately reflecting the distribution of observed sum scores under both SPC and TPC. PPP values for the chi-square discrepancy measure were within acceptable ranges, suggesting good model fit. Yen’s Q3 statistic identified minor potential violations of local independence in a few item combinations, but these did not significantly impact the overall unidimensionality of the scales. The lognormal response time model also showed satisfactory fit, despite slight overprediction of faster responses. One item from the TPC was systematically faster and was excluded from further analysis due to its deviation and impact on local independence. For detailed results, see the Supplementary Materials.

Manipulation Check

On the group level, as expected, mean latent speed increased from SPC $μ_{τ_{1}} = 0.07$ $[0.06, 0.09]$ to $μ_{τ_{2}} = 0.44$ $[0.42, 0.45]$ in TPC. On average, latent speed is by $d_{z} (τ) = 0.69$ $[0.65, 0.73]$ standard deviations larger in the SPC as compared with the TPC ( $0 %$ of the ROPE is below $| 0.1 |$ ). This corresponds to a medium effect (Cohen, 1988).

An increase in speed occurred for almost all test takers. Most test takers were significantly influenced by the TPC in terms of latent speed, with $76.17 %$ of test takers exhibiting a change in effective speed greater than one-half standard deviations. Specifically, test takers who were slow in the SPC showed large increases in latent speed from SPC to TPC ( $ρ (τ_{1}, τ_{Δ}) = - . 51$ $[- . 54, - . 48]$ ). Only one test taker did not show an increase in latent speed when transitioning from SPC to TPC. For this individual, the working speed remained virtually unchanged across both conditions, with a negligible difference in latent speed . The results support the interpretation that the manipulation indeed resulted in an increase in speed for almost all test takers.

Investigating the Speed–Ability Tradeoff

The posterior means of person parameter means, variances, and correlations are presented in Table 1. Notably, there are large correlations between ability levels, $ρ (θ_{1}, θ_{2}) = . 78$ $[. 63, . 97]$ , and between speed levels, $ρ (τ_{1}, τ_{2}) = . 52$ $[. 49, . 55]$ , across both instructional conditions. This suggests that (a) test takers who are more able in the SPC also perform better in the TPC, and (b) test takers who respond faster in the SPC also respond faster in the TPC.

Table 1

ELT: Person Parameter Means, Variances, and Correlations

	$μ$	$σ^{2}$	$θ_{1}$	$θ_{2}$	$τ_{1}$
$θ_{1}$	0.74	0.70
	[0.68, 0.80]	[0.61, 0.80]
$θ_{2}$	0.42	0.73	.78
	[0.37, 0.47]	[0.65, 0.82]	[.63, .97]
$τ_{1}$	0.07	0.29	–.02	–.02
	[0.06, 0.09]	[0.28, 0.30]	[–.07, .04]	[–.07, .03]
$τ_{2}$	0.44	0.28	–.05	–.10	.52
	[0.42, 0.45]	[0.26, 0.29]	[−.10, −.01]	[−.15, −.04]	[.49, .55 ]

Note. $90 %$ highest density intervals are given in square brackets. $θ$ = ability; $τ$ = speed; Index 1 refers to self-paced condition; Index 2 refers to time-pressured condition.

As test takers transitioned from SPC to TPC, as expected, their latent ability tended to decrease. Mean latent ability estimates decreased from SPC to TPC with $d_{z} (θ) = 0.67$ $[0.39, 1.37]$ standard deviations. This corresponds to a significant medium effect (Cohen, 1988) with $0 %$ of the posterior being lower than $| 0.1 |$ .

On the group level, there is a nonsignificant weak relation of intraindividual change in speed and change in ability ( $ρ (θ_{Δ}, τ_{Δ}) = - . 10$ $[- . 50, . 15]$ ), indicating large interindividual differences in the intraindividual relation of speed and ability.

On the individual level, all test takers exhibited a lower ability in the TPC as compared with the SPC. These results suggest that for every (except one) test taker not only speed is higher in the TPC (see “Manipulation Check” section), but that, at the same time, ability decreases. This aligns well with what we would expect in the presence of an SAT. Notably, $73.79 %$ of test takers showed a decrease in ability exceeding one-half standard deviations, further substantiating the impact of time pressure on performance and underscoring the influence of the SAT across individuals.

Figure 4 shows simultaneous changes in latent ability and latent speed for each test taker. In Figure 4A, the individual speed and ability levels under each of the two conditions are depicted by two-dimensional change vectors. Figure 4B illustrates the magnitude $r$ of change (that is, the length of the change vector) as well as the angle ∠ of the change (that is, the slope of the change vector). The greater the magnitude, the greater the change in speed, ability, or both. For a magnitude of 0, there is no change in speed or ability. For an angle of 0, there is no change in ability at all and for an angle of −90°, there is no change in speed. As was discussed above, all test takers showed a change in ability and speed, and thus had a magnitude larger than zero. For all but one test taker, an increase in speed is related to a decrease in ability (the angle is within $- 90^{°} < ∠ < 0^{°}$ ). The amount of decrease in ability for an increase in speed differs across test takers $(SD (∠) = 10 . 77^{°})$ . This finding supports the idea that the SAT curves are heterogeneous across individuals. There is a relationship between magnitude and angle $(Cor (r, ∠) = . 58)$ . Test takers with a larger magnitude (i.e., change in person parameters) tend to show a lower trade-off. This may indicate that individuals chose to increase their speed in proportion to the decrease in ability they were willing to trade off, aiming not to lower performance too much.

Figure 4

ELT: Changes in Latent Ability and Latent Speed Across Conditions on the Individual Level. Panel A: Points Indicate Values in Self-Paced Condition, While Stars Indicate Values in Time-Pressured Condition. The Magnitude of Latent Speed Changes Is Color-Coded, Ranging From Very Small Changes in Yellow (Light) to Very Strong Changes in Black (Dark); Panel B: Polar Plot Showing the Magnitude and Angle of Individual Change Vectors With Unit-Variance Scaled Speed and Ability Changes

Three-Dimensional Cubes Test

As shown in Figure 5, we identified multimodality in frequency distributions of log-transformed response times on 3DC items in both conditions. In total, this anomaly appeared in $2.09 %$ of all item responses, with 307 test takers $(25.06 %)$ showing fast responses on at least one item. To control for rapid guessing, we excluded fast responses (and their respective response times) with response times below 4 s from further analyses (in line with Kroehne et al., 2019; Wise, 2017; Wise & DeMars, 2006).

Figure 5

Frequency Distributions of Log-Transformed Response Times on 3DC Items(Split by Condition)

Convergence and Model Fit

The model estimation converged successfully ( $\hat{R} < 1.01$ for all parameters), with adequately precise parameter estimations $(ESS >> 400)$ . PPMCs showed a generally satisfactory model fit, although there were minor indications of local dependence violations and a slight overestimation of fast responses. The Rasch model accurately captured the distribution of observed sum scores across both conditions, with PPP values indicating satisfactory fit. Minor potential violations of local independence were identified for a few item combinations, but these did not significantly impact the overall scale unidimensionality, given the small residual correlations and the comprehensive item analysis. The lognormal response time model also demonstrated a satisfactory fit, with PPP values supporting the model’s robustness despite the slight overprediction of faster responses. For detailed model fit results, see Supplementary Materials.

Manipulation Check

Mean latent speed estimates increased from SPC $μ_{τ_{1}} = 0.06$ $[0.04, 0.08]$ to $μ_{τ_{2}} = 0.44$ $[0.43, 0.46]$ in the TPC. Speed is significantly increased on average by $d_{z} (τ) = 0.83$ $[0.72, 0.94]$ ( $0 %$ of the ROPE is below $| 0.1 |$ ). This corresponds to a large effect (Cohen, 1988). Furthermore, test takers with lower working speed in the SPC exhibited stronger increases in latent speed when transitioning from SPC to TPC ( $ρ (τ_{1}, τ_{Δ}) = - . 53$ $[- . 58, - . 47]$ ).

The vast majority of test takers $(97.55 %)$ raised their speed levels in response to the speed-emphasized instruction, confirming that the instruction indeed resulted in the requested behavior for nearly every test taker. Most of these test takers were even strongly influenced by the TPC in terms of latent speed, as $79.26 %$ exhibited effective speed changes larger than one-half standard deviations. In contrast, only $0.03 %$ of test takers who lower their speed levels from SPC to TPC exhibited effective speed changes larger than one-half standard deviations.

Investigating the Speed–Ability Tradeoff

The posterior means of person parameter means, variances, and correlations are presented in Table 2. Large correlations were observed both for ability levels across conditions, $ρ (θ_{1}, θ_{2}) = . 68$ $[. 64, . 72]$ , and for speed levels, $ρ (τ_{1}, τ_{2}) = . 60$ $[. 51, . 68]$ , across conditions. These correlations indicate that (a) individuals who perform well in the SPC tend to also excel in the TPC, and (b) individuals who respond more quickly in the SPC generally maintain faster response times in the TPC.

Table 2

3DC: Person Parameter Means, Variances, and Correlations

	$μ$	$σ^{2}$	$θ_{1}$	$θ_{2}$	$τ_{1}$
$θ_{1}$	0.56	1.21
	[0.48, 0.64]	[1.13, 1.29]
$θ_{2}$	–0.02	1.14	.68
	[–0.09, 0.05]	[1.07, 1.21]	[.64, .72]
$τ_{1}$	0.06	0.28	–.25	–.01
	[0.04, 0.08]	[0.26, 0.30]	[−.29, −.21]	[−.05, .02]
$τ_{2}$	0.44	0.24	–.00	–.34	.60
	[0.43, 0.46]	[0.22, 0.26]	[−.04, .03]	[−.38, −.29]	[.51, .68]

Note. $90 %$ highest density intervals are given in square brackets. $θ$ = ability; $τ$ = speed; Index 1 refers to self-paced condition; Index 2 refers to the time-pressured condition.

Mean latent ability estimates significantly decreased from SPC to TPC by $d_{z} (θ) = 0.67$ $[0.59, 0.76]$ standard deviations ( $0 %$ of the posterior is lower than $| 0.1 |$ ). The majority of test takers $(92.03 %)$ exhibited a decrease in ability when transitioning from SPC to TPC, with $68.28 %$ experiencing a decrease in ability exceeding one-half standard deviations. Conversely, only $0.05 %$ demonstrated an increase in ability beyond one-half standard deviations. This pattern of results strongly supports the presence of the SAT, where increases in speed led to a notable decrease in the accuracy for the vast majority of test takers. In addition, change in ability was strongly (negatively) related to change in speed ( $ρ (θ_{Δ}, τ_{Δ}) = - . 79$ $[- . 98, - . 63]$ ), supporting the notion of homogeneity of the intraindividual relationship of ability and speed across persons.

We found three groups of test takers: (a) test takers that behaved as expected, that is, they increased speed and decreased ability in TPC as compared with SPC ( $92.03 %$ of the test takers, see Panel C in Figure 6); (b) test takers that increased speed according to the instruction, but at the same time also showed an increased ability ( $5.52 %$ of the test takers, see Panel B in Figure 6); and (c) test takers that may not have complied to the instruction and decreased speed and as expected by the SAT increased ability ( $2.45 %$ of the test takers, see Panel A in Figure 6). There are no test takers that decreased speed and also ability.

Figure 6

3DC: Changes in Latent Ability and Latent Speed Across Conditions on the Individual Level. Panel A Depicts Those With Increased Ability and Decreased Speed, Panel B Shows Increased Ability and Speed, While Panel C Illustrates Increased Speed With Decreased Ability

According to Panel A in Figure 6, $2.45 %$ $(n = 32)$ of test takers may not have complied with the instructions as they showed just very small changes in speed $(d_{z} ({\hat{τ}}_{Δ}) = 0.14)$ and moderate ones in ability $(d_{z} ({\hat{θ}}_{Δ}) = 0.38)$ . In this group, the slight decrease in speed resulted in a relatively large increase in ability (angles were close to $108^{°}$ ). This effect varied across persons with $SD (∠) = 11 . 64^{°}$ .

The $5.52 %$ $(n = 72)$ of test takers who—as instructed—increased their speed but also increased ability (Panel B in Figure 6), showed generally only very small changes in both speed ( $d_{z} ({\hat{τ}}_{Δ}) = 0.18$ ) and ability $(d_{z} ({\hat{θ}}_{Δ}) = 0.13)$ . The direction of the change was opposite of what one would expect under the existence of the SAT (average angle being $35^{°}$ ) with high variation across persons $(SD (∠) = 29 . 06^{°})$ ; however, the magnitude of change was overall very small.

The largest group of test takers ( $92.03 %$ ; $n = 1, 201$ ) behaved as instructed and showed a pattern that one would expect under the SAT: They increased their speed and decreased ability. This group showed the highest overall change in speed $(d_{z} ({\hat{τ}}_{Δ}) = 0.89)$ and ability $(d_{z} ({\hat{θ}}_{Δ}) = 0.75)$ . The decrease in ability for an increase in speed was as expected in the presence of the SAT $(M (∠) = - 37^{°})$ . Individuals were rather homogeneous in their angle $(SD (∠) = 8 . 99^{°})$ , indicating more homogeneous SAT curves across persons in this group. Different from the ELT, in the 3DC in this group, a negative relationship was observed between the magnitude and angle of change $(Cor (∠, r) = - . 56)$ . This suggests that test takers with a larger magnitude (i.e., change in person parameters) tended to have a greater SAT.

The majority, that is, $94.48 %$ of the test takers (those in Panel A and C) showed patterns that align with the SAT. Within this group, SAT seems to be homogeneous for all test takers (similar angle). Some $(5.52 %)$ of the test takers showed patterns contradicting the SAT. However, changes in speed and ability were very small for this group.

Discussion

In this article, we presented an experimental approach to the investigation of the SAT in psychological assessments and investigated the occurrence of the SAT in two spatial ability tests within a high-stakes setting. The results showed that it is indeed possible to increase working speed by instructions. About 99.92% and 98% of the test takers increased their speed from a self-paced to a TPC in the ELT and the 3DC, respectively. Most of the test takers (99.92% in ELT and 94.48% in 3DC) showed patterns aligning with the existence of an SAT. While the amount of decrease of ability for a given increase in speed was rather homogeneous in the 3DC, it varied considerably in the ELT, indicating similar SAT curves in the 3DC across persons and a heterogeneous SAT in the ELT. While for most test takers the SAT seemed to be present, notably a small subgroup (5.52%) in the 3DC showed patterns contradicting the SAT; they gained in both speed and ability under time pressure. This outcome may suggest that time pressure has positive motivational effects for some test takers. Aligning with this notion, previous research posited that mild time constraints could enhance performance (i.e., reading comprehension; Walczyk et al., 1999), further indicating that the impact of time pressure on performance varies among test takers.

This research introduces an additional method to the toolbox of approaches for investigating the SAT. As compared with previous approaches, our approach has different strengths and limitations. Unlike approaches such as those by Domingue et al. (2022) or Mutak et al. (2024), which require nonstationarity within test conditions to investigate SAT, our method does not rely on this assumption. By deliberately manipulating speed levels within the test via small instructional changes, our approach ensures changes in speed, facilitating the investigation of the SAT. Our approach also does not depend on external measures of proficiency or on the assumption of uniformity in SAT functions across groups of test takers such as in Ranger et al. (2021). However, compared with previous methods, especially those relying on nonstationarity, our experimental approach is more time-intensive as it involves administering the same test under both speeded and unspeeded conditions. We also note that this approach assumes that items from the same test, whether presented under speeded or unspeeded conditions, are comparable and measured on the same scale. This requires that precalibrated item parameters remain valid across different testing conditions.

Our approach only captures two points on the SAT curve and, as such, limits our understanding of its functional form. Previous studies have modeled nonlinear relations speed and ability, but these models often relied on specific assumptions, such as similar SAT curves across groups of persons (Ranger et al., 2021), the need for nonstationarity (Domingue et al., 2022), or opposite functional forms between conditions of low and high accuracy (Kang et al., 2022).

In our study, we implemented a within-person design and manipulated the instructions for the cognitive test to explore individual differences in the SAT. Similar as with all previous approaches, the results on the intraindividual relationship may be impacted by confounding variables, such as ordering effects (Hambleton & Traub, 1974), position effects (Debeer and Janssen, 2013; Kanopka & Domingue, 2022; Weirich et al., 2014), learning, concentration, or fatigue effects (Ranger et al., 2021). While the fixed sequence of tests and conditions was necessary to ensure fairness in this high-stakes selection setting, it limited our ability to control for potential confounders. For instance, variations in concentration or fatigue throughout the study, compounded by the order of the instructional conditions, could systematically affect both ability and speed, potentially biasing the results. Consequently, a decrease in ability might result from an increase in speed, or it could also stem from reduced concentration. However, it is important to note that the administration time for the two spatial ability tests within the high-stakes assessment of our study was relatively short: The 3DC took an average of 17 min, compared with a mean of 8.5 min for the ELT (excluding test instructions and practice items).

Some variables may pose fewer problems in our study compared with previous research. Given the high-stakes nature of our testing environment, it is reasonable to assume that all participants maintained high motivation throughout the assessment, thereby reducing the likelihood of motivation as a confounding factor. This may differ in low-stakes assessments. Furthermore, our experimental design required all test takers to increase their speed, thus partially controlling for person-level confounders. Nevertheless, the chosen speed within each condition could still be influenced by individual-level confounders.

In an ideal research setup, potential confounding variables could be more effectively controlled. This would allow for a clearer interpretation of how different instructions influence cognitive performance across various settings. In future research, one may aim at assessing such possible confounders and control for them. In fact, a preregistered study is currently investigating potential confounding effects like motivation and concentration on the intraindividual relationship of speed and ability (Much et al., 2023).

In our study, we focused on a specific application, that is, investigating young, primarily male trainee ATCO applicants in Austria, and assessing spatial abilities. While this provides detailed insights into this context and provides further knowledge on the SAT in cognitive testing, it may not reflect the complexities in other cognitive areas or among different populations. Research has shown that age differences in speed, ability, and also their trade-offs exist for mental rotation tasks (Berg et al., 1982; Debelak et al., 2014; Hertzog et al., 1993; Linn & Petersen, 1985; Voyer et al., 1995; Zhao et al., 2019). Domingue et al. (2022) have found both positive and negative intraindividual relationships between accuracy and response time in cognitive tasks and highlighted the importance of task design. As such, our study can be seen as adding another piece to the previous results.

While the majority of previous research investigated SAT in low-stakes assessments, this study provides results from high-stakes assessments. Specifically in line with results from low-stakes settings, most participants displayed a decrease in ability with an increase in speed, aligning with traditional SAT expectations. Notably, the high-stakes nature of our tests may have influenced participants to adhere more strictly to the testing conditions, possibly due to the increased pressure and significant consequences associated with their performance outcomes. How well the proposed approach for investigating the SAT would work in low-stakes assessments also depends on how much test takers adhere to the instructions. Previous research (Hertzog et al., 1993; Lerche & Voss, 2017; Nietfeld & Bosma, 2003) has shown positive results. As such, we are optimistic that this approach would also be applicable to low-stakes assessments. This, however, needs to be further investigated in future research.

In future studies, it would be beneficial to directly compare our method applied to a high-stakes environment with results obtained from psychological assessments conducted in low-stakes settings using the same procedural framework. This comparative analysis would help delineate how test stakes influence the manifestation of the SAT and could reveal important variations in cognitive performance dynamics under different settings. Such studies would not only validate the robustness of our method across different testing conditions but also enhance our understanding of the contextual factors that affect test outcomes in psychological assessments.

Another avenue for future research involves exploring which timing conditions are most appropriate for specific diagnostic questions. Findings from Goldhammer et al. (2024) indicate that timed conditions might provide more diagnostic value by revealing how well individuals perform under pressure. For instance, a student’s ability to recognize words rapidly under timed conditions could give more insight into their reading efficiency, particularly with unfamiliar words. This suggests that the choice of a speed level may heavily impact the interpretation of ability scores.

These broader findings indicate that our study, while shedding light on the SAT in spatial ability, is part of a larger, more complex picture. Applying our methodology across various psychological tests, methodological approaches, or populations one may add to the generalizability of the results or identify boundary conditions for our claims. That said, our research underscores that SAT exists in tests and emphasizes the need to account for SAT in assessments. If we do not consider the SAT appropriately, we might misinterpret how people perform on cognitive tasks (Lohman, 1989; Pohl et al., 2021; van der Linden, 2009).

Supplemental Material

sj-pdf-1-epm-10.1177_00131644241271309 – Supplemental material for Assessing the Speed–Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations

Supplemental material, sj-pdf-1-epm-10.1177_00131644241271309 for Assessing the Speed–Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations by Tobias Alfers, Georg Gittler, Esther Ulitzsch and Steffi Pohl in Educational and Psychological Measurement

Footnotes

Acknowledgements

The authors would like to thank the High-Performance Computing Service of Freie Universität Berlin for computing time (Bennett 2020).

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the German Research Foundation (Deutsche Forschungsgemeinschaft DFG) under Grant PO 1655/4-1. Open Access Funding provided by Freie Universität Berlin.

ORCID iD

Tobias Alfers

Supplemental Material

Supplemental material for this article is available online.

Notes

References

Adams

R. J.

Wilson

Wang

W.-C.

(1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1–23. https://doi.org/10.1177/0146621697211001

Arrer

(1992). Unterschied zwischen Computervorgabe und Papier-Bleistift Vorgabe des Dreidimensionalen Würfeltests [Difference between computer-based and paper-pencil based administrations of the three-dimensional cubes test] [Unpublished master’s thesis]. University of Vienna.

Berg

Hertzog

Hunt

(1982). Age differences in the speed of mental rotation. Developmental Psychology, 18, 95–107. https://doi.org/10.1037/0012-1649.18.1.95

Bezanson

Edelman

Karpinski

Shah

V. B.

(2017). Julia: A fresh approach to numerical computing. SIAM Review, 59, 65–98. https://doi.org/10.1137/141000671

Bolsinova

De Boeck

Tijmstra

(2017). Modelling conditional dependence between response time and accuracy. Psychometrika, 82, 1126–1148. https://doi.org/10.1007/s11336-016-9537-6

Carpenter

Gelman

Hoffman

M. D.

Lee

Goodrich

Betancourt

Brubaker

Guo

Riddell

(2017). Stan: A probabilistic programming language. Journal of Statistical Software, 76, 1–32. https://doi.org/10.18637/jss.v076.i01

Cohen

(1988). Statistical power analysis for the behavioral sciences. Routledge. https://doi.org/10.4324/9780203771587

Davison

M. L.

Semmes

Huang

C. N.

(2012). On the reliability and validity of a numerical reasoning speed dimension derived from response times collected in computerized testing. Educational and Psychological Measurement, 72, 245–263. https://doi.org/10.1177/0013164411408412

Debeer

Janssen

(2013). Modeling item-position effects within an IRT framework. Journal of Educational Measurement, 50, 164–185. https://doi.org/10.1111/jedm.12009

10.

Debelak

Gittler

Arendasy

(2014). On gender differences in mental rotation processing speed. Learning and Individual Differences, 29, 8–17. https://doi.org/10.1016/j.lindif.2013.10.003

11.

De Boeck

Chen

Davison

(2017). Spontaneous and imposed speed of cognitive test responses. British Journal of Mathematical and Statistical Psychology, 70, 225–237. https://doi.org/10.1111/bmsp.12094

12.

De Boeck

Jeon

(2019). An overview of models for response times and processes in cognitive tests. Frontiers in Psychology, 10, Article e00102. https://doi.org/10.3389/fpsyg.2019.00102

13.

Dennis

Evans

J. S. B. T.

(1996). The speed-error trade-off problem in psychometric testing. British Journal of Psychology, 87, 105–129. https://doi.org/10.1111/j.2044-8295.1996.tb02579.x

14.

Domingue

B. W.

Kanopka

Stenhaug

Sulik

M. J.

Beverly

Brinkhuis

Circi

Faul

Liao

MacCandliss

Obradović

Piech

Porter

Soland

Weeks

Wise

S. L.

Yeatman

, & Project iLEAD Consortium. (2022). Speed–accuracy trade-off? Not so fast: Marginal changes in speed have inconsistent relationships with accuracy in real-world settings. Journal of Educational and Behavioral Statistics, 47, 576–602. https://doi.org/10.3102/10769986221099906

15.

Gelman

Carlin

J. B.

Stern

H. S.

Dunson

D. B.

Vehtari

Rubin

D. B.

(2013). Bayesian data analysis. CRC Press.

16.

Gelman

Rubin

D. B.

(1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472. https://doi.org/10.1214/ss/1177011136

17.

Gittler

(1990). Dreidimensionaler Würfeltest 3DW: Ein Rasch-skalierter Test zur Messung des räumlichen Vorstellungsvermögens [Three-dimensional cubes test DW: A Rasch-scaled test of spatial ability]. Beltz Test GmbH.

18.

Gittler

(2000). Manual zum A3DW (Adaptiver Dreidimensionaler Würfeltest) [Manual to the ADW (adaptive three-dimensional cubes test)]. Schuhfried.

19.

Gittler

Arendasy

(2003). Endlosschleifen: Psychometrische Grundlagen des Aufgabentyps Ep [Endless loops: Psychometric fundamentals of the item type Ep]. Diagnostica, 49, 164–175. https://doi.org/10.1026//0012-1924.49.4.164

20.

Goldhammer

(2015). Measuring ability, speed, or both? Challenges, psychometric solutions, and what can be gained from experimental control. Measurement: Interdisciplinary Research and Perspectives, 13, 133–164. https://doi.org/10.1080/15366367.2015.1100020

21.

Goldhammer

Klein Entink

R. H.

(2011). Speed of reasoning and its relation to reasoning ability. Intelligence, 39, 108–119. https://doi.org/10.1016/j.intell.2011.02.001

22.

Goldhammer

Kroehne

(2014). Controlling individuals’ time spent on task in speeded performance measures. Applied Psychological Measurement, 38, 255–267. https://doi.org/10.1177/0146621613517164

23.

Goldhammer

Kroehne

Hahnel

Naumann

De Boeck

(2024). Does timed testing affect the interpretation of efficiency scores? A GLMM analysis of reading components. Journal of Educational Measurement. Advance online publication. https://doi.org/10.1111/jedm.12393

24.

Goldhammer

Naumann

Stelter

Tóth

Rölke

Klieme

(2014). The time on task effect in reading and problem solving is moderated by task difficulty and skill: Insights from a computer-based large-scale assessment. Journal of Educational Psychology, 106, 608–626. https://doi.org/10.1037/a0034716

25.

Goldhammer

Steinwascher

M. A.

Kroehne

Naumann

(2017). Modelling individual response time effects between and within experimental speed conditions: A GLMM approach for speeded tests. British Journal of Mathematical and Statistical Psychology, 70, 238–256. https://doi.org/10.1111/bmsp.12099

26.

Hale

D. J.

(1969). Speed-error tradeoff in a three-choice serial reaction task. Journal of Experimental Psychology, 81, 428–435. https://doi.org/10.1037/h0027892

27.

Hambleton

R. K.

Traub

R. E.

(1974). The effects of item order on test performance and stress. The Journal of Experimental Education, 43, 40–46. https://doi.org/10.1080/00220973.1974.10806302

28.

Heitz

R. P.

(2014). The speed-accuracy tradeoff: History, physiology, methodology, and behavior. Frontiers in Neuroscience, 8, Article e00150. https://doi.org/10.3389/fnins.2014.00150

29.

Hertzog

Vernon

M. C.

Rypma

(1993). Age differences in mental rotation task performance: The influence of speed/accuracy tradeoffs. Journal of Gerontology, 48, 150–156. https://doi.org/10.1093/geronj/48.3.p150

30.

Hoffman

M. D.

Gelman

(2014). The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. Journal of Machine Learning Research, 15, 1593–1623.

31.

Howell

W. C.

Kreidler

D. L.

(1964). Instructional sets and subjective criterion levels in a complex information-processing task. Journal of Experimental Psychology, 68, 612–614. https://doi.org/10.1037/h0047862

32.

Kang

De Boeck

Ratcliff

(2022). Modeling conditional dependence of response accuracy and response time with the diffusion item response theory model. Psychometrika, 87, 725–748. https://doi.org/10.1007/s11336-021-09819-5

33.

Kanopka

Domingue

(2022). A position sensitive IRT mixture model. Psyarxiv. https://doi.org/10.31234/osf.io/hn2p5

34.

Klein Entink

R. H.

Fox

J.-P.

van der Linden

W. J.

(2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74, 21–48. https://doi.org/10.1007/s11336-008-9075-y

35.

Kroehne

Buchholz

Goldhammer

(2019, April 4–8). Detecting carelessly invalid effort responses in item sets using item-level response times [Conference presentation]. NCME Annual Meeting2019, Toronto, Ontario, Canada.

36.

Kruschke

J. K.

(2010). What to believe: Bayesian methods for data analysis. Trends in Cognitive Sciences, 14, 293–300. https://doi.org/10.1016/j.tics.2010.05.001

37.

Kruschke

J. K.

(2014). Doing Bayesian data analysis: A tutorial with R, Jags, and Stan. Academic Press.

38.

Kruschke

J. K.

Aguinis

Joo

(2012). The time has come: Bayesian methods for data analysis in the organizational sciences. Organizational Research Methods, 15, 722–752.

39.

Kruschke

J. K.

Liddell

T. M.

(2018). The Bayesian new statistics: Hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychonomic Bulletin & Review, 25, 178–206. https://doi.org/10.3758/s13423-016-1221-4

40.

Lerche

Voss

(2017). Experimental validation of the diffusion model based on a slow response time paradigm. Psychological Research, 83, 1194–1209. https://doi.org/10.1007/s00426-017-0945-8

41.

Linn

M. C.

Petersen

A. C.

(1985). Emergence and characterization of sex differences in spatial ability: A meta-analysis. Child Development, 56, 1479–1498. https://doi.org/10.2307/1130467

42.

Lohman

D. F.

(1989). Individual differences in errors and latencies on cognitive tasks. Learning and Individual Differences, 1, 179–202. https://doi.org/10.1016/1041-6080(89)90002-2

43.

Marianti

Fox

J.-P.

Avetisyan

Veldkamp

B. P.

Tijmstra

(2014). Testing for aberrant behavior in response time modeling. Journal of Educational and Behavioral Statistics, 39, 426–451. https://doi.org/10.3102/1076998614559412

44.

Maris

vander Maas

(2012). Speed-accuracy response models: Scoring rules based on response time and accuracy. Psychometrika, 77, 615–633. https://doi.org/10.1007/s11336-012-9288-y

45.

Much

Mutak

Pohl

Ranger

(2023).). Modeling speed-ability trade-off and test-taking persistence: Parameter validation for two psychometric models (Unpublished pre-registration).

46.

Mutak

Krause

Ulitzsch

Much

Ranger

Pohl

(2024). Modeling the intraindividual relation of ability and speed within a test. Journal of Educational Measurement. Advance online publication. https://doi.org/10.1111/jedm.12391

47.

Neal

R. M.

(2011). MCMC using Hamiltonian dynamics. In Brooks

Gelmana

Jones

Meng

X.-L.

(Eds.), Handbook of Markov Chain Monte Carlo (pp. 116–162). Chapman & Hall/CRC Press.

48.

Nietfeld

Bosma

(2003). Examining the self-regulation of impulsive and reflective response styles on academic tasks. Journal of Research in Personality, 37, 118–140. https://doi.org/10.1016/s0092-6566(02)00564-0

49.

Norman

G. R.

Sloan

J. A.

Wyrwich

K. W.

(2003). Interpretation of changes in health-related quality of life: The remarkable universality of half a standard deviation. Medical Care, 41, 582–592. https://doi.org/10.1097/01.mlr.0000062554.74615.4c

50.

Pohl

Ulitzsch

von Davier

(2021). Reframing rankings in educational assessments. Science, 372, 338–340. https://doi.org/10.1126/science.abd3300

51.

Pohl

Von Davier

(2018). Commentary: On the importance of the speed-ability trade-off when dealing with not reached items. Frontiers in Psychology, 9, Article e01988. https://doi.org/10.3389/fpsyg.2018.01988

52.

Ranger

Kuhn

J.-T.

Pohl

(2021). Effects of motivation on the accuracy and speed of responding in tests: The speed-accuracy tradeoff revisited. Measurement: Interdisciplinary Research and Perspectives, 19, 15–38. https://doi.org/10.1080/15366367.2020.1750934

53.

Rasch

(1960). Probabilistic models for some intelligence and attainment tests. Danish Institute for Educational Research.

54.

Ratcliff

Rouder

J. N.

(2000). A diffusion model account of masking in two-choice letter identification. Journal of Experimental Psychology: Human Perception and Performance, 26, 127–140. https://doi.org/10.1037/0096-1523.26.1.127

55.

Rathje

Golany

Eißfeldt

(2004). Pan-European selection test battery for air traffic control applicants. In Goeters

K.-M.

(Ed.), Aviation psychology: Practice and research (pp. 171–201). Ashgate.

56.

Semmes

Davison

M. L.

(2011). Modeling individual differences in numerical reasoning speed as a random effect of response time limits. Applied Psychological Measurement, 35, 433–446. https://doi.org/10.1177/0146621611407305

57.

Sinharay

Johnson

M. S.

Stern

H. S.

(2006). Posterior predictive assessment of item response theory models. Applied Psychological Measurement, 30, 298–321. https://doi.org/10.1177/0146621605285517

58.

Soldatov

S. K.

Zasyad’ko

K. I.

Bogomolov

A. V.

Vonarshenko

A. P.

Solomka

A. V.

(2018). Professionally important skills of air traffic controllers. Human Physiology, 44, 775–778. https://doi.org/10.1134/s0362119718070150

59.

Thurstone

L. L.

(1937). Ability, motivation, and speed. Psychometrika, 2, 249–254. https://doi.org/10.1007/bf02287896

60.

van der Linden

W. J.

(2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181–204. https://doi.org/10.3102/10769986031002181

61.

van der Linden

W. J.

(2007). A hierarchical framework for modeling speed and accuracy on test items. Psychometrika, 72, 287–308. https://doi.org/10.1007/s11336-006-1478-z

62.

van der Linden

W. J.

(2009). Conceptual issues in response-time modeling. Journal of Educational Measurement, 46, 247–272. https://doi.org/10.1111/j.1745-3984.2009.00080.x

63.

van der Linden

W. J.

Guo

(2008). Bayesian procedures for identifying aberrant response-time patterns in adaptive testing. Psychometrika, 73, 365–384. https://doi.org/10.1007/s11336-007-9046-8

64.

Van Zandt

Colonius

Proctor

R. W.

(2000). A comparison of two response time models applied to perceptual matching. Psychonomic Bulletin & Review, 7, 208–256. https://doi.org/10.3758/bf03212980

65.

Vehtari

Gelman

Simpson

Carpenter

Bürkner

P.-C.

(2021). Rank-normalization, folding, and localization: An improved R for assessing convergence of MCMC (with discussion). Bayesian Analysis, 16, 667–718. https://doi.org/10.1214/20-ba1221

66.

Voyer

Bryden

M. P.

(1995). Magnitude of sex differences in spatial abilities: A meta-analysis and consideration of critical variables. Psychological Bulletin, 117, 250–270. https://doi.org/10.1037/0033-2909.117.2.250

67.

Walczyk

J. J.

Kelly

K. E.

Meche

S. D.

Braud

(1999). Time limitations enhance reading comprehension. Contemporary Educational Psychology, 24, 156–165. https://doi.org/10.1006/ceps.1998.0992

68.

Weirich

Hecht

Böhme

(2014). Modeling item position effects using generalized linear mixed models. Applied Psychological Measurement, 38, 535–548. https://doi.org/10.1177/0146621614534955

69.

Wickelgren

W. A.

(1977). Speed-accuracy tradeoff and information processing dynamics. Acta Psychologica, 41, 67–85. https://doi.org/10.1016/0001-6918(77)90012-9

70.

Wise

S. L.

(2017). Rapid-guessing behavior: Its identification, interpretation, and implications. Educational Measurement: Issues and Practice, 36, 52–61. https://doi.org/10.1111/emip.12165

71.

Wise

S. L.

DeMars

C. E.

(2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43, 19–38. https://doi.org/10.1111/j.1745-3984.2006.00002.x

72.

Zhan

Hong

Man

(2020). The multidimensional log-normal response time model: An exploration of the multidimensionality of latent processing speed. Acta Psychologica Sinica, 52, 1132–1142. https://doi.org/10.3724/sp.j.1041.2020.01132

73.

Zhan

Jiao

Man

Wang

W.-C.

(2021). Variable speed across dimensions of ability in the joint model for responses and response times. Frontiers in Psychology, 12, Article e469196. https://doi.org/10.3389/fpsyg.2021.469196

74.

Zhao

Gherri

Della Sala

(2019). Age effects in mental rotation are due to the use of a different strategy. Aging, Neuropsychology, and Cognition, 27, 471–488. https://doi.org/10.1080/13825585.2019.1632255

75.

Zitzmann

Hecht

(2019). Going beyond convergence in Bayesian estimation: Why precision matters too and how to assess it. Structural Equation Modeling: A Multidisciplinary Journal, 26, 646–661. https://doi.org/10.1080/10705511.2018.1545232

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

21.42 MB