Abstract
Objective
The present study investigated how pupil size and heart rate variability (HRV) can contribute to the prediction of operator performance. We illustrate how focusing on mental effort as the conceptual link between physiological measures and task performance can align relevant empirical findings across research domains.
Background
Physiological measures are often treated as indicators of operators’ mental state. Thereby, they could enable a continuous and unobtrusive assessment of operators’ current ability to perform the task.
Method
Fifty participants performed a process monitoring task consisting of ten 9-minute task blocks. Blocks alternated between low and high task demands, and the last two blocks introduced a task reward manipulation. We measured response times as primary performance indicator, pupil size and HRV as physiological measures, and mental fatigue, task engagement, and perceived effort as subjective ratings.
Results
Both increased pupil size and increased HRV significantly predicted better task performance. However, the underlying associations between physiological measures and performance were influenced by task demands and time on task. Pupil size, but not HRV, results were consistent with subjective ratings.
Conclusion
The empirical findings suggest that, by capturing variance in operators’ mental effort, physiological measures, specifically pupil size, can contribute to the prediction of task performance. Their predictive value is limited by confounding effects that alter the amount of effort required to achieve a given level of performance.
Application
The outlined conceptual approach and empirical results can guide study designs and performance prediction models that examine physiological measures as the basis for dynamic operator assistance.
Introduction
Extensive research has examined the reliability of physiological measures in estimating the mental state of operators engaged in supervisory control. To date, studies have primarily focused on demonstrating that physiological measures are sensitive to changes in task characteristics (Pütz et al., 2024; see also Bafna & Hansen, 2021; Charles & Nixon, 2019; Csathó et al., 2023; Tao et al., 2019) and can discriminate operator states (Ding et al., 2020; Tjolleng et al., 2017; Wilson & Russell, 2003, 2007). Physiological measures could thus provide a continuous and unobtrusive assessment of operators’ mental state, even when adverse changes in operator state have yet to manifest themselves in performance deficits (Sharples & Megaw, 2015). With growing empirical support, researchers have proposed expanding research on physiological measures to explore whether assessing operators’ state could be the basis for predicting operator performance (G. Hancock et al., 2021; Longo et al., 2022; Pütz et al., 2024). This prospect holds appeal from a theoretical and an applied perspective.
From a theoretical perspective, researchers can treat the operator’s performance as an individual-specific benchmark for physiological measures. Doing so allows them to account for the moderating role of inter-individual differences (i.e., human characteristics) on the relationship between task characteristics and the operator’s mental state (see Figure 1). This moderating influence is neglected when mapping physiological responses directly to changes in task characteristics across individuals, which illustrates the advantage of individual-based compared to group-based analyses (see, e.g., Wilson & Russell, 2007). From an applied perspective, using physiological measures to continuously assess the operator’s ability to perform the task provides the basis for dynamic operator assistance (Aricò et al., 2016; Di Flumeri et al., 2019; Freeman et al., 2004; Prinzel et al., 2003; Wilson & Russell, 2007). This offers a solution to the pitfalls of supervisory control, where task demands can fluctuate from passive monitoring under normal conditions to time-critical decision making in the event of system failures (Endsley, 2017; Sheridan, 2021). Abstract conceptual model of the role of operators’ mental state. 
Using physiological measures as predictors of task performance relies on establishing reliable associations between them. However, this can be challenging as different research domains have linked the same physiological responses to different operator states, leading to conflicting implications for their association with performance. This challenge is particularly evident when studies that focus on either
The Role of Mental Effort
With mental effort, we refer to engaging in a task by investing mental resources in service of instrumental behavior (Gendolla & Richter, 2010; Gendolla & Wright, 2009). Thus, we define mental effort in terms of information processing rather than subjective terms (Hockey, 1997; Shenhav et al., 2017). In this sense, mental effort mediates between (1) task characteristics and individual information-processing capacity (i.e., human characteristics) and (2) information-processing fidelity, reflected in task performance (Shenhav et al., 2017). On the one hand, this definition links mental effort to task engagement, which in turn has been defined as the “commitment to effort” (Matthews, 2021, p. 3). On the other hand, it differentiates mental effort from the subjective experience of perceiving a task as effortful. Distinguishing these meanings of the term
Research on mental fatigue has examined performance declines that result from the prolonged execution of mental tasks. As a key driver of this effect, researchers have identified a decrease in task engagement over time on task (Matthews, 2016, 2021; Matthews et al., 2014, 2017; Reinerman et al., 2006), that is, a decreased commitment to invest mental effort (Hockey, 2011; van der Linden, 2011). This decrease has been attributed to the depletion of mental resources through effort exertion (Baumeister et al., 2007; Warm et al., 2008), a diminishing cost-benefit ratio of performing the task (Boksem & Tops, 2008; Kurzban et al., 2013), and mind-wandering (Smallwood & Schooler, 2006). Notably, the decrease in mental effort invested in the task is often contrasted by an increase in the perceived effort of task execution (Neigel et al., 2020; Warm et al., 2008). On a physiological level, mental fatigue has been associated with decreases in pupil size and task-evoked pupillary responses (e.g., Hopstaken et al., 2015a, 2016; McIntire et al., 2014) and increases in HRV (e.g., Karthikeyan et al., 2022; Matuz et al., 2021; Melo et al., 2017). Thus, research on physiological indicators of mental fatigue has mostly gathered evidence associating impaired task performance with lower mental effort, smaller pupil size, and higher HRV.
Some researchers have examined the role of task engagement and mental effort in mental fatigue by manipulating task reward. They reasoned that increased motivation should counteract the effects of mental fatigue by facilitating task re-engagement and increased mental effort. Indeed, studies have shown that increasing task reward can lead to both retention (Herlambang et al., 2019) and recovery of task performance (Boksem et al., 2006; Hopstaken et al., 2015a, 2015b) as well as reduce the frequency of attentional lapses (Massar et al., 2016, 2019). On a physiological level, increasing task reward has been connected to increases in pupil size and task-evoked pupillary responses (e.g., Herlambang et al., 2019; Hopstaken et al., 2015a, 2015b, 2016), while the evidence on HRV remains limited, lacking conclusive findings (Herlambang et al., 2019). Thus, studies on the effect of task reward on mental fatigue support the aforementioned associations, linking better task performance to higher mental effort and larger pupil size.
Whereas research on mental fatigue often focuses on how mental effort varies over time on task, research on mental workload focuses primarily on how task demands affect operators’ mental effort. The core assumption is that humans cope with higher demands via additional mental effort (Kahneman, 1973; Shenhav et al., 2017), whereby workload refers to the ratio between invested effort and effort capacity (Longo et al., 2022; Young et al., 2015). In this context, there is usually no distinction between invested mental effort and perceived effort, as more demanding tasks require higher levels of information processing and are also perceived as more effortful. Consistent with the mental fatigue literature, the increases in mental effort are typically associated with increased pupil size and decreased HRV (see Charles & Nixon, 2019; Pütz et al., 2024; Tao et al., 2019). However, most studies also find that increasing task demands can impair task performance as the increased demands are not fully compensated by increased mental effort. As a result, the large body of research on physiological indicators of mental workload has mostly found associations of impaired task performance with higher mental effort, larger pupil size, and lower HRV.
To summarize, research on mental fatigue and mental workload usually find consistent associations of pupil size and HRV with mental effort but diverging associations of pupil size and HRV with task performance. We propose a synthesis of these findings in Figure 2, which includes the three task characteristics: task demands, time on task, and task reward. All three affect mental effort, which mediates between task characteristics and task performance. Unlike the other two, task demands have a direct effect on task performance by altering the level of mental effort required to achieve a certain level of performance. For example, if an individual invests the same effort despite an increase in task demands, performance will be impaired. Thus, both task demands and mental effort determine performance. Mental effort is associated with physiological responses such as pupil dilation and HRV reduction, which are related to improved task performance due to their common antecedent. Specified conceptual model of the role of operators’ mental effort. 
The Present Study
Given the outlined interdependencies, making reliable predictions of task performance requires information about both task demands and mental effort. Therefore, physiological measures might contribute to performance prediction by (partially) accounting for variance in task performance induced by changes in mental effort. In the present study, we tested this assumption in a process monitoring task. We manipulated task demands and task reward in addition to the progression of time on task to induce variance in participants’ mental effort. We investigated whether the variance in mental effort created covariance in physiological measures and task performance when accounting for the level of task demands. To this end, we first examined whether the three task characteristics showed the expected direct effects on performance and physiological and subjective measures (mental fatigue, task engagement, perceived effort) across participants. This analysis aimed to check the plausibility of mental effort as a viable link between physiological measures and task performance. Second, in our main analysis, we analyzed the intra-individual associations of pupil size and HRV with task performance to estimate their predictive value. Thereby, we tested our research hypothesis
Method
Participants
Fifty participants (28 women and 22 men;
Experimental Task
A simulated process monitoring task was developed in the Unity game engine (see Figure 3). Participants had to monitor a three-by-five grid of gauges that each indicated the continuous fluctuation of a simulated process parameter around a central value (cf. Shi & Rothrock, 2022; Yang & Kim, 2019). Participants were instructed to detect critical system events, which were defined as one process parameter reaching the lower or upper scale limit, and respond as fast as possible by clicking on the alarm button below the associated gauge. After a correct response, a confirmation marker was presented next to the respective gauge until the end of the event, which lasted 7 s each. Interface of the process monitoring task. 
For each participant, a 1 Hz time series of parameter values was sampled for each gauge. Parameter values could remain constant, increase, or decrease between successive timestamps. As the deviation of values from the center increased, the probability of further deviation decreased. Parameter values could not reach the scale limits outside of preselected timestamps. With a preselected event timestamp approaching, the respective parameter value was set to move towards the nearer of the two scale limits. At runtime, values were linearly interpolated between successive timestamps of the sampled time series to display continuous value transitions.
Two task demand levels were implemented. In the low task demand condition, value changes were fixed to one-twelfth of the scale per second, while in the high task demand condition, half of the value changes spanned one-sixth of the scale per second. Therefore, the two task demand levels differed in the consistency and maximum speed of value changes, that is, temporal uncertainty (Szalma & Claypoole, 2019). This made detecting gradual transitions of parameter values towards the scale limits more challenging. The high demand condition also resulted in larger average deviations of parameter values from the center of the scale. These manipulations were established in a pretest to ensure distinct task demand levels and minimize ceiling effects in task performance.
The task demand level alternated between the ten 9-minute task blocks, with half of the participants starting in the low and half in the high task demand condition. The rate of critical system events was set to three events per minute, that is, 27 events per block across all gauges. The timing of events was randomized for each participant and block, with no overlap of events and a minimum offset of 3 s between events. For all participants, the 270 events were evenly distributed among the 15 gauges to minimize systematic differences in gaze positions, which might have affected pupil size estimations. Successive events could not be indicated by the same gauge.
For blocks 9 and 10, a task performance reward was introduced. Participants were told that they would earn points for each response. The maximum number of points was 10, which decreased by 1 point for every 500 ms of response time to a minimum of 1. The earned point value was displayed for 1 s following the response. Participants were instructed that they could earn a bonus of 5 € if they earned more points than the average participant in a fictitious prestudy. In fact, all participants received the bonus at the end of the study. Placing the reward blocks at the end (cf. Hopstaken et al., 2015a; 2015b, 2016) was chosen so that the expected increase in motivation could be separated from mental fatigue effects over time on the task in statistical analyses.
Apparatus
Participants were seated at a desk in a lit testing room. The desk was flanked by partitions that blocked participants’ view of the rest of the lab space and the experimenter, who remained in the room during the experiment to check data recording. The experimental task was presented on a 27 in. IPS monitor with a resolution of 2,560 × 1,440 pixels at a distance of 70 cm from the participants, with the gauges occupying a 25 × 32 cm area in the center of the screen. The monitor refresh rate was set to 144 Hz, matching the fixed frame rate of the task application. To interact with the application, participants used a standard computer mouse.
Participants’ pupil size was measured by recording their pupil diameters at 60 Hz using an FX3 remote eye tracker running EyeWorks version 3.21 by EyeTracking. Ambient lighting conditions were kept constant across participants. In addition, participants wore a chest strap attached to a Movesense Medical single-channel electrocardiography (ECG) sensor, which has been successfully validated against a conventional 12-channel ECG sensor (Rogers et al., 2022). ECG data was collected at 512 Hz and transmitted via Bluetooth to a smartphone running the Movesense Showcase app version 1.1.
Measures
Performance measures
Response times were used as the primary performance measure. Failure to respond before the end of an event was labeled a miss. The long event duration of 7 s was intended to capture most of the variance in response times. Thus, misses were only considered as a secondary performance measure. To compare effect sizes in the statistical analyses, performance measures were aggregated at the block level by calculating median response times (RT) and miss rates (MR) per participant and block.
Physiological measures
Using the Pupil Diameter Analyzer (Kret & Sjak-Shie, 2019) of the PhysioData Toolbox version 0.6.3 (Sjak-Shie, 2022), we preprocessed raw pupil diameter data with the following sequential steps. Lower and upper cut-off values were set to 1.5 mm and 9 mm, respectively. Isolated data clusters were removed if they had durations of less than 50 ms and were separated from other clusters by more than 40 ms. Datapoints with a median absolute deviation (
Raw ECG data were preprocessed using Kubios HRV Premium version 3.5 (Tarvainen et al., 2014). Beat detection was followed by noise detection (set to “Medium”), artifact correction (Lipponen & Tarvainen, 2019), and the removal of nonstationary trends in the times series (Tarvainen et al., 2002). The resulting data were used to calculate the square root of the mean squared differences between successive RR intervals (RMSSD) as HRV indicator per participant and block. As a reference, we also report participants’ heart rate (HR) per block as a secondary ECG measure.
Subjective measures
Three subjective ratings were collected after each block as references for physiological measures. Mental fatigue (MF) and task engagement (TE) were assessed using single items on a scale from 0 (
Procedure
Participants were asked not to consume caffeine or nicotine for 4 hours, and alcohol for 12 hours, prior to the study. Upon arrival, participants handed over their smartphones and wristwatches to minimize external distractions. Then, they received written information about the study and provided signed informed consent. They were also informed that they would receive instructions on how to earn a bonus of 5 € later in the experiment. This was followed with preparation for physiological measurements, including eye makeup removal for eye-tracking. Finally, participants received written instructions for the experimental task and performed a 2-minute practice block. The duration of the practice block was examined in the pretest to achieve sufficient stabilization of task performance.
After answering participant questions, the experimenter initiated the experiment and the participants performed the ten task blocks. Participants were instructed to use the gaps between blocks to answer the subjective measure items only, and not to rest. Following block 8, they received short written instructions about the task reward condition. As a result, the average time between blocks 8 and 9 was about 25 s longer (
Data Analysis
After screening the performance data for outliers and physiological data for data quality, the main analysis was divided into two steps. First, to establish an overview of the direct effects of the included experimental manipulations, linear mixed models (LMM) were fitted to examine the effect of the three independent variables: task demands (low vs. high), time on task (1–10), and task reward (no reward vs. reward) on the two performance, three physiological, and three subjective measures. All models included interaction terms for task demands with the other two independent variables. Second, to test our hypothesis on the predictive value of physiological measures for task performance, PD and HRV were added sequentially to a baseline LMM of RT while controlling for the level of task demands. The likelihood ratios of the model steps were assessed to determine whether the physiological predictors added significant predictive value.
Model steps were also compared using the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). All LMMs included random intercepts for participants and were fitted with the R (version 4.3.1) package
Results
Data Check
We removed blocks from further analysis if they had an MR of 2
Manipulation Check
Performance measures
Figure 4 presents mean RT and MR. For RT, there was a significant effect of task demands, with longer RT in the high demand condition, and of task reward, with shorter RT in the reward blocks (see Table 1). The task reward effect was significantly larger in the high task demand condition. The effect of time on task was not significant. For MR, only the main effect of task reward was significant, with fewer misses when task reward was added. All other effects were nonsignificant. Results for the performance measures. Linear Mixed Models for Performance Measures.
Physiological measures
Figure 5 presents mean-centered PD, RMSSD, and HR. PD showed a significant effect of time on task, with a decrease of PD over time on task, and a significant effect of task reward, with larger PD in the reward blocks (see Table 2). The other effects were nonsignificant. For RMSSD, the analysis also yielded significant effects for time on task and task reward. RMSSD increased over time and increased further when task reward was added. The remaining effects yielded nonsignificant results. HR showed only a significant effect of task reward, decreasing as reward was added. Results for the physiological measures. Linear Mixed Models for Physiological Measures.
Subjective measures
Figure 6 presents the mean subjective ratings for MF, TE, and PE. Statistical analyses yielded congruent results across the three variables, with all showing significant effects of time on task and task reward, but no significant effect of task demands or either interaction term (see Table 3). Both MF and PE increased with time on task and decreased when task reward was added. TE showed the opposite effects, decreasing with time on task and increasing in the reward blocks. Results for the subjective measures. Linear Mixed Models for Subjective Measures.
Hypothesis Test
Hypothesis Test: Linear Mixed Model for the Prediction of Median Response Time.
Post-Hoc Analysis
Post-Hoc: Linear Mixed Model for the Prediction of Median Response Time (Blocks 1–8).
Post-Hoc: Linear Mixed Model for the Prediction of Median Response Time (Blocks 7–10).
Multilevel Correlations
Multilevel Correlations Between Measures.
Discussion
The data supported our research hypothesis that pupil size and HRV are significant predictors of task performance, that is, response times. Nonetheless, the question remains as to whether they are reliable predictors. Post-hoc analyses revealed nuances of their predictive value, namely, that the associations of physiological measures with performance depended on the data subset. We discuss the implications of these findings by first examining the convergence of the physiological and subjective measures as indicators of participants’ mental effort, followed by illustrating how the task characteristics might have influenced the link between mental effort and task performance.
Indicators of Mental Effort
The associated trends in pupil size and subjective measures support the established literature, which suggest that pupil size can be an effective index of mental effort (Kahneman, 1973; van der Wel & van Steenbergen, 2018). Blocks with increased pupil size were associated with reports of higher task engagement and accompanied by lower mental fatigue. Specifically, the three measures indicated that spending more time on task decreased the investment of mental effort, and task reward increased the investment of mental effort. Consistent with previous research, the two task characteristics also induced a dissociation between
Unlike pupil size, HRV results were inconsistent with subjective measures and prior expectations, as the time on task effect and the task reward effect were in the same direction. Although this pattern of effects is consistent with response times and the inferential analysis indicated a significant association, the observed increase in HRV with task reward casts doubt on the reliability of this finding. HRV is usually expected to decrease rather than increase with higher motivation (Herlambang et al., 2019, 2021). Thus, it seems likely that placing the reward blocks at the end of the experiment confounded the task reward effect with the usual increase in HRV over time on task (Csathó et al., 2023). As the combination of the experimental design and the data prevents distinguishing these effects, the present study provides less clear evidence for HRV compared to pupil size. Accordingly, we base the further discussion of participants’ mental effort on the converging results of pupil size and subjective ratings.
Effort and Performance
In contrast to the physiological and subjective measures, task performance differed between the two task demand levels, with performance impaired at higher task demands. This suggests that participants did not cope with higher demands via investing more mental effort, that is, higher task engagement, and, thus, could not maintain their performance level. This rationale is plausible in the investigated monitoring task where participants could opt to maintain their effort level as the speed of process parameter variations changed even if they (un)willingly compromised their response latency to critical events. Hence, the analysis supported a direct effect of task demands on task performance but not a mediation through mental effort (see Figure 7). The absence of a task demand effect on mental effort prevented effort and performance from Updated conceptual model of the role of operators’ mental effort. 
Task performance did not show a significant decrease over time on task, despite the decrease in mental effort indicated by decreasing pupil size and task engagement. Therefore, time on task also induced insensitivity between the variables of interest, with performance being
Finally, task reward showed the expected effect, as the respective increase in mental effort, indicated by larger pupil size and higher subjective task engagement, was
Research Recommendations.
Limitations
In this article, we have argued for mental effort as a pragmatic solution for linking physiological measures and task performance. We have shown how this approach integrates relevant theory and empirical evidence, and in the discussion section we have illustrated how it can be used to interpret unreliable associations between these variables. However, these interpretations should be treated with caution. Our experiment yielded a complex array of associative, nonassociative, and dissociative patterns between the examined variables, some of which deviated from our a priori assumptions. For example, we did not find reliable associations between pupil size and task performance outside of the task reward blocks. While this observation can be explained within an effort-based account, these explanations rely on post-hoc rationalizations (see differences between Figures 7 and 2) that require further empirical investigation and validation.
In addition, there are explanatory approaches in the literature for associations of physiological measures with task performance other than mental effort. For example, physiological measures have been used as indicators of a general arousal state that correlates with operators’ stress. Following this approach, the association between pupil size and task performance in the task reward blocks could be interpreted as participants being in a more performance-conducive arousal state. Here, the present study cannot provide definitive evidence for or against the potential, partially overlapping conceptual accounts. In fact, making such distinctions is hampered by the need to adhere to observational analyses when examining associations between physiological measures and performance, as neither can be directly manipulated as part of an experimental design. Furthermore, the theoretical nature of the psychological constructs and the indirect relationship of the physiological responses with cognitive states and processes make establishing one-to-one relations difficult or even conceptually unlikely. Thus, only the accumulation of evidence in future empirical research can conclusively answer the question of which conceptual account is most beneficial for describing and predicting associations between physiological measures and task performance in the search for physiological performance predictors.
Regarding our statistical analyses, we used a comparatively long time interval to aggregate physiological data. We did so to compare time intervals with the same number of critical events that we could match to per-block subjective ratings and to conduct joint analyses for pupil size and HRV, as the latter requires longer measurement intervals for reliable estimates. This approach allowed us to investigate overarching effects in associations between physiological measures and task performance to examine their value in assessing operators’ current ability to perform the task. That said, optimizing the length of the time interval provides further opportunities to get more precise estimates of physiological measures’ predictive potential. Moreover, we opted for linear relationships in statistical modeling as they were most suitable for the obtained data. However, researchers should also consider the possibility of nonlinear associations (e.g., van den Brink et al., 2016), especially in the investigation of operator overload.
Conclusion
Based on previous empirical findings, we have outlined how mental effort can serve as the necessary conceptual link between physiological measures and task performance, allowing for consistent interpretations across different domains of human factors research. On this basis, the present study indicated that physiological measures, specifically pupil size, can make a meaningful contribution to the prediction of task performance by capturing performance changes induced by variations in mental effort. However, the empirical findings also highlight the need to account for confounding effects that alter the association between effort and performance. This will be necessary to make reliable progress in establishing physiological performance predictors and using them for dynamic operator assistance.
Key Points
• Pupil size effectively captured changes in mental effort, indicating decreases over time on task and increases with the addition of task reward. • HRV results were inconclusive, as the effects of time on task and task reward were confounded and did not match subjective ratings. • Both pupil size and HRV significantly contributed to the prediction of task performance. • Task demands and time on task introduced confounding effects on the link between mental effort and task performance.
Supplemental Material
Supplemental Material - Physiological Predictors of Operator Performance: The Role of Mental Effort and its Link to Task Performance
Supplemental Material for Physiological Predictors of Operator Performance: The Role of Mental Effort and its Link to Task Performance by Sebastian Pütz, Alexander Mertens, Lewis L. Chuang, and Verena Nitsch in Human Factors.
Footnotes
Acknowledgments
The authors would like to thank their research assistants Annika Laura Felter, Mirlinda Hajdari, and Manuel Krebs for their support in conducting the study.
Author Contributions
Sebastian Pütz: Conceptualization, Formal analysis, Investigation, Methodology, Software, Visualization, Writing – original draft. Alexander Mertens: Funding acquisition, Project administration, Resources, Writing – review & editing. Lewis Chuang: Supervision, Writing – review & editing. Verena Nitsch: Resources, Supervision, Writing – review & editing.
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2023 Internet of Production – 390621612.
Supplemental Material
Supplemental material for this article is available online.
Author Biographies
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
