Abstract
A model of arousal dynamics is applied to predict objective performance and subjective sleepiness measures, including lapses and reaction time on a visual Performance Vigilance Test (vPVT), performance on a mathematical addition task (ADD), and the Karolinska Sleepiness Scale (KSS). The arousal dynamics model is comprised of a physiologically based flip-flop switch between the wake- and sleep-active neuronal populations and a dynamic circadian oscillator, thus allowing prediction of sleep propensity. Published group-level experimental constant routine (CR) and forced desynchrony (FD) data are used to calibrate the model to predict performance and sleepiness. Only the studies using dim light (<15 lux) during alertness measurements and controlling for sleep and entrainment before the start of the protocol are selected for modeling. This is done to avoid the direct alerting effects of light and effects of prior sleep debt and circadian misalignment on the data. The results show that linear combination of circadian and homeostatic drives is sufficient to predict dynamics of a variety of sleepiness and performance measures during CR and FD protocols, with sleep-wake cycles ranging from 20 to 42.85 h and a 2:1 wake-to-sleep ratio. New metrics relating model outputs to performance and sleepiness data are developed and tested against group average outcomes from 7 (vPVT lapses), 5 (ADD), and 8 (KSS) experimental protocols, showing good quantitative and qualitative agreement with the data (root mean squared error of 0.38, 0.19, and 0.35, respectively). The weights of the homeostatic and circadian effects are found to be different between the measures, with KSS having stronger homeostatic influence compared with the objective measures of performance. Using FD data in addition to CR data allows us to challenge the model in conditions of both acute sleep deprivation and structured circadian misalignment, ensuring that the role of the circadian and homeostatic drives in performance is properly captured.
In the modern 24/7 society, reduced alertness due to irregular work hours and insufficient sleep has become a major safety and productivity issue in occupational settings, on the roads, and for general health and well-being (Rajaratnam et al., 2013; Rajaratnam and Arendt, 2001). A physically based mathematical model that could predict alertness in wide variety of complicated operational settings would be of high practical utility. Alertness is affected by multiple physiological and environmental factors, including prior sleep history, chronic and acute sleep restriction, circadian phase, light exposure, use of stimulants and sedatives, work schedules, and the task being performed. So far, none of the existing models account for the complex interaction among these factors. In this study we enable prediction of most commonly used subjective sleepiness and objective performance measures in our arousal dynamics model and lay the groundwork for incorporation of these various additional factors.
Multiple models have been developed to predict alertness, sleepiness, and performance (Achermann and Borbély, 1994; Akerstedt et al., 2008; Akerstedt and Folkard, 1995, 1997; Belyavin and Spencer, 2004; Borbély, 1982; Daan et al., 1984; Folkard et al., 1999; Fulcher et al., 2010; Hursh et al., 2004; Ingre et al., 2014; Jewett and Kronauer, 1999; Johnson et al., 2004; McCauley et al., 2013; Phillips et al., 2017; Rajdev et al., 2013; Ratcliff and Van Dongen, 2011; St. Hilaire et al., 2017). Collectively, these models have been applied to predict group-level subjective sleepiness and objective performance during acute sleep deprivation, chronic sleep restriction, jet-lag, and some types of shiftwork. Despite these advances, there are several limitations: Most of the above models (1) focused on only one experimentally observed measure of alertness, which limits future testing to studies reporting that measure, and (2) used data sets with high illuminance of different levels (often >150 lux) to fit alertness parameters but did not correct for those differences. The use of experimental data obtained under different light conditions is undesirable for model development because light has been shown to directly increase alertness independent of its phase-shifting effects (Cajochen et al., 2000; Chang et al., 2013; Phipps-Nelson et al., 2003; Rahman et al., 2014; Rüger et al., 2006); thus, light must be controlled during alertness measurements if models are to be correctly calibrated.
In this study we apply our model of arousal dynamics that accounts for sleep propensity (Postnova et al., 2016) to predict sleepiness and performance and to account for the above limitations. In particular, (1) our arousal dynamics model predicts sleep times as calibrated and tested against laboratory sleep deprivation and forced desynchrony data and does not require them as an input; (2) we develop and test a variety of alertness measures against 10 published experimental studies reporting sleepiness as well as performance during controlled constant routine (CR) and forced desynchrony (FD) protocols; and (3) we select only studies that use dim light exposure (<15 lux) during wakefulness to avoid the direct alerting effects of light. We focus on FD and CR data, because FD data allow us to investigate the effects of the circadian and homeostatic drives separately on performance and sleepiness, whereas CR data demonstrate their combined effect at one particular phase relationship (Lockley et al., 2008).
Methods
A schematic of our model of arousal dynamics is shown in Figure 1. It incorporates mutually inhibitory interactions between the monoaminergic (MA) wake-active neuronal populations and the sleep-active ventrolateral preoptic nucleus (VLPO) (Phillips and Robinson, 2007; Saper et al., 2010). Activity of these populations is regulated by the homeostatic (H) and circadian (C) drives, which collectively introduce a sleep drive to the VLPO depending on the time awake and circadian phase. The phase of the drive C, in turn, depends on the prior light exposure according to the human phase- and dose-response curves (Forger et al., 1999; Jewett and Kronauer, 1999; Kronauer et al., 1999; St. Hilaire et al., 2007, 2012). The model has been used previously to examine the effects of shiftwork, sleep deprivation, caffeine, and other conditions (Fulcher et al., 2010; Phillips et al., 2011; Postnova et al., 2012, 2013; Puckeridge et al., 2011; Robinson et al., 2015) and was recently improved to account for the circadian profile of sleep propensity (Postnova et al., 2016). The model and methods for simulation of experimental protocols have been described previously in great detail (Phillips et al., 2011; Postnova et al., 2014, 2016) but we provide all equations and parameter values in Table 1 for completeness.

Schematic of the key model components. Illuminance of the environmental light, I, is gated by the sleep-wake state, S, and processed by the photic drive, P, which simulates the activation-deactivation dynamics of photoreceptors in the eye. Nonphotic drive, Dn, is modulated by S and, together with P, affects the circadian drive, C, which is represented by a nonlinear combination of the circadian variables X and Y. Drive C suppresses the activity of the sleep-active neurons in the ventrolateral preoptic nucleus (VLPO), reducing its mean voltage Vv and mean firing rate Qv. The VLPO has a mutually inhibitory connection with the wake-active monoaminergic nuclei (MA), thus constituting the so-called sleep-wake switch. Activation of the MA (wake) leads to increase of the homeostatic drive H, which together with the drive C provides for a total sleep drive to the VLPO, which controls the switch between the arousal states. When sleep deprivation is simulated, a wake effort, W, is introduced to keep MA in a wake state with Vm ≥ VWE. For further detail, see Table 1 and Postnova et al. (2016).
Model equations and parameter values.
For further detail, see Figure 1 and Postnova et al. (2016).
At the start of each simulation, the model is in an entrained state with sleep appearing between 2330 and 0800 h. Ambient light conditions for entrainment are 250 lux between 0800 h and 2000 h, 40 lux outside of these hours if awake, and 0 lux during sleep.
Studies Modeled
From the experimental literature on alertness, sleepiness, and performance, we select laboratory sleep studies that described alertness dynamics in healthy participants either during acute sleep deprivation (CR) or during FD protocols and that (1) used dim light levels below 15 photopic lux during wakefulness to avoid direct alerting effects of light; (2) reported results for visual Performance Vigilance Test (vPVT) and/or Karolinska Sleepiness Scale (KSS) among other measures (vPVT and KSS were selected because they were most commonly reported in literature); (3) measured performance and sleepiness multiple times during wakefulness with protocol duration and sampling rate sufficient to illustrate the underlying circadian dynamics (typically <4 h step size for ≥24 h protocols); and (4) controlled for sleep and circadian entrainment via prestudy screening of sleep and measurement of the circadian phase at the baseline to reduce the potential effects of circadian misalignment and sleep debt on alertness. For FD protocols, we choose only those that had a 2:1 wake-to-sleep ratio to minimize effects of chronic sleep restriction because these are not yet incorporated into our model. We also do not include studies reporting results for different chronotypes, as this is beyond the aim of the current work. The resulting list of studies is shown in Table 2 with the corresponding objective and subjective measures of performance and sleepiness that were reported and are being predicted in this study.
Experimental studies used for model fitting and testing.
ADD = mathematical addition task; CRdur = duration of constant routine; DSST = digit symbol substitution task; KSS = Karolinska Sleepiness Scale; N = number of participants; T = duration of the sleep-wake cycles on forced desynchrony protocols; vPVTL = number of lapses on a visual Performance Vigilance Test; vPVTRT = reaction time on a visual Performance Vigilance Test. ✓ indicates measures reported in a given study; ✓ m and ✓ b indicate studies where the measure was reported as a deviation either from the study mean or from the baseline mean, respectively. Data sets that were used for fitting of the default parameter set are indicated with asterisks.
The CR studies in Table 2 reported dynamics over time awake in conditions of dim light and constant posture for up to 40 h of wakefulness (Chua et al., 2012; Ftouni et al., 2013; Jung et al., 2011; Shekleton et al., 2013). These reflect the interplay of the circadian and homeostatic drives in their control of sleepiness and performance at one phase relationship. Conversely, FD studies are designed to uncouple the C and H drives to study their effects separately and their interactions at all phase relationships. This is done by introducing sleep-wake cycles of durations (T) that are sufficiently different from 24 h to prevent circadian entrainment (i.e., outside the range of entrainment) and have sleep scheduled at different circadian phases. Studies considered here had T = 20 h (Silva et al., 2010; Wyatt et al., 1999), T = 28 h (Burke et al., 2015; Lee et al., 2009), and T = 42.85 h (Grady et al., 2010; Wyatt et al., 2004). These protocols lasted for about a month to collect sufficient amount of data with waking time spent in dim lighting with typical illuminance during wakefulness of 3.3 lux and never above 15 lux. Performance and sleepiness measures were collected at 1- to 3-h intervals during scheduled wake episodes.
To extract the data values from figures in publications, a java-based shareware program was used (DataThief III; http://datathief.org/). This allowed for digital estimation of data values and enabled quantitative comparison with simulated data.
Additional Processing of Experimental Data
Three measures of performance and sleepiness were most commonly reported across the above studies. (1) The first measure entailed objective measurements derived from reaction time on a visual Performance Vigilance Test (vPVTRT). All studies used a 10-min vPVT test, and the derived measurements included total number of lapses (vPVTL) during the test (a lapse is counted when vPVTRT is >500 ms), mean and median vPVTRT, and mean fastest and slowest 10% vPVTRT. (2) The second measure consisted of number of correct answers on cognitive tasks such as a mathematical addition task (ADD) and digit symbol substitution task (DSST). Duration of these tests varied across the studies from 2 to 4 min for ADD and from 1.5 to 3 min for DSST. (3) The third measure was the subjective score on the KSS, which ranges from 1 = “Extremely alert” to 9 = “Extremely sleepy, fighting sleep.”
Most experimental studies reported absolute (nonrelative) values for vPVT and KSS, with some reporting relative (e.g., to baseline or study mean) and/or otherwise transformed data. Some types of transformations, such as logarithmic transformation for vPVTRT or square root transformation for vPVTL (e.g., Anderson et al. 2013; Ftouni et al. 2013), cannot be recalculated back to absolute values from published group mean data because the transformations were done on individual data before averaging. Thus, these types of transformed data cannot be compared with nontransformed data and are not used here. We aim to predict absolute values of vPVT and KSS, so only nontransformed data are used for model calibration for these measures, while data reported as relative to baseline/study mean are used for additional testing.
Unlike vPVT and KSS, both ADD and DSST have a learning component; that is, the number of correct responses is affected not only by alertness and test duration but also by how many times the test was repeated, which varies significantly across the protocols. As a result, these data are not directly comparable between the studies. Thus, to enable comparison and modeling of ADD and DSST measures, we (1) recalculate all experimental ADD and DSST data as a mean number of correct tasks per 1 min and (2) calculate deviation from study mean, which minimizes learning effects when the data are averaged according to time after wake or to circadian phase. These processed cognitive measures, ΔADDm and ΔDSSTm, demonstrate the overall trend of changes in performance depending on time since wake or the circadian phase but not the actual values.
Additional processing of experimental data includes removal of data that might be affected by sleep inertia, as most of the studies did not provide enough data during the first 3.5 h after awakening. Thus, sleep inertia is not modeled here, and the first 3.5 h of data are removed from all times after wake data sets to avoid the potential effects of sleep inertia on the model fitting parameters (Jewett et al., 1999).
Processing of Simulated Data
We simulate each of the experimental protocols in Table 2 with the arousal dynamics model. The outputs used to model sleepiness and performance measures are processed in the same way as reported in the respective CR and FD experiments. In particular, to align with experimental data in CR studies, values of the model variables are selected at the same times as data reported in experiments. In FD protocols for data reported versus time awake, each model variable is averaged across all wake episodes to obtain averaged time series versus time awake for that variable, and the data points at times reported in the experiments are selected. For the circadian phase dependency, the variable values observed during wakefulness are averaged within 60° circadian bins relative to core body temperature minimum (CBTmin) for all studies except Wyatt et al. (2004), where the phase is reported relative to melatonin peak (MELpeak). These processed variables are then used to predict performance and sleepiness.
Modeling Performance and Sleepiness
We use multivariate linear regression to predict temporal dynamics of performance and sleepiness measures from variables of the arousal dynamics model. A general form for linear regression is
where A is the desired alertness measure, n is the number of input variables, and F is the matrix of input variables with each variable represented by a vector of values Fj. Equation 18 is directly applied for prediction of KSS and vPVT measures. Prediction of ΔADDm and ΔDSSTm requires adjustment, however, as deviations from the study mean rather than absolute values are predicted to avoid learning effect. In this case, Equation 1 is modified as
with m denoting deviation from the study mean. Equation 19 has one less parameter than Equation 18 because the bias term is zero.
Calibration and Resulting Default Parameter Set
To determine the default parameter set, Θ, that should be used in future predictions of performance and sleepiness where no information except a protocol is available, we calibrate the model by fitting it to all suitable data at the same time. The studies used for fitting the default parameter set for different measures are indicated with an asterisk in Table 2. The vector of parameters Θ that gives the best fit to experimental observations E is found by solving the normal equation:
For the fitting of deviations from the mean, this is represented by
The same formula is used for fitting deviations from the baseline mean (b). In Table 2, the studies reporting deviation from the study and baseline means are indicated with m and b, respectively.
We use leave-one-out cross-validation to estimate how well this default model would generalize to an independent data set for each of the sleepiness and performance measures. To do so, we iteratively choose one test study for each of the modeled measures and fit the model in Equations 18 or 19 to the remaining data. The goodness of fit is then calculated for the test data using normalized root mean squared error (NRMSE):
where N is the number of data points. The final prediction error, 〈NRMSE〉, was calculated as the mean of the errors (NRMSE) across all test data sets. We also calculated prediction error including studies reporting data relative to mean or baseline, 〈NRMSEmb〉. This is the same as 〈NRMSE〉 for relative measures (ΔADDm and ΔDSSTm) but different for vPVT and KSS, because for them the relative data were not used in calibrating the default parameter set.
The goodness of fit is best when NRMSE = 0, indicating that predicted data fully coincides with experimental data (A = E). NRMSE > 1 indicates that on average, the difference between predictions and data is larger than the experimental data range. Overall, lower NRMSE indicates better agreement between the model and the data.
Results
Of the measures reported in the experimental studies in Table 2, we focus on vPVTL, ΔADDm, and KSS, because they had the most available data. We also demonstrate default parameter sets and prediction error for the remaining measures, ΔDSSTm and vPVTRT, but note that those should be revisited when more data are available.
Since both the homeostatic and circadian drives are known to affect alertness measures, we test different homeostatic and circadian input variable combinations. The physiological mechanisms of various alertness measures are still not known. Thus, it is important to test different input combinations to (1) examine whether the same inputs can predict different outputs (e.g., both objective and subjective measures) and (2) identify the smallest number of inputs allowing for a good quantitative fit. Circadian variables include C, X, Y and W (wake effort), and the homeostatic variables are H and W. Note that the wake effort, W (not to be confused with the Process W describing sleep inertia in Akerstedt and Folkard, 1995), is affected by both H and C and thus appears in both sets. We find that (1) linear combinations with 2 input variables (one homeostatic and one circadian) demonstrate similar 〈NRMSE〉 compared with combinations with 3 or more input variables despite their having a higher number of fitting parameters; (2) use of W alone, as done by Fulcher et al. (2010), provides a good overall fit but underpredicts circadian influence during prolonged wakefulness; (3) most of the 2-variable combinations demonstrate similar prediction error but combinations H,X, and H,Y show worse overall performance with an average of 33% higher 〈NRMSE〉 across the sleepiness and performance measures; (4) the remaining five 2-variable combinations all showed good predictions with mean 〈NRMSE〉 across vPVTL, ΔADDm, and KSS of 0.37 ± 0.06. As such, there is not enough evidence to clearly select one of these linear combinations over the other on the basis of their performance on the available data set.
We show detailed results for a linear combination of H and C, because this (1) allows us to study the ratio of the homeostatic and circadian influence on performance and sleepiness directly and (2) is commonly used in other models based on the two-process concept. Equation 18 for prediction of sleepiness/performance measures, A, is thus modified as
Calibrated Model and Default Parameter Set
Table 3 shows the parameters Θ0, Θ H , Θ C , for the different performance and sleepiness measures obtained by simultaneously fitting the linear combination of H and C produced by simulation of the experimental protocols with the arousal dynamics model to all eligible data (see Methods). These are the default parameters that should be used for prediction of performance and sleepiness when there is no prior knowledge of participants’ dynamics. The prediction errors 〈NRMSE〉 and 〈NRMSEmb〉 in Table 3 show how well the model would be expected to perform against an independent data set as calculated with a standard leave-one-out cross-validation approach and, in case of PVT and KSS, addition of independent testing data sets reporting values relative to study and baseline mean. As seen from the table, the prediction error for ΔADDm (〈NRMSE〉 = 0.19) is lower than for vPVTL or KSS (0.4 and 0.23). This is explained by subtraction of the study mean from the ADD values, which minimizes individuals’ inherent variation in ADD levels and learning rates, leaving only the dynamic response to the protocols. The error calculated across all available data for vPVTL and KSS including the testing data sets reported as relative to study/baseline mean 〈NRMSEmb〉, however, indicates improved prediction for vPVTL and worsened predictions for KSS (0.38 vs. 0.4).
Default parameters and model prediction error.
Ratio |Θ H /ΘC| shows relative weightings of the homeostatic and circadian drives in each measure. 〈NRMSE〉 is the model prediction error obtained with leave-one-out cross-validation approach using all eligible data. 〈NRMSEmb〉 is the model prediction error accounting for data reported as the deviation from the study mean and from baseline (if available) where the nontransformed value was modeled. The values shown in parentheses correspond to the fitting case excluding the Shekleton et al. (2013) study from the data set.
The higher 〈NRMSE〉 for vPVTL compared with KSS is largely explained by the differences in the PVT data reported in the study by Shekleton et al. (2013) compared with the others as demonstrated in Figure 2. Conversely, KSS values reported by Shekleton et al. are similar to the other studies. The protocols and measurements in these studies are nearly identical, so the differences in vPVT are expected to be mainly due to individual variability and small numbers of subjects in all studies (n < 20). To demonstrate this, we also fit the model and run cross-validation for all studies excluding the Shekleton et al. data, which leads to a lower 〈NRMSE〉 = 0.31 for vPVTL but a slightly higher 〈NRMSE〉 = 0.28 for KSS.

Model predictions for the number of visual PVT lapses (vPVTL) compared with experimental data obtained during constant routine (CR) and forced desynchrony (FD) protocols. Model predictions with nominal data set are shown with solid black line for CR and filled black circles for FD protocols. Best model fits to individual studies are shown in gray and data are shown with open symbols. (a) vPVTL versus time awake during CR. Experimental data are from Jung et al. (2011) (○), Shekleton et al. (2013) (□), and Chua et al. (2012) (∇). (b) vPVTL dynamics on T = 20 h protocol by Silva et al. (2010). (c) dynamics of vPVTL relative to study mean on T = 20 h protocol by Wyatt et al. (1999). (d) vPVTL dynamics on T = 28 h protocol by Lee et al. (2009) and (e) on T = 42.85 h protocol by Grady et al. (2010). Left panels show data versus time awake. Right panels show data versus circadian phase relative to CBT minimum, which are averaged within 60° circadian bins.
To estimate how good the obtained 〈NRMSE〉 values are, we also calculate NRMSE between experimental studies to estimate variability in experimental data. We find that NRMSE comparing experimental data on similar protocols ranges from 0.103 for ΔADDm to 0.87 for vPVTL and 0.419 for KSS. This shows that our model prediction error falls in the range of between-study variability.
The goodness of fit obtained with the default parameter set for the specific studies is shown in Table 4 and the dynamics are shown in Figures 2 through 4. In addition, we demonstrate the dynamics and goodness of fit for best fits to individual studies, where Equation 6 is fitted to each data set separately, to examine variability in fitted parameters among the studies in Figure 5 and compare to the default model.
Goodness of fit for default parameters compared with best study fit in specific studies.
The table shows comparison of goodness of fit in individual studies using default Θ and best study fit (Θstudy) parameters. For calculation of default NRMSE, the parameters in Table 3 were used for each of the corresponding measures and are the same for all studies. Best study fit NRMSE is found when the model is fitted to each specific study separately, so each study and sleepiness/performance measure has its own parameter set, Θstudy. For FD protocols, the best study fit is found across combined time since waking and circadian data. Best study fit NRMSE is calculated separately for time since waking and circadian phase data to illustrate how well the homeostatic and circadian dynamics are reproduced.
Prediction of Objective Performance—vPVTL
Comparison of the model dynamics with experimental data for vPVTL is shown in Figure 2, with the goodness of fit to specific studies reported in Table 4. For vPVTL, the default parameter set Θ0, Θ H , Θ C in Table 3 is calibrated by fitting to data from 3 CR and 3 FD studies shown in Figure 2 (a, b, d, e). The FD study by Wyatt et al. (1999) in Figure 2c is used only for testing because it reported values relative to study mean. We demonstrate the dynamics of the default model (shown in black) along with the best study fit (gray) obtained by fitting to the specific studies separately.
On a CR protocol in Figure 2a, the default model (black line) predicts vPVTL dynamics that fall between the experimentally reported values due to large variability in the experimental data. The high number of lapses reported in the study by Shekleton et al. compared with others leads to increased homeostatic influence in the default parameter set. Individual study fits, however, show very good qualitative and quantitative agreement with all three CR data sets. A low number of lapses is observed during the first 16 h after awakening, and the wake maintenance zone (WMZ) is prominent in the individual fits for studies by Chua et al. (2012) and Jung et al. (2011). The worst vPVTL performance (23 lapses) on the protocol is observed at 24.5 to 27 h after awakening (~0830-1130 h). The performance subsequently improves as time awake is increased due to circadian influence and reaches 19 lapses at 37.8 h after waking (~2150 h).
The model simulations of the FD protocols in Figure 2 (b-e) reproduced the decrease in performance (increase in vPVTL) as a function of time since wake with the default parameter set providing good agreement across 2 out of 4 studies. The best study fit parameters allow for near identical match to most data with mean NRMSE of 0.17, demonstrating that the observed difference for the default set is likely due to individual variability between studies participants.
The model, likewise, reproduces the dynamics versus circadian phase with maximum vPVTL (worst performance) predicted at CBTmin (0° or 0600 h) and minimum at 240 circadian degrees (2200 h). Interestingly, most of the experimental studies show slightly higher vPVTL at 60° (1000 h) than at 0°. However, due to averaging of the experimental data over 4-h intervals (60° bins) the exact position of the maximum and minimum of performance in the data cannot be estimated. The observed difference in the position of vPVTL maximum in the model and data in Figure 2 (b-e) can be related to differences in timing of CBTmin because the data are plotted relative to CBTmin. In the model, timing of CBTmin is adjusted based on Dijk et al. (1997) (see Postnova et al., 2016) and, thus, appears at 0600 h. In the experimental studies considered here, CBTmin was not reported, so it can be different from that used in the model.
Prediction of Objective Performance with Learning Effect—ΔADDm
Predictions for ADD deviations from the study mean (i.e., number of correct answers on mathematical tasks per minute relative to study mean) are shown in Figure 3 and demonstrate very good agreement with experimental studies for both default and best study fit cases, as confirmed by low NRMSE in Table 4. Here the default parameter set was calibrated using all available data because we predict ADD relative to study mean rather than actual values (due to learning effect).

Model predictions for number of correct ADD tasks per minute relative to the protocol mean versus experimental data on FD protocols. Model predictions with the default parameters are shown with filled black circles, best fit to individual studies are shown with filled gray circles, and data are shown with open circles. In all simulations, left panels show data versus time awake. Right panels show data versus circadian phase relative to CBT minimum, which are averaged within 60° circadian bins. (a) Dynamics of ADD per minute relative to study mean on T = 20 h protocol by Silva et al. (2010) (the only circadian data available); (b) on T = 20 h protocol by Wyatt et al. (1999); (c) on T = 28 h protocol by Burke et al. (2015); (d) on T = 28 h protocol by Lee et al. (2009); (e) on T = 42.85 h protocol by Grady et al. (2010).
As expected, ΔADDm performance decreases with time awake on all FD protocols. Dependence on circadian phase in the right panels of Figure 3 shows a profile similar to those of vPVTL, with best performance at 240° (2200 h in the entrained case) and worst around 0°. Notably, the worst ADD performance in the experimental data has been reported at both 0° (Fig. 3b) and 60° (Fig. 3, c and e), with some studies demonstrating similar values at these circadian phases (Fig. 3, a and d).
Overall, quantitative agreement for ΔADDm is better than for other measures as demonstrated with low NRMSE in Table 4 and explained above for the default parameter set. This is further confirmed by nearly identical predictions for default and best study fit parameter sets (the gray markers in Fig. 3 mostly overlap with the black ones).
Prediction of Subjective Sleepiness—KSS
Comparison of the model KSS dynamics with experimental data is shown in Figure 4 for both CR (Fig. 4a) and FD (Fig. 4, b-f) protocols with the quantitative goodness of fit to specific studies in Table 4. The default parameters were identified using 2 CR and 2 FD studies (Fig. 4, a, b, d). The studies in Figure 4 (c, e, and f) were used only for testing as they report KSS relative to study (n = 1) and baseline (n = 2) mean.

Model predictions for KSS compared with experimental data obtained during constant routine (CR) and forced desynchrony (FD) protocols. Model predictions with the default parameters are shown with solid black line for CR and filled black circles for FD. Best model fits to individual studies are shown in gray, and data are shown with open symbols. (a) Predicted and experimental KSS versus time since wake during CR. Experimental data are from Ftouni et al. (2013) (○), Shekleton et al. (2013) (□), and Chua et al. (2012) (∇). (b) KSS dynamics on T = 20 h protocol by Silva et al. (2010); (c) dynamics of KSS relative to study mean on T = 20 h protocol by Wyatt et al. (1999); (d) KSS dynamics on T = 28 h protocol by Burke et al. (2015); (e) dynamics of KSS relative to baseline mean on T = 42.85 h protocol by Wyatt et al. (2004); and (f) KSS relative to baseline mean on T = 42.85 h protocol by Grady et al. (2010). Right panels show data versus circadian phase relative to melatonin peak in Figure 4e and to CBT minimum otherwise.
In the CR case (Fig. 4a), the model predicts a gradual increase of KSS (increased sleepiness) during normal wake without the WMZ in any of the predictions, which is consistent with experimental observations. KSS increases abruptly after about 16 h of wakefulness (2400 h with 0800 h wake time) and reaches highest sleepiness on the protocol (KSS = 6.9) at 26.1 h after awakening (i.e., around 1000 h). Sleepiness decreases afterward due to circadian influence and reaches a local minimum around 36.5 h after awakening (~2030 h). The key features of KSS dynamics are also reproduced in FD protocols in Figure 4 (b-f). KSS increases with time awake, in qualitative agreement with the data as seen in the left panels. When plotted against the circadian phase (right panels), KSS shows circadian variation with minimum sleepiness at 240° (2000 h) relative to CBTmin and maximum sleepiness at 0° (0600 h).
As seen from the low NRMSE values in Table 4 and Figure 4, the model shows good agreement with CR and most of the FD studies. However, the default model prediction for ΔKSSb for FD data in Wyatt et al. (2004) (Fig. 4d) and Grady et al. (2010) (Fig. 4e) shows a systematic error with the model predictions shifted to higher values, indicating the likely lower baseline KSS levels in the model compared with both data sets. Notably, the results in the two experiments do not align well with each other either (NRMSE between these studies is 0.419) demonstrating systematically higher values in the study by Grady et al. (2010) (by ~1 KSS point) despite its identical protocol to Wyatt et al. (2004). Individual fits to these studies, however, show very good alignment with the data as seen by significantly lower NRMSE in Table 4, indicating that the observed differences are likely due to individual variability and not due to differences in the underlying mechanisms.
Parameter Scatter in Individual Studies
Fitting the model to individual studies, as done above for the best study fits, allows us to investigate variability of the fitting parameters across these studies. Such variability is expected due to interindividual variability, which affects group averages reported in the papers, especially when the number of subjects is small, which is the case for most CR and FD experiments.
Figure 5 demonstrates the variability in parameter pairs of

Scatter plots of best study fit parameters. (a) Best Θstudy fits for studies reporting visual performance lapses, vPVTL; (b) best Θstudy fits for ΔADDm; and (c) best Θstudy fits for KSS. Pairs of parameters
Discussion
We demonstrated application of our arousal dynamics model to predict objective performance and subjective sleepiness during acute sleep deprivation and structured circadian misalignment. We have calibrated the model for dynamic prediction of widely used experimental measures: lapses on a visual performance vigilance test (vPVTL), correct additions on a mathematical addition task (ADD), and a subjective sleepiness score on the Karolinska Sleepiness Scale (KSS). We have also determined initial estimates for the default parameter sets to predict other vPVT measures (e.g., reaction time) and performance on a digit symbol substitution task (DSST), which can be further improved when more data become available. Other measures (e.g., ocular measures of drowsiness) can now also be implemented following the steps presented here. Prediction of a variety of measures allows for a broader validation of the model against experimental studies, which was limited in prior models because they focused on one or the other of these measures.
We find that all of the considered performance and sleepiness measures can be predicted using a linear combination of the same two model variables, the homeostatic and circadian drives, which is in line with earlier studies showing strong correlation between the measures (Bermudez et al., 2016). This confirms involvement of the same core mechanisms responsible for the alertness-related dynamics on a variety of tasks, even though there are differences in the specific physiological systems involved (e.g., muscle reaction for vPVT and subjective estimates for KSS). Furthermore, we have also found that almost every 2-variable combination of the arousal dynamics model variables reflecting homeostatic (H, W) and circadian (C, X, Y, W) dynamics is able to predict sleepiness and performance measures fairly well. Prediction of sleepiness and performance measures using wake effort alone, as done by Fulcher et al. (2010), is also successful in the improved model, but the fit is better for models with two variables.
The model predicts that different measures require different ratios between the circadian and homeostatic drives (|Θ H /ΘC|). Based on the default parameters, the impact of the homeostatic drive is predicted to be stronger in KSS than in the objective measures as shown in Table 3. Examination of H/C contribution in specific studies, however, does not fully align with the prediction based on combined data. Instead the Θ H , Θ C pairs are scattered around the equity line as shown in Figure 5, so more data are needed to clarify the dependencies.
The ratio between the homeostatic and circadian drives is again different in their effect on sleep times and sleep propensity with |Θ H /ΘC| = 2, found to reproduce sleep latency dynamics on forced desynchrony protocols (Postnova et al., 2016). Interestingly, the original model of Phillips et al. (2010, 2011) implied |Θ H /ΘC| = 0.17 for sleep-wake cycles, which is significantly lower than any of the ratios obtained here or by Postnova et al. (2016) and inferred a strongly dominating circadian drive. The present findings with the model calibrated on data suggest a more balanced influence of both the circadian and homeostatic drives on alertness and sleep than found in earlier model versions.
As demonstrated with the prediction error in Table 3, the model shows good agreement with experimental data for KSS, vPVTL, and ΔADDm obtained during constant routine protocols of up to 40 h of wakefulness and during forced desynchrony protocols with different durations of sleep-wake cycles, T = 20, 28, and 42.85 h. The default parameter set fitted across available studies allows for future predictions of mean group performance and sleepiness levels when no prior knowledge about subjects’ dynamics is available. Fitting to specific study protocols, as shown with the best study fits in Figures 2 through 4, is useful to improve predictions when alertness data are available on one protocol for a subject or group of subjects and predictions are needed for another protocol. Note that the model can be applied for both group and individual predictions without changes in the model structure, but parameters need to be adjusted.
We have observed significant variability in the experimental data obtained on similar protocols, most prominent example being the vPVT data in Shekleton et al. study compared with those in Chua et al. and Jung et al. As seen in Figure 2a, these differences are not obvious during baseline wakefulness but become apparent during extended wake beyond 16 h demonstrating vulnerability to sleep deprivation. Given that there are no unreported differences in the protocols, this indicates that prediction of performance for new groups of people or individuals can only be indicative of the overall dynamics, while quantitative predictions would require prior knowledge about the individuals in order to adjust the model parameters (e.g., their chronotype, age, and/or sex). This further supports the importance of studies investigating baseline markers of vulnerability to sleep deprivation, such as that of Patanaik et al. (2015), and of future research aimed at individual predictions.
The performance and sleepiness measures in the model were developed and calibrated using constant routine and forced desynchrony protocols with 2:1 wake-to-sleep ratio. The model can thus be applied to predict these measures under conditions of acute sleep deprivation, nonchronic sleep restriction, and circadian misalignment where sleep is not externally restricted. The use of the experimental studies using dim light (<15 lux) for parameter calibration results in a model that accounts for the core mechanisms while minimizing environmental influences. This will enable us to properly simulate the direct alerting effects of light in future work (Cajochen et al., 2000; Chang et al., 2013; Phipps-Nelson et al., 2003; Rahman et al., 2014; Rüger et al., 2006). According to the alertness dose-response curve to light reported by Cajochen et al. (2000), there is no significant effect of light on alertness for illuminances of up to ~60 lux (photopic), so the current model is applicable under conditions of 0 to 60 lux but would overpredict sleepiness (underpredict performance) under higher light levels.
In summary, the strengths of this model compared with most other models of alertness are (1) it uses only dim-light data to avoid unaccounted environmental influences on alertness; (2) it is calibrated against a larger number of experimental studies; (3) it predicts multiple performance and sleepiness measures at once, allowing for broader application and validation; and (4) it predicts sleep and sleep propensity profile rather than using sleep times as an input. The main weakness of the current model version is that it underpredicts sleepiness and performance decrements under conditions of chronic sleep restriction, so further improvements are needed.
This model thus presents a foundation for prediction of alertness in more complicated settings, such as that with variable lighting levels and chronic variable sleep restriction. Both the alerting effects of light and chronic sleep restriction require changes to the dynamics of the homeostatic drive, which could be incorporated at different levels: for example, on the lower level of potential mechanisms (e.g., Phillips et al., 2017) or higher functional level (e.g., McCauley et al., 2013; Rajdev et al., 2013). In any case, the modifications are likely to be nonlinear and, in the case of light, may also affect the dynamics under normal sleep-wake cycles with ambient lighting. It is important to ensure, however, that an improved model is “backward compatible”: That is, in addition to the new phenomena, it is still able to reproduce the dynamics of sleep propensity as presented by Postnova et al. (2016) and alertness under dim lighting and during acute sleep deprivation as presented here.
Footnotes
Acknowledgements
This research was supported by the Australian Government via the Cooperative Research Center for Alertness, Safety and Productivity; the Australian Research Council Center of Excellence for Integrative Brain Function (ARC Center of Excellence Grant CE140100007); and the Australian Research Council Laureate Fellowship Grant (FL140100025).
Conflict of Interest Statement
S.P. serves as a theme leader, S.W.L. as a program leader, and P.A.R. as a chief investigator in the Cooperative Research Center for Alertness, Safety, and Productivity. S.P., S.W.L., and P.A.R. have no additional conflicts of interests related to the research or results reported in this paper. In the interests of full disclosure, commercial interests from the last 3 years (2014-2017) are listed below. S.W.L. has received consulting fees from the Atlanta Falcons, Atlanta Hawks, Carbon Limiting Technologies Ltd. on behalf of PhotoStar LED, Perceptive Advisors, Serrado Capital, and SlingshOT Insights and has current consulting contracts with Akili Interactive, Consumer Sleep Solutions, Delos Living LLC, Headwaters Inc., Hintsa Performance AG, Light Cognitive, Mental Workout, Pegasus Capital Advisors LP, PlanLED, OpTerra Energy Services Inc., and Wyle Integrated Science and Engineering. S.W.L. has received unrestricted equipment gifts from Biological Illuminations LLC, Bionetics Corporation, and F. LUX Software LLC; has equity in iSLEEP, Pty; has received advance author payment and/or royalties from Oxford University Press; has received honoraria plus travel, accommodation, and/or meals for invited seminars, conference presentations, or teaching from BHP Billiton, Estee Lauder, Lightfair, Informa Exhibitions (USGBC), and Teague; has received travel, accommodation, and/or meals only (no honoraria) for invited seminars, conference presentations, or teaching from FASEB, Hintsa Performance AG, Lightfair, and USGBC. S.W.L. has completed investigator-initiated research grants from Biological Illumination LLC and Vanda Pharmaceuticals Inc. and has an ongoing investigator-initiated grant from F. Lux Software LLC; completed service agreements from Rio Tinto Iron Ore and Vanda Pharmaceuticals Inc.; and completed three sponsor-initiated clinical research contracts from Vanda Pharmaceuticals Inc. S.W.L. holds a process patent for “Systems and Methods for Determining and/or Controlling Sleep Quality,” which is assigned to the Brigham and Women’s Hospital per hospital policy. S.W.L. has also served as a paid expert on behalf of several public bodies on arbitrations related to sleep, light, circadian rhythms, and/or work hours for the City of Brantford, Canada, and legal proceedings related to light, sleep, and health.
