Abstract
Functional magnetic resonance imaging time-series are conventionally processed by linear modelling the evoked response as the convolution of the experimental conditions with a stereotyped hemodynamic response function (HRF). However, the neural signal in response to a stimulus can vary according to task, brain region, and subject-specific conditions. Moreover, HRF shape has been suggested to carry physiological information. The BOLD signal across a range of sensorial and cognitive tasks was fitted using a sine series expansion, and modelled signals were deconvolved, thus giving rise to a task-specific deconvolved HRF (dHRF), which was characterized in terms of amplitude, latency, time-to-peak and full-width at half maximum for each task. We found that the BOLD response shape changes not only across activated regions and tasks, but also across subjects despite the age homogeneity of the cohort. Largest variabilities were observed in mean amplitude and latency across tasks and regions, while time-to-peak and full width at half maximum were relatively more consistent. Additionally, the dHRF was found to deviate from canonicity in several brain regions. Our results suggest that the choice of a standard, uniform HRF may be not optimal for all fMRI analyses and may lead to model misspecifications and statistical bias.
Introduction
Functional Magnetic Resonance Imaging (fMRI) is an indirect measure of neuronal activity which leverages on the neurovascular coupling. The neuronal-driven vascular response gives rise to local changes of paramagnetic deoxyhaemoglobin concentration which can be detected with the Blood Oxygenation Level Dependent (BOLD) contrast. 1 Neurovascular coupling is a complex and highly variable phenomenon which can be even disrupted in certain cases, for example in brain tumours, 2 further complicating the study of neuronal activity.
Different approaches exist to infer brain activation from the BOLD response, including independent component analysis (ICA), 3 affinity propagation clustering, 4 or expansion in basis functions to model the hemodynamic response function (HRF); however, it is common practice to derive a statistical predictor by linear modelling the evoked BOLD response as the convolution of the known experimental conditions (onset, intensity, and duration of stimuli) with a stereotyped HRF. The shape and features of the HRF, such as its magnitude, latency, and duration, have been reported to carry information on the underlying neuronal activity 5 and depend on the interaction between complex neuronal, glial, and vascular events that lead from a stimulus to the measured BOLD response.6,7
Considering the entire chain of events induced by a stimulation, a nonlinear transfer between stimulus and response can take place at least at two steps: in the transition from the stimulus to the evoked neuronal activity, and in the transition from neural activity and vascular/metabolic response (i.e., neurovascular and neurometabolic coupling). 6 Indeed, there is evidence for non-linearity in both mechanisms.8 –10 Therefore, characterizing the HRF as the impulse response function of a linear system can introduce errors. However, an overall linear coupling between neural activity and BOLD response is generally assumed; 11 hence, an accurate estimate of the HRF is crucial for BOLD fMRI. A realistic modelling of the task-evoked response helps to prevent false positive and false negative results and statistical power loss in functional studies.12 –14
Neural activity in response to a stimulus is task-dependent, region-dependent, and subject-dependent and may change over time as a result of habituation or other neurophysiological processes.15 –18 Similarly, the hemodynamic response varies in amplitude, timing, and shape.16,17,19 –24 The extent of this variation, as well as its possible sources, are current topics of research, with heterogeneous results. There are, for example, clear indications of age-related variations of the hemodynamic response.25 –27 At a single-subject level, the hemodynamic response in primary visual and motor cortices was shown to be highly reproducible in amplitude and timing, 19 while a study by Gonzalez-Castillo and colleagues has shown that, under visual stimulation, the variance of the functional signal across datasets is one of the major contributors to the total signal variance in the cortex, except for occipital areas. 28
For the same brain area, the HRF could vary across subjects in timing, approximately up to two seconds in the motor cortex, 24 and the amplitude can even double in prefrontal areas compared to other areas.22,24 A study estimating empirically the HRF in primary visual and motor cortices and in frontal and supplementary eye fields found significant variability of HRF shape across subjects and regions. In particular, the variability of time-to-peak and time-to-onset was in the range of four seconds and the variability across subjects was larger than the variability across regions. 17 On the other hand, Miezin et al. 19 found that the HRF parametrization estimated in the visual cortex was inadequate to predict the HRF properties in the motor cortex, suggesting that regional variations might dominate over across-subject variations.
The HRF regional variability, either within or between subjects, is most likely related to vascular features or neurovascular coupling differences rather than to the neuronal response and is a major contributor of variability in the BOLD response.17,19 Accordingly, Gonzalez-Castillo et al., 29 using a simple visual stimulation and an attention control task, have shown that the HRF broadly varies across regions in timing, specifically onset and offset, and shape. This variability was both anatomically and functionally consistent in each subject and reproducible across subjects. A spatial modulation of the HRF was reported also across relatively homogeneous areas, like V1-V3 visual cortices. 30 A further complication is introduced in case of study designs involving contrasts between more stimulated conditions as opposed to the contrast between a stimulated condition and rest. In this case, systematic regional or task-related differences in hemodynamic response interact together, making the response harder to interpret.
The hemodynamic response function across the majority of the cerebral cortex has been recently characterized. Taylor et al., 21 using a single short audio-visual stimulus, found a unimodal positive HRF and a more variegated negative response. The spatial pattern of the HRF amplitude was very similar across subjects and timing was stable across the cortical surface, albeit small deviations from the mean were found in regions of auditory, visual, and motor cortices. Chen et al. 14 examined the properties of the HRF in grey and white matter using a fast event-related fMRI dataset and found that regional differences in HRF shape dominated over differences across tasks, conditions, and groups.
From the fMRI processing perspective, several methodologies for HRF modelling have been proposed.12,31 –37 The most rigid strategy assumes a fixed HRF shape, where a single free parameter (the amplitude of the response) is included in the analysis. 35 More flexible modelling approaches include the expansion of the BOLD response using a Finite Impulse Response (FIR) basis set,32,33,38 sinusoidal functions,5,36 half-cosine functions, 39 Gamma functions, 31 B-spline basis set, 40 spline fitting, 41 smooth splines, 14 a Fourier basis set, 16 or a Taylor basis set including the combination of the canonical HRF and its time-derivatives. 42 Other approaches include randomly generated HRFs, 39 or parametrization of the HRF. 43 All these models must balance statistical power and accuracy of the fitting, while avoiding overfitting. These requirements pose opposite constraints on the number of free parameters.5,43 In this context, the improved temporal and spatial resolution (sub-second TR and millimetric voxel sizes) of state-of-the-art fMRI protocols44,45 is allowing denser sampling of the functional response, with increased spatial specificity. The additional information available allows for a better characterization of the HRF while imposing more stringent requests on the reliability of the HRF used during processing, if models with a modest number of free parameters are used (as in many conventional studies). In this study, we sought to characterize a deconvolved hemodynamic response function (dHRF) for the positive BOLD response across the whole cortex, for applications in block-design linear model analyses.
Deconvolution is not routinely used in the estimation of the HRF, but can be a valuable approach when the aim is to take variances into account instead of using a fixed shape. Deconvolution was used, for example, in the characterization of the impulse response function in the spinal cord 46 and in the estimation of the respiration response function 47 both of which have no reason, in principle, to be identical to the HRF used for modelling the BOLD response in cortical grey matter.
This study focuses on a relatively large and homogeneous cohort of young healthy subjects (taken from the Human Connectome Project (HCP) 48 ) performing a relatively common set of fMRI tasks with the aim of characterizing the haemodynamic response under common conditions for fMRI experiments. This characterization can help setting a baseline for the study of potential deviations in the HRF in different cohorts or under different experimental conditions.
Materials and methods
Subjects
Subjects (N = 48, 24 females and 24 males, age (mean ±std) = 27.9 ± 3.6 years, age range: 22–35 years) included in this study were part of the Q3 release of the Human Connectome Project (HCP). 48 A cohort of young subjects was chosen in order to minimize age-related haemodynamic variability, and in particular to exclude from the study the haemodynamic changes associated with aging. 27 The HCP study was approved by the Washington University Institutional Review Board and was in accordance with the Declaration of Helsinki. All subjects gave informed written consent before participating in the study.
Stimulation paradigms
HCP stimulations are designed to elicit task-evoked responses in several cortical and subcortical networks. In the present study, we included the following:
working memory/cognitive control (‘wm’), consisting of a 0-back and 2-back working memory task; incentive processing (‘gambling’), consisting of a card guessing game; visual, motion, somatosensory, and motor (‘visual’ and ‘motor’), in which participants were presented with visual cues in order to move fingers, toe or tongue; language processing (‘language’), in which participants are asked to answer story or math questions. social cognition (‘social’) in which participants were presented with video clips of objects which could be interacting, and were asked to judge whether the objects had interaction; relational processing (‘relational’) in which the stimuli were shapes filled with 1 of 6 different textures and the participants were asked to evaluate particular differences.
Stimuli were administered in block design; a full description of the functional tasks can be found in the paper by Barch et al.; 49 more details are included in Supplemental methods and Supplemental Table 1.
MRI acquisition and preprocessing
This study was conducted on datasets acquired by the HCP projects and treated with the HCP minimal preprocessing pipeline; 50 scanning protocol and preprocessing pipeline are shortly described in the Supplemental Methods.
Analyses performed for this study used tools from the AFNI package (http://afni.nimh.nih.gov) and some MATLAB routines locally developed (MATLAB Release 2018b; The MathWorks, Inc., Natick, Massachusetts). fMRI data from the minimal preprocessing pipeline were smoothed with an isotropic Gaussian kernel (5 mm FWHM) and high pass filtered (180 s).
All images were masked to remove the cerebellum and the brainstem, which were not investigated in the current work. The masking was done using a group-level mask obtained by thresholding at 90% a probability map defined by the across-subject average of the FreeSurfer-segmented cerebellum and brainstem.
BOLD response models
Functional analysis of fMRI time series was carried out in the time domain on a voxel basis using the function 3dDeconvolve of AFNI. 51 Given the input stimulus timing and the measured BOLD signal changes, the average response shape to each stimulus was estimated using two models.
Model 1 (M1) employed a sine series expansion for the response function, extended from the beginning of the stimulus to 20 seconds after the end of the stimulus.
A variable number of basis functions was included in the model, depending on the stimulus duration: the longest sine period matched the full duration of the fitted response, and faster terms were added in steps following the harmonic series so that the shortest period was around 4 seconds for all tasks. With this choice, the nominal temporal smoothness of the fitted response was comparable across tasks, at the price of a variable number of degrees of freedom. Model 2 (M2) was included to represent the behaviour of more restrictive models commonly used in the literature; it included the SPM12 canonical HRF31,42 and its first and second (dispersion) time derivatives. All three terms were convolved with a square function representing the stimulus paradigm.
No further hypotheses were made on the shape of the response beyond those implicit in the base composition (e.g., value 0 at starting time). Models included the first two Legendre polynomials to account for constant terms and linear drifts. Both models were fitted with a linear regression approach, resulting in the estimated regression coefficients and in a fitted response (i.e., the estimated BOLD response), which was split into responses to individual stimuli, as functional runs generally included more than one stimulus. More details on the models are reported in Supplemental Methods.
Finally, to compare the explanatory power of M1 and M2, the R2 adjusted for the number of predictors (adjusted R2) was computed for each regression model and differences between adjusted R2 from M1 and M2 were assessed by means of a Wilcoxon non-parametric test.
Identification of task-responding regions
For each model, subject-specific regression coefficients entered a second-level analysis to identify task-responding regions at the group level. Linear combinations of coefficients (contrasts) were built, corresponding to the tasks described in Supplemental Table 1, in analogy with the paper by Barch and colleagues, 49 and were tested for significance by F-test. The null hypothesis was discarded in voxels with a statistical significance of p < 0.05 (FWE corrected, with the additional constraint of clusters larger than 100 adjacent voxels). Areas activated by each task were identified by applying to each thresholded statistical map a mask including only those voxels where the inter-subject average of the fitted response to the relevant condition was positive, i.e., voxels that had a group-averaged positive BOLD response to the task.
Features of the haemodynamic response to distinct tasks
To estimate the dHRF for both M1 and M2 (M1-dHRF and M2-dHRF), the voxel-specific fitted BOLD responses were deconvolved with a boxcar function whose duration matched the task duration, using Tikhonov regularization. 52 Then, to characterize the haemodynamic response for a specific task, the time courses of the dHRF and fitted responses were averaged within the activated areas for each task and subject, and parameters describing magnitude and timing were calculated from the averaged dHRF and the fitted response shape.
Two parameters were extracted from the estimated mean dHRF shape:
time-to-peak (TTP): the time interval from the beginning of the stimulation to the time corresponding to the dHRF peak; full width at half maximum (FWHM).
Other two parameters were extracted from the mean fitted BOLD response:
amplitude (A): the 90th percentile value of maximum signal change during the stimulus time window; latency: the time interval from the beginning of the stimulus to the time corresponding to A/2.
Finally, we evaluated the correlations between timing parameters across subjects and contrasts for model M1, after converting each contrast-specific parameter in z-score values across subjects. Unless otherwise stated, values are given as mean ± standard deviation.
Spatial variability of the haemodynamic response
To characterize the spatial variability of the M1-dHRF, irrespectively of any specific task, we analysed the task-averaged response at the voxel level.
Variability of the dHRF shape
The voxel-wise variance of the estimated M1-dHRF shape from the canonical HRF as implemented in SPM12 53 was calculated. For each voxel and subject, the dHRF was obtained by averaging the dHRFs of tasks for which the voxel was considered to be responding (according to the second-level results) and was then correlated to the canonical HRF.
Voxel-wise variability of haemodynamic response parameters
The consistency of the response parameters across tasks was preliminarily assessed by voxel-based repeated-measures ANOVA (rANOVA, with the degrees of freedom changing voxel-by-voxel due to the different spatial extent of the response to different tasks).
The spatial variability of the four parameters previously defined (TTP, FWHM, A, and latency, estimated with model M1) was evaluated on a voxel basis. The parameters computed for each voxel were first averaged across the responding tasks and then mean and standard deviation were calculated across subjects.
Results
Activated areas
The aggregated activated areas covered approximately 68% of the cortex (Supplemental Figure 1). The large-scale activations found at the group level were statistically significant and in agreement with the second-level activation results of the HCP. 49 Activation maps for each contrast are shown in Supplemental Figure 2.
Features of the haemodynamic response to distinct tasks
The average time courses of the fitted responses and M1-dHRFs are reported in Figure 1(a) and (b). The average fitted responses in task-activated areas (Figure 1(a)) are generally characterized by the expected positive, sustained BOLD-like response features. The post-stimulus undershoot was discernible in 5 out of 7 contrasts (gambling, social, relational, visual, motor), while an initial dip was visible in 4 contrasts (gambling, visual, social, working memory). Mean dHRFs, deconvolved and averaged in task-activated areas (Figure 1(b)) generally start around zero, show a single peak and a late undershoot.

Average fitted responses and dHRFs across subjects. Task-specific mean fitted BOLD responses and mean estimated dHRFs for M1 (top, a and b) and M2 (bottom, c and d). Data are averaged in each responding area and then across subjects. In all plots, the average response is represented with a black solid line while the coloured shades around the mean represent the 95% confidence interval of the mean across subjects.
The features of the fitted response and the deconvolved HRF obtained with model M2 are overall similar (Figure 1(c) and (d)). Post stimulus undershoot is generally more recognizable than in M1 and initial dip is visible only in one task (social, obtained by difference between active conditions). This is likely a consequence of the HRF features intrinsically included in the basis set.
Similarity between fitted response and dHRF derived with M1 and M2 models was assessed by 2-way ANOVA. Results are shown in Table 1. Fitted response and dHRF derived with the two models are different, with generally significant effects of model and interaction, indicating that the two models differ in a time-variable way. In particular, the factor “model” of the fitted response was highly significant in 3 tasks, significant in 2 tasks and not significant in 2 tasks (gambling and relational). For dHRF, the factor “model” was highly significant in 3 tasks, significant in 2 tasks and not significant in 2 tasks (wm and gambling). The interaction was highly significant or significant for all comparisons, except for the fitted response of relational task.
2-Way analysis of variance of fitted response and dHRF obtained with M1 and M2. ANOVA is extended to the common time points (M1 results in shorter time series, thus later time points obtained by M2 are discarded).
P-values are not corrected for multiple comparisons. Significant comparisons are shown in bold and marked with *(p < 0.01) or **(p < 0.0001).
Quantitative parameters that describe the averaged fitted BOLD response and dHRF for each task for model M1 are shown in Figure 2, top. The amplitude across tasks was (0.93 ± 0.45)%, ranging from a minimum value of 0.4% for visual to a maximum of 1.2% for motor (it should be noted that the visual stimulation consisted of the cue for motor tasks, hence the amplitude is low compared to the one expected with more commonly used visual stimulations). Both the mean amplitude and the corresponding dispersion across subjects were substantially higher in areas responding to motor tasks compared to areas activated by the other tasks (Figure 2(a)).

Across-subject variability in amplitude and timing parameters. Across-subjects distribution of amplitude (a) and timing parameters (latency, TTP, and FWHM) for M1 (top, a and b) and M2 (bottom, c and d). Parameters were extracted from the relevant model and calculated for each task and subject using the spatially averaged task-based response and dHRF. The box plots show the mean (open circle), the median (horizontal line), and the interquartile range (box extremes); the whiskers extend to the data point closest to 1.5 times the interquartile range, above or below the first and third quartiles, respectively; grey dots represents outlier points ranging beyond these thresholds.
Timing parameters (Figure 2(b)) showed distinct behaviours. The latency of the fitted response changed appreciably across tasks, with a mean value of (6.5 ± 3.3) s. In general, motor task showed the lowest values, while language and social tasks the highest. Higher mean values were associated with a larger dispersion across subjects. Both TTP and FWHM tended to show a rather uniform mean across tasks, while the variability across subjects was inconsistent and, in some cases, relatively high, especially for language task. TTP across tasks was (4.57 ± 0.41) and The FWHM was (4.51 ± 0.45) s.
Given the tendency of increased dispersion with increased mean, the mean coefficient of variations across subjects are roughly comparable for each parameter: 34% for amplitude, 37% for latency, 33% for TTP, and 36% for FWHM. The mean CVs calculated across tasks are higher for amplitude and latency, 55% and 67% respectively, and almost identical for TTP and FWHM, 33% and 35% respectively.
Parameters obtained using M2 are reported in Figure 2, bottom. Amplitude (Figure 2(c)) and timing parameters (Figure 2(d)) show a similar pattern to M1, but with a reduced variance between subjects.
Parameters obtained with the two models were compared by paired t-test. Results are shown in Supplemental Table 2. Models show a moderate concordance; 9 parameters have significantly different estimation over 28 comparisons. The correlations between timing parameters of M1 are reported in Figure 3. TTP was slightly, but significantly correlated to latency (R = 0.16, p = 0.00024) (Figure 3(a)) and showed a significant and strong correlation with FWHM (R = 0.35, p < 0.0001) (Figure 3(b)), while latency and FWHM showed only a tendency of negative correlation (R = –0.08, p = 0.095) (Figure 3(c)).

Correlations between M1 parameters. (a) Correlation between TTP and latency (significant, R = 0.16, p = 0.00024). (b) Correlation between TTP and FWHM (significant, R = 0.35, p < 0.0001) and (c) correlation between latency and FWHM (not significant, R = −0.08, p = 0.095).
While adjusted-R2 is variable between subjects and tasks, the Wilcoxon non-parametric test indicated that the power of regression model M1 was always significantly superior to that of model M2 (p < 10−8 for all tasks). Adjusted-R2 for all subjects and tasks are reported in Figure 4.

Adjusted R2 of M2 vs M1. Plot of adjusted R2 for all subjects and tasks. Adjusted R2 of regression model M1 was always significantly superior to that of model M2 (p < 10−8 for all tasks).
Spatial variability of the haemodynamic response
Variability of the dHRF shape
The voxelwise similarity of the dHRF to the SPM12 canonical HRF is represented in Figure 5. Regions showing the maximal canonicity are localized in the occipital cortex. Some small regions are characterized by negative z values (i.e., the local dHRF tended to be inversely correlated to the canonical HRF); they are localized in frontal pole, cingulate and paracingulate gyrus, precuneus, and supracalcarine cortex.

Canonicity of the dHRF. Voxelwise similarity, expressed as z-score, of the dHRF to the SPM12 canonical HRF (a), the relevant SD (b) and sagittal section and scale (c). For each voxel and subject, the task-averaged dHRF was obtained by averaging the dHRF shapes of all tasks for which the voxel was responding and was then correlated to the canonical HRF. Regions showing the maximal canonicity are in the occipital cortex. Several regions, mainly in the insular somatosensory and motor cortex and frontal pole showed some asymmetry between hemispheres.
Voxel-wise variability of haemodynamic response parameters
For each parameter, rANOVA showed that the differences between tasks at a voxel level were not significant (p > 0.7). The spatial distributions of the parameters are represented in Figure 6.

Whole brain maps of the investigated BOLD parameters, estimated with M1 model. In each row, from left: axial map of the parameter, associated standard deviation across subjects, sagittal section and scale. From top to bottom: (a)–(c) amplitude of the fitted Continued.BOLD response, expressed as percentage variation of BOLD signal over the baseline. The amplitude of the fitted response varied appreciably across the cortex. In particular, a larger BOLD signal change was observed in sensory areas, such as the primary and supplementary motor cortex, intracalcarine and supracalcarine cortex, and in the caudate nucleus and cingulate gyrus. (d)–(f) mean latency of the fitted BOLD response. Latency appeared moderate and reproducible across subjects in occipital regions, while some temporal areas were characterized by slower responses. (g)–(i) Time To Peak of the deconvolved HRF. TTP showed a high spatial variability at the voxel level. Higher TTP values were found in the posterior division of the temporal gyrus while lower values were found in the supplementary motor cortex. (j)–(l) Full Width at Half Maximum of the dHRF. FWHM was found to be higher in the frontal medial cortex, lateral occipital cortex and angular gyrus, motor cortex (precentral gyrus), cingulate gyrus, and postcentral gyrus. For each parameter, the voxelwise magnitude was first averaged across the responding tasks and then the mean and SD across subjects were calculated.
The amplitude of the fitted response changes appreciably across the cortex (Figure 6(a) to (c)). A stronger BOLD response is observed in sensory areas, including primary and supplementary motor cortices, intracalcarine and supracalcarine cortices, but also in the caudate nucleus and cingulate gyrus.
Latency shows a high spatial variability (Figure 6(d) to (f)). Regions with lower values correspond to primary and supplementary motor areas; substantially higher latencies are found in the posterior division of the temporal gyrus and in the superior temporal gyrus.
TTP has a comparatively lower spatial variability (Figure 6(g) to (i)). Higher TTP values are found in the posterior division of the temporal gyrus, while lower values are located in the supplementary motor cortex.
The FWHM is higher in regions of the frontal medial cortex, lateral occipital cortex and angular gyrus, motor cortex (precentral gyrus), cingulate gyrus, and post-central gyrus (Figure 6(j) to (l)). Occipital areas are again characterized by the lowest variance across subjects.
Discussion
In this study, we characterized the features of the haemodynamic response function at a whole-brain level using a deconvolution approach. Previous studies focusing on the mapping of the HRF used an event-related design and a specific task.14,21 Here, we assessed the response variability across a wide range of sensorial and cognitive block-design tasks. These stimulations have been shown to elicit a positive BOLD response with a high level of significance across a large part of the cortex (Supplemental Figures 1 and 2).
Modelling the evoked BOLD response is crucial for inference.5,12,39,43 Assessing the BOLD response can shed light on its role in physiological aging, cerebrovascular diseases, and dementia.43,54,55 Indeed, several studies have reported that multiple aspects of the HRF (e.g., amplitude, initial dip, post-stimulus undershoot) may be affected both by aging and neurodegenerative, psychiatric, and neurovascular diseases.27,54 –60
In this work, we explored the variability of the dHRF shape and timing across tasks and brain regions in a cohort of subjects taken from the HCP. 48 Typically, fMRI experiments are conducted on smaller sample sizes and investigate a restricted number of tasks and brain regions. HCP data can be further analysed to characterize, using the same approach, several aspects of the HRF that are often neglected, like dependence on sex or age. In particular, aging is associated with vascular changes 61 which can have an effect on the shape and timing of the HRF; this study focused on a homogeneous cohort of participants (i.e., healthy subjects aged 22–35 years) in order to reduce such variability. Age-related changes in the HRF have been observed, 27 while no significant change in neurovascular coupling was found in association with age. 62
Features of the dHRF
We extracted amplitude and timing parameters from the mean-fitted BOLD response and from the dHRF. Latency was characterized by the highest variance across tasks (Figure 2(b) and (d)), while TTP and FWHM were fairly constant across the tasks considered. This could be taken as an indication that TTP and FWHM are mostly driven by regional vascular features while latency, showing a stronger task-related variability, could hence be more sensitive to different extents of cognitive engagement. Moreover, the apparent latency estimate can be substantially affected in responses derived from the contrast between two active conditions. Indeed, the largest latencies were observed for the task language, social, and relational, while the largest variance between subjects and within task was observed for language and social, all cognitive and composite tasks.
Amplitude was highly variable both across tasks (Figure 2(a) and (c)) and across brain regions (Figure 6(a) to (c)). The highest values were found for motor tasks and in sensory areas in general, consistent with previous observations. 21
The large spatial variability observed in latency (Figure 6(d) to (f)), with occipital and temporal regions showing the fastest and slowest dynamics, respectively, could be due to different mechanisms. In particular, it could have a neuronal origin and, thus, indicate differences in brain activity, or it could be related to haemodynamic features, indicating primarily vascular differences, or a mixture of both. In the first case, the delay would represent a feature of interest for the characterization of brain activity following different tasks and for studies based on the temporal features of the BOLD response; in the second case, it would mainly represent a bias to correct for, if the target of the investigation is isolating the underlying neural activity.
We found that the time-to-peak had a high variability across voxels (Figure 6(g) to (i)), but was relatively stable across subjects. Interestingly, although the spatial pattern of TTP differs from that of latency (Figure 6(d) to (f)), a significant correlation was found between the two parameters for the tasks considered (R = 0.16, Figure 3(a)).
The FWHM did not vary much across areas (Figure 6(j) to (l)), similar to the findings of Taylor and colleagues. 21 A significant correlation was found between FWHM and TTP (R = 0.35, Figure 3(b)) for the tasks considered. Taylor et al. 21 reported a correlation between FWHM and TTP of R = 0.34, very close to what we found. Similarly, a correlation between TTP and FWHM has been previously reported for the impulse response in the human visual cortex; 63 in particular, higher values of TTP and FWHM were associated with responses coming from the venous vasculature.63,64
The shape of the HRF is a potential confound, leading to statistically incorrect modelling and inference. 12 The variability of the HRF across subjects and brain regions limits the accurate evaluation of individual BOLD response. A method proposed to solve this problem is to acquire extra scans, using simple tasks (i.e., visual or motor tasks), to evaluate subject-specific HRFs. However, this solution, apart from requiring extra scan and analysis time, would not prevent biases due to HRF intra-subject differences between tasks and areas. Indeed, we observed that, for all the parameters investigated, there is an appreciable variability across areas (Figure 6), in accordance with previous observations. 14 This suggests that an average region-specific HRF would be preferable to a single subject-specific HRF, at least for young, healthy subjects. In particular, estimating a subject-specific HRF based on the activation in a single area would lead to strong biases. The rANOVA analysis showed that, for each parameter and for the stimulations considered, the differences between tasks at the voxel level are not significant (p > 0.7). This suggests that the responses to different tasks can be safely averaged to characterize the voxel-specific BOLD response. The use of a basis set of functions can account for intra-subject variability, since the combination of several functions can model many HRF shapes bringing out a specific physiological feature at a time. Liao et al. 65 propose a basis set of functions to estimate the BOLD response delay following stimulation onset. Nevertheless, such approaches often need to impose many constraints on model fitting, such as a time window for the signal peak 39 in order to limit the fitted response to physiologically plausible shapes.
From a practical point of view, the use of distinct HRFs corresponding to distinct tasks and areas would introduce a serious degree of complexity in fMRI analyses. Our results suggest that approaches allowing variable HRF shape should be used whenever possible, especially if using short TR.64,66
The spatial heterogeneity in the dHRF shape could be partially ascribed to the fact that a variable number of tasks was considered for each region (Supplemental Figure 1). In this sense, regions where a larger number of tasks is averaged could show a greater variability. Based on our results, however, this does not seem to be the case: the occipital region was the one for which the highest number of tasks was considered, and showed a relatively low variability. This finding confirms that spatial variability prevails over task-related variability in the same area.
It should be noted that this study cannot assess to which extent the variability observed in the HRF shape and timing parameters in a certain region arises from stimulation or regional differences because the HCP data do not allow to fully disentangle one contribution from the other.
dHRF estimation
The main drawback of our approach is that it relies on deconvolution, an inherently unstable operation that leverages on linearity features of the BOLD response. The block-design stimulation paradigms available in the HCP dataset were not designed specifically for the calculation of a deconvolved HRF, and therefore are not fully comparable in terms of stimulation length and response amplitude (as can be seen from Figure 1(a) and (c)). In particular, the visual task, which was the 3-second visual cue used for motor tasks, was shorter and elicited a smaller BOLD response compared to what is usually employed for block-design visual stimulations. Using different stimuli, both in terms of type and length, is likely to give slightly different deconvolved HRF shapes due to the fact that the BOLD response changes, and the dHRF is designed to pick up that variability.
Direct measurement of the HRF by using an impulsive delta-like stimulation would be in principle more desirable to detect the true HRF.21,67 However, a large part of the interest in the characterization of the HRF is for inference purposes in linear models of BOLD signal. HRF measured after short stimulation does not account for non-linearity effects, it is tailored to fit event-related activity, and thus it is not completely suitable for approaches inferring brain activation from convolution of the HRF with experimental conditions, as in conventional block-design fMRI paradigms. The deconvolved HRF, which includes the effect of nonlinear terms, habituation, and other confounds is, in principle, a biased estimate of the true HRF, but might be better suited to predict the BOLD response in block-design experiments rather than a directly measured HRF. The design of the fMRI experiment also plays a role in the estimation of the HRF shape. Unfortunately, the efficiency in HRF estimation and the power in activation detection are maximized by reverse conditions.67,68 In particular, the event-related design maximizes estimation efficiency at the cost of poor detection power, while block design maximizes detection power at the cost of estimation efficiency.43,67 Lindquist et al. 5 assessed the performance of different HRF models for event-related designs evaluating bias, misspecification, and statistical power. Their results showed the difficulty in confidently recovering real task-evoked changes in BOLD signal and the wide difference among HRF shapes in terms of power of detection, bias, and uncertainties in parameters determination. Shan et al. 43 specifically studied the capability of block design fMRI paradigms to estimate HRF, and their findings showed that most HRF models, including the sum of gamma functions and inverse logit functions, were able to correctly estimate HRF parameters from a block-design paradigm. They also reported that HRF parameters were reproducible across experimental sessions, supporting the biological origin of these parameters.
In the presence of non-stationarity, the dHRF should be used only to estimate the response to stimulations having about the same duration as the experimental conditions for which it was assessed. The standardized HCP stimulations we used for our analysis are likely to be a good match for many applications. Moreover, the non-linearity tends to be particularly strong with very short stimulus durations, 69 while it is somewhat mitigated at longer durations, e.g., between 4 s and 12 s.66,68
A high number of basis functions in linear models, or free parameters in nonlinear models, make the model more flexible and able to adapt to the real response. However, this flexibility has costs: reduced degrees of freedom, overfitting of noise, and decreased power for collinear regressors.5,43 On the other hand, using a single canonical HRF could elicit a bias estimate of response across the brain, according to previous findings13,17,20,21,70 and as confirmed by our results, that show variability across brain regions and (to a lesser extent) across tasks. Moreover, mismatched latency and FWHM could be confused with dissimilarities in amplitude of response. 12
Finally, to overcome the limit in providing independent estimates of multiple parameters, nonlinear fitting could be used.19,34 However, nonlinear approaches are time and computational consuming and they are affected by convergence problems. 12
Comparison of model M1, M2 and canonical
In general terms, the shapes of fitted response and dHRF significantly differ between models (Table 1). While this is not true for each parameter and task, the widespread difference of interaction of factors “model” and “time” (significant for all comparisons of fitted responses and dHRFs, except for the fitted response to relational tasks) is of particular relevance, indicating that estimates derived from M1 and M2 have distinct time evolutions. The difference between parameters is less pronounced, with 9 significant differences among 28 comparisons (Supplemental Table 2). This finding indicates that the synthetic parameters commonly used to describe the BOLD response not always capture the difference between time courses.
Model M1 is characterized by a superior explanatory power than model M2, as can be observed from systematically and significantly higher values of adjusted R2 of regression model M1 (Figure 4). This observation does not imply that model M1 is a better representation of the true haemodynamic response, but rather than it provides a better fit for reproducing the BOLD signal changes elicited by the block-design paradigms considered.
The voxelwise similarity between the dHRF derived from model M1 and the canonical HRF, as evaluated by a correlation analysis, showed that the deviations of the dHRF from canonicity are region-dependent (Figure 5). This is a direct consequence of the spatial variability of the dHRF and, while in principle it cannot be stated that M1 represents a more realistic description of the true underlying haemodynamic response, it is likely to improve activation detection when response deviates from canonicity.66,70 Deviations from canonicity were previously reported,17,70 and are to be expected in particular for higher spatiotemporal resolutions,66,71 which indicate that flexible models for the HRF might be preferable. However, the actual effect of using a more flexible model on the identification of responding areas can’t be directly derived from data presented in this study, and likely depends on multiple experimental factors.
Outlook
With the shift towards higher magnetic field strengths and smaller voxel sizes in fMRI experiments, spatial resolution becomes a crucial factor that needs to be taken into account in the HRF estimation approach. Human fMRI studies with conventional voxel sizes (around 3 mm), although more likely to exhibit canonical HRFs, 71 are affected by partial volume effects with WM and CSF, which are characterized by distinct signal dynamics.14,72
Even within GM, the HRF has been shown to vary with cortical depth. A few studies reported higher amplitudes, varying TTPs, and broader FWHMs for superficial laminae compared to deeper laminae.73 –75 In general, BOLD responses change across cortical depths partly due to the different neuronal activity of different layers and partly due to a vascular “carry-over” effect from the activated layer to the cortical surface. 76 These differences are accessible only to submillimetric fMRI and are expected to become significant especially at Ultra High Magnetic field and in studies tailored to investigate laminar fMRI.
In terms of subcortical areas, although the whole brain was included in the analysis, the only subcortical regions with a strong enough response were the putamen, pallidum, and thalamus. Cerebellum showed a modest number of voxels above the threshold and was not included in the results.
The spatial heterogeneity of the timing features of the dHRF is of particular relevance in functional connectivity applications. According to our findings, all timing parameters exhibit some degree of spatial variability. In particular, the mean latency of the BOLD response can have across-region variations in the order of several seconds (for example, up to 9 seconds between primary motor areas and temporal gyrus), that borders with the band of interest for functional connectivity. Functional connectivity based on correlation analysis may therefore miss the presence of existing connections between certain areas or underestimate their strength. In the case of time-lagged correlation analysis, strong timing variability could even lead to a misinterpretation of the directional relationship between brain regions. Given that the variability of the timing parameters both across tasks and subjects is relatively low, these biases might be systematically affecting the same regions, i.e., those characterized by larger differences.
Conclusions
We investigated HRF shape and timing parameters in the majority of the brain cortex and for a broad range of functional tasks using a deconvolution approach. BOLD response and dHRF shape and parametrization were found to vary both across subjects and across brain areas, with amplitude and latency showing a higher variability compared to FWHM and TTP.
Each parameter exhibited a distinct pattern of spatial variability, possibly indicating that each of them is driven by a different feature of the response. Accordingly, the extent of deviations from the canonical HRF was found to be region-specific.
Our finding suggests that the assumption of a single HRF can lead to biases in task-based and in connectivity fMRI studies, and that the use of flexible models is better suited for modelling the BOLD response when the exact shape of the response is expected to play a role.
Supplemental Material
sj-pdf-1-jcb-10.1177_0271678X251325413 - Supplemental material for Towards whole brain mapping of the haemodynamic response function
Supplemental material, sj-pdf-1-jcb-10.1177_0271678X251325413 for Towards whole brain mapping of the hemodynamic response function by Fabio Mangini, Marta Moraschi, Daniele Mascali, Maria Guidi, Michela Fratini, Silvia Mangia, Mauro DiNuzzo, Fabrizio Frezza and Federico Giove in Journal of Cerebral Blood Flow & Metabolism
Footnotes
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Hub Life Science – Advanced Diagnostics (HLS-DA), PNC-E3-2022-23683266–CUP:J83C23000110007, funded by the Italian Ministry of Health as part of the National Complementary Plan ‘Innovative Health Ecosystem’ – Unique investment code: PNC-E.3.
Acknowledgements
The authors wish to thank Ilaria Di Sannio for her support with processing. Data were provided by the Human Connectome Project, WU-Minn Consortium (Principal Investigators: David Van Essen and Kamil Ugurbil; 1U54MH091657) funded by the 16 NIH Institutes and Centers that support the NIH Blueprint for Neuroscience Research; and by the McDonnell Center for Systems Neuroscience at Washington University.
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: One of the coauthors (Mauro Di Nuzzo) is member of the Editorial Board of JCBFM.
Authors’ contributions
Conceptualization: SM, FF, FG. Methodology: FM, MM, MG, SM, MDN, FG. Software: FM, DM. Validation: FM, MM, DM, MF. Formal analysis: FG, FM, MM, DM, MG, MF, MDN. Data Curation: FM. Writing – Original Draft: FM, MM, FG. Writing – Review & Editing: all authors. Visualization: FM, MM, DM, MG, MF. Supervision and Funding acquisition: FG, FF.
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
