How processing choices effect repeatability in BOLD

Abstract

Cerebrovascular reactivity (CVR) is increasingly recognized as a valuable clinical biomarker, making accurate, and reliable quantification essential, particularly in the absence of a gold-standard reference. However, both the acquisition and processing of CVR data are influenced by numerous methodological factors, including imaging technique, sequence parameters, vascular paradigm, and data processing strategies. These complexities can be daunting for newcomers and hinder methodological consistency across studies. To support both novice and experienced researchers, we systematically evaluated how different processing strategies affect CVR map repeatability, quantified using spatial intraclass correlation, across multiple vascular paradigms. Twenty-four healthy volunteers underwent BOLD–CVR imaging using CO₂ inhalation, breath-hold, and resting-state paradigms, each repeated in a test–retest setup. We found that optimal processing choices varied across paradigms and interacted in non-trivial ways. For example, the inclusion of motion-confounds and the application of temporal filtering require careful consideration, as they can introduce substantial collinearity in the regression model and reduce repeatability. We summarized these findings into practical insights to guide researchers in making sound methodological choices and promote consistency within the field.

Keywords

BOLD–MRI breath-hold cerebrovascular reactivity resting-state

Introduction

Cerebrovascular reactivity (CVR) refers to the ability for blood vessels to dilate and constrict in response to vasoactive stimulus. CVR mapping has become increasingly important tool in both clinical and research settings, particularly for assessing vascular health in conditions such as stroke, brain tumors, small vessel disease, and neurodegenerative diseases.¹

CVR is typically measured using magnetic resonance imaging (MRI) based techniques in combination with a stimulus paradigm to evoke a vascular response, referred to here as a vascular paradigm. A commonly used MRI sequence is blood-oxygen-level-dependent (BOLD) imaging, which is also used in functional MRI (fMRI), due to its high temporal resolution and signal-to-noise ratio (SNR).² The most frequent vascular paradigms are carbon dioxide (CO₂) inhalation and breath-holding (BH).¹ CO₂ inhalation increases blood CO₂ partial pressure (PCO₂) and lowers blood pH, resulting in vasodilation, and provides a well-controlled and reproducible stimulus, though it requires specialized delivery equipment.³ BH also results in increased PCO₂ and is, by contrast, more accessible but introduces greater variability due to differences in subject compliance.⁴ An alternative approach, resting-state (RS) CVR, avoids external stimuli altogether, and estimates CVR from spontaneous fluctuations in respiration and end-tidal CO₂ (ET-CO₂).⁵ However, this method assumes that sufficient vascular responses occur at rest, which may limit its sensitivity to detect underlying pathophysiology or blood redistribution leading to vascular steal, a local reduction in blood flow caused by increases elsewhere and an important marker of vascular impairment.⁶

While different vascular paradigms are generally assumed to probe the same underlying physiology, the ways in which they elicit a vascular response differ. Consequently, the same data processing strategies, such as the degree of spatial smoothing, temporal filtering, inclusion of motion-confounds, and choice of regressor, may not be equally suitable across paradigms, and different parameters might interact in complex ways.

Individual comparisons, such as CO₂ inhalation versus BH,⁷ BH versus RS,⁸ and CO₂ inhalation versus RS,⁹ have previously been investigated, however no study (to the authors’ knowledge) have simultaneously evaluated more than two paradigms within the same cohort or examined how optimal processing strategies may differ between them.

Several studies have examined the selection of regressor for CVR mapping. In BH-based CVR this has been done to mitigate the effect of low-patient compliance⁴ or to compensate for the absence of high-quality CO₂ recordings.^10,11 Other work has addressed regional variability in cerebrovascular dynamics by accounting for differences in delay and/or fitting a hemodynamic response function (HRF).^12–15 Data-driven approaches have also been proposed, either by deriving a regressor from the mean signal within a predefined region¹⁶ or by refining an initial regressor (e.g. a CO₂ trace) by finding highly correlated imaging signals and aligning them to maximum correlation to produce a new regressor.¹⁷

A smaller number of studies have systematically examined other aspects of CVR data processing. For example, Liu et al. investigated optimal temporal filtering of RS signals to improve the spatial correspondence between RS-CVR and CO₂-CVR maps,⁹ while Moia et al. employed independent component analysis (ICA) to remove motion-related signals while preserving CO₂-related fluctuations.¹⁸

There also exists a rich body of fMRI studies examining methodological choices. However, it remains unclear whether these findings are directly transferable to CVR imaging, particularly given that optimal processing strategies within fMRI, such as degree of spatial smoothing and/or removal of motion influence, can depend on the specific experimental design and research question.^19–22

Although many CVR processing approaches exist, the literature offers little systematic, evidence-based guidance to inform processing choices and ensure reproducible results.

In this study, we sought to address this gap by systematically comparing BOLD-based CVR derived from CO₂ inhalation, BH, and RS within a single cohort of healthy subjects. Our primary aim was to evaluate how widely used processing steps, including spatial smoothing, linear detrending, temporal filtering, and inclusion of motion-confounds, affect the repeatability of CVR maps across vascular paradigms. We found that while several choices, such as spatial smoothing, exerted broadly similar effects across paradigms, others, most notably temporal filtering and motion-confounds, showed more variable impacts and strong interactions with one another.

By presenting our findings within a cohesive framework, we provide actionable, data-driven recommendations to improve CVR repeatability and to guide methodological decision-making, particularly for researchers entering the field.

Materials and methods

Subjects

All subjects gave written informed consent in accordance with the Declaration of Helsinki. The study was approved by the Swedish Ethical Review Authority (ref: 2021-04825). Before participation, subjects completed a health form to screen for factors affecting cerebrovascular function or task compliance. Inclusion required good general health and no history of brain surgery, neurological/psychiatric disorders, cognitive impairment, or substance abuse. Subjects reported prescription drug use; only those on psychoactive medications were excluded. Additional exclusions included diagnosed pulmonary disease and regular smoking.

Procedure

Subjects underwent MRI examinations of BOLD–CVR using three vascular paradigms: fixed CO₂ inhalation, BH, and RS, each repeated in a test–retest design. These data were subsequently used to evaluate how processing choices affect repeatability and to gain insights into CVR image generation, more details below.

After completing the MRI session, subjects rated their discomfort for each vascular paradigm, as well as their general discomfort from wearing a facemask (see below) and from being inside the MRI scanner. Discomfort was rated on a scale from “None” to “Extreme.” For the CO₂ paradigm, subjects additionally reported any experienced symptoms (e.g. dizziness, anxiety). Differences in discomfort ratings were assessed using Friedman’s test. Significant results were followed by Wilcoxon signed-rank tests with Bonferroni correction. Further details are provided in Supplementary Material—Discomfort.

We also evaluated differences in head motion across paradigms by comparing mean framewise displacement (FD) during the MRI acquisitions. Pairwise comparisons were conducted using paired t-tests with Holm correction for multiple comparison. Further information is provided in Supplementary Material—Head motion.

Data acquisition

BOLD–CVR was measured on a 3 T MRI scanner (MAGNETOM Prisma; Siemens Healthineers, Forcheim, Germany), under one of three vascular paradigms: fixed CO₂ inhalation, BH, or RS. Each 5-min run was block-randomized (block size = 6). To assess test–retest repeatability, each paradigm was repeated after a 3–4 min break, during which subjects exited the scanner briefly.

For the CO₂ paradigm, we used our in-house iCO₂ CVR system,²³ involving a 30 s baseline, two 60 s blocks of 5% CO₂, each followed by 75 s of room air. BH instructions, tailored to subjects’ breathing pace,²⁴ were displayed using PsychoPy2,²⁵ a screen and a projector. The BH task included three 15 s end-expiratory holds, interleaved with 45 s of paced breathing, plus 60 s paced breathing before and after. Compliance with the BH task was verified post-acquisition by visual inspection of the respiratory traces recorded by the iCO₂ CVR system. During RS, subjects fixated on a cross without task or paced breathing. The stimulus protocols are shown in top part of Figure 1. To aid compliance, shorter practice versions of CO₂ and BH were conducted pre-scan.

Figure 1.

Overview of the design parameters selected for in-depth analysis, organized within a generalized CVR processing pipeline. The pipeline includes key stages: data acquisition, pre-/post-processing, regression analysis, and refinement. Parameters examined include vascular paradigm (with the specific stimulus protocols used in this study illustrated), susceptibility distortion correction, spatial smoothing, regressor choice, and outlier filtering. Linear detrending, temporal filtering, and motion-confounds were assessed jointly to explore their interaction effects. For a full list of all design parameters and their tested values, see Supplementary Table S1.

Subjects wore a facemask (Mask 7450 V2; Vyaire, Chicago, USA) connected to an MRI-conditional monitor (Expression MR400; Philips, Orlando, USA) throughout (excluding breaks) to deliver CO₂ and sample ET-CO₂ (sample rate: 1 Hz). For BH, subjects exhaled before and after each hold to aid interpolation, assuming linear ET-CO₂ increase. An issue was that the MR400 only provided calculated ET-CO₂, not full traces, which occasionally resulted in missing post-BH measurements. These missing values were imputed using data from the subject’s other BH trials. Additional details on handling of CO₂ data is provided in the Supplementary Material—CO₂ sampling.

Alongside functional T2*-weighted BOLD images, one T1-weighted MPRAGE image was acquired for registration and two (one per test/retest) dual-echo fieldmaps for susceptibility distortion correction (SDC). Acquisition parameters are listed in Supplementary Table S2.

CVR image and signal processing

Below, we describe the processing strategies, or “design parameters,” considered in this study, including both parameters left untested and those systematically varied. All strategies were applied identically across the vascular paradigms used for CVR imaging. For each tested parameter, we report the range of values examined as well as the default setting used during evaluation. When presenting paradigm-independent results, we choose to focus on the CO₂ paradigm, effectively treating it as the default paradigm. A complete overview of all parameters, including their tested levels and default values, is provided in Supplementary Table S1.

BOLD data were converted to BIDS-compliant NIfTI using DCM2Bids v3.1.1,²⁶ and preprocessed with fMRIPrep v23.2.0a3,²⁷ including motion correction, SDC, MNI normalization (MNI152NLin2009cAsym), and FreeSurfer v7.3.2²⁸ reconstruction. To assess the impact of SDC, we additionally ran fMRIPrep without including the fieldmaps (effectively disabling SDC), default: SDC enabled.

Further processing used a custom Python v3.8 framework with Nilearn v0.10.4, SciPy v1.14.1, Pandas v2.2.3, and NumPy v2.1.1.^29–33 Brain voxel-masking used Nilearn’s compute_epi_mask (“EPI-mask”; default) or fMRIPrep’s provided mask (“brain-mask”). Gaussian smoothing was applied using a variable full width at half maximum (fwhm; fwhm 0–9 mm; default: 5 mm; 0 mm = no smoothing).

Both voxel- and region-based analyses were conducted using Nilearn’s NiftiMasker/NiftiLabelsMasker. Region-based analysis used aparc + aseg parcellation³⁴; voxel-wise analysis was default. BOLD and confound timeseries (from fMRIPrep) were upsampled (factor: 1–4; default: 2), as was ET-CO₂, to match the temporal resolution. Polynomial linear detrending (degree 0–3; default: 1; 0 = no detrending) and low-pass temporal filtering (cutoff: 7.275–465.6 mHz, ∞; default: 116.4 mHz; ∞ = no filter) was applied to all timeseries. BOLD signals were scaled to percent signal change (PSC) using variable baseline strategies (see Supplementary Table S1; default: overall-mean). Global signal was PSC-scaled after summing unscaled signals. ET-CO₂ baselines were subtracted using the same approach. To correct for acquisition delays, ET-CO₂ was aligned to global signal via cross-correlation (±15 s). The first 15 s were then discarded to ensure steady-state.

CVR, defined as %BOLD change per unit change in regressor, was estimated via linear regression using BOLD timeseries as the dependent variable and either ET-CO₂ (default for CO₂/BH) or global signal (default for RS) as regressor. To account for physiological delay, the regressor was optionally shifted to the point of absolute maximum correlation with individual BOLD timeseries using cross-correlation and variable max-delay (±0–60 s, ∞; default: 0 s = no alignment). Timeseries were then downsampled to original BOLD sampling rate (0.878 s). Motion-confounds (24 timeseries; x, y, z translation and rotation plus their temporal derivatives, and squared terms) were included in the regression model if their absolute correlation with the main regressor was below a variable threshold (0–1; default: 0 = no motion-confounds).

To ensure robust repeatability estimates, quantified in this study using correlation-based metrics (see below), which are highly sensitive to extreme values, we removed outlying CVR estimates using a median absolute deviation (MAD) filter with a z-corrected threshold of 10 (approximately equivalent to excluding values more than 10 standard deviations from the mean under a normal distribution). While this approach was appropriate for our dataset of healthy volunteers, caution is warranted when applying outlier filtering more broadly, as overly aggressive thresholds may inadvertently remove physiologically meaningful signal in clinical or heterogeneous populations.

As a final step, we evaluated additional techniques for refining the CVR maps. First, we assessed several quality metrics for identifying low-quality voxels, including overall model fit (coefficient of determination, R²), regressor-specific fit (t-value), susceptibility-related distortion (fieldmap magnitude (fmap)), and temporal signal-to-noise ratio (tSNR; mean signal divided by its standard deviation, computed on raw BOLD data). For each metric, the lowest-performing 10% of voxels were removed; by default, no quality-based exclusion was applied. We then examined normalization strategies in which CVR maps were scaled to the mean value within gray matter (GM), white matter (WM), cerebrospinal fluid (CSF), or the whole-brain (WB), based on tissue segmentations obtained with FSL FAST (via fMRIPrep). The default was to apply no normalization. Analyses could also be restricted to a specific tissue of interest (default: WB).

Analysis

The primary outcome measure was test–retest repeatability of CVR maps, measured using voxel-wise intraclass correlation ICC(C,1).³⁵ We assessed how different design parameters, as detailed above, influenced repeatability across the three vascular paradigms: CO₂ inhalation (from here on referred to as CO₂), BH, and RS.

The analysis had two stages. First, we conducted a univariate sensitivity analysis: each parameter was varied individually (others held at default; see Supplementary Table S1), and repeatability of CVR maps were calculated.

Based on these results, key parameters (see Figure 1) were selected for a second, in-depth analysis. We focused on:

SDC: Compared voxel masks with/without correction to assess spatial alignment.

Smoothing: Analyzed how spatial smoothing influenced repeatability and subject/tissue contrast.

Linear detrending, temporal filtering, motion-confounds: Explored combined and interacting effects using Fourier analysis and regressor–confounds collinearity.

Regressor repeatability: Assessed how regressor stability related to CVR repeatability.

Regressor alignment: Evaluated the impact of aligning the regressor to individual BOLD signals.

Quality metrics: Investigated the relationship between tSNR and CVR across tissue types.

Univariate sensitivity analysis of design parameters

For each vascular paradigm, we perturbed individual parameters while keeping others fixed (defaults in Supplementary Table S1). Test–retest repeatability was quantified using ICC(C,1). Effects were evaluated with a repeated-measures Anova (AnovaRM) model after Fisher z-transformation of ICC values, implemented in R v4.4.3³⁶ using the function aov_ez from the package afex v1.5-0³⁷ via the Python-R interface rpy2.³⁸ Correction of p-values for non-sphericity was performed using the Huynh–Feldt (HF) method, which has been shown to be a robust approach under non-normality conditions^39,40 and is recommended for smaller sample sizes and multiple within-subject measurements.⁴¹ When sphericity corrections could not be performed, due to factor having only two levels (sphericity not defined) or singularities in the covariance matrix, uncorrected p-values were reported and explicitly indicated. For each factor we report F-value, degrees of freedom, HF correction factor, corresponding p-value and partial effect sizes (η²).

This exploratory analysis aimed to identify patterns for future validation; therefore, reported p-values were not corrected for multiple comparison.

In-depth analysis of design parameters

Susceptibility distortion correction

To evaluate the impact of SDC, we examined averaged fieldmaps, thresholded at ±40 Hz (pixel-bandwith was ~30 Hz/pixel in the phase-encoding). Differences between EPI-masks (with vs without SDC) were tested using FSL Randomise v2.9⁴² with threshold-free cluster enhancement (TFCE) and α = 0.05.

Smoothing

We evaluated the effect of spatial smoothing on repeatability and on subject/tissue contrast across fwhm levels. Subject contrast was quantified using “differential correlation,” a metric used in fMRI to assess how uniquely statistical maps identify individual subjects.⁴³ Briefly, voxel-wise Pearson correlation of CVR maps between runs were computed for all pairs of subjects and differential correlation was defined as the mean within-subject correlation minus the mean between-subject correlation. Tissue contrast was assessed by generating average CVR maps and extracting mean CVR values within GM, WM, CSF, and WB.

Interaction

We evaluated the combined impact of linear detrending, temporal filtering, and motion-confounds on test–retest ICC(C,1). Every parameter combination was tested and AnovaRM models, as described above, were used to assess main effects and interactions, again correcting from sphericity but not for multiple comparison.

We also analyzed qualitatively how confound–regressor collinearity affected repeatability, computing R² with the regressor as the dependent variable. FFTs were used to assess how parameter combinations altered regressor spectra.

Optimization

Continuing on the analysis above for linear detrending, temporal filtering, and motion-confounds, we aimed to isolate each parameter’s unique effect by performing an “optimized perturbation” where each parameter was varied while others were fixed at optimal values (based on test–retest ICC(C,1)). As a sanity check, we repeated the optimized and univariate perturbation using voxel-wise Pearson correlation with a reference map, the vascular paradigm and parameter combination with the highest test–retest repeatability, instead of test–retest ICC(C,1).

We also examined whether amount of head motion (measured by average FD) correlated with improvement in test–retest ICC(C,1) when including motion-confounds in the linear regression model. Improvements were assessed relative to models without motion-confounds, using optimal linear detrending and temporal filtering parameters for each condition.

Finally, we computed voxel-wise quality metrics using optimal values for linear detrend order, low-pass cutoff frequency, and confound–regressor correlation thresholds (based on test–retest ICC(C,1)) for each vascular paradigm:

Normalized CVR, scaled by the median CVR across voxels, subjects, and runs.

Model fit, measured by R².

Regressor fit, measured by t-value.

Average absolute CVR difference between test–retest, also scaled by the median CVR.

Coefficient of variation (COV), defined as standard deviation divided by absolute mean (limited to 1 to avoid arbitrary large values when mean CVR is close to 0).

Reliability, assessed using ICC(C,1) between test–retest and across subjects.

To avoid edge effects due to varying brain voxel-masks, all metrics above were restricted to voxels with a valid mask in at least 50% of subjects and runs.

Regressor

To assess the repeatability of the regressor itself, the timeseries underwent default linear detrending and temporal filtering. Test and retest regressors were aligned (±15 s) before computing ICC(C,1) between the two. Repeatability of ET-CO₂ versus global signal regressors was compared using paired t-tests on z-transformed ICCs and compared with corresponding CVR map repeatability. We also compared mean WB-CVR between test–retest. Normality was assessed using the Shapiro–Wilk test, followed, when significant, by inspection of Q–Q plots. Effect sizes are reported using Cohen’s d.

Delay

We assessed how aligning regressors to individual BOLD signals affected repeatability. Specifically, we analyzed test–retest differences in CVR maps and investigated whether these differences could be attributed to the repeatability of estimated delays (measured as absolute delay differences), regressor–BOLD correlation, the maximum allowable delay, and the regressor’s autocorrelation function.

We also evaluated if applying a median filter to delay values (kernel size 3 × 3 × 3) prior to the CVR estimation, could improve CVR test–retest repeatability.

Quality metrics

Among the tested quality metrics used to improve CVR repeatability, we choose to focus on tSNR. To maintain measure independence, tSNR was computed from RS data (RS-tSNR) and CVR from CO₂ inhalation (CO₂-CVR). We compared them using scatterplots and averages maps across subjects. Mean values were also calculated within GM, WM, and CSF after within-subject standardization to enable comparison across tissues.

Results

In total 25 subjects were recruited; however, one was later excluded due to difficulties with being inside of the MRI-scanner, leaving 24 subjects (four complete blocks of size 6) that completed the study. The median age was 26 (range: 18–52), and 63% of subjects (n = 15) were women. Results from the discomfort survey are provided in Supplementary Figure S3 and Table S3. No significant differences in discomfort were reported between the different vascular paradigms.

Head motion was highest during BH (mean FD = 0.14 mm), followed by CO₂ (0.12 mm) and RS (0.11 mm). However, only the BH–RS comparison was statistically significant. For more information see Supplementary Material—Head motion.

All participants demonstrated good compliance with the BH task, as confirmed by inspection of the respiratory traces. In three measurements, the respiratory trace was accidently not saved; in these cases, compliance was instead verified by inspection of the global BOLD signal.

Supplementary Figure S5 and Table S4 present the results of the initial univariate sensitivity analysis. In the sections below, we focus on the in-depth analysis of design parameters illustrated in Figure 1, beginning with SDC.

Susceptibility distortion correction

We observed pronounced positive field offsets near the frontal sinuses, while large negative distortions appear around the ear canals (Figure 2, top panel). In areas adjacent to these regions, we identified significant differences (p < 0.05) in voxel-wise brain masks with and without SDC (Figure 2, bottom panel). At the same time, we only saw moderate improvement in repeatability comparing CVR maps with and without SDC (Supplementary Figure S5), which reached significant levels for BH (p = 0.048; η² = 0.16), but not for CO₂ (p = 0.053; η² = 0.15) or RS (p = 0.23; η² = 0.06).

Figure 2.

Effect of SDC. (top panel) Average fieldmap across subjects and runs, thresholded at ±40 Hz, highlighting areas with substantial field inhomogeneity, most prominently around the frontal sinuses and ear canals, and indicating the direction of voxel displacement along the phase-encoding (anterior–posterior) axis. (bottom panel) Regions showing significant differences (p < 0.05) in brain voxel-mask (EPI-mask) with and without SDC, illustrating the spatial extent of geometric distortions corrected by the SDC procedure and signal loss. These EPI-mask were based on BOLD data acquired during the CO₂ paradigm.

Smoothing

We observed a clear increase in test–retest ICC(C,1) as the fwhm increases from 0 mm (no smoothing) to 9 mm (p < 0.001) for all vascular paradigms (η² = 0.97, 0.91, 0.91 for CO₂, BH, RS, respectively). However, the relative improvement in repeatability diminishes with higher smoothing levels (Supplementary Figure S5). At the same time there was a loss of subject and tissue contrast (Figure 3). Particularly we note that CVR in WM tends to be overestimated and in CSF underestimated.

Figure 3.

Effect of spatial smoothing on CVR map repeatability and tissue/spatial contrast for the CO₂ paradigm. (top panel) Average test–retest Pearson correlation of CVR maps within the same subject (green graph) and differential correlation (average within-subject correlation minus average between-subject correlation; red graph) across spatial smoothing levels (0–9 mm fwhm). Within-subject correlation increases with smoothing, indicating improved repeatability, but at the same time the differential correlation reduces, indicating reduced subject-contrast. (middle row) Mean CVR maps at 2, 5, and 8 mm fwhm illustrate the progressive blurring of anatomical boundaries with increased smoothing. (bottom panel) average CVR values within different tissue types, WM, GM, CSF, and WB, as a function of fwhm. As smoothing increases, tissue-specific values converge toward the WB average, reflecting a loss of tissue contrast.

Interaction

For all paradigms the confound–regressor correlation threshold (η² = 0.95, 0.83, 0.65 for CO₂, BH, RS, respectively), low-pass cutoff frequency (η² = 0.96, 0.95, 0.94) and their two-way interaction (sphericity uncorrected; η² = 0.82, 0.73, 0.66) were significant (p < 0.001). The linear detrend order was significant for CO₂ (p = 0.030; η² = 0.14) and RS (p < 0.001; η² = 0.54) but not for BH (p = 0.11; η² = 0.09), and its interaction with low-pass cutoff frequency was significant for CO₂ (p < 0.001; η² = 0.20) but not for BH (p = 0.097; η² = 0.07) or RS (p < 0.13; η² = 0.07). Additionally, the interaction between linear detrend order and confound–regressor correlation threshold was significant in all paradigms (p < 0.001; η² = 0.22, 0.25, 0.53), as was the three-way interaction among all factors (sphericity uncorrected p < 0.001; η² = 0.16, 0.08, 0.07). In Supplementary Table S5, complete results from the AnovaRM analysis are given, including F-values and HF correction factors.

Without motion-confounds, CO₂ and BH CVR map repeatability was largely insensitive to the exact low-pass cutoff frequency, if major regressor components were preserved (Figure 4, top-left and middle-left panels). RS showed a similar trend, though the regressor lacked clear spectral peaks and instead showed a decline in power around 116.4 mHz (~1/9 s). When motion-confounds were included, the regressor and confounds exhibited near-perfect collinearity for low-pass cutoff frequencies below 29.19 mHz, leading to poor repeatability across all vascular paradigms (Figure 4, top-right and middle-right panels). As the cutoff frequency increased, collinearity decreased, and repeatability improved. We also note that without motion-confounds, the regressor shape remains largely preserved except at the lowest cutoff (7.275 mHz; Figure 4, bottom left panel). When motion-confounds are included, however, both 7.275 and 58.20 mHz cutoffs distort the regressor substantially, while the unfiltered version retains the expected temporal profile (Figure 4, bottom right panel). The effect of linear detrending on repeatability was diminished when motion-confounds was included (Supplementary Figure S7, top panels). A likely explanation seems to be that the effect of linear detrending is to remove low-frequency components, however these are already removed when motion-confounds are included (Supplementary Figure S7, bottom panels).

Figure 4.

Interaction between temporal filtering and motion-confounds. (top row) Test–retest ICC(C,1) for CVR maps as a function of low-pass cutoff frequency for each vascular paradigm, (left) without motion-confounds (regressor-confound correlation threshold = 0.0), (right) with motion-confounds (regressor-confound correlation threshold = 1.0). Without motion-confounds, repeatability plateaus at different frequencies for the different vacular paradgims, indicated by colored dotted lines. With motion-confounds, repeatability improves only as cutoff frequency approaches infinity. (middle-left) In the frequency spectra of each (unfiltered) regressor timeseries, normed by the maximum mean amplitude, major components are evident for CO₂ and BH; RS shows reduced power around 116.4 mHz. (middle-right) The collinearity (R²) between regressor and motion-confounds is near perfect at cutoff frequency below 29.10 mHz and reduces successively as the frequency increases. (bottom row) ET-CO₂ regressors for CO₂ paradigm are markedly different with and without motion-confounds, after filtering at different cutoff frequencies. Without motion-confounds (left), only the 7.275 mHz version (turquoise) deviates notably from the expected shape (two equal-sized 60 s blocks). With motion-confounds (right), only the unfiltered version (purple) retains the expected shape.

Optimization

Using optimized perturbed parameters, we found that the linear detrend order no longer had a significant effect on RS paradigm (p = 0.40; η² = 0.04). Likewise, the confound-regressor correlation threshold no longer significantly affected paradigms BH (sphericity uncorrected p = 0.15; η² = 0.07) or RS (sphericity uncorrected p = 0.74; η² = 0.02). All other parameters remained statistically significant, see Supplementary Table S6 for summary.

When accounting for head motion, we observed that subjects with the highest amount of motion during BH (average FD > 0.2 mm) showed improved test–retest ICC(C,1) when motion-confounds were included in the linear regression model (Supplementary Figure S4). In contrast, for the CO₂ and RS paradigms, amount of head motion did not correlate with changes in repeatability.

Overall, the CO₂ paradigm achieved the highest test–retest ICC(C,1) using the default settings: first-order detrending, a low-pass cutoff frequency of 116.4 mHz, and exclusion of motion-confounds. Consequently, this configuration was selected as the reference CVR map. Similar trends to those shown in Figure 5 were observed when calculating whole-brain voxel-wise Pearson correlations with this reference map instead of test–retest ICC(C,1); (Supplementary Figure S8). However, RS showed a local peak in correlation around low-pass cutoff frequency 232.8 mHz under the univariate perturbation analysis. Summaries of the optimized and univariate perturbed analyses using correlation with the reference CVR map are provided in Supplementary Tables S7 and S8, respectively.

Figure 5.

Optimized perturbation of design parameters. Each design parameter, linear detrend order, low-pass cutoff frequency, and confound–regressor correlation threshold, is perturbed individually. At each level, the other two parameters are held at their empirically optimal settings, defined by the highest mean test–retest ICC(C,1). The overall optimal combination for each vascular paradigm is highlighted with an orange marker. Error bars represent 95% confidence intervals.

Using the optimized processing combination shown in Figure 5 for each paradigm, we qualitatively observed that the average median-scaled CVR maps appeared similar across paradigms, as did the distribution of voxel-wise values (Supplementary Figure S9, top section). However, when comparing quality metrics (R², t-value, CVR absolute difference, CVR COV, CVR reliability) the CO₂ paradigm consistently outperformed the other two paradigms, with BH performing second best (Supplementary Figure S9, bottom sections).

Regressor

For the CO₂ paradigm, regressor repeatability was significantly higher for the ET-CO₂ regressor compared to the global signal regressor (p = 0.0016; Cohen’s d = 0.73). No significant differences were observed for BH (p = 0.30; Cohen’s d = 0.21) or RS (p = 0.95; Cohen’s d = −0.012). In contrast, the corresponding CVR map repeatability showed significant lower repeatability (p < 0.001) using the ET-CO₂ regressor across all paradigms (Cohen’s d = −0.79, −1.0, −2.3 for CO₂, BH, RS, respectively). Supplementary Figure S10 shows a graphical representation of above results and Supplementary Table S9 a summary of the statistical tests.

Additionally, across all paradigms, we observed a small but consistent reduction in whole-brain average CVR values between test and retest sessions using the ET-CO₂ regressor, however only significant for CO₂ paradigm (p = 0.036, Cohen’s d = −0.46). See Supplementary Table S10 for summary.

Delay

Aligning the regressor to account for voxel-wise temporal delays (Supplementary Figure S5) reduced CVR map repeatability across all paradigms (p < 0.001; η² = 0.83, 0.88, 0.86 for CO₂, BH, RS, respectively). We found that the regressor alignment introduced a subset of voxels showing anti-correlated CVR values, similar amplitude but different signs, between test and retest (Figure 6, column 1), primarily located in deep white matter (Figure 6, column 2). These voxels exhibited lower regressor–BOLD correlations (Figure 6, column 3), indicating poor model fit. The absolute delay differences between test and retest also clustered around paradigm-specific times (Figure 6, column 4), which corresponded to inflection points in the corresponding regressor’s autocorrelation function (Figure 6, column 5). This suggests that instability in delay estimation, particularly in regions with low signal, may drive the observed decrease in repeatability following alignment.

Figure 6.

Effects of voxel-wise regressor alignment. (column 1) Voxel-wise CVR values from a representative subject, comparing test (x-axis) and retest (y-axis) for the CO₂, BH, and RS paradigms. Blue points indicate voxels with consistent sign and values along the identity line (x = y), while orange points indicate sign reversals and values along the anti-diagonal (x = −y), indicating anti-correlation. (column 2) Spatial distribution of anti-correlated voxels (orange), averaged across subjects and thresholded at 10% occurrence, primarily located in deep white matter. (column 3) Distribution of regressor–BOLD correlation coefficients for blue and orange voxel groups (scaled by peak density), highlighting weaker signal in anti-correlated voxels. (column 4) absolute delay differences between test and retest for each group, and (column 5) regressor autocorrelation structure, showing a plateau, or minima, corresponding to delay were orange voxels pile up in column 4. Delay bounds for alignment were set to: CO₂ = unbounded, BH = ±15 s, RS = ±8 s.

Given that negative CVR has often been associated with vascular steal effects, although other physiological and non-physiological mechanisms may also contribute, we will refer to these anti-correlated voxels introduced by regressor alignment as “fake steal.” We also observed that applying a median filter to delay values prior to CVR estimation, largely removed the anti-correlated voxels (Supplementary Figure S11).

Quality metrics

We found that identifying poor quality voxels using either R², regressor t-value or fieldmap, had a moderate positive impact on repeatability, while tSNR had a negative effect (Supplementary Figure S5). This reduction appears to stem from tSNR preferentially removing voxels with high variability (Figure 7, top panel), which can attenuate correlation-based metrics that depend on overall variability within the dataset.

Figure 7.

Comparison of tSNR and CVR. (top panel) Voxel-wise scatterplot of CO₂-CVR versus RS-tSNR for a representative subject. The shaded area indicates the lowest 10% of tSNR values that would be excluded by the tSNR-quantile filter. (middle panel) Group-averaged maps of CO₂-CVR and RS-tSNR, revealing spatial similarities, and (bottom panel) within-subject standardized values in WM, GM, and CSF, further illustrating tissue-level correspondence. Whiskers denote 1.5 times the interquartile range.

We also observe that both CVR and tSNR maps seem to highlight different tissue boundaries (Figure 7, middle panel) which is confirmed when comparing average values in various tissue types (Figure 7, bottom panel). In fact, all quality metrics, except fieldmaps, showed large tissue dependencies (Supplementary Figure S9).

Notably, the measures in Figure 7 were derived independently, CVR from the CO₂ paradigm (CO₂-CVR) and tSNR from the resting-state paradigm (RS-tSNR), yet similar associations were observed across other paradigm combinations, underscoring the robustness of this relationship.

Discussion

In this study, we systematically evaluated how processing choices influence the repeatability of BOLD–CVR maps across common vascular paradigms. Our main findings were (i) applying temporal filtering and including motion-confounds can introduce substantial collinearity between the regressor and confounds, warranting caution when using them together and (ii) the temporal properties of the regressor impose fundamental limits on its ability to capture physiological delays, requiring care to avoid artifacts such as “fake steal.”

Temporal filtering and motion-confounds create large collinearity

Our analysis of linear detrending, temporal filtering, and motion-confound reveals that processing steps in CVR analysis are not always additive. Specifically, when motion-confounds are included in the linear regression model, adding temporal filtering can be counterproductive, as filtering reduces degrees of freedom and increase collinearity between regressor and confounds (Figure 4, right column).

Without motion-confounds, temporal filtering showed little effect on test–retest repeatability if fundamental stimulus frequencies were preserved, for our CO₂ paradigm this frequency was ~1/135 Hz (60 s CO₂ + 75 s rest) and for BH ~1/60 Hz (15 s BH + 45 s rest) plus the first and second harmonics, supporting that CVR responses are primarily stimulus driven (Figure 4, left column). For RS, test–retest repeatability plateaued at 232.8 mHz (~1/4 s) and the correlation with CVR reference map showed a peak at approximately the same frequency (Supplementary Figure S8). This corresponds to typical respiratory cycles (~5–8 s), indicating that RS-CVR is primarily driven by spontaneous breathing. These results are consistent with Liu et al.,⁹ although they reported a peak at 116.4 mHz (~1/9 s), possibly reflecting small differences in breathing patterns between studies.

From the optimal perturbation analysis (Figure 4), we not that including motion-confounds tended to reduce test–retest repeatability for the CO₂ and BH paradigms (only statistical significantly for CO₂) and had little to no effect on RS paradigm. However, this likely reflects the generally low levels of head motion in our dataset. For the few subjects with a higher degree of head motion during BH (average FD > 0.2 mm), including motion-confounds improved repeatability (Supplementary Figure S4), suggesting that the relative trade-offs might be different when more motion is present. Caution is therefore needed before generalizing our results to other studies with greater and more variable head movement.

As a general recommendation, when motion-confounds are included in the linear regression model, additional processing steps that reduce the degrees of freedom, such as temporal filtering, should generally be avoided as this can increase the collinearity between the regressor and confounds. In contrast, when motion-confounds are omitted, temporal filtering is acceptable provided that fundamental stimulus frequencies, and relevant harmonics, are preserved.

Temporal limitations of regressor can lead to fake steal

Anti-correlated voxels along the x = −y line in Figure 6 (first column, orange) likely reflect true CVR responses with inverted sign caused by regressor misalignment. Such sign inversions can mimic vascular steal, that is, “fake steal,” and arise when baseline segments of the regressor become aligned with plateau segments of the signal, and vice versa. Consistent with this interpretation, the test–retest delay differences for these anti-correlated voxels cluster around paradigm-specific times: ~60 s for CO₂, 15 s for BH, and ~7 s for RS (Figure 6, fourth column). These values match the stimulus cycle lengths for CO₂ and BH, and for RS they correspond to typical respiratory periods (~5–8 s). These voxels also show weaker overall correlation with the regressor (Figure 6, third column), which would make them more susceptible to misalignment.

Crucially, the consistent, paradigm-specific delays suggest that constraining the delay range, ideally within the first local minimum of the regressor’s autocorrelation (Figure 6, fifth column), can mitigate these fake steal artifacts. Since autocorrelation structures differ across paradigm, the safe delay window should be tailored accordingly. Indeed, such an approach is implemented by the advance package RapidTide.⁴⁴ Another solution is to modify the autocorrelation structure itself, for instance by using an asymmetric stimulus. Alternatively, median filtering delay maps before CVR estimation (Supplementary Figure S11) also reduces anti-correlated voxels, suggesting that post hoc filtering can enhance estimate quality.

Other important considerations in BOLD–CVR imaging

Below, we outline important considerations in BOLD–CVR imaging that are highlighted by our findings and align with broader knowledge in the field.

BOLD–CVR is influenced by baseline vascular factors

Our CVR–tSNR analysis shows that quality metric filtering must be applied cautiously, as aggressive filtering can bias results toward specific tissues or regions (Figure 7). Furthermore, the strong spatial correspondence between CVR and tSNR also suggests a shared underlying source.

This could be baseline vascular factors such as cerebral blood volume or deoxyhemoglobin concentration. Their influence is a known limitation of BOLD–CVR⁴⁵ and it seems plausible that they be captured by fluctuation-based metrics like tSNR, especially since such fluctuations have previously been used to rescale fMRI signals.^46,47 An alternative, not mutually exclusive, explanation is that both measures reflect true physiological CVR, consistent with tSNR being the inverse of resting-state fluctuation amplitude (RSFA), a known CVR proxy.⁴⁸

Regardless of interpretation, the dependence of BOLD–CVR on baseline physiology means caution is warranted when comparing CVR values across regions or tissue types that may be modulated by varying baseline vascular factors.

Smoothing improves SNR but reduces subject and tissue contrasts

Our results support what is generally understood: spatial smoothing improves SNR but also decreases subject identifiability and increases tissue blending. This trade-off is evident comparing the within-subject and differential correlation, as well as the mean CVR values across tissue types as smoothing increases (Figure 3). (Note that the high average CVR observed in CSF arises because the CSF mask includes extracerebral regions that contain large venous sinuses.)

Choosing an optimal smoothing level therefore requires weighing SNR improvement against the need to preserve subject-specific and spatial detail. Larger kernels (e.g. 8 mm fwhm for 3 mm voxels at 3 T) may be appropriate for detecting broad CVR changes, such as those seen in steno-occlusive disease. However, for fine-grained effects, such as those relevant to small vessel disease, this degree of smoothing may obscure diagnostically important features.

Standard Gaussian smoothing also blends signals across tissue boundaries, complicating interpretation, especially considering the above discussion on baseline vascular factors and its influence on BOLD–CVR measurements. We therefore recommend alternative smoothing strategies that respect anatomical boundaries, such as tissue-specific or edge-preserving approaches (e.g. SUSAN in the FSL toolbox⁴⁹), which offer a more appropriate balance between SNR improvement and spatial specificity.

Choice of regressor determines interpretation of CVR maps

Across all vascular paradigms, the global signal produced more repeatable CVR maps than the ET-CO₂ regressor, even for the CO₂ paradigm, where ET-CO₂ itself showed high repeatability. This likely reflect the fact that ET-CO₂ is only an indirect measurement of arterial CO₂ and that a HRF links changes in blood gas concentrations to the resulting cerebral blood flow and BOLD responses. Much of this HRF is implicitly captured by the global signal, although it also incorporates non-CO₂ contributions such as neural activity and regionally varying delays.⁵⁰ Another limitation of using the global signal is that the resulting CVR estimates no longer retain physiologically interpretable units (e.g. %BOLD/mmHg).

An alternative explanation for the improved repeatability is that CVR may change between session. We observed a consistent decrease from test to retest, consistent with Sleight et al.,⁵¹ possibly reflecting reduced CO₂ sensitivity after the initial exposure. Because the global signal scales with the overall BOLD response, using it as a regressor partially normalizes out such global shifts, yielding CVR maps that reflect relative rather than absolute reactivity.

This raises a conceptual question: should BOLD–CVR be interpreted as a quantitative physiological measure or as a spatial map of relative vascular responsiveness? And given BOLD’s dependence on baseline vascular factors, does a quantitative regressor truly make BOLD–CVR more quantitative?

In practice, the best regressor might depend on study goals. As both the global signal and ET-CO₂ can be obtained with minimal effort, we recommend acquiring both and selecting whichever yields the most consistent and meaningful CVR estimates for the population and application of interest.

Lost signal can not be restored using SDC

Applying SDC results in little to no improvement in test–retest repeatability (Supplementary Figure S5). Perhaps expected given that the SDC is driven by stable anatomical features, specifically boundaries between tissue–air and tissue–bone. Our data also confirms the well-known dependence of distortion on boundary geometry and orientation relative to the B₀ field, which determines both the direction and magnitude of field perturbations (Figure 2).⁵²

Accordingly, the benefit of SDC in gradient-echo imaging lies in repositioning signal to its correct anatomical location, not restoring lost signal.⁵³ This is illustrated in the bottom panel of Figure 2, where blue voxels in the anterior brain reflect apparent signal loss after SDC. Strong anterior field perturbations shift these voxels posteriorly into the brain, causing signal mixing that cannot be fully disentangled.

This discussion highlights the importance of planning BOLD–CVR experiments in advance to minimize signal loss. Adjusting the phase-encoding direction or tilting the acquisition slab can reduce dropout in specific regions, often, unfortunately, at the cost of increased distortion elsewhere.⁵⁴

Limitations

Single dataset

Our study examined processing parameters using a single dataset with a specific scanning protocol, which may limit the generalizability of some findings. For example, some toolboxes, such as AFNI, automatically adjust linear detrend order based on the scan length,⁵⁵ why our results might not be transferable to studies with substantially different scan length than used in our study (5 min). Therefore, caution is warranted when applying our results to datasets acquired with different scanning protocols.

Only two measurements

Our repeatability assessment relied on just two measurements (test–retest). A more comprehensive evaluation would involve additional repetitions, though the optimal number is not straightforward to define. However, because our goal was not to estimate absolute repeatability with high precision but rather to identify trends and provide methodological insights, we consider two measurements sufficient for the purposes of this study.

Design parameters scope

Our analysis focused on a selected set of CVR processing parameters (Supplementary Table S1). While not exhaustive, these parameters span a broad cross-section of commonly used practices. Many alternative or more advanced approaches, such as ICA-based denoising, certainly exist, but we consider our chosen set to be broadly representative of current CVR processing strategies.

Vascular paradigm

This study compared several commonly used vascular paradigms in CVR imaging but was not exhaustive; for example, it did not include acetazolamide administration or vasoconstrictive challenges such as paced deep breathing. Although optimal processing strategies may differ between paradigms, we believe that many of the insights gained here remain broadly applicable.

Regressor

Although it is common in CVR imaging to convolve the physiological regressor with a HRF, there is no consensus on the most appropriate HRF, nor on whether the optimal HRF varies across vascular paradigms. Regional differences in venous drainage can further shape the HRF,¹³ which some approach by fitting voxel-wise HRFs.¹⁴ Given these complexities, we chose not to evaluate HRF convolution, as doing so would require dedicated methodological considerations beyond the scope of this work. We also note that using the global signal as a regressor can partially mimic the effect of HRF convolution, though it introduces its own limitations, as discussed above.

Repeatability metric

We assessed repeatability using correlation-based metrics computed across the whole-brain CVR maps and did not evaluate alternative approaches such as regional analyses or Bland–Altman comparisons. Although several metrics are available, correlation effectively captured both repeatability and reliability for our purposes. Because our aim was to provide practical insights into processing choices rather than an exhaustive comparison of repeatability metrics, we focused on a single, simple, and interpretable measure.

Emphasis on CVR amplitude

We focused our analysis on CVR amplitude and did not evaluate delay map repeatability or other dynamical aspects of the CVR response (e.g. response speed or shape). This decision allowed us to concentrate on the primary factors influencing CVR amplitude map generation without diluting the analysis with additional outcome measures.

CO₂ sampling

Our setup estimated ET-CO₂ without capturing full waveforms, limiting post hoc verification and potentially affecting generalizability, particularly during BH. However, in terms of repeatability of the regressor itself, the ET-CO₂ timeseries showed similar performance as the global signal (Supplementary Figure S10), and the amount of missing ET-CO₂ data during BH did not correlate with CVR map repeatability (Supplementary Material—CO₂ sampling). Together, these observations indicate that our findings are reasonably robust despite the limitations of the CO₂ sampling method.

Conclusion

Our findings underscore the importance of standardization across processing strategies to improve reproducibility and facilitate comparison between studies. While this study focused on healthy subjects and widely used vascular paradigms, future work should explore how these findings translate to clinical populations and alternative stimulus protocols.

In offering this overview, we hope to contribute to a more transparent and accessible entry point into CVR imaging, while also encouraging greater consistency and comparability across studies.

Supplemental Material

sj-docx-1-jcb-10.1177_0271678X261420026 – Supplemental material for How processing choices effect repeatability in BOLD–CVR imaging

Supplemental material, sj-docx-1-jcb-10.1177_0271678X261420026 for How processing choices effect repeatability in BOLD–CVR imaging by Gustav Magnusson, Alex A Bhogal, Charalampos Georgiopoulos, Gunnar Cedersund, Lovisa Tobieson, Maria Engström and Anders Tisell in Journal of Cerebral Blood Flow & Metabolism

Footnotes

Acknowledgements

The authors gratefully acknowledge Mats Johansson, Region Östergötland, Sweden, and Bengt Rangemalm, Linköping University, Sweden, for their invaluable technical assistance.

Author contributions

GM: conceptualization, data curation, investigation, methodology, software, validation, visualization, writing—original draft, writing—review and editing. AAB: conceptualization, writing—review and editing. CG: subject screening, methodology, supervision. LT: subject screening, methodology, supervision. GC: methodology, funding acquisition, supervision. ME: methodology, funding acquisition, project administration, resources, supervision, writing—review and editing. AT: methodology, funding acquisition, project administration, resources, supervision, writing—review and editing.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was made possible through funding from the Swedish Research Council (grant 2022-02886), the Swedish Brain Foundation (grant FO2022-0109), and Region Östergötland (ALF grant). GC has used funding from the Swedish Research Council (grant 2018-05418, 2018-03319, 2023-03186, and 2023-05460), EU project STRATIF-AI (grant 101080875), and EU project PRECISE4Q (grant 777107).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

ORCID iDs

Gustav Magnusson

Alex A Bhogal

Charalampos Georgiopoulos

Lovisa Tobieson

Maria Engström

Anders Tisell

Data availability statement

Due to ethical and institutional restrictions, the raw imaging data from this study cannot be shared publicly. However, de-identified data and associated processing scripts may be made available to qualified researchers for non-commercial academic use only, upon reasonable request to the authors.

Supplemental material

Supplemental material for this article is available online.

References

Sleight

Stringer

Marshall

, et al. Cerebrovascular reactivity measurement using magnetic resonance imaging: a systematic review. Front Physiol 2021; 12: 643468.

Liu

De Vis

Cerebrovascular reactivity (CVR) MRI with CO₂ challenge: a technical review. Neuroimage 2019; 187: 104–115.

Fierstra

Sobczyk

Battisti-Charbonney

, et al. Measuring cerebrovascular reactivity: what stimulus to use?. J Physiol 2013; 591(23): 5809–5821.

Bright

Murphy

Reliable quantification of BOLD fMRI cerebrovascular reactivity despite poor breath-hold performance. Neuroimage 2013; 83: 559–568.

Liu

Pinho

, et al. Cerebrovascular reactivity mapping without gas challenges. NeuroImage 2017; 146: 320–326.

De Vis

Bhogal

Hendrikse

, et al. Effect sizes of BOLD CVR, resting-state signal fluctuations and time delay measures for the assessment of hemodynamic impairment in carotid occlusion patients. Neuroimage 2018; 179: 530–539.

Tancredi

Hoge

RD.

Comparison of cerebral vascular reactivity measures obtained using breath-holding and CO₂ inhalation. J Cereb Blood Flow Metab 2013; 33(7): 1066–1074.

Lipp

Murphy

Caseras

, et al. Agreement and repeatability of vascular reactivity estimates based on a breath-hold task and a resting state scan. Neuroimage 2015; 113: 387–396.

Liu

Pinho

, et al. Cerebrovascular reactivity mapping using resting-state BOLD functional MRI in healthy adults and patients with moyamoya disease. Radiology 2021; 299(2): 419–425.

10.

Zvolanek

Moia

Dean

, et al. Comparing end-tidal CO₂, respiration volume per time (RVT), and average gray matter signal for mapping cerebrovascular reactivity amplitude and delay with breath-hold task BOLD fMRI. Neuroimage 2023; 272: 120038.

11.

Clements

Zvolanek

Reddy

, et al. Quantitative mapping of cerebrovascular reactivity amplitude and delay with breath-hold BOLD fMRI when end-tidal CO₂ quality is low. Imaging Neurosci 2025; 3: imag_a_00536.

12.

Yao

Yang

Wang

, et al. A novel method of quantifying hemodynamic delays to improve hemodynamic response, and CVR estimates in CO₂ challenge fMRI. J Cereb Blood Flow Metab 2021; 41(8): 1886–1898.

13.

Bhogal

AA.

Medullary vein architecture modulates the white matter BOLD cerebrovascular reactivity signal response to CO₂: observations from high-resolution T2* weighted imaging at 7T. Neuroimage 2021; 245: 118771.

14.

Poublanc

Crawley

Sobczyk

, et al. Measuring cerebrovascular reactivity: the dynamic response to a step hypercapnic stimulus. J Cereb Blood Flow Metab 2015; 35(11): 1746–1756.

15.

Domingos

Esteves

Fouto

, et al. Cerebrovascular reactivity mapping using breath-hold BOLD–fMRI: comparison of signal models combined with voxelwise lag optimization. Imaging Neurosci 2025; 3: IMAG.a.80.

16.

Niftrik

CHB

Piccirelli

Bozinov

, et al. Fine tuning breath-hold-based cerebrovascular reactivity analysis models. Brain Behav. 2016; 6(2): e00426.

17.

Donahue

Strother

Lindsey

, et al. Time delay processing of hypercapnic fMRI allows quantitative parameterization of cerebrovascular reactivity and blood flow delays. J Cereb Blood Flow Metab 2016; 36(10): 1767–1779.

18.

Moia

Termenon

Uruñuela

, et al. ICA-based denoising strategies in breath-hold induced cerebrovascular reactivity mapping with multi echo BOLD fMRI. Neuroimage 2021; 233: 117914.

19.

Alakörkkö

Saarimäki

Glerean

, et al. Effects of spatial smoothing on functional brain networks. Eur J Neurosci 2017; 46(9): 2471–2480.

20.

Mikl

Mareček

Hluštík

, et al. Effects of spatial smoothing on fMRI group inferences. Magn Reson Imaging 2008; 26(4): 490–503.

21.

Ciric

Wolf

Power

, et al. Benchmarking of participant-level confound regression strategies for the control of motion artifact in studies of functional connectivity. Neuroimage 2017; 154: 174–187.

22.

Huijbers

Van Dijk

KRA

Boenniger

, et al. Less head motion during MRI under task than resting-state conditions. Neuroimage 2017; 147: 111–120.

23.

Magnusson

Engström

Georgiopoulos

, et al. High inspired CO₂ target accuracy in mechanical ventilation and spontaneous breathing using the additional CO₂ method. Front Med 2024; 11: 1352012.

24.

Pinto

Bright

Bulte

, et al. Cerebrovascular reactivity mapping without gas challenges: a methodological guide. Front Physiol 2021; 11: 608475.

25.

Peirce

Gray

Simpson

, et al. PsychoPy2: experiments in behavior made easy. Behav Res Methods 2019; 51(1): 195–203.

26.

Boré

Guay

Bedetti

, et al. Dcm2Bids (Internet). Zenodo, https://zenodo.org/record/8436509 (2023, accessed 25 June 2025).

27.

Esteban

Markiewicz

Blair

, et al. fMRIPrep: a robust preprocessing pipeline for functional MRI. Nat Methods 2019; 16(1): 111–116.

28.

Dale

Fischl

Sereno

MI.

Cortical surface-based analysis. Neuroimage 1999; 9(2): 179–194.

29.

Harris

Millman

Van Der Walt

, et al. Array programming with NumPy. Nature 2020; 585(7825): 357–362.

30.

McKinney

. Data structures for statistical computing in Python. In: Proceedings of the Python in science conference (Internet), 2010, pp.56–61. Austin: SciPy.

31.

Nilearn Contributors, Chamma

Frau-Pascual

, et al. Nilearn (Internet). Zenodo, https://zenodo.org/doi/10.5281/zenodo.8397156 (2025, accessed 10 July 2025).

32.

The Pandas Development Team. pandas-dev/pandas: Pandas (Internet). Zenodo, https://zenodo.org/doi/10.5281/zenodo.3509134 (2025, accessed 10 July 2025).

33.

Virtanen

Gommers

Oliphant

, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 2020; 17(3): 261–272.

34.

Desikan

Ségonne

Fischl

, et al. An automated labeling system for subdividing the human cerebral cortex on MRI scans into gyral based regions of interest. Neuroimage 2006; 31(3): 968–980.

35.

Liljequist

Elfving

Skavberg Roaldsen

Intraclass correlation—a discussion and demonstration of basic features. PLoS One 2019; 14(7): e0219854.

36.

R Core Team. R: a language and environment for statistical computing (Internet). Vienna: R Foundation for Statistical Computing, https://www.R-project.org/ (2024, accessed 12 November 2025).

37.

Singmann

Bolker

Westfall

, et al. afex: analysis of factorial experiments (Internet), p.1.5-0, https://CRAN.R-project.org/package=afex (2012, accessed 12 November 2025).

38.

Gautier

rpy2 (Internet), https://github.com/rpy2/rpy2/releases/tag/RELEASE_3_6_4 (2023, accessed 12 November 2025).

39.

Berkovits

Hancock

Nevitt

Bootstrap resampling approaches for repeated measure designs: relative robustness to sphericity and normality violations. Educ Psychol Meas 2000; 60(6): 877–892.

40.

Blanca

Arnau

García-Castro

, et al. Non-normal data in repeated measures ANOVA: impact on type I error and power. Psicothema 2023; 35(1): 21–29.

41.

Haverkamp

Beauducel

Violation of the sphericity assumption and its effect on type-I error rates in repeated measures ANOVA and multi-level linear models (MLM). Front Psychol 2017; 8: 1841.

42.

Winkler

Ridgway

Webster

, et al. Permutation inference for the general linear model. Neuroimage 2014; 92: 381–397.

43.

Amico

Goñi

The quest for identifiability in human functional connectomes. Sci Rep 2018; 8(1): 8254.

44.

Frederick

deB, Salo

Daniel

bbfrederick/rapidtide: version 3.1.1—11/17/25 CVR output bug fix (Internet). Zenodo, https://zenodo.org/records/17.633117 (2025, accessed 18 November 2025).

45.

Schulman

Uludağ

Problems and solutions in quantifying cerebrovascular reactivity using BOLD–MRI. Imaging Neurosci 2025; 3: imag_a_00556.

46.

Kalcher

Boubela

Huf

, et al. RESCALE: voxel-specific task-fMRI scaling using resting state fluctuation amplitude. Neuroimage 2013; 70: 80–88.

47.

Kazan

Mohammadi

Callaghan

, et al. Vascular autorescaling of fMRI (VasA fMRI) improves sensitivity of population studies: a pilot study. Neuroimage 2016; 124: 794–805.

48.

Golestani

Wei

Chen

JJ.

Quantitative mapping of cerebrovascular reactivity using resting-state BOLD fMRI: validation in healthy adults. Neuroimage 2016; 138: 147–163.

49.

Smith

Brady

JM.

SUSAN—a new approach to low level image processing. Int J Comput Vis 1997; 23(1): 45–78.

50.

Tong

Hocke

Frederick

BB.

Low frequency systemic hemodynamic “noise” in resting state BOLD fMRI: characteristics, causes, implications, mitigation strategies, and applications. Front Neurosci 2019; 13: 787.

51.

Sleight

Stringer

Mitchell

, et al. Cerebrovascular reactivity measurements using 3T BOLD MRI and a fixed inhaled CO₂ gas challenge: repeatability and impact of processing strategy. Front Physiol 2023; 14: 1070233.

52.

Dardzinski

Collins

, et al. Three-dimensional mapping of the static magnetic field inside the human head. Magn Reson Med 1996; 36(5): 705–714.

53.

Roopchansingh

French

Nielson

, et al. EPI distortion correction is easy and useful, and you should use it: a case study with toddler data. bioRxiv (Preprint) 2020. doi: 10.1101/2020.09.28.306787

54.

Weiskopf

Hutton

Josephs

, et al. Optimal EPI parameters for reduction of susceptibility-induced BOLD sensitivity losses: a whole-brain analysis at 3 T and 1.5 T. Neuroimage 2006; 33(2): 493–504.

55.

Cox

RW.

AFNI: software for analysis and visualization of functional magnetic resonance neuroimages. Comput Biomed Res Int J 1996; 29(3): 162–173.

How processing choices effect repeatability in BOLD–CVR imaging

Abstract

Keywords

Introduction

Materials and methods

Subjects

Procedure

Data acquisition

CVR image and signal processing

Analysis

Univariate sensitivity analysis of design parameters

In-depth analysis of design parameters

Susceptibility distortion correction

Smoothing

Interaction

Optimization

Regressor

Delay

Quality metrics

Results

Susceptibility distortion correction

Smoothing

Interaction

Optimization

Regressor

Delay

Quality metrics

Discussion

Temporal filtering and motion-confounds create large collinearity

Temporal limitations of regressor can lead to fake steal

Other important considerations in BOLD–CVR imaging

BOLD–CVR is influenced by baseline vascular factors

Smoothing improves SNR but reduces subject and tissue contrasts

Choice of regressor determines interpretation of CVR maps

Lost signal can not be restored using SDC

Limitations

Single dataset

Only two measurements

Design parameters scope

Vascular paradigm

Regressor

Repeatability metric

Emphasis on CVR amplitude

CO2 sampling

Conclusion

Supplemental Material

sj-docx-1-jcb-10.1177_0271678X261420026 – Supplemental material for How processing choices effect repeatability in BOLD–CVR imaging

Footnotes

Acknowledgements

Author contributions

Funding

Declaration of conflicting interests

ORCID iDs

Data availability statement

Supplemental material

References

CO₂ sampling