Abstract
Background
The expected value of sample information (EVSI) measures the expected benefits that could be obtained by collecting additional data. Estimating EVSI using the traditional nested Monte Carlo method is computationally expensive, but the recently developed Gaussian approximation (GA) approach can efficiently estimate EVSI across different sample sizes. However, the conventional GA may result in biased EVSI estimates if the decision models are highly nonlinear. This bias may lead to suboptimal study designs when GA is used to optimize the value of different studies. Therefore, we extend the conventional GA approach to improve its performance for nonlinear decision models.
Methods
Our method provides accurate EVSI estimates by approximating the conditional expectation of the benefit based on 2 steps. First, a Taylor series approximation is applied to estimate the conditional expectation of the benefit as a function of the conditional moments of the parameters of interest using a spline, which is fitted to the samples of the parameters and the corresponding benefits. Next, the conditional moments of parameters are approximated by the conventional GA and Fisher information. The proposed approach is applied to several data collection exercises involving non-Gaussian parameters and nonlinear decision models. Its performance is compared with the nested Monte Carlo method, the conventional GA approach, and the nonparametric regression-based method for EVSI calculation.
Results
The proposed approach provides accurate EVSI estimates across different sample sizes when the parameters of interest are non-Gaussian and the decision models are nonlinear. The computational cost of the proposed method is similar to that of other novel methods.
Conclusions
The proposed approach can estimate EVSI across sample sizes accurately and efficiently, which may support researchers in determining an economically optimal study design using EVSI.
Highlights
The Gaussian approximation method efficiently estimates the expected value of sample information (EVSI) for clinical trials with varying sample sizes, but it may introduce bias when health economic models have a nonlinear structure.
We introduce the spline-based Taylor series approximation method and combine it with the original Gaussian approximation to correct the nonlinearity-induced bias in EVSI estimation.
Our approach can provide more precise EVSI estimates for complex decision models without sacrificing computational efficiency, which can enhance the resource allocation strategies from the cost-effective perspective.
Keywords
Value-of-information (VoI) analysis involves calculating the economic benefits of reducing uncertainty in a decision model.1–4 In general, VoI methods use Bayesian decision theory to integrate insights from health economic decision models with data from previous studies. This approach aids in informing decision making regarding future data collection efforts and resource allocation.1,5 One of the metrics in VoI analysis involves the expected value of sample information (EVSI), which quantifies the expected economic benefits obtained from a specific data collection experiment. EVSI has a high potential to assist in selecting the optimal trial from a health economic standpoint and can guide public resource allocation to optimize future data collection efforts.6–11
However, computing EVSI has been traditionally challenging due to its conceptual and computational burdens. This is because EVSI requires integrating over the conditional expectation of the net benefits, which is usually estimated through nested simulations. 12 Most simulation models used in practice are complex without the possibility of analytically computing the conditional expectation. Thus, EVSI is often computed numerically by sampling from the distribution of the conditional expectation of the net benefits a large number of times. Traditional estimation methods involve numerical methods such as Markov Chain Monte Carlo (MCMC) and may need considerable time to evaluate. 12 As a result, the computation burden of EVSI is often high, especially when the health economic decision model is complex. This has limited the application of EVSI in practice.13,14
In recent years, several methods have been proposed to reduce the computational cost of estimating EVSI.15–22 These methods have been applied in real-world studies.7,13,23 In this exposition, we will be using the Gaussian approximation (GA) approach, which can easily identify the optimal design as it estimates EVSI for studies with distinct sample sizes at a low computational cost. 22 Specifically, the GA approach involves estimating the conditional expectation of net benefit for each intervention by employing a regression model. This model uses the net benefits samples as the response variable and the parameters of a health economic decision model as the predictor variables.22,24 However, the GA approach may produce biased estimates of EVSI when the net benefit function exhibits a nonlinear structure. This is because the GA approach estimates the conditional expectation of the net benefit using an estimate of the conditional expectation of the model parameters. Such a process may inadequately estimate the conditional expectation of the net benefit when the relationship between the parameters and the net benefit is nonlinear. 22
To improve the EVSI estimation accuracy of GA for nonlinear decision models, we extend the original GA by introducing higher-order correction terms. This is achieved by decomposing the conditional expectation of net benefits into 2 components using Taylor series expansions.25–27 The first component is estimated using the original GA approach, while the second component is estimated using the fitted model, samples of parameters, and expected Fisher information to adjust for bias resulting from nonlinear net benefit functions.28–30 The resulting method is named the spline-based Taylor series approximation and Gaussian approximation (TGA). This approach retains its efficiency in approximating EVSI across sample sizes once the prior effective sample sizes (ESSs) are obtained.
We begin this article by formally introducing the definition of EVSI and reviewing the methodology of the GA approach. 22 Then, we present our extension to the GA methods by first presenting how to approximate the conditional expectation of the net benefits using Taylor series expansions, which is followed by an introduction to the approximation methods of the elements required for conducting Taylor series approximation. After that, we use stylized examples and the case study model included in the original GA article to demonstrate that our extended GA approach can provide more accurate EVSI estimates for nonlinear decision models. 22 We conclude this article with a short discussion.
Methods
The EVSI
Health economic decision models assess the net monetary (or net health) benefits of different interventions to help decision makers choose the optimal decision from
Using the information contained in the health economic decision model, the EVSI is defined as the difference between the expected net benefit provided by the optimal decision that is made after an additional dataset is collected, versus the expected benefit given by the optimal decision based on prior knowledge. Using our prior knowledge about
For a study with a sample size n, we plan to collect a dataset
As these data have not been collected, EVSI is defined by averaging over all potential dataset. The probabilistic distribution of potential dataset
We then average over the randomness of
EVSI for a study design with a sample size n is defined as the difference between terms 3 and 1:
where equations 4 and 5 are equivalent because of the law of total expectation. 17 Equation 5 is more commonly used in numerical approximation of EVSI since it can reduce the uncertainty introduced by Monte Carlo sampling. 17
The traditional method for calculating the first component in equation 5 is Monte Carlo sampling, which requires a nested 2-stage process: first, we need to simulate a large number of samples of datasets
Approximating the Conditional Expectation of a Prior Parameter Using the GA Approach
This section reviews the original GA approach. Assume we are aiming to approximate
where
Moreover, we denote the sample mean of the simulated dataset with sample size
Since both
Because the prior distribution and likelihood function are conjugate, the conditional mean of
where
Since the prior distribution of
Following equation 11, we can construct the distribution of
Since the distribution in health economic evaluation often aims to reflect uncertainty, the shape of the distribution of
Lastly, Jalal and Alarid-Escudero suggest that
Taylor Series Expansions for the Conditional Expectation of Net Benefit
Taylor series expansions can provide a more accurate estimation for
where
To approximate
where
Following equations 13 and 14, the estimation of
Although equation 14 uses a second-order Taylor series approximation for estimating
Finally, note that this article limits the parameter
Approximating Conditional Variance of a Prior Parameter Using Expected Fisher Information
In this section, we introduce the methods that can be used to estimate the conditional variance
Alternatively, if the Gaussian assumption of
However, since the Gaussian assumption of
The expected Fisher information is a crucial concept in statistical estimation theory. Utilizing asymptotic theory, it approximates
The expected Fisher information function is defined as the expectation of the second-order derivative of the log-likelihood function of
The functional forms of
Using the asymptotic properties of the conditional distribution of
Lastly, since the asymptotic conditional variance approximated by the expected Fisher information is usually greater than the true conditional variance,
40
we can further adjust the conditional variance provided by the expected Fisher information using the law of total variance. Because equation 15 suggests that the average of the conditional variance should be equal to
After adjustments, the conditional variance associated with
Approximating Conditional Expectation of Net Benefit Using Splines
This section introduces how to approximate the functional form of
Splines are a type of flexible regression model that can characterize the nonlinear relationship between the responses and predictors using a series of basis functions.16,37 As Strong et al.
17
introduced, using the PA datasets, we can regress the samples of net benefits given by the decision
After the functional form of
In equation 14, the conditional expectation of the net benefit can then be approximated using the fitted splines:
Moreover, using equations 12 and 18, the marginal distribution of
EVSI Calculation
Finally, we can repeat the above procedures to approximate the marginal distribution of
The algorithm for estimating EVSI for k different sample sizes
Estimating EVSI Using GA and Spline-Based Taylor Series Expansions
EVSI, expected value of sample information; GA, Gaussian approximation; PA, probabilistic analysis.
Simulation Study
Case Study I: Gaussian Parameters and Nonlinear Net Benefit Functions
In the first case study, our augmented GA method, based on splines and Taylor series expansions (TGA), is used to evaluate the EVSI for 4 stylized examples with Gaussian-distributed parameters and nonlinear net benefit functions. We compare EVSI estimates from our TGA method with those from the conventional GA approach and the nonparametric regression-based method to demonstrate its accuracy. In addition, we have derived the analytic solution for the conditional net benefit and computed the EVSI based on this quantity. The EVSI derived from this approach is deemed to be the most accurate and is used as the benchmark in comparison with all 4 methods.
Incremental net benefit function
Our decision problem compares 2 potential interventions. To simplify the calculation, we can derive the incremental net benefit function using the net benefit functions of 2 decision options by subtracting one net benefit function from the other, that is,
Incremental Net Benefit Functions for 4 Stylized Studies in Case Study I
Using the incremental net benefit function, EVSI can be calculated using the conditional expectation of
Therefore, we can approximate the distribution of
Parameter of interest and dataset generation
For the first 3 scenarios in which the parameter of interest
The likelihood function of
The prior distribution for the fourth bivariate incremental net function is an independent bivariate normal distribution with a mean of 0 and a variance of
Method 1: Analytic method
For each data collection exercise, we draw
Using the simulated dataset with the sample size
Note that deriving the closed-form solution of the conditional incremental net benefit function is usually unrealistic due to the complexity of the underlying health economics decision model. Therefore, the analytic method is rarely, if ever, applied in practical settings.
Method 2: Nonparametric regression-based method
We generate
Method 3: Linear meta-modeling GA
Since the Gaussian assumption is strictly satisfied, we can derive values of prior ESS
Method 4: Spline-based Taylor series GA
The prior ESSs
Case Study II: Calculating EVSI in a Markov Model
In the second case study, we test the robustness of TGA when the prior and likelihood are both non-Gaussian. We compare the accuracy of EVSI given by the nonparametric regression-based method, conventional GA, and TGA using a Markov model included in Jalal and Alarid-Escudero. 22 Four different data collection processes are considered to reduce the uncertainty in this Markov model, and the corresponding EVSI are estimated. EVSI for each of the data collection processes is also computed using the nested Monte Carlo method for comparison. 22
Incremental net benefit function
A Markov model with 3 states (well, disabled, and dead) is used to simulate a group of 30-y-old patients suffering from a genetic disorder. To prevent the disorder from leading to permanent disability, 3 treatment options, labeled
Parameter of interest and dataset generation
For this case study, we consider the 4 different data collection exercises that are included in Jalal and Alarid-Escudero’s work.
22
These 4 data collection exercises aim to reduce the uncertainty in
Prior Distribution and Likelihood Functions for the Markov Model in Case Study II a
Method 1: Nested Monte Carlo method
The EVSI estimates from the nested Monte Carlo method are taken directly from Jalal and Alarid-Escudero’s article. 22 Computing each EVSI estimate through this method required approximately 6 h of processing time on 16 parallel cores. 22
Method 2: Nonparametric regression-based method
We generate
Method 3: Linear meta-modeling GA
We can derive that
Method 4: Spline-based Taylor series GA
Like the conventional GA approach,
Results
Case Study I: Gaussian Parameters and Nonlinear Net Benefit Functions
Figure 1 compares the EVSI of 4 stylized net benefit functions with Gaussian distributed parameters computed using the analytic method, the conventional GA approach, TGA, and the nonparametric regression-based method for different sample sizes (between

The expected value of sample information (EVSI) computed by analytic method (Analytic), conventional Gaussian approximation (GA), spline-based Taylor series expansions and Gaussian approximation (GA and Taylor series approximation), and nonparametric regression-based method (non-nonparametric) for linear and nonlinear incremental net benefit functions with Gaussian distributed parameters. The expected value of partial perfect information (EVPPI) is shown with the horizontal dashed lines. (A) EVSI for Gaussian θ, INB(θ) = −100 + 5,000θ, (B) EVSI for Gaussian θ, INB(θ) = −1,000 + 50,00θ
2
, (C) EVSI for Gaussian θ, INB(θ) = −500 + 50,00θ
4
, and (D) EVSI for Gaussian (θ1, θ2), INB(θ1, θ2) = −1,500 +
In subplot A of Figure 1, when
Case Study II: Calculating EVSI in a Markov Model
Figure 2 compares the EVSI of the Markov models computed by the conventional GA approach, TGA approach, and nonparametric regression-based method for different sample sizes (between

The expected value of sample information (EVSI) computed by conventional Gaussian approximation (GA), spline-based Taylor series expansions and Gaussian approximation (GA and Taylor series approximation), and nonparametric regression-based method (nonparametric) for a Markov model across different sample sizes. EVSI estimated by standard nested Monte Carlo is denoted by the red cross. The expected value of partial perfect information (EVPPI) is shown by the horizontal dashed lines. (A) EVSI for non-Gaussian
When the relationship between conditional net benefits and parameters is nearly linear (subplots
Discussion
This article presents a new algorithm, spline-based Taylor series approximation and Gaussian approximation (TGA), for estimating EVSI. In the TGA method, we estimate EVSI by approximating the conditional expectation of net benefits using 2 steps. First, we use Taylor series expansion to approximate the conditional expectation of net benefits through the net benefit function and the conditional mean and variance of parameters. Subsequently, the net benefit function is approximated by the spline fitted to the PA dataset, and the conditional moments of the parameters are approximated by the conventional GA and expected Fisher information.
Strengths and Limitations
The TGA algorithm has several advantages over alternative EVSI estimation methods. First, once the prior ESS is estimated, TGA can estimate EVSI across multiple sample sizes with minimal computational cost. This is more efficient than EVSI estimation algorithms, in which EVSI must be estimated separately for each sample size (i.e., their computational time scales linearly with the number of sample sizes). Methods with linear scaling include the nonparametric regression-based method and other estimation algorithms based on advanced Monte Carlo methods.15–19,21 In addition, EVSI estimates obtained using TGA are smooth with respect to the sample sizes and convenient for determining study designs that maximize economic benefit through numerical optimization. 8 Finally, EVSI estimates from TGA are more accurate than conventional GA, especially when the net benefit function is highly nonlinear.20–22
However, TGA’s efficiency and accuracy may be affected in certain scenarios. First, if the parameters of interest have a high dimension and complex interactions, a spline with a lot of interaction terms may be required to accurately approximate the function form of the conditional net benefit function. As a result, more computational resources are required for computing the second-order derivative of the net benefit samples, and this may reduce the efficiency of TGA. In this case, we can consider implementing the TGA methods using other nonparametric regression methods that are less affected by the “curse of dimensionality” (e.g., artificial neural network) to approximate the functional form of the conditional net benefit function. A future study might examine the efficacy of these nonparametric regression models in estimating EVSI, particularly when the number of the parameters of interest is large.30,42
Second, although a closed-form solution for the expected Fisher information is typically accessible for most data-generating processes, there are exceptions in complex scenarios. For instance, evaluating the expected Fisher information function becomes problematic in cases like the generalized linear mixed effect model, 43 in which the likelihood function may not have a closed-form solution, or when identifying the likelihood function of the data-generating process proves to be challenging. In such scenarios, alternative approaches to estimating EVSI, such as nonparametric regression-based methods or moment matching,17,20,21 may be more suitable.
Third, while our article demonstrates the proposed method’s accuracy through hypothetical and real-world case studies, this does not guarantee its effectiveness in every complex scenario. Future research should aim to evaluate the method’s adaptability across a broader range of conditions, including more complex prior, likelihood, and net benefit configurations.
Lastly, because the approximation of the conditional variance of the parameters of interest based on the expected Fisher information is more accurate when the sample size of the design is relatively large, EVSI provided by TGA is less accurate when the sample size of the design is relatively small. The nonparametric regression-based method may be preferred over TGA in that scenario. Alternatively, future research should consider investigating more precise methods to quantify the uncertainty of EVSI estimates given by TGA, especially for small sample sizes, possibly through Bayesian bootstrap or Taylor series approximation techniques.25,44
Conclusion
We introduced a novel EVSI estimation method that combines Taylor series approximation and GA. As shown by the 2 case studies, the proposed algorithm can efficiently estimate EVSI for multiple sample sizes and is more accurate than conventional GA when the net benefit function is highly nonlinear. We believe that our method could aid in the evaluation and optimization of study designs using EVSI, particularly when the underlying health economic decision model is complex and includes a nonlinear structure.
Supplemental Material
sj-pdf-1-mdm-10.1177_0272989X241264287 – Supplemental material for Accurate EVSI Estimation for Nonlinear Models Using the Gaussian Approximation Method
Supplemental material, sj-pdf-1-mdm-10.1177_0272989X241264287 for Accurate EVSI Estimation for Nonlinear Models Using the Gaussian Approximation Method by Linke Li, Hawre Jalal and Anna Heath in Medical Decision Making
Supplemental Material
sj-pdf-2-mdm-10.1177_0272989X241264287 – Supplemental material for Accurate EVSI Estimation for Nonlinear Models Using the Gaussian Approximation Method
Supplemental material, sj-pdf-2-mdm-10.1177_0272989X241264287 for Accurate EVSI Estimation for Nonlinear Models Using the Gaussian Approximation Method by Linke Li, Hawre Jalal and Anna Heath in Medical Decision Making
Supplemental Material
sj-pdf-3-mdm-10.1177_0272989X241264287 – Supplemental material for Accurate EVSI Estimation for Nonlinear Models Using the Gaussian Approximation Method
Supplemental material, sj-pdf-3-mdm-10.1177_0272989X241264287 for Accurate EVSI Estimation for Nonlinear Models Using the Gaussian Approximation Method by Linke Li, Hawre Jalal and Anna Heath in Medical Decision Making
Supplemental Material
sj-pdf-4-mdm-10.1177_0272989X241264287 – Supplemental material for Accurate EVSI Estimation for Nonlinear Models Using the Gaussian Approximation Method
Supplemental material, sj-pdf-4-mdm-10.1177_0272989X241264287 for Accurate EVSI Estimation for Nonlinear Models Using the Gaussian Approximation Method by Linke Li, Hawre Jalal and Anna Heath in Medical Decision Making
Footnotes
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: LL was funded by the Canadian Statistical Sciences Institute (grant No. Collaborative Research Team 2023) and the Natural Sciences and Engineering Research Council of Canada (grant No. RGPIN-2021-03366)
Authors’ Note
Part of this work was presented at the 44th Annual Meeting of the Society for Medical Decision Making in October 2022.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
