Abstract
The effective sample size (ESS) measures the informational value of a probability distribution in terms of an equivalent number of study participants. The ESS plays a crucial role in estimating the expected value of sample information (EVSI) through the Gaussian approximation approach. Despite the significance of ESS, except for a limited number of scenarios, existing ESS estimation methods within the Gaussian approximation framework are either computationally expensive or potentially inaccurate. To address these limitations, we propose a novel approach that estimates the ESS using the summary statistics of generated datasets and nonparametric regression methods. The simulation experiments suggest that the proposed method provides accurate ESS estimates at a low computational cost, making it an efficient and practical way to quantify the information contained in the probability distribution of a parameter. Overall, determining the ESS can help analysts understand the uncertainty levels in complex prior distributions in the probability analyses of decision models and perform efficient EVSI calculations.
Highlights
Effective sample size (ESS) quantifies the informational value of probability distributions, essential for calculating the expected value of sample information (EVSI) using the Gaussian approximation approach. However, current ESS estimation methods are limited by high computational demands and potential inaccuracies.
We propose a novel ESS estimation method that uses summary statistics and nonparametric regression models to efficiently and accurately estimate ESS.
The effectiveness and accuracy of our method are validated through simulations, demonstrating significant improvements in computational efficiency and estimation accuracy.
Keywords
Introduction
The effective sample size (ESS) quantifies the informational value of a probability distribution by representing it in terms of an equivalent number of observations contributing information to this distribution.
1
For example, if the probability of successes
Despite its usefulness, existing ESS estimation methods within the GA have limitations. 3 This is because these methods either entail potentially high computational costs, impeding their practicality, or exhibit inaccuracies, undermining the reliability of estimates. 2 To efficiently assess the informative level of probability distributions and simplify the EVSI calculation, we propose a novel method for estimating ESS based on summary statistics and nonparametric regression models, inspired by the nonparametric regression–based EVSI methods proposed by Strong et al. 4 We begin by reviewing the ESS, particularly within the GA approach, and proceed to illustrate the nonparametric regression–based ESS estimation method. Subsequently, we perform a simulation to demonstrate the effectiveness and accuracy of our method, followed by a brief discussion to conclude.
Methods
ESS and GA
The ESS is a statistical concept that quantifies the informational value of a probability distribution through the number of hypothetical participants.1,i A greater ESS signifies a more informative distribution, while a lower value indicates the opposite. The estimation of ESS serves 2 important purposes in health economic evaluation. First, it allows area experts to determine if the amount of information contained in the distribution aligns with their prior beliefs. Second, it facilitates the estimation of the EVSI using the GA approach, which is an efficient computational method that optimizes EVSI across studies with varying sample sizes.2,5 The efficiency of the GA method stems from the fact that ESS is computed only once to estimate EVSI across sample size. ii This section introduces the concept of ESS as it is presented within the GA approach.
First, assume we have
where
In addition, we denote the set of
Due to the conjugate nature of the distributions of
Using this formulation, the ESS
and the variance ratio between
Although equations 4 and 5 are derived using the Gaussian assumption, Jalal and Alarid-Escudero
2
showed that they can be generalized to estimate
Estimate the ESS Using the Nonparametric Regression–Based Method
In this section, we review current
In the original GA article, Jalal and Alarid-Escudero
2
introduced 2 methods that can be generalized to estimate
The second method uses equation 5. However, since the sample mean
Unlike the first method, this approach requires only summary statistics, making it computationally efficient. However, the key limitation lies in identifying suitable summary statistics
We propose a novel approach that substantially reduces the computational burden of estimating
After we find
This equation suggests that the conditional expectation
The choice of regression models is flexible, ranging from statistical models such as splines and Gaussian process regression to machine learning approaches such as Bayesian additive regression trees and neural networks. In our case study, we used splines because these case studies contain 4 or fewer summary statistics. However, when the number of summary statistics exceeds 5, Gaussian process regression is more appropriate due to its flexibility in modeling complex relationships.
4
The detailed algorithm for estimating
The algorithm for estimating the effective sample size (
Case Study: Examining the Accuracy of the Nonparametric Regression–Based
Estimation
We assess the accuracy of the proposed method across 7 diverse data collection scenarios, including settings where
Probability Distribution, Likelihood Function and, Effective Sample Size (
The
The prior distribution for the parameter P follows a truncated normal distribution with a domain ranging from 0 to 1, a mean of 0.2, and a variance of 0.01.
The prior distribution of the parameter θ is defined as the negative natural logarithm of 1 −P, where P follows a beta distribution with parameters α = 4 and β = 6.
To calculate the ESS using our nonparametric regression–based method, we generate
Results
Table 1 provides the estimated
Discussion
This article introduces a novel approach for efficiently quantifying the amount of information contained in a probability distribution (ESS) by combining summary statistics and nonparametric regression models. Our method provides an estimation of the ESS for various probability distributions based on a GA. While effective, this method depends on having a sufficiently large training sample for the nonparametric regression model to converge, and the choice of summary statistics is critical. Ideally, the summary statistics should capture most of the information in the original dataset, with MLE being a particularly good choice, as they are not only robust but also computationally efficient. The accuracy and efficiency of the proposed method are validated through 7 experimental data collection designs involving various types of probability distributions.
In addition to EVSI calculation, our efficient ESS estimation approach can also be used in other applications. This encompasses evidence synthesis in clinical trials, as explored by Morita et al., 1 and informing parameter distributions in medical decision making. For instance, our method can be particularly useful when the uncertainty distribution for a given model input is complex and ESS cannot be readily determined from the parameters of the distribution. In such instances, ESS provides a valuable metric to quantify the number of participants contributing information to a specific input. This insight can be compared with an expert’s intuition, and if there are discrepancies, the uncertainty representation can be revised accordingly.
Overall, ESS is a useful metric that can be readily used to inform EVSI and guide input distributions in medical decision making. Our proposed approach can help estimate ESS more accurately and efficiently.
Supplemental Material
sj-pdf-2-mdm-10.1177_0272989X251324936 – Supplemental material for A Nonparametric Approach for Estimating the Effective Sample Size in Gaussian Approximation of Expected Value of Sample Information
Supplemental material, sj-pdf-2-mdm-10.1177_0272989X251324936 for A Nonparametric Approach for Estimating the Effective Sample Size in Gaussian Approximation of Expected Value of Sample Information by Linke Li, Hawre Jalal and Anna Heath in Medical Decision Making
Supplemental Material
sj-pdf-3-mdm-10.1177_0272989X251324936 – Supplemental material for A Nonparametric Approach for Estimating the Effective Sample Size in Gaussian Approximation of Expected Value of Sample Information
Supplemental material, sj-pdf-3-mdm-10.1177_0272989X251324936 for A Nonparametric Approach for Estimating the Effective Sample Size in Gaussian Approximation of Expected Value of Sample Information by Linke Li, Hawre Jalal and Anna Heath in Medical Decision Making
Supplemental Material
sj-rmd-1-mdm-10.1177_0272989X251324936 – Supplemental material for A Nonparametric Approach for Estimating the Effective Sample Size in Gaussian Approximation of Expected Value of Sample Information
Supplemental material, sj-rmd-1-mdm-10.1177_0272989X251324936 for A Nonparametric Approach for Estimating the Effective Sample Size in Gaussian Approximation of Expected Value of Sample Information by Linke Li, Hawre Jalal and Anna Heath in Medical Decision Making
Footnotes
Acknowledgements
The authors appreciate the insightful suggestions provided by the anonymous reviewers, which have significantly improved the quality of this article.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: LL was funded by the Canadian Statistical Sciences Institute (grant No. Collaborative Research Team 2023) and the Natural Sciences and Engineering Research Council of Canada (grant No. RGPIN-2021-03366).
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
