Abstract
Highlights
The nonparametric sampling method is generic and can sample times to an event from any discrete (or discretizable) hazard without requiring any parametric assumption.
The method is showcased with 5 commonly used distributions in discrete-event simulation models.
The method produced very similar expected times to events, as well as their probability distribution, compared with analytical results.
We provide a multivariate categorical sampling function for R and Python programming languages to sample times to events from processes with different hazards simultaneously.
Keywords
Introduction
Discrete-event simulation (DES) models simulate processes as discrete sequences of events that occur over time. 1 These models rely on sampling the time of different events. For example, if events have a constant rate or hazard of occurrence, the time of their occurrence can be sampled from an exponential distribution. In DES models, time-to-event data following a nonconstant hazard could be sampled from parametric distributions. 2 However, some events cannot be easily described by parametric distributions. For example, life tables, or events following hazards that are a function of time-varying covariates, such as smoking histories or tumor size, do not always follow standard parametric distributions. An alternative is to use a nonhomogeneous Poisson point process (NHPPP), which assumes that the rate of events follows a Poisson process that can vary over time. 3 There are different implementations of algorithms for sampling from NHPPP, which require either numerical integration or rejection sampling. 4
In this brief report, we propose a nonparametric sampling (NPS) implementation of NHPPP that is both generalizable and computationally efficient. The method assumes that time to event is drawn from a nonparametric categorical distribution. We illustrate the NPS method using 5 examples highlighting its accuracy, flexibility, and computational efficiency. In addition, we provide an open-source implementation in R and Python to facilitate wider adoption.
Constructing the Categorical Distribution
The steps to implement the NPS method are described in Box 1 and shown in Figure 1. In summary, the approach involves 6 steps. First, obtaining the discrete-time hazard cumulative distribution function (CDF),
Steps to Draw a Time to Event Using a Nonparametric Sampling Approach a

Steps to sample time to events using a nonparametric sampling approach.
Let
The CDF of
If the hazard is given in a different scale from the one the analyst is interested in, it can be transformed to the desired scale by multiplying the hazard
We derive the probability of an event happening within the
To conduct an NPS of the time interval at which the event can occur, we define
with a probability mass function
Multivariate Categorical Distribution
We expand the previous approach to sample values for multiple random variables simultaneously by defining a multivariate categorical distribution as
where
Common statistical software has no built-in functions to sample from a multivariate categorical distribution. However, we provide the code of the multivariate categorical distribution in R and Python in the Supplementary Material.
Approximating Continuous Time to Event
An approximation error occurs when approximating the continuous time to event by using a discrete-time approach.7–9 Since the NPS samples for the exact time categories that were initially defined while dividing the time interval, the method does not contemplate the possibility of events happening in between any 2 categories. This generates a systematic bias, which could be reduced by adding a random variable
Accounting for Covariates
Hazards could be a function of either time-independent covariates, such as sex, race, or birth cohort, or time-dependent covariates, such as smoking histories, exposure to environmental risk factors, or tumor size. In this section, we demonstrate the use of the NPS method to sample times to events from hazards as functions of time-independent and time-dependent covariates.
Time-Independent Covariates
Let the
Time-Dependent Covariates
We now consider that the covariate can vary over time

(A) Time-dependent hazard, h(t), for different values of a covariate. (B) Example of a covariate path. (C) Corresponding path of the h(t).
Examples
Below, we provide 5 examples to illustrate the implementation of the NPS method for different processes. The R code for these examples and the function of the multivariate categorical distribution is provided in a GitHub repository (https://github.com/DARTH-git/NPS_time_to_event).
Example 1: Time to Event from Parametric Hazards
We used the NPS method for drawing times to events from various commonly used parametric distributions, such as exponential, gamma, and log-normal. We derived the piecewise constant hazard,
Comparison of Expected Time to Events and Mean Sampling Time, in Milliseconds, from 100 Iterations of N Samples Each between the Nonparametric Sampling (NPS) Method and Parametric Distributions or Life Table Estimates
IQR, interquantile range; N/A, not applicable; NPS-C, nonparametric sampling corrected by adding a uniformly distributed random number; NPS-U, nonparametric sampling uncorrected.
Example 2: Sampling Age to Death from a Homogeneous Cohort
We sampled the age to death for 100,000 individuals in a hypothetical cohort from the US population in 2015. 10 We estimated the life expectancy by taking the average across the 100,000 samples with the continuous-time approximation. The probability mass function (PMF) for the age to death obtained from the NPS methods closely follows the PMF from the life table (Figure 3). The estimated life expectancy from the NPS method is 78.53 years, which is close to the life expectancy obtained from the life tables of 78.37 y. The mean execution time, repeating the sampling process 100 times, is 5.15 milliseconds (Table 1).

Probability mass function of dying within 1 y of age in the total US population in 2015.
Example 3: Drawing Age to Death from a Heterogeneous Cohort
We used the multivariate categorical distribution to simultaneously sample ages to death for 100,000 males and females from sex-specific life tables for the US population in 2015, with the continuous-time approximation defined above. The sex-specific PMF from the NPS method and the exact PMF from life tables are shown in Figure 4. The NPS method estimated a life expectancy of 76.22 and 80.93 y for males and females, respectively. The life expectancy obtained from the life tables was 75.93 and 80.76 y for males and females, respectively. The mean execution time, repeating the sampling process 100 times, is 255.30 ms (Table 1).

Probability mass function of dying within 1 y of age by sex, US population in 2015.
Example 4: Drawing Time to Event from Hazards with Time-Dependent Covariates
We used a proportional hazard setup with a time-dependent covariate that increases linearly over time,
The NPS method produced similar expected time to events for the 2 distributions compared with the DS method, from 1 million draws: exponential (8.61 NPS v. 8.52 DS), Gompertz (35.98 NPS v. 35.48 DS), and Weibull (8.79 NPS v. 8.02, DS). Their mean execution times in milliseconds, repeating the sampling process 100 times, were 38.28, 51.92, and 48.44, respectively.
Example 5: Drawing Time to Event from Hazards with Time-Dependent Covariates following Random Paths
We specify a time-varying covariate

(A) Individual-specific trajectories. (B) Individual-specific time-dependent hazards. Sample of 10 individuals.
Discussion
We developed a nonparametric method of sampling times to events with high computation efficiency. The NPS method uses a categorical distribution, which discretizes the hazard of events over a fixed and finite time period, assuming a piecewise hazard. We illustrated the NPS method with 5 examples that show common situations encountered when building DES models and provided their mean execution times. NPS can be used to sample the age of death from age-, sex-, race-, and year-specific life tables and/or times to smoking initiation or cessation from smoking histories. 13 It can also be used to sample times to events with hazards that are functions of either time-independent or time-dependent covariates.
The proposed NPS method works similarly to previous methods when sampling ages of death from a life table for a specific group (e.g., White females born in 1980 in the United States) using a piecewise-constant exponential distribution. 14 However, a strength of the proposed method is the use of multivariate categorical sampling, which extends the NPS method to simultaneously sample multiple ages of death from multiple life tables for different groups.
The NPS method accurately approximates the expected time to events from parametric distributions and can generate times to events from hazards for which no parametric distributions can be accurately fitted, such as time-varying hazards described by time-varying covariates. Once the probability distributions are derived from the observed hazards, the sampling process is computationally efficient and can be easily repeated multiple times. This approach can be very useful for individual-level models that require sampling times to events following processes that could not be appropriately addressed using parametric distributions.
Our approach does not provide criteria to determine the optimal time interval length and it is up to the user to define it. This may pose a limitation because selecting an excessively wide interval can result in distributions that do not resemble the observed hazard, such as those with extremely swift changes in their levels. However, this is a focus for future research. In addition, since this method uses a nonparametric categorical distribution, a sufficient number of samples must be drawn to obtain unbiased estimates. Our method assumes that the analyst is interested in sampling time to events from the mean process. However, if the analyst is interested in propagating the uncertainty of the estimated time-to-event process and has access to the mechanism generating the uncertainty of the estimation of the average process, the user can sample multiple hazards from this mechanism and apply the NPS method to each sampled hazard. Resampling the hazards and running the method on each sampled hazard accurately propagates the uncertainty in the estimated hazards into probabilistic sensitivity analysis using NPS.
We proposed a method that can efficiently sample times to event from any time-to-event process from its hazard, survival, or CDF over time. Moreover, this method can simultaneously sample from multiple different hazards with the multivariate categorical distribution, which we provide as R and Python functions in the Supplementary Material.
Supplemental Material
sj-pdf-1-mdm-10.1177_0272989X241308768 – Supplemental material for A Fast Nonparametric Sampling Method for Time to Event in Individual-Level Simulation Models
Supplemental material, sj-pdf-1-mdm-10.1177_0272989X241308768 for A Fast Nonparametric Sampling Method for Time to Event in Individual-Level Simulation Models by David U. Garibay-Treviño, Hawre Jalal and Fernando Alarid-Escudero in Medical Decision Making
Footnotes
Acknowledgements
We thank Rowan Iskandar for his valuable contributions to the code for the multivariate categorical sampling. We thank Karen Kuntz, Thomas Trikalinos, and Yuliia Sereda for providing feedback on an earlier version of this article.
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article. The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for this study was provided in part by grants from the National Cancer Institute as part of the Cancer Intervention and Surveillance Modeling Network. Dr. Alarid-Escudero is supported by grant U01CA253913, Dr. Jalal is supported by a Canada Research Chair, and Drs. Alarid-Escudero and Jalal are supported by grant U01CA265750. The funding agencies had no role in the design of the study, interpretation of results, or writing of the manuscript. The funding agreement ensured the authors’ independence in designing the study, interpreting the data, writing, and publishing the report.
Ethical Considerations
The authors did not carry out any human and/or animal studies for this publication submission. In addition, the authors of this article do not have any ethical considerations to disclose.
Consent to Participate
The authors did not carry out investigations involving humans for this publication submission.
Consent for Publication
Not applicable.
Data Availability
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
