Sage Journals: Discover world-class research

Abstract

Incidence rates are popular summary measures of the occurrence over time of events of interest. They are also called mortality rates or failure rates, depending on the context. The incidence rate is defined as the ratio between the total number of events and total follow-up time and can be estimated with the strate command. When the event of interest can occur multiple times on any given subject over a time period, like infections, the incidence rate represents an average count per unit of time, such as the average number of infections per year. When the event of interest can occur only once, such as death, an alternative summary measure is the risk, or probability, of occurrence per unit time, such as the risk of dying in one year. In this article, I present the stprisk command, which estimates risks, and illustrate its use and interpretation through a data example.

Keywords

st0698 stprisk incidence rates mortality rates survival analysis

1 Introduction

Scientific research often entails analyzing the occurrence over time of events of interest, such as death or a cancer diagnosis. This type of analysis is generally known as survival analysis, although different terms are used across different research areas. Stata has a comprehensive suite of commands for the analysis of survival data, and the help documentation offers an excellent description of the relevant methods and available commands (help st).

The strate command can estimate incidence rates, which are popular summary measures of the occurrence over time of events of interest. The incidence rate is defined as the ratio between the total number of events and total follow-up time. When the event of interest can occur multiple times on any given subject over a time period, like infections, the incidence rate represents an average count per unit of time, such as average number of infections per year. When the event of interest can occur only once, such as death, an alternative summary measure is the risk, or probability, of occurrence per unit of time, such as the risk of dying in one year.

The following section introduces the new stprisk command, which estimates the risk of occurrence of events of interest over time, and its relation to the popular incidence rate through a data example; section 3 shows the syntax of stprisk; section 4 provides some technical details about risks and the implementation of the stprisk command; and section 5 contains some final remarks.

2 Incidence rates and incidence risks

I illustrate the basis of the new stprisk command with an example. I use the data from a fictitious clinical trial on survival in cancer patients available in Stata.

. sysuse cancer

(Patient survival in drug trial)

I summarize the content of the dataset with the describe command.

The interest of this study is in comparing survival with the three different treatment groups, which comprise two active drugs and a placebo. To help present the arguments contained in the remainder of this section, I set the unit of measurement of the time-to-death variable to be years with the stset command.

Henceforth, I refer to the incidence rate as mortality rate, considering the event of interest is death. I estimate the mortality rate by treatment group with the strate command.

From the above output, the mortality rate in the placebo group is 1.27. On its website, the Centers for Disease Control and Prevention defines the incidence rate as “a measure of the frequency with which new cases of illness, injury, or other health condition occur, expressed explicitly per a time frame […].” A rate is not a risk. If one interpreted the above rate naïvely as a risk, one might conclude that any given patient is expected to die 1.27 times every year or, alternatively, that 100 patients are expected to report 127 deaths every year.

I now estimate the mortality risk with the stprisk command. Section 4 gives more details on its definition and interpretation.

The interpretation of the above risks is simple. For example, in the placebo group, the probability for any given subject to die in a year is 0.83, or alternatively, we expect 83 deaths out of 100 subjects every year.

The mortality rate converges to the mortality risk as the latter tends to zero. This limit behavior is analogous to that of the mean of a Poisson distribution, which converges to the mean of the binomial distribution as the latter tends to zero. This explains why the mortality rate and the mortality risk are numerically closer in drug group 3 than they are in the placebo group 1.

3 The stprisk command

The syntax of the stprisk command, similar to that of strate, is

stprisk [varlist][ [if] [in] , [level(#) graph nowhisker]

stprisk tabulates rates by one or more categorical variables declared in varlist. When varlist is omitted, stprisk estimates the mortality risk for the entire dataset. The level() option specifies the level of the confidence intervals. The default is level(95) or as set by set level. The graph option plots rates against the groups defined by varlist when a varlist is specified. The nowhisker option omits the confidence intervals from the graph.

You must stset your data before using stprisk; see [ST] stset.

4 Incidence risks

This section provides the basic definition and interpretation of incidence risks. Its content consists of slightly edited excerpts from published articles (Bottai 2017; Discacciati and Bottai 2017; Lagergren, Bottai, and Santoni 2021; and Bottai, Discacciati, and Santoni 2021). Let T represent a continuous time-to-event variable with support on the positive real half-line. Let S(t) = Pr(T > t) and H(t) = − log{S(t)} indicate the survival function and the cumulative hazard function, respectively. The function S(t) is defined over the entire real line, ℝ, while H(t) is defined over the set {t ∊ ℝ : S(t) > 0}.

The probability of occurrence of an event over the time interval [t₀, t₁], with t₀ < t₁, conditional on T > t₀, is

Pr (T \leq t_{1} | T > t_{0}) = 1 - S (t_{1}) / S (t_{0})

defined over the set {t ∊ ℝ: S(t) > 0}. Bottai (2017) defined the geometric rate of the event over the interval [t₀, t₁], conditional on T > t₀, as

G (t_{0}, t_{1}) = 1 - {S (t_{1}) / S (t_{0})}^{1 / (t_{1}} {^{- t_{0}}}^{)}

Bottai, Discacciati, and Santoni (2021) later referred to the above as the average probability of occurrence of the event. As explained in the articles, the word average indicates the geometric mean. The following example may help interpret G(t₀, t₁). Suppose the event of interest is death and t₀ = 0 and t₁ = 3. We split the time interval [0, 3] into the three one-unit disjoint intervals [0, 1], [1, 2], and [2, 3]. The mortality rate is the complement of the probability of surviving all three intervals conditional on being alive at time 0, which is S(3)/S(0). This is algebraically equal to the product of the probabilities of surviving each interval conditional on being alive at its start, S(3)/S(0) = {S(1)/S(0)}{S(2)/S(1)}{S(3)/S(2)}. The average probability per unit of time, therefore, is the geometric mean of the probabilities in the three intervals, {S(3)/S(0)}¹ ^/ ³. Applying the geometric mean at each interval yields the probability of surviving the entire period [0, 3].

For example, the probability of surviving one year in the placebo group is 0.225, as evinced in the following output:

From (1), the monthly mortality risk over the first year is G(0, 12) = 1−0.225¹ ^/ ¹² =0.117.

The definition of incidence risk given in (1) can also be written as

G (t_{0}, t_{1}) = 1 - \exp {\frac{H (t_{0}) - H (t_{1})}{t_{1} - t_{0}}}

The stprisk command uses (2), with t₀ set equal to the start of the follow-up time and t₁ set equal to the largest observed time. The cumulative hazard H(t₀) is equal to zero. The cumulative hazard H(t₁) and its confidence interval are obtained from the sts list command with the Nelson–Aalen option; see [ST] sts list.

5 Conclusions

The new stprisk command provides nonparametric estimates and confidence intervals for the risk of occurrence of an event of interest over time. The command is computationally efficient, and its syntax is patterned on that of the strate command.

The risk of occurrence of events is applicable to any event that can occur only once. In the example given in section 2, the event of interest is death, but risks can be assessed for other once-only events, such as first cancer diagnosis, hospital discharge, and first employment.

As the time period (t₀, t₁) tends to zero, the risk G(t₀, t₁) defined in (1) tends to the instantaneous risk, and the incidence rate tends to the instantaneous incidence rate, which is also known as the hazard. The similarities and differences between these quantities are expounded in the article by Bottai, Discacciati, and Santoni (2021).

6 Programs and supplemental materials

Supplemental Material, sj-zip-1-stj-10.1177_1536867X221141057 - Estimating the risk of events with stprisk

Supplemental Material, sj-zip-1-stj-10.1177_1536867X221141057 for Estimating the risk of events with stprisk by Matteo Bottai in The Stata Journal

Footnotes

6 Programs and supplemental materials

To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type

References

Bottai

2017. A regression method for modelling geometric rates. Statistical Methods in Medical Research 26: 2700–2707. https://doi.org/10.1177/0962280215606474.

Bottai

Discacciati

Santoni

2021. Modeling the probability of occurrence of events. Statistical Methods in Medical Research 30: 1976–1987. https://doi.org/10.1177/09622802211022403.

Discacciati

Bottai

2017. Instantaneous geometric rates via generalized linear models. Stata Journal 17: 358–371. https://doi.org/10.1177/1536867X1701700207.

Lagergren

Bottai

Santoni

2021. Patient age and survival after surgery for esophageal cancer. Annals of Surgical Oncology 28: 159–166. https://doi.org/10.1245/s10434-020-08653-w.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB