Abstract
Incidence rates are popular summary measures of the occurrence over time of events of interest. They are also called mortality rates or failure rates, depending on the context. The incidence rate is defined as the ratio between the total number of events and total follow-up time and can be estimated with the
1 Introduction
Scientific research often entails analyzing the occurrence over time of events of interest, such as death or a cancer diagnosis. This type of analysis is generally known as survival analysis, although different terms are used across different research areas. Stata has a comprehensive suite of commands for the analysis of survival data, and the help documentation offers an excellent description of the relevant methods and available commands (
The
The following section introduces the new
2 Incidence rates and incidence risks
I illustrate the basis of the new
I summarize the content of the dataset with the
The interest of this study is in comparing survival with the three different treatment groups, which comprise two active drugs and a placebo. To help present the arguments contained in the remainder of this section, I set the unit of measurement of the time-to-death variable to be years with the
Henceforth, I refer to the incidence rate as mortality rate, considering the event of interest is death. I estimate the mortality rate by treatment group with the
From the above output, the mortality rate in the placebo group is 1.27. On its website, the Centers for Disease Control and Prevention defines the incidence rate as “a measure of the frequency with which new cases of illness, injury, or other health condition occur, expressed explicitly per a time frame […].” A rate is not a risk. If one interpreted the above rate naïvely as a risk, one might conclude that any given patient is expected to die 1.27 times every year or, alternatively, that 100 patients are expected to report 127 deaths every year.
I now estimate the mortality risk with the
The interpretation of the above risks is simple. For example, in the placebo group, the probability for any given subject to die in a year is 0.83, or alternatively, we expect 83 deaths out of 100 subjects every year.
The mortality rate converges to the mortality risk as the latter tends to zero. This limit behavior is analogous to that of the mean of a Poisson distribution, which converges to the mean of the binomial distribution as the latter tends to zero. This explains why the mortality rate and the mortality risk are numerically closer in drug group 3 than they are in the placebo group 1.
3 The stprisk command
The syntax of the
You must
4 Incidence risks
This section provides the basic definition and interpretation of incidence risks. Its content consists of slightly edited excerpts from published articles (Bottai 2017; Discacciati and Bottai 2017; Lagergren, Bottai, and Santoni 2021; and Bottai, Discacciati, and Santoni 2021). Let T represent a continuous time-to-event variable with support on the positive real half-line. Let S(t) = Pr(T > t) and H(t) = − log{S(t)} indicate the survival function and the cumulative hazard function, respectively. The function S(t) is defined over the entire real line, ℝ, while H(t) is defined over the set {t ∊ ℝ : S(t) > 0}.
The probability of occurrence of an event over the time interval [t0, t1], with t0 < t1, conditional on T > t0, is
defined over the set {t ∊ ℝ: S(t) > 0}. Bottai (2017) defined the geometric rate of the event over the interval [t0, t1], conditional on T > t0, as
Bottai, Discacciati, and Santoni (2021) later referred to the above as the average probability of occurrence of the event. As explained in the articles, the word average indicates the geometric mean. The following example may help interpret G(t0, t1). Suppose the event of interest is death and t0 = 0 and t1 = 3. We split the time interval [0, 3] into the three one-unit disjoint intervals [0, 1], [1, 2], and [2, 3]. The mortality rate is the complement of the probability of surviving all three intervals conditional on being alive at time 0, which is S(3)/S(0). This is algebraically equal to the product of the probabilities of surviving each interval conditional on being alive at its start, S(3)/S(0) = {S(1)/S(0)}{S(2)/S(1)}{S(3)/S(2)}. The average probability per unit of time, therefore, is the geometric mean of the probabilities in the three intervals, {S(3)/S(0)}1 / 3. Applying the geometric mean at each interval yields the probability of surviving the entire period [0, 3].
For example, the probability of surviving one year in the placebo group is 0.225, as evinced in the following output:
From (1), the monthly mortality risk over the first year is G(0, 12) = 1−0.2251 / 12 =0.117.
The definition of incidence risk given in (1) can also be written as
The
5 Conclusions
The new
The risk of occurrence of events is applicable to any event that can occur only once. In the example given in section 2, the event of interest is death, but risks can be assessed for other once-only events, such as first cancer diagnosis, hospital discharge, and first employment.
As the time period (t0, t1) tends to zero, the risk G(t0, t1) defined in (1) tends to the instantaneous risk, and the incidence rate tends to the instantaneous incidence rate, which is also known as the hazard. The similarities and differences between these quantities are expounded in the article by Bottai, Discacciati, and Santoni (2021).
6 Programs and supplemental materials
Supplemental Material, sj-zip-1-stj-10.1177_1536867X221141057 - Estimating the risk of events with stprisk
Supplemental Material, sj-zip-1-stj-10.1177_1536867X221141057 for Estimating the risk of events with stprisk by Matteo Bottai in The Stata Journal
Footnotes
6 Programs and supplemental materials
To install a snapshot of the corresponding software files as they existed at the time of publication of this article, type
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
