Sage Journals: Discover world-class research

Abstract

Abstract. For the analysis of survival data obtained from cancer registries, it is common to use the relative survival framework, which incorporates expected mortality rates rather than relying on cause-of-death information. The relative survival framework enables comparisons between population groups where the effect of mortality due to the cancer is isolated to enable fair comparisons when there is differential other-cause mortality between the groups being compared. The stpp command provides nonparametric estimates of marginal relative survival and a range of other nonparametric estimates, including all-cause survival and crude probabilities of death and also recently developed reference-adjusted measures. In addition, it enables (age) standardization to be performed using both traditional standardization and the individual weighting approach. The genindweights command simplifies the process of calculating individual weights.

Keywords

st0795 stpp genindweights strs stnet relative survival survival analysis

1 Introduction

One of the key functions of cancer registries is to report information on cancer incidence, mortality, and survival (Parkin 2008). For the reporting of cancer survival, it is usual to estimate net survival, which is the probability of being alive as a function of time since diagnosis in the hypothetical situation where it is impossible to die from anything other than the cancer under study (Pohar Perme, Stare, and Estève 2012). This enables comparisons between population groups that have differential other-cause mortality rates.

It is possible to estimate net survival using cause-of-death information obtained from death certificates and assigning whether a death was due to the cancer. However, using cause-of-death information is potentially problematic because of misclassification of cause of death (Percy, Stanek, and Gloeckler 1981) or not capturing deaths that were indirectly due to the cancer (Bright et al. 2020). Therefore, it is more common to use the relative survival framework, where the mortality in excess of that expected in a similar group in the general population is used (Dickman and Adami 2006). The expected mortality rates are stratified by age, calendar year, sex, and potentially other factors such as region or socioeconomic group.

In this article, we discuss the stpp command, which calculates nonparametric estimates of net survival using the Pohar Perme estimator. In addition, various other measures, including all-cause survival and crude probabilities of death (Cronin and Feuer 2000), can be calculated as well as reference-adjusted versions of these measures (Rutherford et al. 2022). Reference adjustment is a recent proposal to enable comparisons of all-cause survival or crude probabilities of death over time or between population groups in such a way that any differences will be due to differential cancer mortality rates and not differences in other-cause mortality rates (Lambert et al. 2020).

When one compares survival between population groups, there may be differences in the age distribution and potentially other important factors. Therefore, it is usual to age-standardize to a common age distribution to allow a fair comparison between population groups with a different age distribution. It is also possible to standardize over other factors. Standardization can be performed in two ways: 1) traditional age standardization estimates survival in each age group and then takes a weighted average of the age group specific estimates; and 2) the individual weighting approach upweights or downweights individuals relative to a reference population (Brenner and Gefeller 1996). The different approaches generally give similar estimates, but the individual weighting approach is particularly useful with sparse data or when you want to standardize over many factors (Rutherford et al. 2020). stpp incorporates both approaches, with the individual weighting approach made easier with the genindweights command.

There are other commands to estimate net survival nonparametrically—namely, strs (Dickman and Coviello 2015), stnet (Coviello et al. 2015), and stns (Clerc-Urmès, Grzebyk, and Hédelin 2014)—so why is there a need for another command? Both strs and stnet are based on categorization of time, while stpp is estimated in continuous time, as was originally proposed by Pohar Perme, Stare, and Estève (2012). Although stns also estimates in continuous time, it calculates only the Pohar Perme estimate and does not provide all-cause or crude probabilities of death. In addition, stns does not allow for individual weights or an option to directly obtain standardized estimates using traditional standardization. None of the three existing commands incorporates reference-adjusted measures.

The article is organized as follows: Section 2 describes the relative survival framework and the Pohar Perme estimator of net survival. Section 3 describes other measures in the relative survival framework, including crude probabilities of death and reference-adjusted measures. Section 4 describes the syntax of the stpp command. Section 5 describes the syntax of the genindweights command. Section 6 contains five examples demonstrating different aspects of estimation using stpp and genindweights. Section 7 concludes the article.

2 The relative survival framework

The relative survival framework attempts to estimate mortality due to the disease of interest (usually cancer) without using cause-of-death information. Instead, it estimates the mortality rate in excess of that expected by using expected mortality rates from the general population. These expected rates are nearly always stratified by age, calendar year, and sex and sometimes by other factors, such as region or socioeconomic status.

Let h_i(t) be the all-cause mortality rate for an individual i at time from diagnosis, t, which is then partitioned into two parts: one due to the expected (background) mortality rate, $h_{i} * (t)$ , representing the mortality rate due to other causes; and another due to the excess mortality rate, λ_i(t), for the mortality rate associated with the diagnosis of cancer.

h_{i} (t) = h_{i} * (t) + λ_{i} (t) (1)

(1)

Transforming to the survival function gives

S_{i} (t) = S_{i} * (t) R_{i} (t) (2)

(2) where S_i(t) is the all-cause survival function,

S_{i} * (t)

is the expected survival function, and R_i(t) is the relative survival function. The subscripts i in (1) and (2) are important because when estimating marginal relative survival, one must account for the fact that expected and excess mortality rates vary by individuals.

To describe the estimation approaches, we need to provide some definitions. Let

N_i(t) be a counting process that starts at 0 and jumps to 1 at the time when individual i dies;

Y_i(t) be an at-risk process, where Y_i(t) = I(Ti ≥ t), that is, 1 if at risk, 0 otherwise;

dN_i(t) = Y_i(t)I(T_i = t) be 1 if an individual i dies at time t and 0 otherwise;

$d N (t) = \sum_{i = 1}^{n} d N_{i} (t)$ be the total number of deaths at time t; and

$Y (t) = \sum_{i = 1}^{n} Y_{i} (t)$ be the total number at risk at time t.

The all-cause Nelson–Aalen estimate of the cumulative hazard is

{\hat{H}}_{NA} (t) = \int_{0}^{t} \frac{d N (u)}{Y (u)} = \int_{0}^{t} \frac{\sum_{i = 1}^{n} d N_{i} (u)}{\sum_{i = 1}^{n} Y_{i} (u)} (3)

(3)

Because this nonparametric estimate changes only at event times, the cumulative hazard in (3) is obtained in practice by summing over unique event times. However, we will keep the integral form to highlight some issues when estimating marginal relative survival.

To extend to relative survival, we need to add additional definitions.

$H_{i} * (t) = \int_{0}^{t} h_{i} * (u) d u$ is the cumulative expected hazard for individual i.

The Ederer II method incorporates the difference in the observed and expected number of events.

{\hat{H}}_{E 2} (t) = \int_{0}^{t} \frac{\sum_{i = 1}^{n} Y_{i} (u) {d N_{i} (u) - d H_{i} * (u)}}{\sum_{i = 1}^{n} Y_{i} (u)} (4)

(4)

The Ederer II method was used for many years, but Pohar Perme, Stare, and Estève (2012) showed that it is a biased estimator of net survival. This is because net survival is in the hypothetical world where you cannot die of other causes, but we are using real-world data to estimate it. In the hypothetical world, we would observe more events and have more people at risk at any time point t because there would be no deaths due to other causes. The idea behind the Pohar Perme method is to upweight individuals who are still at risk to represent those who would have died from other causes. This is done by using the inverse of the expected survival, $S_{i} * (t)$ , as the weights.

w_{i} (t) = \frac{1}{S_{i} * (t)}

Thus, older people, who have lower expected survival, will have higher weights. The cumulative excess hazard for the Pohar Perme estimator is

{\hat{H}}_{PP} (t) = \int_{0}^{t} \frac{\sum_{i = 1}^{n} w_{i} (t) Y_{i} (u) {d N_{i} (u) - d H_{i} * (t)}}{\sum_{i = 1}^{n} w_{i} (u) Y_{i} (u)} (5)

(5)

The number of events, the number of expected events, and the number at risk will all be upweighted.

In both the Ederer II and Pohar Perme estimators, there will be a decrease in the cumulative hazard at times when there is not an event. This means that the survival function will increase between event times and, unlike the Nelson–Aalen, is not strictly a step function. However, in practice, the function is evaluated at specific times, usually taken to be all unique survival times in the dataset, and thus the integral in (4) and (5) is replaced with a sum over the observed survival times with $d H_{i} * (t_{j})$ at time tj calculated as $(t_{j} - t_{j - 1}) h_{i} * (t_{j - 1})$ .

Transformation from cumulative (excess hazard) can be done in two ways, either using a Kaplan–Meier (product integral) type approach or a Breslow/Fleming–Harrington type approach. The default is the Kaplan–Meier (product integral) approach, so to convert the Nelson–Aalen cumulative hazard to a survival function, we use

S_{AC} (t) = \prod_{t_{j} \leq t} {1 - d H_{NA} (t_{j})} (6)

(6) with similar transformations for the Ederer II and Pohar Perme estimators.

For comparisons between groups, it is usually essential to standardize by age group and sometimes other covariates, which is described in the following section.

2.1 Standardization of covariates

When one compares population groups, it is usually necessary to standardize with a specific covariate pattern. This can be any covariate pattern, but commonly, this may just be standardization to a common age distribution. Traditional standardization estimates marginal relative survival separately within S strata and then obtains a weighted average of the stratum-specific estimates, with weights equal to the proportion within that stratum in the reference population.

Alternatively, the individual weighting approach described by Brenner and Gefeller (1996) upweights or downweights individuals relative to the reference population. For example, consider standardization over S strata (for example, different age groups). Let $p_{i}^{S}$ be the proportion in the stratum to which the ith individual belongs and $p_{i}^{R}$ be the corresponding proportion in the reference population. Then the time-fixed weights are

w_{i}^{B} = \frac{p_{i}^{R}}{p_{i}^{S}}

For the Pohar Perme estimator, these weights are combined with the inverse expected survival weights

w_{i} (t) = w_{i}^{B} w_{i} * (t)

which are used in (5).

The individual weighting approach can also be applied when estimating the all-cause cumulative hazard, H_NA(t), in (3) or to the Ederer II cumulative excess hazard, H_E2(t), in (4).

3 Other measures in the relative survival frameworks

3.1 Crude probability of death

The crude probability of death is analogous to the cause-specific cumulative incidence function in competing-risks settings (Geskus 2016).

The all-cause probability of death can be partitioned into the crude probability of death due to the cancer under study, F^c(t), and the crude probability of death due to other causes, F^o(t).

F^{c} (t) = P (T \leq t, death due to cancer) = \int_{0}^{t} S_{AC} (u -) d Λ_{c} (u) (8)

(8)

F^{o} (t) = P (T \leq t, death due to other causes) = \int_{0}^{t} S_{AC} (u -) d H_{o} * (u) (9)

(9)

Note that the crude probability of death due to cancer depends on both the mortality rate due to cancer and the mortality rate due to other causes (through the all-cause survival function). Thus, if we observe differences when comparing different population groups, then it is unclear whether the differences are due to differential cancer mortality, differential other-cause mortality, or some combination of both.

The all-cause survival function can be estimated using (6) with the other ingredients to (8) and (9), obtained as follows:

d H_{o} * (u) = \frac{\sum_{i = 1}^{n} d H_{i} * (u)}{Y (u)}, d Λ_{c} (u) = \frac{d N (u) - d H_{o} *}{Y (u)}

Crude probabilities can be standardized using both traditional standardization or using individual weights; see section 2.1.

3.2 Reference-adjusted measures

The reason why net survival is the standard way of comparing survival in population-based cancer studies is that differences will be solely due to differential cancer mortality rates (under the standard assumptions). Reference-adjusted measures are a way of making all-cause survival and crude probabilities of death comparable. Reference-adjusted all-cause survival gives the all-cause survival that would be observed for the cancer cohort if they instead experienced a common reference standard expected mortality rate, instead of that assumed for the cohort. First, other-cause mortality is removed by calculating relative survival using the relevant expected rates for the cohort under study. Then, this is converted back to an all-cause measure using the reference expected mortality rates, which are common across the compared groups.

We need to introduce more definitions. Let

$S_{i}^{* *} (t)$ Z be the expected survival in the reference population and

h∗∗ $h_{i}^{* *} (t)$ Z be the expected mortality rate in the reference population.

We now redefine the weights from (7) as

w_{i} (t) = w_{i}^{B} {\frac{S_{i}^{* *} (t)}{S_{i} * (t)}}

Using these weights in (5) gives a measure proposed by Sasieni and Brentnall (2016). This is a hypothetical measure that depends on the expected survival in the reference population and is designed to allow fair comparisons across population groups using an Ederer II-like estimate. It is acknowledged that interpretation is difficult, and thus Sasieni and Brentnall described it as a standardized relative survival index. When $S_{i}^{* *} (t) = S_{i} * (t)$ , the Sasieni and Brentnall measure gives the Ederer II estimate. Note that the weights required for standardization by age (or other covariates) are included by incorporation of $w_{i}^{B}$ .

Reference adjustment extends these ideas to all-cause survival and the crude probability of death. For all-cause survival, we extend (5) to

{\hat{H}}_{R} (t) = \int_{0}^{t} \frac{\sum_{i = 1}^{n} w_{i} (t) Y_{i} (u) {d N_{i} (u) - d H_{i} * (u) + d H_{i}^{* *} (u)}}{\sum_{i = 1}^{n} w_{i} (u) Y_{i} (u)} (12)

(12) When transforming to survival, this gives the all-cause survival that would be observed if the study population instead experienced the common reference standard expected mortality rates. This means that any differences between population groups will only be dependent on differences in excess mortality rates and not differences in other cause mortality rates.

There are several special cases that can be derived from (12). If $H_{i}^{* *} (t) = H_{i} * (t)$ and $S_{i}^{* *} (t) = S_{i} * (t)$ , then it reduces to the Nelson–Aalen estimator in (3). If $H_{i}^{* *} (t) = 0$ and $S_{i}^{* *} (t) = 1$ , then it reduces to the Pohar Perme estimator in (5). Thus, the Pohar Perme estimator is a special case of reference adjustment, where the reference-expected mortality rates lead to individuals being immortal with respect to deaths from other causes.

Similarly, the crude probability of death due to cancer and other causes can be extended to reference adjustment. Rather than the all-cause survival being based on the Nelson–Aalen estimator, the reference-adjusted all-cause survival in (12) is used. Similarly, rather than use $d Λ_{c} (u) and d H_{o} * (u)$ as defined in (10), we define

{\hat{Λ}}_{R} (t) = \int_{0}^{t} \frac{\sum_{i = 1}^{n} w_{i} (t) Y_{i} (u) {d N_{i} (u) - d H_{i} * (u)}}{\sum_{i = 1}^{n} w_{i} (u) Y_{i} (u)}

using the weights defined in (11). This gives the measure proposed by Sasieni and Brentnall (2016) described above. Similarly, the marginal expected cumulative hazard in the reference population is

{\hat{H}}_{R}^{* *} (t) = \int_{0}^{t} \frac{\sum_{i = 1}^{n} w_{i} (t) Y_{i} (u) {d N_{i} (u) - d H_{i}^{* *} (u)}}{\sum_{i = 1}^{n} w_{i} (u) Y_{i} (u)}

The reference-adjusted crude probabilities of death due to cancer and other causes are

F_{R}^{c} (t) = \int_{0}^{t} S_{R} (u -) d Λ_{R} (u), F_{R}^{o} (t) = \int_{0}^{t} S_{R} (u -) d H_{R}^{* *} (u)

3.3 Some comments on the choice of reference population and age standard

There is a choice over which reference to use for both standardization and the expected mortality rates. It is useful to consider these together and to make one of the population groups being compared the reference for both. For example, when one looks at temporal trends in survival, it is useful to standardize to the age distribution in the most recent calendar period and use the expected mortality rates from the same calendar period (Lambert et al. 2025). This makes the all-cause survival or crude probability of death for the most recent calendar period a factual estimate, with previous calendar periods being counterfactual to ensure fair comparisons. Similarly, if one population group is of specific interest, for example, in international comparisons of cancer survival and taking the perspective for one particular country, then it is sensible to provide a factual estimate for the country of interest, with other countries adjusted.

4 The stpp command

The stpp command estimates nonparametric estimates of marginal relative survival and related measures.

4.1 Syntax

using filename specifies a file containing expected mortality rates.

4.2 Options

datediag(varname) specifies the variable containing the date at diagnosis as a Stata date variable. datediag() is required.

pmother(varlist) specifies additional variables in the population mortality file. Usually, this will include sex, but there could additionally be information on the region, such as socioeconomic status. All variables listed should be in both the data and the population mortality file. pmother() is required.

agediag(varname) specifies the variable containing age at diagnosis, which should be in years. It is best to avoid using truncated (integer) age, which assumes that each person was diagnosed on their birthday. Only one of the agediag() or datebirth options can be specified.

datebirth(varname) specifies the variable name containing the date of birth. Only one of the agediag() or datebirth options can be specified.

display(display_option) changes the display of the results. By default, stpp displays marginal relative survival (display(rs)) estimates, even if using the allcause() or crudeprob() option. You can instead display all-cause probabilities by specifying display(ac) or display crude probabilities by specifying display(cp). If you specify display(none), no results will be displayed. Note that this option controls only what is displayed on the screen; results are still stored as matrices and potentially a frame if use the frame() option.

allcause(newvar) calculates all-cause survival or death probabilities.

by(varlist) calculates separate estimates of each statistic for the groups defined by varlist. A missing value is included as a level when using by().

contrast(contrasttype, suboptions) calculates contrasts between different levels of a variable specified in by(). Currently, the only contrast type is difference. The contrasts and confidence intervals are displayed on the screen and stored in a new frame if the frame() option is specified.

The following suboptions are available:

baselevel(#) specifies the base level when performing contrasts. For contrast difference, this is the negative term. When there is more than one variable specified in the by() option, contrasts are calculated at all levels of the variable not specified in the baselevel() option.

per(#) multiplies the calculated contrast by #.

crudeprob(newvarlist) calculates crude probability of death. If only one new variable is listed, then the crude probabilities of death due to the disease are calculated. If two variables are listed, then both crude probabilities of death due to the disease and other causes are calculated.

deathprob calculates probabilities of death rather than survival. This affects both net and all-cause survival.

ederer2 calculates the Ederer II estimate rather than the default Pohar Perme estimate.

fh uses the Fleming–Harrington estimator of survival (the exponential of the negative cumulative [excess] hazard). The default is the product integral (Kaplan–Meier type) method.

frame(framenamer[, replace]) saves the statistics to a frame. These are evaluated at the times specified in the list() option. You can use the replace suboption to overwrite an existing frame.

inccens evaluates survival functions at censoring and event times. This will lead to more precise long-term estimates when there are sparse data but will increase computation time.

indweights(varname) incorporates individual-level weights to upweight or downweight individuals relative to a reference population. This is useful for external age standardization. See the syntax for genindweights in section 5 for various ways to calculate the individual weights.

level(#) specifies the confidence level, as a percentage, for confidence intervals. The default is level (95) or as set by set level.

list(numlist) specifies times to list estimates of marginal relative survival (and confidence intervals). For example, list(1 5 10) will list marginal survival at 1, 5, and 10 years.

pmage(varname) specifies the name of the age variable in the population mortality file. The default is pmage(_age). This variable cannot exist in the patient data file but should exist in the population mortality file.

pmyear(varname) specifies the name of the year variable in the population mortality file. The default is pmyear(_year). This variable cannot exist in the patient data file but should exist in the population mortality file.

pmrate(varname) specifies the name of the rate variable in the population mortality file. The default is pmrate(rate). The rate should be expressed per person year. If you have only one-year survival probabilities in the population mortality file, then you can obtain the rate using generate rate = -ln(survprob), where survprob is the one-year survival probability.

pmmaxage(#) specifies the maximum age for which general-population mortality rates are provided in the population mortality file. Rates for individuals older than this value are assumed to be the same as for maximum age #. The default is pmmaxage(99).

pmmaxyear(#) specifies the maximum year for which general-population mortality rates are provided in the population mortality file. Rates for individuals still at risk after this year are assumed to be the same as for maximum year #. The default is pmmaxage(99).

popmort2(filename, suboptions) specifies the name of a second population mortality file. This leads to a second set of time-dependent weights, so the estimator of Sasieni and Brentnall is incorporated when calculating marginal relative survival. When this option is specified when calculating crude probabilities of death or all-cause survival or death probabilities, reference-adjusted measures are calculated. The following suboptions can be specified: pmage2(), pmother2(), pmrate2(), and pmyear2(). If unspecified, they take the same variable names as given in the corresponding main options.

standstrata(varname) specifies the variable defining strata across which to average marginal relative survival. Weights are specified using the standweights() option.

standweights(numlist) specifies the weights for obtaining the standardized estimates. numlist should be a length equal to the number of levels specified in standstrata().

using2(filename, suboptions) is a synonym for popmort2().

verbose gives some details about how far the estimation process has proceeded.

graph creates a plot of the main marginal relative-survival estimate with a confidence interval. Only one of the graph, graphname(), or graphcode() options is required to get a plot.

graphname(namer[, replace]) names the graph. You can replace an existing graph with the replace suboption.

graphcode(filenamer[, replace]) creates a new do-file that contains the code to recreate the standard graph. This allows the possibility to make changes to the plot— for example, to add titles, to determine whether to have a risk table, etc.

4.3 Stored results

In addition to the new variables created, the following matrices are returned with estimates evaluated at times given in the list() option.

Most of the time, we find it easier to output this information to a new frame using the frame() option.

5 The genindweights command

The genindweights command generates weights to standardize to a different population. Often, the standardization is by age, but the command is more general and can be used to standardize over many variables.

5.1 Syntax

The reference population to standardize to is defined by using refconditional(), refdata(), refexternal(), or refframe().

5.2 Options

refconditional([exp], strata(varlist)) defines the reference subpopulation based on expression exp. For example, it is sometimes useful to age-standardize to the most recent calendar period when investigating time trends. The strata() option is required and gives the variables that define the groups to standardize over. refconditional(), refdata(), refexternal(), or refframe() must be specified.

refdata(filename, strata(varlist) [refwtname(varname) ]) specifies the name of the dataset where external weights are stored. The strata() option is required and specifies the variables that define the groups to standardize over. These variables must exist in both the active frame and the dataset containing the weights. The refwtname() option gives the name of the variable containing the reference weights. By default, this variable is named refp. refconditional(), refdata(), refexternal(), or refframe() must be specified.

refexternal(external_weights) gives the name of the external weights used for age standardization. Options include ICSS1_5, ICSS2_5, and ICSS3_5 for the five age group weights defined by Corazziari, Quinn, and Capocaccia (2004) and ICSS1_5N and ICSS2_5N for the weights defined in the NORDCAN survival studies used by Lund-berg et al. (2020). refconditional(), refdata(), refexternal(), or refframe() must be specified.

refframe(frame, strata(varlist) [refwtname(varname) l) specifies the name of the frame where external weights are stored. The strata() option is required and specifies the variables that define the groups to standardize over. These variables must exist in both the active frame and the frame containing the weights. The refwtname() option gives the name of the variable containing the reference weights. By default, this variable is named refp. refconditional(), refdata(), refexternal(), or refframe() must be specified.

agegroup(varname) specifies the name of the variable that defines age groups. This option is required when using refexternal() but cannot be used when using option refconditional(), refdata(), or refframe().

by(varlist) calculates relative weights separately for each of the groups defined by varlist.

byrestrict(exp) restricts calculation of the observed proportions to those defined by exp. Note that values are still obtained for individuals not satisfying the expression but are not included in the calculation. It is generally more useful to use the restrict() option, which also places restrictions when calculating the reference proportions (where appropriate).

obsproportion(newvar) saves the observed proportions in a new variable.

refproportion(newvar) saves the reference proportions in a new variable.

restrict(exp) restricts calculation of the observed and reference proportions (when using the refconditional() option) to those defined by exp. Note that weights are still obtained for individuals not satisfying the expression but are not included in the calculation of the weights. The most common use is when implementing period analysis in survival analysis: here we want the observed proportions to be defined based on those diagnosed in a calendar period window, as well as the reference proportions when using refconditional().

saverefframe(newframename[, replace refwtname(varname)]) gives the name of a new frame to save the external weights. This is useful with refconditional() so that the weights can later be applied to a different dataset using the refframe() option. The refwtname() option gives the name of the variable that will contain the reference weights.

stignore specifies that survival analysis checks not be performed. Although originally developed for use with survival data, genindweights is also potentially useful in other contexts, so this option will omit any checks that data have been stset and _st=1.

nosummary specifies that the summary table of weights not be displayed.

6 Examples

The five examples use data on women diagnosed with breast cancer in the Northwest region of England 1985–1990 with follow-up to the end of 1995. The event of interest is death from any cause. Covariates of interest include the effect of deprivation—defined in terms of the area-based Carstairs score and age at diagnosis. There are five deprivation groups ranging from the least deprived (most affluent) to the most deprived quintile in the population. In this article, we restrict analysis to the least (dep == 1) and the most deprived (dep == 5) groups for simplicity.

The data can be loaded and stset as follows:

There are 7,452 women included in the analysis. Follow-up has been restricted to 10 years using the exit(time 10) option.

In any relative survival analysis, the expected rates (or survival probabilities) need to be merged in. For the nonparametric measures, this information is needed at unique time points, and stpp performs this merging at multiple time points within Mata. An example listing of the population mortality (popmort) file is shown below.

The listing shows the expected rates and expected one-year survival probabilities for women aged 71–75 in 1995 for the least and most deprived groups. stpp requires the expected rate variable, while other relative survival commands require the one-year expected-survival probability.

6.1 Example 1: Simple use of stpp

An initial use of stpp is to estimate marginal relative survival in the two deprivation groups.

The output lists marginal relative survival at 1, 5, and 10 years, showing that women living in more deprived areas have lower survival.

A brief explanation of the syntax. The expected rates are stored in a dataset with the filename given after using. The variables containing age and date at diagnosis are given in the agediag() and datediag() options, respectively. The by(varlist) option will calculate marginal relative survival separately in groups defined in varlist. The pmyear(), pmage(), and pmrate() options give the names, respectively, of the calendar year, age, and rate variables in the population mortality file. The pmother() option gives any other variables the rates are stratified by in the population mortality file. In this case, the population mortality file is stratified by sex and deprivation group. The list() option will output estimates of marginal survival at the specified times. The new variable, R_pp, is the marginal relative survival at the value of _t. This can be plotted, but we will demonstrate the graph option, which will produce a graph automatically. In addition, we will add the frame() option, which saves the estimates at the timepoints defined by the list() option in a frame.

The corresponding plot can be seen in figure 1. This shows that women living in the more deprived areas have lower relative survival over the whole of the 10 years of follow-up. The code used to produce the graph can be saved by adding the graphcode() option so that it can be edited to change colors, add text, improve the legend, etc.

Figure 1.

Marginal relative survival by deprivation group

The frame() option has saved estimates in a new frame, stpp_results, where the times are defined by those given in the list() option. This is useful for producing summary tables or plots.

6.2 Example 2: Standardization to an external population

In example 1, marginal relative survival was calculated separately in both groups. If there is a difference in the age distribution between the two groups, then this could partially explain the observed difference in survival.

The output below compares the age distribution in five age categories between the two deprivation groups.

There is a higher proportion of older women in the most deprived group, and because survival tends to be worse in older age groups, this could partially explain the differences. We will use age standardization to force the same age distribution on both groups. We will first apply the age distribution defined in the International Cancer Survival Standard (ICSS) (Corazziari, Quinn, and Capocaccia 2004). This applies weights of 0.07, 0.12, 0.23, 0.29, and 0.29 in the five age groups defined in the table above.

In the code for stpp below, we use the standstrata() option to define the variable to standardize over (agegrp) and the standweights() option to define weights in the ICSS1 age standard. These two options will lead to traditional age standardization being used, where the age-standardized estimate is a weighted average of the five age-group-specific estimates.

The estimates of marginal relative survival are slightly lower than those seen in example 1 because the ICSS age distribution is older than that actually observed, so we are giving more weight to age groups with lower survival.

An alternative way to age-standardize is to use individual weights where each individual is upweighted or downweighted relative to the reference population. To calculate the individual weights, we can use the genindweights command. Because we are using the predefined external weights from ICSS in five age groups, we can use the refexternal(ICSS1_5) option. Because our analysis will calculate marginal relative survival separately by deprivation group, we need to calculate the weights separately using by(dep).

This has calculated the weights, which can then be passed to stpp. genindweights has listed the observed and reference proportions, together with the calculated individual weights ref/obs. For example, for the least deprived group, the proportion of women in the 75-plus year old group is 0.1963, while in the reference group, the proportion is 0.29. Thus, each woman in this age group now represents 1.477 women to account for the fact they are underrepresented compared with the reference population.

The individual weights are passed to stpp using the indweights() option with the output shown below.

The standardized estimates are very similar to those obtained when using traditional age standardization. We tend to favor standardization using individual weights because they work better with sparse data and when there are many categories to standardize over.

6.3 Example 3: Reference weights based on a subset of data

In example 2, we used an external age standard. It can also be useful to define the reference age distribution using a particular subgroup in the study population. For example, this can be a recent calendar period when investigating temporal trends or a specific group of interest. Here we will use the age distribution in the most deprived group as the reference age distribution. To do this, we use the genindweights command with the refconditional() option.

The output shows the observed and reference proportions in each age group together with the individual weights. Because we are using the most deprived group for the reference age distribution, the observed and reference weights are identical for this group, and thus the individual weights are equal to 1. In the least deprived group, there are higher observed proportions for the two youngest age groups compared with the reference, so individual weights are less than 1. In the three oldest age groups, there is a lower observed proportion compared with the reference, so the individual weights are greater than 1.

These individual weights can be passed to stpp using the indweights() option.

Note that because the weights for most deprived group are equal to 1, the estimates of marginal relative survival are identical to the unadjusted estimates in example 1. The estimates for the least deprived group have changed because they have been age standardized to the age distribution of the most deprived group.

Rather than use a specific subgroup as the reference, we may want to use the combined age distribution. This can be achieved as follows:

In addition, we used the saverefframe() option of genindweights. This saves the reference distribution to a new frame, which could then be used in a new dataset using the refframe() option. The new frame is shown below.

6.4 Example 4: Alternative measures

In this example, we show how stpp will also calculate all-cause survival and crude probabilities of death. These will be calculated through use of the allcause() and crudeprob() options, respectively. In addition, we will use the deathprob option so that we estimate probabilities of death, rather than survival, for both the relative and the all-cause measures.

The different probabilities are plotted in figure 2. The all-cause probability of death is easiest to interpret and simply gives the probability of being dead from any cause as a function of time and is equivalent to what would be obtained using sts graph, failure by(dep). The crude probabilities of death partition the all-cause probability of death into that due to cancer and other causes. Summing the crude probabilities of death due to cancer and due to other causes will give the all-cause probability of death. The net survival will be greater than the crude probability of death because it assumes that it is not possible to die from other causes. All four probabilities are higher for women from the most deprived areas when compared with women from the least deprived areas.

Figure 2.

Net, all-cause, and crude probabilities of death for the least and most deprived groups

Because we used the frame() option, results are saved at 1, 5, and 10 years (from the list() option) in a new frame. This is listed below.

Note that confidence intervals have also been calculated but are not shown.

The estimates shown above and in figure 2 are unadjusted. We can standardize in the same way as examples 2 and 3 using individual weights. We will make the most deprived group the reference group as before.

. genindweights indwt, by(dep) refconditional(dep==5, strata(agegrp))

(output omitted)

Now we just add the indweights() option to stpp to obtain the standardized estimates.

The estimates at 1, 5, and 10 years are listed below.

Because the most deprived group is the reference group, the estimates have not changed compared with the unadjusted estimates for this group. The estimates for the least deprived group have changed because they are now age standardized to the age distribution of the most deprived group.

6.5 Example 5: Reference adjustment

This example uses the popmort2() option to incorporate a second set of expected mortality rates. We will use the rates among the most deprived group as the reference for both the expected rates and the age distribution to standardize to. This means that estimates for the most deprived group will be factual and that estimates for the least deprived group will be counterfactual in that they will give estimates for a population with the age distribution of the most deprived and the expected rates of the most deprived. This means that any differences will be due to different excess mortality rates only.

We create a separate file for the reference-expected mortality rates.

Only rows where dep == 5 are kept, and then dep is dropped from the dataset. We then load and stset the breast_NW data.

We will make the most deprived group the reference group for age standardization as in previous examples.

We now run stpp with the popmort2() option.

The popmort2() option gives the name of the file containing the reference-expected mortality rates. If the names of the variables in the data file are the same as specified in the pmage(), pmyear(), and pmrate() options, then they do not need to be specified again. The pmother2(sex) option is given because the reference-expected rates are not stratified by deprivation, so it needs to be different from the pmother() option.

This output gives the Sasieni and Brentnall estimates. This measure can be thought of as being similar to an Ederer II estimator, but the inclusion of the reference-expected rates removes dependency on potentially different expected mortality rates. For the most deprived group, the estimates are the same as those obtained by using the ederer2 option in example 3 because for this group, the reference rates are equal to the study-expected rates.

By adding the allcause() and crudeprob() options, we obtain the reference-adjusted measures.

To see the differences to the all-cause estimates calculated in example 4, figure 3 shows the nonreference-adjusted and reference-adjusted estimates for the two groups. All estimates are age standardized to the age distribution in the most deprived group. Because we have used the most deprived group as the reference for both the age distribution and expected mortality rates, the nonreference-adjusted estimates and the reference-adjusted estimates are identical for this group (and would also be identical to nonstandardized estimates). Because the most deprived group has higher expected mortality rates, we can see their impact on them being used as the reference rates for the reference-adjusted measures for the least deprived group. The all-cause survival increases because we are applying a higher other-cause mortality rate. The crude probability of death has decreased because increasing the expected mortality rate would mean that individuals have a higher chance of dying from other causes before they would die from their cancer. A key point is that the differences in the reference-adjusted measures are not now due to differential other-cause mortality because the common reference expected rates are being applied.

Figure 3.

Net, all-cause, and crude probabilities of death for the least and most deprived groups. The darker lines are nonreference adjusted, and the lighter lines are reference adjusted. All estimates are age standardized to the age distribution in the most deprived group.

7 Conclusions

We have demonstrated the usefulness of the stpp command for estimation of various survival measures in the relative survival framework. Although other commands exist, the advantage of stpp is that it provides estimates in continuous time, provides different approaches to (age) standardization, and incorporates reference-adjusted measures. We believe that the approaches presented here demonstrate that stpp will be valuable for survival analysis performed on data from cancer registries.

There are several extensions that there has not been room to show. First, stpp allows for delayed entry, so the commonly used method of period analysis can be used, which considers only risk time on a specified calendar period window (Brenner, Gefeller, and Hakulinen 2004). Second, the reference-adjusted methods described here are very similar to the computations required to calculate avoidable deaths (Rutherford et al. 2015).

Supplemental Material

sj-txt-2-stj-10.1177_1536867X261425755 - Supplemental material for The stpp command for marginal relative survival and related measures

Supplemental material, sj-txt-2-stj-10.1177_1536867X261425755 for The stpp command for marginal relative survival and related measures by Paul C. Lambert and Mark J. Rutherford

Supplemental Material

sj-dta-1-stj-10.1177_1536867X261425755 - Supplemental material for The stpp command for marginal relative survival and related measures

Supplemental material, sj-dta-1-stj-10.1177_1536867X261425755 for The stpp command for marginal relative survival and related measures by Paul C. Lambert and Mark J. Rutherford

Footnotes

Acknowledgments

The research presented in this manuscript was supported by the Swedish Cancer Society (Cancerfonden) and the Swedish Research Council (Vetenskapsrådet) and by the National Institute for Health and Care Research (NIHR) Applied Research Collaboration East Midlands and NIHR Leicester Biomedical Research Centre. Mark J. Rutherford is a coinvestigator of the NIHR Policy Research Unit on Cancer Awareness, Screening and Early Diagnosis (NIHR206132). The views expressed are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

8

To install the software files as they existed at the time of publication of this article, type

A more recent version of the command may be available from the Statistical Software Components Archive by typing

About the authors

Paul C. Lambert is a biostatistician based at the Cancer Registry of Norway in Oslo, Norway, and a visiting professor of biostatistics at Karolinska Institutet in Stockholm, Sweden. He is a longtime Stata user who has written various commands, including stpm3, standsurv, gensplines, and mlad.

Mark J. Rutherford is a professor of biostatistics at the University of Leicester, UK. He is a longtime Stata user and has a keen interest in applying survival methods in Stata, particularly for population-based cancer data.

References

Brenner

Gefeller

. 1996. An alternative approach to monitoring cancer patient survival. Cancer 78: 2004–2010. 10.1002/(SICI)1097-0142(19961101)78:9%3C2004::AID-CNCR23%3E3.0.CO;2-%23.

Brenner

Gefeller

Hakulinen

. 2004. Period analysis for ‘up-to-date’ cancer survival data: Theory, empirical evaluation, computational realisation and applications. European Journal of Cancer 40: 326–335. 10.1016/j.ejca.2003.10.013.

Bright

C. J.

Brentnall

A. R.

Wooldrage

Myles

Sasieni

Duffy

S. W.

. 2020. Errors in determination of net survival: Cause-specific and relative survival settings. British Journal of Cancer 122: 1094–1101. 10.1038/s41416-020-0739-4.

Clerc-Urmès

Grzebyk

Hédelin

. 2014. Net survival estimation with stns. Stata Journal 14: 87–102. 10.1177/1536867X1401400107.

Corazziari

Quinn

Capocaccia

. 2004. Standard cancer patient population for age standardising survival ratios. European Journal of Cancer 40: 2307–2316. 10.1016/j.ejca.2004.07.002.

Coviello

Dickman

P. W.

Seppä

Pokhrel

. 2015. Estimating net survival using a life-table approach. Stata Journal 15: 173–185. 10.1177/1536867X1501500111.

Cronin

K. A.

Feuer

E. J.

. 2000. Cumulative cause-specific mortality for cancer patients in the presence of other causes: A crude analogue of relative survival. Statistics in Medicine 19: 1729–1740. 10.1002/1097-0258(20000715)19:13%3C1729::AID-SIM484%3E3.0.CO;2-9.

Dickman

P. W.

Adami

H. O.

. 2006. Interpreting trends in cancer patient survival. Journal of Internal Medicine 260: 103–117. 10.1111/j.1365-2796.2006.01677.x.

Dickman

P. W.

Coviello

. 2015. Estimating and modeling relative survival. Stata Journal 15: 186–215. 10.1177/1536867X1501500112.

10.

Geskus

R. B

. 2016. Data Analysis with Competing Risks and Intermediate States. New York: Chapman and Hall/CRC. 10.1201/b18695.

11.

Lambert

P. C.

Andersson

T. M. L.

Myklebust

T. Å.

Møller

Rutherford

M. J.

. 2025. Monitoring temporal trends in cancer survival: Choosing appropriate standards when accounting for age and other-cause mortality variation over time. Cancer Epidemiology, Biomarkers and Prevention 34: 1141–1148. 10.1158/1055-9965.EPI-24-1727.

12.

Lambert

P. C.

Andersson

T. M.-L.

Rutherford

M. J.

Myklebust

T. Å.

Møller

. 2020. Reference-adjusted and standardized all-cause and crude probabilities as an alternative to net survival in population-based cancer studies. International Journal of Epidemiology 49: 1614–1623. 10.1093/ije/dyaa112.

13.

Lundberg

F. E.

Andersson

T. M.-L.

Lambe

Engholm

Mørch

L. S.

Johannesen

T. B.

Virtanen

Pettersson

Ólafsdóttir

E. J.

Birgisson

Johansson

A. L. V.

Lambert

P. C.

. 2020. Trends in cancer survival in the Nordic countries 1990–2016: The NORDCAN survival studies. Acta Oncologica 59: 1266–1274. 10.1080/0284186X.2020.1822544.

14.

Parkin

D. M

. 2008. The role of cancer registries in cancer control. International Journal of Clinical Oncology 13: 102–111. 10.1007/s10147-008-0762-6.

15.

Percy

Stanek

Gloeckler

. 1981. Accuracy of cancer death certificates and its effect on cancer mortality statistics. American Journal of Public Health 71: 242–250. 10.2105/AJPH.71.3.242.

16.

Pohar Perme

Stare

Estève

. 2012. On estimation in relative survival. Biometrics 68: 113–120. 10.1111/j.1541-0420.2011.01640.x.

17.

Rutherford

M. J.

Andersson

T. M.-L.

Møller

Lambert

P. C.

. 2015. Under-standing the impact of socioeconomic differences in breast cancer survival in England and Wales: Avoidable deaths and potential gain in expectation of life. Cancer Epidemiology 39: 118–125. 10.1016/j.canep.2014.11.002.

18.

Rutherford

M. J.

Andersson

T. M.-L.

Myklebust

T. Å.

Møller

Lambert

P. C.

. 2022. Non-parametric estimation of reference adjusted, standardised probabilities of all-cause death and death due to cancer for population group comparisons. BMC Medical Research Methodology 22: art. 2. 10.1186/s12874-021-01465-w.

19.

Rutherford

M. J.

Dickman

P. W.

Coviello

Lambert

P. C.

. 2020. Estimation of age-standardized net survival, even when age-specific data are sparse. Cancer Epidemiology 67: art. 101745. 10.1016/j.canep.2020.101745.

20.

Sasieni

Brentnall

A. R.

. 2016. On standardized relative survival. Biometrics 73: 473–482. 10.1111/biom.12578.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.06 MB

0.00 MB