Sage Journals: Discover world-class research

Abstract

Lexis diagrams are rectangular arrays of event rates indexed by age and period. Analysis of Lexis diagrams is a cornerstone of cancer surveillance research. Typically, population-based descriptive studies analyze multiple Lexis diagrams defined by sex, tumor characteristics, race/ethnicity, geographic region, etc. Inevitably the amount of information per Lexis diminishes with increasing stratification. Several methods have been proposed to smooth observed Lexis diagrams up front to clarify salient patterns and improve summary estimates of averages, gradients, and trends. In this article, we develop a novel bivariate kernel-based smoother that incorporates two key innovations. First, for any given kernel, we calculate its singular values decomposition, and select an optimal truncation point—the number of leading singular vectors to retain—based on the bias-corrected Akaike information criterion. Second, we model-average over a panel of candidate kernels with diverse shapes and bandwidths. The truncated model averaging approach is fast, automatic, has excellent performance, and provides a variance-covariance matrix that takes model selection into account. We present an in-depth case study (invasive estrogen receptor-negative breast cancer incidence among non-Hispanic white women in the United States) and simulate operating characteristics for 20 representative cancers. The truncated model averaging approach consistently outperforms any fixed kernel. Our results support the routine use of the truncated model averaging approach in descriptive studies of cancer.

Keywords

Lexis diagram kernel methods nonparametric smoothing Surveillance Epidemiology and End Results Program cancer surveillance research

1. Introduction

The Lexis diagram is a fundamental construct in epidemiology, demography, and sociology.¹ A Lexis diagram is a rectangular grid of square cells with age along one axis and calendar period along the other. Individuals from a surveilled population contribute person-years and events (births, deaths, cancers, etc.) to each cell. Analysis of Lexis diagrams can elucidate temporal patterns and provide clues about the etiology of an event.² Notably, the analysis of Lexis diagrams is a cornerstone of cancer surveillance research.

Population-based cancer rates usually exhibit Poisson-type variability.³ This is the default mode for analysis, although negative binomial models are available that accommodate more flexible mean-to-variance relationships.^4–6 Within any given Lexis diagram, intrinsic variability (“noise”) can mask important signals and limit the power of comparative analyses to identify heterogeneity. In epidemiologic research, the classic approaches deal with intrinsic variability by aggregating granular data (e.g. data originally sampled at the scale of a single-years of age and/or calendar years) into broader age and period categories,⁷ typically, 5-year age groups. Unfortunately, there is nothing optimal about such traditional groupings.

Several methods have been proposed to de-noise or smooth an observed Lexis diagram before the usual descriptive^7–9 and analytical¹⁰ methods are applied. Keiding¹ pioneered the first such approach using bivariate kernel functions. This was a major advance; however, this classic method has two major limitations. First, the choice of kernel is arbitrary, hence so too is the implicit bias-variance trade-off. Second, the estimated variance-covariance matrix does not reflect uncertainty arising from kernel (model) selection. In practice, analysts tend to use small bandwidths to minimize bias rather than variance. However, there is no evidence that such choices are optimal.

Other investigators have developed more sophisticated methods. Currie et al.¹¹ produced smooth Lexis diagrams using bivariate P-splines and second-difference penalty functions. Camarda¹² implemented the Currie approach in R and developed methods to incorporate established demographic constraints using asymmetric penalties.¹³ Dokumentov et al.¹⁴ developed a hybrid approach that combines smoothing methods with additional parameters that account for abrupt changes in age incidence by period and cohort. Chien et al.^15,16 developed a fully Bayesian smoothing approach using two-dimensional Bernstein polynomials, data-dependent prior distributions, and the Metropolis-Hastings reversible jump algorithm. Martinez-Hernandez and Genton¹⁷ developed nonparametric methods for the situation where the data can be viewed as a functional trend in one temporal dimension (e.g. age) that is modulated over the other dimension (e.g. time).

In this article, we develop a novel and complementary kernel-based approach using truncated bivariate kernel functions^18–20 and information theory.²¹ We refer to these algorithms as “filters” and the outputs as “filtrations” because they do more than simply “smooth” the data: by design, they pass through all signals that materially reduce the within-Lexis mean squared error. Our approach has several attractive features. It is fast, and automatic, provides the smoothest possible kernel-based estimates consistent with data, yields a variance–covariance matrix that takes model selection into account, and has excellent performance.

In Section 2, we review notation and background on Lexis diagrams and kernel functions and assemble a representative panel of incident cancers. In Section 3, we describe the new filters and corresponding variance calculations. In Section 4, we illustrate our proposed methods using invasive estrogen receptor (ER)-negative breast cancer incidence among non-Hispanic white women in the United States, and in Section 6, we simulate our method's operating characteristics over the cancer incidence panel. In Section 7, we give a summary and discuss avenues for future research. We provide technical details in an online supplement. Our R code is freely available upon request.

2. Background

2.1 Event rates on a Lexis diagram

We begin with a brief overview of the Lexis diagram and introduce our notation.²² A Lexis diagram is a rectangular field with attained age a along the y-axis and calendar time p along the x-axis. For any given population and outcome, individual event times and corresponding person-years at risk are summed beginning at age $a_{0}$ at calendar time $p_{0}$ within a grid of square cells defined by A age intervals and P period intervals of width $Δ$ . We can describe the structure of Lexis diagrams using the notation $L (a_{0}, p_{0}, Δ, A, P)$ . For each cell, the event rate is $y_{a p} = E_{a p} / O_{a p}, a = 1, \dots, A, p = 1, \dots, P$ , where $E_{a p}$ is the accumulated number of events and $O_{a p}$ is the corresponding person-years. Each cell can also be referenced by its central value

(a_{*} (a), p_{*} (p)) = ((a_{0} - \frac{Δ}{2}) + Δ a, (p_{0} - \frac{Δ}{2}) + Δ p)

We will denote a rate matrix ascertained over a Lexis diagram as

Y_{A \times P}

, and the vectorized version (stacked column-on-column) as

y = vec (Y)

. Our working model posits that the cells contain realizations of independent quasi-Poisson random variables with over-dispersion parameter

ϕ^{2}

. Given an estimate

{\tilde{ϕ}}^{2}

ϕ^{2}

(Section 3.1), the estimated variance–covariance matrix of

y

\hat{Var} (y) = V_{y} = {\tilde{ϕ}}^{2} diag (vec ({[\frac{E_{a p}}{O_{a p}^{2}}]}_{A \times P})) \equiv {\tilde{ϕ}}^{2} Σ_{y}

2.2 Cancer incidence panel

We extracted authoritative cancer incidence data for the United States from the Surveillance, Epidemiology, and End Results Program's Thirteen Registries Database (SEER-13²³) for 50 single-years of age (ages 35–84) and 27 calendar years (1992–2018). We selected a panel of representative scenarios defined by cancer site (14 cancers associated with obesity²⁴), sex, and standard SEER race/ethnicity categories (Table 1 and Supplemental Part A). We redistributed breast cancer cases with missing or unknown ER status to the corresponding age- and year-specific ER positive and negative cells using a validated approach.^25,26 The panel includes a total of 1,049,633 incident cases.

2.3 Kernel functions

See Supplemental Part B for details. The standard form of a univariate kernel function²⁰ is represented by a canonical “shape” function $K^{(s)} (μ)$ that is non-negative and symmetric within the interval $[- 1, + 1]$ with $\int_{- \infty}^{\infty} K^{(s)} (μ) d u = \int_{- 1}^{1} K^{(s)} (μ) d u = 1$ . The centered bandwidth-specific kernel is

K_{λ}^{(s)} (u | u_{0}) = {\begin{array}{ll} \frac{1}{\int_{- λ}^{λ} K^{(s)} (\frac{v}{λ}) d v} K^{(s)} (\frac{u - u_{0}}{λ}) & u \in [u_{0} - λ, u_{0} + λ] \\ 0 & otherwise \end{array}

where

u_{0}

is the center point and

λ > 0

is the bandwidth. We will use four standard kernels, the boxcar (box), triangle (trg), Epanechnikov (Epan), and triweight (triwt).

We can extend a univariate kernel for use in two isotropic spatial dimensions by expressing its formula as a function of the Euclidean distance between any given point $u_{1} = (x_{1}, y_{1}) \in R^{2}$ from a target point $u_{0} = (x_{0}, y_{0}), i.e., ‖ u_{1} - u_{0} ‖ = \sqrt{{(x_{1} - x_{0})}^{2} + {(y_{1} - y_{0})}^{2}}$ . Bivariate kernels have support over the region $⊙_{u_{0}}^{λ} = {u \in R^{2} : ‖ u - u_{0} ‖ \leq λ}$ , that is, a circle of radius $λ$ centered at $u_{0}$ . Hence

K_{λ}^{(s)} (u | u_{0}) = {\begin{array}{ll} \frac{1}{{\int \int}_{⊙_{u_{0}}^{λ}} K^{(s)} (\frac{‖ (x, y) - u_{0} ‖}{λ}) d x d y} K^{(s)} (\frac{‖ u - u_{0} ‖}{λ}) & u \in ⊙_{u_{0}}^{λ} \\ 0 & otherwise \end{array}

3. Kernel function analysis

3.1 Kernel filtration

Our kernel function analysis is motivated by the classic domain filtering approach of image processing²⁷ which itself is motivated by classic kernel methods in statistics.^20,28,29 For images, bivariate kernels are used to construct mean or low-pass “blurring” filters for purposes of “de-noising” the observed pixel values and/or reducing the impact of high-frequency signals.^30,31 We apply bivariate kernels to Lexis diagrams for the same purposes. A standard kernel function analysis²⁹ is described in Algorithm 1.

Algorithm 1: kernel function analysis of rates on a Lexis diagram:

Choose a kernel shape box, trg, Epan, or triwt. Choose a bandwidth $λ_{a p}$ for each cell.

Repeat steps 3–6 for each cell.

Superimpose a masking circle of radius $λ_{a p}$ over the cell's central value $(a_{*} (a), p_{*} (p))$ .

Calculate kernel weights for all cells whose mid-points are covered by the masking circle.

Divide each weight in step 4 by their sum (normalization).

Use the normalized weights to calculate a weighted average of the corresponding $y_{a p}$ values within the masking circle.

We will refer to the vector of weighted averages as $y_{F}$ ( $F$ for filter, see below), and the corresponding matrix as ${(Y_{A \times P})}_{F} \equiv Y_{F}$ . These calculations can be described by a linear operator or filter that can be represented by a sparse matrix $K_{F}$ such that $y_{F} = K_{F} y$ . Given an estimate of the over-dispersion parameter ${\tilde{ϕ}}^{2}$ , it follows that $\hat{Var} (y_{F}) = {\hat{V}}_{F}^{y} = {\tilde{ϕ}}^{2} K_{F} Σ_{y} {K_{F}}^{'}$ .

Table 1.
Cancer panel.^a

Population and tumor Observed cell counts Simulation parameters

No. Sex^b Race/ethnicity^c Site^a Age groups Min (No. 0) Mean Max Sum Δ ϕ²

1 F NHW ER− breast 35–84 12.9 62.2 143 84,013 1 1

2 F NHW ER + breast 35–84 33.7 292 649 394,190 1 1.5

3 F NHB ER− breast 35–84 0 (6) 15.7 48 21,234 1 1

4 F NHB ER + breast 35–84 3 32.9 85 44,476 1 1

5 F NHW Ovary 35–84 2 34.2 73 46,112 1 1.5

6 F HIS Ovary 35–84 0 (30) 5.5 17 7443 1 1.5

7 F NHW Corpus uteri 35–84 4 72.7 198 98,112 1 2

8 F HIS Rectum 35–84 0 (155) 3.1 16 4147 2 1

9 F NHW Thyroid 35–84 1 31.1 94 42,044 1 2

10 F All Gallbladder 45–84 0 (6) 12 37 14,268 1 1

11 M HIS Colon 40–84 0 (9) 11.2 35 13,604 1 1

12 M NHW Colon 35–84 0 (2) 75.6 212 102,017 1 1

13 M NHW Esophagus 45–84 0 (4) 15.1 44 16,267 2 2

14 M All Kidney 35–84 2 62.6 184 84,526 1 1

15 M NHB Kidney 35–84 0 (53) 6.9 30 9289 1 1

16 M API Liver 40–84 0 (13) 9 30 10,930 1 1

17 M NHB Myeloma 45–84 0 (36) 4.6 17 5020 2 2

18 M API Pancreas 45–84 0 (61) 4.6 20 4963 2 1

19 M NHW Rectum 35–84 0 (8) 25.8 66 34,790 1 2

20 M NHW Stomach 45–84 0 (3) 11.3 32 12,188 2 2

Population and tumor	Observed cell counts	Simulation parameters
1	F	NHW	ER− breast	35–84	12.9	62.2	143	84,013	1	1
2	F	NHW	ER + breast	35–84	33.7	292	649	394,190	1	1.5
3	F	NHB	ER− breast	35–84	0 (6)	15.7	48	21,234	1	1
4	F	NHB	ER + breast	35–84	3	32.9	85	44,476	1	1
5	F	NHW	Ovary	35–84	2	34.2	73	46,112	1	1.5
6	F	HIS	Ovary	35–84	0 (30)	5.5	17	7443	1	1.5
7	F	NHW	Corpus uteri	35–84	4	72.7	198	98,112	1	2
8	F	HIS	Rectum	35–84	0 (155)	3.1	16	4147	2	1
9	F	NHW	Thyroid	35–84	1	31.1	94	42,044	1	2
10	F	All	Gallbladder	45–84	0 (6)	12	37	14,268	1	1
11	M	HIS	Colon	40–84	0 (9)	11.2	35	13,604	1	1
12	M	NHW	Colon	35–84	0 (2)	75.6	212	102,017	1	1
13	M	NHW	Esophagus	45–84	0 (4)	15.1	44	16,267	2	2
14	M	All	Kidney	35–84	2	62.6	184	84,526	1	1
15	M	NHB	Kidney	35–84	0 (53)	6.9	30	9289	1	1
16	M	API	Liver	40–84	0 (13)	9	30	10,930	1	1
17	M	NHB	Myeloma	45–84	0 (36)	4.6	17	5020	2	2
18	M	API	Pancreas	45–84	0 (61)	4.6	20	4963	2	1
19	M	NHW	Rectum	35–84	0 (8)	25.8	66	34,790	1	2
20	M	NHW	Stomach	45–84	0 (3)	11.3	32	12,188	2	2

See Supplemental Part A for details.

Female (F), male (M).

Non-Hispanic white (NHW), non-Hispanic black (NHB), Hispanic (HIS), Asian and Pacific Islander (API), all races combined (all).

The evaluation grid is discrete, while the possible bandwidths $λ$ are continuous. We will select values of $λ$ based on discrete odd integer point values $k = 3, 5, 7, 9, \dots$ with corresponding bandwidths $λ = \frac{k + 1}{2} = 2, 3, 4, 5, \dots$ etc. Our approach allows us to apply a different bandwidth depending on cell coordinates. We will use kernels defined by the size of the target Lexis, the kernel shape, and a chosen point-value $k_{2}$ for the outer two annuli and $k_{1}$ for all other cells. Hence, we can extend the generic notation $K_{F}$ using the specific notation: $K_{k_{2} - k_{1} - s h a p e}$ , or $K_{k_{2} - k_{1} - s h a p e}^{A \times P}$ when we want to include information about the size of the target Lexis. In what follows we consider the simplest possible kernel, $K_{3 - 3 - b o x}^{A \times P}$ , as a benchmark.

Surprisingly, the bivariate kernels used in Algorithm 1 are invertible or nearly so. It follows that little or no information is discarded by these kernels. Their application allows “smooth” signals to pass through almost unchanged, while at the same time, complex variable signals are downweighted. Although the kernel operators are sparse, their inverses are dense.

We these results in hand we can present our “rule-of-thumb” over-dispersion estimator

{\tilde{ϕ}}^{2} \equiv \frac{{([I_{A P} - 2 K_{3 - 3 - b o x}^{A \times P} + {(K_{3 - 3 - b o x}^{A \times P})}^{2}] y)}^{'} Σ_{y}^{- 1} ([I_{A P} - 2 K_{3 - 3 - b o x}^{A \times P} + {(K_{3 - 3 - b o x}^{A \times P})}^{2}] y)}{A P - 1}

See Supplemental Part C for details.

3.2 Adaptive kernel filtration

We hypothesized that we could regularize the output of a filter $K_{F}$ by applying additional smoothing. Intuitively, the kernel smoothers blunt local peaks and fill in local valleys in the rate matrix. The extent to which this happens is greater or lesser if the peaks or valleys cover a smaller or larger domain relative to the kernel's bandwidth. This behavior of $K_{F}$ is entirely characterized by its singular values decomposition (SVD). As illustrated in Supplemental Part D, the singular vectors of $K_{F}$ represent canonical patterns of peaks and valleys that increase in complexity as the magnitude of the corresponding singular values decreases in magnitude. This led us to consider a truncated singular values decomposition approach.³² A key question is how many singular vectors to discard. We posited that information theory would provide a good answer.²¹ By combining these ideas, we developed the adaptive filter described below.

Algorithm 2: Adaptive kernel filtration of rates on a Lexis diagram:

Choose a kernel $K_{F}$

Construct the SVD of $K_{F} = U L V^{'} = U diag (σ_{1}, σ_{2}, \dots, σ_{j}, \dots, σ_{A P}) V^{'}$

Center the data using the inverse-variance weighted average,

y_{0} = y - \bar{y} 1_{A P}, \bar{y} = y^{'} Σ_{y}^{- 1} y / 1_{A P}^{'} Σ_{y}^{- 1} 1_{A P}

. Centering the data stabilizes calculations in the subsequent steps. Calculate the rule-of-thumb over-dispersion estimator

{\tilde{ϕ}}^{2}

3.
Consider the sequence of truncated kernels $K_{F : s} = U diag (σ_{1}, σ_{2}, \dots, σ_{s}, 0, 0, \dots, 0) V^{'}$
4.
For each value of $s$ :
Consider s to be the effective degrees of freedom or $e d f$

Calculate the residual vector $r_{0} (s) = y_{0} - K_{F : s} y_{0}$

Calculate the bias-corrected Akaike information criterion statistic
$A I C_{c}^{F} (s) = {\tilde{ϕ}}^{2} r_{0} {(s)}^{'} Σ_{y}^{- 1} r_{0} (s) + 2 s + \frac{2 s^{2} + 2 s}{A P - s - 1}, s = 1, 2, \dots, A P - 2$
5.
The best-fit model uses $K_{F : a d p}$ , where $a d p = argmi n_{s} A I C_{c}^{F} (s)$ .
6.
The fitted values are $y_{F : a d p} = K_{F : a d p} y$ and the estimated variance–covariance matrix conditional on the selected model is $\hat{Var} (y_{F : a d p}) = {\hat{V}}_{F : a d p}^{y} = {\tilde{ϕ}}^{2} K_{F : a d p} Σ_{y}^{- 1} {K_{F : a d p}}^{'}$ .

Use of the bias-corrected penalty term is essential to avoid over-fitting,²¹ especially for “small” Lexis diagrams with a relatively small total number of cells. The bias-corrected AIC serves as a compromise between the classic uncorrected AIC, which overfits when the number of unknown terms is large relative to the amount of data, and the classic BIC, which is known to be overly conservative producing poorly fitting results.²¹
3.3 Model averaging

We now have three versions of the data: the observed or “raw” data $y_{0}$ , the filtered data $y_{F}$ based on Algorithm 1, which we will also refer to as $y_{F : a l l - i n}$ because all of the kernel's singular vectors contribute to the filtration, and the adaptive filtration $y_{F : a d p}$ based on Algorithm 2. We obtain our penultimate version by applying model averaging as reviewed by Burnham and Anderson.²¹

Algorithm 3: Model averaging:

Select a panel of $G \geq 1$ kernels $K_{F_{1}}, K_{F_{2}}, \dots, K_{F_{G}}$ .

Apply Algorithm 2 to identify optimal truncation values $a d p_{1} for\; K_{F_{1},} a d p_{2} for K_{F_{2}}, \dots, a d p_{G} \; for K_{F_{G}}$ .

Identify the optimal kernel $g_{0} = argmi n_{g \in {1, 2, \dots, G}} A I C_{c}^{F_{g}} (a d p_{g})$ . Discard all kernels with $Δ^{F_{g}} \equiv A I C_{c}^{F_{g}} (a d p_{g}) - A I C_{c}^{F_{g_{0}}} (a d p_{g_{0}}) > t o l$ . By default, use $t o l = 7$ . Values of $t o l$ between 7 and 10 have a theoretical justification.²¹ Label the H remaining kernels $K_{F_{1}}, K_{F_{2}}, \dots, K_{F_{H}}$ , where $1 \leq H \leq G$ .

For each kernel identified in step 3, identify all cut-off values c whose $A I C_{c}^{F_{h}} (c)$ values are within $t o l$ of the global minimum attained by model $g_{0}$ , that is, the sets $C_{h} = {c \in 1, \dots, A P : Δ^{F_{h}} (c) = A I C_{c}^{F_{h}} (c) - A I C_{c}^{F_{g_{0}}} (a d p_{g_{0}}) \leq t o l}, h = 1, \dots, H$ . Let $n_{h} = | C_{h} |$ and $M = \sum_{h = 1}^{H} n_{h}$ . Denote the complete set of “acceptable” models, which includes $n_{1}$ members based on $K_{F_{1}}$ , $K_{F_{1} : C_{1} {1}}, \dots, K_{F_{1} : C_{1} {n_{1}}}, n_{2}$ members based on $K_{F_{2}}$ , $K_{F_{2} : C_{2} {1}}, \dots, K_{F_{2} : C_{2} {n_{2}}}$ , etc., as $K_{F_{m}}, m = 1, \dots, M$ . Assemble the corresponding $A I C_{c}$ differentials $Δ^{F_{m}}$ and filtrations $y_{F_{m}} = K_{F_{m}} y$ .

Calculate Akaike weights

w_{m} = \frac{exp (- \frac{Δ^{F_{m}}}{2})}{\sum_{k = 1}^{M} exp (- \frac{Δ^{F_{k}}}{2})}, m = 1, \dots, M

Calculate the truncated model average filtration

y_{t m a} = \sum_{m = 1}^{M} w_{m} y_{F_{m}}

Calculate the unconditional (model-averaged) variance-covariance matrix

\hat{Var} (y_{t m a}) = {\hat{V}}_{t m a}^{y} = {\tilde{ϕ}}^{2} \sum_{m = 1}^{M} w_{m} (K_{F_{m}} Σ_{y} {K_{F_{m}}}^{'}) + \sum_{m = 1}^{M} w_{m} (y_{F_{m}} - y_{t m a}) {(y_{F_{m}} - y_{t m a})}^{'}

There is no penalty other than computation time for using a comprehensive panel of kernels. We used 48 kernels in our case study and simulations (Supplemental Part B).

3.4. Variance calculations

Each algorithm produces a variance-covariance matrix that allows us to set confidence limits for any given cell, or for any set of linear combinations of cells. Let $F$ denote a filtration; formally, use $F = 0$ to denote the observed data. Then $y_{F}$ is the filtered data, and ${\hat{V}}_{F}^{y}$ is the corresponding variance-covariance matrix ( ${\hat{V}}_{F}^{y} = {\hat{V}}_{F}^{y}$ for Algorithm 1, ${\hat{V}}_{F}^{y} = {\hat{V}}_{F : a d p}^{y}$ for Algorithm 2, and ${\hat{V}}_{F}^{y} = {\hat{V}}_{t m a}^{y}$ for Algorithm 3). Matrices ${\hat{V}}_{F}^{y}$ and ${\hat{V}}_{F : a d p}^{y}$ are conditional variances, that is, conditional on the selected kernel, but ${\hat{V}}_{t m a}^{y}$ is unconditional, that is, it incorporates uncertainty arising from the model selection process.

Given C linear combinations of $y_{F}$ encoded in contrast matrix $O_{C \times A P}$ , the vector of outputs is $z_{F} = O y_{F}$ and ${\hat{V a r}}_{z_{F}} = {\hat{V}}_{F}^{z} = O {\hat{V}}_{F}^{y} O^{'}$ .

To analyze log-transformed rates $ν_{F} = log (y_{F})$ , the delta-method variance-covariance matrix is $\hat{Var} (ν_{F}) = {\hat{V}}_{F}^{ν} = {\hat{V}}_{F}^{y} \circ {(y_{F} {y_{F}}^{'})}^{\circ^{- 1}}$ , where ^∘ denotes the Hadamard (elementwise) product and ${(.)}^{\circ^{p}}$ denotes the elementwise $p$ -th power. Application of a contrast matrix $O^{*}$ to the log rates $ν_{F}$ yields $ζ_{F} = O^{*} ν_{F}$ and ${\hat{V}}_{F}^{ζ} = O^{*} {\hat{V}}_{F}^{ν} {O^{*}}^{'}$ .

4. Results

4.1 Case study: Invasive ER− breast cancer incidence among non-Hispanic white women

Invasive female breast cancer is the most common malignancy among women, and its epidemiology varies by tumor subtype.³³ Two major subtypes are ER-positive (ER+) and ER-negative (ER−) tumors.³³ For reasons that remain unclear, the incidence of ER− breast cancer has been decreasing over time in many populations around the world, and the rate of decrease over time varies by age.^5,34–37 In this case study, we apply our new methods to SEER data on invasive ER− breast cancer incidence among non-Hispanic white women (Table 1, No. 1).

Figure 1(A) presents a heat map of the observed rates per 100,000 women years. There is considerable variability from cell to cell, but the data are not over-dispersed ( ${\tilde{ϕ}}^{2} = {1.06}^{2}$ ). Figure 1(B) presents a canonical plot of log incidence over time within 5-year age groups, and Figure 1(C) plots the cross-sectional log incidence by age within 5-year calendar periods. Incidence appears to be decreasing over time within each age group. However, overlapping values and error bands make it hard to distinguish between many of the curves, and it is difficult to discern in a quantitative sense how fast the rates are changing within and between age groups and calendar periods.

Figure 1.

Estrogen receptor-negative breast cancer incidence among non-Hispanic white women. Raw data (panels A–C), benchmark kernel (panels D – F), and truncated model average (panels G – I). Left panels: Lexis diagram heat maps. Center panels: Rates over time within 5-year age groups. Right panels: Rates by age within 5-year calendar periods. Shaded envelopes show 95% point-wise confidence limits.

Figure 1(D) to (F) repeats these analyses using outputs from the benchmark all-in kernel. Panel 1D presents a heat map based on $y_{3 - 3 - b o x : a l l - i n}$ , and panels 1E-F display log-scale curves obtained from $ν_{3 - 3 - b o x : a l l - i n}$ with confidence bands set using ${\hat{V}}_{3 - 3 - b o x : a l l - i n}^{ν}$ . The patterns are clearer, the curves more easily distinguished, and one can appreciate that age incidence is decreasing over time in every age group, especially among women 45–49 and 50–54 years old. Compared to panels 1B to C, the confidence limits are 55% narrower on average.

Figure 1(G) to (I) repeats these analyses using the filtered data $y_{t m a}$ and $ν_{t m a}$ , with log-scale confidence bands set using ${\hat{V}}_{t m a}^{ν}$ . Algorithm 3 identified $H = 6$ kernels and a total of $M = 31$ acceptable truncation values. The most heavily weighted kernels were $K_{3 - 3 - t r i w t : 51}$ and $K_{5 - 7 - t r i w t : 50}$ (Akaike weights of 0.20 and 0.19, respectively). The patterns are clearest, the curves readily distinguished, and the confidence bands are 71% narrower. The TMA kernel is illustrated in Supplemental Figure D4.

4.2 Case study: Averages, gradients, and trends

Graphs provide insight, but no descriptive study is complete until salient features that may be apparent in graphs are quantified using objective and reproducible statistics. There is latitude regarding the precise feature set^8,10; widely used features obtain from averages, gradients, and trends. We consider five features that together quantify the marginal effects of period and age as well as interactions between period and age.

The marginal period curve $m p c_{F}^{ν}$ and its gradient $\partial m p c_{F}^{ν}$ . The marginal period curve $m p c_{F}^{ν}$ is the log rate averaged over the age group, one value for each calendar period. The gradient $\partial m p c_{F}^{ν}$ is the first difference of $m p c_{F}^{ν}$ divided by the bin width $Δ$ .

The marginal age curve $m a c_{F}^{ν}$ and its gradient $\partial m a c_{F}^{ν}$ . The marginal age curve $m a c_{F}^{ν}$ is the log rate averaged over period, with one value for each age group. The gradient $\partial m a c_{F}^{ν}$ is the first difference of $m a c_{F}^{ν}$ divided by the bin width $Δ$ .

The slope of the age-specific log rates over time, $ℓ_{F}^{π | α}$ , one slope for each age group.

Each feature is a linear function of the log rates; therefore, each can be extracted using a corresponding contrast matrix

O^{*}

whose values depend only on the structure of the corresponding Lexis diagram

L (a_{0}, p_{0}, Δ, A, P)

. See Supplemental Part E for details.

Figure 2(A) to (C) presents these features calculated from the observed data $ν_{0}$ , Figure 2(D) to (F) from $ν_{3 - 3 - b o x : a l l - i n}$ , and Figure 2(G) to (I) from $ν_{t m a}$ . Looking down the columns, the estimates become increasingly regular and the confidence limits narrower. The marginal period curve based on model averaging ( $m p c_{t m a}^{ν}$ , Figure 2(G), left axis) decreases over time, by around 2% per year through 2004 ( $\partial m p c_{t m a}^{ν}$ , Figure 2(G), right axis), and by around 3% per year circa 2005–2013. Subsequently, the gradient approaches 0 as the curve approaches a plateau.

Figure 2.

Breast cancer averages, gradients, and trends. Features extracted from the data are shown in Figure 1. Raw data (panels A–C), benchmark kernel (panels D – F), and truncated model average (panels G – I). Left panels: Marginal period curve (left axis) and gradient (right axis). Center panels: Marginal age curve (left axis) and gradient (right axis). Right panels: Age-specific period trends. Gradient estimates in the left and center panels are trimmed to exclude the first and last time points. Shaded envelopes show pointwise 95% confidence limits.

The marginal age curve ( $m a c_{t m a}^{ν}$ , Figure 2(H), left axis) increases continuously until circa age 72 at a decreasing rate ( $\partial m a c_{t m a}^{ν}$ , Figure 2(H), right axis). Subsequently, the curve attains a plateau and then slowly decreases.

The incidence decreases over time in every age group at a rate that varies by age ( $ℓ_{t m a}^{π | α}$ , Figure 2(I)). The greatest decreases of around 3.25% per year occurred circa ages 45–54, consistent with the curves shown in Figure 1(H).

4.3 Simulation studies

We assessed the operating characteristics of our algorithms by simulating the 20 cancers in Table 1. See Supplemental Part F for details. In five scenarios, we blocked the data into $2 \times 2$ cells, and in nine, we incorporated over-dispersion ( $ϕ^{2} > 1$ ). We considered three approaches: raw data; benchmark all-in kernel (Algorithm 1 with $K_{3 - 3 - b o x}^{A \times P}$ ); and truncated model average (TMA; Algorithm 3 with 48 candidates). We simulated 500 Lexis diagrams for each scenario. We tracked the performance of benchmark and TMA for heat maps (Figure 1(D) and (G)) and features (Figure 2(D) to (I)). To summarize performance, we calculated the percentage reduction in average root mean squared error ( $R M S E$ ) for benchmark versus raw and for TMA versus raw. We also examined bias and variance per cell (heat maps) and per abscissa value (features).

In every scenario, TMA produced more accurate heat maps than the benchmark kernel (Figure 3(A)). Compared to analyzing the raw data, the benchmark kernel reduced the mean percentage error by 58% on average over the 20 scenarios, versus 74% for TMA.

Figure 3.

Arrow plots of simulation results. Rows correspond to 20 cancers summarized in Table 1. Panels correspond to features. Blue circles show a percent reduction for benchmark kernel versus raw, and yellow triangles show a percent reduction for truncated model average (TMA) versus raw.

Performance gains for $m p c$ were comparatively modest (Figure 3(B)). In Scenario 15 (M Kidney NHB), the benchmark kernel performed slightly better. TMA was uniformly superior to the benchmark for estimating the gradient $\partial m p c$ (Figure 3(C); 85% mean reduction versus 68%), and TMA achieved similar gains for $m a c$ and $\partial m a c$ (Figure 3(D) and (E), respectively). TMA was uniformly superior for estimating the slopes $ℓ^{π | α}$ (Figure 3(F); 61% vs. 46%).

How did TMA achieve such gains in performance? The rule-of-thumb over-dispersion estimator, estimated in step 2 and used in step 4 of Algorithm 2, appears adequate (Figure 4(A)). On average over the 20 scenarios, the median value of ${\tilde{ϕ}}^{2}$ (squares) exceeded the true value (triangles) by just 0.043 or 3.1%, and the ninety percent error bars for ${\tilde{ϕ}}^{2}$ varied by just $\pm 0.16$ ( $\pm 11.0 %$ ) around the corresponding median (squares).

Figure 4.

Operating characteristics of truncated model average. (A) Rows correspond to 20 cancers summarized in Table 1. Median (squares) and 90% limits (bars) of the estimated over-dispersion parameter ( ${\tilde{ϕ}}^{2}$ ), versus true values (triangles). (B) Box plots of effective degrees of freedom ( $e d f$ ). (C) Best-fit kernels, frequency of selection across 20 cancers.

In all scenarios, $e d f$ values were one or two orders of magnitude smaller than the maximum possible $e d f$ of $A P - 2$ (Figure 4(B)). Hence, truncated kernels were always superior to any fixed all-in kernel regardless of kernel shape. Small values of $k_{2}$ and $k_{1}$ were selected by TMA a large majority of the time (Figure 4(C)), and all four shapes contributed. Interestingly, amongst the four kernel shapes, the boxcar was selected the least (6.5%) and triweight the most (73%).

5. Discussion

We developed a novel non-parametric approach to regularize (“smooth”) event rates ascertained over Lexis diagrams. Our methods borrow smoothing concepts from time series analysis, that is, k-point moving averages, and classic multivariate kernel methods in statistics²⁹ and image processing, that is, filtering and singular values decomposition. Our approach uses statistical information theory, specifically, the bias-corrected AIC,²¹ to handle the model selection problem and provide a variance–covariance matrix that takes model selection into account.

Our truncated model averaging approach adds to the armamentarium and complements existing non-parametric^12,13,17 and Bayesian¹⁵ methods. The kernel-based methods developed by Duong and Hazelton²⁹ are closest in spirit to our approach. In our approach, we use bias-corrected AIC versus cross-validation, and we use truncated kernels versus full kernels, which, as we show, substantially increases accuracy.

Our approach to selecting effective degrees of freedom to quantify the complexity of the underlying Lexis diagram is similar to approaches used to select basis functions in generalized additive models.³⁸ Furthermore, Gaussian process-based smoothing can be viewed as special case of our new bivariate methods since the Matérn covariance used in previous work³⁹ encompasses both the Exponential and Gaussian kernels, both of which are similar to the triweight kernel. Interestingly, in our simulations, the triweight kernel was by far the most frequently selected kernel shape.

Examination of smoothed Lexis diagrams provides a good overview. Subsequently, scientific conclusions obtain from quantifications of averages, gradients, and trends. As illustrated by our case study, not only does the truncated model averaging approach provide appealingly smoothed heat maps (Figure 1), but corresponding linear combinations of the smoothed values are also substantially more precise (Figure 2).

As shown by our simulation studies, when the cells of the observed Lexis diagrams are statistically independent quasi-Poisson variates, Algorithm 3 (truncated model averaging) is superior to any fixed kernel. Indeed, compared to our benchmark kernel, the truncated model average reduced the intrinsic root mean squared error by 60%–86% depending on the feature.

We focused here on Lexis diagrams, but our methods and software can be applied when the data follow an approximate multivariate normal distribution with a full-rank covariance matrix known up to a scale parameter. Sparse cell counts are an issue in the Poisson case. We implemented our methods using the Normal approximation to the Poisson distribution, which worked well in all scenarios considered (Table 1). If single-year data are sparse (numerous zeros) one can increase the bin width. Future research might consider incorporating a standard or zero-inflated Poisson log-likelihood function. More advanced kernels could also be investigated, for example, steering kernels.²⁷ Indeed, our model averaging approach could be extended to include estimates obtained using complementary methods such as steering kernels or splines.

In cancer surveillance research few studies examine a single Lexis diagram in isolation. Rather, hypotheses are generated by examining related Lexis diagrams defined by sex, race/ethnicity, geographic region, tumor characteristics, etc. Invariably the amount of information per Lexis diminishes with increasing stratification. As demonstrated here, our new truncated model averaging approach can advance such descriptive studies.

In future work, truncated model averaging might be extended to incorporate as covariates the effects of sex, race/ethnicity, etc. For example, one could start with a joint parametric fit, for example, a proportional hazards age-period-cohort model,⁴⁰ then apply truncated model averaging to the residuals to characterize any lack-of-fit.

Supplemental Material

sj-docx-1-smm-10.1177_09622802231192950 - Supplemental material for Smoothing Lexis diagrams using kernel functions: A contemporary approach

Supplemental material, sj-docx-1-smm-10.1177_09622802231192950 for Smoothing Lexis diagrams using kernel functions: A contemporary approach by Philip S Rosenberg, Adalberto Miranda Filho, Julia Elrod, Aryana Arsham, Ana F Best and Pavel Chernyavskiy in Statistical Methods in Medical Research

Footnotes

Acknowledgements

This research was funded by the Intramural Research Program of the National Cancer Institute, Division of Cancer Epidemiology and Genetics. AMF is also supported through an appointment to the National Cancer Institute (NCI) ORISE Research Participation Program under DOE contract number DE-SC0014664.

Data accessibility

Our freely available R code and example data are available upon request.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the ORISE Research Participation Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute (grant number DOE contract DE-SC0014664, Intramural Research Program).

ORCID iD

Philip S Rosenberg

Supplemental material

Supplemental material for this article is available online.

References

Keiding

. Statistical-inference in the Lexis diagram. Philos T Roy Soc A 1990; 332: 487–509.

Carstensen

. Age-period-cohort models for the Lexis diagram. Stat Med 2007; 26: 3018–3045.

Breslow

Day

. Statistical Methods in Cancer Research, Volume 2, The Design and Analysis of Cohort Studies. Oxford: International Agency for Research on Cancer, 1987.

Froelicher

Forjaz

Rosenberg

, et al. Geographic disparities of breast cancer incidence in Portugal at the district level: a spatial age-period-cohort analysis, 1998-2011. Cancer Epidemiol 2021; 74: 102009.

Lynn

BCD

Chernyavskiy

Gierach

, et al. Decreasing incidence of estrogen receptor-negative breast cancer in the United States: trends by race and region. J Natl Cancer Inst 2022; 114: 263–270. DOI: 10.1093/jnci/djab186

Chernyavskiy

Little

Rosenberg

. Spatially varying age-period-cohort analysis with application to US mortality, 2002–2016. Biostatistics 2020; 21: 845–859.

National Cancer Institute SEER*Stat software, Version 8.4.0. Surveillance Research Program, 2022.

Robertson

Boyle

. Age-period-cohort models of chronic disease rates. II: graphical approaches. Stat Med 1998; 17: 1325–1340.

Devesa

Donaldson

Fears

. Graphical presentation of trends in rates. Am J Epidemiol 1995; 141: 300–304.

10.

Fay

Tiwari

Feuer

, et al. Estimating average annual percent change for disease rates without assuming constant change. Biometrics 2006; 62: 847–854.

11.

Currie

Durban

Eilers

PHC

. Smoothing and forecasting mortality rates. Stat Model 2004; 4: 279–298.

12.

Camarda

. Mortalitysmooth: an R package for smoothing Poisson counts with P-splines. J Stat Softw 2012; 50: 1–24.

13.

Camarda

. Smooth constrained mortality forecasting. Demogr Res 2019; 41: 1091–1130.

14.

Dokumentov

Hyndman

Tickle

. Bivariate smoothing of mortality surfaces with cohort and period ridges. Stat-Us 2018; 7: e199.

15.

Chien

Hsiung

, et al. Smoothed Lexis diagrams with applications to lung and breast cancer trends in Taiwan. J Am Stat Assoc 2015; 110: 1000–1012.

16.

Chien

Tseng

Chen

, et al. Comparison of annual percentage change in breast cancer incidence rate between Taiwan and the United States: a smoothed Lexis diagram approach. Cancer Med-Us 2017; 6: 1762–1775.

17.

Martinez-Hernandez

Genton

. Nonparametric trend estimation in functional time series with application to annual mortality rates. Biometrics 2021; 77: 866–878.

18.

Hastie

Tibshirani

Friedman

. The elements of statistical learning: data mining, inference, and prediction. Stanford, CA: Standford University, 2002.

19.

Cleveland

. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 1979; 74: 829–836.

20.

Ramlauhansen

. Smoothing counting process intensities by means of kernel functions. Ann Stat 1983; 11: 453–466.

21.

Burnham

Anderson

. Multimodel inference—understanding AIC and BIC in model selection. Sociol Method Res 2004; 33: 261–304.

22.

Rosenberg

. A new age-period-cohort model for cancer surveillance research. Stat Methods Med Res 2019; 28: 3363–3391.

23.

Institute

. Surveillance, Epidemiology, and End Results (SEER 13, Plus) Program Populations (1992-2018). (www.seer.cancer.gov/popdata), National Cancer Institute, DCCPS, Surveillance Research Program, released February 2022. February 2022S ed.: National Cancer Institute, 2022.

24.

Sung

Siegel

Rosenberg

, et al. Emerging cancer trends among young adults in the USA: analysis of a population-based cancer registry. Lancet Public Health 2019; 4: E137–E147.

25.

Anderson

Katki

Rosenberg

. Incidence of breast cancer in the United States: current and future trends. J Natl Cancer Inst 2011; 103: 1397–1402.

26.

Howlader

Noone

, et al. Use of imputed population-based cancer registry data as a method of accounting for missing information: application to estrogen receptor status for breast cancer. Am J Epidemiol 2012; 176: 347–356.

27.

Milanfar

. A tour of modern image filtering. IEEE Signal Proc Mag 2013; 30: 106–128.

28.

Wand

Jones

. Comparison of smoothing parameterizations in bivariate kernel density-estimation. J Am Stat Assoc 1993; 88: 520–528.

29.

Duong

Hazelton

. Cross-validation bandwidth matrices for multivariate kernel density estimation. Scand J Stat 2005; 32: 485–506.

30.

Takeda

Farsiu

Milanfar

. Kernel regression for image processing and reconstruction. IEEE Trans Image Process 2007; 16: 349–366.

31.

Takeda

Farsiu

Milanfar

. Regularized kernel regression for image deblurring. Conf Rec Asilomar C 2006: 1914–191+. DOI: 10.1109/Acssc.2006.355096

32.

Hansen

. The truncated SVD as a method for regularization. BIT 1987; 27: 534–553.

33.

Anderson

Rosenberg

Prat

, et al.

How many etiological subtypes of breast cancer: two, three, four, or more?

J Natl Cancer Inst 2014; 106: dju165. DOI: 10.1093/jnci/dju165

34.

Tuan

Lynn

BCD

Chernyavskiy

, et al. Breast cancer incidence trends by estrogen receptor status among Asian American ethnic groups, 1990–2014. Jnci Cancer Spect 2020; 4: pkaa005. DOI: 005.10.1093/jncics/pkaa005

35.

Mullooly

Murphy

Gierach

, et al. Divergent oestrogen receptor-specific breast cancer trends in Ireland (2004-2013): amassing data from independent Western populations provide etiologic clues. Eur J Cancer 2017; 86: 326–333.

36.

Rosenberg

Barker

Anderson

. Estrogen receptor status and the future burden of invasive and in situ breast cancers in the United States. J Natl Cancer Inst 2015; 107: djv159. DOI: 10.1093/jnci/djv159

37.

Anderson

Rosenberg

Petito

, et al. Divergent estrogen receptor-positive and -negative breast cancer trends and etiologic heterogeneity in Denmark. Int J Cancer 2013; 133: 2201–2206.

38.

Wood

. Generalized additive models: an introduction with R. Boca Raton, FL: CRC Press, 2017.

39.

Chernyavskiy

Little

Rosenberg

. Correlated Poisson models for age-period-cohort analysis. Stat Med 2018; 37: 405–424.

40.

Rosenberg

Anderson

. Proportional hazards models and age-period-cohort analysis of cancer rates. Stat Med 2010; 29: 1228–1238.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.36 MB

Population and tumor					Observed cell counts				Simulation parameters
No.	Sex^b	Race/ethnicity^c	Site^a	Age groups	Min (No. 0)	Mean	Max	Sum	Δ	ϕ²
1	F	NHW	ER− breast	35–84	12.9	62.2	143	84,013	1	1
2	F	NHW	ER + breast	35–84	33.7	292	649	394,190	1	1.5
3	F	NHB	ER− breast	35–84	0 (6)	15.7	48	21,234	1	1
4	F	NHB	ER + breast	35–84	3	32.9	85	44,476	1	1
5	F	NHW	Ovary	35–84	2	34.2	73	46,112	1	1.5
6	F	HIS	Ovary	35–84	0 (30)	5.5	17	7443	1	1.5
7	F	NHW	Corpus uteri	35–84	4	72.7	198	98,112	1	2
8	F	HIS	Rectum	35–84	0 (155)	3.1	16	4147	2	1
9	F	NHW	Thyroid	35–84	1	31.1	94	42,044	1	2
10	F	All	Gallbladder	45–84	0 (6)	12	37	14,268	1	1
11	M	HIS	Colon	40–84	0 (9)	11.2	35	13,604	1	1
12	M	NHW	Colon	35–84	0 (2)	75.6	212	102,017	1	1
13	M	NHW	Esophagus	45–84	0 (4)	15.1	44	16,267	2	2
14	M	All	Kidney	35–84	2	62.6	184	84,526	1	1
15	M	NHB	Kidney	35–84	0 (53)	6.9	30	9289	1	1
16	M	API	Liver	40–84	0 (13)	9	30	10,930	1	1
17	M	NHB	Myeloma	45–84	0 (36)	4.6	17	5020	2	2
18	M	API	Pancreas	45–84	0 (61)	4.6	20	4963	2	1
19	M	NHW	Rectum	35–84	0 (8)	25.8	66	34,790	1	2
20	M	NHW	Stomach	45–84	0 (3)	11.3	32	12,188	2	2