Abstract
We propose a framework for jointly modelling tumour size at diagnosis and time to distant metastatic spread, from diagnosis, based on latent dynamic sub-models of growth of the primary tumour and of distant metastatic detection. The framework also includes a sub-model for screening sensitivity as a function of latent tumour size. Our approach connects post-diagnosis events to the natural history of cancer and, once refined, may prove useful for evaluating new interventions, such as personalised screening regimes. We evaluate our model-fitting procedure using Monte Carlo simulation, showing that the estimation algorithm can retrieve the correct model parameters, that key patterns in the data can be captured by the model even with misspecification of some structural assumptions, and that, still, with enough data it should be possible to detect strong misspecifications. Furthermore, we fit our model to observational data from an extension of a case-control study of post-menopausal breast cancer in Sweden, providing model-based estimates of the probability of being free from detected distant metastasis as a function of tumour size, mode of detection (of the primary tumour), and screening history. For women with screen-detected cancer and two previous negative screens, the probabilities of being free from detected distant metastases 5 years after detection and removal of the primary tumour are 0.97, 0.89 and 0.59 for tumours of diameter 5, 15 and 35 mm, respectively. We also study the probability of having latent/dormant metastases at detection of the primary tumour, estimating that 33% of patients in our study had such metastases.
Keywords
1 Introduction
Data collected through cancer screening trials or via population-based studies carried out in the presence of screening programmes enable the estimation of underlying cancer natural history processes. The natural history of breast cancer has traditionally been modelled using either discrete multi-state (semi-) Markov models or continuous tumour growth models, based on using data on tumour characteristics that are collected at diagnosis. Such models have been used to study, e.g., screening effectiveness and over-diagnosis. 1 The most basic multi-state Markov model, used in this context, is a three-states model (undetectable breast cancer, asymptomatic cancer detectable by screening and symptomatic breast cancer) and another example is the five-states model that sub-divides asymptomatic and symptomatic states by lymph nodes involvement. 2 Markov models can be easily extended, but model complexity grows quickly. These models have other limitations too, which have been discussed elsewhere. 2 Biologically-inspired (continuous) tumour growth models are being used increasingly often as an alternative to multi-state Markov models, due to their flexibility and parsimony.3–10These models combine a growth model of breast cancer with a continuous function of screening sensitivity (depending on e.g. tumour size or mammographic density) and accommodate random effects to account for individual variations in tumour growth. From fitting such models to observational data it has been shown that Markov models may assume too little individual variation in cancer progression. 6 These types of continuous growth models have so far been developed only to model data collected up until the time of diagnosis. In this paper, we explore an approach that models events occurring after diagnosis but incorporates the tumour’s natural history. Here, we focus specifically on distant metastatic spread: we describe a framework for modelling tumour characteristics at diagnosis and time to distant metastasis (counted from diagnosis), which is based on specifying a dynamic process of metastatic spread from the onset of the primary tumour. The framework we describe could potentially be extended to other events (such as tumour relapse or death) and can be useful, e.g., for studying personalised screening and treatment effects.
The (metastatic) process of cancer cells disseminating from the primary tumour to a distant organ is complex.11,12Arvelo et al. 13 divide the pathogenesis of metastasis into 10 sequential steps. In the early steps, tumour cells detach from the primary tumour and travel through the vascular system; experimental data show that this first stage is very efficient.11,12 For cancer cells to extravasate to a distant organ and to establish metastases they need to then evade multiple mechanisms which are designed to prevent the formation of metastases. Few cells (compared to the large number of cells that shed from the primary tumour) manage to successfully form a metastasis.14,11 From observational data, it is not possible to distinguish these two stages; in the current paper, we model successful metastases directly. Finally, in the third stage, newly formed metastases grow until they become clinically detectable.
Over the years, several publications have reported that around 90% of cancer deaths can be attributed to metastatic spread of the primary tumour15,16; for some cancers (e.g. brain, heart, liver, airways) local tumours can affect vital organs. In breast cancer, all death will, in essence, be attributable to distant metastases. Patients with breast cancer that initially present with distant metastases are diagnosed with Stage IV disease, which is nearly always incurable. It has also been estimated that among women without detectable metastases at the time of diagnosis (and surgery), and initially diagnosed with early-stage disease, around 30% will ultimately develop metastases months or even years later. 17 Some of the most common sites of distant metastatic involvement are lung, bone, liver and pleura. 18 Refined surgical techniques and primary local and systemic therapies are all aimed at preventing metastatic spread and growth. 19 However, metastatic cells become selected, experience high mutation rates and become resistant to treatments. The growth of these (therapy-resistant) metastases colonies is responsible for the vast majority of cancer-related deaths. Hosseini et al. 20 have estimated, based on molecular studies, that at least 80% of breast cancer metastases are derived from early disseminated cells (well before diagnosis of the primary tumour) and this had been offered as a reason why early detection and treatment often fail to prevent cancer deaths. A better understanding of the metastatic process is fundamental to understanding approaches aimed at reducing cancer mortality.
This paper is organised as follows. In Section 2, we first describe the modelling framework for tumour growth (upon which we build); we then extend it to incorporate the distant metastatic spread process. A model in the simple setting of no screening is described in Section 2.3, which is then extended to incorporate data collected in the presence of screening (Section 2.4). We show via Monte Carlo simulation that our estimation procedure retrieves the correct model parameters (Section 3). In Section 4, we illustrate our methodology using data from a Swedish population-based case-control study of postmenopausal breast cancer. In Section 5, we assess the robustness of the model (in the absence of screening) to violation of some of the structural assumptions. Finally, we conclude the manuscript with a discussion (Section 6).
2 Methods
The model developed in this paper is aimed at being used on breast cancer incident cases, collected in a population of women in which screening is offered. Before dealing with screening, we start by introducing the model in the context of no screening.
2.1 Modelling tumour growth
The growth of the primary tumour is assumed to follow an exponential function such that, for a tumour growing according to an inverse growth rate
Individual variation in growth rates is accounted for by assuming that inverse growth rates follow a Gamma distribution with shape
We assume that tumours are detectable with non-zero probability from a given volume
For an unscreened population, the density for tumour volume at symptomatic detection can, from the above assumptions, be calculated as follows:
2.2 Modelling the metastatic process in the absence of screening
The model for distant metastatic spread is inspired by previous models on localised spread to the lymph nodes. 10 We will first formulate a model for successful distant metastatic seeding; then, we will formulate a model for time to detectability of each metastasis given detection of the primary tumour. A successful seeding can also be thought of here as a colony that will not be eradicated by treatment at diagnosis. The time to detectability of distant metastases will be related to the growth rate of the primary tumour.
2.2.1 A model for successful distant metastatic seeding
The model for successful distant metastatic seeding assumes that the metastatic process is proportional to the average number of mutations in the cancer cells and the rate of cancer cell division. We use this model form as it has previously been shown to fit well to observed data on the lymph nodes spread. We note however that our methodology can be easily adapted to other models of spread, e.g. one which specifies the rate of spread to be proportional to the current volume of the primary tumour; it can be seen however that such a model would provide a poorer fit to empirical data (than the one we use), as we will later discuss.
Under the model considered here, assuming a constant rate of mutation during cell division, the average number of mutations is proportional to the number of times the tumour cells have divided. Specifically, metastatic seeding is assumed to follow an inhomogeneous Poisson process with intensity function
The number of times that cancer cells have divided can be calculated from the following relation:
2.2.2 A model for time to first detected distant metastasis
A model for distant metastatic spread needs to be stated in terms of observable quantities (seedings are not observable), such as the presence of distant metastases at diagnosis and the times of distant metastases from diagnosis of the primary tumour. We therefore define a random variable
The time from metastatic seeding to detection of a given metastasis takes the individual-specific time We assume that breast cancers are only diagnosed through the primary tumour, and not by the clinical manifestation of metastases. We make an exception to the above for metastases growing to a detectable size before the diagnosis of the primary tumour: these metastases will be visible at diagnosis of the primary tumour, but not detected beforehand.
For an individual with a tumour that reaches size

Illustration of key time points for the proposed modelling framework, including tumour volumes at each relevant point in time. Two distinct time scales are included to illustrate the models introduced in Sections 2.2.1 and 2.2.2.
Equations (13) to (15) are defined for
We now introduce additional assumptions: we assume that the metastatic seeding completely stops at diagnosis of the primary tumour and that already seeded, successful colonies are not affected. These assumptions constrain the density and hazard functions – (13) and (15) – to be null for values of
A summary of the model components, parameters and assumptions for the joint model of tumour growth, detection and distant metastatic spread.
Detection of the primary tumour is assumed to be independent of distant metastases: cancers are always detected via the primary tumour.
2.3 Likelihood function in the absence of screening
We describe the joint likelihood of tumour size and time to first distant metastasis, for a sample of incident cases from a stable disease population. 9 Three types of observations contribute to the likelihood function:
Observed events, i.e. metastases that are diagnosed after diagnosis of the primary tumour ( Left censored observations, i.e. individuals with detected metastases at the time of diagnosis ( Right censored observations, i.e. individuals that do not develop metastases before the end of follow-up. These observations will contribute with the survival probability introduced in equation (16), that is
Under our modelling assumptions, the density for the inverse growth rates, conditional on volume (which appears in the likelihood contribution for observed events and right-censored observations), can be written
9
as
To calculate the likelihood we need to approximate the integrals appearing in the three quantities described above (which do not have a closed-form); to do this we use numerical integration. Once the full likelihood function is formulated, it can be maximised using any general-purpose optimiser such as
We note again that this likelihood function is valid for samples of incident cases under stable disease assumptions: specifically, that (1) the rate of births in the population, (2) the distribution of age at tumour onset, and (3) the distribution of time to symptomatic detection are constant across calendar years. Stable disease assumptions are discussed in more detail in Isheden and Humphreys. 9
2.4 Likelihood function for a screened population
So far, we have described how to estimate our model for samples of cases that are detected symptomatically in the absence of screening: we now extend our approach to allow the analysis of data from incident cases collected in the presence of screening. In Sweden, where our study is based (Section 4), mammography screening was introduced in 1974, and nationwide coverage was achieved in 1997. Nowadays, all women in Sweden between the ages of 40 and 74 years old are called to screening every 2 years (or one and a half years, depending on the country of residence).
First, we specify a model for screening sensitivity. We assume that the probability of a positive screening test, given the presence of a tumour in the breast, follows a logistic function:
The contribution to the likelihood function is formulated as a conditional joint probability: the joint probability of tumour size and the distant metastatic outcome, given mode of detection (screen-detected versus symptomatic) and the history of previous (negative) screens. We rely on stable disease assumptions to formulate the likelihood, and we use a procedure (and notation) similar to that described in Isheden and Humphreys that was developed for modelling tumour size with data collected in the presence of screening.
9
2.4.1 Likelihood function for screen-detected tumours
The likelihood contribution for screen-detected (SD) tumours can be written as
The joint probability of tumour size and metastasis can be re-written as
The screening history term in
For screen-detected individuals with observed events
The likelihood contribution of screen-detected individuals that are left censored (
2.4.2 Likelihood function for tumours detected symptomatically
For symptomatic (SYM) cases with a history of negative screens, the likelihood can be written as
The contributions to the likelihood for individuals that are detected symptomatically are evaluated analogously to (19) to (21), with the first term, i.e. the screening sensitivity
2.4.3 Evaluation of the likelihood
The likelihood functions described in Sections 2.4.2 and 2.4.3 contain integrals that have no closed form. We approximate them numerically using a procedure similar to that described in Section 2.3 in the absence of screening. Once the full likelihood function is formulated, it can, again, be maximised using any general-purpose optimiser with standard errors calculated using, e.g. the observed information matrix at the optimum or the bootstrap.
2.5 Model-based estimation of the probability of distant metastasis
The vast majority of breast cancer patients do not have a detected distant metastasis at diagnosis: for these, we can estimate conditional survival probabilities using the model described in Section 2.4. These model-based predictions still rely on assumptions that were previously outlined in Table 1; the calculations implicitly assume removal of the primary tumour at the time of diagnosis, which fully stops the process of metastatic seeding.
For example, for symptomatically-detected tumours, survival probability predictions at a given time
The denominator is calculated as the probability of having no detected metastases at detection of the primary tumour, which is independent of
3 Simulation study
3.1 Aim
We designed a simulation study aimed at assessing the ability to retrieve the true data-generating parameters of the model for data collected in the presence of screening, i.e. the model described in Section 2.4. In other words, we aim to test that our implementation can correctly estimate parameter values when the model is correctly specified.
3.2 Data-generating mechanisms
We simulated independent datasets, each with 1500 individuals. We assumed that tumours originate from a single spherical cell of volume
We simulated inverse growth rates from a Gamma distribution with shape
We simulated age at onset of breast cancer with age-increasing rates based on a procedure described previously,8–10 which makes use of published breast cancer incidence rates.
22
We enforced a screening programme starting at the age of 40 years old, with screens every 2 years. We did not include death in our simulations; deaths (due to causes other than breast cancer) would introduce minimal biases in estimates of growth rate distributions, see Isheden et al.
10
Tumours were detected via screening with probabilities corresponding to equation (17), assuming
We simulated the timing of the first metastatic seeding using the inversion method, as described by Bender et al.
23
; we assumed
For each simulated cancer patient, we calculated time to metastasis
3.3 Estimands
The estimands of interest are the parameters of the model for tumour growth and metastatic spread in the presence of screening:
3.4 Methods
We fitted our model (described in Section 2.4) to each independently simulated data set. Standard errors of the model parameters were computed by inverting the Hessian matrix at the optimum, which we calculated using Richardson extrapolation as implemented in the R package
3.5 Performance measures and number of replications
We considered the following performance measures:
Bias, for quantifying whether the estimation procedure can retrieve the true data-generating parameters, on average. We also report average and median estimated values of each estimand. Empirical and model-based standard errors, to quantify the precision of the point estimates. Coverage probability, quantifying the probability that a confidence interval contains the true value of each estimand.
Further details on each performance measure are described elsewhere
27
; for completeness, we also report their Monte Carlo standard errors which quantify the uncertainty in their estimation.
To estimate the key performance measure (bias) with a given precision, we first ran 10 replications of the simulation study. Allowing a Monte Carlo standard error for bias of 0.01, we would require
3.6 Software
This simulation study was coded and run using R version 4.0.2 and analysed using the
3.7 Results
There were no serious convergence issues with the estimation algorithm. Convergence was achieved for 293 (97.67%) of the 300 repetitions.
The performance measures of this simulation study are summarised in Table 2. The shape and rate parameters were estimated (on the log-scale) with small bias (0.0327 and 0.0416, respectively) and slight undercoverage (91.13% and 89.76%); empirical and model-based standard errors were close. Nevertheless, the average (across repetitions) estimated mean and variance of the Gamma distribution were 0.5509 and 0.1179 (compared to true values of 0.5556 and 0.1235). Furthermore, the estimated distributions of the inverse growth rates were very close to the distribution from which the data was generated. Supplemental Figure S1 depicts the estimated inverse growth rate distributions and the distribution of estimated median doubling time, across repetitions. The parameters of the screening sensitivity function
Performance measures for the simulation study with screening data; values are point estimates with Monte Carlo standard errors in brackets, where applicable.
Overall, the largest Monte Carlo error for bias was 0.0081 and, as expected, given the number of repetitions, was
4 An analysis of distant metastatic spread in Swedish post-menopausal breast cancer patients
We used our modelling approach to analyse data from a case-control study of postmenopausal breast cancer in Sweden (CAHRES; Cancer And Hormone REplacement Study). 30 Data on the participants (women born and residing in Sweden, aged 50–74 years old, diagnosed with an incident primary invasive breast cancer between 1st October 1993, to 31st March 1995) was linked to data from the Swedish Cancer Registry and the Stockholm-Gotland Breast Cancer Registry. The data on distant metastases has been used before to study the association of mammographic density and the risk of distant spread. 31 Mammographic images and screening histories were collected from mammography screening units and radiology departments, in an extension of the original case-control study; the collection of this data has also been described previously.32,33 The data analysed here is from 1614 breast cancer patients, of whom 1035 (64.13%) were detected via screening; the remaining 579 (35.87%) women detected their tumour symptomatically. Among all participants, the median tumour size at detection (as diameter) was 15 mm (inter-quartile interval [IQI]: 10–22 mm).
Four (0.25%) women had metastases at diagnosis; none of these women had small tumours (12, 28, 50 and 70 mm). From the women that were free of detected metastases at detection of the primary tumour, 293 (18.20%) were diagnosed with metastases during follow-up (Table 3). Median follow-up, estimated using the reverse Kaplan-Meier method, 34 was 5.49 years (IQI: 5.41–5.58 years). As shown in Table 3, screen-detected tumours are, on average, smaller than symptomatic-detected tumours. The number of person-years at risk and events by tumour size quintiles and mode of detection are included in Table 3. Average rates of spread increase by tumour size and it appears that, taking tumours of similar size, there is more spread in symptomatic cases (which are, on average, faster-growing tumours) than in screen-detected tumours (although there are of course differences in tumour size distributions even within tumour size groups), which is consistent with our model (e.g. equation (19)).
Summary of time at risk and number of events (detected metastases) by tumour size (quintiles: diameter, in mm) and mode of detection, for women free of metastases at detection of the primary tumour. Four women from the symptomatic cases group, i.e., those with detected metastases at diagnosis of the primary tumour, are excluded from this table. Event rates are depicted per 1000 person-years.
We fitted our model (Section 2.4), first fixing
Log-likelihood values for models with different values of
Model parameter estimates obtained from CAHRES data, assuming
Since examples of how continuous growth models can be used to estimate aspects of screening sensitivity and growth of the primary tumour (in CAHRES data) have been presented elsewhere,8–10 we focus here on novel quantities that we can obtain by modelling future events (distant metastases).
The (hazard) rate of detection of first distant metastasis is defined in equation (15) as a function of

Estimates of survival (from distant metastases) probabilities (and 95% point-wise confidence intervals) for three hypothetical patients detected via screening, with a history of two negative screens, and with varying tumour size at detection. These model-based predictions assume no detected distant metastases at diagnosis and full removal of the primary tumour. Kaplan-Meier estimates for comparable subsets of subjects in CAHRES (see text) are included as a comparison.
Figure 3 depicts survival probabilities for two hypothetical patients with a primary tumour of diameter 20 mm at detection and (once again) with a history of two previous negative screens, 2 years apart from each other. One subject is detected via screening, while the other subject is detected symptomatically 6 months after their previous negative screen: the subject detected symptomatically has a worse prognosis than the subject detected via screening. Again, we include standard errors for these model-based predictions, obtained using parametric bootstrap with 1000 bootstrap re-samples, and Kaplan-Meier curves (for each hypothetical patient) fitted on a comparable subset of patients from CAHRES (patients with tumour of size 17.50–22.50 mm and detected via screening, n = 159, or symptomatically with a negative screen in the last 2 years, n = 75).

Estimates of survival (from distant metastases) probabilities (and 95% point-wise confidence intervals) for two hypothetical patients with a tumour of 20 mm at detection and a history of two negative screens. One subject is detected via screening, while the other subject is detected symptomatically six months after the previous negative screen. These model-based predictions assume no detected distant metastases at diagnosis and full removal of the primary tumour. Kaplan-Meier estimates for comparable subsets of subjects in CAHRES (see text) are included for comparison.
Finally, based on our model, we estimated (1) the probability of a patient having dormant/latent metastases at diagnosis and (2) the probability of detecting distant metastases in a patient at diagnosis of the primary tumour, both as a function of tumour size at detection (Figure 4). Once again, standard errors are obtained using parametric bootstrap with 1000 bootstrap resamples. Averaging over the observed tumour size distribution in CAHRES, the estimated probability of having dormant/latent metastases at diagnosis was 0.327. Similarly, the marginal probability (based on the CAHRES tumour size distribution) of detecting distant metastases at diagnosis in CAHRES was calculated to be 0.011. These values are comparable to those that have been reported elsewhere. 35

Estimated probability of latent and detected metastases at detection of the primary tumour (diagnosis) as a function of tumour size, based on model estimated on CAHRES data and assuming
5 Robustness to misspecified model assumptions
In this section, we discuss the robustness of the proposed modelling framework to violations of some of the underlying, structural modelling assumptions; we assess this using Monte Carlo simulation. For studying robustness to misspecified assumptions it will not matter whether we focus on a screened or unscreened population. Therefore, for simplicity, we based our simulations on the implementation of our model in the absence of screening (Sections 2.2 and 2.3).
The assumptions that we assess (in turn) are the following:
the primary tumour and seeded metastases grow at the same inverse growth rate Growth rates follow a Gamma distribution (equation (2)); All individuals share the same spread parameter
We relax (i) by simulating correlated growth rates from a bivariate Gaussian copula with equally distributed Gamma margins and a correlation parameter
We assessed the performance of the model when its assumptions are violated by comparing model-based predictions with empirical probabilities estimated on a large simulated dataset. We did this in terms of clinically relevant, model-based predictions; we assessed median tumour doubling times, probabilities of detected metastases and latent metastases at diagnosis of the primary tumours, fitted distributions of tumour volume at symptomatic detection, and conditional survival probabilities (for being free from detected distant metastases), conditional on being free from metastases and on having a particular tumour volume at diagnosis of the primary tumour. Here we present results only for the latter; results for the other prediction types are presented in Supplemental Material Appendix D.
Predicted conditional survival probabilities (calculated using a procedure similar to that described in Section 2.5), under the six simulation scenarios and for three distinct values of tumour size at detection, are included in Figure 5. For comparison, Kaplan-Meier curves from the large simulated dataset are also included. Under all simulation scenarios, predicted survival probabilities do not dramatically differ from the observed, empirical probabilities and the effect of tumour size on the survival probabilities is preserved. Violations of assumptions are however reflected/detectable, in particular when (1) the correlation between growth rates of the primary tumour and metastases weakens, (2) the distribution of the growth rates is misspecified, and (3) we fail to account for heterogeneity in the spread parameter

Comparisons of model predictions (solid lines) and reference, non-parametric estimates (dotted lines, using the Kaplan-Meier estimator) of time to diagnosis of distant metastasis (denoted in the plots as survival probabilities) for three distinct values of tumour size (diameter, in mm) at diagnosis of the primary tumour.
6 Discussion
Continuous natural history growth models with random effects provide a useful alternative to multi-state (semi-) Markov models for studying the unobserved history of breast cancer. Models that have been introduced include those for tumour growth, time to symptomatic detection, screening sensitivity and local spread to the lymph nodes.3–10As far as we are aware, the model we introduce in this paper, that includes a component for events (distant metastases) that occur after the diagnosis of the primary tumour, is the first of its kind, although a similar model, but without random effects (that assumes that all primary tumours have the same rate of growth) has been proposed by Szczurek et al. 36 Several models for events occurring after diagnosis have, of course, been developed, e.g., for survival after breast cancer surgery,37,38 breast cancer relapse,39–43 metastasis after breast conservation treatment, 44 susceptibility to breast cancer, 45 major adverse cardiovascular events in breast cancer patients. 46 However, these models condition on patients’ characteristics that are observed at diagnosis, i.e., do not incorporate the natural history of the disease. The model introduced here incorporates the unobserved natural history of the tumour. It relates the underlying biology to future outcomes, whilst at the same time providing insights into the latent growth and spread processes of the tumour. According to a recent review on prognostic models for breast cancer survival and recurrence, the most common model inputs are nodal status, tumour size, tumour grade, age at diagnosis, and oestrogen receptor status. 47 In our model, we directly model tumour size and include it as a sole covariate in the screening sensitivity function, but it would be straightforward to accommodate other covariates in this and other parts of the model.
There exist several mathematical models for breast cancer distant metastasis, including models for the initiation of metastatic growth, and models to assess the efficacy of surgery and adjuvant therapy on metastatic spread (for a summary see Clare et al.
48
). Blumenson and Bross
49
derived a model for the probability of a distant metastasis that is not removed by surgical treatment at diagnosis; they further extended their model to accommodate two distinct disease subgroups, with distinct model parameters. Slack et al. introduced another two-disease model describing the course of breast cancer and implications for clinical practice arising from the two-disease hypothesis.
50
Jorcyk et al. described an animal model and a mathematical model for breast cancer progression, with the latter being solved numerically and showing good agreement with laboratory data.
51
Models developed by Koscielny et al.
52
and by Szczurek et al.
36
are perhaps most relevant to our work. Koscielny et al. estimated that
The model we describe in this paper has, of course, its limitations. It relies on a set of assumptions (described in Section 2 and summarised in Table 1), among which that detection of the primary tumour is a function only of its size (i.e. it is independent of the process of metastatic spread), that metastases growing to a detectable size before the diagnosis of the primary tumour are only visible at diagnosis of the primary tumour, that metastatic seeding completely stops at diagnosis of the primary tumour, and that already seeded, successful colonies are not affected. The first two of these exclude the possibility for cancers of unknown primary. The third and fourth are concerned with the efficiency of surgery and therapy in eradicating tumour from the breast and regional lymph nodes, and in preventing metastatic recurrence. Data analysed here is from a study carried out on women diagnosed before the introduction of, e.g., ERBB2-targeted therapy. 53 Even still, our assumptions will not be completely realistic, e.g., we do not incorporate the effects of chemotherapy, which, depending on the drug, aim to disrupt DNA replication or mitosis. Ideally, we would like to be able to explore approaches for modelling the efficacy of treatments at diagnosis.54,55 Metastatic seeding is unlikely to completely stop when the primary tumour is removed surgically, even when adjuvant therapy is administered. 56 Incorporating and modelling the complexities of breast cancer treatments and cancer subtype differences is a natural extension of our model. Rather than assuming that all successful metastases are not affected by therapy, with sufficient data, it would be possible to relax the assumption by, e.g., allowing a proportion of seeded metastases to be eradicated or for growth to be slowed. However, in practice, the model we propose will be of use and may be reasonable since, as mentioned in the introduction, there is strong evidence (e.g. from studies looking at molecular mechanisms) that most metastases are seeded early. 20 There are of course even other assumptions implicit in our approach, that are not mentioned above. In reality, seeding of cancer metastases may in some cases be complex. 57 For example, implicitly we assume that metastases do not cross-seed new metastases; cross-seeding may contribute to faster, or increase heterogeneity in, metastatic growth.
We examined the robustness of our model to misspecification of some of the structural assumptions, using simulation, in Section 5: key patterns in the data were captured. We note that some of the assumptions that we relaxed could be incorporated into our model without too much difficulty. It would, for example, be straightforward to include a random effect for the spread parameter
The conventional, sequential model of breast cancer progression states that cancer passes through several stages disseminating first to the lymph nodes and then to distant organs. However, genomic analyses suggest that there is minimal seeding from axillary lymph nodes. 61 Our current model for distant metastatic spread does not include a component for local breast cancer spread to the lymph nodes (e.g. as described in the work of Isheden and Humphreys 10 ). In future work, we plan to incorporate both components to fit with the conventional model of breast cancer progression.
Even in its current form, our model adapts to biological principles of breast cancer growth and spread providing building blocks for future extensions that could accommodate more complexity and processes within a single, unified and efficient framework for modelling latent tumour growth and future outcomes. Models and, more importantly, frameworks of this kind are fundamentally important for studying novel interventions (such as personalised screening) and treatments, e.g., using microsimulation, or for studying risk factors behind tumour growth and spread.
Supplemental Material
sj-pdf-2-smm-10.1177_09622802211072496 - Supplemental material for Estimating latent, dynamic processes of breast cancer tumour growth and distant metastatic spread from mammography screening data
Supplemental material, sj-pdf-2-smm-10.1177_09622802211072496 for Estimating latent, dynamic processes of breast cancer tumour growth and distant metastatic spread from mammography screening data by Alessandro Gasparini and Keith Humphreys in Statistical Methods in Medical Research
Supplemental Material
sj-pdf-3-smm-10.1177_09622802211072496 - Supplemental material for Estimating latent, dynamic processes of breast cancer tumour growth and distant metastatic spread from mammography screening data
Supplemental material, sj-pdf-3-smm-10.1177_09622802211072496 for Estimating latent, dynamic processes of breast cancer tumour growth and distant metastatic spread from mammography screening data by Alessandro Gasparini and Keith Humphreys in Statistical Methods in Medical Research
Footnotes
Acknowledgements
We are grateful to Gabriel Isheden and Rickard Strandberg for discussions on the paper by Szczurek et al. 36 and for suggestions on using Poisson processes, e.g., the process described in Isheden et al. 10 , to model distant metastatic spread. We are grateful to Kamila Czene and Per Hall for discussions about the data from the CAHRES study. Finally, we are grateful to the two anonymous reviewers for their suggestions that greatly improved this manuscript.
Code availability statement
The authors welcome interested readers to contact the authors if they wish to replicate the results of the manuscript or to apply the proposed methodology to other data sets.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: this work was supported by the Swedish Research Council (2020-01302), the Swedish Cancer Society (CAN 2020-0714) and the Swedish e-Science Research Centre.
Supplemental material
Supplemental materials for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
