Data-driven clustering of infectious disease incidence into age groups

Abstract

Understanding the patterns of infectious diseases spread in the population is an important element of mitigation and vaccination programs. A major and common characteristic of most infectious diseases is age-related heterogeneity in the transmission, which potentially can affect the dynamics of an epidemic as manifested by the pattern of disease incidence in different age groups. Currently there are no statistical criteria of how to partition the disease incidence data into clusters. We develop the first data-driven methodology for deciding on the best partition of incidence data into age-groups, in a well defined statistical sense. The method employs a top-down hierarchical partitioning algorithm, with a stopping criteria based on multiple hypotheses significance testing controlling the family wise error rate. The type one error and statistical power of the method are tested using simulations. The method is then applied to Covid-19 incidence data in Israel, in order to extract the significant age-group clusters in each wave of the epidemic.

Keywords

Clustering multiple testing bagging epidemic modelling Covid-19

Get full access to this article

View all access options for this article.

References

Giuliano

Nielson

, et al. Age-specific prevalence, incidence, and duration of human papillomavirus infections in a cohort of 290 US men. J Infect Dis 2008; 198: 827–835.

Kretzschmar

Teunis

Pebody

. Incidence and reproduction numbers of pertussis: estimates from serological and social contact data in five European countries. PLoS Med 2010; 7: e1000291.

Baguelin

Hoschler

Stanford

, et al. Age-specific incidence of A/H1N1 2009 influenza infection in England from sequential antibody prevalence data using likelihood-based estimation. PLoS ONE 2011; 6: e17074.

Davies

Klepac

Liu

, et al. Age-dependent effects in the transmission and control of Covid-19 epidemics. Nat Med 2020; 26: 1205–1211.

Edmunds

O’callaghan

Nokes

. Who mixes with whom? A method to determine the contact patterns of adults that may lead to the spread of airborne infections. Proce R Soc London SerB: Biol Sci 1997; 264: 949–957.

Wallinga

Teunis

Kretzschmar

. Using data on social contacts to estimate age-specific transmission parameters for respiratory-spread infectious agents. Am J Epidemiol 2006; 164: 936–944.

Mossong

Hens

Jit

, et al. Social contacts and mixing patterns relevant to the spread of infectious diseases. PLoS Med 2008; 5: e74.

Prem

Cook

Jit

. Projecting social contact matrices in 152 countries using contact surveys and demographic data. PLoS Comput Biol 2017; 13: 1–21. DOI: 10.1371/journal.pcbi.1005697.

Hethcote

. Modeling heterogeneous mixing in infectious disease dynamics. Models Infect Human Dis: Struct Relat Data 1996; 215: 238.

10.

Dattner

Goldberg

Katriel

, et al. The role of children in the spread of Covid-19: using household data from Bnei Brak, Israel, to estimate the relative susceptibility and infectivity of children. PLoS Comput Biol 2021; 17: e1008559.

11.

Anderson

May

. Infectious diseases of humans: dynamics and control. Oxford: Oxford university press, 1992.

12.

Keeling

Rohani

. Modeling infectious diseases in humans and animals. Princeton: Princeton University Press, 2011.

13.

May

. Uses and abuses of mathematics in biology. Science 2004; 303: 790–793.

14.

Ross

. The prevention of malaria. London: John Murray, 1911.

15.

Kermack

McKendrick

. A contribution to the mathematical theory of epidemics. Proceedings of the Royal Society of London Series A, Containing papers of a mathematical and physical character 1927; 115: 700–721.

16.

Kermack

McKendrick

. Contributions to the mathematical theory of epidemics. II. the problem of endemicity. Proceedings of the Royal Society of London Series A, containing papers of a mathematical and physical character 1932; 138: 55–83.

17.

Kermack

McKendrick

. Contributions to the mathematical theory of epidemics. III. Further studies of the problem of endemicity. Proceedings of the Royal Society of London Series A, Containing Papers of a Mathematical and Physical Character 1933; 141: 94–122.

18.

Schenzle

. An age-structured model of pre- and post-vaccination measles transmission. Math Med Biol: J IMA 1984; 1: 169–191.

19.

Pellis

Cauchemez

Ferguson

, et al. Systematic selection between age and household structure for models aimed at emerging epidemic predictions. Nat Commun 2020; 11: 1–11.

20.

Magpantay

King

Rohani

. Age-structure and transient dynamics in epidemiological systems. J R Soc Interface 2019; 16: 20190151.

21.

Dattner

. Differential equations in data analysis. WIRES: Comput Stat 2021; 13: e1534.

22.

Dattner

Huppert

. Modern statistical tools for inference and prediction of infectious diseases using mathematical models. Stat Methods Med Res 2018; 27: 1927–1929.

23.

Yaari

Dattner

Huppert

. A two-stage approach for estimating the parameters of an age-group epidemic model from incidence data. Stat Methods Med Res 2018; 27: 1999–2014.

24.

Kimes

Liu

Neil Hayes

, et al. Statistical significance for hierarchical clustering. Biometrics 2017; 73: 811–821.

25.

Vahabi

Salehi

Duarte

, et al. County-level longitudinal clustering of Covid-19 mortality to incidence ratio in the United States. Sci Rep 2021; 11: 1–22.

26.

Megahed

Jones-Farmer

Zhao

, et al. Modeling the differences in the time-series profiles of new Covid-19 daily confirmed cases in 3, 108 contiguous US counties: a retrospective analysis. PLoS ONE 2021; 16: e0242896.

27.

Meinshausen

. Hierarchical testing of variable importance. Biometrika 2008; 95: 265–278.

28.

Rand

. Objective criteria for the evaluation of clustering methods. J Am Stat Assoc 1971; 66: 846–850.

29.

Haas

Angulo

McLaughlin

, et al. Nationwide vaccination campaign with bnt162b2 in Israel demonstrates high vaccine effectiveness and marked declines in incidence of SARS-CoV-2 infections and Covid-19 cases, hospitalizations, and deaths 2021.

30.

Bar-On

Goldberg

Mandel

, et al. Protection of bnt162b2 vaccine booster against Covid-19 in Israel. N Engl J Med 2021; 385: 1393–1400.

31.

Gavish

Yaari

Huppert

, et al. Population-level implications of the Israeli booster campaign to curtail Covid-19 resurgence. Sci Transl Med 2022; 14:eabn9836.

32.

Gorelik

Anis

Edelstein

. Inequalities in initiation of Covid-19 vaccination by age and population group in Israel—December 2020–July 2021. Lancet Reg Health-Eur 2022; 12: 100234.

33.

Benderly

Huppert

Novikov

, et al. Fighting a pandemic: sociodemographic disparities and Coronavirus disease-2019 vaccination gaps—a population study. Int J Epidemiol 2022; 51.

34.

Hennig

Meila

Murtagh

, et al. Handbook of cluster analysis. New York: CRC Press, 2015.

35.

Wasserstein

Schirm

Lazar

. Moving to a world beyond “p < 0.05”. Am Stat 2019; 73: 1–19.

36.

Zheng

. Semi-supervised hierarchical clustering. In 2011 IEEE 11th International Conference on Data Mining. IEEE, pp. 982–991.

37.

Bade

Nürnberger

. Creating a cluster hierarchy under constraints of a partially known hierarchy. In Proceedings of the 2008 SIAM international conference on data mining. SIAM, pp. 13–24.

38.

Zhao

. Hierarchical agglomerative clustering with ordering constraints. In 2010 Third International Conference on Knowledge Discovery and Data Mining. IEEE, pp. 195–199.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.18 MB