Abstract
The Multi-state EHR-based Network for Disease Surveillance (MENDS) developed a pilot electronic health record (EHR) surveillance system capable of providing national chronic disease estimates. To strategically engage partner sites, MENDS conducted a latent class analysis (LCA) and grouped states by similarities in socioeconomics, demographics, chronic disease and behavioral risk factor prevalence, health outcomes, and health insurance coverage. Three latent classes of states were identified, which inform the recruitment of additional partner sites in conjunction with additional factors (e.g. partner site capacity and data availability, information technology infrastructure). This methodology can be used to inform other public health surveillance modernization efforts that leverage timely EHR data to address gaps, use existing technology, and advance surveillance.
Introduction
In 2018, National Association of Chronic Disease Directors (NACDD), funded by Centers for Disease Control and Prevention (CDC), initiated the Multi-state EHR-based Network for Disease Surveillance (MENDS) to develop a near real-time, chronic disease surveillance system (https://chronicdisease.org/page/MENDSINFO/). 1 MENDS, modeled on a Massachusetts program, 2 aims to support chronic disease surveillance and inform public health program and policy development. An essential attribute of a surveillance system is that it is representative of the intended population to ensure accurate estimates of disease incidence, prevalence, quality, and other health events. 3 Statistical methods that help group states, counties, or other geographies that support creating a representative population for key public health activities are needed. This short report explores the use of a latent class analysis (LCA) to inform state selection.
Methods
MENDS employed a latent class analysis (LCA) to group 50 states and the District of Columbia according to their similarities in socioeconomics, demographics, health insurance coverage, chronic disease and behavioral risk factor prevalence, based on prior methods developed for selection of representative comparison communities for a program evaluation. 4 The LCA examined states, as opposed to communities or counties, as the organizing geographic unit for two reasons: 1) partner sites (e.g. health systems, health information exchanges) may not be limited to a smaller geography such as county, and 2) some information needed for this modeling was not easily available at smaller geographic levels. The goal of this LCA was to support data-informed selection of additional partner sites beyond the initial four sites identified for the project.
Data used for the LCA were from four broad domains: sociodemographic, prevalence of health-related risk factors and chronic disease, chronic disease-related health outcomes (e.g. heart disease deaths), and health insurance status. All data for the study were from publicly available sources (noted in Table 1): U.S. Census population estimates, the U.S. Census American Community Survey, and CDC Chronic Disease Indicators. These data sets had no personally identifiable information. To preserve stability of the LCA model given the relatively small number of units examined (n = 51), only 12 indicators from the four domains were used in the analysis. 3 An indicator on persons ever-diagnosed with depression was added due to its high prevalence and possible negative effects on chronic disease self-management.5,6 Each variable was dichotomized prior to analysis to capture relative differences among the states as in Jiang et al. 7 Analysis was performed using the Proc LCA package for SAS version 9.4.8,9
State characteristics by latent class and overall national mean.
n = total number of states [including District of Columbia] in each class.
U.S. Census Bureau. 2017 United States Census Population Estimates. Available at: https://data.census.gov/cedsci/.
U.S. Census Bureau. American Community Survey 5-year Estimates, 2013-2017. Available at: https://data.census.gov/cedsci/.
Centers for Disease Control and Prevention, Chronic Disease Indicators, 2017. Available at: https://www.cdc.gov/cdi/.
Results
Among alternative models, this three-class solution was identified as the best fitting model in 51% of iterations. This model had strong classification certainty (entropy = 0.97, minimum class probability = 0.78). In terms of model fit, the Akaike Information Criteria was 532.30, the Bayesian Information Criterion (BIC) was 623.10, and the sample size-adjusted BIC was 475.54. Core clusters of states remained relatively stable through many different model iterations (not shown). While model fit statistics indicated that a four-class solution could also be a fit to the data (log likelihood = −397.66 compared to log likelihood = −418.29 for a 3-class model), the small sample size (50 states and District of Columbia) and replicability of different iterations of the model led to selecting the three-class solution.
The three identified classes have a clear geographic orientation and strong similarities within each class. Class 1, the smallest class, contains 14 states and the District of Columbia (Figure 1). The states in this class have younger populations, lower proportions of non-Hispanic White residents, higher proportions of Hispanic and non-Hispanic Asian residents, and lower rates of risk factors such as smoking, obesity, and depression (means for each class shown in Table 1; item response probabilities not shown). Class 2, the largest class, comprises 19 states that have higher percentages of non-Hispanic Black residents, lower percentages of people reporting that they have health insurance, higher percentages of people living below poverty level, and higher rates of risk factors for chronic disease. Class 3 has 17 states including many of the Great Plains states and less populated states in the Northeast. These states have higher proportions of non-Hispanic White residents and lower proportions of non-Hispanic Black residents. The states within each class do vary in population size (e.g. California and DC are in the same class), as evidenced by the large standard deviation for total population. Detailed characteristics of states and the District of Columbia by latent class were also compiled and are available upon request.

States and district of Columbia by latent class.
Discussion
The LCA analysis identified three distinct groups of states, which aids in MENDS project planning and may serve as an approach for similar efforts with chronic disease surveillance to consider. States within each class exhibited strong similarities in domains (e.g. sociodemographic) while these differed notably among the classes. In addition, the three classes have a clear geographic orientation. These findings suggest that the three classes are cohesive and can be used to identify states when selecting additional sites. The different distribution of characteristics in the three classes warrants additional investigation to understand the underlying factors driving these patterns (e.g. population chronic disease burden, insurance status, public health infrastructure, etc. The National Health and Nutrition Examination Survey (NHANES) used a similar strategy to group states and territories into four strata according to similarities in sociodemographics and health, but not based on LCAs. 10 Although our LCA is limited by the small number of units under analysis, it generated results to inform continued recruitment of sites. Similar LCAs have been used to classify cities/towns into categories of chronic disease risk despite small sample sizes. 7
Use of EHR data has the potential to advance public health surveillance and guide public health interventions, 11 which may include approaches at various geographic levels. MENDS presents one opportunity to modernize chronic disease surveillance methods with timely EHR data. These results are only one part of the planning process, especially for EHR surveillance, which relies on engagement of clinical partner sites and resources for implementation. In the MENDS pilot, decisions to identify four initial partner sites were made prior to conducting the LCA analysis and relied upon partners’ ability and interest to engage in the pilot. For future expansion, the underlying data from states in underrepresented latent classes will help inform the selection of partner sites to increase the representativeness of MENDS. Although this work was undertaken as part of surveillance planning at a national level, it could also be replicated at smaller geographies (e.g. county) for other surveillance planning purposes.
Footnotes
Acknowledgements
We acknowledge the following people for their contributions into the development of this publication: Amanda Martinez, MPH, MSN, RN; Kathy Foell, MS; Kayla Craddock, MPH; Jeanne Alongi, DrPH, MPH; and Marti Macchi, MEd, MPH, National Association of Chronic Disease Directors. Ms Martinez provided writing assistance and supported the preparation of submission materials. In addition, we wish to acknowledge Thomas G. Land, PhD, former University of Massachusetts Medical School, who contributed to the idea and its original iteration, and Matthew Ritchey, PT, DPT, OCS, MPH and Adam Vaughan, PhD, MPH, MS, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division for Heart Disease and Stroke Prevention, who provided input that was adapted into this research brief. The map was created with free map software at mapchart.net. No copyrighted materials were used in this article.
The creation of this information is associated with Multi-state EHR-based Network for Disease Surveillance (MENDS) (https://chronicdisease.org/page/MENDSINFO/). The MENDS network leverages three software applications: Electronic medical record Support for Public Health (ESP), PopMedNet, and RiskScape. ESP (https://www.esphealth.org/) is an open-source software application that extracts electronic health data, organizes the data into a standard format stored across multiple data tables, and applies algorithms to identify conditions of public health interest. PopMedNet (https://www.popmednet.org/) is a software application that allows for the querying of the ESP data tables. RiskScape (
) is a software application that supports sense-making and action by providing summaries and visualizations of the ESP data. MENDS is the national implementation of these three applications.
Contributorship
LN was the primary author on this manuscript and analyzed output and sensitivity analyses for the paper. BA aggregated the data from several publicly available sources, ran the original model, and prepared an initial report of the latent class analysis that summarized early results. WL and a colleague conceived of the idea and worked with BA to design and implement the original iteration. JW provided critical review and revision of the manuscript throughout its development and administrative and technical support including leadership for CDC clearance. KHH conducted additional data iterations and a critical review of the manuscript for important alignment with the MENDS project. MP obtained funding for the project, as well as provided oversight on the implementation of this work and additional administrative and technical support. All authors contributed to and reviewed the final manuscript.
Declaration of conflicting interests
BA is a psychologist in a private practice and receives payment from insurance providers for her services. The other authors of this report have no conflicts of interest and no financial disclosures to report. Most authors have had contracts with the National Association of Chronic Disease Directors to conduct this work.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, as part of a financial assistance award totaling $1,890,000, 100 percent funded by CDC/HHS (grant number #5NU38OT000286). The findings and conclusions in this report are those of the authors and do not necessarily represent the official views of, nor an endorsement by, CDC/HHS, or the U.S. Government.
Ethical approval
Not applicable. Ethical approval was not sought for this article, because this work was not research and did not need institutional review board approval.
Trial registration
Not applicable, because this article does not contain any clinical trials.
Guarantor
KHH
Informed consent
Not applicable. Informed consent was not sought for this article, because all data for the study were from publicly available data sources: U.S. Census population estimates, the U.S. Census American Community Survey, and CDC Chronic Disease Indicators. These data sets had no personally identifiable information.
