Abstract
Objectives:
Electronic health records (EHRs) and electronic laboratory records (ELRs) are increasingly seen as a rich source of data for performing public health surveillance activities and monitoring community health status. Their potential for surveillance of chronic illness, however, may be underused. Our objectives were to (1) evaluate the use of EHRs and ELRs for diabetes surveillance in 2 California counties and (2) examine disparities in diabetes prevalence by geography, income, and race/ethnicity.
Methods:
We obtained data on a clinical diagnosis of diabetes and hemoglobin A1c (HbA1c) test results for adult members of Kaiser Permanente Northern California living in Contra Costa County or Solano County at any time during 2010-2014. We evaluated the validity of using HbA1c test results to determine diabetes prevalence, using clinical diagnoses as a gold standard. We estimated disparities in diabetes prevalence by combining HbA1c test results with US Census data on income, race, and ethnicity.
Results:
When compared with a clinical diagnosis of diabetes, data on a patient’s 5-year maximum HbA1c value ≥6.5% yielded the best combination of sensitivity (87.4%) and specificity (99.2%). The prevalence of 5-year maximum HbA1c ≥6.5% decreased with increasing median family income and increased with greater proportions of residents who were either non-Hispanic black or Hispanic.
Conclusions:
Timely diabetes surveillance data from ELRs can be used to document disparities, target interventions, and evaluate changes in population health. ELR data may be easier to access than a patient’s entire EHR, but outcome metric validation with diabetes diagnoses would need to be ongoing. Future research should validate ELR and EHR data across multiple providers.
Public health surveillance is an essential public health function, but conducting public health surveillance for many prevalent illnesses, such as diabetes, cardiovascular disease, and asthma, is challenging. One reason for this challenge is that administrative data (eg, hospital discharge records) collected during outpatient visits lack detailed information and are often made available several years after the events that they describe have occurred. Additionally, community surveys include data for only a limited portion of the population, may not reflect patients’ true health status, and are often insufficient for assessing disease prevalence at a more refined geographic scale (eg, at the census-tract level). The use of electronic health records (EHRs) for public health surveillance may offer state and local public health agencies timely, broad-based, accurate, and actionable information on the health of their communities. 1,2 A subset of EHRs, electronic laboratory records (ELRs), may be sufficient for the surveillance of some diseases. ELRs are widely used to transmit data from commercial laboratories, public health laboratories, and health care providers to public health agencies. 3
Type 2 diabetes mellitus (type 2 diabetes) is a target for public health surveillance because of its prevalence, cost, and amenability to prevention and management efforts and because of disparities among patient populations. This study evaluated use of the hemoglobin A1c (HbA1c) blood test, normally used to assess glycemic control in clinical health care settings, as a proxy indicator for the public health surveillance of diabetes. The HbA1c test has historically been used to evaluate glycemic control and inform diabetes therapies, and it is commonly included in clinical data systems. However, it has been used more recently to diagnose prediabetes and diabetes: the American Diabetes Association categorizes HbA1c test results <5.7% as normal, 5.7% to 6.4% as prediabetes, and ≥6.5% as a diagnostic threshold for diabetes. 4 To examine how EHRs and ELRs can enable diabetes surveillance, we analyzed data maintained by Kaiser Permanente Northern California (KPNC), a large integrated health care delivery system in California.
Public Health Surveillance Efforts
Public health surveillance data are essential for recognizing disease clusters, assessing trends, allocating scarce resources, evaluating interventions, identifying high-risk populations, and developing research hypotheses. 5 Modern public health surveillance systems, however, are fractured across agencies, geographies, and illnesses, leading to challenges in data quality. Data utility is often limited by delays between collection and reporting, poor geographic refinement, and sparse population coverage that hinders the analysis of disease trends and health disparities. 6 EHRs are seen as a potential solution to these challenges. 7
Electronic Health Records
EHRs—which include ELRs—are electronic versions of patients’ paper charts and may be shared with other authorized users for clinical purposes. 8 Public health authorities are also legally allowed to receive individually identifiable health information from EHRs for surveillance purposes, 9 and financial incentives to adopt EHR data systems capable of reporting data to public health agencies are offered to practitioners and health care systems. 10
Among California physicians, however, the most common uses of EHRs are for clinical care, such as documenting diagnoses and procedures, recording clinical notes, maintaining problem and medication lists, or transmitting prescriptions. 11 Even amid increased data abstraction from EHR systems, most reportable conditions are communicable diseases. 12 Most data flowing from EHRs to public health agencies are data on communicable disease diagnoses from laboratory data. 13
Yet, despite their limited use to date outside of communicable diseases, EHRs are increasingly seen as a viable tool for chronic disease surveillance. Researchers have used EHRs to examine the use of various indicators for surveillance of chronic conditions and behaviors, including smoking, 14 metabolic syndrome, 15 obesity, 16 –19 and dental disease. 20
Type 2 Diabetes Surveillance
Type 2 diabetes is a target for public health surveillance with EHRs because it is highly prevalent, it is costly, and its onset and most severe complications are preventable. Type 2 diabetes accounts for 95% of diabetes cases among US adults and is more amenable to public health prevention and control than type 1 diabetes. 21,22 Health care costs for people with diabetes in the United States were $245 billion in 2012, which was 2.3 times higher per person than health care costs for people without diabetes, after adjusting for age and sex. 23 Some racial/ethnic populations have a higher prevalence of diabetes, worse glycemic control, and increased rates of complications when compared with non-Hispanic white people. 24 Recent research has indicated correlations between type 2 diabetes and preventable environmental exposures, indicating additional avenues for public health research and prevention. 25 –27
Given the health and financial burdens of type 2 diabetes, the public health community needs to conduct accurate, reliable, and timely public health surveillance of this condition. Previous assessments of EHR data for diabetes surveillance have examined longitudinal trends, 28,29 neighborhood-level prevalence, 30,31 algorithms to estimate diabetes type and incidence rates, 31,32 and provider care practices. 33 Uses for the data have included outreach to high-risk patients, 28 clinical care support, 33 documentation of diabetes incidence, 29,32 and high-resolution mapping of diabetes prevalence. 30,31 Some studies relied on HbA1c laboratory data or International Classification of Diseases, Ninth Revision codes alone, 28,30 whereas others have had full EHR access, including clinical notes, diagnoses, and medications. 29,31,32
In practice, the availability of EHR data that can be used for public health surveillance of diabetes is limited. Clinical data in EHRs may have errors or missing information; demographic data are sporadically recorded; and privacy laws may limit the data elements available for public health surveillance. 34
However, several possible advantages exist for using ELR for diabetes surveillance. For example, ELR data are already widely used for public health surveillance; have a longer history of standardized coding to record, store, and report data, when compared with EHR data; and may be more accessible than a patient’s full EHR. 13 Our aim was to assess the validity and utility of HbA1c laboratory data for the public health surveillance of diabetes, including the assessment of geographic, racial/ethnic, and income disparities in diabetes control. Our research approach was unique in that we examined a narrow set of ELR data (ie, HbA1c test results) for their utility in the public health surveillance of diabetes and documentation of disparities. KPNC has a long history of (1) EHR use for clinical and research purposes, (2) deep market penetration in Northern California, and (3) the ability to validate patient laboratory results, with clinical diagnoses as a gold standard.
Methods
We examined diabetes prevalence in Contra Costa County and Solano County in Northern California because the 2 counties in this region have high KPNC market penetration and patients belong to a range of socioeconomic and racial/ethnic groups in urban, suburban, and rural settings. These demographically diverse counties had a combined population that was 45.4% non-Hispanic white, 10.2% non-Hispanic black, 14.5% non-Hispanic Asian, 24.5% Hispanic, and 5.4% non-Hispanic other; 22.6% were foreign born. Residents were also highly segregated by income, with census tract–level annual median household incomes ranging from $14 965 to $196 079, according to US Census 2010 data. 35 Choosing such an area allowed us to demonstrate the potential of ELRs to document socioeconomic disparities in diabetes control.
As of 2012, 99% of Kaiser Permanente physicians in California had access to KPNC’s EHR system. 36 The KPNC EHR system includes all data elements required to calculate population surveillance metrics based on HbA1c test results and a Diabetes Registry that serves as a gold standard to evaluate HbA1c metrics. The KPNC Diabetes Registry, which was established in 1993 by Kaiser’s Department of Research to conduct epidemiologic and research studies, has a sensitivity of 99% for type 2 diabetes diagnoses. 37 The actively curated KPNC Diabetes Registry uses redundant methods of ascertainment, including clinical diagnoses, use of hypoglycemic drugs, and laboratory blood glucose and HbA1c values. 38 –41 The Kaiser Permanente Division of Research Institutional Review Board approved this study and provided access to its data.
We queried EHR data for members of KPNC living in either Contra Costa County or Solano County at any time during 2010-2014. Our study included adults aged ≥18 as of January 1, 2010. These counties had a combined total of 1 514 388 residents, of which 412 400 (27.2%) were KPNC members aged ≥18 and a KPNC member for any time during 2010-2014. Among the study population of 412 400 members, 18 131 had type 2 diabetes based on the KPNC Diabetes Registry, for an age-adjusted prevalence of 4.2% (based on the 2010 US population as the standard).
This study addressed 3 broad questions. First, we quantified the consistency with which various populations in the study area were represented in the data. We assessed the continuity of KPNC membership among the patients, with membership tenures ranging from 1 to 60 months for any given member. Because some patient demographic characteristics (eg, race/ethnicity, income) were not reliably recorded in the KPNC EHR, we relied on US Census 2010 data to assess variability in membership tenure by demographic characteristics. To the degree that demographic subgroups had a similar distribution of membership tenures, they had similar opportunities for receiving HbA1c tests, thereby limiting any potential bias in estimating population disparities.
Second, we examined which HbA1c metric best approximated observed diabetes prevalence. We calculated the lowest, average, and maximum HbA1c results for each patient and aggregated data using a priori cut points of ≥6.5%, ≥7.0%, ≥7.5%, ≥8.0%, ≥8.5%, and ≥9.0%. We calculated sensitivity, specificity, positive predictive value (PPV), and the Pearson correlation coefficient for each HbA1c metric for the study population. We also calculated results for patients who had ever received any HbA1c test. We used the HbA1c metric with the highest Pearson correlation coefficient to calculate the sensitivity, specificity, and PPV by sex and age groups.
Third, using the HbA1c metric with the best combination of sensitivity and specificity, we assessed diabetes disparities by race/ethnicity, income, and geography. We used US Census 2010 data to characterize the race/ethnicity and income characteristics of each census tract in the study area, which allowed us to stratify calculations by grouping the study population by census tract of residence. We used the total number of KPNC enrollees aged ≥18 residing in a given census tract as the denominator when calculating age-adjusted disease rates for that census tract. We mapped results using ESRI ArcGIS version 10.3. 42
Results
Membership Tenure
During the 5-year period, members in the Diabetes Registry were covered for an average of 51.3 months, and those without diabetes diagnoses were covered for an average of 51.6 months. The mean tenure in KPNC was decreased for patients who were younger (aged <45), lived in census tracts with lower median family incomes, or lived in census tracts with higher proportions of non-Hispanic black and Hispanic residents (Table 1).
Mean membership tenure and distribution of Kaiser Permanente Northern California members in Contra Costa County and Solano County, California, by demographic characteristics, 2010-2014a
aData source: Kaiser Permanente of Northern California unpublished electronic health records.
bPercentages may not total to 100 due to rounding.
cBased on member’s census tract of residence among those with geocoded addresses. Data source: US Census 2010. 35
Selection of Outcome Measure
A maximum HbA1c result ≥6.5% at any time during the 5-year period had the best combination of sensitivity (87.4%) and specificity (99.2%); the PPV was 83.2% (Table 2). The performance of maximum HbA1c ≥6.5% varied slightly by age; specificity (84.0%) and sensitivity (98.4%) were lowest for adults aged ≥60 years (Table 3). Overall, a measure of maximum HbA1c ≥6.5% was highly concordant with the prevalence of type 2 diabetes from the Diabetes Registry at the census-tract level when both were expressed as age-adjusted rates (Pearson correlation coefficient, r = 0.97). A maximum HbA1c ≥7.0% was also highly concordant with type 2 diabetes prevalence (Pearson correlation coefficient, r = 0.96).
Sensitivity, specificity, and positive predictive value for various HbA1ca measures among Kaiser Permanente Northern California members aged ≥18 in Contra Costa County and Solano County, California, 2010-2014b
Abbreviation: HbA1c, hemoglobin A1c.
aHbA1c is a blood test used to assess glycemic control, diagnose prediabetes and diabetes, and inform diabetes therapies.
bData source: Kaiser Permanente of Northern California unpublished electronic health records.
Sensitivity, specificity, and positive predictive value for maximum HbA1ca ≥6.5%, by demographic characteristics, among Kaiser Permanente Northern California members aged ≥18 in Contra Costa County and Solano County, California, 2010-2014b
Abbreviation: HbA1c, hemoglobin A1c.
aHbA1c is a blood test used to assess glycemic control, diagnose prediabetes and diabetes, and inform diabetes therapies.
bData source: Kaiser Permanente of Northern California unpublished electronic health records.
cDiabetes prevalence is based on patient’s record in the Kaiser Permanente Northern California Diabetes Registry.
Using Diabetes Surveillance to Document Disparities
Similar to known trends in actual type 2 diabetes, the prevalence of 5-year maximum HbA1c ≥6.5% decreased with increasing median family income and increased with greater proportions of residents who were either non-Hispanic black or Hispanic (Figure 1). For example, the age-adjusted prevalence among residents of tracts having a median annual family income ≥$110 000 was 2.9% (95% CI, 2.8%-3.0%) based on this indicator, whereas for those living in tracts with a median annual family income <$50 000, this prevalence was 6.2% (95% CI, 6.0%-6.4%). When residents of tracts with <4.0% non-Hispanic black populations were compared with those living in tracts with ≥16.0% non-Hispanic black populations, these numbers were 3.1% (95% CI, 3.0%-3.2%) and 6.3 (95% CI, 6.1%-6.4%), respectively. When residents of tracts with <10.0% Hispanic populations were compared with those living in tracts with ≥40.0% Hispanic populations, these numbers were 2.8% (95% CI, 2.7%-2.9%) and 5.9% (95% CI, 5.7%-6.2%). Among tracts overall, these rates ranged from 0.9% (95% CI, 0.4%-1.9%) to 9.5% (95% CI, 7.2%-12.4%). Geographic variations in age-adjusted prevalence by census tract are shown in Figure 2.

Age-adjusted rates of 5-year maximum HbA1c ≥6.5% among Kaiser Permanente Northern California members aged ≥18, by characteristics of census tract of residence. HbA1c is a blood test used to assess glycemic control, diagnose prediabetes and diabetes, and inform diabetes therapies. Census-tract characteristics were based on US Census 2010. 35 Error bars indicate 95% CIs. Abbreviation: HbA1c, hemoglobin A1c.

Age-adjusted rates of 5-year maximum HbA1c ≥6.5%, by census tract of residence, for Contra Costa County and Solano County, California, 2010-2014. HbA1c is a blood test used to assess glycemic control, diagnose prediabetes and diabetes, and inform diabetes therapies. Reference population is Kaiser Permanente Northern California members aged ≥18. 35 Abbreviation: HbA1c, hemoglobin A1c.
Practice Implications
A 5-year maximum HbA1c value ≥6.5% yielded the best combination of sensitivity and specificity when compared with a patient’s record in the Diabetes Registry. A maximum HbA1c ≥7.0% had a higher PPV and lower sensitivity versus a maximum HbA1c ≥6.5% but similar disparities by income, race, and geography. Overall, we found that the 5-year maximum HbA1c value ≥6.5% was a valid and accessible data metric for performing public health surveillance functions, including the documentation of type 2 diabetes disparities in the study population by race, income, and geography.
Clinical practices vary in their use of HbA1c test results, particularly in screening for diabetes. 43 In this study population, most tests were likely related to glycemic control monitoring over time. Ultimately, using the HbA1c test for patient practice and for public health surveillance are separate concepts: the American Diabetes Association recommends a diabetes diagnosis for patients with HbA1c test results ≥6.5%, 44 but various HbA1c metrics may be adequate for core surveillance functions, such as documenting health disparities and mapping high-resolution HbA1c metrics as a proxy for diabetes prevalence.
Although this pilot study of KPNC data is valuable for understanding the use of EHR data for diabetes surveillance, KPNC covers only a portion of the health care market in California and may be more advanced than other health care providers in terms of EHR access, clinical surveillance of glycemic status, and the ability to link HbA1c test results with full clinical data. Future analyses should assess data quality based on de-duplication results, missing values, and out-of-range values across multiple laboratory and clinical providers. With increased population coverage from multiple providers, a larger volume of patient HbA1c test results may provide high-resolution estimates of diabetes control in narrower intervals than the 5-year period used in our study. Providing data with high temporal and geospatial resolution would benefit public health surveillance and programming.
Limitations
The potential for confounding in these analyses arose from variable membership tenures for some populations. Small deviations in sensitivity may reflect age-related differences in how elements of the gold-standard definition, such as clinical histories, were recorded in the EHR. Regarding EHRs from other health care providers, confounding could also arise from deviations in standards of care because some patients may not be tested as regularly as those in the KPNC health care system. In addition, reliance on maximum HbA1c test results ≥6.5% may exclude future at-risk patients or patients with well-controlled diabetes (ie, patients with HbA1c test results <6.5%) from population surveillance efforts, and bias results toward subpopulations with poorer diabetes control. 45 Finally, diabetes screening and the frequency of glycemic monitoring with HbA1c test results may vary by socioeconomic factors. 46 The extent to which certain subpopulations may be less likely to receive HbA1c testing needs to be considered in future analyses.
Conclusion
Using EHRs and ELRs for public health surveillance may provide state and local public health agencies with timely, population-based, accurate, and actionable information on the health of their communities. 47 Other considerations must also be taken into account. Patient privacy is paramount; surveillance efforts that rely on data reported in aggregate must maintain patient anonymity; and, for many chronic conditions, case-based data are not necessary to develop population-based interventions. 13 For diabetes, laboratory HbA1c data may be sufficient for crafting and evaluating public health activities, but data validation with clinical diagnoses will need to be ongoing to understand potential data biases. Finally, although this study relied on sociodemographic data from the US Census to assess diabetes disparities by race/ethnicity and income, the inclusion of social and behavioral data in EHRs would greatly increase its use in public health surveillance and community interventions. 48 Overall, EHRs can play an increasing role in surveillance as administrative obstacles to their use are overcome.
Footnotes
Authors’ Note
The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the Centers for Disease Control and Prevention or the US Department of Health and Human Services.
Acknowledgments
We thank Drs Bela Matyas, Lynn Silver, and Matt Willis for their on-the-ground insights on the utility of diabetes surveillance data.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by the Centers for Disease Control and Prevention (NU38EH000953-06).
