Abstract
Objective:
In the absence of access to surveillance system data, single-source administrative databases are often used to study health care utilization and health outcomes among people with sickle cell disease (SCD). We compared the case definitions from single-source administrative databases with a surveillance case definition to identify people with SCD.
Materials and Methods:
We used data from Sickle Cell Data Collection programs in California and Georgia (2016-2018). The surveillance case definition for SCD developed for the Sickle Cell Data Collection programs uses multiple databases, including newborn screening, discharge databases, state Medicaid programs, vital records, and clinic data. Case definitions for SCD in single-source administrative databases varied by database (Medicaid and discharge) and years of data (1, 2, and 3 years). We calculated the proportion of people meeting the surveillance case definition for SCD that was captured by each single administrative database case definition for SCD, by birth cohort, sex, and Medicaid enrollment.
Results:
In California, 7117 people met the surveillance case definition of SCD from 2016 through 2018; 48% of this group was captured by the Medicaid case definition and 41% by the discharge case definition. In Georgia, 10 448 people met the surveillance case definition of SCD from 2016 through 2018; 45% of this group was captured by the Medicaid case definition and 51% by the discharge case definition. These proportions differed by years of data, birth cohort, and length of Medicaid enrollment.
Practice Implications:
The surveillance case definition identified twice as many people with SCD as the single-source administrative database definitions during the same period, but trade-offs exist in using single administrative databases for decisions on policy and program expansion for SCD.
Sickle cell disease (SCD) is a rare, inherited blood disorder that, in the United States, predominantly affects racial and ethnic minority populations. 1 SCD causes substantial lifelong morbidity and an increased risk of early mortality.1,2 Multiple subtypes of SCD are associated with varying degrees of clinical severity. 3 Because early detection of SCD and intervention can reduce morbidity and mortality, screening of SCD has been included on the Recommended Universal Screen Panel for newborns. 4 Despite the inclusion of SCD in state newborn screening (NBS) panels, no long-term follow-up efforts, national registry, or network of centers has been established to identify all those living with SCD in the United States.5,6 Furthermore, the date of SCD inclusion in NBS panels varies by state, from 1975 for New York to 2006 for New Hampshire. 7 As such, some people living with SCD have not been identified by NBS programs, such as immigrants and people born before SCD inclusion in universal NBS in their states. Additionally, no population-based registries exist for SCD. 8
The Healthcare Cost and Utilization Project, the National Inpatient Sample, the Pediatric Health Information System, and state or national Medicaid sources are examples of single-source administrative databases that provide information on quality of care within specific subpopulations or information for standardization of practice across hospitals, especially those that allow linkage of patients over time. However, inclusion of people in these administrative databases is conditional on the presence of health plan coverage, length of coverage, being hospitalized, or other criteria. These limitations have implications related to leveraging these databases to develop and target population-based resources to improve the quality of life and outcomes of people living with SCD.
The Sickle Cell Data Collection (SCDC) program is a longitudinal surveillance system that has been implemented in California and Georgia since 2010.9,10 The SCDC program collects health information on all people living with SCD to study long-term trends in prevalence, geographic distribution, treatment, and access to quality care in these 2 states.11,12 The California and Georgia programs have developed population-based cohorts of people living with SCD by linking multiple state-specific datasets, such as NBS and vital records, clinical databases, and administrative claims.13,14 These cohorts were developed by using validated surveillance case definitions to identify people living with SCD in each state. 15 These state-specific cohorts have been used to inform policy, health care standards, and allocation of health care resources with the overall goal to improve and extend the lives of people with SCD.16 -20
Although the SCDC program expanded to 11 states in 2020, representing approximately one-third of people living with SCD in the United States, substantial gaps remain when considering national surveillance of people living with SCD. 21 In the absence of comprehensive data available through the SCDC program, administrative datasets, such as hospital or emergency department discharge databases, Medicaid administrative claims, and other comprehensive health insurance–based claims databases (eg, IBM MarketScan Research Databases or state all-payer claims databases), have the potential to provide insight for populations of people living with SCD.22 -25
Our objective was to compare the surveillance case definition that was developed for the SCDC program with single-source administrative database case definitions, which may be more easily implemented by researchers to identify people living with SCD. We compared several single-source administrative database case definitions because of the variation in years and types of data that may be available to researchers. We examined the variation in these comparisons by age, sex, and length of Medicaid enrollment.
Methods
Data Sources
We obtained study data from the California and Georgia SCDC programs; the methodologies for these programs have been previously described.9,21,26 Briefly, the California program draws from all NBS-identified infants with SCD; all nonfederal hospital discharge, emergency department, and ambulatory surgery encounters (“discharge data”); death files from vital records; Medicaid (Medi-Cal, the Genetically Handicapped Persons Program, and Children’s Medical Services) claims and enrollment data for all people with ≥1 SCD International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) 27 or International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) 28 codes; and clinical case reports from SCD care centers in the state. The ICD-9-CM diagnosis codes included are 282.6, 282.60-282.64, 282.68-282.69, and 282.41-282.42; the ICD-10-CM diagnosis codes included are all D57 codes except D57.3 and D57.811. Data are linked across data sources and across years of data and are deduplicated across data sources. Data agreements and access are through the California SCDC’s relationship with the California Department of Public Health.
The Georgia program includes all people with SCD identified through NBS, hospital discharge, emergency department, and ambulatory surgery encounters; death files from vital records; Medicaid, PeachCare for Kids (Georgia’s state children’s health insurance plan), and the State Health Benefit Plan (limited years only) claims data for all people with ≥1 SCD ICD-9-CM or ICD-10-CM codes; and clinical case reports from SCD care centers in the state. The diagnosis codes for Georgia are the same as for California; however, D57.811 was also included. Similar to California, data containing all available personal identifiers are linked across data sources and over years and are deduplicated at the individual level.
In California, the California Committee for the Protection of Human Subjects, each administrative data source, and the institutional review board (IRB) for each SCD care center reviewed and approved this work. In Georgia, the Georgia State University IRB and the IRB for each SCD care center reviewed the SCDC program and this work and either granted a public health exemption or found the work to not be human subjects research.
Surveillance Case Definition
The surveillance case definition for SCD used multiple databases to identify people living in California or Georgia from 2016 through 2018. The definition requires meeting at least 1 of several criteria to indicate that the person was living in the state during the period of interest (2016-2018, 2017-2018, or 2018 only). For example, people with SCD would be included if they met any of the following criteria: born from 2016 through 2018 with a confirmed NBS result for SCD or ≥3 administrative claims with ICD-9-CM or ICD-10-CM codes with evidence of living in state during at least 1 year from 2016 through 2018. Detailed descriptions of inclusion criteria for the surveillance definition for each state are available in the supplemental files (eTable 1a-c in Supplemental Material). For the purposes of these analyses, we considered the surveillance case definition to be the gold standard. Throughout this article, we refer to the multiple-database surveillance case definition described previously as the “surveillance case definition.”
Single-Source Administrative Database Case Definitions
We used case definitions from single-source administrative databases to identify people living with SCD (ie, in Medicaid and in discharge data). The definitions varied by data source (Medicaid or discharge database) and years of available data (3 years [2016-2018], 2 years [2017 and 2018], and 1 year [2018]), for a total of 6 single administrative database case definitions. The case definitions from each single-source administrative database are summarized in the supplemental material (eTable 2 in Supplemental Material). Although discharge databases are considered a single-source administrative database, we included inpatient, emergency department, and ambulatory discharge records as described previously. We included definitions using 3 years, 2 years, and 1 year of data to simulate the various numbers of years that researchers may have access to data; some researchers may be limited in the number of years of access because of constraints such as cost and data availability. For each definition, we identified people with SCD by the presence of ≥3 administrative claims or encounters with SCD ICD-10-CM codes in the database during the specified period. Hereinafter, we refer to case definitions from these single-source databases as single administrative database case definitions, with modifiers included when we refer to a specific database (ie, Medicaid or discharge case definition).
Demographic Characteristics
We included data on demographic characteristics (age, sex, and Medicaid enrollment) in our analyses. We expressed age as the individual’s birth cohort in 10-year intervals based on date of birth. We determined Medicaid enrollment through Medicaid eligibility files and stratified people as follows: no enrollment, 1%-49% of months enrolled, 50%-99% of months enrolled, and continuously enrolled from 2016 through 2018. Medicaid enrollment was calculated as the number of months enrolled in Medicaid divided by the number of months of eligibility; children born during 2016 through 2018 had <36 months of eligibility.
Statistical Analysis
We performed all analyses separately by state. We calculated the proportion of people identified by the surveillance case definition who were captured when each single administrative database case definition was applied, overall and by birth cohort and sex. We then calculated the proportion of people in the surveillance case definition who would be captured when specific Medicaid enrollment criteria were applied (1%-49%, 50%-99%, and 100% enrollment from 2016 through 2018). This constraint on Medicaid enrollment is often part of the inclusion criteria for studies aiming to assess health care utilization or health outcomes during a certain period.
Results
In California, 7117 people met the surveillance case definition of SCD from 2016 through 2018 (Table 1). Of these 7117 people, 48% (n = 3436) were captured by the Medicaid case definition when 3 years of Medicaid data were available, and 41% (n = 2906) were captured by the discharge case definition when 3 years of discharge data were available. In Georgia, 10 448 people met the surveillance case definition of SCD from 2016 through 2018 (Table 2). Among these 10 448 people, 45% (n = 4724) were captured by the Medicaid case definition when 3 years of Medicaid data were available, and 51% (n = 5364) were captured by the discharge case definition when 3 years of discharge data were available. The proportion of people who were captured by each single administrative database case definition varied by birth cohort, with people who were older (born before 1960) less likely to be captured by the Medicaid case definitions (<40% in California and <30% in Georgia) and people who were younger (born from 2000 through 2018) less likely to be captured by the discharge case definitions (<30% in California and <45% in Georgia).
Variations in the proportion of people identified by the surveillance case definition versus single administrative database case definitions for sickle cell disease, by birth cohort and sex, California, 2016-2018 a
Data sources: Sickle Cell Data Collection program in California (surveillance case definition) 22 and Medicaid or discharge databases in California (single administrative case definitions).
Nonfederal hospital discharge, emergency department, and ambulatory surgery encounters.
Variations in the proportion of people identified by the surveillance case definition versus single administrative database case definitions for sickle cell disease, by birth cohort and sex, Georgia, 2016-2018 a
Data sources: Sickle Cell Data Collection program in Georgia (surveillance case definition) 22 and Medicaid or discharge databases in Georgia (single administrative case definitions).
Nonfederal hospital discharge, emergency department, and ambulatory surgery encounters.
The number of people with SCD identified by the 2-year (2017 and 2018) and 1-year (2018) surveillance case definitions was lower than the number identified by the 3-year surveillance case definition (2016-2018). Similarly, the proportions of people identified with the 2- and 1-year single administrative database case definitions also decreased (Tables 1 and 2). The 2-year Medicaid case definition captured 45% and 43% of the corresponding 2-year surveillance case definition cases in California and Georgia, respectively; the discharge case definition captured 33% of people with SCD in California and 44% in Georgia. The 1-year Medicaid case definition captured 40% of SCD cases identified by the corresponding 1-year surveillance case definition in California and 38% in Georgia; the discharge case definition captured 22% of people with SCD in California and 29% in Georgia.
Of 7117 people identified with the 2016-2018 surveillance case definition of SCD in California, 52% (n = 3681) did not meet the 3-year Medicaid case definition. The 48% who met the Medicaid case definition in California included 7% (n = 513) who were enrolled in Medicaid during 1%-49% of the 3-year study period, 9% (n = 663) who were enrolled in Medicaid during 50%-99% of the 3-year study period, and 32% (n = 2260) who were continuously enrolled (Table 3). In Georgia, 55% (n = 5724) of people with SCD identified by the 3-year surveillance case definition were not captured by the corresponding Medicaid case definition. The 45% meeting the Medicaid case definition in Georgia included 7% (n = 744) enrolled in Medicaid during 1%-49% of the 3-year study period, 13% (n = 1355) enrolled in Medicaid during 50%-99% of the 3-year study period, and 25% (n = 2625) who were continuously enrolled (Table 4). In both states, these proportions varied by birth cohort. People in the youngest birth cohort had the greatest proportion of Medicaid coverage, but they also had the greatest fluctuation in coverage during the 36 months, with 44% (n = 236) of people with any Medicaid coverage in California and 61% (n = 691) of people in Georgia who had at least a 1-month gap in their coverage during the time for which they were eligible for Medicaid during the study period.
Characteristics of people with sickle cell disease identified by the surveillance case definition versus the Medicaid case definition, by Medicaid beneficiary status, California, 2016-2018 a
Percentage coverage indicates percentage during 2016-2018. Data sources: Sickle Cell Data Collection program in California (surveillance case definition) 22 and Medicaid database in California to determine Medicaid coverage.
Characteristics of people with sickle cell disease identified by a surveillance case definition versus a Medicaid case definition, by Medicaid beneficiary status, Georgia, 2016-2018 a
Percentage coverage indicates percentage during 2016-2018. Data sources: Sickle Cell Data Collection program in Georgia (surveillance case definition) 22 and Medicaid database in Georgia to determine Medicaid coverage.
Practice Implications
To our knowledge, this is the first published study to compare SCD case ascertainment using a surveillance case definition versus a single administrative database case definition, such as Medicaid or discharge data. Our results indicate that, during the same 3-year period, twice as many people with SCD were captured by the surveillance case definition versus the single administrative database case definition. Although surveillance data and single administrative databases can be useful for answering various research and policy-related questions, our study emphasized the importance of considering the trade-offs of using each approach.
The proportion of people with SCD captured by single administrative database case definitions varied by years of available data and length of Medicaid enrollment. Because of costs and availability of data, researchers are often limited to 1 or 2 years of data. In our study, during the same time frame, fewer people with SCD were captured by the single administrative database case definition than the number of people who met surveillance case definitions. Numerous single administrative database studies have been conducted using state or national Medicaid datasets in which the study population is inherently limited to Medicaid beneficiaries. These studies often further limit the study population by applying enrollment criteria, meaning that they require people to be enrolled in Medicaid for a specific period.29 -31 This requirement is in an effort to comprehensively include all billed and received health services. However, researchers should consider the impact of enrollment on results, because the number of people captured decreases as more stringent enrollment criteria are applied, even in our study of people with a complex chronic disease such as SCD, who are likely to maintain continuous coverage. Furthermore, older people in our study (ie, born before 1960) were less likely to be included than younger cohorts, particularly as enrollment criteria became more stringent. Because SCD-related complications increase with age, studies assessing the proportion of people with comorbidities or negative health outcomes may be biased downward given the younger distribution of ages of people who are continuously enrolled.
When answering questions about the prevalence of SCD in a geographic region, it is important to use surveillance data. The surveillance case definition has distinct strengths given the use of multiple linked databases. For example, the populations in surveillance data do not rely on enrollment in a particular public or private health insurance program, nor do they depend on use of acute care alone. This scenario is evident across both states, where >50% of people living with SCD were not identified through single administrative databases, which are restricted by health insurance type, enrollment, and/or use of acute care. An accurate understanding of the prevalence of SCD in a state can help health care providers and policy makers better plan and appropriately allocate health care resources and avoid exacerbating the disparities in care that may be already experienced by people living with SCD. For example, without intentional inclusion of the entire population of people living with SCD in a state, programs and policies to improve health outcomes may be underfunded or misdirected and, therefore, less effective than programs and policies targeting the full population. More comprehensive surveillance data can allow for better targeting of programs and resources, thus leading to positive health-related outcomes, reduced health disparities, and an increased likelihood for continued investments.
In the absence of access to data from a surveillance system such as SCDC, single administrative databases can be useful for identifying trends and health outcomes among subpopulations of people with SCD.31 -36 These databases are often straightforward to access and, thus, can rapidly accelerate health services research. Such administrative databases can include discharge data only (eg, Healthcare Cost and Utilization Project, National Inpatient Sample, Pediatric Health Information System) or rely on Medicaid administrative claims (eg, state-specific Medicaid data or Transformed Medical Statistical Information Systems national Medicaid data). These administrative databases can provide pertinent information on opportunities for improving quality of care in specific subpopulations or standardizing health care practices across hospitals, particularly among administrative databases that allow linkage of patients over time. For example, SCD-related questions on trends over time in comorbidities and receipt of preventive services have been answered by using these databases.37,38 The generalizability of these single administrative database studies to the population of people living with SCD is unknown. Further research could help explain differences in disease severity or health care use among people with SCD who are captured by each type of case definition (surveillance vs single administrative databases).
Limitations
Our study had several limitations. First, the surveillance case definition may not completely capture data on people living with SCD. For example, given that the surveillance case definition depends on multiple health care claims for people with SCD not identified through the state’s NBS program or receiving care from 1 of the reporting SCD clinical care centers, SCD patients without 3 claims during the number of years of available data would not be captured. In addition, people who recently moved to California or Georgia may not be included. Therefore, the surveillance case definition is not a true gold standard for identification of people living with SCD in each state. However, these results are complementary to work that examined the accuracy of various administrative database case definitions to identify people living with SCD. 15 Second, the lack of complete case capture and the lack of data on people who do not have SCD and/or SCD diagnoses preclude calculations of specificity and negative predictive values. Third, one additional diagnosis code (D57.811) was included in Georgia but excluded in California; however, the effect of including this code in the California data was negligible (<5 additional people included with the surveillance definition).
Conclusion
Although single administrative databases are typically straightforward to access, a surveillance case definition that uses multiple databases captured more than twice as many people with SCD and is, therefore, the optimal data to use for public health planning. Furthermore, the proportion of individuals within the surveillance case definition who were captured in the single administrative database case definition of SCD varied by age, length of Medicaid enrollment, and number of years of data available. These results can help researchers and policy makers understand the trade-offs of using single administrative databases when making decisions related to policy and program expansion for people living with SCD.
Supplemental Material
sj-docx-1-phr-10.1177_00333549231166465 – Supplemental material for Case Ascertainment of Sickle Cell Disease Using Surveillance or Single Administrative Database Case Definitions
Supplemental material, sj-docx-1-phr-10.1177_00333549231166465 for Case Ascertainment of Sickle Cell Disease Using Surveillance or Single Administrative Database Case Definitions by Sarah L. Reeves, Sophia Horiuchi, Mei Zhou, Susan Paulukonis, Angela Snyder, Shondelle Wilson-Frederick and Mary Hulihan in Public Health Reports
Footnotes
Disclaimer
The findings and conclusions in this article are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported by funding from the Centers for Disease Control and Prevention (CDC-RFA-DD20-2003).
Supplemental Material
Supplemental material for this article is available online. The authors have provided these supplemental materials to give readers additional information about their work. These materials have not been edited or formatted by Public Health Reports’s scientific editors and, thus, may not conform to the guidelines of the AMA Manual of Style, 11th Edition.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
