Abstract
Objective:
During the past several decades, survival rates for sickle cell disease (SCD) have substantially increased, with many people now living well into middle adulthood. To understand trends in mortality and survival patterns, research has taken 2 diverging approaches to ascertaining the death status for people with SCD. Single-source approaches rely on death certificates alone to identify people with SCD who died, while multiple-source approaches first identify those with SCD and then link them to death certificates to ascertain mortality status. This study evaluated these 2 approaches in understanding SCD mortality.
Methods:
We used 16 years of data (2004 through 2019) from the Sickle Cell Data Collection programs in California and Georgia. Drawing on these population-based surveillance systems and using the single- and multiple-source approaches, we constructed SCD decedent cohorts. For each approach, we examined the number of decedents with SCD, differences in demographic characteristics, and differential causes of death.
Results:
The single-source approach identified 1788 deaths among people with SCD, while the multiple-source approach identified 2524 such deaths, an increase of 41%. While many of the demographic characteristics were similar between the approaches, the multiple-source approach identified the average age of death to be 3.5 years greater than that of the single-source approach. While the multiple-source approach identified more decedents with SCD, the death records contained a higher percentage of nonspecific cause-of-death codes relative to the single-source approach.
Conclusions:
Researchers should be aware of the differences between the single- and multiple-source approaches when analyzing and interpreting mortality patterns among people with SCD. Prior estimates based on single-source approaches may be biased.
Sickle cell disease (SCD) is an inherited rare blood disorder estimated to affect approximately 100 000 people in the United States. 1 It predominantly occurs among racial and ethnic minority populations in the United States; characteristics include genetic subtypes that differ in terms of disease severity, lifetime morbidity, and recurrent episodes of extreme pain due to vaso-occlusive crises.2,3 Although people with SCD still experience a shorter life expectancy as compared with the general population, it is widely recognized that improved early detection of SCD through newborn screening programs in conjunction with advances in treatment and medical care have significantly decreased mortality rates for people with SCD since the condition was first documented by Herrick in 1910.2,4-6
Despite the unique health care needs of this group, no population-based registries of comprehensive mortality data exist for SCD.7,8 Therefore, previous SCD mortality studies have primarily taken 2 diverging methodological approaches to ascertain death status. The first approach uses death certificates from vital records programs to identify SCD-related deaths. This single-source approach typically relies on International Classification of Diseases, Tenth Revision, Clinical Modification 9 (ICD-10-CM) codes captured on the death certificate alone to identify people with SCD who died in a given period.10-14 This approach has been used to examine long-term trends in SCD mortality patterns in the United States,15,16 assess excess death among people with SCD due to the COVID-19 pandemic, 17 and document the low prevalence of opioid-related deaths among people with SCD.18,19
The second approach ascertains death status by using multiple sources. This approach first identifies a cohort of people with SCD and then links these people to death certificates to examine mortality outcomes and patterns.20-24 This approach has been prominent in epidemiologic studies from the Sickle Cell Data Collection (SCDC) program, which conducts population-level surveillance for SCD by using validated case definitions applied to integrated and deduplicated data from various sources.25,26 Recent SCDC studies have taken the multisource approach to examine the use of acute care in the last year of life, 27 create baseline death rates in newly established surveillance programs, 28 compare SCD death rates with rates in the general population, 29 and examine the prevalence of COVID-19–related deaths among people with SCD. 30
While these studies have improved our understanding of SCD-related mortality, the effect of differences in the 2 approaches for capturing mortality data among people with SCD warrants investigation. For example, a recent study found that multiple-source approaches identified twice as many people with SCD than did single-source datasets. 8 Therefore, previous research using death certificates alone may have underestimated or undercounted the number of deaths among people with SCD. In addition, death certificates may likely capture data on people with the most severe forms of SCD, because placement of SCD-related ICD codes on the death certificate requires clinical recognition 13 and known SCD status at the time of death by those completing the death record. As a result, research using multiple sources of data versus a single source likely captures a broader population of people with SCD, and the distribution of the causes of death may differ between the approaches.
The objective of this study was to compare causes of death among decedents with SCD by using 2 methodologies: first, single-source death certificate data; second, a population-based surveillance system to identify people with SCD and then link them to their death certificates. We aimed to elucidate the inconsistencies in our epidemiologic understanding of SCD-related mortality by comparing the similarities and differences that result because of the diverging identification strategies used in these approaches. If the use of death certificates alone leads researchers to underidentify decedents with SCD, then that could lead to an incomplete understanding of patterns of death among people with SCD and to inefficiently designed clinical interventions or underfunded policies. Understanding these methodological implications is especially important as curative gene therapies for SCD become more available, which may increase overall life expectancy in the population with SCD. 31
Methods
Data for this study came from the SCDC programs in California and Georgia. SCDC is a population-based surveillance program that uses various administrative health care data to identify people living with SCD in each program’s state catchment area. Data sources include newborn screening records, hospital discharge data, administrative health care claims data, and data reported from SCD specialty clinics, to name a few. As a component of their public health authority to conduct SCD surveillance, both programs obtain death certificates for all people living in the state (ie, all decedents regardless of SCD status). Complete information about the SCDC program and data linkage methodology is described elsewhere.25,26 We used the 2 surveillance programs to construct 2 SCD decedent cohorts: a single-source approach and a multiple-source approach. We selected these 2 SCDC programs for this study because they are the longest-running surveillance programs in the SCDC network, which allowed for uniform data collection for the largest number of people with SCD over time.
Single-Source Approach
In the single-source approach, we constructed an SCD decedent cohort using death certificates alone to identify people with SCD who died in the 16-year period from 2004 through 2019. Death records are standardized according to the World Health Organization and use ICD-10-CM codes 9 to report on the underlying cause of death (immediate or direct) and the contributory cause of death (additional disease or comorbidities that contributed but were not directly implicated). 32 It is common in epidemiologic research to use all available underlying and contributing cause-of-death fields to identify people with a specific health condition who died in a given period. 33 Use of all fields is particularly important in SCD mortality research, where using only the underlying cause of death may undercount the number of people with SCD. 16 Therefore, following previous mortality studies of SCD,11-13,16 in the single-source approach, we searched all cause-of-death fields on the death certificates to identify people with SCD and used the following ICD-10-CM codes: D57.0, D57.1, D57.2, D57.4, or D57.8.
Multiple-Source Approach
In the multiple-source approach, we constructed an SCD decedent cohort by first ascertaining all people with SCD in each state and then linking these people to their death records in each state’s vital statistics registry. The identification of all people with SCD in each state followed validated SCDC case definitions,34,35 which use confirmatory and probable case definitions for population-based SCD surveillance. In the confirmatory case definition, a person was identified as having SCD if indicated by a state newborn screening program and/or other clinical laboratory data sufficient to confirm SCD. Alternatively, a person met the probable case definition if 3 or more SCD-coded health care encounters during any 5-year period were indicated in linked administrative datasets. To construct the multiple-source SCD decedent cohort, for each study year from 2004 through 2019, we identified all people in California and Georgia meeting either the confirmatory or probable SCDC case definition; then, we probabilistically matched these people to their death certificates (Supplement).
Analyses
Our first analysis examined whether the 2 death ascertainment approaches differed in the number of SCD decedents identified. To that end, we counted the number of deaths identified by each approach. Our second analysis aimed to understand how the diverging approaches may have resulted in different epidemiologic profiles of the decedents. Therefore, we compared the distribution of the demographic characteristics and underlying cause of death between the SCD decedent cohorts. We obtained data on demographic characteristics at the time of death directly from the death certificates and included age, sex, ethnicity, race, education level, location of death, and whether an autopsy was performed. The death certificates also included county of residence, which we converted into an indicator for urban residence following Rural–Urban Continuum Codes of the US Department of Agriculture’s Economic Research Service. 36 For both population cohorts, we classified the underlying cause of death by using the algorithm created by Foreman et al, 37 which categorizes the underlying cause into 34 mutually exclusive categories, broadly including communicable diseases, noncommunicable diseases, injuries, and “garbage” codes. Garbage codes refer to ICD-10-CM–coded conditions that indicate vague clinical information or are impossible or improbable causes of death, therefore serving little public health utility for surveillance or for understanding epidemiologic profiles of decedents.38,39 For both approaches, we always obtained information on cause of death directly from the underlying cause-of-death field reported on the death certificate.
We described differences in the demographic characteristics between the SCD decedent cohorts. Similar to prior population-based SCD surveillance research, 8 our analysis did not include statistical testing given that the observed differences between the approaches represent the true population-level effect (ie, the cohorts were not drawn via a sampling procedure). We stratified all analyses by state (Supplement) and conducted analyses in Stata MP version 16 (StataCorp LLC). We obtained ethical approval for this study from the institutional review boards of Georgia State University (approval H11142) and the Public Health Institute in California (approval 15-10-2249).
Results
Using the single-source approach, we found 1788 deaths among people with SCD from 2004 through 2019 (853 in Georgia and 935 in California); using the multiple-source approach, we found 2524 deaths (1256 in Georgia and 1268 in California). The multiple-source approach identified an additional 736 deaths among people with SCD, a 41% increase in the number of deaths identified.
We found several differences in the distribution of demographic characteristics between the approaches (Table 1). For example, the mean age at death via the single-source approach was 42.4 years, which increased to 45.8 years via the multiple-source approach (an increase of approximately 3.5 years). This change was likely driven by the multiple-source approach identifying a greater percentage of older people with SCD, as made evident by the higher percentage of people aged 51 to 75 years (35.1% vs 29.4%) and ≥76 years (5.3% vs 2.1%) in the multiple-source approach.
Demographic characteristics of decedents with sickle cell disease (SCD) at the time of death, by death ascertainment approach, California and Georgia, 2004-2019 a
Single-source approaches rely on death certificates alone to identify people with SCD who died, while multiple-source approaches first identify those with SCD and then link them to death certificates to ascertain mortality status.
Includes Native American/American Indian, missing, unknown, and “other specified,” as reported on the death record.
Includes free-standing birthing centers and other long-term care facilities in Georgia and hospice in California.
When compared with the single-source approach, the multiple-source approach identified a greater percentage of female (54.4% vs 50.3%) and Hispanic (3.4% vs 1.7%) decedents. The single-source approach resulted in a higher percentage of missing data for education (49.6% in single source vs 42.7% in multiple source). Similarly, the percentage of records missing data for the rural–urban indicator was greater in the single-source approach (4.2%) than in the multiple-source approach (0.5%). The single-source approach indicated a higher percentage of decedents having an autopsy performed (17.3% vs 13.6% in multiple source), although the multiple-source approach indicated a greater percentage of records missing data on autopsy status (50.0%) as compared with the single-source approach (46.9%). Finally, the distribution of data for place of death differed between the approaches. Most notably, the percentage of deaths occurring in a hospital was higher in the single-source approach than in the multiple-source approach (58.3% vs 55.7%), whereas the percentage occurring at the decedent’s residence was slightly higher in the multiple-source approach than in the single-source approach (15.9% vs 14.5%).
In the single-source approach, we found that 2.8% of deaths were caused by communicable diseases, 90.8% by noncommunicable diseases, and 2.4% by injuries, leaving 4.0% of deaths categorized as garbage codes (Table 2). In the multiple-source approach, we found that 5.1% of deaths were caused by communicable diseases, 80.4% by noncommunicable diseases, and 5.8% by injuries; otherwise, 8.5% of deaths were categorized as garbage codes and 0.2% were missing information on cause of death. The most notable difference in these distributions was the relative attribution of noncommunicable diseases as the underlying cause of death. Specifically, 65.8% of deaths were attributed to “other noncommunicable diseases” as the underlying cause in the single-source approach, as compared with 41.3% in the multiple-source approach. In the classification scheme described by Foreman et al, 37 the ICD-10-CM codes for SCD were assigned to that category. Accordingly, we found a greater percentage of underlying noncommunicable causes of death other than SCD in the multiple-source approach. For example, we found a higher percentage of cancer in the multiple-source approach as compared with the single-source approach (6.9% vs 3.7%). The percentage of garbage codes in the multiple-source approach was more than double the percentage in the single-source approach (8.5% vs 4.0%), most likely attributable to higher percentages of heart failure (1.2% vs 0.2%) and ill-defined codes (2.1% vs 0.1%) in the multiple-source approach.
Distribution of underlying cause of death among decedents with sickle cell disease (SCD) at the time of death, by death ascertainment approach, California and Georgia, 2004-2019 a
Single-source approaches rely on death certificates alone to identify people with SCD who died, while multiple-source approaches first identify those with SCD and then link them to death certificates to ascertain mortality status.
Discussion
The objective of this study was to evaluate how 2 major approaches for ascertaining death status among people with SCD may produce different epidemiologic understandings of SCD mortality patterns. We leveraged population-based surveillance systems in 2 states to evaluate the differences that arise in crude death counts, demographic profiles at time of death, and causes of mortality by using 2 approaches: first, a single-source approach, searching death certificates alone for people with SCD who died; second, a multiple-source approach, identifying a cohort of people with SCD and then linking them to death certificates. The results of our analysis highlight substantial differences between the approaches.
One major finding of this study is that the multiple-source approach identified an exceptionally higher number of deaths among people with SCD as compared with the single-source approach. The most likely explanation for this finding lies in the design of the single-source approach, in which death certificates alone contain a limited number of ICD-10-CM fields to identify people with SCD, as captured at a discrete point in time (ie, time of death); in contrast, the multiple-source approach relies on several integrated data sources spanning longer periods. The implications of this difference are notable for SCD mortality research. For example, many prior studies that relied on death certificates alone frequently calculated SCD mortality rates15,16; the results of our analysis raise concerns about the accuracy of these rates given that the numerator was likely undercounted (ie, the number of deaths in the SCD population). It is plausible that SCD mortality rates are higher than previously estimated. Knowledge of these higher rates may inform the creation of SCD mortality review boards to better explore this issue.
A second major finding of this study is that the average age of death among SCD decedents was several years older in the multiple-source approach than in the single-source approach. One explanation for this finding is that the single-source approach may be capturing data on people with the most clinically severe genotypes of SCD, who were dying at earlier ages because of SCD complications (as evident by the differences in the “other noncommunicable diseases” category). Conversely, the multiple-source approach is likely capturing more people with SCD than the single-source approach, but ultimately these people die of non-SCD underlying causes of death. Several studies have noted increases in age at death among people with SCD,10,12,15,16 and the results of our study indicate that longevity in the SCD population may be higher than previously estimated, a positive clinical finding.
Despite the multiple-source approach identifying more deaths and an older average age at time of death, this approach involved trade-offs. While the multiple-source approach consistently had less missing demographic data than the single-source approach, it produced a higher percentage of garbage codes for the underlying cause of death. The higher percentage of garbage codes may be again explained by the multiple-source approach identifying a broader population of people with SCD who were less likely to die of non-SCD causes as compared with the single-source approach. Additionally, the 8.5% of garbage codes identified in our multiple-source approach is close to the approximately 10% of garbage codes found overall in US death data. 37 Nonetheless, among valid cause-of-death codes, the multiple-source approach may yield important findings for the clinical treatment of people with SCD. For example, the findings on higher percentages of death due to cancer and cardiovascular disease in the multiple-source approach than in the single-source approach may indicate that more clinical attention (eg, timely preventive care and coordinated specialty care) is necessary for these conditions as efforts are made to increase life expectancy among people with SCD.
The results of our study underscore the importance of investing in and using population-based surveillance for SCD. We recognize that for many researchers the single-source approach is at times the best available data source for understanding SCD mortality. For example, the Centers for Disease Control and Prevention’s WONDER system (Wide-ranging Online Data for Epidemiologic Research) provides publicly accessible death records for the entire United States and ready access to examining SCD mortality for long periods. However, researchers using the WONDER system should consider the findings of our study when conducting and interpreting their analyses. Namely, the use of death certificates alone is likely undercounting people with SCD who died, underestimating age of death, and overlooking clinically important differences in causes of death.
We contend that, when possible, researchers should use integrated data sources to first identify a cohort of people with SCD and then link their records to mortality data to best calculate the number of people who died and to determine their causes of death. Additionally, because linked data systems such as the SCDC program contain more information than death certificates alone, these systems can help researchers better understand potential drivers of differential mortality patterns by looking back on health care encounters and other measures to identify possible opportunities for intervention. 27 Similarly, as others have noted, 21 integration of additional data sources such as prospective longitudinal clinical registries may provide more clinically accurate causes of death than what are available in administrative records such as death certificates. Such data sources may indicate the true cause of death and additional morbidities from medical records. They are especially important because SCD in and of itself does not cause death; instead, it is the complications of SCD that result in death among people with the condition. Additional linked data sources likely contain more robust cause-of-death information than the cause-of-death algorithm used in our study.
Limitations
Our study had at least 2 limitations. First, because SCDC programs do not exist in every state, we could not directly compare single- and multiple-source approaches for the entire United States. Therefore, generalizability of this study is limited to California and Georgia, although these states do have large populations of people with SCD. Second, the SCDC case definitions used in the multiple-source approach, while population based, may still have missed a small percentage of people with SCD in each state—most likely people who were either unaffiliated with care or born before implementation of newborn screening for SCD. Future research could incorporate additional SCDC states to evaluate whether the findings of this study hold true. Future research could leverage additional data sources (eg, linked electronic health records) to better understand and disaggregate the higher prevalence of garbage codes found in the multiple-source approach than in the single-source approach and obtain cause-of-death data from medical records. Finally, additional research should aim to examine, by genotype, the accuracy of ICD-10-CM SCD codes that appear on death certificates. For example, using the multiple-source approach, future research could compare how genotypes obtained from laboratory confirmatory testing or newborn screening compare with the genotypes coded on the death certificates.
Conclusion
Our study compared 2 approaches for identifying deaths among people with SCD: a single-source approach based on death certificate data alone and a multiple-source approach that first identified people with SCD through population-based surveillance and then linked them to death records. The multiple-source approach identified 41% more deaths than the single-source approach and revealed an older average age at death, a broader demographic distribution, and more diverse underlying causes of death, including higher rates of cancer and cardiovascular disease. These findings suggest that relying solely on death certificates may lead to undercounting and underestimating age at death for people with SCD. While the multiple-source approach provides more comprehensive data, it also yields a higher proportion of nonspecific or garbage cause-of-death codes. Nonetheless, this approach offers a more complete picture of mortality patterns of SCD. As treatments for SCD continue to improve, robust integrated data systems such as those used in the SCDC program are essential for accurately tracking outcomes and informing public health and clinical strategies to improve SCD-related care.
Supplemental Material
sj-docx-1-phr-10.1177_00333549251382847 – Supplemental material for Two Approaches for Comparing Characteristics of Decedents With Sickle Cell Disease: Inconsistencies and Implications
Supplemental material, sj-docx-1-phr-10.1177_00333549251382847 for Two Approaches for Comparing Characteristics of Decedents With Sickle Cell Disease: Inconsistencies and Implications by Brandon K. Attell, James Marton, Brett Alfrey, Jhaqueline Valle, Sangeetha Lakshmanan, Jiajing Scarlette Shi, Mei Zhou and Angela B. Snyder in Public Health Reports®
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Support for this study was provided by the Centers for Disease Control and Prevention (grant DD-23-0002).
Supplemental Material
Supplemental material for this article is available online. The authors have provided these supplemental materials to give readers additional information about their work. These materials have not been edited or formatted by Public Health Reports’s scientific editors and, thus, may not conform to the guidelines of the AMA Manual of Style, 11th Edition.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
