Abstract
Objectives:
Understanding the scope of the current opioid epidemic requires accurate counts of the number of opioid-involved drug overdose deaths. Given known errors and limitations in the reporting of these deaths, several studies have used statistical methods to develop estimates of the true number of opioid-involved overdose deaths. This study validates these procedures using a detailed county-level database of linked toxicology and vital records data.
Methods:
We extracted and linked toxicology and vital records data from Marion County, Indiana (Indianapolis), during a 6-year period (2011-2016). Using toxicology data as a criterion measure, we tested the validity of multiple imputation procedures, including the Ruhm regression-based imputation approach for correcting the number of opioid-involved overdose deaths.
Results:
Estimates deviated from true opioid-involved overdose deaths by 3% and increased in accuracy during the study period (2011-2016). For example, in 2016, 231 opioid-involved overdose deaths were noted in the toxicology data, whereas the corresponding imputed estimate was 233 opioid-involved overdose deaths. A simple imputation approach, based on the share of opioid-involved overdose deaths among all drug overdose deaths for which the death certificate specified ≥1 drug, deviated from true opioid-involved overdose deaths by ±5%.
Conclusions:
Commonly used imputation procedures produced estimates of the number of opioid-involved overdose deaths that are similar to the true number of opioid-involved overdose deaths obtained from toxicology data. Although future studies should examine whether these results extend beyond the geographic area covered in our data set, our findings support the continued use of these imputation procedures to quantify the extent of the opioid epidemic.
The United States is in the midst of an overdose epidemic, marked by drastic increases in the number of drug-related deaths in recent years. 1 Opioid-involved overdose deaths account for most overdose deaths; synthetic opioids, in particular, are largely responsible for growing numbers of overdose deaths. 1,2 However, opioid-involved overdose deaths are believed to be undercounted because of a large number of drug overdose deaths nationally for which a drug is not specified on a death certificate. 3 The National Vital Statistics System derives the official number of overdose deaths attributed to each drug by using data from death certificates; these data specify the drug by using 4-digit International Classification of Diseases, 10th Revision (ICD-10) codes. 4 -6 However, in 20% to 25% of overdose deaths, drugs involved are categorized as unspecified (ICD-10 code T50.9). 7,8 In addition, even for deaths in which the drugs involved are specified using ICD-10 codes listed on death certificates, this list may be incomplete. Together, these problems lead to an underreporting of drug-specified overdose deaths and unknown measurement error at the individual level. Measurement error in the number of drug-specified overdose deaths may bias estimates of studies that evaluate the effectiveness of interventions to lower the number of drug overdose deaths, such as prescription drug monitoring programs and abuse-deterrent reformulation of prescription opioids. Accurate estimates of the number of drug-specified overdose deaths, especially opioids, are critically important to understanding the scope of the opioid epidemic, its evolution, and, in turn, allocation of resources to address drug misuse. 9
Imputing missing proportions of drug-specified overdose deaths is one way of generating corrected numbers of drug-involved overdose deaths. Ruhm 7,8 used a regression-based imputation procedure to derive corrected estimates of opioid-involved overdose deaths. Ruhm used data from a subset of drug overdose death certificates in which at least 1 drug was specified (ie, a drug-specified sample) to impute the number of opioid-involved overdose deaths when no drugs were specified (ie, a drug-unspecified sample). Using probit regression to predict opioid involvement, Ruhm’s models included explanatory variables such as the decedent’s sex, race/ethnicity, marital status, education, day of the week of death, location of death, sex–race interactions, and several characteristics of the county of death (eg, poverty rates, education shares, proportion of households headed by females, median income, population density, and number of physicians per 1000 persons). Separate estimations were run for each year—2008-2014 7 and 1999-2015 3 —to allow for time-varying predictors. Estimated coefficients were used to predict the likelihood of an opioid overdose for persons with a drug-unspecified overdose death. Another imputation approach by Buchanich et al simply uses the share of opioid-involved overdose deaths in the drug-specified sample to extrapolate the share of opioid-involved overdose deaths in the drug-unspecified sample. 10
These estimates suggest that many states have dramatically underreported the number of opioid-involved overdose deaths. As noted by Ruhm, 3,7 the imputation approach assumes that persons who overdose and whose death certificate specifies opioids are not systematically more or less likely to have an opioid-involved overdose than persons with no drugs specified on their death certificates, conditional on characteristics such as age, race/ethnicity, and education. Similarly, imputing the share of opioid-involved overdose deaths from data on drug-specified overdose deaths only assumes that the proportion of opioid-involved overdose deaths is the same across drug-specified and drug-unspecified overdose deaths. However, data used in these imputation approaches do not allow for the testing of model assumptions or validation of corrected estimates.
This study addresses these limitations and contributes to the growing literature on underreporting of the number of opioid-involved overdose deaths. By merging data from individual-level toxicology reports with death certificate data during a multiyear period, we tested the validity of Ruhm’s imputation procedure 3,6 and the approach used by Buchanich et al. 10 For the purposes of this validation, we used toxicology and vital records data from the Marion County Coroner’s Office (MCCO), which provide a criterion measure of substances involved in overdose deaths and, as a result, a unique opportunity to test the validity of these imputation approaches.
Methods
Data Sources and Sample
Through a collaboration with the MCCO, we extracted death certificate data and accompanying toxicology records for all accidental poisoning deaths in Marion County, Indiana, from 2011 through 2016. 11 Marion County is the largest county in Indiana (with a 2017 population of 940 000), and its county seat, Indianapolis, is the state capital. From death certificate data, we extracted data on the cause of death from ICD-10 codes that were recorded by the Indiana Death Registration System and that indicated primary and contributing causes of death. We determined accidental drug poisoning for deaths that were assigned ICD-10 codes X40-X44. In addition, we determined opioid-involved overdose deaths through ICD-10 codes indicating opioids as a contributing cause of death, including ICD-10 codes T40.0-T40.4 or T40.6. Finally, we recorded whether an accidental drug poisoning death was coded as T50.9 (“unspecified”); this code indicates a death caused by polysubstance use, but no single drug is specified on the death certificate. Marion County, similar to Indiana, has a high rate of drug-unspecified accidental poisoning deaths, resulting in an undercounting of opioid-involved overdose deaths. 12,13 In 2018, Indiana passed legislation to address the high number of drug-unspecified overdose deaths through the provision of training and education for county coroners and clearer standards for death investigation and reporting in cases of accidental overdose. 14 However, this legislation was not in effect at the time of our investigation. Death certificates also provided data on age, race/ethnicity, sex, marital status, education, ZIP code, and time (ie, week, month, year) of death.
From toxicology screening reports, we recorded substances detected at threshold levels. The procedures for manual abstraction and coding of toxicology reports are detailed elsewhere. 12,15,16 Two researchers (P.H. and E.M.) coded data from death certificates and toxicology reports. A third senior reviewer (B.R.R.) reviewed entries at random to check for accuracy. This study was exempt from review by the Indiana University Institutional Review Board, per the university’s policy on research on decedents. The final sample included 1169 matched death certificates and toxicology reports from accidental overdose deaths occurring during 2011-2016.
Measures
Our primary outcome measure was whether an overdose death was specified to have opioid involvement. We defined a drug-specified overdose death as a death for which ICD-10 codes on the death certificate specified at least 1 drug. We defined a drug-unspecified overdose death as a death for which no single drug was listed on the death certificate as a primary or contributing cause of death. However, we still coded these deaths as accidental drug poisoning deaths based on death investigations conducted by the MCCO. We determined opioid involvement by positive detection of opioids on toxicology tests using the DetectiMed (American Institute of Toxicology, Inc, Indianapolis, IN) blood panel thresholds set by the testing agencies (AIT Laboratories, Denton, TX; NMS Labs, Horsham, PA). We based opioid identification on detection of the following substances: codeine, hydrocodone, hydromorphone, morphine, oxycodone, and oxymorphone. These drugs composed 86% of total opioid prescription dispensing during 2011-2016 in Indiana. 17 We used data on the decedent’s demographic characteristics, including age, sex, race/ethnicity, and circumstances of the death (ie, place of death), to predict opioid involvement among drug-specified overdose deaths and then to extrapolate the number of opioid-involved overdose deaths among drug-unspecified overdose deaths.
Statistical Analysis
We broadly replicated Ruhm’s approach by estimating a probit model (ie, regression for binary data with normally distributed error terms) predicting opioid involvement in drug-specified overdose deaths. Our replication differed from the approach by Ruhm in 4 ways. First, because our data were limited to 1 county, we did not include county-level characteristics. Accordingly, our predictors included binary controls for the decedent’s sex, race/ethnicity, marital status, age, education, place of death, and time (ie, day of the week, month, and year) of death. Second, because few decedents were classified as Hispanic or “other non-white,” we included interactions only between black race and sex instead of the more detailed sex and race/ethnicity interactions conducted by Ruhm. Third, we conducted our estimates at the county level rather than at the state level, as Ruhm did. Fourth, because our sample size was smaller than Ruhm’s, we included year-specific interactions for selected variables that were most predictive of opioid involvement, instead of year-specific probits estimated by Ruhm. Compared with Ruhm’s approach, our sparser imputation model performed well and provided further evidence of its suitability for within-single-county corrections as well. Next, we used estimated probit coefficients to predict the probability of opioid involvement in overdose deaths in the drug-unspecified sample. We assigned 1 to predicted probabilities ≥0.5, indicating the presence of the specific drug, and we assigned 0 to predicted probabilities <0.5, indicating the absence of the drug. Adding the ones, we calculated the estimated number of opioid-involved overdose deaths in the drug-unspecified sample.
We tested 2 specifications of Ruhm’s imputation approach. The first specification included year-fixed effects but pools effects across years (Imputation A). The second specification allowed the effect of these characteristics to vary by year (Imputation B). Imputation B is in line with Ruhm, who estimated his model separately for each year, essentially introducing interactions by year for all predictors.
We also replicated the simple imputation approach used by Buchanich et al, 10 which imputed the number of opioid-involved overdose deaths in the drug-unspecified sample based on the share of opioid-involved overdose deaths in the drug-specified sample. We compared estimates based on Ruhm’s imputation with this imputation procedure.
We used coefficients produced by these 3 imputations to predict the number of opioid-involved overdose deaths in the 672 unspecified overdose deaths. 18 We compared the total predicted number of overdose deaths with the true number of opioid-involved overdose deaths (determined by data in toxicology reports) and with the number of opioid-involved overdose deaths from the drug-specified sample only, which is the commonly reported measure of opioid-involved overdose deaths that the imputation procedure seeks to correct.
In the absence of toxicology reports confirming involvement of specific drugs for drug-unspecified overdose deaths, it is difficult to know whether the assumptions underlying the imputation are accurate. As Ruhm 3,7 noted, the approach assumes that the predictors of opioid involvement in overdose deaths are the same in the drug-specified sample as in the drug-unspecified sample. For example, a violation of this assumption would occur if males were more likely than females to have opioid-involved overdose deaths in the drug-specified sample but not in the drug-unspecified sample. Consistent differences in characteristics of persons in the drug-specified and drug-unspecified sample could, in turn, lead to errors in predicting the number of opioid-involved overdose deaths in this group.
Similarly, the simple imputation approach by Buchanich et al 10 assumes that the share of opioid-involved overdose deaths is the same in the drug-specified and drug-unspecified samples. This assumption would be violated if, for example, persons who died of an overdose and whose death certificate specified no drugs were systematically less likely to have opioid involvement than persons who died of an overdose and whose death certificate specified at least 1 drug. Our data set enabled us to validate these imputation strategies by comparing substances detected through toxicology reports for all overdose deaths, not just those in which at least 1 drug was specified on the death certificate. We reported the estimated number of opioid-involved overdose deaths using this imputation and compared it with the true number of opioid-involved overdose deaths from toxicology data for each year.
Finally, we conducted 3 sensitivity tests to ensure the robustness of the imputation procedures. First, both Imputation A and Imputation B were estimated using a logistic regression instead of a probit regression. Second, we calculated the number of opioid-involved overdose deaths by adding the predicted probabilities, without first assigning 1 to predicted probabilities if ≥0.5 and 0 otherwise. Third, we tested 2 additional optimal cut points to classify an overdose death as involving opioids—derived by maximizing the sum of the model specificity and sensitivity, as summarized in the Youden J statistic: predicted probabilities >0.82 (from Imputation A) or >0.66 (from Imputation B). We used Stata version 16.0 for all analyses and considered t tests with P < .10 to be significant. 18
Results
In both Imputation A and Imputation B, the most significant predictors of opioid involvement in overdose deaths were male sex (P = .07), black race (P = .01), age 0-30 (P = .07), and dying at home (P = .03) (results shown in appendix available at https://iu.box.com/s/qgfs63k8ynw7u2dsqh9s7v2205z9hon8).
Of 1169 drug overdose deaths, the drugs involved were specified in 497 deaths (42.5%), of which 404 (81.3%) involved opioids (Table 1). Of the remaining 672 drug-unspecified overdose deaths, toxicology reports showed that 591 (87.9%) deaths involved opioids. Therefore, across both samples, toxicology reports showed 995 opioid-involved overdose deaths in Marion County during 2011-2016, and the official estimates underestimated the number of deaths by 59% (404 vs 995).
Characteristics of decedents in opioid-involved overdose deaths (n = 995) among 1169 drug overdose deaths, by whether or not a drug was specified on the death certificate, Marion County, Indiana, 2011-2016a
a Data source: Toxicology screening reports and death certificates from the Marion County Coroner’s Office. 11 Values are percentages unless otherwise noted.
b Sample consists of persons who died of drug overdose and whose death certificate specified ≥1 drug; for this sample, evidence of opioid involvement was based on death certificate data.
c Sample consists of persons whose death certificate did not specify any drugs; for this sample, evidence of opioid involvement was based on data provided in toxicology reports of the Marion County Coroner’s Office.
Both Imputation A and Imputation B correctly classified observations in the drug-specified sample—that is, the sample on which it was estimated—in 82.9% and 82.7% of deaths, respectively. Both imputations were driven by high sensitivity (98.0% and 97.0%, respectively) and low specificity (17.2% and 26.9%, respectively). The area under the receiving operating characteristic curve (0.62) summarized the diagnostic performance of the imputation (Table 2).
Performance measures of the Ruhm probit imputation procedure to predict opioid involvement in a drug overdose death in a sample of persons who died of drug overdose and whose death certificate specified ≥1 drug (n = 497), Marion County, Indiana, 2011-2016a,b
a Data source: Toxicology screening reports and death certificates from the Marion County Coroner’s Office. 11
b The Ruhm probit imputation procedure was designed to predict opioid involvement in a drug overdose, conditional on a person’s characteristics. 3,7 For this sample, evidence of opioid involvement was based on death certificate data. Estimations included day of the week of death, month of death, and year-fixed effects. Coefficients available in online appendix at https://iu.box.com/s/qgfs63k8ynw7u2dsqh9s7v2205z9hon8.
d Ruhm Imputation B allowed the effect of individual characteristics to vary by year.
e Number of true opioid-involved overdose deaths from toxicology data.
Overall, Imputation B, which included by-year interactions on selected variables, predicted 1024 opioid-involved overdose deaths during 2011-2016 (Table 3). Relative to the 995 true opioid-involved overdose deaths (determined from toxicology reports), our estimates overstated the number of opioid-involved overdose deaths by 3%. Imputation A, which excluded by-year interactions, overestimated the number of opioid-involved overdose deaths by 5% (1046 deaths vs 995 deaths). The simple imputation approach predicted 950 opioid-involved overdose deaths during 2011-2016, which understated the true number of opioid-involved overdose deaths by 5%. The relationship is shown graphically (Figure), indicating that an imputation procedure tracks the true number of opioid-involved overdose deaths.

Number of opioid-involved overdose deaths determined by using true toxicology data vs imputation, Marion County, Indiana, 2011-2016. Data from death certificates and accompanying toxicology records for all accidental poisoning deaths in Marion County, Indiana, from 2011-2016. Data source: Ray et al. 11 Specified sample includes drug overdose deaths with at least 1 drug specified on the death certificate. Unspecified sample includes drug overdose deaths with no drug specified on the death certificate. Ruhm’s Imputation B approach included year-fixed effects and the significant predictors of opioid involvement in overdose death (male sex [P = .07], black race [P = .01], age 0-30 [P = .07], and dying at home [P = .03], where P < .10 was considered significant), with the effect of these characteristics varying by year.
Number of opioid-involved overdose deaths determined by using data from toxicology reports and 3 imputation procedures, Marion County, Indiana, 2011-2016a,b
Abbreviation: pp, percentage point.
a Data source: Toxicology screening reports and death certificates from the Marion County Coroner’s Office. 11
b Ruhm Imputation A and Imputation B correspond to Imputations A and B in Table 2. 3,7 Ruhm Imputation A included year-fixed effects but pools effects across years 2011-2016. Ruhm Imputation B allowed the effect of individual characteristics to vary by year. Simple imputation refers to calculating the number of opioid-involved overdose deaths based on extrapolating annual averages of share of opioid-involved overdose deaths from the drug-specified sample to the drug-unspecified sample. A drug-specified overdose death was defined as a death for which International Classification of Diseases, 10th Revision codes on the death certificate specified at least 1 drug. 6 A drug-unspecified overdose death was defined as a death for which no single drug was listed on the death certificate as a primary or contributing cause of death.
c All values for error were calculated by using the counts for the specified sample and unspecified sample (in the first column) as the denominator.
The error in Imputation B ranged from 0% in 2015 to 10% in 2011 (Table 3). The estimates were most accurate during 2014-2016. Our preferred imputation approach understated the rise in opioid-involved overdose deaths during 2011-2016 by 15 percentage points, whereas the standard approach, which used data from the drug-specified sample only, overstated the change by 14 percentage points. The simple imputation approach, on the other hand, deviated only 2%-3% from toxicology reports in early years (2011-2012) but deviated from toxicology reports by as much as 3%-9% in later years (2013-2016).
The imputation approach was robust to alternative parametric specifications. In the logistic regression–based version of Imputation A and Imputation B, the number of opioid-involved overdose deaths deviated from the true number of opioid-involved overdose deaths by only 3% and improved in accuracy in recent years (available in online Table 2 at https://iu.box.com/s/qgfs63k8ynw7u2dsqh9s7v2205z9hon8). In the approach that imputed the number of opioid-involved overdose deaths by simply adding the probit regression–based predicted probabilities, the magnitude of deviation between the imputed additive probabilities and data from the toxicology reports was slightly larger than when the number of opioid-involved overdose deaths was calculated by adding binary assignments of predicted probabilities. Additive probability-based imputation underestimated the true number of opioid-involved overdose deaths in 4%-10% of deaths, and the deviation was similar in magnitude to the deviation between the true and imputed number of opioid-involved overdose deaths determined by using the simple imputation (available in online Table A3 at https://iu.box.com/s/qgfs63k8ynw7u2dsqh9s7v2205z9hon8).
Using the alternative optimal cut point of predicted probabilities >0.66 to classify an overdose death as involving opioids, results were similar to those produced by our preferred imputation approach. However, when we classified predicted probabilities >0.82 as involving opioids, the imputation performed poorly (available in online Table A4 at https://iu.box.com/s/qgfs63k8ynw7u2dsqh9s7v2205z9hon8). With both alternative thresholds, imputation accuracy did not improve monotonically over time.
Discussion
Findings from our 6-year study suggest that an imputation strategy based on methods described by Ruhm 3,7 produces accurate estimates of opioid-involved overdose deaths and overestimates true opioid-involved overdose deaths by a small degree (3%). Both imputations were effective at detecting opioid involvement but less effective at detecting opioid absence. Given this asymmetry, the imputation approaches broadly tended to overestimate the number of opioid-involved overdose deaths in the drug-specified sample—by 17.2% in the specification without by-year interactions and by 26.9% in the specification with by-year interactions. In the full sample, the performance of the 2 specifications was similar, with Imputation B performing marginally better than Imputation A. A simpler approach, proposed by Buchanich et al, 10 underestimated the number of true opioid-involved overdose deaths by a similarly small degree (5%). Overall, Imputation B—which allowed the effect of key predictors to vary by year—performed better, more closely resembled Ruhm’s original approach, and was our preferred specification.
The findings from our study support the validity of imputation strategies to address the undercounting of opioid-involved overdose deaths. In response to high proportions of drug-unspecified overdose deaths, some states (eg, Kentucky and Indiana) have enacted legislation to require toxicology testing in cases of suspected drug overdose. 14,19 However, whether these efforts will reduce high numbers of drug-unspecified overdose deaths remains to be seen. 20 Currently, the National Association of Medical Examiners recommends the use of toxicology testing in all suspected overdose deaths, traumatic deaths, and deaths in which the cause of death is unknown. 21 This group also recommends noting all substances on toxicology reports in the event of polysubstance use. Locally, these efforts are likely to be important to accurate surveillance of opioid-involved overdose deaths. However, given variability in local reporting, which often limits the broader surveillance of opioid-involved overdose deaths, imputation strategies provide one way to correct for drug-unspecified overdose deaths in national data. Imputation strategies are especially useful to account for more accurate tracking of fentanyl-involved deaths (eg, via advances in toxicology testing), which may artificially inflate the number of opioid-involved overdose deaths in jurisdictions with high proportions of unspecified drug overdose deaths. 11 Corrected estimates for opioid-involved overdose deaths will accurately capture the extent and evolution of the opioid epidemic and, thus, its public health importance.
Limitations
Our analysis had several limitations. Our matched sample of death certificates and toxicology reports, which was smaller than the sample used by Ruhm, did not allow year-specific probits such as those conducted by Ruhm. 3,7 We addressed this limitation by including by-year interactions on key variables. Estimates showed that our approach performed well, which provides confidence in our alternative. A second implication of our smaller sample size was that our study was underpowered, thereby preventing us from formally testing for differences in the estimated number of opioid-involved overdose deaths across imputation approaches. For example, the 3% difference we observed may not have been significant. Significance testing requires accounting for sampling error in means across samples and standard errors in estimating the probits required for imputation. The most straightforward approach would be a bootstrap procedure that, within each bootstrap replication, first estimates a probit model to predict the number of opioid-involved overdose deaths in the drug-specified sample and then applies the estimated coefficients to estimate the number of opioid-involved overdose deaths in the drug-unspecified sample. In our study, even if the 3% difference was significant using a bootstrap procedure, the small magnitude of difference would have little practical implication. Finally, we validated the imputation procedure in a single region of the United States; as such, our results may not extend to other regions. Validation efforts should be extended to broader geographic regions.
Conclusions
In a series of articles, Ruhm 3,7 provided an approach for correcting estimates of the number of opioid-involved overdose deaths. Although this approach has wide potential for application, it relies on assumptions that available national data are unable to test. Using a local data set that links toxicology data with vital records, we provided a validation of this approach. Our findings support the validity of the imputation approach, with a 3% error rate during a 6-year period, suggesting that imputation strategies are a tool to improve local, regional, and national surveillance of opioid-involved overdose deaths. Future research should investigate the validity of imputation approaches to derive the corrected number of overdose deaths involving other substances as well, particularly given growing polysubstance detection in opioid-involved overdose deaths. Accurately quantifying the extent of the opioid epidemic is critically important to properly allocate resources to address it.
Footnotes
Acknowledgment
The authors thank Philip Huynh and Elizabeth May for their research assistance in data collection.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Supplemental Material
Supplemental material for this article is available online.
