Abstract
Objectives:
Reliable methods are needed to monitor the public health impact of changing laws and perceptions about marijuana. Structured and free-text emergency department (ED) visit data offer an opportunity to monitor the impact of these changes in near-real time. Our objectives were to (1) generate and validate a syndromic case definition for ED visits potentially related to marijuana and (2) describe a method for doing so that was less resource intensive than traditional methods.
Methods:
We developed a syndromic case definition for ED visits potentially related to marijuana, applied it to BioSense 2.0 data from 15 hospitals in the Denver, Colorado, metropolitan area for the period September through October 2015, and manually reviewed each case to determine true positives and false positives. We used the number of visits identified by and the positive predictive value (PPV) for each search term and field to refine the definition for the second round of validation on data from February through March 2016.
Results:
Of 126 646 ED visits during the first period, terms in 524 ED visit records matched ≥1 search term in the initial case definition (PPV, 92.7%). Of 140 932 ED visits during the second period, terms in 698 ED visit records matched ≥1 search term in the revised case definition (PPV, 95.7%). After another revision, the final case definition contained 6 keywords for marijuana or derivatives and 5 diagnosis codes for cannabis use, abuse, dependence, poisoning, and lung disease.
Conclusions:
Our syndromic case definition and validation method for ED visits potentially related to marijuana could be used by other public health jurisdictions to monitor local trends and for other emerging concerns.
In the United States, state laws and cultural norms pertaining to marijuana are changing. In Colorado, voters approved the cultivation and possession of medical marijuana in 2001, the commercial sale of medical marijuana in 2010, and the sale of retail marijuana for adults aged ≥21 in 2014. 1 The impact of these changes on public health is unknown, and reliable methods to identify and interpret health trends related to these changes are still emerging. 2 –4 Having a source of timely data and a broad case definition for health conditions potentially related to marijuana would help public health departments and other partners monitor trends in adverse health events.
Syndromic surveillance seeks to use existing health data in real or near-real time to analyze public health outcomes, rather than waiting for formal diagnosis data or survey results. One such method searches free-text and coded data in hospital emergency department (ED) visit records for keywords to identify patients with pertinent exposures or symptoms. Reported daily, these data provide an opportunity for timely identification and monitoring of adverse health outcomes related to a range of health exposures and conditions. Syndromic surveillance has been an objective of the Centers for Disease Control and Prevention (CDC) for several decades; in 2012, CDC redesigned its syndromic surveillance system (BioSense 2.0) to improve detection of emerging health concerns through trends in care seeking at EDs. 5,6 Federal incentives have been offered since October 2014 to encourage capable hospitals to report data from ED visits to local and state public health agencies. 7,8 Yet, although CDC has developed case definitions (or binning algorithms) for numerous syndromes, it has not developed any binning algorithms for drug-related ED visits. 9
Most syndromic surveillance studies have focused on assessing case detection methods or fields, evaluating whole surveillance systems, or comparing surveillance systems. 10 –13 Several researchers have developed syndromic case definitions, but they have not evaluated the keywords or codes used in their definitions. 14 –16 Traditionally, the process of validating new diagnostic or case detection methods includes the use of extensive chart reviews by ≥2 clinicians, direct clinical assessments, or comparisons with a preclassified data set that is based on gold standard case definitions. 17 –22 Validations with these methods may accurately capture the number of true- and false-positive cases as well as true- and false-negative cases, but they require a gold standard case definition and can be resource intensive.
No gold standard currently exists for medical visits potentially related to marijuana. Diagnosis codes that describe cannabis use, dependence, or poisoning may approach that standard, but these codes are sometimes used for other substances (eg, synthetic marijuana) and are sometimes absent from BioSense 2.0. When diagnosis codes cannot be used as a gold standard, the cost and time necessary for an optimal validation can be impractical, particularly during outbreaks and for institutions that are not funded as research entities.
As a result, there is a public health need for a less costly and more efficient process for creating and validating a syndromic case definition for medical visits potentially related to marijuana and other substances. Our study had 2 objectives: (1) generate and validate a syndromic case definition for ED visits potentially related to marijuana and (2) develop and describe a method for doing so that is less resource intensive than traditional methods.
Methods
Data Source
Two health departments in the Denver metropolitan region (Denver Public Health and the Tri-County Health Department) serve 4 of Colorado’s most populous counties: Adams, Arapahoe, Denver, and Douglas. We extracted data from all ED records reported by 15 of 16 hospitals with EDs in the 4 counties during 2 periods (September 1 through October 31, 2015, and February 1 through March 31, 2016) from BioSense 2.0, focusing on demographic characteristics, date of treatment, and 5 fields useful for case detection: 2 structured data fields (diagnosis code and diagnosis text [ie, text corresponding to diagnosis code]) and 3 free-text fields (chief complaint, triage notes, and clinical impression).
Although the BioSense 2.0 system required that the diagnosis code, diagnosis text, and chief complaint fields be reported if available, it accepted blank entries. The system did not require entries into the triage notes and clinical impression fields. We processed the data using SAS Enterprise Guide 5.1, 23 which removed duplicate records, set all text and codes to lowercase, and interrogated the structured data and free-text fields for our search terms.
Initial Syndromic Case Definition
Before conducting our first search for ED visits potentially related to marijuana, we developed our initial syndromic case definition, using common terms, products, and street names for marijuana (eg, cannabis, THC [tetrahydrocannabinol], and edibles) as well as International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) and International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes for cannabis abuse, dependence, and poisoning. 24,25 We included diagnosis codes with and without decimal points in our definition (eg, ICD-9-CM codes 304.31 and 30431). We excluded diagnosis codes for cannabis abuse/dependence in remission (ICD-9-CM codes 304.33 and 305.23 and ICD-10-CM code F12.21) and underdosing of cannabis derivatives (ICD-10-CM code T40.7X6) from our definition. We also excluded terms such as “denies marijuana,” “synthetic,” and “quit using marijuana,” as well as unrelated words that happened to include a keyword, such as “birthcontrol,” which contains the term “THC.”
The syndromic case definition that we developed was not intended to address whether marijuana caused the symptoms or conditions that prompted the ED visit. Not only is research on relationships between marijuana exposure and acute or chronic outcomes incomplete, but the data available from BioSense 2.0 did not typically contain this type of information. Instead, our goal was to assess whether marijuana appeared to be temporally or medically related to ED visits.
Validations
In the absence of a gold standard definition for ED cases potentially related to marijuana and with the goal of limiting the use of resources, a single case reviewer (K.D., an epidemiologist) manually assessed all visit records obtained from BioSense 2.0 that matched ≥1 search term to determine true- and false-positive cases (Figure 1). We assigned cases to 1 of 3 categories: (1) potentially related to marijuana, (2) unrelated to marijuana, or (3) unclear. With these categories, we then defined the cases that were potentially related to marijuana as true-positive cases and those that were unrelated to marijuana or unclear as false-positive cases.

Method used to validate a syndromic case definition for detecting emergency department visits potentially related to marijuana in BioSense 2.0 data, Denver (Colorado) metropolitan region, September-October 2015 and February-March 2016. Data source: Centers for Disease Control and Prevention. National Syndromic Surveillance Program: BioSense background. 5 The actual prevalence of emergency department visits related to marijuana was unknown, so true-positive matches to the case definition were defined as those for which marijuana appeared to be temporally or medically involved in visit circumstances, based on manual record review.
The reviewer considered a visit to be potentially related to marijuana (and, therefore, a true-positive case) if it involved any of the following: a diagnosis code or diagnosis text related to cannabis or psychodysleptics without a contradicting free-text entry, a free-text entry containing a search term and a marijuana-related symptom or outcome (eg, THC hallucinations), a free-text entry indicating that patient symptoms began after marijuana exposure (eg, hallucinations after marijuana edible), a free-text entry specifying recent or regular patient use of marijuana, or a free-text entry describing patient use of marijuana because of or after the symptoms or condition that prompted the ED visit (eg, patient smokes marijuana to treat anxiety).
The reviewer considered a visit to be unrelated to marijuana (and, therefore, a false-positive case) if the free-text fields indicated that the ED visit was clearly not related to marijuana (eg, allergic to weeds), that the patient reported nonrecent use of marijuana (eg, quit marijuana 3 months ago), or that the diagnosis code was for something clearly unrelated to marijuana (eg, spice). The reviewer designated a visit as unclear (and, therefore, also a false-positive case) if there was not enough information to make an assessment.
Once the reviewer completed this assessment for the data set of September 1 through October 31, 2015, we determined the number of total cases and true-positive cases for each search term in the initial syndromic case definition. We defined the positive predictive value (PPV) for each search term as the percentage of the total number of cases that were true-positive cases. In addition, to determine the value of the different data set fields that were interrogated, we examined the PPV and the total and unique number of cases identified in each field.
Together, these assessments provided insight into which search terms and data set fields were most valuable for identifying cases. We used these results to refine the syndromic case definition, in anticipation of applying the revised definition to a second data set. We particularly focused on cases that were identified by only 1 search term, because these cases would have been missed if that term had not been used. As part of this refinement process, we also identified and modified or removed search terms that returned a high rate of false-positive cases. To reduce the number of false-positive cases, we removed keywords and codes that produced no true-positive cases and modified keywords and codes with low PPV by adding exclusion terms.
We applied this revised case definition to the data set of February through March 2016 to assess its performance in an unrelated period and its generalizability. After a second round of validation, we used similar techniques to revise the definition a final time.
As a quality improvement effort to an existing syndromic case definition, we determined that this work was not research but rather fell within our routine public health work of evaluating and improving surveillance systems; as such, we did not submit the project to our institutional review board.
Results
In the first data set, a total of 126 646 unique visits were made to participating EDs; 524 ED visit records had terms that matched ≥1 search term in the initial case definition. Of these cases, the reviewer classified 486 as true-positive cases (PPV, 92.7%) and 38 as false-positive cases (PPV, 7.3%). Of the 38 false-positive cases, 32 were classified as unrelated to marijuana and 6 as unclear (Figure 2).

Flowcharts validating 2 syndromic case definitions for detecting ED visits potentially related to marijuana, based on BioSense 2.0 data from Denver (Colorado) metropolitan region, September-October 2015 and February-March 2016. Data source: Centers for Disease Control and Prevention. National Syndromic Surveillance Program: BioSense background. 5 True-positive matches to the case definition were those for which marijuana appeared to be temporally or medically involved in ED visit circumstances. Abbreviations: ED, emergency department; PPV, positive predictive value.
The ICD-9-CM diagnosis code 305.2 (nondependent cannabis abuse) identified the most cases (n = 146) and had a PPV of 97.9%, followed by the diagnosis text term “canna,” which identified 130 cases and had a PPV of 98.5% (Table 1). In the free-text fields, the keyword “marijuana” identified 118 cases (PPV, 92.4%), 92 of which were detected by using only this term. The keyword “THC” identified 49 cases (PPV, 98.0%), 32 of which were detected by using only this term.
Number of cases identified by and PPVs of search terms for detecting emergency department visits potentially related to marijuana, based on BioSense 2.0 dataa from Denver (Colorado) metropolitan region, September-October 2015 and February-March 2016
Abbreviations: NA, not applicable; PPV, positive predictive value.
aData source: Centers for Disease Control and Prevention. National Syndromic Surveillance Program: BioSense background. 5
bSearch terms identifying <4 true-positive cases not included in table.
cValues in both columns not mutually exclusive, because some cases matched to >1 search term.
dTrue-positive matches to the case definition were those for which marijuana appeared to be temporally or medically involved in emergency department visit circumstances.
eInternational Classification of Diseases, Ninth Revision, Clinical Modification. 24
fInternational Classification of Diseases, Tenth Revision, Clinical Modification. 25
After removing keywords that had not identified any true-positive cases and adding exclusion terms to prevent unrelated or unclear matches, we added 2 new terms to the definition (“blunt” and “adible” [a misspelling of “edible”]) to assess if they would capture more true-positive cases.
In the second data set, of 140 932 unique ED visits, 698 had terms that matched ≥1 search term in the revised definition. Of these cases, the reviewer classified 668 (95.7%) as true-positive cases and 30 (4.3%) as false-positive cases (Figure 2). Of all search terms in this second round of validation, “canna” identified the most cases (n = 350; PPV, 100.0%), followed by the ICD-10-CM diagnosis code F12.1, which identified 253 cases (PPV, 98.8%). Of the free-text keywords, “marijuana” identified the most cases (n = 108; PPV, 94.4%), 79 of which were detected by using only this term. The term “THC” identified 28 cases (PPV, 100.0%), 11 of which were detected by using only this term (Table 1).
In the second round of validation, the diagnosis code field yielded the most cases (n = 542; PPV, 99.4%), 174 of which were detected by using the diagnosis code field only. Of the 355 cases detected through the diagnosis text field (PPV, 100.0%), 1 case was detected with only this field and would have been missed if this field had not been searched. This case contained the ICD-10-CM diagnosis code for a condition (cannabinosis) that we had not considered. Of the 142 cases (PPV, 88.0%) that matched a search term in the triage notes free-text field, 113 cases were detected with only this field and would have been missed if this field had not been searched. Similarly, 36 of the 57 cases identified by the chief complaint field would have been missed if that field had not been searched, and 4 of the 7 cases identified through the clinical impression field would have been missed if that field had not been searched (Table 2).
Number of cases identified by and PPVs of search fields for detecting emergency department visits potentially related to marijuana, based on BioSense 2.0 dataa from Denver (Colorado) metropolitan region, February-March 2016
Abbreviations: NA, not applicable; PPV, positive predictive value.
aData source: Centers for Disease Control and Prevention. National Syndromic Surveillance Program: BioSense background. 5
bValues in both columns not mutually exclusive, because some cases matched to >1 search term.
cTrue-positive matches to the case definition were those for which marijuana appeared to be temporally or medically involved in emergency department visit circumstances.
Of 698 total cases and 668 true-positive cases, 494 (491 true-positive cases; PPV, 99.4%) were identified solely with the diagnosis code or diagnosis text fields, 153 (126 true-positive cases; PPV, 82.4%) were identified solely with ≥1 free-text field, and 51 (51 true-positive cases; PPV, 100.0%) were detected with a combination of free-text and either diagnosis code and/or diagnosis text fields.
The second round of validation refined the syndromic case definition a final time. We added the ICD-10-CM diagnosis code J66.2 for cannabinosis and the exclusion phrase “denies any marijuana” and removed search terms with <5 true-positive cases or a PPV <94.0%. Although the ICD-9-CM diagnosis code 305.2 identified 6 cases and had a 100.0% PPV in the second round, it identified 146 cases in the first round. Because of the likelihood that the code’s value as a search term would continue to diminish with the transition from ICD-9-CM to ICD-10-CM, it was removed. The final case definition contained 6 keywords for marijuana or derivatives and 5 ICD-10-CM diagnosis codes for cannabis use, abuse, dependence, poisoning, and lung disease (Table 3).
Search and exclusion terms tested and retained in or removed from final syndromic case definition for emergency department visits potentially related to marijuana, after validation with BioSense 2.0 dataa from Denver (Colorado) metropolitan region, September-October 2015 and February-March 2016
Abbreviation: NA, not applicable.
aData source: Centers for Disease Control and Prevention. National Syndromic Surveillance Program: BioSense background. 5
bInternational Classification of Diseases, Tenth Revision, Clinical Modification. 25
cInternational Classification of Diseases, Ninth Revision, Clinical Modification. 24
Discussion
We chose 5 fields in the BioSense 2.0 database to interrogate our definition: diagnosis code, diagnosis text, chief complaint, triage notes, and clinical impression. Use of the diagnosis code field resulted in a high PPV (99.4%), the greatest number of cases, and the largest number of cases that would have been missed if this field had not been used.
Use of the free-text triage notes field also identified a large number of cases, many of which would have been missed if this field had not been used. This finding is noteworthy because this field was optional for BioSense 2.0, and only 2 facilities in the Denver metropolitan area reported it. Triage notes typically contained narratives about the visit and its causes, unlike the chief complaint text, which contained only short symptom descriptions (eg, “altered mental status”). However, although the triage notes field identified a relatively large number of cases, it increased the chances of identifying patients with noncurrent marijuana use (ie, false-positive cases), as evidenced by a relatively low PPV of 85.0% among cases that matched only on this field. These findings demonstrate the potential trade-off between volume of case detection and PPV. Public health jurisdictions may need to determine an acceptable balance between case detection and PPV as they negotiate with hospital partners for more complete reporting to syndromic surveillance systems or procure other clinical data sources to monitor exposure-specific trends, such as drug-related ED visits.
Finally, we found that relatively few (51 of 698) cases were detected by using terms in both the free-text fields and the diagnosis code or text fields, but these cases had 100.0% PPV. This finding demonstrates the difficulty of using diagnosis codes as the gold standard in this data source.
During our validation process, we found a large number of true-positive cases and high PPVs using certain diagnosis codes and free-text keywords as search terms. For example, in both rounds of validation, use of the diagnosis text term “canna” and the free-text keyword “marijuana” resulted in high levels of case detection and high PPV.
We also noted that whereas alternate spellings of the term “marijuana” resulted in high PPVs, they identified relatively few cases. This finding suggested that it was not imperative to include all spelling variations in the case definition. In addition, except for the term “edible,” terms for marijuana-containing products such as “candy,” “brownie,” “gummy,” and “wax” had relatively low PPVs, and some terms produced more false-positive cases than true-positive cases. We removed these terms from the case definition. Although the ICD-9-CM diagnosis code 305.2 yielded the most cases in the first round of validation, it identified only 6 cases in the second round, so we removed it from the definition. These modifications demonstrated the importance of periodically revalidating syndromic case definitions. Our observations also suggest that ICD-9-CM diagnosis codes will become less useful as the transition to ICD-10-CM is completed and that certain keywords, such as those for marijuana-containing products, may have less value in jurisdictions without a commercial market for marijuana-infused edible products than in jurisdictions with such a market.
We propose several ways that others may build on our research. Other health departments or partners in jurisdictions with and without a commercial marijuana marketplace should test our final syndromic case definition and share their results. A detailed medical record review of marijuana-related cases and noncases may provide a measure of the true prevalence of potentially marijuana-related ED visits and insights into the sensitivity, specificity, and negative predictive value of our case definition. Finally, a second set of case definitions that would assign cases that are potentially related to marijuana to ≥1 subsyndrome of interest (eg, nausea/vomiting, mental disorders, pediatric exposures, polysubstance abuse, or accidental injury) would be beneficial.
Limitations
This analysis had several limitations. First, because the syndromic case definition was intended to capture data on any ED visits in which marijuana was involved in the circumstances leading up to the visit, we were unable to determine direct causality, evaluate the severity of symptoms, or distinguish between the acute and chronic effects of marijuana. Indeed, the definition was likely biased toward visits associated with acute marijuana-related health effects; as such, we speculate that patients and health care providers were less likely to associate ED visits with chronic marijuana use. Second, certain diagnosis codes and diagnosis text terms in our analysis had high PPVs primarily because cases identified from these fields were deemed to be false positives only if a text entry existed elsewhere in the record that contradicted the diagnosis. Our case definition may have identified a number of false-positive cases that had a marijuana-related diagnosis code but were actually unrelated to marijuana. Interviews with health care providers and medical coders might clarify how marijuana-related diagnosis codes and text are used during ED visits.
Conclusions
With local fine-tuning, our syndromic case definition for visits potentially related to marijuana should perform well in other public health jurisdictions and could also be applied to other data sources, such as the electronic medical record systems of hospitals and clinics. Additionally, our method for generating, validating, and revising a syndromic case definition is flexible and potentially less resource intensive than traditional methods. Using such a method would allow a health department or other entity to rapidly develop and validate a syndromic case definition and apply it to near–real-time data. Because new drugs of concern emerge frequently, this method could prove useful for public health monitoring and intervention.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This project was supported by the Centers for Disease Control and Prevention’s National Syndromic Surveillance Program grant and the Denver Office of Marijuana Policy.
