Tobacco use status from clinical notes using Natural Language Processing and rule based algorithm

Abstract

BACKGROUND:

This cross-sectional retrospective study utilized Natural Language Processing (NLP) to extract tobacco-use associated variables from clinical notes documented in the Electronic Health Record (EHR).

OBJECITVE:

To develop a rule-based algorithm for determining the present status of the patient’s tobacco-use.

METHODS:

Clinical notes ( $n=$ 5,371 documents) from 363 patients were mined and classified by NLP software into four classes namely: “Current Smoker”, “Past Smoker”, “Nonsmoker” and “Unknown”. Two coders manually classified these documents into above mentioned classes (document-level gold standard classification (DLGSC)). A tobacco-use status was derived per patient (patient-level gold standard classification (PLGSC)), based on individual documents’ status by the same two coders. The DLGSC and PLGSC were compared to the results derived from NLP and rule-based algorithm, respectively.

RESULTS:

The initial Cohen’s kappa ( $n=$ 1,000 documents) was 0.9448 (95% CI $=$ 0.9281–0.9615), indicating a strong agreement between the two raters. Subsequently, for 371 documents the Cohen’s kappa was 0.9889 (95% CI $=$ 0.979–1.000). The F-measures for the document-level classification for the four classes were 0.700, 0.753, 0.839 and 0.988 while the patient-level classifications were 0.580, 0.771, 0.730 and 0.933 respectively.

CONCLUSIONS:

NLP and the rule-based algorithm exhibited utility for deriving the present tobacco-use status of patients. Current strategies are targeting further improvement in precision to enhance translational value of the tool.

Keywords

Data mining decision support systems clinical health information systems smoking electronic health records information storage and retrieval

Get full access to this article

View all access options for this article.

References

U.S. Department of Health and Human Services. U.S. Department of Health and Human Services. The Health Consequences of Smoking – 50 Years of Progress: A Report of the Surgeon General: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, National Center. Rockville, MD: 2014.

World Health Organization. WHO Report On The global tobacco epidemic, 2015 Raising taxes on tobacco. WHO Press 2015. www.who.int/tobacco (accessed November 2, 2017).

Warnakulasuriya

Sutherland

Scully

. Tobacco, oral cancer, and treatment of dependence. Oral Oncol 2005; 41: 244–60. doi: 10.1016/j.oraloncology.2004.08.010.

Khor

Yip

W-K

Bressel

Rose

Duchesne

Foroudi

. Practical implementation of an existing smoking detection pipeline and reduced support vector machine training corpus requirements. J Am Med Inform Assoc 2014; 21: 27–30. doi: 10.1136/amiajnl-2013-002090.

Liu

Shah

Jiang

Peterson

Dai

Aldrich

, et al. A study of transportability of an existing smoking status detection module across institutions. AMIA Annu Symp Proc 2012; 2012: 577–86.

Hecht

. Tobacco Smoke Carcinogens and Lung Cancer. JNCI J Natl Cancer Inst 1999; 91: 1194–210. doi: 10.1093/jnci/91.14.1194.

White

D’Aloisio

Nichols

DeRoo

Sandler

. Active Tobacco Smoke and Environmental Tobacco Smoke Exposure During Potential Biological Windows of Susceptibility in Relation to Breast Cancer. Cancer Epidemiol Biomarkers Prev 2016; 25: 562–562. doi: 10.1158/1055-9965.EPI-16-0091.

Jamal

Dube

Malarcher

Shaw

Engstrom

, Centers for Disease Control and Prevention (CDC). Tobacco use screening and counseling during physician office visits among adults – National Ambulatory Medical Care Survey and National Health Interview Survey, United States, 2005–2009. MMWR Suppl 2012; 61: 38–45.

Siu

. Behavioral and Pharmacotherapy Interventions for Tobacco Smoking Cessation in Adults, Including Pregnant Women: U.S. Preventive Services Task Force Recommendation Statement. Ann Intern Med 2015; 163: 622. doi: 10.7326/M15-2023.

10.

Shimpi

Bharatkumar

Jethwani

Chyou

P-H

Glurich

Blamer

, et al. Knowledgeability, Attitude and Behavior of Primary Care Providers Towards Oral Cancer: a Pilot Study. J Cancer Educ 2016: 1–6. doi: 10.1007/s13187-016-1084-4.

11.

Clark

Good

Jezierny

Macpherson

Wilson

Chajewska

, et al. Identifying smokers with a medical extraction system. J Am Med Inform Assoc 2008; 15: 36–9. doi: 10.1197/jamia.M2442.

12.

Hazlehurst

Sittig

Stevens

Smith

Hollis

Vogt

, et al. Natural language processing in the electronic medical record: assessing clinician adherence to tobacco treatment guidelines. Am J Prev Med 2005; 29: 434–9. doi: 10.1016/j.amepre.2005.08.007.

13.

Centers for Disease Control and Prevention (CDC). Centers for Disease Control and Prevention. Meaningful use 2017. http://www.cdc.gov/ehrmeaningfuluse/introduction.html (accessed November 22, 2016).

14.

Rostami

Hegde

Shimpi

Pack

Olson

Acharya

. Oral Cancer Risk Assessment Using Machine Learning Algorithms. 2016 AADR/CADR Annu. Meet. (March 16–19, 2016) (Los Angeles), Los Angeles: Journal of Dental Research 95(A): 1486; 2016.

15.

Hyun

Johnson

Bakken

. Exploring the ability of natural language processing to extract data from nursing narratives. Comput Inform Nurs 2009; 27: 215-23-5. doi: 10.1097/NCN.0b013e3181a91b58.

16.

Valdez

Rueschman

Kim

Redline

Sahoo

. An Ontology-Enabled Natural Language Processing Pipeline for Provenance Metadata Extraction from Biomedical Text (Short Paper). move to meaningful Internet Syst…: CoopIS, DOA, ODBASE Confed. Int. Conf. CoopIS, DOA, ODBASE… proceedings. OTM Confed. Int. Conf., vol. 10033, 2016, p. 699–708. doi: 10.1007/978-3-319-48472-3_43.

17.

Dorr

Gaasterland

. Exploiting aspectual features and connecting words for summarization-inspired temporal-relation extraction. Inf Process Manag 2007; 43: 1681–704. doi: 10.1016/j.ipm.2007.01.008.

18.

Hegde

Shimpi

Pack

Rostami

Acharya

. Smoking Status Classification Of Clinical Notes Using Natural Language Processing. 2016 AADR/CADR Annu. Meet. (March 16–19, 2016) (Los Angeles), Los Angeles: Journal of Dental Research 95(A): 0997; 2016.

19.

Zeng

Goryachev

Weiss

Sordo

Murphy

Lazarus

. Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 2006; 6: 30. doi: 10.1186/1472-6947-6-30.

20.

Wang

Akella

. A Hybrid Approach to Extracting Disorder Mentions from Clinical Notes. AMIA Jt Summits Transl Sci Proc AMIA Summit Transl Sci 2015; 2015: 183–7.

21.

Gould

Sakoda

Ritzwoller

Simoff

Neslund-Dudas

Kushi

, et al. Monitoring Lung Cancer Screening Utilization and Outcomes in Four Cancer Research Network Sites. Ann Am Thorac Soc 2017: AnnalsATS.201703-237OC. doi: 10.1513/AnnalsATS.201703-237OC.

22.

Regan

Meigs

Grinspoon

Triant

. Determinants of Smoking and Quitting in HIV-Infected Individuals. PLoS One 2016; 11: e0153103. doi: 10.1371/journal.pone.0153103.

23.

Cavazza

Zweigenbaum

. Extracting implicit information from free text technical reports. Inf Process Manag 1992; 28: 609–18. doi: 10.1016/0306-4573(92)90030-4.

24.

Park

Hartzler

Huh

McDonald

Pratt

. Automatically Detecting Failures in Natural Language Processing Tools for Online Community Text. J Med Internet Res 2015; 17: e212. doi: 10.2196/jmir.4612.

25.

Varpio

Rashotte

Day

King

Kuziemsky

Parush

. The EHR and Building the Patient’s Story: A Qualitative Investigation of How EHR Use Obstructs a Vital Clinical Activity. Int J Med Inform 2015; 84: 1019–28. doi: 10.1016/j.ijmedinf.2015.09.004.

26.

Nikfarjam

Emadzadeh

Gonzalez

. Towards generating a patient’s timeline: Extracting temporal relationships from clinical notes. J Biomed Inform 2013; 46: S40–7. doi: 10.1016/j.jbi.2013.11.001.

27.

Wang

Ruan

Yang

Liu

. Comparison of Three Information Sources for Smoking Information in Electronic Health Records. Cancer Inform 2016; 15: 237–42. doi: 10.4137/CIN.S40604.

28.

Savova

Masanz

Ogren

Zheng

Sohn

Kipper-Schuler

, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Informatics Assoc 2010; 17: 507–13. doi: 10.1136/jamia.2009.001560.

29.

Husten

. How should we define light or intermittent smoking? Does it matter? Nicotine Tob Res 2009; 11: 111–21. doi: 10.1093/ntr/ntp010.

30.

Hassmiller

Warner

Mendez

Levy

Romano

. Nondaily Smokers: Who Are They? Am J Public Health 2003; 93: 1321–7. doi: 10.2105/AJPH.93.8.1321.

31.

Kang

Singh

Afzal

van Mulligen

Kors

. Using rule-based natural language processing to improve disease normalization in biomedical text. J Am Med Inform Assoc 2013; 20: 876–81. doi: 10.1136/amiajnl-2012-001173.