Analysis of Unstructured Text-Based Data Using Machine Learning Techniques: The Case of Pediatric Emergency Department Records in Nicaragua

Abstract

Free-text information is still widely used in emergency department (ED) records. Machine learning techniques are useful for analyzing narratives, but they have been used mostly for English-language data sets. Considering such a framework, the performance of an ML classification task of a Spanish-language ED visits database was tested. ED visits collected in the EDs of nine hospitals in Nicaragua were analyzed. Spanish-language, free-text discharge diagnoses were considered in the analysis. Five-hundred random forests were trained on a set of bootstrap samples of the whole data set (1,789 ED visits) to perform the classification task. For each one, after having identified optimal parameter value, the final validated model was trained on the whole bootstrapped data set and tested. The classification accuracies had a median of 0.783 (95% CI [0.779, 0.796]). Machine learning techniques seemed to be a promising opportunity for the exploitation of unstructured information reported in ED records in low- and middle-income Spanish-speaking countries.

Keywords

emergency department visits low- and middle-income countries free-text discharge diagnosis Spanish random forest classification task

Get full access to this article

View all access options for this article.

References

Bentz

Ruzsics

Koplenig

Samardzic

(2016). A comparison between morphological complexity measures: Typological data vs. language corpora. Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC). https://www.aclweb.org/anthology/W16-4117

Biese

K. J.

Forbach

C. R.

Medlin

R. P.

Platts-Mills

T. F.

Scholer

M. J.

McCall

. . . Kizer

J. S.

(2013). Computer-facilitated review of electronic medical records reliably identifies emergency department interventions in older adults. Academic Emergency Medicine, 20, 621-628.

Breiman

(2001). Random forests. Machine Learning, 45, 5-32. doi:10.1023/A:1010933404324

Castillo

J. J.

(2010). A machine learning approach for recognizing textual entailment in Spanish. In Proceedings of the NAACL HLT 2010 Young Investigators Workshop on Computational Approaches to Languages of the Americas (pp. 62-67). Stroudsburg, PA: Association for Computational Linguistics.

Cotik

Filippo

Castaño

(2014). An approach for automatic classification of radiology reports in Spanish. Studies in Health Technology and Informatics, 216, 634-638.

Crouse

H. L.

Torres

Vaides

Walsh

M. T.

Ishigami

E. M.

Cruz

A. T.

. . . Soto

M. A.

(2016). Impact of an Emergency Triage Assessment and Treatment (ETAT)-based triage process in the paediatric emergency department of a Guatemalan public hospital. Paediatrics and International Child Health, 36, 219-224.

Denny

M. J.

Spirling

(2018). Text preprocessing for unsupervised learning: Why it matters, when it misleads, and what to do about it. Political Analysis, 26, 168-189.

Ehret

Szmrecsanyi

(2016). An information-theoretic approach to assess linguistic complexity. https://pdfs.semanticscholar.org/c4c9/d7dce2d1733f98ae268ca80eb35a3cc6e71b.pdf

Feinerer

Hornik

(2017). tm: Text mining package (Version 0.7-1). https://CRAN.R-project.org/package=tm

10.

Ford

Carroll

J. A.

Smith

H. E.

Scott

Cassell

J. A.

(2016). Extracting information from the text of electronic medical records to improve case detection: A systematic review. Journal of the American Medical Informatics Association, 23, 1007-1015.

11.

Geisler

B. P.

Schuur

J. D.

Pallin

D. J.

(2010). Estimates of electronic medical records in U.S. emergency departments. PLoS One, 5, e9274. doi:10.1371/journal.pone.0009274

12.

Gerbier

Yarovaya

Gicquel

Millet

A.-L.

Smaldore

Pagliaroli

. . . Metzger

M.-H.

(2011). Evaluation of natural language processing from emergency department computerized medical records for intra-hospital syndromic surveillance. BMC Medical Informatics and Decision Making, 11, 50. doi:10.1186/1472-6947-11-50

13.

Grolemund

Wickham

(2011). Dates and times made easy with lubridate. Journal of Statistical Software, 40, 1-25.

14.

Harrell

F. E. J.

(2014). rms: Regression modeling strategies (Version 4.1-3) [R Package]. http://CRAN.R-project.org/package=rms

15.

Heffernan

(2004). Syndromic surveillance in public health practice, New York City. Emerging Infectious Diseases, 10, 858-864.

16.

Henning

K. J.

(2004). What is syndromic surveillance? Morbidity and Mortality Weekly Report, 53(Suppl.), 5-11. https://www.cdc.gov/mmwr/preview/mmwrhtml/su5301a3.htm

17.

Hester

(2017). glue: Interpreted string literals. https://cloud.r-project.org/package=glue

18.

Hirshon

J. M.

Warner

Irvin

C. B.

Niska

R. W.

Andersen

D. A.

Smith

G. S.

McCaig

L. F.

(2009). Research using emergency department–related data sets: Current status and future directions. Academic Emergency Medicine, 16, 1103-1109.

19.

James

Witten

Hastie

Tibshirani

(2013). An introduction to statistical learning: With applications in R. New York, NY: Springer.

20.

Johnson

Gaus

Herrera

(2016). Emergency department of a rural hospital in Ecuador. Western Journal of Emergency Medicine, 17(1), 66-72.

21.

Kim

J.-H.

(2009). Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap. Computational Statistics & Data Analysis, 53, 3735-3745. doi:10.1016/j.csda.2009.04.009

22.

Lall

Abdelnabi

Ngai

Parton

H. B.

Saunders

Sell

. . . Mathes

R. W.

(2017). Advancing the use of emergency department syndromic surveillance data, New York City, 2012-2016. Public Health Reports, 132(1_suppl), 23S-30S.

23.

Liaw

Wiener

(2002). Classification and regression by randomForest. R news, 2(3), 18-22.

24.

Liu

Zhang

Wah Ho

A. F.

Hock Ong

M. E.

(2018). Artificial intelligence in emergency medicine. Journal of Emergency and Critical Care Medicine, 2, 82. doi:10.21037/jeccm.2018.10.08

25.

Metzger

M.-H.

Tvardik

Gicquel

Bouvry

Poulet

Potinet-Pagliaroli

(2017). Use of emergency department electronic medical records for automated epidemiological surveillance of suicide attempts: A French pilot study. International Journal of Methods in Psychiatric Research, 26(2). doi:10.1002/mpr.1522

26.

Obermeyer

Abujaber

Makar

Stoll

Kayden

S. R.

Wallis

L. A.

Reynolds

T. A.

(2015). Emergency care in 59 low- and middle-income countries: A systematic review. Bulletin of the World Health Organization, 93, 577-586.

27.

Pérez

Weegar

Casillas

Gojenola

Oronoz

Dalianis

(2017). Semi-supervised medical entity recognition: A study on Spanish and Swedish clinical corpora. Journal of Biomedical Informatics, 71, 16-30.

28.

R Core Team. (2017). R: A language and environment for statistical computing (Version 3.4.2). Vienna, Austria: R Foundation for Statistical Computing.

29.

Salton

Buckley

(1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24, 513-523. doi:10.1016/0306-4573(88)90021-0

30.

Sequeira

Espinoza

Amador

J. J.

Domingo

Quintanilla

De los Santos

(2011). The Nicaraguan health system. https://path.azureedge.net/media/documents/TS-nicaragua-health-system-rpt.pdf

31.

Taira

B. R.

Orue

Stapleton

Lovato

Vangala

Tinoco

L. S.

Morales

(2016). Impact of a novel, resource appropriate resuscitation curriculum on Nicaraguan resident physician’s management of cardiac arrest. Journal of Educational Evaluation for Health Professions, 13, 25. doi:10.3352/jeehp.2016.13.25

32.

Tanev

Zavarella

Linge

Kabadjov

Piskorski

Atkinson

Steinberger

(2009). Exploiting machine learning techniques to build an event extraction system for Portuguese and Spanish. Linguamática, 1(2), 55-66.

33.

Wickham

(2017a). stringr: Simple, consistent wrappers for common string operations. https://stringr.tidyverse.org/

34.

Wickham

(2017b). tidyverse: Easily install and load the “tidyverse.” https://tidyverse.tidyverse.org/

35.

Wing

Weston

Williams

Keefer

Engelhardt

Cooper

. . . Hunt

(2017). caret: Classification and regression training. https://rdrr.io/cran/caret/

36.

Worster

Bledsoe

R. D.

Cleve

Fernandes

C. M.

Upadhye

Eva

(2005). Reassessing the methods of medical record review studies in emergency medicine research. Annals of Emergency Medicine, 45, 448-451.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.53 MB