Sage Journals: Discover world-class research

Abstract

Accurate and efficient injury coding is critical for effective injury surveillance. Machine learning (ML) models trained on manually-coded historical injury records have been used to predict the injury codes efficiently and with reasonable accuracy but their accuracy has been limited for complex narratives and rare causes of injury. In this study, we examined performance of Large Language Models (LLMs) to predict three injury codes: cause-of-injury, product-involved, and nature-of-injury, on 100 injury cases randomly selected from the National Electronic Injury Surveillance System database. The prediction performance of LLM (ChatgGPT-3.5) was compared with a traditional ML (Logistic Regression) and a neural network model (Multilayer Perceptron). We observed that LLM was better than the other two models in terms of effectively (a) extracting syntactic relationships, (b) handling misspellings and common acronyms, and (c) deciphering semantic information from the text with reasonable accuracy, even when the narratives were not in a proper grammatical format.

Keywords

injury coding injury surveillance machine learning large language models product safety

Get full access to this article

View all access options for this article.

References

Lehto

Marucci-Wellman

Corns

(2009). Bayesian methods: A useful tool for classifying injury narratives into cause groups. Injury Prevention: Journal of the International Society for Child and Adolescent Injury Prevention, 15(4),259–265. https://doi.org/10.1136/ip.2008.021337

Marucci-Wellman

H. R.

Lehto

M. R.

Corns

H. L.

(2011). A combined Fuzzy and Naïve Bayesian strategy can be used to assign event codes to injury narratives. Injury Prevention: Journal of the International Society for Child and Adolescent Injury Prevention, 17(6),407–414. https://doi.org/10.1136/ip.2010.030593

Marucci-Wellman

H. R.

Lehto

M. R.

Corns

H. L.

(2015). A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms. Accident Analysis & Prevention, 84, 165–176. https://doi.org/10.1016/j.aap.2015.06.014

Nanda

Grattan

K. M.

Chu

M. T.

Davis

L. K.

Lehto

M. R.

(2016). Bayesian decision support for coding occupational injury data. Journal of Safety Research, 57, 71–82. https://doi.org/10.1016/j.jsr.2016.03.001

Nanda

Vallmuur

Lehto

(2018). Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution? Accident Analysis & Prevention, 110, 115–127. https://doi.org/10.1016/j.aap.2017.10.020

Nanda

Vallmuur

Lehto

(2019). Semi-automated text mining strategies for identifying rare causes of injuries from emergency room triage data. IISE Transactions on Healthcare Systems Engineering, 9(2),157–171. https://doi.org/10.1080/24725579.2019.1567628

Nanda

Vallmuur

Lehto

(2020). Intelligent human-machine approaches for assigning groups of injury codes to accident narratives. Safety Science, 125, 104585. https://doi.org/10.1016/j.ssci.2019.104585

Examining the Efficacy of Large Language Models for Injury Surveillance

Abstract

Keywords

Get full access to this article

References