Abstract
Background/Significance:
Alcohol use carries significant morbidity and mortality, yet accurate identification of alcohol use disorder (AUD) remains a multi-layered problem for both researchers and clinicians.
Objective:
To fine-tune a language model to AUD in the clinical narrative and to detect AUDs not accounted for by ICD-9 coding in the MIMIC-III database.
Materials and Methods:
We applied clinicalBERT to unique patient discharge summaries. For classification, patients were divided into nonoverlapping groups stratified by the presence/absence of AUD ICD diagnosis for model training (80%), validation (10%), and testing (10%). For detection, the model was trained (80%) and validated (20%) on 1:1 positive/negative patients, then applied to remaining negative patient population. Physicians adjudicated 600 samples from the full model confidence spectrum to confirm AUD by Diagnostic and Statistical Manual of Mental Disorders-V criteria.
Results:
The model exhibited the following characteristics (mean, standard deviation): precision (0.9, 0.02), recall (0.65, 0.03), F-1 (0.75, 0.02), area under the receiver operating curve (0.97, 0.01), and area under the precision-recall curve (0.86, 0.01). Adjudication produced an estimated 4% under-documentation rate for the total study population. As model confidence increased, AUD under-documentation rate rose to 30% of the number of patients identified as positive by ICD-9 coding.
Conclusion:
Our model improves the identification of patients meeting AUD criteria, outperforming ICD codes in detecting cases of AUD. Detection discrepancy between ICD and free-text highlights clinician under documentation, not under recognition. Adjudication revealed model over-sensitivity to language around substance use, withdrawal, and chronic liver disease; future study requires application to a broader set of patient age and acuity. This model has the potential to improve rapid identification of patients with AUD and enhance treatment allocation.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
