Abstract
Objectives
Systemic lupus erythematosus (SLE) is a heterogeneous disease characterized by disease flares which can require hospitalization. Our objective was to apply machine learning methods to predict hospitalizations for SLE from electronic health record (EHR) data.
Methods
We identified patients with SLE in a longitudinal EHR-based cohort with ≥2 outpatient rheumatology visits between 2012 and 2019. We applied multiple machine learning methods to predict hospitalizations with a primary diagnosis code for SLE, including decision tree, random forest, naive Bayes, logistic regression, and an ensemble method. Candidate predictors were derived from structured EHR features, including demographics, laboratory tests, medications, ICD-9/10 codes for SLE manifestations, and healthcare utilization. We used two approaches to assess these variables over longitudinal follow-up, including the incorporation of lagged features to capture changes over time of clinical data. The performance of each model was evaluated by overall accuracy, the F statistic, and the area under the receiver operator curve (AUC).
Results
We identified 1996 patients with SLE. 4.6% were hospitalized for SLE in their most recent year of follow-up. Random forest models had highest performance in predicting SLE hospitalizations, with AUC 0.751 and AUC 0.772 for two approaches (averaging and progressive), respectively. The leading predictors of SLE hospitalizations included dsDNA positivity, C3 level, blood cell counts, and inflammatory markers as well as age and albumin.
Conclusion
We have demonstrated that machine learning methods can predict SLE hospitalizations. We identified key predictors of these events including known markers of SLE disease activity; further validation in external cohorts is warranted.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
