Abstract

Dear Editors,
Using natural language processing (NLP) to ascertain missing outcomes from unstructured notes has great potential for enhancing real world data to assess the care of patients with multiple sclerosis and other diseases. The novel approach of Alves et al. 1 uses the full information available in unstructured notes to address missingness in the expanded disability status scale (EDSS). This blurs the important distinction between extraction of information that is plainly in the medical records and single imputation of missing information using prediction based on correlations with other variables. This novel approach raises new and interesting questions. We focus on two that we believe warrant further consideration when dealing with missing patient outcomes.
First, how can algorithm performance be evaluated on the notes where EDSS is missing? Performance is readily measured in the group with EDSS assessments, but matters for those without. The excellent performance on the small test set of notes where EDSS is recorded may not reflect, even approximately, that on the large set of notes where EDSS is missing, since the former set might differ from the latter in many ways. Patient- and practice-level differences between the note samples with and without EDSS should be described. But beyond Table 1 characteristics, the notes without EDSS may differ from those with EDSS in the degree to which disability is assessed and documented, which may have important effects on algorithm performance.
Second, can these imputed outcomes be used for unbiased treatment effect estimation? This is a complex issue that is understandably not the focus of the paper. But consider that any feature from the unstructured notes that correlate with outcome in the small development sample, comprising 5% of the population, may be used to project eEDSS outcomes in the full sample. Any of these features also correlated to treatment in an observational study is a potential confounder, but these features may not be known or available for deconfounding during effect estimation. Even small biases in outcome prediction across treatment groups can cause clinically important biases in effect estimation, since prediction errors in each treatment group may compound and be magnified in the effect estimate.2,3
Given the above and other issues, we suggest maintaining a clear distinction between data (particularly outcome data) that has been extracted from the medical record and is verifiable at the individual patient level (e.g. by expert review of the medical record) and imputed outcomes that are not. We also wondered, since the binarized EDSS depends only on the ability to ambulate without an assistive device, whether a simple rule-based NLP may provide high-integrity, verifiable outcome data on many patients, avoiding some of the issues above and preserving transparency and analytically important information about where this outcome is actually missing in accord with emerging guidance. 4
Sincerely,
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
