Abstract
As
Introduction
Thinking machines have long occupied and fascinated the human mind, captivating us with their potential to radically alter society. As the digitization of society has progressed and massive amounts of data have accrued, a new vocabulary has emerged. The scale of data collection has given rise to
Health care is no exception, and the prospect of transforming care delivery by way of these technologies is a vibrant and rapidly growing area of research.1–5 Dissatisfied with besting humans at jeopardy, IBM’s Watson has set its sights on health and disease (and learned so far that these are very difficult domains, indeed).
6
DeepMind, having wowed the world with a reinforcement learning agent that achieved superhuman performance at
The potential success of these new technologies rests largely on 2 key drivers: affordable, accessible high-performance computer hardware and an explosion of data. The latter is often taken for granted. The medical literature has been quick to embrace
Critical care medicine concerns itself with the care of unstable, high-acuity patients, particularly those with multi-organ failure; continuous physiologic monitoring is consequently the hallmark of the intensive care unit (ICU). With the near-total digitization of health care, the ICU represents an incredibly fertile ground for the proliferation of big data technologies. Advancements that take advantage of this wealth of data promise to fortify our currently relatively fragile evidence base by providing large cohorts for knowledge discovery and causal inference, and will provide the substrate for the next generation of clinical decision support tools. 10 Reliable clinical data, whether digital or not, have always been the basis for caring for our patients, but digitization provides the opportunity to leverage the troves of data generated in the ICU to advance the field into a new era of medicine. In this perspective review, we examine the fundamental role of data as we present the current progress that has been made toward a data-driven precision critical care medicine.
Critical Care Databases
Knowledge discovery, decision support model development, and the education of the next generation of clinician data scientists all require health data to be available and easily accessible. In fact, we envision a future in which all clinicians will be data scientists to a certain degree. Prior commercial databases developed primarily for the development of benchmark models and national registries lack the resolution and volume required to support the breakthroughs of this new era. 11 However, over the last 2 decades, we have witnessed the emergence of large-scale, highly granular, critical care databases for use in observational research and predictive model development.
The Multiparameter Intelligent Monitoring in Intensive Care (MIMIC) database was the first resource of this kind. 12 Developed and maintained over the past 2 decades by the MIT Laboratory for Computational Physiology (LCP), the database is now its third iteration as the Medical Information Mart for Intensive Care (MIMIC-III) database. 13 The MIMIC-III database contains high-resolution and multi-modal de-identified data from the electronic health record (EHR) associated with 53 342 distinct hospital admissions to the Beth Israel Deaconess Medical Center (BIDMC) in Boston, Massachusetts. The data include, but are not limited to, vital sign recordings and waveforms, laboratory data, clinical notes, diagnostic reports, and administered interventions, including medications. Some of the data are quantitative or structured, but much requires extraction from text format.
In addition to MIMIC-III, the LCP partnered with Philips to release de-identified data from the Philips eICU Research Institute. The eICU Collaborative Research Database (eICU-CRD), now at version 2.0, is a multi-center critical care database containing data from more than 200 000 ICU admissions from across the United States that were archived from Philips’ ICU telehealth platform. 14 This resource allows for the development of models with populations more representative of the entire United States, ascertaining the generalizability of findings and models.
Long believing that open access to data spurs innovation and accelerates progress, the LCP makes MIMIC-III and eICU-CRD publicly available to any individual who completes a standard course on human subject research and signs a data use agreement. In doing so, these data have allowed for countless projects in academia and industry, and the availability of MIMIC-III has made the BIDMC ICU population the most intensely studied critically ill cohort to date. In addition, the data use agreement for MIMIC-III requires that the code for projects developed with MIMIC-III be publicly shared. This has led to the rapid development of reusable concepts and their respective codes and queries, with the LCP maintaining a large, publicly available code repository. 15 The availability of this code accelerates research and promotes reproducibility by ensuring that common concepts are implemented consistently across studies.
By way of international collaborations that will be discussed further below, MIMIC-III has inspired the development of similar critical care databases in Spain, Brazil, China, Australia, and Switzerland. The existence of these databases drives similar progress in those respective countries and should lead to an international system of data sharing capable of supporting the development of large international observational cohorts and generalizable predictive models. Unfortunately, there remain major barriers to data sharing endeavors.
From a technical perspective, international data sharing represents a complex challenge. There is currently no widely accepted database structure for critical care databases. Should such a structure be developed, disparate concepts between centers would require harmonization, and we currently lack a common standard for representing various sources of clinically relevant information. 16 As medical centers have substantial differences in the way care is delivered, with variable access to medical technologies, and cultural differences in the way care is documented, the development of a system for cross talk between critical care databases would be a major engineering feat.
Varying perspectives on privacy and data sharing represent an even greater barrier, and the prospect is increasingly limited by complex legal frameworks. 17 For example, the European Union (EU) General Data Protection Regulation (GDPR) applies to all data controllers and processors of personal data for subjects in the EU regardless of whether the processing occurs in the EU or not, and thus databases based outside of the EU must comply with GDPR if residents from the EU are included in the data. Therefore, the linking of MIMIC-III which contains subjects not requiring explicit consent, to a database from the EU for a larger, more broadly applicable analysis would require explicit consent be obtained by the researchers if anonymization is deemed inadequate. There are also numerous opponents to public data sharing on the grounds of missed financial opportunities to monetize these intrinsically valuable data. 18 Together, these barriers hinder progress toward the development of a global network for health data exchange, and legal and ethical frameworks must evolve for us to make progress toward this ultimate goal.
Collaborative Data Science
Deriving insight from large EHR databases is a non-trivial task requiring skills and expertise that span multiple disciplines from clinical intensive care to sophisticated statistical methods. The methods by which data are explored, processed, harmonized, transformed, and modeled fall well beyond the purview of traditional medical training and can lead to misunderstandings of what can and cannot be accomplished with these tools. As implementation of these methods often requires acumen with programming languages like SQL, Python, and R, clinicians may find themselves overwhelmed, even when they have a relatively sound understanding of complicated biostatistical approaches.
Similarly, data scientists rarely have the clinical insights to know what questions are relevant to medical care and how the data themselves were generated in practice and how they should be interpreted. Consider also that patterns of missing data in the EHR are rarely uninformative: a serum lactate level is ordered when physicians are concerned about the adequacy of organ perfusion, and thus the very presence of this laboratory test in a patient’s data tells us something about the clinical context. This small insight is obvious to physicians, but the apparent “missingness” of lactate values might perplex an uninformed data scientist and lead to an incorrect modeling decision. Similarly, a key step in model building processes is feature engineering and selection. Considering the breadth of data available in an electronic medical record, when should, for example, a serum phosphate level be included in a predictive model? Certainly, it will be more useful when modeling a population of patients with kidney disease, but likely less useful in a population of patients with acute trauma.
There has been no dearth of literature arguing for changes in medical education such that the next generation of clinicians can understand and work with complex statistical methods, and grasp the computational approaches that will undoubtedly be incorporated into their practices.19–21 That said, whereas the clinician data scientist will surely emerge (in a manner akin to the translational scientist) to bridge the computational and clinical science realms, the future of medical research and health care delivery will progressively rely on collaboration between clinicians and data scientists. The term
The success of the datathon model has relied heavily on the availability of data. The MIMIC-III and the eICU-CRD databases serve as the substrates on which clinicians can learn to ask questions amenable to secondary analysis, and data scientists can begin wrangling real health care data. The events often begin with physicians from local hospitals pitching their research questions to the audience. Teams are formed and immediately get to work to parse the question into a study design, extract the cohort, and build models with the support and guidance of clinical data scientists from MIT Critical Data. With the publicly available code repository containing many of the common concepts required for critical care research, projects can be rapidly performed in the span of a weekend.15,28 Mentors provide feedback throughout the entire process and ultimately evaluate the clinical relevance, technical implementation, and reproducibility of the final projects.
MIT Critical Data has hosted more than 20 datathon events in 10 countries across 5 continents jumpstarting numerous international collaborations. Many of the projects pitched and initiated at datathon events are eventually published in the scientific literature.29–32 In addition, as mentioned in the previous section, these international collaborations have demonstrated the value of secondary EHR analysis to countless decision makers at health care institutions across the world and have led to the development of similar critical care databases. Despite the aforementioned barriers, this trend is laying the groundwork for a network of EHR data sharing that will ultimately allow for multi-national and multi-institution analyses.
This collaborative format, in which clinicians propose research questions and work with teams of data scientists to address them, has also given rise to a course at the Harvard-MIT Division of Health Science, and Technology (HST). The course “Collaborative Data Science for Medicine” introduces students to MIMIC-III and the eICU-CRD, and features lectures on database querying, statistics and epidemiology, data exploration and visualization, machine learning, and causal inference. The course, now in its third year, produces numerous abstracts, presentations, and publications and will serve as a model for other courses around the world.33–37 To promote such efforts, MIT Critical Data has published a textbook for the course,
All of these efforts seek to build a bridge between clinician and data scientist that works to improve understanding of health and disease, and ultimately impact patient outcomes. Working side by side, clinicians and data scientists provide a skill set far greater than the sum of their parts. This partnership is the only way medicine can hope to navigate the
Machine Learning and Decision Support
Clinical decision making is rife with uncertainty: we seek to leverage the evidence derived from clinical trials and observational studies, but often the specific study we require does not exist, and when it does, it is usually insufficient in one or more respects. Furthermore, as the ground truth is constantly shifting in medicine, even a perfectly performed and applicable study from a few years prior may no longer apply as new tests and treatments are incorporated into practice and patient demographics change. Information gaps are one of the drivers of variation in care as physicians rely on their prior experiences and training as well as institutional culture to guide decisions. A process of continually using routinely collected clinical data to update knowledge and guide practice, intimately linking knowledge generation and care delivery, represents a new paradigm that promises to bring us closer to a true evidence-based care.
39
This concept has often been referred to as the
The emergence of sophisticated machine learning methods has inched us closer to this vision, and we have recently seen a variety of exciting implementations of machine learning applications in critical care medicine.1,3 It should be noted that this is not a completely novel concept in critical care as approaches like multivariate logistic regression, a form of machine learning, have long been applied in this specialty. For example, illness severity scores such as the APACHE (Acute Physiology and Chronic Health Evaluation) system represent an early form of machine learning in health care, although APACHE and similar models have generally not been used to guide clinical decision making.11,41,42 More recent applications of machine learning with EHR data have included gradient boosted decision trees that can forecast acute kidney injury and predict readmission; convolutional neural networks that can diagnose diabetic retinopathy; recurrent neural networks that can prognosticate directly from clinical time series data; and a reinforcement learning agent that can make treatment decisions in sepsis.43–49 This last example encapsulates the essence of a vast collective experience: the agent was trained on the management decisions of clinicians caring for more than 100 000 sepsis patients and learned to tailor treatment to each individual patient with the goal of reducing 90-day mortality. 46
Many of these more complex methods befuddle clinicians. Rooted in intricate mathematical concepts and proofs, their correct application to clinical problems is not trivial. However, formatting data for model training and fitting a model correctly to minimize the generalization error represent the easiest steps in the creation and deployment of clinical decision support tools. As has been stressed above, the first requirement in this process is the data. Data preparation for machine learning—which includes aggregation, integration, and harmonization—requires substantial effort and buy-in from health care administrators, hospital information technologists, data engineers, and data scientists. The challenge of navigating these barriers dwarfs that of model development. Should these 2 steps be successfully traversed, bringing the model to the bedside presents an equally monumental challenge. Model safety with attention to identification of algorithm bias must be considered and clinical validation is crucial; usability and information overload must also be considered.50,51 We will focus the remainder of this section on specific challenges to developing models that can be effectively incorporated into routine care.
George EP Box famously stated, “all models are wrong, but some are useful.” 52 The question then is how do we determine which are useful. Classification models are frequently described by their ability to discriminate. Discrimination is most often measured by the area under the receiver operating characteristic curve (AUROC). 53 However, the AUROC is a less appropriate measure of performance when a model’s task is detection of rare events, as is common in the critical care context, because, for rare events, specificity disproportionately drives accuracy. 54 The area under the precision-recall curve (AUPRC) should be used in these instances, as it provides a more accurate measure in the face of rare events.
Neither of these metrics captures a model’s performance regarding the quantification of absolute risk, which is often of greater clinical value than discrimination of event from non-event. 53 A classification model’s ability to adequately quantify absolute risk probabilities is termed calibration. Calibration may be examined visually with reliability curves and may be quantified by way of observed-to-predicted ratios; null hypothesis tests such as the Hosmer-Lemeshow goodness-of-fit test are not recommended. 53 However, calibration has been less emphasized in the literature and has recently been described as the “Achilles Heel” of clinical predictive model development. 55 Model calibration is sensitive to shifts in measured and unmeasured covariates, and thus if a patient is not drawn from a population similar to the cohort the model was trained on, the model may provide an incorrect risk estimate. 56 We have broached this problem in illness severity score development, but ultimately deployed models will need to have calibration continuously evaluated, requiring regular re-calibration, as well as users who have the ability to tell when a model does not apply to the patient in front of them. 57
Although correct metric selection should drive the development of a well-performing model, there remains an important, and as of yet addressed, caveat: causation. 58 Machine learning approaches have demonstrated incredible performance in fitting the associations inherent in the underlying data generating process while avoiding overfitting the random noise that threatens generalization. Nevertheless, these models do not grasp causal structure. As such, they optimize for metrics which ensure prediction based on the associations within the data, but sometimes these associations are spurious and the model relies on an association from a “backdoor path” provided by an unmeasured variable or variables. For example, Caruana et al 59 found that asthma was protective of death from pneumonia when building predictive models for pneumonia outcomes. In fact, in the institution where the data were obtained for the model, an asthma attack triggers a higher level of care. A similar issue has been noted in the application of illness severity scores to morbidly obese patients who are critically ill: their physiology is altered at baseline so that they appear sicker than they truly are based on cut-off values established in a cohort with few morbidly obese patients. 33 This blindness to causality has recently been discussed more broadly within the context of training models on data wrought with human biases. Racial and sex underrepresentation within datasets as a result of structural biases may lead to models that misclassify underrepresented groups, causing misallocations in care that ultimately amplify health care disparities. 60 Model developers must therefore take care because current machine learning approaches are blind to causal structure. Schulam and Saria 61 have approached this with some success by attempting to model not only the factual Gaussian processes present in data, as current approaches do, but also the counterfactual Gaussian processes. However, there is much work to be done toward the development of models that can identify causal structure.
The problem of causality speaks to the greater problem of model interpretation. As models grow more sophisticated, they also tend to become less interpretable. Deep learning models boast a remarkable ability to examine complex non-linear interactions between inputs, exploiting patterns within the data beyond what the human mind could identify alone, but these models are difficult to interpret and currently represent predictive black boxes.8,62 A lack of interpretability is a non-starter for physicians in practice and a significant barrier to incorporation of such models into clinical decision making.
Shortliffe and Sepulveda 63 highlighted this issue and emphasized the need to consider how these tools will actually be deployed in practice. In a recent publication, they provide a series of criteria for the development of clinical decision support tools. Their insightful considerations reflect the real-life barriers to uptake including clinician workload and application usability. As most models end up in a graveyard of citations that are never deployed, future efforts should focus on usability, interpretability, and, most importantly, impact on relevant population or health system outcomes.
Toward a Precision Critical Care
Precision medicine seeks to tailor care to the individual, and precision critical care has become an active research area.
64
Whereas clinicians have always individualized care based on their interpretation of clinical data, the term “precision” has come to mean the use of genomics, expression analyses including proteomics, metabolomics, and other data sources to target the mechanisms which define specific disease phenotypes as well as therapeutic responsiveness. This philosophy has flourished in oncology, where driver mutations and pathway-specific therapies have emerged. However, the critically ill are defined by multi-organ failure occurring via a complex interplay of exposure, host response, and genomic substrate and expression along with innumerable other dimensions of variation that challenge the successful application of such approaches.
65
These -omics data are inherently
The availability of large EHR databases and sophisticated modeling approaches bring us closer to the promise of a precision critical care. The granularity afforded by access to all of the data a patient generates presents an opportunity to examine nuances of care previously inaccessible to the unaided individual clinician. This ability to capture and analyze all the available data will allow clinicians to continuously trend signals to support the iterative formulation of assessments and plans. 67 These data at the population level should also assist in the future creation of more precise therapeutic interventions than currently available in critical care. For example, individualized differences in the nature or timing of the immune response to sepsis or trauma would inform the selection of treatment A rather than treatment B for a particular constellation of insult, host state of instability, and immunologic response. 68 Artificial intelligence will grow to fill in the gaps in this process out of necessity as the volume and dynamics of the data inputs exceed even the abilities of a clinician dedicated to the bedside care of a single patient.
One example of a potential use for clinical data analytics is in the area of laboratory test interpretation. The idea that a
More abstractly, for any given patient in the ICU, there exist a set of recorded variables; the values of these variables, as well as the presence or absence of such data, and even the time of the data collection collectively define an interaction between the patient and the caregivers. Furthermore, each of these collections of patient variables is an element of a data mart which defines the interaction between the ICU’s population, the system within which the ICU exists, and the ICU’s caregivers. Formalizing these as the ingredients that define the substrate for understanding collective experience, we envision the next generation of EHR to support “dynamic clinical data mining” (DCDM).
70
Specifically, DCDM would enable examination of any single ICU encounter within the context of
As critical care does not yet possess gene-based therapies, precision medicine in this area rests on a data-driven capacity to make more individualized decisions in a greater variety of clinical contexts. To begin to approach this task, we must start to store all pertinent data on individual patients, as we are doing, and develop open, de-identified population databases as we are only beginning to do. Appropriate software, including a variety of machine learning applications, will be required to harness, analyze, and apply the data necessary to ensure that precision medicine can be practiced in this especially complex domain.
Conclusions
In the short term, with the emergence of powerful machine learning approaches, and data volumes that allow patients to be mapped across an expanding dimension of physiologic variations, we stand at the precipice of a new era of critical care that will be individualized in a data-driven manner. The barriers are myriad, but if clinicians, data scientists, and policy-makers can work together, the vision of a learning health care system may be realized. To achieve this vision of personalized care, physician collaboration with relevant experts such as data scientists is crucial. The training of physicians will require some very fundamental overhauls that may be poorly understood by and even in conflict with the educational hierarchy already in place in medicine. And most importantly, having the most complete, reliable, and interoperable data to work with represents a necessary if insufficient goal for the infrastructure of a digital, learning health system for acutely ill patients.
Footnotes
Funding:
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Declaration of conflicting interests:
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Author Contributions
DJS and LAC outlined the scope of the review, and CVC developed the initial manuscript draft. All authors participated in developing the final manuscript.
