Abstract
Screening for social determinants of health (SDOH) is recommended, but numerous barriers exist to implementing SDOH screening in clinical spaces. In this study, the authors identified how both active and passive information retrieval methods may be used in clinical spaces to screen for SDOH and meet patient needs. The authors conducted a retrospective sequential cohort analysis comparing the active identification of SDOH through a patient-led digital manual screening process completed in primary care offices from September 2019 to January 2020 and passive identification of SDOH through natural language processing (NLP) from September 2016 to August 2018, among 1735 patients at a large midwestern tertiary referral hospital system and its associated outlying primary care and outpatient facilities. The percent of patients identified by both the passive and active identification methods as experiencing SDOH varied from 0.3% to 4.7%. The active identification method identified social integration, domestic safety, financial resources, food insecurity, transportation, housing, and stress in proportions ranging from 5% to 36%. The passive method contributed to the identification of financial resource issues and stress, identifying 9.6% and 3% of patients to be experiencing these issues, respectively. SDOH documentation varied by provider type. The combination of passive and active SDOH screening methods can provide a more comprehensive picture by leveraging historic patient interactions, while also eliciting current patient needs. Using passive, NLP-based methods to screen for SDOH will also help providers overcome barriers that have historically prevented screening.
Introduction
Medical care is estimated to account for only 10%–20% of modifiable contributors to healthy outcomes for a population, whereas the other 80%–90% are accounted for by health-related behaviors, socioeconomic factors, and environmental factors, broadly referred to as social determinants of health (SDOH). 1
The World Health Organization (WHO) defines SDOH as conditions in which people are born, grow, live, work, and age; they are mostly responsible for health inequality seen within and between countries. 2 The WHO categorizes the determinants into 5 domain social issues: economic stability, education, social and community context, health and health care, and neighborhood and built environment. 3
Physicians and hospital staff agree that screening for SDOH is important. The American Academy of Family Physicians recommends documenting SDOH for every patient at every office visit. 4 Screening for SDOH has allowed large health care institutions to identify social needs and implement appropriate interventions. 5 In addition, implementing SDOH is cost-effective for hospitals. Economic benefits include hospitals not being penalized by Medicare or Medicaid for readmission, being eligible for reimbursement by Medicare or Medicaid at higher rates, and increasing satisfaction and loyalty of employees and patients. 6
Moreover, there has been a recent move of federal, state, and commercial payers to include SDOH as a clinical quality metric. 7 Despite clear recommendations, few hospital systems and physician practices screen patients for major SDOH. 8
Significant concerns have surfaced about screening for SDOH needs in health care contexts, especially in acute care settings. 9,10 Studies have found that physicians may not feel confident about discussing SDOH with patients due to a variety of reasons, including time constraints on patient interactions or not knowing how to ask about social determinants. 11 In addition, the time taken to screen for SDOH may take away time that clinicians spend providing medical care. 12
Natural language processing (NLP) methods can be used to screen medical records and could be key to improving performance on SDOH identification by clinicians. 13 About 80% of medical data remain unstructured, 14 and hospital systems have a difficult time using this unstructured medical data in ways that are relevant, efficient, and effective. 15,16 Numerous NLP programs have been used to successfully generate structured information from unstructured clinical data. 17 –20 Many electronic medical record vendors recognize this potential and are already offering screening tools for hospitals' and providers' individual needs. 21 Machine learning has already been utilized in patient-care settings to identify high-risk patients who might benefit from having their social needs addressed. 22,23
In this study, the authors compared and analyzed the benefit of using active versus passive information retrieval methods to screen for SDOH. The objective of this study is to identify how both active and passive information retrieval methods may be used in health care settings to best address patient needs.
Methods
Setting, participants, and study design
This study conducted a retrospective sequential cohort analysis comparing the active identification of SDOH through a patient-led digital manual screening process completed in primary care offices from September 2019 to January 2020 and passive identification of SDOH through NLP from September 2016 to August 2018, among 1735 patients at a large midwestern tertiary referral hospital system and its associated outlying primary care and outpatient facilities.
The study population was drawn from rural adult patients who were seen at 1 of 4 outpatient primary care clinics and 4 related hospitals that serviced those clinics within their network. Patients were selected to be included in the study if they were at least 18 years of age, agreed to complete manual screening for SDOH, and had either an ambulatory or inpatient visit during the study period. The OSF Institutional Review Board approved the research protocol. IRB number is #1689505-4.
Passive identification of SDOH
NLP is a subset of artificial intelligence, which enables machines to have the ability to read and understand the human language. Passive information retrieval was performed by using the Pieces NLP system to identify SDOH from narrative documentation in the electronic health record (EHR) on a sample of 2131 deidentified patients. The Pieces software platform is a cloud-based, machine learning software platform that securely sends and receives data from external sources, such as the EHR. The software is designed to collect and process both structured and unstructured clinical data from multiple data sources, and to interpret this information using predictive modeling and NLP.
Clinical chart records from both inpatient and outpatient encounters were provided to the Pieces system through a Health Insurance Portability and Accountability Act compliant secure file transfer protocol communication channel. Patient documentation relating to SDOH was identified by the Pieces system, categorized by the following models, as defined in Figure 1.

Passive social determinants of health identification—natural language processing model definitions.
Active identification of SDOH
Active information retrieval was performed by the administration of an SDOH digital screening questionnaire to patients at outpatient clinics. This questionnaire was field tested. All patients who receive services at all of the study sites completed the questionnaire as part of the routine workflow using a tablet before seeing their provider. The screening questionnaire includes 22 questions regarding the following SDOH concepts: social integration, safety and domestic violence, education, financial resource strain, food insecurity, transportation, housing, and stress.
In addition to being screened for SDOH needs, patients were also asked whether they would be willing to accept assistance if they screened positive for 1 or more SDOH categories. Results from the SDOH screening were incorporated into the patients' EHR.
Data analysis
Patients with incomplete narrative notes were excluded from the analysis. The proportions of the notes were evaluated by author type, such as physician, physician assistant, case manager, and social worker. The clinical notes were then run through the passive identification method, allowing the NLP models to extract phrasing that identified the patients as experiencing 1 or more SDOH as defined in Figure 1.
The positive predictive value (PPV) and sensitivity of the SDOH NLP models were calculated by comparing the performance of the NLP system against gold standard manual note review using random sampling of 58% (N = 1132) of all of the cases. Trained chart reviewers independently examined the EHR notes of the sampled cases for the SDOH features using the definitions described in Figure 1. The results of this manual review were compared with results generated from the passive identification method to calculate both PPV and sensitivity of the SDOH NLP models (Fig. 2).

Defining natural language processing model evaluation metrics.
The authors calculated the sensitivity, specificity, PPV, and negative predictive value (NPV) of the passive identification method against manual chart review for each SDOH, individually as well as in combination.
PPV indicates the probability that a patient who is identified as having an SDOH by the passive method has manually verified documentation of that SDOH; whereas NPV indicates the probability that a patient who is identified as not having an SDOH truly does not.
PPV = True positive/(True positive + False positive)
NPV = True negative/(True negative + False negative)
Sensitivity indicates the probability of the passive system correctly identifying SDOH among those with manually verified documentation of that SDOH; whereas specificity indicates the probability of the passive system correctly identifying patients with no SDOH documented as such. Sensitivity and specificity are not affected by the prevalence of SDOH.
Sensitivity = True positive/(True positive + False negative)
Specificity = True negative/(True negative + False positive)
In addition, the passive identification findings were analyzed to review the frequency of SDOH found in the various author types. Descriptive statistics were used to present means and percentages, as appropriate.
The SDOH needs identified by active identification were matched to those identified by the passive approach (Fig. 3). The prevalence of the SDOH identified by each method was reviewed to evaluate the potential benefits of combining both methods of information retrieval for SDOH screening.

Matching of social determinants of health between 2 methods of identification.
The authors compared the frequency of SDOH identified by the 2 methods using Pearson χ 2 , as calculated by SciStat. The difference in the findings between the passive and active methods of identifying SDOH was found to be statistically significant, as discussed further in the Results section.
Results
The demographics of the study population can be viewed in Table 1. The population comprised primarily White (96%) and female (64%) participants. The average age was 50 years old, 50% of the participants reported being married, and over 80% reported having completed high school or higher education.
Baseline Patient Characteristics
Table 2 gives the prevalence of social determinants identified by passive or active screening. The percent of patients identified by both identification methods as experiencing an SDOH varied from 0.3% to 4.7%, depending on which SDOH need was being assessed.
Prevalence of Social Determinants Identified by Method
The passive method was the only method that sought to identify behavioral health, substance misuse, and disability, which may be viewed as drivers to the other social needs assessed in the study. The passive method also contributed to the identification of financial resource issues and stress, identifying 9.6% and 3% of patients to be experiencing these issues, respectively. The manual identification method identified social integration, domestic safety, financial resources, food insecurity, transportation, housing, and stress in proportions ranging from 5% to 36%.
The proportions of the population for which each SDOH was identified by the manual compared with the passive method were found to be statistically significant (Pearson χ 2 ) for all 7 SDOH that both methods sought to identify (P < 0.0001).
Table 3 reports the SDOH NLP model performance. Statistical analysis demonstrated that the NLP models (passive method) correctly identified (PPV) 84%–100% of the SDOH extracted from the clinical notes 84% or more of the time, varying by the particular SDOH being assessed. The average PPV for the SDOH NLP models was 0.98. The NLP models (passive method) identified 75%–100% of SDOH documented in the clinical notes (sensitivity), varying by the SDOH being assessed. The average sensitivity for the SDOH NLP model was 0.96.
Pieces Social Determinants of Health Natural Language Processing Model Performance: Patient-Level Statistics
The author types of the notes in which the passive method identified SDOH were analyzed. The type of SDOH discussed with patients as identified by the passive method tends to vary by provider. For example, physicians in the inpatient setting were more likely than case managers to identify more clinically related issues, such as behavioral health and substance misuse, 50% versus 36% (χ2, P < 0.0001) and 36% versus 5% (χ2, P = 0.0003), respectively. These differences were statistically significant.
In contrast, case managers were more likely to document notes regarding social issues, such as transportation needs and food insecurity, 0% versus 21% (χ2, P < 0.0001) and 0% versus 13% (χ2, P = 0.0003), respectively.
The differences in the percentage of patients for which other SDOH were identified (stress and safety/domestic violence) were not found to be statistically significant.
Discussion
The results from this study indicate that SDOH can be passively identified from patient records with relatively high accuracy. This result aligns with that of a growing body of literature surrounding the use of NLP to successfully screen for SDOH. 20,24,25 The effective use of SDOH NLP models to screen for SDOH from patient records may offer improvements over current SDOH screening methods.
For example, NLP screening can be used to identify SDOH across a longer time range than active screening by scanning all historical notes in a patient's chart. In this way, patterns can be flagged that providers may not have noted before a clinical encounter, appreciating the longitudinal nature of social needs. More extensive identification could prompt more in-depth conversations around social vulnerabilities noted in the past.
In addition, despite clear recommendations in support of regular screening for SDOH, 4 the question of which SDOH should be screened for, and how they should be prioritized, is still debated. 26 The results from this study indicate that passive techniques identify distinct SDOH from manual screening methods. Active methods identified social integration, domestic safety, financial resources, food insecurity, transportation, housing, and stress with greater frequency than the passive method. Similar to the findings of Ridgway et al, our results indicate that neither passive nor active methods sufficiently capture a patient's entire experience relating to SDOH (Table 2). 27
Future studies are needed to understand the best way to blend the models, understanding that a sequential or hybrid model could provide data to allow for longitudinal risk assessment outside of the clinical environment, followed by in-person screenings on a visit.
Passive identification also allows for a proactive approach to SDOH identification without the presence of a patient, which could address an increasing demand for technology-based methods for communicating social resources outside of clinical encounters. 28,29 This outreach would help prioritize clinical visits for urgent medical needs, while addressing SDOH needs in the community on an ongoing basis. It is possible that new technology, such as an NLP-informed Chatbot function, could improve patient access to SDOH resources by tailoring suggested community resources based on information from a patient's electronic medical record.
It is documented that providers are hesitant to discuss SDOH with patients due to a number of different reasons. 11 Patients, too, have been found to express hesitancy when discussing SDOH with medical providers. 30 Although research has been conducted around improving provider approach to screening for SDOH, 31 alternatives to manual screening methods should also be considered as a way to prepare both patient and providers for these types of conversations. By using passive methods to identify high-risk patients before a visit, doctors may feel more prepared to discuss SDOH and offer helpful resources.
The results from this study indicate that different types and locations of providers document varied social vulnerabilities (Table 4), possibly because providers vary in their perception of SDOH screening importance as found by Palacio et al. 32 Evidence shows that both patients and hospital systems benefit from screening for SDOH, 33 –35 and it is increasingly becoming a payor quality metric in risk-based contracts. 6
Social Determinants of Health Identified by Author Type
Since passive identification can track and categorize the providers, locations, and frequency of which SDOH are discussed and documented in the EHR, health systems could leverage these data to formulate strategic interventions. Moreover, health systems can use these data to better match the use of their resources in the hospital by assessing where and when case managers or social workers should be matched with patients.
Both passive and active methods of SDOH data collection may be necessary at this time, as they demonstrate a small degree of overlap. However, by offering additional training and strategically placing case managers and social workers into the patient journey, a dependence on active identification through manual screening could decrease as the provider documentation around SDOH increases. That is, NLP can only detect what is written in the EHR. Thus, although an initial approach may be to implement a blended model that relies on active screening to initiate the conversations, as clinicians become more comfortable with regularly discussing social issues as part of routine clinical care, the process could shift to rely more heavily on passive screening, helping to gain efficiencies over time.
In short, passive SDOH identification allows for both strategic and tactical efforts across health systems, and the use of each tool will require a continued reassessment. At this time, since active and passive screening fulfill slightly differing roles, future study is needed to understand the best placement of these tools in the health system to optimize the care of vulnerable patients.
Limitations
The interpretability of data gathered from machine-learning models is still debated. 22 These techniques have been found to be limited by the diversity of data used in their production, and require consistent methods of data collection to work effectively. 14,36,37 The study population for this project was predominantly White, female, and resided in rural communities.
This study compares data from slightly different time periods. Although the NLP analysis was completed on previous EHR data, the screening was done in real time. It is possible that an individual could have had SDOH identified by NLP using retrospective review that were not identified on active screening in real time, as they had been resolved by the time of the active screen. Although this is a limitation, the authors assumed a normal distribution of those events (ie, it is just as likely that an SDOH was not identified by passive screen and then identified on active screen as the converse).
During passive screening using NLP, less than 20 patients were found to have notes that mentioned social isolation, housing stability, transportation, or food insecurity. Thus, although the PPV/sensitivity numbers are rather high for those models, it must be noted that the sample size for determining these numbers was less than 20 patients for each one.
Conclusion
This study demonstrates that passive artifical intelligence (AI)-based methods can accurately identify SDOH and, in combination with in-person active screening methods, have the potential to be a comprehensive assessment of social vulnerability. Most health systems beginning this study may start by increasing the capture and collection of SDOH through a blended model of passive AI-based methods alongside active screening. However, these results suggest that it may be possible for systems to rely on passive AI systems to perform the entire screening task in the future, reserving active manual efforts for higher value clinical tasks.
Footnotes
Acknowledgments
The authors thank Ruben Amarasingham, MD, for his help in reviewing and editing this article. The authors thank Yuanyuan Feng and Yuqi Si for their tireless data analysis and Michael Seeber for his support and guidance of their efforts. The authors thank Mathew Fowler for providing the research team with the required data sets. They thank Majid Rastegar and Hong Kang for their masterful NLP work, as well as Komli-Kofi Atsina, MD, and Yukun Chen for their leadership and guidance throughout the project execution and analysis.
Authors' Contributions
Dr. Stewart de Ramirez contributed to conceptualization, funding acquisition, investigation (equal), methodology (equal), project administration (lead), resources (equal), supervision, and writing—review and editing (equal). Ms. Shallat was involved in writing—original draft (lead) and writing—review and editing (equal). Mr. McClure contributed to writing—original draft (supporting) and writing—review and editing (equal).
Ms. Barenblat carried out data curation (lead), formal analysis, investigation (equal), methodology (equal), project administration (supporting), resources (equal), software, validation, writing—original draft (supporting), and writing—review and editing (equal). Ms. Foulger was in charge of data curation (supporting), methodology (equal), and blinded review of results (supporting).
Author Disclosure Statement
OSF HealthCare is an investor in Pieces Technologies. Dr. Stewart de Ramirez and Ms. Foulger are employed by OSF HealthCare. Ms. Barenblat is employed by Pieces Technology. The individual authors have no additional interests to disclose.
Funding Information
No funding was received for this article.
