Abstract
Background:
Type II workplace violence in health care, perpetrated by patients/clients toward home healthcare nurses, is a serious health and safety issue. A significant portion of violent incidents are not officially reported. Natural language processing can detect these “hidden cases” from clinical notes. In this study, we computed the 12-month prevalence of Type II workplace violence from home healthcare nurses’ clinical notes by developing and utilizing a natural language processing system.
Methods:
Nearly 600,000 clinical visit notes from two large U.S.-based home healthcare agencies were analyzed. The notes were recorded from January 1, 2019 to December 31, 2019. Rule- and machine-learning-based natural language processing algorithms were applied to identify clinical notes containing workplace violence descriptions.
Results:
The natural language processing algorithms identified 236 clinical notes that included Type II workplace violence toward home healthcare nurses. The prevalence of physical violence was 0.067 incidents per 10,000 home visits. The prevalence of nonphysical violence was 3.76 incidents per 10,000 home visits. The prevalence of any violence was four incidents per 10,000 home visits. In comparison, no Type II workplace violence incidents were recorded in the official incident report systems of the two agencies in this same time period.
Conclusions and Application to Practice:
Natural language processing can be an effective tool to augment formal reporting by capturing violence incidents from daily, ongoing, large volumes of clinical notes. It can enable managers and clinicians to stay informed of potential violence risks and keep their practice environment safe.
Keywords
Background
Type II workplace violence in healthcare, committed by patients/clients toward workers (Howard, 1996), is the most common type of violence in healthcare (Occupational Safety and Health Administration [OSHA], 2015). A recent meta-analysis shows that among home healthcare workers, most commonly nurses, over 50% report experiencing nonphysical violence, and nearly 15% experience physical violence annually (Byon, Lee, et al., 2020). Workplace violence victims experience serious negative consequences, including poor mental health, physical injury, and sickness absence (Nyberg et al., 2021). In addition, victims report decreased work functioning, worsening relationships with patients, and the diminished quality of care they provide (Lanctôt & Guay, 2014). Addressing Type II workplace violence is crucial in recruiting and retaining this critical workforce.
Violence is underreported across healthcare settings, especially among home healthcare nurses (Byon, Lee, et al., 2020), limiting our understanding of the full magnitude of the issue. Studies in inpatient institutions show that up to 60% of Type II workplace violence is not reported to management, either verbally or in any other format (Arnetz et al., 2015; Findorff et al., 2005; Pompeii et al., 2016). Underreported incidents create biased data that misinform workplace violence prevention policies and programs (Campbell, 2017).
Pompeii et al. (2016) found that a large portion (24%) of Type II workplace violence events were recorded in electronic health records by hospital workers rather than formal reporting channels. Part of the workers’ motivation was to document “their side of the story” in the patient medical record and protect themselves rather than seek support from their employer. Barriers of reporting Type II workplace violence among home healthcare nurses were identified by our team’s qualitative examination (Byon, Liu, et al., 2020). The nurses were unwilling to report when it was perceived as disadvantageous (reliving the trauma), discouraged (by a norm that experiencing violence is a part of the job), unachievable (unstandardized reporting process), and ambiguous (uncertain of what is reportable). Without an efficient and effective formal reporting system, healthcare workers may continue to resort to informal reporting channels such as electronic medical records. Home healthcare nurses’ clinical visit notes in electronic health records can be an informative source for case detection and analysis. Visit notes can provide rich narratives about violent incidents. However, these notes have intrinsic challenges for data analysis related to the enormous volumes of notes, complexity, and use of free-text data used to describe an incident.
Natural language processing—a technique from computer science that helps to analyze large bodies of text (Demner-Fushman et al., 2021)- can be used to address this challenge. There are broadly two approaches to natural language processing: rule-based and machine-learning-based (Banda et al., 2018; Hirschberg & Manning, 2015). The rule-based approach categorizes the text based on specific rules. The rules rely on keywords extracted by domain experts (e.g., home healthcare clinicians). As such, a rule-based natural language processing approach is applied to extract concepts from unstructured texts based on clinical insight. However, a machine-learning-based approach is a method by which algorithms learn without being explicitly instructed on how to understand texts. Employing statistical methods, the machine-learning-based approach involves analyzing pre-annotated texts to create its own rules.
Natural language processing helps quantitatively read, decipher, and understand textual descriptions. It can help identify Type II workplace violence cases by analyzing clinical notes in electronic health records. Capturing previously underreported cases by extracting information from clinical notes can provide a more accurate estimate of Type II workplace violence prevalence. Analysis of clinical notes at scale can also provide a more precise picture of the epidemiology of Type II workplace violence, which can inform policy-making and prevention efforts. Furthermore, developing natural language processing algorithms that identify Type II workplace violence cases from a large volume of clinical notes can help detect cases promptly and potentially identify risks for future violence via risk modeling.
In this study, we extracted and analyzed Type II workplace violence-related information from home healthcare nurses’ clinical notes using natural language processing. The purpose of this study was to determine the 12-month prevalence of Type II workplace violence and compare the prevalence with that of incidents formally reported in the established reporting system (i.e., incident report).
Methods
Study Data
The primary data for this study were clinical visit notes from two large home healthcare agencies on the East Coast of the United States, recorded from January 1, 2019, to December 31, 2019, in the electronic health record. Clinical notes written by home healthcare nurses were analyzed. The study data did not include clinical notes of other occupations. We extracted nearly 600,000 clinical notes (570,865 and 19,743 notes from two agencies, respectively). We also inquired about Type II workplace violence prevalence data in their established reporting systems (i.e., incident reports) for the same year. All data were de-identified before analysis. This study was approved by the Institutional Review Boards at UVA Health and VNS Health.
Data Analysis
Our analytical pipeline consisted of three phases. In Phase I, we developed an initial vocabulary of words/phrases potentially describing Type II workplace violence incidents. Based on the vocabulary list, we automatically screened for potential violence cases across the entire sample of clinical notes. In Phase II, the potential violence cases (from Phase I) were reviewed by a human reviewer and annotated for the actual presence of Type II workplace violence. In Phase III, we used the human-annotated data (from Phase II) to develop and evaluate a natural language system to identify Type II workplace violence in clinical notes. Below we provide a detailed description of each phase. Our analysis combined rule-based and machine-learning-based approaches to benefit from both home healthcare clinician insight and automated algorithms. Prevalence was calculated as the number of Type II violence incidents per 10,000 home visits. The unit of analysis was each clinical note. Each note was a proxy for one visit by a nurse.
Phase I: Violence-Related Vocabulary Development and Screening for Potential Violence Cases Using a Rule-Based Approach
We first created a Type II workplace violence vocabulary list to identify potential Type II workplace violence cases from the clinical notes. The violence included both physical (e.g., physical assault and sexual assault) and nonphysical types (verbal abuse, threatening, racial slur, and sexual harassment). At the initial step, a base Type II workplace violence vocabulary list was developed and manually compiled by two content experts (HB, MC). This vocabulary included expressions such as “punching,” “cursing,” and “hitting,” among others. Then, we expanded the base vocabulary list through NimbleMiner (Topaz et al., 2019), an open-source natural language processing system developed by the study co-investigator (MT). Specifically, using the “Rapid Vocabulary Explorer” module in NimbleMiner, two authors (H.B. and M.C.) independently input query terms from the base vocabulary expressions (e.g., “punch” for physical violence; “curse” for nonphysical violence) and NimbleMiner produced potential synonyms (e.g., “pt (patient) punched,” or “using foul language”) for researchers to manually choose from. This was an iterative process as newly chosen expressions were then used to identify even more similar terms. The repeated process was stopped when the two content experts (H.B. and M.C.) judged that NimbleMiner could not offer additional synonyms (Figure 1). The final expression lists from the two researchers were compared, and disagreements were resolved by a third reviewer (MT). As a result, we identified a total of 359 expressions that described workplace violence (Table 1). Then, we screened the entire clinical notes to search for any notes that contained at least one of these expressions.

Violence Words/Phrases Vocabulary Development Steps.
Violence-Related Words and Phrases (n = 359) Describing Type II Workplace Violence.
Note. The underbar (_) indicates a space. Some words and phrases (e.g., assault) were misspelled in the clinical notes and captured as they were. Some words and phrases existed in multiple contexts (e.g., “grabbed” in the physical assault and sexual assault categories).
Phase II: Creating Annotated Corpora by Human Review
From Phase I, approximately 600,000 clinical notes were reduced to a manageable subset of 1,515 notes that might have included descriptions of potential Type II workplace violence. Then, each of the 1,515 notes was manually reviewed by a research assistant (a master-degree registered nurse with experience in home healthcare), who classified each note as either having Type II workplace violence toward a nurse or not (binary labeling). A violent incident was considered Type II workplace violence if it was committed toward the nurse during the home healthcare visit. The violence could have been perpetrated by the patient, family members, or any persons in the patient’s house. A clinical note was not considered a case of Type II violence when (a) the clinical note described non-Type II workplace violence (e.g., violence was directed toward a family member), (b) the violence was not directed toward nurses (e.g., the victim was the home health aide), or (c) the victim was unclear (e.g., violence was recorded without mention of the victim). The first 100 clinical notes were reviewed by two reviewers (HB and the research assistant). Once high levels of agreement on labeling decisions were achieved between the two reviewers (Kohen’s Kappa > 90%), the rest of the notes were labeled by the research assistant alone. This process ensured the validity and reliability of the labeling. The 12-month prevalence of Type II workplace violence was defined as the proportion of identified violent incidents out of the total number of nurse home visits per 10,000 visits. The number of clinical notes that described Type II workplace violence was used as a proxy for the number of total violent incidents (numerator). The total number of clinical notes in our data set was a proxy for the total number of nursing visits (denominator). When a single note included multiple incidents of the same type of violence, we treated them as one incident.
Phase III: Developing a Machine-Learning-Based Natural Language Processing Algorithm to Automatically Identify Type II Workplace Violence
Using the annotated corpora from Phase II, we applied a support vector machine (SVM) algorithm to develop a classifier that automatically classified clinical notes into either the notes that included any violent incidents or the notes that did not include them. Support vector machine is one of the most popular machine learning algorithms, and it classifies the given data as correctly as possible by maximizing the distance between the data clusters (Cortes & Vapnik, 1995). We split the 1,515 notes into 70% training and 30% testing sets. We evaluated the algorithm’s accuracy in identifying Type II workplace violence by computing precision (number of true positives out of a total number of predicted positives), recall (number of true positives out of an actual number of positives), and F-score (weighted harmonic mean of precision and recall). This process was implemented using KNIME software.
Results
The 12-Month Prevalence of Type II Workplace Violence
Via the natural language processing and human review (Phases I–II), we identified 236 clinical notes that included workplace violence toward home healthcare nurses. Among these incidents, 222 (94.1%) were nonphysical violence, 20 (8.5%) were physical violence incidents, and 6 (2.5%) clinical notes recorded both physical and nonphysical violence incidents. An example of physical violence incident reads: “pt unable to transfer to return to bed attempted to stand her up pt hitting nurse.” An example of verbal abuse note read, “pt verbally abusive toward nurse during visit, shouting, cursing and yelling.”
The prevalence of physical violence was 0.067 incidents per 10,000 home visits. The prevalence of nonphysical violence was 3.76 incidents per 10,000 home visits. The prevalence of any violence was four incidents per 10,000 home visits. However, for the same 1-year period, no Type II workplace violence incidents were recorded in the official incident report systems of the two agencies.
Accuracy of the Machine-Learning-Based Natural Language Processing Algorithm
Overall, the classification algorithm demonstrated good accuracy (F-measure = 0.79; Table 2) in automatically identifying whether a clinical note includes a violent incident or not. On average, eight of every 10 notes labeled as positive or negative for a violent incident correctly reflected the true presence or absence of a violent incident in the notes (precision = 0.8). On average, 7.8 of every 10 notes that recorded the presence or absence of a violent incident were correctly classified as positive or negative for a violent event (recall = 0.78). Such accuracies were higher when classifying the notes that did not record a violent incident (F-measure = 0.94, precision = 0.93, recall = 0.95) than the notes that did record a violent incident (F-measure = 0.63, precision = 0.67, recall = 0.60).
Metrics to Evaluate the Accuracy of Natural Language Processing Algorithm in Classifying Violence Events in Home Healthcare Nurses’ Clinical Notes.
Discussion
To our knowledge, this is the first study that identified Type II workplace violence incidents from clinical notes in electronic health records. Using natural language processing, we searched approximately 600,000 clinical notes and identified 236 Type II violence incidents that were not recorded in the incident report systems. Of those, 20 were incidents of physical violence. Although details of the consequences of violence were not described in the clinical notes, the harmful effects of such workplace violence are well described in various studies and documented in a systematic literature review (Lanctôt & Guay, 2014). These consequences include physical injuries to various body parts (e.g., head, back, and arm) with different levels of severity (e.g., bruise, abrasion, scratch, and laceration) and varied physical symptoms (e.g., headaches, stomach aches, and other pain). Victims of physical violence, as well as nonphysical violence, often suffer from psychological consequences (Lanctôt & Guay, 2014). One of the most frequent psychological aftermaths is post-traumatic stress disorder (PTSD). Studies have found that its symptoms include avoidance, negative change in cognition or mood, recurring disturbing memories, and hypervigilance, to name a few (Lanctôt & Guay, 2014), which all adversely and critically affect nurses’ ability to take care of their patients and themselves. Repercussions of experiencing workplace violence also encompass permanent disability, quitting jobs, absence from work, impaired job performance, reduced organizational commitment, lower levels of job satisfaction, and lower quality of interpersonal relationships with colleagues (Lanctôt & Guay, 2014).
Home healthcare is a critical component in the delivery of healthcare. The U.S. home healthcare industry is projected to grow 7% annually from US$103 billion in 2018 to US$173 billion by 2026, outpacing growth in other healthcare sectors, including hospital care and physician services (Business Insider Intelligence, 2019). The aging population and a shift in patient care to noninstitutional settings are expected to continue to drive the demand for home healthcare services. In 2021, there were 324,460 healthcare practitioners and technicians, including 173,790 registered nurses and 964,760 healthcare support workers in the U.S. home healthcare services (Bureau of Labor Statistics, 2022). However, the shortage in the number of home healthcare workers has been reported widely and remains a significant concern (Foster et al., 2019; Institute of Medicine of the National Academies, 2008; Spetz et al., 2015; Weaver et al., 2018; Yao et al., 2018). One of the main reasons for home healthcare workers’ job dissatisfaction and turnover can be their experience of violence from their clients (Lanctôt & Guay, 2014).
Nurses often do not report Type II workplace violence because their ability to report patient aggression is limited by reporting systems in the workplace and/or they do not know or understand the reporting process. In addition, they are too busy to report, or the systems that are in place do not support reporting incidents (Christensen & Wilson, 2022). Considering violence is grossly underreported among home healthcare nurses (Byon, Lee, et al., 2020), the ability to detect otherwise unreported violence incidents in clinical notes using natural language processing provides excellent opportunities for case detection and violence prevention. Capturing previously unnoticed cases will help develop a more precise picture of the epidemiology of violence. Through developing a timely alert system based on identified cases, nurses and management can take prompt preventive measures, such as exercising heightened vigilance and/or joining a second trained individual on high-risk home visits. Such understanding of the prevalence of violence and risk awareness is especially needed for home healthcare nurses who are lone workers facing unique challenges that put them at a higher risk of experiencing violence—namely, working alone in the patient’s home and being exposed to multiple hazards such as weapons, drugs, and family violence without the standard security services that are present in hospitals and residential care settings.
Our results suggest that it is feasible to develop and utilize natural language processing to identify Type II workplace violence incidents embedded in clinical notes. Our model showed good accuracy in identifying violent incidents. This means that it is possible to automate the daily clinical notes’ search process to alert management of potential violence incidents so that they can look into those notes for actual violence cases. Because it is not plausible for management to read through thousands of daily clinical notes that home healthcare clinicians create, natural language processing systems can significantly assist in detecting those “hidden” violence cases. The system can help bridge the gap in nurses’ underreporting of violence incidents to management. Recently, a machine learning model with natural language processing was applied to identify nonworkplace, interpersonal violence in mental healthcare electronic records, demonstrating its potential practical use (Botelle et al., 2022). In our study, no Type II workplace violence incidents were officially recorded in the incident report systems. Incident report systems do not accurately account for the number of violence cases experienced by clinicians. Systems should be in place to help encourage clinician reporting and identify potential areas of risk. Increasing knowledge of violence incidents can help add to our understanding of where clinicians are at greatest risk for Type II workplace violence. Our results show that it is possible to capture these “hidden” violence cases to some extent using natural language processing.
Our study has limitations. Although the clinical notes are extracted from two different home healthcare agencies, they were both located on the East Coast of the United States. The safety and incident reporting practices in the workplace may differ from those in other regions of the United States or in other countries. Some of the words and phrases that were used by nurses in other regions might not have been captured. Also, we analyzed the clinical visit notes of home healthcare nurses. Our findings (violence prevalence) may not represent the experience of Type II workplace violence toward other home healthcare occupation groups (e.g., physical therapists, speech therapists, social workers, and home health aides). So, the generalizability of our findings is limited. Also, we found that the nurses’ language used to describe violence was not identical between the two agencies (e.g., some words are unique in the southern area). Words that are also associated with violence, for example, “pull” is a commonly used term when providing home healthcare, limiting the system’s ability to learn healthcare provider’s language when documenting Type II workplace violence. Using geographically different locations and increasing the amount of data available would strengthen the system’s ability to identify cases. On the practice side, standardizing documentation to ensure that free text is easily accessible to reporting and programs such as NimbleMiner will help strengthen the ability to use natural language processing to identify Type II workplace violence cases.
Using natural language processing to identify Type II workplace violence incidents from clinical notes is feasible. Such a system is much desired and needed when considering the underreporting of violence among home healthcare nurses. A fuller picture of the epidemiology of workplace violence can better inform prevention policies and programs in the home healthcare industry, potentially contributing to the reduction of job dissatisfaction and increase job retention of home healthcare nurses. Natural language processing can be used to reduce the manual review of notes, allowing home healthcare leaders to reduce the amount of time auditing notes and aid in improving reporting among clinical staff. Developing natural language processing and increasing the amount of data available for research can help better inform the determinants of violence in the home setting and potentially help avoid a case of Type II workplace violence against nurses.
Implications for Occupational Health Nursing Practice
Home healthcare workers are at high risk for workplace violence. Clinicians are often alone in an uncontrollable environment with limited access to resources available in institutional settings. Home healthcare nurses are strongly encouraged to report any actual and potential violence they experience during their home visits to agency management. Reporting is crucial in addressing and preventing workplace violence. In reality, many clinicians only record violent incidents in their clinical notes and do not report them using a formal reporting system. Not all home healthcare agencies may have a formal reporting system. Clinical visit notes are the official record of a visit and often hold the true story of what happened during the visit. This information is very hard for agencies and leaders to read in real-time, given the complexity and volume of data needed to review to find incidents. Natural language processing is a novel analytic tool that can augment formal reporting by automatically capturing violence incidents from nurses’ clinical notes. It can help management to detect violence cases, even as a nurse types a note and syncs the electronic note. Cases identified using natural language processing of clinical notes add another layer of evidence that violence incidents are underreported through a formal reporting system. Employers need to make continuous efforts to improve their policies and procedures and motivate their nurses to report violence incidents.
Applying Research to Occupational Health Practice
This research supports the enhanced occupational communication of risk to management. Home healthcare nurses’ clinical notes in electronic health records can be an informative source for case detection and analysis. Using natural language processing, a real-time analytic tool can be developed and implemented to automatically capture violence incidents from daily, ongoing, large volumes of clinical notes. It can filter through large volumes of free text data, enabling leaders, quality reviewers, and others to identify trends outside of the traditional form/answer format. Identified violence incidents can support management to stay informed and be able to keep their staff safe. Even though home healthcare nurses work alone and function remotely from management, the ability to capture and inform management of violence incidents can be a viable option. Alerting field nurses when there is a potential risk for violence before their next visits offers a level of occupational safety as management captures dangerous situations in advance and can better support their staff.
Footnotes
Author Contributions
H.B. and M.T. conceptualized the study, conducted data analysis, and took the main role in writing the manuscript. C.H., M.C., and J.S. analyzed data and co-wrote the manuscript.
Conflict of Interest
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Two of the coauthors had a financial affiliation with an organization where the study was conducted. To ensure the objectivity of the study, all data were de-identified for analysis and interpretation. Other authors have no affiliation with the organization with any financial or non-financial interest.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research project was supported by the Eugenie and Joseph Doyle Research Partnership Fund.
