Return visits to the emergency department: An analysis using group based curve models

Abstract

Stratification modeling in health services is useful to identify differential patient risk groups, or latent classes. Given the frequency and costs, repeated emergency department (ED) may be an appropriate candidate for risk stratification modeling. We applied a method called group-based trajectory modeling (GBTM) to a sample of 37,416 patients who visited an urban, safety-net ED between 2006 and 2016. Patients had up to 10 ED visits during the study period. Data sources included the hospital’s electronic health record (EHR), the state-wide health information exchange system, and area-level social determinants of health factors. Results revealed three distinct trajectory groups. Trajectories with a higher risk of revisit were marked by more patients with behavioral diagnoses, injuries, alcohol & substance abuse, stroke, diabetes, and other factors. The application of advanced computational techniques, like GBTM, provides opportunities for health care organizations to better understand the underlying risks of their large patient populations. Identifying those patients who are likely to be members of high-risk trajectories allows healthcare organizations to stratify patients by level of risk and develop early targeted interventions.

Keywords

Health information exchange electronic health records group-based trajectory model data mining emergency department revisits

Introduction

The concept of risk, that patients possess the differing underlying severity of disease or different probabilities of negative outcomes, is fundamental to effective clinical care and population health management.^1,2 Increasingly common is the application of advanced data mining models to predicting information about patient risk from within big healthcare data repositories.³ Healthcare organizations have assimilated data mining to improve physician practices, disease management, resource exploitation⁴ and prediction models exist for a host of outcomes ranging from specific disease development, to disease prognosis, to mortality, and health care costs.

One such application of risk data mining (specifically prediction modeling) is in the area of repeat emergency department¹ (ED) visits. Over utilization of ED services is a particular challenge for health systems worldwide and ED revisit are common⁵ and can be utilized as performance or quality metric.^6–8 Moreover, ED revisits might be avoidable if it was possible to detect individuals with higher risk of revisiting into the ED at their first visit.⁹ Critically, ED revisits may be an appropriate candidate for risk prediction modeling as prior studies have identified different driving factors return visits,¹⁰ including the kind of disease,¹¹ medical errors,¹² patient satisfactions,¹³ and lack of initial evaluation or treatment.^14,15

In this work, we extend risk stratification modeling for health service utilization through an advanced computational technique which enables including the effect of multiple instances of ED revisits by the same patient over time.^16,17 Specifically, this study analyzes longitudinal forms in ED visits exploiting a unique method called group-based trajectory modeling (GBTM). In addition, we model distinct trajectories and their structure, and the association of these trajectories to several patient traits. To illustrate the value of the longitudinal approach of modeling changes in individual patient risk over time, we contrast GBTM with a standard clustering approach (K-means). Furthermore, beyond typical ED data reported in previous research, our study utilizes data collected from many distributed sources, thereby availing more ED data for track patients’ risk of ED revisit.

Identifying effective, and implementable approaches to risk stratification may assist clinicians in fitting personalized medical care to prevent ED revisit and enable healthcare organizations to more efficiently deliver population health management interventions.

Patients’ risk for early ED revisits may change over time. Risk profiling approaches by specifically identifying the relative developmental risk over time, could allow healthcare institutes to prepare involvements and plans to patients’ varying needs. This is the reason our paper will focus on time-series model rather than the classical traditional clustering method (e.g. K-means).

Background

ED visits and Revisits

EDs play a critical role in healthcare systems by performing acute interventions for emergent situation and as a source of primary care that is widely accessible within communities.¹⁸ Return visits to the ED are common with estimates ranging between 10–26% depending on the subpopulation.^9,19,20 Repeat ED visits may or may not result in an inpatient admission. In addition, the ED visit may or may not be for the same reason as the previous visit, or may not even be to the same institution. ED revisits could be as a result of reduced care quality or for unanticipated complications.¹¹ Also, a return visit to the ED can also result from a patient’s over-estimation of the urgency of their medical conditions,²¹ which could have been treated on other points of care outside the ED (e.g., community clinics). In terms of clinical conditions, for instance asthma and diabetes are associated with higher ED revisits.¹¹ ED revisits are problematic because each visit suggests the potential for an increased risk of side-effects, like functional deterioration and infections.^22,23 Numerous clinical and social factors are associated with an increased risk of an ED revisit. For example, age, loneliness, residence in a long-term care facility, or receiving in-home assistance services each have been associated with ED revisits.^18,23 In addition, limited English proficiency patients are more likely to have an unplanned revisit than English speakers.²⁴

Clustering methods and machine learning

Clustering, a fundamental method in unsupervised machine learning,²⁵ may be useful in identifying and segmenting patient populations at differing levels of risk for experiencing an ED revisit. Clustering algorithms allocate items into clusters by using some distance measures between an item and the centroid of the cluster and the method does not depend on any distributional assumptions.²⁶ A common clustering algorithm is the K-means clustering which finds clusters by diminishing the sum of squared deviations between the each distinct item and their cluster means.²⁷ Hao et al.¹¹ used decision trees to produce scores approximating the probability of the ED revisit within 1 year of the visit history. Retrospective and prospective testing results along with a case study summary displayed the algorithm’s effectiveness in the discovery of patients with different ED revisit risks with decent sensitivities.²¹ Other works have used logistic regression,²⁸ Two-Class Boosted Decision Trees,²⁰ and hidden Markov models.¹⁰

Materials and methods

We modeled the trajectories with a higher risk of an ED revisit in a 10-year longitudinal patient panel using group-based trajectory modeling (GBTM).

Study setting and data

The study sample included 145880 ED visits among 37,416 patients at an urban safety-net hospital located in Indianapolis, Indiana, USA between 2006–2016. Due to the requirements of a prior study, these secondary data were limited to Indiana residents that had at least one primary care visit at the hospital’s outpatient primary care clinics before and after 2011.

We combined four data sources. The primary data source was the safety-net hospital’s electronic health record (EHR). The EHR data were joined with data from the statewide health information exchange (HIE) data repository. We classified data from the EHR as either associated with: 1) the current ED visit or 2) historical EHR data (i.e. data generated prior to the current visit). Third, we extracted data from the Indiana Network for Patient Care (INPC). The INPC includes patient information from more than one hundred different hospitals and thousands of health care providers from across the state and is one of the US′ largest and oldest multi-institutional clinical data repositories.²⁹ Lastly, we joined area-level social determinants of health.³⁰ Each measure was at the census tract level. Census tract level is defined by the US Census Bureau. These are sub-county areas of approximately 4000 individuals.

Features & feature selection

From the four data sources, we extracted a total of 73 different features. From the hospital’s EHR, we obtained current visit and patient information including demographics (age, gender, race/ethnicity), timing of visit (AM/PM and weekday/weekend) and reason for the visit, such as a non-emergency and identified visits associated with injury, alcohol, or substance abuse according to the NYU algorithm.³¹ Also from the hospital’s EHR, we obtained historical information, i.e., generated prior to the ED visit, such as prior diagnoses, Charlson comorbidity index scores,³² the number of outpatient visits and hospitalizatῐῐions in the prior 30 days. From the HIE data, we calculated the following measures reflective of healthcare visits beyond those at the hospital: additional prior diagnoses, a recalculated Charlson comorbidity index scores (based on EHR and HIE data), total number of outpatient visits and hospitalizations in the prior 30 days from any provider organization, and determined the total number of unique prescriptions in the past 30 days from all providers. From the area-level dataset we included various measures of socioeconomic status and public safety. To facilitate presentation, we labeled features as:

1) Current ED visit and patient data (7 features with an “ed” prefix and age, gender, and race),

2) Historical EHR data (15 features with an “ehr” prefix),

3) HIE data (14 features with an “hie” prefix), and

4) Area-level social determinant (34 features with a “ct” prefix).

Group-based trajectory modeling (GBTM)

We used group-based trajectory modeling (GBTM), a longitudinal trajectories analysis method, to study the developmental path of ED revisits risk of each patient. GBTM is a fixed mixture modeling, assuming that the population distribution of trajectories rises from a finite mixture of unknown order J.³³ It models the distribution of dependent variables conditional on a time-related metric such as time from the beginning of the first period or event, represented by $P (Y_{i} | Z_{i})$ , where the random vector Y_i denotes individual i’s longitudinal sequence of dependent variable results, and the vector Z_i denotes features of I assessed at baseline or are constant. The probability for each individual i, conditional on the number of groups J, are represented as

P (Y_{i} | Z_{i}) = \sum_{j = 1}^{J} π_{j} P (Y_{i} | Z_{i}, j; β_{j}),

where π_j is the probability of affiliation in trajectory j, and the conditional distribution of Y_i given affiliation in j is indexed by the unknown parameter vector β_j, that set the shape of the trajectories. Naturally, the trajectory is modeled using a polynomial function (linear, quadratic or cubic) of the time-related metric. For any given j, conditional independence is assumed for the sequential results of Y_i over the T periods. For our binary outcome, ED revisit, Yi equals to either 0 or 1 (0 for non-ED revisit and one for an ED revisit). The value of the probability of any patient in trajectory J, (

P^{j} (Y_{i t} = 1)

) if the

ε_{i t}

is assumed to distributed normally is a Probit function³³ and alternatively for an extreme value distribution, it follows the binary logit distribution of:

P^{j} (Y_{i t} = 1) = \frac{exp (β_{0}^{j} + \sum_{s = 1}^{S} β_{s}^{j} X_{i t}^{s})}{1 + exp (β_{0}^{j} + \sum_{s = 1}^{S} β_{s}^{j} X_{i t}^{s})}

where X is an example for a time-varying independent variable and s is the polynomial order (linear, quadratic, or cubic) ³³. For more details on the process of the method in binary logit, censored normal or Poisson distributions, please refer to Nagin.³³ GBTM generates the shape of the trajectory for each group (linear, quadratic or cubic) and the portion of each group.

Modeling and generating the ED revisit trajectories

The unit of analysis was the ED visit and the time-related metric of interest was the risk of a repeat ED visit at each visit. We implemented the GBTM models using the user-developed Traj procedure³⁴ in STATA version 17.³⁵ The optimization procedure included four goodness of fit tests³³ (Nagin 2005): 1) the minimum BIC and AIC throughout the number of groups; 2) p-values of the shapes and their overall fit; 3) the least difference between actual and assessed proportion of assignment to groups; and, 4) the highest average posterior probability of group membership for each group.

Feature selection methods

The first step before starting the GBTM modeling was to identify the most important features in our data sets and to reduce the number of variables in the GBTM model.

Two selected methods for feature selection (FS) were used. Gini Importance³⁶ and Information Gain (IG) (using Entropy).³⁷

We used Python to perform the feature selection methods. The results were compared across both Gini and IG methods. We picked the five most impactful variables in each category (to use it on GBTM Full Model) as we elaborate bellow in section 4.2.

The Dependent variable–ER revisit

Repeat ED visit within 30 days was identified as dependent variable. ED revisits could happen at any ED in the Indiana state and were not at all bounded to visits to the same institution. For each patient in the sample, we selected their first to their 10th ED visit (above 10 visits the data were too sparse). ED revisit appeared and used also in earlier works.

Figure 1 depict the main steps of our paper.

Figure 1.

Graphical view of the main steps.

Results

Descriptive statistics

Overall, the mean age in our sample was 50.04 years 34% of the patients were males, 23% were white non-Hispanic. Table 1 shows a decrease in the number of patients as the number of visits increases, as most patients had few repeat visits. Notable, however, was consistent increases in the percentage with a subsequent revisit, i.e., the number of patients who will have another revisit after this ED visit keeps growing as we come closer to the last visit.

Table 1.

ED Revisits results by ED visit number.

Visit number	Number of patients	% Repeated visits
1	37,416	7.74
2	31,563	15.30
3	26,496	17.67
4	22,357	19.13
5	18,810	21.28
6	15,916	21.90
7	13,590	22.77
8	11,567	23.06
9	9921	23.59
10	8573	23.53

Feature selection

After comparing results across the Gini and IG methods, we retained the five most impactful variables in each category for use in GBTM modeling. One of the first steps was to identify the key features in each class of variables.

Figure 2 illustrates the similarities of comparing the two FS methods for the class of Current ED visit and patient data (besides ED visit associated with psychological diagnosis, visit was due to an emergency that switched their places). Similar stable ranking was achieved in all three other classes of variables (HIE data, area-level and EHR data).

Figure 2.

Example for feature selection using both methods.

We divided the variables into time-stable and baseline variables and time-varying variables. The time-varying variables included: 6 variables from the class ‘Current ED visit data and patient data’, 15 variables from the class ‘EHR Data’, 14 variables from the class ‘HIE Data’ and 10 variables from the class ‘Area-level characteristics’ that are changing over time. The first run of feature selection (FS) time-stable and baseline variables include 24 Area-level characteristics, age, gender, and race. Then, we run the FS in all three other classes of variables (Current ED visit data and patient data, EHR Historical data and HIE data).

Eventually, for building the GBTM model, we tried to run the GBTM using the strongest five variables from each class of data, including the combination (having 20 variables by taking the five strongest variables from each class of data) of all of them. We received the best GBTM goodness of fit measurements when we used the five most impactful variables (all received from the class ‘Current ED visit data and patient data’). Thus, the best model included these variables: weekend visit, admitted to the hospital via the ED, ED visit associated with psychological diagnosis, visit was due to an emergency, and ED visit was due to an injury.

Analyzing the ED visits using the GBTM

GBTM goodness of fit tests

Following model fitting, the optimization procedure included four goodness of fit tests (Nagin 2005): 1) the minimum BIC and AIC throughout the number of groups; 2) p-values of the shapes and their overall fit; 3) the least difference between actual and assessed proportion of assignment to groups; and, 4) the highest average posterior probability of group membership for each group. These goodness of fit tests were repeated for each visit distinctly and for each shape (linear, quadratic and cubic). The 3-group model resulted in maximum consistency. Furthermore, it offers the practical gain of allowing healthcare organizations to choose the three groups as high, medium and low risk groups, respectively (Table 2).

Table 2.

Comparison between goodness of fit for two and three trajectories.

	2 groups			3 groups
	LOG	AIC	BIC	LOG	AIC	BIC
Basic model	−73,298.54	−73,307.54	−73,345.16	−73,200.73	−73,213.73	−73,268.07
Full model	−73,252.89	−73,271.89	−73,351.31	−73,156.80	−73,184.80	−73,301.84

Note. Basic model – this term relates to the basic run of GBTM model without covariates. It is recommended especially for model fitting (Nagin 2005). Full model–this term relates to the run of GBTM model with covariates (time-varying as well as time-invariant covariates).

Basic and full GBTM

Using the GBTM statistical software (STATA), we examined the features of the patients and their visits in each of their trajectory groups to understand the factors that described them. The full model of patients’ over up to 10 visits to the EDs is appeared in Figure 3, where the X-axis displays the number of ED visits and Y-axis displays the probabilities for ED revisit (within 30 days). Applying the BIC criterion for model selection to the study setting showed that the GBTM clustered 37,416 patients into three distinct trajectories based on patients’ ED revisit risk levels. According to the GBTM model, the low risk group included 79.6% of the population who were in relatively stable health, with their average number of ED revisits is 0.61, which continued to increase their likelihood to ED revisit slightly over the entire study period. The medium risk level included 19.1% of the population, which almost unchanged over the multiple visits (increasing their likelihood to ED revisit slightly until visit 6 and then decreasing until visit 9), with their average number of ED revisits is 4.15. The high-risk group (average number of ED revisits is 8.14) included only 1.2% of the population, that was defined as the Worsening group. It started at a high ED risk level which continued to increase their likelihood to ED revisit until visit 7, and after visit 7 the risk level is levelling off a bit (also due to a decrease in the number of patients–Table 3) but without intersection with any other trajectory and stays as the riskiest group for ED revisits.

Figure 3.

The shapes of the trajectories over the ED visits.

The best significant results for the shapes of each trajectory of both basic and full models were Cubic (for the low trajectory), Cubic (for the medium trajectory) and Quadratic (for the high trajectory) shapes for the first, second and third trajectories respectively. The basic model (Figure 3) shows very similar trajectories, which strength the stability of the GBTM full model results.

Distribution of patients by trajectories over the ED visits

Table 3 shows the changes in the average levels of ED revisits over time and by group membership. High risk group showed the highest average ED revisits levels over the entire study period.

Table 3.

Revisits results by ED visit number by trajectories.

Visit number/Trajectory		Number of patients	ED revisits mean value
Visit 2	Low	30,576	14.69%
	Medium	0	0
	High	987	34.25%
	Total	31,563	15.30%
Visit 3	Low	25,566	16.65%
	Medium	0	0
	High	930	45.48%
	Total	26,496	17.67%
Visit 4	Low	20,262	17.11%
	Medium	0	0
	High	2095	38.76%
	Total	22,357	19.13%
Visit 5	Low	15,659	18.71%
	Medium	2910	31.68%
	High	241	62.24%
	Total	18,810	21.28%
Visit 6	Low	13,565	19.46%
	Medium	2200	33.73%
	High	151	68.87%
	Total	15,916	21.90%
Visit 7	Low	11,363	20.04%
	Medium	2003	34.20%
	High	224	59.38%
	Total	13,590	22.77%
Visit 8	Low	9110	20.16%
	Medium	2237	30.98%
	High	220	62.27%
	Total	11,567	23.06%
Visit 9	Low	7536	21.14%
	Medium	2159	28.62%
	High	226	57.08%
	Total	9921	23.59%
Visit 10	Low	6297	20.31%
	Medium	2050	29.61%
	High	226	57.97%
	Total	8573	23.53%

Figure 4 shows the progressive assignments of trajectory affiliation as the number of visits are increasing and more information regarding time-varying variables are captured and utilized (the number of patients on each visit and on each trajectory is shown). GBTM distributes patients to one of the trajectories based on their highest posterior probabilities for trajectory affiliation, with one goal to achieve prompt assignments of the latent trajectory affiliation. We can see that the riskiest trajectory, patients are assigned into their final trajectories by the second visit whereas patients in the medium and high-risk levels are assigned into their final trajectory only from the fourth/fifth visit.

Figure 4.

The progressive assignments of trajectory affiliation as the number of visits.

For the lower risk group, the average last ED revisit rate (each patient may have up until 10 visits so this average is not the 10th visit as each patient may has its own number of visit) was 11.1%, for the medium group 26.68% and the higher trajectory was 56.59%.

Profiling the GBTM trajectories

Tables 4–7 provides a summary of patients' characteristics after profiling by the three trajectories: Low, Medium and High (In these tables, there is a split in the set of variables to show both types of variables including the variables that were discovered and those that were not discovered as the most influential ones using the feature selection process.). There is also an average of the study sample (all the patients participated).

Table 4.

Patient and visit characteristics (Age, Gender, Race, EDs).

Patient characteristics	Study sample n = 37416	Low n = 34452 (92.08%)	Medium n = 2729 (7.29%)	High n = 235 (0.63%)
Age (years)	50.04 ± 14.20	50.01 ± 14.27	50.37 ± 13.45	50.68 ± 12.31
Gender (% male)***	0.34 ± 0.47	0.33 ± 0.47	0.43 ± 0.50	0.53 ± 0.50
Race (% white)***	0.23 ± 0.42	0.22 ± 0.41	0.30 ± 0.46	0.31 ± 0.46
ED visit associated with psychological diagnosis***	0.011 ± 0.102	0.010 ± 0.098	0.021 ± 0.142	0.030 ± 0.170
ED visit was due to an emergency***	0.092 ± 0.289	0.088 ± 0.283	0.131 ± 0.338	0.166 ± 0.373
ED visit resulted in hospitalization*	0.065 ± 0.246	0.064 ± 0.244	0.076 ± 0.265	0.060 ± 0.237
ED visit occurred on a weekend*	0.249 ± 0.432	0.247 ± 0.431	0.266 ± 0.442	0.281 ± 0.450
ED visit was due to an injury	0.077 ± 0.267	0.077 ± 0.267	0.082 ± 0.274	0.072 ± 0.260
Remaining features not identified in the feature selection process
ED visit number***	0.760 ± 1.793	0.689 ± 1.682	1.567 ± 2.634	1.689 ± 2.768
ED visit associated with alcohol use***	0.006 ± 0.079	0.006 ± 0.075	0.013 ± 0.113	0.021 ± 0.145

Note: Data are the mean ( ± SD) or number of subjects (proportion). ANOVA tests were run. *** p<0.001, ** p < .01, *p<0.05, + p<0.1. No sign means no significant difference. Same conventions in other tables. The tables below use similar conventions.

Table 5.

Area-level characteristics.

Patient characteristics	Study sample n = 37416	Low n = 34452 (92.08%)	Mediumn = 2729 (7.29%)	High n = 235 (0.63%)
Area-level features
Rate of coronary artery disease (CAD)	4.716 ± 2.612	4.718 ± 2.633	4.675 ± 2.356	4.862 ± 2.253
Rate of drug consumption	6.334 ± 3.900	6.336 ± 3.909	6.333 ± 3.808	6.117 ± 3.567
Rate of cancer ***	4.965 ± 2.208	4.980 ± 2.219	4.801 ± 2.072	4.685 ± 2.103
Rate of depression ***	11.359 ± 7.373	11.449 ± 7.395	10.345 ± 7.049	9.922 ± 6.729
Rate of external injuries ***	11.121 ± 3.830	11.089 ± 3.832	11.460 ± 3.758	11.931 ± 4.101
Remaining features not identified in the feature selection process
Rate of schizophrenia	2.140 ± 2.291	2.142 ± 2.271	2.115 ± 2.531	2.076 ± 2.329
Rate of cardiac arrhythmia ***	4.007 ± 2.219	4.031 ± 2.231	3.717 ± 2.055	3.737 ± 2.077
Rate of hepatitis	2.341 ± 1.732	2.343 ± 1.735	2.314 ± 1.711	2.264 ± 1.452
Rate of HIV	0.603 ± 0.813	0.602 ± 0.813	0.615 ± 0.815	0.568 ± 0.704
Rate of pneumonia	1.884 ± 1.309	1.884 ± 1.311	1.888 ± 1.298	1.841 ± 1.232
Area-level general (time invariant variables)
ct_POPWDIPN1***	0.385 ± 0.782	0.383 ± 0.783	0.412 ± 0.774	0.415 ± 0.765
Percent of occupied housing units (combination of rental and owner) whose occupants pay 30% or more of income for housing costs ***	0.435 ± 0.847	0.429 ± 0.847	0.508 ± 0.842	0.440 ± 0.870
Percent of all occupied units that are owner occupied ***	−0.264 ± 0.841	−0.259 ± 0.845	−0.322 ± 0.791	−0.325 ± 0.774
Percent of households with cash public assistance or food Stamps/SNAP ***	0.468 ± 0.904	0.461 ± 0.903	0.560 ± 0.914	0.460 ± 0.888
Percent of population all ages without health insurance ***	0.427 ± 0.912	0.420 ± 0.912	0.507 ± 0.914	0.472 ± 0.944
Percent of labor force age 16 and over who are unemployed ***	0.433 ± 0.948	0.428 ± 0.947	0.498 ± 0.955	0.352 ± 0.939
Percent of population in poverty for whom poverty status is determined ***	0.433 ± 0.907	0.423 ± 0.906	0.565 ± 0.912	0.452 ± 0.905
Percent of population living below 125% poverty ***	0.453 ± 0.870	0.443 ± 0.869	0.579 ± 0.867	0.483 ± 0.857
Percent of workers age 16 and over who did not drive a car, truck or van as their means of transportation to work ***	0.213 ± 0.986	0.200 ± 0.979	0.356 ± 1.055	0.376 ± 1.024
Violent crimes and simple assaults per 1000 population ***	0.349 ± 0.963	0.333 ± 0.956	0.539 ± 1.027	0.480 ± 0.977
Property crimes per 1000 population ***	0.218 ± 0.942	0.205 ± 0.936	0.374 ± 0.991	0.398 ± 0.935
Total juvenile offense charges per 1000 population. age 5–17 ***	0.055 ± 0.847	0.048 ± 0.810	0.141 ± 1.231	0.065 ± 0.513
Births where mother has less than 12 years education as percent of all births ***	0.115 ± 1.022	0.101 ± 1.015	0.285 ± 1.090	0.212 ± 1.022
Population with a disability as percent of civilian noninstitutionalized total population ***	0.337 ± 0.980	0.327 ± 0.982	0.467 ± 0.960	0.402 ± 0.909
Diversity index (index of racial dissimilarity)	0.213 ± 0.900	0.214 ± 0.902	0.201 ± 0.875	0.217 ± 0.866
Tax delinquent properties as a percentage of total parcels ***	0.301 ± 1.094	0.293 ± 1.094	0.405 ± 1.091	0.286 ± 1.084
Parcels within 1/4 mile of an active park or greenway, as a percentage of all parcels ***	0.143 ± 0.981	0.134 ± 0.979	0.249 ± 1.004	0.237 ± 1.039

Table 6.

Patient EHR characteristics.

Patient characteristics	Study sample n = 37416	Low n = 34452 (92.08%)	Medium n = 2729 (7.29%)	High n = 235 (0.63%)
External injury**	0.358 ± 0.479	0.339 ± 0.473	0.568 ± 0.495	0.745 ± 0.437
Substance abuse***	0.237 ± 0.425	0.217 ± 0.412	0.451 ± 0.498	0.647 ± 0.479
Hyperlipidemia	0.384 ± 0.486	0.383 ± 0.486	0.400 ± 0.490	0.387 ± 0.488
Stroke***	0.078 ± 0.268	0.075 ± 0.263	0.112 ± 0.315	0.111 ± 0.314
Diabetes mellitus***	0.340 ± 0.474	0.334 ± 0.472	0.412 ± 0.492	0.438 ± 0.497
Remaining features not identified in the feature selection process
Schizophrenia***	0.052 ± 0.222	0.045 ± 0.207	0.125 ± 0.330	0.243 ± 0.430
HIV***	0.010 ± 0.099	0.009 ± 0.095	0.017 ± 0.129	0.021 ± 0.145
Charlson comorbidity index***	1.405 ± 1.642	1.346 ± 1.600	2.073 ± 1.938	2.306 ± 1.921
Coronary artery disease***	0.137 ± 0.344	0.130 ± 0.336	0.215 ± 0.411	0.315 ± 0.465
Cardiac arrhythmia ***	0.152 ± 0.359	0.144 ± 0.351	0.240 ± 0.427	0.332 ± 0.472
Depression***	0.373 ± 0.484	0.358 ± 0.480	0.537 ± 0.499	0.604 ± 0.490
Chronic obstructive pulmonary disease (COPD)***	0.194 ± 0.395	0.183 ± 0.387	0.309 ± 0.462	0.400 ± 0.491
Congestive heart failure (CHF)***	0.092 ± 0.289	0.085 ± 0.279	0.173 ± 0.378	0.217 ± 0.413
Asthma***	0.147 ± 0.354	0.140 ± 0.347	0.224 ± 0.417	0.285 ± 0.452
Number of unique medications in the past 30 days ***	2.507 ± 10.169	2.291 ± 9.271	4.891 ± 17.275	6.498 ± 17.657

Table 7.

Patient HIE characteristics.

Patient characteristics	Study sample n = 37416	Low n = 34452 (92.08%)	Medium n = 2729 (7.29%)	High n = 235 (0.63%)
Number of inpatient admissions in past 30 days***	0.021 ± 0.157	0.017 ± 0.137	0.063 ± 0.290	0.145 ± 0.439
History of substance abuse ***	0.264 ± 0.441	0.243 ± 0.429	0.495 ± 0.500	0.706 ± 0.456
History of external injury***	0.509 ± 0.500	0.489 ± 0.500	0.721 ± 0.449	0.868 ± 0.339
History of depression***	0.399 ± 0.490	0.383 ± 0.486	0.569 ± 0.495	0.655 ± 0.476
History of diabetes mellitus***	0.370 ± 0.483	0.364 ± 0.481	0.444 ± 0.497	0.485 ± 0.501
Remaining features not identified in the feature selection process
Number of outpatient visits in the prior 30 days ***	0.588 ± 1.073	0.578 ± 1.059	0.713 ± 1.213	0.745 ± 1.149
History of HIV ***	0.011 ± 0.103	0.010 ± 0.099	0.020 ± 0.139	0.030 ± 0.170
History of schizophrenia***	0.062 ± 0.241	0.054 ± 0.225	0.149 ± 0.356	0.277 ± 0.448
History of cardiac arrhythmia**	0.213 ± 0.409	0.201 ± 0.401	0.335 ± 0.472	0.540 ± 0.499
Charlson comorbidity index***	1.663 ± 1.833	1.593 ± 1.787	2.432 ± 2.128	2.915 ± 2.246
History of stroke***	0.099 ± 0.299	0.095 ± 0.293	0.148 ± 0.355	0.170 ± 0.377
History of osteoporosis	0.044 ± 0.205	0.044 ± 0.204	0.048 ± 0.215	0.047 ± 0.212
History of coronary artery disease***	0.165 ± 0.371	0.156 ± 0.363	0.257 ± 0.437	0.362 ± 0.482
A1C test in past 48 hours	0.021 ± 0.143	0.021 ± 0.144	0.017 ± 0.130	0.021 ± 0.145

Another key issue is regarding the values of the variables. For the time-invariant variables such as age, gender and race, we show the basic value received from the first visit. For the time-variant variables such as the reason for the ED visit and its related diagnoses, we show the last and most-recent value received from the last visit that each patient has in our data (if a patient has three visits it would be the third value, and if she had eight visits it would be the eighth one).

Gender and Race were significantly different across the groups, as well as the average number of ED visits (Table 4). The mean age is approximately 50 in all trajectories. The riskier groups the higher the percentage of male patients, the higher the percentage of White, and the greater number of visits. For riskier groups (medium and high as compared to low risks groups), there were more ED visits associated with behavioral diagnoses, emergency conditions (as opposed to a non-emergent issue), and alcohol usage. Riskier groups were more likely to occur during weekends (Table 4).

Table 5 shows that all area-level characteristics were significantly different across the groups except the diversity index measure. Concerning the area-level properties, only cancer and depression (lower rates for the riskier groups) and injury rates (higher for the riskier groups) were significantly different across the groups (For riskier trajectories, there are fewer patients with consumption of drugs, cancer and depress high rate).

All Patient EHR characteristics were significantly different across the groups beside a history of hyperlipidemia (Table 6). The trend was that there are more patients with injuries, consumption of drugs, stroke, and diabetes mellitus as well as all the rest EHR characteristics (prevalence of diseases) as the risk of ED revisits is growing.

Table 7 describe that all patient HIE characteristics were significantly different across the group and much higher for riskier groups (except from history of osteoporosis and having an A1C test in past 48 hours).

Profiling the K-means Clustering

Overall, after running the K-means in the exact same conditions of the GBTM, including all variables available in each visit, the results show three distinct clusters. Table 8 and Table 9 show the characteristics of each cluster (‘Patient and visit characteristics’ and ‘Area-level characteristics’). Table 10 shows the patient EHR characteristics across the three clusters and Table 11 shows the patient HIE characteristics across the three clusters. A close examination of the results shows findings relatively different to the GBTM.

Table 8.

Patient and visit characteristics (Age, Gender, Race, EDs).

Patient characteristics	Study sample n = 37416	Low n = 9630 (25.73%)	Medium n = 13115 (35.05%)	High n = 14671 (39.21%)
Age (years)	50.04 ± 14.203	50.06 ± 14.4	50.06 ± 14.21	50.06 ± 14.06
Gender (% male)	0.34 ± 0.472	0.34 ± 0.474	0.33 ± 0.472	0.33 ± 0.472
Race (% white)***	0.23 ± 0.419	0.21 ± 0.409	0.22 ± 0.417	0.24 ± 0.426
ED visit associated with psychological diagnosis	0.01 ± 0.102	0.01 ± 0.104	0.01 ± 0.098	0.01 ± 0.105
ED visit was due to an emergency +	0.09 ± 0.289	0.10 ± 0.294	0.09 ± 0.282	0.09 ± 0.290
ED visit resulted in hospitalization ***	0.06 ± 0.246	0.07 ± 0.262	0.07 ± 0.247	0.06 ± 0.233
ED visit occurred on a weekend ***	0.25 ± 0.432	0.22 ± 0.415	0.26 ± 0.437	0.26 ± 0.438
ED visit was due to an injury ***	0.08 ± 0.267	0.04 ± 0.207	0.07 ± 0.249	0.11 ± 0.311
Remaining features not identified in the feature selection process
ED visit number***	0.76 ± 1.793	0.52 ± 1.538	0.61 ± 1.522	1.05 ± 2.105
ED visit associated with alcohol use	0.01 ± 0.079	0.01 ± 0.079	0.01 ± 0.081	0.01 ± 0.076

Table 9.

Areal-level characteristics.

Patient characteristics	Study samplen = 37416	Low n = 9630 (25.73%)	Mediumn = 13115 (35.05%)	High n = 14671 (39.21%)
Area-level features
Rate of coronary artery disease (CAD)***	4.716 ± 2.612	4.817 ± 2.695	4.694 ± 2.586	4.670 ± 2.577
Rate of drug consumption	6.334 ± 3.900	6.528 ± 3.988	6.391 ± 3.914	6.156 ± 3.821
Rate of cancer +	4.965 ± 2.208	5.008 ± 2.269	4.958 ± 2.213	4.943 ± 2.163
Rate of depression***	11.359 ± 7.373	12.177 ± 7.439	11.615 ± 7.418	10.593 ± 7.214
Rate of external injury**	11.121 ± 3.830	10.935 ± 3.856	11.160 ± 3.875	11.209 ± 3.768
Remaining features not identified in the feature selection process
Rate of schizophrenia***	2.140 ± 2.291	2.217 ± 2.399	2.168 ± 2.246	2.065 ± 2.256
Rate of cardiac arrhythmia***	4.007 ± 2.219	4.163 ± 2.321	4.037 ± 2.242	3.877 ± 2.121
Rate of hepatitis**	2.341 ± 1.732	2.381 ± 1.722	2.354 ± 1.766	2.303 ± 1.707
Rate of HIV	0.603 ± 0.813	0.606 ± 0.821	0.606 ± 0.865	0.598 ± 0.757
Rate of pneumonia	1.884 ± 1.309	1.882 ± 1.342	1.881 ± 1.316	1.887 ± 1.281
Area-level general (time invariant variables)
ct_POPWDIPN1***	0.385 ± 0.782	0.338 ± 0.799	0.390 ± 0.778	0.411 ± 0.773
Percent of occupied housing units (combination of rental and owner) whose occupants pay 30% or more of income for housing costs ***	0.435 ± 0.847	0.394 ± 0.851	0.430 ± 0.846	0.467 ± 0.844
Percent of all occupied units that are owner occupied **	−0.264 ± 0.841	−0.250 ± 0.865	−0.254 ± 0.840	−0.282 ± 0.824
Percent of households with cash public assistance or food Stamps/SNAP ***	0.468 ± 0.904	0.413 ± 0.911	0.463 ± 0.904	0.508 ± 0.897
Percent of population all ages without health insurance ***	0.427 ± 0.912	0.397 ± 0.916	0.434 ± 0.913	0.440 ± 0.910
Percent of labor force age 16 and over who are unemployed ***	0.433 ± 0.948	0.367 ± 0.943	0.427 ± 0.951	0.480 ± 0.945
Percent of population in poverty for whom poverty status is determined ***	0.433 ± 0.907	0.384 ± 0.909	0.426 ± 0.910	0.472 ± 0.903
Percent of population living below 125% poverty ***	0.453 ± 0.870	0.406 ± 0.875	0.445 ± 0.871	0.491 ± 0.864
Percent of workers age 16 and over who did not drive a car, truck or van as their means of transportation to work ***	0.213 ± 0.986	0.170 ± 0.975	0.194 ± 0.970	0.258 ± 1.006
Violent crimes and simple assaults per 1000 population ***	0.349 ± 0.963	0.298 ± 0.971	0.326 ± 0.946	0.402 ± 0.970
Property crimes per 1000 population ***	0.218 ± 0.942	0.187 ± 0.961	0.206 ± 0.937	0.250 ± 0.932
Total juvenile offense charges per 1000 population. age 5–17 ***	0.055 ± 0.847	0.047 ± 0.905	0.049 ± 0.836	0.066 ± 0.816
Births where mother has less than 12 years education as percent of all births ***	0.115 ± 1.022	0.086 ± 1.028	0.104 ± 1.017	0.144 ± 1.022
Population with a disability as percent of civilian noninstitutionalized total population ***	0.337 ± 0.980	0.285 ± 0.985	0.320 ± 0.980	0.387 ± 0.975
Diversity index (index of racial dissimilarity)***	0.213 ± 0.900	0.248 ± 0.891	0.221 ± 0.904	0.183 ± 0.902
Tax delinquent properties as a percentage of total parcels ***	0.301 ± 1.094	0.226 ± 1.051	0.287 ± 1.088	0.363 ± 1.122
Parcels within 1/4 mile of an active park or greenway, as a percentage of all parcels ***	0.143 ± 0.981	0.085 ± 0.973	0.135 ± 0.977	0.188 ± 0.989

Table 10.

Patient EHR characteristics.

Patient characteristics	Study sample n= 37416	Low n=9630 (25.73%)	Medium n = 13115 (35.05%)	High n = 14671 (39.21%)
External injury***	0.36 ± 0.479	0.26 ± 0.440	0.31 ± 0.463	0.46 ± 0.499
Substance abuse***	0.24 ± 0.425	0.21 ± 0.406	0.21 ± 0.407	0.28 ± 0.449
Hyperlipidemia***	0.38 ± 0.486	0.37 ± 0.483	0.38 ± 0.486	0.40 ± 0.489
Stroke***	0.08 ± 0.268	0.07 ± 0.253	0.07 ± 0.258	0.09 ± 0.285
Diabetes mellitus***	0.34 ± 0.474	0.32 ± 0.467	0.33 ± 0.469	0.37 ± 0.481
Remaining features not identified in the feature selection process
Schizophrenia***	0.05 ± 0.222	0.06 ± 0.234	0.04 ± 0.204	0.06 ± 0.228
HIV	0.01 ± 0.099	0.01 ± 0.099	0.01 ± 0.098	0.01 ± 0.099
Charlson comorbidity index***	1.40 ± 1.642	1.27 ± 1.579	1.31 ± 1.553	1.58 ± 1.740
Coronary artery disease***	0.14 ± 0.344	0.12 ± 0.326	0.13 ± 0.332	0.16 ± 0.364
Cardiac arrhythmia***	0.15 ± 0.359	0.14 ± 0.349	0.13 ± 0.341	0.17 ± 0.380
Depression***	0.37 ± 0.484	0.35 ± 0.476	0.35 ± 0.477	0.41 ± 0.492
Chronic obstructive pulmonary disease (COPD)***	0.19 ± 0.395	0.17 ± 0.377	0.18 ± 0.383	0.22 ± 0.416
Congestive heart failure (CHF)***	0.09 ± 0.289	0.09 ± 0.281	0.08 ± 0.274	0.10 ± 0.306
Asthma***	0.15 ± 0.354	0.12 ± 0.327	0.13 ± 0.340	0.18 ± 0.381
Number of unique medications in the past 30 days ***	2.51 ± 10.169	2.83 ± 11.793	2.39 ± 9.203	2.40 ± 9.830

Table 11.

Patient HIE characteristics.

Patient characteristics	Study sample n= 37416	Low n = 9630 (25.73%)	Medium n = 13115 (35.05%)	High n = 14671 (39.21%)
Inpatient admissions in the past 30 days ***	0.02 ± 0.157	0.03 ± 0.203	0.02 ± 0.138	0.02 ± 0.138
History of substance abuse***	0.26 ± 0.441	0.24 ± 0.426	0.24 ± 0.425	0.31 ± 0.461
History of external injury ***	0.51 ± 0.500	0.43 ± 0.495	0.47 ± 0.499	0.60 ± 0.490
History of depression***	0.40 ± 0.490	0.37 ± 0.484	0.38 ± 0.485	0.43 ± 0.496
History of diabetes mellitus***	0.37 ± 0.483	0.35 ± 0.478	0.36 ± 0.479	0.39 ± 0.488
Remaining features not identified in the feature selection process
Number of outpatient visits in the prior 30 days *	0.59 ± 1.073	0.61 ± 1.106	0.58 ± 1.056	0.58 ± 1.065
History of HIV	0.01 ± 0.103	0.01 ± 0.103	0.01 ± 0.105	0.01 ± 0.102
History of schizophrenia***	0.06 ± 0.241	0.07 ± 0.254	0.05 ± 0.224	0.07 ± 0.247
History of cardiac arrhythmia ***	0.21 ± 0.409	0.21 ± 0.408	0.19 ± 0.395	0.23 ± 0.422
Charlson comorbidity index (CCI) ***	1.66 ± 1.833	1.56 ± 1.819	1.56 ± 1.752	1.82 ± 1.901
History of stroke***	0.10 ± 0.299	0.09 ± 0.292	0.09 ± 0.287	0.11 ± 0.313
History of osteoporosis	0.04 ± 0.205	0.04 ± 0.203	0.04 ± 0.203	0.05 ± 0.208
History of coronary artery disease***	0.16 ± 0.371	0.15 ± 0.361	0.15 ± 0.360	0.18 ± 0.386
A1C test in past 48 hours	0.02 ± 0.143	0.02 ± 0.147	0.02 ± 0.148	0.02 ± 0.136

For the lower risk group, the average last ED revisit average rates was 10.75%, for the medium group 11.69% and the higher one was 14.43%. Overall, the K-means show little differences between the averages of the last ED revisit rates. The superiority of the GBTM model was firstly pronounced with the GBTM big differences between the averages of the last ED revisit rates (GBTM: 11.1%, 26.68% and 56.59% from low, medium and high-risk groups respectively).

Race and average number of ED visits were significant different across the clusters (Table 8). For riskier clusters (medium and high as compared to low risks groups), there are more patients in that are associated with injury reasons. As compared to GBTM, the K-means the identified only 5 out of 10 variables as significantly different among the three clusters, whereas the GBTM identified 8 out of 10 variables as different.

It appears from Table 9 that all area-level general characteristics were significantly different across the clusters except the HIV and pneumonia rate. Concerning the area-level properties, only injury rates were lower for the low riskier groups. In both Tables 5 and 9, all area-level general characteristics were significantly different across the groups. The K-means model identified 24 out of 27 variables as significantly different among the three clusters, whereas the GBTM identified 20 out of 27 variables as different. However, the higher prevalence of the variables values in the K-means runs were not necessary connected to the high-risk groups.

All Patient EHR characteristics were significantly different across the groups beside HIV prevalence (Table 10). The trend was that for the riskiest cluster, there are substantially more patients with injury, consumption of drugs, stroke, and diabetes mellitus as well as most of the rest EHR characteristics (prevalence of diseases). The K-means and the GBTM models both identified 14 out of 15 variables as significantly different among the three groups.

Table 11. echoes somewhat the results of Table 10 describing that most patient HIE characteristics were significantly different across the group and much higher for riskier groups. As compared to GBTM, the K-means the identified 11 out of 14 variables as significantly different among the three clusters, whereas the GBTM identified 12 out of 14 variables as different.

According to the results, there are several quality differences between the two models. First, the Low-risk group identified by the GBTM is higher (92.08%) in contrast to K-means (25.73%) and the High-risk group identified by the GBTM is much smaller (0.63%) in contrast to K-means (39.21%), therefore the K-means cluster is much less deterministic (Table 4 vs. Table 8.), while the GBTM is much more actionable in terms of pinpointing the risky patients with much smaller group of high-risk patients. Second, the K-means shows little differences between the averages of the last ED revisit rates (10.75%, 11.68% and 14.43% from low, medium and high-risk groups respectively). The GBTM model has big differences between the averages of the last ED revisit rates (11.1%, 26.68% and 56.59% from low, medium and high-risk groups respectively).

Discussion and conclusions

This study found three distinct trajectories of ED revisit probabilities among a large sample of adult, urban, safety-net patients. The GBTM modeling approach used longitudinal data and shows the improved performance over other clustering methods. The application of advanced computational techniques, like GBTM, provide opportunities for health care organizations to better understand the underlying risks for the broader patient populations.

The heterogeneous nature of ED patients can make characterizing the entire population a challenge. Patients seek ED care for urgent, life threatening issues, but at the same time, the ED is a routine source of primary care for many, and especially unserved patients.^38,39 Additionally, evidence shows that small percentages of patients accounts for the majority of ED visits.⁴⁰ Such variance requires risk stratification in order to find interventions to prevent revisits and to facilitate transitions to other care settings.¹⁹ Consistent with this situation, we identified three distinct subpopulations with widely varying probabilities for a revisit within a single ED setting. The highest risk group had higher proportions patients with ED visits associated with behavioral diagnoses, alcohol & substance abuse, injuries, COPD diagnosis and diabetes. Each of these factors individually have been associated, to various degrees, with ED over utilization.^40,41 Identifying those patients who are likely to be members of high-risk trajectories enables healthcare organizations to develop early interventions.

The identification of distinct trajectories of ED revisit probabilities within the population has practical considerations for both clinicians as well as health care managers. For clinical care, inferences about individual patient prognosis and eventual clinical outcomes are useful in treatment decisions and care management planning. Health care managers have the responsibility of ensuring and planning for adequate organizational resources to effectively support care plans and clinical decisions. Understanding the entire number of patients at different probabilities for revisits helps staffing and resource allocation. Likewise, following changes in trajectory size or composition over time, may be tools for health care leaders to better assess and monitor dynamics and changes within their overall patient populations.

Additionally, ED revisit risk groups varied by characteristics derived from HIE data as well as area-based measures. The former illustrates the potential value of additional datasets in risk stratification data.²⁰ For example, the average number of hospital visits in the prior 30 days increased substantially across risk groups. Through the HIE, we were able to obtain accurate counts from such admissions from across the state. Data from a single institution would have underestimated the total number of visits. The differences between the area-level measures across the risk groups also illustrates the role of social and environmental contexts in patient utilization. Our models included both “traditional” measures of social determinants like area poverty, household composition, employment, etc. as well as area-level measures of disease burden. While differences existed by social determinant measures, the extent was minimal because the sample was drawn from a single urban area (thus minimizing socioeconomic variation). Our models do indicate, however, that area-level measures cannot be substitutes for individual level measures.⁴² The highest risk group had the lowest area prevalence of depression and substance abuse, which is opposite from the individual-level measures.

Methodologically, this paper contributes through the comparison of GBMT with the more widely used clustering method, K-means. The better performance of GBTM is important; K-means was less deterministic than the GBTM meaning the High-risk group identified by the K-means has much more patients than the GBTM high risk group. As access to longitudinal data increases and the computational requirements for such advanced data mining models becomes easier, time-series methods like GBTM are better positioned to be more informative approaches to risk stratification. The methodology presented here is generalizable to larger cohort sizes with a differing number of ED visits as well.

Finally, as access to longitudinal data increases and the computational requirements for such advanced data mining models becomes easier, time-series methods like GBTM are better positioned to be more informative approaches to risk stratification. The methodology presented here is generalizable to larger cohort sizes with a differing number of ED visits as well.

Limitations

Our work has several limitations. First, the study was limited to a single healthcare organization in the United States. As such the factors associated with high risk for revisits may not be generalizable other countries with differing health system structures as readmission rates vary widely internationally.⁴³ Similarly, the results may not generalize to another outcome measures such as length of stay.

Future research

Our work can be immediately expanded into two ways that increase external validity. First, the approach of employing time-series methodologies for longitudinal stratification, instead of cross-sectional methods, could be applied to other outcomes of interest such as hospital readmissions or kept (or no-show) appointments. Second, the work on ED revisit could be expanded to additional populations and areas to assess the consistency of the identified trajectories.

Conclusions

Risk stratification may be a step to help address the challenge of ED service utilization. GBTM is an advanced computational technique that effectively identified at risk groups by leveraging the longitudinal information.

Footnotes

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: JV is a founder and equity holder in Uppstroms, LLC, a health technology company.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Robert Wood Johnson Foundation through the Systems for Action National Coordinating Center (ID: 75549).

ORCID iD

Ofir Ben-Assuli

Notes

References

Iezzoni

. Risk adjustment for measuring healthcare outcomes. Chicago, IL: Health Administration Press, 1997.

Reddy

Sessums

Gupta

, et al. Risk stratification methods and provision of care management services in comprehensive primary care initiative practices. The Ann Fam Med 2017; 15(5): 451–454.

Oztekin

Tomak

. An analytic approach to better understanding and management of coronary surgeries. Decis Support Syst 2012; 52(3): 698–705.

Ozaydin

Hardin

Chhieng

. Data mining and clinical decision support systems. In Clinical Decision Support Systems. Berlin, Germany: Springer, 2016, pp. 45–68.

Naseer

Agerholm

Fastbom

, et al. Factors associated with emergency department revisits among older adults in two Swedish regions: a prospective cohort study. Arch Gerontology Geriatrics 2020; 86: 103960.

Wiler

Welch

Pines

, et al. Emergency department performance measures updates: proceedings of the 2014 emergency department benchmarking alliance consensus summit. Acad Emerg Med 2015; 22(5): 542–553.

Pham

Kirsch

Hill

, et al. Seventy‐two‐hour returns may not be a good indicator of safety in the emergency department: a national study. Acad Emerg Med 2011; 18(4): 390–397.

Pines

Mullins

Cooper

, et al. National trends in emergency department use, care patterns, and quality of care of older adults in the United States. J Am Geriatr Soc 2013; 61(1): 12–17.

de Gelder

Lucke

de Groot

, et al. Predictors and outcomes of revisits in older adults discharged from the emergency department. J Am Geriatr Soc 2018; 66(4): 735–741.

10.

Ben-Assuli

Vest

. Data mining techniques utilizing latent class models to evaluate emergency department revisits. J Biomedical Informatics 2020; 101: 103341.

11.

Hao

Jin

Shin

, et al. Risk prediction of emergency department revisit 30 days post discharge: a prospective study. PloS One 2014; 9(11): e112944.

12.

Nunez

Hexdall

Aguirre-Jaime

. Unscheduled returns to the emergency department: an outcome of medical errors? BMJ Qual Saf 2006; 15(2): 102–108.

13.

Katz

Aufderheide

Gaeth

, et al. Satisfaction and emergency department revisits in patients with possible acute coronary syndrome. The J Emergency Medicine 2013; 45(6): 947–957.

14.

Gordon

Hayward

, et al. Initial emergency department diagnosis and return visits: risk versus perception. Ann Emergency Medicine 1998; 32(5): 569–573.

15.

C-L

Wang

F-T

Chiang

Y-C

, et al. Unplanned emergency department revisits within 72 hours to a secondary teaching referral hospital in Taiwan. The J Emergency Medicine 2010; 38(4): 512–517.

16.

Alexander

Grumbach

Remy

, et al. Congestive heart failure hospitalizations and survival in California: patterns according to race/ethnicity. Am Heart J 1999; 137(5): 919–927.

17.

Black

. Learning about 30-day readmissions from patients with repeated hospitalizations. The Am J Manag Care 2014; 20(6): e200–207.

18.

Šteinmiller

Routasalo

Suominen

. Older people in the emergency department: a literature review. Int Journal Older People Nursing 2015; 10(4): 284–305.

19.

Lowthian

Straney

Brand

, et al. Unplanned early return to the emergency department by older patients: the Safe Elderly Emergency Department Discharge (SEED) project. Age and Ageing 2016; 45(2): 255–261.

20.

Vest

Ben-Assuli

. Prediction of emergency department revisits using area-level social determinants of health measures and health information exchange information. Int J Med Inform 2019; 129: 205–210.

21.

Hao

Jin

Shin

, et al. Risk prediction of emergency department revisit 30 days post discharge: a prospective study. PloS One 2014; 9(11): e112944.

22.

Ehrenberg

Oredsson

Anttila

, et al. Omhändertagande av äldre som inkommer akut till sjukhus-med fokus på sköra äldre: En systematisk litteraturöversikt. Stockholm: Statens beredning för medicinsk utvärdering, 2013.

23.

Hastings

Oddone

Fillenbaum

, et al. Frequency and predictors of adverse health outcomes in older medicare beneficiaries discharged from the emergency department. Med Care 2008; 46: 771–777.

24.

Ngai

Grudzen

Lee

, et al. The association between limited English proficiency and unplanned emergency department revisit within 72 hours. Ann Emergency Medicine 2016; 68(2): 213–221.

25.

Lange

(eds). Power k-means clustering. In International Conference on Machine Learning, June 13, 2019, Long Beach, CA.

26.

Kiang

Fisher

. An extended self-organizing map network for market segmentation—a telecommunication example. Decis Support Syst 2006; 42(1): 36–47.

27.

van Dam

J-W

Van De Velden

. Online profiling and clustering of Facebook users. Decis Support Syst 2015; 70: 60–72.

28.

Pellerin

Gao

Kaminsky

. Predicting 72-hour emergency department revisits. The Am J Emerg Med 2018; 36(3): 420–424.

29.

McDonald

Overhage

Barnes

, et al. The Indiana network for patient care: a working local health information infrastructure. Health Affairs 2005; 24(5): 1214–1220.

30.

Golembiewski

Allen

Blackmon

, et al. Combining nonclinical determinants of health and clinical data for research and evaluation: rapid review. JMIR Public Health Surveillance 2019; 5(4): e12846.

31.

NYU Center for Health and Public Service Research . NYU ED Algorithm [Internet]. 2016 [cited 2022 Jun 6]. Available from: http://wagner.nyu.edu/faculty/billings/nyued-background

32.

Charlson

Peterson

, et al. The Charlson comorbidity index is adapted to predict costs of chronic disease in primary care patients. J Clinical Epidemiology 2008; 61(12): 1234–1240.

33.

Nagin

. Group based models of development. Boston, MA: Harvard University Press, 2005.

34.

Jones

Nagin

. A note on a Stata plugin for estimating group-based trajectory models. Sociological Methods Res 2013; 42(4): 608–613.

35.

Jones

Nagin

. Advances in group-based trajectory modeling and an SAS procedure for estimating them. Sociological Methods Res 2007; 35(4): 542–571.

36.

Liu

Zhou

Yao

, editors. Weighted Gini index feature selection method for imbalanced data. 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC); 2018 27–29 March 2018.

37.

Huang

Cai

, et al. Feature selection of power quality disturbance signals with an entropy-importance-based random forest. Entropy 2016; 18(2): 44.

38.

Arnett

Thorpe

Gaskin

, et al. Race, medical mistrust, and segregation in primary care as usual source of care: findings from the exploring health disparities in integrated communities study. J Urban Health 2016; 93(3): 456–467.

39.

Shachaf

Davidovitch

Halpern

, et al. Utilization profile of emergency department by irregular migrants and hospitalization rates: lessons from a large urban medical center in Tel Aviv, Israel. Int J Equity Health 2020; 19: 1–9.

40.

LaCalle

Rabin

. Frequent users of emergency departments: the myths, the data, and the policy implications. Ann Emergency Medicine 2010; 56(1): 42–48.

41.

Lee

Chen

, et al. Prevalence of and predictors for frequent utilization of emergency department: a population-based study. Medicine 2015; 94(29).

42.

Gottlieb

Francis

Beck

. Uses and misuses of patient-and neighborhood-level social determinants of health data. The Permanente J 2018; 22.

43.

Yao

J-L

Fang

Lou

Q-Q

, et al. A systematic review of the identification of seniors at risk (ISAR) tool for the prediction of adverse outcome in elderly patients seen in the emergency department. Int Journal Clinical Experimental Medicine 2015; 8(4): 4778–4786.