Abstract
Racial and ethnic disparities in viral load suppression (VLS) have been well documented among people living with HIV (PLWH). The authors hypothesized that a contemporary analytic technique could reveal factors underlying these disparities and provide more explanatory power than broad stereotypes. Classification and regression tree analysis was used to detect factors associated with VLS among 11 419 adult PLWH receiving treatment from 186 New York State HIV clinics in 2013. A total of 8885 (77.8%) patients were virally suppressed. The algorithm identified 8 mutually exclusive subgroups characterized by age, housing stability, drug use, and insurance status but neither race nor ethnicity. Our findings suggest that racial and ethnic disparities in VLS exist but likely reflect underlying social and behavioral determinants of health.
Introduction
Racial and ethnic disparities in HIV treatment outcomes have been well documented in the United States. 1 According to the statistics from the Center for Disease Control, African Americans and Hispanic people living with HIV (PLWH) are less likely to achieve viral load suppression (VLS) and experience higher rates of AIDS-related mortality than whites. 2 –5 These disparities have implications for population health and translate into higher rates of new infections among these groups. 6,7 However, broadly defined demographic groups may conceal nuanced disparities, and the transformation of the domestic HIV/AIDS crisis into multiple “microepidemics” demands a precise and unbiased examination of treatment-related disparities. 8,9
An understanding of the underlying determinants of health disparities is essential to the development of interventions that can reach these vulnerable populations. 10,11 Socioeconomic status may be more strongly associated with VLS disparities than race and ethnicity. 12 Lack of transportation, food insecurity, and unstable housing have been established as common barriers to VLS. 13 –15 In addition, behavioral health disorders including active substance abuse and depression are also predictors of nonadherence to antiretroviral therapy (ART) and AIDS-related mortality. 16 –19 The prevalence of these and other common barriers to VLS suggests that alternative representations of disparities are available to inform both public health and quality improvement (QI) initiatives.
Because health inequalities may reflect a multifactorial combination of demographic, environmental, and behavioral factors, we hypothesized that a contemporary analytic technique could precisely articulate treatment disparities with information routinely collected in patient records. We sought to use classification and regression tree analysis (C&RT) to identify high-risk segments of the New York State HIV population that share common barriers to desired health outcomes. Classification and regression tree analysis is a commonly used data mining technique and is increasingly used in medicine and public health to stratify risk and predict response to treatment. 20 –23 We demonstrate that the technique is easily interpretable and can inform the allocation of resources to population subgroups with the highest needs.
Methodology
Participants and Setting
We used an existing database generated by sampling from 186 outpatient HIV programs in New York State. Data were extracted from eHIVQUAL, 24 a Web-based performance measurement tool that collects a standardized set of data to drive QI activities at each HIV program in the state and monitor the quality of HIV care statewide. Participation is required for all HIV programs in New York State. Sampled patient records from January 01, 2013, to December 31, 2013, were uploaded to the eHIVQUAL platform and included a predefined collection of variables, including patient demographics, behavioral data, laboratory testing, and indicators of care processes. Each participating facility followed a standardized random sampling procedure. Sample sizes were proportional to the size of each clinic’s patient population. The size of each program’s sample is calculated to ensure that a 90% confidence interval with a width of 0.16 would result from a theoretical population score of 50%. The sampling process is described at length elsewhere. 25
To be eligible for inclusion in the study cohort, patients had to be 13 years of age or older and have at least 1 HIV medical visit in 2013. Newly treated patients whose final laboratory test was fewer than 12 weeks after ART initiation were excluded. Information regarding ART initiation and subsequent treatment were obtained from medical records and uploaded by participating facilities.
Predictors and Outcome Measures
Patient characteristics were derived from medical records and entered into the eHIVQUAL application. Patient characteristics and transmission risk were abstracted from each chart. Substance use was defined as any documented use of illicit drugs within 6 months of the patient’s HIV medical visit. If drug use was identified, the specific drugs used were listed, and drug use was classified as dependent or nondependent behavior. To discern whether patients had a mental health disorder, the eHIVQUAL system asks providers whether patients screened positive for depression, an anxiety disorder, cognitive impairment, or posttraumatic stress disorder according to a standardized program indicator. 26 Facilities used their own screening tools to diagnose mental health disorders. Housing status was characterized through mutually exclusive categories that included stable housing, unstable housing, or supportive housing. For 773 patients, housing status was unknown, and this response was treated as a legitimate category in all analyses.
Suppression was defined as having an HIV viral load <200 copies/mL on the last viral load test of 2013. Laboratory test results were derived from each patient’s chart. Programs were required to enter data from each of the patient’s viral load tests in 2013, including the dates and results of each test. Laboratory results were externally validated. Twenty-six individuals with missing or invalid viral load data were omitted from the study.
Statistical Analyses
Summary statistics were generated to characterize the demographics of the study cohort. The cohort was then stratified by suppression strata. Chi-square goodness-of-fit tests were used to determine whether certain subgroups were more likely to achieve viral suppression. This process was repeated for selected sociodemographic and clinical variables. Two-sided type I errors of 5% were considered statistically significant.
To further explore disparities in viral suppression among the study cohort, C&RT was applied to the data. Classification and regression tree analysis is a supervised learning technique that may be used for regression or classification. 27 For binary outcomes such as VLS, the algorithm partitions the data into progressively smaller pieces as it attempts to create subgroups that are increasingly homogenous in outcome. A 2-step process determines each partition. First, for each predictor in the data set, the algorithm finds the threshold value that reduces overall heterogeneity in patient outcomes. Then algorithm subsequently identifies the single predictor (with its respective split determined in the preceding step) that maximally distinguishes suppressed from unsuppressed patients. This process yields 2 mutually exclusive subsets of the original group of observations. As the partitioning continues, the subsets become progressively smaller and more homogenous in outcome. 20,27 One can determine by trial and error the desired stopping criteria of the tree, ensuring that the tree has acceptable precision and interpretability. 20,27 For this analysis, the minimum node size was set to 1/20th the size of the data set and the complexity parameter to 0.002.
We used the rpart package in R to build the classification trees. Variables available to C&RT included age, race and ethnicity, gender, risk group, past substance abuse, depression, immigration status, current alcohol or cocaine or heroin use, insurance payer, and housing instability. Analyses were performed using R version 3.0.3 (The R Foundation for Statistical Computing).
Results
A total of 8808 of 11 252 eligible patients were suppressed on their last viral load, yielding an overall suppression rate of 78.3%. Table 1 shows the characteristics of the entire sample. The cohort was predominantly composed of people of color, largely non-Hispanic black (48.3%), and Hispanic (33.1%). The majority of the cohort was men (63.1%), and the mean age was 46.8 years. The most common transmission risk group was heterosexual contact (46.0%), followed by men who have sex with men (MSM) (25.3%) and intravenous drug use (14.6%).
Characteristics of Facilities Submitting Data.
Abbreviation: IQR, interquartile range.
In univariate analyses, patients who were under the age of 45, of African-American race, or who acquired their infections through intravenous drug use were less likely to be suppressed (Table 2). In addition, unsuppressed patients were more likely to lack stable housing, have Medicaid or be uninsured, use illicit drugs, and have a mental health disorder. Suppressed patients were more likely to be older, MSM, self-identify as white or as an Asian or Pacific Islander, be stably housed, and not use illicit drugs during the review period.
Characteristics of the Sample by Viral Load Suppression Status.
Abbreviation: PTSD, posttraumatic stress disorder; ADAP, AIDS Drug Assistance Program.
aTwo-tailed Pearson χ2 test; significance level < .01.
bTwo-tailed Pearson χ2; significance level < .001.
cTwo-tailed Pearson χ2; significance level < .05.
The classification algorithm stratified the study cohort and identified 5 risk profiles (1, 2, 3, 4, and 8) with significantly lower rates of VLS (P < .01; Figure 1). The size of these groups, as well as their relative likelihood of VLS, is presented in Table 3. Patients without stable housing and with evidence of active substance use were the least likely to be suppressed (profile 1, relative risk [RR] = 2.0), followed by unstably housed individuals with no evidence of substance use (profile 2, RR = 1.4). Among patients with stable housing, 3 risk profiles with a lower than average prevalence of suppression were identified. These risk profiles included patients younger than 47, who were uninsured or enrolled in public insurance, and who actively used illicit drugs (profile 3, RR = 1.7) as well as those who did not actively use illicit drugs (profile 4, RR = 1.5). Patients older than 47 years who abused cocaine (profile 6, RR = 1.4) were also significantly less likely to achieve VLS.

Classification tree generated by the classification and regression tree analysis (C&RT) algorithm and the identified risk profiles of viral load suppression.
Risk Profiles Identified Using C&RT.
Abbreviations: C&RT, classification and regression tree analysis; CI, confidence interval.
aSignificance level (.05) using the Bonferroni correction for multiple comparisons.
Two patient groups (profiles 7 and 8) were more likely than average to be suppressed. These groups included patients aged 47 years and older, who were privately insured (profile 7, RR = 0.6) as well as those who were uninsured or publicly insured (profile 6, RR = 0.7).
Discussion
Despite the provision of high-quality, comprehensive care in New York State HIV care programs, more than 20% of patients in a large statewide cohort failed to achieve VLS. Although black and Hispanic HIV-infected outpatients were observed to have lower rates of VLS than whites, sociodemographic characteristics likely underlie observed racial and ethnic inequalities. In order to ameliorate disparities in health outcomes, interventions must be targeted and tailored to vulnerable populations defined by common barriers to VLS rather than broadly defined demographic groups.
Using classification tree analysis, a parsimonious set of risk profiles characterized by overlapping characteristics provided a detailed representation of disparities in viral load outcomes. 28 Housing instability was the single strongest predictor of failure to achieve VLS. Unstably housed individuals with substance abuse disorders had poor (56.2%) viral load outcomes, as did unstably housed PLWH who did not abuse substances (70.0%). Housing instability is a proven determinant of clinical outcomes and is the focus of several recent interventions nationally and in New York State. 29 –33 Among individuals with stable housing, age was highly predictive of VLS. This finding supports recent efforts to improve outcomes along the HIV care and treatment cascade in younger patients. 34,35 Although rates of VLS were lowest among individuals aged 19 to 24 years, the algorithm identified a classification threshold of 47 years because this split effectively segmented a large proportion of the cohort. Substances users had lower rates of VLS across multiple strata, corroborating other reports. 36,37 Implementation of QI initiatives and health disparities research underscores the importance of understanding the factors underlying inequalities when designing organizational and jurisdictional public health interventions. 38 Alternatively, a continued focus on racial and ethnic generalizations may give rise to beliefs among providers that blacks and Hispanics, as groups, are less likely to effectively self-manage. These assumptions could lead to inappropriate delays in offering treatment and enhance perceptions of stigma and discrimination toward them.
The complexity of identifying multifactorial combinations of barriers to VLS poses specific challenges. 39 Because the algorithm is not hypothesis driven and proceeds automatically, it may neglect groups that are traditionally perceived as vulnerable populations. For example, rates of VLS were uniformly low among young PLWH in our cohort that prevented further segmentation by C&RT. Although young MSM of color have frequently been identified as vulnerable, the algorithm did not precipitate this group owing to the predominance of other variables more predictive of VLS. 40,41 Transgender people were also not represented likely owing to their small numbers (n = 112). 42 This limitation can also be a strength of C&RT, in that it only draws attention to sizeable population segments within a public health jurisdiction or large clinic. Finally, the algorithm has been characterized as unstable due to the fact that minor changes to the data can alter a tree’s appearance. 43,27 For example, early in the modeling process, C&RT identified a vulnerable group of stably housed individuals under the age of 26. When insurance status was introduced into the algorithm for the first time, this partition became less powerful and was eliminated from consideration.
Several other limitations of our data should be considered. Although our study group consists of a large and heterogeneous cadre of facilities, data from the Veterans Health Administration, private practices, and correctional facilities were not available. Additionally, our markers of socioeconomic status—insurance status and housing status—cannot account for additional social determinants that can influence VLS such as income, history of incarceration, 44 –46 and education level. 47 Future analyses should further elucidate the relationship between socioeconomic status and population-level VLS disparities.
To ameliorate treatment disparities, New York State has long been committed to providing comprehensive care for PLWH through insurance programs and state-defined models of care, including centers of excellence. In spite of these initiatives, persisting disparities in VLS suggest that intensive and targeted interventions are needed. Robust process improvement at the facility level can lead to identification of the specific subgroups in need of targeted interventions within the specific clinic population. 48 Peer mentoring and enhanced personal contact from HIV care providers can complement existing initiatives to improve VLS among young PLWH. Active substance users would likely benefit from colocated HIV treatment and drug treatment services. These tailored efforts can be accompanied by intensive case management to help vulnerable PLWH overcome common barriers to VLS. Inductive techniques such as C&RT can be regularly used to evaluate progress and detect population segments that require increased attention.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
