Abstract
Publicly reported patient experience (PE) scores and overall hospital quality and safety ratings (QSR) increasingly influence consumers' healthcare choices and reimbursement, but their correlation remains unclear. We analyzed 2,384 adult US acute care hospitals’ public data from Google® (Google), the Leapfrog Group® (Leapfrog), Hospital Compare® (HC), US News and World Report® (USN), and Healthgrades® (HG). We abstracted QSR from three sources (Leapfrog, HC, and USN) and PE from four sources (HC, USN, HG, and Google). We found significant differences in PE scores across QSR categories in Leapfrog and HC (Kruskal-Wallis Test, all P < 0.001) and between USN-ranked and non-ranked institutions (Mann-Whitney U test, P < 0.001), indicating better PE in hospitals with higher QSR scores. We performed a logistic regression that showed that some but not all QSR were independently associated with PE. In conclusion, our study adds to the evidence that PE and quality often, though not always, correlate.
Keywords
Introduction
Hospitals in the United States are increasingly leveraging online marketing to deliver services. Personal recommendations, insurance network constraints, and publicly available data influence patients’ hospital and provider choices. Meanwhile, patients’ engagement with public data sources is rising rapidly.1,2
Two categories of online data accessible to consumers are patient experience (PE) information and quality and safety ratings (QSR) generated by public and private organizations. PE data are derived from unscripted, purely evaluative online reviews and standardized tools such as the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) Survey. 3 The HCAHPS surveys aim to measure experiences by asking objective questions about care processes to reduce subjectivity.
Previous research generally supports a positive association between PE and quality-of-care outcomes. 4 A systematic review found a positive association between higher PE scores and higher quality-of-care and patient-safety ratings across most of the 55 studies. 5 However, these studies have limitations, including small sample sizes, a focus on specific conditions, and an emphasis on primary care settings. We identified a gap in the research: the need to include a large number of hospitals and to simultaneously compare multiple sources of QSR and PE. We, therefore, included all U.S. acute care hospitals to examine the correlation between various PE sources, including unscripted evaluative online reviews, and the association between PE and QSR across several rating systems.
Methods
Search Strategy
We obtained the list of acute care hospitals from the American Hospital Directory, which compiles information from publicly available sources. 6 We excluded specialty, pediatric, and critical access hospitals because they were outside the scope of our intended study population. We also excluded facilities with no data entries in any of the five PE and QSR datasets.
Between February 1, 2023, and October 3, 2023, four authors (TN, SI, AE, and JR) independently queried five online databases: Google® (Google), the Leapfrog Group® (Leapfrog), Hospital Compare® (HC), U.S. News & World Report® (USN), and Healthgrades® (HG). To achieve temporal alignment, all QSR and PE data for a given hospital were recorded on a single day, recognizing that there will always be variable lags between sources; i.e., for a given hospital, two QSR sources might be assessing different time periods.
Databases Searched
We gathered data from five sources: Google, HC, HG, Leapfrog, and USN.
Google is a large technology company focusing primarily on online searches. Users can leave unscripted reviews and rate their experience on a scale of 1 to 5, with 5 being the highest rating. The review aggregation is therefore random and unadjusted. HC is CMS’s tool for presenting data from hospitals participating in the Medicare program. HC assigns scores on a 1 to 5-star scale. HG is a private U.S. company that evaluates hospitals based on risk-adjusted mortality and in-hospital complications, converting data from publicly available sources into 1 to 5-star scores. Leapfrog is a not-for-profit organization dedicated to advancing patient safety in healthcare. Each hospital voluntarily completes a survey twice a year. After validating the data, the Leapfrog organization assigns an overall safety grade (A, B, C, D, or F; F is the lowest) based on various safety and quality domains. USN is a private digital media company that publishes regional, national, and specialty procedures & conditions-specific hospital rankings.
Variables Recorded
For each hospital, we recorded the location, number of beds, total average yearly discharges, gross revenue, and patient days. For the QSR data, we noted Leapfrog’s overall safety grade, HC’s total number of quality and safety stars, and USN’s hospital regional or national ranking status. For the PE results, we documented the average ratings on Google, HC, and USN, along with the percentage of patients who rated a hospital either a 9 or 10 in HG.
Statistical Analysis
We present continuous data as median [25-75% Interquartile range] and categorical data as numbers (percentage). We calculated the Spearman correlation coefficient to assess the strength of the correlation between the different PE scores. We analyzed differences in patient PE across quality ratings and compared the values using the Mann-Whitney U and Kruskal-Wallis tests.
We performed a linear logistic regression to assess the independent association between the PE scores and QSR. We entered the following variables: number of beds, total average yearly discharges, patient days, gross revenue, and PE for HC, HG, USN, and Google in one step. We then uncovered evidence of collinearity among the non-PE variables (number of beds, total discharges, patient days). We then added interaction terms for the three variables and re-ran the model. We entered all variables in a single step and considered them significant at p < 0.05. All analyses were conducted using SPSS software version 28.0, IBM (Chicago, IL, USA).
Results
Hospitals Characteristics
The American Hospital Directory database included 3,871 hospitals. Of these, 2,384 met the inclusion criteria for our study. The average number of beds per hospital was 197 (±237), the average number of yearly discharges was 7,754 (±10,835), and the total patient days were 37,936 (±64,212). The number of Google reviews varied widely, ranging from close to 50 to over 1,000.
Correlation Between the Different PE Scores
Correlation Matrix Between the Patient Experience Ratings
We present the correlation coefficients between the studied PE sources. The correlation is strong between HC, HG, and USN, and very weak between Google and the other three sources.
PE Scores and QSR
We found significant differences in PE scores from HC, USN, Google, and HG across Leapfrog and HC categories (Kruskal-Wallis Test, all P < 0.001), and between USN-ranked and non-ranked institutions (Mann-Whitney U test, P < 0.001), i.e., better PE experience was associated with higher QSR grades (Figure 1). Appendix Table 1 presents the average PE scores for the different categories, expressed as Median [IQR]. Patient experience scores in HC, HG, USN, and google stratified by leapfrog or HC grades
The logistic regression showed that QSR in Leapfrog was statistically independently associated with PE in Google (P=0.044), HG (P=0.004), and USN (P=0.032) and was close to statistical significance in HC (P=0.054); QSR in HC was statistically independently associated with PE in HC (P=0.002), HG(P<0001), and Google (P=0.009) but not USN. QSR in USN was not statistically associated with PE, with a close result with HC (P=0.053). Appendix Tables 2–4 presents the complete results of our Linear logistic regression analysis.
Discussion
Our analysis of five databases demonstrated that PE and QSR correlate to varying degrees. The dramatic increase in patients’ use of ratings and reviews to over 75% in recent surveys 2 underscores the growing relevance of our subject.
The strong correlation of PE scores among HC, HG, and USN is expected and likely arises from the use of HCAHPS survey data. Despite statistical significance, the crowdsourced database (Google) and the three other PE sources show little correlation, a finding that aligns with published research, such as that of Ellenbogen et al., 7 and may be due to differences in methodology. HCAHPS’s experiential measurement with random sampling and risk adjustment of surveys sent only to patients discharged home 8 contrasts with purely evaluative online review platforms accessible to all to express experiences beyond the constraints of standardized questions. Unsolicited reviews are at risk of self-selection bias, in which a vocal minority, either irate or ecstatic about their experience, posts online. 9
Our results regarding the correlation between QSR and PE were mixed, consistent with prior evidence. For example, research using CMS quality measures has not consistently shown meaningful correlations with Google reviews,7,10 and patient-reported outcomes from a national spine registry correlated poorly with physician ratings on platforms such as HG, Vitals, WebMD, and Google. 11 In some studies, lower 30-day readmissions were associated with better PE [12-15], but not in others.10,16 Regarding mortality, Trzeciak et al. 15 reported lower in-hospital mortality with better PE, and a Norwegian study found that Facebook ratings significantly correlated with 30-day mortality. 17
The lack of correlation between QRS in USN and PE across all sources can be explained by the uniqueness of the USN rating, which heavily rewards reputation and complex specialty outcomes rather than consumer sentiment after routine care. A crowded hospital with an unpleasant emergency room experience might have low PE scores but still receive a USN ranking for, say, advanced neurosurgical procedures that only a few patients receive.
The practical implications of our findings are significant. Hospital leaders must look beyond improving HCAHPS scores, even though these scores directly affect reimbursement under Medicare’s Hospital Value-Based Purchasing Program. A hospital must adopt a dual-track strategy in which clinical quality initiatives will improve its Leapfrog grade, while service excellence and reputation management are required to improve Google ratings. One is not always a proxy for the other.
The financial consequences of poor quality or PE ratings are considerable. In a 2015 survey of 2,360 physicians and other healthcare providers, nearly 55% used online ratings to derive measures to improve patient care, 18 and this number has likely increased over the past decade. Yet, performance-based reimbursement faces skepticism. In a 2017 survey of 4197 practicing physicians in Rhode Island, only 2% felt that Medicare sites were “very accurate” in depicting physician quality, just over one-third felt that performance-based quality measures were “helpful”, and a similar percentage reported that patient reviews were “helpful” for patients choosing a physician. 19 One reason for this skepticism may be PE’s emphasis on non-clinical factors, such as bedside manner and communication skills, as highlighted in a recent study of one-star reviews of pediatric orthopedic surgeons. 20 Our findings suggest a correlation between PE and QSR, but are not sufficient to reduce that skepticism and can only reinforce the need for a hospital to pay attention to multiple PE and QSR sources.
Our study has several strengths, including the number of hospitals we included and the diversity of PE sources we queried, which mirrors the multi-platform search behavior of modern healthcare consumers. We do not believe excluding specialty, pediatric, and critical-access hospitals is a limitation of our study, given their stark differences from acute adult hospitals.
Limitations
First, we explore limitations due to temporal alignment. PE and quality ratings are dynamic and could change our conclusions if the study were repeated. Along similar lines, the published data across different QSR and PE sources is not in real time, with variable lags between sources. Our data collection spanned eight months, and we could not verify the extent of change within the collection timeframe. Second, we chose not to exclude hospitals with a handful of potentially unstable Google ratings. Third, we used Google as the sole source of crowdsourced PE. Patients may value different platforms unequally, and assuming equal influence could misrepresent real-world behavior. Future research should first explore the patient-preferred and most widely used crowdsourced sites. Fourth, the correlations we found do not establish causation. In our logistic regression, we were unable to adjust for the case mix index or Medicaid data.
Conclusion
In our study of 2,384 acute care hospitals in the USA, the correlation between QSR of Leapfrog, HC, USN, and PE in Google, HC, HG, and USN was frequent but not uniform.
Footnotes
Ethical Considerations
The Cooper University Healthcare Institutional Review Board reviewed this study and deemed it exempt from institutional review
Authors Contributions
Conceptualization (SR), Data gathering (SR, TN, AAS, SR, MS), Statistical Analysis (SR), Writing-up (SB, SR, MB).
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
Raw data saved in a password secured manner, and we can share it freely with any researcher who is going to use it for noncommercial purposes.
Appendix
Patient Experience Stratified by Quality and Safety Ratings We present the results as median [IQR]. PE’s highest possible score is 5 in all sources except for HG, which is a percentage (percent that rated 9 or 10).
Quality and safety result
Patient experience source
HC score
USN
Google
HG
Leapfrog
A
3[3,4]
3[3,4]
3.2[2.8-3.9]
72[68-76]
B
3[3,4]
3[3,4]
3[2.6-3.6]
69[64-74]
C
3[2,3]
3[2,3]
2.9[2.6-3.4]
67[61-71]
D
3[2,3]
3[2,3]
2.8[2.4-3.1]
62[57-69]
F
2.5[1-3.2]
1.5[1-3.2]
2.5[2.3-3.2]
57.5[43.5-35.2]
HC
5
4[3,4]
4[3,4]
3.2[2.9-3.7]
76[71-79]
4
3[3,4]
3[3,4]
3.1[2.7-3.6]
71[67-75]
3
3[3-3]
3[3-3]
3[2.6-3.6]
68[64-73]
2
3[2,3]
3[2,3]
2.8[2.5-3.4]
64[60-69]
1
2[2,3]
2[2,3]
2.7[2.4-3.4]
59[53-64]
USN
Ranked
3[3,4]
3[3,4]
3.1[2.8-3.5]
72[68-76]
Not Ranked
3[3,4]
3[2-4]
3.0[2.6-3.6]
68[62-73]
Logistic Regression Results, Leapfrog aDependent Variable: Leapfrognumerical.
Leapfrog numerical score
Coefficients
a
Unstandardized coefficients
Standardized coefficients
Model
B
Std. error
Beta
t
Sig.
(Constant)
1.261
.301
4.196
<.001
Staffed
-.005
.002
-1.270
-2.235
.026
Total discharges
.000
.000
2.110
2.644
.008
Patient days
-1.093E-5
.000
-.731
-1.158
.247
Gross Patient revenue
3.840E-8
.000
.092
2.376
.018
Google review
.095
.047
.067
2.012
.044
HospitalcomparePatientExp
.126
.066
.104
1.925
.054
HealthgradesExp
.019
.006
.154
2.889
.004
Us New Hospital Satisfaction
.133
.062
.112
2.150
.032
InterStaffGoogle
.001
.000
.621
1.962
.050
InterStaffHC
.000
.001
-.197
-.434
.664
InterStaffHgrades
7.659E-5
.000
1.338
1.367
.172
InterStaffUSN
-.001
.001
-.673
-1.606
.108
InterDiscUSN
3.234E-6
.000
.117
.210
.834
InterDiscHC
-1.280E-5
.000
-.475
-.768
.443
InterDiscHgrade
-2.228E-6
.000
-1.800
-1.812
.070
InterDiscGoogle
2.237E-5
.000
.837
1.579
.114
InterDays
-5.938E-6
.000
-1.305
-2.137
.033
InterDaysHC
3.278E-6
.000
.720
1.043
.297
InterDaysUSN
3.797E-6
.000
.822
1.337
.181
InterstaffedDischarged
-2.631E-9
.000
-.035
-.120
.905
Interstaffeddays
3.534E-9
.000
.287
1.340
.180
InterDisDays
-1.000E-10
.000
-.356
-1.539
.124
Logistic Regression Results, Hospital Compare aDependent Variable: Hospital Compare.
Hospital Compare
Coefficients
a
Unstandardized coefficients
Standardized coefficients
Model
B
Std. error
Beta
t
Sig.
1
(Constant)
-1.030
.328
-3.138
.002
Staffed
.000
.003
.046
.089
.929
Total discharges
5.142E-5
.000
.481
.665
.506
Patient days
-1.813E-5
.000
-1.008
-1.758
.079
Gross Patient revenue
4.331E-8
.000
.086
2.454
.014
Google review
.135
.052
.079
2.622
.009
HospitalcomparePatientExp
.227
.072
.155
3.167
.002
HealthgradesExp
.045
.007
.309
6.413
<.001
Us New Hospital Satisfaction
.061
.068
.043
.898
.369
InterStaffGoogle
.000
.000
.092
.321
.748
InterStaffHC
.000
.001
-.218
-.530
.596
InterStaffHgrades
7.184E-6
.000
.104
.117
.907
InterStaffUSN
.000
.001
-.187
-.492
.623
InterDiscUSN
6.144E-6
.000
.185
.365
.715
InterDiscHC
-4.781E-6
.000
-.148
-.263
.793
InterDiscHgrade
-2.170E-7
.000
-.146
-.162
.872
InterDiscGoogle
-1.100E-5
.000
-.342
-.711
.477
InterDays
2.844E-7
.000
.052
.094
.925
InterDaysHC
2.596E-6
.000
.474
.757
.449
InterDaysUSN
2.139E-6
.000
.385
.690
.490
InterstaffedDischarged
2.717E-8
.000
.299
1.134
.257
Interstaffeddays
-2.810E-9
.000
-.189
-.976
.329
InterDisDays
5.189E-13
.000
.002
.007
.994
Logistic Regression Results, USN Ranking aDependent Variable: USNRankingdichotomized.
Coefficients
a
Unstandardized coefficients
Standardized coefficients
Model
B
Std. error
Beta
t
Sig.
1
(Constant)
-.312
.100
-3.128
.002
Staffed
-.001
.001
-.608
-1.357
.175
Total discharges
-2.394E-5
.000
-.641
-1.018
.309
Patient days
8.247E-6
.000
1.310
2.632
.009
Gross Patient revenue
1.188E-9
.000
.007
.221
.825
Google review
-.004
.016
-.006
-.232
.817
HospitalcomparePatientExp
.042
.022
.083
1.935
.053
HealthgradesExp
.003
.002
.050
1.184
.236
Us New Hospital Satisfaction
-.021
.021
-.042
-1.031
.303
InterStaffGoogle
.000
.000
.735
2.943
.003
InterStaffHC
.000
.000
-.344
-.960
.337
InterStaffHgrades
-1.830E-6
.000
-.076
-.098
.922
InterStaffUSN
.000
.000
.198
.599
.549
InterDiscUSN
3.753E-6
.000
.324
.734
.463
InterDiscHC
-8.880E-6
.000
-.784
-1.606
.108
InterDiscHgrade
6.492E-7
.000
1.246
1.591
.112
InterDiscGoogle
-1.019E-7
.000
-.009
-.022
.983
InterDays
-2.182E-6
.000
-1.140
-2.367
.018
InterDaysHC
2.157E-6
.000
1.126
2.069
.039
InterDaysUSN
-7.705E-7
.000
-.396
-.818
.414
InterstaffedDischarged
2.517E-8
.000
.791
3.457
<.001
Interstaffeddays
-2.787E-9
.000
-.537
-3.184
.001
InterDisDays
-7.142E-11
.000
-.603
-3.311
<.001
