Does Patient Experience Correlate With Overall Hospital Quality Ratings?

Abstract

Publicly reported patient experience (PE) scores and overall hospital quality and safety ratings (QSR) increasingly influence consumers' healthcare choices and reimbursement, but their correlation remains unclear. We analyzed 2,384 adult US acute care hospitals’ public data from Google® (Google), the Leapfrog Group® (Leapfrog), Hospital Compare^® (HC), US News and World Report^® (USN), and Healthgrades^® (HG). We abstracted QSR from three sources (Leapfrog, HC, and USN) and PE from four sources (HC, USN, HG, and Google). We found significant differences in PE scores across QSR categories in Leapfrog and HC (Kruskal-Wallis Test, all P < 0.001) and between USN-ranked and non-ranked institutions (Mann-Whitney U test, P < 0.001), indicating better PE in hospitals with higher QSR scores. We performed a logistic regression that showed that some but not all QSR were independently associated with PE. In conclusion, our study adds to the evidence that PE and quality often, though not always, correlate.

Keywords

CAHPS / HCAHPS patient experience patient safety patient-reported experience measures quality improvement

Introduction

Hospitals in the United States are increasingly leveraging online marketing to deliver services. Personal recommendations, insurance network constraints, and publicly available data influence patients’ hospital and provider choices. Meanwhile, patients’ engagement with public data sources is rising rapidly.^1,2

Two categories of online data accessible to consumers are patient experience (PE) information and quality and safety ratings (QSR) generated by public and private organizations. PE data are derived from unscripted, purely evaluative online reviews and standardized tools such as the Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) Survey.³ The HCAHPS surveys aim to measure experiences by asking objective questions about care processes to reduce subjectivity.

Previous research generally supports a positive association between PE and quality-of-care outcomes.⁴ A systematic review found a positive association between higher PE scores and higher quality-of-care and patient-safety ratings across most of the 55 studies.⁵ However, these studies have limitations, including small sample sizes, a focus on specific conditions, and an emphasis on primary care settings. We identified a gap in the research: the need to include a large number of hospitals and to simultaneously compare multiple sources of QSR and PE. We, therefore, included all U.S. acute care hospitals to examine the correlation between various PE sources, including unscripted evaluative online reviews, and the association between PE and QSR across several rating systems.

Methods

Search Strategy

We obtained the list of acute care hospitals from the American Hospital Directory, which compiles information from publicly available sources.⁶ We excluded specialty, pediatric, and critical access hospitals because they were outside the scope of our intended study population. We also excluded facilities with no data entries in any of the five PE and QSR datasets.

Between February 1, 2023, and October 3, 2023, four authors (TN, SI, AE, and JR) independently queried five online databases: Google® (Google), the Leapfrog Group® (Leapfrog), Hospital Compare® (HC), U.S. News & World Report® (USN), and Healthgrades® (HG). To achieve temporal alignment, all QSR and PE data for a given hospital were recorded on a single day, recognizing that there will always be variable lags between sources; i.e., for a given hospital, two QSR sources might be assessing different time periods.

Databases Searched

We gathered data from five sources: Google, HC, HG, Leapfrog, and USN.

Google is a large technology company focusing primarily on online searches. Users can leave unscripted reviews and rate their experience on a scale of 1 to 5, with 5 being the highest rating. The review aggregation is therefore random and unadjusted. HC is CMS’s tool for presenting data from hospitals participating in the Medicare program. HC assigns scores on a 1 to 5-star scale. HG is a private U.S. company that evaluates hospitals based on risk-adjusted mortality and in-hospital complications, converting data from publicly available sources into 1 to 5-star scores. Leapfrog is a not-for-profit organization dedicated to advancing patient safety in healthcare. Each hospital voluntarily completes a survey twice a year. After validating the data, the Leapfrog organization assigns an overall safety grade (A, B, C, D, or F; F is the lowest) based on various safety and quality domains. USN is a private digital media company that publishes regional, national, and specialty procedures & conditions-specific hospital rankings.

Variables Recorded

For each hospital, we recorded the location, number of beds, total average yearly discharges, gross revenue, and patient days. For the QSR data, we noted Leapfrog’s overall safety grade, HC’s total number of quality and safety stars, and USN’s hospital regional or national ranking status. For the PE results, we documented the average ratings on Google, HC, and USN, along with the percentage of patients who rated a hospital either a 9 or 10 in HG.

Statistical Analysis

We present continuous data as median [25-75% Interquartile range] and categorical data as numbers (percentage). We calculated the Spearman correlation coefficient to assess the strength of the correlation between the different PE scores. We analyzed differences in patient PE across quality ratings and compared the values using the Mann-Whitney U and Kruskal-Wallis tests.

We performed a linear logistic regression to assess the independent association between the PE scores and QSR. We entered the following variables: number of beds, total average yearly discharges, patient days, gross revenue, and PE for HC, HG, USN, and Google in one step. We then uncovered evidence of collinearity among the non-PE variables (number of beds, total discharges, patient days). We then added interaction terms for the three variables and re-ran the model. We entered all variables in a single step and considered them significant at p < 0.05. All analyses were conducted using SPSS software version 28.0, IBM (Chicago, IL, USA).

Results

Hospitals Characteristics

The American Hospital Directory database included 3,871 hospitals. Of these, 2,384 met the inclusion criteria for our study. The average number of beds per hospital was 197 (±237), the average number of yearly discharges was 7,754 (±10,835), and the total patient days were 37,936 (±64,212). The number of Google reviews varied widely, ranging from close to 50 to over 1,000.

Correlation Between the Different PE Scores

We found a very weak but statistically significant correlation between Google and HC (0.17 [0.13-0.21]), USN (0.14 [0.1-0.18]), and HG (0.28 [0.25-0.32]) (all p < 0.001). There were very strong correlations between HC-USN 0.75[0.73-0.77], HC-HG 0.76[0.74-0.77] and USN-HG 0.75[0.73-0.76] (all P <0.001). Table 1 summarizes these findings in a Correlation Matrix.

Table 1.

Correlation Matrix Between the Patient Experience Ratings

Google	1
HC	0.17	1
USN	0.14	0.75	1
HG	0.28	0.76	0.75	1
	Google	HC	USN	HG

Correlation coefficient interpretation: 0-0.4; weak, 0.41-0.79; moderate, 0.80-1; strong.

We present the correlation coefficients between the studied PE sources. The correlation is strong between HC, HG, and USN, and very weak between Google and the other three sources.

PE Scores and QSR

We found significant differences in PE scores from HC, USN, Google, and HG across Leapfrog and HC categories (Kruskal-Wallis Test, all P < 0.001), and between USN-ranked and non-ranked institutions (Mann-Whitney U test, P < 0.001), i.e., better PE experience was associated with higher QSR grades (Figure 1). Appendix Table 1 presents the average PE scores for the different categories, expressed as Median [IQR].

Figure 1.

Patient experience scores in HC, HG, USN, and google stratified by leapfrog or HC grades

The logistic regression showed that QSR in Leapfrog was statistically independently associated with PE in Google (P=0.044), HG (P=0.004), and USN (P=0.032) and was close to statistical significance in HC (P=0.054); QSR in HC was statistically independently associated with PE in HC (P=0.002), HG(P<0001), and Google (P=0.009) but not USN. QSR in USN was not statistically associated with PE, with a close result with HC (P=0.053). Appendix Tables 2–4 presents the complete results of our Linear logistic regression analysis.

Discussion

Our analysis of five databases demonstrated that PE and QSR correlate to varying degrees. The dramatic increase in patients’ use of ratings and reviews to over 75% in recent surveys² underscores the growing relevance of our subject.

The strong correlation of PE scores among HC, HG, and USN is expected and likely arises from the use of HCAHPS survey data. Despite statistical significance, the crowdsourced database (Google) and the three other PE sources show little correlation, a finding that aligns with published research, such as that of Ellenbogen et al.,⁷ and may be due to differences in methodology. HCAHPS’s experiential measurement with random sampling and risk adjustment of surveys sent only to patients discharged home⁸ contrasts with purely evaluative online review platforms accessible to all to express experiences beyond the constraints of standardized questions. Unsolicited reviews are at risk of self-selection bias, in which a vocal minority, either irate or ecstatic about their experience, posts online.⁹

Our results regarding the correlation between QSR and PE were mixed, consistent with prior evidence. For example, research using CMS quality measures has not consistently shown meaningful correlations with Google reviews,^7,10 and patient-reported outcomes from a national spine registry correlated poorly with physician ratings on platforms such as HG, Vitals, WebMD, and Google.¹¹ In some studies, lower 30-day readmissions were associated with better PE [^12-15], but not in others.^10,16 Regarding mortality, Trzeciak et al.¹⁵ reported lower in-hospital mortality with better PE, and a Norwegian study found that Facebook ratings significantly correlated with 30-day mortality.¹⁷

The lack of correlation between QRS in USN and PE across all sources can be explained by the uniqueness of the USN rating, which heavily rewards reputation and complex specialty outcomes rather than consumer sentiment after routine care. A crowded hospital with an unpleasant emergency room experience might have low PE scores but still receive a USN ranking for, say, advanced neurosurgical procedures that only a few patients receive.

The practical implications of our findings are significant. Hospital leaders must look beyond improving HCAHPS scores, even though these scores directly affect reimbursement under Medicare’s Hospital Value-Based Purchasing Program. A hospital must adopt a dual-track strategy in which clinical quality initiatives will improve its Leapfrog grade, while service excellence and reputation management are required to improve Google ratings. One is not always a proxy for the other.

The financial consequences of poor quality or PE ratings are considerable. In a 2015 survey of 2,360 physicians and other healthcare providers, nearly 55% used online ratings to derive measures to improve patient care,¹⁸ and this number has likely increased over the past decade. Yet, performance-based reimbursement faces skepticism. In a 2017 survey of 4197 practicing physicians in Rhode Island, only 2% felt that Medicare sites were “very accurate” in depicting physician quality, just over one-third felt that performance-based quality measures were “helpful”, and a similar percentage reported that patient reviews were “helpful” for patients choosing a physician.¹⁹ One reason for this skepticism may be PE’s emphasis on non-clinical factors, such as bedside manner and communication skills, as highlighted in a recent study of one-star reviews of pediatric orthopedic surgeons.²⁰ Our findings suggest a correlation between PE and QSR, but are not sufficient to reduce that skepticism and can only reinforce the need for a hospital to pay attention to multiple PE and QSR sources.

Our study has several strengths, including the number of hospitals we included and the diversity of PE sources we queried, which mirrors the multi-platform search behavior of modern healthcare consumers. We do not believe excluding specialty, pediatric, and critical-access hospitals is a limitation of our study, given their stark differences from acute adult hospitals.

Limitations

First, we explore limitations due to temporal alignment. PE and quality ratings are dynamic and could change our conclusions if the study were repeated. Along similar lines, the published data across different QSR and PE sources is not in real time, with variable lags between sources. Our data collection spanned eight months, and we could not verify the extent of change within the collection timeframe. Second, we chose not to exclude hospitals with a handful of potentially unstable Google ratings. Third, we used Google as the sole source of crowdsourced PE. Patients may value different platforms unequally, and assuming equal influence could misrepresent real-world behavior. Future research should first explore the patient-preferred and most widely used crowdsourced sites. Fourth, the correlations we found do not establish causation. In our logistic regression, we were unable to adjust for the case mix index or Medicaid data.

Conclusion

In our study of 2,384 acute care hospitals in the USA, the correlation between QSR of Leapfrog, HC, USN, and PE in Google, HC, HG, and USN was frequent but not uniform.

Footnotes

ORCID iDs

Samer Badr

Jean-Sebastien Rachoin

Ethical Considerations

The Cooper University Healthcare Institutional Review Board reviewed this study and deemed it exempt from institutional review

Authors Contributions

Conceptualization (SR), Data gathering (SR, TN, AAS, SR, MS), Statistical Analysis (SR), Writing-up (SB, SR, MB).

Funding

The authors received no financial support for the research, authorship, and/or publication of this article.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data Availability Statement

Raw data saved in a password secured manner, and we can share it freely with any researcher who is going to use it for noncommercial purposes.*

Appendix

Appendix Table 1.

Patient Experience Stratified by Quality and Safety Ratings

Quality and safety result	Patient experience source
Quality and safety result	HC score	USN	Google	HG
Leapfrog
A	3[3,4]	3[3,4]	3.2[2.8-3.9]	72[68-76]
B	3[3,4]	3[3,4]	3[2.6-3.6]	69[64-74]
C	3[2,3]	3[2,3]	2.9[2.6-3.4]	67[61-71]
D	3[2,3]	3[2,3]	2.8[2.4-3.1]	62[57-69]
F	2.5[1-3.2]	1.5[1-3.2]	2.5[2.3-3.2]	57.5[43.5-35.2]
HC
5	4[3,4]	4[3,4]	3.2[2.9-3.7]	76[71-79]
4	3[3,4]	3[3,4]	3.1[2.7-3.6]	71[67-75]
3	3[3-3]	3[3-3]	3[2.6-3.6]	68[64-73]
2	3[2,3]	3[2,3]	2.8[2.5-3.4]	64[60-69]
1	2[2,3]	2[2,3]	2.7[2.4-3.4]	59[53-64]
USN
Ranked	3[3,4]	3[3,4]	3.1[2.8-3.5]	72[68-76]
Not Ranked	3[3,4]	3[2-4]	3.0[2.6-3.6]	68[62-73]

We present the results as median [IQR]. PE’s highest possible score is 5 in all sources except for HG, which is a percentage (percent that rated 9 or 10).

Appendix Table 2.

Logistic Regression Results, Leapfrog

Leapfrog numerical score
Coefficients^a	Unstandardized coefficients		Standardized coefficients
Model	B	Std. error	Beta	t	Sig.
(Constant)	1.261	.301		4.196	<.001
Staffed	-.005	.002	-1.270	-2.235	.026
Total discharges	.000	.000	2.110	2.644	.008
Patient days	-1.093E-5	.000	-.731	-1.158	.247
Gross Patient revenue	3.840E-8	.000	.092	2.376	.018
Google review	.095	.047	.067	2.012	.044
HospitalcomparePatientExp	.126	.066	.104	1.925	.054
HealthgradesExp	.019	.006	.154	2.889	.004
Us New Hospital Satisfaction	.133	.062	.112	2.150	.032
InterStaffGoogle	.001	.000	.621	1.962	.050
InterStaffHC	.000	.001	-.197	-.434	.664
InterStaffHgrades	7.659E-5	.000	1.338	1.367	.172
InterStaffUSN	-.001	.001	-.673	-1.606	.108
InterDiscUSN	3.234E-6	.000	.117	.210	.834
InterDiscHC	-1.280E-5	.000	-.475	-.768	.443
InterDiscHgrade	-2.228E-6	.000	-1.800	-1.812	.070
InterDiscGoogle	2.237E-5	.000	.837	1.579	.114
InterDays	-5.938E-6	.000	-1.305	-2.137	.033
InterDaysHC	3.278E-6	.000	.720	1.043	.297
InterDaysUSN	3.797E-6	.000	.822	1.337	.181
InterstaffedDischarged	-2.631E-9	.000	-.035	-.120	.905
Interstaffeddays	3.534E-9	.000	.287	1.340	.180
InterDisDays	-1.000E-10	.000	-.356	-1.539	.124

^aDependent Variable: Leapfrognumerical.

Appendix Table 3.

Logistic Regression Results, Hospital Compare

Hospital Compare
Coefficients^a		Unstandardized coefficients		Standardized coefficients
Model		B	Std. error	Beta	t	Sig.
1	(Constant)	-1.030	.328		-3.138	.002
	Staffed	.000	.003	.046	.089	.929
	Total discharges	5.142E-5	.000	.481	.665	.506
	Patient days	-1.813E-5	.000	-1.008	-1.758	.079
	Gross Patient revenue	4.331E-8	.000	.086	2.454	.014
	Google review	.135	.052	.079	2.622	.009
	HospitalcomparePatientExp	.227	.072	.155	3.167	.002
	HealthgradesExp	.045	.007	.309	6.413	<.001
	Us New Hospital Satisfaction	.061	.068	.043	.898	.369
	InterStaffGoogle	.000	.000	.092	.321	.748
	InterStaffHC	.000	.001	-.218	-.530	.596
	InterStaffHgrades	7.184E-6	.000	.104	.117	.907
	InterStaffUSN	.000	.001	-.187	-.492	.623
	InterDiscUSN	6.144E-6	.000	.185	.365	.715
	InterDiscHC	-4.781E-6	.000	-.148	-.263	.793
	InterDiscHgrade	-2.170E-7	.000	-.146	-.162	.872
	InterDiscGoogle	-1.100E-5	.000	-.342	-.711	.477
	InterDays	2.844E-7	.000	.052	.094	.925
	InterDaysHC	2.596E-6	.000	.474	.757	.449
	InterDaysUSN	2.139E-6	.000	.385	.690	.490
	InterstaffedDischarged	2.717E-8	.000	.299	1.134	.257
	Interstaffeddays	-2.810E-9	.000	-.189	-.976	.329
	InterDisDays	5.189E-13	.000	.002	.007	.994

^aDependent Variable: Hospital Compare.

Appendix Table 4.

Logistic Regression Results, USN Ranking

Coefficients^a		Unstandardized coefficients		Standardized coefficients
Model		B	Std. error	Beta	t	Sig.
1	(Constant)	-.312	.100		-3.128	.002
	Staffed	-.001	.001	-.608	-1.357	.175
	Total discharges	-2.394E-5	.000	-.641	-1.018	.309
	Patient days	8.247E-6	.000	1.310	2.632	.009
	Gross Patient revenue	1.188E-9	.000	.007	.221	.825
	Google review	-.004	.016	-.006	-.232	.817
	HospitalcomparePatientExp	.042	.022	.083	1.935	.053
	HealthgradesExp	.003	.002	.050	1.184	.236
	Us New Hospital Satisfaction	-.021	.021	-.042	-1.031	.303
	InterStaffGoogle	.000	.000	.735	2.943	.003
	InterStaffHC	.000	.000	-.344	-.960	.337
	InterStaffHgrades	-1.830E-6	.000	-.076	-.098	.922
	InterStaffUSN	.000	.000	.198	.599	.549
	InterDiscUSN	3.753E-6	.000	.324	.734	.463
	InterDiscHC	-8.880E-6	.000	-.784	-1.606	.108
	InterDiscHgrade	6.492E-7	.000	1.246	1.591	.112
	InterDiscGoogle	-1.019E-7	.000	-.009	-.022	.983
	InterDays	-2.182E-6	.000	-1.140	-2.367	.018
	InterDaysHC	2.157E-6	.000	1.126	2.069	.039
	InterDaysUSN	-7.705E-7	.000	-.396	-.818	.414
	InterstaffedDischarged	2.517E-8	.000	.791	3.457	<.001
	Interstaffeddays	-2.787E-9	.000	-.537	-3.184	.001
	InterDisDays	-7.142E-11	.000	-.603	-3.311	<.001

^aDependent Variable: USNRankingdichotomized.

References

Holliday

Kachalia

Meyer

Sequist

. Physician and Patient Views on Public Physician Rating Websites: A Cross-Sectional Study. J Gen Intern Med. 2017;32(6):626-631.

Online rating platforms direct patients to higher-quality physicians: Study[. https://www.beckershospitalreview.com/hospital-physician-relationships/online-rating-platforms-direct-patients-to-higher-quality-physicians-study/].

Farley

Enguidanos

Coletti

, et al. Patient satisfaction surveys and quality of care: an information paper. Ann Emerg Med. 2014;64(4):351-357.

Anhang

Elliott

Zaslavsky

, et al. Examining the role of patient experience surveys in measuring health care quality. Med Care Res Rev. 2014;71(5):522-554.

Doyle

Lennox

Bell

. A systematic review of evidence on the links between patient experience and clinical safety and effectiveness. BMJ Open. 2013;3(1):1-18.

American Hospital Directory. https://www.ahd.com/

Ellenbogen

Rim

Brotman

. Characterizing the Relationship Between Hospital Google Star Ratings, Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) Scores, and Quality. J Patient Exp. 2022;9:23743735221092604.

CAHPS Hospital Survey. https://www.hcahpsonline.org

Devgan

Klein

Fox

Ozturk

. Bifurcation of Patient Reviews: An Analysis of Trends in Online Ratings. Plast Reconstr Surg Glob Open. 2020;8(4):e2781.

10.

Ramasubramanian

Joshi

Krishnan

. Wisdom of the Experts Versus Opinions of the Crowd in Hospital Quality Ratings: Analysis of Hospital Compare Star Ratings and Google Star Ratings. J Med Internet Res. 2022;24(7):e34030.

11.

Wanner

Pennings

Nian

, et al. Rating Spine Surgeons: Physician Review Websites Versus a Patient-reported Outcomes-derived Ranking. Clin Spine Surg. 2022;35(8):E643-E648.

12.

Chakraborty

Church

. Social media hospital ratings and HCAHPS survey scores. Journal of Health Organization and Management. 2020;34(2): 162-172. 10.1108/jhom-08-2019-0234

13.

Hawkins

Brownstein

Tuli

Runels

Broecker

, et al. Measuring patient-perceived quality of care in US hospitals using Twitter. BMJ Quality & Safety. 2015;25(6): 404-413. 10.1136/bmjqs-2015-004309

14.

Glover

Khalilzadeh

Choy

Prabhakar

Pandharipande

, et al. Hospital Evaluations by Social Media: A Comparative Analysis of Facebook Ratings among Performance Outliers. Journal of General Internal Medicine. 2015;30(10): 1440-1446. 10.1007/s11606-015-3236-3

15.

Trzeciak

Gaughan

Bosire

Mazzarelli

. Association Between Medicare Summary Star Ratings for Patient Experience and Clinical Outcomes in US Hospitals. J Patient Exp. 2016;3(1):6-9.

16.

Campbell

. Are Facebook user ratings associated with hospital cost, quality and patient satisfaction? A cross-sectional analysis of hospitals in New York State. BMJ Qual Saf. 2018;27(2):119-129.

17.

Bjertnaes

Iversen

Skyrud

Danielsen

. The value of Facebook in nation-wide hospital quality assessment: a national mixed-methods study in Norway. BMJ Qual Saf. 2020;29(3):217-224.

18.

Emmert

Meszmer

Sander

: Do Health Care Providers Use Online Patient Ratings to Improve the Quality of Care? Results From an Online-Based Cross-Sectional Study. J Med Internet Res. 2016;18(9):e254.

19.

Lagu

Haskell

Cooper

Harris

Murray

Gardner

: Physician Beliefs About Online Reporting of Quality and Experience Data. J Gen Intern Med. 2019;34(11):2542-2548.

20.

Hitchman

Baumann

Glasgow

, et al. An Analysis of Negative One-star Patient Reviews and Complaints for Pediatric Orthopaedic Surgeons throughout the United States: A Retrospective Study. J Pediatr Orthop. 2024;44(2):129-134.