Abstract
Background:
The recent literature highlights how physician and patient gender influence medical care assessment. Gender impacts Press Ganey survey results, a key measure of patient satisfaction.
Methods:
We analyzed 82,195 outpatient visits at Northwell Health (January 01, 2018–January 04, 2019) to assess the role of gender and gender concordance in patient ratings. Inpatient, pediatric, and geriatric encounters were excluded. Five Press Ganey Care Provider questions were reviewed, covering provider concern, shared decision-making (SDM), medication information, confidence, and likelihood of recommendation. Top box scores and abbreviated summary scores were calculated and compared by specialty, physician gender, and patient gender.
Results:
Male physicians received significantly higher scores in internal medicine subspecialties, primary care, and OB/GYN, whereas female physicians scored higher in surgical specialties. Female patients rated female physicians lower than male physicians in internal medicine subspecialties and OB/GYN but higher in surgical fields. Male patients showed no significant rating differences based on physician gender. Male physicians scored higher on concern, SDM, confidence, and likelihood of recommendation in nonsurgical fields, with female physicians scoring higher on surgical specialties.
Conclusions:
Our findings suggest that physician gender significantly impacts patient satisfaction scores, with variations by specialty. Expanding this dataset could allow a more nuanced, intersectional analysis in future research.
Introduction
The Press Ganey Patient Satisfaction Survey (PG) is a commonly used tool that is the industry’s largest database of physician, caregiver, and patient feedback. The Patient Care and Affordable Care Act of 2010 and the Centers of Medicare and Medicaid Services highlight the importance of patient satisfaction. 1 This linked the usage of patient satisfaction surveys as the measurement of quality among health care providers. PG partners with nearly half of the hospitals in the United States. Hospitals, care providers, and administrators may be incentivized based on PG scores.2,3 For physicians, these scores may affect credibility, morale, patient retention rate, professional development, and compensation. 3 The data are analyzed to compare the quality of care between participating hospitals, specialties within hospital systems, and even physicians within the database. These data are geared toward improving accountability for patient experience, increasing workflow, quality of care, and patient satisfaction. 4 While intended as a useful tool, in its current form, PG scores are flawed. Large margins of error occur with small sample sizes, critically ill patients, and lack of randomization and accountability within high-flow patient units.5,6 Prior studies reported common response errors and the influence of nonmodifiable physician and patient characteristics on PG results.6,7
Many qualitative studies highlight gender-specific concerns in PG’s scoring system, emphasizing the need to differentiate gender from biological sex and consider societal gender norms.8–16 Drawing on the stereotype content model, Fiske et al. and Eckes demonstrate how ambivalent gender stereotypes affect social equity17,18; traditional women are seen as warm but less competent (paternalistic stereotype), whereas nontraditional women are competent but cold (envious stereotypes). Men, often in leadership, are viewed as competent but less warm, reinforcing dominance. These stereotypes shape patient behavior and the perception of physician attentiveness, affecting PG scores. Male physicians’ paternalistic behavior is often seen as confident, whereas similar behavior by female physicians is judged more harshly, reflecting gender bias aligned with these stereotypes. 17 In contrast, Freire’s “critical pedagogy” promotes a collaborative, reflective, and patient-centered approach that counters these biases.
The distinction between sex and gender is not new, but its integration into medical education is a recent development. The disciplines of Sex and Gender Medicine explore how sex and gender influence health, emphasizing that biological sex—distinct from gender expression—affects disease progression, diagnosis, and treatment. 19 Gender dynamics influence decision-making, treatment decisions, treatment accessibility, and diversity in clinical trials. Recognizing the difference between sex and gender is essential for understanding health outcomes influenced by both biology and society.
Physician specialty and demographics impact patient satisfaction (PG) scores. A study of 44,496 surveys showed that internal medicine subspecialists scored lower than plastic surgery, dermatology, and family medicine.18,20 In an analysis of 36,840 outpatient surgical visits, surgeons’ race, gender, and age influenced top box scores, with non-Hispanic White surgeons, older age, and male gender linked to higher ratings. 8 A review of 909 surveys from outpatient gynecology visits revealed that female physicians were 47% less likely to receive a top box score,10,12 whereas samples from academic outpatient otolaryngology practices showed no effect of gender on provider-focused PG scores. 13 Another systematic review found that patients seeking obstetric or gynecological care preferred female physicians and partially attributed this to a more patient-centered communication style. 15 However, a contrasting study showed that female gynecologists were 17% likely to receive top box patient satisfaction scores. 21
In this study, we aimed to analyze the relationship of gender with PG scores across multiple specialties in the outpatient setting of a nonprofit integrated health care network. We focused on the outpatient setting to avoid potential confounding variables in inpatient and emergency room settings.
Within the multivaried search for gender-biased score reporting, we outlined three specific hypotheses. (1) Female physicians receive lower overall PG scores than males; (2) gender concordance between physicians and patients affects PG scores, as prior research suggests modest benefits in satisfaction and communication, particularly in female concordant pairs,22,23 although these studies are limited by narrow populations; and (3) the effect of gender on PG scores is affected by specialty.
Methods
We collected data on PG surveys for physicians within Northwell Health Physician Partners, the largest private health care provider in New York State. Only de-identified data were collected. Our institutional review board was consulted, and they reviewed and exempted our study. The questions analyzed from the survey included Question 3 (Q3)—concern for patient worries, Question 4 (Q4)—inclusion in treatment decisions, Question 5 (Q5)—information the provider gave about medications, Question 9 (Q9)—confidence in the provider, and Question 10 (Q10)—likelihood of recommending the provider from the care provider domain on the PG Satisfaction Survey (Table 1).
PG Questions Evaluated in This Study
PG
Our Aims
To determine if physician gender is associated with the PG Care Provider (PGCP) abbreviated summary score, stratified by physician specialty.
To determine if physician gender is associated with individual PGCP questions (Q3, Q4, Q5, Q9, and Q10), stratified by physician specialty.
To determine if gender concordance is associated with the PGCP abbreviated summary score, stratified by physician specialty.
To determine if gender concordance is associated with individual PGCP questions (Q3, Q4, Q5, Q9, and Q10), stratified by physician specialty.
Study Variables
Our data query focused on five questions from the PG Satisfaction Survey (Table 1). Each question was collected on a Likert scale of 1–5 (1 = very poor, 2 = poor, 3 = fair, 4 = good, and 5 = very good) and was re-scored to a scale of 0–1 (0, 0.25, 0.5, 0.75, and 1). A summary score for all five questions of interest was individually and collectively calculated by taking the average score of the five questions (Qs). We refer to this as an “abbreviated summary score,” using only 5 out of the 11 PGCP Qs.
PGCP Qs were evaluated for and aligned with this study. A top box (binary [1 = score of 5 or 0 = scores of 1–4]) variable was created for each of the five Qs by categorizing everyone who responded 5 as “yes” and 0–4 as “no.” Other variables examined included physician specialty, physician gender, and patient gender. A binary variable for patient–physician gender concordance was created, defined as concordant if both reported the same gender.
Statistical Analysis
All variables were first summarized descriptively. Categorical variables were summarized using frequency and percentage. Continuous variables were summarized using means and standard deviations. Abbreviated summary scores and individual questions were then compared across physician gender categories using the Wilcoxon rank sum test (WRST). Abbreviated summary scores and individual questions were compared across physician–patient gender concordance categories using the WRST. To compare abbreviated summary scores and individual questions across extended physician gender categories, the Dwass, Steel, Critchlow-Fligner (DSCF) test for multiple pairwise comparisons was used to determine specifically which groups differed from one another. All statistical analyses were stratified by physician specialty.
Provider gender, patient gender, and provider–patient gender concordance were summarized descriptively using frequency and percentage. The PGCP abbreviated summary score was summarized using mean and standard deviation, and differences in the PGCP abbreviated summary score by provider gender were assessed using the WRST. Individual PGCP questions were assessed both as top box binary outcomes and continuous outcomes, and differences in individual questions by provider gender were assessed for these binary and continuous outcomes by using the chi-squared test and the WRST, respectively.
The PGCP abbreviated summary score was also compared across provider–patient gender concordance groups as a collapsed binary outcome (concordant gender versus nonconcordant gender) using the WRST and as an extended categorical outcome (male patient:male provider, male patient:female provider, female patient:male provider, and female patient:female provider) using the DSCF test for multiple pairwise comparisons.
All analyses were stratified by provider specialty, and a p-value of <0.05 was considered statistically significant. Analyses were performed using SAS Studio version 3.8 (SAS Institute Inc., Cary, NC, USA) and R version 4.1.2.
Study Sample
The initial sample included 130,763 observations linked to outpatient visits between January 01, 2018, and January 04, 2019. Due to a concern for potential confounding variables, 31,723 pediatric and geriatric visits were excluded, as there was a high likelihood of a caretaker completing the survey of uncertain gender. An additional 16,845 visits were excluded due to missing data for either patient gender or at least one of the PGCP questions. The final sample size included 82,195 observations, and of these observations, 33,787 were from internal medicine subspecialties (IMSS), 6,438 from OB/GYN, 20,087 from general internal medicine and family practice (primary care), and 21,883 from surgical fields (surgery). The specialties included in each category are listed in Supplementary Appendix S1.
Results
Among IMSS, primary care, and surgery, most physicians were male. Female physicians were more highly represented in OB/GYN. In all groups, most of the patients were female (Table 2). Among IMSS, primary care, and OB/GYN, the average abbreviated summary score was significantly higher for male physicians, whereas the opposite trend was seen in surgery (Table 3). Male physicians scored higher than female physicians on patient responses to questions about concern for patient worries (Q3) and inclusion in treatment decisions (Q4) in IMSS and primary care, but lower in surgery. When evaluating responses about the information provided by the care provider (Q5), patient confidence (Q9), and the likelihood of recommending the provider (Q10), male physicians scored higher in IMSS, primary care, and OB/GYN but lower in surgery in both abbreviated summary and top box scores (Table 4).
Data Summary, Stratified by Physician Specialty
IMSS, internal medicine subspecialties; SD, standard deviation.
Association between Physician Gender and Press Ganey Care Provider Abbreviated Summary Score, Stratified by Provider Specialty
Bold values indicate the statistical significance of p value.
IMSS, internal medicine subspecialties; SD, standard deviation.
Association Between Physician Gender and Individual Press Ganey Care Provider Questions, Stratified by Physician Specialty (Abbreviated Summary Score and Top Box % Score)
Bold values indicate the statistical significance of p value.
IMSS, internal medicine subspecialties; SD, standard deviation.
Female patients (FP) rated female physicians significantly lower than male physicians in IMSS and OB/GYN, but higher in surgery in the abbreviated summary score. In primary care, there was no significant difference in abbreviated summary scores between male and female physicians for FP. Male patients (MP) showed no significant differences in their scores based on the physician’s gender across all specialties. MP gave female physicians significantly higher scores in primary care compared with FP. The analysis of physician–patient gender concordance in OB/GYN was limited due to the predominance of FP, but in surgery, concordant gender combinations had significantly higher scores (0.9479 versus 0.9402, p = 0.0036) (Table 5).
Association Between Provider–Patient Gender Concordance and Press Ganey Care Provider Abbreviated Summary Score, Stratified by Provider Specialty
Bold values indicate the statistical significance of p value.
IMSS, internal medicine subspecialties; SD, standard deviation.
Discussion
While varied throughout different medical subspecialties, lower PG scores were associated with female physicians. The link between gender, PG scores, and quality of medical care is yet to be defined. We demonstrated differences in PG scores based on the physician’s gender. We outlined the questions we chose to highlight based on the literature-supported precedent to decipher previously established gender preconceptions.9–12,24,25 Amid confounding evidence supporting females scoring females harsher and males scoring males more leniently, we found that overall, women physicians received lower scores than men.
We evaluated Q3 as a reflection of patients’ perception of physician empathy. Female physicians are often associated with communal traits such as empathy and warmth. 17 In a study of 480 physicians and 22,431 surveys, these traits were linked to better scores. 26 However, in a sample of 109,997 surveys, there was no significant difference in overall scores by provider sex, although female providers were described with communal adjectives such as “empathetic,” “sweet,” and “attentive,” whereas male providers were described with agentic adjectives such as “informative” and “superior.” 27 Despite these perceptions, our data showed higher PG scores for male physicians in IMSS, primary care, and OB/GYN subspecialties.
Q4 and Q5 in the PG survey reflect shared decision-making (SDM), whereas Q5 assesses physician communication regarding treatment options. In the described cohort, female physicians had higher aggregate scores in Q4 and Q5 in surgery, but not in IMSS, primary care, or OBGYN. This raises the question of is there a difference in physician characteristics by specialty and patient expectations by specialty? SDM involves establishing a relationship in which physicians explain treatment options and patients feel supported in the discussion during the decision-making process. 27 SDM is incorporated into guidelines for multiple diseases,28–31 and it is the opposite of traditional paternalistic methods of decision-making. 32 For patients, SDM is directly related to strengthened communication, understanding, trust, and satisfaction. 33 However, despite the widespread incorporation of SDM, the effect of SDM on behavioral and health outcomes has not yet been established. 34
Q9 highlights how gender and gender concordance influence patient confidence, trust, and faith in a physician’s decision-making abilities, whereas Q10 summarizes the patient’s overall impression and the likelihood of recommending the physician. Both Q9 and Q10 reflect patients’ overall opinions of their physicians and are commonly used by health systems to assess performance. Male physicians received higher ratings in IMSS, OB/GYN, and primary care, whereas female physicians were rated higher in surgical specialties. The impact of gender on these perceptions can significantly affect a physician’s career, influencing patient trust, referral patterns, and professional evaluations.35–37 Studies show that gender bias in patient assessments may contribute to disparities in career advancement, leadership opportunities, and compensation for female physicians.38,39
While male physicians scored higher than females in IMSS, primary care, and OB/GYN, female physicians scored higher in the surgical subspecialties. We propose that patient expectations of physician characteristics may vary by specialty, but an exploration of this has not been described to date. Further inquiry may include the assessment of patient expectations.
Limited research compares the quality of care and PG scoring by physician gender. In a study of elderly hospitalized patients, those treated by female internists had lower mortality and readmissions compared with those cared for by male internists.39,40 In a Swedish cohort of patients undergoing acute cholecystectomy, female surgeons had more favorable outcomes. 32 In a French cohort, maternal morbidity after C-section was not significantly different when comparing female and male surgeons. 41
Several prior studies have demonstrated that PG results were not directly related to patient outcomes. In a study of patients undergoing hip arthroplasty, no statistically significant relationship was found between PG scoring and visit outcomes. 42 A second study, observing lumbar puncture patients, found no positive correlations between PG scores and quality of care. 43
We propose that male physicians score higher overall due to paternalistic behavior patterns expected to be associated with male physicians. This ties into the distinction between critical pedagogy and the “banking model” of education. Paulo Freire, in Pedagogy of the Oppressed, critiques the view of authority figures as the sole possessors of knowledge, treating patients as passive recipients. 44 This model, often used in patient–physician interactions, can be detrimental—especially for patients with language, literacy, or socioeconomic barriers—contributing to implicit biases, medical distrust, and nonadherence to treatment. However, when paternalistic behavior is exhibited by male physicians, it is often perceived as confidence or competence, whereas similar behavior from female physicians may be judged more negatively. This reflects underlying gender bias consistent with the stereotype content model, 17 which suggests that men are stereotyped as competent and women as warm, penalizing women who violate prescriptive gender norms. In contrast, Freire’s “critical pedagogy” promotes a more dynamic, collaborative exchange, where both physicians and patients are active participants—encouraging reflection, engagement, and more patient-centered care. 45
Nonresponse bias is an unavoidable confounder. Previous research shows that older individuals and women are more likely to respond to patient surveys. 5 The dataset only considers sex (male/female), limiting its scope. Results are valid only when patient and physician gender align with birth sex. Key confounders such as race, age, physician–patient relationship length, socioeconomic status, and education were not accounted for. Clustering potentially impacted the results; some physicians or patients may have been disproportionately represented in the dataset. The sample is also geographically limited to New York City and the surrounding suburbs, which may not reflect broader regional perspectives. In addition, responses from caregivers could skew results. Other common biases, such as recall bias and fear of retaliation, may also affect findings. Finally, as the data were collected before the coronavirus disease 2019 pandemic, they do not capture post-2020 shifts in health care dynamics, particularly considering movements for gender equity and health care reform.
Future Research
Our study demonstrates that gender is related to patient responses to care provider questions on the PG survey, and these differences are influenced by provider specialty and gender concordance between the physician and patient. Further studies are needed to evaluate the impact of gender on the ability of PG to evaluate physicians. The influence of cultural standards, social implicit bias, and patient and physician demographics on PG survey outcomes remains incompletely understood. The impact of physician specialty on patient expectations is unknown and will be explored in future research. Questions remain about the gender composition across subspecialties, score discrepancies, and the factors influencing the specialty choice among genders, notably in fields such as family planning, OB/GYN, and urology. We question if PG is the optimal assessment of physician value, quality, and implied reflection of medical expertise.
Conclusion
Despite its widespread use, the PG survey is influenced by individual patient expectations and factors such as statistical significance, randomization, recall bias, response rates, and gender concordance. Key questions remain about gender distribution, scoring variability, and the factors influencing the physician choice across subspecialties. The additional impact of paternalistic attitudes and implicit biases on gender concordance in patient–physician interactions warrants further exploration. PG surveys fail to capture the subtlety and complexity of patient experiences, shaped by societal factors and interpersonal dynamics. While a well-meant measure of patient satisfaction, an updated and nuanced version of this questionnaire is essential.
Footnotes
Author Disclosure Statement
No competing financial interests exist.
Funding Information
No funding was received for this article.
Supplemental Material
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
