Building better pediatric surgeons: A sentiment analysis of online physician review websites

Abstract

Purpose:

Physician review websites are a heavily utilized patient tool for finding, rating, and reviewing surgeons. Natural language processing such as sentiment analysis provides a comprehensive approach to better understand the nuances of patient perception. This study utilizes sentiment analysis to examine how specific patient sentiments correspond to positive and negative experiences in online reviews of pediatric orthopedic surgeons.

Methods:

The online written reviews and star ratings of pediatric surgeons belonging to the Pediatric Orthopaedic Society of North America were obtained from healthgrades.com. A sentiment analysis package obtained compound scores of each surgeon’s reviews. Inferential statistics analyzed relationships between demographic variables and star/sentiment scores. Word frequency analyses and multiple logistic regression analyses were performed on key terms.

Results:

A total of 749 pediatric surgeons (3830 total online reviews) were included. 80.8% were males and 33.8% were below 50 years of age. Male surgeons and younger surgeons had higher mean star ratings. Surgeon attributes including “confident” (p < 0.01) and “comfortable” (p < 0.01) improved the odds of positive reviews, while “rude” (p < 0.01) and “unprofessional” (p < 0.01) decreased these odds. Comments regarding “pain” lowered the odds of positive reviews (p < 0.01), whereas “pain-free” increased these odds (p < 0.01).

Conclusion:

Pediatric surgeons who were younger, communicated effectively, eased pain, and curated a welcoming office setting were more likely to receive positively written online reviews. This suggests that a spectrum of interpersonal and ancillary factors impact patient experience and perceptions beyond surgical skill. These outcomes can advise pediatric surgeons on behavioral and office qualities that patients and families prioritize when rating/recommending surgeons online.

Level of evidence:

Keywords

Physician review websites natural language processing sentiment analysis patient satisfaction pediatric orthopedic surgeons

Introduction

Patients and their families are determined to find the best possible physician for their healthcare. For this reason, people are increasingly turning toward online physician review websites (PRWs) due to their convenience and the ability to obtain firsthand commentary from other patients.¹ PRWs provide information for potential patients and have been shown to impact patients’ selection of certain providers.^2–4 The objective of the current study was to use a publicly available PRW to analyze online reviews for pediatric orthopedic surgeons quantitatively and qualitatively.

Recent research has analyzed PRWs to better understand what patients value in their surgeons. Research regarding reviews of non-pediatric surgeons found that higher ratings were associated with surgeon trustworthiness, compassion, and strong bedside manner, while lower ratings were associated with rudeness or poor pain management.^5,6 However, few studies have analyzed online reviews for pediatric physicians, let alone pediatric surgeons. Furthermore, most of these studies were limited to subjective analyses. One such study utilized a Likert-type scale survey and found that the patient–physician relationship along with staff friendliness had the strongest correlation with patients recommending the practice to others.⁷ A separate study found that clear communication, cheerfulness, and patient confidence in their provider were strong predictors of patient satisfaction.⁸ While these studies provide an initial glimpse into what patients value in their pediatric surgeon, they are limited by the usage of pre-written surveys, subjective analyses, and primarily investigating non-surgical specialties. The present study utilizes sentiment analysis, which is a form of machine learning that can understand spoken and written human language and obtain insights based on the language used. Sentiment analysis takes written prose and assigns a quantitative score based on how positive or negative it is. By doing so, this study more objectively evaluates patient values and explores the factors contributing to more positive and negative reviews of pediatric surgeons.

While it has been shown that patients are increasingly relying on online review sites, some physicians remain wary about their use due to the impact of negative reviews, a lack of correlation between ratings and quality of care, and the fact that non-clinical factors can impact patient reviews.^9–11 However, online reviews are increasingly impacting patient decision making. Therefore, it is imperative that physicians are attentive to their online presence and consider how to best tailor their practice to improve the patient experience. To further explore factors that influence patient and caregiver reviews of pediatric surgeons, this study utilizes machine learning to analyze online patient comments. In line with the previous literature, we hypothesize that confidence, a positive bedside manner, and a welcoming office environment will lead to the most positive reviews of pediatric surgeons.

Materials and methods

Pediatric surgeon data extraction

The physicians identified in this study were obtained from the Pediatric Orthopaedic Society of North America (POSNA). Those without profiles or with no written reviews on healthgrades.com were excluded from the study. These surgeon names were placed in a web-scraping code, which queried Google for “(Physician Name) Pediatrics Healthgrades.” This culminated in a list of Healthgrades links for all surgeons, which were accessed to extract written and star-rating reviews as well as demographic information. States included in each region for locational analysis followed the United States Census Bureau regions. Healthgrades was selected as it was consistently one of the first websites offered when searching Google for provider reviews and one of the only websites that permitted bulk extraction of data.

Sentiment analysis calculation

The sentiment analysis for this study was performed by utilizing the “Valence Aware Dictionary and sEntiment Reasoner” (VADER) Python package. This package is a publicly available, published package for sentiment analysis.¹² The foundation is based on a dictionary which was developed by a team of 10 human raters. These raters assigned scores to hundreds of words ranging from −4 to +4 in terms of the perceived sentiment of the word. On this scale, more positive scores indicate more positive words, such as “great,” while 0 represents neutral words. Therefore, this package “reads” through sentences and develops a score that represents the overall sentiment of the text. This score is calculated and normalized to a scale between −1 and +1, with −1 representing the most negative sentiment and +1 representing the most positive.

The power of the package comes from its ability to interpret parts of speech and modifiers in sentences and alter its calculations based on context of the words. The package recognizes punctuation, redundant capitalization, and adverbs, as these all impact the meaning or tone of a sentence. For example, the sentence “She is a great surgeon” has less impact than “She is a GREAT surgeon.” Additionally, positive modulators, such as “very,” multiply the effect of the proceeding term. As such, a “very good” surgeon is given a higher score than just a “good” one and vice versa for negative words. Similarly, negative modulators change the sentiment of the proceeding term by flipping the sign of the contributing score. For example, if “great” was used to describe a surgeon, it would normally contribute positively to the overall score, but using the phrase “not great” conveys negative sentiment and score.

Model validation and data analysis

Linear regression analysis was performed in Python to relate each surgeon’s average sentiment score to their average reported online star score. This was conducted as a proof of concept for the sentiment analysis. If a significant relationship was seen between the calculated sentiment score and the star score, then the calculated score has a significant relationship with what is being reported by the patients themselves.

Student t-tests were utilized to assess the relationship between sex and average sentiment analysis score of written reviews, while one-way analysis of variance (ANOVA) tests were performed on the age ranges and geographical regions. A word frequency analysis was also conducted to report the most commonly used words in both the most positive and most negative reviews. Before this frequency analysis, non-clinically or behaviorally relevant words such as “amazing” or “worst” were removed to focus on potentially actionable and relevant words to clinicians. Additionally, after conducting the word frequency analysis, a bigram (or two word-pair strings) analysis was performed to provide greater context for the words utilized. Sentiment scores > + 0.50 were defined as positive reviews and scores < 0 were defined as negative reviews. Finally, a multiple logistic regression was performed to analyze the effects of specific, clinically relevant words or phrases on the likelihood of a review scoring > + 0.50.

Results

Model validation

When plotting each review’s average star score against our calculated sentiment score, there was a positive linear relationship (r² = 0.61, p < 0.01), providing validity to our analysis (Figure 1).

Figure 1.

Pediatric orthopedic surgeon cohort model validation. Linear regression of average calculated sentiment analysis scores of each surgeon compared to their reported online star ratings.

Pediatric surgeon demographics

This study identified 749 pediatric orthopedic surgeons who met the inclusion criteria yielding 3830 online reviews, while 443 surgeons were excluded for having no online profile or reviews. Male surgeons comprised 80.8% of the cohort and received higher average star scores than females (4.17/5.00 ± 0.78 vs 3.89/5.00 ± 1.02, p < 0.01) with no significant difference in sentiment scores (+0.50 ± 0.44 vs +0.45 ± 0.50, p = 0.28). Regarding age, 6.8% were younger than 40 years old, 27.0% were between 40 and 49 years old, 32.5% were between 50 and 59 years old, and 33.8% were older than 60 years old. Younger surgeons trended toward having higher sentiment scores (p = 0.07) and received higher star scores (p < 0.01). 19.7% of included surgeons practice in Midwestern states, 22.1% in Northeastern states, 38.4% in Southern states, and 19.7% in Western states. Regional analysis showed a trend toward different star scores (p = 0.08) but no difference in sentiment scores (p = 0.25). Average sentiment and star scores for age, sex, and location are shown in Table 1.

Table 1.

Pediatric orthopedic surgeon demographic analysis.

	Age range				p-value
	<40 y/o	40–49 y/o	50–59 y/o	≥60 y/o	p-value
Frequency (%)	37 (6.8)	148 (27.0)	178 (32.5)	185 (33.8)
Mean sentiment score analysis	+0.56 ± 0.43	+0.50 ± 0.45	+0.49 ± 0.42	+0.40 ± 0.48	0.07
Mean star score analysis	4.56 ± 0.70	4.25 ± 0.72	4.02 ± 0.86	3.97 ± 0.84	<0.01
	Sex				p-value
	Male		Female		p-value
Frequency (%)	513 (80.8)		122 (19.2)
Mean sentiment score analysis	+0.50 ± 0.44		+0.45 ± 0.50		0.28
Mean star score analysis	4.17 ± 0.78		3.89 ± 1.02		<0.01
	Region				p-value
	Midwest	Northeast	South	West	p-value
Frequency (%)	126 (19.7)	141 (22.1)	245 (38.4)	126 (19.7)
Mean sentiment score analysis	+0.61 ± 0.28	+0.63 ± 0.28	+0.67 ± 0.24	+0.64 ± 0.25	0.25
Mean star score analysis	4.29 ± 0.62	4.21 ± 0.72	4.38 ± 0.68	4.20 ± 0.74	0.08

Some physicians did not have their age, sex, or location listed and were not included in respective analyses. States included in each region for locational analysis followed the United States Census Bureau regions.

Word/word-pair frequency and multivariate analysis

Out of 3830 surgeon reviews, 2729 (71.3%) were deemed positive reviews (sentiment score > + 0.50) and 670 (17.5%) were negative reviews (sentiment score < 0). The most used words in positive reviews of surgeons were “care,” “kind,” “caring,” “wonderful,” and “friendly.” Conversely, the most used words in negative reviews for surgeons included “pain,” “no,” “rude,” “care,” and “problem.” The most common word pairs in positive reviews included “feel comfortable,” “kind compassionate,” “cares patients,” and “truly cares.” The most frequent word pairs in negative reviews were “back pain,” “no pain,” and “severe pain.” For the 10 most frequently used words and word pairs in positive and negative reviews, see Tables 2 and 3, respectively.

Table 2.

Word frequency analysis for pediatric orthopedic surgeons.

Positive reviews		Negative reviews
Word	Frequency (n)	Word	Frequency (n)
Care	600	Pain	278
Kind	322	No	194
Caring	307	Rude	114
Wonderful	298	Care	75
Friendly	237	Problem	54
Pain	226	Broken	54
No	192	Injury	52
Comfortable	172	Horrible	49
Better	160	Worst	48
Compassionate	143	Terrible	46

Frequency indicates the number of times a word was used in positive or negative reviews, not the number of positive or negative reviews that included that word.

Table 3.

Word-pair frequency analysis for pediatric orthopedic surgeons.

Positive reviews		Negative reviews
Word-pair	Frequency (n)	Word-pair	Frequency (n)
Feel comfortable	78	Back pain	24
Kind compassionate	66	No pain	20
Cares patients	64	Severe pain	18
Truly cares	62	Waste time	17
Took care	59	Urgent care	16
Pain free	58	No one	15
Kind caring	52	Poor bedside	14
Made sure	41	Worst experience	14
God bless	40	Pain no	14
Would definitely	40	Rude unprofessional	12

Frequency indicates the number of times a word-pair was used in positive or negative reviews, not the number of positive or negative reviews that included that word-pair.

Multivariate analysis

The multivariate analysis identifies words that increase or decrease the odds of a physician receiving a positive review. For example, when the word “comfortable” was used in a review, a surgeon was 8.3 times more likely to receive a positive review (odds ratio (OR): 8.34, p < 0.01). Alternatively, when “wait” was included, a surgeon was 0.4 times as likely to receive a positive review (OR: 0.44; p < 0.01). For the full multivariate analysis, see Tables 4 and 5.

Table 4.

Multivariate analysis of relevant keywords with positive influence on reviews.

Phrase	Odds ratio (95% CI)	p-value
Comfortable	8.34 (3.61–19.28)	<0.01
Confident	8.02 (2.48–25.97)	<0.01
Family	1.78 (1.11–2.85)	0.017
Front desk	1.02 (0.33–3.22)	0.97
Knowledgeable	2.14 (1.33–3.44)	<0.01
Listens	2.88 (1.10–7.59)	0.032
Organized	2.58 (0.28–23.74)	0.40
Pain-free	14.45 (3.20–65.29)	<0.01
Staff	2.23 (1.73–2.87)	<0.01
Warm	3.32 (0.73–15.16)	0.12
Young	1.56 (0.69–3.54)	0.28

CI: confidence interval.

Table 5.

Multivariate analysis of relevant keywords with negative influence on reviews.

Phrase	Odds ratio (95% CI)	p-value
Activities	0.57 (0.15–2.13)	0.40
Arrogant	0.11 (0.02–0.53)	<0.01
Diagnosis	0.90 (0.51–1.59)	0.71
Medication	0.54 (0.14–2.03)	0.36
Nurse	0.41 (0.23–0.74)	<0.01
Old	0.64 (0.52–0.78)	<0.01
Pain	0.26 (0.20–0.35)	<0.01
Return	0.46 (0.25–0.87)	0.016
Rude	0.024 (0.008–0.066)	<0.01
Unprofessional	0.026 (0.003–0.201)	<0.01
Wait	0.44 (0.30–0.64)	<0.01
X-ray	0.61 (0.25–1.51)	0.28

CI: confidence interval.

Discussion

Online PRWs are increasingly influencing patients and their families as they seek out and select surgeons. Uniquely, pediatric surgeons must strive to build positive rapport with both their patients and their patients’ caregivers. This adds nuances to pediatric surgeon reviews that are typically not present for their non-pediatric colleagues. In this study, we utilized a machine learning approach to objectively analyze 3830 written reviews for 749 pediatric orthopedic surgeons using sentiment analysis.

Previous literature has consistently shown that younger surgeons receive better online reviews.^5,6,13,14 We found that younger pediatric surgeons had significantly higher star ratings, with surgeons under 40 years old scoring the highest. Younger surgeons also trended toward having higher sentiment scores. Damodar et al.¹⁵ suggested that this may be influenced by younger surgeons encouraging more patients to utilize online reviews, indicating that it may serve surgeons well to cultivate a social media presence^5,16 and/or focus on improving factors discussed later in this study. Regarding the impact of surgeon sex on reviews, the literature is less clear as some studies found no difference in online ratings between sexes^17–19 while others found female surgeons to be rated more highly.^14,20 However, male surgeons in this study were found to have significantly higher star ratings with no difference in sentiment score. Therefore, even though patient reviews were positive to a similar degree for both male and female pediatric surgeons, male surgeons still received higher online ratings. This may indicate the presence of implicit gender biases, as noted by Hutchinson,²¹ or may be skewed due to the higher number of male surgeons in this study than female surgeons.

Interpersonal characteristics such as strong communication and compassion, along with clinical knowledge and skill, have been shown to improve written reviews.^5,22,23 We found that surgeons who appeared confident (OR: 8.02), knowledgeable (OR: 2.14), listened attentively (OR: 2.88), and made patients feel comfortable (OR: 8.34) were significantly more likely to receive positive reviews. Positive engagement with patients’ caregivers was also valued, as “family” (OR: 1.78) significantly predicted positive reviews. Additionally, the top five most common words and the top five most common word-pairs in positive reviews were all related to interpersonal characteristics. Alternatively, reviews with the words “rude” (OR: 0.024) or “unprofessional” (OR: 0.026) were almost 50× more likely to be negative. With respect to pain management, “pain-free” (OR: 14.45) was the strongest predictor of positive reviews in the study while “pain” (OR: 0.26) alone made reviews more likely to be negative. Additionally, “pain” and word-pairs including “pain” were the most frequently found words in negative reviews. Patients and their caregivers seem to highly value a strong bedside manner in conjunction with effective alleviation of pain and symptoms when grading surgeons.

Ancillary factors have also been shown to significantly impact patient experiences and reviews.^24–27 This study found that most words regarding ancillary factors, including “wait” (OR: 0.44) and “nurse” (OR: 0.41), were predictive of negative reviews. Reviewers may be more likely to comment on these office aspects only if they are negative or seem unnecessary. Multiple studies have found that office staff interactions promote negative reviews,^22,25 but we found the word “staff” (OR: 2.23) to predict positive reviews. Yu et al.²⁸ similarly found a positive impact of office staff and noted that staff friendliness and helpfulness significantly impacted patient experiences. While certain factors such as office staff personnel may be under a surgeon’s management, other elements including wait time and scheduling may be more difficult for a surgeon to personally improve. Nevertheless, it is important for surgeons to understand how these variables impact a patient’s and their caregiver’s perspective to facilitate a well-rounded patient experience.

This study design is not without limitations. People are more likely to leave reviews if their experiences were exceedingly positive or negative, and we were unable to gauge a patient’s motivation for writing a review. Patients also might review a surgeon at different points in the timeline of care, influencing their overall sentiment. Surgeons can also encourage patients they’ve positively interacted with to leave reviews. While these are all potential sources of bias in online reviews, we believe these findings are still relevant as prospective patients are increasingly relying on PRWs and have access to the same information we analyzed. Therefore, this study can help physicians understand what factors are currently influencing those patients. Analysis was also limited to pediatric surgeons who are members of POSNA and have at least one written review on a single PRW, as it was the only site with public accessibility. Future studies may seek to directly compare pediatric surgeons to non-pediatric surgeons to more finely assess the qualities and factors that influence patient sentiment across various fields of medicine.

To our knowledge, this is the first and largest quantitative analysis of online reviews for pediatric surgeons utilizing a sentiment analysis approach. We found that younger surgeons were associated with higher star scores, as were male surgeons. Unsurprisingly, positive interpersonal skills and traits seemed to be drivers of positive reviews, as were concise and effective pain/symptom management. Negative reviews were primarily associated with comments regarding pain and ancillary factors that surgeons hold less control over. Ultimately, this study advises pediatric surgeons on specific behavioral and office qualities that patients and their caregivers prioritize to best improve their practice, maximize patient satisfaction, and engage more future patients.

Footnotes

Author contributions

Liam R Butler: Study design, data acquisition and analysis, manuscript preparation and approval.

Justin E Tang: Study design, data acquisition and analysis, manuscript preparation and approval.

Skylar M Hess: Study design, manuscript preparation and approval.

Christopher A White: Study design, manuscript preparation and approval.

Varun Arvind: Data acquisition and analysis, manuscript preparation and approval.

Jun S Kim: Data acquisition, manuscript preparation and approval.

Abigail K Allen: Study design, manuscript preparation and approval.

Sheena C Ranade: Study design, data acquisition, manuscript preparation and approval.

Declaration of conflicting interests

The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: J.S.K. declares Aldentyfy Inc. (Stock or Stock Options). The other authors declare that they have no disclosures or conflicts of interest.

Ethical approval

This research involved the collection and study of currently existing data that was publicly available to patients, surgeons, and researchers alike. Therefore, it was not necessary to obtain approval from our Institutional Review Board.

Funding

The author(s) received no financial support for the research, authorship, and/or publication of this article.

ORCID iDs

Liam R Butler

Christopher A White

References

Hanauer

Zheng

Singer

, et al. Parental awareness and use of online physician rating sites. Pediatrics 2014; 134(4): e966–e975.

Emmert

Meier

Pisch

, et al. Physician choice making and characteristics associated with using physician-rating websites: cross-sectional study. J Med Internet Res 2013; 15(8): e187.

Lee-Won

McKnight

Effects of online physician reviews and physician gender on perceptions of physician skills and primary care physician (PCP) selection. Health Commun 2019; 34(11): 1250–1258.

Schulz

Rothenfluh

Influence of health literacy on effects of patient rating websites: survey study using a hypothetical situation and fictitious doctors. J Med Internet Res 2020; 22(4): e14134.

Bernstein

Mesfin

Physician-review websites in orthopaedic surgery. JBJS Rev 2020; 8(3): e0158.

Tang

Arvind

White

, et al. What are patients saying about you online? A sentiment analysis of online written reviews on scoliosis research society surgeons. Spine Deform 2021; 10(2): 301–306.

Peng

Burrows

Shirley

, et al. Unlocking the doors to patient satisfaction in pediatric orthopaedics. J Pediatr Orthop 2018; 38(8): 398–402.

Ahmed

Miller

Burrows

, et al. Evaluation of patient satisfaction in pediatric dermatology. Pediatr Dermatol 2017; 34(6): 668–672.

Burn

Lintner

Cosculluela

, et al. Physician rating scales do not accurately rate physicians. Orthopedics 2018; 41(4): e445–e456.

10.

Cloney

Hopkins

Shlobin

, et al. Online ratings of neurosurgeons: an examination of web data and its implications. Neurosurgery 2018; 83(6): 1143–1152.

11.

Daskivich

Houman

Fuller

, et al. Online physician ratings fail to predict actual performance on measures of quality, value, and peer review. J Am Med Inform Assoc 2018; 25(4): 401–407.

12.

Hutto

Gilbert

. VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: 8th international AAAI conference on weblogs and social media, Ann Arbor, MI, 1–4 June 2014, Vol 8, pp. 216–225. Palo Alto, CA: AAAI Press.

13.

Zhang

Omar

Mesfin

. Online ratings of spine surgeons: analysis of 208 surgeons. Spine; 43(12): E722–E726.

14.

Nwachukwu

Adjei

Trehan

, et al. Rating a sports medicine surgeon’s “Quality” in the modern era: an analysis of popular physician online rating websites. HSS J 2016; 12(3): 272–277.

15.

Damodar

Donnally

3rd McCormick

, et al. How wait-times, social media, and surgeon demographics influence online reviews on leading review websites for joint replacement surgeons. J Clin Orthop Trauma 2019; 10(4): 761–767.

16.

Donnally

3rd McCormick

Pastore

, et al. Social media presence correlated with improved online review scores for spine surgeons. World Neurosurg 2020; 141: e18–e25.

17.

Runge

Jay

Vergara

, et al. An analysis of online ratings of hip and knee surgeons. J Arthroplasty 2020; 35(5): 1432–1436.

18.

Melone

Brodell

Jr Hernandez

, et al. Online ratings of spinal deformity surgeons: analysis of 634 surgeons. Spine Deform 2020; 8(1): 17–24.

19.

Jay

Runge

Vergara

, et al. An analysis of online ratings of pediatric orthopaedic surgeons. J Pediatr Orthop 2021; 41(9): 576–579.

20.

Heimdal

Gardner

Dhanani

, et al. Factors affecting orthopedic sports medicine surgeons’ online reputation. Orthopedics 2021; 44(2): e281–e286.

21.

Hutchison

Four types of gender bias affecting women surgeons and their cumulative impact. J Med Ethics 2020; 46(4): 236–241.

22.

Kalagara

Eltorai

AEM

DePasse

, et al. Predictive factors of positive online patient ratings of spine surgeons. Spine J 2019; 19(1): 182–185.

23.

Langerhuizen

DWG

Brown

Doornberg

, et al. Analysis of online reviews of orthopaedic surgeons and orthopaedic practices using natural language processing. J Am Acad Orthop Surg 2021; 29(8): 337–344.

24.

Bakhsh

Mesfin

Online ratings of orthopedic surgeons: analysis of 2185 reviews. Am J Orthop 2014; 43(8): 359–363.

25.

Donnally

3rd Roth

, et al. Analysis of internet review site comments for spine surgeons: how office staff, physician likeability, and patient outcome are associated with online evaluations. Spine 2018; 43(24): 1725–1730.

26.

Hanauer

Zheng

Singer

, et al. Public awareness, perception, and use of online physician rating sites. JAMA 2014; 311(7): 734–735.

27.

Espinoza

Perry

, et al. Online ratings of ASOPRS surgeons: what do your patients really think of you? Ophthalmic Plast Reconstr Surg 2017; 33(6): 466–470.

28.

Samuel

Yalçin

, et al. Patient-recorded physician ratings: what can we learn from 11,527 online reviews of orthopedic surgeons? J Arthroplasty 2020; 35(6S): S364–S367.