Abstract
Purpose:
Physician review websites are a heavily utilized patient tool for finding, rating, and reviewing surgeons. Natural language processing such as sentiment analysis provides a comprehensive approach to better understand the nuances of patient perception. This study utilizes sentiment analysis to examine how specific patient sentiments correspond to positive and negative experiences in online reviews of pediatric orthopedic surgeons.
Methods:
The online written reviews and star ratings of pediatric surgeons belonging to the Pediatric Orthopaedic Society of North America were obtained from healthgrades.com. A sentiment analysis package obtained compound scores of each surgeon’s reviews. Inferential statistics analyzed relationships between demographic variables and star/sentiment scores. Word frequency analyses and multiple logistic regression analyses were performed on key terms.
Results:
A total of 749 pediatric surgeons (3830 total online reviews) were included. 80.8% were males and 33.8% were below 50 years of age. Male surgeons and younger surgeons had higher mean star ratings. Surgeon attributes including “confident” (p < 0.01) and “comfortable” (p < 0.01) improved the odds of positive reviews, while “rude” (p < 0.01) and “unprofessional” (p < 0.01) decreased these odds. Comments regarding “pain” lowered the odds of positive reviews (p < 0.01), whereas “pain-free” increased these odds (p < 0.01).
Conclusion:
Pediatric surgeons who were younger, communicated effectively, eased pain, and curated a welcoming office setting were more likely to receive positively written online reviews. This suggests that a spectrum of interpersonal and ancillary factors impact patient experience and perceptions beyond surgical skill. These outcomes can advise pediatric surgeons on behavioral and office qualities that patients and families prioritize when rating/recommending surgeons online.
Level of evidence:
IV
Keywords
Introduction
Patients and their families are determined to find the best possible physician for their healthcare. For this reason, people are increasingly turning toward online physician review websites (PRWs) due to their convenience and the ability to obtain firsthand commentary from other patients. 1 PRWs provide information for potential patients and have been shown to impact patients’ selection of certain providers.2–4 The objective of the current study was to use a publicly available PRW to analyze online reviews for pediatric orthopedic surgeons quantitatively and qualitatively.
Recent research has analyzed PRWs to better understand what patients value in their surgeons. Research regarding reviews of non-pediatric surgeons found that higher ratings were associated with surgeon trustworthiness, compassion, and strong bedside manner, while lower ratings were associated with rudeness or poor pain management.5,6 However, few studies have analyzed online reviews for pediatric physicians, let alone pediatric surgeons. Furthermore, most of these studies were limited to subjective analyses. One such study utilized a Likert-type scale survey and found that the patient–physician relationship along with staff friendliness had the strongest correlation with patients recommending the practice to others. 7 A separate study found that clear communication, cheerfulness, and patient confidence in their provider were strong predictors of patient satisfaction. 8 While these studies provide an initial glimpse into what patients value in their pediatric surgeon, they are limited by the usage of pre-written surveys, subjective analyses, and primarily investigating non-surgical specialties. The present study utilizes sentiment analysis, which is a form of machine learning that can understand spoken and written human language and obtain insights based on the language used. Sentiment analysis takes written prose and assigns a quantitative score based on how positive or negative it is. By doing so, this study more objectively evaluates patient values and explores the factors contributing to more positive and negative reviews of pediatric surgeons.
While it has been shown that patients are increasingly relying on online review sites, some physicians remain wary about their use due to the impact of negative reviews, a lack of correlation between ratings and quality of care, and the fact that non-clinical factors can impact patient reviews.9–11 However, online reviews are increasingly impacting patient decision making. Therefore, it is imperative that physicians are attentive to their online presence and consider how to best tailor their practice to improve the patient experience. To further explore factors that influence patient and caregiver reviews of pediatric surgeons, this study utilizes machine learning to analyze online patient comments. In line with the previous literature, we hypothesize that confidence, a positive bedside manner, and a welcoming office environment will lead to the most positive reviews of pediatric surgeons.
Materials and methods
Pediatric surgeon data extraction
The physicians identified in this study were obtained from the Pediatric Orthopaedic Society of North America (POSNA). Those without profiles or with no written reviews on healthgrades.com were excluded from the study. These surgeon names were placed in a web-scraping code, which queried Google for “(Physician Name) Pediatrics Healthgrades.” This culminated in a list of Healthgrades links for all surgeons, which were accessed to extract written and star-rating reviews as well as demographic information. States included in each region for locational analysis followed the United States Census Bureau regions. Healthgrades was selected as it was consistently one of the first websites offered when searching Google for provider reviews and one of the only websites that permitted bulk extraction of data.
Sentiment analysis calculation
The sentiment analysis for this study was performed by utilizing the “Valence Aware Dictionary and sEntiment Reasoner” (VADER) Python package. This package is a publicly available, published package for sentiment analysis. 12 The foundation is based on a dictionary which was developed by a team of 10 human raters. These raters assigned scores to hundreds of words ranging from −4 to +4 in terms of the perceived sentiment of the word. On this scale, more positive scores indicate more positive words, such as “great,” while 0 represents neutral words. Therefore, this package “reads” through sentences and develops a score that represents the overall sentiment of the text. This score is calculated and normalized to a scale between −1 and +1, with −1 representing the most negative sentiment and +1 representing the most positive.
The power of the package comes from its ability to interpret parts of speech and modifiers in sentences and alter its calculations based on context of the words. The package recognizes punctuation, redundant capitalization, and adverbs, as these all impact the meaning or tone of a sentence. For example, the sentence “She is a great surgeon” has less impact than “She is a GREAT surgeon.” Additionally, positive modulators, such as “very,” multiply the effect of the proceeding term. As such, a “very good” surgeon is given a higher score than just a “good” one and vice versa for negative words. Similarly, negative modulators change the sentiment of the proceeding term by flipping the sign of the contributing score. For example, if “great” was used to describe a surgeon, it would normally contribute positively to the overall score, but using the phrase “not great” conveys negative sentiment and score.
Model validation and data analysis
Linear regression analysis was performed in Python to relate each surgeon’s average sentiment score to their average reported online star score. This was conducted as a proof of concept for the sentiment analysis. If a significant relationship was seen between the calculated sentiment score and the star score, then the calculated score has a significant relationship with what is being reported by the patients themselves.
Student t-tests were utilized to assess the relationship between sex and average sentiment analysis score of written reviews, while one-way analysis of variance (ANOVA) tests were performed on the age ranges and geographical regions. A word frequency analysis was also conducted to report the most commonly used words in both the most positive and most negative reviews. Before this frequency analysis, non-clinically or behaviorally relevant words such as “amazing” or “worst” were removed to focus on potentially actionable and relevant words to clinicians. Additionally, after conducting the word frequency analysis, a bigram (or two word-pair strings) analysis was performed to provide greater context for the words utilized. Sentiment scores > + 0.50 were defined as positive reviews and scores < 0 were defined as negative reviews. Finally, a multiple logistic regression was performed to analyze the effects of specific, clinically relevant words or phrases on the likelihood of a review scoring > + 0.50.
Results
Model validation
When plotting each review’s average star score against our calculated sentiment score, there was a positive linear relationship (r 2 = 0.61, p < 0.01), providing validity to our analysis (Figure 1).

Pediatric orthopedic surgeon cohort model validation. Linear regression of average calculated sentiment analysis scores of each surgeon compared to their reported online star ratings.
Pediatric surgeon demographics
This study identified 749 pediatric orthopedic surgeons who met the inclusion criteria yielding 3830 online reviews, while 443 surgeons were excluded for having no online profile or reviews. Male surgeons comprised 80.8% of the cohort and received higher average star scores than females (4.17/5.00 ± 0.78 vs 3.89/5.00 ± 1.02, p < 0.01) with no significant difference in sentiment scores (+0.50 ± 0.44 vs +0.45 ± 0.50, p = 0.28). Regarding age, 6.8% were younger than 40 years old, 27.0% were between 40 and 49 years old, 32.5% were between 50 and 59 years old, and 33.8% were older than 60 years old. Younger surgeons trended toward having higher sentiment scores (p = 0.07) and received higher star scores (p < 0.01). 19.7% of included surgeons practice in Midwestern states, 22.1% in Northeastern states, 38.4% in Southern states, and 19.7% in Western states. Regional analysis showed a trend toward different star scores (p = 0.08) but no difference in sentiment scores (p = 0.25). Average sentiment and star scores for age, sex, and location are shown in Table 1.
Pediatric orthopedic surgeon demographic analysis.
Some physicians did not have their age, sex, or location listed and were not included in respective analyses. States included in each region for locational analysis followed the United States Census Bureau regions.
Word/word-pair frequency and multivariate analysis
Out of 3830 surgeon reviews, 2729 (71.3%) were deemed positive reviews (sentiment score > + 0.50) and 670 (17.5%) were negative reviews (sentiment score < 0). The most used words in positive reviews of surgeons were “care,” “kind,” “caring,” “wonderful,” and “friendly.” Conversely, the most used words in negative reviews for surgeons included “pain,” “no,” “rude,” “care,” and “problem.” The most common word pairs in positive reviews included “feel comfortable,” “kind compassionate,” “cares patients,” and “truly cares.” The most frequent word pairs in negative reviews were “back pain,” “no pain,” and “severe pain.” For the 10 most frequently used words and word pairs in positive and negative reviews, see Tables 2 and 3, respectively.
Word frequency analysis for pediatric orthopedic surgeons.
Frequency indicates the number of times a word was used in positive or negative reviews, not the number of positive or negative reviews that included that word.
Word-pair frequency analysis for pediatric orthopedic surgeons.
Frequency indicates the number of times a word-pair was used in positive or negative reviews, not the number of positive or negative reviews that included that word-pair.
Multivariate analysis
The multivariate analysis identifies words that increase or decrease the odds of a physician receiving a positive review. For example, when the word “comfortable” was used in a review, a surgeon was 8.3 times more likely to receive a positive review (odds ratio (OR): 8.34, p < 0.01). Alternatively, when “wait” was included, a surgeon was 0.4 times as likely to receive a positive review (OR: 0.44; p < 0.01). For the full multivariate analysis, see Tables 4 and 5.
Multivariate analysis of relevant keywords with positive influence on reviews.
CI: confidence interval.
Multivariate analysis of relevant keywords with negative influence on reviews.
CI: confidence interval.
Discussion
Online PRWs are increasingly influencing patients and their families as they seek out and select surgeons. Uniquely, pediatric surgeons must strive to build positive rapport with both their patients and their patients’ caregivers. This adds nuances to pediatric surgeon reviews that are typically not present for their non-pediatric colleagues. In this study, we utilized a machine learning approach to objectively analyze 3830 written reviews for 749 pediatric orthopedic surgeons using sentiment analysis.
Previous literature has consistently shown that younger surgeons receive better online reviews.5,6,13,14 We found that younger pediatric surgeons had significantly higher star ratings, with surgeons under 40 years old scoring the highest. Younger surgeons also trended toward having higher sentiment scores. Damodar et al. 15 suggested that this may be influenced by younger surgeons encouraging more patients to utilize online reviews, indicating that it may serve surgeons well to cultivate a social media presence5,16 and/or focus on improving factors discussed later in this study. Regarding the impact of surgeon sex on reviews, the literature is less clear as some studies found no difference in online ratings between sexes17–19 while others found female surgeons to be rated more highly.14,20 However, male surgeons in this study were found to have significantly higher star ratings with no difference in sentiment score. Therefore, even though patient reviews were positive to a similar degree for both male and female pediatric surgeons, male surgeons still received higher online ratings. This may indicate the presence of implicit gender biases, as noted by Hutchinson, 21 or may be skewed due to the higher number of male surgeons in this study than female surgeons.
Interpersonal characteristics such as strong communication and compassion, along with clinical knowledge and skill, have been shown to improve written reviews.5,22,23 We found that surgeons who appeared confident (OR: 8.02), knowledgeable (OR: 2.14), listened attentively (OR: 2.88), and made patients feel comfortable (OR: 8.34) were significantly more likely to receive positive reviews. Positive engagement with patients’ caregivers was also valued, as “family” (OR: 1.78) significantly predicted positive reviews. Additionally, the top five most common words and the top five most common word-pairs in positive reviews were all related to interpersonal characteristics. Alternatively, reviews with the words “rude” (OR: 0.024) or “unprofessional” (OR: 0.026) were almost 50× more likely to be negative. With respect to pain management, “pain-free” (OR: 14.45) was the strongest predictor of positive reviews in the study while “pain” (OR: 0.26) alone made reviews more likely to be negative. Additionally, “pain” and word-pairs including “pain” were the most frequently found words in negative reviews. Patients and their caregivers seem to highly value a strong bedside manner in conjunction with effective alleviation of pain and symptoms when grading surgeons.
Ancillary factors have also been shown to significantly impact patient experiences and reviews.24–27 This study found that most words regarding ancillary factors, including “wait” (OR: 0.44) and “nurse” (OR: 0.41), were predictive of negative reviews. Reviewers may be more likely to comment on these office aspects only if they are negative or seem unnecessary. Multiple studies have found that office staff interactions promote negative reviews,22,25 but we found the word “staff” (OR: 2.23) to predict positive reviews. Yu et al. 28 similarly found a positive impact of office staff and noted that staff friendliness and helpfulness significantly impacted patient experiences. While certain factors such as office staff personnel may be under a surgeon’s management, other elements including wait time and scheduling may be more difficult for a surgeon to personally improve. Nevertheless, it is important for surgeons to understand how these variables impact a patient’s and their caregiver’s perspective to facilitate a well-rounded patient experience.
This study design is not without limitations. People are more likely to leave reviews if their experiences were exceedingly positive or negative, and we were unable to gauge a patient’s motivation for writing a review. Patients also might review a surgeon at different points in the timeline of care, influencing their overall sentiment. Surgeons can also encourage patients they’ve positively interacted with to leave reviews. While these are all potential sources of bias in online reviews, we believe these findings are still relevant as prospective patients are increasingly relying on PRWs and have access to the same information we analyzed. Therefore, this study can help physicians understand what factors are currently influencing those patients. Analysis was also limited to pediatric surgeons who are members of POSNA and have at least one written review on a single PRW, as it was the only site with public accessibility. Future studies may seek to directly compare pediatric surgeons to non-pediatric surgeons to more finely assess the qualities and factors that influence patient sentiment across various fields of medicine.
To our knowledge, this is the first and largest quantitative analysis of online reviews for pediatric surgeons utilizing a sentiment analysis approach. We found that younger surgeons were associated with higher star scores, as were male surgeons. Unsurprisingly, positive interpersonal skills and traits seemed to be drivers of positive reviews, as were concise and effective pain/symptom management. Negative reviews were primarily associated with comments regarding pain and ancillary factors that surgeons hold less control over. Ultimately, this study advises pediatric surgeons on specific behavioral and office qualities that patients and their caregivers prioritize to best improve their practice, maximize patient satisfaction, and engage more future patients.
Footnotes
Author contributions
Declaration of conflicting interests
The author(s) declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: J.S.K. declares Aldentyfy Inc. (Stock or Stock Options). The other authors declare that they have no disclosures or conflicts of interest.
Ethical approval
This research involved the collection and study of currently existing data that was publicly available to patients, surgeons, and researchers alike. Therefore, it was not necessary to obtain approval from our Institutional Review Board.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
