Introduction/Purpose: Online star ratings are increasingly visible and influential when patients select orthopaedic surgeons. However, the reliability of patient-initiated (unsolicited) reviews compared to institutionally administered (solicited) surveys remains unclear. In this multi-institutional analysis, we aimed to (1) quantify differences between solicited and unsolicited star ratings of surgeons, (2) compare review volumes across platforms, and (3) identify the minimum number of unsolicited reviews needed for an unsolicited rating to reach statistical equivalence with surgeons’ global ratings, comprised of both solicited and unsolicited ratings.
Methods: We analyzed the data of orthopaedic surgeons from three large health systems across each of the five regions in the continental United States. Each surgeon’s star ratings and review counts were gathered from each platform. Unsolicited data were collected from Google, Healthgrades, and Vitals. Solicited data were collected from institutional websites. All ratings utilized a 1 to 5-star scale. Surgeons were stratified into groups of 10-review increments based on their review count on a given unsolicited platform (1-9 reviews, 10- 19 reviews, etc.). For each group, the weighted mean unsolicited rating and the paired global weighted mean were compared. The global mean was calculated using all reviews from both solicited and unsolicited sources of surgeons in that group. Normality was assessed using Shapiro-Wilk test. Paired t-tests, Wilcoxon signed- rank tests, and equivalence tests were conducted as appropriate. The Benjamini–Hochberg false discovery rate was used for multiple comparisons.
Results: 322,907 reviews across 629 orthopaedic surgeons (555 male, 74 female) were included in the analysis. There were 51,779 (16.0%) unsolicited reviews and 271,128 (84.0%) solicited reviews. The weighted mean unsolicited rating was significantly lower than the weighted mean solicited rating (4.5 ± 0.4 vs. 4.8 ± 0.1, p < 0.001). On average, per surgeon unsolicited review counts were significantly fewer than solicited counts (82 vs. 431, p < 0.001). The total unsolicited group (all unsolicited ratings combined) reached equivalence with the paired global group at 90 reviews, while Google ratings reached equivalence at 60 reviews, and Healthgrades at 80 reviews (p < 0.05). Vitals failed to reach equivalence with global ratings at all review volumes (p > 0.05).
Conclusion: Unsolicited ratings are lower and based on fewer reviews than solicited ratings. Equivalence with global ratings emerges at platform-specific review-volume thresholds: approximately 60 reviews for Google, 80 reviews for Healthgrades, and 90 reviews across all unsolicited reviews. These data support minimum review thresholds for public display of star ratings. They also underscore the value of solicited institutional survey platforms to increase response volume for more reliable and representative assessment of orthopaedic surgeons.
Scatter plot of star rating vs. review volume across all platforms (Solicited, Google, Healthgrades, and Vitals).