Sage Journals: Discover world-class research

Abstract

Patient satisfaction is critical to a health organization’s sustainability, making patient feedback an important source of insight for improving healthcare delivery. This study analyzes 31,054 patient responses across seven healthcare units to identify factors that influence patient satisfaction and to examine how satisfaction is reflected in qualitative comments across diverse care settings. Deep learning methods were employed to extract features related to sentiment and topics from patient comments. Linear regressions were used to evaluate the explanatory power of both quantitative ratings of healthcare service attributes and features extracted from qualitative comments, accounting for unit-specific healthcare settings. Our results reveal that trust and communication factors are the strongest predictors of patient satisfaction. However, the explanatory power of quantitatively rated survey items shows clear variations across units, suggesting that standardized survey questions inadequately capture patient experience in specific contexts. Text analysis uncovered important unit-specific priorities, such as prescription management in Medical Practice and result efficiency in OP Lab, that were absent from the rated survey items. Patients who mentioned topics not covered in the rated survey items tended to report lower satisfaction scores, indicating potential service gaps. The integration of quantitative ratings and qualitative comments in our analysis significantly improved explanatory power in six of the seven units. Based on the identified key determinants of patient satisfaction and emerging concerns from patient comments, we propose a tiered approach for healthcare practitioners to leverage both forms of feedback for more targeted, emotionally responsive, and context-aware improvements in patient care. As healthcare delivery continues to evolve, our findings also underscore the value of flexible, multi-modal feedback strategies—supported by an adaptive analytical framework—that can inform future research and system-level responses to the diverse and shifting expectations of patients.

Keywords

patient satisfaction healthcare transformation natural language processing feedback analysis patient-centered care

Introduction

Patient satisfaction is important for both treatment outcomes and healthcare providers’ financial performance, as satisfied patients are more likely to choose the same healthcare providers for future care and recommend the provider to family and friends.^1,2 Assessment of patient satisfaction has identified multiple aspects of health service delivery that could influence it such as provider communication, care coordination, wait times, service accessibility, and administrative rules.³ Past patient satisfaction research has primarily relied on surveys where various health service attributes were measured quantitatively in the form of patient ratings.^4,5 Although quantitative surveys allow standardized data collection and analysis, they often overlook contextual details or emerging patient concerns, especially in today’s evolving healthcare contexts shaped by rapid technological advancements.⁶ Patient comments collected through open-ended survey questions can reveal contextual details missing from quantitative surveys and offer insights into why patients gave certain ratings. Indeed, analyzing qualitative comments provided a deeper understanding of patient experiences, uncovering factors that influence satisfaction beyond what quantitative measures capture. This approach supports healthcare providers better address patient needs and improve care quality by incorporating patient voices into service evaluation and development.⁷

Although quantitative and qualitative patient satisfaction research has contributed greatly to our understanding of patient satisfaction, research opportunities remain. First, most studies have focused on single specialty units,^8–10 thereby limiting generalizability and hindering comprehensive comparisons across diverse healthcare settings.^11,12 Second, existing research has predominantly relied on either quantitative ratings^13–15 or qualitative comments and reviews,^11,16,17 rarely examining how these two forms of patient feedback can jointly influence patient satisfaction. We argue that integrating both data types is crucial because scale ratings may overlook nuanced patient concerns, while comments alone often lack the quantitative clarity needed for actionable insights. Combining scale ratings and patient comments could provide a holistic view, highlighting potential synergies and uncovering new service dimensions not captured by rated survey items. Third, qualitative patient satisfaction research has primarily relied on manual or software-assisted traditional coding methods, such as thematic coding, phenomenology, grounded theory, or content analysis.¹⁸ These methods are most effective when the qualitative dataset is not overly large. Text analytics has expanded the capabilities of traditional coding methods and has been increasingly used in recent years to extract deeper insights from patients’ comments and online reviews.^19,20 However, while traditional text analytics methods offer valuable baseline capabilities, deep learning approaches—such as Bidirectional Encoder Representations from Transformers (BERT)—provide more advanced means of capturing the nuanced and contextual nature of patient experiences and emerging concerns in today’s evolving healthcare environment.^21,22 BERT-based models can capture contextual relationships and semantic subtleties that keyword-based approaches may overlook. This is particularly important in healthcare, where the same term may carry different meanings depending on its context. Specifically, sentiment analysis powered by deep learning benefits from semantic embeddings that account for contextual meaning, enabling a more accurate interpretation of the emotional tone in patient feedback than rule-based or lexicon-based approaches.²³ BERT’s bidirectional processing enhances this understanding by accounting for negations and complex sentence structures commonly found in patients’ comments. Furthermore, BERT models can identify emerging themes and sentiments without relying on predefined lexicons or rules, thereby supporting the discovery of novel patient concerns that traditional data mining techniques might miss.

The limitations in the current literature identified above highlight the need for a comprehensive, multi-unit approach that integrates quantitative ratings and patients’ comments to understand the evolving patient experience and satisfaction. To address these limitations, we analyze post-care surveys of patients concerning their patient experiences in seven healthcare units within a health system. The dataset includes both quantitatively rated questionnaire items that elicited patient responses with respect to various health service attributes as well as open-ended questions, allowing us to examine how ratings of those health service attributes and qualitative comments relate to patient satisfaction. Specifically, we investigate the following research questions:

• What key determinants drive patient satisfaction in the healthcare environment, and how do these determinants vary in importance across diverse healthcare settings?

• Do qualitative comments uncover insights not revealed in quantitatively rated survey items? If yes, what additional insights do they uncover about patient satisfaction?

• How can we improve questionnaire design and patient experience based on the analysis of patient feedback?

By combining rated survey responses with deep learning–based analysis of patient comments across multiple units, this study offers both methodological and managerial contributions. Methodologically, we demonstrate how integrating quantitative ratings with qualitative feedback can reveal both core and emerging areas of patient concern that may be overlooked or insufficiently captured by quantitatively rated survey items alone. From a managerial perspective, our findings can guide healthcare administrators in refining or redesigning patient satisfaction measurement and analytical tools that are both responsive to evolving patient needs and comparable across diverse clinical settings.

In the following sections, we detail our data collection and analytical approach, describe the results, and discuss their implications for healthcare providers adapting to a rapidly changing clinical landscape. We then acknowledge the study’s limitations, suggest avenues for future research, and conclude the paper.

Data and methods

Our dataset comprises 31,054 post-care survey responses from patients concerning their patient experiences at seven healthcare units in a university health system in the US, collected in December 2020 by the health system as part of their healthcare delivery processes. The patient experience surveys were designed by the healthcare system’s patient experience team in collaboration with National Research Corporation, a professional survey vendor specializing in healthcare feedback measurement. Data collection utilizes an invitation-based, multi-modal approach including email, text messaging, interactive voice recording, and mail distribution, rather than public online review platforms. Data privacy is a critical concern in healthcare research, especially when handling sensitive patient information. In line with best practices highlighted by Lin et al.,²⁴ our study adheres strictly to privacy and security protocols. Our research protocol was approved by the Institutional Review Board, allowing us to use this patient survey dataset. Given the differences in service settings and the tailored design of questionnaires, we analyzed the survey responses separately for each healthcare unit. Several units included in this study have received Magnet Recognition, an award granted by the American Nurses Credentialing Center to recognize excellence in nursing practice.

The surveys contained questionnaire items that elicited quantitative ratings of various health service attributes (such as “Was it easy to get an appointment when you wanted?” and “Did you have enough input or say in your care?”) on a 1-4 scale (1 = “No” and 4 = “Yes, definitely”). Patient satisfaction was measured using the Net Promoter Score (NPS) on a 0-10 scale, which was the patient’s rating of the likelihood of recommending the provider to family and friends (0 = extremely unlikely and 10 = extremely likely). The survey also contained open-ended questions (such as “What can you tell us about what we did well or how we could have improved your experience?”) that elicited qualitative comments from patients regarding their healthcare experience. Each questionnaire incorporated selected quantitative items from Table 1 along with an open-ended question designed to capture overall satisfaction.

Table 1.

Quantitative survey items, corresponding question texts, and rating scale.

Quantitative survey item	Question text	Scale
Ease of appointment	Was it easy to get an appointment when you wanted?	1-4
Enough input	Did you have enough input or say in your care?	1-4
Explain by nurses	Did nurses explain things in a way you could understand?	1-4
Explain by provider	Did the care providers explain things in a way you could understand?	1-4
Explain by staff	Did someone on the staff explain what to do if the problems or symptoms continued, got worse, or came back?	1-4
Listen by nurses	Did nurses listen carefully to you?	1–4
Listen by provider	Did the (care) providers listen carefully to you?	1-4
Safety as priority	Did our team make safety from coronavirus/Covid-19 a priority during your visit?	1-4
Talk about concerns	Were you comfortable talking with nurses about your worries or concerns?	1-4
Timely being seen	Were you seen by a care provider in a timely manner?	1-4
Respect from nurses	Did nurses treat you with courtesy and respect?	1-4
Respect from provider	Did the staff treat you with courtesy and respect?	1-4
Trust for nurses	Did you have confidence and trust in the nurses treating you?	1-4
Trust provider	Did you trust this provider with your care?	1-4
Patient satisfaction	How likely would you be to recommend this facility to your family and friends?	0-10

Table 2 summarizes key characteristics of the survey responses across the seven healthcare units. The majority of respondents were seniors, with an overall average age of approximately 61 years (SD = 17.1). Age variability was greater in the Emergency Magnet and Urgent Care units, reflecting the diverse patient populations typically served in those settings. Medical Practice received the highest number of survey responses (14,983) and comments (8,449). Most units reported an average NPS above 9, indicating high levels of patient satisfaction and likelihood of recommending their providers, with Emergency Magnet slightly lower at above 8. Approximately half of all survey respondents provided open-ended comments along with their ratings.

Table 2.

Characteristics of survey responses.

Healthcare unit	Avg age (SD)	# of responses	Avg NPS (SD)	% with comments
Med prac	62 (17.0)	14,983	9.4 (1.52)	56.4
Med prac mag	63 (14.5)	2,239	9.5 (1.48)	48.7
Emergency mag	49 (21.1)	1,789	8.3 (3.05)	50.4
OP rehab	60 (16.1)	4,184	9.4 (1.67)	51.0
OP rehab mag	61 (14.0)	1,858	9.4 (1.59)	48.0
Urgent care	50 (20.2)	1,729	9.3 (1.77)	54.9
OP lab	58 (17.2)	4,272	9.6 (1.36)	61.3

Note: Med, Prac, Mag, OP, and Rehab refer to Medicine, Practice, Magnet, Outpatient, and Rehabilitation, respectively.

Figure 1 illustrates the three-phase analytical framework developed to address the research questions. Phase I examines the association between service attributes and patient satisfaction. Phase II focuses on extracting informative features from qualitative comments. This phase consists of three primary analytical components: text preprocessing, sentiment analysis, and topic modeling. In Phase III, we further examine how service attribute ratings and qualitative comments jointly explain patient satisfaction. Details of each phase are provided in the following subsections.

Figure 1.

Analytical framework for the medical survey data.

Service attribute ratings

The existing surveys at the studied health system assess four major aspects of healthcare service: communication, trust, safety, and accessibility. Communication includes measures such as “listen carefully by nurses and/or providers” (7 of 7 units; Medical Practice Magnet includes both items (listen carefully by nurses and providers), while all other units have one), “treat with respect from nurses, staff, and/or providers” (6 of 7; not included in OP Rehabilitation), “enough input during care” (5 of 7; not included in Medical Practice or Urgent Care), “comfort in talking about concerns with nurses,” and “explanations provided understandably by nurses, staff, and/or providers” (3 of 7; included only in the Magnet units of Medical Practice, Emergency, and OP Rehabilitation). Trust is assessed by the item “trust in providers and/or nurses” (5 of 7; not included in OP Rehabilitation or OP Lab). Safety is measured through the item “prioritization of safety” (7 of 7). Accessibility evaluates factors such as “ease of scheduling an appointment” (4 of 7; not included in Emergency Magnet, Urgent Care, or OP Lab) and “timeliness of being seen” (1 of 7; only present in Emergency Magnet). Based on healthcare settings, units such as Medical Practice, Medical Practice Magnet, Emergency Magnet, and Urgent Care involve both providers and nurses/staff in patient interactions, whereas others, such as OP Rehabilitation units and OP Lab, primarily involve nurses and/or staff, with limited or no direct provider involvement.

To examine the associations between ratings of those service attributes and patient satisfaction, we fitted ordinary least square (OLS) regressions for each healthcare unit and examined their statistical significance and coefficients as well as the total variance explained. The analytical model employed follows the general linear form specified in equation (1):

\begin{align} y & = β_{0} + \sum_{i = 1}^{n} α_{i} X_{i} + ϵ \end{align}

(1)

where y represents the patient satisfaction score, and X_i is the rating for the i-th quantitative survey item within the corresponding question pod. β₀ is the intercept parameter, and α_i are the regression coefficients associated with each quantitative survey item. Additionally, n is the total number of quantitative items in the question pod.

Extracting features from qualitative comments

Our analysis of patient comments was informed by recent methodological advances in text analytics. Text mining techniques can reveal underlying patterns in qualitative data that complement quantitative metrics, providing a framework for integrating mixed data types in topic identification.²¹ Meanwhile, benchmarks in sentiment classification have demonstrated the effectiveness of deep learning models, with LSTM networks achieving up to 91% accuracy in identifying sentiments of customer feedback.²³ Building on these developments, we employed deep learning methods to extract both topic and sentiment information from patient comments, identify concerns not covered in quantitative ratings in the survey, and examine how these features influence overall patient satisfaction.

Text data preprocessing

To prepare patient comments into an analyzable format for sentiment extraction and topic modeling, we implemented the following preprocessing steps. First, while most of the comments to open-ended questions were in English, some were in other languages. Therefore, all non-English texts were translated into English to ensure consistency and inclusiveness in the analysis. Next, we performed sentence segmentation, dividing each comment into individual sentences as the unit of analysis. This step is essential for both sentiment analysis and topic modeling, as it facilitates a more granular examination by accounting for variations in emotions and topics across sentences within the same comment, thereby enhancing analytical precision and coherence. We conducted additional preprocessing steps to improve data quality for topic modeling, including converting text to lowercase, removing stop words, eliminating punctuation, and excluding heavily subjective terms to ensure that subsequent topic clusters are not dominated by positive or negative adjectives.²⁵ These steps support the identification of more meaningful themes.

Sentiment analysis

Sentiment analysis is widely applied in fields such as social media monitoring and customer feedback analysis.²⁶ In this paper, BERT-based approaches were chosen over traditional sentiment analysis methods (such as lexicon- or rule-based approaches) because healthcare feedback requires a deep contextual understanding of domain-specific terminology and nuanced sentiment shifts within patient narratives. Traditional methods struggle with medical terminology in context and often miss sentiment shifts within comments (e.g., “Initially concerned…but ultimately satisfied”). In other words, BERT’s bidirectional contextual representations facilitate thorough semantic understanding by simultaneously analyzing both preceding and subsequent tokens, outperforming traditional unidirectional methods in capturing sentiment based on context.²⁷ In this research, we evaluated three deep learning approaches related to BERT for sentiment analysis: the original BERT model (BERT-Origin),²⁷ BERT combined with Long Short-Term Memory (BERT-LSTM),²⁸ and BERT integrated with Convolutional Neural Network (BERT-CNN).²⁹ Using our manually annotated dataset of over 9000 sentences, we trained, fine-tuned, and tested the candidate models. The BERT-LSTM model achieved the highest performance, with accuracy around 91.45% on test folds, surpassing BERT-Origin and BERT-CNN models as it effectively captured both the contextual representations from BERT and the sequential dependencies of LSTM layers.

Topic modeling

In this study, we also adopted the BERT algorithm for the topic modeling. BERT’s pre-trained language model architecture makes topic modeling more effective in capturing contextual word relationships and semantic details.^27,30 Recent research also shows its improved performance in identifying coherent and emerging topics among varied document collections.³¹

After generating sentence embeddings, we applied the Uniform Manifold Approximation and Projection (UMAP) algorithm ,³² which integrates the dimensionality reduction with the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) for clustering. To represent topics within the identified clusters, we adopted a bag-of-words approach, which computes the frequency of each word within a cluster. This method is particularly effective when combined with HDBSCAN, as it does not rely on predefined cluster structures, making it suitable for clusters with diverse shapes and densities. Furthermore, we refined the topic word importance using the class-based TF-IDF (C-TF-IDF) algorithm. Through this approach, each cluster is treated as a single document, and the importance of a word x within a cluster c is computed as:

W_{(x, c)} = {tf}_{(x, c)} \times \log (1 + \frac{A}{f_{x}})

(2)

where tf_(x,c) represents the frequency of the word x in cluster c, f_x is the frequency of the word x across all clusters, and A represents the average number of words per cluster. In C-TF-IDF, all text within a cluster is treated as a single document. The frequency of words x in each cluster c, tf_(x,c), is normalized using L1 to account for size differences among clusters. Additionally, we adjusted the inverse document frequency term to reflect the average word count in each category, ensuring the value of the logarithmic term remains positive by adding 1, as suggested by Joachims.³³ In conclusion, by leveraging BERT for sentence embeddings and combining it with advanced techniques, including UMAP, HDBSCAN, and C-TF-IDF, our approach enables a robust framework for topic modeling. The BERT algorithm based on the UMAP + HDBSCAN pipeline was selected over traditional topic modeling (e.g., LDA) due to its ability to handle overlapping topics and automatically identify emerging patient concerns, which are critical capabilities for evolving healthcare contexts.

Extracted features

Each patient’s comment sentiment was aggregated from individual sentences into a single metric and scaled to a range of 1 to 4, aligning with the scale of quantitative ratings to ensure coefficient comparability, where higher values indicate more positive sentiment and lower values indicate more negative sentiment. Each sentence was assigned to the topic with the highest probability based on the BERT output, and topic meanings were represented using keywords and representative sentences. Topics that did not align with those service attributes were labeled as new topics, and a binary variable, “mentions of new topic(s),” was created to indicate whether a comment included a sentence introducing a new theme. An interaction term between sentiment and the new topic indicator was constructed as an additional feature.

Integrated evaluations

To assess how quantitative ratings and qualitative comments jointly reflect patient satisfaction, we employed OLS regressions integrating both sets of measures. This approach enabled us to evaluate the additional explanatory power of variables derived from patient comments while controlling for quantitative ratings. Besides examining the significance and coefficients of individual predictors, we used likelihood ratio tests to assess the added value of textual features, including sentiment, mentions of new topics, and their interaction term. This approach also allowed us to examine the robustness of quantitative predictors when qualitative comment-derived features were included.

To examine the joint contribution of quantitative ratings and qualitative feedback to patient satisfaction, we extended the baseline regression model specified in equation (1) to incorporate features extracted from comments. The expanded model is formulated in equation (3):

\begin{align} y & = β_{0} + \sum_{i = 1}^{n} α_{i} X_{i} + β_{1} X_{Sentiment} \\ + β_{2} X_{Mentions} + β_{3} (X_{Sentiment} \times X_{Mentions}) + ϵ \end{align}

(3)

where X_S_entiment denotes the sentiment score derived from patient comments after scaling and mean-certering, X_M_entions represents a binary indicator of whether new topics are mentioned in the qualitative feedback, and the interaction term X_S_entiment∗X_M_entions captures how the relationship between sentiment and satisfaction changes when a new topic is mentioned.

Results

Quantitative ratings and patient satisfaction

This analysis examines responses that provide quantitative ratings of service attributes in the questionnaire. Table 3 presents the results of a linear regression analyzing the quantitative service attributes, highlighting several factors that significantly influence patient satisfaction. Notably, “trust for provider” emerges as the most influential predictor of patient satisfaction, demonstrating consistent significance across healthcare units that measured this item (coefficients: 0.718–1.055, p < 0.001), including Medical Practice, Medical Practice Magnet, Emergency Magnet, and Urgent Care. “Trust for nurses” exhibits an association with patient satisfaction in Medical Practice Magnet (coefficient: 0.208, p < 0.05) but is not statistically significant in Emergency Magnet or OP Rehabilitation Magnet.

Table 3.

Quantitatively rated questions across different healthcare units.

	Med prac	Med prac mag	Emergency mag	OP rehab	OP rehab mag	Urgent care	OP lab
Number of responses	14,509	2,138	1,673	4,095	1,813	1,702	4,264
Intercept	1.537***	.178	−2.038***	2.544***	2.015***	−1.359***	−1.081***
Ease of appointment	.119***	.046		.151***	.108**
Enough input		.774***	.607***	.678***	.639***		.706***
Explain by nurses		.068	−.107		−.086
Explain by provider			.086
Explain by staff			.114^.
Listen by nurses		.125	.577***		.368**
Listen by provider	.517***	.308**				.527***
Safety as priority	.206***	.126**	.213***	.345***	.160**	−.055	.891***
Talk about concerns		−.075	−.065		.079
Timely being seen			−.025
Respect from nurses		−.053	−.072		−.061
Respect from provider						.725***
Respect from staff	.174***	.174***	.501***	.651***	.708***	.554***	1.134***
Trust for nurses		.208*	.164		.066
Trust for provider	1.055***	.718***	1.019***			1.041***
Adjusted R²	.342	.384	.707	.292	.428	.543	.412

*Statistical significance: p < 0.001***, p < 0.01**, p < 0.05*, and p < 0.1. Number of responses refers to the number of valid responses included in the linear regression analysis for each healthcare unit. Empty cells indicate that the corresponding quantitative survey items were not included in that particular survey pod.

Similarly, “enough input” is a consistently significant predictor across all healthcare settings that included this item (coefficients: 0.607–0.774, p < 0.001). Medical Practice and Urgent Care did not assess this item but instead measured ”listen by provider”, which also demonstrates strong significance (coefficient: 0.308–0.527, p < 0.001). “Listen by nurses” was measured in Emergency Magnet (coefficient: 0.577, p < 0.001) and OP Rehabilitation Magnet (coefficient: 0.368, p < 0.01). While Medical Practice Magnet assessed both provider and nurse attentiveness, the results show that only provider attentiveness has a significant association with patient satisfaction (coefficient: 0.308, p < 0.01). The item “respect from provider” is significantly associated with patient satisfaction in Urgent Care (coefficient: 0.725, p < 0.001), whereas “respect from staff” shows strong positive effects across all units (coefficients: 0.174 - 1.134, p < 0.001). However, “respect from nurses” does not exhibit statistical significance in the Magnet units of Medical Practice, Emergency, or OP Rehabilitation. Additionally, ”explain by provider, nurses, and/or staff” and “talk about concerns” do not show significant relationships with patient satisfaction in the Magnet healthcare units.

Among other factors, “prioritizing safety” maintains significant positive associations across most healthcare units (coefficients: 0.126–0.891, p < 0.001 – p < 0.01), except for Urgent Care. “Ease of appointment” shows weak but significant positive associations in Medical Practice, OP Rehabilitation, and OP Rehabilitation Magnet (coefficients: 0.108–0.151, p < 0.001 – p < 0.01), but is not significant in Medical Practice Magnet. “Timely being seen” also does not reach statistical significance in Emergency Magnet.

The adjusted R² ranges from 0.292 to 0.707 as shown in Table 3, with relatively lower variance explained in OP Rehabilitation, Medical Practice, and Medical Practice Magnet departments (R² < 0.4), and higher variance explained towards Emergency Magnet (R² > 0.7).

Sentiment and topics in patient comment

For each patient comment in the survey, we extracted its sentiment and identified key topics. Each key topic could correspond to an existing service attribute in the quantitative rating items, or an emerging service attribute.

Table 4 presents the AVG NPS and the AVG sentiment scores for responses with comments, along with their standard deviations. The NPS for responses containing comments remains close to the overall NPS of each healthcare unit, but shows slightly lower scores and larger SDs, as compared with Table 2. The average Pearson correlation between the NPS and sentiment scores across the seven healthcare departments is 0.27, with Emergency Magnet showing the highest correlation (0.47) and OP Rehabilitation Magnet the lowest (0.11).

Table 4.

Summary NPS and sentiment statistics for responses with comments.

Healthcare unit	N	Avg NPS (SD)	Avg sent_score (SD)	Correlation	Avg sent_scaled(SD)	Correlation
Med prac	8,449	9.4 (1.70)	2.3 (2.04)	0.22	2.3 (.76)	0.23
Med prac mag	1,087	9.4 (1.70)	2.0 (1.90)	0.21	2.2 (.74)	0.18
Emergency mag	896	7.9 (3.48)	1.3 (2.12)	0.47	2.4 (.76)	0.48
OP rehab	2,123	9.3 (1.80)	1.7 (1.64)	0.21	2.3 (.71)	0.21
OP rehab mag	888	9.4 (1.85)	1.7 (2.00)	0.11	2.3 (.71)	0.21
Urgent care	946	9.3 (1.99)	1.9 (1.80)	0.36	2.4 (.73)	0.38
Op lab	2,612	9.5 (1.54)	1.5 (1.34)	0.28	2.4 (.70)	0.27

*NPS ranges from 1 to 10. The sentiment score (sent_score) is obtained from the BERT-LSTM sentiment analysis and then scaled to a 1-4 range (sent_scaled). N refers to the number of responses containing the comment. The first correlation is between NPS and sentiment, and the second is between NPS and sent_scaled.

The topic modeling of patient comments reveals both shared and distinct themes across healthcare units. Table 5 highlights the five topics with the highest number of comment sentences. Common topics included patient perceptions of the communication, reflected in themes such as “listen by doctor” and “respect from doctor, nurses, or staff”. “Appointment scheduling” also appeared as a recurring theme. Moreover, patients frequently commented on their overall visit experiences, highlighting their impressions of the hospital or the team. The focus areas also vary across healthcare units. As shown in Table 5, prescription management and treatment plans dominate feedback in Medical Practice and Medical Practice Magnet units. For Emergency Magnet and Urgent Care units, facility conditions and staff interactions are key concerns. In the OP Lab, feedback is dominant in results efficiency and test administration, while mammogram- and MRI-related topics are the top focus in OP Rehabilitation and OP Rehabilitation Magnet.

Table 5.

Hot topics across healthcare units.

Healthcare unit	Hot topic 1	Hot topic 2	Hot topic 3	Hot topic 4	Hot topic 5
Med prac	Listen by provider	Respect from nurses	Appointment scheduling	Prescription management	Respect from staff
Med prac mag	Listen by provider	Treatment plans	Appointment scheduling	Respect from doctor	Safety for COVID
Emergency mag	Respect from nurses & provider	Safety for COVID	Pain management	Facility & respect from staff	General experience
OP rehab	General experience	Respect from staff	Appointment scheduling	Mammogram	MRI
OP rehab mag	Listen by nurses & enough input	Appointment scheduling	Mammogram	Respect from staff	General experience
Urgent care	Respect from staff	Listen by doctor	General experience	Respect from provider & nurses	Room & facility
OP lab	General experience	Results efficiency	Appointment scheduling	Test administration	Respect from staff

As shown from the above results, some patient comments address topics beyond the existing healthcare service attributes. Table 6 further highlights the most prominent emerging themes. These new topics represent patient-reported attributes of care that either (1) are entirely absent from the quantitatively rated attributes, or (2) provide a nuanced perspective that elaborates on those established service attributes. In Medical Practice and Medical Practice Magnet, patients frequently mentioned specific treatment, telehealth or virtual visits, and wait time or experiences. In Emergency Magnet and Urgent Care, concerns included facility conditions, wait experience, diagnostic procedures, and pain or injury management. In OP Rehabilitation and OP Rehabilitation Magnet, topics of examination and treatment experience stood out, while the OP Lab feedback emphasized efficiency of the results, test administration, and process management.

Table 6.

New topics across healthcare units.

Healthcare unit	New topic 1	New topic 2	New topic 3	New topic 4	New topic 5
Med prac	Prescription management	Blood tests and exam results	Surgery care	Wait time	Telehealth visit
Med prac mag	Treatment plans	Virtual visit	Team of audiology	Symptoms of hearing	Wait experience & parking
Emergency mag	Pain management	Wait room experience	Breathing symptoms & diagnostics	Discharge experience	Gastrointestinal symptoms & diagnostics
OP rehab	Mammogram	MRI	Follow-up call & bill	Radiology experience	Ultrasound technician
OP rehab mag	Mammogram	Treatment experience	MRI	Pain management	Stress tests accommodation
Urgent care	Room & facility	Prescription management	Wait, app, and text	Xray	Injury management
OP lab	Results efficiency	Test administration	Drive through experience	Process management	Wait time

Integrating quantitative service attributes and comment features for patient satisfaction

To examine how comment-derived features and quantitative ratings jointly influence patient satisfaction, Table 7 shows the results of regressions that integrate both sets of variables.

Table 7.

The integrated evaluations across different healthcare units.

	Med prac	Med prac mag	Emergency mag	OP rehab	OP rehab mag	Urgent care	OP lab
Number of responses	6,848	876	720	1,765	731	780	2,125
Sentiment	.135***	.115	.014	.269***	.135	.189**	.247***
Mentions	−.249***	−.233**	−.131	−.206**	−.136	−.179*	−.06
Mentions*Sentiment	.194***	.087	.351*	−.191*	−.018	.169^.	−.088
Intercept	−.402**	−3.392***	−2.805***	−0.364	−.862*	−3.13***	−1.615**
Ease of appointment	.111***	.122*		.139***	.151**
Enough input		.345**	.669***	.827***	.793***		.695***
Explain by nurses		.196	.025		−.324*
Explain by provider			.152
Explain by staff			.067
Listen by nurses		.094	.319*		.112
Listen by provider	.584***	.777***				.845***
Safety as priority	.363***	.259***	.384***	.594***	.426***	.182*	.872***
Talk about concerns		−.255*	−.027		.256
Timely being seen			−.036
Respect from nurses		.132	−.199		.543***
Respect from provider						.579***
Respect from staff	.336***	.139	.472***	.863***	.964***	.498***	1.32***
Trust for nurses		.567***	.352*		−.195
Trust for provider	1.186***	.958***	1.009***			1.15***
Adjusted R²	.485	.533	.807	.47	.631	.726	.485

*Statistical significance: p < 0.001***, p < 0.01**, p < 0.05*, and p < 0.1. ‘Mentions’ refers to ‘Mentions of new topic(s)’. Sentiment was scaled and mean-centered to reduce multicollinearity given the inclusion of interaction term in models.

Based on the significance and coefficients of the quantitatively rated attributes shown in Table 7, while controlling comment-derived features, “trust for provider” remains the significant and most important factor across the four healthcare units that measured it (coefficients: 0.958 - 1.186, p < 0.001). And the significance of “trust for nurses” becomes stronger for Medical Practice Magnet (coefficient: 0.567, p < 0.001) and also appears significant for Emergency Magnet (coefficient: 0.280, p < 0.05). Similar to the results based on quantitatively rated service attributes in completed survey responses in Table 3, “listen by provider” and “enough input” keep being significant and important while for the Medical Practice Magnet unit, the influence of “enough input” weakens, with its coefficient declining from 0.774 (p < 0.001) to 0.345 (p < 0.01) and the coefficient of “listen by provider” increases from 0.308 (p < 0.01) to 0.777 (p < 0.001). The item “listen by nurses” remains significant only for Emergency Magnet but not for Medical Practice or OP Rehab Magnet for this subset of responses containing comments. The item “respect from staff and/or provider” is consistently statistically significant across healthcare units (coefficients: 0.336 - 1.32, p < 0.001) except for Medical Practice Magnet. The previously insignificant item ”respect from nurses” becomes significant in OP Rehab Magnet, with its coefficient increasing from −0.61 to 0.543 (p < 0.001). However, for OP Rehab Magnet, the absolute value for the coefficient of “explain by nurses” increases, indicating a stronger effect, despite becoming more negative (from −0.086 to −0.324, p < 0.05). The item “explain by provider” stays insignificant for Emergency Magnet. The “safety as priority” keeps being significant across units (coefficient: 0.259-0.872, p < .001) and becomes significant for Urgent Care (coefficient: 0.182, p < 0.05) as well. The “ease of appointment” stays weakly positively associated and also shows significance for Medical Practice Magnet. “Timely being seen” stays insignificant for Emergency Magnet.

Among the features extracted from patient comments, sentiment exhibits a positive association with patient satisfaction in all units, reaching statistical significance in Medical Practice, OP Rehabilitation, Urgent Care, and OP Lab, but not in Magnet departments. The feature “mentions of new topic(s)” shows a statistically significant negative association with patient satisfaction in four units: Medical Practice, Medical Practice Magnet, OP Rehabilitation, and Urgent Care (coefficients between −0.249 and −0.179, p < 0.001 to p < 0.05). The interaction term between sentiment and new topic mentions is significant in three units, but the direction of moderation varied by healthcare setting. For Medical Practice (coefficient: 0.194, p < 0.001) and Emergency Magnet (coefficient: 0.351, p < 0.05), there were significantly positive interactions. For Urgent Care, it also shows a positive interaction effect with marginal significance (coefficient: 0.169, p < 0.1). On the other hand, for OP Rehab (coefficient: −0.191, p < 0.05), there was a significant negative moderation effect, indicating that the influence of sentiment on satisfaction is weaker when new topics are present. And for OP Rehab Magnet (coefficient: −0.018) and OP Lab (coefficient: −0.088), the negative moderation is small and statistically insignificant.

Table 8 presents the model fit comparisons across healthcare units. The model using all responses

(R_{Quan}^{2})

shows an average variance explained of 44.4%, while the model using only responses that provided comments

(R_{Quan - Comment}^{2})

explains an average variance of 58.4%, reflecting an average increase of 14% across healthcare units. The overall model incorporating both quantitative ratings and comment-derived features

(R_{Both - Comment}^{2})

shows an additional average increase of 0.7% in model fit compared to the model using comment responses. While the Likelihood Ratio (LR) Test confirms that the inclusion of comment-derived features significantly improves model fit in most units (p < 0.01), an exception is observed in OP Rehabilitation Magnet. Note that a Variance Inflation Factor (VIF) assessment confirmed that multicollinearity was not a significant concern among predictors in the overall model, with all VIF values remaining below 10. As an additional context, the standalone model using only comment-derived features

(R_{Comment}^{2})

explains between 5.8% (in Medical Practice Magnet) and 30.4% (in Emergency Magnet) of the variance in patient satisfaction, indicating variability in how much unstructured feedback alone can predict outcomes. The linear regression models, using either features extracted from comments alone or qualitative service attributes only, are presented in Appendix Table A.1 based on the subset of responses containing comments.

Table 8.

Model fit without and with extracted features from comments.

Healthcare unit	$R_{Q u a n}^{2}$	$R_{Q u a n - C o m m e n t}^{2}$	$R_{Both - Comment}^{2}$	LR test (χ²)	$R_{Comment}^{2}$
Medical prac	.342	.471	.485	287.01***	.106
Medical prac mag	.384	.528	.533	18.184**	.058
Emergency mag	.707	.804	.807	33.127**	.304
OP rehab	.292	.462	.470	5.816***	.064
OP rehab mag	.428	.629	.631	7.867	.043
Urgent care	.543	.717	.726	30.007***	.187
OP lab	.412	.478	.485	40.497***	.086
Average	.444	.584	.591	–	.121
Range	[.292, .707]	[.462, .804]	[.47, .807]	–	[.058, .304]

*Statistical significance: p < 0.001*** and p < 0.01**. The Quan model refers to regressions in Table 3 using only quantitative service attributes from the complete sample; The Quan-Comment model refers to regressions in Table A.1 using only quantitative service ratings from responses that include comments; The Both-Comment model refers to regressions in Table 7 incorporating both quantitative service attributes and comment-extracted features from responses with comments. The LR test compares the standalone model using comment-extracted features in Table A.1 with the Both-Comment model in Table 7. The Comment model refers to regressions in Table A.1 using only comment-extracted features from responses with comments.

Discussion

This study yields three primary findings. First, the analysis of quantitative ratings of health service attributes indicates that trust in the provider, provider–patient communication, and operational aspects such as accessibility and safety are central to patient satisfaction across all units. However, the relative importance of these factors varies across units, underscoring the importance of unit-specific evaluation and strategy. Additionally, ratings of service attributes do not adequately explain patient satisfaction, particularly for units whose surveys contained fewer service attribute items being rated or which delivered non-standardized care services not captured in the post-care patient satisfaction surveys. This inadequacy points to the need for additional measurement dimensions beyond existing quantitatively rated items to more comprehensively capture patients’ experiences and drivers of satisfaction.

Second, patient comments in response to open-ended questions provide valuable complementary insights. Frequently mentioned topics not only enrich the interpretation of quantitative ratings of service attributes by adding contextual depth but also suggest new areas of concern not captured by quantitatively rated items. Findings from the analysis of patient comments highlight distinct and emerging service priorities across healthcare units and underscore the importance of integrating narrative feedback into the patient satisfaction assessment process.

Third, mentions of new service attributes in patient comments are generally associated with lower satisfaction scores and can moderate the relationship between comment sentiment and satisfaction rating, with patterns varying across healthcare settings. This suggests that patient concerns raised in the comments and not already captured in the quantitatively rated items may reflect unmet expectations or service gaps that warrant closer attention. Additionally, patients who chose to leave comments tended to provide more informative quantitative ratings, as reflected in relatively higher explained variance in patient satisfaction than those who did not leave comments. This pattern suggests deeper engagement with their care experience may be fruitful and points to the potential value of paying greater attention to the responses of this subgroup in future patient experience analysis.

The relationship between rated service attributes and patient satisfaction

Trust in providers was identified as the most prominent factor associated with patient satisfaction based on the analysis of patient ratings of service attributes specified in the survey. This result highlights the uniqueness of the healthcare context, where treatment outcomes greatly depend on patients’ adherence to treatments and their openness about their health and behavior. Revealing sensitive information such as mental health, substance use, and sexual health requires emotional safety since patients may fear being judged or embarrassed, confidence in confidentiality, assurance of privacy, and confidence in the diagnoses and treatments. Both Ardito and Rabellino³⁴ and Hatcher and Barends³⁵ emphasize that therapeutic alliance, a collaborative, trusting relationship between provider and patient is essential for fostering openness and engagement in therapy, which is viewed as a prerequisite for positive health outcomes.

Besides patient trust in healthcare providers, communication was identified as the most prominent factor associated with patient satisfaction, based on the analysis of patient ratings of service attributes specified in the survey. This finding aligns with prior research of Hays and Skootsky³⁶ and Mehra and Mishra,³⁷ which emphasizes the role of provider communication, empathy, and sufficient consultation time in shaping patient satisfaction. Communication-related service attributes, including adequate input, active listening, and respectful behavior from the staff and provider consistently demonstrated statistical significance across healthcare units. These patterns suggest that patients place considerable importance on relational and interpersonal dimensions of care. The communication behaviors are central tenets of the Patient-Centered Communication framework, which frames communication not as a soft skill but as a set of core clinical functions.³⁸ This is because empathy, active listening, and compassion expressed through communication help patients feel cared for, not just medically, but personally. Communications also help patients develop understanding of diagnoses, treatments, risks, and options, and provide needed emotional support, all of which facilitate shared decision-making in patients’ own treatment plans. Clear communications build confidence in the treatment plans, which leads to greater trust and improved satisfaction with the provider.

Our finding, that trust and communication represent the strongest drivers of patient satisfaction, is consistent with the broad importance placed on these interpersonal dimensions on digital healthcare devices.³⁹ In addition, our study extends existing research by analyzing large-scale, multi-modal patient feedback to reveal how these factors are actually present in patient experiences and intersect with changing service priorities, providing insights not fully captured by traditional surveys or reviews.

Given that data were collected during the Covid pandemic, “safety precautions” was consistently statistically significant across most units, reflecting the enduring influence of pandemic-related concerns on patient expectations. This finding supports previous studies such as Chekijian et al.,⁴⁰ which identified personal safety and perceived risk of virus exposure as notable concerns in patient experience. Our study further shows that the impact of safety prioritization is particularly evident in OP Lab (coefficient: 0.891, p < 0.001), while it shows insignificance in Urgent Care based on the overall responses. These variations likely reflect differing sensitivities to safety concerns based on patient acuity and procedural risk.

The item “appointment scheduling” was found to be a significant factor across multiple healthcare units, highlighting the importance of care access in patient satisfaction. This observation is consistent with findings from Atinga et al.,⁴¹ which highlighted access as a recurring concern in patient satisfaction. Notably, appointment scheduling was not found to be a significant concern in Medical Practice Magnet, suggesting possible operational differences in scheduling systems or patient flow management.

The total variance explained by quantitatively rated service attributes shown in Table 3 and Table 8 suggest that additional dimensions may improve the ability to predict patient satisfaction. Across all healthcare units, 15 unique items were used to measure service attributes. Most items were related to communication, with other items covering service accessibility and personal safety. However, OP Rehabilitation Magnet only measured three items, explaining just 29.2% of the variance, indicating potential limitations in capturing the full range of patient experiences. Furthermore, Medical Practice and Medical Practice Magnet measured five and twelve attributes, respectively, yet explained only 34.2% and 38.4% of the variance in patient satisfaction. In contrast, Urgent Care (measured five items) and Emergency Magnet (measured twelve items) explain 54.5% and 70.7% of the variance, respectively, suggesting that more complex service processes or situational factors in the former units may require additional measurement dimensions beyond existing structured ratings to better capture patient satisfaction in these settings.

The value of patient feedback in response to open-ended questions

Contextualizing core service attributes and uncovering blind spots in healthcare services

The text analysis of patient comments reinforces the importance of key factors identified in the quantitative ratings part of the survey and adds contextual depth to ratings. Frequently mentioned topics—such as “listening by providers,” “respect from staff,” and operational attributes like “appointment scheduling” and “safety precautions”—support the significance of those service attributes in patient experience. Patient comments in response to open-ended questions also shed light on why patients held certain perceptions, offering explanations that the rated questionnaire items alone cannot fully capture. Take “listen by provider” for instance, while ratings by patients reflect the extent to which patients felt heard, patient comments provide insight into what patients wanted providers to hear—such as specific symptoms or concerns—and how their experiences were shaped when these were overlooked. For example, one patient noted that their provider listened attentively to their symptoms, while another felt dismissed when the provider ordered routine tests without engaging in a meaningful discussion. These descriptions deepen the understanding of communication quality, revealing variation in patient expectations and perceptions within high- or low-scoring areas.

Patient comments also revealed emerging and unit-specific priorities not captured by the quantitatively rated survey items. One salient area relates to diagnostics and treatments specific to each healthcare setting—for example, blood tests and medications in Medical Practice, and symptom checks and pain management in Emergency care. Patients also emphasized operational factors influencing their experience, including care process efficiency and the physical environment. Comments identified concerns such as wait times, result turnaround, discharge procedures, waiting room conditions, and parking—underscoring the need for patient-centered improvements beyond clinical interactions. Digital health services, including telehealth and electronic communication channels, were frequently mentioned in patient feedback from Medical Practice and Urgent Care, highlighting the emerging role of accessible, technology-supported care in shaping perceptions and experiences in these ambulatory settings. Notably, patients in Emergency Magnet and Urgent Care notably commented on the facility—an aspect not included in its quantitatively rated survey items, yet warranting further consideration. While sharing certain common themes, specific concerns varied across healthcare units. From “telehealth visits” in Medical Practice to the “drive-through experience” in OP Lab, these setting-specific concerns reflect the distinct characteristics of each care context and would likely remain hidden without the analysis of patient comments to open-ended questions.

Taken together, the topics identified through qualitative comments both contextualize patient ratings of service attributes and reveal critical blind spots. Our findings confirm and extend the work of Khanbhai et al.,¹¹ underscoring the evolving dimensions of patient experience and the meaningful variation across healthcare settings.

Enhancing explanatory power through integrating comments

Beyond providing descriptive insights, patient comments demonstrate their complementary value in explaining patient satisfaction while controlling for quantitative ratings. Mentions of new topics in patient comments are found to be negatively associated with patient satisfaction scores and was statistically significant for four of seven units. This pattern may be partially explained by the direct and spillover effects observed in Xu,⁴² which showed that giving lower ratings of service attributes increased the likelihood of patients writing detailed comments about those attributes, and the lower rating of one service attribute also prompted comments expressing other concerns. Extending this logic, patients who introduced new topics in their qualitative comments could be raising concerns that were not addressed by the quantitatively rated survey items, and these unmeasured issues might have contributed to lower overall satisfaction scores.

Sentiment itself was found to be positively associated with patient satisfaction and was statistically significant across all non-Magnet units. This supports prior research linking sentiment in patient narratives to satisfaction^43,44 but also underscores its explanatory value in settings without formal nursing excellence structures, even when used alongside quantitative ratings. Furthermore, mentions of new topics moderated the relationship between sentiment and patient satisfaction in certain healthcare settings. More specifically, when comments introduced topics not measured by the quantitative survey items, the influence of sentiment on satisfaction wass generally amplified—except in OP Rehabilitation (coefficient: −0.191, p < 0.05), where the association weakened. This negative moderation effect in rehabilitation settings may be attributed to the nature of their services, which involve repeated visits and sustained patient–provider interactions.⁴⁵ In such contexts, patient satisfaction may rely more on perceived care quality, recovery progress, and staff rapport rather than on newly introduced service attributes. This finding extends the scope of Chatterjee et al.⁴⁶ from healthcare products to services and suggests that the mentioning of new topic(s) may moderate the sentiment–satisfaction relationship, with their effects varying by care setting.

Patient comments further elucidated the association between quantitative ratings and satisfaction. We found that patients who left comments were found to be more engaged and expressed stronger or more consistent opinions. After controlling for extracted features from comments, several rated service attributes in the survey—particularly those related to trust and communication, such as “trust in provider,” “enough input,” “listening by provider,” and “respect from provider”—remained statistically significant across both the overall and commenting-group models. The consistency of these associations suggests that the rated service attributes in the survey represent robust dimensions of patient experience, even after accounting for additional explanatory content expressed in qualitative comments. Notably, “trust in provider” consistently emerged as the strongest predictor across all four units where it was measured, with high and stable coefficients, reinforcing its central role. “Listening by provider” showed slightly higher coefficients in the commenting group across multiple units, suggesting its heightened salience among patients who chose to elaborate on their care experiences. Together, these findings underscore the importance of provider trust and communication as foundational components of patient experience that persist across levels of engagement and feedback formats.

Taking comments into account shifted the importance of quantitatively rated service attributes, suggesting that patients who left comments might have weighed certain elements of care differently or that comment-derived features partially explained variance captured by the models that considered only quantitatively rated items. In Medical Practice Magnet, for example, “trust in nurses” became more strongly associated with satisfaction scores, while “respect from staff” lost significance. This change may reflect patients’ more refined differentiation between general courtesy and interpersonal trust, particularly among those motivated to comment. “Ease of appointment” also emerged as significant in this setting, potentially indicating that logistical barriers or access issues became more salient to those who took time to share detailed feedback. Shifts in the importance of “enough input” and “listening by provider” in the same unit may further reflect the expressive nature of the commenting group, suggesting that engaged patients was more attentive to whether their voices are heard and their perspectives are genuinely considered. In Emergency Magnet, “trust in nurses” became newly significant, while “listening by nurses” declined in importance. This may indicate a focus on confidence in clinical judgment over relational communication during high-pressure healthcare settings. Meanwhile, in OP Rehabilitation and Testing, all four surveyed service attributes showed stronger coefficients while remaining statistically significant. In the OP Rehabilitation Magnet unit, “respect from nurses” gained significance as “listening by nurses” diminished—suggesting a potential overlap in how these communication-related items were interpreted by patients in long-term or follow-up care contexts. In Urgent Care, “safety as a priority” became significant in the commenting group, possibly highlighting their sensitivity to personal safety in fast-paced outpatient settings.

Interestingly, the items “explanation by nurses, staff, and/or provider” and “feeling comfortable talking about concerns” were generally not significantly associated with the satisfaction score in rated survey data. However, when integrated with patient comments, these items displayed a small but statistically significant negative relationship in the model. This pattern may suggest that among patients who provided comments, provider explanations alone—particularly when not paired with active listening or respectful engagement—were not consistently viewed as enhancing their care experience. This is because provider explanations may sometimes be perceived as insufficient or formulaic, especially when they fail to directly address the patient’s individual concerns. This highlights the importance of context-sensitive communication, where being heard and actively involved may weigh more heavily in shaping overall impressions than simply receiving care information.

The significant Likelihood Ratio Test across most units alongside a slight increase in R² (Table 8) suggests that those extracted features contribute to patient satisfaction but do not drastically enhance the variance explained. However, the R² value increases more substantially for the model based solely on rated service attributes using the full dataset (n = 31, 054) when compared to the model using only responses with both ratings and comments (n = 17, 036). This indicates that ratings of service attributes in the survey have stronger explanatory power among respondents who provided comments. These results highlight the strategic value of analyzing responses that include qualitative comments, as they offer clearer insights into patient priorities and strengthen the interpretability of rated questionnaire items among more engaged respondents.

Taken together, these findings confirm and extend prior work suggesting that patient comments serve multiple functions in healthcare quality evaluation.^47,48 They provide rating responses with additional context, reveals emotional dimensions of care experiences, surfaces new concerns not captured by rated survey items, and enhances the explanatory power of models predicting patient satisfaction. Integrating comment-derived variables transforms not only the model fit but also the interpretive landscape—helping to distinguish between patient concerns that are robust and those that shift with engagement level or care setting. These patterns reveal the limitations of relying solely on quantitatively rated surveys in capturing nuanced or emotionally charged experiences and reinforce the value of open-ended questions in advancing a more complete understanding of patient experience.

Implications for healthcare service measurement and patient experience improvement

This study offers several practical and research implications for healthcare questionnaire design, patient experience improvement, and patient feedback analysis. Our findings first highlight opportunities to enhance questionnaire design in capturing patient experience—particularly in units where quantitatively rated survey items explain only a modest proportion of the variance in patient satisfaction. To address this gap, we recommend strengthening the measurement of core dimensions—such as trust in providers and nurses, communication quality, and respect—which consistently demonstrate strong associations with patient satisfaction. For example, the Medical Practice, OP Rehabilitation and Testing, and OP Lab units currently include only three to five service attribute questionnaires, with variance explained ranging from less than 35% to 50% in the overall and commenting-group models, respectively. These limitations suggest a need to more comprehensively capture core aspects of patient experience and to broaden the scope of quantitatively rated items where appropriate.

Beyond clearly measured core healthcare service attributes, survey instruments should remain adaptable and incorporate new dimensions identified through qualitative feedback to capture evolving and context-specific patient concerns. Themes such as diagnostic and treatment experiences, as well as operational factors, were frequently raised in comments. Since these emerging topics were found to negatively impact patient satisfaction when unaddressed, they should be considered for inclusion in quantitatively rated questionnaire items. For example, questionnaire design may incorporate satisfaction measures related to telehealth in Medical Practice and Urgent Care, or drive-through services and result efficiency in OP Lab. The integration of these evolving and unit-specific dimensions into surveys would enhance sensitivity to local concerns, increase the actionability of patient feedback, and deepen cross-unit insights, enabling health systems to better align with patient needs and expectations.

Our findings also suggest a tiered approach for healthcare administrators and practitioners aiming to improve patient experience and enhance patient satisfaction. First, healthcare organizations should prioritize resource allocation and team-based training focused on communication, access to care, and safety precautions—areas consistently shown to be significant in structured ratings and frequently highlighted in open-ended comments. Second, practitioners should regularly review and analyze patient comments to proactively identify emerging patient needs and emotionally charged aspects of care, enabling more targeted and responsive improvement efforts. For example, in units where the interaction between sentiment and newly raised topics is positive—such as Medical Practice and Emergency Magnet, where patient interactions are often time-limited or high-pressure—addressing novel concerns through emotionally positive engagement strategies may amplify patient satisfaction. Additionally, administrators should pay particular attention to feedback from patients who leave comments, as this subgroup typically demonstrates deeper engagement and provides more informative evaluations.

From a research perspective, our findings recommend a multi-form data analysis approach, quantitative and qualitative patient feedback on health services to accurately capture patient experience and its link to their satisfaction. The variations in importance of determinants and prevalent themes across healthcare units underscore the necessity of unit-specific survey design and contextualized comparative analysis. Additionally, our study identifies a previously unexplored negative spillover effect, wherein mentions of new topics in comments often correspond with lower satisfaction scores and can moderate sentiment-satisfaction relationships depending on departmental service settings. Furthermore, our use of deep learning methods (BERT-based models) illustrates the effectiveness of advanced text analytics in identifying nuanced, emerging themes from qualitative comments—insights often missed by conventional text analysis methods. Researchers may also consider the role of commenting behavior and examine how it influences the importance and explanatory value of quantitatively rated service attributes in predicting service outcomes. Overall, we demonstrate that the value of patient comments should not be judged solely by incremental R² increases but by their ability to provide rich contextual insights that enhance the interpretation of quantitative measures, identify evolving and unit-specific patient concerns, and strengthen the clarity of associations with outcomes of interest.

Limitations and future work

Despite its contributions, this study has three primary limitations. First, the analyses are based on data from a single month (December 2020), which limits the ability to capture temporal variation or evolving trends in patient feedback. A longitudinal study using data across a broader time span would offer a more comprehensive view of shifting patient concerns. Second, we were unable to assess the association of each identified new topic individually due to the exploratory nature of topic modeling and the unequal distribution of topic occurrences, which limited the feasibility of topic-level statistical analysis. Future research can investigate these nuanced associations further to refine measurement and intervention strategies. Third, while the multi-modal approach effectively uncovers emerging patient concerns, the findings may be influenced by the organizational structures and patient demographics of the participating healthcare units. This framework could be applied across multiple institutions or geographic regions to evaluate the robustness and generalizability of the findings.

Furthermore, while our deep learning model is highly scalable for discovering service aspects of concern, the subsequent aggregation and evaluation of qualitative information can also be viewed as a Group Decision Making (GDM) problem. A promising avenue for future work is a hybrid approach in which a panel of healthcare practitioners acts as experts to confirm and prioritize the emerging themes identified by the model. For example, Ji et al. (2021) introduce a biobjective optimization model that balances consensus and confidence, thereby enhancing the quality of GDM and the effectiveness of service resource allocation.⁴⁹

Beyond future research related to our current limitations, another direction of work could explore how artificial intelligence (AI)-driven solutions enhance patient satisfaction by optimizing service delivery processes, drawing on the theoretical framework proposed by Chiang⁵⁰ for interpreting AI applications in service operations. For example, the implementation of AI-enabled communication platforms may enable healthcare providers to offer more personalized and empathetic interactions with patients while simultaneously improving operational efficiency.

Additionally, future research could adopt a systematic methodology to enhance patient satisfaction, grounded in the unified service system framework proposed by Wang et al.⁵¹ Specifically, leveraging work domain analysis as outlined by Wang et al.⁵² would enable a comprehensive mapping of all variables associated with determinant factors such as trust and communication. This process should include an examination of organizational structures, technology interfaces, staff training programs, and environmental factors that collectively shape trust-building and communication effectiveness within healthcare service systems.

Conclusion

Given the impact of ongoing technological advancements, this study examines how patient concerns in healthcare services may be shifting and how ratings of questionnaire items and comments in response to open-ended questions jointly relate to patient satisfaction. Using deep learning and regression methods, we identified key drivers—such as trust and communication—that consistently influence patient satisfaction, while also revealing emerging, unit-specific concerns. Mentions of new aspects tend to lower satisfaction scores and may moderate the sentiment–satisfaction relationship depending on the healthcare setting. These findings suggest that qualitative patient feedback captures evolving priorities and provides complementary predictive value for patient satisfaction that standardized survey items may overlook. As healthcare services continue to evolve, our findings highlight the need for flexible, multi-modal feedback strategies supported by a context-aware analytical framework to address the diverse and changing needs of patients.

Supplemental Material

Supplemental Material - Assessing patient satisfaction in healthcare: Integrating ratings of service attributes and BERT-based analysis of comments

Supplemental Material for Assessing patient satisfaction in healthcare: Integrating ratings of service attributes and BERT-based analysis of comments by Lin Lu, Qiong Hu, Zhiping Walter, Donglai Huo, Hongbo Zhang in International Journal of Engineering Business Management.

Footnotes

ORCID iDs

Lin Lu

Qiong Hu

Zhiping Walter

Donglai Huo

Funding

The authors acknowledge financial support from the Business School at the University of Colorado Denver and the Dolan School of Business at Fairfield University for this research.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Supplemental Material

Supplemental material for this article is available online.

Appendix

Table A.1.

Standalone linear regressions for patient satisfaction across healthcare units.

Service attribute/extracted feature	Med prac		Med prac mag		Emergency mag		OP rehab		OP rehab mag		Urgent care		OP lab
Number of responses	6,851	6,848	877	876	759	720	1,765	1,765	731	731	780	780	2125	2125
Mentions*Sentiment	.404***		.064		.194		−.029		−.099		.258		−.005
Mentions	−.677***		−.509***		−1.734***		−.490***		−.235		−.875***		−.373**
Sentiment	.400***		.480***		1.922***		.581***		.592***		.938***		.601***
Intercept	9.626***	−1.093***	9.567***	−3.765***	8.423***	−3.269***	9.489***	−.087	9.432***	−.998**	9.449***	−3.884***	9.503***	−2.083***
Ease of appointment		.129***		.137**				.160***		.161**
Enough input				.364**		.700***		.840***		.827***				.737***
Explain by nurses				.204		.003				−.335*
Explain by provider						.152
Explain by staff						.063
Listen by nurses				.087		.357*				.125
Listen by provider		.618***		.780***								.866***
Safety precautions		.393***		.279***		.396***		.607***		.431***		.209*		.884***
Talk about concerns				−.233.		−.021				.257
Timely being seen						−.001
Respect from nurses				.105		−.224				.484*
Respect from provider												.598***
Respect from staff		.359***		.139		.496***		.892***		.955***		.535***		1.361***
Trust for nurses				.573***		.343				−.153
Trust for provider		1.242***		.982***		1.039***						1.231***
Adjusted R²	.106	.471	.058	.528	.304	.804	.064	.462	.043	.629	.187	.717	.086	.478

*Statistical significance: p < 0.001***, p < 0.01**, p < 0.05*, and p < 0.1. The Mentions here refers to ‘Mentions of new topic (s)’.

References

Boulding

Glickman

Manary

, et al. Relationship between patient satisfaction with inpatient care and hospital readmission within 30 days. Am J Manag Care 2011; 17(1): 41–48.

Richter

Muhlestein

. Patient experience and hospital profitability: is there a link? Health Care Manag Rev 2017; 42(3): 247–257.

Batbaatar

Dorjdagva

Luvsannyam

, et al. Determinants of patient satisfaction: a systematic review. Perspect Public Health 2017; 137(2): 89–101.

for Medicare & Medicaid Services C . HCAHPS fact sheet 2022. https://www.hcahpsonline.org/globalassets/hcahps/facts/hcahps_fact_sheet_april_2022.pdf

Fatima

Malik

Shabbir

. Hospital healthcare service quality, patient satisfaction and loyalty: an investigation in context of private healthcare systems. Int J Qual Reliab Manag 2018; 35(6): 1195–1214.

Whaley

Pera

Cantor

, et al. Changes in health services use among commercially insured US populations during the COVID-19 pandemic. JAMA Netw Open 2020; 3(11): e2024984.

Kumah

Osei-Kesse

Anaba

. Understanding and using patient experience feedback to improve health care quality: systematic review and framework development. J Patient Cent Res Rev 2017; 4(1): 24–31, Systematic review and framework proposal.

Hentati

Cabrera

D’Anza

, et al. Patient satisfaction with telemedicine in rhinology during the COVID-19 pandemic. Am J Otolaryngol 2021; 42(3): 102921.

Naik

JRK

Anand

Bashir

. An empirical investigation to determine patient satisfaction factors at tertiary care hospitals in India. International Journal of Quality and Service Sciences 2015; 7(1): 2–16.

10.

Policastro

Carnes

Friedman

, et al. Narrative comments about urologists on physician rating websites provide insight into what drives patient satisfaction surveys. Urol Pract 2019; 6(4): 222–226.

11.

Khanbhai

Warren

Symons

, et al. Using natural language processing to understand, facilitate and maintain continuity in patient experience across transitions of care. Int J Med Inform 2022; 157: 104642.

12.

Hao

Zhang

. The voice of Chinese health consumers: a text mining approach to web-based physician reviews. J Med Internet Res 2016; 18(5): e108.

13.

Alipour

Sharifian

Dehghan Haghighi

, et al. Patients’ perceptions, experiences, and satisfaction with e-prescribing system: a cross-sectional study. Int J Med Inform 2024; 181: 105282.

14.

Sengupta

Sarkar

Bhattacherjee

. The relationship between telemedicine tools and physician satisfaction, quality of care, and patient visits during the COVID-19 pandemic. Int J Med Inform 2024; 190: 105541.

15.

Jannati

Nakhaee

Yazdi-Feyzabadi

, et al. A cross-sectional online survey on patients’ satisfaction using store-and-forward voice and text messaging teleconsultation service during the COVID-19 pandemic. Int J Med Inform 2021; 151: 104474.

16.

Terpend

Rossetti

Kroes

, et al. Leveraging free-form comments to assess and improve patient satisfaction. Ann Fam Med 2022; 20(6): 551–555.

17.

Shah

Yan

Tariq

, et al. What patients like or dislike in physicians: analyzing drivers of patient satisfaction and dissatisfaction using a digital topic modeling approach. Inf Process Manag 2021; 58(3): 102516.

18.

Levitt

Pomerville

Surace

. A qualitative meta-analysis examining clients’ experiences of psychotherapy: a new agenda. Psychol Bull 2016; 142(8): 801–830.

19.

Zaman

Goldberg

Abrahams

, et al. Facebook hospital reviews: automated service quality detection and relationships with patient satisfaction. Decis Sci J 2021; 52(6): 1403–1431.

20.

Ojo

Rizun

Walsh

, et al. Prioritising national healthcare service issues from free text feedback–A computational text analysis & predictive modelling approach. Decis Support Syst 2024; 181: 114215.

21.

Chen

Tsang

. When text mining meets science mapping in the bibliometric analysis: a review and future opportunities. Int J Eng Bus Manag 2023; 15: 18479790231222349.

22.

Chagnon

Pandolfi

Donatelli

, et al. Benchmarking topic models on scientific articles using BERTeley. Natural Language Processing Journal 2024; 6: 100044.

23.

Samir

Abd-Elmegid

Marie

. Sentiment analysis model for airline customers’ feedback using deep learning techniques. Int J Eng Bus Manag 2023; 15: 18479790231206019.

24.

Lin

, et al. Privacy, security and resilience in Mobile healthcare applications. Enterp Inf Syst 2023; 17(3): 1939896.

25.

Wang

, et al. Measuring service quality with text analytics: considering both importance and performance of consumer opinions on social and non-social online platforms. J Bus Res 2023; 169: 114298.

26.

Dang

Moreno-García

De la Prieta

. Sentiment analysis based on deep learning: a comparative study. Electronics 2020; 9(3): 483.

27.

Devlin

Chang

Lee

BERT

, et al.pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). 2019.

28.

Murthy

GSN

Allu

Andhavarapu

, et al. Text based sentiment analysis using LSTM. International Journal of Engineering Research and Technology Research. 2020; 9(05): 32–41.

29.

Chen

Wang

. BERT with CNN for sequential sentence classification. In: 2019 IEEE international conference on big data (big data), Los Angeles, CA, USA, 9–12 December 2019, 2019; 15: 5081–5084.

30.

Liu

, et al. BETM: A new pre-trained BERT-guided Embedding-based Topic Model. Big Data Research; 2025: 100551.

31.

Egger

Joanne

. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Frontiers in sociology; 7(2022): 886498.

32.

George

Sumathy

. An integrated clustering and BERT framework for improved topic modeling. International Journal of Information Technology. 2023; 15(4): 2187–2195.

33.

Joachims

A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: Proceedings of the fourteenth international conference on machine learning (ICML), Nashville, Tennessee, USA, 8–12 July 1997. vol. 97.

34.

Ardito

Rabellino

. Therapeutic alliance and outcome of psychotherapy: historical excursus, measurements, and prospects for research. Front Psychol 2011; 2: 270.

35.

Hatcher

Barends

. How a return to theory could help alliance research. Psychother Theor Res Pract Train 2006; 43(3): 292–299.

36.

Hays

Skootsky

. Patient experience with in-person and telehealth visits before and during the COVID-19 pandemic at a large integrated health system in the United States. J Gen Intern Med 2022; 37(4): 847–852.

37.

Mehra

Mishra

. Role of communication, influence, and satisfaction in patient recommendations of a physician. Vikalpa 2021; 46(2): 99–111.

38.

Epstein

Street

. Patient-centered communication in cancer care: promoting healing and reducing suffering. National Cancer Institute, 2007. NIH Publication No. 07-6225.

39.

Ogbeyemi

Lin

Odeyemi

, et al. Human factors in digital healthcare systems: a critical literature review. Enterp Inf Syst 2025; 19: 2524847.

40.

Chekijian

Fodeh

. Emergency care and the patient experience: using sentiment analysis and topic modeling to understand the impact of the COVID-19 pandemic. Health Technol 2021; 11(5): 1073–1082.

41.

Atinga

Akosen

Bawontuo

. Perceived characteristics of outpatient appointment scheduling association with patient satisfaction and treatment adherence: an innovation theory application. Hosp Pract 2021; 49(4): 298–306.

42.

. Closed-form evaluations and open-ended comment options: how do they affect customer online review behavior and reflect satisfaction with hotels? Decis Support Syst 2021; 145: 113525.

43.

Cheng

Ying

, et al. Physician review websites: understanding patient satisfaction with ophthalmologists using natural language processing. J Ophthalmol 2023; 2023(1): 4762460.

44.

Santuzzi

Brodnik

Rinehart-Thompson

, et al.

Patient satisfaction: how do qualitative comments relate to quantitative scores on a satisfaction survey?

Qual Manag Health Care 2009; 18(1): 3–18.

45.

Graham

. Organization of rehabilitation services. Handb Clin Neurol 2013; 110: 113–120.

46.

Chatterjee

Goyal

Prakash

, et al. Exploring healthcare/health-product ecommerce satisfaction: a text mining and machine learning application. J Bus Res 2021; 131: 815–825.

47.

Brookes

Baker

. What does patient feedback reveal about the NHS? A mixed methods study of comments posted to the NHS choices online service. BMJ Open 2017; 7(4): e013821.

48.

Ranard

Werner

Antanavicius

, et al. Yelp reviews of hospital care can supplement and inform traditional surveys of the patient experience of care. Health Aff 2016; 35(4): 697–705.

49.

Zhang

. A biobjective optimization model for expert opinions aggregation and its application in group decision making. IEEE Syst J 2020; 15(2): 2834–2844.

50.

Chiang

C-H. AI

. in airport operations: enhancing competitiveness and satisfaction. Enterprise Information Systems. 2025; 19(3–4): 2454003.

51.

Wang

Zhang

, et al.

On a unified definition of the service system: what is its identity?

IEEE Syst J 2013; 8(3): 821–826.

52.

Wang

Ding

, et al. On domain modelling of the service system with its application to enterprise information systems. Enterp Inf Syst 2016; 10(1): 1–16.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.02 MB