Telemedicine in China: Effective indicators of telemedicine platforms for promoting health and well-being among healthcare consumers

Abstract

Objective

Telemedicine platforms played a crucial role during the COVID-19 pandemic, alleviating issues related to the shortage and unequal distribution of healthcare resources. The purpose of this study is to identify key factors affecting the service quality of telemedicine platforms in China, with the dual objectives of advancing patient wellbeing and informing evidence-based service innovations for industry stakeholders.

Methods

To quantitatively assess the impact of these key factors on health and wellbeing from the perspective of healthcare consumers, a total of 25,499 valid online reviews were collected from telemedicine platforms. To establish a service quality evaluation framework, this study proposes a novel approach that combines the Servqual quality assessment model with a CNN-BiLSTM deep learning model enhanced by an attention mechanism.

Results

Analysis of the full sample shows that healthcare consumers are most concerned about the quality of services provided by telemedicine platforms, with the most important being the professional competence of doctors, a critical factor for promoting consumer health and wellbeing. The proposed hybrid deep learning approach demonstrates superior performance in sentiment classification accuracy, outperforming conventional methods by 11.11 percentage points. This methodological innovation enables more precise identification of consumer sentiment patterns across service dimensions.

Conclusion

The novel quality assessment framework introduced here provides actionable insights for advancing telemedicine platforms, driving progress toward precision healthcare and consumer-centric wellbeing. Furthermore, it enables healthcare consumers to select telemedicine services aligned with their personalized needs.

Keywords

Telemedicine patient reviews Servqual model sentiment analysis healthcare consumers

Introduction

With the rapid development of the internet and the increasing demand for medical services, internet-based healthcare services and health communities are becoming more prevalent, disrupting traditional methods of accessing health information and disease treatment.^1,2 Many countries have elevated telemedicine to a national strategy, and China's development in this area began in the mid-1980s, gradually establishing a national telemedicine network. By the end of 2001, there were 300 online hospitals in the country. Since 2010, telemedicine has seen widespread application in eastern cities such as Beijing, while western regions like Guizhou have also begun to focus on this field. Finally, China successfully achieved telemedicine coverage for 13,000 medical institutions and all impoverished counties, significantly improving the accessibility of healthcare services.³ Bolstered by emerging technologies such as 5G, artificial intelligence, and big data, China's telemedicine services have demonstrated capabilities that are on par with， and even surpass， internationally advanced standards. Focusing on the developmental challenges of China's telemedicine platforms can provide valuable insights for other countries in deploying telemedicine.

Especially during the COVID-19 pandemic, residents are increasingly accessing health information through the internet and mobile applications. This shift is moving them away from solely relying on hospital visits.^4–6 On the one hand, online consultations by healthcare consumers on telemedicine platforms reduce face-to-face contact between doctors and healthcare consumers, helping to reduce the spread of infectious diseases.⁷ On the other hand, the cross-regional service characteristics of telemedicine platforms broaden the service population, enabling healthcare consumers to contact doctors anytime and anywhere to consult on health issues and disease treatments.^8–11 Despite the significant contributions of telemedicine platforms, issues including user privacy breaches, physician diagnostic errors, and cumbersome platform operations not only affect user loyalty and stickiness but also attract widespread social attention.^12–15 Assessing telemedicine platforms based on user reviews and promptly improving services and functionalities pose major challenges. These efforts are necessary to evaluate various quality indicators and achieve long-term sustainable development of the platform.¹⁶

Service quality is the overall assessment of the superiority of a service by users. Due to the intangibility of services, regulating service quality is more challenging than regulating tangible products.^17,18 Therefore, with the rapid development of telemedicine platforms, the academic community is increasingly focusing on service quality issues.

Previous studies have largely focused on subjective social survey methods such as questionnaires and grounded theory. However, these subjective social survey methodologies are contingent upon the voluntary participation of respondents, which may result in inadequate sample representativeness. Respondents might obscure their genuine sentiments to align with societal norms or the anticipations of others, thereby undermining the objectivity of the data. Furthermore, the static nature of questionnaire design hampers the dynamic capture of evolving user needs, and the protracted data collection period impedes the real-time identification of service quality issues. As the number of online comments about specific service platforms increases and becomes more accessible, research has shifted focus. Studies based on online comments and other objective approaches are growing.^19,20 For example, Liu et al.²¹ investigated the relationship between the tone of doctors’ voices during online health consultations and healthcare consumer satisfaction, finding that healthcare consumer satisfaction was influenced by the speed of the doctor's speech during consultation. Currently, most studies on telemedicine platforms rarely develop a multidimensional model to analyze service quality from aspects such as needs fulfillment, security, responsiveness, and user interface. Although the focus of related research varies, overall, it tends to limit practicality and effectiveness. Therefore, it is necessary to further study the service quality of telemedicine platforms through online user reviews, to provide references for optimizing the service quality and sustainable development of these platforms.

To quantitatively assess the impact of these key factors from the perspective of healthcare consumers, we collected 25,499 healthcare consumer reviews from the telemedicine platforms of “Good Doctor Online” (GDO) and “Doctor Dingxiang” (DDX). In this study, we analyzed the collected healthcare consumer comments through the topic generation model and deep learning-based sentiment analysis, comparing the impact of different dimensions on the quality of telemedicine platform from a real healthcare consumer perspective. Moreover, through the topic relevance matrix²² and the Servqual quality assessment model,^23,24 we sorted out the different aspects that healthcare consumers focus on and constructed a comprehensive and reasonable telemedicine platform quality assessment index system. In our work, a hybrid CNN-BiLSTM-MHA model was proposed, which integrates the local feature extraction capability of convolutional neural networks, the temporal modeling strength of bidirectional long-short-term memory networks, and the multilevel semantic focus enabled by multihead attention mechanisms. This model significantly enhances the accuracy of sentiment classification, achieving an improvement of approximately 11% compared to single models. Consequently, the quality evaluation method for the telemedicine platform presented in this study is rendered more reliable. The specific research questions addressed in this paper are as follows:

RQ1: How to find the factors that affect the quality of telemedicine platforms through healthcare consumer comments?

RQ2: How to aggregate topics generated from health consumer reviews using the Servqual model?

RQ3: How to evaluate the service quality of telemedicine platforms through sentiment analysis?

This article covers several parts, as follows: the second section describes the literature review. The third section details the proposed methodology for quality assessment of telemedicine platforms. The fourth section applies the new method to an actual case analysis, evaluating its effectiveness and practicality. Lastly, the discussions and conclusions are presented in the fifth and sixth sections, respectively.

Literature review

Role and benefits of telemedicine platforms in modern healthcare

The telemedicine platforms are extensions of traditional doctor–patient relationships on the internet, enabling patients to consult with healthcare professionals anytime and anywhere regarding health issues and disease treatments.²⁵ Healthcare consumer consultations conducted through telemedicine platforms represent an innovative approach to meeting the growing demand for medical services, allowing users to overcome time and geographic constraints and providing more options for both doctors and patients.^26,27 The primary benefits include reducing infection risks and improving efficiency.

In terms of infection control, telemedicine platforms avoid face-to-face contact between healthcare providers and healthcare consumers, thereby lowering the risk of transmitting infectious diseases such as COVID-19, which can easily spread among close contacts.^28,29

On the other hand, telemedicine platforms significantly enhance consultation efficiency.³⁰ For example, during the COVID-19 pandemic, shortages of medical equipment and overwhelmed hospitals were major issues. However, telemedicine platforms not only saved substantial costs and time spent on lockdown measures but also enabled excellent physicians to diagnose diseases quickly and conveniently.^13,31

Importance of online reviews in evaluating telemedicine platforms

Online reviews are a valuable resource for revealing user opinions on service quality, and telemedicine platform-related online reviews can effectively assist other consumers in making choices.³² The quantity, quality, content type, and even the length of review content influence users.³³ These online reviews cover aspects ranging from healthcare service quality, doctor professionalism, and staff attitude to platform usability, and some users even engage in conversations through reviews. Classifying and conducting sentiment analysis on these online reviews can help us effectively understand the focal points and issues of user concern.³⁴ Sudirjo et al.³⁵ analyzed online customer reviews on Tokopedia to identify factors influencing purchase decisions on Tokopedia online stores. Li et al.³⁶ used a machine learning-based conditional survival forest model to categorize online restaurant reviews from two popular tourist destinations into five categories: location, taste, price, service, and ambiance, to predict restaurant survival rates and determine which online review features are the best indicators of restaurant survival.

Materials and methods

The methodological framework of this study is illustrated in Figure 1. The study in this article consists of the following steps: firstly, the collected healthcare consumer comments are introduced and data preprocessing is performed on them. Subsequently, Term Frequency-Inverse Document Frequency (TF-IDF)³⁷ and Latent Dirichlet Allocation (LDA) topic models³⁸ are utilized to extract the service quality indicators of telemedicine platforms that healthcare consumers are concerned about. After analyzing the healthcare consumers comments for sentiment tendency through sentiment analysis techniques, quality assessment of the telemedicine platforms is conducted based on the extracted service indicators.

Figure 1.

Overview of the research of service quality assessment including three main modules. (1) Preliminary work and LDA topic clustering, (2) sentiment analysis based on deep learning with comments, and (3) service quality evaluation based on comments sentiment analysis. LDA: Latent Dirichlet Allocation.

Data collection and preprocessing

This article collects data on users information and comments from two platforms with relatively excellent operation of online health communities in China: GDO and DDX. According to the 2023 China Internet Healthcare Development Report, GDO and DDX rank among the top three platforms in terms of user activity on China's telemedicine services, collectively covering over 80% of the online consultation user base and thereby demonstrating broad representativeness. Specifically, GDO primarily offers comprehensive diagnostic and treatment services across multiple departments, including internal medicine, surgery, and others, while DDX focuses on health education and chronic disease management. The combination of these two platforms effectively encompasses the core service modalities of telemedicine.

The data includes information such as healthcare consumer name, healthcare consumer comments and time of healthcare consumer comments. This study strictly adhered to data privacy protection protocols. Direct identifiers, including user aliases and contact information, were systematically removed from the original user-generated comment dataset. Furthermore, potential quasi-identifiers that might enable user identification through data linkage or inference were anonymized using generalization techniques. All data are securely encrypted and stored on controlled servers, with access restricted to authorized members of the research team. Partial data obtained from four different channels are presented in Table 1.

Table 1.

Composition of review datasets from two telemedicine platforms.

	GDO			DDX
Department	Comment dataset	Ratio	Pos	Comment dataset	Ratio	Pos
1	2766	76.57%	2118	2605	74.47%	1940
2	2322	72.09%	1674	2116	69.00%	1460
3	4525	75.38%	3411	4250	70.99%	3017
4	3265	77.98%	2546	3650	72.19%	2635
Total	12,878	75.70%	9749	12,621	71.72%	9052

DDX: Doctor Dingxiang; GDO: Good Doctor Online.

Through manual classification, each comment is categorized based on its sentiment orientation, labeled as positive, neutral, or negative. In order to ensure the accuracy of manual tagging, the study in this article adopts multiperson joint tagging to make multiple judgments on controversial comments, retaining comments with the same tagging by three people, and finally obtaining more than 10,000 comments from the two platforms as the training corpus (Table 2).

Table 2.

Examples of manual tagging of comments for emotional tendencies.

Comments	Polarity	Judgment logic
The platform's interface is clean, simple and easy to use	Positive	Satisfied with the platform's interface and operating experience
The platform's reservation and registration function is convenient and saves time	Positive	Satisfied with the appointment booking function of the platform
The health information provided by the platform is rich and practical	Positive	Satisfied with the health information provided by the platform
I hope more doctors will join the platform	Neutral	Neutral on the number of doctors on the platform
I hope that the platform can reduce the price of drugs, so that patients can benefit more	Neutral	Recommendations were made to the platform
The booking and registration function of the platform is often malfunctioning and cannot be used normally	Negative	Dissatisfaction with the platform's appointment booking feature
Doctors do not reply in time, answer questions unprofessionally, and have a perfunctory attitude	Negative	Dissatisfaction with the platform's payment features

Capture topics of Servqual service quality assessment model

In order to obtain the service quality topics that healthcare consumers are concerned about as the feature dimension of telemedicine platforms, the study adopts healthcare consumer comments topic mining based on the Latent Dirichlet Allocation (LDA) topic model, and due to the specialization of medical information, further classification is achieved manually by combining the medical general knowledge with the LDA topic clustering.

After tokenizing and performing other preprocessing on the review data, this study used the doc2bow function from the Gensim library to convert the review texts into a bag-of-words model for text vectorization. Subsequently, evaluating the perplexity (a metric that measures the model's ability to predict unseen data) of models with varying topic counts to determine the optimal number of topics. Finally, three medical professionals independently reviewed and validated the generated topics, confirming that the optimal number of topics for our model is 14.

By explicitly setting the number of topics to be identified and implementing the LDA model, this study conducted a cluster analysis of healthcare consumer comments. Upon completing the clustering process and conducting an exhaustive analysis of the elicited topics, we observed intersections among some topics.

As shown in Table 3, to enhance the discriminability between topics, the identified topics were further refined and appropriately named with the assistance of experts in the field of telemedicine. This study, by comprehensively considering the intertopic correlation and the Servqual service quality assessment model, endeavors to reconstruct and refine the core indicator system for evaluating the service quality of telemedicine platforms. Given the fundamental differences between telemedicine and conventional services, the study revises the definitions of each dimension of the evaluation model to ensure a better alignment with the unique aspects of telemedicine services.

Table 3.

Results of the LDA clustering of healthcare consumer comments.

Topic	Topic Name	Keyword
$A_{1}$	System Stability	website, mobile, download, online, network, update, system, flashback
$A_{2}$	Interface Aesthetics	interfaces, advertising, search, index, navigation, push, page, module
$A_{3}$	Convenience of Operation	login, operation, text message, download, convenient and quick, unauthorized operation, bind, shortcut
$A_{4}$	Physician Professional Competence	doctor, medical skill, profound, professional, explanation, level, experience, professor
$A_{5}$	Physician Interaction Attitude	noble, compassion, medical ethics, accomplishment, morality, responsibility, accountable, conscience
$A_{6}$	Physician Interaction Speed	reply, very fast, time, quick, punctual, delay, anxious, efficient
$A_{7}$	Physician Ethics	friendly, responsible, encouraging, enthusiastic, approachable, amiable, patient, persevering
$A_{8}$	Customer Service Level	appointment, registration, consultation, treatment, medication, consultation, purchase medicine, prescribe medication
$A_{9}$	Platform Functional Diversity	customer service, attitude, response, service, efficient, staff, service attitude, resolve
$A_{10}$	Information Accuracy	information, assistance, useful, everyday knowledge, tips, effective, effect, take effect
$A_{11}$	Usefulness of Information	false, rigorous, diagnosis, prescription, accurate, reason, inaccurate, vague
$A_{12}$	Degree of privacy information confidentiality	phone number, comment, medical condition, examination report, make a phone call, medical history, private
$A_{13}$	Discount of Price	free, charge, free clinic, inexpensive, spend money, fee, expenditure, payment
$A_{14}$	Speed of after-sales drug delivery	ship, express delivery, shipping fee, logistics, tracking number, order, customer service, location

LDA: Latent Dirichlet Allocation.

Sentiment analysis model

For online reviews, sentiment analysis can be employed to determine healthcare consumers’ sentiments toward various issues.^39–41 Identifying the polarity of text within sentences or documents, determining whether expressions are neutral, positive, or negative, constitutes a primary objective of sentiment analysis. Addressing this aim, this paper introduces a sentiment analysis model based on a hybrid CNN-BiLSTM-MHA architecture. Compared to single models, the hybrid model holds the potential to enhance the accuracy of sentiment analysis. As illustrated in Figure 2, the proposed model comprises five main components: an input layer, a CNN module, a Bidirectional Long Short-term Memory Network (BiLSTM) layer, a multihead attention layer, and an output layer.

Figure 2.

Structure of the proposed hybrid deep learning model.

The role of the input layer is to accept normalized healthcare consumer comment data. In the convolutional layer, the feature extraction module can automatically identify key local features in comments. The BiLSTM layer is tasked with processing temporal sequence information, analyzing the logical relationship before and after a sentence to accurately understand complex sentence structures such as’ although the response is fast, the diagnosis is not accurate. In this study, a BiLSTM model was constructed using TensorFlow as the fundamental architecture, employing the Adam optimizer and tanh activation function. The BiLSTM layer processes feature vectors extracted by the CNN layer through bidirectional learning, capturing contextual information in time-series data.

The structure of the BiLSTM model can be expressed by the following formula:

i_{t} = σ (W_{x_{i}} x_{t} + W_{h_{i}} h_{t - 1} + b_{i})

(1)

f_{t} = σ (W_{x_{f}} x_{t} + W_{h_{f}} h_{t - 1} + b_{f})

(2)

c_{t} = f_{t} ⊙ c_{t - 1} ⊙ \tan h (W_{x_{c}} x_{t} + W_{h_{c}} h_{t - 1} + b_{c})

(3)

o_{t} = σ (W_{x_{o}} x_{t} + W_{h_{o}} h_{t - 1} + b_{o})

(4)

h_{t} = o_{t} ⊙ \tan h (c_{t})

(5)

In the equations, $x_{t}$ represents the input at the current time step, while $h_{t - 1}$ the hidden state from the previous time step. $W_{x_{i}}$ , $W_{x_{f}}$ , $W_{x_{c}}$ , and $W_{x_{o}}$ , as well as $W_{h_{i}}$ , $W_{h_{f}}$ , $W_{h_{c}}$ , and $W_{h_{o}}$ , are the weight matrices associated with the input/output and the hidden states. $i_{t}$ , $f_{t}$ , and $o_{t}$ represent the outputs of the input gate, forget gate, and output gate, respectively. $c_{t}$ corresponds to the updated cell state, and $h_{t}$ denotes the updated hidden state. The symbol ⊙ indicates element-wise multiplication, while $t a n h$ and $σ$ are activation functions. $b_{i}$ , $b_{f}$ , $b_{c}$ , and $b_{o}$ are the bias terms.

Meanwhile, the attention layer serves to capture the inherent correlations between input healthcare consumer comments and related data, thereby enhancing the classification accuracy of healthcare consumer comment polarity.

Statistical analysis

The statistical analysis in this study employed a comprehensive approach to evaluate telemedicine platform quality. Descriptive statistics were calculated for sentiment scores across service quality dimensions, with means and standard deviations reported to quantify central tendency and variability. The TF-IDF algorithm was utilized to determine indicator weights, while LDA topic modeling extracted key service quality themes from 25,499 user reviews. Sentiment analysis performance metrics (accuracy, precision, recall, and F1-score) for the proposed hybrid deep learning model and comparison models were expressed as mean ± standard deviation across multiple validation runs. Quality assessment scores for platforms were calculated using weighted sentiment values with 95% confidence intervals. Statistical significance of indicator weights was verified through hypothesis testing (p < 0.001 for all secondary indicators). All text processing and deep learning implementations were conducted using Python's natural language processing libraries and TensorFlow framework.

Analysis and results

Analysis of service quality evaluation indicators

Figure 3.

Correlation matrix between topics of the healthcare consumer comments obtained through LDA. LDA: Latent Dirichlet Allocation.

Tangibility involves factors such as the physical environment, equipment, and personnel image provided by the service provider. In the context of remote healthcare, which utilizes the internet to deliver services, this study considers the esthetic appeal of the remote healthcare platform's interface, system stability, and ease of operation as indicators of the platform's tangibility dimension. Reliability refers to the service provider's ability to fulfill service commitments punctually and accurately. For example, it entails timely completion of promised tasks and the ability to deliver services as promised. In this study, the professional competence of physicians, diversity of platform functionalities, and level of customer service are regarded as indicators of the platform's reliability dimension. Responsiveness refers to the speed and proactiveness of the service provider's response to customer requests, inquiries, and issues. In the context of remote healthcare, service speed such as the interaction speed of physicians and the promptness of postconsultation medication delivery can serve as indicators of the platform's responsiveness dimension. Assurance pertains to the service provider's knowledge, skills, reputation, and how they establish trust in service quality with customers. In this study, the platform's information accuracy, usefulness, and the ethical conduct of physicians are considered as indicators of the platform's assurance dimension. Empathy denotes the level of care and understanding shown by service providers toward customers, including the attention paid to and fulfillment of customers’ individual needs and expectations. In the process of telemedicine consultations, the demeanor of physicians during interactions, the platform's commitment to privacy protection, and the discount of price are indicative of the platform's concern and understanding toward patients, thus serving as dimensions of the platform's empathy.

Following the identification of primary and secondary indicators for assessing the quality of telemedicine platform services, the weights associated with these indicators were determined through the TF-IDF algorithm's analysis of feature words. This culminated in the formulation of the indicators for evaluating the quality of telemedicine platforms, as delineated in Table 4. As shown in Table 4, the weights of the five indicators of Quality of System, Quality of Service, Speed of Service, Quality of Information, and Attitude of Service are 0.1330, 0.3412, 0.2538, 0.1199, and 0.1521, respectively. It can be seen that for healthcare consumers, the quality of services provided by the platform is the most important, especially the professional competence of doctors.

Table 4.

Weights of primary and secondary service quality indicators.

Primary indicator	Weight of primary indicator	Secondary indicator	Weight of secondary indicator	P-value
Quality of System	0.1330	System Stability	0.5610	<0.001
		Interface Aesthetics	0.1257
		Convenience of operation	0.3133
Quality of Service	0.3412	Physician Professional Competence	0.5459	<0.001
		Platform Functional Diversity	0.2316
		Customer Service Level	0.2225
Speed of Service	0.2538	Physician Interaction Speed	0.6889	<0.001
Speed of Service	0.2538	Speed of After-sales Drug Delivery	0.3111	<0.001
Quality of Information	0.1199	Information accuracy	0.3665	<0.001
		Usefulness of Information	0.2825
		Physician Ethics	0.3510
Attitude of Service	0.1521	Physician Interaction Attitude	0.3459	<0.001
		Degree of privacy Information Confidentiality	0.3182
		Discounts of Price	0.3359

Table 4 demonstrates that physicians’ professional competence is the cornerstone of service quality. The platform should enhance the verification of physician credentials and provide ongoing training to ensure that physicians possess advanced medical skills and effective communication abilities. Furthermore, expanding services, such as online prescription issuance and health record management, can diversify the platform's functionalities to meet the varied needs of users.

Regarding service speed, the platform could introduce an intelligent triage system or optimize physician scheduling mechanisms to reduce user wait times. In addition, improving postsale pharmaceutical delivery by partnering with logistics companies to establish expedited channels will ensure timely medication distribution.

Concerning system quality, the platform should conduct regular stress tests and optimize compatibility across mobile and web interfaces. Engaging user experience designers for interface iteration can also reduce advertising interference and enhance navigational logic.

For information quality, strengthening the review mechanism for medical content is essential. Establishing an expert team to periodically audit platform content will help ensure that the information provided is both scientifical and practical. To improve the usefulness of the information, a personalized health knowledge push feature should be developed to offer customized recommendations based on users’ consultation histories.

The three secondary indicators of service attitude carry similar weights; hence, the platform should implement a physician service attitude evaluation system that integrates user feedback into performance assessments. Simultaneously, reinforcing the confidentiality of personal information—by clearly communicating the scope of data usage and offering privacy settings—will enhance user trust. Finally, to support low-income groups, the platform should introduce public welfare consultations subsidized jointly by the government and the platform.

Table 5 presents the final assessment results obtained from applying the quality evaluation method developed in this study to the platforms “Good Doctor Online” and “Doctor Dingxiang”. The quality assessment calculation is derived from the emotional score of the theme and the weight of its indicators. The quality assessment calculation formulas are shown as follows:

δ_{A_{i}} = 5 \cdot \frac{C_{p o s} - C_{n e g}}{C_{p o s} + C_{n e u} + C_{n e g}}

(6)

Sentiment Score S_{i} = \sum_{k = 1}^{n} w_{A_{i}} δ_{A_{i}}

(7)

Table 5.

Results of the quality assessment of the telemedicine platforms.

Variables	Good Doctor Online		Doctor Dingxiang
	Score	95%CI	Score	95%CI
Quality of System	4.161	4.010–4.312	4.109	3.850–4.368
Quality of Service	4.199	3.855–4.543	4.051	3.755–4.347
Speed of Service	3.340	3.040–3.640	3.027	2.663–3.391
Quality of Information	3.575	3.105–4.045	3.670	3.225–4.115
Attitude of Service	3.615	3.135–4.095	2.896	2.635–3.157
Total	3.812	-	3.577	-

Where $δ_{A_{i}}$ denotes the affective tendency score of topic $A_{i}$ , considering matching with the mainstream five-point scale, this article multiplies the affective tendency score by 5, and $C_{p o s}$ , $C_{n e u}$ , and $C_{n e g}$ denotes the number of positive, neutral, and negative comments, respectively. $S_{i}$ denotes the sentiment score of primary indicators, which is calculated from the sentiment score of topic $A_{i}$ and its indicator weights $w_{A_{i}}$ .

Each comment is classified for sentiment polarity using a hybrid deep learning model, and a comprehensive score for each service dimension is calculated based on topic weights. On this basis, the variance and covariance matrix of each indicator is computed using the Bootstrap resampling technique, ultimately yielding the 95% confidence interval (CI) for each dimension. Numerical calculations are primarily performed using Python's scipy.stats module.

Table 5 shows that the overall platform quality score of GDO is 3.812, which marginally outperforms the overall platform quality score of DDX, which is 3.577. In three aspects: Quality of System, Quality of Service, and Speed of Service, the score of GDO are 4.161, 4.199, and 3.340 which superior performance compared to DDX, showcasing its advantages in platform stability, customer response, and operational efficiency. However, in terms of information quality, DDX surpasses GDO, indicating that it provides more accurate and comprehensive medical information. The performance of the two platforms is comparable in terms of service attitude, with deficiencies observed in professionalism and empathy. Overall, both platforms exhibit certain shortcomings in service speed, information quality, and service attitude, and neither can satisfy healthcare consumers in terms of postpurchase medication delivery and discounts of price.

Sentiment classification based on mixed deep learning model

Sentiment analysis algorithm as the core of the quality assessment method of telemedicine platform proposed in this article, its algorithm performance is related to the effectiveness of the quality assessment method of this article. To verify the performance of the proposed methodology in this article even further, several other deep learning algorithms were used on the same dataset for comparison experiments, and the results are shown in Table 6.

Table 6.

Predicting results of different classification models (mean value ± standard deviation).

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-score (%)
CNN⁴²	82.70 ± 0.64	81.52 ± 1.12	84.58 ± 1.20	83.02 ± 0.77
RNN⁴³	82.59 ± 0.69	83.15 ± 1.09	81.79 ± 1.47	82.45 ± 0.68
BiLSTM⁴⁴	84.42 ± 0.73	83.85 ± 1.35	85.26 ± 0.98	84.55 ± 0.81
BiLSTM-Att⁴⁵	88.60 ± 0.58	88.77 ± 0.83	86.66 ± 0.62	87.69 ± 0.59
CNN-BiLSTM⁴⁶	88.31 ± 0.24	89.91 ± 0.41	89.75 ± 0.37	89.33 ± 0.48
Proposed methodology	91.24 ± 0.12	91.28 ± 0.40	91.23 ± 0.33	91.25 ± 0.26

BiLSTM: Bidirectional Long Short-term Memory Network; CNN: Convolutional Neural Network; RNN: Recurrent Neural Network.

The performance comparison of different models is presented as mean ± standard deviation, including four core metrics: Accuracy, Precision, Recall, and F1-score. The results of the different classification models in Table 3 show that the precision and F1-score of the proposed methodolog in this study is 91.28% and 91.25%, respectively, which is higher than the 89.91% and 89.33% of the CNN-BiLSTM model and the 88.77% and 87.69% of the BiLSTM-Att model.

The CNN extracts local textual features through convolutional kernels but lacks the capability to model long-range semantic dependencies, resulting in insufficient recognition of complex emotional expressions. The RNN, while modeling temporal sequences, suffers from the vanishing gradient problem, making it difficult to capture long-text dependency relationships. The BiLSTM effectively captures contextual information but demonstrates insufficient sensitivity to local critical features. The BiLSTM-Att enhances key positional weighting through an attention mechanism, yet its single-head attention fails to comprehensively cover multidimensional semantic correlations. Although the CNN-BiLSTM integrates CNN's local feature extraction with BiLSTM's sequential modeling, it lacks mechanisms to establish interconnections between features across different hierarchical levels or spatial positions. From the experimental results, the sentiment analysis algorithm proposed in this article greatly assists in evaluating the service quality of telemedicine platforms. By combining the strengths of CNN and BiLSTM, and introducing a multihead self-attention mechanism, the CNN-BiLSTM model with the multihead self-attention mechanism surpasses other models in critical performance metrics like accuracy, precision, recall, and F1-Score. The multihead attention mechanism can analyze online comments content more comprehensively, simulating the human ability to focus on key points during reading, while capturing multidimensional information such as service speed and doctors’ professional competence. With the parallel processing capacity of multiple heads, the model can perform comprehensive analyses from different perspectives, resulting in a more complete and rich data representation.

Discussion

The main purpose of this study was to establish a new quality assessment method for telemedicine platforms. To better analyze the emotions of healthcare consumers during the use of telemedicine services, we collected 25,499 online reviews from the telemedicine platforms. This study focuses on analyzing the data from these comments to extract and summarize the factors influencing the evaluation of service quality on telemedicine platforms. Additionally, employing methods such as deep learning, sentiment analysis is conducted on the textual content of comments. Subsequently, a model for evaluating the service quality of telemedicine platforms is constructed, providing an assessment of the service quality offered by telemedicine platforms.

Jonkisz et al. investigated the application of the Srevqual model in assessing healthcare quality in Asia, demonstrating its cross-domain applicability.²⁴ This study expands the boundaries of traditional service quality evaluation by integrating the Srevqual model with deep learning techniques. In contrast to the classical Srevqual framework proposed by Parasuraman et al.,¹⁷ this research restructures the dimensions specifically for telemedicine characteristics, making it better aligned with the practical needs of digital healthcare services. Furthermore, it transcends the dependence on subjective data inherent in traditional questionnaire-based surveys.²³

As services possess intangible attributes, establishing a scientifically robust system for evaluating service quality necessitates the consideration of various factors influencing service standards. Simultaneously, these indicators should be representative, striving for both brevity and inclusiveness to assess the quality of telemedicine platforms from diverse perspectives. This study, by comprehensively considering the inter-topic correlation and the Servqual service quality assessment model, endeavors to reconstruct and refine the core indicator system for evaluating the service quality of telemedicine platforms. Given the fundamental differences between telemedicine and conventional services, the study revises the definitions of each dimension of the evaluation model to ensure a better alignment with the unique aspects of telemedicine services. In this study, we collected healthcare consumer comments from two telemedicine platforms. By using LDA topic clustering and the Servqual model's five quality dimensions, we identified that healthcare consumer comments predominantly revolve around aspects such as Quality of System, Quality of Service, Attitude of Service, Quality of information, and Speed of Service. These identified comments topics are considered as service features that healthcare consumers are concerned about and are used to evaluate the quality of services provided by the platforms.

This study adopted a hybrid deep learning model, CNN-BiLSTM-MHA, for sentiment analysis of healthcare consumer comments. Comparative experiments reveal that, compared to traditional deep learning methods, the model constructed in this paper exhibits significant performed well in the precision and F1-Score of 91.28% and 91.25% which was higher than the single model by approximately 11%. By introducing multiple attention heads, the model can focus on different parts of the comments text from various perspectives, capturing richer semantic information. This multihead attention mechanism enables the model to achieve better accuracy and generalization capabilities in sentiment classification tasks. From the experimental results, the sentiment analysis algorithm proposed in this article greatly assists in evaluating the service quality of telemedicine platforms. The study found that combining CNN with BiLSTM, and introducing a multihead self-attention mechanism, the CNN-BiLSTM model with the multihead self-attention mechanism surpasses other models in critical performance metrics like accuracy, precision, recall, and F1-Score. The multihead attention mechanism can analyze online comments content more comprehensively, enhancing the model's ability to understand healthcare consumers’ needs and service experiences, thereby providing more targeted improvement suggestions for the platform. With the parallel processing capacity of multiple heads, the model can perform comprehensive analyses from different perspectives, resulting in a more complete and rich data representation.

Limitations

The data samples utilized in this study originate from GDO and DDX platforms, which may introduce certain limitations. Users of both platforms are predominantly distributed in eastern developed cities, while western regions exhibit lower user proportions. This geographical imbalance may result in underestimation of remote areas’ sensitivity to logistics efficiency (such as medication delivery delays) and information accuracy (including dialect-related communication barriers). Additionally, the platforms’ primary user base consists of young and middle-aged individuals (20–45 years old), with relatively low representation of elderly users (>60 years). Consequently, older demographics’ requirements for operational simplicity (e.g. interface complexity) may not be adequately reflected. Regarding platform characteristics, DDX specializes in health science popularization whereas GDO emphasizes online consultations. These distinct service models could lead to differentiated user priorities, the former potentially emphasizing information quality and the latter prioritizing response speed.

Conclusion

This study introduces a novel method for evaluating the quality of telemedicine platforms, using the Servqual quality assessment model in conjunction with the CNN-BiLSTM-MHA deep learning model. The swift proliferation of such platforms during the COVID-19 crisis has provided substantial assistance to China in addressing issues of medical resource scarcity and imbalances in supply and demand, while also supporting healthcare consumers’ health and wellbeing. However, uncontrolled and aggressive expansion could potentially lead to future challenges, including pricing irregularities and fraudulent marketing practices within the telemedicine sector. Consequently, assessing platform quality through healthcare consumer comments, which authentically capture their sentiments on aspects influencing their health and wellbeing, emerges as indispensable for the sustainable evolution of telemedicine services.

The evaluation framework and hybrid deep learning model developed in this study are not only applicable to telemedicine platforms in China, but also provide a viable methodology for global scenarios with similar characteristics. The urban–rural service disparities revealed in this research show parallels to the pain points of telemedicine implementation in developing regions such as India and Brazil. Through word embedding transfer learning techniques, the proposed model can be effectively transferred to analyze user reviews in other languages, thereby offering a cross-cultural service quality assessment tool for multilingual regions. The framework enables dynamic adjustment of weight coefficients across different evaluation dimensions to accommodate varying policy requirements and regulatory intensities in different national contexts.

Our findings have important practical implications for participants in telemedicine platforms. This study found that healthcare consumers’ evaluation of the quality of remote medical platform services can be mainly divided into five parts, with the most important being Quality of Service, a critical factor for promoting consumer health and wellbeing. For practitioners of telemedicine platforms, it is essential to standardize and control service pricing while rigorously ensuring service quality to promote both health and wellbeing. With the expansion of healthcare needs, the new methodology proposed in this study for assessing the quality of telemedicine services can provide valuable insights into the development of healthcare services. Based on the weights of service quality dimensions and identified score deficiencies, it is recommended that the platform implement phased improvements according to the priority levels outlined in Table 7 below. This article validates the proposed method through examples, but the collected data may have sample bias issues. In future work, the method can be validated by expanding channels to collect data.

Table 7.

The schedule for telemedicine platforms improvement tasks.

Stage	Time (months)	Key point	Expected results
Short-term	0–6	Optimize doctor response speed Streamline interface advertising	10% increase in user satisfaction index Consultation response time ≤15 minutes
Mid-term	6–12	Deploy AI-assisted diagnostic module Establish regional pharmaceutical hubs	15% improvement in diagnostic accuracy rate 48-hour guaranteed delivery for remote areas
Long-term	12–24	Deploy blockchain encrypted medical record system Comprehensive accessibility upgrade	30% reduction in privacy complaints 20% increase in elderly user activity

AI: artificial intelligence.

Footnotes

ORCID iDs

Xiaogang Jin

Chaoqi Chang

Ethical considerations

Ethical approval is not applicable to this study as no human participants or animals are used.

Consent to participate

Written informed consent was obtained from all participants prior to their inclusion in the research. The consent process explicitly outlined the study's purpose, procedures, potential risks, and benefits, as well as participants’ right to withdraw at any time without penalty. All data were anonymized prior to analysis to ensure confidentiality. Participants were informed that anonymized findings may be published in an open-access format, freely accessible to the public. No personally identifiable information (e.g. images, names, medical records) is included in this manuscript.

Author contributions

XJ and YY involved in conceptualization; XJ in methodology, data curation, and writing original draft preparation ; CC in software and investigation; XW in validation; YY in formal analysis, resources, supervision, project administration, and funding acquisition;; XT in writing review and editing; ZL in visualization. All authors have read and agreed to the published version of the manuscript.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Social Science Fund of China, (grant number No.23BXW011).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Foglia

Garagiola

Bellavia

, et al. Digital technology and COVID-19 pandemic: feasibility and acceptance of an innovative telemedicine platform. Technovation 2024; 130: 102941.

Wang

Zhou

, et al. Utilization of, satisfaction toward, and challenges for Internet-based healthcare services provided by primary health institutions: evidence from China. Front Public Health 2023; 10: 1100634.

Xiao

Chen

Zhou

, et al. Challenges in establishing a strong telemedicine system in China. Postgrad Med J 2023; 99: 1–3.

Hartasanchez

Heen

Kunneman

, et al. Remote shared decision making through telemedicine: a systematic review of the literature. Patient Educ Couns 2022; 105: 356–365.

Zhang

Wei

, et al. Analysis of health service utilization and influencing factors due to COVID-19 in Beijing: a large cross-sectional survey. Health Res Policy Syst 2024; 22: 31.

Benis

Banker

Pinkasovich

. Reasons for utilizing telemedicine during and after the COVID-19 pandemic: an internet-based international study. J Clin Med 2021; 10: 5519.

Song

Liu

Wang

. The role of telemedicine during the COVID-19 epidemic in China—experience from Shandong province. Crit Care 2020; 24: 1–4.

Tsou

Robinson

Boyd

, et al. Effectiveness of telehealth in rural and remote emergency departments, systematic review. J Med Internet Res 2021; 23: e30632.

Rush

Seaton

, et al. Rural use of health service and telemedicine during COVID-19: the role of access and eHealth literacy. Health Informatics J 2021; 27: 14604582211020064.

10.

Fisher

Tapley

Ralston

, et al. General practice trainees’ telehealth use during the COVID-19 pandemic: a cross-sectional study. Fam Pract 2023; 40: 638–647.

11.

Fitzsimon

Belanger

Glazier

. Clinical and economic impact of a community-based, hybrid model of in-person and virtual care in a Canadian rural setting: a cross-sectional population-based comparative study. BMJ Open 2023; 13: e069699.

12.

Hosseini

Boushehri

Alimohammadzadeh

. Challenges and solutions for implementing telemedicine in Iran from health policymakers’ perspective. BMC Health Serv Res 2024; 24: 50.

13.

Ftouni

AlJardali

Hamdanieh

, et al. Challenges of telemedicine during the COVID-19 pandemic: a systematic review. BMC Med Inform Decis Mak 2022; 22: 207.

14.

Mishkin

Zabinski

Holt

, et al. Ensuring privacy in telemedicine: ethical and clinical challenges. J Telemed Telecare 2023; 29: 217–221.

15.

Antonacci

Benevento

Bonavitacola

. Healthcare professional and manager perceptions on drivers, benefits, and challenges of telemedicine: results from a cross-sectional survey in the Italian NHS. BMC Health Serv Res 2023; 23: 1115.

16.

Rosário

. Telemedicine platforms and telemedicine systems in patient satisfaction. In: Improving security, privacy, and connectivity among telemedicine platforms. Pennsylvania: IGI Global, 2024, pp.119–151.

17.

Parasuraman

Zeithaml

Berry

. A conceptual model of service quality and its implications for future research. J Mark 1985; 49: 41–50.

18.

Dam

. Relationships between service quality, brand image, customer satisfaction, and customer loyalty. J Asian Finance Econ Bus 2021; 8: 585–593.

19.

Quan

Thanh

Thuy

TNT

. The capability of E-reviews in online shopping. Integration of the PLS-SEM and ANN method. Int J Prof Bus Rev 2023; 8: e02638.

20.

Rane

Achari

Choudhary

. Enhancing customer loyalty through quality of service: effective strategies to improve customer satisfaction, experience, relationship, and engagement. Int Res J Mod Eng Technol Sci 2023; 5: 427–452.

21.

Liu

Zhang

Gao

, et al. Physician voice characteristics and patient satisfaction in online health consultation. Inf Manag 2020; 57: 103233.

22.

Wang

Zhao

, et al. A novel topic clustering algorithm based on graph neural network for question topic diversity. Inf Sci (Ny) 2023; 629: 685–702.

23.

Goula

Stamouli

Alexandridou

. Public hospital quality assessment. Evidence from Greek health setting using SERVQUAL model. Int J Environ Res Public Health 2021; 18: 3418.

24.

Jonkisz

Karniej

Krasowska

. The Servqual method as an assessment tool of the quality of medical services in selected Asian countries. Int J Environ Res Public Health 2022; 19: 7831.

25.

Wilson

Maeder

. Recent directions in telemedicine: review of trends in research and practice. Healthc Inform Res 2015; 21: 213.

26.

Hardy

Grinzaid

. Benefits and challenges of telemedicine: the JScreen program experience. Curr Genet Med Rep 2017; 5: 84–90.

27.

Sharifikia

Rafizadeh

Shahmoradi

. Telemedicine in the emergency department: an overview of systematic reviews. J Public Health (Berl.) 2023; 31: 1193–1207.

28.

Girum

Lentiro

Geremew

. Global strategies and effectiveness for COVID-19 prevention through contact tracing, screening, quarantine, and isolation: a systematic review. Trop Med Health 2020; 48: 91.

29.

Pappalardo

Fanelli

Chiné

, et al. Telemedicine in pediatric infectious diseases. Children 2021; 8: 260.

30.

Snoswell

Chelberg

De Guzman

, et al. The clinical effectiveness of telehealth: a systematic review of meta-analyses from 2010 to 2019. J Telemed Telecare 2023; 29: 669–684.

31.

Weiner

Bandeian

Hatef

, et al. In-person and telehealth ambulatory contacts and costs in a large US insured cohort before and during the COVID-19 pandemic. JAMA Netw Open 2021; 4: e212618.

32.

Patil

Rane

. Customer experience and satisfaction: importance of customer reviews and customer value on buying preference. Int Res J Mod Eng Technol Sci 2023; 5: 3437–3447.

33.

Fan

. Effect of online reviews on consumer purchase behavior. J Serv Sci Manag 2015; 8: 419.

34.

Thakur

. Customer engagement and online reviews. J Retail Consum Serv 2018; 41: 48–59.

35.

Sudirjo

Ratnawati

Hadiyati

, et al. The influence of online customer reviews and e-service quality on buying decisions in electronic commerce. J Manag Creat Bus 2023; 1: 156–181.

36.

Bruce

, et al. Restaurant survival prediction using customer-generated content: an aspect-based sentiment analysis of online reviews. Tour Manag 2023; 96: 104707.

37.

Wang

. Keyword extraction from scientific research projects based on SRP-TF-IDF. Chin J Electron 2021; 30: 652–657.

38.

Kim

Kang

. Analyzing the discriminative attributes of products using text mining focused on cosmetic reviews. Inf Process Manag 2018; 54: 938–957.

39.

Zhao

Zhang

Zeng

. Construction of an aspect-level sentiment analysis model for online medical reviews. Inf Process Manag 2023; 60: 103513.

40.

Chang

Hwang

. Learning bilingual sentiment lexicon for online reviews. Electron Commer Res Appl 2021; 47: 101037.

41.

Jain

Pamula

Srivastava

. Systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Comput Sci Rev 2021; 41: 100413.

42.

Mhamed

Sutcliffe

Quteineh

. A deep CNN architecture with novel pooling layer applied to two Sudanese Arabic sentiment data sets. J Inf Sci 2022: 01655515231188341.

43.

Kuang

Safa

Edalatpanah

. A hybrid deep learning approach for sentiment analysis in product reviews. Facta Univ Ser: Mech Eng 2023; 21: 479–500.

44.

Pei

Chen

. Ab-labse: uyghur sentiment analysis via the pre-training model with bilstm. Appl Sci 2022; 12: 1182.

45.

Jiang

Zhao

, et al. A PERT-BiLSTM-Att model for online public opinion text sentiment analysis. Intell Autom Soft Comput 2023; 37: 2387–2406.

46.

Alayba

Palade

. Leveraging Arabic sentiment classification using an enhanced CNN-LSTM approach and effective Arabic text preparation. J King Saud Univ Comput Inf Sci 2022; 34: 9710–9722.