Abstract
Objective
To identify user needs on digital mental health platforms through text mining to guide service optimization.
Methods
User comments collected over 6 months via Python-based web scraping (ethically reviewed) underwent data cleaning, word segmentation, and stop-word removal. Latent topics were extracted using Latent Dirichlet Allocation (LDA), with sentiment and semantic network analyses via ROST Content Mining System (Version 6.0). A need-intensity formula integrated with LDA enabled quantitative analysis of user needs and emotional tendencies.
Results
From 25,131 valid user comments, six main topics were identified. Topic prevalence, measured by comment volume, was highest for “Career and Personal Growth” (20.40%) and “Social and Emotional Support” (20.15%). Sentiment analysis showed over 77% of comments expressed negative emotions, with each topic exceeding 96% negative sentiment. We then calculated a composite need-intensity score (via LDA integration and Analytic Hierarchy Process weighting: comment count 0.54, sentiment 0.16, user attention 0.30) to prioritize beyond mere prevalence. This analysis identified “Career and Personal Growth” (0.94) and “Social and Emotional Support” (0.88) as the most pressing needs.
Conclusions
Users exhibit pronounced needs in emotion management and psychological counseling, with a strong emphasis on familial and social support. To address the most urgent needs identified by our need-intensity analysis, we recommend that platforms implement targeted features such as a structured career stress triage pathway and secure family linkage options. These actionable strategies can enhance service precision and user engagement, providing a clear roadmap for optimizing digital mental health service delivery.
Keywords
Introduction
The escalating global burden of mental disorders has intensified the demand for accessible and effective mental health services, a need particularly acute in China where the lifetime prevalence of mental disorders is 16.6%. 1 Digital mental health platforms have rapidly developed to meet this demand, positioning themselves as a key supplement to traditional services. However, the core value of these platforms hinges on their ability to accurately understand and respond to user needs, which are inherently dynamic and heterogeneous.2,3 Currently, a significant disconnect exists between user demands and service provision on these platforms, characterized by structural imbalances in supply and a lag in responding to evolving needs.4,5
This disconnect stems from critical shortcomings in the current paradigms for identifying user needs. Firstly, prevailing methodological approaches, which often rely on small-sample surveys or qualitative interviews, 6 lack the scalability to capture the full spectrum of real-world, large-scale user concerns. Secondly, many studies aiming to understand user needs draw data from general social media.7,8 While accessible, this data is prone to source bias and diluted by non-clinical content, thereby lacking the direct relevance and “signal-to-noise ratio” of data from dedicated mental health platforms. Thirdly, and most critically, existing research predominantly stops at identifying the existence of needs. There is a conspicuous lack of frameworks to quantitatively prioritize these needs by their intensity and urgency, 9 which is essential for guiding strategic resource allocation and targeted service optimization by platform managers.
To bridge these gaps, this study introduces a text mining approach specifically designed for precise need identification and prioritization. We analyze user comments from leading specialized digital mental health platforms in China (Yixinli, Yidianling, and Haoxinqing). By integrating Latent Dirichlet Allocation (LDA) topic modeling with sentiment analysis and a novel need-intensity formula, our research moves beyond thematic discovery to deliver a quantifiable ranking of user demands. This methodology directly addresses the identified limitations by leveraging large-scale, high-relevance data to provide actionable, data-driven insights for platform optimization, thereby offering a concrete pathway to better align digital mental health services with the most pressing needs of their users.
Methods
Data source
The data in this study were sourced from mainstream professional digital mental health service platforms in China, including Yixinli, Yidianling, and Haoxinging. These platforms were selected based on the following considerations: First, their high alignment with the research theme. This study aims to accurately identify user groups with explicit needs for professional mental health services. Compared with comprehensive social platforms (e.g. Zhihu, Weibo) or embedded applications (e.g. WeChat Mini Programs), these independent platforms serve as the primary venues where users proactively seek professional psychological help. The messages and consultation content generated by users here directly and centrally reflect their unmet mental health service needs, with a high data “signal-to-noise ratio,” which can effectively avoid interference from pan-social and entertainment-oriented content on comprehensive platforms. Second, the authority and richness of platform content. The selected platforms have established authority and a broad user base in China's mental health field, providing a large amount of structured user-generated content (UGC), such as message titles, content, view counts, and reply numbers. These metadata provide key indicators for quantitatively analyzing the intensity of needs, which is often lacking in data from other forums. Third, the accessibility and ethical compliance of the data. The public data on these platforms were obtained through their publicly accessible interfaces. During the data collection process, the research team only collected explicitly public UGC and excluded any information involving personal privacy. Additionally, this study has obtained approval from the Ethics Committee of Wenzhou Medical University (Approval ID: 2025061), which provides ethical guarantees for the compliant acquisition and use of data, in line with the norms of academic research.
This study collected user message data from the most recent 6 months (March to August 2023). The 6-month window captures a representative snapshot of contemporary user concerns while mitigating the impact of seasonal variations in mental health (e.g. seasonal affective disorder, year-end stress, or academic examination periods). It balances the need for timely data with sufficient volume and stability for robust analysis, reducing confounding from single external events. Ultimately, a total of 26,472 raw text entries were obtained from the aforementioned platforms, providing strong support for in-depth analysis of user needs and emotional tendencies.
Text mining
Latent Dirichlet Allocation topic modeling
To identify latent themes from the collected user comments, we employed LDA, an unsupervised generative probabilistic model that represents documents as mixtures of topics and topics as distributions of words.10,11 The modeling process was implemented using the Gensim library (version 4.3.0) in Python. Text preprocessing included Chinese word segmentation using the Jieba tokenizer (version 0.42.1) with platform-specific dictionaries to enhance accuracy, and stop-word removal using the Dalian University of Technology stop-word list supplemented with custom internet slang filters. To determine the optimal number of topics (K), models with K ranging from 2 to 15 were iteratively evaluated based on perplexity and coherence scores (using the ‘c_v’ coherence metric). The value K = 6 was selected as it achieved the lowest perplexity (Figure 1) and the highest coherence (score = 0.79; Figure 2), indicating an optimal balance between model fit and topic interpretability for our corpus. This quantitative approach was supplemented by a structured qualitative validation process to assign meaningful labels to the LDA-derived topics. The process involved three experts in psychological services, each with over 5 years of research experience in the field. They first independently reviewed the top keywords and a set of representative (paraphrased) comments for each topic. Following the independent review, the experts engaged in a structured online discussion to present their initial thematic labels and justifications. The discussion continued until a full consensus was reached on the final label for each topic, ensuring that the identified topics were not only data-driven but also clinically meaningful and consistently interpreted.

Evaluation of perplexity across different topic numbers.

Evaluation of topic coherence across different topic numbers.
ROST content mining system (Version 6)
The ROST Content Mining System (Version 6) is an integrated text analysis platform that utilizes big data technologies and incorporates advanced natural language processing and machine learning methods. 12 It offers multiple core functions including Chinese word segmentation, word frequency analysis, social network, and semantic network analysis, as well as sentiment analysis. In this study, ROST CM6 was applied to derive meaningful insights from the extensive user-generated comments on mental health service platforms. The system's sentiment analysis module calculates sentiment polarity and intensity based on established linguistic resources and machine learning techniques. 13 Semantic network analysis was used to construct a network graph by extracting high-frequency words and their co-occurrence relationships, thereby revealing the conceptual structure and connections within the user discourse.
By integrating LDA topic modeling with ROST CM6's sentiment and semantic network analyses, our research was able to not only identify latent themes in user comments but also examine the emotional tendencies and conceptual relationships associated with each topic. This combined approach enabled a comprehensive understanding of user needs at both macro-level thematic and micro-level emotional dimensions.
It is important to acknowledge the limitations of the ROST CM6 system. The sentiment analysis may be influenced by biases in its pre-built dictionaries, which might not fully capture domain-specific expressions or emerging terminology. Additionally, while the hybrid model employs multiple approaches, it may still produce classification errors when dealing with nuanced language features such as irony, sarcasm, or context-dependent expressions. These limitations suggest that automated results should be interpreted with caution and may benefit from supplementary manual validation.
Statistical analysis
In this study, we integrated the concept of need intensity with LDA to quantitatively assess public demand for psychological services across different identified topics. This approach involved translating key indicators—including the number of user comments, sentiment tendency, and user attention (proxied by average views)—into a composite need-intensity value for each topic. These values were then statistically analyzed to evaluate and compare the demand levels for various mental health service themes, thereby offering data-driven insights to support the refined management of public mental health resources. The need intensity for a specific topic was formalized using the following formula (1):
The weight coefficients α, β, and χ for the three indicators were determined using the Analytic Hierarchy Process (AHP). To ensure the robustness and validity of the weighting, a panel of seven experts specializing in mental health services, clinical psychology, and public health policy was assembled. Each expert independently constructed a pairwise comparison matrix by scoring the relative importance of each indicator. The consistency of each expert's judgments was rigorously evaluated using the consistency ratio (CR), with all CR values falling below the standard threshold of 0.1, indicating acceptable consistency in the pairwise comparisons. 14 The final weights were derived by aggregating the consistent judgments of all seven experts. Using formula (1) with the obtained weights, we calculated the need intensity for each topic.
In summary, the need intensity of a topic is influenced by (a) the volume of user comments, (b) the strength of sentiment expressed (where more extreme sentiment signals higher urgency), and (c) the degree of public attention measured by view counts. Collectively, a larger number of comments, stronger emotional expression, and greater viewer engagement each contribute to a higher need-intensity score for a given topic.
Results
Need identification
To ensure comprehensive and representative data, we developed a Python-based web crawler to capture publicly available user comments from the selected platforms. The raw data collected were then cleaned and preprocessed. We removed duplicate entries, completed any missing fields, and filtered out irrelevant or invalid information. We performed Chinese word segmentation using the Jieba toolkit, augmented with domain-specific dictionaries from the platforms to improve segmentation accuracy. We also filtered out stop words using the Dalian University of Technology stop-word list supplemented with a custom list of meaningless internet slang. After preprocessing, we retained 25,131 valid text entries for analysis. We combined each comment's title, content, timestamp, view count, and number of replies into a structured dataset for analyzing user needs on the digital mental health service platforms.
Need analysis
Need content analysis
Table 1 shows the topic modeling results for the collected user comments. We first segmented the text using Jieba (version 0.42.1) and removed stop words, then applied a TF-IDF model to extract keywords and convert the documents into numerical feature vectors. We finally employed the LDA topic model (Gensim 4.3.0) to identify prevalent themes. Using LDA's perplexity and coherence parameters (‘c_v’ metric), we determined the optimal number of topics to be 6 (coherence score = 0.79; see Figure 2). We extracted the top keywords for each topic and ranked them by their probability within that topic, yielding the results shown. By interpreting and inductively grouping these keywords, we identified six prominent “need topics” expressed by users: Career and Personal Growth, Social and Emotional Support, Adolescent Mental Health, Emotional Disorder Treatment, Family and Marriage Counseling, and Family Life and Parenting. As shown in the table, users’ needs on the platform primarily concentrate in areas such as personal growth and career issues, emotional support, and youth mental health.
Identified topics from user comments, with topic intensity scores and top keywords.
To qualitatively contextualize the interpreted topic labels and enhance the validity of our findings, we examined representative user expressions corresponding to the key themes. For instance, the theme “Career and Personal Growth” was vividly reflected in user descriptions of work-related anxiety and sleep disturbances due to excessive overtime, as well as feelings of being lost and under immense pressure regarding their future career path. Similarly, the “Social and Emotional Support” topic captured expressions of loneliness and a desire for connection, exemplified by users reporting having no one to confide in when feeling down and expressing a hope to find genuinely understanding friends. Concerns under “Adolescent Mental Health” often related to overwhelming academic pressure and strained family dynamics, with users describing schoolwork as unbearable and feelings of being suffocated, or reporting that their parents focused solely on academic performance while dismissing their emotional needs. Furthermore, users seeking “Emotional Disorder Treatment” expressed clinical concerns, describing persistent low mood, lack of motivation over several months—wondering if it constituted depression—and debilitating panic attacks that impaired their daily functioning. These paraphrased accounts illustrate the lived experiences and core concerns behind the quantitative topic modeling results, grounding our thematic interpretation in the actual voices of users while safeguarding their privacy.
In Figure 3, we present a visualization of the LDA topic model via pyLDAvis. In this bubble chart, each bubble represents one topic; larger bubbles indicate topics that appear more frequently in the corpus. The relative distance between bubbles reflects the similarity between topics, where overlapping bubbles would suggest shared characteristic keywords. 15 Through big data analytics, one can identify the public's prominent needs in the text data and intuitively present the overall landscape of public service needs in a given domain. In our results, the topic bubbles are well-separated with no overlaps, indicating a clear delineation between topics and strong independence of each theme. This suggests that the topic modeling achieves effective extraction of distinctly different themes, and that the chosen number of topics and the model's performance are appropriate.

LDA topic model visualization of user needs. LDA: Latent Dirichlet Allocation.
Need correlation analyses
As shown in Figure 4, we generated a semantic network graph via ROST CM6.0 to display the core themes in the user comments and their interconnections. In this graph, each node represents a distinct keyword, and the lines between nodes indicate co-occurrence relationships, revealing the internal links among user needs. Several keywords emerge as central nodes in this network, including “parents,” “children,” “learning,” and “friends.” These central nodes occupy important positions in the user messages, indicating that they are focal points of user concern. For example, the node “emotion” is closely connected to state-related words such as “depression” and “anxiety,” illustrating users’ concern with emotion management. The “family” node is associated with words such as “parents,” “children,” and “marriage,” which further emphasizes the central role of family relationships in discussions of mental health (Vigezzi et al., 2022). Additionally, connections among words such as “learning,” “work,” and “school” and terms such as “effort” and “pressure” indicate user needs pertaining to personal growth and career development. These linkages reveal an urgent desire among users for psychological balance and healthy development in the contexts of family, education, and work.

Semantic network of user needs.
Need sentiment analysis
As shown in Table 2, a preliminary sentiment analysis was conducted on the user comments via the built-in tool ROST CM6.0. The results indicate that, overall, negative emotions in public requests are expressed far more frequently than positive emotions. Specifically, positive emotions accounted for only 21.97% of all sentiment expressions, neutral emotions accounted for 0.64%, whereas negative emotions accounted for as many as 77.39% of the total. This finding suggests that the majority of users’ needs on mental health service platforms are expressed in a negative emotional tone, reflecting the severity of the psychological distress and the urgency of the needs faced by these users. The strikingly high prevalence of negative sentiment (77.39%) underscores a critical state of psychological distress among users seeking help online, highlighting the urgent need for effective emotional support and intervention on these platforms.
Sentiment analysis results of user comments, with overall and intensity-level breakdown.
Need decision-making
Need-intensity analysis
As shown in Table 3, the need-intensity analysis provides statistical data on the number of user comments, their proportion (comment share), the sentiment distribution, and the calculated need-intensity score for each mental health service topic. Among these topics, the “Career and Personal Growth” and “Social and Emotional Support” topics received the highest numbers of comments—2187 and 2160 respectively—accounting for 20.40% and 20.15% of the total. The sentiment distribution indicates that most comments exhibit negative sentiment, with all topics showing a negative sentiment proportion exceeding 96.00%.
Need-intensity analysis of mental health service topics, with AHP weights.
Note: The need-intensity score was calculated using Formula (1) with the following AHP-derived weights: Number of Comments (α = 0.54), Aggregate Sentiment (β = 0.16), and User Attention (χ = 0.30). All values are harmonized to two decimal places.
AHP: Analytic Hierarchy Process.
The quantitative need-intensity analysis revealed that “Career and Personal Growth” emerged as the most pressing concern (Need-Intensity Score = 0.94), closely followed by “Social and Emotional Support” (Score = 0.88). This finding points to the immense pressure individuals’ face in modern work environments and their profound need for interpersonal connection and emotional backing.
The need-intensity scores were calculated using Formula (1), which integrated the number of comments, sentiment tendency, and user attention (average views), weighted by coefficients derived from the AHP. As noted in Table 3, the AHP assigned weights of 0.54, 0.16, and 0.30 to these three indicators, respectively. This comprehensive analysis provides a quantitative perspective for understanding and prioritizing user needs on digital mental health service platforms beyond mere comment prevalence.
Discussion
Principal results
This study used text mining to analyze user needs on China's digital mental health service platforms. The results reveal that users express strong needs in areas such as emotion management, mental illness consultation, family relationship management, social support, career development, and personal growth. These needs reflect the psychological pressures individuals face in a rapidly changing society. In particular, family relationships and intergenerational pressures stand out in the Chinese cultural context, which contrasts with Western studies where family roles might receive less emphasis. 16 These cultural specificities should directly inform the design of platform user experiences. For instance, incorporating family-centric features—such as secure “family linkage” for progress updates (with user consent), invite-to-session functionality for family members, and culturally attuned educational content for parents—could improve engagement and therapeutic effectiveness within families. 17 Moreover, the semantic network analysis revealed intrinsic links among user needs—especially the close association of “emotion” with state-of-mind terms such as “depression,” and “anxiety,” and the linkage of “family” with terms such as “parents,” “children,” and “marriage” —underscoring the importance of family dynamics in mental health discussions. 18
The sentiment analysis results showed that negative emotions accounted for up to 77.39% of user comments, whereas positive emotions accounted for only 21.91% and neutral 0.64%. This finding is consistent with previous research, indicating that negative emotions dominate users’ expressions on mental health platforms. 19 The prevalence of negative sentiment highlights users’ urgent needs for emotion management and reflects feelings of helplessness when facing life pressures. The need-intensity analysis further indicated that the topics “Career and Personal Growth” and “Social and Emotional Support” had the highest need intensities (0.94 and 0.88, respectively). The strong demand in Career and Personal Growth is related to the intense job competition and workplace stress faced by the younger generation in China. 20 Meanwhile, the high demand for Social and Emotional Support reflects people's longing for emotional connection, which is particularly salient amidst globalization and the clash of cultural values. 16 To translate these top-priority needs into concrete platform improvements, we propose two specific product changes: First, for “Career and Personal Growth,” platforms should implement a dedicated career stress triage path. This feature would guide users through a structured self-assessment upon selecting career-related concerns, automatically routing them to the most appropriate service based on urgency and need type—such as immediate crisis counseling, scheduled sessions with career-specialist consultants, or self-help modules on stress management and skill development. Second, to address the pronounced need for “Social and Emotional Support” within the crucial context of family relationships, platforms should develop family-aware features. This includes a secure “family linkage” option—with explicit user consent—that allows clients to safely share progress updates or key insights with designated family members. Furthermore, platforms could offer invite-to-session functionality, enabling therapists to facilitate included family members in certain sessions, and provide culturally attuned educational content to help families better understand and support their loved ones’ mental health journey. 17
These concrete product changes move beyond generic support and leverage technology to deliver targeted, context-sensitive interventions that directly address the most urgent needs identified by users.
Comparison with prior work
Compared to previous studies, our research made important advances in data source and methodological approach. First, our data were drawn directly from user feedback on specialized mental health service platforms, overcoming biases and coverage limitations present in studies based on social media data 21 Second, we employed a combination of text mining techniques, including LDA topic modeling, semantic network analysis, and sentiment analysis, to process large amounts of unstructured data and uncover latent patterns and trends in user needs.22,23 This provided rich information for understanding user needs. Finally, we introduced the integration of LDA topic modeling with a need-intensity formula for the first time, offering a practical basis for prioritizing resource allocation and service optimization on mental health service platforms.
Limitations
This study has several limitations, which can be grouped into three broad themes for clarity: data scope, methodological interpretation, and generalizability.
First, regarding data scope, our collection relied on web scraping from a limited number of major domestic mental health platforms, which may introduce selection bias. The findings primarily reflect the views of active users who frequently post comments, potentially underrepresenting “lurkers” or vulnerable populations who rely on informal or offline support channels.
Second, challenges in methodological interpretation warrant consideration. Although the LDA model effectively identified latent topics, its dependency on word frequency may overlook semantically rich or context-dependent expressions, particularly subtle emotional cues and culturally specific phrasing commonly found in mental health discourse. 24 Furthermore, the sentiment analysis conducted via ROST CM6, while efficient, may struggle with the linguistic complexity of UGC. Informal language, irony, sarcasm, and mixed emotions are prevalent in mental health discussions but are challenging for dictionary- and rule-based systems to accurately classify. 25 Such misclassification could lead to an oversimplified representation of users’ emotional states, and the strikingly high rate of negative sentiment (77.39%) should be considered in light of this potential tool-based bias. Notably, the need-intensity values in this study are point estimates for descriptive prioritization of user needs, without quantifying uncertainty or conducting statistical significance testing. This limitation, addressable by integrating variance-measuring methods in future work, should be acknowledged when interpreting the relative rankings of need intensity.
Finally, the generalizability of our findings is constrained by two factors. The lack of demographic details—such as age, gender, and socioeconomic status—restricts our ability to perform subgroup analyses. Variations in needs and expression patterns across different user profiles remain unexplored, limiting the practical tailoring of intervention strategies. Additionally, the cross-sectional nature of the data impedes insight into longitudinal trends in need evolution and the sustained impact of platform services. 26
Future studies should incorporate stratified sampling across diverse platforms, employ advanced context-aware models (e.g. transformer-based architectures) for sentiment and topic detection, and integrate user demographics to enable more differentiated and dynamic need assessment.
Conclusion
This study employed text mining to analyze user needs on Chinese digital mental health platforms, identifying pronounced demand in areas such as emotion management, psychological counseling, family relationships, and social support. In particular, “Career and Personal Growth” and “Social and Emotional Support” emerged as the most urgent needs based on our quantitative need-intensity scores. Crucially, the need-intensity metric offers direct operational value for platform management: it can inform the development of automated triage systems to direct users to the most appropriate level of care based on their expressed concerns, and it provides a data-driven foundation for prioritizing the development of new features and service lines in platform roadmaps. To systematically address these prioritized needs, we recommend closer integration of digital mental health services within national health policies and public insurance frameworks to improve accessibility and affordability. Furthermore, fostering cooperation between digital platforms and local community clinics can form a continuous, multi-level support system. The establishment of clear regulatory standards covering service quality, data privacy, and ethical practices remains essential to build user trust and promote sustainable industry development. These measures, guided by prioritized need-intensity data, would enable a more equitable, efficient, and user-centered delivery of digital mental health resources.
Footnotes
Acknowledgements
Not applicable.
Ethical considerations
The research protocol was reviewed and approved by the Ethics Committee of Wenzhou Medical University. No private or identifiable personal information was involved in the analysis. The research team strictly adhered to ethical guidelines for internet-based research, ensuring that the data were anonymized and de-identified during the processing and analysis phases to protect user privacy.
Contributorship
LJ, YX, LF, and WW designed and conceptualized the study. WW drafted the article. LJ and WW analyzed the data with support from all other authors. SD and JT participated in the study design and interpretation of findings and assisted in reviewing and revising the article. QZ provided valuable suggestions, assisted with revisions, and conducted critical review during the manuscript revision process. HY co-ordinated and supervised the entire research process. All authors contributed to the study design, interpreting findings, reviewing the article, and critically revising it for important intellectual content. All authors read and approved the final manuscript.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was funded by the National Social Science Fund of China (No. 23BGL300) and the Key Research Center of Philosophy and Social Sciences of Zhejiang Province (Institute of Medical Humanities, Wenzhou Medical University).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Guarantor
Haiyan Yu.
