Abstract
Objective
This study examines shifting trends in infertility-related concerns among Sina Weibo users by comparing data from 2018 (pre-pandemic) and 2023 (post-pandemic). The goal is to provide evidence-based insights to inform targeted health education and digital information strategies.
Methods
Utilizing web crawler technology, Weibo posts tagged with #infertility# were harvested for the years 2018 and 2023. A hybrid methodological approach was employed: “High-Attention” posts were categorized via manual coding, while “Low-Attention” posts were analyzed using Latent Dirichlet Allocation (LDA) topic modeling.
Results
High-Attention posts in both periods were characterized by four primary orientations: Policy and News, Medical Science Popularization, Personal Sharing, and Solely Commercial. In 2018, the implicit hotspots centered on outpatient appointment consultations, the integration of traditional Chinese and Western medicine, and etiology screening. By 2023, the focus of implicit hotspots shifted toward spiritual practices and body conditioning, alongside a burgeoning interest in IVF technology and international medical trends.
Conclusion
This study demonstrates that combining manual content analysis with LDA topic modeling is an effective framework for longitudinal monitoring of health discourse on social media. These findings elucidate evolving public concerns and engagement patterns over time, offering a valuable reference for health informatics research and public health policy monitoring.
Introduction
Driven by global population aging and evolving social structures, infertility has emerged as a critical public health challenge worldwide. 1 This issue is particularly pronounced in China, where rapid economic growth and lifestyle shifts have intensified the focus on reproductive health. 2 According to the Infertility Prevalence Estimates (1990-2021) report published by the World Health Organization (WHO) in 2023, 3 approximately 17.5% of adults—roughly one in six people—are affected by infertility globally, with prevalence continuing to rise. In China, infertility rates currently range from 12% to 15%, impacting nearly 50 million individuals. 4 The WHO forecasts that infertility will eventually rank among the top three health challenges of the 21st century, following only oncology and cardiovascular diseases, carrying profound implications for both family stability and broader societal development.
Over recent decades, China’s health policy has undergone substantial reforms, most notably with the formal introduction of the “three-child policy” in May 2021. 5 This pivotal shift has sparked extensive public discourse on fertility, accompanied by a marked increase in conversations surrounding infertility. Concurrently, the COVID-19 pandemic has significantly altered health behaviors and information accessibility worldwide. Restrictions on traditional healthcare services during the pandemic precipitated a growing reliance on internet hospitals and online counseling, as individuals increasingly sought digital platforms for medical support.6,7 These internet hospitals—digital healthcare platforms integrated with physical medical institutions—enable patients to conduct online consultations, manage follow-up visits, and receive electronic prescriptions remotely. In the context of infertility, such platforms provide a vital channel for accessing specialist advice and managing long-term treatment protocols while mitigating the need for frequent physical travel.
Research indicates a pervasive lack of awareness regarding infertility among both men and women, which often leads to delayed diagnosis and treatment, thereby diminishing conception prospects.8,9 Sina Weibo, one of China’s most expansive and influential microblogging platforms, functions similarly to a hybrid of X (formerly Twitter) and Facebook. By enabling information dissemination through short-form posts, multimedia content, and interactive features—such as retweets, likes, and hashtags—Weibo serves as a vital conduit for public discourse. With a user base of 337 million and a 42.1% netizen usage rate, it has become a primary arena for health communication. These digital platforms not only facilitate rapid access to health information but also empower the public to voice their health concerns directly. In the realm of reproductive health, social media has actively influenced fertility intentions, 10 early recognition of infertility, 11 and subsequent treatment seeking. 12 Furthermore, while text mining research has proven instrumental across diverse health domains—including endometriosis, 13 COVID-19, 14 and oncology 15 —underscoring its role in shaping health behaviors, research specifically addressing infertility remains comparatively scarce. This deficiency emphasizes the imperative to investigate infertility-related discourse on social media, particularly to understand its implications for public health perceptions within today’s rapidly evolving digital and policy landscape.
This study scrutinizes infertility-related discussions on Sina Weibo across 2018 and 2023 to elucidate shifts in public concern over this five-year interval. By comparing data from these two distinct temporal points, the research evaluates the impact of social policies and global events on public attitudes and behaviors regarding reproductive health. This investigation not only enriches the understanding of China’s role in global health issues but also offers vital insights for optimizing reproductive health information services. These findings will assist public health policymakers and medical professionals in better comprehending the influence of social media on health communication. Ultimately, the insights derived will inform the development of future health policies and information services aimed at more effectively addressing public needs.
Methods
This study utilized Weibo posts tagged with the hashtag #Infertility# to construct a dataset encompassing two full calendar years: 2018 (pre-pandemic) and 2023 (post-pandemic). The retrieved metadata included timestamps, usernames, post content, and engagement metrics, such as the number of retweets, comments, and likes, alongside user authentication status.
Inclusion and exclusion criteria
To ensure thematic relevance and the high quality of the study corpus, the following inclusion and exclusion criteria were established: Inclusion Criteria: (1) Thematic Relevance: Weibo posts containing the hashtag #不孕不育# (hashtag #Infertility#). (2) Temporal Scope: Posts published between January 1 and December 31 in both 2018 and 2023. (3) Engagement-Based Categorization: To facilitate comparative analysis, the data were divided into two groups: ● High-Attention Group: Posts with > 0 retweets AND ( > 0 likes OR comments).
16
● Low-Attention Group: All other remaining relevant posts that met the thematic and temporal criteria.
Exclusion Criteria: (1) Redundancy: Exact duplicate posts and identified automated “bot” spam. (2) Irrelevance: Posts utilizing the hashtag in contexts unrelated to human reproductive health, such as metaphorical or non-medical usage. (3) Incomplete Data: Records with corrupted text or missing essential engagement metrics.
Procedure
The study workflow is illustrated in Figure 1. Data retrieval was executed using Octoparse (version 8.x, Octopus Data Inc.), an automated visual web scraping technology that simulates human browsing behavior to ensure the precise extraction of dynamic content from the Sina Weibo platform. Workflow depicting the methodological pipeline of the study.
Following data retrieval and the application of inclusion/exclusion criteria, the dataset was categorized into High-Attention and Low-Attention groups based on the pre-defined engagement thresholds. This specific threshold was established to account for the sensitive nature and social stigma surrounding infertility on Chinese social media; in such a context, any interaction—particularly a retweet coupled with further engagement—represents a significant “tipping point” of public resonance.
To ensure a comprehensive analysis, a dual-track hybrid methodology was adopted: (1) Track 1 (Manual Content Analysis): Applied to the High-Attention group to identify “explicit hotspots,” defined as visible and highly impactful narratives. (2) Track 2 (LDA Topic Modeling): Utilized for the Low-Attention group to uncover “implicit hotspots,” representing latent thematic patterns and pervasive collective concerns. This balanced approach captures both the high-resolution insights of influential posts and the broad-scale thematic trends within the “long-tail” of personal experiences.
Text mining analysis
Sina Weibo offers a non-intrusive source of online discourse, facilitating data collection through naturalistic observation without direct researcher interference. Compared to traditional questionnaire-based surveys, this approach yields richer, more ecologically valid data by capturing authentic public discourse within a real-world context. 17
To ensure methodological rigor in processing unstructured data, the text mining analysis followed a structured pipeline: (1) Data Preprocessing: Microblog texts were segmented into words, and stop words—including common particles and punctuation marks with minimal semantic value—were removed to mitigate noise. (2) Manual Coding Rigor: For explicit hotspots, two independent researchers coded a random 10% sample of the High-Attention posts. Inter-rater reliability was assessed using Cohen’s Kappa coefficient. Based on the criteria by Landis and Koch, a Kappa value of 0.75 was obtained, indicating substantial agreement. Furthermore, bot activity and commercial bias were screened by verifying user authentication status and auditing content attributes to ensure data authenticity. (3) LDA Algorithm Deployment: For implicit hotspots, we deployed the Latent Dirichlet Allocation (LDA) algorithm, a generative probabilistic model employing a three-layered structure comprising topic words, topics, and documents. 18 LDA was selected for its efficiency in identifying latent thematic clusters that would otherwise be labor-intensive to analyze manually. 19
The LDA model parameters were optimized through iterative adjustment testing. Specifically, we systematically tested a range of values for the number of topics ( K ) and evaluated the resulting clusters through a dual-evaluation approach: calculating quantitative coherence scores to ensure mathematical robustness and performing qualitative interpretability reviews to ensure semantic clarity. This rigorous selection process enabled the determination of the optimal K, effectively uncovering nuanced, lesser-reported topics with significant social implications. The reporting of this study conforms to the STROBE statement, and the completed STROBE checklist is provided as a Supplementary File.
Results
Users and time distribution
In 2018, a total of 2,059 Weibo posts posted by 379 users were analyzed. Among these, 225 (59.4%) were personally authenticated users, while 154 (40.6%) were officially authenticated. The user structure on Sina Weibo—characterized by verified accounts (e.g., medical institutions and influencers) and unverified individual accounts—significantly shaped the findings. Verified accounts generally drove the High-Attention (explicit) discourse, prioritizing authoritative medical and policy information. Conversely, unverified individual users dominated the Low-Attention (implicit) group, providing authentic, personal insights into the daily struggles of infertility. This distinction ensures that the results reflect a balance between institutional guidance and grassroots patient experiences.
By 2023, the volume increased to 6,451 Weibo posts from 750 users, with 436 (58.1%) being personally authenticated and 314 (41.9%) officially authenticated. Temporal distribution analysis revealed that discussion peaks occurred in December 2018 and February 2023. Notably, the volume of posts in February 2023 experienced a surge, indicative of a localized “outbreak” in discussion, followed by a subsequent decline and fluctuating intensity thereafter (Figure 2). Comparative monthly trends in Weibo post volumes across 2018 and 2023.
Thematic distribution of High-Attention posts
In 2018 and 2023, the High-Attention group comprised 99 and 609 Weibo posts, respectively. Through manual content analysis, these posts were categorized into four primary thematic orientations: (1) Policy and News: The dissemination of authoritative governmental regulations and medical news. (2) Medical Science Popularization: Professional health education and scientific outreach. (3) Personal Sharing: Narratives of lived experiences and emotional expression. (4) Solely Commercial: Content exclusively focused on product or service promotion.
In 2018, the thematic distribution was predominated by Medical Science Popularization (40.4%), followed by Solely Commercial (30.3%) and Personal Sharing (23.2%) content, while Policy and News accounted for the remaining 6.1%. By 2023, a profound shift in public engagement emerged: Personal Sharing became the dominant theme, rising to 49.0%. Conversely, Policy and News experienced a precipitous decline to a mere 0.3%. These fluctuations underscore a clear transition in public interest, shifting from a reliance on authoritative professional guidance in 2018 toward the exchange of authentic, lived experiences in 2023.
Using the volume of retweets, comments, and likes as coordinates, three-dimensional distribution matrices were constructed to visualize engagement patterns. In 2018, the highest engagement density was concentrated within a cluster characterized by fewer than 20 comments, 250 likes, and 250 retweets. Notably, the Solely Commercial category achieved the highest levels of likes and retweets, while Medical Science Popularization content garnered the most comments(Figure 3) Matrix distribution of High-Attention (Explicit Hotspot) Weibo posts in 2018 identified via manual content analysis.
By 2023, the engagement profile shifted significantly, with high-density areas typically featuring fewer than 100 comments, 100 likes, and 20 retweets. During this period, Personal Sharing posts predominated in both likes and comments, whereas Solely Commercial content maintained the highest retweet volume. These evolving patterns underscore a fundamental shift in how users interact with and respond to infertility-related discourse over the five-year study period(Figure 4). Matrix distribution of High-Attention (Explicit Hotspot) Weibo posts in 2023 identified via manual content analysis.
Implicit themes of Low-Attention posts
LDA topic modeling results for the Low-Attention (implicit hotspot) group in 2018: Thematic clustering and keyword identification.
LDA topic modeling results for the Low-Attention (implicit hotspot) group in 2023: Thematic clustering and keyword identification.
Discussion
This study reveals a substantial increase in infertility-related discourse on Sina Weibo between 2018 and 2023, reflecting a complex socio-demographic transition driven by various socioeconomic and policy factors. China’s total fertility rate fell to 1.3 in 2020 20 —well below the replacement level of 2.1—prompting the 2021 “three-child policy” to counteract this decline. 21 Despite these legislative efforts, the surge in social media discussions suggests that public anxiety regarding fertility challenges remains acute.
Furthermore, the COVID-19 pandemic acted as a catalyst, exacerbating psychological stress and altering lifestyles, which further amplified infertility concerns. 22 Restrictions on traditional healthcare accelerated the adoption of internet hospitals and online counseling, fundamentally transforming the dynamics of health information access. 23 Within this digital ecosystem, Weibo has emerged as a crucial conduit for health communication due to its rapid, extensive, and interactive capabilities. 24 Notably, the spike in discussions in February 2023, coinciding with the China National Health Insurance Bureau’s response to infertility treatment coverage, underscores the platform’s role as a responsive arena for policy discourse. Thus, social media platforms like Weibo serve not only as information channels but as amplifiers of public sentiment, illustrating the intricate interplay between social dynamics and health perceptions in contemporary Chinese society. 25
Our findings demonstrate a notable increase in the proportion of Personal Sharing posts between 2018 and 2023. These Weibo posts are predominantly characterized by subjective narratives and peer-to-peer advice on global medical treatments. Such communicative behavior is likely driven by a dual need: the expression of complex emotions associated with infertility and the cultivation of personal identity and community belonging within digital spaces. 26 Conversely, the proportion of Medical Science Popularization content has declined, despite its substantial potential to enhance public scientific literacy. 27 While content from certified users remains a cornerstone of authority, this downward trend underscores a pressing need for healthcare organizations to more proactively engage in digital scientific outreach. Furthermore, Solely Commercial content maintains a persistent and substantial presence. The resilience of these commercial promotions—notwithstanding their potential for negative impact—highlights the enduring nature of profit-driven narratives on social media. This persistence necessitates more stringent regulatory measures and robust public education initiatives to empower users to critically discern and challenge potentially misleading health information.
LDA, an unsupervised machine learning model, was deployed to identify implicit topics within the large-scale corpora of “Low-Attention” posts. 28 Our longitudinal analysis reveals a distinct shift in public focus between 2018 and 2023. In 2018, the discourse centered on pragmatic concerns, including outpatient consultations, comparative experiences with traditional Chinese and Western medicine, and etiology screening. By 2023, however, the thematic focus expanded to encompass spiritual practices and body conditioning, alongside a growing interest in In Vitro Fertilization (IVF) technology and international medical trends. The emergence of “spiritual practices” in 2023 likely reflects a proactive psychological coping strategy rather than a rejection of medical science. From a psychological perspective, rituals such as praying for children provide positive suggestion, helping individuals regain a sense of agency and a “locus of control” amidst the profound uncertainty of infertility treatments. By attributing outcomes to external spiritual forces, patients may alleviate self-blame and emotional distress, utilizing these practices as an emotional supplement to clinical interventions.
This shift indicates that despite escalating challenges, the reproductive intentions of the infertile population remain robust. Over this five-year interval, the number of assisted reproduction facilities in China surged from 498 to over 2,700, significantly enhancing accessibility while simultaneously shifting public demand toward higher medical quality. Furthermore, the rise of transnational reproductive tourism has introduced new avenues for surrogacy while simultaneously raising complex socio-ethical issues. 29 This evolving landscape underscores the dynamic nature of infertility discussions and the critical role of social media in both mirroring and shaping public priorities.
Based on the study’s findings, several evidence-based recommendations are proposed to optimize the dissemination of reproductive health information: (1) Tailoring Outreach to Evolving Public Concerns: Relevant health authorities should dynamically adjust their outreach content to reflect the shifting concerns within the infertility community identified over the five-year study period. Addressing the growing demand for comprehensive, longitudinal information is essential for meeting the public’s informational needs. (2) Enhancing Science Communication within Medical Institutions: As the primary authoritative sources for infertility treatment, medical institutions must refine their scientific communication strategies. Implementing multimodal, new-media-based approaches—specifically tailored to key demographics such as couples of childbearing age—will significantly bolster reproductive health awareness and literacy. (3) Strengthening Information Governance and Monitoring: It is imperative to establish robust monitoring and governance frameworks to regulate misleading or malicious content. Ensuring the authenticity and reliability of digital health information is crucial for protecting the public’s health rights. By implementing these strategies, stakeholders can more effectively satisfy societal information requirements regarding infertility, thereby enhancing public health literacy and informing the development of future reproductive health policies.
This study addresses a significant research gap by providing a longitudinal comparison of infertility discourse on Chinese social media surrounding the COVID-19 pandemic. The identification of emerging trends in 2023—such as spiritual practices and international IVF-seeking—serves as a critical clinical reminder to prioritize patients’ psychological distress and evolving help-seeking preferences. Looking forward, future research should focus on developing AI-driven tools for the real-time monitoring of health misinformation or exploring how social media interactions directly influence the actual treatment-seeking behaviors of infertile couples. Furthermore, it is essential to acknowledge the burgeoning influence of video-based platforms in reproductive medicine. Recent cross-sectional studies on TikTok, YouTube, and Instagram have evaluated information quality regarding hysteroscopy,30,31 PCOS, 32 and pregnancy strategies, 33 consistently highlighting a universal challenge: the prevalence of low-quality or inaccurate medical claims across diverse digital formats.
Notwithstanding its contributions, the present work possesses several strengths and limitations. The study’s primary strengths lie in its unique longitudinal design and hybrid methodology. However, several limitations must be acknowledged: (1) Demographic Constraints: The lack of granular demographic data for Weibo users restricts the ability to fully assess the sample’s representativeness. (2) External Influences: Findings are shaped by external factors, such as platform moderation policies and national fertility shifts, which may affect long-term replicability. (3) Data Accuracy and Verification: The medical accuracy of claims made by non-certified users could not be independently verified. Additionally, engagement metrics such as ‘likes’ may include self-interactions, which the platform’s public metadata does not distinguish. (4) Sampling Approach: A formal a priori sample size calculation was not performed; instead, a total population sampling approach was adopted to capture all relevant discourse. Consequently, the results should be interpreted as a longitudinal thematic reflection rather than a statistically powered generalization.
Code Availability
The Python code and LDA model parameters used for this analysis are available on GitHub at https://github.com/yeguolin/PerspectivesOnInfertility/tree/main. The repository includes a comprehensive English README for international accessibility, with a corresponding Chinese version (README_zh.md) also provided.
Footnotes
Ethical considerations
This study did not require ethics approval because all data collected were publicly available. There is no means within this paper or its supporting materials to establish the identification of users and their corresponding tweets.
Authors contributions
ST was responsible for drafting the manuscript, collecting data, and data analysis. YRX assisted with data collection and analysis. HQX provided mock peer review, supervised the investigation and acquired funding.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Shanghai Nursing Association Research Project (Grant Number: 2021MS-B01); Shanghai Jiao Tong University School of Medicine: Nursing Development Program (Grant Number: SJTUHLXK2024); the Humanities and Social Sciences Young Talent Training Program of Shanghai Jiao Tong University (2025QN014).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
