Sage Journals: Discover world-class research

Abstract

Objectives

Urinary tract infections (UTIs) frequently affect individuals of all ages, necessitating antibiotic treatment and medical care, which can impair quality of life and cause psychological strain. Online Health Consultation (OHC) platforms serve as a widely used communication tool, offering integrated support for medical guidance and disease management. By examining OHC interactions, this study explores the concerns and difficulties experienced by UTI patients to better understand their perspectives.

Methods

Data from 20,000 anonymized UTI-related records (2020–2024) were obtained from a major Chinese online healthcare platform, Good Doctor Online. Analysis occurred in two stages: BERTopic extracted key themes and keywords from text data, followed by sentiment analysis of these findings using a generative AI language model. All data was publicly accessible and de-identified.

Results

Analysis of 18,479 cleaned records using BERTopic identified six key themes: “Polite Expressions for Consultation,” “Symptom and Management Challenges,” “Differential Diagnosis of Cystitis,” “Etiology Related to Sexual Activity,” “Nocturnal Symptoms and Fever,” and “Perinatal Considerations.” Sentiment analysis showed predominantly negative emotions, reflecting the condition's substantial physical and mental toll. The “Etiology Related to Sexual Activity” theme had the highest negativity (97%), while “Polite Expressions for Consultation” showed the most positivity (9%).

Conclusion

These research results highlight the important role of online communities in providing support and information to patients, and the insights derived from this study can provide valuable reference for social media developers, medical service providers, and policymakers.

Keywords

Urinary tract infections topic modeling sentiment analysis disease management artificial intelligence

Introduction

Urinary tract infections (UTIs) represent a complex and escalating public health challenge, particularly pronounced in developing nations. UTIs affect individuals across all age groups, frequently necessitating medical consultations and antibiotic therapy while significantly compromising patient quality of life.^1–4 Beyond the immediate physical discomfort, characteristic symptoms disrupt fundamental daily activities, including work, exercise, and sleep. Moreover, the persistent burden of disease management contributes to substantial emotional distress, often manifesting as psychological sequelae that elevate the risk of comorbid anxiety and depression.^5,6

Concurrently, the integration of digital technologies into healthcare systems has fundamentally transformed patient–physician interactions. Online Health Consultation (OHC) platforms have emerged as pivotal channels for healthcare delivery, particularly in regions with limited access to traditional medical expertise. Facilitated by mobile internet technology, these platforms exemplify a broader digital transformation that has reshaped public health information acquisition.⁷ This technological shift, bolstered by supportive policies and evolving healthcare-seeking behaviors, has accelerated the digitalization of healthcare in China,⁸ establishing the internet as the primary source for health-related information.⁹ Areas such as information sharing and online diagnosis have positioned the internet as a critical space for obtaining health knowledge,¹⁰ a trend amplified during the COVID-19 pandemic.^11–13

Importantly for conditions like UTIs, which may involve sensitive or stigmatized topics, OHC offers distinct advantages over traditional face-to-face consultations: 24/7 accessibility irrespective of location,¹⁴ and a degree of anonymity that facilitates disclosure of concerns with reduced psychological burden.^15–17 Consequently, a 2021 nationwide survey indicated that 28.9% of Chinese residents intended to utilize OHC for diagnosis and treatment, confirming its status as a prevalent doctor–patient communication modality that delivers an integrative support environment for professional guidance and illness management.¹⁸ These digital interactions generate extensive clinical narratives through text-based consultations, creating an unprecedented repository of real-world clinical insights. Crucially, such patient-generated data captures nuanced aspects of disease presentation, progression, and management often undetected in conventional structured clinical data. Furthermore, OHC empowers patients by providing social support, including informational and emotional support from healthcare professionals.¹⁹

Despite this wealth of data generated by OHC platforms, characterizing patient perspectives and experiences on conditions like UTIs has traditionally relied on cohorts recruited through clinical settings.^20,21 While valuable, these methods are often inefficient for capturing insights from the broader patient community actively engaging online. Within these high-interaction digital communities, vast amounts of highly unstructured, tacit knowledge accumulate.^22,23 However, the inherent complexity of this narrative data presents significant analytical challenges. Extracting meaningful, clinically relevant insights necessitates sophisticated computational methods for knowledge extraction and interpretation,²⁴ which have historically been lacking.

Fortunately, contemporary advances in medical informatics offer significant methodological progress in handling unstructured healthcare data. NLP-based text mining techniques, encompassing natural language processing and machine learning, have demonstrated notable efficacy in extracting clinically relevant insights from diverse medical text sources^25,26 (e.g., adverse drug reaction detection,²⁷ disease trajectory prediction,²⁸ novel clinical association discovery). Nevertheless, unique linguistic, contextual, and cultural features within Chinese medical discourse introduce additional methodological hurdles that remain inadequately addressed within current analytical frameworks.

Specifically to address these challenges, particularly the nuanced and indirect nature prevalent in Chinese social media discourse, advanced analytical tools are required. While traditional topic modeling methods like Latent Dirichlet Allocation (LDA) and Top2Vec often struggle with contextual complexity, BERTopic has demonstrated its capability to excel in identifying subtle themes with greater precision, terminological diversity, and flexibility.²⁹ Building upon this foundation and to achieve deeper semantic analysis, we employed the cutting-edge capabilities of ChatGPT-4. This novel approach integrates BERTopic's contextual depth with ChatGPT-4's advanced semantic understanding and generative labeling. We utilized ChatGPT-4 both to classify the themes identified by BERTopic and to assess sentiment within the discourse. This AI-driven integration aims to significantly improve the accuracy and interpretability of topic modeling, thereby unlocking meaningful insights from the intricate linguistic patterns found in large-scale, real-world patient narratives.

Therefore, this research aims to harness the combined analytical power of BERTopic and ChatGPT-4. We will apply this integrated methodology to delve into large volumes of consultation records from China's premier OHC platform. Our primary goal is to uncover the common concerns, information needs, and emotional experiences of patients and health community users regarding UTIs within this digital landscape. Ultimately, we seek to generate actionable insights that can inform strategies for improving patient-centered healthcare services and optimizing the organization and structuring of health information within online health communities.

Methods

Study overview

This retrospective observational study integrates data mining and textual analysis within the interdisciplinary fields of medical informatics and health services research. It employs a mixed-methods approach, combining quantitative topic modeling and qualitative sentiment analysis on textual data, to systematically identify key challenges in UTI diagnosis and treatment and to uncover multidimensional characteristics of patient experiences.

The research consisted of two main phases: First, the BERTopic model was applied to patient consultation texts to extract core themes and representative keywords, with typical comment excerpts selected to enhance thematic interpretability. Subsequently, large language models (LLMs) were employed to further deepen the understanding of thematic connotations and their associated emotional tendencies. The entire process included detailed topic labeling and emotion-oriented identification to uncover predominant emotional characteristics within each theme.

Data collection

Data were obtained from “Haodf.com” (Good Doctor Online), a prominent Chinese OHC platform. The selection of this platform was based on comprehensive ranking metrics for medical and health websites, incorporating authoritative indicators such as Alexa ranking, Baidu weight, PR value, and mobile compatibility.

Relevant consultation records were collected by searching the following Chinese keywords: “Urinary Tract Infection,” “Bladder Infection,” “Cystitis” and “Urethritis.” All related records from June 2020 to September 2024 were crawled. Haodf.com provides full access to registered users, allowing the review of all publicly available content. The consultation data typically include text-based interactions between patients and healthcare providers, covering symptom descriptions, medical history, timestamps, and physician identifiers, among other metadata.

In evaluating the stability of clustering results, Harloff and colleagues³⁰ examined the convergence rates of various clustering methods using six sets of partitioned ranking data. For domains containing 25 items or more, to ensure clustering stability, there should be around 20 samples per topic. Applying the above principle (i.e., 20 samples per topic), the theoretical sample size should be 400 entries. The present study already uses 20,000 samples, well above the minimum requirement, thereby fully meeting the statistical standards for the stability of the clustering analysis.

Although these consultation data are publicly accessible and commonly used for research, this study strictly adhered to ethical guidelines, emphasizing user privacy and data confidentiality. All records underwent manual review to exclude any potentially personally identifiable information, ensuring compliance with privacy protection standards.

Data preprocessing

Preprocessing of the raw data was performed using Python (Python Software Foundation). From an initial set of 20,000 consultation records, incomplete documents, entries from unrelated medical specialties, and duplicate records were excluded. Numbers, special characters, and stop words were also filtered out. The application of these criteria resulted in a final dataset of 18,479 valid online consultation records, forming a robust sample for analyzing UTI-related consultations.

For Chinese word segmentation, the Python library “jieba” was used to convert continuous text into discrete lexical sequences. The process integrated four major stop word lexicons: the Harbin Institute of Technology Stop Word List, Baidu Stop Words, the Renmin University of China Stop Words, and the Sichuan University Machine Intelligence Laboratory Stop Words, supplemented by a custom stop word list. This effectively removed irrelevant words and transformed the text into a word-list format. These preprocessing steps significantly enhanced the robustness and interpretability of subsequent analyses.

Topic identification with BERTopic methodology

The BERTopic method is an unsupervised, pretrained deep learning method used for topic modeling and is often used in social science research. It achieves significant results in analyzing social media content,³¹ performs well when reviewing analyst reports,³² plays a key role in conducting literature meta-analysis,³³ and is also effective in evaluating customer reviews.³⁴ This technology uses the self-attention mechanism and Category based-Term Frequency-Inverse Document Frequency (C-TF-IDF) to generate well-defined clusters that improve the interpretability of topics while retaining key descriptive keywords.

While conventional topic modeling approaches like LDA^,³¹ Probabilistic Latent Semantic Analysis (PLSA),³⁵ and the Correlated Topic Model (CTM)³⁶ have their merits, BERTopic stands out by offering three distinct improvements. For starters, it leverages a pre-trained bidirectional BERT model to produce text embeddings, which not only pinpoints document themes with precision but also tackles the tricky issue of words with multiple meanings.^37,38 What's more, by integrating the Transformer framework with the C-TF-IDF technique, it builds tightly knit clusters that take topic coherence to the next level.³⁹ And let's not forget—unlike other methods that might lose critical terms in the shuffle, BERTopic keeps the most meaningful words front and center throughout the clustering process, making the final output far easier to interpret.

The BERTopic topic modeling workflow encompasses five essential stages⁴⁰: First, it harnesses a BERT model to generate numerical embedding vectors; Next, it applies Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction; Then, it employs the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) for clustering; Following that, it implements various topic generalization approaches; Lastly, it extracts pivotal keywords for each cluster using C-TF-IDF, streamlining the process of topic identification and ranking. The system's modular structure allows researchers to tweak each phase to suit their requirements. Given these compelling benefits, this research employs BERTopic to pinpoint topics within consulting records.

Therefore, we employed the BERTopic Python library for structured topic modeling to explore patients’ experiences with UTIs. The default Sentence-Transformers embedding method in BERTopic was utilized, specifically the paraphrase-multilingual-MiniLM-L12-v2 model, which supports Chinese text embedding. For dimensionality reduction and clustering, we adopted the default UMAP and HDBSCAN methods, respectively. After comparing multiple modeling results, the key parameters were set as follows: UMAP n_neighbors = 10, n_components = 10; HDBSCAN min_cluster_size = 200; and random state = 42. During text processing, Scikit-learn's Count Vectorizer was first applied for feature extraction and vectorization, converting raw text into a term frequency matrix. In terms of performance, the BERTopic model demonstrated significantly higher topic coherence compared to the traditional LDA model.^41,42 Subsequent optimization strategies—including “C-TF-IDF,” “Distributions,” and “Embeddings”—were implemented by reloading the model, transforming text data, reducing outlier topics, and updating the model to iteratively improve topic quality and structural rationality.

Topic labeling with LLM using prompt engineering

Utilizing LLMs is our groundbreaking approach to tweak and elevate the interpretations of topics that emerge from BERTopic. By inputting BERTopic's findings into OpenAI's ChatGPT-4o-mini, guided by meticulously designed prompts, we aim to validate the core meanings and emotional undertones behind the topics identified via unsupervised machine learning. This strategy not only brings clarity to the results but also delves deeper into the intricacies of automatic thematic analysis, providing a more nuanced understanding. In the context of large-scale text sentiment analysis, LLMs have demonstrated significant advantages in understanding context, parsing complex semantics (e.g., identifying sarcasm and irony), and capturing fine-grained emotional tendencies (e.g., distinguishing between “disappointment” and “anger”). A growing body of literature has already adopted this approach.^43,44

We have developed a multitier emotional dictionary, integrating automated collection with manual validation to balance efficiency and accuracy. The dictionary comprises two main parts:

1. Basic emotion words: contains general Mandarin emotion terms, including:

Positive emotion words: for example, “satisfied,” “effective,” “improved,” etc., used to express positive emotions. Negative emotion words: for example, “pain,” “discomfort,” “worry,” etc., used to express negative emotions. Each emotion word is tagged with its emotional polarity (positive/negative) and an intensity score reflecting how strongly the emotion is conveyed.

2. Domain-specific emotion words: to boost analysis performance in particular contexts, we have collected specialized vocabulary for medical consultation scenarios, such as “alleviate,” “side effects,” “therapeutic effect good,” etc. These terms were extracted from medical forums, consultation records, and professional literature to ensure strong applicability in the medical field.

To ensure the accuracy and consistency of LLMs in sentiment annotation tasks, this study employs a systematic validation process that comprises two main stages:

First, validation checks are the primary mode, with human reviewers directly evaluating whether the LLM's sentiment judgments are correct and correcting any mislabeling. To ensure uniform review standards and reduce subjective differences among reviewers, we established explicit sentiment-discrimination criteria. Following statistical requirements at the 95% confidence level with a ± 5% margin of error, 400 texts were randomly sampled from the overall dataset for detailed verification. Two reviewers independently assess the same batch of samples, and for disagreements a third senior expert is brought in to arbitrate to determine the final labeling. Furthermore, during manual review, systemic errors of the LLM (such as consistently misclassifying certain sarcasm statements as positive sentiment) are proactively identified and fed back into the model optimization process. These findings can inform iterative adjustments to prompt design or postprocessing workflows, forming a continuous closed-loop quality control mechanism for ongoing improvement.

Second, to quantify the overall performance of LLM labeling, another random sample of 200 items was selected for independent coding validation, with domain experts independently performing sentiment labeling, and the agreement between LLM outputs and expert labels was calculated. Interrater reliability was assessed using Cohen's Kappa, and statistical analysis was conducted with IBM SPSS Statistics 23.0. Results showed a Kappa value of 0.854, exceeding the conventional threshold of 0.80, indicating a high level of agreement between the LLM outputs and human expert annotations, thereby supporting the reliability of the approach in this study and enabling further analyses based on automated labeling results. The prompt used in this study is provided in the supplementary materials.

Ethical considerations

The Ethics Committee of Jiangsu Province Hospital of Chinese Medicine approved this study and waived the requirement for written informed consent (Approval No. 2024NL-025-01), as all data were obtained from a public platform. Our study was conducted in accordance with the principles outlined in the Declaration of Helsinki. All data underwent comprehensive de-identification to remove personal details. In accordance with the platform's terms of service and established international research practices, user consent followed an implied consent model.^45–47

Results

Building on the framework of prior studies, this research utilizes the BERTopic topic modeling technique to identify predominant themes in UTIs online consultation data. Subsequently, the study undertakes a detailed analysis of these thematic clusters.

Topic identification

Under the preset parameter conditions, the BERTopic model identified 14 core topics (ranging from Topic 0 to Topic 13). These topics were arranged in descending order of their prevalence within the corpus. Each topic is represented by a set of characteristic terms, with varying weights assigned to each term; terms possessing higher weights contribute more significantly to defining the topic. The marginal diminishing effect associated with the feature word weights for each topic is illustrated in Figure 1.

Figure 1.

The declining trend of feature words’ weights.

Analysis reveals that 3–5 key terms are typically sufficient to capture the essence of most topics. Beyond five terms, the inclusion of additional features yields diminishing returns, offering minimal added value for topic representation. Figure 2 presents the probabilistic feature weights and corresponding lexicon for the 13 identified research topics. This visualization enables systematic corpus-based assessment of topic-defining terminology across consultation records, permitting definitive thematic designations through comprehensive analytical synthesis. To illustrate, consider Topic 0: key terms such as “Pregnancy,” “Infant,” and “Lactation” distinctly delineate its core focus. An examination of associated discussions confirms that this topic centers on investigating the causes of urinary tract infections during the perinatal period. Consequently, documents associated with Topic 0 are categorized under “Perinatal Health.” The identification and naming of subsequent topics follow this methodology, ensuring the assigned labels accurately reflect the research nuance and maintain a clear, scholarly tone.

Figure 2.

Fourteen consultation topic feature words and weight distribution.

This study uses a combination of internal consistency (evaluating keywords within each topic) and external validity (document sampling validation) to comprehensively validate the results of the topic model.

Regarding internal consistency, a panel of experts first assessed the semantic coherence and logical connections among the keywords inside each topic to ensure internal consistency, while also comparing keyword sets across different topics to test how well the topics are distinguished, thereby confirming the uniqueness of each topic.

Regarding external validity, we randomly selected 50 records assigned to each topic for manual cross-checking by the expert panel. For example, topic 0 (keywords: “Pregnancy,” “Infant,” “Preconception care”). After reviewing the original texts, the experts found that 48 of the records were related to pregnancy, family planning, infant care, and menstrual health, indicating a high level of classification accuracy for this topic.

Main themes

To begin, we computed a similarity matrix across topics using cosine similarity. This metric allowed us to visualize the strength of associations between different subjects through a heatmap (Figure 3). Building on these relationships, we then applied clustering techniques to systematically organize the topics, ultimately establishing a comprehensive framework for discussions about urinary tract infections. The clustering outcomes (Figure 4) revealed that the 14 initial topics naturally fell into 6 broader thematic categories, each reflecting core areas of patient inquiry. We assigned descriptive labels to these clusters based on their predominant keywords, with the full breakdown provided in Table 1. Topic 12 exhibits minimal thematic associations with other research topics in the cluster analysis visualization. Given that its characteristic keywords constitute standard consultation terminology, it has consequently been classified alongside topics 2, 3, and 4 within the overarching theme of “Polite expressions for consultation.”

Figure 3.

Heatmap of themes in consultation records related to urinary tract infections.

Figure 4.

Thematic hierarchy diagram: research theme clustering of urinary tract infection records based on cosine similarity.

Table 1.

Keywords representative of the six main themes.

Theme	Keywords^a
Polite expressions for consultation	“Years Ago,” “Urination,” “Trouble you,” “Hello,” “Thanks,” “Hi,” “Professor,” “Frequent,” “Medication Use,” and “Consultation”
Diagnosis, symptoms, and management challenges	“WBCs,” “Bacteria,” “UA,” “A few days,” “Urine culture,” “Cephalosporins,” “Urethral meatus,” “Incomplete bladder emptying,” “RBCs,” and “Antibiotics”
Differential diagnosis of cystitis	“Cystitis glandularis,” “Surgery,” “Cystoscopy,” “Bladder,” “Cystitis,” “IC,” “Instillation,” “TUR,” “Postoperative,” and “Chronic cystitis”
Etiology related to sexual activity	“Sexual activity,” “Masturbation,” “Testicle(s),” “Erection,” “Ejaculation,” “Semen,” “Penis,” and “Frequent”
Nocturnal symptoms and fever	“Fever,” “Yesterday,” “Night,” “Daytime,” “Nocturia,” “Sleeping,” “Last night,” “Morning,” “The day before yesterday,” and “Washroom”
Perinatal considerations	“Pregnancy,” “Infant,” “Preconception care,” “Postpartum,” “Effect/Impact,” “Lactation period,” “Sexual intercourse,” and “Menstruation”

Ranked based on their relevance scores from the BERTopic results, making them the 10 most representative words for each topic. The table shows the results after removing duplicates. WBCs: white blood cells; UA: urinalysis; RBCs: red blood cells; IC: interstitial cystitis; TUR: transurethral resection.

Significant variations exist in the attention distribution across different thematic clusters. Table 2 further details the number of themes encompassed within each cluster, the corresponding volume of related consultation records, and their proportional distribution. This quantification offers an objective basis for analyzing the primary thematic clusters within UTIs online consultation data. Overall distribution indicates that the cluster “Diagnosis, Symptoms, and Management Challenges” contains the highest number of consultation records, suggesting that topics within this cluster receive greater patient attention. The clusters “Polite Expressions for Consultation” and “Nocturnal Symptoms and Fever” exhibit comparable record counts. In contrast, the clusters “Perinatal Considerations,” “Differential Diagnosis of Cystitis,” and “Etiology Related to Sexual Activity” each comprise only a single theme, representing the clusters with the lowest number of constituent themes.

Table 2.

The volume of themes and records related to urinary tract infections.

Theme	Number (topics)	Count (records)	Percentage (%)
Polite expressions for consultation	4	2624	14.20%
Diagnosis, symptoms, and management challenges	4	10,349	56.01%
Differential diagnosis of cystitis	1	1793	9.70%
Etiology related to sexual activity	1	1217	6.59%
Nocturnal symptoms and fever	3	2236	12.10%
Perinatal considerations	1	259	1.40%

Thematic content analysis

Topic 1: Polite expressions for consultation

This topic accounts for 14.20% of the discussions (2624/18,479), focusing primarily on the conventional phrases used when consulting doctors online. Key terms include “Trouble you,” “Hello,” and “Thanks.” In internet-based communications, patients frequently employ such polite expressions to convey respect for medical professionals. Examples are as follows:

Hello, Professor. I’ve been experiencing painful urination ever since a high-risk encounter with my ex-girlfriend in March.

Hello, Doctor. I’m a patient who consulted you before. You mentioned I was suffering from anxiety and advised me to see a psychiatrist.

Topic 2: Diagnosis, symptoms, and management challenges

Constituting 56.01% of the discussions (10,349/18,479), this topic centers on the diagnosis of UTIs, their primary symptoms, and preventive and therapeutic measures. As the most prominent topic, it garners the greatest attention from patients.

Terms related to specific diagnostic tests include “White blood cells (WBCs),” “Red blood cells (RBCs),” “Bacteria,” as well as testing methods such as “Urinalysis (UA)” and “Urine culture.”

Examples of records utilizing these keywords:

Now my routine urine test is basically normal, but the urine culture still grows bacteria—though not in large quantities. Even after taking sensitive medications, the symptoms only improve slightly.

The urine test showed elevated levels of occult blood, white blood cells, and bacteria, leading to a diagnosis of acute cystitis.

Patients also frequently report UTI-related discomfort, with key terms such as “Incomplete bladder emptying” and “Urinary hesitancy.”

Examples of such consultations:

Frequent urination, incomplete emptying, and occasional difficulty urinating. The bladder trigone feels swollen and irritated, and I have to get up to urinate over ten times at night.

Two weeks ago, after drinking lemon water, I developed frequent and urgent urination with pain. Due to the pandemic, I couldn’t get a proper urine culture, so I couldn’t receive targeted medication and had to rely on empirical treatment instead.

Posts addressing the correct usage, precautions, and dosages of medications are also prevalent, with high-frequency keywords including “Antibiotics,” “Cephalosporins,” “Levofloxacin,” and “Sanjin tablets (Chinese herbal medicine).”

For instance:

I’ve been taking antibiotics before, sometimes along with Sanjin tablets. This time, I was worried about antibiotic resistance, so I tried using only Sanjin tablets, but the results weren’t satisfactory.

Topic 3: Differential diagnosis of cystitis

This topic makes up 9.70% of the discussions (1793/18,479), focusing on the differential diagnosis of various types of cystitis. Key terms include “Cystitis glandularis,” “Cystitis,” “Interstitial cystitis,” and “Chronic cystitis.” Cystitis is a common form of UTI. While glandular cystitis and interstitial cystitis share similar symptoms with ordinary cystitis and UTIs, they differ in nature, etiology, clinical significance, and management.

Examples of records using these keywords:

Is it possible for a pathological examination of glandular cystitis to be misdiagnosed? Is it necessary to get a second opinion at another hospital?

Hello, my condition is a bit complicated. I suspect I have interstitial cystitis.

Frequent urination, discomfort, and pubic/lower abdominal pain—this feels like interstitial cystitis.

Topic 4: Etiology related to sexual activity

Accounting for 6.59% of the discussions (1217/18,479), this topic explores the etiological links between sexual activity and UTIs. Key terms include “Sexual activity,” “Masturbation,” “Erection,” “Ejaculation,” “Semen,” “Penis,” and “Frequent.” Posters in this category attribute UTIs primarily to sexual activity.

Examples incorporating these keywords:

Unprotected sexual activity (including uncondoned intercourse and oral sex) … led to frequent urination, incomplete emptying, and urgency.

Two days after masturbating, combined with eating too much sugarcane, I started having frequent urination the next day. Then, at night, I had severe night sweats, couldn’t sleep, couldn’t eat, and felt weak all over.

Two days ago, after excessive sexual activity, I developed itching in the urethra during urination.

Topic 5: Nocturnal symptoms and fever

This topic constitutes 12.10% of the discussions (2236/18,479), focusing on patients’ nocturnal symptoms and fever. Key terms include “Fever,” “Nocturia,” “Sleep,” and “Midnight,” which appear frequently in descriptions of discomfort, indicating significant distress. Nocturnal symptoms include worsening conditions at night, frequent nighttime awakenings, and insomnia. Examples of consultation records:

My buttocks and lower abdomen hurt nonstop, making it impossible to sleep. I’m extremely exhausted and have been insomnia for days.

Drinking more water leads to frequent urination, with nighttime awakenings every two hours. There's a constant dull pain.

Recurrent high fever for three days, accompanied by frequent urination and mild lower abdominal pain.

Topic 6: Perinatal considerations

Comprising 1.40% of the discussions (259/18,479), this topic reflects the distinct concerns of perinatal women regarding UTIs. Key terms include “Pregnancy,” “Infant,” “Preconception care,” and “Postpartum.”

Examples of sentences containing these keywords:

Doctor! Hello! I had in vitro fertilization and am currently around 6 weeks pregnant. I’ve had frequent urination with small volumes since the embryo transfer, and it's gotten worse lately.

During pregnancy, I have a UTI, with lower back and leg pain. My urinary tract gets infected very easily.

What medications are safe to take for a UTI during preconception? I have urine occult blood and lower back pain.

Have symptoms of frequent urination. I got a urinary tract infection while I was pregnant and I never took my medicine. The diagnosis is now cystitis.

Sentiment analysis of the UTIs topics

The proportional results of emotional classification for patients’ consultation records under each topic are presented in Figure 5. It is revealed that among the six topics, negative records predominate in patients’ consultations, whereas the proportions of neutral and positive records are relatively lower, with the proportion of negative records exceeding that of neutral ones.

Figure 5.

Topic emotional analysis: comparative distribution of positive, neutral, and negative emotions across six consultation topics.

Specifically, the proportions of positive, neutral, and negative records in the four topics, namely “Diagnosis, Symptoms, and Management Challenges,” “Differential Diagnosis of Cystitis,” “Nocturnal Symptoms and Fever,” and “Perinatal Considerations,” are relatively close. Among all six topics, the “Etiology Related to Sexual Activity” topic exhibits the highest proportion of negative records, reaching as high as 96.88%. In contrast, the “Polite Expressions for Consultation” topic has the lowest proportion of negative records at 82.32% and the highest proportion of positive records at 9.03%.

Discussion

The online doctor–patient communication, as a core component of internet-based healthcare, involves both physicians and patients as the main subjects of medical services, extending the scenarios and boundaries of healthcare delivery through the application of new technologies. Given that internet-based healthcare is still in its early developmental stage, research on online doctor–patient interactions remains relatively limited. Against this backdrop, our findings offer valuable insights into the concerns and challenges faced by patients with UTIs. Specifically, the analysis of discussions within online health communities has revealed several key points, which carry significant implications for patient care, education, and support.

Our analysis revealed that patients commonly adopt polite and respectful language when seeking online consultations. This communication style aligns with established frameworks for effective doctor–patient interaction, such as the Pendleton model and the Calgary-Cambridge guidelines, which emphasize empathy, rapport-building, and mutual respect. Moreover, prior research has shown that users’ positive attitudes toward online platforms significantly enhance their willingness to use such services (P < .01)^.⁸ Patients’ polite expressions may thus reflect both trust in and appreciation for the online consultation process, potentially eliciting more positive and empathetic responses from physicians. This may explain why the topic “Polite expressions for consultation” demonstrated the highest proportion of positive patient emotions.

The present study identified that patients’ primary concerns regarding UTIs focus on the interpretation of the analysis of test results and complex symptoms. These concerns stem partly from the variability of UTI symptoms across different populations and partly from the limitations of conventional diagnostic methods. While urine culture remains the diagnostic gold standard, it is not without limitations. For example, significant bacterial growth in culture may not necessarily indicate an active infection, particularly in cases of asymptomatic bacteriuria common among elderly women. As such, accurate diagnosis requires a nuanced interpretation of laboratory findings in light of patient-reported symptoms.⁴⁸ Our findings also highlight that patients frequently mention nocturia, particularly in relation to sleep disturbances. This symptom is likely linked to bacterial irritation of the bladder mucosa, which induces hypersensitivity and abnormal contractility, thereby diminishing the patient's ability to perceive bladder fullness and leading to frequent nighttime urination. The relationship appears to be bidirectional, as fragmented sleep may in turn exacerbate nocturia.⁴⁹ Notably, this connection has been found to persist even after adjusting for comorbidities such as BMI and diabetes.⁵⁰ Elevated body temperature is another concern that deserves attention, especially when accompanied by tremors and waist discomfort. These are typical symptoms of upper urinary tract infections, such as pyelonephritis, and are common problems in people with weak immune systems.⁵¹ Overall, our data underscore fundamental challenges in UTI diagnosis: the subjectivity of symptoms and the necessity of laboratory confirmation. Consistent with clinical guidelines, diagnosis cannot rely solely on symptoms or isolated test results. Patients often struggle to interpret discrepant findings, such as asymptomatic positive cultures, suggesting that online health platforms should enhance patient education. Providing clear explanations of diagnostic procedures, the significance of culture results, and the differences between asymptomatic bacteriuria and symptomatic infection may help alleviate patient anxiety and improve understanding.

Our data also reveal a worrisome phenomenon: patients frequently mention antibiotics without adequate understanding, reflecting that the public generally regards them as a universal remedy for all urinary tract discomfort. This cognitive bias may stem from two factors: one, the long-standing overuse of antibiotics; and two, the prescription pressure exerted by patients on doctors. Studies have confirmed that self-medication with antibiotics is commonplace and fraught with risks, not only potentially delaying proper treatment but also worsening the escalating problem of antimicrobial resistance.^52–55 In this context, the role of online doctors extends beyond information provision; they must take on a crucial health education remit. They should proactively correct patients’ erroneous beliefs, emphasize the importance of targeted therapy when indicated by urine culture and susceptibility results, and firmly discourage self-medication. This educational function is essential for the deep integration of online consultations with antimicrobial stewardship principles.

International experience offers useful reference points. Registry data from Sweden⁵⁶ show that antibiotic prescribing in digital primary care is markedly lower than in offline clinics (for urinary tract infections, the prescribing rate difference can reach 34–41 percentage points). This finding robustly demonstrates that digital healthcare can effectively substitute traditional service models without increasing the risk of antibiotic misuse. The study also identifies two core problems in doctor–patient communication records: first, there is widespread confusion among patients about antibiotic use (e.g., duration of therapy, handling of side effects); second, clinicians fail to convey prescribing guidelines adequately. Similarly, analysis of electronic prescriptions for urinary tract infections during the Saudi pandemic yields a comparable warning. The study⁵⁷ found antibiotic prescribing accounted for 32.1%, with higher prescribing among men, children, and patients with urogenital disorders. Even more concerning, “primary adherence” was very low (only 35.5% of prescriptions were redeemed), attributed primarily to “short prescription validity” and “poor patient awareness.” This phenomenon underscores the urgency and necessity of strengthening patient education.

Our analysis also revealed frequent confusion among patients regarding different subtypes of cystitis. While acute cystitis is highly prevalent among women,^58–60 noninfectious etiologies such as glandular cystitis or interstitial cystitis (IC) must also be considered. Glandular cystitis involves metaplastic changes in the bladder lining, often triggered by chronic inflammation, whereas IC, or bladder pain syndrome (BPS), is characterized by chronic pelvic pain, urgency, and frequency in the absence of infection.^61,62 Despite their differing pathophysiology, the similarity in symptoms between these conditions complicates differential diagnosis,⁶³ underscoring the importance of accurate interpretation in both online and offline clinical settings.

The analysis of consultation timing within our data suggests potential delays in seeking professional care, with patients often attempting self-management first. This delay can be particularly risky for upper UTIs like pyelonephritis, where prompt treatment is essential. Online consultations, while potentially adding a step, can serve as a critical funnel to expedite care by triaging these cases effectively. Furthermore, our data reveals nuanced concerns across demographics. The research also pinpointed particular weaknesses within specific demographic groups. A notable advantage of online consultations is the degree of anonymity they offer, which appears to reduce patients’ anxiety and increase their willingness to disclose sensitive information. This is especially evident in discussions around sexual activity—a topic often stigmatized in face-to-face settings. The reduced visibility of social cues (e.g., facial expression, tone) may shield patients from judgment, thereby encouraging more open communication.^64,65 Young women who are sexually active are at a higher risk, largely because of how often they have sex and their anatomy. Things like a shorter urethra that's close to the anus make it easier for bacteria to travel where they shouldn’t.^66,67 In our study, we noticed that people were more likely to admit to risky sexual behavior, which is a well-known contributor to UTIs.⁶⁸ Pregnant women are also a group where UTIs are more common. Hormonal changes and physical compression of the urinary tract slow urine flow and promote bacterial growth. If untreated, bacteriuria in pregnancy may lead to serious complications such as pyelonephritis or preterm birth.^69–72 This risk is further elevated in those with gestational diabetes mellitus (GDM), where insulin resistance may impair immune responses. In parallel, our findings show that pregnant women express substantial concerns about medication use. Consistent with earlier studies, many believe that all medications may pose risks to the fetus, leading over one-third to avoid prescriptions altogether and more than half to self-modify medication regimens.^73,74 This hesitation was echoed in our dataset, where perinatal patients were often reluctant to inquire about pharmaceutical options, highlighting the need for better education and reassurance in these scenarios.

Across the board, sentiment analysis reveals that patients tend to be in a pretty-negative emotional state. This negativity stems from a few key areas: first, the constant physical discomfort and pain caused by UTI symptoms like frequent urination, urgency, and painful urination really take a toll. Add to that the emotional distress—the anxiety, the feeling down, the sense of helplessness—that comes from dealing with chronic discomfort and the worry about the infection coming back. It's no surprise that this leads to a general dip in quality of life across the board, affecting everything from physical abilities and mental well-being to social life and the ability to carry out daily routines. On top of all that, the frequent setbacks of treatment failures or recurrent infections can lead to intense feelings of frustration and a sense of losing control. And let's not forget the potential stigma linked to the illness or its symptoms, like urinary urgency or incontinence, which can cause people to withdraw socially. These emotional findings line up perfectly with what other studies have shown.^5,75,76 Given the complex interactions among the aforementioned factors, a comprehensive intervention model that integrates physical treatment and psychological support is of vital importance in the clinical management of UTI.

Limitations

This study has several limitations that warrant consideration when interpreting the results. Although the research design focuses exclusively on patient-initiated questions, thereby excluding the dynamics of physician–patient interaction, this specific approach enables the capture of patients’ most spontaneous and unguided expressions of healthcare needs. Current research on online healthcare often prioritizes the physician's perspective; consequently, patient-driven inquiries serve as crucial indicators of unmet medical demands. Future investigations should adopt more comprehensive frameworks. A critically underrepresented patient group is the elderly population, whose potentially lower engagement with social media platforms may limit their representation. Furthermore, participation bias is likely, as individuals actively posting about UTIs on social media may disproportionately represent those experiencing more severe symptoms, complex treatment courses, or particularly burdensome healthcare challenges compared to nonposting patients. In addition, the LLM employed in this study has inherent limitations. First, its operational mechanism is essentially a “black box,” with internal reasoning processes that lack transparency and make full verification challenging. Second, as a general-purpose model trained on vast internet-scale data, it may carry forward biases present in its training data, which could influence outcomes in tasks like sentiment classification. To address this, we have not only incorporated a manual verification step but also recognized the need to adopt more automated verification frameworks—such as Retrieval-Augmented Generation (RAG)—in future work. This approach aims to enhance the verifiability of the model's outputs and establish a more robust verification process.

Conclusions

This study, by analyzing online doctor–patient communications, investigates the concerns and challenges faced by patients with UTI. It finds that patients commonly struggle with interpreting diagnoses, using antibiotics, and anxieties particular to different population groups, underscoring the need for more proactive, educational, and guidance-oriented approaches in online consultations. The findings suggest that incorporating clinical decision support tools, guiding clinicians to address key knowledge points, providing standardized test interpretation information, and identifying potential risk factors based on patients’ reported symptoms, can markedly improve the quality and safety of online consultation platforms.

Moreover, this study not only systematically maps patients’ concerns but also offers practical insights for clinical practice in the digital health arena. Future research should prioritize assessing how such integrated educational interventions affect patient outcomes, antibiotic prescribing rates, and the management of common infections via remote healthcare, thereby supplying internet-based platforms and policymakers with a clearer understanding of patient needs and a basis for designing targeted intervention strategies.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251393289 - Supplemental material for Personalized insights into urinary tract infection management: A text mining analysis of online consultation data

Supplemental material, sj-docx-1-dhj-10.1177_20552076251393289 for Personalized insights into urinary tract infection management: A text mining analysis of online consultation data by Ruijie Tang, Peiqi Zhu, Ruxue Yan, Yaping Zhou, Zhian Tang and Weiming He in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076251393289 - Supplemental material for Personalized insights into urinary tract infection management: A text mining analysis of online consultation data

Supplemental material, sj-docx-2-dhj-10.1177_20552076251393289 for Personalized insights into urinary tract infection management: A text mining analysis of online consultation data by Ruijie Tang, Peiqi Zhu, Ruxue Yan, Yaping Zhou, Zhian Tang and Weiming He in DIGITAL HEALTH

Footnotes

Acknowledgments

We thank the database provider, maintenance team, and all reviewers for their valuable contributions.

ORCID iD

Weiming He

Ethical approval

This study was approved by the Ethics Committee of Jiangsu Province Hospital of Chinese Medicine. Since the observed data were obtained directly from the public platform, the requirement for participant written informed consent was waived by the Ethics Committee of Jiangsu Province Hospital of Chinese Medicine (Approval No. 2024NL-025-01). Our study was conducted in accordance with the principles outlined in the Declaration of Helsinki.

Contributorship

RT led the conceptualization of the study, with PZ contributing equally. RY and YZ were responsible for data curation and investigation. RT conducted the formal analysis, supported by PZ. HW and TZ acquired funding and provided resources. Project administration was led by RT and PZ, with equal participation from YZ and RY, and supporting roles from WH and ZT. Supervision and validation were jointly performed by RT and PZ. RT carried out the visualization. The original draft was led by RT and PZ, with supporting input from WH and ZT. All authors participated in reviewing and editing the manuscript, with RT and PZ taking the lead.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Natural Science Foundation of China (grant number 82575032), the Jiangsu Provincial Medical Innovation Center (grant number 202215), and the Postgraduate Research and Practice Innovation Program of Jiangsu Province (grant number SJCX24_0954).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Guarantor

HW.

Supplemental material

Supplemental material for this article is available online.

References

Huang

Chan

C-K

Yee

, et al. Global burden and temporal trends of lower urinary tract symptoms: a systematic review and meta-analysis. Prostate Cancer Prostatic Dis 2023; 26: 421–428.

Abufaraj

Cao

, et al. Prevalence and trends in urinary incontinence among women in the United States, 2005-2018. Am J Obstet Gynecol 2021; 225: 166.e1.

Medina

Castillo-Pino

. An introduction to the epidemiology and burden of urinary tract infections. Ther Adv Urol 2019; 11: 3–7.

Wagenlehner

Wullt

Ballarini

, et al. Social and economic burden of recurrent urinary tract infections and quality of life: a patient web-based study (GESPRIT). Expert Rev Pharmacoecon Outcomes Res 2018; 18: 107–117.

Grigoryan

Mulgirigama

Powell

, et al. The emotional impact of urinary tract infections in women: a qualitative analysis. BMC Womens Health 2022; 22: 182.

Thompson

Marijam

Mitrani-Gold

, et al. Activity impairment, health-related quality of life, productivity, and self-reported resource use and associated costs of uncomplicated urinary tract infection among women in the United States. PLOS ONE 2023; 18: e0277728.

Liobikiene

Bernatoniene

. The determinants of access to information on the internet and knowledge of health related topics in European countries. Health Policy 2018; 122: 1348–1355.

Meng

You

Liu

, et al. Health information adoption behavior research —— conceptualization, theoretical framework, and future directions. Mod Inf 2024; 44: 157–167.

Woo

. A study on the development of China's medical and health online mutual aid platform industry. J China Stud 2020; 23: 23–48.

10.

Zhang

Jiao

, et al.

Digital dividend” or “digital divide": what role does the internet play in the health inequalities among Chinese residents?

Int J Environ Res Public Health 2022; 19: 15162.

11.

Nasseef

Baabdullah

Alalwan

, et al. Artificial intelligence-based public healthcare systems: g2G knowledge-based exchange to enhance the decision-making process. Gov Inf Q 2022; 39: 101618.

12.

Kumar

Dwivedi

Anand

. Responsible artificial intelligence (AI) for value formation and market performance in healthcare: the mediating role of patient’s cognitive engagement. Inf Syst Front 2023; 25: 2197–2220.

13.

Zhang

Fan

. User adoption of physician's replies in an online health community: an empirical study. J Assoc Inf Sci Technol 2020; 71: 1179–1191.

14.

Chen

Guo

, et al. Exploring the influence of doctor–patient social ties and knowledge ties on patient selection. Internet Res 2022; 32: 219–240.

15.

Liu

, et al. Patients’ self-disclosure positively influences the establishment of patients’ trust in physicians: an empirical study of computer-mediated communication in an online health community. Front Public Health 2022; 10: 1–12.

16.

Liu

Xiao

, et al. Is my doctor around me? Investigating the impact of doctors’ presence on patients’ review behaviors on an online health platform. J Assoc Inf Sci Technol 2022; 73: 1279–1296.

17.

Sun

Zhang

Zhu

, et al. Exploring users’ willingness to disclose personal information in online healthcare communities: the role of satisfaction. Technol Forecast Soc Change 2022; 178: 121596.

18.

Wang

Shukla

Shi

. Digitalized social support in the healthcare environment: effects of the types and sources of social support on psychological well-being. Technol Forecast Soc Change 2021; 164: 120503.

19.

Mirzaei

Esmaeilzadeh

. Engagement in online health communities: channel expansion and social exchanges. Inf Manage 2021; 58: 103404.

20.

Flower

Bishop

Lewith

. How women manage recurrent urinary tract infections: an analysis of postings on a popular web forum. BMC Fam Pract 2014; 15: 162.

21.

Leydon

Turner

Smith

, et al. The journey from self-care to GP care: a qualitative interview study of women presenting with symptoms of urinary tract infection. Br J Gen Pract 2009; 59: e219–e225.

22.

Liu

Wang

Fan

, et al. Finding useful solutions in online knowledge communities: a theory-driven design and multilevel analysis. Inf Syst Res 2020; 31: 731–752.

23.

Lindelof

Aledavood

Keller

. Dynamics of the negative discourse toward COVID-19 vaccines: topic modeling study and an annotated data set of twitter posts. J Med Internet Res 2023; 25: e41319.

24.

Zhang

Yin

Zeng

, et al. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020; 20: 280.

25.

Osei-Frimpong

Wilson

Lemke

. Patient co-creation activities in healthcare service delivery at the micro level: the influence of online access to healthcare information. Technol Forecast Soc Change 2018; 126: 14–27.

26.

Hickson

Talbert

Thornbury

, et al. Online medical care: the current state of “eVisits” in acute primary care delivery. Telemed e-Health 2015; 21: 90–96.

27.

Rajkomar

Oren

Chen

, et al. Scalable and accurate deep learning with electronic health records. npj Digital Med 2018; 1: 18.

28.

Liu

Lin

Mao

, et al. Changing prevalence of chronic hepatitis B virus infection in China between 1973 and 2021: a systematic literature review and meta-analysis of 3740 studies and 231 million people. Gut 2023; 72: 2354.

29.

Lin

Xin

Peng

, et al. Research on topic mining and evolution trends of functional agriculture based on the BERTopic model. Agriculture-Basel 2024; 14: 1691.

30.

Harloff

Stringer

Perry

. Sample size requirements for stable clustering of free partition sorting data. Bull Sociological Methodol 2013; 117: 93–105.

31.

Egger

. A topic modeling comparison between LDA, NMF, Top2Vec, and BERTopic to demystify twitter posts. Front Sociol 2022; 7: 886498.

32.

Mandas

Lahmar

Piras

, et al.

ESG In the financial industry: what matters for rating analysts?

Res Int Bus Finance 2023; 66: 102045.

33.

Wang

Chen

, et al. Identifying interdisciplinary topics and their evolution based on BERTopic. Scientometrics 2024; 129: 7359–7384.

34.

Kim

. What enhances or worsens the user-generated metaverse experience? An application of BERTopic to roblox user eWOM. Internet Res 2024; 34: 1800–1817.

35.

Hofmann

. Probabilistic latent semantic indexing. Acm Sigir 2017; 51: 211–218.

36.

Zhang

Bao

Wang

, et al. Textual semantic mining and sentiment analysis based on review data. Inf Sci 2021; 39: 53–61.

37.

Abuzayed

Al-Khalifa

. BERT For arabic topic modeling: an experimental study on BERTopic technique. Procedia Comput Sci 2021; 189: 191–194.

38.

Devlin

Chang

M-W

Lee

, et al. Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), 2019, pp.4171–4186.

39.

Jeon

Yoon

Sohn

. Exploring new digital therapeutics technologies for psychiatric disorders using BERTopic and PatentSBERTa. Technol Forecast Soc Change 2023; 186: 122130.

40.

Yang

. Research on topic mining and evolution analysis of domestic information resource management based on BERTopic model. J Inf Sci 2024; 42: 12–21.

41.

Ogunleye

Maswera

Hirsch

, et al. Comparison of topic modelling approaches in the banking context. Appl Sci 2023; 13: 797.

42.

Grootendorst

. BERTopic: neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:220305794 2022.

43.

Liu

Arulappan

Naha

, et al. Large language models and sentiment analysis in financial markets: a review, datasets, and case study. IEEE Access 2024; 12: 134041–134061.

44.

Chatzimina

Papadaki

Pontikoglou

, et al. Topic modeling and sentiment analysis of Greek clinician-patient conversations in hematologic malignancies. Int J Med Inf 2025; 204: 106071.

45.

Lyu

Han

Luli

. COVID-19 Vaccine-Related discussion on twitter: topic modeling and sentiment analysis. J Med Internet Res 2021; 23: e24435.

46.

Song

Malin

, et al. Examining online behaviors of adult-child and spousal caregivers for people living with Alzheimer disease or related dementias: comparative study in an open online community. J Med Internet Res 2023; 25: e48193.

47.

Kong

Chen

Zhang

, et al. Public discourse and sentiment toward dementia on Chinese social Media: machine learning analysis of weibo posts. J Med Internet Res 2022; 24: e39805.

48.

Chu

Lowder

. Diagnosis and treatment of urinary tract infections across age groups. Am J Obstet Gynecol 2018; 219: 40–51.

49.

Dani

Esdaille

Weiss

. Nocturia: aetiology and treatment in adults. Nat Rev Urol 2016; 13: 573–583.

50.

Fantus

Packiam

Wang

, et al. The relationship between sleep disorders and lower urinary tract symptoms: results from the NHANES. J Urol 2018; 200: 161–166.

51.

Pietrucha-Dilanchian

Hooton

. Diagnosis, treatment, and prevention of urinary tract infection. Microbiol Spectr 2016; 4: 1–20.

52.

Chen

Wang

, et al. Antimicrobial resistance and molecular epidemiology of carbapenem-resistant Escherichia coli from urinary tract infections in Shandong, China. Int Microbiol 2023; 26: 1157–1166.

53.

Minmin

Yan

, et al. Distribution and antimicrobial susceptibility results of pathogenic bacteria in 473 bacterial urinary tract infections. Gansu Med J 2022; 41: 1085–1088.

54.

Trautner

Kaye

Gupta

, et al. Risk factors associated with antimicrobial resistance and adverse short-term health outcomes among adult and adolescent female outpatients with uncomplicated urinary tract infection. Open Forum Infect Dis 2022; 9: 1–8.

55.

Magiorakos

Srinivasan

Carey

, et al. Multidrug-resistant, extensively drug-resistant and pandrug-resistant bacteria: an international expert proposal for interim standard definitions for acquired resistance. Clin Microbiol Infect 2012; 18: 268–281.

56.

Wilkens

Thulesius

Arvidsson

, et al. Evaluating the effect of digital primary care on antibiotic prescription: evidence using Swedish register data. Digital Health 2023; 9: 1–14.

57.

Alhassoun

Aldossary

. Utilization of remote e-prescription (Anat) in Saudi Arabia during COVID-19: factors associated with primary adherence and antibiotic prescription. Digital Health 2023; 9: 1–17.

58.

Mancuso

Midiri

Gerace

, et al. Urinary tract infections: the current scenario and future prospects. Pathogens 2023; 12: 623.

59.

Yildirim

Shoskes

Kulkarni

, et al.

Urinary microbiome in uncomplicated and interstitial cystitis: is there any similarity?

World J Urol 2020; 38: 2721–2731.

60.

Gezginci

Iyigun

Acikel

, et al. Determination of genital hygiene behaviours in women with cystitis. Int J Urol Nurs 2013; 7: 161–165.

61.

Bogart

Berry

Clemens

. Symptoms of interstitial cystitis, painful bladder syndrome and similar diseases in women: a systematic review. J Urol 2007; 177: 450–456.

62.

Theoharides

Whitmore

Stanford

, et al. Interstitial cystitis: bladder pain and beyond. Expert Opin Pharmacother 2008; 9: 2979–2994.

63.

Dell

Mokrzycki

Jayne

. Differentiating interstitial cystitis from similar conditions commonly seen in gynecologic practice. Eur J Obstet Gynecology Reprod Biol 2009; 144: 105–109.

64.

Walther

. Cues filtered out, cues filtered in: computer mediated communication and relationships. In: Handbook of interpersonal communication, 2002, pp.529.

65.

Chester

Glass

. Online counselling: a descriptive analysis of therapy services on the internet. Br J Guid Counc 2006; 34: 145–160.

66.

Seid

Markos

Aklilu

, et al. Community-Acquired urinary tract infection among sexually active women: risk factors, bacterial profile and their antimicrobial susceptibility patterns, arba minch, southern Ethiopia. Infect Drug Resist 2023; 16: 2297–2310.

67.

Amiri

Rooshan

Ahmady

, et al. Hygiene practices and sexual activity associated with urinary tract infection in pregnant women. East Mediterr Health J 2009; 15: 104–110.

68.

Riaz

Sherwani

NZF

Inam

SHA

, et al. Physician gender preference amongst females attending obstetrics/gynecology clinics. Cureus 2021; 13: e15028.

69.

Pan

Ling

Zhu

, et al. Analysis of influencing factors and prevention strategies for hospital-acquired infections in obstetric patients. Chin J Hosp Infect 2017; 27: 3579–3582.

70.

Xie

. Analysis of risk factors and prevention measures for urinary tract infections during pregnancy. Chin J Maternal Child Health 2017; 32: 3485–3486.

71.

Athanasiou

Antsaklis

Betsi

, et al. Clinical and urodynamic parameters associated with history of urinary tract infections in women: a prospective study. Acta Obstet Gynecol Scand 2007; 86: 1130–1135.

72.

Rizvi

Nazim

. The frequency of urinary symptoms in women attending gynaecology clinics at the Aga Khan University Hospital Karachi, Pakistan. JPMA J Pak Med Assoc 2005; 55: 489–492.

73.

Wolgast

Lindh-Astrand

Lilliecreutz

. Women's perceptions of medication use during pregnancy and breastfeeding—A Swedish cross-sectional questionnaire study. Acta Obstet Gynecol Scand 2019; 98: 856–864.

74.

Head

Doamekpor

South

, et al. Behaviors related to medication safety and use during pregnancy. J Womens Health (Larchmt) 2023; 32: 47–56.

75.

Yun

Powell

Mulgirigama

, et al. The emotional impact of uncomplicated urinary tract infections in women in China and Japan: a qualitative study. BMC Womens Health 2024; 24: 94.

76.

Renard

Ballarini

Mascarenhas

, et al. Recurrent lower urinary tract infections have a detrimental effect on patient quality of life: a prospective, observational study. Infect Dis Ther 2015; 4: 125–135.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.01 MB

0.03 MB

Personalized insights into urinary tract infection management: A text mining analysis of online consultation data

Abstract

Objectives

Methods

Results

Conclusion

Keywords

Introduction

Methods

Study overview​

Data collection​

Data preprocessing​

Topic identification with BERTopic methodology

Topic labeling with LLM using prompt engineering

Ethical considerations

Results

Topic identification

Main themes

Thematic content analysis

Topic 1: Polite expressions for consultation

Topic 2: Diagnosis, symptoms, and management challenges

Topic 3: Differential diagnosis of cystitis

Topic 4: Etiology related to sexual activity

Topic 5: Nocturnal symptoms and fever

Topic 6: Perinatal considerations

Sentiment analysis of the UTIs topics

Discussion

Limitations

Conclusions

Supplemental Material

sj-docx-1-dhj-10.1177_20552076251393289 - Supplemental material for Personalized insights into urinary tract infection management: A text mining analysis of online consultation data

Supplemental Material

sj-docx-2-dhj-10.1177_20552076251393289 - Supplemental material for Personalized insights into urinary tract infection management: A text mining analysis of online consultation data

Footnotes

Acknowledgments

ORCID iD

Ethical approval

Contributorship

Funding

Declaration of conflicting interests

Data availability statement

Guarantor

Supplemental material

References

Supplementary Material

Study overview

Data collection

Data preprocessing