Abstract
Objective
To analyze pelvic floor dysfunction heterogeneity by identifying distinct clinical subtypes using unsupervised machine learning in a nationally representative cohort.
Methods
This cross-sectional study analyzed 7291 women from the National Health and Nutrition Examination Survey (NHANES 2005–2012). K-means clustering was applied to pelvic floor dysfunction–positive women to identify latent subtypes based on physiological features. Supervised machine learning models were subsequently developed to predict subtype membership, with SHapley Additive exPlanation analysis used to identify key predictors.
Results
Two distinct subtypes emerged. Phenotype 1 (“metabolic-inflammatory”) was characterized by severe central obesity (mean body mass index, 36.1 kg/m2), diabetes, and hypertension. Conversely, Phenotype 2 (“metabolically-healthy”) revealed pelvic floor dysfunction symptoms but maintained a healthy metabolic profile indistinguishable from the controls. A neural network model accurately differentiated these groups (area under the curve = 0.848), identifying waist circumference as the primary predictive factor.
Conclusions
Pelvic floor dysfunction is not a single disorder but includes a distinct “metabolic-inflammatory” phenotype strongly associated with systemic metabolic disease. This data-driven classification challenges traditional paradigms, suggesting “subtype-specific” strategies for precision management.
Keywords
Introduction
Pelvic floor dysfunction (PFD) encompasses a group of highly prevalent yet under-recognized clinical syndromes affecting hundreds of millions of women worldwide, primarily manifesting as stress urinary incontinence (SUI), urgency urinary incontinence (UUI), pelvic organ prolapse (POP), and fecal incontinence (FI).1–3 These conditions represent a complex constellation of symptoms that frequently coexist and overlap, significantly impacting women’s health across multiple domains. The impact of PFD on women’s quality of life is profound; it not only causes significant physical distress but also imposes substantial psychological and economic burden.4–6 Furthermore, among middle-aged and older women, symptoms such as UUI significantly increase the risk of falls, posing a serious health threat. 7 As the global population ages, the prevalence of PFD is projected to rise, establishing it as an escalating public health challenge that requires a transition toward more holistic and integrative management frameworks. 8
Although conventional epidemiological research has identified several key risk factors for PFD—with advancing age, parity (particularly vaginal delivery), menopause, and obesity being the most recognized drivers6,9–11—a persistent challenge for both clinicians and researchers is the condition’s marked clinical heterogeneity.1,12 Individuals with similar risk exposures often exhibit wide variations in symptom presentation, disease progression, and therapeutic response. 12 Current clinical practice and research consider PFD as a single, homogeneous entity. This monolithic view critically limits our understanding of its complex pathophysiology and impedes the development of personalized prevention and treatment strategies.13–15
In recent years, machine learning, particularly unsupervised clustering, has emerged as a robust data-driven tool for analyzing complex, heterogeneous diseases, successfully identifying clinically meaningful subtypes in fields such as oncology, diabetes, and cardiovascular disease.16–20 This approach, which does not rely on a priori assumptions, can objectively uncover latent patient clusters from high-dimensional data, thereby challenging traditional medical paradigms and advancing the stratification required for precision medicine.20,21 We hypothesized that PFD is not a single pathological entity but rather a collection of distinct phenotypes characterized by different pathophysiological pathways. Specifically, we posited that in addition to the classic phenotype—primarily characterized by mechanical injury and tissue aging—another phenotype associated with systemic metabolic dysregulation and chronic low-grade inflammation may exist, which has not yet been clearly defined.
Accordingly, this study aimed to systematically analyze the heterogeneity of female PFD by applying unsupervised machine learning to a large, nationally representative dataset from the US National Health and Nutrition Examination Survey (NHANES). Our specific objectives were to (a) empirically identify data-driven clinical phenotypes within the PFD-positive population; (b) comprehensively characterize and compare these phenotypes across multiple domains, including demographics, clinical comorbidities, and metabolic, inflammatory, and functional status; and (c) develop and validate a machine learning model to predict phenotype membership and identify the most critical clinical predictors. Through this research, we aim to provide new insights into the heterogeneity of PFD and to inform future precision diagnostics and phenotype-targeted therapeutic strategies.
A preliminary version of this manuscript has been previously released as a preprint in medRxiv. 22
Material and methods
Study design and population
This retrospective, cross-sectional study included a secondary analysis of data from the US NHANES 2005–2012. This study was reported in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines for cross-sectional studies. 23 This research was conducted in accordance with the ethical principles of the Declaration of Helsinki of 1975, as revised in 2024. The authors accessed the NHANES database for research purposes on 20 September 2025. The authors did not have access to information that could identify individual participants during or after data collection, as the dataset was fully deidentified. The NHANES protocol was approved by the National Center for Health Statistics (NCHS) Research Ethics Review Board, and all participants provided written informed consent. Given the secondary analysis of publicly available, de-identified data, this study was exempted from Institutional Review Board (IRB) approval. The participant selection process is detailed in Figure 1. Among 40,790 nonpregnant individuals from the NHANES 2005–2012 cycles, we considered 20,533 female participants. We then restricted the cohort to adults aged ≥20 years, resulting in 11,547 eligible individuals. To focus on PFD driven by non-iatrogenic factors, we subsequently excluded participants with missing PFD outcome data (n = 1777), a history of major pelvic surgery (n = 2121), specific neurological diseases (including stroke, multiple sclerosis, spinal cord injury, and Parkinson’s disease; n = 193), and cancers with direct pelvic impact (including gynecological, colorectal, or urological cancers; n = 165). Following this rigorous screening process, a final analytical cohort of 7291 participants was established. To further validate the biological characteristics of the identified subtypes, we also conducted sensitivity analyses in available data subsets using an objective physiological measure (grip strength) and a marker of systemic inflammation (periodontitis) (Supplementary Material A).

Flowchart of participant selection.
Variable definitions
PFD was defined by the presence of one or more of the following symptoms: SUI, UUI, FI, or POP. Symptom presence was determined using standardized questions from the NHANES questionnaires. Detailed definitions for all outcome variables and covariates are provided in Supplementary Material B.
Statistical analysis
All data cleaning, management, and analyses were performed using R software (version 4.4.2), incorporating the survey package to account for the complex, multistage sampling design of NHANES (weighting, strata, and clusters). All statistical tests were two-sided, and a P-value <0.05 was considered statistically significant. Regarding the handling of missing data, we first assessed the proportion of missingness for all covariates (a detailed summary of missing rates for all features and covariates is provided in Supplementary Table S1). Any covariate with a missing rate >20% was excluded from the models, unless the indicator was a biologically essential derived measure (e.g. triglyceride-glucose [TyG] index) whose underlying components exhibited acceptable missingness for imputation. Missing data in the covariates were addressed using multiple imputation by chained equations (MICE), a highly recommended method in current literature (R package mice, m = 20 imputations).24,25 Our analytical workflow consisted of two main stages: Stage 1: Unsupervised discovery of PFD subtypes. First, we applied the K-means clustering algorithm to the PFD-positive cohort to objectively identify latent clinical subtypes. The clustering was based on 12 carefully selected core physiological features spanning the domains of obesity, metabolism, nutrition, physical function, and mental health. These variables were selected to capture core biological domains directly reflecting metabolic, inflammatory, and functional status while minimizing redundancy and collinearity. Behavioral and lifestyle factors (e.g. physical activity, dietary indices, and psychosocial stress) were intentionally excluded from clustering to preserve the biological interpretability of phenotypic centroids and were instead incorporated in downstream comparative analyses. The optimal number of clusters (K) was determined by the integrated use of the elbow and silhouette methods. Stage 2: Supervised prediction and interpretation of subtypes. After identifying two clinical subtypes, we categorized the study population into three classes: (a) healthy controls; (b) Phenotype 1; and (c) Phenotype 2. The dataset was then randomly divided into a training set (80%) and an independent test set (20%). In the training set, we developed and compared seven distinct machine learning algorithms—including neural networks, XGBoost, logistic regression, random forest, support vector machines, naive Bayes, and K-nearest neighbors—to construct multiclass models capable of accurately predicting an individual’s class membership. Model performance was evaluated in the test set using the macro-average area under the receiver operating characteristic curve (AUC) as the primary metric. The macro-average AUC was selected because it computes the unweighted mean of class-specific one-vs-rest AUCs in a multiclass setting, thereby ensuring that performance for smaller phenotypes is not overshadowed by larger groups. To interpret the best-performing model and identify the key drivers of subtype differentiation, we employed SHapley Additive exPlanations (SHAP) analysis to quantify the contribution of each predictor. Baseline characteristics were described using weighted means (standard deviations) or medians (interquartile ranges) for continuous variables and weighted percentages for categorical variables. Group comparisons were conducted using design-based t-tests (or Kruskal–Wallis tests) and chi-square tests with the Rao & Scott adjustment, as appropriate. We also performed a sensitivity analysis by constructing three nested predictive models to assess the change in the relative importance of parity across varying levels of model complexity.
Prior to clustering, all continuous variables were standardized using z-score normalization to account for the scale sensitivity of distance-based algorithms. K-means clustering was selected because the clustering features primarily consisted of continuous physiological variables, and the method is computationally efficient and distribution-free and yields clinically interpretable cluster centroids in large population-based datasets. Hierarchical clustering was additionally explored as a sensitivity analysis to assess robustness.
The optimal number of clusters was determined using internal validation metrics (elbow method and average silhouette width) in conjunction with biological interpretability. Although several values of K were examined, a two-cluster solution was favored because it produced clinically coherent and biologically distinct phenotypes, whereas solutions with higher K values resulted in fragmented subgroups without clear clinical relevance. To ensure the reliability of the algorithm, clustering was performed in a complete-case subset of the PFD-positive population (N =16,497,385), as the K-means algorithm is sensitive to data incompleteness.
Results
Participant baseline characteristics
The final analytical cohort included 7291 participants, representing 72,325,873 US adult women. Of these, 36,483,067 (a weighted 50.5%) were identified as having at least one type of PFD (PFD-positive group), while 35,842,806 (a weighted 49.5%) had no PFD symptoms (PFD-negative group). The baseline characteristics of these two groups are detailed in Table 1.
Baseline characteristics of the study population, stratified by pelvic floor dysfunction status.
Mean (SD)
Median (Q1, Q3); n (%)
Design-based t-test; Pearson’s X2: Rao & Scott adjustment
PFD: pelvic floor dysfunction; PIR: poverty–income ratio; BMI: body mass index; WC: waist circumference; TyG: triglyceride–glucose index; TyG-WC: triglyceride–glucose–waist circumference index; CRP: C-reactive protein; NLR: neutrophil-to-lymphocyte ratio; SII: systemic immune-inflammation index; HEI-2015: Healthy Eating Index-2015; PFI: Physical Function Index; PHQ-9, Patient Health Questionnaire-9; LS7: Life’s Simple 7; MgDS: Magnesium Depletion Score; SD: standard deviation; Q1: first quartile; Q3: third quartile.
Compared with their PFD-negative counterparts, women in the PFD-positive group were significantly older (mean age: 48.2 vs. 38.7 years, P < 0.001). Regarding reproductive history, the PFD-positive group consisted of a significantly lower proportion of nulliparous women (18.7% vs. 37.2%) and a significantly higher proportion of women who had experienced only vaginal deliveries (63.3% vs. 43.1%; P < 0.001). Furthermore, the prevalence of postmenopausal status was nearly double in the PFD-positive group (41.4% vs. 20.7%, P < 0.001).
In terms of comorbidities, the prevalence of several chronic conditions was significantly higher in the PFD-positive group, including diabetes (9.6% vs. 4.1%), hypertension (31.0% vs. 16.9%), and chronic cough (11.2% vs. 7.4%) (all P < 0.001).
Markers of central obesity and metabolic dysregulation were also more pronounced in the PFD-positive group. Their mean body mass index (BMI) (29.8 vs. 27.3 kg/m2), waist circumference (WC) (98.0 vs. 91.4 cm), and TyG-WC index—an indicator of insulin resistance and central obesity—were all significantly higher than those in the PFD-negative group (all P < 0.001). Regarding inflammatory markers, C-reactive protein (CRP) and serum albumin levels showed statistically significant differences (both P < 0.001), whereas the neutrophil-to-lymphocyte ratio (NLR) and systemic immune-inflammation index (SII) did not differ significantly between the two groups.
In the domains of functional, lifestyle, and psychosocial factors, the PFD-positive group exhibited poorer physical function (higher PFI scores), higher prevalence of clinical depressive symptoms (12% vs. 5.8%), greater food insecurity (14.2% vs. 11.5%), and higher rate of ever-smokers (40.2% vs. 36.1%) (all P < 0.05). The Life’s Simple 7 (LS7) score, representing overall cardiovascular health, was significantly lower in the PFD-positive group (mean: 8.1 vs. 9.4, P < 0.001), while the Magnesium Depletion Score (MgDS) indicated a more severe risk of magnesium deficiency in this group (P < 0.001).
Notably, no statistically significant differences were observed between the two groups in terms of the poverty–income ratio (PIR), Healthy Eating Index (HEI-2015), or Dietary Inflammatory Index (DII).
Identification and definition of PFD clinical subtypes
To investigate the heterogeneity within the PFD-positive population, we applied K-means clustering, an unsupervised learning algorithm, to 12 core physiological features. The optimal number of clusters (K) was determined using both the elbow and silhouette methods. The elbow method revealed a sharp decrease in the within-cluster sum of squares (WSS) as K increased from 1 to 2, indicating a potential inflection point at K = 2 (Supplementary Material C, Figure C1). More definitively, the silhouette method revealed that the average silhouette width peaked at K = 2 (approx. 0.24) before declining with additional clusters, providing strong evidence for selecting two clusters (Supplementary Material C, Figure C2). Importantly, the two-cluster solution was not only statistically verified but also yielded phenotypes with clear biological coherence, separating a metabolically compromised subgroup from a metabolically healthy subgroup with traditional risk characteristics. Consequently, we classified the PFD-positive population into two distinct clinical subtypes, designated as Phenotype 1 and Phenotype 2.
Table 2 provides a detailed comparison of baseline characteristics among the healthy control group, Phenotype 1, and Phenotype 2. The results demonstrate that these two subtypes possess strikingly different clinical and biological profiles.
Baseline characteristics of healthy controls and the two identified PFD phenotypes.
Mean (SD); n (%)
Design-based Kruskal–Wallis test; Pearson’s X2: Rao & Scott adjustment
PFD: pelvic floor dysfunction; PIR: poverty–income ratio; BMI: body mass index; WC: waist circumference; TyG: triglyceride–glucose index; TyG-WC: triglyceride–glucose–waist circumference index; CRP: C-reactive protein; NLR: neutrophil-to-lymphocyte ratio; SII: systemic immune-inflammation index; HEI-2015: Healthy Eating Index-2015; PFI: Physical Function Index; PHQ-9: Patient Health Questionnaire-9; LS7: Life’s Simple 7; MgDS: Magnesium Depletion Score; SD: standard deviation.
The sample size in Table 2 (N = 16,497,385) represents a complete-case subset of the total PFD-positive population described in Table 1 (N = 36,483,067). This discrepancy is a methodological requirement of the K-means clustering algorithm, which requires complete data across all 12 core input features. Participants from the PFD-positive group with any missing values in these clustering variables were excluded to ensure the stability and precision of the phenotypic classification.
Phenotype 1 was defined as the “metabolic-inflammatory phenotype,” representing 6,386,347 women. It was distinguished by severe metabolic dysregulation and systemic inflammation. Compared with healthy controls, Phenotype 1 exhibited a dramatically higher prevalence of diabetes (20.9% vs. 4.1%) and hypertension (49.9% vs. 16.9%). These individuals presented with severe obesity, with a mean BMI of 36.1 kg/m2 and a mean WC of 113.5 cm, and their TyG-WC index, an indicator of insulin resistance, was extremely high (mean: 1025.40). Regarding inflammatory markers, this phenotype demonstrated significantly elevated CRP (mean: 0.73 mg/dL) and reduced serum albumin (mean: 39.86 g/dL) levels. At the functional and psychosocial levels, Phenotype 1 reported poorer physical function (PFI score: 0.78), a much higher prevalence of clinical depressive symptoms (17.2% vs. 5.8%), greater food insecurity (20.2% vs. 11.5%), and markedly poor cardiovascular health (mean LS7 score of only 6.3).
Phenotype 2, in contrast, was defined as the “metabolically-healthy phenotype,” representing 10,111,038 women. This phenotype’s profile was considerably different from that of Phenotype 1. Despite having PFD, their metabolic parameters were nearly indistinguishable from the healthy control group, with a mean BMI of 25.4 kg/m2, mean WC of 88.0 cm, diabetes prevalence of only 3.7%, and hypertension prevalence of 20%. Their CRP level (0.30 mg/dL) was also within the normal range. Notably, this phenotype’s cardiovascular health score was even slightly higher than that of the healthy controls (mean LS7 score: 9.7 vs. 9.4). However, compared with the healthy control group, these women were older (46.9 vs. 38.7 years) and had a similarly high proportion of vaginal deliveries (64.1% vs. 43.1%).
Table 2 represents a subset of the participants from Table 1; Supplementary Material B presents a detailed explanation.
Performance of multiclass predictive models
To develop a model capable of accurately distinguishing among the three clinical states (Healthy Control, Phenotype 1, and Phenotype 2), we trained and evaluated seven different machine learning algorithms. Model performance was assessed on an independent test set and quantified using the macro-average AUC. Figure 2 illustrates the ROC curves for all models on the test set, with AUC values reported with 95% confidence intervals (CIs) estimated by stratified bootstrap resampling (2000 iterations). All algorithms demonstrated strong predictive capabilities, with AUC values substantially exceeding the 0.5 threshold of random chance. The neural network model showed the best performance, achieving the highest macro-average AUC of 0.848 (95% CI: 0.830–0.865). The performance of several other models was highly competitive and nearly equivalent. Specifically, XGBoost (AUC = 0.837), logistic regression (AUC = 0.834), and random forest (AUC = 0.831) all demonstrated robust discriminatory power, followed by support vector machine (AUC = 0.825) and naive Bayes (AUC = 0.792). The K-nearest neighbors algorithm was the weakest performer for this task (AUC = 0.768). Overall, multiple models successfully achieved effective prediction of the clinical subtypes, with AUCs generally above 0.83, indicating that our identified PFD subtypes were highly distinguishable.

Comparison of machine learning models based on receiver operating characteristic curves.
Key predictors for differentiating PFD phenotypes
To identify the most critical factors for distinguishing between the two PFD clinical phenotypes, we analyzed the best-performing and interpretable XGBoost model. We employed mean XGBoost Gain to quantify the contribution of each feature, with error bars in Figure 3 indicating ±1 standard deviation (SD) across 200 iterations of bootstrap resampling and model refitting to ensure the stability of feature rankings. The analysis revealed that WC emerged as the single most dominant predictor, with an importance score exceeding 0.329—nearly three times that of the second-ranked feature, age (0.117). This factor played a decisive role in phenotype differentiation, serving as the primary driver of the metabolic-inflammatory phenotype.

Feature importance of the XGBoost Model (Gain).
Beyond WC, other key predictors clustered into three primary domains: 1. Demographic and Core Metabolic Metrics. Age (importance score: 0.117) was the second most important factor, closely followed by the TyG index (0.099), and BMI (0.097). 2. Systemic health and psychological status. The depression score (PHQ-9, 0.064) also demonstrated a significant contribution. 3. Biological markers and reproductive history. The remaining top 10 factors included lipid profile markers (triglycerides and high density lipoprotein cholesterol), serum albumin (reflecting nutritional and inflammatory status), parity (a classic obstetric risk factor), and hemoglobin level (an indicator of overall health).
In summary, the feature importance analysis demonstrates that the metrics of metabolic dysregulation based on central obesity are the primary drivers for differentiating between the two PFD phenotypes, followed by age and composite indicators reflecting systemic health and psychological status.
Sensitivity analysis: predictive importance of parity in nested models
To investigate the role of parity, a traditional risk factor, across predictive models of varying complexity, we conducted a sensitivity analysis. We constructed three nested models: (a) a “basic clinical model” with 5 core variables; (b) a “metabolic model” incorporating 10 variables; and (c) a “systemic health model” with 13 variables. Detailed results are available in Supplementary Material D.
As shown in Figure 4, the relative importance of parity systematically decreased as model complexity increased. In the basic clinical model, which included only BMI, age, parity, race/ethnicity, and education, parity ranked as the third most important feature, following BMI and age, with an importance score of 8.81 (when scaled to a maximum of 100 for BMI, and 47.24 for age). When metabolic indicators were added (10 features in total), the rank of parity dropped to 5th. In the final, most comprehensive model that included systemic health indicators (13 features in total), its rank further declined to 6th. This finding indicates that while parity serves as a significant independent predictor in a simple clinical assessment, a substantial portion of its predictive value is explained or absorbed by more comprehensive metabolic and systemic health metrics, strongly suggesting that the long-term association of parity with PFD may be partially explained by shared associations with metabolic and systemic health indicators.

Feature importance shifts across nested models.
Discussion
Principal findings
Using a nationally representative sample and an unsupervised machine learning approach, this study is among the first to demonstrate that female PFD may not represent a single clinical entity but can be analyzed as at least two clinical phenotypes with distinct biological profiles. This finding extends the conventional understanding of PFD and provides a potential framework for precision medicine in this field. We identified a large metabolic-inflammatory phenotype characterized by central obesity and systemic metabolic dysregulation, coexisting with a metabolically-healthy phenotype more closely aligned with traditional factors such as aging and parity-related changes.
These findings align with emerging clinical evidence suggesting that PFD is a complex, multifactorial condition. As demonstrated in a recent descriptive analysis of a large surgical cohort (n = 832), the onset of PFD is not solely a consequence of local mechanical or surgical insults but may also be associated with systemic and metabolic health profiles, 26 underscoring the importance of moving beyond localized anatomical perspectives toward a more integrative, system-based understanding of the disease. Notably, although we explored biological phenotyping, certain PFD components—particularly POP and FI—had relatively low prevalence in the NHANES cycles analyzed, limiting the statistical power to generate stable, phenotype-specific symptom profiles. Thus, we directed our analytical focus toward the underlying biological and metabolic heterogeneity.
The metabolic-inflammatory phenotype: PFD as a condition potentially linked to systemic metabolic dysfunction
The core finding of this study is the identification of Phenotype 1, the “metabolic-inflammatory phenotype.” Patients in this group exhibited a pronounced metabolic syndrome profile, including markedly elevated BMI and WC, a diabetes prevalence exceeding 20%, and a hypertension prevalence exceeding 50%. Feature importance analysis further indicated that WC was the most influential variable differentiating the phenotypes, with a contribution substantially exceeding that of other predictors. These results suggest that for many women, PFD may not be solely an isolated structural condition but may also be associated with systemic metabolic dysfunction, potentially reflecting overlapping pathophysiological mechanisms warranting longitudinal investigation. This perspective aligns with the emerging concepts of “meta-inflammation” and “inflammaging,” which describe obesity-associated chronic low-grade inflammation as a systemic process affecting vasculature, neural integrity, and connective tissue biology. As a region subject to significant mechanical stress and structural complexity, the pelvic floor may be particularly susceptible to such systemic pathological states.9,11,27–29
Not all inflammatory indices demonstrated consistent differentiation between phenotypes. Although CRP and serum albumin supported the characterization of a metabolically compromised subgroup, other markers such as NLR and SII exhibited limited separation. Therefore, the inflammatory component should be interpreted as indirect and heterogeneous, rather than as definitive evidence of systemic inflammatory activation.
Our validation analysis in the periodontitis subcohort provides complementary, albeit indirect, support for the inflammatory hypothesis. We found that the prevalence of periodontitis was significantly higher in the PFD-positive group. Although periodontitis serves as a pragmatic, population-level proxy for systemic chronic inflammation in the NHANES database, we acknowledge it is an indirect marker. The absence of direct inflammatory cytokines, such as interleukin (IL)-6 or tumor necrosis factor (TNF)-α in the analyzed cycles, remains a limitation for deeper mechanistic exploration. Conversely, the lack of a significant association between PFD and grip strength suggests that the observed heterogeneity may be more consistent with metabolic-inflammatory pathways than with a generalized sarcopenia/frailty pathway. However, this exploratory analysis should be interpreted cautiously, as residual confounding from factors such as total muscle mass or specific exercise habits could not be fully adjusted due to data limitations. Future longitudinal studies incorporating more granular biomarkers and functional assessments are needed to clarify these biological associations.
The metabolically-healthy phenotype: a reaffirmation of traditional etiology
In contrast to Phenotype 1, Phenotype 2—the “metabolically-healthy phenotype”—exhibited a metabolic profile nearly indistinguishable from that of the healthy controls. These women, however, were older and exhibited a similarly high proportion of vaginal deliveries. The existence of this phenotype supports the concept that age-related tissue degeneration and delivery-associated mechanical injury represent a distinct and robust traditional pathway to PFD. The strength of our study lies in its ability to distinguish this “classic” pathway from the newly identified metabolic-inflammatory pathway through a data-driven approach, suggesting that multiple biological trajectories may lead to PFD.
Revisiting the role of parity in the context of metabolic and systemic health
Childbirth is widely recognized as one of the most significant initiating factors for PFD.30–32 However, our sensitivity analysis revealed a more complex picture. Although parity ranked as the third most important predictor in a basic clinical model, its relative importance diminished significantly when metabolic and systemic health indicators were included. This finding suggests that the association between parity and PFD may be partly accounted for by long-term shifts in a woman’s metabolic health and systemic inflammatory state, rather than solely through direct mechanical injury from delivery. In other words, childbirth may be associated with long-term metabolic changes in some women, which could, in turn, contribute to the progression of PFD. Although the causal nature and directionality of these relationships require confirmation through longitudinal studies, this integrated perspective offers a new angle for clinical practice: attention should be shifted toward a more holistic, long-term metabolic health management for postpartum women, rather than focusing exclusively on parity as a fixed risk factor.
Clinical implications and future directions
These findings have important implications for clinical practice and future research, supporting the shift from a “one-size-fits-all” treatment model toward a phenotype-specific precision strategy. For patients with the “metabolic-inflammatory phenotype,” management may extend beyond pelvic floor physical therapy or surgery to include weight optimization, insulin resistance management, and broader cardiometabolic risk control. Importantly, although these phenotypes were identified using unsupervised machine learning, their practical identification may not require specialized tests or advanced computational infrastructure. Key discriminative features—such as waist circumference, age, and routinely available metabolic indicators (e.g. triglycerides and fasting glucose)—are commonly available in primary care. This suggests that preliminary phenotype stratification could be realistically achieved using simple rule-based or score-based approaches, making precision urognecology accessible at the point of care within a multidisciplinary, “one-stop” clinical hub without the need for complex machine learning implementation.
Future studies should validate the longitudinal stability of these phenotypes and examine potential differences in treatment response. At present, the identified phenotypes should be regarded as a conceptual framework rather than a formal diagnostic classification. A critical unanswered question is whether the PFD symptom profiles (e.g. the prevalence and severity of SUI, UUI, and POP) differ between the phenotypes, which will be key to refining phenotype-specific treatment strategies. Identifying specific biomarkers associated with each phenotype may further refine precision diagnostics and therapeutic development. Such progress will help bridge research findings with integrated clinical practice. 8
Strengths and limitations
The primary strengths of this study are its foundation in the large, nationally representative NHANES database and its pioneering use of unsupervised machine learning to objectively analyze PFD heterogeneity. Our study design was rigorous, incorporating model validation, feature interpretation, and multiple sensitivity and validation analyses.
However, certain limitations must be acknowledged. First, the cross-sectional design of NHANES precludes the inference of causality; hence, the identified phenotypes should be interpreted as observed associations. Second, the relatively low prevalence of certain PFD components, such as POP and FI, limited our statistical power to generate detailed, phenotype-specific symptom profiles. Third, several data points (e.g. depressive symptoms and delivery history) relied on self-reporting, which may be subject to recall bias, potentially influencing the accuracy of phenotypic classification. Fourth, direct inflammatory biomarkers (e.g. IL-6 or TNF-α) were not available in the analyzed cycles, and objective validation metrics—such as periodontitis and grip strength—were restricted to smaller sub-cohorts. As noted, the clustering analysis was conducted on a complete-case subset of the PFD-positive population. This approach may introduce selection bias if missingness was not completely at random, particularly if individuals with incomplete metabolic or inflammatory profiles differed systematically from those included. Although multiple imputation was applied in downstream analyses, clustering stability and phenotype representation may have been influenced by the exclusion of participants with missing core features. Finally, the study population was derived from the US NHANES cohort, which reflects specific demographic compositions (predominantly non-Hispanic White), lifestyle patterns, and healthcare access characteristics. For instance, the relatively high prevalence of obesity and metabolic syndrome in the US may over-represent the “metabolic-inflammatory phenotype” compared to regions with lower BMI distributions, such as parts of East Asia. Furthermore, differences in parity patterns and healthcare delivery systems (e.g. the availability of routine postpartum pelvic floor rehabilitation) across countries may influence the distribution and clinical expression of PFD phenotypes. Therefore, the generalizability of these findings to non-US populations should be interpreted with caution and warrants validation in diverse international cohorts.
Conclusion
In conclusion, this study, utilizing a data-driven approach, identifies two distinct clinical phenotypes of female PFD: a metabolic-inflammatory phenotype, which is closely associated with central obesity and metabolic dysregulation, and a metabolically-healthy phenotype, which is more aligned with the traditional factors of aging and parity-related changes. These findings provide a more nuanced perspective on the heterogeneity of PFD pathophysiology and suggest new possibilities for developing personalized prevention and treatment strategies, marking an important step toward precision medicine in urognecology.
Footnotes
Acknowledgements
The authors express their gratitude to the National Center for Health Statistics at the Centers for Disease Control and Prevention for the planning, collection, and maintenance of the NHANES database. We would also like to thank the researchers and study participants who made this publicly available dataset possible. The interpretation and conclusions contained herein are those of the authors and do not necessarily represent the views of the NCHS or the CDC.
Authors’ contributions
Jingming Yang, Duo Zhao, and Lan Wang designed research; Jingming Yang and Duo Zhao analyzed data; and Jingming Yang and Lan Wang wrote the paper. Lan Wang had primary responsibility for final content. All authors read and approved the final manuscript.
Data availability statement
The data described in the manuscript were obtained from the National Health and Nutrition Examination Survey (NHANES), a publically available database. The data, code book, and analytic code are freely and publicly available without restriction at the Centers for Disease Control and Prevention website (
).
Declaration of conflicting interest
The author(s) declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical considerations
This study was a secondary analysis of data from the National Health and Nutrition Examination Survey (NHANES). The NHANES protocol was approved by the research ethics review board of the National Center for Health Statistics (NCHS), and all participants provided written informed consent. As the data used in this study are publicly available and de-identified, our institution’s Institutional Review Board (IRB) deemed this study exempt from review.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
