Abstract
To systematically evaluate the quality of both domestic and international clinical practice guidelines for male infertility using the Appraisal of Guidelines for Research & Evaluation II (AGREE II) tool, identify methodological shortcomings, and propose evidence-based recommendations for improvement, this study was conducted. A systematic search of Chinese and English databases, along with official websites of international organizations, was undertaken to identify relevant guidelines. Four independent reviewers assessed the included guidelines employing the AGREE II tool, focusing on the rigor of development processes and the concordance and divergence among core recommendations. Ten guidelines were included, with only one rated “recommendable” using AGREE II criteria; others were “conditionally recommendable” or “not recommendable.” Key shortcomings included deficient rigor of development (Domain III: 34.4%), applicability (Domain V: 48.3%), and editorial independence (Domain VI: 23.5%). Eighteen core recommendations were identified. Domestic guidelines lacked transparent conflict of interest disclosures and multidisciplinary collaboration, whereas international guidelines demonstrated superior methodological rigor through interprofessional integration. Most guidelines failed to validate clinical impacts of recommendations, hindering practical implementation. This study represents the first systematic evaluation of male infertility guidelines at national and international levels, with comparative analysis of their recommendations. Findings reveal widespread deficiencies in methodological rigor, applicability, and editorial independence. Future guideline development should adopt standardized frameworks, enhance multidisciplinary collaboration, ensure editorial independence, and integrate evidence-based medicine to improve quality and clinical utility.
Introduction
Male infertility constitutes a major contributor to fertility disorders, affecting approximately 15% of couples of reproductive age worldwide, with male factors accounting for nearly 50% of infertility cases (Inhorn & Patrizio, 2015). Although various associations and organizations have developed guidelines offering diagnostic and therapeutic strategies for the management of male infertility, considerable heterogeneity persists across these guidelines. Notably, there is a lack of consensus regarding thresholds for varicocele surgery, the application of antioxidant therapies, and the management of idiopathic infertility, leading to inconsistencies that may introduce uncertainty in clinical decision-making and compromise patient care (Colpi et al., 2018; Minhas et al., 2021).
At present, international authorities such as the European Association of Urology (EAU) and the American Society for Reproductive Medicine (ASRM) have issued multiple clinical practice guidelines on male infertility. However, owing to variations in organizational focus, membership composition, and target populations, substantial disparities exist in the methodological rigor and quality of these guidelines. Previous studies have highlighted several limitations in guideline development processes (Chen et al., 2017), including delayed incorporation of emerging evidence, inadequate disclosure of conflicts of interest, and insufficient integration of patient perspectives, all of which may undermine both the scientific validity and clinical applicability of the resulting recommendations.
In this context, the present study employed the internationally recognized AGREE II tool (Supplemental Material 1) to systematically evaluate the quality and recommendation concordance of global clinical practice guidelines on male infertility. By identifying methodological and content-related deficiencies, this analysis aims to propose feasible optimization strategies and to provide an evidence-based foundation for the future development and revision of clinical practice guidelines in this field.
Materials and Methods
To assess guideline quality and concordance, we initiated a comprehensive guideline review. Study steps comprised a comprehensive guideline search, formulation and application of selection criteria, assessment of guideline quality, compilation of results and analysis.
Literature Search Strategy
This study was conducted following a systematic evaluation framework based on the AGREE II instrument. A comprehensive search was performed across multiple databases, including China National Knowledge Infrastructure (CNKI), VIP Journal Database (VIP Database), Wanfang Med Online (Wanfang Data), PubMed, Web of Science, as well as the official websites of major organizations such as World Health Organization (WHO) and National Institute for Health and Care Excellence (NICE). The search encompassed all records from database inception to April 2025. A combination of Medical Subject Headings (MeSH) and free-text terms was employed, using Boolean logic operators to construct a highly sensitive search strategy. Key terms included “male infertility,” “guidelines,” and their equivalents (Supplemental Material 2).
Inclusion and Exclusion Criteria
Inclusion criteria included (1) type of study was a published guideline, (2) the language was Chinese and English, (3) the target population was patients with a clear diagnosis of male infertility, (4) the latest version of the same series of criteria.
Exclusion criteria included (1) literature for which full text was not available; (2) consensus/normative/expert recommendation articles, (3) translations and interpretations of guidelines, expert commentary, and other literature; (4) guidelines on laboratory/imaging tests for male infertility; (5) duplicate publications.
Literature Screening and Data Extraction
The retrieved records were managed using NoteExpress V3.7. Two independent investigators screened titles, abstracts, and full texts according to predefined inclusion and exclusion criteria. Discrepancies were resolved through group discussion or adjudication by a third investigator. Following screening, four trained reviewers independently assessed the methodological quality of the included guidelines using the AGREE II tool, with unrestricted access to official training resources provided on the AGREE II website.
Quality Assessment
AGREE II is the most widely validated instrument for quantitatively evaluating the methodological rigor and transparency of clinical practice guideline development. It encompasses six domains comprising 23 items, each rated on a 7-point scale (1 = strongly disagree, 7 = strongly agree). Domain scores were standardized using the following formula: (Standardized Score) = (Obtained Score − Minimum Possible Score) / (Maximum Possible Score − Minimum Possible Score) × 100. In addition, AGREE II includes two overall assessments: (1) an overall quality rating of the guideline on a 7-point scale, and (2) a recommendation for clinical use categorized as “recommended” “recommended with modifications” or “not recommended” (Brouwers et al., 2010).
In accordance with AGREE II guidelines, all evaluations were conducted by four fixed reviewers who had undergone standardized training. Prior to formal assessment, three guidelines were pre-evaluated to calibrate scoring consistency. Inter-rater reliability was assessed using the intraclass correlation coefficient (ICC) calculated via SPSS 26.0 software; an ICC ≥ 0.75 across all domains was required to proceed to formal evaluation. Where discrepancies exceeding one point between reviewers were observed, resolution was achieved through consensus discussions or consultation with external evidence-based medicine experts.
It should be noted that the AGREE II assessment primarily evaluates the rigor of the guideline development process rather than the validity of the clinical recommendations themselves. For comparative purposes, guidelines were categorized based on their overall quality into three levels following established criteria: high-quality recommendation:≥5 domains scoring above 60%; fair-quality recommendation: 3 to 4 domains scoring above 60%; low-quality recommendation: 1 to 2 domains scoring above 60% (Klein Haneveld et al., 2024; Messina et al., 2017; Sakalis et al., 2024).
Results
Guideline Screening
A total of 2,511 records were retrieved in this study. After title and abstract screening, 2,368 irrelevant or duplicate records were excluded, resulting in 56 articles for full-text assessment. Following full-text review, 46 articles were excluded for the following reasons: 18 were nonguideline documents, 13 were duplicates, seven were outdated versions of guidelines, six had no available full text, and two were comprehensive guidelines not specifically dedicated to male infertility. Ultimately, 10 clinical practice guidelines were included for analysis (Figure 1), comprising five Chinese-language guidelines (M. J. Zhang et al., 2015; J. W. Zhang et al., 2023; Clinical Application Guidelines of Proprietary Chinese Medicines for the Treatment of Dominant Diseases of the Standardized Project Group, 2022; Guidelines for Clinical Diagnosis and Treatment of Varicocele Infertility in Chinese Medicine, 2021; H. Li et al., 2022) and five English-language guidelines (Brannigan et al., 2024; Colpi et al., 2018; Dohle et al., 2005; Flannigan et al., 2023; Minhas et al., 2021) (Table 1).

Diagram of the Search Strategy
Characteristics of the Studies.
Note. CIATCM: China Information Association of Traditional Chinese Medicine; CMA: Chinese Medical Association; CAIM: Chinese Association of Integrative Medicine; GPATCM: Guangdong Provincial Association of Chinese Medici; NATCM: National Administration of Traditional Chinese Medicine; CUA: Canadian Urological Association; EAA: European Academy of Andrology; EAU: European Association of Urology; AUA/ASRM: American Urological Association.
AGREE-II Quality Assessment Results
The methodological quality of clinical practice guidelines for male infertility management was low. Only one guideline achieved a “fair recommendation”; the rest were “weak recommendations.” Scores across the six AGREE II domains varied substantially. “Clarity of presentation” scored highest (mean 65.5%), whereas “editorial independence” scored lowest (mean 23.5%) (Table 2).
Summary of Average Reviewer AGREE II Scores for All Guidelines Assessed.
The standardized mean score for Domain I (Scope and Purpose) was 54.8%. Only the CUA and AUA/ASRM guidelines scored above 60%, with the remaining guidelines falling between 40% and 60%. All guidelines clearly defined the overall objectives; however, they exhibited notable shortcomings in specifying health issues and target populations.
For Domain II (Stakeholder Involvement), the average standardized score was 44.9%, with the NATCM guideline achieving the highest score (90%). Most guidelines, with the exception of the EAU 2005 version, listed the members of the development team and institutional affiliations. However, few guidelines incorporated patient preferences or values, and none provided a detailed definition of the target population.
Domain III (Rigor of Development) had a low mean score (34.4%). NATCM scored highest (80%); others ranged 10% to 50%. Most guidelines relied on limited evidence or expert opinion, lowering scores. NATCM uniquely used a modified Delphi method/voting for evidence-based consensus with a clear process. Five guidelines reported systematic searches/evaluations (Brannigan et al., 2024; Colpi et al., 2018; Minhas et al., 2021; M. J. Zhang et al., 2015), but details (especially evidence-based recommendations) were incomplete. All except CAIM involved multiple stakeholders, yet none specified disagreement resolution methods, and supporting evidence was often unclear. Four guidelines mentioned external review (Brannigan et al., 2024; Colpi et al., 2018; Flannigan et al., 2023; Minhas et al., 2021), with two listing reviewers (Brannigan et al., 2024; Chen et al., 2017), but not findings. Five reported updates (Chen et al., 2017; Dohle et al., 2005; Flannigan et al., 2023; H. Li et al., 2022; M. J. Zhang et al., 2015), but four lacked update plans (Chen et al., 2017; Dohle et al., 2005; H. Li et al., 2022; M. J. Zhang et al., 2015).
Domain IV (Clarity of Presentation) had the highest standardized mean score at 65.5%. The majority of guidelines presented key recommendations clearly, with eight guidelines making them easily identifiable through the use of subheadings, tabular summaries, or standalone statements for key recommendations.
Domain V (Applicability) had the lowest standardized mean score at 26.9%. Only a few guidelines (AUA/ASRM, 2024; CUA, 2023; EAA, 2018; EAU, 2021; NATCM, 2021) addressed implementation challenges and resource requirements (Brannigan et al., 2024; Chen et al., 2017; Dohle et al., 2005; Flannigan et al., 2023; Minhas et al., 2021), and only the EAU 2021 guideline provided a quick reference manual.
Finally, Domain VI (Editorial Independence) had the lowest score among all domains, with a standardized mean of 23.5%. Four guidelines reported financial support (Brannigan et al., 2024; Chen et al., 2017; Flannigan et al., 2023; H. Li et al., 2022), but only EAU 2021 specified the potential impact of funding on content. In addition, only four guidelines disclosed their conflict of interest management practices (Brannigan et al., 2024; Chen et al., 2017; Dohle et al., 2005; Flannigan et al., 2023).
In terms of overall recommendations, three guidelines were deemed suitable for direct recommendation, five were recommended with modifications, and two were not recommended for use (Figure 2).

Chart Demonstrating Recommendations for Use for Each Guideline
The quality of guidelines developed by different regions or organizations varied considerably. Chinese societies and organizations published the largest number of guidelines, but all, except for NATCM 2021, received low scores. Among international guidelines, the EAU 2021 and CUA 2023 guidelines achieved high scores. Cross-sectional comparisons revealed that Chinese guidelines lagged behind their international counterparts in terms of methodological quality, particularly in the domains of “applicability” and “editorial independence” (Figure 3).

Radar Chart of Guide Scores for Different Countries or Regions
Results of Core Recommendations
A total of 91 primary recommendations were extracted from the 10 guidelines. Following an expert consensus meeting and a comprehensive multifactorial assessment, 18 core recommendations were ultimately selected. These were categorized into six groups based on their content: diagnostic assessment, genetic counseling and risk prevention, assisted reproduction, surgical intervention, lifestyle modifications, and drug contraindications (Figure 4, Supplementary Material 4).

Specific Recommendations
The diagnostic evaluation encompassed three main components. Semen analysis standardization (strong recommendation): two semen tests are required to confirm the diagnosis of oligozoospermia or oligoasthenozoospermia (OAT), with an interval of at least 3 months, in strict adherence to WHO guidelines. Endocrine testing (strong recommendation): all OAT patients should undergo testing for follicle-stimulating hormone (FSH), luteinizing hormone (LH), and testosterone levels, with prolactin (PRL) added for those suspected of having hypogonadotropic hypogonadism (EAA 2018). Initial genetic screening (strong recommendation): karyotyping and Y chromosome microdeletion testing are mandatory for sperm concentrations ≤5 × 10⁶/mL (EAA 2018/EAU 2021), and patients with congenital bilateral vasal agenesis defects (CBAVD) must undergo screening for CFTR gene mutations, including the 5T allele (EAU 2021).
Genetic counseling and risk prevention focused primarily on two points. Mandatory genetic counseling (strong recommendation): patients with CFTR mutations or AZF microdeletions should receive genetic counseling to understand the genetic risks for their offspring (EAU 2021). Fertility preservation and pre-implantation genetic testing (PGT) are recommended for individuals with chromosomal abnormalities, such as Creutzfeldt–Jakob syndrome (AUA/ASRM 2024). Tumor risk surveillance (weak recommendation): scrotal ultrasound screening is advised for infertile men to assess the risk of testicular carcinoma in situ (EAU 2021).
In the field of assisted reproductive technologies, the guidelines emphasized specific indications (strong recommendations) along with contentious testing analyses. IVF/ICSI is strongly recommended for couples with refractory OAT (EAA 2018), whereas micro-TESE combined with ICSI is required for non-obstructive azoospermia (NOA), with priority given to cryopreservation of surgically retrieved sperm (AUA/ASRM 2024). The guidelines showed variation in the strength of recommendation for SDF index testing: the EAA recommends it only in cases of failed ART or recurrent miscarriage, whereas EAU 2021 strongly advocates routine testing.
Regarding surgical interventions, the guidelines emphasize the preference for microsurgical techniques (strong recommendation), such as microscopic vasovasostomy or epididymovasostomy for obstructive azoospermia (OA), especially when female ovarian reserve is adequate (EAU 2021). Varicocele repair is recommended for patients with abnormal semen parameters, provided other etiologies have been excluded (conditional recommendation). Furthermore, testicular sperm retrieval is strongly discouraged in individuals with complete AZFa/b deficiency (EAU 2021).
The guidelines also address the role of lifestyle interventions, although the strength of evidence remains limited. For instance, EAA 2018 and AUA 2024 recommend that OAT patients quit smoking, maintain a body mass index (BMI) <30, limit alcohol intake (strong recommendation), and avoid heat exposure (conditional recommendation) and sedentary behavior (weak recommendation).
Important concerns are raised regarding drug contraindications and restrictions. For instance, multiple guidelines (EAA 2018/EAU 2021/AUA 2024) strongly prohibit the use of exogenous testosterone for male infertility treatment due to its inhibition of spermatogenesis (strong recommendation). The EAA 2018 also advises against using tamoxifen, aromatase inhibitors, or gonadotropins for male infertility, unless a clear diagnosis of hypogonadotropic hypogonadism is confirmed. In addition, high-quality evidence supporting the routine use of antioxidants such as vitamin E and coenzyme Q10 for male infertility treatment remains lacking, leading to a weak recommendation in this area.
Discussion
This study represents the first systematic application of the AGREE II tool to evaluate the quality of global clinical practice guidelines on male infertility. The findings highlight several significant deficiencies in the development of these guidelines, particularly in terms of methodological rigor, applicability, and editorial independence. In addition, this study sheds light on the underlying mechanisms contributing to the quality discrepancies observed across guidelines from different regions. The results not only provide actionable strategies for improving guideline quality but also offer valuable insights into how evidence-based medicine can be effectively translated into clinical practice within the field of reproductive health.
Analysis of Overall Quality
Among the 10 guidelines included in this study, only the NATCM 2021 guideline met the criterion for “general recommendation” (scoring ≥60% in at least three domains). A notable variation in quality was observed across the different domains of assessment. Domain IV (Clarity) received the highest average score (65.5%), whereas Domain III (Rigor of Development) (34.4%), Domain V (Applicability) (26.9%), and Domain VI (Editorial Independence) (23.5%) scored the lowest. These results suggest that while considerable attention was given to the presentation of the guidelines, critical aspects such as evidence integration and the feasibility of implementation were insufficiently addressed. Notably, the low score in Domain III (Methodological Rigor) underscores a systematic shortcoming in the construction of evidence, which compromises the guidelines’ scientific foundation. Furthermore, the very low score in Domain VI (Editorial Independence) points to a substantial deficiency in the management of conflicts of interest, which raises concerns regarding the impartiality and transparency of the guideline development process.
Analysis of Key Quality Deficits
The quality gap observed in male infertility guidelines stems from several fundamental structural deficiencies. This clinical guideline quality assessment revealed the most significant systematic flaws within Domain III (Rigor of Development), Domain V (Applicability), and Domain VI (Editorial Independence).
Regarding methodological rigor, the majority of the guidelines lacked systematic and transparent processes for evidence construction. Only a few guidelines, such as those from NATCM, CUA, and EAU, provided detailed information on their database search strategies, covering mainstream databases like EMBASE and Cochrane. Alarmingly, 30% of the guidelines did not specify the timeframe for their literature search, leading to an inherent lag in the evidence updates and the potential for selection bias. Furthermore, there was notable inconsistency in the evidence assessment methods used across different guidelines. International guidelines generally relied on randomized controlled trials (RCTs) to support level A/B evidence, whereas some traditional Chinese medicine (TCM) guidelines primarily relied on expert consensus to provide level C evidence. Only 40% of the guidelines mentioned the use of the GRADE grading system, and most were formalized in terms of external review mechanisms. Although some claimed to have a peer-review process, only the EAU 2021 guideline published the opinions of external reviewers and the process through which their feedback was incorporated. Moreover, it remained unclear whether the external reviewers had expertise in the field of reproductive medicine, thus undermining the scientific rigor and authority of the review process.
In terms of applicability, most guidelines lacked the necessary tools and clinical pathways for effective implementation, rendering the recommendations difficult to apply in practice. Only the EAU 2021 guideline offered an online SDF index calculation tool for clinical decision support. In addition, factors related to resource allocation were often overlooked. For instance, the selected guidelines did not assess the cost burden of diagnostic tests or the accessibility of health insurance, especially in the context of high-cost procedures such as CFTR genetic testing, which further disconnects the guidelines from real-world feasibility. This discrepancy poses significant barriers to the implementation of recommendations in low-resource settings, particularly in developing countries. Furthermore, there is an absence of quantitative follow-up and monitoring mechanisms in most guidelines. While the AUA/ASRM 2024 guideline explicitly provides criteria for evaluating semen parameters 6 months post-varicocele surgery, other guidelines generally lack clear operational indicators to monitor efficacy, thereby limiting their utility in closing the management loop in clinical practice.
Both domestic and international guidelines are susceptible to issues of editorial independence. Notably, none of the Chinese guidelines reviewed disclosed their funding sources, revealing a serious lack of transparency. The absence of conflict of interest management mechanisms further diminishes the credibility of these guidelines. With the exception of NATCM 2021, none of the other TCM guidelines disclosed conflict of interest statements from the expert authors. In some cases, such as in the CAIM 2017 guideline, there were concerns about potential undisclosed commercial associations between participating experts and the development of pharmaceutical formulations. If such potential conflicts of interest are not declared in a standardized manner, they may significantly compromise the objectivity and clinical credibility of the guidelines.
Interpreting Differences in the Quality of Domestic and International Guidelines
In the international comparison of male infertility guidelines, it is evident that cultural and institutional contexts play a significant role in shaping the development process. The TCM guidelines notably scored higher in the “stakeholder involvement” dimension. This is primarily due to the inclusion of academic teams and expert groups as the main contributors, which reflects a broader participation in form. However, under the “multidisciplinary + patient involvement” criterion recommended by AGREE II, the depth of qualitative participation and diversity of representation in TCM guidelines did not match the level seen in other international guidelines. As a result, the overall quality of the guideline development process in TCM guidelines was lower than that of their international counterparts. For example, NATCM (2021) scored 90% in the stakeholder involvement area, but the proportion of TCM practitioners among the expert participants far exceeded that of specialists in reproductive surgery, men’s medicine, and endocrinology. This imbalance led to gaps in recommendations related to microsurgery, assisted reproduction, and genetic counseling, areas in which TCM guidelines were less comprehensive.
Institutional differences also played a crucial role, particularly in the dimension of independence. International guidelines have generally established robust mechanisms for declaring conflicts of interest and implementing accountability systems. For example, the CUA (2023) guideline involved ethical experts and patient representatives throughout the entire development process. The use of an enhanced Delphi process and blind review mechanisms helped mitigate bias and uncertainty in the recommendations. In contrast, domestic guidelines, except for a few with high methodological rigor, exhibited a severe lack of conflict of interest declaration mechanisms and often lacked an independent review process. This institutional gap has led to significant differences in methodological quality and editorial transparency between domestic and international guidelines, ultimately limiting the comparability and reference value of domestic guidelines on the global stage.
Analysis of Differences in Core Recommendations
Despite broad consensus on diagnostic assessments and basic treatment approaches across various guidelines, considerable divergence exists in the strength of recommendations and implementation pathways, particularly with regard to the application of emerging technologies and the integration of traditional medicine. This is especially evident in the contrast between domestic and international guidelines. Such discrepancies may stem from differences in the depth of understanding of evidence-based medicine, the choice of evidence evaluation systems, and the divergent frameworks for health resource allocation. International guidelines typically issue moderate recommendations based on RCTs, yet their promotion remains cautious due to the absence of long-term safety data. In contrast, TCM guidelines, such as the NATCM (2021), base their recommendations on observational studies and expert consensus, reflecting the distinct logical frameworks underpinning different evidence systems.
A similar divergence is observed in the strength of recommendations for SDF testing. The EAU (2021) routinely recommends SDF testing based on robust clinical evidence, whereas the American Urological Association (AUA) and ASRM (2024) recommend it only for high-risk populations, such as those with recurrent miscarriages, largely due to health economic considerations. This suggests that, even when the quality of evidence is comparable, disparities in the interpretation of health care resource allocation and the promotion of indications exist among guideline developers.
Furthermore, genetic screening strategies also reveal these differences. European guidelines mandate CFTR genetic screening for patients with congenital bilateral absence of the vas deferens (CBAVD), a recommendation underpinned by well-established health insurance systems and standardized laboratory services. In contrast, guidelines in developing countries tend to favor selective screening, a pragmatic approach driven by infrastructural limitations and the high financial burden on patients. Cultural differences are also evident in lifestyle recommendations. Western guidelines emphasize standardized measures for weight control, smoking cessation, and alcohol moderation, whereas TCM guidelines favor individualized interventions based on “constitution identification.” Notably, only the NATCM (2021) explicitly acknowledges the integration of cultural preferences through patient interviews in the development of guidelines, while other guidelines still exhibit a substantial gap in integrating local cultural needs with scientific evidence.
Conclusion
In summary, the overall quality of current infertility diagnosis and treatment guidelines requires significant improvement. Only through the formulation of a standardized process, strictly adhering to established guidelines, can the methodological quality of both the guidelines and consensus be enhanced. Such an approach will ensure that clinical practice guidelines can provide correct guidance for clinical standards and effectively shape the practice of health care. Based on the results of this study, we propose the following recommendations:
(1) It is essential to establish a dynamic evidence integration mechanism that ensures the timely updating of guidelines. Guidelines and consensus documents reflect clinical experience and the evolution of health care challenges, and as such, they possess a certain degree of temporal relevance. Consequently, regular updates should be implemented post-publication, with a dedicated system for continuous monitoring of relevant literature in databases such as PubMed and Embase. This would enable the incorporation of the most current evidence, particularly in rapidly advancing fields, thereby preserving the scientific validity of the guidelines. (2) It is crucial to enhance the standardization of the consensus development process and its associated methodology. A survey revealed that 77.9% of editors of domestic medical science and technology journals are unfamiliar with the norms for guideline and consensus report development (Liu et al., 2025). Future research should focus on strengthening the integration of insights from experienced jurists into the consensus process. Moreover, multidisciplinary experts must be actively involved in the decision-making process. The working group responsible for guideline development should also invest in a deeper understanding of guideline methodology, in order to improve the overall quality of the guidelines (Wan et al., 2021). In addition, the inclusion of patient representatives in ethically sensitive areas, such as fertility preservation, is strongly encouraged. Structured questionnaires should be used to collect patient preferences, ensuring that their perspectives are systematically incorporated into decision-making. (3) The adoption of internationally recognized methods for evidence assessment and recommendation is imperative. The transformation of evidence into actionable recommendations is a pivotal aspect of the guideline development process. It is essential to employ systematic evidence retrieval methods and criteria for evidence selection, as well as to establish clear standards for the strength of recommendations. The use of internationally recognized grading systems, such as the Oxford evidence grading and GRADE systems, is the current standard in guideline development (Y. L. Li et al., 2023). However, the diversity of evidence grading systems currently employed in male infertility guidelines complicates the objective evaluation of evidence. Therefore, it is essential for guideline developers to adopt authoritative international standards, while also tailoring their approach to suit the specific context and needs of their national health care systems. (4) Strengthening the quality control system is vital, and establishing a conflict of interest declaration system is an essential step. All participants in the guideline development process should be required to clearly declare any potential conflicts of interest. Financial support sources within the past 5 years should also be transparently disclosed. To ensure the objectivity of the guidelines, it is necessary to conduct independent third-party audits to detect potential commercial biases, thus safeguarding the academic credibility of the guidelines. (5) Increasing the transparency of the guideline development process is crucial. None of the guidelines included in this study were registered on the International Platform for Registration and Transparency of Practice Guidelines (http://www.guidelines-registry.org). To improve the transparency of the reporting process, it is recommended that future guidelines or consensus documents be registered on this international platform. This would enhance the openness and accountability of the guideline development process. (6) In terms of localization, particularly with regard to Chinese guidelines, attention should be paid to the unique characteristics of TCM. Under the AGREE II framework, additional entries should be introduced to assess TCM practices. This might include visualizing the process of diagnosis and treatment, such as providing a flowchart for evidence-based diagnostic decisions, as well as ensuring quality control of Chinese medicinal preparations, such as labeling the purity of active ingredients and conducting batch stability tests.
Limitations
Due to limitations in the language search strategy, this study was unable to include several important regional guidelines written in languages such as French and Japanese. As a result, the global assessment of the quality of male infertility guidelines may be somewhat unrepresentative and incomplete, with a potential risk of language bias. Although the AGREE II tool is widely used for the quality assessment of international clinical guidelines, its applicability to specialized fields, such as TCM guidelines, remains a subject of debate. The reliance of TCM on non-RCT evidence, including ancient literature and clinical experience, positions these sources lower within the current evidence grading system. This makes it difficult to adequately capture the overall logic and value of the TCM diagnostic and treatment system, potentially underestimating the true clinical significance of such guidelines. In addition, this study utilized a systematic evaluation of guideline texts to assess the evidence base for recommendations, focusing primarily on whether they cited systematic reviews or high-quality studies. As a result, it was not possible to conduct a deeper analysis of the specific effect sizes or clinical significance of the original studies supporting each recommendation, thus limiting the empirical validation of their effectiveness. Furthermore, the incomplete uploading or absence of supplementary materials—such as evidence evaluation forms, external review comments, and conflict of interest statements—in some guidelines has objectively impacted the overall assessment of their methodological quality. This also potentially underrepresents the actual level of transparency and rigor within certain guidelines.
Supplemental Material
sj-docx-2-jmh-10.1177_15579883251380203 – Supplemental material for Male Infertility Management: A Critical Appraisal of Clinical Practice Guidelines With the AGREE II Instrument
Supplemental material, sj-docx-2-jmh-10.1177_15579883251380203 for Male Infertility Management: A Critical Appraisal of Clinical Practice Guidelines With the AGREE II Instrument by Jie Li, Kecheng Li, Mingqiang Zhang, Jixuan Chen, Maoke Chen, Wenxuan Dong, Wenhao Yu, Lixing Lei, Yao Huang, Haodong Yang, Peixuan Ren, Qiang Zou and Longsheng Deng in American Journal of Men's Health
Supplemental Material
sj-docx-3-jmh-10.1177_15579883251380203 – Supplemental material for Male Infertility Management: A Critical Appraisal of Clinical Practice Guidelines With the AGREE II Instrument
Supplemental material, sj-docx-3-jmh-10.1177_15579883251380203 for Male Infertility Management: A Critical Appraisal of Clinical Practice Guidelines With the AGREE II Instrument by Jie Li, Kecheng Li, Mingqiang Zhang, Jixuan Chen, Maoke Chen, Wenxuan Dong, Wenhao Yu, Lixing Lei, Yao Huang, Haodong Yang, Peixuan Ren, Qiang Zou and Longsheng Deng in American Journal of Men's Health
Supplemental Material
sj-pdf-1-jmh-10.1177_15579883251380203 – Supplemental material for Male Infertility Management: A Critical Appraisal of Clinical Practice Guidelines With the AGREE II Instrument
Supplemental material, sj-pdf-1-jmh-10.1177_15579883251380203 for Male Infertility Management: A Critical Appraisal of Clinical Practice Guidelines With the AGREE II Instrument by Jie Li, Kecheng Li, Mingqiang Zhang, Jixuan Chen, Maoke Chen, Wenxuan Dong, Wenhao Yu, Lixing Lei, Yao Huang, Haodong Yang, Peixuan Ren, Qiang Zou and Longsheng Deng in American Journal of Men's Health
Supplemental Material
sj-xlsx-4-jmh-10.1177_15579883251380203 – Supplemental material for Male Infertility Management: A Critical Appraisal of Clinical Practice Guidelines With the AGREE II Instrument
Supplemental material, sj-xlsx-4-jmh-10.1177_15579883251380203 for Male Infertility Management: A Critical Appraisal of Clinical Practice Guidelines With the AGREE II Instrument by Jie Li, Kecheng Li, Mingqiang Zhang, Jixuan Chen, Maoke Chen, Wenxuan Dong, Wenhao Yu, Lixing Lei, Yao Huang, Haodong Yang, Peixuan Ren, Qiang Zou and Longsheng Deng in American Journal of Men's Health
Footnotes
Acknowledgements
All authors have made significant contributions to this study and the field of medical education. As we reach the completion of this paper, I would like to express my sincere gratitude and extend my best wishes to those who have supported and guided me throughout this research and learning journey.
Ethical Considerations
This study was a public data collection and analysis and did not involve animals or other human subjects and did not require ethical approval.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was funded by the Natural Science Foundation of Fujian Province (General Program, No. 2023J011626), the Natural Science Foundation of Xiamen (General Program, No. 3502Z20227274), and Fourth Batch of Academic Experience Inheritance of Elderly Chinese Medicine Experts in Fujian Province (Min Wei Chinese Medicine Letter [2022] No. 554).
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data Availability Statement
The data used or analyzed during the current study are available upon reasonable request from the corresponding author.
Supplemental Material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
