Abstract

Keywords
Research plays a critical role in advancing evidence-based practice by providing measurable insights into the relationships between exposures or interventions and outcomes. These insights are often constructed through the application of statistical research findings to inform clinical care. The significance of a research finding is not determined by its statistically significant effect estimates or p values alone. For research outcomes to be clinically significant and effectively guide evidence-based practice, they must be evaluated in light of several characteristics: their magnitude (i.e., size of the effect estimates), precision (i.e., range of values within which we think the true effect estimate lies), consistency (i.e., alignment with existing literature), and clinical applicability to the target population (i.e., relevance to informing practice for populations of interest).
The International Board of Lactation Consultant Examiners (IBLCE), the credentialing organization of International Board Certified Lactation Consultants (IBCLCs), wrote a Code of Professional Conduct for all IBCLCs that includes the following principle: “That we will provide care to meet clients’ individual needs that is culturally appropriate and informed by the best available evidence” (IBLCE, 2015, Pricniple 1, Section 1,2). Different types of research can be used to provide evidence, but quantitative research, which involves numerical data collection and analysis, allows for the calculations of the probability and variability of findings in target populations. Quantitative research is generally classified into experimental and non-experimental (observational) studies. Although experimental studies are considered more rigorous, they are not always ethical or practical for lactation research. In both experimental and observational studies, researchers rely on inferential statistics to characterize how study findings should be interpreted alongside existing evidence to understand their clinical significance within broader populations.
Statistical testing is employed to assess the likelihood that a study’s findings are due to chance. It detects relationships between some exposure (e.g., 6 months of exclusive breastfeeding) or intervention (e.g., a home visiting lactation support) and an outcome of interest (e.g., gastrointestinal tract infection) compared to the probability of observing such findings by chance. The possibility that there is no significant difference, that the difference between the group receiving the intervention and those not receiving the intervention is just a chance finding, is the null hypothesis. Comparing two groups involves testing the null hypothesis of no relationship between the exposure and outcome of interest. If statistical testing detects a relationship that is unlikely to have occurred by chance, the null hypothesis is rejected. The conventional threshold for “statistical significance” is p value < 0.05, indicating that there are five or fewer chances out of 100 that the observed finding occurred by chance or randomly (Greenland et al., 2016).
Over-reliance on p values is discouraged since this arbitrary cutoff can be misleading and observational studies are typically affected by biases that ensure that the p value represents issues more complicated than “chance alone” (Westreich, 2019). The author guidelines for the Journal of Human Lactation include the following: “In most cases, 95% confidence intervals are preferred over the p-value for evaluating statistical significance” (see the submission guidelines, https://journals.sagepub.com/author-instructions/JHL). One of the main reasons for this preference is that p values conflate two pieces of information about a study finding: the magnitude and the precision of the estimated difference between the two populations being compared. Instead, it is preferable to present both the magnitude of effect estimates (e.g., odds ratios, rate ratios, risk differences) and their precision estimates (confidence intervals). Additionally, findings should be discussed in the context of the broader literature to assess their consistency with existing evidence and clinical relevance to different populations.
Effect estimates quantify the magnitude of the association between the exposure and outcome of interest, or the change the researchers think they will find between two groups. Effect estimates should be selected to align with the attributes of the study design. For example, a case-control study assessing whether exclusive breastfeeding is sustained at 6 months could directly estimate an odds ratio between the two groups. However, a longitudinal study examining exclusive breastfeeding cessation at any point over the first year after birth would more suitably estimate a hazard ratio, which provides a more dynamic measure of risk by incorporating both the timing and occurrence of events.
Effect estimates of the exposure-outcome relationships are categorized into ratio measures and absolute measures of association. Ratio measures include risk ratio, odds ratio, and hazard ratio (RR/OR/HR), which express the relative likelihood of an outcome occurring in the exposed group compared to the unexposed group. In contrast, absolute measures of association, such as risk difference (RD), quantify the actual difference in risk between exposed and unexposed groups. The interpretation of these measures varies based on their type. For ratio measures, an estimate greater than 1 indicates an increased risk, less than 1 indicates a reduced risk, and 1 represents no difference (null risk). For absolute measures, an estimate greater than 0 (zero) indicates an increased risk, less than 0 indicates a reduced risk, and 0 denotes no difference (null risk). For example, in the rigorous experimental lactation study, the Promotion of Breastfeeding Intervention Trial (PROBIT), patients who received a Baby-Friendly style intervention had lower odds of weaning by 3 months compared with control participants. The OR of 0.52 denotes a reduction of almost half in the risk of weaning (Kramer et al., 2001). However, ratio measures communicate only relative differences between groups rather than an absolute difference. For example, if we know the OR = 2, we know an outcome is twice as likely, but the OR does not specify whether this corresponds to an increase from 20% to 40% or from 0.2% to 0.4%. In contrast, a risk difference of 20% clearly defines the actual change in risk. Absolute difference measures are mathematically more intuitive and easier to interpret, making them more clinically accessible. For example, a randomized trial found that offering infants a pacifier once lactation was well established (after 15 days) did not reduce exclusive breastfeeding at 3 months in a clinically meaningful way (RD = 0.004) (Jenik et al., 2009). The RD effect estimate in this study means that the percent of exclusively breastfeeding babies at 3 months only differed by 0.4%.
Effect estimates should be accompanied by 95% confidence intervals, which provide information on precision by denoting the range of values within which the true effect estimate is likely to lie, given our uncertainty. Interpretation of confidence intervals involves the concept of hypothetical replications: if the study were repeated 100 times under identical conditions, we would expect that 95 out of the 100 confidence intervals would contain the true population estimate of association. For example, Yourkavitch et al. (2018) reported that people who regularly pump are more likely to stop human milk feeding in the first year compared to those who do not regularly pump, using a hazard ratio and confidence interval: HR = 1.62, 95% CI [1.47, 1.78]. This means that regular pumpers are estimated to be 62% more likely to stop human milk feeding in the first year compared to those who do not regularly pump. The confidence interval can be interpreted to mean that given the level of variability of the data, we expect the true population estimate of an increased risk of human milk feeding cessation for regular pumpers to fall between the upper and lower confidence interval parameters of 47% and 78% (Yourkavitch et al., 2018). This additional quantification of precision is much more informative than a simple p value indicating statistical significance.
The following example shows how the p value changes in either the precision or magnitude of an effect estimate. In studies with a large sample size, statistically significant results may be detected even when the effect size is minimal and lacks clinical significance (Greenland et al., 2016). For example, suppose we have a study of 253 participants and an original RR of 1.3 (95% CI [0.9, 2.0], p value = 0.2), suggesting the association being tested between having more than one versus only one prenatal provider and receiving timely postpartum care is not significant (Wouk et al., 2022). If this small study had enrolled additional patients, increasing the sample size to 3000, the precision of the RR estimate could have improved, resulting in a narrower confidence interval and a smaller p value that is statistically significant (RR = 1.3, 95% CI [1.16, 1.46], p < 0.001). While the larger study sample improved the precision and statistical significance of the study findings, the magnitude of the association remained weak (1.3). A weak effect estimate may not be considered clinically significant except in cases where the outcome is very serious (e.g., mortality or hospitalization).
On the other hand, if the RR increased from 1.3 to 2.0 while retaining the same spread in the 95% confidence interval as around our original estimate (RR = 2.0, 95% CI [1.6, 2.7], the p value would decrease to p < 0.001, indicating statistical significance. However, if the RR further increased to 2.5, but had a wide 95% confidence interval (RR = 2.5, 95% CI [0.43, 14.59]), the p value could rise to 0.3, indicating that the result is not statistically significant. In our example, while the p value and the confidence interval varied depending on the size of the sample and the size of the effect estimate, the confidence limits continued to give more refined information about the precision. Presenting both the strength of the estimate and quantifying the uncertainty around the estimates provides more nuanced information than the p value to allow the reader to use their judgment in evaluating whether the finding is clinically significant.
The consistency and clinical applicability of statistical outcomes must also be considered when determining clinical significance. Consistency is assessed by comparing the magnitude and precision of effect sizes with those from existing literature across a broader context. Repeated observations of similar findings across studies with different designs, contexts, and populations strengthen the evidence that a real effect has been identified in contrast to an effect resulting from chance alone. The clinical applicability of statistical findings should be assessed by considering to whom the effect estimates and confidence intervals are generalizable. A study with a very homogenous sample might produce extremely precise estimates, but they might not be clinically relevant to a population that differs from the sample population in important characteristics. For example, a rigorously designed clinical trial of lactating people experiencing postpartum depression might report statistically significant findings of the benefit of an intensive breastfeeding support intervention. However, if the sample is systematically different from the general population, because their depression is mild enough to permit their participation in a demanding clinical trial, then the findings may not be clinically significant for patients with severe symptoms. The strength of the effect estimate and confidence interval range of values may be more clinically significant when presented for subgroups of the study sample that might be more like target populations of interest to communicate how findings differed based on patient-level characteristics. Therefore, researchers should present as much raw data as possible to allow the reader to comprehend differences in magnitude and precision of findings within different subgroups of interest. Additionally, qualitative data can be used to reveal underlying cultural differences, patient preferences, or systematic barriers that may affect the generalizability of statistical findings.
Finally, to assess the clinical significance of study findings, it is also important to interpret effect estimates and confidence intervals in the context of other study characteristics that may bias reported measures of magnitude and precision. Descriptions of these other common study biases have been described elsewhere (Edwards et al., 2015; Westreich et al., 2019), including in other articles in this series (Berndt, 2020; Duckett, 2021; Haile, 2023). After considering sources of bias, both researchers and consumers of research findings must look beyond p-value reporting to describe the alignment of effect estimates and confidence interval ranges with findings from existing research. If the findings have been replicated across diverse study settings, they may be interpreted as being clinically significant across broad target populations, while new findings or findings that contradict existing research may guide future research (i.e., using larger samples, understudied population subgroups, more intense exposures, and/or more accurate measurement tools).
Qualitative Research
While this paper is focused on quantitative outcomes, qualitative research methods also merit important consideration in applying clinical significance. Qualitative methods, rooted in the social sciences and increasingly applied in health research, are crucial for understanding the complex and nuanced nature of everyday realities and challenges of breastfeeding practices (Asiodu et al., 2021; Bookhart et al., 2021; Woods Barr et al., 2021). In the context of applying evidence to practice, qualitative research studies can improve the development of culturally appropriate and context-informed practices to improve the relevance, acceptance, and sustainability of clinical practice guidelines, and facilitate implementation. In determining clinical applicability, the important and distinct contributions of qualitative research are therefore critical, not only for interpreting statistical findings, but for informing design and recommendations for interventions, as well as capturing the meaning, context, and values of what is defined as clinically significant (Sandelowski, 1996).
Conclusion
A study does not need to achieve statistical significance to be important or clinically relevant! When presenting research findings in a manuscript, it is important to clarify both the magnitude and the precision of the effect estimates in the results section and accompanying tables. These findings should be interpreted in the discussion section, considering alignment with existing research to make the case for why findings may represent clinically significant outcomes and identify the target populations for which these findings are relevant. Absolute measures of association are often more intelligible to clinicians and patients. Regardless of whether absolute or ratio measures are reported, it is important to make recommendations for future research based on the implications of the study findings, including the direction of the observed effect, consistency of findings with other research, and characteristics of the study population. Instead of only reporting statistical significance, communicating the clinical significance of research findings can ensure that readers have the best available evidence to meet clients’ individual needs.
Footnotes
Acknowledgements
The authors would like to thank D. Marcus Herman-Giddens, MAT, MPH for statistical comments and input.
Author Contributions
Disclosures and Conflicts of Interest
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: Ellen Chetwynd was the Editor-in-Chief of the Journal of Human Lactation at the time this manuscript was written and published. We have no other conflicts or funding to disclose.
Funding
The author received no financial support for the research, authorship, and/or publication of this article.
