Abstract
Adequate and transparent reporting is necessary for critically appraising research. Yet, evidence suggests that the design, conduct, analysis, interpretation, and reporting of oral health research could be greatly improved. Accordingly, the Task Force on Design and Analysis in Oral Health Research—statisticians and trialists from academia and industry—empaneled a group of authors to develop methodological and statistical reporting guidelines identifying the minimum information needed to document and evaluate observational studies and clinical trials in oral health: the OHstat Guidelines. Drafts were circulated to the editors of 85 oral health journals and to Task Force members and sponsors and discussed at a December 2020 workshop attended by 49 researchers. The final version was subsequently approved by the Task Force in September 2021, submitted for journal review in 2022, and revised in 2023. The checklist consists of 48 guidelines: 5 for introductory information, 17 for methods, 13 for statistical analysis, 6 for results, and 7 for interpretation; 7 are specific to clinical trials. Each of these guidelines identifies relevant information, explains its importance, and often describes best practices. The checklist was published in multiple journals. The article was published simultaneously in
Keywords
Introduction
“Large proportions of articles contain errors in the application, analysis, interpretation, or reporting of statistics or in the design or conduct of research” (Lang and Altman 2013). Oral health research is not immune to this criticism. For example, a 2009 review of 95 randomized controlled trials (RCTs) published in the leading journal in each of 6 dental specialties found generally suboptimal reporting of key Consolidated Standards for Reporting Trials (CONSORT) guidelines (Pandis et al. 2010). In another review, “spin”—nonstatistically significant results reported as “clinically important”—was assessed in the abstracts of 75 RCTs published in 10 leading dental journals. Of the 75 trials, 17 incorrectly presented a “statistically nonsignificant result for the primary outcome as showing treatment equivalence or comparable effectiveness” and 2 emphasized the conclusions of a secondary outcome when the primary outcome was not statistically significant (Roszhart et al. 2020). Additionally, a report of quality and spin in RCT abstracts in the periodontal-cardiovascular field found poor adherence to CONSORT guidelines, with 87% of trials not reporting on the primary outcome and 86% of trials showing at least 1 form of spin in the results and/or conclusions (Shaqman et al. 2020). Thus, “overall, dental journals show low reporting of quality-related characteristics with high variation that is journal-dependent” (Pandis et al. 2011).
Although oral health research is similar to clinical research in other fields, many dental studies have design characteristics that can confound analysis. For example, the unit of analysis can be a single tooth, multiple teeth, individual tooth sites, or a single patient. In longitudinal studies, teeth can be lost without disqualifying the participant from the study, and perhaps uniquely in human research, observational units may be added through the primary and permanent dentition process. Another unusual study design in oral health research is the split-mouth study (Lesaffre et al. 2009). A review of 119 such studies found improved reporting across 2 decades, but overall quality “was still below the acceptable level”: 85% did not provide a sample size calculation, 76% did not identify a primary outcome, 61% used inappropriate statistical methods that did not consider the correlated data, and 38% did not justify the design (Qin et al. 2020).
A common approach to improving reports of biomedical research is to use a checklist of reporting guidelines. Checklists can remind authors to report key elements of a study and help reviewers find where each guideline is addressed when evaluating a manuscript. Most such guidelines are modeled after the CONSORT Statement for reporting randomized trials, first published in 1996 (Begg et al. 1996) and most recently updated in 2010 (Schulz et al. 2010). Also of interest to this document is the STROBE Statement for reporting observational studies (von Elm et al. 2014). Use of the CONSORT Statement has been associated with improved reporting of RCTs (Moher et al. 2001; Plint et al. 2006). However, the EQUATOR Network website lists over 550 checklists (University of Oxford Center for Statistics in Medicine n.d.). Thus, there appeared to be a need for a consolidated guideline that could address the main issues in the most common study designs in oral health.
Accordingly, members of the Task Force on Design and Analysis in Oral Health Research (Task Force on Design n.d.) began to develop guidelines for reporting clinical studies in oral health in 2019. The process of development is described in the OHStat Statement (Best et al. 2024). Drafts were circulated to editors of 85 oral health journals and to Task Force members and sponsors. The draft was discussed at a December 2020 workshop, attended by 49 researchers. The revision was circulated to the writing group and approved by the Task Force. As with other guidelines, the recommendations for reporting oral health research should 1) inform authors of the information needed to document and publish their research, 2) allow readers to assess the validity of the research or at least the credibility of the authors, 3) make the research process transparent, and 4) ideally, provide links to the information needed to replicate the study.
The target audiences for the OHStat Guidelines are authors, reviewers, and journal editors. Authors are advised to include the completed OHStat checklist when submitting a manuscript for publication. Journal editors and reviewers may also wish to consult these and other guidelines when evaluating a manuscript and should insist on complete adherence to the guidelines within journal page limits, word limits, or in supplemental information. Critical appraisal and interpretation of observational studies and clinical trials in oral health will improve with an understanding of the details that support study validity. The purpose of this article is to provide the rationale and scientific background for each item. The terminology used is that provided in the original CONSORT Explanation and Elaboration document (Altman et al. 2001).
The OHStat Statement: Explanations and Elaborations
Identifying Information
The primary purpose of identifying information—the title and abstract—is to help readers make an informed choice about whether to read an article. Not so obvious is that this information should also help readers decide not to read an article. Thus, titles should identify the relationship that was studied. The title should not attempt to “capture the reader’s attention” with anything other than an accurate description of the research. Abstracts should not “highlight the research” but, again, should summarize it accurately so readers will know what to expect if they read the article (Lang 2010).
The strength of evidence for health care interventions is limited by the study design. Including this information in the title helps with critical appraisal by assisting readers decide whether to read the article. Character limits notwithstanding, try to include as many of the SPICED-T elements as possible: Setting, Patients, Intervention, Comparator, Endpoint, Design, and sometimes Time frame (Lang 2020). A title can easily be shortened by removing the least important element. If applicable, some key elements must always be included in the title and abstract (e.g., single-sex studies).
The International Committee of Medical Journal Editors (ICMJE) recommends including a structured abstract when reporting original research (International Committee of Medical Journal Editors 2018). Such abstracts have 5 or more headings, and journals may specify which headings to use. Usually, only the results and conclusions require complete sentences. However, the form of the abstract will be specified by the individual journal.
Many studies have found important discrepancies between the abstract and the full article (Lang 2022). Because abstracts are often separated from the full article, the information they contain needs to be identical to that in the full article. The conclusions, results, and objectives all need to be consistent throughout the manuscript.
The classic IMRaD structure of scientific articles (Introduction, Methods, Results, and Discussion) is well known, and the OHStat Guidelines emphasize the reasons for this organization. In 1965, Sir Austin Bradford Hill stated in an editorial board meeting of the
Introduction: Why Did You Start? (Hill 1965)
After the title, the introduction is the most important and least-appreciated part of the scientific article. A good introduction can be enormously useful because it prepares readers to understand the paper, orients them to the research by establishing the need and importance of the study, indicates in general how the need was addressed, and tells readers what to expect if they continue to read the article.
Describe the historical, social, medical, ideological, or public health contexts of the problem. Indicate how serious and prevalent it is, as well as its consequences, implications, and whom it affects.
“Little is known about . . .” is rarely a good justification for doing research. A simple lack of knowledge is not sufficient to explain why a relationship needs to be studied or why a research report should be taken seriously (Lang 2017). Novice readers may need the background to understand the problem; experts expect a compelling justification of the research. The background in the Introduction should support a problem statement—the gap in knowledge or an untapped potential—that stimulated the research.
The problem statement in the Introduction should support the choice of the primary outcome—the variable whose change in value is of interest and why it is clinically or practically important. The specific and measurable objectives should determine the methods of the research.
Methods: What Did You Do?
The purpose of the Methods section is to tell how the research question was addressed. The thought that a clear and transparent Methods section would allow someone to replicate the study is laudable but often not realistic, given the word limitations of a typical journal article, even with supplemental information. Instead, it may be better to tell readers where to obtain copies of the protocol, the statistical analysis plan, and the original data set. In an article, a more reasonable goal is to provide enough information to establish the adequacy of the methods and, in so doing, establish the credibility of the authors as careful and thoughtful researchers.
To understand the essential aspects of the study, its design should be described in the Methods. The hierarchy of evidence for clinical studies (both observational studies and clinical trials) arranges sources of information and research designs from those with the most control over error, confounding, and bias to those with the least control. We encourage researchers to aim for the highest appropriate level of evidence (American Dental Association [ADA] 2013; Oxford for Evidence-Based Medicine Group 2013). The hierarchy listed below is one of many versions, although all include essentially the same designs in the same order (Torabinejad and Bahjri 2005):
Meta-analysis of RCTs
Systematic reviews
RCTs
Cohort studies
Case-control studies
Cross-sectional studies
Case series
Case reports
At a minimum, authors should report whether an observational study is a cohort, case-control, or cross-sectional design and include information about the study timeline and variants such as nested designs or crossover studies.
The hierarchy of evidence should not be confused with the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) “system of rating quality of evidence and grading strength of recommendations” (Guyatt et al. 2011). The hierarchy simply ranks therapeutic study designs by their potential to control for bias. Grades or levels of evidence usually refer to ways to describe or score the quality of individual studies.
A clinical trial is a “a research study in which one or more human subjects are prospectively assigned to one or more interventions (which may include placebo or other control) to evaluate the effects of those interventions on health-related biomedical or behavioral outcomes” (U.S. National Institutes of Health 2014). A clinical trial is one that meets all 4 of the following criteria:
It involves human participants.
It prospectively assigns participants to an intervention (not necessarily random assignment).
It evaluates the effect of the intervention on the participants.
It has a health-related biomedical or behavioral outcome.
Note that the definition includes both conventional parallel-group studies—where participants are assigned to interventions—and within-person studies—studies in which specific body parts or locations, such as lesions or dentition in the same individual, are assigned to experimental groups. The latter type of assignment allows participants to receive 2 or more treatments on different structures or areas so that each patient acts concurrently or sequentially as their own controls. Examples include dental split-mouth studies.
Conventionally, there are 2 types of human studies with health-related outcomes—clinical trials and observational studies. Most studies in oral health are observational studies, studies that do not meet the above definition.
The 7 guidelines especially relevant to clinical trials are in bold: 7, 8, 18, 19, 26, 27, and 39.
Although the 2 designs have much in common, they differ greatly in terms of how they are designed and how their results are evaluated. An explanatory RCT evaluates the efficacy of an intervention under controlled conditions in a narrowly defined patient population, which maximizes internal validity but can limit generalizability (external validity). A pragmatic RCT is conducted under real-world conditions, where key aspects of the study can have great variability: the diagnosis, enrollment, treatment, participant adherence to treatment, and data collection. A pragmatic trial is designed to assess comparative effectiveness in more typical settings—to maximize external validity (Chalkidou et al. 2012). Other approaches (e.g., safety trials, dose finding studies) should also be reported, where appropriate.
Both clinical trials and observational studies may be registered, thus improving transparency. Most major journals require, as a condition of publication, that a randomized clinical trial must be registered on a public trials registry, such as clinicaltrials.gov or others listed on the WHO International Clinical Trials Registry Platform (ICTRP) (World Health Organization 2014)
Observational studies also benefit from a statistical analysis plan. Outlining the details of study methodology before the study begins can distinguish a priori comparisons from post hoc analyses. Observational studies should clearly specify the hypotheses intended to be tested and the statistical methods planned to test them. Differences between the study as planned and the study as published should also be disclosed, as they should be in any design, and the differences and their effects on the reliability of the study results explained.
If applicable, tell how to obtain the study protocol, the statistical analysis plan, the original data, or any biological samples.
A standard requirement for conducting and publishing any research involving human and animal participants is prestudy approval by a recognized institutional review board (IRB), whose task is to protect the rights and welfare of participants during and after the study.
Some types of research, such as surveys, benign behavioral interventions, and routinely collected clinical or educational data, might be exempt from IRB approval, but such an exemption still needs to be approved by an IRB and documented in the article.
Prospective studies of adults must obtain written informed
Authors (and the authors’ institution or employers) should report any competing or potential conflicts of interest that might influence or bias the conduct or reporting of the research (World Association of Medical Editors n.d.). Competing interests may be related to financial, professional, intellectual, political, or personal relationships. They may be only potential or perceived, or they may be factual. Competing interests do not necessarily mean that the research is biased. This information may be placed at the beginning or end of the Methods section or before the references, depending on the journal.
Several groups fund clinical research, including government agencies, consumer groups, advocacy groups, private foundations, wealthy individuals, clinical centers, and industry. Almost any funding source has competing interests, that is, economic, programmatic, or reputational incentives to report favorable results. In addition, favorable results may also affect the chances of continued funding. Thus, the involvement of a funder in any phase of the research should be disclosed; any supplies, drugs, equipment, technical support, or unrestricted funding provided should be acknowledged. If no support was given, a simple statement to that effect is sufficient.
If participating individual physicians, group practices, clinics, or research sites were compensated for their time or contribution to the research, this must be disclosed.
Importantly, the potential for bias does not necessarily mean that the results are biased.
The setting or venue of a study (e.g., private practice, community hospital, academic medical center) and its location (e.g., rural, inner city, geographic region, country) can affect how well its findings might apply to other settings and locations. Aspects of location include social, cultural, economic, political, and geographic factors. If relevant to the generalizability of the findings, state why the setting(s) and location(s) were chosen.
The challenge of generalizability arises because your study includes past information on individual people, experiencing interventions, measured uniquely, at your particular location. These “instances on which data are collected” are only of scientific interest to the extent that your study results may generalize to the units, treatments, variables, and settings not directly observed (Shadish et al. 2002). Provide sufficiently detailed information to inform readers who was eligible for the study and to assess to whom the findings can be generalized.
For example, as recently as 30 y ago, women of childbearing age were excluded from clinical trials by Food and Drug Administration (FDA) policy. Current federal policies encourage including both sexes and all gender identities in clinical studies (De Castro et al. 2016; Wainer et al. 2020). Data should be routinely disaggregated and analyzed by sex or gender, as appropriate (Heidari et al. 2016). Single-sex studies must be justified if the reasons are not obvious. Women and minority group members are markedly less likely to volunteer for some clinical trials, so additional recruitment efforts may be required to achieve generalizability, inclusivity, and equity (Masood et al. 2019).
If race, ethnicity, sex, primary language, or disability is reported, indicate the classification options used and report who assigned the categories (e.g., self-report, investigator judgment) (Dorsey and Graham 2011).
As item 11 identified
Typically, stratification or matching is employed during the recruitment process to ensure comparability of study groups. If done, give the reader sufficient information to judge the adequacy of these efforts.
Such descriptions might include the pharmacological properties of a drug, the technical aspects of a procedure, or patient home-care instructions. The most common types of comparators include placebo, a competing intervention, usual or standard of care, and untreated or unexposed. If multiple interventions occurred, describe the sequencing. Each intervention must be completely and accurately described if the research is to be evaluable.
This item describes what interventions were done; the next item describes how the database records this and other features.
The primary outcome drives the study’s sample size calculation and statistical power, so it must be clearly specified and defined (International Committee of Medical Journal Editors 2018). If possible, use common definitions and established outcome measures, to make comparing results across similar studies easier. Secondary questions and outcomes may be posed, but because trials are generally not designed or adequately powered to address secondary or exploratory outcomes, authors should interpret the results carefully (Pihlstrom and Barnett 2010). Secondary outcomes should be labeled as exploratory unless they are clearly outlined in a prespecified analysis plan.
Although outcomes with clinical and practical relevance are preferred, surrogate outcomes may also be used. If so, the biochemical mechanism or epidemiologic rationale for their use should be clear. Describe the established relationship between the surrogate measure and the clinical endpoint, where possible.
If a composite outcome is used—where 2 or more variables are combined into a single outcome—results should be reported for each of its components, in addition to the composite variable. Consider whether the components are of similar importance. For example, are the counts of decayed, filled, and missing teeth comparable (Casamassimo et al. 2009)?
In studies measuring outcomes at various time points, specify the follow-up duration of primary interest. State how the comparison time was determined and whether comparisons were made at a prespecified time point.
Participants may be considered independent for the purposes of statistical analysis, but sites measured within a patient’s mouth are dependent, meaning that the value of one measurement is correlated with another. For example, periodontal measures in the same mouth are positively correlated (Imrey 1986; Fleiss et al. 1988; Imrey et al. 1994). The result is that analyzing tooth- or site-level data as if they were independent underestimates variability and overstates statistical significance (Fleiss et al. 1988). Therefore, analyses should account for correlated data (this approach is described in item 24). The statistical method for analyzing correlated data should be described in oral health publications (see item 30), where correlated measurements are the rule rather than the exception (Sterne et al. 1988; DeRouen 1990; McDonald and Pack 1990; DeRouen et al. 1991; Albandar and Goldstein 1992; Smith and Hadgu 1992).
Data from partial-mouth examinations can underestimate disease prevalence. Disease severity is overestimated if data are restricted to high-risk segments of the dentition (Eke et al. 2010). Accordingly, partial-mouth examinations should be justified, reported, and discussed.
The ultimate goal of medicine is to improve personal and population health. So, research should focus on clinically important and practically useful outcomes. The National Institutes of Health defines clinically meaningful outcomes as a measure of how a patient feels, functions, or survives (Biomarkers Definitions Working Group 2001). Facial aesthetics, tooth retention, and oral function are key to oral health. A practically useful outcome should be well defined, reliable, measurable, interpretable, and sensitive to the effects of an intervention (Fleming and Powers 2012).
For example, the ADA clinical practice guideline on the nonsurgical treatment of chronic periodontitis interpreted a mean difference in clinical attachment loss (CAL) between treatment and control using a “clinical relevance scale” (Smiley et al. 2015). Even if statistically significant, a CAL difference of 0.2 mm was interpreted as “zero effect.” A difference in the range of >0.2 to 0.4 mm was interpreted as a “small effect,” and this was the minimum clinically important difference used in the practice guideline.
Small differences between large groups are often clinically meaningless. A “positive” finding is statistically and clinically defensible if all the values in a 95% confidence interval (CI) around the resulting effect size exceed the minimum clinically important difference.
As item 14 describes
Not all clinical trials use random assignment, nor is concealment always possible (Friedman et al. 2015). Tell how interventions were assigned or allocated. In parallel group studies, the schedule indicates the group to which the next enrolled participant will be assigned. In within-person studies (e.g., split-mouth studies), the schedule indicates the location or ordering of the interventions. If interventions were assigned at random, tell how this was accomplished (i.e., with the use of a validated statistical software program or a table of random numbers). The unit of randomization may not be the unit of measurement.
Report how (or whether) the allocation schedule was concealed from study personnel to prevent group assignment from being intentionally or unintentionally manipulated.
Implementation refers to how a participant is assigned to a group without anyone knowing whether it is to the intervention group or the control group. Tell who generated the allocation schedule, who enrolled participants in the trial, and who assigned patients to groups (Schulz et al. 2010). Studies with inadequate or unstated allocation concealment tend to have significantly better outcomes than those with appropriate allocation concealment (Schulz et al. 1995).
Allocation concealment keeps group assignment hidden until after patient recruitment; blinding can also keep assignments hidden during the intervention and after.
Certain forms of bias may be prevented by using blinding (e.g., selection bias, ascertainment bias, and expectation bias). In clinical trials, blinding is not required, but when it is not used, this should be clear. If any, report the methods to mask the interventions from study administrators, from participants, and from those measuring outcomes. Contrary to popular belief, there are no widely agreed-on definitions for which groups are masked in a “single-blind” or “double-blind” study, so these terms should not be used (Lang and Stroup 2020). Consequently, specify who was blinded to interventions. Examples include participants, care providers, and those assessing outcomes.
Describe the similarities and differences between a placebo or sham procedure and the active drug or the trial procedure. Testing the effectiveness of blinding after the trial is over is uninformative because the results cannot be separated from pretrial expectations of the success of the intervention (Sackett 2004). Instead, indicate whether the interventions could be distinguished by the participants or those assessing outcomes. Report how blinding was maintained and whether or how it may have been compromised.
In all blinded studies using clinical examiners, specify what the assessor was blind to. For example, in a periodontal therapy trial, it is preferable for the assessment of end-of-study pocket depth to be done blinded to the baseline value, as well as to group membership. Report whether laboratory values (e.g., IL-6) were assayed blind to group membership.
The process of capturing data—the operational details of turning a concept into an entry in a database—bears directly on the validity of the data collected. How this process occurs can improve (or limit) the completeness and accuracy of the information used for analysis. Report information sufficient for a reader to judge these important details and to reproduce the process in future research.
Survey instruments (regardless if they are conducted on paper, via phone, or electronically) should be identified or provided in supplemental material. Cite a reference for any validation studies or established rating scales used, and disclose any modifications. Report how subscales or dimensions were scored and indicate any important thresholds (e.g., an established “normal” range, “high” or “low” scores). If the scale uses a “total score,” consider the effect of missing values (specifically, a missing value should not necessarily be scored as zero).
Large databases—clinical, administrative, billing—are increasingly available for analysis but have several characteristics that must be addressed (Katz 1997). Information recorded for another purpose must be converted into a research database. Report how the original information was collected and how entries are used or combined into variables for the study. Specifically, describe the classification methods for interventions, exposures, outcomes, and confounders. Consider the risk of misclassification because medical records are limited in studying clinical topics and often contain errors and omissions in clinically important areas (Hornberger and Wrone 1997).
For example, the lack of a CDT code (Code on Dental Procedures and Nomenclature) for caries in a database from a periodontal practice does not necessarily indicate noncarious dentition. That is, a missing value may lead to misclassification bias or to unmeasured confounding. Attend to time-stamped records to ensure that the values of predictor variables precede the encoding of outcomes (and not the other way around).
The codes and algorithms for subject selection should be either given in detail or made available on request, including how the algorithms were validated. Report how records are linked between databases. If the algorithms are extensive, consider including this information in supplemental material accompanying an article.
Science depends on measurement (“Can’t measure it, can’t do science on it.”). Accordingly, describe any training, experience, calibration, monitoring, or other efforts to improve the accuracy and consistency of measurements. Indicate the number of measurements taken for each outcome of interest, the number of independent observers, and level of inter- and intraobserver variability (e.g., κ, percent agreement, intraclass correlation coefficients). Disagreements between assessors are common (Holtfreter et al. 2012).
Describe what assessors knew about the study participant before they rendered a judgment. Report whether clinical outcomes were determined by the same individuals implementing the intervention(s). The independent assessment of predictors and outcomes adds to the credibility of findings.
The Methods section should identify potential sources of error, confounding, and bias and tell how these issues were addressed in the design or analysis. Describe how the role of potential confounders was addressed, including the use of stratification or statistical adjustment.
There are a large number of potential sources of bias to consider in the design, execution, analysis, and interpretation of research (Hartman et al. 2002). If your overview of the problem (item 4) identified potential bias in previous research, report your methods to overcome these difficulties.
Statistical Methods
In hypothesis-driven research, but especially in RCTs, a sample size should be reported and based on an a priori power calculation. Report the assumptions made in the determination (e.g., effect size, estimates of variability, expected dropout rates). Where possible, include an estimate of the minimum clinically importance difference on the primary outcome. Calculations should consider confounding variables as well the implications of insufficient enrollment, dropouts, or missing data (Hsieh 1989). Split-mouth studies require additional documentation (e.g., the standard deviation of the within-person differences) for sample size calculations (Pandis 2012).
For example, the minimum clinically important difference of 9 points (out of 80) on an oral health quality of life scale guided the sample size determination in a removable partial denture framework study (Ali et al. 2020).
Power calculations to determine sample size are not required—for example, the analysis data may have been previously collected for another purpose. In studies where the sample is fixed, describe how the study size is sufficient to estimate clinically meaningful differences. If a study is “too small,” the confidence intervals may be too wide to make a meaningful conclusion; if a study is “too big,” clinically inconsequential differences may be found (Altman and Bland 1995).
Studies designed to test equivalence or noninferiority (or “just as good as” studies) differ from superiority trials (studies of differences). Among other things, equivalence studies require a prespecified range of clinically important therapeutic effects (i.e., the equivalence margins) that must be reported and justified. See the article by Piaggio et al. (2012) for sample size calculation in equivalence studies.
Statistical analyses should include a predetermined plan for analyzing the primary outcome, with specific objectives and clear plans for addressing secondary and exploratory aims. Specify the statistical software used (e.g., SAS). If necessary for clarity, briefly note the procedures/extensions used (e.g., PROC LOGISTIC). As needed, report details in supplemental information.
The goal is to describe statistical methods with enough detail to enable a knowledgeable reader to assess the validity of the results and for those with access to the original data to verify the reported results. The data set and computer code used to perform the analysis should be available if requested.
An RCT of a single outcome may rely on randomization to justify a simple comparison reflecting the primary aim. An RCT with a longitudinal measure (e.g., baseline and follow-up) may also rely on randomization to ensure baseline comparability. By definition, only a longitudinal study may assess “change.” A repeated-measures mixed-model approach to account for baseline imbalance should be considered. In observational studies, analysis of covariance (ANCOVA) or adjustment for baseline values is generally inappropriate (Blance et al. 2007; Etminan et al. 2021).
Always report absolute risks because all other expressions of risk can be derived from these. Be aware that analyzing percent change can easily be misleading. Analyzing percent change (the difference from baseline divided by the baseline value) may violate several statistical assumptions and can easily exaggerate effects, so such analyses should be avoided.
For information on analyzing and reporting equivalence of noninferiority studies, see the articles by Piaggio et al. (2012), Flight and Julious (2016), and Ebbutt and Frith (1998).
Even in observational studies, it must be clear who was included in every analysis. Missing outcome data can be problematic but can be accommodated by modern statistical methods. In clinical trials, some patients may drop out, not receive the intended treatment, or not adhere to the trial protocol. To preserve the benefits of an RCT, intention-to-treat analysis (ITT) is recommended (Wood et al. 2004). Simply stated, ITT analysis means “once randomized, there analyzed” regardless of whether subjects actually received the allocated interventions or whether they adhered to follow-up visits or trial protocol. This analysis requires 2 conditions: all randomized patients should be included in the analysis, and they should be analyzed in the group to which they were allocated.
For superiority trials, report the ITT analysis for the primary outcome (Lachin 2000). Exclusion of eligible participants for any reason is incompatible with the intention-to-treat principle and may bias the results. Accordingly, include all randomized participants in the primary outcome analysis. This conservative approach acknowledges that participants may drop out
There is no consensus on acceptable “modified ITT” criteria (Brody 2016). Modified ITT is often reported inconsistently and has increased (Abraha and Montedori 2010). Deviations from ITT described as “modified ITT” may exclude patients who did not commence their randomized intervention (as-treated analysis), patients without a baseline assessment, patients without a postbaseline assessment, patients not returning for follow-up assessments, or patients found to lack a specific diagnosis at entry. Report the justification for modifications to the standard criteria. A per-protocol analysis includes only participants who completed the study without major departures from the protocol. Such analysis may be reported—it can indicate effectiveness—but should not supplant the ITT analysis (Shrier et al. 2014). Published reports of clinical trials should clearly distinguish between ITT analyses and all other forms by describing who was included in each analysis.
Many clinical studies analyze the data for 1 or more subgroups. Planned and well-specified subgroup analysis has a stronger basis for inference. In contrast, post hoc subgroup analysis is at high risk for spurious findings and is typically discouraged; at a minimum, it should always be identified as exploratory (Pocock et al. 1987; Mills 1993). For case-control and cohort studies, analyzing subsets of the study population that were not part of the original study objectives is not appropriate.
The strongest inferences are made in trials that are completed as planned (i.e., reaching 1 or more of the following planned goals: obtaining an adequate sample size, collecting follow-up data from a sufficient number of patients, having event counts sufficient for analysis, or closing the trial on the scheduled date).
However, sometimes trials are stopped early when an interim analysis triggers a statistical stopping rule. Performing multiple statistical analyses as the data accumulate during a trial (usually for safety reasons) weakens inference and increases the chances of reporting spurious results unless appropriate statistical corrections are made. The timing of all interim analyses should be reported, as should adjustments made to account for multiple analyses (e.g., multiple comparison or group sequential methods) if the interim results are to be published.
As above, when an interim analysis finds that a treatment is exceptionally effective or exceptionally harmful, the trial may be stopped early. Withholding an effective intervention from the control group may be unethical, as is continuing to subject the treatment group to a harmful intervention. If applicable, report how an independent data monitoring committee examined the accumulating data and include any formal statistical stopping rules.
In dental studies, missing teeth pose a unique problem. Before the study begins, identify strategies to accommodate missing teeth (e.g., the measurement of a contralateral tooth). In longitudinal studies, when a tooth was measured at baseline but is no longer present at follow-up, it should not be considered “missing” in the statistical sense but rather could be considered a negative outcome. This problem is similar to that of what to do with patients who drop out of a clinical trial, except that in oral health studies, a tooth may drop out and the patient remain.
State specifically how missing data were handled in the analyses. Measures to prevent missing data and to retain participants should also be described. Missing data may be associated with loss of power and potential bias.
Unless missingness is rare, complete case analysis—excluding participants (or teeth) with missing data—is rarely justified. “There are no universally applicable methods for handling missing data” (Shih 2002). We recommend assessing the differences between comparison groups in retention rates and patterns of “missingness” and exploring characteristics likely to be associated with missing data or dropouts. Describe the analytic approach used to address missing data, including methods for imputation and any sensitivity analyses used to explore the potential impact of missing data.
Prematurely dividing a continuous distribution of values into 2 or more categories can reduce statistical power and may introduce bias, depending on how the categories are determined. Thus, continuous variables should be maintained as such during analysis unless well-established and accepted criteria justify categorization. Report why the categories were created, when they were created (before or after data collection), and where and how the boundaries were assigned. This guideline does not preclude categorizing variables after analysis, rescaling units to be more clinically meaningful or to simplify communication and promote clinical utility.
Details on data transformation and imputation should be included in the statistical analysis section. If skewed data were mathematically transformed for analysis, indicate the transformation used (e.g., square root, log) and whether the transformation was successful (i.e., suitable for analysis with parametric methods). When describing the results, transform the results back to make them clinically meaningful (e.g., “square root follow-up time” should be back-transformed to months). If results are best expressed as percentage change, use the preferred method of analysis and then convert the summary statistics into absolute or relative risk (Vickers 2016).
Describe the predetermined analyses plan for the primary outcome. List specific objectives and clearly address plans for secondary or exploratory aims.
The study design should drive the modeling approach of the specific aims. If variable selection is employed, it should follow a well-defined procedure to control for potential bias in the final model. If possible, determine whether interaction between predictors is present; if so, describe effect modification (Hyman 2006). For prediction modeling, all “candidate” predictors should be evaluated holistically (Steyerberg and Harrell 2016). If applicable, identify the variable-selection process used (Nguyen et al. 2019; Talbot and Massamba 2019). Be aware that data-driven methods have been shown to be biased toward too high an estimate with too narrow confidence limits.
In any study where there are multiple measurements on the same individual, the correlation between these measures should be considered. In within-person trials, each participant is subjected to 2 or more treatments, and measurements are therefore correlated. In such trials, a group is the set of participants’ body sites allocated to a particular intervention or to the order in which the interventions are given. Report the statistical methods appropriate for the specific within-person design employed (Pandis et al. 2019). Report the observed correlation between body sites for continuous outcomes and tabulation of paired results for binary outcomes. In these trials, the expected correlation of within-person treatment outcomes should be incorporated when estimating the sample size (Hujoel and Loesche 1990; Hujoel 1998). In designs in which segments, quadrants, or half-dentitions within each subject are assigned interventions, consider possible carryover effects (Chilton and Fleiss 1986; Hujoel and Moulton 1988; Lesaffre et al. 2009).
Ancillary analyses are intended to support the preplanned primary (and perhaps secondary) analyses. Analyses suggested by the data are addressed in item 32.
A sensitivity analysis can determine the robustness of the findings to changes in methods or assumptions. Subgroup or interaction analysis may be used to explore whether the findings are consistent in subpopulations. Missing data can lead to potentially biased results and loss of power. If missing values are imputed, document the prevalence of missing cases for each variable and describe the method of imputation. Multiple imputations require reporting the results of sensitivity analysis.
There is no consensus on how to assess the assumptions that underlie common analysis methods (Nørskov et al. 2021). Many assumptions cannot be statistically established, and only context knowledge will serve to guide the analyses (e.g., see item 30, correlated data). The validity of results may be enhanced by reporting clear, complete, and transparent assessments—likely in supplemental material because of publication limitations.
Whereas ancillary analyses may support the primary aims, analyses suggested post hoc by the data or initial analyses can only be considered exploratory. As Marcia McNutt, past editor of
If exploratory findings are to be reported (item 41), specify clearly the way the data were approached for the exploratory analyses. The limitations of post hoc analyses are explicit if the process is transparent.
A credible scientific claim has many components, and statistical analyses continue to be critically important in supporting claims. However, researchers often rely on “
That said, in carefully designed and executed clinical trials or in large population-based sampling studies, sometimes it is appropriate to base statistical inference on classical null hypothesis significance testing. In such cases, it is important to emphasize that a
In most instances, the estimated result and its 95% confidence interval are preferred. A larger interval indicates a less precise estimate, so the range of the interval should also be considered in the interpretation of potential clinical importance. More important, an interval that contains both clinically important and unimportant values usually suggests ambiguous results and should be interpreted with caution.
The decision to accept or reject a manuscript based on “statistically significant” results should be replaced with criteria based on the strength of the study design and analyses. Similarly, outcomes having small
Never report only “The results were statistically significant (
For large calculated
Discontinue designations for levels of significance—for example, a single asterisk for
With one exception, never report a
Analyses of large databases often can produce very small probabilities (
Results: What Did You Find?
The obvious purpose of the Results section is to report and describe the findings of the study: the data that were collected and the relationships among them. A purpose just as important is to tell what happened during the study, such as protocol deviations, changes in the intervention, and unexpected data losses. Numbers in the text are difficult to read and compare, so data should be reported in tables or graphs whenever possible and duplicated in the text as little as possible (Council of Science Editors 2015; Christiansen et al. 2020). Ideally, call attention to general results in the text and refer readers to the details in tables and figures: for example, “Periodontal disease was present in 28% of the patients (Table 4)” (Lang 2010).
If there are changes in the protocols or other important revisions in the conduct of the study, describe them in the first paragraph of results. Typically, however, the Results section should begin with a description of the participants, include simple presentations of one-variable-at-a-time results, and end with results of a multivariable model. Sufficient detail should be provided so that results can be verified and integrated into other analyses (Lang and Altman 2013).
Although randomized controlled trials are considered the strongest research design because they provide the most control over bias, most studies are observational. Thus, the findings from observational studies should be phrased using terms such as “association” or “related.” Avoid terms that connote causality such as “lead to,” “effect,” “influence,” and “produce” unless the result arises from an appropriate analysis of a causal model (Bellamy et al. 2007).
Report the numerators and denominators for percentages, rates, and ratios. Summarize continuous data with a measure of central tendency and a measure of variability. For distributions that are reasonably symmetrical, means and standard deviations are appropriate. For nonsymmetric data, give the median and an appropriate percentile range. Do not use the standard error of the mean (SE or SEM) to describe the variability of observations. The SE is an inferential statistic, not a descriptive one. (A range encompassing an estimate ±1 SE represents a 68% CI.)
Visually summarize the research design and analysis populations in a flow diagram. The diagram can show the number of participants at each stage of sample selection, the size of each group and subgroup in the analysis, and the number of participants with various outcomes. Another organization for the flow diagram identifies a target population, a source population screened from the target population, an eligible sample selected from the source population, and the study participants enrolled from those eligible. Both the numerators and denominators for intention-to-treat and per-protocol analysis can be shown, for example. The flow diagram also allows all participants to be accounted for at each stage of the study by checking the numbers.
Participant characteristics are often best reported in tables with standard descriptive statistics. Generally, because items are more easily compared side-to-side (space permitting), groups should be named in the column headings, and the variables on which they are compared should be named in the row headings (Lang 2018). Column headings usually also indicate the size of each group (e.g., “
Report numbers and measurements with an appropriate degree of precision. Percentages are preferred to proportions. Numerators and denominators should always be clear and easily found. Round to whole numbers unless there is a compelling reason not to. Reporting more than 2 decimal places is rarely needed.
In an RCT, do not compare baseline differences with significance tests; by definition, any imbalances occurred by chance. Consider imbalance in light of the ability of the predictor to influence the outcomes. Consider incorporating clinically important imbalances into the analyses and report how the choice was made. In case-control and cohort studies, this is also important.
Therapies and case definitions continuously evolve, and the exposures and risks in a community can change profoundly. Report the actual time frame of the study so that it may be compared to others.
Summarize the results in clinically meaningful or practical terms. For example, “Brushing with fluoride toothpaste had a statistically significant effect on the mean number of decayed, missing, and filled primary tooth surfaces (DMFS) . . . for populations at high risk of developing caries [standardized mean difference = −0.25 (95% CI = −0.36 to −0.14)]” reports the result as a standardized difference. Reporting in clinically meaningful terms would phrase it as “(DMFS difference = −1.92 (95% CI = −1.32 to −2.49).” Include starting and ending values and a brief summary of an analysis for the primary outcome of interest, as well as any prespecified secondary outcomes identified in the methods and each of the primary covariates. All baseline and end-of-study descriptive statistics should be accompanied by appropriate measures of variability. As with group descriptions, group comparisons are usually best summarized in a table. As a result, the differences and their confidence intervals (and
There is a difference between reporting a “difference” between independent groups and reporting a “change” across time within a group—a difference reflected in the statistical analysis used. But there are any number of “within-group” studies in oral health; split-mouth and crossover designs come to mind. In these cases, it is important to report the repeated-measures analysis method employed and to report results accounting for the within-person correlation; see item 30: correlated data.
Tables and graphs used to collect or analyze data may not be optimal for communicating data. They should be designed to present patterns in the evidence, such as trends, differences, or associations, especially to clarify relationships that would otherwise be difficult to explain in the text. Tables can effectively summarize and compare detailed information. Figures effectively show trends and patterns in the data (Council of Science Editors 2015; Christiansen et al. 2020).
Tables and graphs should complement rather than duplicate each other or the text. Because missing data are common, the sample size should be clear for every summary statistic. Consider including a column with the number of participants in a table. Tables and figures should be understandable without undue reference to text. Except for horizontal lines, tables should generally be free of lines, boxes, arrows, or other devices unless they indicate the structure of the data (Lang and Secic 2006; Lang 2010).
Clinical and laboratory images (e.g., radiographs, photographs, electrocardiograms, blots) differ from other visuals in scientific publications because they do not present, organize, or summarize information; they are the information. For this reason, images must be well documented. The 6 CLIP principles (Lang et al. 2012) identify key information that could or should be reported:
Identify the subject of the image.
Tell how the image was acquired.
Explain why the specific image was selected.
Describe any modifications of the image after it was obtained.
Emphasize the important details of the image itself.
Interpret and give the implications of the image.
The overarching goal is that an image must correctly and clearly represent the scientific content. However, the National Institutes of Health’s Office of Research Integrity reports that more than 80% of accusations of misconduct involve image manipulation (Office of the Secretary 2017). Always retain the unprocessed image and clearly document all changes made to the submitted image. Follow journal guidelines for permissible processing. The most common problematic manipulations are undisclosed incidences of (Rossner and Yamada 2004)
Splicing different images together into a single image
Changing brightness and contrast on only part of the image
Using cloning tools to hide details
Cropping images to eliminate information
Describe participants who did not complete the protocol (e.g., those leaving the study, lost to follow-up, whose treatment was ended, and those who deviated from the protocol).
“‘Harm’ is the totality of possible adverse consequences of an intervention or therapy; they are the direct opposite of benefits, against which they must be compared” (Ioannidis et al. 2004). Report expected and unexpected adverse consequences so that readers may make informed decisions about using interventions in practice. Even in observational studies, consider the effect of dropouts and loss to follow-up on the results. Adverse outcomes can impact the validity of the study or affect whether it is ethical to continue a longitudinal study.
Especially in RCTs, where harms may be caused by the intervention, report any harms or adverse events (Ioannidis et al. 2004). Describe or identify harms with standard definitions, including any grades for severity and extent, how they were detected, whether or not they were prespecified, whether they were anticipated or unexpected, and whether they were attributed to an intervention. Provide a balanced discussion of benefits and harms in the context of a study’s limitations and generalizability. When necessary, report harms in supplementary tables.
The results of unadjusted analyses may be reported, often as a prelude to the definitive findings from the adjusted—multivariable and/or multivariate—analysis. Unadjusted analyses should not be used for final interpretation unless confounding can be excluded.
For each multivariable analysis, report the measure of association or difference with corresponding confidence intervals for all variables in the final model, including any interaction terms. If the number of confounders or covariates is large, this detail could be included in supplemental material so that the summary table in the manuscript can focus on the primary factors of interest. Provide an appropriate measure of the model’s goodness of fit to the data (e.g.,
Research results commonly suggest further analyses. These post hoc analyses must be interpreted more cautiously than planned comparisons. Subgroup analyses and comparisons especially must be interpreted carefully, given the reduced statistical power associated with smaller sample sizes and the increased number of hypotheses tests, which can create the multiple comparisons problem of false positives and can lead to claims of “data dredging” (Erasmus et al. 2022). A post hoc power calculation should not be reported as it provides no additional information (Christogiannis et al. 2022).
Discussion: What Does It Mean?
When writing the Discussion (and the cover letter to the journal), keep in mind the advice attributed to Franz Ingelfinger, editor of the
Answer the research question posed in the Introduction. Briefly summarize the study but emphasize the final results. With a prespecified analytic plan, including adjustments for multiple comparisons, results may be reliably expressed for the primary analyses.
Avoid emphasizing results suggested by the data. Results from post hoc analyses should be labeled as descriptive or exploratory only and should be summarized separately.
Ensure that the main conclusions match those in the abstract (see item 3). Discrepancies in the information reported in the abstract with that reported in the article are distressingly common, serious, widespread, and longstanding (Zhang and Liu 2011; Bastian 2014; Lang 2022).
Discuss both the expected and unexpected results. The estimated treatment effect should be accompanied by a measure of precision (typically a 95% CI) and should be interpreted in terms of clinical or practical importance (Brignardello-Petersen et al. 2013). The implications of both the lower and upper limits of CIs should be considered when assessing clinical or practical importance.
For each research question, compare and contrast the findings of others with the results presented in this study. Depending on the topic, references more than 5 or 10 y old are generally less relevant, with the exception of seminal articles or comprehensive reviews. Cite the original source when possible; secondary sources are often incomplete and inaccurate. Read the full reference (not just the abstract) before citing it.
Describe the extent to which the study data may be representative of the population of interest (see item 12) (Shadish et al. 2002). In clinical trials, this may be affected by the approach (see item 7). Indicate how the results might be applied to other populations or settings. Generalizing often requires speculation, which should be acknowledged in the article.
If reasonable, speculate about how the findings might improve patient care if the intervention were to be widely adopted. If the results do generalize to other populations or settings, call attention to possible implications. For example, a more sensitive diagnostic test may detect more cases, increasing the number of patients treated and increasing the total treatment costs. A more effective but expensive treatment may not be affordable to the patients eligible to receive it. A new technology might require specialized maintenance capabilities and special training for those who use it.
Avoid saying that “more research is needed.” More research is always needed. Instead, if possible, suggest specific ways in which future research might be improved.
The Cochrane Collaboration has a useful tool for recognizing the main sources of potential bias: the ROBINS-I tool for assessing risk of bias in clinical studies (Sterne et al. 2016). Potential sources of bias include participant selection, unmeasured or uncontrolled confounding factors, inconsistent interventions, imprecise measurements, protocol deviations, missing data, variation in judgments, and selective reporting of results. The major limitations of retrospective and nonrandomized designs, self-reported surveys, analyses of databases clinical registries, and so on are widely known and need not be reported.
Many authors do not report limitations for fear their paper might be rejected. If limitations are acknowledged, a reviewer knows the authors were competent enough to recognize a limitation and honest enough to acknowledge it. Readers appreciate modesty as well.
Listing each conclusion promotes specificity and helps readers better understand the research. Do not overstate the implications of the research and do not speculate on the conclusions.
Results are not conclusions. “We found a 65% reduction in dental caries” is a result, not a conclusion. A conclusion identifies the clinical or practical implications. For example, “We believe the data clearly support the use of this treatment in children at high risk for caries in supervised brushing environments.”
Conclusions from clinical trials—randomized or not—should be based on the results of the primary outcome measure as analyzed in a prespecified statistical analysis plan.
Conclusions from observational studies should be based on the results of multivariable models or other methods that control for correlated data, potential confounding, and effect modification. Conclusions should not be based on unadjusted analyses with a single predictor (independent or explanatory variable) unless confounding can be excluded.
Closing
Clinical research is difficult, and truth is elusive. The best that can be done is to conduct a well-designed study as rigorously as possible, to acknowledge its shortcomings, to present the results fairly, and to interpret treatment effects carefully, neither overstating their importance nor understating uncertainty (Pollock 2020).
Evidence-based dentistry is literature-based dentistry (Lang 2010). Clinicians, authors, reviewers, and editors should take the time to learn how to accurately report and assess the validity, relevance, and implications of the published literature. The Cochrane Center is the premier site for systematic reviews (The Cochrane Collaboration n.d.). Sites such as the ADA Center for Evidence-Based Dentistry (Center for Evidence-Based Medicine n.d.) and the University of Dundee Centre for Evidence-Based Dentistry (University of Dundee, School of Dentistry n.d.) make it easy to find clinical guidelines. Such clinical guidelines depend directly on the existing evidence and on the ability to appraise that evidence.
Ultimately, patient care is improved when valid and useful research is planned, executed, and communicated to practitioners. The guidelines presented here should assist authors in preparing research reports, journal editors in reviewing those reports, and clinicians in understanding those reports. Journal editors can also disseminate these guidelines by including them in their instructions to authors and insisting that authors follow them as a condition of publication. We also hope the OHStat Guidelines will serve as a template for updating and informing increasingly useful oral health research reports.
Author Contributions
A.M. Best and T.A. Lang contributed to the conception and design of the guidelines, took the lead in organizing, drafting, and documenting the original manuscript, and incorporated comments and insights from the other authors. J.C. Gunsolley, E. Ioannidou, and B.L. Greenberg contributed to the conception of the guidelines, critically appraised each revision, and provided substantive comments and insights throughout the development process. All authors agree to be accountable for all aspects of the work and approved the final draft for publication.
Footnotes
Disclaimer
This article was written by the Task Force writing group authors, who take sole responsibility for the final content. The views expressed do not represent the policies, views, or opinions of the authors’ institutions.
Task Force Writing Group
A.M. Best, Virginia Commonwealth University; B.L. Pihlstrom, University of Minnesota; D.V. Dawson, University of Iowa; B.L. Greenberg, New York Medical College; E. Ioannidou, University of Connecticut; J.C. Gunsolley, Virginia Commonwealth University; J.S. Hodges, University of Minnesota; and T.A. Lang, Principal, Tom Lang Communications and Training International. P.B. Imrey, Cleveland Clinic and Case Western Reserve University, also provided suggested revisions and performed comprehensive reviews.
Reviews
Substantial written critiques were also provided by M. Glick,
Declaration of Conflicting Interests
The authors declared the following potential conflicts of interest with respect to the research, authorship, and/or publication of this article: All authors have completed the ICMJE unified competing interest form, a copy of which is available from the corresponding author.
To encourage dissemination of the OHStat Statement, this article and the checklist is freely available on
. This article has been simultaneously copublished in the
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The Task Force on Design and Analysis in Oral Health Research provided funding for the December 2020 meeting and provided support for the consultant (T.L.). A.B. received funding for travel related to the December 2020 meeting. None of the Task Force sponsors was involved in the planning, execution, or writing of the OHStat documents. Additionally, no funder played a role in the drafting of the manuscript.
