Abstract

Respected seniors, colleagues, students, and my fellow readers, we head forward with utmost care and caution to return some semblance of normal life and activity after enduring through a very unusual and unprecedented pandemic. I urge you to follow and respect the basic outlines of social distancing, mask usage, and proper sanitization protocols while all of us resume our daily lives.
A gray area is a world that doesn’t like gray areas. But the gray areas are where you find the complexity, it’s where you find the humanity and it’s where you find the truth. —Jon Ronson
This is a wonderful quote to introspect on; like myself, many of us are ardent academicians and researchers who partake, process, and publish interesting (or at least try to!) topics in the orthodontic field. Additionally, many of us also are involved in academic postgraduate teaching, where journal clubs and thesis work are an integral part of the curriculum. In both situations, one of our biggest conundrums we face is that of the P value and the pestering notions and implications of the same if it is above or below .05. This dichotomy dictates and defines our research.
Introduced in 1900 by Pearson, the P value means the probability, for a given statistical model, when the null hypothesis is true. Smaller the value, greater is the incompatibility of the data with the null hypothesis. 1 Over the course of time, P value has become the preferred method to describe the study results of medical literature. Since this value is the outcome of a statistical test, it is incorrectly assumed as the most important aspect of the statistical scores. As mentioned earlier, the P value only indicates the incompatibility with the null hypothesis, NOT the compatibility of the study hypothesis, that is, accepting the alternative hypothesis—a fact that I found out during the course of a manuscript submission. 2
During the revisions of a manuscript, I came across a very interesting editorial—“Moving to a World Beyond P < .05” by Wasserstein et al. 3 After thoroughly reading the same, it led to another similar editorial by Harvey and Brinkhof. 4 Both editorials dealt with the same topic—to look beyond and accept our study findings without being bracketed into either P < or > .05. They speak of a possibility where P = .051 and P = .049 are not treated as all or none and thereby not constraining our findings based on the division of the magic number that is, .05. Doing so would potentially allow more studies to be replicable and reproducible in some part, leading to more customized study and statistical designs. The encouragement of statistical thinking is given importance than “statistical significance” and how our manuscript study design is effectively communicated rather than outcome assessment arbitrarily based on the P value.
This is a fresh and intriguing take on this subject, while the debate among statisticians on this topic has been ongoing for a while, as mentioned in the 2 above-mentioned articles, never or rarely have we come across such articles in our specialty dealing with the same. All, if not 99.9%, of orthodontic paper results are dichotomized, and on the basis of these results, clinical and observational inferences are drawn that if looked upon from this viewpoint may not be entirely correct. Wasserstein et al
3
recommended the following—“
Thus, now it is pertinent to ask, and introspect to a degree as well, did we or have we, at some point, not accepted this uncertainty and reported P = .051 as statistically insignificant and vice versa, and if so, were the authors scrutinized by a respective journal and peer reviewers? How many times have we come across an article in a good peer-reviewed journal that reports such findings in open and thoughtful manner, especially in orthodontics? The answer is very few or none, more so in regard to the mainstream topics right now like accelerated tooth movement, efficacy of x or y appliance, etc., where restriction of our viewpoint to .05 may lead to draw some erroneous conclusions and clinical recommendations. Many may also wonder what could be the alternative—well, there are several options such as confidence intervals, credibility or prediction intervals, likelihood ratios, Bayesian statistics, and decision-theoretic modeling.4,5 These methods tackle the issue of effect size, focusing more on estimation than testing.
As I intend to fully debate and deliberate on this “gray” area, I humbly implore all our readers and also invite articles where critical analysis of this dichotomy and its avoidance could have led to different conclusions if it had been investigated from their newer outlook that has been presented today. I am sure and confident that upon doing so, many of the readers (especially the younger ones) will end up with a broader scientific perspective at the end of it. While this breaks the set norm that has been followed for decades, I would like to conclude with the following quote with a hope to inspire and introspect the “researcher” within all of us
You never know what you can do until you try, and very few try unless they have to. —CS Lewis
