Abstract
Reliability refers to the consistency of one outcome when the measurement is repeated. Sport scientists are aware of the importance of reliability and, consequently, a large number of single-center studies (performed in only one research center following specific protocols) have been conducted to evaluate the reliability of different testing procedures. Although single-center reliability studies are valuable, their findings could be compromised by the generally low sample size that can be gathered and restricted to the specific population and testing procedures used, reducing external validity. More robust information about the reliability of fitness tests could be obtained by pooling the data collected in independent research centers on a collaborative basis (multicenter reliability studies). This manuscript highlights the potential benefits of multicenter reliability studies and provides a number of recommendations for conducting multicenter reliability studies based on both a priori (researchers from different centers agree to collect data to address a specific question) and a posteriori (the data of published single-center studies are post-hoc combined) approaches.
Importance of reliability in sports medicine and science
The assessment of muscular function through fitness tests provides valuable information to sport scientists and practitioners [1]. The fitness tests may help to identify specific deficiencies [2], detect individuals suited to a particular athletic endeavor [3] and monitor muscular function over time [4]. One of the main properties of any fitness test is its reliability [5, 6]. It is important to note that different variables can be collected during a specific fitness test. For example, a fitness test could be the vertical jump and the outcome variables could be the jump height, maximum force, rate of force development, etc. In this regard, it is plausible to observe meaningful differences in reliability between different variables collected during a specific fitness test. For instance, during the isometric mid-thigh clean pull (fitness test) the maximum force can be obtained with a higher reliability than the rate of force development [7]. The present manuscript proposes a guide to explore the reliability of the different outcome variables that can be collected during specific fitness tests when the measurement is repeated.
The standard error of the measurement (SEM), expressed as coefficient of variation (CV), is the main indicator of the consistency of individual scores [5]. When assessing individual subjects, it is particularly important that the CV of the fitness test is lower than the smallest worthwhile change (SWC) in the tested performance [8]. Namely, if the CV of a fitness test (e.g., 5%) is higher than the SWC (e.g., 3%), a hypothetically meaningful change in performance (e.g., 4%) could be attributable to the typical error of the test. Unfortunately, the CV of many fitness test is higher than the SWC, limiting their usefulness to monitor individual changes in performance [8]. Therefore, the reliability of fitness tests should be explored to identify and standardize the testing procedures that provide more reliable outcomes. Although different criteria have been used to decide whether a fitness test can be considered reliable (e.g., CV
The evaluation of the reliability of fitness tests has received increasing attention over the last years. When searching in the Web of Science for the keywords “reliability” and “sport” or “fitness”, a total of 659 articles were found to be published in 2015, 719 in 2016, 797 in 2017, 885 in 2018, and 1042 in 2019. Most of these studies are single-center reliability studies (i.e., studies that are conducted in only one research center following a specific testing protocol). Although the single-center reliability studies are apparently valuable, their findings could be compromised by a relatively low sample size and somehow restricted to the specific population and testing procedures used. Of note is that the same search adding the keyword “multicenter” or “multicentre” only retrieved 4 articles for 2015, 7 articles for 2016, 8 articles for 2017, 3 articles in 2018, and 7 articles in 2019. However, after careful examination of those articles, only 6 articles were identified with the objective of examining the reliability of fitness tests [9, 10, 11, 12, 13, 14]. Five of those studies examined the reliability of field-based fitness tests (i.e., conducted outside the laboratory) [9, 10, 11, 12, 14] and only one study was conducted under laboratory conditions [13]. Overall, the literature search revealed a paucity of data from multicenter reliability studies, especially for fitness tests performed under laboratory conditions. Therefore, the objective of this manuscript is to propose the multicenter reliability study, which consists of the combination of the data collected by several research centers, as a viable approach to gather more meaningful information about the reliability of fitness tests performed in field and laboratory conditions.
Potential benefits of multicenter reliability studies
Multicenter reliability studies could be valuable in sports science, not only for exploring the effects of training interventions [19], but also for assessing the reliability of different fitness tests. While Impellizzeri [19] pointed out some advantages of multicenter studies for conducting training interventions, here we highlight the three main benefits of multicenter studies for exploring the reliability of fitness tests.
Recruitment of more subjects
The combination of the data collected in different research centers would allow the recruitment of a larger sample size. Note that although a minimum of 50 study participants have been recommended when conducting reliability studies in sports science [5], most of the single-center studies assessing the reliability of fitness tests use a lower sample size [20, 21, 22, 23]. The number of recruited participants is especially low when testing highly-trained or elite athletes. In this regard, one of the advantages of the multicenter approach is that the data of subjects belonging to a specific population (e.g., elite athletes) can be collected in different research centers using similar testing procedures and later combined in a multicenter reliability study. Therefore, through a larger sample size, multicenter reliability studies are expected to provide a more precise estimation of the reliability of different fitness tests.
Improved generalizability
Slight variations in the characteristics of single-center studies would allow to obtain a more robust indicator of the overall reliability of the fitness test when their outcomes are combined in a multicenter reliability study. The variation could consist of the assessment of different populations (elite athletes, physical education students, older adults, etc.) or the use of different testing procedures (exercises, instructions given to participants, equipment used, etc.). However, it is important to note that the multicenter reliability study could face the problem of the standardization of methods and protocols. Therefore, sport scientists should be cautious not to include too heterogeneous testing procedures that could compromise the accuracy of the overall CV.
Lower logistical and time resources
Sport scientists of different research centers could share their resources to provide a comprehensive reliability analysis of a fitness test. The reliability of a fitness test could be explored simultaneously in different research centers using slightly different testing procedures to obtain a more generalizable indicator of its reliability. More interestingly, the data of already published single-center studies could also be post-hoc combined within a multicenter reliability study (see next section for details).
A practical guide for the application of multicenter reliability studies
Here we provide a practical guide for helping and encouraging sport scientists in conducting multicenter reliability studies using both a priori and a posteriori approaches. Regarding the a priori approach, sport scientists from different research centers should agree in advance to collect data to address a specific question regarding the reliability of a fitness test. Regarding the a posteriori approach, sport scientists should share the original data of their already published single-center studies. Therefore, while in the a priori approach the data collection is pre-planned to address a specific reliability question, in the a posteriori approach the reliability question is addressed from the data collected in already published single-center studies.
A priori multicenter reliability studies
Step 1. Definition of the reliability question
Sport scientists should explicitly formulate the reliability question to be solved (e.g., “Is the reliability of the force-velocity relationship higher when obtained from proximal or distant points?”) [24]. In this first step, the aim (e.g., “to compare the reliability of the force-velocity relationship obtained from proximal and distant points”) and hypothesis (e.g., “the force-velocity relationship would be obtained with higher reliability from distant points as compared to the proximal points”) should also be formulated.
Step 2. Planning the testing procedures
Researchers from different centers should agree about the testing procedures to be used. Slightly different testing procedures could be followed in each research center. However, sport scientists should not use too heterogeneous testing procedures because they could compromise the overall CV obtained from the combination of the data collected in the different research centers. Following the example provided in the Step 1, it should be noted that the force-velocity relationship can be assessed in different exercises (e.g., sprinting, cycling, jumping, bench press, deadlift isokinetic tasks, etc.). However, a single research center rarely has access to the measurement devices needed to determine the force-velocity relationship in all exercises. Therefore, several research centers could collaborate to collect the data with the locally available measurement devices.
Step 3. Sharing the database
The data needed to conduct the reliability analysis should be shared preferably in the form of an Excel spreadsheet. The different research centers should use the same Excel spreadsheet to save time and avoid errors for the researcher processing the data. Following the example provided in the Step 1, each research center should provide the values of force and velocity against 4 or more loads/velocities.
Step 4. Data processing and reliability analysis
A researcher should be responsible for the processing of the data to calculate the dependent variables of interest. In our example, if we consider that each research center evaluated the tested task under 4 different loads, the parameters of the force-velocity relationship (see Jaric [25] for calculations) would be obtained from the force and velocity data recorded against the 2 intermediate loads (i.e., proximal loads) as well as from the force and velocity data collected against only the 2 most distant loads [24]. Once the magnitude of the dependent variables has been determined either within the same testing session (within-session reliability) or in different testing sessions (between-sessions reliability) the reliability analysis can be performed. To assess the reliability, we recommend to calculate the SEM between two consecutive trials as proposed by Hopkins [5]: SEM = standard deviation of the difference score between the trials 1 and 2 for all participants divided by
Step 5. Analysis of results
Sport scientists should provide a general reliability outcome based on the combination of the data collected in all independent research centers (i.e., overall CV). The possible differences in reliability between independent research centers could also be examined by comparing their CVs. The ratio between 2 CVs should be used for reliability comparisons, being 1.15 the minimum ratio to claim meaningful differences in reliability [26]. Therefore, besides providing the overall CV of the fitness test, specific recommendations could be provided when meaningful differences in reliability are detected between centers. Returning to our example, we would specifically determine an overall CV for the two-point method based on proximal loads and another overall CV for two-point method based on distant loads [27]. The comparison of the two overall CV would allow us to state whether one combination of loads is more reliable than the other. In addition, the comparison of the CV of independent research centers would allow us to determine whether similar results are obtained for all exercises evaluated. The CV ratio provides more valuable information than arbitrary reliability thresholds (e.g., CV
A posteriori multicenter reliability studies
The a posteriori approach should follow the same guides previously proposed for the a priori multicenter reliability study with the only difference being the Step 2. Regarding the Step 2 of the a posteriori approach, sport scientists should identify published manuscripts presenting a study design that allows for testing the hypotheses previously formulated. Note that the selected single-center studies do not necessarily have to address the identical reliability question of the a posteriori multicenter reliability study, but their testing procedure must allow to answer the same reliability question. Once the manuscripts are identified, the authors of the original single-center studies should be contacted to offer them the participation in the a posteriori multicenter reliability study. Once the original authors agree to collaborate, they should send their database as described in the Step 3. An a posteriori multicenter reliability study has recently been published following the guidelines proposed in the present manuscript [13].
Limitation and further research
A limitation of multicenter reliability studies is that the collection of data from different laboratories could artificially increase the between-subjects variability, and, in turn, overestimate the reliability when assessed through the ICC [5]. The same problem would apply to the “multicenter validity study” because the Pearson’s correlation coefficient could also be overestimated. On the other hand, since the CV calculated from the SEM is not affected by the between-subject variability, the overall CV of the fitness test could be accurately obtained by pooling the data of different research centers or single-center studies. We also believe that the a posteriori multicenter reliability study could present worthwhile benefits as compared to meta-analyses. Note that while meta-analyses are restricted to the results reported in single-center studies, in the a posteriori multicenter reliability study sport scientists can use the raw data of single-center studies whose testing procedures allow to answer the same reliability question. Therefore, the collaboration of researchers in a posteriori multicenter reliability studies could solve a common limitation of meta-analyses, which is the low number of studies that typically meet the inclusion and exclusion criteria.
Practical implications
Sport scientists are increasingly aware of the importance of reliability and the result of this is the increased number of scientific articles that have been published over the years to inform the reliability of different fitness tests. Of note is that the majority of those studies were performed in a single research center (i.e., single center reliability studies), while the use of multicenter studies for exploring the reliability of fitness tests are still not frequent in our area, especially for measurements that take place in the laboratory (e.g., isokinetic testing). The three main benefits that should encourage researchers to collaborate in multicenter reliability studies are the following: (I) increasing the precision of the reliability outcomes through the recruitment of more subjects, (II) an improved generalizability by allowing slight variations of the testing protocols, and (III) lower logistical and time resources because sport scientists can share their facilities, measuring devices or study sample (a priori study) as well as the data of their published manuscripts (a posteriori study). We hope that the growing engagement in multicenter reliability studies using both the a priori and a posteriori approaches will contribute to the refinement and standardization of the procedures of different fitness tests.
Footnotes
Acknowledgments
We would like to thank our scientific mentor, Prof. Dr. Slobodan Jaric for his constructive comments regarding the present manuscript as well as for all the valuable advice that he provided during the time that we had the opportunity to learn from him. Dear Slobodan, R.I.P.
Conflict of interest
The authors declare no conflict of interest.
