Sage Journals: Discover world-class research

Abstract

Reliability refers to the consistency of one outcome when the measurement is repeated. Sport scientists are aware of the importance of reliability and, consequently, a large number of single-center studies (performed in only one research center following specific protocols) have been conducted to evaluate the reliability of different testing procedures. Although single-center reliability studies are valuable, their findings could be compromised by the generally low sample size that can be gathered and restricted to the specific population and testing procedures used, reducing external validity. More robust information about the reliability of fitness tests could be obtained by pooling the data collected in independent research centers on a collaborative basis (multicenter reliability studies). This manuscript highlights the potential benefits of multicenter reliability studies and provides a number of recommendations for conducting multicenter reliability studies based on both a priori (researchers from different centers agree to collect data to address a specific question) and a posteriori (the data of published single-center studies are post-hoc combined) approaches.

Keywords

Coefficient of variation collaborative research fitness tests reproducibility testing

1. Importance of reliability in sports medicine and science

The assessment of muscular function through fitness tests provides valuable information to sport scientists and practitioners [1]. The fitness tests may help to identify specific deficiencies [2], detect individuals suited to a particular athletic endeavor [3] and monitor muscular function over time [4]. One of the main properties of any fitness test is its reliability [5, 6]. It is important to note that different variables can be collected during a specific fitness test. For example, a fitness test could be the vertical jump and the outcome variables could be the jump height, maximum force, rate of force development, etc. In this regard, it is plausible to observe meaningful differences in reliability between different variables collected during a specific fitness test. For instance, during the isometric mid-thigh clean pull (fitness test) the maximum force can be obtained with a higher reliability than the rate of force development [7]. The present manuscript proposes a guide to explore the reliability of the different outcome variables that can be collected during specific fitness tests when the measurement is repeated.

The standard error of the measurement (SEM), expressed as coefficient of variation (CV), is the main indicator of the consistency of individual scores [5]. When assessing individual subjects, it is particularly important that the CV of the fitness test is lower than the smallest worthwhile change (SWC) in the tested performance [8]. Namely, if the CV of a fitness test (e.g., 5%) is higher than the SWC (e.g., 3%), a hypothetically meaningful change in performance (e.g., 4%) could be attributable to the typical error of the test. Unfortunately, the CV of many fitness test is higher than the SWC, limiting their usefulness to monitor individual changes in performance [8]. Therefore, the reliability of fitness tests should be explored to identify and standardize the testing procedures that provide more reliable outcomes. Although different criteria have been used to decide whether a fitness test can be considered reliable (e.g., CV $<$ 10% and 5% for acceptable and high reliability, respectively) [7], the CV should be as low as possible to ensure that the fitness test is able to detect the SWC in performance.

The evaluation of the reliability of fitness tests has received increasing attention over the last years. When searching in the Web of Science for the keywords “reliability” and “sport” or “fitness”, a total of 659 articles were found to be published in 2015, 719 in 2016, 797 in 2017, 885 in 2018, and 1042 in 2019. Most of these studies are single-center reliability studies (i.e., studies that are conducted in only one research center following a specific testing protocol). Although the single-center reliability studies are apparently valuable, their findings could be compromised by a relatively low sample size and somehow restricted to the specific population and testing procedures used. Of note is that the same search adding the keyword “multicenter” or “multicentre” only retrieved 4 articles for 2015, 7 articles for 2016, 8 articles for 2017, 3 articles in 2018, and 7 articles in 2019. However, after careful examination of those articles, only 6 articles were identified with the objective of examining the reliability of fitness tests [9, 10, 11, 12, 13, 14]. Five of those studies examined the reliability of field-based fitness tests (i.e., conducted outside the laboratory) [9, 10, 11, 12, 14] and only one study was conducted under laboratory conditions [13]. Overall, the literature search revealed a paucity of data from multicenter reliability studies, especially for fitness tests performed under laboratory conditions. Therefore, the objective of this manuscript is to propose the multicenter reliability study, which consists of the combination of the data collected by several research centers, as a viable approach to gather more meaningful information about the reliability of fitness tests performed in field and laboratory conditions.

2. Potential benefits of multicenter reliability studies

Multicenter reliability studies could be valuable in sports science, not only for exploring the effects of training interventions [19], but also for assessing the reliability of different fitness tests. While Impellizzeri [19] pointed out some advantages of multicenter studies for conducting training interventions, here we highlight the three main benefits of multicenter studies for exploring the reliability of fitness tests.

2.1 Recruitment of more subjects

The combination of the data collected in different research centers would allow the recruitment of a larger sample size. Note that although a minimum of 50 study participants have been recommended when conducting reliability studies in sports science [5], most of the single-center studies assessing the reliability of fitness tests use a lower sample size [20, 21, 22, 23]. The number of recruited participants is especially low when testing highly-trained or elite athletes. In this regard, one of the advantages of the multicenter approach is that the data of subjects belonging to a specific population (e.g., elite athletes) can be collected in different research centers using similar testing procedures and later combined in a multicenter reliability study. Therefore, through a larger sample size, multicenter reliability studies are expected to provide a more precise estimation of the reliability of different fitness tests.

2.2 Improved generalizability

Slight variations in the characteristics of single-center studies would allow to obtain a more robust indicator of the overall reliability of the fitness test when their outcomes are combined in a multicenter reliability study. The variation could consist of the assessment of different populations (elite athletes, physical education students, older adults, etc.) or the use of different testing procedures (exercises, instructions given to participants, equipment used, etc.). However, it is important to note that the multicenter reliability study could face the problem of the standardization of methods and protocols. Therefore, sport scientists should be cautious not to include too heterogeneous testing procedures that could compromise the accuracy of the overall CV.

2.3 Lower logistical and time resources

Sport scientists of different research centers could share their resources to provide a comprehensive reliability analysis of a fitness test. The reliability of a fitness test could be explored simultaneously in different research centers using slightly different testing procedures to obtain a more generalizable indicator of its reliability. More interestingly, the data of already published single-center studies could also be post-hoc combined within a multicenter reliability study (see next section for details).

3. A practical guide for the application of multicenter reliability studies

Here we provide a practical guide for helping and encouraging sport scientists in conducting multicenter reliability studies using both a priori and a posteriori approaches. Regarding the a priori approach, sport scientists from different research centers should agree in advance to collect data to address a specific question regarding the reliability of a fitness test. Regarding the a posteriori approach, sport scientists should share the original data of their already published single-center studies. Therefore, while in the a priori approach the data collection is pre-planned to address a specific reliability question, in the a posteriori approach the reliability question is addressed from the data collected in already published single-center studies.

3.1 A priori multicenter reliability studies

Step 1. Definition of the reliability question

Sport scientists should explicitly formulate the reliability question to be solved (e.g., “Is the reliability of the force-velocity relationship higher when obtained from proximal or distant points?”) [24]. In this first step, the aim (e.g., “to compare the reliability of the force-velocity relationship obtained from proximal and distant points”) and hypothesis (e.g., “the force-velocity relationship would be obtained with higher reliability from distant points as compared to the proximal points”) should also be formulated.

Step 2. Planning the testing procedures

Researchers from different centers should agree about the testing procedures to be used. Slightly different testing procedures could be followed in each research center. However, sport scientists should not use too heterogeneous testing procedures because they could compromise the overall CV obtained from the combination of the data collected in the different research centers. Following the example provided in the Step 1, it should be noted that the force-velocity relationship can be assessed in different exercises (e.g., sprinting, cycling, jumping, bench press, deadlift isokinetic tasks, etc.). However, a single research center rarely has access to the measurement devices needed to determine the force-velocity relationship in all exercises. Therefore, several research centers could collaborate to collect the data with the locally available measurement devices.

Step 3. Sharing the database

The data needed to conduct the reliability analysis should be shared preferably in the form of an Excel spreadsheet. The different research centers should use the same Excel spreadsheet to save time and avoid errors for the researcher processing the data. Following the example provided in the Step 1, each research center should provide the values of force and velocity against 4 or more loads/velocities.

Step 4. Data processing and reliability analysis

A researcher should be responsible for the processing of the data to calculate the dependent variables of interest. In our example, if we consider that each research center evaluated the tested task under 4 different loads, the parameters of the force-velocity relationship (see Jaric [25] for calculations) would be obtained from the force and velocity data recorded against the 2 intermediate loads (i.e., proximal loads) as well as from the force and velocity data collected against only the 2 most distant loads [24]. Once the magnitude of the dependent variables has been determined either within the same testing session (within-session reliability) or in different testing sessions (between-sessions reliability) the reliability analysis can be performed. To assess the reliability, we recommend to calculate the SEM between two consecutive trials as proposed by Hopkins [5]: SEM = standard deviation of the difference score between the trials 1 and 2 for all participants divided by $\surd$ 2. For comparative purposes, the SEM should be expressed as a CV (CV (%) = SEM / mean score $\times$ 100). Therefore, in this step the CV obtained by pooling the data of all single center studies (overall CV) as well as the CV of each single center study should be determined. The intraclass correlation coefficient (ICC) could also be calculated to elucidate whether the fitness test could be effective to differentiate among athletes.

Step 5. Analysis of results

Sport scientists should provide a general reliability outcome based on the combination of the data collected in all independent research centers (i.e., overall CV). The possible differences in reliability between independent research centers could also be examined by comparing their CVs. The ratio between 2 CVs should be used for reliability comparisons, being 1.15 the minimum ratio to claim meaningful differences in reliability [26]. Therefore, besides providing the overall CV of the fitness test, specific recommendations could be provided when meaningful differences in reliability are detected between centers. Returning to our example, we would specifically determine an overall CV for the two-point method based on proximal loads and another overall CV for two-point method based on distant loads [27]. The comparison of the two overall CV would allow us to state whether one combination of loads is more reliable than the other. In addition, the comparison of the CV of independent research centers would allow us to determine whether similar results are obtained for all exercises evaluated. The CV ratio provides more valuable information than arbitrary reliability thresholds (e.g., CV $<$ 10% and 5% for acceptable and high reliability, respectively) for comparing the reliability of different variables collected during fitness tests [7]. For example, following the criteria presented above, a variable with a CV of 4.9% would be considered highly reliable, while variables with a CV of 5.1% and 9.9% would be considered to have only an acceptable reliability. However, the CV ratio is able to discriminate that trivial differences exist between a CV of 5.1% and 4.9% (CV ratio $=$ 1.04) and meaningful differences exist between a CV of 5.1% and 9.9% (CV ratio $=$ 1.94).

3.2 A posteriori multicenter reliability studies

The a posteriori approach should follow the same guides previously proposed for the a priori multicenter reliability study with the only difference being the Step 2. Regarding the Step 2 of the a posteriori approach, sport scientists should identify published manuscripts presenting a study design that allows for testing the hypotheses previously formulated. Note that the selected single-center studies do not necessarily have to address the identical reliability question of the a posteriori multicenter reliability study, but their testing procedure must allow to answer the same reliability question. Once the manuscripts are identified, the authors of the original single-center studies should be contacted to offer them the participation in the a posteriori multicenter reliability study. Once the original authors agree to collaborate, they should send their database as described in the Step 3. An a posteriori multicenter reliability study has recently been published following the guidelines proposed in the present manuscript [13].

4. Limitation and further research

A limitation of multicenter reliability studies is that the collection of data from different laboratories could artificially increase the between-subjects variability, and, in turn, overestimate the reliability when assessed through the ICC [5]. The same problem would apply to the “multicenter validity study” because the Pearson’s correlation coefficient could also be overestimated. On the other hand, since the CV calculated from the SEM is not affected by the between-subject variability, the overall CV of the fitness test could be accurately obtained by pooling the data of different research centers or single-center studies. We also believe that the a posteriori multicenter reliability study could present worthwhile benefits as compared to meta-analyses. Note that while meta-analyses are restricted to the results reported in single-center studies, in the a posteriori multicenter reliability study sport scientists can use the raw data of single-center studies whose testing procedures allow to answer the same reliability question. Therefore, the collaboration of researchers in a posteriori multicenter reliability studies could solve a common limitation of meta-analyses, which is the low number of studies that typically meet the inclusion and exclusion criteria.

5. Practical implications

Sport scientists are increasingly aware of the importance of reliability and the result of this is the increased number of scientific articles that have been published over the years to inform the reliability of different fitness tests. Of note is that the majority of those studies were performed in a single research center (i.e., single center reliability studies), while the use of multicenter studies for exploring the reliability of fitness tests are still not frequent in our area, especially for measurements that take place in the laboratory (e.g., isokinetic testing). The three main benefits that should encourage researchers to collaborate in multicenter reliability studies are the following: (I) increasing the precision of the reliability outcomes through the recruitment of more subjects, (II) an improved generalizability by allowing slight variations of the testing protocols, and (III) lower logistical and time resources because sport scientists can share their facilities, measuring devices or study sample (a priori study) as well as the data of their published manuscripts (a posteriori study). We hope that the growing engagement in multicenter reliability studies using both the a priori and a posteriori approaches will contribute to the refinement and standardization of the procedures of different fitness tests.

Footnotes

Acknowledgments

We would like to thank our scientific mentor, Prof. Dr. Slobodan Jaric for his constructive comments regarding the present manuscript as well as for all the valuable advice that he provided during the time that we had the opportunity to learn from him. Dear Slobodan, R.I.P.

Conflict of interest

The authors declare no conflict of interest.

References

Abernethy

Wilson

Logan

. Strength and power assessment. Issues, controversies and challenges. Sport Med. 1995; 19(6): 401-17.

Samozino

Edouard

Sangnier

Brughelli

Gimenez

Morin

. Force-velocity profile: Imbalance determination and effect on lower limb ballistic performance. Int J Sports Med. 2014; 35(6): 505-10.

Vaeyens

Lenoir

Williams

Philippaerts

. Talent identification and development programmes in sport: Current models and future directions. Sport Med. 2008; 38(9): 703-14.

García-Ramos

Torrejón

Pérez-Castilla

Morales-Artacho

Jaric

. Selective changes on the mechanical capacities of lower body muscles after a cycle ergometer sprint training against heavy and light resistances. Int J Sport Physiol Perform. 2018; 13(3): 290-7.

Hopkins

. Measures of reliability in sports medicine and science. Sport Med. 2000; 30(1): 1-15.

Weir

. Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005; 19(1): 231-40.

James

Roberts

Haff

Kelly

Beckman

. Validity and reliability of a portable isometric mid-thigh clean pull. J strength Cond Res. 2017 May; 31(5): 1378-86.

Hopkins

. How to interpret changes in an athletic performance test. Sportscience [Internet]. 2004; 8: 1-7. Available from: http//www.sportsci.org/jour/04/wghtests.htm.

Ruiz

Castro-Pinero

Espana-Romero

Artero

Ortega

Cuenca

, et al. Field-based fitness assessment in young people: the ALPHA health-related fitness test battery for children and adolescents. Br J Sports Med. 2011 May; 45(6): 518-24.

10.

España-Romero

Artero

Jimenez-Pavon

Cuenca-Garcia

Ortega

Castro-Pinero

, et al. Assessing health-related fitness tests in the school setting: reliability, feasibility and safety; the ALPHA Study. Int J Sports Med. 2010; Jul; 31(7): 490-7.

11.

Artero

Espana-Romero

Castro-Pinero

Ortega

Suni

Castillo-Garzon

, et al. Reliability of field-based fitness tests in youth. Int J Sports Med. 2011 Mar; 32(3): 159-69.

12.

Amado-Pacheco

Prieto-Benavides

Correa-Bautista

Garcia-Hermoso

Agostinis-Sobrinho

Alonso-Martinez

, et al. Feasibility and reliability of physical fitness tests among colombian preschool children. Int J Environ Res Public Health. 2019 Aug; 16(17).

13.

Garcia-Ramos

Jaric

. Optimization of the force-velocity relationship obtained from the bench press throw exercise: An a-posteriori multicentre reliability study. Int J Sports Physiol Perform. 2019 Aug; 14(3): 317-22.

14.

Ramirez-Velez

Rodrigues-Bezerra

Correa-Bautista

Izquierdo

Lobelo

. Reliability of health-related physical fitness tests among Colombian children and adolescents: The FUPRECOL Study. PLoS One. 2015; 10(10): e0140875.

15.

Le Gall

Lemeshow

Saulnier

. A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA. 1993 Dec; 270(24): 2957-63.

16.

Ritchie

Zaret

Strauss

Pitt

Berman

Schelbert

, et al. Myocardial imaging with thallium-201: a multicenter study in patients with angina pectoris or acute myocardial infarction. Am J Cardiol. 1978 Sep; 42(3): 345-50.

17.

Xian

Yang

Lee

Jiang

Luo

, et al. A randomized, double-blind, multicenter, placebo-controlled clinical study on the efficacy and safety of Shenmai injection in patients with chronic heart failure. J Ethnopharmacol. 2016; 186: 136-42.

18.

Wall

Polousky

Shea

Carey

Ganley

Grimm

, et al. Novel radiographic feature classification of knee osteochondritis dissecans: A multicenter reliability study. Am J Sports Med. 2015; 43(2): 303-9.

19.

Impellizzeri

. Together we are stronger: Multicenter studies. Int J Sports Physiol Perform. 2017; 12(2): 141-141.

20.

Cuk

Markovic

Nedeljkovic

Ugarkovic

Kukolj

Jaric

. Force-velocity relationship of leg extensors obtained from loaded and unloaded vertical jumps. Eur J Appl Physiol. 2014; 114(8): 1703-14.

21.

Schabort

Hawley

Hopkins

Blum

. High reliability of performance of well-trained rowers on a rowing ergometer. J Sports Sci. 1999; 17(8): 627-32.

22.

Blagrove

Howatson

Hayes

. Test-retest reliability of physiological parameters in elite junior distance runners following allometric scaling. Eur J Sport Sci. 2017; 17(10): 1231-40.

23.

Driller

Argus

Bartram

Bonaventura

Martin

West

, et al. Reliability of a 2-bout exercise test on a wattbike cycle ergometer. Int J Sports Physiol Perform. 2014; 9(2): 340-5.

24.

Pérez-Castilla

Jaric

Feriche

Padial

García-Ramos

. Evaluation of muscle mechanical capacities through the two-load method: optimization of the load selection. J Strength Cond Res [Internet]. 2018 Apr; 32(5): 1245-53. Available from: http//europepmc.org/abstract/med/28475551.

25.

Jaric

. Force-velocity relationship of muscles performing multi-joint maximum performance tasks. Int J Sports Med. 2015 Mar; 36(9): 699-704.

26.

Fulton

Pyne

Hopkins

Burkett

. Variability and progression in competitive performance of Paralympic swimmers. J Sports Sci. 2009; 27(5): 535-9.

27.

Jaric

. Two-load method for distinguishing between muscle force, velocity, and power-producing capacities. Sport Med. 2016; 46(11): 1585-9.

Potential benefits of multicenter reliability studies in sports science: A practical guide for its implementation

Abstract

Keywords

1. Importance of reliability in sports medicine and science

2. Potential benefits of multicenter reliability studies

2.1 Recruitment of more subjects

2.2 Improved generalizability

2.3 Lower logistical and time resources

3. A practical guide for the application of multicenter reliability studies

3.1 A priori multicenter reliability studies

Step 1. Definition of the reliability question

Step 2. Planning the testing procedures

Step 3. Sharing the database

Step 4. Data processing and reliability analysis

Step 5. Analysis of results

3.2 A posteriori multicenter reliability studies

4. Limitation and further research

5. Practical implications

Footnotes

Acknowledgments

Conflict of interest

References