Sage Journals: Discover world-class research

Abstract

Objective:

The objective was to evaluate intrarater and inter-rater reliability of ultrasonography muscle morphology measurements in men and women, considering morphological differences between the genders.

Materials and Methods:

Thirty-two healthy subjects (16 male; 16 female; 26.6 ± 4.9 years) participated in two evaluation days. On day 1, subjects were evaluated by a single experienced rater, who repeated the evaluation on the next testing day along with two other raters. Muscle morphology of quadriceps femoris (Q_MT), rectus femoris (RF_MT), vastus intermedius (VI_MT), vastus medialis (VM_MT), and vastus lateralis (VL_MT) muscle thickness, rectus femoris muscle cross-sectional area (RF_CSA), vastus lateralis muscle fascicle length (VL_FL), and pennation angle (VL_PA) were obtained. Reliability was evaluated by intraclass correlation coefficients (ICCs).

Results:

All intrarater comparisons demonstrated good reliability (ICC >0.90), even after participants’ gender stratification (ICC >0.81). Inter-rater comparisons for RF_CSA, and muscle thickness showed good reliability (ICC >0.76). Male’s VL_FL and VL_PA reliability was considered insufficient (ICC = 0.58 and 0.60, respectively), while female’s was slightly higher (ICC = 0.79 and 0.75, respectively).

Conclusion:

Ultrasonography has potential to be used to observe changes in muscle size, but pennation angle and fascicle length should be evaluated by the same rater.

Keywords

Fascicle length pennation angle reliability ultrasonography and muscle morphology

Ultrasonography (US) is a valid, convenient, and common diagnostic method to evaluate muscle architecture (i.e., muscle thickness, fascicle length and angle of pennation of its fibers),^1
–5 which is determinant to muscle function and force production.^6,7 A 2003 systematic review identified several studies that have shown the validity of this method to evaluate muscle size in comparison to the gold standards magnetic resonance imaging (MRI) and computerized tomography (CT).⁸ Since then, other studies have been published that expand the support for the validity of this method by comparing specific metrics (fascicle length and pennation angle) between US and cadaveric measurements,⁹ identifying its association with function in different populations^2,10,11 and systematically reviewing current validity studies specifically for the diagnosis of sarcopenia.² The valid and reliable measurement of muscle architecture through time is important for detecting muscle losses or gains (and its associated strength) that can occur during hospitalization^12,13 and training,^10,11 respectively.

A critical matter when conducting measurements in both research and clinical settings, particularly those where the measurement depends mostly on the rater (i.e., does not require the participant to execute a movement), is the reliability of the results. Reliability is defined as the quality of a measure that produces reliable scores on repeated administrations of a test.¹⁴ Although the terminology of the different kinds of reliability may vary between studies, two are commonly found in the literature: (1) the intrarater reliability that regards the results obtained by the same rater in different moments¹⁵ and (2) the inter-rater reliability that regards the results obtained by different raters in the same day.^16,17 Common metrics to quantify reliability are intraclass correlation coefficients (ICCs), standard error of measurements (SEMs), and minimal detectable change (MDC),¹⁸ which are used to understand if a change, observed as a result of a rehabilitation program or hospitalization, is true or due to measurement error.

Although several studies have evaluated the reliability of muscle architecture,^{9,15,16,19
–22} generally finding good to excellent values, the current literature is still limited. The primary limitation is the lack of studies comparing the reliability of measurements taken on both genders. Because females tend to have smaller muscle mass than males,²³ there is a possibility that measurements in men can be less reliable than in woman since a relatively smaller portion of the muscle is visible on US images, increasing mathematical extrapolation of fascicle length calculation. Even though several studies have evaluated both males and females,^{1,9,19
–22,24,25} no study has specifically compared the reliability between genders. A second limitation is that studies typically focus on a few muscles (or muscle portions in the case of the quadriceps) and metrics, although there are many that can be used in clinical practice.^19,26
–29 In addition, studies generally evaluate only intrarater reliability,^9,19,20 despite the importance of both types when measurements are performed in different days to follow a patient’s progress and may also be conducted by different raters due to staff scheduling in a clinical setting. Finally, there are studies that chose a sample size that was not statistically justified and may have been underpowered to answer the proposed question.^1,9,19,22

Because of the importance of being able to accurately measure muscle architecture to follow a patient’s response to disuse or to training programs, it is fundamental to know how the reliability of all portions of the quadriceps muscle is influenced by participant gender. In addition, considering situations in which these measures are carried out by different assessors, both the intrarater and inter-rater reliability should be evaluated. Therefore, the aim of this study was to compare intrarater and inter-rater reliability of quadriceps portions of healthy men and women, using different metrics with a sufficient number of individuals determined by sample size calculation. It was hypothesized that, given the appropriate training of the raters, all measures will be reliable, particularly in the female subgroup.

Materials and Methods

This laboratory-based study used a methodology approved by the University’s Research Ethics Committee (CAEE # 36588914.4.1001.5347). The participants were asked to read and sign an informed consent after all questions about the study was answered by the designated researcher.

Participants

Healthy and physically active men and women were recruited through social networks, advertisements, and in-person invitations at the institution where the study was conducted. None of the participants presented (1) injury to the evaluated lower limb, (2) any cardiovascular disease, or (3) central or peripheral neurological disease. The sample size was chosen to allow a precise estimation of ICCs when the evaluation is repeated three times (three raters), the true ICC being approximate to 0.9 based on previous studies,^21,25,26,30 and the probability of obtaining a precision of 0.20 (i.e., confidence interval [CI] being ±0.20 from the ICC) being 50%.^31,32 The minimal sample size for each group using these criteria was 12. To anticipate possible data loss (and avoid reducing statistical power), 16 male and 16 female participants were recruited.

Procedures

Subjects were submitted to two assessment sessions, on two different days separated by a week. The evaluations were performed without the cooperation of the participants (they were asked to relax during all procedures), in a small temperature-controlled room (23°C) equipped with a stretcher to simulate a hospital intensive care unit environment. This was done because the study was part of a greater project for the implementation of neuromuscular electrical stimulation in intensive care units. Before the tests, participants rested for 20 minutes in order to redistribute body fluids,³³ while lying on the stretcher in a dorsal decubitus position, resting the right lower limb on a knee extensor board. Subsequently, the US muscle morphology measurements were performed with the muscle at rest.

In one of the testing days, participants were evaluated by a single experienced rater (R1 = six years of experience with US measurements) who repeated the evaluation on the next testing day along with two other raters, who were duly trained by the experienced rater to perform the procedures (R2 = one year of experience, R3 = three months of experience). Both the raters’ evaluations order and the days’ order in which participants were evaluated by a single rater or by the three raters were intentionally randomized to discard the moment’s influence in which the subjects were evaluated. The second evaluation by R1 was conducted with the help of a map representing the transducer position in relation to moles, scars, and bone protuberances, which was created on the first day.¹¹ Different raters were blinded to other raters’ procedures, and all pen marks made on subjects’ skin were removed with alcohol to exclude inter-rater influence along evaluations.

Evaluation of Knee Extensors Morphological Properties

Two-dimensional real-time US was performed using a Vivid-I ultrasound equipment system (GE Healthcare, Waukesha, Wisconsin). To record the muscle morphology evaluation, a 44-mm wide linear-array transducer, transmit frequency of 9MHz was used. The transducer was soaked in a water-soluble transmission gel promoting acoustic contact without depressing the skin surface. Optimization parameters (e.g., brightness, gain) were kept the same during all evaluations, with only the depth being modified for different participants.

Images were analyzed using the Image-J software (National Institute of Health, USA) by a single experienced analyst, which was not one of the raters. The following measurements were obtained: (1) the rectus femoris muscle cross-sectional area (RF_CSA);³⁴ the muscle thickness of the (2) quadriceps femoris (Q_MT),³⁵ (3) rectus femoris (RF_MT), (4) vastus intermedius (VI_MT), (5) vastus medialis (VM_MT),³⁶ and (6) vastus lateralis (VL_MT);¹¹ (7) the fascicle length of vastus lateralis muscle fibers (VL_FL)¹¹ and (8) its respective pennation angle (VL_PA).¹¹ For each evaluated morphological variable, the mean value of three images was used for statistical analysis.

Rectus Femoris Muscle Cross-Sectional Area Evaluation

Image capture was performed at the 70% level of the RF muscle belly (from the greater trochanter to the knee lateral epicondyle), in the transverse plane (See Figure 1A). The transducer was positioned transversely to the orientation of the muscle belly, with the image depth adjusted so that RF’s aponeuroses, as well as the femur, were visible. The RF_CSA was obtained by tracing the muscle perimeter while excluding the aponeuroses and calculating the area of the resultant shape. The unit of measurement for this assessment was cm².

Figure 1.

Sonograms were obtained from a single participant and demonstrate the following: (A) rectus femoris cross-section area (RF_CSA); (B) quadriceps muscle thickness (Q_MT), rectus femoris muscle thickness (RF_MT), and vastus intermedius muscle thickness (VI_MT); (C) vastus medialis muscle thickness (VM_MT); (D) vastus lateralis muscle thickness (VL_MT); and (E) vastus lateralis fiber length (VL_FL) and pennation angle (VL_PA) analysis.

Muscles Thickness Evaluation

The Q_MT, RF_MT, and VI_MT were obtained at 50% of the distance from the greater trochanter to the knee lateral epicondyle. A single vertical measurement was taken in the central portion of each muscle belly, from the superficial to the deep RF’s aponeuroses for RF_MT, from the superficial VI’s aponeurosis to the femur for VI_MT and from the superficial RF’s aponeurosis to the femur for Q_MT (See Figure 1B). For the VM_MT, images were captured at 70% of the thigh length (See Figure 1C) with the transducer positioned obliquely, longitudinally to the muscle fibers. For VL_MT, images were captured at 50% of the thigh length (See Figure 1D), longitudinally to muscle fibers. Muscle thickness (cm) was considered the distance between the superficial and deep aponeuroses, with a single measurement performed for obtaining Q_MT, RF_MT, and VI_MT; five equidistant measurements obtained and averaged for the VM_MT and three for the VL_MT (one at the left, one at the center, and one at the right side of each image).

Evaluation of the Vastus Lateralis Fascicle Length and Pennation Angle

For the evaluation of the VL_FL and the VL_PA, the same images obtained for the evaluation of the VL_MT were analyzed, with one representative muscle fiber being selected for each of the three images. Given that the ultrasound transducer had a small scanning area (44 mm), the VL_FL (cm) was calculated using trigonometry¹¹ considering the VL_PA (°) to mathematically extrapolate the trajectories of the structures out of the image (See Figure 1E).

Statistical Analysis

The Shapiro–Wilk and Levene tests were used to confirm data normality and homogeneity of variance, respectively. To compare RF_CSA, Q_MT, RF_MT, VI_MT, VM_MT, VL_MT, VL_FL, and VL_PA values between men and women, an independent samples’ T-test was used. To verify the clinical relevance of eventual differences found between measurements performed in men and women, effect sizes [Cohen’s d = (M2 − M1)⁄SD_pooled] were calculated adopting the following criteria: <0.2: trivial, ≥0.2: small; ≥0.50: moderate; ≥0.80: large.³⁷

For the evaluation of intrarater and inter-rater reliability, the ICCs and their respective CIs were calculated using the “2, 1” model,¹⁸ as follows:

ICC 2, 1 = \frac{M S_{S} - M S_{E}}{M S_{S} + (k - 1) M S_{E} + \frac{k (M S_{T} - M S_{E})}{n}}

where MS_S is the participants’ mean square, MS_E is the mean square error, MS_T is the total mean square, k is the number of trials, and n is the sample size. The SEM and MDC were calculated to quantify reliability, according to the formulas provided by Weir:¹⁸

SEM = SD \sqrt{1 - I C C}

MDC = SEM \times 1.96 \times \sqrt{2}

Benchmarks were defined for “good” reliability if ICCs were ≥0.75.³⁸ Thus, ICCs <0.75 were considered as “insufficient.” All reliability tests were conducted separately for men and women as well as for the pooled sample. A significance level of p = .5 was adopted for all analyses. Analysis was completed using IBM SPSS software (v 20, IBM Corp., Armonk, New York).

Results

All muscle thickness measurements were higher for the men compared to the women (See Figure 2), with large effect size. All muscle thickness intrarater ICCs were considered good, with only VM_MT for men and VL_MT for women presenting values lower than 0.90 (0.81 and 0.83, respectively). The SEMs ranged from 1.9% to 4.7% and MDCs ranged from 3.5% to 9.2% of the mean values, with no clear difference between the genders (See Table 1). All muscle thickness inter-rater reliability values were considered good, with only VM_MT for women and VL_MT for all groups presenting values lower than 0.90 (0.76 and between 0.84 and 0.87, respectively). The SEMs ranged from 1.7% to 4.3% and MDCs ranged from 3.3% to 8.5% of the mean values, also with no clear differences between genders (See Table 2).

Figure 2.

Example sonograms obtained from the cohort of female (right) and male (left) participants. (A) rectus femoris at 70% of thigh length; (B) rectus femoris and vastus intermedius at 50% of thigh length; (C) vastus lateralis; (D) vastus medialis. Numbers and ticks to the left indicate centimeters to exemplify scale.

Table 1.

Mean and Standard Deviation Values of Ultrasonographic Measurements Acquired in Two Different Days by the Same Evaluator.

	Gender	First day (mean ± SD)	Second day (mean ± SD)	SEM	SEM (%)	MDC	MDC (%)	ICC	95% CI
Intrarater
RF_CSA	POOLED MEN WOMEN	3.72 ± 1.35 cm² 4.39 ± 1.37 cm² 3.05 ± 0.97 cm²**^d	3.83 ± 1.39 cm² 4.54 ± 1.42 cm² 3.12 ± 0.96 cm²**^d	0.25 cm² 0.25 cm² 0.17 cm²	6.6 5.6 5.6	0.49 cm² 0.49 cm² 0.34 cm²	12.9 11.0 11.0	0.97 0.94 0.97	0.93-0.98 0.85-0.98 0.93-0.99
Q_MT	POOLED MEN WOMEN	3.80 ± 0.68 cm 4.24 ± 0.56 cm 3.35 ± 0.46 cm*^d	3.84 ± 0.68 cm 4.28 ± 0.58 cm 3.40 ± 0.51 cm*^d	0.13 cm 0.10 cm 0.08 cm	3.3 2.4 2.6	0.25 cm 0.20 cm 0.17 cm	6.6 4.7 5.0	0.97 0.92 0.95	0.93-0.98 0.79-0.97 0.88-0.98
RF_MT	POOLED MEN WOMEN	2.10 ± 0.35 cm 2.33 ± 0.29 cm 1.82 ± 0.22 cm*^d	2.15 ± 0.38 cm 2.40 ± 0.30 cm 1.89 ± 0.24 cm*^d	0.07 cm 0.05 cm 0.04 cm	3.1 2.3 2.2	0.13 cm 0.10 cm 0.08 cm	6.1 4.4 4.4	0.95 0.93 0.90	0.91-0.97 0.81-0.97 0.76-0.96
VI_MT	POOLED MEN WOMEN	1.56 ± 0.40 cm 1.76 ± 0.39 cm 1.35 ± 0.30 cm**^d	1.53 ± 0.38 cm 1.71 ± 0.35 cm 1.36 ± 0.32 cm**^d	0.7 cm 0.6 cm 0.5 cm	4.7 3.9 4.1	0.14 cm 0.13 cm 0.11 cm	9.2 7.6 8.1	0.96 0.94 0.96	0.92-0.98 0.83-0.97 0.90-0.98
VM_MT	POOLED MEN WOMEN	4.09 ± 0.57 cm 4.52 ± 0.35 cm 3.66 ± 0.39 cm*^d	4.07 ± 0.56 cm 4.49 ± 0.34 cm 3.65 ± 0.39 cm*^d	0.13 cm 0.06 cm 0.07 cm	3.0 1.4 1.9	0.25 cm 0.12 cm 0.13 cm	6.0 2.7 3.8	0.95 0.81 0.93	0.90-0.98 0.54-0.93 0.82-0.97
VL_MT	POOLED MEN WOMEN	2.47 ± 0.33 cm 2.65 ± 0.32 cm 2.29 ± 0.22 cm**^d	2.50 ± 0.34 cm 2.67 ± 0.34 cm 2.33 ± 0.23 cm**^d	0.08 cm 0.05 cm 0.04 cm	3.3 2.2 1.8	0.16 cm 0.11 cm 0.08 cm	6.6 4.4 3.5	0.94 0.94 0.83	0.87-0.97 0.85-0.98 0.58-0.93
VL_FL	POOLED MEN WOMEN	12.60 ± 1.81 cm 12.82 ± 1.61 cm 12.37 ± 2.02 cm^b	12.45 ± 1.73 cm 12.66 ± 1.78 cm 12.24 ± 1.71 cm^b	0.42 cm 0.30 cm 0.33 cm	3.4 2.4 2.7	0.83 cm 0.59 cm 0.65 cm	6.6 4.7 5.3	0.94 0.93 0.94	0.89-0.97 0.81-0.97 0.85-0.98
VL_PA	POOLED MEN WOMEN	11.07 ± 1.37° 11.52 ± 1.36° 10.62 ± 1.25°^c	11.43 ± 1.53° 11.88 ± 1.66° 10.97 ± 1.28°^c	0.45° 0.27° 0.22°	4.0 2.3 2.1	0.89° 0.53° 0.44°	8.0 4.6 4.1	0.90 0.88 0.89	0.81-0.95 0.70-0.95 0.73-0.96

Abbreviations: CI, confidence intervals; ICC, intraclass correlation coefficient; MDC, minimal detectable change; Q_MT, quadriceps muscle thickness; RF_CSA, rectus femoris cross-sectional area; RF_MT, rectus femoris muscle thickness; SEM, standard error of measure; VI_MT, vastus intermedius muscle thickness; VL_FL, vastus lateralis fiber length; VL_MT, vastus lateralis muscle thickness; VL_PA, vastus medialis pennation angle; VM_MT, vastus medialis muscle thickness.

Significant difference between men and women (p < .0001).

Significant difference between men and women (p < .05).

Effect sizes: ^atrivial; ^bsmall; ^cmoderate; and ^dlarge.

Table 2.

The Mean and Standard Deviation Values of Ultrasound Measurements Acquired in the Same Day by Three Different Evaluators.

	Gender	Rater 1 (mean ± SD)	Rater 2 (mean ± SD)	Rater 3 (mean ± SD)	SEM	SEM (%)	MDC	MDC (%)	ICC	95% CI
Inter-rater
RF_CSA	POOLED MEN WOMEN	3.75 ± 1.34 cm² 4.38 ± 1.40 cm² 3.12 ± 0.95 cm²**^d	4.13 ± 1.39 cm² 4.90 ± 1.36 cm² 3.36 ± 0.95 cm²**^d	3.82 ± 1.39 cm² 4.42 ± 1.42 cm² 3.23 ± 0.72 cm²**^d	0.39 cm² 0.25 cm² 0.17 cm²	10.1 5.5 5.3	0.77 cm² 0.49 cm² 0.33 cm²	19.8 10.8 10.5	0.91 0.91 0.82	0.85-0.95 0.81-0.96 0.64-0.92
Q_MT	POOLED MEN WOMEN	3.84 ± 0.68 cm 4.27 ± 0.57 cm 3.40 ± 0.46 cm*^d	3.81 ± 0.72 cm 4.25 ± 0.63 cm 3.38 ± 0.50 cm*^d	3.86 ± 0.66 cm 4.27 ± 0.56 cm 3.44 ± 0.46 cm*^d	0.12 cm 0.10 cm 0.08 cm	3.0 2.5 2.5	0.23 cm 0.20 cm 0.16 cm	6.0 4.8 4.9	0.97 0.94 0.96	0.95-0.98 0.87-0.97 0.91-0.98
RF_MT	POOLED MEN WOMEN	2.14 ± 0.35 cm 2.38 ± 0.30 cm 1.90 ± 0.22 cm*^d	2.13 ± 0.39 cm 2.40 ± 0.34 cm 1.87 ± 0.22 cm*^d	2.15 ± 0.38 cm 2.41 ± 0.33 cm 1.90 ± 0.21 cm*^d	0.06 cm 0.05 cm 0.03 cm	3.1 2.4 2.0	0.13 cm 0.11 cm 0.07 cm	6.1 4.7 4.0	0.96 0.94 0.93	0.91-0.97 0.87-0.97 0.86-0.97
VI_MT	POOLED MEN WOMEN	1.56 ± 0.40 cm 1.74 ± 0.38 cm 1.37 ± 0.32 cm**^d	1.52 ± 0.39 cm 1.68 ± 0.36 cm 1.36 ± 0.35 cm**^d	1.56 ± 0.39 cm 1.72 ± 0.39 cm 1.40 ± 0.33 cm**^d	0.07 cm 0.06 cm 0.05 cm	4.3 4.0 4.3	0.13 cm 0.13 cm 0.11 cm	8.5 7.8 8.4	0.96 0.94 0.97	0.93-0.98 0.87-0.97 0.93-0.98
VM_MT	POOLED MEN WOMEN	4.09 ± 0.58 cm 4.53 ± 0.37 cm 3.66 ± 0.38 cm*^d	3.99 ± 0.57 cm 4.38 ± 0.44 cm 3.59 ± 0.37 cm*^d	4.04 ± 0.52 cm 4.41 ± 0.43 cm 3.67 ± 0.27 cm*^d	0.15 cm 0.07 cm 0.06 cm	3.6 1.7 1.7	0.29 cm 0.14 cm 0.12 cm	7.1 3.3 3.3	0.93 0.91 0.76	0.88-0.96 0.81-0.96 0.55-0.90
VL_MT	POOLED MEN WOMEN	2.49 ± 0.34 cm 2.67 ± 0.33 cm 2.31 ± 0.23 cm**^d	2.48 ± 0.29 cm 2.57 ± 0.29 cm 2.39 ± 0.26 cm^c	2.54 ± 0.31 cm 2.68 ± 0.26 cm 2.39 ± 0.29 cm**^d	0.12 cm 0.05 cm 0.04 cm	4.6 2.0 2.0	0.24 cm 0.10 cm 0.09 cm	8.9 4.0 4.0	0.87 0.86 0.84	0.78-0.93 0.71-0.94 0.68-0.93
VL_FL	POOLED MEN WOMEN	12.57 ± 1.61 cm 12.86 ± 1.56 cm 12.28 ± 1.66 cm^b	12.52 ± 1.85 cm 12.95 ± 2.01 cm 12.09 ± 1.63 cm^b	12.92 ± 1.94 cm 13.53 ± 1.97 cm 12.30 ± 1.75 cm^c	1.00 cm 0.33 cm 0.29 cm	7.9 2.5 2.5	1.95 cm 0.65 cm 0.58 cm	15.4 5.0 4.8	0.69^† 0.58^† 0.79	0.52-0.82 0.29-0.81 0.59-0.91
VL_PA	POOLED MEN WOMEN	11.16 ± 1.40° 11.53 ± 1.45° 10.78 ± 1.27°^c	11.38 ± 1.42° 11.38 ± 1.58° 11.37 ± 1.27°^a	11.43 ± 1.54° 11.57 ± 1.56° 11.27 ± 1.54°^a	0.84° 0.27° 0.24°	7.4 2.4 2.2	1.64° 0.53° 0.48°	14.5 4.7 4.4	0.66^† 0.60^† 0.75	0.49-0.80 0.32-0.82 0.53-0.89

Abbreviations: CI, confidence interval; ICC, intraclass correlation coefficient; MDC, minimal detectable change; Q_MT, quadriceps muscle thickness; RF_CSA, rectus femoris cross-sectional area; RF_MT, rectus femoris muscle thickness; SEM, standard error of measure; VI_MT, vastus intermedius muscle thickness; VL_FL, vastus lateralis fiber length; VL_MT, vastus lateralis muscle thickness; VL_PA, vastus medialis pennation angle; VM_MT, vastus medialis muscle thickness.

Significant difference between men and women (p < .0001).

Significant difference between men and women (p < .05).

Effect sizes: ^atrivial; ^bsmall; ^cmoderate; and ^dlarge. ^†Insufficient reliability.

The RF_CSA was greater for men than for women, with large effect size. Intrarater ICCs were good for all groups (>0.94), with SEMs ranging from 5.6% to 6.6% and MDCs ranging from 11% to 12.9% of the mean measured values (See Table 1). Inter-rater ICCs were considered good for all groups, with women presenting a slightly lower value (ICC = 0.82) than men (ICC = 0.91). The SEMs ranged from 5.3% to 10.1% and MDCs ranged from 10.5% to 19.8% of the mean values, with no difference between genders (See Table 2).

The VL_FL was not significantly different between men and women (small effect sizes). Intrarater ICCs were considered good (>0.93), with SEMs ranging from 2.4% to 3.4% and MDCs ranging from 4.7% to 6.6% of the mean measured values, being similar for both genders (See Table 1). However, despite inter-rater VL_FL ICCs were considered good for women (ICC = 0.79), they were insufficient for men and for the pooled sample (ICC = 0.58 and 0.69, respectively). The SEMs ranged from 2.5% to 7.9% and MDCs ranged from 4.8% to 15.4% of the mean, with no differences between genders (See Table 2).

The VL_PA was not different between men and women (moderate effect sizes). Intrarater ICCs were considered good for men, women, and pooled samples (>0.88), with SEMs ranging from 2.1% to 4.0% and MDCs ranging from 4.1% to 8.0% of the mean and being similar between sexes (See Table 2). Inter-rater VL_PA ICCs were considered good for women (ICC = 0.75) but also insufficient for men and pooled samples (ICC = 0.60 and 0.66, respectively). The SEMs ranged from 2.2% to 7.4% and MDCs ranged from 4.4% to 14.5% of the mean measured values, with no differences between genders (See Table 2).

Discussion

In this study, the intrarater and inter-rater reliability of different US measurements of the quadriceps portions were evaluated, particularly focusing on the participant between-gender differences. For women, all measurements were considered good (ICC >0.75), whereas for men, reliability was considered insufficient only for the inter-rater comparison of the VL_FL and VL_PA. These results partially confirm the study hypothesis that all measurements would have been reliable, as fascicle length and pennation angle ICCs seemed to be rater and gender-dependent, but SEMs and MDCs were similar between participants’ genders.

Muscle thickness of the four quadriceps portions and overall quadriceps presented high ICCs for both intrarater and inter-rater comparisons, regardless of participant gender (>0.81 for men and >0.76 for women). These results agree with the current literature that also found high reliability for quadriceps portions intrarater ^{9,19,24,29,39
–41} and inter-rater^{24
–26,29,30} comparisons in mixed or exclusively male populations. These studies were also conducted in populations that were healthy,^9,26,39,40 hospitalized,^25,29,30 or with diabetes melittus.²⁴ Similarly, RF_CSA also presented good reliability that was very similar or slightly above to literature values, which observed intrarater ICCs between 0.87 and 0.99^{1,20,22,40,42} and inter-rater values between 0.79 and 0.99.^21,24,42 It is important to note that this muscle was evaluated at 70% of the thigh length, as opposed to the most popular 50%, in order to fit the whole muscle in the image with the available transducer. Transducer position did not seem to be an issue, which is further supported by Lima et al,²² who found similar reliability when evaluating at 50% or 15 cm proximal to the patellar edge. Overall, the current study’s reliability findings may suggest that thickness and RF_CSA can be reliable measurements to assess changes in muscle size due to training programs or disuse (e.g., during hospitalization).

The VL_FL and VL_PA inter-rater reliabilities were considered insufficient when considering all subjects pooled (0.69 and 0.66, respectively) and exclusively men (0.58 and 0.60). Conversely, when evaluating only women, ICCs rose to 0.79 and 0.75, values that were just above the threshold to be considered good. This difference is likely because women presented smaller VL muscles, making it easier to find a representative fascicle to be used for calculating the pennation angle. Furthermore, errors in pennation angle and muscle thickness are particularly important when extrapolating fascicle length using trigonometry, resulting in a higher reliability when the extrapolation is required for a small percentage of the length or not required at all.²³ Although smaller muscle size in women seems a reasonable explanation for their higher reliability in these measurements, other studies that could corroborate this hypothesis by stratifying groups by gender or presenting groups composed exclusively of women were not found.

There are not many studies that have investigated the reliability of these parameters when made by different raters, particularly in VL, as only Chiaramonte et al²⁶ evaluated pennation angle reliability in this muscle, finding an ICC of 0.95. However, it is not clear how much experience each rater had at the time of the measurements, nor which steps were taken to make sure the raters did not influence each other (e.g., exiting the room or erasing skin marks), all of which could have contributed to increasing the reliability. Inter-rater reliability has also been studied for other muscles. In a series of studies, Cho et al^27,28,43 found medial gastrocnemius and tibialis anterior pennation angle ICC values between 0.81 and 0.98 in stroke patients. The only study found where fascicle length inter-rater reliability was measured was the one by König et al,⁴⁴ where medial gastrocnemius fascicle length ICC was 0.77, in addition to pennation angles between 0.80 and 0.90. Overall, the current study’s results were lower than those previously found, possibly due to the smaller size of medial gastrocnemius and tibialis anterior in comparison to the vastus lateralis, indicating that care should be taken when using fascicle length and pennation angle, measured by different raters, to make clinical decisions.

Intrarater reliability tends to be higher than inter-rater reliability because a given experienced rater uses a similar technique to identify the structures required to acquire a good sonographic image and may remember the characteristics of the participant from the previous evaluation. Another instrument that can be used to make sure the measurements are consistent is a map. This map can be produced using a transparent sheet where the position of the probe is recorded in relation to anatomical points such as moles, scars, and bony protrusions, making sure it is positioned in the same place in multiple evaluations. However, even using the map, there are other factors that are more difficult to accurately reproduce, such as the angle of the probe relative to the skin surface, which makes the results not the same, lowering the reliability scores. However, the current study findings showed that all the quadriceps measurements evaluated by an experienced rater had good reliability and can be used in research and clinical practice.

Inter-rater reliability can be highly influenced by the raters’ experience, given that experienced raters can easily identify anatomical points and the structures that need to be obtained during the evaluation of the US muscle morphology measurements. In the present study, the three raters had different levels of experience. While R1 had six years of experience with the assessment technique, R2 had one year of experience and R3 worked with the technique for only three months. When looking at the reproducibility values obtained in comparisons between the most experienced rater and the other two raters individually, it can be observed that only the VL_FL and VL_PA were more reliable when comparing the R1 with the R2 (0.85 and 0.72) than when comparing R1 with R3 (0.65 and 0.60). However, when comparing R1 with himself (intrarater), the ICC values were 0.94 and 0.90, suggesting that these differences may have been caused by the raters’ different levels of experience. In addition, as previously discussed, a landmark map may also help to improve reliability by minimizing probe position variation. However, no study has investigated intrarater or interrater reliability while comparing the use or not of this map. Nonetheless, what the current findings suggest is that, when evaluating fascicle length and pennation angle, it would be preferable that raters have experience with the technique for more than one year, whereas when evaluating thickness and cross-sectional areas, a proper training and a smaller period of experience should be enough to provide reliable measurements.

Limitations

This study has major limitations due to the study design that has threats to internal and external validity. It is also important to note added limitations, when interpreting these results: (1) the quadriceps muscle was chosen because it can represent well the participants’ muscle characteristics, given it is the largest muscle in the body and is highly associated with patient function and prognosis.^2,45 However, other muscles can also be used in clinical practice and may present different inter-rater and intrarater reliabilities; (2) only one person analyzed all the images from both days and all raters to be consistent. The identification of structures by this analyst may also have influenced the results, and the inter-analyst reliability could also bring valuable information for understanding the reliability of these measurements.¹⁷ Finally, (3) the participants were young and healthy and did not have any current pathology. The muscles of older and pathological people may have different characteristics that could make the identification of muscle structures more difficult. Thus, the results of this study should be used with care when seeking to extrapolate these results to clinical practice.

Conclusion

The high ICCs and low SEMs and MDCs observed for all intrarater parameters demonstrated that these measurements were reliable when performed by an experienced rater at different moments. High reliability found for RF_CSA and muscle thickness measures in inter-rater comparisons in all groups and high reliability for VL_FL and VL_PA in women demonstrate that these measures could be used in the evaluation of musculoskeletal morphology, when performed by different raters. The insufficient reliability found for VL_FL and VL_PA in mixed-gender and exclusively male groups in the inter-rater comparisons suggests that these parameters are evaluator-dependent. These results should be considered when using these types of measurement for making clinical decisions based on US muscle architecture values.

Footnotes

Ethics Approval

Ethical approval for this study was obtained from the University’s Research Ethics Committee (CAEE # 36588914.4.1001.5347).

Informed Consent

Written informed consent was obtained from all subjects before the study.

Animal Welfare

Guidelines for humane animal treatment did not apply to the present study because no animals were used during the study.

Trial Registration

Not applicable.

Declaration of Conflicting Interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The authors disclosed this work was supported by the Brazilian Council of Scientific and Technological Development (CNPq) (grant number 458838/2013-6). MAV, PhD, is a recipient of a research fellowship from the CNPq.

ORCID iD

Rodrigo Rabello

References

Mendis

Wilson

Stanton

Hides

: Validity of real-time ultrasound imaging to measure anterior hip muscle size: a comparison with magnetic resonance imaging. J Orthop Sports Phys Ther. 2010;40(9):577–581. doi:10.2519/jospt.2010.3286.

Parry

El-Ansary

Cartwright

, et al: Ultrasonography in the intensive care setting can be used to detect changes in the quality and quantity of muscle and is related to muscle strength and function. J Crit Care. 2015;30(5):1151.e9–1151.e14. doi:10.1016/j.jcrc.2015.05.024.

Liu

Ling

, et al: Reliability and validity of assessing lower-limb muscle architecture of patients with cerebral palsy (CP) using ultrasound: a systematic review. J Clin Ultrasound. 2023;51(7):1212–1222. doi:10.1002/jcu.23498.

Nijholt

Scafoglieri

Jager Wittenaar

Hobbelen

JSM

van der Schans

: The reliability and validity of ultrasound to quantify muscles in older adults: a systematic review. J Cachexia Sarcopenia Muscle. 2017;8(5):702–712. doi:10.1002/jcsm.12210.

Nagae

Umegaki

Yoshiko

Fujita

: Muscle ultrasound and its application to point-of-care ultrasonography: a narrative review. Ann Med. 2023;55(1):190–197. doi:10.1080/07853890.2022.2157871.

Henriksson-Larsén

Wretling

Lorentzon

Oberg

: Do muscle fibre size and fibre angulation correlate in pennated human muscles.? Eur J Appl Physiol Occup Physiol. 1992;64(1):68–72. doi:10.1007/BF00376443.

Lieber

Friden

: Functional and clinical significance of skeletal muscle architecture. Muscle Nerve. 2000;23(11): 1647–1666. doi:10.1002/1097-4598(200011)23:11<1647::AID-MUS1>3.0.CO;2-M.

Perkin

Bond

Thompson

Woods

Smith

: Real time ultrasound: an objective measure of skeletal muscle. Phys Ther Rev. 2003;8(2):99–108. doi:10.1179/108331903225002506.

Ema

Wakahara

Mogi

, et al: In vivo measurement of human rectus femoris architecture by ultrasonography: validity and applicability. Clin Physiol Funct Imaging. 2013;33(4):267–273. doi:10.1111/cpf.12023.

10.

Cuthbert

Ripley

McMahon

Evans

Haff

Comfort

: The effect of Nordic hamstring exercise intervention volume on eccentric strength and muscle architecture adaptations: a systematic review and meta-analyses. Sport Med. 2020;50(1):83–99. doi:10.1007/s40279-019-01178-7.

11.

Baroni

Geremia

Rodrigues

De Azevedo Franke

Karamanidis

Vaz

: Muscle architecture adaptations to knee extensor eccentric training: rectus femoris vs. vastus lateralis. Muscle Nerve. 2013;48(4):498–506. doi:10.1002/mus.23785.

12.

Hough

: Improving physical function during and after critical care. Curr Opin Crit Care. 2013;19(5):488–495. doi:10.1097/MCC.0b013e328364d7ef.

13.

Denehy

Skinner

Edbrooke

, et al: Exercise rehabilitation for patients with critical illness: a randomized controlled trial with 12 months of follow-up. Crit Care. 2013;17(4):R156. doi:10.1186/cc12835.

14.

Batterham

George

: Reliability in evidence-based clinical practice: a primer for allied health professionals. Phys Ther Sport. 2003;4(3):122–128. doi:10.1016/S1466-853X(03)00076-2.

15.

Vieira

Siqueira

Ferreira-Junior

Pereira

Wagner

Bottaro

: Ultrasound imaging in women’s arm flexor muscles: intra-rater reliability of muscle thickness and echo intensity. Brazilian J Phys Ther. 2016;20(6):535–542. doi:10.1590/bjpt-rbf.2014.0186.

16.

Zaidman

Wilder

Darras

Rutkove

: Minimal training is required to reliably perform quantitative ultrasound of muscle. Muscle Nerve. 2014;50(1):124–128. doi:10.1002/mus.24117.

17.

Rabello

Fröhlich

Bueno

, et al: Echo intensity reliability between two rectus femoris probe sites. Ultrasound. 2019;27(4):233–240. doi:10.1177/1742271X19853859.

18.

Weir

: Quantifying test-retest reliability using the intraclass correlation coefficient and the SEM. J Strength Cond Res. 2005;19(1):231–240. doi:10.1519/15184.1.

19.

Raj

Bird

Shield

: Reliability of ultrasonographic measurement of the architecture of the vastus lateralis and gastrocnemius medialis muscles in older adults. Clin Physiol Funct Imaging. 2012;32(1):65–70. doi:10.1111/j.1475-097X.2011.01056.x.

20.

Tomko

Muddle

TWD

Magrini

Colquhoun

Luera

Jenkins

NDM

: Reliability and differences in quadriceps femoris muscle morphology using ultrasonography: the effects of body position and rest time. Ultrasound. 2018;26(4):214–221. doi:10.1177/1742271X18780127.

21.

Hammond

Mampilly

Laghi

, et al: Validity and reliability of rectus femoris ultrasound measurements: comparison of curved-array and linear-array transducers. J Rehabil Res Dev. 2014;51(7):1155–1164. doi:10.1682/JRRD.2013.08.0187.

22.

Lima

KMM

da Matta

de Oliveira

: Reliability of the rectus femoris muscle cross-sectional area measurements by ultrasonography. Clin Physiol Funct Imaging. 2012;32(3):221–226. doi:10.1111/j.1475-097X.2011.01115.x.

23.

Blazevich

Gill

Zhou

: Intra and intermuscular variation in human quadriceps femoris architecture assessed in vivo. J Anat. 2006;209(3):289–310. doi:10.1111/j.1469-7580.2006.00619.x.

24.

De Souza Silva

Dos Santos Costa

Rocha

De Lima

DAM

Do Nascimento

De Moraes

SRA

: Quadriceps muscle architecture ultrasonography of individuals with type 2 diabetes: reliability and applicability. PLoS ONE. 2018;13(10):1–9. doi:10.1371/journal.pone.0205724.

25.

Sabatino

Regolisti

Bozzoli

, et al: Reliability of bedside ultrasound for measurement of quadriceps muscle thickness in critically ill patients with acute kidney injury. Clin Nutr. 2017;36(6):1710–1715. doi:10.1016/j.clnu.2016.09.029.

26.

Chiaramonte

Bonfiglio

Castorina

Antoci

SAM

: The primacy of ultrasound in the assessment of muscle architecture: precision, accuracy, reliability of ultrasonography. Physiatrist, radiologist, general internist, and family practitioner’s experiences. Rev Assoc Med Bras. 2019;65(2):165–170. doi:10.1590/1806-9282.65.2.165.

27.

Cho

Lee

: Reliability of rehabilitative ultrasound imaging for the medial gastrocnemius muscle in poststroke patients. Clin Physiol Funct Imaging. 2014;34(1):26–31. doi:10.1111/cpf.12060.

28.

Cho

Lee

: Intra-and inter-rater reliabilities of measurement of ultrasound imaging for muscle thickness and pennation angle of tibialis anterior muscle in stroke patients. Top Stroke Rehabil. 2017;24(5):368–373. doi:10.1080/10749357.2017.1285745.

29.

Pardo

El Behi

Boizeau

Verdonk

Alberti

Lescot

: Reliability of ultrasound measurements of quadriceps muscle thickness in critically ill patients. BMC Anesthesiol. 2018;18(1):1–8. doi:10.1186/s12871-018-0647-9.

30.

Hadda

Khilnani

Kumar

, et al: Intra- and inter-observer reliability of quadriceps muscle thickness measured with bedside ultrasonography by critical care physicians. Indian J Crit Care Med. 2017;21(7):448–452. doi:10.4103/ijccm.IJCCM_426_16.

31.

Borg

Bach

AJE

O’Brien

Sainani

: Calculating sample size for reliability studies. PMR. 2022;14(8):1018–1025. doi:10.1002/pmrj.12850.

32.

Zou

: Sample size formulas for estimating intraclass correlation coefficients with precision and assurance. Stat Med. 2012;31(29):3972–3981. doi:10.1002/sim.5466.

33.

Lopez

Pinto

: Does rest time before ultrasonography imaging affect quadriceps femoris muscle thickness, cross-sectional area and echo intensity measurements? Ultrasound Med Biol. 2019;45(2):612–616. doi:10.1016/j.ultrasmedbio.2018.10.010.

34.

Neves

Vechin

Teixeira

, et al: Effect of different training frequencies on maximal strength performance and muscle hypertrophy in trained individuals—a within-subject design. PLoS ONE. 2022;17(10):e0276154. doi:10.1371/journal.pone.0276154.

35.

Qing

Wang

Huang

: Effect of quadriceps training at different levels of blood flow restriction on quadriceps strength and thickness in the mid-term postoperative period after anterior cruciate ligament reconstruction: a randomized controlled external pilot study. BMC Musculoskelet Disord. 2023;24(1):360. doi:10.1186/s12891-023-06483-x.

36.

Cheon

Lee

Jun

Chang

: Acute effects of open kinetic chain exercise versus those of closed kinetic chain exercise on quadriceps muscle thickness in healthy adults. Int J Environ Res Public Health. 2020;17(13):4669. doi:10.3390/ijerph17134669.

37.

Cohen

: Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Lawrence Earlbaum Associates; 1988.

38.

Portney

Watkins

: Foundations of Clinical Research: Applications to Practice. 2nd ed. Prentice Hall Health; 2000.

39.

Franke

RDA

Baroni

Rodrigues

Geremia

Lanferdini

Vaz

: Neural and morphological adaptations of vastus lateralis and vastus medialis muscles to isokinetic eccentric training. Motriz Rev Educ Fis. 2014;20(3):317–324. doi:10.1590/S1980-65742014000300011.

40.

Ruas

Pinto

Lima

Costa

Brown

LE.

Test-retest reliability of muscle thicknessecho-intensity and cross sectional area of quadriceps and hamstrings muscle groups using B-mode ultrasound. Int J Kinesiol Sport Sci. 2017;5(1):35. doi:10.7575/aiac.ijkss.v.5n.1p.35.

41.

Oranchuk

Nelson

Storey

Cronin

: Variability of regional quadriceps architecture in trained men assessed by B-mode and extended-field-of-view ultrasonography. Int J Sports Physiol Perform. 2020;15(3):430–436. doi:10.1123/ijspp.2019-0050.

42.

Mandal

Suh

Thompson

, et al: Comparative study of linear and curvilinear ultrasound probes to assess quadriceps rectus Femoris muscle mass in healthy subjects and in patients with chronic respiratory disease. BMJ Open Respir Res. 2016;3(1). doi:10.1136/bmjresp-2015-000103.

43.

Cho

Yoo

sang Lee

Lee

: Reliability and validity of a dual-probe personal computer-based muscle viewer for measuring the pennation angle of the medial gastrocnemius muscle in patients who have had a stroke. Top Stroke Rehabil. 2018;25(1):6–12. doi:10.1080/10749357.2017.1383723.

44.

König

Cassel

Intziegianni

Mayer

: Inter-rater reliability and measurement error of sonographic muscle architecture assessments. J Ultrasound Med. 2014;33(5):769–777. doi:10.7863/ultra.33.5.769.

45.

Joskova

Patkova

Havel

, et al: Critical evaluation of muscle mass loss as a prognostic marker of morbidity in critically ill patients and methods for its determination. J Rehabil Med. 2018;50(8):696–704. doi:10.2340/16501977-2368.

Reliability of Ultrasonographically Acquired Muscle Morphology Based on a Cohort of Men and Women