Abstract
BACKGROUND:
Ultrasound is an important tool to diagnose many clinical conditions. Yet hand-held devices may be prone to more data variability in part from the greater likelihood of human error.
OBJECTIVE:
Quantify intra-rater reliability of subcutaneous skin fold thickness from a hand-held ultrasound device.
PARTICIPANTS:
College-age subjects (18 men, 14 women) submitted to two sets of ultrasound subcutaneous skin fold measurements spaced (mean
RESULTS:
Intra-rater reliability was high for most of our statistical tests.
CONCLUSION:
Despite the relatively long period between measurements, our hand-held ultrasound device exhibited a high degree of intra-rater reliability. Given our results, ultrasound measurements may be a useful tool to quantify skin fold thickness.
Introduction
Ultrasound (US) is a tool that helps confirm numerous medical conditions, including those common to the diagnosis and rehabilitation of sports-related injuries [1, 2, 3, 4]. Many types of US devices exist, yet hand-held units have the added utility of site-specific measurements and ease of use [5, 6]. Hand-held units resulted from US’s miniaturization that occurred from on-going advancements in computer science [3, 5]. They are cheaper and more convenient than typical US devices, and some advocate their use in medical education [3, 5, 6, 7]. They may be used for non-invasive measurement of a variety of tissues; among them are subcutaneous skin fold thickness, which are composed primarily of fat [1, 7, 8]. Such measurements have value, as site-specific adiposity helps identify obesity that in turn is a risk factor for many lifestyle-related ailments. Subcutaneous skin fold thickness is also used to estimate cross-sectional area changes to various muscle groups to monitor atrophy seen with disuse [9]. Finally, site-specific skinfold thickness is used to assess cryotherapy durations and the depth of thermocouple placement to measure skeletal muscle temperature [10]. Poor lower leg thermocouple placement may yield inaccurate cryotherapy times and thus slower rates of recovery from injury [10]. Thus, hand-held units offer data of practical importance to researchers and clinicians [6].
Hand-held B-mode US values have been routinely compared to results from DEXA and calipers, which typically show it to be a valid and reliable measurement tool [11, 12, 13, 14]. US advantages include greater portability than DEXA, and higher accuracy than skin fold calipers [12]. Yet B-mode is expensive, and thus prohibitive for many who would benefit from the information it offers [15]. In contrast, hand-held A-mode US is cheaper and assesses skin fold thickness at lower frequencies [16, 17]. The A-mode also has a shallow penetrating depth and detection resolutions as low as 1 mm; these features may make it more practical for subcutaneous skin fold thickness measurements [16, 17]. Yet the merits of A-mode US, to assess subcutaneous skin fold thickness, are not as well known as B-mode. In addition, just as A-mode has received less attention, the same is true for some anatomical sites. Among them are the calf and lower leg.
Lower leg cross-sectional areas are largely comprised of the calf muscle group [18, 19]. Plantar flexion, for which the calf muscle group’s posterior compartment serves as the prime mover, is integral to various tasks related to athletics, locomotion and postural control [19]. The calf’s medial surface is said an easy anatomical site to assess; prior B-mode US studies showed its subcutaneous skin fold thickness was a strong correlate to body composition values obtained from calipers and DEXA [10, 13, 14, 19]. A study concluded subcutaneous skin fold thickness measurements are important for injury management, and calf US data should be validated since this is a common area for thermocouple placement to assess muscle temperature, as well as a site prone to venous thrombosis [10]. The lower leg has many roles related to health and performance, yet little data exists on A-mode US measurements for this muscle group [16, 20].
However, in contrast to DEXA, multiple US measurements may be susceptible to greater variability provided by individual raters due to a higher likelihood of human error and probe misplacement, as anatomical site deviations of as little as 1 cm may differ significantly from correctly measured values [5, 21]. Thus, particularly for hand-held devices, establishment of rater proficiency is encouraged prior to data collection [2]. In addition, since hand-held unit screens are far smaller than traditional US devices, they offer visual images that raters must interpret more carefully [5]. Finally, at anatomical regions where various tissues overlap, it is difficult to derive multiple measurements devoid of error [8, 13, 19]. Hand-held units are also limited to point of care readings [5]. Thus, raters must be trained on how to use these devices, as well as on proper interpretation of their images [3]. Some suggest human error by raters that use these devices, rather than the US unit itself, is the main cause of data variability [1].
Quantifying data variability from hand-held US, particularly with the less understood A-mode, is an issue that should be addressed, as not all health care programs offer training with these devices [19]. Since national guidelines do not exist, there are no standards to teach students how to best use US and interpret their results [2, 3]. Intra-rater reliability establishes measurement consistency from a single rater over time who uses the same collection techniques and instruments. Since the calf muscle group and A-mode US have received limited attention in prior studies, this may serve as an appropriate research topic to assess the intra-rater reliability of hand-held units [19]. The purpose of this study was therefore, to quantify the intra-rater reliability of lower leg subcutaneous skin fold thickness from an A-mode hand-held US unit.
Methods
Participants
Healthy college-age subjects (18 men, 14 women) first gave informed written consent, and then filled out a medical questionnaire which stated they were free of conditions that could compromise their participation. In addition to subcutaneous lower leg skinfold measurements done at each of their laboratory visits, subjects first visits entailed anthropometric data collection that measured their body height and mass with a stadiometer (Saltner Brecknell; Brooklyn, NY) as they stood barefoot. Anthropometric values from our female subjects (mean
Ethics approval
The current study was approved for data collection by the University of Tulsa’s Institutional Review Board (Protocol 14–56).
Instrumentation
The current study examined a hand-held US device (BodyMetrix Pro System BX2000; Livermore, CA) used in accordance with the manufacturer’s guidelines [22, 23]. The BodyMetrix Pro System BX2000 was equipped with a transducer contained within a hand-held probe, conducting gel, instructional manual and proprietary software housed on a CD ROM [23]. To measure skin fold thickness the probe was placed perpendicular to skin’s surface [24]. Operation of the current study’s US unit saw its transducer send a beam into the body site of interest that was instantly returned to the transducer in a series of waves; the characteristics of those waves denoted tissue density the beam encountered and defined skin fold thickness [13, 17]. The manufacturer claimed its device is not impacted by body hydration status, exercise or caffeine intake [23]. Thus, for laboratory visits we imposed no hydration, exercise or caffeine restrictions upon subjects. Figure 1 depicts the US unit used in the current study.
Current study’s ultrasound unit (BodyMetrix Pro System BX2000; Livermore, CA).
For data collection, the same person performed all US skin fold thickness measurements. At the time of data collection, he was a pre-med student. He previously worked in the principal investigator’s laboratory and is a coauthor of this paper. In preparation for the current study’s data collection, that person read the instructional manual and then practiced skin fold thickness measurements on a variety of people who did not serve as subjects in our investigation. After three weeks of training, the person stated they felt comfortable with the hand-held US device as current study data collection commenced. With three weeks of training performed on a variety of people, our rater’s training exceeded that of recent US studies [15, 19]. Per skin fold measurement subjects assumed a relaxed upright posture. Measurements were done in the A-mode. Subcutaneous skin fold thickness was assessed from the left leg’s calf muscle group, which is thought to be easier to assess since it typically has less body fat than other anatomical regions [1, 13, 25]. We examined the left calf exclusively simply for consistency.
With respect to the body’s transverse plane, skin fold thickness measurements occurred at the calf site with the largest circumference, which was confirmed with a cloth tape measure. At that same circumference site, four separate measurements were made, each equidistant to the calf’s medial-lateral and anterior-posterior borders and 90
Image of transverse section of greatest circumference (a). Overhead view of the same section with location of four skin fold sites, shown by the letter x, 90
A repeated-measure, laboratory-based study examined intra-rater reliability of subcutaneous skin fold thickness values with our hand-held US unit. Each subject made two laboratory visits spaced (mean
Absolute (mean
sem; in mm) subcutaneous skinfold thickness values, averaged across two laboratory visits
Absolute (mean
Current study intra-rater reliability results
Descriptive statistics on skin fold thickness data collected for the current study, both subdivided by gender and pooled, appear in Table 1. Intra-rater reliability was assessed with intraclass correlation coefficients (ICC) for each of the four sites (anterior, medial, posterior, lateral) measured. ICC and eta squared (
In addition to ICC, we calculated limits of agreement, coefficient of variation and smallest real difference to assess intra-rater reliability. Limits of agreement (LOA) assess the nature of the data distribution and identifies homo- or heteroscedastic tendencies. LOA is often presented as a feature of Bland-Altman plots equal to
Results
At the lower leg site where skin fold thickness was measured, circumference was 37.2
Also shown in Table 2 are current LOA, CV% and SRD results. The LOA column includes a value range that represents test-retest reliability for 95% of population investigated at each skin fold site. CV% results denote low intra-rater reliability, likely due to the dependent variables examined and the way CV% is computed. Table 2 results generally exhibit less data variability at measurement sites that also had lower absolute skin fold thickness values. LOA and SRD results agree with those for our ICC values and help affirm our study purpose. LOA results generally appear homoscedastic, yet with lower medial skin fold values. The low Table 2 medial SRD values are a function of how it was calculated. In contrast to the other skin folds, medial values produced a higher correlation (
Discussion
Current laboratory visits were spaced 10.6
More research assessed calf skin fold thickness with the B-mode, which is partly the impetus behind our assessment of A-mode US. B-mode US and caliper values each measured skin fold thickness at eight sites, which included the medial calf, in well-trained athletes [12]. Results showed B-mode US was consistently more accurate among the sample examined [12]. B-mode US was also compared to calipers to assess medial calf skin fold thickness in women athletes [13, 14]. ICC values were 0.953, with a 95% test-retest range of 0.90–0.98 [14]. Measured values, from calipers and US, produced an R
Establishment of intra- and inter-rater reliability is vital to hand-held US, as limited data exists [15, 16, 19, 21]. It is important to compare our results with those in the literature, as hand-held US is assumed to create more user error from inconsistent probe placement [1, 2, 5, 21, 22]. To assess hand-held US reliability, 11 college students underwent 1- and 3-site skin fold thickness measurements, which elicited ICC values of 0.98 and 0.94, respectively [22]. While those results hold promise, yet the authors did not state the mode (A or B) they used to collect data [22]. B-mode hand-held US assessed skin fold thickness in canoeists [19]. Eight skin fold sites were examined, which included the medial calf [19]. Measurements were done by a rater trained only days before actual data collection. Nonetheless ICC results were 0.99, while those for the medial calf were 0.98 and were said to be close to physiological feasibility, due to fat’s plasticity and blurred muscle-fat interfaces that limit measurement accuracy by
With all measurements done by novice raters, inter-rater A-mode US (BodyMetrix BX-2000) reliability was assessed when that data was compared to those obtained from skin fold calipers [15]. With three-site formulas used that did not include calf measurements, results showed consistently higher ICC and smaller 95% confidence intervals from US. It was concluded inter-rater reliability was superior in novice raters with A-mode US, which concurred with prior research [15, 31, 32]. In an investigation perhaps most like the current study, since it assessed intra- and inter-rater A-mode US reliability which included calf data, subjects submitted to subcutaneous skin fold thickness measurements with a Renco Lean-Meater Series US unit [16]. Intra-rater ICC values for calf skin fold thickness were 0.95 and 0.86 from two raters [16]. Such results appear promising since they address the issue of intra-rater error due to inconsistent probe placement, yet the study used a different US brand than the current investigation [1, 2, 5, 16, 21].
Table 2 LOA results, calculated from standard deviations for each skin fold site, display test-retest reliability for 95% of the population examined. Like our LOA values, which exhibit low amounts of variability, SRD is the least change needed to note true skin fold measurement differences. Table 2 SRD values, which help identify systematic error attributable to learning effects from US measurements [33], compare favorably with prior results [34, 35]. In contrast to our LOA and SRD values, CV% results exhibit more variability and low intra-rater reliability, which may be due to low mean skin fold thickness values. CV% represents the variance about the mean works best when standard deviations are far less than positive mean values [36]. In contrast, current CV% results appear to be inflated by low mean values, and thus exceed many of those reported in the literature. CV% is also impacted by sample heterogeneity and the range of data values [33, 37]. Given these attributes, CV% may not be an ideal index to assess current study data. Nonetheless it is noteworthy to observe calf muscle exercise performance, assessed on an isokinetic dynamometer, yielded intra-subject CV% values of 2.6 (at 1.05 rad
The SRD, perhaps the best current study statistic to assess intra-rater reliability, was deemed superior to ICC since it offers insights on the responsiveness of an intervention and its clinical relevance [41, 42, 43, 44]. The SRD was defined as the smallest statistically significant change, and an assessment of real change rather than random data variations [43, 44]. Prior studies on exercise data reliability saw large SRD values, which were likely due to subject’s inability to provide consistent values with physical activity [45, 46, 47]. Lower SRD values occurred from studies that did not include exercise. Sickness impact profiles from 40 stroke patients entailed multiple physical and psychosocial measures [42]. SRD results ranged from 9.3–40.3 [42]. Perhaps most pertinent to the current results, some calculated SRD values in response to US measurements [41, 44]. Inter-recti distance was assessed with B-mode US at seven weeks and six months post-partum [44]. Distance between the two rectus abdominis muscles was measured at four sites on the linea alba. SRD values varied from 0.33–0.55, while results for those same sites in women who never gave birth were from 0.20–0.29 [44]. With only intrasession intra-rater measurements taken, it was implied inter-rater reliability was needed and important to longitudinal studies [44].
Thirty (20 women, 10 men) underwent US imaging with two devices, one of which was portable, to assess the intra- and inter-rater reliability of abductor hallucis measurements [41]. To assess inter-rater reliability, subjects made two laboratory visits spaced three days apart and all data were collected by the same technician who had two years prior US experience [41]. With the measurement mode unspecified, data were collected and SRD values, indicative of the magnitude of change that exceeded the expected inter-trial variability, were calculated. SRD dorso-plantar thickness results show intra- (0.26–0.34) and inter-rater (0.71–0.91) reliability comparable to those of the current study. Yet medio-lateral and cross-sectional area intra- and inter-rater SRD results were far greater [41]. It was concluded intra-rater abductor hallucis measurements for the two US units examined were very high. Unlike the current study, data were not partitioned by gender [41]. Except for Table 2 anterior skin fold thickness results, current SRD values were consistently higher for female subjects. The higher SRD anterior male skin fold values are due to the weaker correlation for that same data collected across the two test days, which also appears why male anterior skin fold data elicited the lowest Table 2 ICC value. Otherwise, SRD results appear to be influenced by standard deviation values. Bigger standard deviations usually yield higher SRD values. Yet since sample size impacts standard deviations, it is noteworthy the current study had more men, which influences inter-gender SRD comparisons.
Reliability and validity are interrelated. Each is used to assess research quality. Reliability refers to the consistency of repeated measurements. In contrast validity denotes measurement accuracy, or an instrument’s ability to measure what it purports to measure, such as when US skin fold values are compared to those from DEXA [39, 40] or calipers [11, 22]. Measurements may exhibit reliability without being valid, as occurred in some [22, 39, 40], but not all [11] hand-held US trials. Skin fold thickness measurement validity was examined at seven anatomical sites with calipers and an US (Logic 500 Pro, General Electric; Milwaukee WI) unit [11]. Results included US medial calf skin fold thickness values comparable to those of the current study, thus it is assumed they exhibit high reliability [11]. In addition, skin fold thickness measured by calipers and US were also significantly (
However, and vital to US measurement interpretation, some studies had high data reliability yet low validity [22, 39, 40]. College students underwent 1- and 3-site skin fold thickness measurements done with calipers and an US (BodyMetrix BX-2000) unit [22]. ICC skin fold thickness results for 1- and 3-site US measurements were 0.98 and 0.94 respectively [22]. Yet despite high US reliability values as determined by ICC, it was concluded the same device’s validity was low due to weak correlations with the same measurements done with calipers. Due to the high reliability, the authors suggested US may help track moderate skin fold thickness changes over time [22]. Another study compared US (BodyMetrix 2000) values to DEXA for adipose tissue thickness estimates [40]. Both 1- and 3-site US measurements were taken from 13 college-age female gymnasts. There were significant differences in adipose tissue estimates, whether the 1- or 3-site protocol was used [40]. It was concluded US did not offer valid estimates [40]. Finally, in an A-mode hand-held US (BodyMetrix 2000) study, measured fat- and fat-free mass values were compared to those from DEXA [39]. Results included vast intra-subject variability for the measurements employed. Specific to US, it was concluded they underpredicted fat mass, and significantly overpredicted fat-free mass [39]. It was also suggested an US unit limitation was its reliance on proprietary software to define muscle-fat interfaces at measurement sites. Perhaps accuracy and validity would improve with visual feedback provided by scans [39].
Conclusions
The current study examined lower leg subcutaneous skin fold thickness, which has received little attention in prior US studies. With ICC as one of the best statistical tools to assess reliability, our results compare favorably to prior studies that also examined US (and A-mode) measurements. A factor that contributed to current ICC, LOA and SRD results was the familiarization our rater underwent prior to actual data collection. Like other studies, and despite our high intra-rater reliability results, we advise training in US usage, particularly for held-held units, before actual data collection [1, 2, 3, 5, 6, 7]. Future efforts with hand-held US should focus on development of standards for healthcare professionals to ensure these devices are best utilized.
Author contributions
CONCEPTION: Brian T. McGirr, Jake L. Martin and John F. Caruso.
PERFORMANCE OF WORK: Brian T. McGirr, Jake L. Martin, Chris E. Colborn, Alex C.S. Shefflette, Steve R. Soltysiak, Elisabeth Dichiara and John F. Caruso.
INTERPRETATION/ANALYSIS OF DATA: John F. Caruso.
PREPARATION OF MANUSCRIPT: Brian T. McGirr, Steve R. Soltysiak and John F. Caruso.
REVISION FOR IMPORTANT INTELLECTUAL CONTENT: John F. Caruso.
SUPERVISION: John F. Caruso.
Ethical considerations
The current study was approved for data collection by the University of Tulsa’s Institutional Review Board in 2014 (Protocol 14-56). All subjects provided written informed consent prior to participation in the study.
Funding
The authors report no funding.
Footnotes
Acknowledgments
We wish to thank our subjects for their participation.
Conflict of interest
The authors have no conflicts of interest to report. Given his role as an Editorial Board Member, John F. Caruso had no involvement nor access to information regarding the peer review of this article.
