Abstract
There is not a universally accepted fit test procedure because test protocols need to be customized for specific products or research questions. A fit test protocol can vary significantly based on the product tested, but often includes these steps: define fit criteria, select anthropometric dimensions of interest, perform testing, collect quantitative and qualitative feedback, identify design recommendations, and develop or refine a sizing chart. To illustrate general approaches to human fit testing, case studies of fit tests are described for three different product types: a multi-layered protective clothing ensemble, a therapeutic airway clearance vest, and a patient transfer sling.
Keywords
Ergonomic design processes typically rely on human testing, with benefits that include improved user acceptance (Jeon et al., 2011; Ármannsdóttir et al., 2020), confirmation of design requirements (Zehner et al., 1987), establishing or enhancing product preference (e.g., Jeon et al., 2011), and maintaining customer satisfaction. Further, human testing provides the benefit of additional feedback, allowing designers to benchmark user preference across multiple designs.
Several business needs leverage fit testing, which collects data on human dimensions and corresponding user determination for product fit, comfort, and preference. For example, in the medical device environment, fit testing may be a component of required human factors and usability testing. However, businesses may use the same information as evidence for marketing claims, product positioning, and pricing strategy. A good fitting product can help reduce product returns, increase utilization, and improve profitability. Much research has applied anthropometry to product design (Deanat, 2018), but human fit testing is not routinely discussed in the ergonomic design literature, except in the case of quantitative fit testing of respirators (e.g., Cameron et al., 2020; Hon et al., 2017), or in specialized military contexts (Brantley, 2000; Choi et al., 2011, 2016). Additionally, because test protocols need to be customized for specific products or research questions, there is not a universally accepted standard fit test procedure.
This article presents case studies of fit tests for three diverse products, illustrating both commonalities and differences in methodology based on product functionality, stage of development, and business goals. The objective is to highlight the breadth of strategies employed across the different projects, not to provide sufficient methodological detail to allow reproducibility, nor to report specific results. The projects vary from a multi-layered protective clothing ensemble with well-established fit criteria, to an evaluation of medical devices that deliver respiratory therapy, and the exploration of fit criteria for a sling designed to lift patients.
FIT TESTING: AN OVERVIEW
A specific fit test protocol can vary significantly based on the product tested, but often includes these common steps: define fit criteria, select anthropometric dimensions of interest, perform testing (on one or more sizes of a product), collect quantitative and qualitative feedback, identify design recommendations, and develop or refine a sizing chart.
Fit Criteria
Two people with an identical body size and shape might experience a different “fit” because people interpret the term differently. A solution to this problem is to define specific fit criteria so that the same concept is used across all potential test participants. The fit test team should define these fit criteria both quantitatively and qualitatively by working in advance with the design team, who will have designed the product with a specific fit in mind.
Fit criteria can take many forms, but ideally, they can be expressed in quantitative terms. For example, a criterion for sleeve length might be that the end of the sleeve should be within ½ inch of the Stylion (bony wrist) landmark. Alternatively, a criterion for chest circumference might be between 4 and 6 inches of circumferential ease, or the difference between the garment circumference and the chest circumference. Qualitative fit criteria can also be useful in the context of perceived comfort despite being subjective. For example, one might ask the participant if the garment is too tight or too loose in the waist to determine if a pattern adjustment needs to be made. Note that when making use of qualitative fit criteria for a specialized product, it is important to test participants who understand the way the product is meant to be worn and used, as naïve users might apply inappropriate fit or comfort criteria.
Choosing Anthropometric Dimensions
As fit is partly a function of anthropometry, it is key to obtain anthropometric measurements on test participants. The selection of anthropometric dimensions to determine fit is guided by two principles. First, the list should allow comparison of the test sample with the larger population for whom the product is designed. In a civilian setting, that might mean comparing with any available anthropometric data, such as from the National Health and Nutrition Examination Survey (NHANES) (CDC, 2020). For military products the test sample might be compared with the most recent US Army data (ANSUR II) (Gordon et al., 2014) or similar datasets, which generally contain many more dimensions than NHANES. After data collection, comparisons are made either by using statistical methods or visual comparison of scatterplots depicting the test sample in relation to the general population.
Second, the measurement list should contain body dimensions that are easily relatable to garment or product dimensions. For example, length, width, and circumference measurements are included for all dimensions that could relate to a garment. Alternatively, documenting the location of the eyes and ears may be useful for head-worn devices. Standards or regulations for certain products may also inform the choice of dimensions.
Using Fit Test Data and Developing Recommendations
Translating fit test data into product improvements begins with analyzing the results of each fit criterion and then applying the findings of the test sample to the anthropometric characteristics of the intended user population. For example, if the sleeves of a uniform were too long for the majority of participants, we might recommend shortening the sleeve by a length that provides an improved fit for those with shorter arms while still accommodating those with the longest arms. However, these recommendations are product dependent. In the case of a chemical protective garment, exposed skin poses a burn risk, and we might recommend leaving the sleeve too long for those with shorter arms to avoid the risk of any users having exposed skin. Although the cases described herein focused on product functionality and comfort, fashion or social considerations may also influence the application of fit test data to product designs. These case studies also evaluate products for which the material is determined and the design is relatively mature; approaches may differ when multiple materials or other design attributes are also being evaluated.
The other immediate use for fit test data is the creation, validation, or improvement of size prediction charts. To create a size prediction chart, two-dimensional scatter plots are created to compare the predicted size for each individual with their actual best-fitting size. An optimal chart is determined finding the boundaries between the two sizes until predicted size matches actual best-fitting size for the greatest number of individuals. The same approach is used when validating or updating size prediction charts: the predicted size is compared to the actual best-fitting size, and the chart is adjusted accordingly.
CASE STUDY 1: MULTI-LAYERED PROTECTIVE CLOTHING ENSEMBLE
The Integrated Aircrew Ensemble (IAE) was developed for the US Air Force. It was designed to protect aircrew from a changing variety of threats—from cold weather to chemical/biological weapons to emergency landings over water. Since the type of threat can vary over a service-member’s tour of duty, the program goal was that a single set of garments would contain all the elements needed for a variety of working conditions and could be combined or removed, as needed. The challenge from a size and fit point of view was making the various layers still fit in any of several possible combinations. The eight layers are: 1) Undergarments (silk-weight T-Shirt and briefs; summer socks) 2) Thermal undergarments (mid-weight long sleeve and long leg) 3) Environmental protective layer (EPL; one-piece thermal garment, plus socks) 4) Chemical/Biological protective layer (CBRL; one-piece absorptive garment, plus socks) 5) Pressure vest (to counter loss of blood to the brain in high-G maneuvers—torso) 6) G-Suit (to counter loss of blood to the brain in high-G maneuvers—lower body) 7) Parachute Harness 8) Survival Vest (contains survival gear, such as knives and lights)
A standard flight suit is worn by all fliers together with the other required layers of the IAE. However, sizing adjustments of the flight suit were not in scope for this study. Layers 3 and 4 are seen in Figure 1 along with the standard flight suit. Figure 2 shows layers 7 and 8. Selected IAE layers (from left to right): EPL; CBRL; Flight suit. Survival vest and parachute harness.

The program goal was to accommodate 98% of males and 98% of females in the Air Force flying population in as few sizes as possible. The goal for this fit test was to determine whether 98% of males and females were accommodated, determine the accuracy of the size prediction chart, and identify whether any pattern changes were needed.
Seventy-three Air Force personnel were recruited to ensure that we had participants in each of the available sizes. For most of the layers, six circumferential sizes and four lengths were available for testing. Some of the circumferential sizes came in all four lengths, but most sizes came in fewer than four lengths, yielding a total of 16 size/length combinations.
Anthropometric Dimensions Measured for IAE Study.
While the protective system was designed to be worn in multiple configurations, it was not practical to test every possible combination of layers. As a result, we tested two configurations—a minimal set (for summer weather, flying over land, no chem/bio threat) and the full ensemble including all layers. To address questions of fit for each layer, we tested the layers individually as they were incrementally added to the ensemble.
Thermal Undergarment Fit Criteria. Measurements taken with participants standing with both arms hanging straight at the sides. “Ease” Indicates the Difference Between the Body Circumference and Corresponding Garment Circumference.
In addition, for some layers we used a limited functional protocol, verifying the participants could completely move the arms forward and overhead or step up on a high foothold. We did a number of these functional movements in the seated position as well. These criteria are measured differently from the ease dimensions, but they are essentially recorded as a set of pass/fail criteria.
The accuracy of a size prediction chart measures the percentage of participants from the test sample predicted into the correct size. Typically, the preferred approach is for participants to try on adjacent sizes (e.g., one size up and one size down from the usual or predicted size) to determine which is the better fitting size. Since this particular test was part of a multi-year development program, and the size prediction charts were reasonably well established prior to this test, most participants fit the predicted size. The few participants not correctly predicted by the size charts from previous program phases were moved to their best-fitting size before data collection on the remaining fit criteria began. With the best-fitting size overlaid against the anthropometric data, it was possible to adjust the size prediction charts as needed to create the final sizing charts for each of the layers.
Finally, in a few cases where pattern adjustments were needed, we used the anthropometric data along with the fit results to recommend specific changes in key dimensions to improve the fit.
CASE STUDY 2: AIRWAY CLEARANCE VEST
An airway clearance vest (ACV) provides High Frequency Chest Wall Oscillation (HFCWO) therapy to mobilize retained secretions that cause respiratory infections, hospitalizations, and reduced lung function (Button & Button, 2013). While most healthy individuals can clear mucus through coughing, some chronic conditions require additional measures (Sutton et al., 1983). Chest physical therapy (CPT) with percussion of the chest by another individual can be used to help loosen lung secretions. As an alternative, ACVs offer an independent option with a consistent and thorough application of therapy. By inflating and deflating a soft vest garment with air pressure, the gentle oscillations of the chest wall help to dislodge mucus from the bronchial walls and move secretions from smaller to larger airways where it can be coughed or suctioned out. An ACV is prescribed for conditions including cystic fibrosis, bronchiectasis, or other chronic pulmonary and neuromuscular conditions.
The ACV of focus in this testing was The Vest® System (Hillrom, Batesville, IN, USA), which consists of a breathable, non-stretch, polyester and polyurethane-coated nylon garment with buckles, a set of air hoses, and an airflow generator/control unit. Once individuals are fitted to the garment, they connect the air hoses to ports in the front of the garment and on the control unit. The control unit has programmable settings that vary frequency and intensity of the airflow delivery (Figure 3). Anecdotally, many of the participants in this study reported a roughly 30-min therapy routine 2–3 times per day while seated. Airway clearance vest.
Anthropometric Dimensions Measured for Vest Study.
Fourteen participants with cystic fibrosis, bronchiectasis, COPD, or a combination of respiratory conditions were enrolled in the study (57% female, 43% male) which was approved by an external IRB. It was important to recruit actual users of HFCWO products as they would be experienced on how the therapy should feel and be able to identify functional aspects of appropriate fit and comfort.
After anthropometric measurements, participants were asked to don a garment and answer a series of questions about the fit/feel (pre-test). Next, they were taken through a modified therapy with the device on a low setting for 10 minutes and answered questions about comfort and fit (post-test). Pre-test questions included a baseline report of discomfort level with the garment before the system was turned on, as well as assessments of the adjustment of the buckles, straps, or hook and loop fastener. Post-test questions included responses to tests such as emulating a cough/deep breath, perceptions of quality, proper positioning, fit, and comfort of the garment. Most comfort questions used a 7-point Likert scale, but participants were also asked to estimate which device could be used longest without discomfort. Comparison surveys asked participants to assess perceived differences in quality, fit, and ease of use of the vest. The final question asked the participant to rank the devices by overall preference for which they would want to take home for use.
This study verified the prototype comfort performance and fit accommodation relative to existing products. Additionally, this small preliminary test clarified the best fit criteria for future fit testing and sizing studies, such as the finding that waist circumference was generally a better predictor of sizing than chest circumference. Furthermore, both quantitative and qualitative data provided additional learnings for future design. For example, participants varied in terms of preferred length of the vests. Some liked a garment that did not cover their belly, while others preferred full coverage. Participants also varied in factors that influenced preference for the device. Some users considered comfort and fit, while others prioritized the perceived quality of therapy or the type of oscillation provided. Participants also shared unique strategies and contexts for how they use the product. These nuances will better inform design requirements and associated testing.
CASE STUDY 3: PATIENT TRANSFER SLING
Healthcare workers experience musculoskeletal injuries at rates that exceed most other industries, and these injuries are most often caused by manual handling of patients (BLS, 2018) which expose workers to physical stresses that far exceed recommended guidelines (Marras, 1999). Safe patient handling and mobility (SPHM) equipment is recognized as the gold standard solution to reduce the risk of injury in caregivers (American Nurses Association, 2014). SPHM equipment is also a tool that facilitates mobilizing patients out of bed to combat the negative effects of bedrest. Despite the clinical and financial benefits of mobilizing patients, it is often among the most omitted nursing tasks, in part because of the physical challenges associated if SPHM equipment is unavailable.
A seated sling is used together with a ceiling lift or mobile lift to safely transfer a patient to destinations including chairs, toilets, beds, stretchers, exam tables, and vehicles. The seated sling is one of the most commonly used pieces of SPHM equipment. Proper fit is important for ensuring the patient can be lifted safely. Furthermore, although a patient is typically only suspended for a few seconds and rarely more than 1 minute, a comfortable experience is important for gaining cooperation from patients and endorsement of healthcare workers.
The goal of this study was to better define the range of patient anthropometry accommodated by each size of a common model of seated sling. A secondary goal was to evaluate the sizing chart. This approach required developing new fit criteria and methods for assessing comfort that were specific to being suspended in a seated sling.
The sling has unique requirements compared to many other products that are sized for an individual, because the sling fully supports the body weight against gravity and is used for only a short time. The sling evaluated in this testing was made of non-stretchable polyester material with a foam layer for comfort in the thighs and back. The manufacturer had existing criteria for assessing fit, but this testing allowed the opportunity to evaluate additional criteria that could be more reliable for future testing and development. The gold standard used to assess “acceptable” and “ideal” fit was the determination of a clinical expert who reviewed photos taken during testing. A pilot assessment with 3 clinical experts determined sufficient reliability that a single expert could be used for the full testing.
Additional potential measurements evaluated to predict fit included hip angle, span of sling material not in contact with the back of the body, and protrusion of the buttocks below the seam of the sling. It was known that as a sling becomes too large for a patient, the hip angle decreases, the buttocks can protrude below the edge of the sling, and more sling material is not in contact with the back. Quantifying these measurements allowed evaluation for their use in future testing to predict fit.
Anthropometric Dimensions Measured for Sling Study.
aMeasured while the participant was seated in the sling.
After measurement, the participants sat leaning forward slightly on a flat surface, and each sling was consistently applied by placing the edge of the sling just under the buttocks. The participant was raised with the lift to a standardized height fully off the sitting surface. After the participant was suspended, they adjusted themselves in the sling for comfort and symmetry. Pilot testing showed this adjustment resulted in more repeatable measurements.
To define hip angle, adhesive dots were placed on the participant’s right knee at Lateral Femoral Epicondyle, on the sling over the hip at Trochanter, and on the sling where the participant’s shoulder made its highest contact with the sling. To quantify the amount of material not in contact with the participant, the slings had been pre-marked with a grid of one-inch squares, starting with zero in the center of the sling at the bottom seam. The x-y location of the shoulder dots was recorded for both right and left sides and retained for later analysis.
After attaching the markers, two photographs were taken for each sling from standardized front and side views. A clinical expert used these photos to assess the fit of each sling for each participant. Two additional dimensions, Shoulder-Trochanter-Knee Angle (hip angle) and Bottom Protrusion Length, were extracted from the side view photographs (Figure 4). Considerations of the expert reviewer included the patient falling out through the seat or experiencing uncomfortable hip flexion when the sling was too large, or falling through the front or experiencing point pressures in the legs and crotch when the sling was too small. Participant suspended in Patient Transfer Sling, pink dots indicate the shoulder (top left), trochanter (center), and lateral epicondyle (knee). The lines indicate hip angle as measured from the photographs.
Most sizes of the sling were tested on each participant. XXLarge was the first size tested, unless that size was clearly much too large. The slings were then tested in order of decreasing size until a sling was much too small. In that case, significant discomfort, typically in the crotch, was the signal that no smaller sizes should be tested on that individual.
In each sling, investigators measured two additional dimensions, Forward Leg Seam to Popliteal Fossa, and Knee-Knee Breadth, and then asked the participants a series of questions. The time spent in each sling was approximately equal, and the list of questions was the same for each sling. The questions were: • Is this more or less comfortable than the previous sling configuration? Why? • How long could you stay in this sling? • Do you feel discomfort? If Yes, Where? • Do you think you could fall out?
The key to understanding the answers to the question “Is this more or less comfortable than the previous sling configuration?” was the sequence of slings tested, which allowed us to infer the participant’s self-determined best-fitting size after the fact. This approach simplified the assessment for the participant by obviating the need to recall the experience of multiple sizes or provide consistency across sizes using Likert scores. We estimated “participant-determined” best-fitting size based primarily on the optimum of the comfort comparisons with ad hoc adjustments or tiebreakers using the other subjective responses. These participant-determined best fits were generally consistent with the clinical expert determination of best-fitting size, although where differences occurred the clinical expert tended to consider a larger size as optimum. This may highlight differences in clinical needs and required flexibility compared to individual preferences.
A comparison of best-fitting size to the available anthropometric dimensions showed that Buttock-Knee Length was the best predictor of fit. However, we also evaluated sizing based on weight, as this attribute is known to most caregivers and it is impractical for hospital caregivers to routinely make an anthropometric measurement before performing a transfer with the sling. These attributes were only fair predictors of fit, but this was partially because of forgiving sizing in which all participants experienced acceptable fit by at least three sizes.
This fit test was unique in its measurement of comfort and also the use of the clinical expert who determined best-fitting size and “acceptable” sizes. This test allowed the design team to identify new fit criteria and eliminate others that were less effective (e.g., buttock protrusion depth and sling contact on the back) as objective predictors of fit. Furthermore, the team gained a more nuanced understanding of the relationship of body size and shape to sling fit and how many sizes are needed to best accommodate the intended user population.
DISCUSSION
Anthropometric fit tests can take a variety of forms, but the uniting principle is evaluating the fit of the product for a sample of test participants and using anthropometric data to infer the fit more generally for the target population. Although we have not reported population comparisons, in each case anthropometric data were used to verify that the test sample was anthropometrically representative of the product’s intended user population. Beyond the unifying principle of inferring fit of a sample to a target population, the form and design of the fit test follow the goals of the investigation, and the type of product being tested.
In the case of the IAE, the specific fit criteria simplified data collection and analysis. This yielded targeted recommendations for changes to product dimensions to maximize fit, and small adjustments to the size prediction chart. By contrast, the fit criteria for the sling lacked objective definition prior to the test and there was uncertainty about which functional and anthropometric measurements were needed. This test of the sling yielded better understanding of fit criteria which will translate to more streamlined future tests with fewer required measurements. Key findings showed that discomfort at the back of the knee was a key driver in participant satisfaction with a particular size, and that adjacent sizes overlapped considerably with respect to the population they accommodated. Finally, the small, targeted fit test for the HFCWO vest confirmed comfort performance and fit accommodation relative to existing products. Further, the test provided insights for future designs.
In addition to evaluating fit, a goal of many fit tests is also to evaluate or optimize a size prediction chart. In the case of the IAE, the fit test enabled slight optimizations of the charts. For the sling, and to a lesser extent the vest, the data validated existing size prediction charts. Such nuanced information can be helpful to large organizations in determining how many of each size to purchase or keep in stock.
SUMMARY
An anthropometric fit test can provide useful information in the development of new products, optimization of existing products, and in enhancing comfort and function of products at any point in a product’s development or lifespan. Unlike, for example, clinical trials in a medical setting, there are no uniform procedures for conducting such tests. Instead, the specific product type and test goals define how the tests should be conducted. We have shown how tests were conducted for three different products to illustrate the range of procedures that might be employed in such testing. We encourage the reporting of anthropometric fit tests in the open literature to increase the use of these valuable testing procedures.
Footnotes
![]()
Testing of the Integrated Aircrew Ensemble was funded by TIAX, LLC. Testing of the airway clearance vest and patient handling sling was funded by Hillrom, a subsidiary of Baxter International.
