Abstract
A recent survey, even one limited to human studies, found considerable “publication scatter” in that more than 250 different professional journals publish articles on obesity. Over the years, and particularly since the 1970s and 1980s when the so-called obesity epidemic began, there has been an explosion of clinical interest in a field that encompasses general medicine, pediatrics, surgery, psychiatry, and almost every subspecialty. And rightly so, since even by 2008, there were an estimated 1.46 billion adults worldwide who were overweight, and of these, 502 million were in the obese category, all of which translate into major public health consequences. Despite many highly publicized studies, why do we not have a greater understanding about obesity than we do? It is certainly not from a lack of trying. This article presents an overview of the limitations and challenges, that is, complexities, due to discrepant frameworks and diverse conceptualizations of obesity; potential flaws inherent in its clinical studies; and particularly, impediments due to difficulties in the measurement of body composition (and particularly adipose accumulation), food intake, and physical activity, as well as to notoriously inaccurate self-reporting by subjects. As a result, clinicians remain limited in issuing recommendations to their patients.
A recent survey, even one limited to human studies, found considerable “publication scatter” in that more than 250 different professional journals publish articles on obesity –and of those fewer than 20% are found in the 3 leading obesity journals. 1 Over the years, and particularly since the 1970s and 1980s when the obesity “epidemic” began, 2 there has been an explosion of interest in the field that encompasses general medicine, pediatrics, surgery, psychiatry, and almost every subspecialty. And rightly so since even by 2008, there were an estimated 1.46 billion adults worldwide who were overweight, and of these, 502 million in the obese category, 3 all of which potentially translate into major health consequences. Despite highly publicized, well-conducted studies, such as those on diet and lifestyle, 4 the importance of calories, 5 a comparison of different diets, 6 or the relationship of body mass index (BMI) to mortality, 7 why do we not know more than we know? It is certainly not from a lack of trying.
Over the years, and particularly since the 1970s and 1980s when the obesity “epidemic” began, there has been an explosion of interest in the field that encompasses general medicine, pediatrics, surgery, psychiatry, and almost every subspecialty.
There are several reasons why the study of obesity lends itself to such complexity. Although there is no particular impediment that is specific to the study of obesity, clinicians may find it is the aggregate of uncontrolled and uncontrollable variables in all areas of clinical research that predisposes investigators to potential difficulties. Researchers called attention to some of these issues well over a decade ago, 8 and though considerable work has been published since then, we are still facing many of these same problems that not only compromise the validity of a study but also make it difficult for clinicians to issue recommendations to their patients as they may relate to disease prevention.9,10 Borrowing from the language of social planning, 11 researchers have referred to the complexity of obesity as a “wicked problem” that includes no definitive formulation, complex (not binary) solutions, no immediate test of a solution, and so on. 2 We can even question whether research on diet and obesity in population-based studies is feasible at all. 12 This article will limit discussion to complexities due to different conceptual frameworks and diverse conceptualizations of obesity, potential flaws inherent in its clinical studies, and, particularly, impediments due to difficulties in the measurement of body composition (and particularly adipose tissue), food intake, and physical activity, as well as to notoriously inaccurate self-reporting by patients.
An Understanding of Clinical Bias
The major reason that a study has compromised validity is that it suffers from bias. In other words, validity is “the degree to which a study is free from bias.” 13 More specifically, bias is any systematic error (as opposed to random error by chance) that can affect the design or implementation of a research study. 14 Multiple biases may be operating simultaneously and are not, by any means, mutually exclusive. 8 Only when researchers are content with a descriptive approach alone, without making recommendations or inferences, can they avoid bias analysis. 15 Clinical bias can occur, of course, in all fields of scientific research.14,16-20 For example, the testing of “non-prespecified hypotheses”—what is called the bias of “data dredging” 16 —led to detecting implausible and “spurious” associations linking astrological signs and health outcomes in a population of more than 10 million Canadians. 21
Sackett, 16 in his classic article, identified biases by the stage of research: conducting a literature review, specifying and selection of the sample population, execution of the experimental design, measurement of exposures and outcomes, analysis of the data, interpretation of the analysis, and publication of the results. He specifies, accordingly, about 55 different categories subsumed under these stages and defines bias as anything that “systematically deviates from the truth.”16,22 More recently, almost 75 different mechanisms have been delineated by which research bias can manifest itself. 19 Biases do not necessarily have to compromise research studies if they are accounted for by statistical means,20,23-27 but can certainly compromise research if they are not appreciated and accounted for. A focus on bias has been called “a constant preoccupation among nutritional epidemiologists” and as such, researchers have to settle for “relative validity.” 8
It was Cochrane in the conclusions to his classic treatise on health care written 40 years ago 28 who called attention to T. S. Eliot’s play in verse, The Family Reunion. 29 Cochrane urged clinicians to “abandon the pursuit” of the “margin of the impossible” and settle instead for what he called “reasonable probability.” Nowhere is this a more relevant and appropriate suggestion than in the study of obesity.
Complexities Due to Discrepant Frameworks of Obesity
One of the first complexities encountered in the study of obesity is in its conceptualization. Essentially, obesity is an excess accumulation of adipose (fat) tissue. 30 Simplistically, it results from an energy imbalance such that the amount of food taken in is greater than the amount of energy exerted (or calories expended). As such, it reflects the first law of thermodynamics, the conservation of energy.31,32 Other than excess adipose tissue, though, there are no other “inevitable” or characteristic signs or symptoms present in everyone with obesity.33,34 In fact, this excess adipose tissue does not even accumulate in the same place in everyone with obesity: for some, it accumulates predominantly in the abdominal area and most dangerously around internal organs (ie, the so-called android distribution, because it is a pattern more common in men), whereas for others it accumulates predominantly more superficially and below the waist (ie, the so-called gynoid distribution, because it is a pattern more common in women). 35 Researchers cannot even agree on what really causes this accumulation of excess fat and even whether obesity is a disorder 36 or a disease at all.33,37 Years ago, it was even referred to as a “psychosomatic disorder,” though “multicausal” in origin. 38 More recently, it has been considered an impulse disorder 39 or even a “psychological disorder involving impulse control” and “reinforcement pathology.” 40
Obesity is, however, recognized as a disease, under the “Endocrine Nutritional, and Metabolic Diseases” category by the World Health Organization (WHO) in its International Classification of Diseases (ICD 10). 41 The concept of “obesity” as a disease is “controversial,” although obesity “meets all the criteria of a medical disease, including a known etiology, recognized signs and symptoms, and a range of structural and functional changes that culminate in pathological consequences.” 42 Obesity, in fact, has variously been called a brain disease,43,44 a metabolic disease, 45 a genetic disease,46-48 a disease of inflammation, 49 a neurochemical disease, 31 and even an infectious disease caused by a virus. 50 It has also been called a matter of “energy balance dynamics.” 51 On the other hand, from an evolutionary perspective, some believe obesity is an example of “inappropriate adaptation” 52 or even the “result of people responding normally to the obesogenic environments they find themselves in.” 2 Though there are “layered determinants” of obesity, the actual “physiology of energy balance is proximally determined by behaviors and distally by environments.” 2 Those in the National Association to Advance Fat Acceptance believe obesity is a form of “body diversity that should be tolerated and respected,” analogous to diversity of ethnicity, race, or sexual preference. 53 Whatever model we use to define obesity, it is likely that the regulation of fat accumulation is extraordinarily complex, multifactorial, and determined by genetic, gender, perinatal, developmental, dietary, environmental, neural, and psychosocial factors 43 whereby “genetics loads the gun while the environment pulls the trigger” (Table 1). 31 Furthermore, it is also likely we are dealing with the “obesities” rather than “obesity.”
Complexities Due to Discrepant Frameworks of Obesity.
Complexities Due to Clinical Study Design
The Sample Population
One of the major issues in the study of obesity, as in all research, is the choice of a sample population. A wrong sample size can affect results: samples can be too large and prove anything, whereas they can be too small and prove nothing. 16 The gold standard of research, of course, is the randomized controlled study.54,55 Neither the cohort study, in which 2 groups are identified and followed “forward in time,” nor the case–control study, in which cases are gathered, compared with a control group, and studied retrospectively (“direction of inquiry backward in time”), is as valid and free from bias as is the randomized controlled study. 56 A case series, with no control group, is the least free from bias and “prone to overinterpretation.” 56 Researchers in the field of obesity are confronted with several options: they can conduct large and sweeping community-based epidemiological studies with thousands of subjects (but fairly limited control over their subjects’ behavior) or they can draw their sample from smaller, more specific clinical populations. The most controlled of all human obesity studies are those conducted on an inpatient metabolic unit, but while researchers gain control, they forfeit exposure to a real-life situation and must often have studies of short duration. Of course, when the sample population is limited (eg, specific race, gender, ethnic group, or age), not only is the pool of subjects limited but the generalizability of the study may also be limited. Obesity studies commonly focus on specific populations (eg, Caucasians, Europeans, and postmenopausal women). 57 For example, the original BMI guidelines for obesity were validated among a population of those of European descent and hence revisions might be warranted for those of non-European descent, such as those from China, South Asia, and so on. 58
Researchers, though, often opt for restricting the sample population because of the possibility of confounding. Confounding is essentially a “confusion of effects” such that the apparent effect of study becomes “mixed with” or “distorted” because of some extraneous factor or factors associated with the outcome. 59 A confounder is a “risk factor” for the outcome, but it is not affected by either the exposure or the outcome.59,60 Failure to account for confounding can lead to either overestimation or underestimation of any effect, and the degree of confounding is more important than whether it is there at all. 59 Confounding describes an association that is true but potentially misleading, whereas bias creates an association that is not true. 20 In obesity studies, one of the most important confounders is smoking, but age, sex, and race can also be confounders.59,61
Not controlling for smoking, for example, can have serious consequences for studies in obesity. Taking a smoking history, though, is much more complicated than it first appears. Misclassifications can result when researchers use a binary classification, such as “smoker or nonsmoker.” For example, not only is it important to inquire about general smoking history but also about the duration and intensity of the smoking exposure (eg, when smoking began; age at cessation; brand of tobacco; whether cigarettes, cigar, pipe, even filtered or unfiltered; how much inhaling; etc). Even the category of “no smoking” may require further clarification.62,63 However, when smoking is not accounted for accurately, for example, studies particularly involving obesity and its effects on mortality may be severely compromised, leading to the so-called J-curve of mortality; the wrong conclusions can be drawn, namely, that increased mortality is not only in the obese but also among those who are considered normal weight or thin.7,64-67
“Reverse causation,” also called “effect cause,” can occur in obesity studies.64,68 Here, an underlying (and maybe even unrecognized) disease is responsible for a low body weight such as in chronic “wasting diseases” (eg, end-stage kidney disease, congestive heart failure, AIDS, and many end-stage cancers). In these cases, increased mortality may give the false impression that obesity may even have a survival advantage. 64 For example, it has been suggested that obesity and its metabolic abnormalities, from an evolutionary perspective, may have been advantageous against the wasting, devastating disease, tuberculosis. 69 Obesity researchers cannot even agree on the definition of reverse causality and cannot rule out the possibility that “bias due to preexisting illness may affect weight-mortality studies.” 68
Another bias typical of obesity studies is the “nonresponder bias”: in general, those who agree to participate in a study may, in fact, be different from those who do not.8,12,16,25,59 Some believe that “reduced participation” or “suboptimal control samples” are the “most common problem” when conducting population-based studies. 25 For example, biased results occurred when only 42% of a randomly selected group of more than 3600 Swedish subjects participated in research on cardiovascular risk, 70 and “the main limitation” of one study on BMI and its relationship to psychiatric disorders was a response rate of only 57%. 71 Here, inferences were made that BMI and common mental problems were the same in responders and nonresponders, “but it is not possible to test the validity of this assumption.” 71
Furthermore, those who volunteer for a study may also be different from the general population, that is, “volunteer bias.” 16 For example, it has been reported that those who volunteer in obesity studies are less likely to have the metabolic syndrome, a serious cluster of symptoms often seen in obesity. 12 One of the most important and long-term studies in obesity research, for example, is the National Weight Control Registry, a study begun in the early 1990s to investigate successful dieters. Now following thousands of individuals, it began with an original group of 629 women and 155 men, all of whom were self-selected volunteers recruited through local and national media advertisements, mailings to weight loss programs, and so on, and not subject to any randomization and not at all even typical of the US population.72,73
A very high attrition (dropout) rate is characteristic of many obesity studies, particularly those involving weight management or specific dietary changes (eg, comparison of low carbohydrate with low fat), not uncommonly as high or higher than 50%, even after only 1 year of follow-up.26,74,75 A very high dropout rate, for example, was noted in a study that compared diets by Atkins, Ornish, Weight Watchers, and Zone. This kind of bias cannot be easily corrected, even by statistical calculations. 64 Subjects may withdraw from studies for a multitude of reasons, including a lack of motivation, 27 but sometimes for reasons completely unknown. 16 Even in a study where the dropout rate was fairly low (13%), researchers found differences between those who dropped out and those who continued to participate over 2 years. 76 Obtaining high response rates with high-quality data retrieval has been called “the single largest obstacle to high-quality epidemiological research,” and the loss of follow-up of recruited participants is much more significant than the loss of a specific population initially because the rate of loss may be reflective of both disease and exposure. 77 When only 60% of subjects can be traced, studies are looked at skeptically, and even when 70% or 80% are traced, those numbers can still be too low to assure against bias if the loss to follow-up might be associated with both exposure and disease. 63
In obesity studies, “membership bias,” in which those who belong to a certain profession or engage in certain activity or even who are employed may be healthier or even more health conscious than the general population, can distort results. One of the most important studies in obesity, for example, is the longitudinal Nurses’ Health Study. 78 To what extent being in the health professions affects results is open to question. Along those lines, those subjects willing to engage in research may exhibit “clustering,” whereby certain habits, particularly about health—whether positive or negative—cluster together so that it is difficult to assess correlations. For example, smokers were more apt to eat red meat, engage in less physical exercise, and drink more sugared soft drinks. 79
Data Collection
Missing information, such as on questionnaires or in records, is also a common problem. 80 Data may be missing because it is normal, never measured, negative, or even measured but never recorded. 16 Furthermore, participants who realize they are controls can choose to change their behavior (eg, eating more healthily or exercising) so that there is the possibility of the “bias of contamination.” 81 A more general concern, however, is how often and when to observe. Cross-sectional observation versus longitudinal observation can lead to significantly different results. Weight fluctuations, even in the course of a day, are extraordinarily common. This can be particularly significant, for example, in studies of weight fluctuations (eg, yo-yo dieting) in which there are repeated patterns of weight gain and weight loss over time that may not be accurately assessed with the limited data that cross-sectional, one-time observation provides. Furthermore, subjects are often asked to remember patterns of weight loss that may have occurred many years earlier and subject to memory distortions. 82
Longitudinal observation, on the other hand, also has its complexities, especially when subjects do not maintain experimental protocol. Many obesity studies involve long-term follow-up over months or even years such that noncompliance with the experimental design can become a problem, particularly over long-term follow-up. Over the 8 years of follow-up in the Women’s Health Initiative Study, for example, the group randomized to a low-fat level of 20% could not maintain that level and hence biased findings (toward the null) regarding cholesterol and triglyceride levels. 64 In fact, it is never really possible in community studies to measure compliance with a prescribed diet, and this has been called “the fundamental flaw in obesity research.” 83 Though random selection tends to control confounding in a study, when there is considerable nonadherence or noncompliance with the treatment protocol, even in large randomized studies, considerable nonrandom confounding can result.59,60
One means of controlling for noncompliance is conducting a study in a laboratory setting rather than in a free-living environment. The laboratory setting has been criticized for not providing for “real meals, real people, real eating situations”84 and has been called “artificial,” particularly when the cost of food, short-term compensation in food intake over several days, and timing of eating, including diurnal rhythms, seasonal effects, and even differences between daily and weekend eating patterns, are often overlooked in a lab setting. 85 Furthermore, the effects of alcohol on food consumption as well as the presence of other people are often not considered, and many of the factors, such as environmental, psychological, and social, that influence food intake are lost in the clinical research–controlled lab environment. 85
Observation in a laboratory setting, though, may mitigate against other forms of bias, such as the “obsequious bias,” when subjects tell researchers what they think the researchers want to hear the “unacceptability bias” in which subjects may be embarrassed to admit to certain behaviors. 16 In a free-living environment, these biases are seen with certain frequency, particularly when obesity studies depend on subjects’ self-reporting 82 (see below). Observation in a lab setting, though, where subjects know they are being observed (the so-called “Hawthorne effect” 13 ), can also affect behavior. Even just having subjects see how much they are eating (eg, leaving dirty plates on the table) can affect how much they eat. 86
Meta-Analysis
Meta-analysis is a pooling or systematic review of multiple studies. The purpose of a meta-analysis is to identify patterns among study results and sources of disagreement among those results. 87 Because meta-analysis deals with considerable heterogeneity in design and statistical methods, even definitions of the problem can differ so much among studies so that an actual meta-analysis becomes impossible and only a qualitative approach is possible. For example, inconsistencies on the potential dangers of weight cycling were seen among studies because there was not even a consistent definition of what constituted a weight cycle, 88 and meta-analysis of randomized controlled trials to assess the effects of calcium supplementation on weight found so many discrepancies among the studies that they could not even conduct a proper meta-analysis and had to settle for a “narrative review” 89 (see Table 2).
Complexities Due to Clinical Study Design.
Complexities Due to Measurement
Nobel Laureate Sir Henry Dale 90 said, “All true measurement is essentially comparative,” and whenever there is measurement, there is always the possibility that there will be error—either by random chance or systematically. 59 One of the most essential impediments in obesity research is, in fact, measurement bias—whether of body composition, food and caloric intake, and/or physical activity and specifically exercise. This is sometimes categorized as “information bias.”19,20,59
Measurements of Adipose Tissue
Currently, we use BMI to define categories of obesity.91,92 Obesity is defined arbitrarily as a “threshold” and as such “a relatively small increase in average weight has had a disproportionate effect” on the actual incidence of obesity. 93 It is not clear how BMI became the general standard to measure obesity. Adolphe Quetelet, a Belgian mathematician and astronomer and the father of modern statistics, in the middle of the 19th century, established this ratio of weight in kilograms to height in meters squared. 94 BMI, though, did not become popular as a measure of obesity until recent years. Back in the early 1970s, though, Ancel Keys and his colleagues noted the “need for an index of relative body weight” and credited Quetelet for this ratio that they called for the first time “body mass index.” 95 Earlier in the 20th century, when scales became available for home use, insurance companies gathered data on weight and its relationship to mortality. 96 These early measurements were highly inaccurate such that people were weighed with shoes and clothing and without any standardization. Even the categories of “small, medium, and large” build were determined by the subjective judgment of an examiner without any corroborating data. 96 Before the use of BMI classifications, the measurements of obesity were much less precise. For example, categories might include “overweight” and “percent overweight.” 36 As a result, as the definition of obesity has become more standardized (though still arbitrary and subject to potential change in the future), comparing older studies with more recent ones or even future ones can lead to what is called “diagnostic vague bias” in which the same condition can receive different diagnostic labels over time. 16 Though BMI use began earlier, 97 it was only in the late 1990s that there were the guidelines established by the US Department of Health and Human Services and the WHO to measure overweight and obesity by the BMI categories that are in use today.98,99 The WHO had “convened a Consultation on Obesity” because of its concern about the comorbidities as well as the “social bias, prejudice, and discrimination” to which the obese are often subjected. Their conclusion was that BMI was a “coherent system” that should be adopted internationally. 99
Over the years since, use of BMI as a standard, however, has caused considerable controversy itself. Early on, researchers began questioning the new guidelines and felt that lowering the overweight threshold “stigmatized” too many people and was not justified on the basis of data on mortality. 100 Even today, researchers describe the classification of obesity as a BMI of 30 kg/m2 or more as having “a certain degree of arbitrariness,” without genetic markers. 47 Furthermore, because BMI measures not only degrees of fatness but also muscle and skeletal mass, it may be inaccurate in those who are particularly muscular or in those who have lost muscle (eg, sarcopenia) typically in old age (ie, may underestimate BMI),101,102 and there is a need to make “adjustments” when calculating BMI not only in athletes 103 but also in particularly tall or short people, as height is part of the equation, and in children younger than 16 years. 104 The practice of using BMI as a measurement of obesity has been called “obsolete,” resulting in a considerable “underestimation of the grave consequences of the obesity epidemic,” 105 as well as a “deeply flawed measure of fatness,” 105 a “surrogate measure” providing “misleading information,” 30 and only a “proxy” measure for body fat. 106 For example, when BMI was compared with more accurate measurements of total body fat, such as the use of deuterium water, BMI “was a poor surrogate for body fatness for both males and females.” 107 And because BMI is only an indirect measure of obesity, it did not discriminate between body fat and lean muscle in patients with coronary artery disease 108 and should not be used alone but rather only with other measurements such as a direct assessment of body composition and measurement of waist circumference. 109 Not all studies, though, have been critical of the use of BMI.65,110 For example, the Prospective Studies Collaboration called it a “reasonably good measure of general adiposity.” 65
One of the most problematic issues with use of BMI is that many studies employ use of subjects’ self-reports to calculate BMIs. Though there is some question regarding the accuracy of self-reports, most researchers suggest that people tend to underreport weight and overreport height.111-113 Although self-reports are easier to collect, they “should not be used exclusively as an obesity surveillance tool.” 114 Likewise, self-reports are more likely to be “underestimations” when people round off their measurements or do not even know their height, particularly as height may change with age, 115 or even when they have certain diseases. 116 Criticisms of self-reports of BMI as a measurement of obesity have been worldwide.82,113,117-126 Studies from Japan, 117 the Netherlands, 120 Sweden, 121 Australia, 124 France, 125 Canada, 114 Greece, 116 and Spain 113 have reported on the inaccuracy of self-reports, the need for caution in the interpretation of findings, and even a need to make adjustments for discrepancies. In the United States, both the sensitivity and specificity of BMI “have been shown to be poor” and demonstrate “various deficiencies as a measure of obesity” when BMI is obtained through self-reports 102 such that BMI by self-report is not interchangeable with BMI by actual measurement. 126 In fact, data from 2 waves of the NHANES (I and II; National Health and Nutrition Examination Survey) in a subgroup of healthy subjects who have never smoked concluded that bias and inconsistency produced by self-reported BMI data may actually account for discrepancies in published data regarding mortality and its relationship to BMI and “even small changes in BMI distribution in future studies could have dramatic effects on misclassification rates.” 126 Another difficulty is that measurement conditions, such as clothing worn, equipment used, instructions given, and even the time of measurement, are rarely, if ever, specified. 82
Of course, BMI has not been the only means of measuring obesity. Clinicians have used calipers for skin fold thickness in various areas of the body, such as arm, scapula, back, hip, and so on. Though sometimes seen as a “comparatively simple and reasonably accurate assessment of body fatness,” 127 most believe it is the most inaccurate of all ways to measure body fat and may not only vary from examiner to examiner but on different examinations with the same examiner. 128 Researchers have also used measurement of both waist circumference and waist-to-hip ratios, but these, too, may depend on the skill of the examiner. For example, it is often difficult to locate the so-called natural waist—or smallest circumference—on an obese person so measurement can be taken at the level of the umbilicus. Although intrarater reliability was “acceptable,” measurement error is more likely to occur in the overweight and obese. 129 Some have suggested that waist circumference may be a useful adjunct to BMI errors from self-reports. 130 Others have found the use of waist circumference led to misclassification. 131 A meta-analysis of more than 82 000 people in the United Kingdom found that use of self-reported BMI led to inconsistent results in relating mortality to the accumulation of body fat, but waist-to-hip measurements “showed the strongest association with mortality from cardiovascular disease,” as compared with either waist circumference or BMI. 132 As a result of missing data, though, 25% of the original sample had to be eliminated, and results might not apply to other ethnic, more diverse samples. 132 There is also controversy, though, over the use of the waist-to-hip measurement, for example, the waist-to-hip ratio has been described as “a superior measure of central obesity with low measurement error,” 133 but its use has been questioned, particularly since hip circumference measures both muscle, fat, and bone. 134
The most accurate (and reproducible) way of measuring body composition is by dual energy X-ray absorption, which is based on the fact that X-ray beams pass through bone, fat, and muscle differently.64,134 Though its use is limited because it is not portable (and hence impractical for large epidemiological studies) and cannot be used on pregnant women, it uses the same machine employed for assessing bone density. As a result, those people being evaluated for osteoporosis can easily request a simultaneous evaluation of their body composition. 64 The so-called gold standard of measuring body composition, though, is underwater weighing, called densitometry, which uses the principle that fat is less dense than water. 64 Clearly, this is an unwieldy technique that cannot be used in large-scale studies or easily with children or the elderly. Finally, both computed tomography and magnetic resonance imaging can both measure body composition but are expensive and obviously require special equipment, and computed tomography exposes subjects to radiation. 64 It was British cardiologist Sir Thomas Lewis 135 who said that “there is a manifest tendency . . . for the medical profession to exaggerate the accuracy of its subjective methods of examination.” 135 Clearly, there is “no single measurement method that is error-free.” 136
Measurement of Food Intake and Caloric Consumption
As noted, our inability to measure accurately what people are really eating is the “fundamental flaw” in research in the field of obesity. 83 We are left, as a result, with “partly inaccurate information” and failing “in a fundamental task of science, accurately measuring the independent variable.” 83 This becomes so much more problematic in obesity research, as noted, because of social desirability: people can be embarrassed by their behaviors, especially about food (and alcohol), and misrepresent their intake. (ie, “unacceptability bias” 16 ). Furthermore, much of this information relies on a subject’s recall, which can be notoriously inaccurate, even with the best of intentions. Though sometimes related to social embarrassment, underreporting of food intake may also be reflective of a poor memory or even a genuine lack of awareness regarding specific food items and actual amounts consumed. 12 Wansink 86 has described the phenomenon of “portion distortion,” seen not only in obese subjects but also in those of normal weight. Furthermore, studies of diet are often limited by the use of “disappearance data” that are only indirectly limited to intake, 137 and the complexity of the human diet represents a “daunting challenge” to those studying a connection between diet and disease. 137 Dietary exposures can rarely be characterized as present or absent: individuals rarely make clear changes in their diet at identifiable periods of time. More typically, patterns evolve over years, and even though diets of individuals are often consistent over time, they are usually characterized by marked variation from day to day. 137
Food intake, though, can be measured by several means, including 24-hour recall, the most widely used dietary assessment method (and the basis for national nutrition surveys), food diaries for varying periods of time (often 3-7 days), and food-frequency questionnaires.64,137 With food diaries, subjects must be highly motivated to keep these records, but this effort may increase their awareness of (and hence alterations in) food intake. Information retrieval can also be by telephone or in-person interview. As researchers in all fields appreciate, use of the telephone has made randomization more complex, as cell phones (ie, area codes), voicemail, and other technological advancements do not necessarily identify a subject’s location. 102 Both the food-frequency questionnaires and the 24-hour recall depend on memory, leading food writer Michael Pollan to wonder whether Marcel Proust could remember with precision all that he had eaten. 138 Furthermore, complications in dietary research can stem from the inherent biological complexity of nutrient–nutrient interactions, and since diet is often associated with health consciousness in general, the diets of those who participate may differ substantially from those who do not participate and hence bias samples. 137 Another problem with dietary studies is that the time between any change in diet and any expected change in incidence of disease is typically uncertain: even if an effect is not found, it may not be possible to rule out that follow-up was not long enough. 137 And, as noted, compliance often wanes over a long trial, particularly if treatment involves a real change in food intake, and sometimes the control group chooses to adopt the prescribed diet of the treatment group, particularly if it is thought to be of benefit 137 (“bias of contamination” 81 ).
The obesity literature is replete with references to inaccuracy in reporting of diet not only in obese subjects but particularly in the obese and often correlated with the degree of obesity as measured by BMI8,32,139-147 It is a “major challenge” to link diet with health when subjects are “implausible reporters,” and even using statistical means to account for underreporting cannot determine “true validity.” 23 A review of both prospective and retrospective studies yielded underreporting discrepancies in food intake when measured against doubly labeled water. 32 Underreporting has been linked not only with greater BMI but also with greater body dissatisfaction and lower income. 148 In general, the failure of obese people to lose weight while on a specified diet (what the subjects called “diet resistance”) may reflect both underreporting of caloric intake and overreporting of physical exercise rather than on any metabolic differences between the obese and the nonobese. 139 Furthermore, subjects, particularly the obese, can both underreport and undereat during the period of observation, 32 often by 20%.32,85 The “eye–mouth gap” is the discrepancy between the food intake people believe they are eating and what they are actually eating. 149 Doubly labeled water or 24-hour urinary collection for nitrogen excretion can assess protein specifically in an attempt to validate dietary intake, but these methods are cumbersome and expensive and not suitable for large epidemiological studies. 8 When there is overreporting of protein, it is suggestive that there is an underreporting of fat and carbohydrate, but there are no means of assessing specifically what nonprotein sources (eg, fat, carbohydrates, alcohol) are underreported by subjects. 8 Underreporting leads to a “dual bias”—general underreporting of total caloric intake and underreporting for specific foods. 12 Furthermore, intensified public health campaigns regarding lowering fat and sugar intake may have led over time to even more inaccurate underreporting, even in those who were not obese. 10 Underreporting was also found in up to 45% of pregnant women, and those who tend to underreport tend to be less compliant in general with dietary recommendations for pregnant women. 150 Underreporting can also occur particularly in obese patients who are depressed. 151
Measurement of Physical Activity
Perhaps even more difficult than measuring caloric intake or actual percentage of body fat is the measurement of physical activity. Our bodies burn calories through the digestion, absorption, and storage of food (ie, its thermogenic effect); through our resting metabolic rate; and any/all physical activity, the most variable of the 3 components. 73 Caloric expenditure by physical activity can vary by 3-fold from the extremely active to those who are sedentary. 152
There are 2 kinds of physical activity: nonexercise physical activity thermogenesis, that is, spontaneous movement of the body (including posture, fidgeting, sitting, standing, or even chewing gum), and exercise, that is, physical activity that is “purposeful” and planned specifically for maintaining health or fitness or for burning calories. Exercise can be measured by its intensity, its frequency, and its duration. 73 Many studies, though, particularly large epidemiological surveys, do not measure precisely and employ inaccurate self-reports that are sometimes merely estimates. The “Compendium of Physical Activities” lists thousands of activities in categories such as sports, occupation, home repair, self-care, and so on, all of which are given a value compared with sitting comfortably.153,154 No 2 people, though, perform an activity in exactly the same way so that these values are only approximations. In some studies, attempts to measure physical activity more accurately can be done using an Actigraph, an instrument that measures the intensity of movement. 155 Only in a lab setting, though, can we obtain accurate measures of actual physical activity. Most studies just report that their subjects did “moderate” exercise without a precise definition. There are even further difficulties in calculating caloric expenditure during exercise: one has to consider not only the number of calories expended during an exercise but also consider (and subtract) the number of calories that might have been expended just by standing or sitting. 156
Exercise research suffers from other methodological problems, such as not being randomized controlled studies but merely observational with poor follow-up. 157 Studies of exercise and its role in psychiatric disorders found that the exact nature of the exercise recommended was not even specified, nor its intensity or sometimes even its dropout rate. 158 Intensity of exercise, for example, was also not measured (and considered a limitation) 158 in a study where subjects were asked to take a “brisk” walk for 30 minutes a day, 159 and the study population was too homogeneous to make generalizations to other populations and did not measure adherence over time. 159
One measure of physical activity is the pedometer that, once set to a person’s stride length, calculates how many steps a person has taken within a day.154,159,160 A pedometer can be a “useful tool” for tabulating the amount of walking because it can provide immediate feedback, 159 but there is conflicting evidence regarding the accuracy of pedometers in actually capturing physical activity. When, for example, the pedometer was compared with measurements conducted in a respiratory chamber, it was “at best only a crude predictor” of physical activity, and because it does not record the duration or the intensity of the steps taken, it does not provide accurate enough information for calculating energy expenditure. 161 Furthermore, pedometers cannot even accurately measure “stride length” as stride changes depending on the speed of walking 161 and may be limited when comparing samples that are based on varying recommendations for physical activity. 162
As in studies involving food intake and body measurements, those involving physical activity are also subject to inaccuracy with self-report, particularly with the use of questionnaires. 163 As in questionnaires tabulating food intake, even the order of the questions can significantly affect responses. “Subjective interpretations” involving the intensity of exercise may contribute to errors in classifying the intensity of an exercise and self-reports involving physical activity tend to overestimate physical activity levels when compared with “objective monitoring” as, for example, by accelerometry. 164 Likewise, a systematic review comparing direct measurements of physical activity with self-report data found considerable inaccuracies in self-reports, with both higher and lower levels reported, although self-report data can give information on an individual’s “perception” of an activity’s difficulty but not in “capturing all levels of activity.” 165 Direct measurement, though, may fail to capture “incidental daily movements” or even activities like swimming so that there is a need for “valid, accurate, and reliable measures” to assess physical activity, particularly as it relates to possible clinical interventions 165 (see Table 3).
Complexities Due to Measurement.
Abbreviations: DXA, dual-energy X-ray; CT, computed tomography; MRI, magnetic resonance imaging.
Conclusion
Practitioners in all disciplines are familiar with obstacles that confront them and limit their expertise. The field of obesity unfortunately lends itself particularly well to the compounding of these difficulties. Although there is no impediment that is specific to obesity, clinicians may find themselves inadvertently thwarted by the aggregate of uncontrolled and uncontrollable variables that predispose them to potential and sometimes insurmountable challenges. These challenges include complexities due to discrepant frameworks and diverse conceptualizations of obesity, potential flaws inherent in its clinical studies, and particularly to problems in the measurement of body composition (and specifically adipose accumulation), food intake, and physical activity, as well as to notoriously inaccurate and misleading self-reporting by subjects. As a result, those who attempt to study and treat obesity are constantly on T. S. Eliot’s “margin of the impossible.” 29 Unfortunately, there are no straightforward solutions to these challenges, and clinicians often remain limited and even tentative in the recommendations they can offer their patients. In fact, given all these difficulties, we can marvel that we know as much as we do. Both researchers and clinicians alike, though, while striving for success, must remain cognizant of, and sensitive to, not only their patients’ not so infrequent failures but also to their own as well.AJLM
