Abstract
The relationship between declining bone density and increasing fracture risk is firmly established; the relationship between increasing bone density and decreasing fracture risk is less clear. Because of this, the clinical utility of assessing the therapeutic efficacy of prescription therapies to reduce fracture risk by measuring changes in bone density has been called into question. However, there is substantial clinical trial data to support this approach. Nevertheless, an apparent increase or decrease in the bone density may be misinterpreted without an understanding of the statistical concepts of precision and least significant change. These concepts are not difficult and are of profound clinical importance. If the least significant change is not known, serial measurements of bone density cannot be interpreted. These concepts will be discussed and illustrated, and the rationale for the importance of changes in bone mineral density on therapy will be explored.
Measuring changes in bone density to assess therapeutic efficacy during therapy for osteoporosis is a relatively straightforward process. The reasons for doing so and the interpretation of the measured changes are often less clear.
Assume that a 62-year-old Caucasian woman, 11 years postmenopausal on no hormone replacement, underwent a dual-energy x-ray absorptiometry (DXA) bone-density study of the posterior-anterior lumbar spine (LS). Her L1-L4 bone mineral density (BMD) was 0.892 g/cm2. She was begun on an antiresorptive agent, and 2 years later her L1-L4 BMD was 0.932 g/cm2. Should this be considered a statistically significant increase? If the follow-up L1-L4 BMD was 0.874 g/cm2, should this be considered a significant decline? And, importantly, do changes in BMD matter? The last question is the most important because if changes in BMD do not matter, then the statistical significance of any change is clinically irrelevant.
The relationship between BMD & fracture risk
BMD is a very good but clearly imperfect surrogate for bone strength [1]. The relationship between declining BMD and increasing fracture risk is well established and quantifiable [2–4]. The controversial issue is whether stability or increases in BMD under the influence of bone-active agents imply a reduction in fracture risk and, conversely, whether declines in BMD during therapy suggest therapeutic failure.
Early work from Wasnich et al. documented a statically significant and exponential relationship between declining BMD and increasing spine-fracture incidence [2]. Quantifiable increases in the relative risk (RR) for fracture with declining BMD have been documented in several prospective studies [3–7]. In a meta-analysis of such studies, Marshall et al. found a RR for fracture of 1.5 per standard deviation (SD) decline in bone density for all skeletal sites, with greater increases in RR for site-specific fracture risk measured at the site in question [8].
It seemed logical to expect that increases in bone density would result in reductions in fracture risk. However, with the publication of the fracture trials for the earliest antiresorptive agents, the observed relationship between increasing bone density and declining fracture risk was not as expected. Table 1 summarizes the increase in LS BMD versus placebo in these trials and the reduction in spine-fracture risk. With the publication of the results from the two arms of the Fracture Intervention Trial (FIT I and FIT II) using alendronate it was immediately apparent that the reduction in fracture risk was far greater than anticipated from the increase in BMD [9,10]. In the Multiple Outcomes of Raloxifene Evaluation, not only were the reductions in spine-fracture risk greater than anticipated based on the change in LS BMD, the reduction in risk was not proportional to the magnitude of the change in BMD compared with results from FIT I and FIT II [11]. The data from the two Vertebral Efficacy with Risedronate Therapy (VERT) trials perpetuated this debate and with the publication of the Prevent Recurrence of Osteoporotic Fractures trial utilizing intranasal salmon calcitonin, researchers began to wonder if increases in BMD mattered at all [12–14].
Change in lumbar spine BMD versus placebo and the reduction in spine fracture risk for various antiresorptive agents from their sentinel spine-fracture trials.
All trials were 3 years in duration (with the exception of the PROOF trial, which was 5 years in duration).
BMD: Bone mineral density; FIT: Fracture Intervention Trial; LS: Lumbar spine; MORE: Multiple Outcomes of Raloxifene Evaluation; PROOF: Prevent Recurrence of Osteoporotic Fractures; VERT-MN: Vertebral Efficacy with Risedronate Therapy – Multinational; VERT-NA: Vertebral Efficacy with Risedronate Therapy – North America.
To expeditiously examine these issues, meta-analyses of trials with antiresorptive agents were done using summary statistics to evaluate the relationship between changes in BMD versus placebo and the reduction in fracture risk. In a meta-analysis from Wasnich and Miller, a statistically significant relationship was found between increasing LS BMD versus placebo and declining spine-fracture risk [15]. The analysis also suggested that, in the absence of an increase in LS BMD versus placebo, a statistically significant 22% reduction in spine-fracture risk could still be seen. In a second meta-analysis from Cummings et al., there was once again a statistically significant relationship between increasing spine BMD versus placebo and the reduction in spine-fracture risk [16]. Similarly, in the absence of a significant increase in spine BMD versus placebo, a statistically significant reduction in spine fracture risk was still observed, although the magnitude of the reduction in risk differed very slightly, at 25%.
Individual patient data were used by several investigators to determine the percent of the absolute risk reduction (ARR) that was attributable to the increase in LS BMD with any particular agent. As noted in Table 2, the ARR was as little as 4% with raloxifene and as much as 28% with risedronate [16–19]. These analyses, however, should not be directly compared. Three of the four involved different agents and none of the four was performed using exactly the same statistical techniques. As illustrated by Shih et al., the use of different statistical techniques, even when applied to the same data, can result in a finding of an ARR as low as 10.6% or as high as 79% [20].
Absolute spine fracture risk reduction attributable to the increase in spine BMD.
ARR: Absolute risk reduction; BMD: Bone mineral density; LS: Lumbar spine.
Although interesting, these competing analyses were not particularly helpful in resolving this issue, however it was clear that most of the fracture-risk reduction was not explained by the apparent increase in BMD. Nevertheless, as a single, quantifiable entity, BMD did explain a sizeable proportion of the reduction in fracture risk. However, it was also clear from the summary statistic meta-analyses from Wasnich and Miller and Cummings et al., that a reduction in risk could be seen in the absence of an increase in BMD versus placebo [15,16]. This was also suggested by Chapurlat et al., using patient data from the first year of FIT I and FIT II [21]. This analysis was restricted to women in either arm of FIT who were at least 70% compliant with placebo or 5 mg alendronate per day. The women were divided into four groups based on their change in BMD at the end of the first year. The spine-fracture incidence at the end of the FIT trial was compared between the placebo- and alendronate-treated women within each of the four categories of BMD change. The authors found a statistically significant reduction in spine fracture incidence in the alendronate-treated women compared with the placebo group within each category of BMD change at the spine or total hip, except for the category in which both groups lost more than 4% from baseline. The reduction in spine-fracture incidence was also similar between categories of BMD change. These data have been incorrectly used to support the premise that changes in BMD are not relevant to fracture-risk reduction. What is actually illustrated, as suggested by the earlier meta-analyses, is that there is still a reduction in spine-fracture risk even when there is no change in the treated group compared with the placebo group.
Data from both arms of FIT have also been used to examine the relationship between the magnitude of changes in BMD from baseline under the influence of alendronate therapy and the reduction in spine-fracture risk [22]. When the alendronate-treated women were divided into three groups based on the change in spine BMD at the end of 24 months in FIT, those women who had a measured gain of less than 3% had a RR reduction of 38% compared with the women whose BMD did not change or declined by any magnitude. The alendronate-treated women who had a measured gain in spine BMD of 3% or more had a statistically significant RR reduction of 50%. In a similar, but not identical, analysis of risedronate-treated women from the two Vertebral Efficacy with Risedronate Therapy trials and the Hip Intervention Program, the risedronate-treated women who had no change or a gain of less than 3% or a gain of 3% or more in spine BMD had statistically significant reductions in spine fracture risk compared with those women who had any magnitude decline in spine BMD [23]. Both analyses would lead one to conclude that it is better, in terms of spine-fracture risk reduction, to gain BMD on therapy than to lose it.
Fortunately, the proportion of women with declines in BMD on therapy appears to be small. Table 3 summarizes the available data for several bone-active agents from different clinical trials [24–27]. This is reassuring because there are no data that suggest risk reduction is achieved if a significant decline in BMD from baseline or versus placebo is seen on any therapy.
Women with declines in spine BMD from baseline on various therapies.
All drugs were administered orally. Data for one drug should not be compared with data for another drug if the data do not come from the same trial.
ALN: Alendronate; CEE: Conjugated equine estrogen; IBN: Ibandronate; RIS: Risedronate; RLX: Raloxifene; qd: Once daily; qm: Once monthly; qwk: Once weekly.
Determining when a change in BMD is statistically significant
The relevant clinical question is what constitutes a statistically significant change in BMD? The statistical concept that must be understood is precision. Clinically, precision is the ability to reproduce a quantitative test result when the tests have been performed in identical fashion and conditions in the setting of no real biologic change. The formal definition of precision from the International Organization of Standardization (ISO) refers to the closeness of agreement between independent test results obtained under stipulated conditions [28]. Although repeatability and reproducibility are clinical synonyms for precision, Engelke and Glüer noted that the International Organization of Standardization uses these terms to denote different precision testing conditions [29]. For both, independent test results are obtained with the same method on the same test items. In the case of repeatability, the test results are obtained in the facility by the same operator using the same equipment for all tests. Reproducibility refers to the use of different facilities, different operators and different equipment to obtain the test results.
The inherent precision of DXA bone-density testing is excellent but not perfect. If a patient undergoes three LS bone-density tests within moments of each other, the bone-density results will be similar but not identical, even with consistent perfection on the part of the technologist. That is because the precision of the test is not perfect. A perfect precision value is 0. This is because precision reflects variability in the results and variability is undesirable. Therefore, the smaller the precision value, the better. Thus, it is understandable that the precision value is often called the ‘precision error’ or the ‘imprecision’ of the technique.
The International Society for Clinical Densitometry (ISCD) recommends that each densitometry facility perform an in vivo short-term precision study [30]. Such a study involves DXA-scanning of a specific number of individuals, each of whom is scanned a specific number of times within a few weeks. The number of individuals and scans per individual is determined on statistical grounds to ensure a minimum of 30 degrees of freedom (df). The concept of df is beyond the scope of this article, but is explained adequately elsewhere [31]. One such combination is 15 individuals scanned three-times each. Other combinations are ten individuals scanned four-times each or 30 individuals scanned twice each. Once the scans are completed, the standard deviation (SD) for each set of measurements on an individual is calculated. Then the root-mean-square SD (RMS-SD) for the entire group is calculated. The RMS-SD is the short-term precision value for the skeletal site at which the measurement was made. The formula for the RMS-SD, as described by Glüer et al., is [32]
Thus, the RMS-SD is equal to the square root of the sum of the squared SDs for each of the patients in the precision study, which has been divided by the total number of patients, m, in the precision study. The SD and the RMS-SD will have the same units as the original measurements, which in the case of DXA is g/cm2. International Organization of Standardization terminology would dictate that an RMS-SD obtained in this manner be called the repeatability of the test, although in clinical densitometry it is simply called precision. Precision calculators are available from several sources, such as the calculator available on the ISCD website [101], which negates the need to perform these calculations.
The ISCD short-term precision testing guidelines have been criticized on the grounds that 30 df, and therefore the ISCD recommended combination of 15 patients scanned three-times each, were insufficient to establish bone-density testing precision for clinical purposes [33]. The criticism was based on finding that the precision values for four of six different cohorts of 30 individuals each (with such a sample having 30 df) that were randomly chosen from a much larger cohort fell outside the 95% confidence interval for the precision value of the larger cohort, which had almost 200 df. However, the correct expectation is that the precision value of the entire cohort will fall within the 95% confidence interval of the sample cohort precision value. This was, in fact, the case for five of the six sample cohorts. In addition, it is a statistical certainty that precision studies with very large df will produce a confidence interval for the precision estimate that is narrower than a study with fewer df. Blake et al. noted that 30 df was necessary to ensure that the true precision value was not more than 34% greater or 20% less than the calculated precision value [34]. This uncertainty is reduced to 24 and 16%, respectively, at 50 df, while 200 df reduces the uncertainty to 11 and 9%, respectively. These larger dfs would require substantially more patients scanned and/or more scans per patient than is practical or desirable clinically. The more patients that are required, the more logistically difficult the study is to perform. The more scans per patients that are required, the greater the radiation exposure to the patient and the greater the likelihood that the patient will consciously or unconsciously assist in their repositioning for each study. From a practical and statistical perspective, the current ISCD guidelines are entirely appropriate.
The RMS-SD must be determined separately for each skeletal region of interest (ROI). This information is absolutely necessary for a clinician to determine when the measured changed in bone-density in their patient can be considered a real, biologic change. Therefore, every facility performing bone density testing must establish a specific precision value for each skeletal site that is measured.
Recognizing that the precision is not 0, one must then ask how much the bone density must change to be sure that the variability in the test has been exceeded. Stated another way, by how much must the precision error of the test be exceeded to consider a change in bone density statistically significant? The relevant concept here is called the least significant change (LSC).
The LSC is the magnitude of the change that must be equaled or exceeded to conclude that a real, biologic change has occurred. In other words, the variability in the test has been sufficiently exceeded such that it is reasonable to conclude that there has been a real change in the BMD. The formula for the LSC is
In this formula, Pr is the precision value, n1 is the number of bone-density studies at baseline and n2 is the number at follow-up. Z′ is a value that is determined by the chosen level of statistical confidence which is desired by the clinician. ISCD recommends that the LSC be determined at a 95% confidence level using a two-sided test for statistical confidence [30]. With this information, the Z′ value of 1.96 is simply selected from a statistical table. As the number of scans at baseline and follow-up is generally 1, the value under the square-root sign in the formula becomes 2. The formula for the 1x1LSC95 (the LSC for one scan at baseline and one at follow-up at 95% confidence) can then be reduced to
Or
Therefore, the LSC or the magnitude of the change in BMD at any given ROI that needs to be equaled or exceeded to conclude that a biologic change has occurred at a 95% confidence level is 2.77-times the precision value for that ROI. This formula is not dependent on the direction of the change in BMD and can be used to conclude that there has been a statistically significant loss or significant gain. If the measured change in BMD does not equal or exceed the LSC, the appropriate conclusion, regardless of the direction or magnitude of the measured change, is that no real change in BMD has occurred.
The magnitude of the LSC can be reduced if two or more scans are done at baseline and follow-up or if a lower level of statistical confidence is used [31]. While duplicate scans are often performed in clinical research trials, this would not seem to be a practical clinical suggestion. However, the use of a lower level of statistical confidence could be considered. The decision for the clinician is how confident he or she needs to be that a change in BMD has or has not occurred to base clinical decisions on the findings. A lower statistical confidence level of 80% is not unreasonable in many circumstances and, if used, would change the formula for the 1×1LSC to
Or
This results in an LSC that is approximately 35% less than the LSC calculated at 95% confidence. As a practical matter, this means that a smaller change in BMD is required to conclude that a real biologic change has occurred, but the level of confidence that it is a real biologic change is less. Although a change in BMD may not be significant at a 95% confidence level, it may be significant at a lower confidence level. With this knowledge, a clinician can determine the importance of the findings rather than summarily concluding that no change in BMD has occurred because it was not significant at the 95% confidence level. The level of statistical confidence for any magnitude of change can be determined if the precision is known. An automated statistical confidence calculator is available for purchase [35].
When should a bone-density study be repeated?
Recommendations from several organizations are published on the timing of repeat bone-density studies for the purpose of monitoring changes in BMD in different clinical circumstances. In 2002, ISCD noted that the typical monitoring interval for an individual receiving a prescription intervention to prevent bone loss or reduce fracture risk was not less than 1 year [30]. In 2001, and again in 2003, the American Association of Clinical Endocrinologists (AACE) recommended that BMD be measured yearly for a minimum of 2 years and until stability was seen [36,37]. Thereafter, a 2-year monitoring interval was recommended. In 2002, and again in 2006, the North American Menopause Society (NAMS) recommended a 2-year interval between studies [38,39]. The American Association of Clinical Endocrinologists and NAMS also addressed the monitoring interval for a woman not receiving pharmacologic intervention. The American Association of Clinical Endocrinologists recommended an interval of 3–5 years, if the baseline BMD was normal [37]. NAMS recommended the same interval in untreated women without characterizing the BMD in those women [39].
These recommendations regarding the monitoring interval are based on understanding the expected time required to reach the LSC. The posterior-anterior LS or total hip ROI is recommended for monitoring changes in BMD, because of the tendency for better precision at these sites than for the femoral neck or trochanter. The anticipated rate of change at an ROI is also important, as it is the combination of precision and rate of change at that ROI that ultimately determines the time to the LSC. When the anticipated rate of change is considered, the posterior-anterior LS is often the preferred site for monitoring [30].
Clinical considerations when a significant decline in BMD is observed
While statistically significant increases in BMD are reassuring and reasonably interpreted as indicative of therapeutic efficacy, stability of the bone density also implies efficacy. However, a significant decline in BMD, is worrisome. As noted earlier, there are no data to suggest therapeutic efficacy when a significant decline in BMD has occurred. The responder analyses shown in Table 3 are reassuring in this regard, in that only a small percentage of women have any magnitude of bone loss on currently available medications and very few sustain losses that would equal or exceed commonly encountered LSCs and thus be considered statistically significant. Nevertheless, in clinical practice, the physician may need to determine why a patient has a significant bone loss on therapy. Lewiecki summarized the most common reasons for observing BMD nonresponse during therapy [40]. In approximate order of likelihood, these included: poor adherence, comorbid conditions, calcium and vitamin D deficiency, malabsorption, metabolic factors, wrong dose, wrong dosing interval and, finally, lack of efficacy. As the women participating in the clinical trials reflected in Table 3 underwent extensive screening for secondary causes of bone loss, were provided with calcium and vitamin D and were continually monitored to ensure compliance with the drug, the percentages of women losing bone shown in Table 3 are likely lower than those that will be encountered in clinical practice. Correctable causes of bone loss should obviously be addressed. If no cause can be found, a change in therapy may be necessary.
Applying the short-term precision & LSC
The patient presented at the beginning of this discussion had a measured gain in BMD at L1-L4 over a 2-year period of 0.040 g/cm2. Is this a significant change? There are three things that must be known to answer this question. First, what is the precision of testing at L1-L4? Second, at what level of statistical confidence is the LSC being calculated and third, the number of scans done at baseline and at follow-up? It is already known that 1 scan was done at baseline and at follow-up. The LSC will be calculated at 95% confidence according to ISCD recommendations. The precision value must be obtained from an L1-L4 in vivo precision study. For this example, assume a precision of 0.011 g/cm2 at L1-L4. As one scan was done at baseline and follow-up and because the LSC is being calculated at 95% confidence, the LSC is equal to 2.77 × 0.011 g/cm2 or 0.031 g/cm2. The measured increase in BMD of 0.040 g/cm2 exceeds the LSC of 0.031 g/cm2 and is considered a statistically significant increase in BMD.
The second possibility described for this patient was a measured decline in BMD after 2 years of 0.018 g/cm2. Is this a significant decline? The measured change of 0.018 g/cm2 does not equal or exceed the previously calculated LSC of 0.031 g/cm2. The same LSC can be used because this two-sided approach is not dependent on the direction of the measured change. Consequently, although there was an apparent decline in BMD, the decline is not considered statistically significant. The correct interpretation of this finding is that the BMD has not changed.
Conclusion
The inherent precision of DXA bone-density testing is superb but not perfect. There will be variability in the measurement even when the test is performed consistently perfectly in the setting of no real biologic change in BMD. This variability can and must be quantified in a short-term precision study. The variability is correctly called the precision of the test, but is also referred to as the imprecision or precision error of the test. The precision must be known to calculate the LSC. Changes in BMD cannot be interpreted and proper clinical decisions made without knowledge of the LSC. And changes in BMD do matter. Meta-analyses as well as analyses of patient data from individual studies demonstrate a statistically significant relationship between increases in BMD and decreases in fracture risk. Although a reduction in spine-fracture risk can still be seen with no increase in spine BMD on therapy compared with placebo, no data support a reduction in fracture risk when BMD declines significantly from baseline or versus placebo. And because changes in BMD do matter, if a significant decline in BMD is seen during therapy, the patient should be rigorously evaluated for previously undetected causes of bone loss. If none are found, therapeutic alternatives should be considered.
Future perspective
Ideally, therapeutic efficacy is assessed quickly to avoid wasting time, money and effort in the prevention of a disease or disease outcome. In osteoporosis, therapies are intended to stop bone loss and reduce fracture risk. There is currently no better validated approach here than the measurement of bone density. Nevertheless, efficacy generally cannot be assessed for minimally a year. And, as previously noted, the change in BMD does not explain all of the reduction in fracture risk. Thus, we would like to be able to assess efficacy quicker and with even greater certainty than we currently can with the measurement of bone density. The measurement of biochemical markers of bone turnover, which may profoundly change within days of beginning an antiresorptive therapy, is the most likely candidate in the near term to assess efficacy more quickly. But this approach has not been extensively validated and it is not clear at all that this would capture a significantly greater proportion of the risk reduction than currently possible with the measurement of BMD.
Executive summary
Bone mineral density (BMD) is a surrogate for bone strength, rather than a direct measure of bone strength.
The relationship between declining BMD and increasing fracture risk is firmly established; the strength of the relationship between increasing BMD and decreasing fracture risk is controversial.
The decrease in fracture risk in placebo-controlled trials of antiresorptive agents is greater than expected from the magnitude of the increase in BMD and is not proportional among the various agents.
Meta-analyses using summary statistics and individual trial analyses of patient data suggest that increasing BMD versus placebo confers a reduction in spine-fracture risk.
It has never been demonstrated that spine-fracture risk reduction is achieved with any therapy if a significant decline in spine BMD from baseline or versus placebo is seen.
To determine if a change in bone density is statistically significant, the physician must know the least significant change (LSC) for the particular skeletal region measured.
The LSC is the magnitude of the change in BMD that must be equaled or exceeded to conclude that the change is real.
The LSC is dependent on the establishment of the short-term precision for the skeletal region measured and the desired level of statistical confidence.
Guidelines for the timing of repeat bone-density studies are based on an understanding of the length of time needed to reach the LSC.
If a significant decline in BMD is observed, a re-evaluation of the patient for compliance and secondary causes of bone loss is indicated before concluding that a change in therapy is necessary.
Footnotes
The author has no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.
No writing assistance was utilized in the production of this manuscript.
