Abstract
The upper limit of the reference range for serum thyroid-stimulating hormone (TSH) is used to assist in identifying individuals with hypothyroidism. Improvements in TSH assays have led to better definition of the lower limit of the reference range, but the upper limit of the range for a healthy population is currently a topic of some debate. Population studies have improved our understanding of the clinical implications of elevated serum TSH concentrations in terms of future progression to hypothyroidism, but have not yet fully elucidated the correlation of modestly elevated TSH levels with long-term morbidity. This paper will review the current debate including the arguments for and against reducing the upper limit of the TSH range, whether such a level should be based on evidence from epidemiological studies, and the implications of categorizing large numbers of people with subclinical hypothyroidism. The impact of using different methodologies for the measurement of TSH and the inherent variability of results on reference ranges is also discussed. We argue that the reference range for TSH should be assay-specific and be determined by standard techniques in normal populations as recommended by the National Academy of Clinical Biochemistry. In contrast, we suggest that a decision level be determined separately from epidemiological studies to identify a population with subclinical hypothyroidism. Serial monitoring of TSH in this population deserves further study as a means of identifying those at risk of progressing to frank hypothyroidism and meriting treatment.
Introduction
The biochemical hallmark of primary hypothyroidism is an elevated serum concentration of thyroid-stimulating hormone (TSH). In the presence of characteristic clinical findings, a significantly raised TSH concentration and a low thyroxine concentration, there is usually no difficulty in making a clinical diagnosis. There is however, much current debate regarding the significance of a modestly elevated TSH concentration in the presence of a total or free thyroxine concentration within the reference range, so called subclinical hypothyroidism. Such subclinical hypothyroidism is often a biochemical diagnosis and patients may be asymptomatic or have non-specific symptoms without defining findings on physical examination. 1–4
In 2002, the National Academy of Clinical Biochemistry (NACB) published a monograph on laboratory support for the diagnosis and monitoring of thyroid disease.
1
The authors argued for adopting a reduced upper limit of the TSH reference range. The crux of their argument was that the existing upper limit of the range does not accurately discriminate between individuals regarded as euthyroid from those who have mild or undetected hypothyroidism. This was attributed by the authors to the following factors:
Discrepancies between studies in the selection of thyroid disease-free populations including a lack of sensitivity in assays for detection of antithyroid antibodies; Differences in results due to different TSH methodologies employed in studies and the influences of heterogeneity in the TSH molecule and interfering substances.
The monograph recommended a set of guidelines for establishing the reference range for TSH. The definition of euthyroidism has proved to be the most difficult to meet.
Based on the increased future risk of hypothyroidism in individuals with a TSH of >2.0 mU/L, as identified in the Whickham survey, the authors of the NACB monograph argued that the upper limit of the reference range should be 2.5 mU/L. 1
Reducing the upper limit of the TSH reference range would result in an increase in the proportion of the population diagnosed with subclinical hypothyroidism, frequently including those with non-specific symptoms probably unrelated to thyroid disease, and is a subject of considerable debate to which we contribute as follows.
Thyroid-stimulating hormone measurement and reference ranges
The influence of analytical factors on TSH reference ranges and thyroid guidelines was reviewed recently by Beckett and MacKenzie 5 who concluded that current TSH assays are fit for purpose and enable the identification of patients with overt thyroid disease who require treatment and patients with minor degrees of thyroid dysfunction where the benefits of treatment are still the subject of debate. However, Beckett and Mackenzie recommended that with respect to the issue of subclinical thyroid disease, guidelines should take account of the current limitations of assays, particularly with regard to assay bias. 5 Currently, the non-radiometric immunoassays, as used by over 80% of laboratories taking part in United Kingdom National External Quality Assessment Scheme (UKNEQAS) for thyroid hormones, indicate an expected reference range upper limit of between 3.05 mU/L and 5.5 mU/L. 6 Some, but not all, methods that report TSH concentrations higher than the ‘All Laboratories Trimmed Mean’ (ALTM) are associated with higher method bias when measuring TSH in euthyroid specimens, suggesting that another reason or reasons such as different antibody reactivity towards different TSH isoforms are responsible for the significant differences in the reported TSH results among laboratories. 6 It is likely that the variation in ranges found in large studies may well have been related, at least in part, to the type of immunoassay methods used in these studies. 7–10
It should also be noted that there is no published literature to support the reference values supplied by manufacturers with their assays and it remains unclear how these ranges are derived. Indeed, only minimal information is available in assay manuals. For the four methods with the largest user groups in the UKNEQAS for thyroid hormones, expected values are derived from the measurements of TSH on ‘euthyroid’ individuals with the caveat that individual laboratories should provide their own local reference ranges (Table 1). It is unclear of how many laboratories actually do heed this advice.
Details of reference populations used to determine reference ranges for thyroid-stimulating hormone, for the methods with the largest user base in UKNEQAS
Siemens Healthcare Diagnostics, IL, USA; Abbott Diagnostics, IL, USA; Roche Diagnostics, IL, USA. Data obtained from current assay manuals
Population studies and reference ranges for TSH
Over the past 20 years, there have been a small number of large population studies assessing the prevalence of thyroid disease, although only one large study has addressed the long-term implications of an abnormal TSH concentration (Table 2). 7,8
Thyroid-stimulating hormone (TSH) ranges (mU/L) in large population studies
RIA, radioimmunoassay; ELISA, enzyme-linked immunosorbent assay; CIMA, chemiluminescence immunometric assay; SCHo, subclinical hypothyroidism; NHANES, National Health and Nutrition Examination Survey
The Whickham surveys
A longitudinal study reported in 1977 and 1995, the Whickham survey, has had a major impact on our understanding of the clinical implications of an elevated TSH concentration. 7,8 In the initial study (2,779 individuals), TSH was measured using a radioimmunoassay. 11 Elevated TSH concentration (defined as >6 mU/L) was found in 2.8% and 7.5% of males and females, respectively. This percentage was markedly increased after the age of 45 in women, mostly in those who were positive for antithyroid peroxidase antibody. Minor degrees of hypothyroidism were defined as TSH >6 mU/L with no obvious clinical features of hypothyroidism. Individuals without clinical thyroid disease, goitre, family history of thyroid disease or positive antithyroid antibody status were identified as a ‘thyroid negative’ subgroup. This group had a mean TSH of 2.2 mU/L with an upper limit of 6.0 mU/L.
In 1995, a follow-up study of 1877 known survivors, in which TSH was measured using an enzyme-linked immunosorbent assay was carried out. 8 The study defined an upper ‘normal’ range limit (it is unclear whether this was manufacturer's or experimentally elucidated data) of 5.2 mU/L. A TSH of 5.2–10.0 mU/L was considered ‘borderline’ and a TSH of >10 mU/L as ‘clearly raised’. The ‘thyroid negative’ group (using a similar definition to that utilized in the first study) had an upper TSH limit of 4.29 mU/L in men and 4.54 mU/L in women. The odds ratios of developing future hypothyroidism if above these cut-off values, were eight for women and 44 for men. In this second study, the probability of hypothyroidism was plotted against log serum TSH as measured in the first study in surviving females. This plot showed a distinct change in the relationship at a TSH concentration of 2.00 mU/L above which any further elevation in TSH concentration was associated with an increased risk of developing hypothyroidism (Figure 1). This risk was further increased by the presence of family history of thyroid disease, positive antithyroid peroxidase antibody status or goitre.

Logit probability (log odds) of 20-year development of hypothyroidism with increasing concentration of thyroid-stimulating hormone at study entry in 912 female survivors of the Whickham Survey (reproduced with permission from the reference Vanderpump et al. 8 )
The Colorado Thyroid Disease Prevalence Study
The Colorado Thyroid Disease Prevalence Study was a cross-sectional study of 25,862 healthy individuals, conducted in 1995 and reported in 2000. 9 TSH assessment was conducted using a third generation chemiluminescent assay (London Diagnostics Eden Prairie, MN, USA). It was found that 9.5% of individuals had a TSH of >5.1 mU/L. A normal range was quoted as 0.3–5.1 mU/L, referring to an earlier assessment of the assay used. 12 Indeed that assessment quotes a normal range (mean ± 2SD) of 0.39–4.6 mU/L, with a mean of 1.9 mU/L. Individuals with TSH >5.1 mU/L but with low thyroxine concentrations were considered to be hypothyroid. Clinical and subclinical hypothyroidism were not defined at the outset of the study but identified by relating symptom percentage score to TSH concentration. This of course reflects the difficulty in making a clinical diagnosis of subclinical hypothyroidism, a point acknowledged by the authors. Both the hypothyroid and subclinical hypothyroidism groups reported more symptoms than those considered to be euthyroid. There was however, considerable overlap between the two groups.
The National Health and Nutrition Examination Survey
The most recent cross-sectional study is the National Health and Nutrition Examination Survey (NHANES III), a population study of 17,353 individuals conducted between 1988 and 1994, and reported in 2002. 10 This study reported TSH and thyroid hormone results for a ‘disease-free’ population of 16,533 individuals deemed not to have thyroid disease (as self-reported), goitre or be taking thyroid medication. A ‘reference population’ of 13,344 was also chosen; a disease-free population also excluding those with pregnancy, taking androgens or oestrogens, having thyroid antibodies or biochemical evidence of hypothyroidism or hyperthyroidism. TSH was measured using a chemiluminescent method (Nichols Institute Diagnostics, San Juan Capistrano, CA, USA), with a manufacturer's reported upper limit of 4.6 mU/L. Subclinical hypothyroidism, defined as TSH >4.5 mU/L with a normal thyroxine concentration, was present in 4.4% of the total population and 3.9% of the reference population. TSH mean for the reference population was 1.4 mU/L (1.37–1.44 95% confidence limit) with the TSH median and upper limit (95% confidence limit) for this group of 1.39 mU/L and 4.12 mU/L, respectively.
In the studies reviewed above limited clinical examination for the presence or absence of thyroid disease formed part of the follow-up of the Whickham study and no clinical or ultrasound examination took place in either the Colorado or NHANES studies. The three studies relied heavily on self-reporting and the use of historical data and questionnaires, and all three studies relied on total rather than free thyroxine, introducing the possibility that this parameter may have been overestimated in those with thyroid-binding protein abnormalities. 1 Measurement of free thyroxine by analogue methods would have its own limitations, because analogue immunoassays for free thyroid hormone are subject to changes in protein-binding capacity and may produce erroneous results in patients with non-thyroidal illness. 13 In addition, these methods are susceptible, in common with immunoassays, to interference from heterophilic and autoantibodies. 14
Studies based on NACB recommendations
Based on the NACB monograph, there have been two further studies attempting to establish a reference range for TSH. 15,16 For a subgroup of 1,036 individuals deemed not to have evidence of thyroid disease or underlying predisposing factors, the first study reported an upper TSH limit of 4.07 mU/L using time-resolved fluoroimmunometric assay (AutoDelfia, Perkin-Elmer, CT, USA). The second study utilized an immunochemiluminescent assay (Elecsys, Roche Diagnostics, IN, USA) for the study of 453 blood donors. Individuals were assessed for the presence of thyroid disease by means of a questionnaire and thyroid ultrasonography. Participants were without a personal or family history of thyroid disease, goitre and had negative results for antithyroid antibodies. The upper limit of TSH for the healthy donors with a normal thyroid gland was found to be 3.77 mU/L.
Influence of iodine status
Two studies have found evidence of variation in TSH range with underlying iodine status of the area. One study used questionnaires, blood and urine sampling and thyroid ultrasonography to compare TSH levels in two areas of Denmark with a slightly different iodine status. 17 This study reported an upper limit of the mean of TSH (after logarithmic transformation) of 1.47 and 1.37 mU/L among participants with mild and moderate iodine deficiency, respectively. It is unclear, however, how these limits compare to levels obtained in iodine-sufficient areas. The second study conducted in Germany reported a TSH upper limit of 2.12 mU/L in a previously iodine-deficient area with a stable supply in the last 10 years. 18 The authors attributed the differences between the ranges derived from this study and the NHANES III study to differing iodine status. NHANES III did not include ultrasonography of the thyroid gland but in the German study this procedure resulted in the exclusion of 34% of the participants without known history of thyroid disease from entering the final analysis. Both the Danish and the German studies used chemiluminescent assays (BRAHAMS, Berlin, Germany and Byk Sangtec Diagnostica GmbH, Frankfurt, Germany, respectively). The Danish study defined the upper limit of TSH as 3.6 mU/L while the method used in the German study quotes the upper limit of reference range as 3.0 mU/L. Of course, this may have an impact on any comparison between these results and that of the USA study, which used another chemiluminescent assay with a reported upper limit of 4.6 mU/L.
Discussion
In 2004, an expert panel from the American Thyroid Association and the American Association of Clinical Endocrinologists (ATA/AACE) reviewed the guidelines for the diagnosis and management of subclinical thyroid disease. 2 Based on the existing evidence for the benefit of treatment of subclinical hypothyroidism, the panel decided against recommending a change in the upper limit of the TSH reference range. The panel's reasoning was that, although some individuals within the range of 2.6–4.5 mU/L may have subclinical thyroid disease, there is a lack of evidence of adverse outcome in this group. Furthermore, the panel argued that within this range of TSH problems inherent to TSH methodology (unspecified technical problems, abnormal isoforms and heterophilic antibodies) may also account for the apparent raised baseline TSH concentration found in the NHANES III study. An interesting observation that is not readily referred to in the first report on the NHANES III study, is that the majority of participants had a TSH concentration <2.5 mU/L. 19 The main argument against change is that such an alteration could potentially identify large numbers of people currently defined as having a normal TSH concentration as hypothyroid or borderline hypothyroid. 19,20 In the USA over 20 million people are likely to have results of 2.5–3.0 mU/L, compared to under 10 million with results of 5–10 mU/L. 20 Indeed, it has already been suggested that defining a normal TSH range implies that individuals outside such a range should be considered for thyroxine replacement. 20
What is lacking is the evidence that there is increased morbidity with increasing concentrations and evidence of benefit from treating such patients. 8,19,21,22 In selected cases there is an argument for quality of life improvement by treating such individuals.
Nevertheless, the Whickham survey has demonstrated that the chance of developing future clinical hypothyroidism is increased when TSH concentration is above 2.0 mU/L, and that the ideal TSH concentration appears to be at, or slightly below, this concentration. The assay used in this study was of the first generation type, reported to have a functional sensitivity of between 1 mU/L and 2 mU/L. 23 We believe there is a case for reporting a decision cut-off in TSH concentration above which the likelihood of future thyroid dysfunction is increased but there is no need to redefine the upper limit of the reference range. Defining reference ranges normally includes, among other factors, the demographics of the study population, the exclusion of individuals with diseases affecting the parameter under review or taking any medication that may influence results and the assay used. In these respects, the definition of a TSH reference range should not be different. The epidemiological implications and cost-benefit analysis of redefining the reference range, as a separate entity from defining decision levels used to detect subclinical hypothyroidism, should not be determining factors. Indeed, defining the TSH reference range and decision levels for further action are separate issues. Decision levels are increasingly employed to streamline patient management; cholesterol targets being a prime example. 24
It has been highlighted that using population studies to establish reference ranges may not be sensitive enough to detect changes in thyroid function in individuals. 25–27 Each individual has a genetic set-point for TSH concentration and, despite wide inter-individual variation, there is a low index of individuality (ratio between within-individual and between-individual variability). In addition, TSH reference data may be skewed by the inclusion of subjects with occult thyroid disease. 28 Therefore, the establishment of an individual set-point of TSH concentration and serial measurements to establish trend and variation may be an alternative to the use of reference ranges. However, it has been shown that up to 85 measurements of TSH are required to determine an individual's set-point and that the mean, minimum difference for significance (at 5% level) was 0.75 mU/L (range 0.2–1.6). 27 In another report, when TSH concentrations were measured weekly for six weeks in 10 healthy subjects, a critical difference of 1.2 mU/L was reported. 29 In yet another study of patients considered subclinically hypothyroid, it was estimated that a 40% difference between two results indicated a significant change. 30 It is clearly impractical to carry out such large number of observations on individuals and further work is required to decide how to implement this concept in practice. However, once a significant change is identified or the TSH level exceeds a decision level, the patient could be considered as requiring further follow-up. A clinical decision, taking into account the patient presentation and findings on clinical examination, could then be made regarding further monitoring or treatment.
The effect of bias and variation in performance has been highlighted in a paper and an editorial in this journal. 5,31 The impact of these factors however, is not different from any other areas of care in which targets and decision levels have been set as a pragmatic, albeit not necessarily scientific, solution to affect the delivery of care to a certain cohort of the population. For the majority of clinicians, the issues of bias and variation are bewildering and can potentially prohibit meaningful clinical decisions. The case for TSH is not different and it is of little help to anyone attempting to advise on the management of a patient presenting with non-specific symptoms and found to have a borderline TSH concentration. Indeed, the notion of treating each individual patient on his or her own merit regardless of TSH concentration tends to support such an approach. 31
Conclusion
The decision of whether or not to reduce the upper limit of the TSH reference range should ideally be based on clinical studies enrolling large numbers of disease-free individuals and utilizing standardized TSH assays. Decision cut-offs to diagnose and perhaps to consider treatment of mild hypothyroidism are a separate issue and should not be confused with the issue of defining reference ranges. Such cut-offs may well be lower than the upper limit of reference range reported in many assays and recent studies suggest perhaps between 2.5 mU/L and 3.0 mU/L. Issues that currently need clarification are whether TSH concentration above such cut-offs are associated with long-term morbidity, whether treating such patients gives benefit in terms of improved outcome or indeed harm and whether serial monitoring of TSH concentrations assists in the diagnosis and management of patients with subclinical hypothyroidism. There are currently inadequate data in these areas, but they are issues relating to whether or not to diagnose and treat, rather than a dilemma pertaining to deciding what the reference range should be.
