Abstract
Objective
To assess the value of population screening for adult hypothyroidism.
Setting
Healthy people attending for a general health assessment.
Methods
A thyroid-stimulating hormone (TSH) measurement was performed on people attending for a general health assessment (women aged 50–79 [35–49 with a family history of thyroid disease] and men aged 65–79). Those with TSH levels above 4.0 mU/L were invited to join a randomized double-blind crossover trial of thyroxine and placebo, each given in random order for four months. On entry a second blood sample was collected for a TSH measurement after the end of the trial to determine whether this would help select individuals for thyroxine treatment. The daily thyroxine dose started at 50 µg and if necessary was increased to achieve a TSH level of 0.6–2.0 mU/L.
Results
There were 341 (8%) people with a TSH level above 4.0 mU/L, 110 met eligibility criteria (64 agreed to participate), and 56 (49 women, 7 men) completed the trial. Among the 15 individuals with a repeat TSH measurement above 4.5 mU/L, 11 reported feeling better on thyroxine than placebo and none reported feeling better on placebo (P = 0.001; four felt no different), indicating that in this group 73% benefitted (i.e. 11/15; 95% CI 45–92%). The main symptoms relieved were tiredness and loss of memory. There was no indication of harm. In the 41 individuals with a repeat serum TSH of 4.5 mU/L or less: 10 reported feeling better on thyroxine than placebo and 16 better on placebo (P = 0.42, 15 felt no different). Thus about 8% of men and women in the specified age groups had a TSH above 4.0 mU/L, and of these about a quarter had a repeat TSH above 4.5 mU/L, of whom about half would benefit from thyroxine treatment.
Conclusion
The results indicate that screening for hypothyroidism would be worthwhile. Approximately 1% of people screened would have a better quality of life. Pilot screening programmes for adult hypothyroidism are justified.
INTRODUCTION
Hypothyroidism affects about 1–2% of women over 50. 1 The disease is often overlooked because its typical symptoms (lethargy, tiredness, weight gain, forgetfulness) are non-specific and may be considered a natural consequence of ageing. 2 Early detection and treatment may improve quality of life, but the value of screening adults for hypothyroidism remains uncertain. 2–5 Serum thyroid-stimulating hormone (TSH) is the appropriate screening test, because it increases in response to an underactive thyroid before serum thyroxine (T4) decreases, 2–5 but about 8% of women over 50 have an elevated TSH 6 while only 1–2% appear to have a symptomatic response to thyroxine. 1 Symptomatic improvement on thyroxine treatment in people with high TSH is therefore a necessary diagnostic criterion, as well as being the long-term treatment.
Of seven randomized placebo-controlled trials of thyroxine in people with high TSH, three reported symptomatic improvement but four reported none, 7–13 a meta-analysis of the trials found no overall statistically significant effect, 14 and expert panels concluded that the evidence was too limited to judge whether screening is worthwhile. 4,5 We therefore carried out a randomized crossover trial. It was accepted at the outset that the 8% of the population identified as having elevated TSH would not all have hypothyroidism and any benefit from screening would reside in a subset of this group, and that it would be necessary to obtain an indication of the appropriate TSH cut-off, and of the value of identifying those people with a persistently raised TSH by performing a repeat TSH measurement several weeks after the first.
METHODS
The study was conducted among healthy adults attending for a general health assessment at five centres operated by the British United Provident Association (BUPA Wellness), at which TSH is measured routinely in all women aged 50–79, men aged 65–79, and women aged 35–49 with a family history of thyroid disease. 15 These age-sex groups thereby formed the eligibility criteria for the trial. BUPA Wellness used a cut-off of 4.0 mU/L, the 92nd centile, to define ‘high’ TSH. Eligible people with high TSH were sent an invitation letter to join the crossover trial with an information leaflet; those who joined gave written informed consent and were randomly allocated to take thyroxine first or placebo first. The following persons with high TSH were ineligible for the crossover trial: those who did not live or work in or near Greater London, did not speak English, had known thyroid, pituitary, adrenal or cardiovascular disease, or were taking drugs whose serum concentration was affected by thyroxine or which affected the serum concentration of TSH or free T4. 16
Participants started the four-month thyroxine period taking 50 µg per day for one month, increasing the dose monthly to 75 and 100 µg if necessary until the dose was sufficient to attain the target TSH range of 0.6–2.0 mU/L 17,18 (about the 5th to 60th centile). The half-life of thyroxine is seven days, 19 so one month is equivalent to four half-lives, a sufficient period to attain steady state. The four-month duration of the thyroxine period ensured that participants took the necessary dose of thyroxine for at least two months before the assessments at the end of the thyroxine periods, and the monthly monitoring avoided undertreatment and overtreatment. 6,20,21 During the four-month placebo period all participants had blood taken after one month, and some (selected at random) after two or three months, to preserve the blinding. The placebo and thyroxine 50, 75 and 100 µg capsules, manufactured by DHP Ltd, Crickhowell, Powys, UK, were identical in appearance, also to preserve the blinding.
At the end of the trial, before breaking the blinding, participants were asked whether they felt better during the first period, better during the second period or no different. This was the main outcome measure; it gave each participant the opportunity to judge whether there was benefit from taking thyroxine over placebo. Participants also completed a simple questionnaire asking whether they felt better with respect to each of 14 common symptoms of hypothyroidism during one period compared with the other. At the end of both periods (thyroxine and placebo) the participants completed questionnaires on quality of life (Short Form Survey; SF-36 version 2) and general psychological wellbeing (General Health Questionnaire; GHQ-30), 22–26 they were examined for physical signs of hypothyroidism, their blood pressure, pulse rate and body weight were measured, and their scores on indices of the clinical features of hypothyroidism, the Zulewski score 27 and the Billewicz index, 28 were calculated. These assessments were performed by one researcher (MA-H) who was blind to the order of the thyroxine and placebo periods, while another researcher who was not blind (MRL) despatched either thyroxine at the appropriate dose (according to the TSH measurement) or placebo to each participant, according to the study period.
Serum samples were collected at every visit. At the end of the trial serum concentrations of TSH were measured on every sample and thyroxine (T4), free T4, tri-iodothyronine (T3), free T3 and thyroid peroxidase (TPO) antibodies were measured on samples collected at the start of the trial and the end of each phase. All measurements were made using time-resolved fluoroimmunoassays (Auto DELFIAR, PerkinElmer, Turku, Finland). The Doctors Laboratory measured total cholesterol, HDL cholesterol and apolipoprotein B on the samples collected at the end of each phase, and serum TSH during dose monitoring.
The statistical analysis used Stata 10. 29 A discordant pairs analysis was used to compare the numbers of participants feeling better on thyroxine and placebo, 30,31 and paired t-tests to compare measurements made at the end of each period.
RESULTS
Figure 1 shows that 8% (341) of the 4365 people attending BUPA Wellness centres during the recruitment period in whom serum TSH was measured had values exceeding 4.0 mU/L, of whom 110 were eligible for the trial. The commonest reason for ineligibility was residence abroad or far from London. Of the 64 participants who joined the trial, 56 completed it (49 women and 7 men, mean age 58, range 35–74 years); eight did not complete it (because of loss of interest in 5, headache in 3). Baseline TSH was in the range 4.1–9.0 mU/L in all participants but one (in whom it was 13), and baseline FT4 was within its reference range in all the participants.

Flow of BUPA Wellness Centre attendees through each stage of the study
Table 1 shows the results of thyroid function tests at the end of each treatment period. Serum TSH fell to within the target range of 0.6–2.0 mU/L while taking thyroxine in all participants. The mean thyroxine dose was 72 µg/day (50 µg in 18, 75 µg in 28, 100 µg in 9, 125 µg in 1). During the four-month treatment period, at the time when each participant had taken his or her maximum thyroxine dose for one month, serum TSH was reduced to 1.31 mU/L on average, and at the end of the treatment period, when participants had taken their maximum thyroxine dose for two or more months, TSH was 1.30 mU/L on average, confirming that TSH responds to an increment in thyroxine dose within the one month dosage titration interval (participants could have been overtreated with thyroxine had this taken longer).
Results of thyroid function tests at the end of the placebo and thyroxine periods in the 56 participants who completed the randomized crossover trial
Table 2 shows the numbers of participants who felt better on thyroxine, no different or better on placebo according to the initial serum TSH measurements and the repeat measurements (performed on samples taken a median of 7 weeks after the first). The ratio of the number of participants feeling better on thyroxine to the number feeling better on placebo (the odds) increases with increasing TSH. The use of a repeat measurement improved discrimination. Among 15 participants with an elevated repeat serum TSH (>4.5 mU/L), 11 reported feeling better on thyroxine and none on placebo (P = 0.001). Four felt no different, so 73% (11/15; 95% CI 45–92%) of people with high repeat TSH benefitted from thyroxine. Using higher TSH cut-off values on the repeat test missed most of these 11 cases: only two of them, and three feeling no different, had repeat TSH >6.0 mU/L for example.
Numbers of trial participants who felt better on thyroxine, no different and better on placebo according to various cut-off values using one TSH measurement and using two TSH measurements about seven weeks apart
The 41 participants in whom the repeat TSH measurement fell below 4.5 mU/L showed no indication of benefit from thyroxine (10 felt better on thyroxine and 15 on placebo [P = 0.42], while 16 felt no different).
Table 3 shows the results of the 14-item symptoms questionnaire in the 11 participants with repeat TSH above 4.5 mU/L who felt better overall while taking thyroxine than placebo. Tiredness, forgetfulness and paraesthesiae were the symptoms most commonly alleviated. Eight of the 11 participants had one or more symptoms alleviated on thyroxine and none on placebo, and the reverse occurred in no participants. Among these 11 participants physical signs of hypothyroidism (delayed ankle jerk) were detected in six on placebo and one on thyroxine. Hence the Zulewski score, 27 an index of clinical features of hypothyroidism based on seven symptoms out of those in Table 3 and five physical signs, when modified to record improvement in a symptom in one period compared with the other rather than its presence or absence, was significantly lower on thyroxine than placebo (0.3 versus 2.4; P = 0.002), whereas the difference was not significant (P = 0.31) when calculated in the conventional way, comparing scores as if obtained in separate groups of people (Web-table 1). There was no significant change in pulse rate on thyroxine (70) compared with placebo (71).
Distribution of changes in reported symptoms among the 11 participants with two high TSH levels (>4.0 and >4.5 mU/L in the inital and repeat samples respectively) who felt better overall on thyroxine, according to symptoms and repeat TSH level (+better on thyroxine, −worse on thyroxine)
TSH, thyroid-stimulating hormone
The within-person standard deviation of serum TSH when not taking thyroxine was 1.7 mU/L, indicating wide random fluctuation over time. In people with usual TSH values that were not particularly high, a single measurement could, therefore, readily have exceeded the trial cut-off value of 4.0 mU/L (2 within-person SDs above a usual value of 2 mU/L on a single occasion would be 5.4 mU/L for example). Such people would have been selected as having ‘high’ TSH but their repeat measurements would tend to be lower and ‘regress to the mean’. In people with usual TSH values that were genuinely high the repeat measurements would tend to remain high. These results therefore account for the fall in serum TSH on repeat measurement in many participants (Table 2) through regression to the mean, and indicate the need to use repeat TSH measurements.
There was a reduction of 0.2 mmol/L in serum total cholesterol on thyroxine in the 56 participants (P = 0.01). Web-table 2 shows that family history of thyroid disease 15 and thyroid peroxidase antibody positivity 18 were less discriminatory than repeat serum TSH in identifying people who respond symptomatically to thyroxine.
The 56 participants who completed the trial, the eight who did not complete it and the 46 people who declined to participate all had similar scores with respect to symptoms of hypothyroidism (tiredness, dry skin, weight gain, constipation, depression) in a health questionnaire conducted at BUPA Wellness before recruitment, indicating that those who completed the trial were no more symptomatic on recruitment than people with high TSH in general, so their results should be generalizable.
DISCUSSION
Our results indicate that screening for hypothyroidism would be worthwhile. Of the 15 participants with high TSH on two successive measurements, 11 reported feeling better on thyroxine and none on placebo. All 11 judged their symptomatic improvement to be sufficient to continue taking thyroxine after the end of the trial. The number of participants on whom our conclusions are based is small, but the discordance of 11 improving on thyroxine and none on placebo provides compelling evidence of efficacy. It illustrates the statistical power of a crossover trial and a discordant pairs analysis. 29,30 If thyroxine had no effect, people who expressed a preference would be equally likely to prefer thyroxine or placebo, so the probability of reporting a benefit on thyroxine would be ½. For two people it is ¼ (½ × ½), and for 11 people it would be 11 halves multiplied together, one in 2048 (½) 11 ; or 1 in 1024 (P = 0.001) using a two-tailed test which allows for the possibility of a true benefit on placebo. In a discordant pairs analysis the people who feel no difference (4 in our analysis of 11 versus 0) are uninformative for the purposes of assessing whether there is a benefit and so are not included. They are included in the assessment of the size of the population benefit, in this case 11 out of 15 or 73%. Had there been 11 improving on thyroxine and 100 feeling no different, the evidence for a benefit would be just as strong (11 versus 0) but the proportion who benefit would be much smaller (11/111 or 10%).
Figure 2 is a flow diagram showing the outcome in 100 adults screened for hypothyroidism that would be expected from our results on TSH screening in women aged 50 and over and men 65 and over. Eight percent had TSH levels greater than 4.0 mU/L (Figure 1). From the 56 participants in our trial who represent these 8% (see Table 2), seven had TSH measurements on the basis of family history alone but were under the age cut-off leaving 49 (with or without a family history) who were at or above the age cut-off. Of these 11 (22%) had repeat TSH levels greater than 4.5 mU/L, of whom seven (14%) had symptomatic relief from taking thyroxine compared with taking placebo. The results therefore show that 1.1% (14% of 8%), or approximately 1%, of people screened would be expected to benefit from screening and subsequent thyroxine therapy.

Summary of adult screening for hypothyroidism: expected outcome in 100 people screened (based on Table 2 and second paragraph of Discussion section)
There were significant changes in the Zulewski score 27 (an index of clinical features of hypothyroidism), adapted to take advantage of the crossover design of the trial (allocating points for improvement in each clinical feature, not presence or absence). If a parallel group design had been used about a total of 600 participants would have been needed to demonstrate the expected difference in the Zulewski score (based on the results in Web-table 1), instead of only about 15 at the same level of significance with the crossover design. In this context a crossover design has a considerable advantage.
A previous randomized crossover trial of population screening also suggested a benefit; eight of 17 participants reported symptomatic improvement while taking thyroxine and one while taking placebo. 7 Six other randomized trials, which reported little or no improvement, 8–13 were of parallel-group design (or in one case, crossover but analysed as a parallel group trial) and consequently lacked statistical power. These six trials necessarily compared the Zulewski score or other indices of clinical features of hypothyroidism or quality-of-life questionnaires at the end of each period since parallel group trials can demonstrate benefit only by quantifying any change in symptoms in the thyroxine group and, separately, in the placebo group, and then comparing the two (as in Web-table 2). In view of the insensitivity of this approach it is not surprising that these analyses did not show a statistically significant improvement, again illustrating the value of the crossover design. It is recognized that crossover trials are more powerful but in detecting non-specific symptomatic improvement the gain is striking.
The results confirm that symptomatic response to thyroxine is a necessary diagnostic criterion of hypothyroidism; serum concentration of TSH or other markers alone do not accurately identify affected individuals as assessed by a benefit from thyroxine treatment. The ‘treatment’ is thus both diagnostic and therapeutic. The exclusion of the results from the eight participants who did not complete the trial does not introduce selection bias because of the crossover design of the trial whereby each person is their own control. The demonstration of benefit in people with TSH levels between 4.5 and 10 mU/L means that recommendations against treating people with TSH below 10 mU/L and serum T4 within the reference range (generally taken as mean ± 2 standard deviations) may need to be revised. 2,3
In our trial, it was expected that any benefit from screening would be limited to a subset of the participants because the prevalence of undiagnosed hypothyroidism in women over 50 is only 1–2%, 1 and this turned out to be the case. About two-thirds of the 341 people found to have high TSH (see Figure 1) were not eligible to join the trial (mainly because they lived far from London), but their exclusion will not have introduced bias because it could not have influenced the response to treatment. There were too few male participants (n = 7) for the trial to show statistically significant benefit in men but there is no reason to conclude that men would not benefit from screening as well as women, though fewer men stand to benefit because hypothyroidism is about one-fifth as common in men. 15
The study population commonly visit BUPA Wellness every 2–3 years, so that people with more pronounced symptoms or more extreme thyroid function may have been detected and treated previously. This may explain why only one participant had serum TSH above 10 mU/L (who incidentally felt no different on thyroxine), and none had serum T4 below the reference range. The study may therefore have underestimated the benefit from screening, the effect may be greater in practice. While people shown to benefit from thyroxine may, in the absence of screening, present with symptoms at a later date and be treated at that time, the opportunity to prevent troublesome symptoms during the intervening period, which could be several years, would have been lost. Without screening some people may never have their hypothyroidism diagnosed, and some may have symptoms misdiagnosed and be given inappropriate medication, such as antidepressants.
While our trial indicates the value of adult screening for hypothyroidism, pilot screening programmes, preferably adopting a placebo-controlled crossover trial of therapy in each person who is screen-positive, would be the next step to assess feasibility, logistics, acceptability and cost-effectiveness. Such pilot programmes would estimate the costs of screening and thereby the cost per case of hypothyroidism detected, and would be able to show that the benefits persisted for years (though there is no reason to believe that it would not). The majority of people with initial TSH values of 4.0 mU/L and repeat TSH above 4.5 mU/L would be expected to benefit from taking thyroxine. About 1% of women aged 50 and over would be expected to benefit. Although our trial includes women aged under 50 with a family history of thyroid disease, there were only seven of them, too few to judge whether selecting women on the basis of a family history is warranted and other evidence suggests that screening based on family history is unlikely to be worthwhile because the increased risk is small. 18,32
An adult hypothyroidism screening programme would be worthwhile. It would be distinctive in that it aims to improve the quality of life rather than extending the duration of life. The number of people who stand to benefit would be similar to that in existing adult screening programmes that focus on saving life.
Footnotes
ACKNOWLEDGEMENTS
We thank The BUPA Foundation for their financial support (Grant No. TBF 33a/05).
We also thank Peter Mace, Johann Carinus and other BUPA Wellness staff for their practical support and assistance in conducting the study; John Lazarus, Mark Simmonds and Joan Morris for comments on the manuscript; and Lynne George for laboratory assistance.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
