Abstract
Background
A large discordance in the diagnosis and potential management of hypothyroidism using Abbott and Roche thyroid assays has been reported recently. The difference in Abbott and Roche thyroid-stimulating hormone (TSH) results in these studies was larger than anticipated from the external quality assessment (EQA) reports.
Methods
Abbott and Roche TSH method means in UK NEQAS for thyroid hormones distributions 430 to 454 were compared against the amount of TSH spiked. A TSH deplete serum pool was spiked with various concentrations of pooled high TSH serum and 3rd WHO International Standard for TSH (WHO-IS). Four serum pools with TSH close to clinical decision limits were spiked with two concentrations of WHO-IS.
Results
On review of EQA data, median (IQR) Roche: Abbott TSH ratio was lower (p < 0.001) in 48 pools spiked with TSH (1.11 (1.07–1.16)) compared to 41 pools not spiked (1.29 (1.25–1.31)) and the decrease was proportionate to the contribution of spiked TSH to total TSH in the samples (ρ=−0.908, p < 0.001). In spiking experiments, the relationship of Roche and Abbott TSH was different in TSH deplete pool spiked with WHO-IS (RocheTSH=1.13*AbbottTSH–0.52) and high TSH serum (RocheTSH=1.43*AbbottTSH–0.50), respectively. The Roche: Abbott TSH ratio decreased and the method agreement improved on spiking serum pools with WHO-IS.
Conclusion
Abbott and Roche TSH assays are not in harmony in human serum samples but the agreement was better in samples spiked with WHO-IS which contains pituitary-derived TSH. Use of pituitary-derived TSH spiked samples, such as provided by EQA schemes, may mask clinically significant between-assay differences.
Keywords
Background
The International Federation of Clinical Chemistry (IFCC) Committee for Standardization of Thyroid Function Tests (C-STFT) has been tasked to reduce between-method variability for thyroid assays. 1 For TSH, the imminent development of a reference measurement procedure was considered unlikely and, therefore, the C-STFT opted for regression-based mathematical recalibration harmonisation approach using split sample multiple method comparison and all-procedure trimmed mean (APTM) of WHO traceable assays.2,3 This approach transferred traceability from the WHO International Standard to a serum sample harmonisation panel. Pituitary TSH predominantly has N-acetylgalactosamine-sulfated biantennary glycoforms, whereas circulating human TSH predominantly contains sialylated glycans 4 and these different TSH isoforms have different immunoreactivity in assays.5,6 Compared to the traditional International Reference Preparation (IRP) referenced approach, the serum sample harmonisation panel–based recalibration approach is more likely to achieve commutable TSH assays as it includes serum samples and therefore typical glycoforms of serum TSH. 2
Both Abbott and Roche TSH assays, employed by approximately 75% of clinical laboratories in the United Kingdom, were standardised against the 2nd International Reference Preparation (IRP) for TSH 80/558.7,8 When stocks of the 2nd IRP for TSH were exhausted, the 3rd WHO International Standard (WHO-IS) for TSH 81/565 was obtained from the same highly purified human cadaver pituitary TSH extract as the 2nd IRP. 9 Both Abbott and Roche participated in the C-STFT harmonisation and were two of four assays in the APTM-4 panel used for mathematical recalibration of all the participating TSH assays. 10 Recently, however, a striking 56% discordance in the diagnosis and potential management of subclinical hypothyroidism (SCH)11,12 and a 14% difference in the potential management of primary hypothyroidism 13 have been reported using Abbott and Roche thyroid assays. This difference was largely due to not only different TSH results but also because of variations in the manufacturer-provided method-specific reference ranges.
Roche TSH results were 1.34 ± 0.14-fold higher than the Abbott TSH in samples from patients with primary hypothyroidism (n = 100) and subclinical hypothyroidism (n = 93) exchanged between July and September 2020 (Figure 1). The difference was predominantly proportional in the TSH range of 0.4–11.1 mIU/L.11,13 The prevailing external quality assessment (EQA) method mean data (UK NEQAS for thyroid hormones) for distributions 430 to 454, however, showed a much smaller between-method differences, especially around the clinical decision thresholds of upper reference limit and TSH of 10 mIU/L, despite relatively stable B scores. Most UK NEQAS samples with TSH higher than 2.5 mIU/L had been spiked with 3rd WHO-IS for TSH, SciPak TSH or unspecified TSH.
We, therefore, investigated whether the larger difference between Abbott and Roche TSH in biological samples compared to that reported in the EQA scheme was related to the TSH spike used in the scheme. We re-examined the EQA data and performed experiments comparing Roche and Abbott results in samples spiked with serum TSH and the 3rd WHO-IS for TSH.
Methods
The study consisted of three parts: (a) Review of Abbott and Roche TSH method mean and pool description data from the laboratory’s EQA reports, (b) spiking TSH deplete serum pool with human serum TSH and pituitary-derived TSH (3rd WHO-IS for TSH) to assess the relationship of the Abbott and Roche assays for spiked TSH from different sources and (c) spiking human serum pools with TSH result close to clinical decision thresholds with pituitary-derived TSH to assess the consistency of the relationship observed in (b) in the concentration range of clinical importance. 1. The UK NEQAS for thyroid hormones distributions 430 (July 2018) to 454 (September 2020) was reviewed. Monthly UK NEQAS for thyroid hormones distribution consisted of five samples before November 2019 and three samples thereafter. Data on Abbott and Roche TSH method mean were tabulated and pool description data including the amount of TSH spiked and other manipulations were collected. Samples with an undeclared quantity of TSH spiked (n = 3), unspecified manipulation (n = 5), added human serum albumin (n = 3) and below the limit of quantitation of either Roche or Abbott methods (n = 1) were excluded. Most UK NEQAS pools did not specify the source of the spiked TSH, but when specified it was either SciPak TSH or the 3rd WHO-IS for TSH. 2. Spiking experiments, outlined in Figure 2, were performed using a pooled high TSH biological sample and pituitary-derived TSH provided by the 3rd WHO-IS for TSH (NIBSC code 81/565).
9
A serum pool was created from patient samples with TSH below the limit of quantitation (0.04 mIU/L for Abbott Architect). TSH deplete serum was used to reconstitute the WHO-IS for TSH which generated a pituitary TSH sample with an Abbott TSH value of 53.3 mIU/L which was used for the spiking experiments. A high human TSH pooled serum was produced from four anonymized surplus patient serum samples with Abbott TSH values >50 mIU/L received on the day before the spiking experiment and stored at 2–8°C. The resultant pool had an Abbott TSH value of 70.3 mIU/L which was used for the spiking experiments. The remainder of the TSH deplete pool was divided into two halves. One half was spiked with WHO-IS for TSH to create six samples with TSH concentrations ranging from the limit of quantitation to approximately 30 mIU/L. The other half was spiked with high TSH pooled serum to achieve six samples with comparable concentrations. 3. As outlined in Figure 2, four serum pools from anonymized surplus patient samples were prepared to give TSH results close to the lower reference limit, mid reference range, upper reference limit and 10 mIU/L. Each pool was prepared by combining four to six anonymized surplus patient serum samples received within 2 days before the spiking experiment and stored at 2–8°C following the conclusion of the requested tests. The samples were chosen based on an arbitrary inclusion criterion of Abbott TSH of 0.4 ± 0.2 mIU/L, 2.0 ± 0.4 mIU/L, 5.0 ± 1.0 mIU/L and 10.0 ± 2.0 mIU/L, respectively. Each of the four serum pools was spiked with approximately 0, 4 and 8 mIU/L WHO-IS for TSH to create three samples from each pool. Spiking experiment outline.

Each sample described above was mixed well and divided into two aliquots, one each for measurements with Abbott and Roche methods. The spiking experiment was performed in the laboratory with Abbott analytical platform and the aliquots were stored at ambient temperature following spiking. An aliquot from each sample was stored and transported at ambient temperature to a neighbouring laboratory within the pathology network for analysis by the Roche method. All the samples were batch analysed in duplicate for TSH on an Architect i2000 SR (Abbott Laboratories, USA) and a cobas e801 (Roche Diagnostics, Germany) within eight hours of sample preparation. Roche measurements were within three hours of Abbott measurements. Intra-assay coefficient of variation (CV) for Abbott TSH was 1.6%, 1.4% and 1.7% and inter-assay CV was 3.0%, 3.1% and 3.6% at 0.05 mIU/L, 3.5 mIU/L and 18.2 mIU/L, respectively using Technopath Multichem® IA Plus quality control. Intra-assay CV for the Roche TSH was 1.5%, 0.7% and 1.2% and inter-assay CV for the Roche TSH was 2.8%, 2.0% and 2.2% at 0.14 mIU/L, 3.4 mIU/L and 26.7 mIU/L, respectively using Randox IA control. Both assays had satisfactory internal quality control results on the day of the study and satisfactory external quality assurance performance, compared with the method mean, in the month of the study (April 2021).
Data were tabulated in Excel (Microsoft Corporation) and statistical analysis was performed using Excel and IBM SPSS Statistics for Windows, version 26 (IBM Corp., USA). Since the data were non-parametric (Shapiro–Wilk test), Spearman rank correlation was used to measure the degree of association. Wilcoxon signed-rank test was used to assess the significance of the difference between paired Abbott and Roche TSH results and the Mann–Whitney U test was used to compare Roche: Abbott TSH ratios in spiked and non-spiked samples. The threshold for statistical significance was 5%. Data were expressed as medians with interquartile ranges (IQRs). An arbitrary cut-off of ≥10%, as used by the IFCC C-STFT, 1 was used to categorise TSH results as different.
Results
Review of UK NEQAS data
In the 89 pools studied, the method mean TSH results were higher (p < 0.001) for Roche than the Abbott. The median (IQR) Roche method mean TSH: Abbott method mean TSH ratio in 48 pools spiked with TSH 1.11 (1.07–1.16) was lower (p < 0.001) than in 41 pools not spiked with TSH 1.29 (1.25–1.31). The Roche method mean TSH: Abbott method mean TSH ratio decreased with increasing TSH spike concentrations (ρ= −0.777, p < 0.001) (Figure 3(a)) and the decrease was proportionate to the percentage contribution of spiked TSH to total TSH in the samples (ρ = −0.908, p < 0.001) (Figure 3(b)). All distributed non-spiked pools had Abbott and Roche method mean average TSH of ≤2.18 mIU/L with one exception. The exception was a pooled human serum sample with Abbott and Roche method mean average TSH of 46.1 mIU/L in distribution 445 (marked by an arrow in Figure 3(a), Roche and Abbott method means of 52.6 mIU/L and 39.6 mIU/L, respectively) which had a Roche method mean TSH: Abbott method mean TSH ratio of 1.33. (a) Roche and Abbott method mean TSH results in UK NEQAS for Thyroid Hormones samples (n = 89). Unfilled circles, filled rectangles and crosses indicate pools not spiked with TSH, pools spiked with TSH <8 mIU/L and pools spiked with TSH ≥8 mIU/L respectively. (b) Ratio of Roche method mean TSH to Abbott method mean TSH presented against the contribution of spiked TSH to the total TSH in the same UK NEQAS for Thyroid Hormones samples (n = 89). The average of Abbott method mean TSH and Roche method mean TSH was taken as total TSH in the sample. Note: TSH: thyroid stimulating hormone.
Spiking experiments
Overall, Roche TSH results were higher than the Abbott TSH results (n=22; p < 0.001) and the difference was greater in samples spiked with human serum TSH compared to those spiked with WHO-IS for TSH (Figure 4(a)). Roche TSH results were higher than Abbott TSH results in all the serum pools with TSH result close to clinical decision thresholds (n = 4) and in all the samples prepared by spiking TSH <LOQ pooled serum with high TSH serum pool (n = 5), and median (IQR) ratio of Roche: Abbott TSH was 1.25 (1.18–1.38). Abbott TSH results were higher than Roche TSH results in low TSH pools spiked up to 8 mIU/L with the 3rd WHO-IS for TSH (Figures 4(a) and (b) pool 1), and Roche TSH results were higher in the rest. In four out of five samples generated by spiking TSH <LOQ serum pool with the 3rd WHO-IS for TSH, Abbott and Roche results were within 10%. In all five samples generated by spiking TSH <LOQ serum pool with high TSH pooled serum, Roche TSH was >20% higher than Abbott TSH. The relationship of Roche and Abbott TSH results was different in TSH <LOQ pool spiked with high TSH serum pool (Roche TSH = 1.43*Abbott TSH – 0.50) and 3rd WHO-IS for TSH (Roche TSH = 1.13*Abbott TSH – 0.52) (Figure 4(a)). (a) Abbott and Roche TSH result in samples spiked with WHO International Standard for TSH compared with samples created from the same base pool with TSH <LOQ but spiked with high TSH pooled serum. (b) Abbott and Roche TSH results in pooled samples closer to clinical decision limits. The connected markers for each pool indicate base pool spiked at 4 mIU/L and 8 mIU/L with WHO International Standard for TSH and demonstrates improvement in the agreement of the methods when spiked with WHO International Standard for TSH. Note: TSH: thyroid stimulating hormone.
In four serum pools with TSH results near to clinical decision thresholds, baseline Roche TSH results were higher than Abbott TSH (Figure 4(b)). After spiking with the 3rd WHO-IS for TSH, the difference between Roche and Abbott TSH decreased across the tested range irrespective of starting TSH. The ratio of Roche: Abbott TSH correlated negatively with the 3rd WHO-IS spike (ρ= −0.621, p < 0.001).
Discussion
The Roche TSH results were significantly higher than Abbott TSH results in pooled patient serum samples and samples spiked with high TSH patient serum (Figures 1 and 4). Compared to samples spiked with human serum, the difference between Roche and Abbott TSH results was much smaller in samples spiked with WHO-IS for TSH (Figures 3 and 4).
In the patient samples, between-assay differences may lead to clinically significant differences in the diagnosis and management of hypothyroidism since current manufacturer provided assay-specific reference intervals do not account for these assay differences.11,13 EQA samples spiked with pituitary-derived TSH (3rd WHO-IS for TSH, SciPak TSH or unspecified source), however, did not detect these large differences evident in the patient samples and samples spiked with human serum TSH and may provide false reassurance about comparative assay performance. It is noteworthy that even when method differences are accounted for by method-specific reference intervals, clinical management differences around the universal TSH decision threshold of 10 mIU/L may still prevail as long as the methods remain non-harmonised.11,14
The between-assay differences in samples spiked with TSH from pooled human serum and the 3rd WHO-IS reported in this study are likely to be due to differing immunoreactivity of different TSH isoforms present in circulating and pituitary TSH. A C-STFT study concluded that modern TSH assays measured serum TSH in an equimolar fashion irrespective of glycosylation differences due to thyroid disorder–specific glycoforms of serum TSH and, therefore, are ‘glycosylation blind’. Our data, however, indicate that at least one of the Abbott or Roche TSH assays is not ‘blind’ to the pituitary TSH isoforms when compared to circulating TSH. The noted differences may, however, be due to glycosylation differences or other isoforms of TSH.5,6
Abbott and Roche participated in C-STFT TSH harmonisation. 10 Our previous data11,13 and the results of this study indicate that Abbott and Roche TSH assays are currently not in harmony. The reason for this is unknown but it could be due to long-term assay stability issues with either or both assays or non-implementation of harmonisation in routine production by the manufacturers. Both manufacturer package inserts state standardisation against 2nd IRP for TSH but neither mention further harmonisation.7,8
In the review of UK NEQAS data, NEQAS samples were spiked with either the 3rd WHO-IS for TSH, SciPak TSH or the source of the TSH was not specified in the pool description. SciPak TSH is no longer available for procurement and therefore our study was limited to comparing the Abbott and Roche TSH relationship using human serum TSH and the 3rd WHO-IS for TSH. The relationship may differ depending not only on the TSH source but also on the preparation method.
This study demonstrates better agreement of Abbott and Roche TSH assays in samples spiked with pituitary origin TSH compared to those spiked with human serum TSH. Laboratories should be aware that the presence or absence of method differences in manipulated EQA samples may not be replicated in biological samples. This is often classed as a matrix effect; however, in this study, it is likely to be an analyte effect due to varying cross-reactivity of different TSH glycoforms or isoforms. EQA providers should limit reliance on manipulated samples spiked with non-serum TSH and consider distributing pooled human serum samples or samples spiked with human serum TSH, especially around clinical decision cut-off values.
Footnotes
Acknowledgements
The authors thank Lauren Hughes for help in sourcing WHO International Standard for TSH.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Ethical approval
An institution approved service evaluation project ID 5441 with the Royal Wolverhampton NHS Trust.
Guarantor
TK.
Contributorship
TK conceived the study. TK and JF designed the study with input from HSC and AS. JF did laboratory experiments and compiled data. TK analysed the data and wrote the first draft of the manuscript. All the authors contributed to data interpretation, critically reviewed the manuscript and approved the final version.
Data availability statement
Data are available from the corresponding author on request.
