Abstract
True and precise routine measurements of quantities of clinical interest are essential if results are to be optimally interpreted for patient care. Additionally, results produced by different measurement procedures for the same measurand must be comparable if common diagnostic decision values and clinical research findings are to be broadly applied. Metrology, the science of measurement, provides laboratory medicine with a structured approach to the development and terminology of reference measurement systems which, when implemented, improve the accuracy and comparability of patients' results. The metrological approach is underpinned by the concepts of common measurement units, traceability of measured values, measurement uncertainty and commutability. Where traceability to the International System of Units (SI units) is not yet realized for a measurand, result comparability may be achievable by other, less ideal, approaches. Measurements are the core activity of clinical laboratories, and clinical biochemists should ensure that patients' results are traceable to the highest available reference. This review introduces and illustrates the principles of metrological traceability, describes its critical importance to improving the quality of patients' results and highlights the need to actively promote traceability in clinical laboratories.
Introduction
Clinical biochemistry measurement results are used for patient diagnosis and management, medication compliance, population screening, clinical trials, research, wellness testing and other purposes. Patients' measurement results for a given measurand must therefore be reliably comparable over time, and with clinical, legal, sporting and other types of decision values. Such uses require measurement procedures to be analytically specific and produce true and precise results. This goal is often confounded in practice because different measurement procedures for the same analyte are commonly in routine use, and comparison of their results is often clinically unacceptable. Additionally, clinical decision values for a measurand are often recommended by international expert groups without consideration of the different measurement procedures being used by clinical laboratories. It is therefore essential that patients' results for a measurand be comparable, irrespective of how or where they are produced. Improvement of result comparability lowers health-care costs; for example, implementation of primary reference materials and reference procedures for the measurement of serum cholesterol in the United States reduced result variability from 18% to <5%, with estimated savings in health-care costs in excess of US$100 m (£65 m) per annum due to reduced diagnostic mis-classification. 1 Lack of result comparability between different measurement procedures for the same analyte also limits the transfer to clinical practice of decision values recommended by an expert group or clinical research study because they are only applicable to patients results produced by the measurement procedure on which the recommendations were based.
Advances in automation, quality management and laboratory accreditation have improved the overall quality of clinical laboratory testing, but comparison studies, inter-laboratory round-robins and external quality assessment (EQA) programmes continue to show that different measurement procedures for the same measurand often still lack clinically acceptable comparability of results. A major challenge for clinical laboratories and in vitro diagnostic (IVD) medical device manufacturers is to ‘close the gap’.
In 1979, Tietz 2 advocated that achieving inter-laboratory comparability of clinical chemistry measurements required implementation of a ‘comprehensive, coherent measurement system’, and described a hierarchical model comprising definitive methods with negligible systematic error, reference methods and materials, with SI the preferred measurement units. Twenty-eight analytes were identified as having definitive methods, and 25 had reference methods either in place or in development. Fifteen years later, frustration at the lack of significant progress towards implementation of the proposed reference system was captured in the title ‘Accuracy in Clinical Chemistry – Does Anybody Care?’, in which Tietz 3 identified that the accuracy of many routine laboratory methods had declined as use of faster, automated methods and instrumentation increased. Since Tietz's cri de coeur there has been significant progress with both the theory and the practice of implementing a coherent reference system for measurements in clinical laboratories.
Basic components of a measurement
Quantities
Substances, objects, bodies or phenomena possess properties such as mass, length, colour, gender and name, some of which are measurable (e.g. mass, length), and some of which are not (e.g. blue, male). A measurable property in metrological terminology is a quantity, and a measurement is a comparison of the quantity of interest with a measurement unit (e.g. centimeter). The ratio of the magnitude of the quantity to the measurement unit is the quantity value (e.g. 2.5 cm). The description of a measurement result should include the name of the system (e.g. steel rod), the quantity measured (e.g. length), the kind-of-quantity (e.g. diameter), the quantity value and the measurement unit (e.g. centimeter). Because a steel rod diameter may depend on its temperature, the description should also record the temperature at which the measurement was made.
Measurands
A measurand is ‘the quantity intended to be measured’ (VIM 2.3). 4 The minimal description of a measurand identifies the system (e.g. serum, plasma), the component (analyte) of interest (e.g. total calcium) and the kind-of-quantity (e.g. amount-of-substance concentration); for example, amount-of-substance concentration of total calcium in plasma; measurement unit: mmol/L. The measurand may also depend on the specific measurement procedure used for its measurement (see later in this subsection). The phrase ‘intended to be measured’ recognizes that biological quantities are generally not amenable to direct measurement by routine procedures, and therefore must be indirectly estimated via measurement of other properties, sometimes referred to as ‘surrogate’ quantities. For example, the amount-of-substance concentration of calcium in serum is not directly measurable (i.e. number of calcium atoms/L), so another, but directly measurable, property of calcium is used, e.g. fluorescence intensity produced when calcium complexes with calcein or by atomic absorption spectrometry. The measurand is the same for both measurement methods, but the quantity actually measured (surrogate) is different. The relationship between the ‘surrogate’ quantity and the quantity intended to be measured (measurand) is established by calibration of the measuring system using a calibrator with a metrologically traceable assigned value for the measurand, followed by application of the appropriate measurement equation. In contrast, an example of a directly measured measurand is a count of white blood cells in urine using a microscope. The term ‘intended to be measured’ also recognizes that measurement conditions may alter the quantity being measured from that defined by the measurand, for example a small temperature change when measuring the catalytic activity concentration of an enzyme.
The unequivocal identification of a measurand of clinical interest is straightforward if it is a chemically well-characterized entity of known molecular structure and weight, for example urea or sodium. However, there can be difficulty with complex molecules such as peptides and proteins, which are often structurally heterogeneous due to post-translational modifications, e.g. glycosylation, sialylation, sulphation, complexes and catabolic fragments. Such structural complexity can mean inadequate knowledge as to whether one, several or all forms of an ‘analyte’ are of clinical interest. An example is human chorionic gonadotrophin (hCG), for which seven quantitatively significant isoforms have been characterized (intact and nicked hCG, α and β subunits, nicked β subunit, β core fragment and hyperglycosylated hCG). Their relative concentrations can differ markedly, depending on the clinical condition (pregnancy stage, gestational trophoblastic disease, hCG-secreting malignancy) and fluid examined (serum, urine); ideally, the clinically most useful hCG isoform(s) for each condition would be selectable for specific measurement.
Structural heterogeneity can also cause lack of analytical specificity. While immunoassays exploit the specificity of a monoclonal antibody for a selected epitope, the epitope may be present and reactive in some or all isoforms of the target protein or peptide, so that a measured quantity may relate to a mix of structurally related entities of unknown relative proportions. A further problem occurs if isoforms are present in significantly variable amounts in individual patients. A further difficulty for result comparability is that different commercial measurement procedures for the same proteins and peptides often employ antibodies directed against different epitopes, so that the analytical specificity and reactivity for the various isoforms of an analyte can differ significantly between the different measurement procedures, e.g. for hCG. 5 In such cases, the measurement procedure determines what is actually measured, so that identification of the procedure used is an essential part of the definition of the measurand.
Measurement units
Ancient Egyptians used the cubit, the distance from elbow to fingertips, as a measurement unit for length, others being thumb width, hand span, foot length and stride. Variability of body size meant measurements were unreliable for trading purposes, and therefore an attempt at measurement standardization introduced the royal cubit, a black granite rod against which other cubits were compared. Measurement units in late 18th century France differed ‘not only in every province, but in every district and almost every town’, 6 while in 1901 eight different standard gallons were in use in the USA. 1 As trade and science expanded during the 19th century, the use of different measurement units for the same quantity was increasingly recognized as a costly barrier to international commerce and scientific communication.
The need to broadly implement a coherent set of measurement units was formally recognized in 1871 when an International Conference established the International Bureau of Weights and Measures (Bureau International des Poids et Mesures, [BIPM]), followed in 1875 by 17 nations signing a diplomatic treaty, the Convention of the Metre. In 1889, the metre was defined as the distance between two lines on a platinum–iridium bar, and the unit of mass, the kilogram, was defined as equal to a platinum–iridium artefact, termed the international prototype of the kilogram. Both standards were maintained at BIPM and copies sent to international signatories. In 1971, the SI base quantity for chemistry, amount-of-substance, with mole as measurement unit, was introduced. The broad implementation of SI units has greatly facilitated international trade and scientific exchange. The ongoing role of BIPM is to provide the basis for a single, coherent system of measurements traceable to the SI, and to ensure equivalence of national measurement standards, including those used in laboratory medicine.
SI measurement units in clinical biochemistry
The SI base and derived quantities and units, with appropriate decimal multiples and submultiples, are widely used in clinical biochemistry (Table 1). The mole is the amount-of-substance of a system which contains as many elementary entities as there are atoms in 0.012 kg of carbon 12. The elementary entities must be specified, and may be atoms, molecules, ions, electrons, etc. 7 Non-SI units accepted for use with the SI system include minute (min), hour (h), day (d) and litre (L). To be of practical value the theoretical definitions of SI units must be physically created (realized) and easily transferable for global applications in industry, science and medicine.
SI quantities and units relevant for clinical biochemistry
Realization and transfer of an SI base unit
The SI base unit of mass, the kilogram (kg), is realized by the international prototype platinum–iridium cylinder held at the BIPM, and is transferable to National Metrology Institutes (NMIs) worldwide using stainless steel copies (Figure 1). An NMI can use its kg copy to produce and calibrate national mass standards for use by mass calibration reference laboratories. In turn, manufacturers can calibrate their scientific and other weighing instrumentation through comparison with the reference laboratory mass standards. The SI definition of the kilogram is therefore realized, and transferred in a series of comparison steps (calibrations) from the international prototype kg to end-users. An unbroken sequence of measurement standards and calibrations (calibration hierarchy) linking for example, the result of a mass measurement by an end-user to the reference (prototype kg), is termed a metrological traceability chain. A measured value produced through such a calibration hierarchy has the property of metrological traceability, defined as a ‘property of a measurement result whereby the result can be related to a reference through a documented unbroken chain of calibrations, each contributing to the measurement uncertainty’. 4

Calibration hierarchy for a mass measurement metrologically traceable to the SI unit (kg). MU, measurement uncertainty
The purpose of a calibration hierarchy is to transfer the trueness of a realized measurement unit for a quantity (reference) to the values produced by a routine measurement procedure for the same quantity. If results for a measured quantity (e.g. mass) produced by different measurement procedures are metrologically traceable to the same measurement unit, and the measured values have estimated measurement uncertainties, then the values will be comparable. Ideally, measurement results should be metrologically traceable to an SI unit, but if SI traceability is unavailable other references can be used, provided they are well defined and formally adopted (conventional measurement units; e.g. hour).
An unavoidable consequence of a calibration is that some error is introduced to the quantity value being transferred, the error accumulating with each calibration in the hierarchy. Therefore a critical component of metrological traceability is an estimate of the measurement uncertainty (MU) introduced by each calibration step.
Measurement uncertainty
All types of measurement are inherently inexact, regardless of scientific field and procedure. Even replicate measurements performed with conditions kept as constant as possible, generally obtain different values if the measuring system is sufficiently sensitive. A measurement result is therefore an estimate of the true value of the measurand, and it follows that if a true value cannot be exactly known, the magnitude of measurement error cannot be exactly quantified. To address this problem the concept of MU has been developed. 8–11 The MU approach assumes significant bias from the reference has been eliminated from the measured value, then calculates an interval of values within which the true value of the measured quantity is believed to lie, with a stated level of confidence, e.g. value = x ± y, ≈95% probability. In clinical laboratories, the interval is usually estimated from the dispersion of values obtained when the same measurement is repeated a number of times (e.g. quality control [QC] materials), generally under intermediate conditions. Standard MU (u) is typically expressed as one standard deviation (SD) of the repeated measured values. A single measured value is considered the best available estimate of a true value, and the dispersion of other possible values (i.e. if the measurement was repeated), usually as an expanded MU, is centred on that (Figure 2). Calculation of the MU associated with each calibration step in a calibration hierarchy is an essential component of establishing the metrological traceability of a measurement result.

Expanded measurement uncertainty (U) of a single measured value of an amount-of-substance concentration of x
Metrological traceability in clinical biochemistry
Clinical need
A 2003 study compared the certified values of 20 measurands commonly measured in human serum with the values obtained by approximately 1000 clinical biochemistry laboratories. Clinically significant differences were found for all measurands. For example, the values reported for serum total calcium and iron concentrations varied by up to ±15% from their target values, albumin varied −20% to +30%, cholesterol by −10% to +15%, creatinine by −20% to +50%, γ-glutamyltransferase by −60% to +30%, urea by −40% to +35% and thyroxine by −30% to +20%. 12 Such variability impacts on clinical management and health-care costs. Another study estimated that a positive bias of 0.125 mmol/L in a plasma total calcium procedure used to screen for hypercalcaemia would increase patient care costs in the United States by up to US$199m. 13 Another study estimated that a positive bias of 3% in cholesterol results would cause an approximate 9% increase in patients identified as hypercholesterolaemic, while a 10% positive bias would increase misclassified patients by about 28%. 14
When a clinical decision value for a measurand is recommended by an international expert group without consideration of result comparability between the different measurement procedures in routine use, then clinical decisions for a patient may differ significantly, depending on the measurement procedure used. For example, measurement of serum total prostate-specific antigen (PSA) concentrations with a clinical decision value of >4.0 μg/L for patients <60 years old is widely used to assist patient selection for prostate biopsy. 15 The cut-off value was determined from a large clinical study using a measurement procedure with a calibrator PSA concentration value assigned by the manufacturer (Hybritech®, San Diego, CA, USA) based on total protein content. However, most currently available PSA procedures now have their calibrator values assigned using a World Health Organization (WHO) international reference material (WHO 96/670) with a PSA value based on amino acid analysis and mass spectrometry (MS). A 2004 study of 2304 patients compared PSA results obtained by a procedure using the Hybritech calibrator with those from another commercial procedure calibrated with the WHO standard. Of 288 patients with PSA > 2.5 µg/L by both assays, 55 (19%) would have exceeded the clinical decision value of 4.0 μg/L based on the result produced by the Hybritech assay result, but would not have been candidates for prostate biopsy if their WHO-calibrated procedure values had been used. 16
Another study measured PSA concentrations in serum from 106 men (PSA interval 0.1–9.1 μg/L) using a measurement procedure with the Hybritech calibrator, and then re-measured with the same measurement procedure calibrated using WHO 96/670. The values produced by the two different calibrations showed a linear relationship, with the WHO-calibrated values approximately 20% lower. This relationship was then used to convert to WHO-based results the values obtained for 5865 men using the Hybritech-calibrated procedure. Not surprisingly, the study found that if the decision value of 4.0 μg/L from the original study was used to interpret the calculated WHO-calibrated values, the relative biopsy rate and cancer detection rates would have been reduced by 19% and 20%, respectively. 17 Despite this finding, many laboratories continue to report patients' PSA results using a WHO-calibrated procedure but with age-related decision values based on the original study.
There is much evidence that poor result comparability is detrimental to the diagnosis and management of patients, while expert groups often continue to recommend clinical decision values for measurands that lack adequate comparability of results across procedures in use, e.g. serum thyroid stimulating hormone (TSH) concentrations in managing primary hypothyroidism in pregnancy. 18 Ensuring calibrators of different measurement procedures for a given measurand are standardized to the same reference through the implementation of metrological traceability is essential to improving result comparability, and will also facilitate the use of common reference intervals and implementation of clinical research findings and evidence-based laboratory medicine. Achieving metrological traceability of patients' measurement results to the highest available reference will lead to improved patient care and reduced health-care costs, and is therefore a vital task for clinical biochemists to vigorously undertake.
A major driver for improving metrological traceability of measurement results was the European Union (EU) Directive for IVD medical devices enacted in 1998, which stated ‘The traceability of values assigned to calibrators and/or control materials must be assured through available reference measurement procedures and/or available reference materials of a higher order’. 19 Commercial IVD measurement procedures failing to comply after 2003 could not be marketed within the EU.
Requirements for metrological traceability
Establishing metrological traceability to an SI unit for physical measurements such as mass or length is generally straightforward because the quantities of interest are well defined and can usually be directly measured by specific methods. In contrast, biological quantities of clinical interest are often difficult to unequivocally define in terms of molecular structure and weight, usually reside in a complex matrix such as serum and, apart from those requiring cell counting or sizing, are not amenable to direct measurement by routine procedures.
If an analyte of clinical interest is of known molecular or atomic weight and obtainable in highly pure form, it can generally be gravimetrically prepared as a solution of known amount-of-substance concentration with a small MU. Such a solution is an ideal primary calibrator for a reference measurement procedure because the measurand value embodies the SI measurement unit, i.e. directly traceable to the mole. However, for patients' measured values of a measurand to be metrologically traceable to an SI unit, the trueness of the primary calibrator must be transferred, with the smallest achievable MU, to the quantity value assigned to the calibrator of the routine laboratory measurement procedure. Direct transfer is not possible because routine measurement procedures are designed for use with biological samples, and therefore require calibrators to be matrix-matched to patient samples. The transfer of trueness from primary calibrators to routine calibrators is therefore achieved by a reference measurement system comprising a sequence of alternating reference materials and reference measurement procedures. The ideal aim of a metrological traceability chain is that measurement results produced by a routine measurement procedure are the same as if the quantity in the patient samples had been measured by the reference measurement procedure.
The general principles and features of reference measurement systems for establishing metrological traceability are described in the international standard, International Organization for Standardization (ISO) 17511:2003, 20 and in several authoritative guidelines and reviews. 21–24
Components of a reference measurement system
1. Primary reference materials
A primary reference material is a preparation of known composition containing the analyte of interest. The analyte must be physico-chemically fully characterized, with a known molecular or atomic weight. The material has a known purity with an associated uncertainty, generally determined by measurement of trace contaminants and moisture content, e.g. crystalline creatinine. Primary reference materials are prepared by NMIs and accredited reference measurement laboratories recognized for the purpose, 25 such as the National Institute of Standards and Technology (NIST) and the Institute for Reference Materials and Measurements (IRMM), which issue certificates of analysis (Standard Reference Material, SRM®; Certified Reference Material, CRM, respectively) if the material meets stringent technical and documentation specifications, including traceability of the assigned quantity value to an SI unit, purity, stability and homogeneity. 26,27 Available primary reference materials are listed on the website of the Joint Committee for Traceability in Laboratory Medicine (JCTLM). 28 A primary reference material can be used with a primary reference measurement procedure to prepare primary calibrators with assigned quantity values that are directly traceable to an SI unit.
2. Primary reference measurement procedures
Primary reference measurement procedures occupy the highest metrological order, and are defined as a ‘reference measurement procedure used to obtain a measurement result without relation to a measurement standard for a quantity of the same kind’ (VIM 2.8). 4
A primary reference procedure:
Can be described by a measurement function and complete uncertainty budget in terms of SI units; Produces quantity values metrologically traceable to an SI unit without use of a calibrator for the same quantity that is measured by procedures lower in a calibration hierarchy; Has a small relative MU; Is endorsed by an authoritative NMI or international scientific body to facilitate universal acceptance.
Primary reference measurement procedures, e.g. gravimetry, are used by NMIs and accredited reference measurement laboratories to prepare primary calibrators that have quantity values and associated uncertainties that are directly traceable to SI units, e.g. mole. For example, if the analyte of interest is obtainable as a primary reference material, an appropriately sensitive and accurate scientific balance and high-grade calibrated volumetric flasks can be used to gravimetrically prepare one or more solutions of known mass concentration, with a small uncertainty. Because the analyte has a known molecular structure and weight, the molar concentration can be calculated, and therefore the quantity value assigned to the primary calibrator is directly traceable to the SI unit. An example is the gravimetric preparation of mixtures of pure HbA0 and HbA1c as primary calibrators.
29
If a primary reference material is unavailable, a primary calibrator cannot be prepared.
3. Primary calibrators
Measurement standards (calibrators) are the ‘realization of the definition of a given quantity, with stated quantity value and associated measurement uncertainty, used as a reference’ (VIM 5.1). 4 A primary calibrator prepared using a primary reference measurement procedure is generally an aqueous or acidified solution that realizes the SI unit (e.g. mol/L) for the quantity of interest (e.g. amount-of-substance concentration) with a very small MU. The characteristics of primary calibrators (analyte, matrix, quantity value and uncertainty, traceability, methods used, intended purpose) are fully described, and certified as reference materials in accompanying certificates issued by the responsible NMI or accredited reference laboratory. Currently, primary calibrators are available for a relatively small number of analytes of clinical interest, and are listed on the JCTLM website. 28 Primary calibrators are used to calibrate secondary reference measurement procedures.
4. Secondary reference measurement procedures
For the purpose of establishing metrological calibration hierarchies, secondary reference measurement procedures are required for the assignment of quantity values to matrix-matched (e.g. serum) secondary calibrators, thereby transferring, with the smallest achievable uncertainty, the trueness of the quantity value of the primary calibrator. They should therefore measure the same quantity as the routine measurement procedure, be analytically specific, fully characterized and validated, be uninfluenced by sample matrix, and indicate quantity values with an MU appropriate for their purpose. Secondary reference measurement procedures generally employ a different measurement principle to that of the primary reference procedure, e.g. high-performance liquid chromatography-MS procedure for HbA1c, 30 and for metrological traceability purposes are performed by NMIs and accredited reference measurement laboratories. If the reference and routine measurement procedures for a given measurand do not have the same analytical specificity, then the metrological traceability chain for patients' values is broken. Secondary reference measurement procedures must meet detailed documentation requirements concerning all aspects of the measurement procedure, as described in ISO 15193. 31 Secondary reference measurement procedures are also used to assess the commutability (see later) of lower order calibrators, assign quantity values to trueness controls and panels of patient samples and assess the analytical performance and comparability of results produced by other procedures measuring the same quantity. A list of such higher order reference measurement procedures is maintained on the JCTLM website. 28 If a primary calibrator is unavailable for a given quantity, then a secondary reference measurement procedure cannot be developed for the measurand of interest.
5. Secondary calibrators
Secondary calibrators are matrix-matched materials with assigned quantity values and associated uncertainties, and are produced and certified as CRMs by NMIs and accredited reference laboratories with appropriate expertise. Secondary calibrators can be used by IVD medical device manufacturers to calibrate their selected or standing measurement procedures for assigning quantity values to in-house master or commercial product calibrators, and may occasionally be used by routine laboratories for assessment of systematic error. A list of higher order secondary calibrators is maintained on the JCTLM website. 28 If a secondary reference measurement procedure is unavailable, then secondary calibrators cannot be produced.
6. International conventional reference measurement procedures
For measurands for which secondary calibrators are unavailable, there may be available international conventional reference measurement procedures which are not primary, are calibrated by international conventional calibrators which are not SI-traceable (see below), and are endorsed by an appropriate international body to promote international acceptance of a single reference measurement system for the measurand (e.g. International Federation for Clinical Chemistry and Laboratory Medicine [IFCC], International Council for Standardization in Hematology [ICSH]). International conventional reference measurement procedures may also define the measurement unit for some measurands, for example IFCC reference procedures for measuring catalytic activity concentrations of various serum enzymes, e.g. alanine aminotransferase. 32 International conventional reference measurement procedures are undertaken for traceability purposes by NMIs and accredited reference measurement laboratories with relevant expertise.
7. International conventional calibrators
As noted earlier, many measurands of clinical interest are inadequately defined in terms of their physico-chemical characteristics and molecular weights, so that primary and secondary calibrators with SI-traceable values are unavailable. Instead, international conventional calibrators, which are not SI-traceable, may be available; these being defined as a ‘measurement standard recognized by signatories to an international agreement and intended to serve worldwide’ (VIM 5.2). 4 Such materials are prepared and assigned arbitrary or other units according to protocols developed by expert groups, 33 and adopted as international conventional calibrators by appropriate international scientific bodies. A provider of such calibrators, particularly for peptide and protein hormones, are the WHO international laboratories, coordinated by the WHO Expert Committee on Biological Standardization (WHO-ECBS), 33 which aims ‘To define an internationally agreed unit for each biological substance to allow comparison of biological measurements worldwide’. The WHO-ECBS provides reference materials with long-term stability to which arbitrary units (International Units, IU) are assigned based on the biological activity of the material using specified conditions or, where appropriate, by amino acid analysis, protein mass determination or conventional reference methods if available. The reference material must demonstrate commutability (see later) with patient samples. 34,35 The first preparation of a standard material is designated a First International Standard (IS), with the biological activity expressed in arbitrary international units (IUs) without an associated MU. To ensure continuity of the arbitrary unit when the first IS requires replacement, values for the second IS are aligned through international collaborative studies using multiple methods. It should be noted that the uncertainty of assigned IU values is not stated, and metrological traceability is not claimed.
8. Manufacturers' measurement procedures and calibrators
Selected and standing measurement procedures are used by IVD medical device manufacturers for assigning quantity values respectively to in-house master calibrators and commercial product calibrators. Depending on the measurand and commercial considerations, a manufacturer may choose a secondary reference measurement procedure as their selected procedure (e.g. hexokinase/glucose-6-phosphate dehydrogenase for glucose measurement), while a standing procedure is usually an optimized version of their product routine laboratory procedure (e.g. glucose oxidase procedure).
Commutability of calibrators
If two consecutive measurement procedures in a calibration hierarchy produce comparable patients' measurement results across their reportable intervals, then the calibrator linking the two procedures is said to have the property of commutability, meaning that the analyte in the calibrator and in patients' samples interact in the same way with both measuring systems. Commutability of each calibrator in a calibration hierarchy is essential for patients' measurement results to be metrologically traceable to the highest reference used. 20 Commutability has been defined in various ways. 4,20,36 A recent recommendation for a practical definition of commutability is ‘The equivalence of the mathematical relationships among the results of different measurement procedures for an RM and for representative samples of the type intended to be measured.’ 37 A commutability study is a comparison of the measured values produced by the procedure positioned either side of a calibrator in a calibration hierarchy, one of which is usually a reference procedure, and the other a routine procedure. Commutability is therefore a property of calibrators, and is also applicable to materials used for external proficiency testing and QC.
The commutability of a secondary calibrator with a routine measurement procedure in the same calibration hierarchy can be investigated, for example, by measuring the calibrator and a panel of patient samples by both the secondary reference and the routine measurement procedure. The patient samples should have a range of measurand values representative of those encountered in routine practice, and the reference measurement procedure should be performed by an accredited measurement reference laboratory. If the ratio of the values obtained for the secondary calibrator by the two procedures is not significantly different from the ratio of values obtained for the patient samples, then the secondary calibrator is deemed commutable with the routine measurement procedure. There are various other approaches to assessing commutability. 22,37
Non-commutability of a calibrator breaks the metrological traceability of results produced by the measurement procedure for which the secondary or international conventional calibrator is non-commutable, i.e. the trueness of the reference calibrator is not transferred to the patients' measured values from the routine procedure. Non-commutability occurs if patient samples cause matrix effects sufficient to alter the measurement signal generated by the quantity being measured, i.e. the calibrator and samples behave differently in the measuring system. A matrix effect is defined as an ‘influence of a property of the sample, other than the measurand, on the measurement of the measurand according to a specified measurement procedure and thereby on its measured value.’ 20 A useful guideline, primarily for use by method manufacturers and producers of proficiency testing materials, describes how to evaluate matrix effects. 38
During manufacture, secondary calibrators may suffer matrix modifications caused by steps such as lyophilization, freeze-thawing, filtration, etc., and will be unsuitable for assigning measurand values to routine calibrators. For example, Thienpont et al. found that minimal preparation steps for a human serum thyroid hormone reference material significantly disturbed the equilibrium between the protein-bound and free hormone. 39 A recent study of two candidate reference materials for the measurement of cardiac troponin I found that non-commutability of one or other material was demonstrated for nearly half of 15 different routine measurement procedures evaluated. 40 Addition of human, non-human or synthetic analyte to matrix-matched material to achieve desirable concentrations may also introduce matrix effects or altered analyte composition, and render a reference material non-commutable, as has been shown for some commercial PSA calibrators. 41 In such cases, a manufacturer may need to calibrate their routine measurement procedure using the correlation between measurand values obtained for a panel of patient samples measured by both a matrix-insensitive reference measurement procedure and the routine procedure. 24
It should be noted that systematic error in patients' measurement results caused by a difference in analyte composition between the calibrator and patient samples also breaks a traceability chain. Lack of analytical specificity is a limiting property of the measurement procedure itself rather than non-commutability of the calibrator, and is a major cause of non-traceability of immunoassay measurement results due to different manufacturers selecting different epitopes and antibodies for the same ‘analyte’. It should also be noted that reference materials listed on the JCTLM website are not evaluated for commutability, and it is the responsibility of manufacturers to investigate the commutability of their product calibrators, and to make such information available to end-users. 20
Calibration hierarchies
The general principles and features of organizing reference measurement procedures and reference materials into calibration hierarchies that reflect their metrological pedigree is described in the international standard ISO 17511:2003, 20 and in several authoritative guidelines and reviews. 21–24
High level of metrological traceability
The highest level of calibration hierarchy transfers the trueness, with an estimated MU, of the SI unit realized by a primary reference material to patients' values produced by a routine measurement procedure via a sequence of alternating measurement procedures and calibrators (Figures 3 and 4a). Patients' values produced by different routine measurement procedures that share the same metrological reference will be comparable. Currently, a minority of analytes have reference calibrators available that are traceable to an SI unit (e.g. mole, katal), comprising electrolytes (6), enzymes (6), metabolites (7), steroid hormones (3), thyroid hormones (2), proteins (9), therapeutic drugs (6), vitamins (2), and a number of amino acids, toxic elements and drugs of abuse. Calcium is an example of a physicochemically well-characterized analyte, the measurement of which in biological fluids has well-established clinical applications.

Metrological traceability chain to the SI unit (mol/L) of patients' measured values for the amount-of-substance concentration of calcium in serum. Central thin solid arrows, calibration steps; central thick solid arrows, transfers of trueness of quantity value. U = expanded combined measurement uncertainty (≈95% confidence) SRM, Standard Reference Material; NIST, National Institute of Standards and Technology; MU, measurement uncertainty; BIPM, International Bureau of Weights and Measures; NMI, National Metrology Institutes; IVD, In vitro diagnostic; AAS, atomic absorption spectrometry; ID-TIMS, Isotope dilution-thermal ionization mass spectrometry, ISO, International Organization for Standardization; IEC, International Electrotechnical Commission

Calibration hierarchies based on ISO 1751120 CRM, Certified Reference Material; QC, quality control
Primary reference material: e.g. NIST Standard Reference Material® (SRM®) 915b, containing calcium carbonate of mass fraction (99.907 ± 0.021) % (≈95% confidence).
Primary reference measurement procedure: Gravimetry.
Primary calibrator: SRM 915b gravimetrically prepared as a primary calibrator containing a known mass fraction of calcium, with a small MU, in a stable acidified solution (volume fraction of nitric acid: 10%), e.g. NIST SRM 3109a, certified mass fraction of calcium = (10.025 ± 0.017) mg/g.
Secondary reference measurement procedure: A matrix-insensitive method suitable for calcium is ID-thermal ionization mass spectrometry, using NIST SRM 3109a for calibration, to produce matrix-matched (serum) secondary calibrators.
Secondary calibrator: Certified secondary calibrators for amount-of-substance concentration of calcium in human serum are currently available from IRMM and NIST, e.g. NIST SRM® 909b – two amount-of-substance concentrations of calcium in human serum (2.218 ± 0.016) mmol/L; (3.532 ± 0.028) mmol/L; ≈95% confidence.
Manufacturers' selected and standing measurement procedures: A manufacturer's selected measurement procedure assigns quantity values to in-house matrix-matched master calibrators and should be of a higher metrological order than that employed in routine laboratories, e.g. atomic absorption spectrometry procedure calibrated by NIST SRM 909b. Standing measurement procedures, using master calibrators, assign quantity values to product calibrators, and are usually the commercial measurement procedures (e.g. photometry of dye-binding by calcium) with optimized analytical performance.
Product calibrator: When assigning a quantity value to a product calibrator, the manufacturer must state its uncertainty, ensuring that the uncertainties contributed by all previous transfer steps in the calibration hierarchy are included. This information should be provided to laboratory users with each new calibrator batch, enabling them to calculate the combined standard MU of their patients' values.
Routine calcium measurement procedures: Several different methods are available for the routine measurement of amount-of-substance concentrations of total calcium in serum and plasma, commonly utilizing either the arsenazo lll or o-cresolphthalein dye-binding properties of calcium, with a small minority using atomic absorption spectrometry or total calcium-specific electrodes. Information provided with product calibrators should include:
The assigned value and associated MU; The highest metrological reference to which the calibrator value is traceable; Statement of commutability.
If the manufacturer's instructions for use of a measurement procedure are not followed, the laboratory is responsible for validating that the modified procedure maintains metrological traceability of patients' results.
Lower levels of metrological traceability
Several hundred measurands of clinical interest lack metrological traceability to SI units because primary and secondary reference measurement procedures are unavailable, but can be accommodated in one or other of several calibration hierarchies of lower metrological order (Figures 4b–e). 20
International conventional reference measurement procedure with an international conventional calibrator
An example is measurement of total haemoglobin concentration in blood using the ICSH-endorsed absorption spectrometry method for measuring haemiglobinocyanide, calibrated using the international conventional calibrator (IRMM BCR 522), a bovine blood lysate containing haemiglobinocyanide with a certified value and uncertainty assigned using calibrated spectrophotometers (Figure 4b). 42
International conventional reference measurement procedure (ICRMP) only
An example is the measurement of the catalytic activity concentration of plasma/serum enzymes, where activity is defined as the number of moles of substrate converted per unit time, with katal as the SI-coherent derived measurement unit (mol/s). In practice, 1 μmol/min is defined as one enzyme unit (U), and when the measurement procedure is exactly specified, U reflects the amount of active enzyme present (Figure 4c). Because catalytic activity of an enzyme with a specific substrate is a biological property that may be variably expressed by one or more isoenzymes, and the magnitude of activity is determined by measurement conditions (e.g. pH, temperature, co-factors etc.), a reference material with an SI-traceable quantity value is unavailable. Instead, the highest available metrological reference is a reference procedure for measuring catalytic activity under exactly specified conditions that control all factors known to influence the activity, and using instrumentation (e.g. pH, temperature, volume, spectrophotometry, etc.) that meets specified measurement and associated uncertainty specifications. 36 An example is the IFCC reference method for measurement of the catalytic activity concentration of alanine aminotransferase. 32 The problems and approaches to achieving traceability and standardization of measuring enzyme catalytic activity concentrations have recently been reviewed. 43,44
The trueness of the catalytic activity concentration realized by an international conventional reference measurement procedure for a given enzyme can be transferred to a calibration material that can be used by manufacturers' select or standing measurement procedures. An example is lactate dehydrogenase (LD) isoenzyme 1 catalytic activity concentration, obtainable in lyophilized form (IRMM ERM®-AD453/IFCC), which on re-constitution has a certified catalytic activity concentration of 502 ± 7 U/L (8.37 ± 0.12 μkat/L) ≈95% confidence. This material has been shown to be commutable with two different commercial routine procedures for LD, allowing metrological traceability of the patients' values they indicate to the IFCC conventional reference measurement procedure. 45 If non-commutability is demonstrated, an alternative approach is for catalytic activity concentration values to be assigned by the ICRMP to a set of patient's samples with values across the reportable interval, and to use these as matrix-based calibrators for routine procedures. 24
International conventional calibrator only
Examples are human C-reactive protein (CRP) WHO 1st IS, NIBSC 85/506, 2008, the ampoules of which contain freeze-dried residue of 0.5 mL of a solution containing approximately 50 μg of human CRP in 0.5 mL normal human serum, assigned a value of 49 mIU/ampoule, WHO 1st IS for follicle stimulating hormone (FSH), a highly purified recombinant FSH for calibration of immunoassays, with the freeze-dried residue containing approximately 10 μg of recombinant FSH with an assigned value of 60 IU/ampoule, TSH (WHO 81/565, 11.5 mIU/ampoule), intact parathyroid hormone (PTH) 1-84, (WHO 95/646, 100 μg/ampoule). In the latter example, the reference preparation is sufficiently pure and characterized that the assigned value is based on amino acid analysis of the PTH content.
Patients' results produced by a routine measurement procedure calibrated by an international conventional calibrator are traceable no further than the arbitrary measurement unit. The comparability of patients' values produced by different routine measurement procedures using the same commutable conventional calibrator may be improved, and if the comparability is statistically and clinically acceptable, they are said to be harmonized. The results of measurements for the same measurand by routine measurement procedures calibrated by different conventional calibrators are unlikely to be comparable, e.g. procedures for measurement of serum PSA concentrations. 15–17,46
An international conventional calibrator may not contain all relevant isoforms of complex analytes found in patients' samples, nor in a natural form unmodified by calibrator production, which may lead to significant differences in analytical specificity towards the calibrator by different routine measurement procedures. This problem was recently illustrated by a study of 16 different commercial hCG immunoassay measurement procedures, calibrated with WHO 75/589, which found significantly differing levels of recognition of the six WHO international reference reagents for the major hCG isoforms. 5,47 Therefore, international conventional calibrators must be shown to be commutable with the different measurement procedures claiming traceability (Figure 4d).
Manufacturer's selected measurement procedure
Many clinically important measurands lack both international conventional reference measurement procedures and international conventional calibrators, for example the tumour marker group of carbohydrate antigens. Assays of this type, e.g. immunoassay for CA-19.9, are only traceable to the manufacturer's in-house measurement procedure or calibrator, and generally show poor comparability of patients' values with those produced by other commercial procedures (Figure 4e). 48,49 In such cases, the measurement procedure defines the measurand because the epitopes selected and antibodies used are procedure-specific, and therefore patients' measured values will only be comparable with those produced by the same commercial measurement procedure performed according to the manufacturer's instructions. Therefore, when reporting patient values, laboratories should advise clinicians that patients' results are not comparable with those produced by different measurement procedures. Currently, measured values for the majority of clinical chemistry measurands are metrologically traceable no further than the manufacturer's in-house selected measurement procedure or master calibrator(s).
Modified calibration hierarchies
Depending on the measurand, manufacturers can sometimes lower costs by omitting a calibration step whilst still achieving a valid traceability chain to the desired higher reference. Shorter traceability chains usually also reduce the uncertainty of product calibrator values.
Realizing and maintaining metrological traceability
The highest level of metrological traceability in any field of measurement is to the definition of the relevant SI base unit. The BIPM is responsible for establishing fundamental measurement standards and scales under the supervision of the International Committee for Weights and Measures (CIPM), which in turn is responsible to the General Conference on Weights and Measures (CGPM). 7 The development and implementation of higher order metrological traceability of measurements in clinical biochemistry is underpinned by the activities of the BIPM, NMIs, reference measurement laboratories, professional societies, government and accreditation bodies, IVD medical device manufacturers and end-user laboratories, each having a responsibility to maintain and develop their scientific and technical contributions to improving the comparability of patients' measurement results across different measurement procedures and clinical laboratories. To effectively meet this challenge, the different groups need to work together to realize the standardization, or if that is not currently feasible, the harmonization of patients' results for the many measurands with lower order traceability (Figure 5).

Major organizational entities underpinning the quality of patients' measurement results QA, quality assurance; R&D, research and development; EQA, external quality assessment; QC, quality control; IVD, In vitro diagnostic; ISO, International Organization for Standardization; CLSI, Clinical and Laboratory Standard Institute; IFCC, International Federation for Clinical Chemistry and Laboratory Medicine; AACC, American Association of Clinical Chemistry
Routine clinical laboratories
Producing the required clinical quality of patients' measurement results is paramount. Several decades ago, routine laboratories took full responsibility for the quality of their measurement results, often modifying and developing measurement procedures to meet local clinical needs. During the last two decades, routine measurement procedures have become overwhelmingly dominated by commercial ‘closed’ systems, with manufacturers taking on responsibility for the quality of calibrators, reagents and achievable analytical performance. This technical sea-change has created the perception by many laboratorians that their responsibility for result quality is now limited to the selection of commercial measurement procedures, monitoring imprecision and bias of these procedures by QC, and participation in peer group comparisons through EQA programmes.
Despite the inability to significantly influence the analytical quality of most routine assays, the increasing clinical pressure for patients' results to be metrologically traceable to the highest available reference re-establishes full laboratory responsibility for result quality. Routine laboratories must therefore look beyond QC and EQA activities and also focus on the soundness of the metrological traceability and comparability of their measurement results. This focus will highlight to clinical laboratorians the information they require from manufacturers concerning the metrological pedigree of their product calibrators, i.e. the highest order reference to which metrological traceability is claimed, commutability and the uncertainty of quantity values assigned to each new calibrator batch. Routine provision of such data will also better inform laboratories when selecting new measuring systems, and inform laboratory advice to clinical users concerning the applicability of internationally recommended clinical decision values.
In addition to accessing adequate traceability and commutability information from manufacturers, laboratories need to expand their quality activities to include, where possible, the evaluation and routine monitoring of the trueness of measured values. At present, if EQA or inter-laboratory comparisons suggest systematic error, options for confirmation are limited to repeatability measurements of the relevant reference calibrator or trueness control material, or through comparison of patient sample measurements produced by both routine and reference measurement procedures. However, such action is necessarily a one-off response to events of low frequency, and does not provide a mechanism for routinely monitoring trueness. For this purpose, laboratories need their EQA programmes to routinely provide them with trueness performance data using commutable materials. 50,51 This ability will close the analytical quality assessment gap for routine laboratories, and allow them to better understand the quality of their patients' results.
EQA programmes
EQA programmes are well established in the quality armamentarium of routine laboratories, providing useful regular snapshots of performance relative to peers using the same measurement procedures, and to target values generally set by consensus or by selected laboratories. The challenge for EQA programmes, recently articulated by Panteghini, 51 is to enable routine laboratories to regularly assess the trueness of their measurement results across reportable ranges by providing commutable materials with measurand values assigned using the highest metrological order reference measurement procedures available. Some national EQA programmes are beginning to use accredited reference measurement laboratories for target setting of some enzymes and SI-traceable simple analytes, and the increased costs of this approach will presumably be passed on to users.
Manufacturers of routine measurement procedures and calibrators
Routine laboratories need to know the metrological traceability and uncertainty of the values assigned to product calibrators, and that commutability has been demonstrated using a recognized protocol. Although traceability information is required by ISO 17511, 20 it is often not adequately provided in the information sheets that accompany commercial reagent sets, typical examples being ‘traceable to a commercial radioimmunoassay’, ‘calibrated against our previous X method’, ‘standardized against ID-MS’ (ID-MS, isotope dilution mass spectrometry). In addition, there is generally no statement as to the commutability of calibration materials nor an uncertainty estimate of assigned values. To meet their responsibility for result quality, clinical laboratories need to ensure the required information is readily provided by manufacturers.
Accreditation, standards and guideline bodies
Accreditation schemes for clinical laboratories ensure that minimum standards of technical and management practices are met, for which purpose ISO 15189 is widely applied. 52 In relation to examination procedures, this standard requires documentation of trueness of measurement (5.5.3c), 52 and of metrological traceability (5.5.3g). 52 Accreditation schemes are also concerned with promoting the quality of clinical laboratory practice beyond minimum standards, and in this context should encourage the use of measurement procedures which produce patient values metrologically traceable to the highest available references. While international standards identify what must be done, other bodies, such as the Clinical and Laboratory Standards Institute (CLSI) 53 produce many excellent bench-friendly guidelines describing how to achieve the technical and other requirements stated in international standards.
Joint committee for traceability in laboratory medicine 28
The decision in 2002 by the CIPM, the IFCC and the International Laboratory Accreditation Cooperation (ILAC) to establish the JCTLM was a significant positive response to the EU Directive concerning the requirement for metrological traceability of calibrators. 19 The JCTLM provides a platform to globally promote and give guidance on internationally recognized and accepted equivalence of measurements in laboratory medicine. JCTLM Working Group 1 (WG1) evaluates the compliance of candidate materials with the requirements of relevant ISO and other standards for recognition as a primary reference material, primary or secondary reference calibrator. Candidate materials are nominated by NMIs and accredited reference laboratories by completion of a detailed questionnaire available on the JCTLM website. If further information is required, WG1 may have additional studies undertaken by accredited laboratories with expertise demonstrated by participation in Mutual Recognition Agreement (MRA) key comparison or IFCC-RELA (REference LAboratories) studies, an annual round-robin reference laboratory proficiency testing program conducted by the Reference Institute for Bioanalytics of the German Society for Clinical Chemistry and Laboratory Medicine under the auspices of the IFCC. The RELA programme ensures participants measuring the same measurand produce comparable results, thereby providing a network of competent reference laboratories for that measurand. Materials, calibrators and measurement procedures that meet the requirements are listed on the JCTLM database, and it is the responsibility of the producers to continue to meet the certified specifications, and to ensure stocks will be available for a significant time. WG 2 performs a similar function for candidate reference measurement laboratories, which are also assessed against relevant ISO standards (see below), and those complying are listed on the JCTLM website.
The JCTLM provides a practical framework for the international promotion and implementation of metrological traceability to the SI unit, facilitates close cooperation between NMIs and reference laboratories, and through their website provides IVD medical device manufacturers, regulatory, accreditation bodies, professional organizations and clinical laboratories with the information they need to develop and implement standardization of clinical laboratory measurements.
National Metrology Institutes
A NMI is formally recognized through an MRA with the CIPM, whereby the NMI will ensure national measurement standards are demonstrably traceable to the SI, and are comparable to those of other MRA nations through participation in regular key comparison studies. NMIs are accredited to the general requirements for calibration laboratories (ISO 17025 54 ), and the specialized requirements for the production of reference materials (ISO Guide 34, 25 ISO 15194 27 ). NMIs provide the metrological traceability link between SI units and reference measurement laboratories through provision of primary (pure substance) materials, primary and secondary (matrix-based) reference calibrators (CRMs), and by maintaining secondary reference measurement procedures with which measurement procedures can be compared. For the preparation of pure substance materials, NMIs generally obtain high-grade chemical materials from third party manufacturers, and determine purity by measurement of impurities and moisture content using primary reference measurement procedures. In the case of certified secondary calibrators, a particular challenge is to produce materials with clinically appropriate measurand values in volumes sufficient to meet user needs for a decade or more. Available certified primary reference materials and secondary calibrators, and their sources are listed on the JCTLM website. The roles and activities of one NMI, NIST, have recently been described. 55
Accredited reference measurement laboratories
These laboratories must be accredited to ISO/International Electrotechnical Commission (IEC) 17025 54 and ISO 15195 56 by an organization that is a full member of ILAC, and have high level expertise in performing reference measurement procedures approved by the JCTLM for the purposes of assigning method-independent quantity values to reference calibrators, trueness control materials, EQA control materials and panels of patient's samples. The detailed requirements for recognition as an accredited reference measurement laboratory can be accessed on the JCTLM website. Reference laboratories are required to participate in the RELA round-robin reference laboratory proficiency testing programme. Currently, less than 20 reference laboratories offer calibration and reference measurement services for up to 31 measurands with high level metrological traceability. 28 Accredited reference laboratories are often also involved in the research and development of new reference measurement procedures. The roles and activities of reference laboratories have been recently reviewed. 57
Professional societies
The IFCC 58 has been energetically proactive in both developing a formal structure for promoting metrological traceability by jointly establishing the JCTLM and by forming, through its Scientific Division, technical committees and work groups to address specific problems concerning reference measurement systems for clinically important quantities. The IFCC Committee for Traceability in Laboratory Medicine is responsible for overseeing the IFCC-RELA Trials (RELA) for reference laboratories, establishing and maintaining networks of reference laboratories for specific measurands, for example HbA1c, and for evaluating the feasibility of standardization or harmonization for measurands currently lacking high level traceability. The IFCC Committee for Nomenclature, Properties and Units (C-NPU) has developed, in collaboration with the International Union of Pure and Applied Chemistry (IUPAC), a coherent terminology for properties and units in the clinical laboratory sciences. 59 Other IFCC Scientific Division committees are addressing traceability problems in the areas of plasma proteins, reference systems of enzymes and molecular diagnostics. Currently, IFCC working groups are focusing on the standardization of measurands such as thyroid hormones, cystatin C, cardiac troponin I (cTnI), urine albumin and carbohydrate-deficient transferrin. 58 The IFCC also undertakes joint projects with other organizations such as the American Association of Clinical Chemistry (AACC) and the CLSI to promote metrological traceability through technical advances and the development of guidelines for manufacturers and routine laboratories.
Internationally recognized expert clinical committees
Clinical practice guidelines produced by internationally recognized expert clinical groups that specify clinical decision values, and often analytical performance requirements, can be effective initiators of efforts to improve metrological traceability. Unfortunately, such guidelines generally ignore the lack of result comparability between different procedures in use for the same analyte. Widespread clinical demand for improved analytical quality to meet new diagnostic criteria in turn puts pressure on clinical laboratories and manufacturers to meet the expectation, e.g. ID-MS-alignment of serum creatinine concentration measurements.
Discussion
Accurate and comparable patients' values produced by different measurement procedures for the same measurand improves clinical management, facilitates use of common clinical decision values, reduces health-care costs, lowers risk of clinical error, and is best achieved by the underpinning of measurement procedures through reference measurement systems with metrological traceability to an SI unit, and supported by networks of accredited reference measurement laboratories. Where such a reference system is in place for an analyte, its measurement is described as standardized, and offers the best opportunity for obtaining measured values that are close to true values and are harmonized across different routine measurement procedures.
Development and implementation of a reference measurement system is a major scientific and organizational challenge that inevitably spans many years. The first reference system in clinical chemistry was the standardization of serum cholesterol measurements in the USA, commenced in 1957 and realized by the 1970s. 60 However, technical advances bring opportunities to improve reference measurement systems, e.g. a recently proposed GC-ID-MS reference measurement procedure for cholesterol. 61 The IFFC Reference Measurement System for HbA1c, initiated 16 years ago, is only now seeing routine laboratories beginning to report HbA1c results in SI units. Such undertakings improve health outcomes, as demonstrated by the improved detection of asymptomatic chronic kidney disease 62,63 due to the routine reporting of estimated GFR which has been facilitated by the widespread introduction of ID-MS-aligned creatinine concentration measurements. However, it is important to note that even when measurands have primary and secondary reference measurement procedures supported by reference laboratory networks, there is no guarantee of value accuracy unless the routine measurement procedures have analytical specificity. Steroids such as oestradiol and testosterone are examples of components which have had reference measurement systems in place for some years, but the inadequate analytical specificity of many routine measurement procedures severs their traceability chains. 64
Well defined, commonly measured measurands now have reference measurement systems that have greatly improved result comparability for analytes such as potassium, calcium, etc., and therefore the scientific challenges increase as attention focuses on more complex molecules. The standardization of HbA1c posed the problem of identifying and then achieving analytical specificity for the clinically most appropriate measurand from among the different glycated haemoglobins in diabetic serum. The difficulty was overcome by defining the measurand as haemoglobin molecules with a glycated N-terminal hexapeptide on the β-chain, this being common to the various isoforms and therefore measurable on an equimolar basis. 65 The approach of targeting a part of the molecular structure that is common to all isoforms of an analyte is likely to be of value in achieving analytical specificity and equimolar measurement for other complex analytes, particularly heterogeneous peptides and proteins. 66
cTnI measurements are central to the clinical definition and laboratory diagnosis of acute coronary syndromes, but a recent study comparing cTnI results obtained from seven different commercial measurement procedures found up to five-fold ratios between reported values. 67 The need for standardization of cTnI measurements is therefore urgent, but poses problems because of the structural and chemical heterogeneity of cTnI in post-MI serum. 68 Two candidate reference materials, one isolated from human cardiac tissue and the other produced by recombinant techniques, were non-commutable with approximately half of routine measurement procedures studied, even when used as calibrators. 69 The IFCC Working Group on Standardization of cTnI in collaboration with several NMIs is working to develop a secondary reference immunoassay measurement procedure, calibrated by the original reference material, that can be used to assign cTnI values to a commutable secondary calibrator comprising serum from post-MI patients so as to ensure clinically relevant forms of cTnI are present. 70 Although the measurement of total serum thyroxine and triiodothyronine have been standardized, 71 free thyroid hormone measurements are problematic because measurement conditions, such as pH and temperature, affect the free/protein-bound hormone equilibrium. 72 However an international conventional reference measurement procedure and a secondary calibrator have been validated. 73,74
For measurands yet to be adequately defined, traceability to non-SI measurement units can improve the harmonization of patients' results, providing the conventional calibrators are commutable with the routine measurement procedures in use. However, more than one conventional calibrator may be available for a given measurand, and if they are not comparable with each other, there will be a lack of result comparability between measurement procedures traceable to the different reference materials, as has occurred for PSA. 15,16 For many measurands there continues to be a lack of both reference and conventional measurement systems, obliging manufacturers to assign product calibrator values that are determined in-house. To address the lack of progress in improving the harmonization of routine results for measurands with the lowest two levels of traceability (Figures 4d and e), the AACC recently sponsored a meeting, ‘Improving Clinical Laboratory Testing through Harmonization: An International Forum’, which brought together clinical laboratorians, professional societies, IVD manufacturers, government and regulatory bodies to explore avenues for achieving consensus in identifying the organizational processes needed to harmonize patients' results produced by measurement procedures that for the foreseeable future will lack higher order reference measurement systems.
Availability of higher order metrological traceability does not realize the potential clinical benefits for patients unless there is across-the-board implementation by IVD medical device manufacturers and clinical laboratories. Uptake of traceability to the highest available reference is generally positive, but a study to assess implementation of traceability to IFCC reference procedures for the measurement of catalytic activity concentrations of serum enzymes identified a lack of trueness for both LD and amylase by several commercial measurement procedures. 75 The cost to manufacturers of developing in-house calibration protocols, conducting commutability and uncertainty studies, followed by production of adequate long-term supplies of product calibrators is relatively high. Where improved reference materials replace earlier versions, implementation by manufacturers may face lengthy delay until stocks of current product calibrators are exhausted, and the costs of validating a new traceability chain can be commercially justified. When appropriate metrological traceability has been commercially implemented for calibrator value assignment, it is essential that the integrity of the manufacturer's in-house traceability and commutability procedures are regularly monitored, otherwise the traceability chain can be broken, with potentially poorer health-care outcomes for patients. 13,76
Manufacturers should provide clinical laboratories with adequate information about the pedigree of the metrological traceability of their product calibrators. This information should include the identity of the highest reference material or measurement procedure to which the product calibrator is traceable, the uncertainty of the assigned product calibrator value relative to the reference and the commutability of the product calibrator with the reference. Such information, in full or part, is often available on request, but it is desirable for such information to be routinely available, preferably with reagent kit information sheets. However, if such data are considered commercially sensitive, they could be provided, for example, via a customer-access only area of the manufacturer's website.
The great majority of reported measurement results from routine laboratories are produced by automated ‘closed system’ instrumentation, the quality of which is determined by the manufacturers. With routine laboratories effectively locked out of significantly influencing the analytical quality of such measurement procedures, clinical biochemists often consider they have lost responsibility for the trueness of their results, and instead focus on monitoring imprecision and peer group EQA comparisons. However, rather than being disengaged, clinical biochemists can and should play a leading role in promoting the implementation of metrological traceability to the highest available references, particularly by placing high priority on the metrological pedigree of product calibrators when selecting new or replacement measurement procedures.
Routine laboratories expend significant budget on EQA, and therefore clinical biochemists should ensure that the target values of the programmes they use are set using the highest available references, and that the materials are commutable with their routine procedures, thereby enabling effective routine monitoring of result trueness. In this way responsibility for analytical quality will be restored to clinical laboratories. Laboratory standards and accreditation organizations also have a role in improving metrological traceability in routine laboratories by requiring patient measurement results to be demonstrably traceable to the highest order references available.
DECLARATIONS
Footnotes
Annex A. Useful metrological terms and definitions (from VIM 4 unless otherwise stated)
