Sage Journals: Discover world-class research

Abstract

New facts have recently enhanced interest in the topic of reference intervals. In particular, the International Organization for Standardization standard 15189, requesting that ‘biological reference intervals shall be periodically reviewed’, and the directive of the European Union on in vitro diagnostic medical devices asking manufacturers to provide detailed information on reference intervals, have renewed interest in the subject. This review presents an update on the topic, discussing the theoretical aspects and the most critical issues. The basic approach to the definition of reference intervals proposed in the original International Federation of Clinical Chemistry documents still remain valid. The use of data mining to obtain reference data from existing databases has severe limitations. New statistical approaches to discard outliers and to compute reference limits have been recommended. On the other hand, perspectives opened by the improvement in standardization through the implementation of the concept of traceability suggest new models to define ‘common’ reference intervals that can be transferred and adopted by different clinical laboratories in order to decrease the proliferation of different reference intervals not always justified by differences in population characteristics or in analytical methodology.

Introduction

The concept of ‘reference intervals’ as known today was developed by Gräsbeck and Saris in the late sixties and presented at a congress of the Scandinavian Society in 1969.¹ Previously, reference intervals were usually named ‘normal values’ or ‘normal ranges’ without a clear definition of the term. Twenty years later, the International Federation of Clinical Chemistry (IFCC) expert panel published the first official IFCC document on the theory of reference values.²

Someone could ask why we are still talking about reference intervals in the era of evidence-based medicine, characterized by the use of decision limits, more than 30 years after the publication by Galen and Gambino of the landmark book ‘Beyond normality’.³ There are several reasons for reconsidering this ‘old-fashioned’ argument. The most important one is that, while the theory is clearly defined, there is a big gap in its application in everyday life. Moreover, three new facts are driving renewed interest to the topic: the publication of the International Organization for Standardization (ISO) standard 15189: 2007 on requirements for quality and competence of clinical laboratories,⁴ the implementation of the European Directive 98/79 on in vitro diagnostic (IVD) medical devices⁵ and, indirectly related to the Directive, the creation of the Joint Committee for Traceability in Laboratory Medicine (JCTLM).⁶

The first document⁴ states in clause 5.5.5 that ‘biological reference intervals shall be periodically reviewed’ and, particularly, requires that they shall be verified every time a variation in analytical and/or preanalytical procedures takes place. It is difficult for laboratories to comply with this requirement, considering the enormous number of different types of tests and the very rapid evolution of the analytical technology. The issue is not easy for manufacturers either: the IVD Directive requires in Annex I ‘Essential requirements’, part B, 8.7, statement l, to include in the information supplied by the manufacturer ‘the reference intervals for the quantities being determined, including a description of the appropriate reference population’.⁵ The third and probably most relevant element is the general requirement of the IVD Directive that for each method ‘the traceability of values assigned to calibrators and/or control materials must be assured through available reference measurement procedures and/or available reference materials of a higher order’.⁵ This element, improving the method's comparability with other traceable methods, could substantially modify the situation by allowing an easier and more correct definition of reference intervals.

In this review, after an update of the theory underlining the critical aspects, the new perspectives and developments in the field of reference intervals are presented.

The theory of reference values

Reference values evolved with laboratory tests. They do not have a meaning per se but only if referred to in a particular context, usually a physiological situation. Until the 1970s, before Gräsbeck's publications and the work of the IFCC expert panel, the term ‘normal values’ was usually used. The use of the term ‘normal’ has, however, been discouraged, because it can assume different meanings: Murphy listed seven of them (statistical – Gaussian distribution – most representative of a class, most commonly present in a class, most suited for survival, that does not harm – in medicine – conventional, ideal).⁷ Thus, it may be subjective and ambiguous; moreover, it implies that everything outside the reference range is ‘abnormal’, however, due to the way it is calculated, this is not true. For these reasons, the IFCC document introduced the term ‘reference values’.² Usually these values are health-associated, but they can also reflect specific physiological conditions like pregnancy, or refer to specific population groups, such as professional athletes. The basic concept is that the values represent a specific population and thus they rely upon the choice of the subjects on which they were obtained. First of all, the criteria for the selection of reference individuals have to be clearly defined; those individuals represent the reference population from which a reference sample group is selected, on which the reference values are measured. The obtained values will assume a certain distribution (reference distribution) and, by analysing it with appropriate statistical methods, reference limits can be calculated. Conventionally, these limits are set to include 95% of the measured values. The reference interval is defined by the reference limits and includes them. This has a series of consequences:

The reference interval of a specific measurand depends upon the intra- and inter-individual biological variability of the sample group of the reference subjects; thus, the numbers of selected subjects and their partition into subgroups may gain relevant importance;

The preanalytical aspects need to be strictly controlled so as to be reproducible when collecting samples from subjects of the general population;

Analytical aspects are essential. The standardization of the measurements allows comparison of data obtained on different groups of subjects and eventually their application to different populations in places and times different from those where they were obtained;

The method by which reference limits are calculated may modify significantly the results obtained, e.g. if inappropriate statistical models are applied or if the exclusion of outliers is not correctly performed.

Selection of the reference subjects

This represents the starting point. Usually, we select ‘healthy’ individuals, but exactly what do we mean by health? How can we define and recognize it? Several commentaries have been written on this topic.^8–10 The World Health Organization's definition: ‘a state of complete physical, mental and social wellbeing and not merely absence of disease or infirmity’¹¹ cannot be a realistic starting point. On the other hand, the concept of health can differ between cultures and countries. In 1975, the Scandinavian Committee on Reference Values tried to define a list of pathological conditions to be excluded to consider an individual ‘healthy’.¹² However, when they tried to apply this recommendation it was impractical,¹³ especially if aged subjects were involved. When Horn and Pesce¹⁴ looked at the third National Health and Nutrition Examination Survey (NHANES III), they found that no more than 10% of the subjects aged 70–80 fell into the ‘healthiest’ category.

Consequently, a more pragmatic approach is needed and health should be judged subjectively as the absence of signs of disease specifically related to the measurand(s).⁸ As clearly stated in the IFCC document,² the first step should be the definition of the scope for use of reference intervals and, secondly the definition of the method used to select the reference individuals. The main pathological situations to be excluded are reported in Section 3.3 of the IFCC document.¹⁵ This can be done through an anamnestic questionnaire (an example of such a questionnaire is reported in the document C28-A2 of the Clinical and Laboratory Standards Institute (CLSI),¹⁶ a physical examination and further investigations (e.g. laboratory tests, imaging studies). The excluded subjects will depend, of course, on the particular analyte evaluated. For example, to determine a reference interval for haemoglobin or related haematological analytes, it would be wise to exclude subjects with iron deficiency, marked vitamin B12 or folic acid deficiency, inflammation or chronic respiratory disease, tumours, genetic abnormalities of haemoglobin synthesis, etc., as all these factors influence haemoglobin synthesis.

When selecting individuals, it is necessary to take into account all variables that can affect the concentration of the analyte: gender, age, environment, lifestyle, ethnicity, etc. In the example of haemoglobin in addition to gender and age, stratification according to the altitude of living and smoking habits is also important. All these biological aspects can be used as partitioning criteria. In general, to determine and consider which factors may be of importance a comprehensive knowledge of physiology (in addition to pathology) is required. As already stated, the width of the reference interval is influenced by three sources of variability: the intra- and inter-individual biological variability of the selected reference individuals and analytical variability of the measurement system. Analytical variability, with the exception of analytes showing very low biological variation, such as electrolytes, usually has minimal influence on reference intervals. Effects of intra- and inter-individual variabilities are inextricably bound together. The relative sizes of these two sources of variability can substantially affect the utility of a reference interval as a tool for interpreting an individual result. Harris demonstrated in 1974 that only if the intra-individual coefficient of variation (CV_I) is substantially larger than the inter-individual one (CV_G) will the distribution of the results in a single individual span the entire range of the reference interval, so that the reference interval can be a useful tool to evaluate his state of health.¹⁷ Unfortunately, this is a relatively uncommon situation; usually the CV_I is smaller than CV_G and the range of values of an individual span only a limited part of the distribution of the values of the reference population. Consequently, in the majority of cases the reference interval is of limited utility in evaluating the results of a single subject and its sensitivity in finding that abnormal data is low. Harris demonstrated that only if the CV_I/CV_G ratio (index of individuality) is >1.4 will the reference interval be a sensitive and useful tool, whereas if the ratio is <0.6 the utility of the reference interval is low.¹⁸ In this case, the only way to improve the utility of a reference interval is to increase the ratio and, provided that the CV_I cannot be modified, the only possible intervention is to reduce the CV_G by stratifying the individuals into more homogeneous subgroups.^18–20

A priori vs. a posteriori selection

Two factors essentially drive the choice whether to decide in advance which individuals to select and how to partition them (a priori criterion) or whether to collect a relevant number of subjects, analyse them and decide thereafter which are to be kept in the reference population and how to partition them (a posteriori criterion): first, knowledge of the biology of the analyte and secondly the available resources.

If the biology of the analyte is well known, the a priori approach is the most convenient one. This approach is recommended in the IFCC document on selection of individuals.¹⁶ The same document however does not explicitly exclude the possibility of utilizing the existing data, primarily not collected for the scope of defining reference intervals, quoting Martin et al.²¹

Indirect reference values

The use of existing databases containing thousands or even millions of patients' records is an exciting opportunity, recently exploited by several authors.^22–30 With appropriate software, it is an inexpensive and relatively rapid procedure. This approach must not be confused with the a posteriori approach, unless the database also contains detailed clinical information. The first papers using this approach were published in the 1960s^31,32 and were based on the postulate that the majority of laboratory results are ‘normal’. Considering the frequency distribution of results, it should be possible to apply some statistical procedures to eliminate the extremes of the distribution curve, so excluding the less frequent results, typical of ‘unhealthy’ subjects. However, care has to be taken as several relevant limitations may have a negative influence on this approach. First, it does not fulfil the fundamental principle of the theory of reference values, which is the careful definition of the characteristic of the reference population.³³ With this approach we know almost nothing about the subjects we are using. Usually, the applied statistical calculation is based on a presumed distribution of the results of the studied population, but in some cases the assumed distribution of data may not be correct. For instance, in the case of a skewed distribution the statistical models usually fail.³⁴ Furthermore, there is little or no control over preanalytical variables. Finally, it is very difficult to provide any demonstration of the metrological traceability of the obtained results; consequently, the observed intervals might be applicable only in the laboratory which produced them and cannot be adopted for use elsewhere.³⁵ In conclusion, even if information technology provides us with a powerful means of calculation, the data mining approach cannot be endorsed as the best way of defining reference intervals. Only if the original data are obtained with carefully controlled methodology, the laboratory is able to provide traceable results and reliable clinical data are available can this approach be adopted. In the majority of cases, it can only represent a means to confirm and validate the findings obtained with the more scientifically sound a priori selection.

Preanalytical aspects

The samples for reference interval studies must be collected under conditions representative of those used in clinical practice.³⁶ Unfortunately, in clinical practice the preanalytical phase is usually poorly standardized. For this reason, when performing a reference interval study it is essential to accurately define and describe the preanalytical conditions to allow others to reproduce the same situation and to understand the effects of certain factors (e.g. the collection device or the posture of the individual). Table 1 shows the most important preanalytical conditions to be taken into account when a blood analyte is evaluated. For body fluids other than blood it may be necessary to include different factors, while, for specific analytes, more information may be needed (e.g. emotional stress level for certain hormones).

Table 1

Main preanalytical factors to be considered in the production of reference values

Subject preparation	Methodological factors
	Specimen collection	Specimen handling
• Fasting vs. non-fasting	• Time of day	• Transportation
• Drug regimen	• With or without tourniquet	• Time before centrifugation
• Physical activity	• Body posture during phlebotomy	• Centrifugation time and speed
	• Type of anticoagulant	• Storage conditions before measurement
	• Sampling equipment
	• Freedom from haemolysis

Analytical aspects

This aspect is neglected in many publications on reference intervals. The IFCC document dealing specifically with this topic gave a series of recommendations for documenting the operating procedures, focusing on how internal quality control should be practiced during production and application of reference values.³⁷ These recommendations are, however, useful if the defined reference intervals are to be applied only within the laboratory who defined them, because it allows the baseline conditions to be properly fixed and then to understand whether the modification of certain analytical aspects may change the reference intervals. On the contrary, the recommended approach is not very effective in providing procedures which define reference intervals that are ‘transferable’ to different laboratories.

In 1991, the concepts of reference measurement systems and of the implementation of metrological traceability (defined as a ‘property of a measurement result relating the result to a stated metrological reference through an unbroken chain of calibrations of a measuring system or comparisons, each contributing to the stated measurement uncertainty’³⁸) were still in an embryonic state. Although reference measurement procedures were already available for some common analytes and the concept of method hierarchy had been introduced more than 10 years ago,³⁹ systematic application of these concepts was lacking. Only some years later, was the concept of reference measurement systems formalized, based on implementation of reference measurement procedures, preparation of reference materials and identification of reference measurement laboratories.⁴⁰ The reference measurement system represents a trueness-based approach in which different commercial methods that provide results traceable to the system are able to produce comparable results in clinical laboratories using these assays. ISO has produced two standards on this concept: ISO 17511: 2003⁴¹ and ISO 18153:2003.⁴²

Only reference intervals obtained with analytical procedures producing results traceable to the same reference measurement system should be transferred between laboratories (see ‘common reference intervals’ section).

Calculation of reference limits

This issue has been of most concern to authors dealing with reference values. There are three main problems: (i) the statistical methodology that provides the most effective way to extrapolate the results obtained on a sample population to the whole population itself; (ii) the partitioning of results among different groups (age, gender, etc.); (iii) the detection and discarding of outliers. Here, we give a brief outline of these aspects, referring the readers to more details from two excellent textbooks: Statistical Bases of Reference Values in Laboratory Medicine by Harris and Boyd⁴³ and Reference Intervals. A User's Guide by Horn and Pesce.⁴⁴

Statistical methods

Harris and Boyd⁴³ refer to the approach by Wootton et al.⁴⁵ who in 1951 applied parametric statistics for the first time to the calculation of reference intervals. However, these authors soon realized that this statistical model was only applicable in a minority of situations and, two years later; they proposed logarithmic transformation of data to achieve a Gaussian-like distribution.⁴⁶ However, the incorrect practice of defining the reference interval as the mean ± 2SDs, without any preliminary verification of the shape of the distribution of the data, has unfortunately continued for many years and is still sometimes used today.

Several publications on the use of fractiles for the definition of reference intervals appeared in the 1970s and 1980s,^47–50 but the milestone is represented by an IFCC publication in 1987.⁵¹ This document clearly defined a number of elements that, even more than 20 years later, will remain valid. First, it promoted the (arbitrary) choice of using the central 95% of the distribution for the reference interval calculation. If we exclude a few different opinions, such as that of Jørgensen et al.⁵² who proposed widening the interval to include 99.8% of the observed data to reduce false-positives in cases where a large battery of tests is requested, this approach is still widely accepted today. Secondly, the IFCC document recommended that reference limits should always be presented together with their 90% CIs. The width of the CIs decreases with the increase in the number of evaluated subjects and represents a reliable indicator of the uncertainty of the reference limits. Again, the document recommended the use of a non-parametric statistical method to calculate the reference limits. Even if parametric methods are theoretically more reliable, particularly if the sample of subjects is small, the uncertainty on the real ‘Gaussianity’ of the original distribution (or after its transformation) increases the uncertainty of the final estimate. A proposal for a simple and effective way to calculate reference limits was also included. This method is based on the calculation of the 0.025 and 0.975 fractiles. The α fractile cannot be calculated unless α is above 1/N, where N is the sample size. Thus, the determination of 0.025 and 0.975 fractiles requires at least 40 values (α = 1/40 = 0.025). However, with only 40 subjects, the minimum and maximum values represent the lower and the upper limit of the reference interval and it is therefore impossible to estimate their CIs. To calculate the uncertainty around the limits at least 120 subjects are needed: in this case, when the data are arranged in increasing order, the 2.5th centile is the third value in the series and the 97.5th centile is the 118th, while the 90% CI for the lower limit spans from the first to the seventh value and that for the upper limit spans from the 114th to the 120th value. The IFCC recommendation to use a minimum of 120 individuals per class derives from these considerations. Theoretically, this approach should be easy to apply, but the relatively high number of individuals to be enrolled may create practical difficulties, e.g. for paediatric populations, for expensive tests or for analytes with age and gender dependence that require partitioning among different classes. The number of subjects can be reduced by using parametric statistics, but require the data to have a Gaussian distribution. As an alternative, Horn et al.^53,54 proposed a ‘robust method’ based on the transformation of the original data according to Box and Cox,⁵⁵ followed by a relatively complex algorithm, based on robust indicators, able to provide correct answers even for less-than-ideal situations. This ‘robust’ algorithm gives different weights to the data, depending upon their distance from the mean. The use of this approach should allow estimation of correct reference limits with samples of only 20 subjects. To calculate the 90% CIs around the limits, it is possible to use the so-called ‘bootstrap’ methodology. With this methodology, observations are ‘resampled’, with replacement, from the data, creating a ‘pseudosample’. From each pseudosample the reference interval is derived. This process is repeated a large number of times (1000–2000), yielding a distribution of upper and lower reference limits. From this distribution, the 5th and the 95th quantiles may be used to determine the 90% CI for each limit. A critical drawback of this approach is that the 90% CIs can be very wide if the sample size is small (at least 80 individuals are needed to obtain acceptably small 90% CIs). Regression analysis has been proposed as an alternative technique to deal with small sample sizes and has been applied to the determination of age-dependent reference intervals.^56–58

Partitioning criteria

As discussed in the section on the selection of reference subjects, the decision of whether or not to separate different groups is extremely important and several statistical methodologies have been proposed to achieve this. An intuitive approach is based on the calculation of the statistical significance of the difference between the mean values of two subclasses. This approach can lead to the identification of different subclasses, even for very small differences that are indeed statistically significant although clinically irrelevant, especially when the number of subjects per class is high. Sinton et al.⁵⁹ suggested two classes should be separated only if the difference between the respective means is greater than 1/4 of the interval calculated from 95% of the individuals of the combined distribution. This criterion was originally proposed for specific analytes (i.e. calcium, inorganic phosphate and alkaline phosphatase), but did not allow adequate separation when applied to other analytes.⁶⁰

The most popular partitioning method was proposed by Harris and Boyd^43,61 and subsequently endorsed by the CLSI document C28-A2.¹⁶ In their studies, the authors first considered the idea that partitioning should lead to reduced inter-individual variability in the subgroups compared with that of the entire data group. However, they found that a worthwhile reduction in inter-individual variation was hard to achieve, even with large differences between subgroup means. Therefore, they abandoned the goal of inter-individual variability reduction as the basis for establishing partitioning criteria and focused on the proportions of the subgroups outside the reference limits of the entire population and suggested that ‘the problem of whether or not to compute separate pairs of 95% reference limits for subgroups of the population may be reduced to the question: Does a single pair of limits, derived from a combined sample of subpopulations, come close enough to satisfying this criterion of 2.5% below and 2.5% above for each subpopulation?’.⁶¹ The problem is in deciding at which level to set the acceptability limit. Harris and Boyd proposed that if the percentage is higher than 4% or lower than 1%, it is necessary to define different reference limits. This criterion appears valid because it considers not only the means but also the standard deviations of the subgroups, as a different standard deviation by itself may produce different reference limits. Their proposed test consists of two steps: first, evaluation of the difference between the means and secondly, comparison of their standard deviations. Further details and practical examples may be found in the CLSI C28-A2 document.¹⁶ However, this approach works well only with Gaussian distributions and with subclasses of similar size and standard deviation.⁶⁰ To overcome these limitations Lahti et al.^62–64 have proposed a method based on similar concepts, but allowing the estimation specifically of the percentage of subjects in a subclass outside the reference intervals of the entire population in any situation. Following the criteria based on biological variability presented by Gowans et al.⁶⁵ they proposed the creation of a subclass when more than 4.1% or less than 0.9% of the subjects of the subgroup fall outside the limits of the entire group. If the percentage is <3.2% or >1.8%, they suggest combining the groups. For intermediate (marginal) situations, the choice to combine or not should be based on clinical criteria.

Detection of outliers

Whatever the method used for the calculation of the reference interval, the presence of outliers can significantly modify the limits, even if complex mathematical and statistical methods are applied.^53,54 Thus, correct detection and exclusion of outliers is important. A simple but effective method to detect outliers is visual inspection of the distribution of the data. If an outlier is detected and if there are no obvious reasons to discard it (such as conditions of the subject, analytical problems, calculation or transcription errors), it is useful to apply statistical methods to justify its exclusion.

The most popular statistical method is the one proposed by Dixon,⁶⁶ which is based on the D/R ratio, where D is the absolute value of the difference between the outlier and the next or preceding value and R represents the entire range of the observations (maximum–minimum), outlier included. According to Reed et al.⁴⁷ the CLSI C28-A2 document proposes one-third as the limit for this ratio. The test is, however, not very sensitive; in particular, when there is more than one outlier, the presence of a less extreme outlier may mask the other(s). In this case, the suggestion is to first evaluate a less extreme suspected outlier and, if the test identifies it as a true outlier, also discard the second more extreme outlier. Horn et al.⁶⁷ have proposed a more sophisticated two-step algorithm in which the data are first transformed using the Box and Cox method,⁵⁵ to obtain a Gaussian distribution, then the outlier identified using the Tukey robust approach.⁶⁸ This method identifies the extremes using the central 50% of the distribution, thus eliminating the confounding effects of more outliers and involves the computation of lower and upper quartiles (25th and 75th percentiles) of the transformed data (Q₁ and Q₃) from which the interquartile range (IQR) (Q₃–Q₁) is calculated. Finally, the lower and upper ‘fences’ are computed: the lower fence as Q₁ – 1.5 × IQR and the upper fence as Q₃ + 1.5 × IQR. Any data point outside the fences is considered an outlier and discarded.

Current situation and future developments

In 1960, Schneider wrote that ‘…practical medicine is basically founded on comparison. If medicine is to be scientific, we must not only understand the structural, functional and chemical relations operating in individuals, but we must also understand the bases of our comparisons’.⁶⁹ From all the issues discussed above, we could conclude that the target that Schneider drew in his publication has been reached. Unfortunately, while the theory is well-defined, its practical application in the everyday life of most clinical laboratories is far from optimal. Laboratories often use different reference intervals without any valid reason, such as variations in analytical methodology or population served. A typical example of the situation can be derived from a survey on the reference intervals used in Italy in 2005 for alanine aminotransferase.⁷⁰ In 93 laboratories, all claiming to use the IFCC procedure with pyridoxal phosphate addition (even though on different analytical platforms), the upper reference limit for adult males spanned from 40 U/L to 72 U/L, while the lower limit ranged from 0 U/L to 30 U/L. This was partly related to differences in reference intervals suggested by the manufacturers in their package inserts, although not every laboratory adopted manufacturers' values. This common situation is dangerous and misleading both for clinicians and patients (the same analytical result can be considered ‘normal’ in one laboratory and ‘abnormal’ in another, according to the reference interval in use). Moreover, it hampers the creation of common databases, i.e. the combination of data from different laboratories.

The reasons for these differences are multifactorial; possible reasons are adoption of literature data or manufacturers' values without any critical appraisal, e.g. changes in analytical methodologies are not accompanied by corresponding changes in reference intervals. Establishing reference intervals based on the laboratory's own served population is a very costly and demanding process, requiring recruitment of appropriate reference individuals. Laboratories have easy access to pathological samples but rarely to samples from apparently healthy subjects. For certain types of samples, e.g. from paediatric subjects, access to healthy individuals is particularly difficult since ethical issues may prevent phlebotomy simply for establishing reference intervals. Furthermore, it is very time-consuming to establish reference intervals for all analytes and to repeat the work for any change in methods or analytical systems.

A possible alternative to overcome this situation is the development and implementation of ‘common’ reference intervals.

Common reference intervals

The basis for adopting common reference intervals are simple: if analytical methods are the same or yield comparable results because they are correctly standardized, and the population has the same characteristics or, alternatively, it is known that the specific analyte is not influenced by ethnicity or environment, the same, i.e. common reference intervals can be used. Unfortunately, the practical application of this simple concept is not as easy as it would appear. A number of prerequisites, summarized in Table 2, will need to be in place before adopting it.

Table 2

Necessary pre-requisites for production and use of common reference intervals

Category	Pre-requisite	Responsibility for implementation
Analytical	Existence of a reference measurement system (trueness-based)	IFCC, JCTLM, national metrological institutes
	Existence of traceable routine methods	Manufacturers
	Correct implementation of methods in clinical laboratories	Clinical laboratories
	Control of the performance of routine methods to keep them within stated limits for uncertainty	Clinical laboratories, EQAS organizers
Clinical	Accurate definition of reference intervals, providing information on the influence of biological and environmental factors	Joint effort between IFCC, manufacturers and clinical laboratories
	Comparable preanalytical phase	Clinical laboratories
	Validation of applicability of common reference intervals to the laboratory's own population	Clinical laboratories
	Adoption of common reference intervals	Clinical laboratories

IFCC, International Federation of Clinical Chemistry; JCTLM, Joint Committee on Traceability in Laboratory Medicine; EQAS, External Quality Assessment Scheme

Establishing common reference intervals

Assuming that, for a given analyte, a reference measurement system exists, the most demanding task for producing common reference intervals which can be adopted by any clinical laboratory that operates under similar preanalytical and analytical conditions are the definition of an adequate set of reference values. This should include subjects from different ethnic groups and from various environments in order to document whether clinically significant differences exist, which would prevent the use of common reference intervals. The best way to obtain this information is to conduct a multicentre study, involving clinical laboratories in different regions or countries. This approach has been pursued in Spain^71–75 and further developed in the Nordic countries.^76–80 In particular, it requires:

An a priori selection of reference individuals according to well-defined criteria, as specified before. The number of participating centres and enrolled individuals should be determined according to the number of subjects required for partitioning by age, gender, race, lifestyle, etc. To obtain sufficiently narrow CIs for the reference limits, the optimal number of individuals within each group should be around 500,⁷⁸ the minimal sample size to allow for non-parametric calculation of the confidence limits being 120. The criteria for partitioning should be according to Lahti et al.;^62–64

A clear definition of the preanalytical phase. Ideally, to reproduce sample handling within clinical laboratories, the analyses should be performed on fresh samples. However, to reduce analytical variability it is usual to freeze the samples and analyse them in a single or minimum number of batches. This approach is only acceptable if it has been demonstrated that freezing does not affect the analyte. If sample stability is confirmed, storage of additional aliquots for further use is highly recommended;

The use of methods providing results traceable to the reference measurement system and high interlaboratory comparability. Traceability to the reference measurement system must be verified through the use of two or more commutable materials (e.g. frozen pools) with values assigned by the reference measurement procedure, possibly by a number of reference laboratories. The interlaboratory comparability must be checked using common quality control materials. An internal quality control programme must be implemented in each participating laboratory with clearly defined a priori criteria, for acceptance or rejection of each analytical run;

Proper data analysis for the calculation of reference limits. The data from the different centres must be compared to identify the presence of any analytical bias (from the quality control data) or atypical distribution of the reference values. Finally, before calculating reference limits, possible outliers must be detected and eliminated.

Adopting common reference intervals

In order to be able to apply common reference intervals, a clinical laboratory has to verify the similarity of the preanalytical conditions to those adopted in the production of the intervals, the performance of the analytical system employed, and the characteristics of the population served.

Preanalytical conditions

The reference intervals can only be used if the same preanalytical conditions are applied (e.g. specimen type, fasting subjects, etc.), or if it is possible to demonstrate that any introduced modification has no significant effect, e.g. demonstrated equivalence between results obtained with heparin plasma and serum samples, analyte concentrations are not modified by meals, etc.

Analytical aspects

The method in use must produce results traceable to the reference measurement system for that specific analyte. For European countries, if the analytical system is ‘CE-marked’ it should be used according to the manufacturer's specifications. However, even though the European Directive on IVD medical devices⁵ stipulates traceability as an essential requisite, a number of routine analytical systems may still be significantly biased when compared with the internationally accepted reference systems, as was recently demonstrated for the measurement of some enzymes.⁸¹

The analytical quality of the method in use should be controlled in order to keep its total error within stated limits. Targets for allowable total error can be derived from the criteria related to biological variability.⁸² A list of estimated within-subject and between-subject biological variations and analytical quality specifications can be found at Westgard's website.⁸³ The magnitude of the total error can be checked through participation in External Quality Assessment Schemes (EQAS), provided that the control samples are commutable and their target values are assigned by laboratories using reference methods.

Characteristics of the population served by the laboratory

In a recent publication by Ichihara et al.⁸⁴ large between-city differences were demonstrated for several analytes in six Asian cities. If these results are confirmed and the observed differences considered large enough to merit separate reference intervals, the possibility of adopting common reference intervals is reduced to the analytes demonstrating no or low interpopulation variability.⁸⁵

In general, if race or life-style are known not to influence reference intervals, it is sufficient to verify the preanalytical and analytical aspects. If ethnicity or lifestyle are known to influence reference intervals or if no information is available, it is advisable that the clinical laboratory validates them on a small sample group derived from its own population, before their adoption. This validation can be done according to the CLSI document C28-A2, paragraph 8.2.¹⁶ The advice is to examine 20 individuals representing the local apparently healthy population and satisfying the selection criteria. After discarding outliers, if no more than two of the 20 tested values fall outside the common interval, it can be adopted. If three or more values fall outside the common reference limits, the experiment should be repeated with another 20 subjects. If no more than two of the 20 repeat values fall outside the common interval, adopt the interval; if three or more values again fall outside, it probably means that the populations differ and a specific reference interval is needed, provided that all the preanalytical and analytical aspects are controlled. This type of binomial test works well if the reference values have a Gaussian-like distribution, but it is very insensitive if the distribution is skewed. In the latter case, more powerful statistical tests should be carried out, e.g. the Kolmogorov-Smirnov test which compares the full dataset from tested reference individuals with the 20 reference specimens for a given laboratory. A further approach may be the calculation of reference intervals from the values of 20 subjects using a robust statistical algorithm, like that proposed by Horn et al.^53,54 to check whether the obtained experimental limits are within the confidence limits of the common reference limits. Finally, an alternative approach could be the application of one of the previously described statistical methods for data mining to the laboratory's stored data.^24,31,32 The comparison of the reference limits obtained, with those of the proposed common reference interval can allow judgement of their applicability.

The approach described for adopting common reference intervals is not straightforward. Development of reference measurement systems and compliance by manufacturers with calibration traceability can be a slow process. Establishing robust reference intervals is time-consuming and expensive. Clinical laboratories are usually disinclined to modify reference intervals as this is a demanding task, which also requires education of clinicians and patients. Large multicentre studies are needed for the correct definition of common reference intervals, at least for certain analytes, in order to make real progress in this field and bridge the large gap existing between sound theory and poor practice.

Adoption of validated reference intervals

If all the previously defined organizational, preanalytical and analytical requirements are fulfilled not only can reference intervals obtained experimentally in multicentre studies be adopted as common reference intervals, but also reference intervals defined by a single laboratory. The IFCC Committee on Reference Intervals and Decision Limits (C-RIDL) has recently published a paper on the validation of already published reference intervals for creatinine in serum.⁵⁸ Obviously, when adopting a validated reference interval developed in a single centre, the preliminary verification of the interval on the local population acquires greater importance.

Reference intervals vs. decision limits

The main characteristics related to these two concepts are reported in Table 3. Some confusion can arise from the use of previously defined reference intervals as decision limits in specific circumstances, e.g. in the screening of blood donors, all subjects above the upper reference limit for alanine aminotransferase could be excluded from donation. However, while reference intervals describe the biological characteristics of a well-defined (usually apparently healthy) population, decision limits depend upon the diagnostic question and are obtained from specific clinical studies in order to define the probability for the presence of a certain disease or for a different outcome. The decision limits selected are usually based on the level of overlap of two populations (diseased/non-diseased) and on the desired degree of clinical sensitivity and specificity. The scope of these limits, as defined by the word itself, is to lead to decisions: individuals with values above or below the decision limit should be treated differently.

Table 3

Differences between reference intervals and decision limits

	Reference intervals	Decision limits
Conditions influencing them	• Type of population	• Clinical question
	• Age group	• Patient category
	• Gender
Information gathered	Being or not being part of the reference population	Patient eligible for a certain procedure (‘treatment’)
Statistics	95% central range of the distribution curve	ROC curves and predictive values
Data number	Two (lower and upper limits)	One or more, according to the likelihood of clinical situation or different clinical questions

ROC, receiver-operating characteristic

Individual reference intervals

Although Harris⁸⁶ defined the theoretical basis and the statistical methods for individual reference intervals more than 30 years ago, their implementation represents a challenge for the future. Today, the development of information technology allows us to archive a huge amount of data and to retrieve and process them rapidly. On the other hand, implementation of traceability concepts and the consequent improvement of assay standardization will increase result stability and comparability over time and location for many analytes. These premises will permit transformation of theory into practice. The experimental model is quite simple and requires the collection of several samples from the same individual during a period of stable health. The results of measurements on these samples for a given analyte will produce a temporal series, forming a baseline against which future results will be judged. A fundamental issue is the number of samples needed to define the baseline value with acceptable approximation. This depends upon the biological variability of the analyte, its analytical reproducibility and the applied mathematical models.⁸⁷ In clinical practice, this approach is already used in doping control programmes, in which the baseline values of haematological parameters for athletes are recorded and individual reference intervals calculated. This allows detection of the use of illegal substances causing significant changes in the individual's haematological analytes.^88,89

Conclusions

Even though a large number of publications on the topic of reference intervals already exist, it is clear that much work is needed to reach an optimal situation. In general, modifying the existing reference intervals is always a delicate task, requiring a culture of commitment to inform clinicians and their patients. The definition of common reference intervals will hopefully significantly reduce the number of different reference intervals employed for the same analyte, providing the clinician with more congruent and effective information. Laboratorians need to increase their efforts in these areas, trying to overcome the undoubted practical difficulties and the inactivity that sometimes characterized the past. Otherwise, the improvements in the theory will not be translated into clinical practice and patients will not obtain the expected advantages.

Footnotes

Acknowledgements

F. Ceriotti has been the chair of the IFCC Scientific Division Committee on Reference Intervals and Decision Limits (C-RIDL) since its creation in 2005. R. Hinzmann was the liaison between the Scientific Division Executive Committee and C-RIDL between 2005 and 2007.

References

Gräsbeck

, Saris

. Establishment and use of normal values. Scand J Clin Lab Invest 1969;26 (Suppl. 110):62–3

Solberg

International Federation of Clinical Chemistry (IFCC). Scientific Committee, Clinical Section. Expert Panel on Theory of Reference Values International Committee for Standardization in Haematology (ICSH), Standing Committee on Reference Values. Approved recommendation (1986) on the theory of reference values. Part 1. The concept of reference values. J Clin Chem Clin Biochem 1987;25:337–42. ( Clin Chim Acta 1987;165:111–8; Labmedica 1987;4:27–31; Ann Biol Clin 1987;45:237–41)

Galen

, Gambino

. Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses. New York, USA: John Wiley and Sons, 1975

International Organization for Standardization. Medical Laboratories – Particular Requirements for Quality and Competence ISO 15189. Geneva: ISO, 2007

Directive 98/79/EC of the European Parliament and of the Council of 27 October 1998 on in vitro diagnostic medical devices. Offic J Eur Commun 7 December 1998, L331/1-L331/37

Murphy

. The normal, and the perils of the sylleptic argument. Perspect Biol Med 1972;15:566–82

Gräsbeck

. The evolution of the reference value concept. Clin Chem Lab Med 2004;42:692–7

Petitclerc

. Normality: the unreachable star. Clin Chem Lab Med 2004;42:698–701

10.

Ritchie

, Palomaki

. Selecting clinically relevant populations for reference intervals. Clin Chem Lab Med 2004;42:702–9

11.

World Health Organization. Constitution in Basic Documents. Geneva: WHO, 1948

12.

Alström

, Gräsbeck

, Hjelm

, Skandsen

; Committee on Reference Values, Scandinavian Society for Clinical Chemistry and Clinical Physiology. Recommendations concerning the collection of reference values in clinical chemistry and activity report. Scand J Clin Lab Invest 1975;35 (Suppl. 144):1–74

13.

Berg

, Nilsson

J-E

, Solberg

, Tryding

. Practical experience in the selection and preparation of reference individuals: empirical testing of the provisional Scandinavian recommendations. In: Gräsbeck

, Alström

eds., Reference Values in Laboratory Medicine. John Wiley & Sons, 1981: 55–64

14.

Horn

, Pesce

. Reference intervals: an update. Clin Chim Acta 2003;334:5–23

15.

PetitClerc

, Solberg

International Federation of Clinical Chemistry (IFCC). Approved recommendation (1987) on the theory of reference values. Part 2. Selection of individuals for the production of reference values. J Clin Chem Clin Biochem 1987;25:639–44. ( Clin Chim Acta 1987;170:S3–12)

16.

Clinical and Laboratory Standards Institute. How to Define and Determine Reference Intervals in the Clinical Laboratory; Approved Guideline (CLSI document C28-A2). 2nd edn. PA, USA: Clinical and Laboratory Standards Institute, 2000

17.

Harris

. Effects of intra- and interindividual variation on the appropriate use of normal ranges. Clin Chem 1974;20:1535–42

18.

Hyltoft Petersen

, Fraser

, Sandberg

, Goldschmidt

. The index of individuality is often a misinterpreted quantity characteristic. Clin Chem Lab Med 1999;37:655–61

19.

Fraser

. Inherent biological variation and reference values. Clin Chem Lab Med 2004;42:758–64

20.

Gellerstedt

, Petersen

. Partitioning reference values for several subpopulations using cluster analysis. Clin Chem Lab Med 2007;45:1026–32

21.

Martin

, Hologgitas

, Driscoll

, Fanger

, Gudzinowicz

. Reference values based on populations accessible to hospitals. In: Gräsbeck

, Alström

, eds. Reference Values in Laboratory Medicine Chischester: Wiley, 1981:233–62

22.

Naus

, Borst

, Kuppens

. The use of patient data for the calculation of reference values for some haematological parameters. J Clin Chem Clin Biochem 1980;18:621–5

23.

Baadenhuijsen

, Smit

. Indirect estimation of clinical chemical reference intervals from total hospital patient data: application of a modified Bhattacharya procedure. J Clin Chem Clin Biochem 1985;23:829–39

24.

Kairisto

, Hänninen

, Leino

, Generation of reference values for cardiac enzymes from hospital admission laboratory data. Eur J Clin Chem Clin Biochem 1994;32:789–96

25.

Kouri

, Kairisto

, Virtanen

, Reference intervals developed from data for hospitalized patients: computerized method based on combination of laboratory and diagnostic data. Clin Chem 1994;40:2209–15

26.

Krøll

, Saxtrup

. On the use of patient data for the definition of reference intervals in clinical chemistry. Scand J Clin Lab Invest 1998;58:469–73

27.

Bock

, Dolan

, Miller

, The data warehouse as a foundation for population-based reference intervals. Am J Clin Pathol 2003;120:662–70

28.

Grossi

, Colombo

, Cavuto

, Franzini

. The REALAB project: a new method for the formulation of reference intervals based on current data. Clin Chem 2005;51:1232–40

29.

Haeckel

, Wosniok

, Arzideh

. A plea for intra-laboratory reference limits. Part 1. General considerations and concepts for determination. Clin Chem Lab Med 2007;45:1033–42

30.

Arzideh

, Wosniok

, Gurr

, A plea for intra-laboratory reference limits. Part 2. A bimodal retrospective concept for determining reference limits from intra-laboratory databases demonstrated by catalytic activity concentrations of enzymes. Clin Chem Lab Med 2007;45:1043–57

31.

Hoffmann

. Statistics in the practice of medicine. J Am Med Assoc 1963;185:864–73

32.

Bhattacharya

. A simple method of resolution of a distribution into Gaussian components. Biometrics 1967;23:115–35

33.

Solberg

. Using a hospitalized population to establish reference intervals: pros and cons. Clin Chem 1994;40:2205–6

34.

Ferre-Masferrer

, Fuentes-Arderiu

, Puchal-Ane

. Indirect reference limits estimated from patients' results by three mathematical procedures. Clin Chim Acta 1999;279:97–105

35.

Ceriotti

, Schumann

, Panteghini

. Redefining reference limits needs more attention to the analytical aspects. Liver Int 2006;26:1155–6

36.

Solberg

, PetitClerc

, International Federation of Clinical Chemistry (IFCC). Scientific Committee, Clinical Section. Expert Panel on Theory of Reference Values (EPTRV). Approved recommendation (1988) on the theory of reference values. Part 3. Preparation of individuals and collection of specimens for the production of reference values. J Clin Chem Clin Biochem 1988;26:593–8 Clin Chim Acta 1988;177:S1–12

37.

Solberg

, Stamm

International Federation of Clinical Chemistry (IFCC), Scientific Committee, Clinical Section. Expert Panel on Theory of Reference Values. Approved recommendation (1991) on the theory of reference values. Part 4. Control of analytical variation in the production, transfer and application of reference values. Eur J Clin Chem Clin Biochem 1991;29:531–5. ( Clin Chim Acta 1991;202:S5–12)

38.

International Vocabulary of Basic and General Terms in Metrology (VIM) ISO VIM (DGUIDE 99999). Geneva: International Organization for Standardization, 2004

39.

Tietz

. A model for a comprehensive measurement system in clinical chemistry. Clin Chem 1979;25:833–9

40.

Müller

. Implementation of reference systems in laboratory medicine. Clin Chem 2000;46:1907–9

41.

International Organization for Standardization. In vitro diagnostic medical devices – Measurement of quantities in biological samples – Metrological traceability of values assigned to calibrators and control materials. Geneva: ISO, 2003 (EN ISO 17511:2003)

42.

International Organization for Standardization. In vitro Diagnostic Medical Devices – Measurement of Quantities in Samples of Biological Origin – Metrological Traceability of Values for Catalytic Concentration of Enzymes Assigned to Calibrators and Control Materials. Geneva: ISO, 2003 (EN ISO 18513:2003)

43.

Harris

, Boyd

. Statistical Bases of Reference Values in Laboratory Medicine. NY, USA: Marcel Dekker, 1995

44.

Horn

, Pesce

. Reference Intervals. A User's Guide. Washington, DC: AACC Press, 2005

45.

Wootton

IDP

, King

, Maclean Smith

. The quantitative approach to hospital biochemistry: normal values and the use of biochemical determinations for diagnosis and prognosis. Br Med Bull 1951;7:307–11

46.

Wootton

IDP

, King

. Normal values for blood constituents. Interhospital differences. Lancet 1953;1:470–1

47.

Reed

, Henry

, Manson

. Influence of statistical method used on the resulting estimate of normal range. Clin Chem 1971;17:275–84

48.

Hanie

, Demets

. Estimation of normal ranges and cumulative proportions by transforming observed distributions to Gaussian form. Clin Chem 1972;18:605–12

49.

Boyd

, Lacher

. A multi-stage Gaussian transformation algorithm for clinical laboratory data. Clin Chem 1982;28:1741–53

50.

Shultz

, Willard

, Rich

, Connelly

, Critchfield

. Improved reference-interval estimation. Clin Chem 1985;31:1974–8

51.

Solberg

International Federation of Clinical Chemistry (IFCC). Scientific Committee, Clinical Section. Expert Panel on Theory of Reference Values (EPTRV) International Committee for Standardization in Haematology (ICSH), Standing Committee on Reference Values. Approved recommendation (1987) on the theory of reference values. Part 5. Statistical treatment of collected reference values. Determination of reference limits. J Clin Chem Clin Biochem 1987;25:645–56. ( Clin Chim Acta 1987;170:S13–32)

52.

Jørgensen

LGM

, Brandslund

, Hyltoft Petersen

. Should we maintain the 95 percent reference intervals in the era of wellness testing? A concept paper. Clin Chem Lab Med 2004;42:747–51

53.

Horn

, Pesce

, Copeland

. A robust approach to reference interval estimation and evaluation. Clin Chem 1998;44:622–31

54.

Horn

, Pesce

, Copeland

. Reference interval computation using robust vs. parametric and nonparametric analyses. Clin Chem 1999;45:2284–5

55.

Box

GEP

, Cox

. An analysis of transformations. J R Stat Soc 1964;B26:211–52

56.

Virtanen

, Kairisto

, Irjala

, Rajamäki

, Uusipaikka

. Regression-based reference limits and their reliability: example on hemoglobin during the first year of life. Clin Chem 1998;44:327–35

57.

Virtanen

, Kairisto

, Uusipaikka

. Parametric methods for estimating covariate-dependent reference limits. Clin Chem Lab Med 2004;42:734–8

58.

Ceriotti

, Boyd

, Klein

, Reference intervals for creatinine concentrations in serum: assessment of the available data for global application. Clin Chem 2008;54:559–66

59.

Sinton

, Crowley

, Bryant

. Reference values for calcium, phosphate, and alkaline phosphatase as derived on the basis of multichannel-analyzer profiles. Clin Chem 1986;32:76–9

60.

Lahti

. Partitioning biochemical reference data into subgroups: comparison of existing methods. Clin Chem Lab Med 2004;42:725–33

61.

Harris

, Boyd

. On dividing reference data into subgroups to produce separate reference ranges. Clin Chem 1990;36:265–70

62.

Lahti

, Hyltoft Petersen

, Boyd

, Fraser

, Jørgensen

. Objective criteria for partitioning Gaussian-distributed reference values into subgroups. Clin Chem 2002;48:338–52

63.

Lahti

, Hyltoft Petersen

, Boyd

. Impact of subgroup prevalences on partitioning Gaussian-distributed reference values. Clin Chem 2002;48:1987–99

64.

Lahti

, Hyltoft Petersen

, Boyd

, Rustad

, Laake

, Solberg

. Partitioning of nongaussian distributed biochemical reference data into subgroups. Clin Chem 2004;50:891–900

65.

Gowans

EMS

, Hyltoft Petersen

, Blaabjerg

, Hørder

. Analytical goals for acceptance of common reference intervals for laboratories throughout a geographical area. Scand J Clin Lab Invest 1988;48:757–64

66.

Dixon

. Processing data for outliers. Biometrics 1953;9:74–89

67.

Horn

, Feng

, Li

, Pesce

. Effect of outliers and nonhealthy individuals on reference interval estimation. Clin Chem 2001;47:2137–45

68.

Tukey

. Exploratory Data Analysis. Reading, MA, USA: Addison-Wesley, 1977

69.

Schneider

. Some thoughts on normal, or standard, values in clinical medicine. Pediatrics 1960;26:973–84

70.

Ceriotti

. Gli intervalli di riferimento nel nuovo millennio. Biochim Clin 2007;4:254–66

71.

Fuentes-Arderiu

, Ferré-Masferrer

, Gonzales-Alba

, Escola-Aliberas

, Balsells-Rosello

, Blanco-Cristobal

. Multicentric reference values for some quantities measured with Tina-Quant reagents systems and RD/Hitachi analysers. Scand J Clin Lab Invest 2001;61:273–6

72.

Ferré-Masferrer

, Fuentes-Arderiu

, Alvares Funes

, Güel-Miró

, CastiñeirasLacambra

. Multicentric reference values: shared reference limits. Eur J Clin Chem Clin Biochem 1997;35:715–8

73.

Ferré-Masferrer

, Fuentes-Arderiu

, Gomà-Llongueras

, Regional reference values for some quantities measured with the ADVIA Centaur analyser. A model of co-operation between the in vitro diagnostic industry and clinical laboratories. Clin Chem Lab Med 2001;39:166–9

74.

Fuentes-Arderiu

, Ferré-Masferrer

, González-Alba

, Multicentric reference values for some quantities measured with the Elecsys 2010 analyser. Clin Chim Acta 2001;304:143–6

75.

Fuentes-Arderiu

, Mas-Serra

, Alumá-Trullás

, Martí-Marcet

, Dot-Bach

. Guideline for the production of multicentre physiological reference values using the same measurement system. A proposal of the Catalan Association for Clinical Laboratory Sciences. Clin Chem Lab Med 2004;42:778–82

76.

Bäck

S-E

, Nilsson

J-E

, Fex

, Jeppsson

J-O

, Rosén

, Tryding

. Towards common reference intervals in clinical chemistry. An attempt at harmonization between three hospital laboratories in Skåne, Sweden. Clin Chem Lab Med 1999;37:573–92

77.

Rustad

, Felding

, Lahti

. Proposal for guidelines to establish common biological reference intervals in large geographical areas for biochemical quantities measured frequently in serum and plasma. Clin Chem Lab Med 2004;42:783–91

78.

Hyltoft Petersen

, Rustad

. Prerequisites for establishing common reference intervals. Scand J Clin Lab Invest 2004;64:285–92

79.

Rustad

, Felding

, Franzson

, The Nordic Reference Interval Project 2000: recommended reference intervals for 25 common biochemical properties. Scand J Clin Lab Invest 2004;64:271–84

80.

Rustad

, Felding

, Lahti

, Hyltoft Petersen

. Descriptive analytical data and consequences for calculation of common reference intervals in the Nordic Reference Interval Project 2000. Scand J Clin Lab Invest 2004;64:343–70

81.

Jansen

, Schumann

, Baadenhuijsen

, Trueness verification and traceability assessment of results from commercial systems for measurement of six enzyme activities in serum: an international study in the EC4 framework of the Calibration 2000 project. Clin Chim Acta 2006;368:160–7

82.

Cotlove

, Harris

, Williams

. Biological and analytic components of variation in long-term studies of serum constituents in normal subjects. III. Physiological and medical implications. Clin Chem 1970;16:1028–32

83.

84.

Ichihara

, Itoh

, Lam

CWK

, Sources of variation for commonly measured serum analytes among 6 Asian cities and consideration of common reference intervals. Clin Chem 2008;54:356–65

85.

Boyd

. Caution in the adoption of common reference interval. Clin Chem 2008;54:238–9

86.

Harris

. Some theory of reference values. II. Comparison of some statistical models of intraindividual variation in blood constituents. Clin Chem 1976;22:1343–50

87.

Queraltó

. Intraindividual reference values. Clin Chem Lab Med 2004;42:765–77

88.

Sharpe

, Ashenden

, Schumacher

. A third generation approach to detect erythropoietin abuse in athletes. Haematologica 2006;91:356–63

89.

Ashenden

. A strategy to deter blood doping in sport. Haematologica 2002;87:225–34

Reference intervals: the way forward

Abstract

Introduction

The theory of reference values

Selection of the reference subjects

A priori vs. a posteriori selection

Indirect reference values

Preanalytical aspects

Analytical aspects

Calculation of reference limits

Statistical methods

Partitioning criteria

Detection of outliers

Current situation and future developments

Common reference intervals

Establishing common reference intervals

Adopting common reference intervals

Preanalytical conditions

Analytical aspects

Characteristics of the population served by the laboratory

Adoption of validated reference intervals

Reference intervals vs. decision limits

Individual reference intervals

Conclusions

Footnotes

Acknowledgements

References