Abstract

Few patients being assessed and treated in acute situations in hospitals have only one set of laboratory investigations undertaken. While the more esoteric and specialist investigations may be done less frequently, readily available tests in laboratory medicine such as urea and electrolytes, liver function tests, bone studies and full blood count are often requested on a daily basis, if not more often. Although this practice is often considered inappropriate and wasteful, it does give serial results on individuals: these should be used to add value in both laboratory and clinical management.
Although fixed numerical decision limits based on risk, for example, have many advantages and are becoming more widely applied throughout medicine, laboratories fulfil accreditation and other requirements by reporting population-based reference intervals, sometimes partitioned according to age, gender or other important factors, with almost every test result. These remain the basis of interpretation of numerical results when no previous data are available on an individual, as in some diagnosis and screening settings. However, it is now well recognized that within-subject biological variation (CVI) is much smaller than between-subject biological variation (CVG) for nearly all quantities assayed in laboratory medicine. 1 As a result, traditional reference intervals have limited value in diagnosis and screening because many patients will have results that are highly unusual for them but that still lie within the reference intervals, irrespective of whether these are generated by the individual laboratory, transferred, validated or harmonized. Conventional reference intervals are of even more limited use in the assessment of serial results generated on an individual. Each individual has values that span only a part of the conventional reference interval. In consequence, individuals can have significant changes in results when these all lie within the reference interval. In addition, results can change from inside the interval to outside (and vice versa) without significance. 2 Making better use of differences in serial laboratory results is required.
In this issue, Garner et al. 3 report an investigation on the detection of acute kidney injury (AKI) in hospital patients by comparing three of the proposed AKI definitions and a very simple delta check using serial serum creatinine results. Unsurprisingly, it was found that use of the different definitions proposed for AKI detected different populations of patients, although all definitions make use of rises in serum creatinine concentration in an individual. However, it was demonstrated that the laboratory delta check detected 98% of all the patients identified by Acute Kidney Injury Network (AKIN), Risk, Injury, Failure (RIFLE) and Waikar & Bonventre strategies combined and therefore suggested that the delta check could provide a practical way of detecting AKI patients. This work could be emulated for other quantities for which use of differences in serial results is of clinical importance.
Delta check functionality is embedded in most laboratory information management systems (LIMS). Although there are a number of ways to select the delta check values, 4 setting is usually done somewhat empirically with the aim of detecting major blunders, such as the submission for analysis of samples from different patients but which have been labelled with a single identifier, but not flagging too many cases for further time-consuming, but often inconsequential, investigation by professionals in laboratory medicine. Garner et al. 3 selected the delta check value of an increase of 26 μmol/L in serum creatinine for the detection of AKI using the criteria set in the AKIN and Waikar & Bonventre strategies for diagnosis of Stage 1 AKI, and this simple approach did appear to be potentially clinically useful. Others have proposed similarly simple numerical criteria for interpretation of differences in serial results: recent examples include that reductions of at least 25% and 50% are considered minimal and partial responses to treatment for monoclonal proteins in serum 5 and a change of 20% is significant for serum troponin. 6 The question arises as to whether such criteria should be used as delta check values in the laboratory and then differences greater than the delta check values notified to users in some way so as to point out that these are of potential clinical interest. The question of communication of information on the patients identified by the delta check was not addressed by Garner et al. 3 In addition, a perhaps more interesting question is whether there are better, more scientific, ways to set criteria for interpretation of differences in serial results and whether these criteria can then be applied as delta check values and for other purposes in laboratory medicine.
Garner et al. 3 discuss the use of reference change values (RCV) and consider that RCV could potentially provide a more efficient means of detecting AKI patients. Differences in serial results from an individual may be due to the individual improving or deteriorating clinically, but also to three inherent sources of variation, namely, preanalytical variation (CVP), analytical imprecision (CVA) and CVI. The basis for the RCV aid to interpretation is that, for a difference in serial results to be significant, this must be greater than the inherent variation. This variation is the RCV (or sometimes termed the critical difference) which is calculated as: RCV = 21/2 × Z × (CVA 2 + CVI 2)1/2, where Z is the number of standard deviations appropriate to the desired probability and CVP is considered to be minimized by good laboratory practice. RCV are very simple to calculate since all laboratories know the CVA of every method in detail from internal quality control strategies and estimates of CVI have been documented for many quantities. 7 Estimates of CVI are generally constant over time, geography, methodology and in chronic but stable disease: 8 the CVI of creatinine in health is 5.3% 7 and is 5.3%, 5.9% and 6.4% in chronic renal failure, type 1 diabetes mellitus and impaired renal function, respectively. 8 However, Garner et al. 3 are right that whether this thesis holds true for acute illnesses is unclear. As per their example of variation in serum creatinine following myocardial infarction, in acute situations, CVI will likely be higher than in health. However, it has been pointed out that use of RCV calculated using CVI derived from the healthy will simply give what could be termed ‘false-positive’ results, that is, differences would be identified that are not really of major clinical importance: 4 this is probably a better clinical option than missing important differences by having a RCV which is larger than appropriate.
Also important is the selection of the correct Z-score. In the bulk of the literature on the generation and application of data on biological variation (and perhaps much more widely in medicine), 95% probability (P < 0.05) is considered significant. However, other levels of probability may be of real interest. Moreover, it is vital to recognize, as did Garner et al., 3 that the often cited Z-scores of 1.96 for P < 0.05 and 2.58 for P < 0.01 are two-sided and can only be used when both a rise and fall are being considered together, in other words, a change. If the real clinical requirement is the evaluation of a rise or increase, such as in the case of detection of AKI, or a fall, decline, decrease or reduction, then one-sided Z-scores must be used: these are 1.65 for P < 0.05 and 2.33 for P < 0.01. Further information on relationships between differences and probabilities can be gleaned using a simple rearrangement of the RCV equation which makes the Z-score the unknown, that is: Z = change/[21/2 × (CVA 2 + CVI 2)1/2]. The probability appropriate for the Z-score can be obtained from standard statistical tables. This approach has been used 4 to create tables and graphs showing probability versus change: an example is shown in a recent investigation on dehydration markers 9 where graphs of probability versus change for plasma osmolality, urine specific gravity and body mass were given together with semantic interpretative anchors on the abscissa, namely, change was likely at P > 0.80, more likely at P > 0.90, very likely at P > 0.95 and virtually certain at P > 0.99.
It has been proposed previously that RCV can be used for delta checking and autoverification: LIMS can be adapted to do this, as has been demonstrated. 10 Moreover, serial results showing significant and highly significant differences can be flagged on laboratory reports, 4,10 just like results outside conventional reference intervals. Perhaps this approach should be more widely emulated since it seems the best means to use delta check values of clinical utility, irrespective of how they are derived, that is, as initiating elements for flags on laboratory reports to draw clinical attention to a significant difference in serial results that might indicate improvement or deterioration. Often, much time and effort is spent agonizing over the appropriateness of the reference values quoted by laboratories on reports and in handbooks. Since most investigations in laboratory medicine are actually done for monitoring, either in the acute setting in hospitals or in evaluation of chronic disease over the longer term, surely as much thought and effort should be put into creating the means to inform the users of serial test results about the significance of the differences seen.
Creatinine has a CVI = 5.3% in health 7 and, as per Garner et al., 3 let it be assumed that CVA = 5.0%. Then rises of 10%, 20% and 30% are significant at 83.5%, 97.5% and 99.9% probabilities, respectively. Rises of 1.5, 2.0 and 3.0 times baseline, as suggested in the RIFLE and AKIN definitions, are clearly significant at very high levels of probability. However, in these definitions, Stage 3 AKI is defined as rises in serum creatinine of 3.0 times baseline or an increase ≥44 μmol/L from baseline when creatinine >350 μmol/L. A rise of 44 μmol/L from a concentration of 350 μmol/L is a difference of just 12.6% and, using the above model, the significance of such a change is 89% (P < 0.11). Thus, the alternatives in the definitions do not actually seem to give the same probability that the rise is significant. It is suspected that this shows the somewhat empirical nature of such definitions and points out the clear need for professionals in laboratory medicine to better educate users in the interpretation of numerical results, in particular, in the use of population-based reference values and RCV and the true meaning of the causes and significance of differences seen in serial results from an individual. Moreover, there appear to be important roles for such professionals to play in the generation of sound, evidence-based, guidelines and recommendations on interpretation of laboratory data in a wide variety of clinical scenarios.
DECLARATIONS
