Statistical validation of reagent lot change in the clinical chemistry laboratory can confer insights on good clinical laboratory practice

Abstract

Verification of new lot reagent’s suitability is necessary to ensure that results for patients’ samples are consistent before and after reagent lot changes. A typical procedure is to measure results of some patients’ samples along with quality control (QC) materials. In this study, the results of patients’ samples and QC materials in reagent lot changes were analysed. In addition, the opinion regarding QC target range adjustment along with reagent lot changes was proposed. Patients’ sample and QC material results of 360 reagent lot change events involving 61 analytes and eight instrument platforms were analysed. The between-lot differences for the patients’ samples (Δ_P) and the QC materials (Δ_QC) were tested by Mann–Whitney U tests. The size of the between-lot differences in the QC data was calculated as multiples of standard deviation (SD). The Δ_P and Δ_QC values only differed significantly in 7.8% of the reagent lot change events. This frequency was not affected by the assay principle or the QC material source. One SD was proposed for the cutoff for maintaining pre-existing target range after reagent lot change. While non-commutable QC material results were infrequent in the present study, our data confirmed that QC materials have limited usefulness when assessing new reagent lots. Also a 1 SD standard for establishing a new QC target range after reagent lot change event was proposed.

Keywords

Quality assurance and control laboratory methods

Introduction

It is essential that clinical laboratories achieve consistent and reliable test results. Tests performed using a modern automated clinical chemistry analyser involve the instrument, the calibrator and one or more reagents. Reagent lot changes occur relatively frequently because large amounts of reagents are used for short periods. Therefore, to ensure the consistency of the test results of a clinical laboratory, the results before and after reagent lot changes should be analysed at the statistical level.¹ For this reason, the inspection checklist of the College of American Pathologist (CAP) requires that when there is a reagent lot change, a certain number of patient specimens and 2–3 levels of quality control (QC) materials should be tested with the new and old reagent lots.² When differences in the results measured using new lots and old lots meet the acceptance criteria in each laboratory, the test results are considered as consistent irrespective of reagent lot changes. Although acceptance criteria vary from laboratory to laboratory, a new lot is generally considered to be acceptable when (i) the results of the patient specimens obtained with the new lot do not differ from those with the old lot³ and (ii) the QC material values obtained with the new reagent lot fall within the predefined QC material target range of the old reagent lot. The validation process can be passed only when these both criteria are met.

However, there are some doubts about the usefulness and significance of the second criterion in lot-to-lot consistency evaluations because QC materials can show ‘non-commutability’, namely, the different numeric relationship between the test results with QC materials and those with patient specimens.^1,4–13 This non-commutability of QC materials relates to a matrix effect that occurs during the production of QC materials and is due to the use of substances whose properties differ from those of similar substances in patient clinical specimens.^6,14–16 However, despite the fact that QC materials cannot completely reflect the characteristics of patient specimens, they are used in the daily and routine internal QC practices of clinical laboratories. In this regard, the target ranges of QC materials need to be adjusted when the results of QC materials measured with new lot reagent are abruptly changed due to non-commutability. But there is no specific guideline as to the extent of alteration that demands the adjustment of QC material target ranges.

In this study, we estimated the extent of changes in QC materials’ results relative to patient specimens during reagent lot changes. In addition, we estimated the degree of changes in QC materials’ results caused by reagent lot changes and proposed the criteria of QC target range adjustment.

Materials and methods

The parallel test results of patient and QC samples after 360 reagent lot change events in the clinical chemistry laboratory of the Department of Laboratory Medicine, Asan Medical Center, Seoul from January 2009 to April 2012 were analysed. These reagent lot changes involved eight instrument platforms and 61 analytes. Manufacturers and models of the laboratory measurement procedures used and the manufacturers of the QC materials used for each measurement procedure are listed in Table 1.

Table 1.

The lists of manufacturers and models of the laboratory measurement procedures used and the manufacturers of the QC materials used for each measurement procedure.

Manufacturer	Model of measurement procedures	Analytes	Manufacturer of QC material
Beckman Coulter	DxC	Uric acid	Bio-rad
		Cholesterol	Bio-rad
		Alkaline phosphatase	Bio-rad
		Total bilirubin	Bio-rad
		Direct bilirubin	Bio-rad
		Magnesium	Bio-rad
		Lipase	Bio-rad
		Triglyceride	Bio-rad
		HDL cholesterol	Bio-rad
		CRP	Bio-rad
		hsCRP	Bio-rad
Siemens	Centaur	Myoglobin	Bio-rad
		Troponin-I	Bio-rad
		CK-MB	Bio-rad
		BNP	Bio-rad
		Ferritin	Bio-rad
		Homocystein	Bio-rad
		Vitamin B12	Bio-rad
		Folate	Bio-rad
Siemens	Viva-E	MPA	Siemens
		NEFA	Bio-rad
		CH50	Bio-rad
Roche	Cobas Integra 800	Carbamazepine	Bio-rad
		Phenobarbital	Bio-rad
		Phenytoin	Bio-rad
		Amikacin	Bio-rad
		Vancomycin	Bio-rad
		Theophylline	Bio-rad
		Digoxin	Bio-rad
		Valproic acid	Bio-rad
		Gentamicin	Bio-rad
		CK	Bio-rad
		Uric acid	Bio-rad
		Iron	Bio-rad
		Lipase	Bio-rad
Siemens	Dimension RXL	Cyclosporin	Bio-rad
Siemens	Dimension RXL	FK506	Bio-rad
Siemens	BN II	IgG	Bio-rad
		IgA	Bio-rad
		IgM	Bio-rad
		IgE	Bio-rad
		C3	Bio-rad
		C4	Bio-rad
		Cystatin C	Bio-rad
		Lp(a)	Siemens
		Ceruloplasmin	Bio-rad
		Haptoglobin	Bio-rad
		Prealbumin	Bio-rad
		RF quantitative	Bio-rad
		Transferrin	Bio-rad
Roche	Cobas C6000	Apo-A	Bio-rad
		Apo-B	Bio-rad
		c-telopeptide	Roche
		Osteocalcin	Roche
		AFP	Roche
		β-hCG	Roche
		CA19-9	Roche
Abbott	Architect	AFP	Abbott
		Free T4	Abbott
		TSH	Abbott
		CEA	Abbott
		CA125	Abbott

AFP: alpha fetoprotein; BNP: B-type natriuretic peptide; CEA: carcinoembryonic antigen; CK: creatine kinase; CK-MB: creatine kinase MB; CRP: C-reactive protein; MPA: mycophenolic acid; NEFA: non-ester fatty acid; TSH: thyroid stimulating hormone.

In the laboratory, the new lot reagents are validated by testing 3–5 patients’ samples and 2–3 levels of QC materials with both old and new lots. The new lot is considered to be acceptable when the average Delta % of the patients’ sample results obtained with the new and old lots fall within ±10%. Delta % was calculated as follows: ([new lot result – old lot result] /old lot result) × 100. Also QC materials’ results are regarded as acceptable when the results tested with new lots fall within the predefined target range with old lot. Our laboratory usually establishes the target range as mean ± 2 SD by using the mean and standard deviation (SD) derived from the initial evaluation study.

All patient and QC sample tests with old and new reagent lots were conducted under similar conditions using identical equipment, reagents and manufacturer’s guides.

Statistical procedures

For all 360 reagent lot changes, the results of the patient specimens and QC with the old and new lots were natural log-transformed, and the differences between the new and old lots for each sample (Δ_P) and quality control (Δ_QC) were calculated. For each reagent lot change event, the differences between Δ_P and Δ_QC were tested for significance by using Mann–Whitney U testing because there were 3–5 Δ_P and Δ_QC, below 30, in each reagent lot change event. The software SPSS version 13.0 (SPSS, Inc., Chicago, IL, USA) was employed for this. P values below 0.05 were considered to be statistically significant.

The 28 reagent lot changes associated with a significant difference between Δ_P and Δ_QC (Table 2) were then examined more closely. For each of these events, the patient or QC result with the old lot was subtracted from the patient result with the new lot to yield D_p or D_QC, respectively, which could be a positive or negative number. The D_p and the D_QC of each of the 28 reagent lot changes were then compared to see if they were both positive, both negative (i.e. the changes were in the same direction) or one was positive while the other was negative (i.e. the changes were in opposite directions).

Table 2.

Number of events with significant or non-significant differences between the results of QC materials and those of patients’ samples after changes in reagent lots for analytes grouped by category of analytical measurement procedure and source of QC material.

	All data	Measurement procedure		Source of QC material
	n = 360 Reagent lot change-QC sample events (%)	n = 171 General chemistry (%)	n = 189 Immunoassay (%)	n = 287 Third party QC (%)	n = 73 QC provided by method manufacturer (%)
Significant difference	28 (7.8%)	14 (8.2%)	14 (7.4%)	25 (8.7%)	3 (4.1%)
Non-significant difference	332 (92.2%)	157 (91.8%)	175 (92.6%)	262 (91.3%)	70 (95.9%)

In total, there were 962 D_QC results in the 360 reagent lot change events. For each of these 962 D_QC results, the size of the D_QC was calculated by dividing it by the SD used in the predefined criteria set for daily QC tasks and parallel testing. Thus, D_QC was expressed as a multiple of SD. These multiples of SD results were then classified into five categories: <0.5, 0.5–1.0, 1.0–1.5, 1.5–2.0 and >2.0. The QC results in each corresponding category were assessed.

Results

As presented in Table 2, the Δ_P and Δ_QC differed significantly (P < 0.05) in 28 of the 360 reagent lot changes (7.8%). The 360 reagent lot changes involved 171 general chemistry tests and 189 immunoassays. The Δ_P and Δ_QC differed significantly in 14 (8.2%) and 14 (7.4%) of these test types, respectively. The two test types did not differ in terms of frequency of Δ_P and Δ_QC differences (P = 0.937). In addition, in 73 and 287 of the reagent lot change events, the QC material was provided by the method manufacturer and a third party, respectively. The Δ_P and Δ_QC differed significantly in 3 (4.1%) and 25 (8.7%) lot change events for each source of QC material. However, two sources of QC material did not significantly differ in the frequency of Δ_P and Δ_QC differences (P = 0.229).

The 360 reagent lot change events were then analysed more closely by calculating D_QC and D_P and determining the direction of change (Table 3). Among 28 events that were associated with significant Δ_P and Δ_QC differences, both D_QC and D_P showed change in the same direction in 12 events (42.9%) and the opposite direction of change in the remaining 16 events (57.1%). Analysis of the D_QC and D_P trends in the 332 reagent lot change events that were not associated with significant differences between Δ_P and Δ_QC revealed that both D_QC and D_P showed the same direction changes in 220 (66.3%) of the events. The remaining 112 events (33.7%) were associated with opposite trends. However, these frequencies did not differ significantly from the frequencies of the 28 events showing significant differences between Δ_P and Δ_QC.

Table 3.

Comparison of the trends of D_QC and D_P in reagent lot-to-lot change tests.

Reagent lot change events with a significant difference between Δ_P and Δ_QC n = 28	Reagent lot change events without a significant difference between Δ_P and Δ_QC n = 332
Comparison of the trends of D_QC and D_P	No. of QC events (%)	No. of QC events (%)
The same	12 (42.9)	220 (66.3)
Different	16 (57.1)	112 (33.7)

Table 4 presents how the D_QC in all 360 reagent lot change events (962 new versus old QC pairs) varied in size. For this, D_QC was expressed as a multiple of SD. In total, 446 (46.4%), 237 (24.6%), 123 (12.8%), 72 (7.5%) and 84 (8.7%) QC pairs fell in the <0.5, 0.5–1.0, 1.0–1.5, 1.5–2.0 and >2.0 multiple-of-SD categories, respectively. The two assay methods (general chemistry versus immunoassay) did not differ significantly in terms of the frequencies in the multiple-of-SD categories. For example, the general chemistry and immunoassay frequencies in the <0.5 multiple-of-SD category were 50.9% (209/411) and 43.0% (237/551), respectively. In the >2.0 multiple-of-SD category, these frequencies were 6.6% (27/411) and 10.3% (57/551), respectively.

Table 4.

Number of events with QC results’ difference between two lots of reagents in multiple of SD.

Multiple of SD	Total		General chemistry		Immunoassay
Multiple of SD	No.	%	No.	%	No.	%
<0.5	446	46.4	209	50.9	237	43.0
0.5–1.0	237	24.6	96	23.4	141	25.6
1.0–1.5	123	12.8	54	13.1	69	12.5
1.5–2.0	72	7.5	25	6.1	47	8.5
>2.0	84	8.7	27	6.6	57	10.3
Total	962	100	411	100	551	100

Discussion

Ideally, when patients’ samples and QC materials are measured before and after reagent lot changes, the test results should be the same; this is often not the case. Instead, each laboratory defines acceptance criteria that signal whether the test results with a new reagent lot are consistent with the test results with the old reagent lot. The standard used in our laboratory is the new reagent lot was considered to be acceptable if (i) the same patient specimens tested with the new and old reagent lots returned values do not differ by more than ±10% (3), and (ii) the values obtained by testing the QC materials with the new lot reagent fell into a conventional predefined target range. We were able to confirm that reagent lot testing in our laboratory satisfied all acceptance criteria and met the requirements of CAP (data not shown). However, these criteria are the minimal requirement as practice guideline and insufficient as the standard of optimal laboratory practice.

In the present study, the differences in the values of patients’ samples (Δ_P) and QC materials (Δ_QC) after testing with the old and new reagent lots were determined. Only 7.8% of the 360 lot change events exhibited statistically significant differences between Δ_P and Δ_QC. By contrast, another study by Miller et al.¹ reported that 40.9% of lot change events were associated with statistically significant difference, even though similar clinical chemistry test items and instrument platforms were used in both studies. According to our lab policy, usually 3–5 patients’ samples are tested per each reagent lot change event for parallel test. Thus, we had to analyse the data using Mann–Whitney, a non-parametric statistical test, in the present study, while t-test with Satterthwaite adjustment was used in the previous study.¹ The disparity might be possibly attributed to the difference in statistical method. The present study also showed that the frequencies of significant differences between Δ_P and Δ_QC did not differ when the general chemistry tests were compared to the immunoassay tests. In addition, the source of QC material (third party versus the method manufacturer) did not significantly affect these frequencies. However, when QC materials from a third party were used, 8.7% events showed significant differences between Δ_P and Δ_QC, whereas the QC materials from the method manufacturer were associated with a relatively lower rate of such differences (4.1%). This phenomenon suggests the possibility that QC materials provided by method manufacturer have fewer matrix effects compared with those provided by third party.

The use of QC materials is inevitably compromised by intrinsic non-commutability problems. This non-commutability reflects the fact that the matrix of QC materials generated by the manufacturing process is different to that of patient specimens. As a result, questions have been asked about the usefulness of considering the lot-to-lot QC test data when determining whether a new reagent lot is acceptable. For example, our experience with non-commutability problems is generated by the use of non-human source (animal placenta) QC materials for an alkaline phosphatase assay.¹⁷ Moreover, when performing new versus old lot tests for a lipase assay, we found that although the patient specimens rarely showed marked differences between the new and old lots, the values of the high level QC materials often exceeded the target range (unpublished data). Later, we found that this discrepancy was due to a matrix-related bias caused by the use of non-human (porcine-based) lipase in the QC materials. Thus, in our experience, QC material results obtained with reagent lot change are not 100% reliable and should only be considered as a reference. This is supported by the analysis in the present study of the trends of D_QC and D_P in the 332 lot change events that were not associated with significant differences between Δ_P and Δ_QC; D_QC and D_p only followed the same trend in 66.3% of these events. Thus, even in the present study, the usefulness of the lot-to-lot QC material data in establishing the suitability of a new lot was limited.

Since QC materials are used in reagent lot change parallel tests in daily practice, it is important to adjust the target range of QC materials after reagent lot change to reduce the rates of false positive and false negative errors, as these errors eventually lead to inappropriate errors. Actually, applying too strict standards wastes resources (including reagents) and increases the test time. However, there are currently no guidelines about the degree of readjustment that will be associated with the lowest rates of false positive and false negative errors.

In our laboratory, the QC material target range is set as the mean ± 2 SD of the results of initial validation when the lot number of QC material is changed like our laboratory. If the distribution of the repeated measured values of QC materials follows a normal distribution curve, the false positive or false negative rates of QC material measurements with a new reagent lot can be calculated by dividing the D_QC (the difference between the QC material results with the new and old lots) by SDs ranging from 0.5 to 3.0, thus yielding multiples of SD. The theoretical false positive and false negative rates associated with these SDs are presented in Table 5, which shows that the false detection rate (false positive rate plus false negative rate) is 15.8% (13.6% + 2.2%) when the QC material values with the new lot reagent differ from the mean of the predefined target range by 1.0 SD. The false detection rate increases drastically to 30.9% (28.6% + 2.3%) when the new lot QC material values differ by 1.5 SD. With this estimation, we propose that 1.0 SD should be the cutoff for maintaining pre-existing target range after reagent lot change. 1.0 SD criteria seem applicable to all the daily QC practice; however, some modification may be necessary depending on test items, the concentration of QC materials and the number of QC materials.

Table 5.

The extent of shift and the resulting false positive and false negative rates.

The extent of shift (SD)	% of false positive	% of false negative
0.5	4.4	1.7
1.0	13.6	2.2
1.5	28.6	2.3
2.0	47.7	2.3
2.5	66.8	2.3
3.0	81.8	2.3

When the size of D_QC in all 962 new versus old lot QC result pairs was assessed by calculating the multiple of SD using the SD employed in the predefined acceptable criteria, 71.0% of the QC results fell into multiple-of-SD categories below 1.0 (which is the theoretical standard that was suggested earlier). Thus, in 29% of reagent lot change events, the QC target range has to be readjusted when 1.0 is used as the upper limit of the target range.

In conclusion, this study confirmed that the QC material data in parallel tests performed during reagent lot-to-lot change are of limited usefulness in evaluating the suitability of a new lot. Also a new standard for establishing a new QC target range after reagent lot change event was proposed.

Footnotes

Conflict of interest

None declared.

Funding

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Ethical approval

Asan Medical Center Institutional Review Board, Ethics reference number: (2013-0630).

Guarantor

WL.

Contributorship

M-CC, SYK and WL researched literature and conducted the study. M-CC, T-DJ, WL, SC and W-KM were involved in protocol development and data analysis. All authors reviewed and edited the manuscript and approved the final version of the manuscript.

References

Miller

Erek

Cunningham

. Commutability limitations influence quality control results with different reagent lots. Clin Chem 2011; 57: 76–83.

The College of American Pathologists, CAP Accreditation Program, Chemistry and Toxicology Checklist, 2012.

Miller

Quality control. In: McPherson

Pincus

(eds). Henry's clinical diagnosis and management by laboratory methods, 22nd ed. Philadelphia: Elsevier Saunders, 2011, pp. 125–127.

Cattozzo

Franzini

Melzi d'Eril

. Commutability of calibration and control materials for serum lipase. Clin Chem 2001; 47: 2108–2113.

Franzini

Ceriotti

. Impact of reference materials on accuracy in clinical chemistry. Clin Biochem 1998; 31: 449–457.

Miller

. Specimen materials, target values and commutability for external quality assessment (proficiency testing) schemes. Clin Chim Acta 2003; 327: 25–37.

Miller

Myers

Ashwood

. State of the art in trueness and interlaboratory harmonization for 10 analytes in general clinical chemistry. Arch Pathol Lab Med 2008; 132: 838–846.

Miller

Myers

Ashwood

. Creatinine measurement: state of the art in accuracy and interlaboratory harmonization. Arch Pathol Lab Med 2005; 129: 297–304.

Thienpont

Stockl

Friedecky

. Trueness verification in European external quality assessment schemes: Time to care about the quality of the samples. Scand J Clin Lab Invest 2003; 63: 195–201.

10.

Bock

Endres

Elin

. Comparison of fresh frozen serum to traditional proficiency testing material in a College of American Pathologists survey for ferritin, folate, and vitamin B12. Arch Pathol Lab Med 2005; 129: 323–327.

11.

Palmer-Toy

Wang

Winter

. Comparison of pooled fresh frozen serum to proficiency testing material in College of American Pathologists surveys: Cortisol and immunoglobulin E. Arch Pathol Lab Med 2005; 129: 305–309.

12.

Rej

Jenny

Bretaudiere

. Quality control in clinical chemistry: characterization of reference materials. Talanta 1984; 31: 851–862.

13.

Schreiber

Endres

McDowell

. Comparison of fresh frozen serum to proficiency testing material in College of American Pathologists surveys: Alpha-fetoprotein, carcinoembryonic antigen, human chorionic gonadotropin, and prostate-specific antigen. Arch Pathol Lab Med 2005; 129: 331–337.

14.

Eckfeldt

Copeland

. Accuracy verification and identification of matrix effects. The College of American Pathologists’ Protocol. Arch Pathol Lab Med 1993; 117: 381–386.

15.

Vesper

Miller

Myers

. Reference materials and commutability. Clin Biochem Rev 2007; 28: 139–147.

16.

Miller

Myers

Rej

. Why commutability matters. Clin Chem 2006; 52: 553–554.

17.

Bae

Chung

Kim

. Placental alkaline phosphatase isoenzyme in quality control materials may be a source of variability in alkaline phosphatase activity. Clin Biochem 2011; 44: 251–253.