Abstract
Objective
To analyze the 2014 results of neonatal screening external quality assessment (EQA) performed by the Chinese National Centre for Clinical Laboratories.
Methods
EQA test panels consisting of five dried blood spots (three panels for phenylalanine (Phe) and thyroid stimulating hormone (TSH), two for glucose-6-phosphate dehydrogenase (G6PD) and 17-alpha-hydroxy progesterone (17-OHP)) were distributed to laboratories and the results collected and evaluated. To compare the correct recognition rates, chi-square test was used.
Results
Test results were received from 170 laboratories for Phe, 176 for TSH, 65 for G6PD and 65 for 17-OHP. The total number of effective quantitative and qualitative results of Phe, TSH, G6PD and 17-OHP were 2520 and 2370, 2605 and 2450, 645 and 530, 645 and 645, respectively. The overall correct recognition rates for qualitative tests of Phe, TSH, G6PD and 17-OHP were 99.79 %, 99.67 %, 93.40 % and 99.84 %, and the proportion of acceptable quantitative results were 94.48 %, 98.31 %, 84.65 % and 99.84 %, respectively. There were significant differences in the rates of acceptable quantitative results of the two measurement systems for Phe, TSH and G6PD (p<0.001); χ2 test showed significant differences in correct recognition or acceptable rates among programmes (p<0.001).
Conclusion
Most of the quantitative results were acceptable and the overall correct recognition rates in qualitative results approached 100%. Distributing more challenging samples and increasing the range of concentrations of EQA samples will improve standards in future assessments.
Keywords
Introduction
Following a pilot study in 1981, neonatal screening was extended to all provinces in China over the next two decades. 1 The first external quality assessment (EQA) of neonatal screening laboratories testing phenylalanine (Phe) and thyroid stimulating hormone (TSH) was performed in 1999, and of laboratories testing for glucose-6-phosphate dehydrogenase (G6PD) deficiency and 17-alpha-hydroxy progesterone (17-OHP) in 2010, by the National Centre for Clinical Laboratories (NCCL). We analyzed the EQA results from these qualitative and quantitative neonatal screening tests.
Methods
EQA programme materials and scheme design
In March 2014, EQA panels, each consisting of five dried blood spots, three panels for Phe and TSH (three panels, 15 samples), two for G6PD and 17-OHP (two panels, 10 samples) were prepared, labelled, and distributed to Hospitals and Maternal and child health centres which provided neonatal screening. Blood spot homogeneity and stability was guaranteed by a manufacturer approved by the Chinese Food and Drug Administration (CFDA). The concentrations of all the spots were different, and panels were stored at 2∼8℃ until distribution. Each sample/panel was coded to make analysis clear: the first four digits represented the year in which these samples were distributed, the fifth identified the lot of the sample/panel, and the last digit indicated the serial number of a sample in one panel. To ensure confidentiality, code numbers of data reporting and further communication details were included with instruction manuals sent with the test panels. The participant laboratories were given two weeks to test the EQA panel, using their standard laboratory protocols, and return the results to NCCL via express mail or a data submission form on the NCCL website. Screen positive or negative results were required from qualitative EQA participants. Those providing quantitative schemes submitted a report showing umol/L for Phe, mIU/L for TSH, U/gHb for G6PD and nmol/L for 17-OHP.
Target values assigned and evaluation of the results
Results were assessed using consensus. 2 The assigned value of quantitative test was the robust average of the results reported by all participants in a subgroup, using the algorithm A referring to ISO 13528. 3 For qualitative tests, the result of positive or negative which exceeded 80 % was the assigned value for this group. 4
For quantitative results, scores were awarded on the basis of bias against the assigned value: 20 points assigned if the quantitative results of Phe fell in the range of ±30 % or 60 umol/L (whichever was larger) of the assigned value, ±30 % for TSH, G6PD, and 17-OHP. Other results were not scored in every EQA panel. For qualitative test results, 20 points were assigned for being same as the assigned value whereas the different results were not scored in every EQA panel. Equivocal or borderline results were not accepted - every dried blood spot was required to be tested and assigned a definitive conclusion of positive or negative. Thus 100 points was the maximum possible score for participant laboratories in both quantitative and qualitative assessment. As with other EQA programmes, the NCCL EQA programme considered 80 points or more acceptable performance, and less than 80 points unacceptable.5,6 The overall correct recognition rate or acceptable rate from qualitative or quantitative results for each programme was calculated as (number of acceptable results)/(overall number of effective results).
Statistical analysis
Data was analyzed using SPSS 13.0 (StataCorp) and Clinet-EQA evaluation system V 1.0, designed by NCCL and used in the national EQA programme (see http://www.clinet.com.cn/shop/shop). Median, standard deviation (SD), robust average, robust standard deviation (RSD), robust coefficient of variation (RCV) and bias were calculated and used to evaluate performance of participant laboratories. To compare the performance between measurement system and overall correct recognition rate among the four neonatal screening EQA programmes, chi-square (χ 2 ) test was used. A p<0.05 was considered significant.
Results
Participant laboratories
Hospitals and Maternal and Child Health centres providing neonatal screening services participated in the 2014 EQA programme. There were 170, 176, 65, and 65 laboratories enrolled in Phe, TSH, G6PD and 17-OHP EQA programmes, respectively. Two mainstream measurement systems (PerkinElmer and Ani labsystem) were used. For Phe, TSH, G6PD, and 17-OHP EQA programmes 133, 151, 52, and 58 participants used PerkinElmer; 37, 25, 13, and 7 participants used Ani labsystem, respectively. In 2014, we collected 2520, 2605, 645, and 645 quantitative EQA test reports and 2370, 2450, 530, and 645 qualitative reports for Phe, TSH, G6PD, and 17-OHP (12 laboratories submitted only quantitative results for Phe, TSH, and G6PD).
Analysis of the qualitative results
Analysis of the qualitative results.
20 points for same as the assigned value whereas the different results were not scored in every EQA panel. 100 points was the full score.
Qualitative results of PerkinElmer and Ani labsystem measurement systems in 2014.
20 points for same as the assigned value whereas the different results were not scored in every EQA panel. 100 points was the full score.
Analysis of the quantitative results
Analysis of the quantitative results.
20 points were assigned if the quantitative results of Phe fell in the range of ±30 % or 60 umol/L (whichever was larger) of the assigned value, ±30 % for TSH, G6PD and 17-OHP. Other results were not scored in every EQA panel.
Quantitative results of PerkinElmer and Ani labsystem measurement systems in 2014.
20 points were assigned if the quantitative results of Phe fell in the range of ±30 % or 60 umol/L (whichever was larger) of the assigned value, ±30 % for TSH, G6PD and 17-OHP. Other results were not scored in every EQA panel.
The assigned value (robust average) and RSD for each Phe, TSH, G6PD, and 17-OHP sample for both measurement systems are shown in Figures 1–4. The robust average of Phe (6 of 15 samples) (Figure 1) was lower than 120 umol/L; others were higher than 300 umol/L. The range of robust average was about 60-720 umol/L and the range of RCV was about 5-23 %. The error bars represent RSD ranging from 0.14 to 0.809 for PerkinElmer, and 0.149 to 0.988 for Ani labsystem. The RCV of EQA samples with lower concentration was higher. In lower concentrations, the PerkinElmer RCV was lower than that of Ani labsystem. In higher concentrations, the robust average of Ani labsystem was higher than that of PerkinElmer.
The robust average of concentration and RCV of each EQA sample of two measurement system of Phe (The Panel ID omitted “2014”, error bar represented RSD, the following are the same.). The robust average of concentration and RCV of each EQA sample of two measurement system of TSH. The robust average of concentration and RCV of each EQA sample of two measurement system of G6PD deficiency. The robust average of concentration and RCV of each EQA sample of two measurement system of 17-OHP.



For 6 out of 15 samples, the robust average concentration of TSH was less than 10 mIU/L; for others it was higher than 25 mIU/L (see figure 2). For higher concentrations, the robust average and RSD using Ani labsystem was larger than that using PerkinElmer. In lower concentrations, the RCV was higher. The RCV using Ani labsystem was higher than PerkinElmer except for lots 13, 21, and 35. The range of RCV using Ani labsystem was about 5-25 % and RCVs were quite different from each other. The RCV using PerkinElmer was about 5-10 % but with little difference among samples.
For the G6PD EQA programme, the concentration of G6PD uniformly increased from <1-9 U/gHb. The robust average and RSD of G6PD using Ani labsystem was larger than that using PerkinElmer. The RCV using PerkinElmer was considerably smaller than that using Ani labsystem for every sample. The range of RCV using PerkinElmer was 0-15 %, and for Ani labsystem about 23-50 % (see figure 3).
Figure 4 shows that the robust average of 17-OHP for 4 of 10 EQA samples was less than 20 nmol/L, but for others was higher than 70 nmol/L. In higher concentrations the robust average and RSD using PerkinElmer were less than that those using Ani labsystem. The RCV using PerkinElmer was higher in lower concentrations but lower in higher concentration. The RCV using Ani labsystem was different among samples.
Discussion
We here provide the first analysis of EQA results for the performance of neonatal screening in China. Tens of thousands of infants are born in China every day, and only government accredited laboratories can legally provide neonatal screening services. All neonatal screening laboratories in China were enrolled in the NCCL EQA scheme.
The overall correct recognition rate in qualitative tests for G6PD was lower than for the other programmes, but there was no relationship between EQA results and measurement system. The robust average concentration for all samples of Phe, TSH, and 17-OHP was not near the equivocal or borderline value for either the PerkinElmer or Ani labsystem measurement systems, but the robust average concentration of more than one sample of G6PD was around 2 U/gHb, identified by many laboratories as a borderline result.
In the quantitative results, there was a significant difference in the rate of acceptable results among different programmes. The proportion of participants with full marks, 80 points, and lower than 80 points differed significantly between the two measurement systems for Phe, TSH, and G6PD deficiency (but not 17-OHP). The RCV of TSH and G6PD using Ani labsystem was remarkably higher than that using the PerkinElmer measurement system, suggesting that the dispersion of results using Ani labsystem was greater than that using PerkinElmer, thus leading to a lower rate of acceptable results. For higher concentrations of Phe, the RCV using Ani labsystem was slightly lower than that using PerkinElmer, while the robust average using Ani labsystem was higher than that using PerkinElmer. In relation to the evaluation standard, the Ani labsystem range was larger than PerkinElmer, and the relative dispersion was less than PerkinElmer. In lower concentrations, although the PerkinElmer measurement system had a smaller RCV, the range of acceptable results calculated as robust average (approximate 120 umol/L) ±60 umol/L was large enough to make almost all the results acceptable in lower concentrations. The rate of acceptable results using Ani labsystem was higher than that using PerkinElmer.
Although variation within the two different measurement systems led to discrepancies, and the RCV and RSD using Ani labsystem were larger than those using PerkinElmer in most samples, the overall correct recognition rate for qualitative results had no significant statistical differences between the two systems. For neonatal screening, the qualitative results were much more useful than quantitative results, and the performance of the two measurement systems could be considered to be the same.
A limitation of our study was that the test panel blood spots were not from infant blood, but simulated infant blood spots, which may have caused matrix effects. In addition, it may be difficult to score or evaluate the performance of participant laboratories if the concentration of the G6PD EQA sample was close to equivocal or borderline values.
In conclusion, most of the quantitative results were acceptable and the overall correct recognition rate in qualitative results approached 100%. The EQA programme is vital, but distributing more challenging samples and increasing the range of concentration of EQA samples may improve the quality of screening in the future.
