Sage Journals: Discover world-class research

Abstract

Although the rabbit pyrogen test is one of the crucial methods included in each pharmacopeia to evaluate the safety of parenteral medicine, the experimental procedures and pyrogen result judgment algorithms (PRJAs) are still greatly different from one another. In the first stage of testing, original data of 879 batches from a total of 2637 rabbits in our laboratory were judged by PRJAs in the Chinese Pharmacopoeia 2005 III, the Japanese Pharmacopoeia XIV, the Japanese Pharmacopoeia XV, the European Pharmacopeia 6.0, the United States Pharmacopoeia 32 NF27 and two theoretical models proposed by S. Hoffmann, respectively. The results were analyzed to evaluate the effects of various PRJAs. It was shown that: (i) the significant differences in the results judged by various pharmacopeias and Hoffmann’s theoretical models were mainly due to the PRJAs and the great differences in PRJAs should be harmonized throughout the world based on balance of reducing animal use and guaranteeing the safety of medicines; (ii) it is better to use PRJAs that depend on the threshold of the sum of temperature rise of all tested rabbits than those that depend on the number of rabbits that are over the threshold of temperature rise of individual rabbit according to clinical proof and the experimental data; and (iii) the PRJA of the Japanese Pharmacopoeia XV has obvious advantages when the total suspicious rate of samples was less than 10%. Additionally, a new PRJA designed for reducing the additional experiment stages and animal consumption is promoted for evaluation.

Keywords

Rabbit pyrogen test (RPT)pyrogen result judgment algorithm (PRJA)threshold of temperature rise of individual rabbit (TTRIR)threshold of the sum of temperature rise of all tested rabbits (TSTRATR)replacement/refinement/reduction (3Rs)

Introduction

Reactions caused by pyrogens are well known to be harmful to the human body. The major symptoms are fever, chills, nausea, vomiting, headache, waist and/or joint pain, gray skin color, leukopenia, and an increase in vascular permeability. In severe cases, pyrogens can lead to coma, shock and even death.^1,2 In order to guarantee the safety of parenteral pharmaceutical products, especially intravenous medicines, the rabbit pyrogen test (RPT) method has been adopted by most pharmacopoeias, which strictly regulates any pharmaceutical product for parenteral application, including biological products and medical devices (product for in vivo use). Based on the conditions of lower pyrogen content in samples under modern Good Manufacture Practice (GMP) and the demand of reducing the use of animals by the replacement/refinement/reduction (3Rs) principle in recent years, Hoffmann et al.³ proposed two other pyrogen test models, 2-2-2-A and 2-2-2-B (Hoffmann’s theoretical models) in 2005.

Although the purpose of the pyrogen tests in every pharmacopoeia and Hoffmann’s theoretical models are the same, the test procedures and result interpretations are different.⁴ We summarized that there are eight main differences among two Japanese Pharmacopoeias (JPXIV⁵ and JPXV⁶), the European Pharmacopoeia 6.0 (EP6.0),⁷ the United States Pharmacopoeia 32 NF27 (USP32),⁸ the Chinese Pharmacopoeia 2005 volume 3 (CHPIII)⁹ and Hoffmann’s theoretical models. First, the number of rabbits used in the first stage of the pyrogen test is different, three in each pharmacopoeia and two in Hoffmann’s theoretical models. Second, the times of testing and total number of rabbits used for the final result judgment are different. The EP6.0 requires up to 4 testing times and 12 rabbits (the highest total number of the tests compared here); JPXV requires up to 3 times and 9 rabbits; CHPIII, USP32 and JPXIV require only up to 2 times and 8 rabbits; and Hoffmann’s theoretical models require up to 3 times and 6 rabbits (the lowest total number of the tests compared here). Third, the pyrogen result judgment algorithms (PRJAs) are different in each pharmacopoeia. JPXIV uses the threshold of temperature rise of individual rabbit (TTRIR) as the judgment criterion; EP6.0, JPXV and Hoffmann’s theoretical models use the threshold of the sum of temperature rise of all tested rabbits (TSTRATR) as the judgment criterion; and USP32 and CHPIII use the combined criteria, not only the TTRIR, but also the TSTRATR. Fourth, the criteria of TTRIRs and TSTRATRs are different. Although the PRJA of CHPIII is similar to USP32, the TTRIR and TSTRATR is higher than USP32 (the TTRIR and TSTRATR are 0.6°C and 3.5°C in CHPIII and 0.5°C and 3.3°C in USP32, respectively).^8,9 The TSTRATRs allowable to a qualified sample (defined as ‘passed’ in this context) in the first two stages in EP6.0 are lower than that in JPXV, but higher than that for ‘pyrogen’ samples (defined as ‘failed’ in this context). Fifth, the mean temperature rise calculated from the total temperature rising value divided by all rabbits is different, USP32 is 0.41°C (3.3°C divided by 8, the lowest), Hoffmann’s 2-2-2-B is 0.42°C, CHPIII is 0.44°C, Hoffmann’s 2-2-2-A is 0.52°C, both EP6.0 and JPXV are 0.55°C and JPXIV is 0.6°C³ (the highest). Sixth, we can directly judge the samples unqualified in the first stage of the test by all the PRJAs, except USP32. Seventh, the initial temperature judgment method of each PRJA is different.¹⁰ EP6.0, CHP III and JPXV use the mean value of temperature recorded from the same rabbit twice at an interval of 30 min within 40 min immediately before injection of the sample to be examined, whereas USP32 demands the normal temperature to be determined within 30 min. Finally, the initial temperature range of rabbits required in each PRJA is different.¹⁰ The initial temperature range requires 38.0–39.6°C in CHPIII, 38.0–39.8°C in EP6.0, no more than 39.8°C in USP32, JPXIV, JPXV and no mention about it in Hoffmann’s theoretical models. In this paper, the comparison of the methods is based solely on the differences in interpretation of the temperature rise methods, particularly among JPXIV, JPXV, EP6.0, USP32, CHPIII and Hoffmann’s theoretical models.

The differences between pharmacopeias, especially in the test result interpretations, are still an issue. On the one hand, to our knowledge, extensive study has never been performed on the differences in result interpretations for the test, and no one has examined the feasibility of Hoffmann’s models; on the other hand, RPT is only a limit control test with great individual differences, whether it is suitable and cost-effective to control pyrogens strictly by the RPT is controversial. The present study was initiated to analyze the consistency and differences of the PRJAs among EP6.0, USP32, JPXIV, JPXV, CHPIII and Hoffmann’s theoretical models by using the original data in our laboratory, subsequently, to discuss the optimal PRJA. On the basis of the above analysis and the safe use of products for a long time period under the control now, we provide a new PRJA which is not only simple, reasonable and reduces replication times, but is also in accordance with the 3R principles (reducing animal-consumption).

When necessary, the abbreviations EP6.0, USP32, JPXIV, JPXV, CHPIII and Hoffmann’s theoretical models stand for the corresponding PRJAs below.

Materials and methods

Materials

The ZRY-2A pyrogen testing instrument was purchased from the Tianda-Tianfa Science and Technology, LLC (Tianjin, PR China). All testing samples were from different Chinese biopharmaceutical companies. The total 879 batches of testing samples included human serum albumin (414 batches), human immunoglobulin for intravenous injection (199 batches), Haemophilus influenza type b conjugate vaccine (206 batches), and polyvalent pneumococcal polysaccharide vaccine (60 batches). All the tests were completed between 2006 and 2008.

Animals and experimental environment

In total, 2637 healthy New Zealand white rabbits, males or non-pregnant females weighing 1.7–2.7 kg, were purchased from KeYu Laboratory Animal Company in Beijing. The qualified certificate numbers are SCXK (Jing) 2002-0005 and SCXK (Jing) 2007-0003.

All of the tests were done in the barrier rooms (license number: SYXK (Jing) 2006-0004). The temperature difference between the experimental rooms and the rabbits’ living quarters was not more than 3°C. The temperature of the laboratory was 20–25°C, and the humidity was 40–70%. Rabbits were fed with the same diet 7 d before measuring the body temperature prior to the test. One cage housed one rabbit. During this period, no abnormal manifestations, such as loss of body weight or problems in behavioral status, appetite or excretions, occurred.

Pyrogen test

The methods and environmental conditions of the CHPIII pyrogen test were similar to that described in other pharmacopoeias and Hoffmann’s theoretical models.³^–¹⁰ However, the CHPIII description is more explicit.⁴ The description gives the following directions: the rabbits, which have not been previously used for pyrogen testing, shall be selected by the following screening stage. First, measure the body temperature of each rabbit 3 d prior to the first stage of the test 8 times at intervals of 30 min under the same conditions as the pyrogen test without test sample injection (only EP6.0 screening test requires the intravenous injection of 10 ml/kg of body weight of the pyrogen-free 9 g/l solution of sodium chloride pre-warmed to about 38.5°C). Second, the rabbit may be used for the pyrogen test only when the body temperature in all 8 measurements is within the range of 38.0–39.6°C, and the difference between the highest and the lowest body temperatures is not more than 0.4°C.

During the first stage of the pyrogen test, withhold diet from the rabbits at least 1 h prior to the test, and put the rabbits into a suitable unit until the test is completed. The accuracy of the device used for measuring body temperature of the rabbits should be within the range of 0.1°C. The measurement of the body temperature of each rabbit should be performed twice at an interval of 30 min before the test. The difference between the two measurements should not exceed 0.2°C. The mean of the two measurements is regarded as the normal body temperature of the rabbit. All the normal body temperatures of the rabbits used on the date of the test should be within the range of 38.0–39.6°C, and the difference between normal body temperatures of the rabbits in the same group should not be more than 1°C. Within 15 min after measuring the normal body temperatures of three rabbits, inject slowly the test sample –preheated to 38°C – at the prescribed dose into the ear vein of each rabbit. Measure the body temperature of each rabbit 6 times at intervals of 30 min. The difference between the highest body temperature among the six measurements and the normal body temperature is regarded as the body temperature rise of that rabbit.

As for Hoffmann’s theoretical models, only the temperature rise data of the first two rabbits of the three in the test is considered for judgment to avoid the subjective bias.

Result judgments and statistical methods

According to the above method, the original results of 879 batches of test samples in the first stage of the pyrogen test were judged by CHPIII, JPXIV, JPXV, EP6.0, USP32 and Hoffmann’s theoretical models, respectively. The judged results were defined as passed (or pyrogen-free) batches, batches needing additional stages (repeat batches) and failed (or pyrogenic) batches (the content of pyrogens exceeded the pharmacopeia standard).

The statistical methods used were as follows: Kendall’s tau-b coefficient, weighted kappa coefficient and Bowker’s test for symmetry for 3 × 3 table; Kendall’s tau-b coefficient, simple kappa coefficient, McNemar’s test and diagnostic analysis,¹¹ including sensitivity (Se), specificity (Sp), agreement rate (π), Youden index (YI), negative likelihood ratio (LR⁻), the predictive value for suspicious rate of the total samples (PV⁺) and the predictive value for passed rate of the total samples (PV⁻) for 2 × 2 table. SAS software package was used in statistical analysis.

Results

Result judgments by CHPIII, JPXIV, JPXV, EP6.0, USP32 and Hoffmann’s theoretical models

From the original data tested by CHPIII in the first stage of the test, the results of four biological products showed that there were 325 batches (78.5%) passed, 80 batches (19.3%) needing additional stage and 9 batches (2.2%) failed in human serum albumin (414 batches in total); 129 batches (64.8%) passed, 57 batches (28.6%) needing additional stage and 13 batches (6.5%) failed in human immunoglobulin for intravenous injection (i.v.) (199 batches in total); 172 batches (83.5%) passed, 28 batches (13.6%) needing additional stage and 6 batches (2.9%) failed in H. influenza type b conjugate vaccine (206 batches in total); 47 batches (78.3%) passed, 9 batches (15.0%) needing additional stage and 4 batches (6.7%) failed in polyvalent pneumococcal polysaccharide vaccine (60 batches in total).

In order to check the consistency and differences between the PRJAs, the results (all 879 batches) of CHPIII judgment were taken as the reference standard (the statistic ranking will not changed by choosing another PRJA as reference). The results of other PRJAs were compiled according to CHPIII. The sum and percentage of each class of PRJA were calculated and showed in the last column except CHPIII and all agreement batches judged by all PRJAs in the test results of the original data of pyrogen tests in the last two rows. The data are shown in Table 1 (3 × 3 table). In order to compare the PRJAs further, the needing additional stages and failed batches were combined together as suspicious batches (the suspicious batch means that a PRJA cannot confirm the test sample safely). The results of CHPIII judgment were also taken as the reference standard, and the results of other PRJAs were compiled. The sum and percentage of each class of PRJA were calculated and are shown in the last column except CHPIII in the last row. The data are listed in Table 2 (2 × 2 table).

Table 1.

Pyrogen test results of the original data for 879 batches determined by CHP, JP, EP, USP and Hoffmann’s theoretical models (3 × 3 table)

	Test results by CHP III interpretation	Passed batches	Batches needing additional stages	Failed batches	Sum (%)
JPXIV	Passed batches	673	4	0	677 (77.0%)
	Batches needing additional stages	0	170	0	170 (19.8%)
	Failed batches	0	0	32	32 (3.6%)
JPXV	Passed batches	673	115	1	789 (89.8%)
	Batches needing additional stages	0	59	27	86 (9.8%)
	Failed batches	0	0	4	4 (0.5%)
EP6.0	Passed batches	643	67	0	710 (80.8%)
	Batches needing additional stages	30	107	28	165 (18.8%)
	Failed batches	0	0	4	4 (0.5%)
USP32^a	Passed batches	522	0	0	522 (59.4%)
	Batches needing additional stages	151	174	32	357 (40.6%)
	Failed batches	0	0	0	0 (0%)^a
2-2-2-A^b	Passed batches	548	40	0	588 (66.9%)
	Batches needing additional stages	125	128	25	278 (31.6%)
	Failed batches	0	6	7	13 (1.5%)
2-2-2-B^b	Passed batches	548	40	0	588 (66.9%)
	Batches needing additional stages	125	128	23	276 (31.4%)
	Failed batches	0	6	9	15 (1.7%)
	Sum of CHPIII (%)	673 (76.6%)	174 (19.8%)	32 (3.6%)
	Agreement batches judged by all PRJAs (%)^c	548 (62.3%)	80 (9.1%)	3 (0.3%)

The pyrogen test results of the original data for 879 batches, as determined respectively by JPXIV, JPXV, EP, USP and Hoffmann’s theoretical models, were classified as passed, need additional stages and failed, and were compiled using the results of CHPIII judgment as the reference standard. The sum and percentage of each class of each PRJA calculated and showed in the last column except CHPIII in the last row.

USP32 cannot directly determine unqualified samples in preliminary tests.

Only the temperature rise data of the first two rabbits out of the three in the test was considered for judgment to avoid the subjective bias.

Agreement batches judged by all PRJAs in the test results of the original data of pyrogen tests for 879 batches, were classified as passed, need additional stages and failed, except USP32.

Table 2.

Pyrogen test results of the original data for 879 batches determined by CHP, JP, EP, USP and Hoffmann’s theoretical models (2 × 2 table)

	Test results by CHP III interpretation	Suspicious batches	Passed batches	Sum (%)
JPXIV	Suspicious batches	202	0	202 (23.0%)
	Passed batches	4	673	677 (77.0%)
JPXV	Suspicious batches	90	0	90 (10.2%)
	Passed batches	116	673	789 (89.8%)
EP6.0	Suspicious batches	139	30	169 (19.2%)
	Passed batches	67	643	710 (80.8%)
USP32	Suspicious batches	206	151	357 (40.6%)
	Passed batches	0	522	522 (59.4%)
2-2-2-A	Suspicious batches	166	125	291 (33.1%)
	Passed batches	40	548	588 (66.9%)
2-2-2-B	Suspicious batches	166	125	291 (33.1%)
	Passed batches	40	548	588 (66.9%)
	Sum of CHPIII (%)	206 (23.4%)	673 (76.6%)

The pyrogen test results of the original data for 879 batches, classified as passed and suspicious were determined respectively by CHPIII, JPs, EP, USP and Hoffmann’s theoretical models, and compiled using the result of CHPIII as the reference standard. The sum and percentage of each class of each PRJA calculated and showed in the last column except CHPIII in the last row.

Statistical analysis

We analyzed the data in Table 1 (3 × 3 table) by using Kendall’s tau-b coefficient, Weighted kappa and Bowker’s test for symmetry to compare the consistency and difference of results judged by various PRJAs. The analyzed results are shown in Table 3. In order to compare the consistency and difference of the results judged by other PRJAs and CHPIII in Table 2, we analyzed the results using Kendall’s tau-b coefficient, simple kappa coefficient and McNemar’s test. The diagnostic parameters such as Se, Sp, π, YI, LR⁻, PV⁺, PV⁻ of each PRJA were also calculated, using the results of CHPIII as the reference standard according to the annotation format in Table 4. These results are shown in Table 5.

Table 3.

The correlation and symmetry difference of JPs, EP, USP and Hoffmann’s theoretical models in comparison to CHPIII (for 3 × 3 table)

	Kendall’s tau-b coefficient	Weighted kappa	Bowker's test for symmetry
	(Asymptotic standard error)	(95% CI)	(P-value)
JPXIV	0.9883 (0.0058)	0.9893 (0.9789–0.9998)	4.0000 (0.2615)
JPXV	0.6340 (0.0255)	0.5026 (0.4423–0.5630)	143.0000 (<0.0001)
EP6.0	0.6828 (0.0283)	0.6229 (0.5687–0.6771)	42.1134 (<0.0001)
USP32	0.6560 (0.0196)	–*	–*
2-2-2-A	0.5646 (0.0281)	0.5156 (0.4609–0.5703)	55.4330 (<0.0001)
2-2-2-B	0.5670 (0.0282)	0.5228 (0.4676–0.5780)	58.7534 (<0.0001)

The consistencies and differences analyzed by Kendall’s tau-b coefficients, Weighted kappa coefficients and Bowker’s test for symmetry respectively shown that only PRJA of JPXIV was closely consistent with that of CHPIII (the Kendall’s tau-b coefficient >0.95 and P-value of Bowker’s test for symmetry is >0.05), other PRJAs were significant different from CHPIII (P of Bowker’s test for symmetry <0.0001).

Because USP32 is unable to determine unqualified samples in first stage of the test, weighted kappa coefficients and Bowker’s test for symmetry analysis are impossible.

Table 4.

Diagnostic parameter analysis of JPs, EP, USP and Hoffmann’s theoretical models in comparison to CHPIII

	Test results by CHPIII	Suspicious batches	Passed batches
Test results by another PRJA	Suspicious batches	A	B
Test results by another PRJA	Passed batches	C	D

Taking the result judgment of CHPIII as the reference standard, A represents the suspicious batches determined by CHPIII and another pyrogen result judgment algorithm (PRJA); B represents the batches that CHPIII determined as passed and another PRJA determined to be suspicious; C represents the batches that CHPIII determined to be suspicious and another PRJA determined as passed; D represents the passed batches determined by CHPIII and another PRJA.

Table 5.

Comparison of the statistical parameters of JPs, EP, USP and Hoffmann’s theoretical models to those of CHPIII (for 2×2 table)

	JPXIV	JPXV	EP6.0	USP32	2-2-2-A and B
Kappa value (95% CI)	0.9872 (0.9748–0.9997)	0.5430 (0.4741–0.6119)	0.6721 (0.6120–0.7321)	0.6184 (0.5671–0.6696)	0.5424 (0.4824–0.6025)
Kendall's tau-b coefficient (asymptotic standard error)	0.9873 (0.0063)	0.6105 (0.0271)	0.6774 (0.0300)	0.6690 (0.0205)	0.5582 (0.0298)
McNemar’s test results (P-value)	4.000 (0.0455)	116.0000 (<0.0001)	14.1134 (0.0002)	151.0000 (<0.0001)	43.7879 (<0.0001)
Se (standard error)	0.9806 (0.0096)	0.4369 (0.0356)	0.6748 (0.0326)	1.0000	0.8058 (0.0276)
Sp (standard error)	1	1	0.9554 (0.008)	0.7756 (0.0161)	0.8143 (0.015)
π	0.9954	0.8680	0.8896	0.8282	0.8123
YI (standard error)	0.9806 (0.0096)	0.4369 (0.0346)	0.6302 (0.0336)	0.7756 (0.0161)	0.6201 (0.0314)
LR⁻	0.0194	0.5631	0.3404	–	0.2385
Sample suspicious rates	0.2298	0.1024	0.1923	0.4061	0.3311
PV⁺ for suspicious samples when the total suspicious rate is 0.2344	1	1	0.8225	0.5771	0.5705
PV⁻ for passed samples when the total suspicious rate is 0.2344	0.9941	0.8529	0.9056	1	0.9320
PV⁺ for suspicious samples when the total suspicious rate is 0.10	1	1	0.6271	0.3312	0.3253
PV⁻ for passed samples when the total suspicious rate is 0.10	0.9978	0.9411	0.9636	1	0.9742

Kappa test: evaluation of the consistency of two PRJAs in the 2×2 table. The value of kappa is between −1 and +1. K = −1 indicates that the two PRJAs determination of the result is completely inconsistent; K = 0 indicates that the observation consistency is completely caused by accidental errors; K = 1 indicates that the two PRJAs have complete consistency, not due to chance. Generally, if K > 0.75, the consistency is thinking as best; if 0.40 < K < 0.75, the consistency good; if K < 0.4, the consistency bad.

Kendall’ tau-b coefficient is used for a correction of ties only when both variables lie on an ordinal scale. The higher the tau-b value, the higher its association with CHP III.

McNemar’s test is used to evaluate the result differences between two PRJAs in the 2 × 2 table. P < 0.01, the judgment results of two PRJAs have a significant difference.

Sensitivity, Se = A/(A + C), represents the suspicious batches judged by a particular pyrogen judgment mode compared to the number of suspicious batches judged by CHPIII, reflecting the ability of the PRJA to detect suspicious samples. It is also called the suspicious rate (true pyrogen suspicious ratio). A larger value indicates that the discriminating ability for suspicious batches is more similar to CHP III.

Specificity, Sp = D/(B + D), represents the qualified batches judged by a particular PRJA compared to the number of qualified batches judged by CHPIII, reflecting the ability of the PRJA to detect qualified samples. It is also called the qualified rate (true pyrogen qualified rate). A larger value indicates that the discriminating ability for passed batches is more similar to CHP III.

Agreement rate, π = (A + D)/(A + B + C + D), represents that the sum of the true suspicious and true qualified batches judged by a particular PRJA compared to the total number of batches. It reflects the ability of the PRJA to correctly differentiate the suspicious and qualified samples detected by CHPIII.

Youden index, YI = Se + Sp − 1, represents the difference between the true suspicious rate and false suspicious rate, reflecting the total ability of the PRJA to judge the qualified and suspicious samples detected by CHPIII. A larger value indicates that the authenticity of the result detected by the PRJA was closer to CHPIII.

Negative likelihood ratio, LR⁻ = [C*(B + D)]/[D*(A + C)], refers to the ratio of the false negative rate and true negative rate. A smaller value means that the ability of the PRJA to eliminate suspicious samples is stronger.

The predictive value for suspicious rate of the total samples, PV⁺= P*Se/[P*Se + (1 − P)*(1 − Sp)], refers to the ability of a PRJA to judge suspicious batches compared to the actual number of suspicious batches. A larger value indicates that the predictive ability of the PRJA is stronger.

The predictive value for passed rate of the total samples, PV⁻ = (1 − P)*Sp/[(1 − P)*Sp + P*(1 − Se)], refers to the ability of a mode to judge the number of qualified batches compared to the actual number of qualified batches. A larger value indicates that the predictive ability of the PRJA is stronger.

As shown in Tables 3 and 5, only the results of JPXIV were closely correlated with those of CHPIII (>0.98); other PRJAs had the ordinary correlation by using Kendall’s tau-b coefficient and weighted kappa coefficient (in Table 1) or simple kappa coefficient (in Table 2); the coefficients were between 0.5–0.7. The sequence of correlation in the two statistical methods was also the same JPXIV > EP6.0 > USP32 > JPXV > 2-2-2-A and 2-2-2-B. Through the comparison of the different number of batches by using the Bowker’s test for symmetry in Table 1 or McNemar’s test in Table 2, there was a significant difference (P < 0.0001) between EP6.0, USP32, JPXV, Hoffmann’s theoretical models and CHPIII, but not between JPXIV and CHPIII.

When we took the judgment result of CHPIII as the reference standard, the sequence of Se (see in Table 5) reflecting the ‘real positive rate’ in other PRJAs in decreasing order was USP32 (1) > JPXIV (0.98) > 2-2-2-A and B (0.81) > EP6.0 (0.67) > JPXV (0.44). The sequence of Sp, which reflecting the ‘real negative rate’, in decreasing order was JPXIV = JPXV (1) > EP6.0 (0.96) > 2-2-2-A and B (0.81) > USP32 (0.78). While both TTRIR and TSTRATR of USP32 are lower than that of CHPIII (the TTRIR and TSTRATR are 0.6°C and 3.5°C in CHPIII, 0.5°C and 3.3°C in USP32, respectively), there were more suspicious batches judged by USP than CHPIII and the Se of USP32 was the largest (most sensitive) (1.000) and the Sp the smallest (0.78). The Se values in JPXIV and JPXV were greatly different, although the Sps (1) were the same. The Se in JPXIV (0.98) was almost equal to that in CHPIII, while the Se in JPXV (0.44) was the smallest among all of PRJAs. It can be concluded that, although the PRJA of CHPIII is most similar to that of USP32, the results judged by it are most similar to JPXIV, not to USP32. In addition, there were more suspicious batches in CHPIII had been classified as qualified in JPXV. This indicated that most suspicious batches were due to rabbits that were over the TTRIR of 0.6°C whereas the TSTRATR was less than 1.3°C.

When the total suspicious rate was 0.2344, the order of π values (Table 5), reflecting the ability to distinguish properly the suspicious and passed batches of other PRJAs, was JPXIV (0.9954) > EP6.0 (0.8896) > JPXV (0.8680) > USP32 (0.8282) > 2-2-2-A and B (0.8123). The YIs order, reflecting the total discriminating capacity of various PRJAs, was JPXIV (0.9806) > USP32 (0.7756) > EP6.0 (0.6302) > 2-2-2-A and B (0.6201) > JPXV (0.4369). These results showed that JPXIV’s π and YI values were both close to 1, indicating the judgment ability for suspicious and qualified samples was closely consistent with CHPIII, agreeing with the Se and Sp results. Although the values of π of USP32, EP6.0, JPXV and 2-2-2-A and B were all above 80%, there were large differences in the total judgment ability amongst the PRJAs. YI of USP32 was close to 0.8; EP6.0 and 2-2-2-A and B were within the range of 0.62–0.64. YI of JPXV was the lowest of all at about 0.4, showing that the total judgment ability of JPXV on test results had the largest difference with CHPIII, which was similar to its results of a low Se as shown above.

The order of the LR⁻ values from strong to weak rank (Table 5), reflecting the ability of various PRJAs to exclude suspicious samples, was JPXIV (0.0194) > 2-2-2-A and B (0.2385) > EP6.0 (0.3404) > JPXV (0.5631). LR⁻ value of JPXV was the largest, suggesting that its ability to judge suspicious samples was the weakest. It once again indicated that the influence of JPXV had the largest difference from CHPIII.

When the total suspicious rate was reduced from 0.2344 to 0.10 (Table 5), the PV⁺ values, representing the forecasting ability of a PRJA to judge suspicious batches in the actual number of suspicious batches, declined. Although the orders (the order at 0.2344 and the order at 0.10) of the PV⁺ value in various PRJAs were the same, the degree of decrease was different. The sharpest decline was in USP32 (24.6%), followed by 2-2-2-A and B (24.5%) > EP6.0 (19.5%). In addition, the PV⁺ values of JPXIV and JPXV did not change, showing that the ability of forecasting suspicious samples of the JPs had very high consistency with CHPIII, which was related to their low sensitivities.

When the total suspicious rate was reduced from 0.2344 to 0.10 (Table 5), the PV⁻ values, representing the forecasting ability of a PRJA to judge passed batches in the actual number of passed batches, increased. Although the orders (the order at 0.2344 and the order at 0.10) of the PV⁻ value in various PRJAs were the same, the degree of the increase was different. PV⁻ of USP32 did not change (the value was 1). The PV⁻ of the other PRJAs rose in varying degrees, the order was JPXV (8.8%) > EP6.0 (5.8%) > 2-2-2-A and B (4.2%) > JPXIV (0.4%), meaning that when the total suspicious rate decreased, the predictive ability of all the other PRJAs for the qualified samples increased (reaching over 0.94). JPXV was the fastest rising (8.8% reaching 94%), showing that, with a low total suspicious rate, the ability of JPXV judgment mode to judge qualified samples was the most enhanced.

Discussion

At present, interpretations of the pyrogen test have differences in major pharmacopeias. Three main patterns have been concluded. First, the PRJA depends only on the TTRIR, such as in JPXIV. Second, the PRJA depends only on the TSTRATR, such as in EP6.0, JPXV and Hoffmann’s models. Finally, the PRJA depends not only on TTRIR, but also on TSTRATR, such as in USP32 and CHPIII. From the data shown above, various PRJAs significantly affect the results in the first step of the test in judging the same original test data. There are more batches that needed additional stages in EP6.0 and USP32 because the TSTRATR of EP6.0 is only 1.15°C and the TTRIR is only 0.5°C in USP32. TSTRATR of EP6.0 and TTRIR of USP32 are the lowest compared to JPXV (1.3°C), CHPIII (1.4°C, 0.6°C) and JPXIV (0.6°C). More batches were classified as failed in CHPIII and JPXIV in the first stage of the test because of the strict standard of two out of three rabbits over 0.6°C temperature rise, which is the strictest PRJA for failed samples in comparison to EP6.0 (2.65/3 = 0.88) and JPXV (2.5/3 = 0.83). USP32 cannot judge the sample as a failed batch, but it can be deduced^3,12 that if the sum of the temperature rise of the three rabbits in the first stage is over 3.3°C (temperature rise of each is over 1°C), it can also judge the sample as failed, but this rarely happened based on our data.

In 2005, Hoffmann et al.³ analyzed the difference of the results affected by using different thresholds of the mean temperature rise of the PRJAs in EP, USP and JP. They found that when the endotoxin concentration reaches 7.38 EU/kg, which will result in a mean temperature rise in rabbits of 0.60°C (considered as pyrogenic), all the three PRJAs resulted in a probability of at least 95.0% in a pyrogenic classification. However, if the threshold of endotoxin was 2.53 EU/kg, i.e. 0.41°C, the probabilities of pyrogenic classification were differed between 2.5% for the EP and 53.8% for the USP, making it difficult to get the same conclusion for the same sample. Our data showed that some of the passed batches in JPXV were judged suspicious and needed additional stages in EP6.0 (9.0%), JPXIV (12.8%), CHPIII (13.2%), USP32 (30.4%) and Hoffmann’s theoretical models (22.9%; Tables 1 and 2). This indicated that it could increase the additional stages and pay much more attention when a sample was examined in the RPT with stricter criteria (like USP32), because the test had its own limitations. Therefore, various PRJAs should be harmonized further to set up a unique PRJA under the promise of suitable range of safety and fewer numbers of rabbits used.

There are some other facts about the limitations of the test. In the past 60 years of the RPT, it has been found that there is a great individual variation of sensitivity to fever.¹³ About two-thirds of rabbits could have fever of over 0.5°C at a dose of 5 EU/kg endotoxin. However, a few sensitive rabbits could experience fever over 0.5°C even at a dose of under 2.5 EU/kg endotoxin. It was also found during past routine work that most of the batches needing additional stages were mainly due to a high temperature rise of individual rabbits (the temperature of the other two rabbits was lower than 0.3°C) and few were due to the sum of the temperature rise of all three rabbits being over.

In this study, the data showed an interesting phenomenon that even though the PRJA in CHPIII was similar to that of USP32, the results had a strong consistency with JPXIV. Further analysis of the data revealed (data not shown), according to CHPIII, that 94.2% (194) of the 206 suspicious batches were finally determined passed batches after further experiment. The actual rate of failed batches (<5%) was lower than that determined even by JPXV. This was proven from another point of view that the difference of the test results was mainly due to the individual variation of sensitivity and the ability of JPXV to distinguish whether samples are qualified or not is more realistic than the others, although the value of Se is the lowest, the Sp and LR⁻ were the highest in all PRJAs as in comparison to CHPIII. In addition, with a decline of the total suspicious rate of samples from 23.4% to 10%, the predictability of qualified samples was improved the greatest in JPXV (8.8%) compared to other PRJAs (i.e. EP 5.8%). Thus, when the total suspicious rate of samples was less than 10%, the overall capacity of JPXV to discriminate passed and suspicious batches is more dominant with more advantages, such as reducing the number of animals used and saving manpower and costs. This may be the reason that JPXV modified its PRJA using TSTRATRs as criteria. It could be concluded that usage of the TSTRATR is more scientific and capable of guaranteeing the product quality. This point of view is similar to that of Tschumi,¹² because it considers all of the rabbits used in the test.

As more and more types of parenteral pharmaceutical products are developed and used in the clinic, more rabbits are needed for the pyrogen test. However, as modern pharmaceutical production is in line with GMP conditions, the number of batches with excessive pyrogen is less than before. From Table 2 and Table 5, the PRJA using TTRIR as a criterion would have a higher false positive rate than using TSTRATR and JPXV is more powerful, realistic and scientific to discriminate between the passed and failed batches, even in the first stage of the test when the total suspicious sample rate declines. That means when the total sample qualification rate is high, fewer animals could be used in the PRJAs using TSTRATR as criterion (such as EP 6.0 and JPXV) compared with CHPIII, USP32, and JPXIV and there is little influence to the test for (real) suspicious batches.

The 2-2-2-A and B models proposed by Hoffmann et al.³ are theoretical models based on fever induced by endotoxin, and consideration of 3R principles and global unification. The 2-2-2-A model was designed as a compromise between the EP 6.0 and JPXIV algorithms, while 2-2-2-B model simulated USP 32. In the two theoretical models, only two rabbits are used in each stage of the test, which is completed within three times. The models only need up to six rabbits for final judgment of the test. From our data shown above, there was almost no difference between the results of the two models and more samples needed additional stages in the two theoretical models than in EP6.0, JPXIV and JPXV, which was perhaps due to only using two animals. It is also shown from the data that it is better to use the TSTRATR criterion with three rabbits in the first test stage than with two animals to evaluate the pyrogen test result because it reduced the additional stages times and saved manpower and costs. The models may also have risks of missing problematic samples when using two rabbits in the test group because it is an individual rabbit, which make a sample fail sometimes.

On the basis of (i) the goal of the pyrogen test is to ensure the safety of parenteral medicines; (ii) a PRJA modification should be accurate and reduce costs (i.e. manpower, material and animal); (iii) the facts that the pyrogen test in EP and CHP has been used for several decades and no serious clinical problems have been reported, the latest improvements of pyrogen test in JPXV are thought as scientific and embodying the principles of 3R (reducing three rabbits compared with EP). Maybe in the future, JPXV could be a good reference for other pharmacopeia revisions.

As explained in our previous paper,¹⁴ the probe currently used in RPT is made from integrated PVC, and the accuracy is up to 0.1°C. According to the definition of the significant digit, it is more scientific and reasonable to use the hundredth digit¹⁵ because it can avoid large error conduction in later calculations, improve the accuracy of results and reduce additional stage times and waste of rabbits. For example, if you keep the hundredth significant digit, you will draw a conclusion that the temperature rises of three rabbits with 0.66°C, 0.25°C and 0.22°C, respectively, is pyrogen-free by EP6.0 and JPXV and need additional stages by USP32, CHPIII, and JPXIV. When you round off the significant digit from the hundredth to the tenth, the temperature rise will be 0.7°C, 0.3°C and 0.2°C, respectively, result in pyrogen-free conclusion only in JPXV.

Through accumulating research in our laboratory (94.2% of 206 suspicious samples were finally proven qualified after additional stages) and the above analysis about various PRJAs, we can see that using the TSTRATR for the RPT is more reasonable and has more advantages, such as reducing the retrial times and the number of animals used. Here, considering the safety on clinical use of the products, the advantage of the RPT in JPXV, EP6.0, and the correlative literature,^3,12 we put forward another PRJA as shown in Table 6 for evaluation and discussion. In this PRJA, the final judgment threshold for evaluating the samples is the same as that of JPXV and EP6.0 (the mean temperature rise is 0.55°C), which use TSTRATR as judgment criterion. The first and second thresholds are set between the two pharmacopoeias and the mean value of the three thresholds increases (for qualified sample) or decreases (for unqualified sample) equidifferently. By checking the original data above with this new proposed version of PRJA, the results judged in the first stage of the test are completely the same as those judged by JPXV.

Table 6.

A new proposed version of PRJA

Cumulative number of rabbits	Product passes if summed response does not exceed (mean response)	Product fails if summed response exceeds (mean response)
3	1.35°C (0.45)	2.61°C (0.87)
6	3.00°C (0.50)	4.26°C (0.71)
9	4.95°C (0.55)	4.95°C (0.55)

The proposed version consists of three steps with three rabbits in each step, in which all steps allow for passed or failed classification. Additionally, the first two steps include temperature ranges, which demand to proceed to the next step. In this PRJA, the criteria for classification are the TSTRATRs. For example, in the first step, the criteria of TSTRATRs are 1.35°C and 2.61°C, a sum of the temperature rise below or equal to the lower TSTRATR (1.35°C) in a passed classification and above the upper TSTRATR (2.61°C) is in a failed classification. Values between the two TSTRATRs will demand the testing of additional three rabbits in a second step. In the last step, the tested sample will be failed if the sum of the temperature rise of all 9 rabbits exceeds 4.95°C, and passed otherwise.

New alternative method to the pyrogen test is needed. Owing to the difference of species and individuals, the credibility of RPT result is always questioned by people, especially on some special products (such as human growth hormone). Therefore, scientists have been looking for alternative methods of pyrogen detection. The bacterial endotoxin test (BET), one of the alternative methods, has been widely accepted, although it also has species’ differences and many limitations, such as it only can detect endotoxins. At the present moment, the European Centre for the Validation of Alternative Methods (ECVAM) has validated five in-vitro methods of pyrogen tests,¹⁶^–²³ which all use the secretion of cytokines of human blood cells to measure the pyrogenic activity of pyrogens. The methods include human whole blood IL-1, human whole blood IL-6, PBMC IL-6, MM6 IL-6, and human cryopreserved whole blood IL-1. Although these five methods were adopted by EP and also endorsed by FDA, they are still having some limitations; for example, they are not the appropriate replacements for the RPT for those drugs or biologics whose pharmacodynamic activity is to induce cytokines release. Until now, it still needs further evaluation to use these five far-ranging methods, and none can be considered as a complete replacement for the RPT without additional product-specific information.

Footnotes

Acknowledgements

The authors are grateful to Dr Ji-Fu Wei (Clinical Experiment Center, First Affiliated Hospital of Nanjing Medical University) for critical discussion of our study and to Dr Qian Liu (Department of Pharmacology, National Institute for the Control of Pharmaceutical and Biological Products). This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

References

Huang

. Development and discussion of pyrogen test methods. Chin J Lab Anim Sci 2002; 12: 232–235.

Tan

Zhang

. Wang

. Pyrogen detection and evaluation. Safety Evaluation of Biotechnology Pharmaceuticals, 1st edn. Beijing: People’s Medical Publishing House, 2008476–482.

Hoffmann

Lüderitz-Püchel

Montag

Hartung

. Optimisation of pyrogen testing in parenterals according to different pharmacopoeias by probabilistic modeling. J Endotoxin Res 2005; 11: 25–31.

Tan

Zhang

. Discussion of the major differences of pyrogen test between USP, EP, JP and ChP. Chin J Pharm Anal 2008; 28: 2149–2155.

Society of Japanese Pharmacopoeia. General Tests, 47 Pyrogen Test. The Japanese Pharmacopoeia, 14th edn (English version). Tokyo, 2001; 78–79.

Society of Japanese Pharmacopoeia. Biological Tests, 4.04 Pyrogen Test. The Japanese Pharmacopoeia, 15th edn (English version). Tokyo, 2006; 87–88.

Council of Europe. Methods of analysis, Biological Tests, 2.6.8 Pyrogen, In: European Pharmacopeia. Strasbourg: Council of Europe, 2007; 164–165.

Biological Tests, ‘pyrogen test’. In: The United States Pharmacopeial Convention eds. The United States Pharmacopoeia 32, 27th edn. Rockville, MD, 2009; 124–125.

Appendix XIID Pyrogen test. In: Chinese Pharmacopoeia Commission, ed. Pharmacopoeia of the People’s Republic of China (2005) Volume III, 1st edn. Beijing: Chemical Industry Publishing House, 2005; appendix 77–78.

10.

Tan

Zhang

Ren

. The effects on the sham test on the result of rabbit pyrogen test. Chin Pharm Aff 2005; 19: 25–28.

11.

. Hu

. Statistical analysis method of the diagnostic test. Testing medical research design and statistical analysis, 1st edn. Beijing: People’s Military Medical Press, 2006168–174.

12.

Tschumi

. Comparison of temperature rise interpretations between European and United States Pharmacopeias’ pyrogen tests. PDA J Pharm Sci Technol 2003; 57: 218–227.

13.

Tan

Ren

Huang

. Pyrogen and pyrogen detection method of the original research. Chin J Pharm Anal 2004; 24: 653–659.

14.

Tan

Ren

. Comparison of the pyrogen test methods in ‘Chinese Pharmacopeia’ and ‘Requirements for Biologics of PRC’. Chin Pharm Aff 2005; 19: 304–307.

15.

. The effect to pyrogen test of the temperature probe’s precision and rounding off method. Chin Pharm Aff 1994; 8: 312–313.

16.

Hartung T. Statement on the validity of the in-vitro pyrogen test. European Centre for the Validation of Alternative Methods, Ispra, 2006.

17.

Nakagawa

Maeda

Murai

. Evaluation of the in vitro pyrogen test system based on proinflammatory cytokine release from human monocytes: comparison with a human whole blood culture test system and with the rabbit pyrogen test. Clin Diagn Lab Immunol 2002; 9: 588–597.

18.

Schindler

Spreitzer

Löschner

. International validation of pyrogen tests based on cryopreserved human primary blood cells. J Immunol Methods 2006; 316: 42–51.

19.

Hasiwa

Kullmann

von Aulock

Klein

Hartung

. An in vitro pyrogen safety test for immune-stimulating components on surfaces. Biomaterials 2007; 28: 1367–1375.

20.

Gao

. The progress of a new in vitro pyrogen test. Chin J Pharm Anal 2007; 27: 777–781.

21.

Kikkert

de Groot

Aarden

. Cytokine induction by pyrogens: comparison of whole blood, mononuclear cells, and TLR-transfectants. J Immunol Methods 2008; 336: 45–55.

22.

Daneshian

von Aulock

Hartung

. Assessment of pyrogenic contaminations with validated human whole-blood assay. Nat Protocols 2009; 4: 1709–1721.

23.

Alderson NE. ICCVAM/FDA communication. 2009; <http://ecvam.jrc.it/index.htm>.

Comparison of temperature rise interpretations in the rabbit pyrogen test among Chinese,Japanese,European,and United States pharmacopeias and 2-2-2 theoretical models proposed by S. Hoffmann

Abstract

Keywords

Introduction

Materials and methods

Materials

Animals and experimental environment

Pyrogen test

Result judgments and statistical methods

Results

Result judgments by CHPIII, JPXIV, JPXV, EP6.0, USP32 and Hoffmann’s theoretical models

Statistical analysis

Discussion

Footnotes

Acknowledgements

References