Abstract
Although the rabbit pyrogen test is one of the crucial methods included in each pharmacopeia to evaluate the safety of parenteral medicine, the experimental procedures and pyrogen result judgment algorithms (PRJAs) are still greatly different from one another. In the first stage of testing, original data of 879 batches from a total of 2637 rabbits in our laboratory were judged by PRJAs in the Chinese Pharmacopoeia 2005 III, the Japanese Pharmacopoeia XIV, the Japanese Pharmacopoeia XV, the European Pharmacopeia 6.0, the United States Pharmacopoeia 32 NF27 and two theoretical models proposed by S. Hoffmann, respectively. The results were analyzed to evaluate the effects of various PRJAs. It was shown that: (i) the significant differences in the results judged by various pharmacopeias and Hoffmann’s theoretical models were mainly due to the PRJAs and the great differences in PRJAs should be harmonized throughout the world based on balance of reducing animal use and guaranteeing the safety of medicines; (ii) it is better to use PRJAs that depend on the threshold of the sum of temperature rise of all tested rabbits than those that depend on the number of rabbits that are over the threshold of temperature rise of individual rabbit according to clinical proof and the experimental data; and (iii) the PRJA of the Japanese Pharmacopoeia XV has obvious advantages when the total suspicious rate of samples was less than 10%. Additionally, a new PRJA designed for reducing the additional experiment stages and animal consumption is promoted for evaluation.
Keywords
Introduction
Reactions caused by pyrogens are well known to be harmful to the human body. The major symptoms are fever, chills, nausea, vomiting, headache, waist and/or joint pain, gray skin color, leukopenia, and an increase in vascular permeability. In severe cases, pyrogens can lead to coma, shock and even death.1,2 In order to guarantee the safety of parenteral pharmaceutical products, especially intravenous medicines, the rabbit pyrogen test (RPT) method has been adopted by most pharmacopoeias, which strictly regulates any pharmaceutical product for parenteral application, including biological products and medical devices (product for in vivo use). Based on the conditions of lower pyrogen content in samples under modern Good Manufacture Practice (GMP) and the demand of reducing the use of animals by the replacement/refinement/reduction (3Rs) principle in recent years, Hoffmann et al. 3 proposed two other pyrogen test models, 2-2-2-A and 2-2-2-B (Hoffmann’s theoretical models) in 2005.
Although the purpose of the pyrogen tests in every pharmacopoeia and Hoffmann’s theoretical models are the same, the test procedures and result interpretations are different. 4 We summarized that there are eight main differences among two Japanese Pharmacopoeias (JPXIV 5 and JPXV 6 ), the European Pharmacopoeia 6.0 (EP6.0), 7 the United States Pharmacopoeia 32 NF27 (USP32), 8 the Chinese Pharmacopoeia 2005 volume 3 (CHPIII) 9 and Hoffmann’s theoretical models. First, the number of rabbits used in the first stage of the pyrogen test is different, three in each pharmacopoeia and two in Hoffmann’s theoretical models. Second, the times of testing and total number of rabbits used for the final result judgment are different. The EP6.0 requires up to 4 testing times and 12 rabbits (the highest total number of the tests compared here); JPXV requires up to 3 times and 9 rabbits; CHPIII, USP32 and JPXIV require only up to 2 times and 8 rabbits; and Hoffmann’s theoretical models require up to 3 times and 6 rabbits (the lowest total number of the tests compared here). Third, the pyrogen result judgment algorithms (PRJAs) are different in each pharmacopoeia. JPXIV uses the threshold of temperature rise of individual rabbit (TTRIR) as the judgment criterion; EP6.0, JPXV and Hoffmann’s theoretical models use the threshold of the sum of temperature rise of all tested rabbits (TSTRATR) as the judgment criterion; and USP32 and CHPIII use the combined criteria, not only the TTRIR, but also the TSTRATR. Fourth, the criteria of TTRIRs and TSTRATRs are different. Although the PRJA of CHPIII is similar to USP32, the TTRIR and TSTRATR is higher than USP32 (the TTRIR and TSTRATR are 0.6°C and 3.5°C in CHPIII and 0.5°C and 3.3°C in USP32, respectively).8,9 The TSTRATRs allowable to a qualified sample (defined as ‘passed’ in this context) in the first two stages in EP6.0 are lower than that in JPXV, but higher than that for ‘pyrogen’ samples (defined as ‘failed’ in this context). Fifth, the mean temperature rise calculated from the total temperature rising value divided by all rabbits is different, USP32 is 0.41°C (3.3°C divided by 8, the lowest), Hoffmann’s 2-2-2-B is 0.42°C, CHPIII is 0.44°C, Hoffmann’s 2-2-2-A is 0.52°C, both EP6.0 and JPXV are 0.55°C and JPXIV is 0.6°C3 (the highest). Sixth, we can directly judge the samples unqualified in the first stage of the test by all the PRJAs, except USP32. Seventh, the initial temperature judgment method of each PRJA is different. 10 EP6.0, CHP III and JPXV use the mean value of temperature recorded from the same rabbit twice at an interval of 30 min within 40 min immediately before injection of the sample to be examined, whereas USP32 demands the normal temperature to be determined within 30 min. Finally, the initial temperature range of rabbits required in each PRJA is different. 10 The initial temperature range requires 38.0–39.6°C in CHPIII, 38.0–39.8°C in EP6.0, no more than 39.8°C in USP32, JPXIV, JPXV and no mention about it in Hoffmann’s theoretical models. In this paper, the comparison of the methods is based solely on the differences in interpretation of the temperature rise methods, particularly among JPXIV, JPXV, EP6.0, USP32, CHPIII and Hoffmann’s theoretical models.
The differences between pharmacopeias, especially in the test result interpretations, are still an issue. On the one hand, to our knowledge, extensive study has never been performed on the differences in result interpretations for the test, and no one has examined the feasibility of Hoffmann’s models; on the other hand, RPT is only a limit control test with great individual differences, whether it is suitable and cost-effective to control pyrogens strictly by the RPT is controversial. The present study was initiated to analyze the consistency and differences of the PRJAs among EP6.0, USP32, JPXIV, JPXV, CHPIII and Hoffmann’s theoretical models by using the original data in our laboratory, subsequently, to discuss the optimal PRJA. On the basis of the above analysis and the safe use of products for a long time period under the control now, we provide a new PRJA which is not only simple, reasonable and reduces replication times, but is also in accordance with the 3R principles (reducing animal-consumption).
When necessary, the abbreviations EP6.0, USP32, JPXIV, JPXV, CHPIII and Hoffmann’s theoretical models stand for the corresponding PRJAs below.
Materials and methods
Materials
The ZRY-2A pyrogen testing instrument was purchased from the Tianda-Tianfa Science and Technology, LLC (Tianjin, PR China). All testing samples were from different Chinese biopharmaceutical companies. The total 879 batches of testing samples included human serum albumin (414 batches), human immunoglobulin for intravenous injection (199 batches), Haemophilus influenza type b conjugate vaccine (206 batches), and polyvalent pneumococcal polysaccharide vaccine (60 batches). All the tests were completed between 2006 and 2008.
Animals and experimental environment
In total, 2637 healthy New Zealand white rabbits, males or non-pregnant females weighing 1.7–2.7 kg, were purchased from KeYu Laboratory Animal Company in Beijing. The qualified certificate numbers are SCXK (Jing) 2002-0005 and SCXK (Jing) 2007-0003.
All of the tests were done in the barrier rooms (license number: SYXK (Jing) 2006-0004). The temperature difference between the experimental rooms and the rabbits’ living quarters was not more than 3°C. The temperature of the laboratory was 20–25°C, and the humidity was 40–70%. Rabbits were fed with the same diet 7 d before measuring the body temperature prior to the test. One cage housed one rabbit. During this period, no abnormal manifestations, such as loss of body weight or problems in behavioral status, appetite or excretions, occurred.
Pyrogen test
The methods and environmental conditions of the CHPIII pyrogen test were similar to that described in other pharmacopoeias and Hoffmann’s theoretical models. 3 – 10 However, the CHPIII description is more explicit. 4 The description gives the following directions: the rabbits, which have not been previously used for pyrogen testing, shall be selected by the following screening stage. First, measure the body temperature of each rabbit 3 d prior to the first stage of the test 8 times at intervals of 30 min under the same conditions as the pyrogen test without test sample injection (only EP6.0 screening test requires the intravenous injection of 10 ml/kg of body weight of the pyrogen-free 9 g/l solution of sodium chloride pre-warmed to about 38.5°C). Second, the rabbit may be used for the pyrogen test only when the body temperature in all 8 measurements is within the range of 38.0–39.6°C, and the difference between the highest and the lowest body temperatures is not more than 0.4°C.
During the first stage of the pyrogen test, withhold diet from the rabbits at least 1 h prior to the test, and put the rabbits into a suitable unit until the test is completed. The accuracy of the device used for measuring body temperature of the rabbits should be within the range of 0.1°C. The measurement of the body temperature of each rabbit should be performed twice at an interval of 30 min before the test. The difference between the two measurements should not exceed 0.2°C. The mean of the two measurements is regarded as the normal body temperature of the rabbit. All the normal body temperatures of the rabbits used on the date of the test should be within the range of 38.0–39.6°C, and the difference between normal body temperatures of the rabbits in the same group should not be more than 1°C. Within 15 min after measuring the normal body temperatures of three rabbits, inject slowly the test sample –preheated to 38°C – at the prescribed dose into the ear vein of each rabbit. Measure the body temperature of each rabbit 6 times at intervals of 30 min. The difference between the highest body temperature among the six measurements and the normal body temperature is regarded as the body temperature rise of that rabbit.
As for Hoffmann’s theoretical models, only the temperature rise data of the first two rabbits of the three in the test is considered for judgment to avoid the subjective bias.
Result judgments and statistical methods
According to the above method, the original results of 879 batches of test samples in the first stage of the pyrogen test were judged by CHPIII, JPXIV, JPXV, EP6.0, USP32 and Hoffmann’s theoretical models, respectively. The judged results were defined as passed (or pyrogen-free) batches, batches needing additional stages (repeat batches) and failed (or pyrogenic) batches (the content of pyrogens exceeded the pharmacopeia standard).
The statistical methods used were as follows: Kendall’s tau-b coefficient, weighted kappa coefficient and Bowker’s test for symmetry for 3 × 3 table; Kendall’s tau-b coefficient, simple kappa coefficient, McNemar’s test and diagnostic analysis, 11 including sensitivity (Se), specificity (Sp), agreement rate (π), Youden index (YI), negative likelihood ratio (LR−), the predictive value for suspicious rate of the total samples (PV+) and the predictive value for passed rate of the total samples (PV−) for 2 × 2 table. SAS software package was used in statistical analysis.
Results
Result judgments by CHPIII, JPXIV, JPXV, EP6.0, USP32 and Hoffmann’s theoretical models
From the original data tested by CHPIII in the first stage of the test, the results of four biological products showed that there were 325 batches (78.5%) passed, 80 batches (19.3%) needing additional stage and 9 batches (2.2%) failed in human serum albumin (414 batches in total); 129 batches (64.8%) passed, 57 batches (28.6%) needing additional stage and 13 batches (6.5%) failed in human immunoglobulin for intravenous injection (i.v.) (199 batches in total); 172 batches (83.5%) passed, 28 batches (13.6%) needing additional stage and 6 batches (2.9%) failed in H. influenza type b conjugate vaccine (206 batches in total); 47 batches (78.3%) passed, 9 batches (15.0%) needing additional stage and 4 batches (6.7%) failed in polyvalent pneumococcal polysaccharide vaccine (60 batches in total).
Pyrogen test results of the original data for 879 batches determined by CHP, JP, EP, USP and Hoffmann’s theoretical models (3 × 3 table)
The pyrogen test results of the original data for 879 batches, as determined respectively by JPXIV, JPXV, EP, USP and Hoffmann’s theoretical models, were classified as passed, need additional stages and failed, and were compiled using the results of CHPIII judgment as the reference standard. The sum and percentage of each class of each PRJA calculated and showed in the last column except CHPIII in the last row.
USP32 cannot directly determine unqualified samples in preliminary tests.
Only the temperature rise data of the first two rabbits out of the three in the test was considered for judgment to avoid the subjective bias.
Agreement batches judged by all PRJAs in the test results of the original data of pyrogen tests for 879 batches, were classified as passed, need additional stages and failed, except USP32.
Pyrogen test results of the original data for 879 batches determined by CHP, JP, EP, USP and Hoffmann’s theoretical models (2 × 2 table)
The pyrogen test results of the original data for 879 batches, classified as passed and suspicious were determined respectively by CHPIII, JPs, EP, USP and Hoffmann’s theoretical models, and compiled using the result of CHPIII as the reference standard. The sum and percentage of each class of each PRJA calculated and showed in the last column except CHPIII in the last row.
Statistical analysis
The correlation and symmetry difference of JPs, EP, USP and Hoffmann’s theoretical models in comparison to CHPIII (for 3 × 3 table)
The consistencies and differences analyzed by Kendall’s tau-b coefficients, Weighted kappa coefficients and Bowker’s test for symmetry respectively shown that only PRJA of JPXIV was closely consistent with that of CHPIII (the Kendall’s tau-b coefficient >0.95 and P-value of Bowker’s test for symmetry is >0.05), other PRJAs were significant different from CHPIII (P of Bowker’s test for symmetry <0.0001).
Because USP32 is unable to determine unqualified samples in first stage of the test, weighted kappa coefficients and Bowker’s test for symmetry analysis are impossible.
Diagnostic parameter analysis of JPs, EP, USP and Hoffmann’s theoretical models in comparison to CHPIII
Taking the result judgment of CHPIII as the reference standard, A represents the suspicious batches determined by CHPIII and another pyrogen result judgment algorithm (PRJA); B represents the batches that CHPIII determined as passed and another PRJA determined to be suspicious; C represents the batches that CHPIII determined to be suspicious and another PRJA determined as passed; D represents the passed batches determined by CHPIII and another PRJA.
Comparison of the statistical parameters of JPs, EP, USP and Hoffmann’s theoretical models to those of CHPIII (for 2×2 table)
Kappa test: evaluation of the consistency of two PRJAs in the 2×2 table. The value of kappa is between −1 and +1. K = −1 indicates that the two PRJAs determination of the result is completely inconsistent; K = 0 indicates that the observation consistency is completely caused by accidental errors; K = 1 indicates that the two PRJAs have complete consistency, not due to chance. Generally, if K > 0.75, the consistency is thinking as best; if 0.40 < K < 0.75, the consistency good; if K < 0.4, the consistency bad.
Kendall’ tau-b coefficient is used for a correction of ties only when both variables lie on an ordinal scale. The higher the tau-b value, the higher its association with CHP III.
McNemar’s test is used to evaluate the result differences between two PRJAs in the 2 × 2 table. P < 0.01, the judgment results of two PRJAs have a significant difference.
Sensitivity, Se = A/(A + C), represents the suspicious batches judged by a particular pyrogen judgment mode compared to the number of suspicious batches judged by CHPIII, reflecting the ability of the PRJA to detect suspicious samples. It is also called the suspicious rate (true pyrogen suspicious ratio). A larger value indicates that the discriminating ability for suspicious batches is more similar to CHP III.
Specificity, Sp = D/(B + D), represents the qualified batches judged by a particular PRJA compared to the number of qualified batches judged by CHPIII, reflecting the ability of the PRJA to detect qualified samples. It is also called the qualified rate (true pyrogen qualified rate). A larger value indicates that the discriminating ability for passed batches is more similar to CHP III.
Agreement rate, π = (A + D)/(A + B + C + D), represents that the sum of the true suspicious and true qualified batches judged by a particular PRJA compared to the total number of batches. It reflects the ability of the PRJA to correctly differentiate the suspicious and qualified samples detected by CHPIII.
Youden index, YI = Se + Sp − 1, represents the difference between the true suspicious rate and false suspicious rate, reflecting the total ability of the PRJA to judge the qualified and suspicious samples detected by CHPIII. A larger value indicates that the authenticity of the result detected by the PRJA was closer to CHPIII.
Negative likelihood ratio, LR− = [C*(B + D)]/[D*(A + C)], refers to the ratio of the false negative rate and true negative rate. A smaller value means that the ability of the PRJA to eliminate suspicious samples is stronger.
The predictive value for suspicious rate of the total samples, PV+ = P*Se/[P*Se + (1 − P)*(1 − Sp)], refers to the ability of a PRJA to judge suspicious batches compared to the actual number of suspicious batches. A larger value indicates that the predictive ability of the PRJA is stronger.
The predictive value for passed rate of the total samples, PV− = (1 − P)*Sp/[(1 − P)*Sp + P*(1 − Se)], refers to the ability of a mode to judge the number of qualified batches compared to the actual number of qualified batches. A larger value indicates that the predictive ability of the PRJA is stronger.
As shown in Tables 3 and 5, only the results of JPXIV were closely correlated with those of CHPIII (>0.98); other PRJAs had the ordinary correlation by using Kendall’s tau-b coefficient and weighted kappa coefficient (in Table 1) or simple kappa coefficient (in Table 2); the coefficients were between 0.5–0.7. The sequence of correlation in the two statistical methods was also the same JPXIV > EP6.0 > USP32 > JPXV > 2-2-2-A and 2-2-2-B. Through the comparison of the different number of batches by using the Bowker’s test for symmetry in Table 1 or McNemar’s test in Table 2, there was a significant difference (P < 0.0001) between EP6.0, USP32, JPXV, Hoffmann’s theoretical models and CHPIII, but not between JPXIV and CHPIII.
When we took the judgment result of CHPIII as the reference standard, the sequence of Se (see in Table 5) reflecting the ‘real positive rate’ in other PRJAs in decreasing order was USP32 (1) > JPXIV (0.98) > 2-2-2-A and B (0.81) > EP6.0 (0.67) > JPXV (0.44). The sequence of Sp, which reflecting the ‘real negative rate’, in decreasing order was JPXIV = JPXV (1) > EP6.0 (0.96) > 2-2-2-A and B (0.81) > USP32 (0.78). While both TTRIR and TSTRATR of USP32 are lower than that of CHPIII (the TTRIR and TSTRATR are 0.6°C and 3.5°C in CHPIII, 0.5°C and 3.3°C in USP32, respectively), there were more suspicious batches judged by USP than CHPIII and the Se of USP32 was the largest (most sensitive) (1.000) and the Sp the smallest (0.78). The Se values in JPXIV and JPXV were greatly different, although the Sps (1) were the same. The Se in JPXIV (0.98) was almost equal to that in CHPIII, while the Se in JPXV (0.44) was the smallest among all of PRJAs. It can be concluded that, although the PRJA of CHPIII is most similar to that of USP32, the results judged by it are most similar to JPXIV, not to USP32. In addition, there were more suspicious batches in CHPIII had been classified as qualified in JPXV. This indicated that most suspicious batches were due to rabbits that were over the TTRIR of 0.6°C whereas the TSTRATR was less than 1.3°C.
When the total suspicious rate was 0.2344, the order of π values (Table 5), reflecting the ability to distinguish properly the suspicious and passed batches of other PRJAs, was JPXIV (0.9954) > EP6.0 (0.8896) > JPXV (0.8680) > USP32 (0.8282) > 2-2-2-A and B (0.8123). The YIs order, reflecting the total discriminating capacity of various PRJAs, was JPXIV (0.9806) > USP32 (0.7756) > EP6.0 (0.6302) > 2-2-2-A and B (0.6201) > JPXV (0.4369). These results showed that JPXIV’s π and YI values were both close to 1, indicating the judgment ability for suspicious and qualified samples was closely consistent with CHPIII, agreeing with the Se and Sp results. Although the values of π of USP32, EP6.0, JPXV and 2-2-2-A and B were all above 80%, there were large differences in the total judgment ability amongst the PRJAs. YI of USP32 was close to 0.8; EP6.0 and 2-2-2-A and B were within the range of 0.62–0.64. YI of JPXV was the lowest of all at about 0.4, showing that the total judgment ability of JPXV on test results had the largest difference with CHPIII, which was similar to its results of a low Se as shown above.
The order of the LR− values from strong to weak rank (Table 5), reflecting the ability of various PRJAs to exclude suspicious samples, was JPXIV (0.0194) > 2-2-2-A and B (0.2385) > EP6.0 (0.3404) > JPXV (0.5631). LR− value of JPXV was the largest, suggesting that its ability to judge suspicious samples was the weakest. It once again indicated that the influence of JPXV had the largest difference from CHPIII.
When the total suspicious rate was reduced from 0.2344 to 0.10 (Table 5), the PV+ values, representing the forecasting ability of a PRJA to judge suspicious batches in the actual number of suspicious batches, declined. Although the orders (the order at 0.2344 and the order at 0.10) of the PV+ value in various PRJAs were the same, the degree of decrease was different. The sharpest decline was in USP32 (24.6%), followed by 2-2-2-A and B (24.5%) > EP6.0 (19.5%). In addition, the PV+ values of JPXIV and JPXV did not change, showing that the ability of forecasting suspicious samples of the JPs had very high consistency with CHPIII, which was related to their low sensitivities.
When the total suspicious rate was reduced from 0.2344 to 0.10 (Table 5), the PV− values, representing the forecasting ability of a PRJA to judge passed batches in the actual number of passed batches, increased. Although the orders (the order at 0.2344 and the order at 0.10) of the PV− value in various PRJAs were the same, the degree of the increase was different. PV− of USP32 did not change (the value was 1). The PV− of the other PRJAs rose in varying degrees, the order was JPXV (8.8%) > EP6.0 (5.8%) > 2-2-2-A and B (4.2%) > JPXIV (0.4%), meaning that when the total suspicious rate decreased, the predictive ability of all the other PRJAs for the qualified samples increased (reaching over 0.94). JPXV was the fastest rising (8.8% reaching 94%), showing that, with a low total suspicious rate, the ability of JPXV judgment mode to judge qualified samples was the most enhanced.
Discussion
At present, interpretations of the pyrogen test have differences in major pharmacopeias. Three main patterns have been concluded. First, the PRJA depends only on the TTRIR, such as in JPXIV. Second, the PRJA depends only on the TSTRATR, such as in EP6.0, JPXV and Hoffmann’s models. Finally, the PRJA depends not only on TTRIR, but also on TSTRATR, such as in USP32 and CHPIII. From the data shown above, various PRJAs significantly affect the results in the first step of the test in judging the same original test data. There are more batches that needed additional stages in EP6.0 and USP32 because the TSTRATR of EP6.0 is only 1.15°C and the TTRIR is only 0.5°C in USP32. TSTRATR of EP6.0 and TTRIR of USP32 are the lowest compared to JPXV (1.3°C), CHPIII (1.4°C, 0.6°C) and JPXIV (0.6°C). More batches were classified as failed in CHPIII and JPXIV in the first stage of the test because of the strict standard of two out of three rabbits over 0.6°C temperature rise, which is the strictest PRJA for failed samples in comparison to EP6.0 (2.65/3 = 0.88) and JPXV (2.5/3 = 0.83). USP32 cannot judge the sample as a failed batch, but it can be deduced3,12 that if the sum of the temperature rise of the three rabbits in the first stage is over 3.3°C (temperature rise of each is over 1°C), it can also judge the sample as failed, but this rarely happened based on our data.
In 2005, Hoffmann et al. 3 analyzed the difference of the results affected by using different thresholds of the mean temperature rise of the PRJAs in EP, USP and JP. They found that when the endotoxin concentration reaches 7.38 EU/kg, which will result in a mean temperature rise in rabbits of 0.60°C (considered as pyrogenic), all the three PRJAs resulted in a probability of at least 95.0% in a pyrogenic classification. However, if the threshold of endotoxin was 2.53 EU/kg, i.e. 0.41°C, the probabilities of pyrogenic classification were differed between 2.5% for the EP and 53.8% for the USP, making it difficult to get the same conclusion for the same sample. Our data showed that some of the passed batches in JPXV were judged suspicious and needed additional stages in EP6.0 (9.0%), JPXIV (12.8%), CHPIII (13.2%), USP32 (30.4%) and Hoffmann’s theoretical models (22.9%; Tables 1 and 2). This indicated that it could increase the additional stages and pay much more attention when a sample was examined in the RPT with stricter criteria (like USP32), because the test had its own limitations. Therefore, various PRJAs should be harmonized further to set up a unique PRJA under the promise of suitable range of safety and fewer numbers of rabbits used.
There are some other facts about the limitations of the test. In the past 60 years of the RPT, it has been found that there is a great individual variation of sensitivity to fever. 13 About two-thirds of rabbits could have fever of over 0.5°C at a dose of 5 EU/kg endotoxin. However, a few sensitive rabbits could experience fever over 0.5°C even at a dose of under 2.5 EU/kg endotoxin. It was also found during past routine work that most of the batches needing additional stages were mainly due to a high temperature rise of individual rabbits (the temperature of the other two rabbits was lower than 0.3°C) and few were due to the sum of the temperature rise of all three rabbits being over.
In this study, the data showed an interesting phenomenon that even though the PRJA in CHPIII was similar to that of USP32, the results had a strong consistency with JPXIV. Further analysis of the data revealed (data not shown), according to CHPIII, that 94.2% (194) of the 206 suspicious batches were finally determined passed batches after further experiment. The actual rate of failed batches (<5%) was lower than that determined even by JPXV. This was proven from another point of view that the difference of the test results was mainly due to the individual variation of sensitivity and the ability of JPXV to distinguish whether samples are qualified or not is more realistic than the others, although the value of Se is the lowest, the Sp and LR− were the highest in all PRJAs as in comparison to CHPIII. In addition, with a decline of the total suspicious rate of samples from 23.4% to 10%, the predictability of qualified samples was improved the greatest in JPXV (8.8%) compared to other PRJAs (i.e. EP 5.8%). Thus, when the total suspicious rate of samples was less than 10%, the overall capacity of JPXV to discriminate passed and suspicious batches is more dominant with more advantages, such as reducing the number of animals used and saving manpower and costs. This may be the reason that JPXV modified its PRJA using TSTRATRs as criteria. It could be concluded that usage of the TSTRATR is more scientific and capable of guaranteeing the product quality. This point of view is similar to that of Tschumi, 12 because it considers all of the rabbits used in the test.
As more and more types of parenteral pharmaceutical products are developed and used in the clinic, more rabbits are needed for the pyrogen test. However, as modern pharmaceutical production is in line with GMP conditions, the number of batches with excessive pyrogen is less than before. From Table 2 and Table 5, the PRJA using TTRIR as a criterion would have a higher false positive rate than using TSTRATR and JPXV is more powerful, realistic and scientific to discriminate between the passed and failed batches, even in the first stage of the test when the total suspicious sample rate declines. That means when the total sample qualification rate is high, fewer animals could be used in the PRJAs using TSTRATR as criterion (such as EP 6.0 and JPXV) compared with CHPIII, USP32, and JPXIV and there is little influence to the test for (real) suspicious batches.
The 2-2-2-A and B models proposed by Hoffmann et al. 3 are theoretical models based on fever induced by endotoxin, and consideration of 3R principles and global unification. The 2-2-2-A model was designed as a compromise between the EP 6.0 and JPXIV algorithms, while 2-2-2-B model simulated USP 32. In the two theoretical models, only two rabbits are used in each stage of the test, which is completed within three times. The models only need up to six rabbits for final judgment of the test. From our data shown above, there was almost no difference between the results of the two models and more samples needed additional stages in the two theoretical models than in EP6.0, JPXIV and JPXV, which was perhaps due to only using two animals. It is also shown from the data that it is better to use the TSTRATR criterion with three rabbits in the first test stage than with two animals to evaluate the pyrogen test result because it reduced the additional stages times and saved manpower and costs. The models may also have risks of missing problematic samples when using two rabbits in the test group because it is an individual rabbit, which make a sample fail sometimes.
On the basis of (i) the goal of the pyrogen test is to ensure the safety of parenteral medicines; (ii) a PRJA modification should be accurate and reduce costs (i.e. manpower, material and animal); (iii) the facts that the pyrogen test in EP and CHP has been used for several decades and no serious clinical problems have been reported, the latest improvements of pyrogen test in JPXV are thought as scientific and embodying the principles of 3R (reducing three rabbits compared with EP). Maybe in the future, JPXV could be a good reference for other pharmacopeia revisions.
As explained in our previous paper, 14 the probe currently used in RPT is made from integrated PVC, and the accuracy is up to 0.1°C. According to the definition of the significant digit, it is more scientific and reasonable to use the hundredth digit 15 because it can avoid large error conduction in later calculations, improve the accuracy of results and reduce additional stage times and waste of rabbits. For example, if you keep the hundredth significant digit, you will draw a conclusion that the temperature rises of three rabbits with 0.66°C, 0.25°C and 0.22°C, respectively, is pyrogen-free by EP6.0 and JPXV and need additional stages by USP32, CHPIII, and JPXIV. When you round off the significant digit from the hundredth to the tenth, the temperature rise will be 0.7°C, 0.3°C and 0.2°C, respectively, result in pyrogen-free conclusion only in JPXV.
A new proposed version of PRJA
The proposed version consists of three steps with three rabbits in each step, in which all steps allow for passed or failed classification. Additionally, the first two steps include temperature ranges, which demand to proceed to the next step. In this PRJA, the criteria for classification are the TSTRATRs. For example, in the first step, the criteria of TSTRATRs are 1.35°C and 2.61°C, a sum of the temperature rise below or equal to the lower TSTRATR (1.35°C) in a passed classification and above the upper TSTRATR (2.61°C) is in a failed classification. Values between the two TSTRATRs will demand the testing of additional three rabbits in a second step. In the last step, the tested sample will be failed if the sum of the temperature rise of all 9 rabbits exceeds 4.95°C, and passed otherwise.
New alternative method to the pyrogen test is needed. Owing to the difference of species and individuals, the credibility of RPT result is always questioned by people, especially on some special products (such as human growth hormone). Therefore, scientists have been looking for alternative methods of pyrogen detection. The bacterial endotoxin test (BET), one of the alternative methods, has been widely accepted, although it also has species’ differences and many limitations, such as it only can detect endotoxins. At the present moment, the European Centre for the Validation of Alternative Methods (ECVAM) has validated five in-vitro methods of pyrogen tests, 16 – 23 which all use the secretion of cytokines of human blood cells to measure the pyrogenic activity of pyrogens. The methods include human whole blood IL-1, human whole blood IL-6, PBMC IL-6, MM6 IL-6, and human cryopreserved whole blood IL-1. Although these five methods were adopted by EP and also endorsed by FDA, they are still having some limitations; for example, they are not the appropriate replacements for the RPT for those drugs or biologics whose pharmacodynamic activity is to induce cytokines release. Until now, it still needs further evaluation to use these five far-ranging methods, and none can be considered as a complete replacement for the RPT without additional product-specific information.
Footnotes
Acknowledgements
The authors are grateful to Dr Ji-Fu Wei (Clinical Experiment Center, First Affiliated Hospital of Nanjing Medical University) for critical discussion of our study and to Dr Qian Liu (Department of Pharmacology, National Institute for the Control of Pharmaceutical and Biological Products). This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
