Abstract
Background
Peginterferon alfa-2a (PEG-IFN) treatment stopping rules in chronic hepatitis B (CHB) are clinically desirable. Previous studies exploring this topic contained important limitations resulting in inconsistent recommendations within the current treatment guidelines. We undertook a systematic review and individual patient data meta-analysis to identify the most appropriate PEG-IFN treatment stopping rules.
Methods
Roche's internal database, PubMed and conference abstracts were searched for studies that enrolled >50 treatment-naive patients with CHB who received PEG-IFN treatment for 48 weeks. Stopping rules were identified using receiver-operating characteristic curve analyses and pre-specified biomarker cutoff target performance characteristics (sensitivity >95%, specificity >10%, negative predictive value >90%). Robustness of proposed stopping rules was assessed using internal/external validation analyses.
Results
Eight study datasets were included in the meta-analysis (n=1,423; 765 hepatitis B e antigen [HBeAg]-positive, 658 HBeAg-negative patients). In general, performance of hepatitis B surface antigen (HBsAg) and HBV DNA cutoffs at weeks 12 and 24 was similar, and common biomarker cutoffs that met target performance criteria were identified across multiple patient subgroups. For HBeAg-positive genotype B/C and HBeAg-negative genotype D patients the proposed stopping rule is HBsAg >20,000 IU/ml at week 12. Alternatively, HBV DNA level cutoffs of >8 log10 and >6.5 log10 IU/ml, respectively, can be used instead. The proposed stopping rules accurately identify up to 26% of non-responders.
Conclusions
The meta-analysis demonstrates that early PEG-IFN discontinuation should be considered in HBeAg-positive genotype B/C and HBeAg-negative genotype D patients at week 12 of treatment based on HBsAg or HBV DNA levels.
Introduction
Chronic hepatitis B (CHB) is a leading cause of mortality and morbidity worldwide. An estimated 257 million people chronically infected are at risk of developing serious and life-threatening sequelae including cirrhosis, hepatic decompensation and hepatocellular carcinoma, which together account for 887,000 deaths annually [1].
Peginterferon alfa-2a (40 kD; PEG-IFN; Pegasys®, Roche, Basel, Switzerland) is an approved treatment option for CHB and, following a finite 48-week course of treatment, durable virological and/or serological responses are observed in approximately 30% of patients [2,3]. However, PEG-IFN therapy is associated with a significant treatment burden that consists of weekly subcutaneous injections and frequent and bothersome interferon-related adverse events. Considering the PEG-IFN benefit–risk profile, the ability to identify patients who are likely or unlikely to respond to PEG-IFN is clinically desirable.
Response-guided therapy is a concept that aims to optimize treatment for individual patients by predicting the likelihood of response early during the treatment course. This personalized healthcare approach is of significant clinical value as it minimizes unnecessary treatment exposure in patients unlikely to respond to further therapy, which in turn improves a treatment's benefit–risk profile as well as cost-effectiveness.
Over the last decade, particularly as accruing evidence implicated levels of hepatitis B surface antigen (HBsAg) as an important predictor of PEG-IFN response, numerous PEG-IFN stopping rules have been proposed [4–13]. These rules are generally applied at either week 12 or week 24 of treatment, and are based on either absolute or change-from-baseline HBsAg and/or HBV DNA levels. However, these analyses had certain limitations and differed in terms of methodology, patient population, sample size, choice of efficacy end point and treatment regimen [14]. These factors have made the interpretation of these analyses and the identification of optimum stopping rules challenging, as evidenced by the inconsistent recommendations in current treatment guidelines [15–18].
Meta-analyses of individual participant data (IPD), the gold standard of systematic reviews, provide the highest quality of evidence. Hence, we undertook a systematic review and IPD meta-analysis of PEG-IFN studies to identify the most appropriate PEG-IFN stopping rules.
Methods
This systematic review and IPD meta-analysis followed a predefined analysis plan and its reporting adheres to the recommendations of the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Statement [19]. A steering committee reviewed the meta-analysis findings and provided input regarding potential additional analyses.
Search strategy and study selection
A comprehensive study search was conducted with the primary source being the Roche internal study database. This database contains listings of all Roche-sponsored/-supported PEG-IFN studies (n>500), with many of the individual study datasets being readily available. Secondary search sources included PubMed (search terms: ‘PEG-IFN’, ‘peginterferon’, ‘pegylated interferon’, ‘hepatitis B’ and ‘CHB’) and published abstracts from annual hepatology conferences (AASLD, EASL and APASL) from 2010 to 2015. The aim of the secondary literature search was to identify any additional potentially eligible study datasets.
Studies were considered eligible if they met all of the following criteria: 1) completed Phase III or IV study; 2) enrolled >50 treatment-naive patients with CHB; 3) PEG-IFN administered using licensed dose/duration (180 μg/week for 48 weeks); 4) quantitative HBsAg data collected during treatment.
Data extraction and processing
In order to ensure the meta-analysis population was representative of the ‘real-world’ clinical practice treatment population, individual patients were considered eligible for analysis if they met all of the following criteria: 1) baseline HBsAg status available and consistent with the parent study population; 2) completed 80-120% of the planned 48-week PEG-IFN treatment duration (that is, 38-58 weeks); 3) available treatment outcome (response) data.
Clinical and laboratory data (including quantitative HBsAg, HBV DNA and ALT levels) from individual studies was pooled and specific data handling rules applied (Additional file 1). HBsAg and HBV DNA data were log-transformed prior to analysis. Treatment response, determined at 24 weeks post-treatment, was defined as: 1) hepatitis B ‘e’ antigen (HBeAg) seroconversion (loss of HBeAg and presence of anti-HBe antibodies) and HBV DNA <2,000 IU/ml, for HBeAg-positive patients; 2) HBV DNA <2,000 IU/ml and normalization of ALT for HBeAg-negative patients.
Statistical analysis
Separate analyses were conducted according to HBeAg status (positive or negative) and HBV genotype. Bio-marker kinetics were graphically explored to ascertain whether pooling of treatment subgroups (PEG-IFN monotherapy or in combination with nucleos(t)ide analogues [NA]) was appropriate. Stopping rules were developed using a 3-step process: biomarker selection, stopping rule identification and validation analyses. First, receiver operating characteristic (ROC) curve analyses were conducted for individual biomarkers (both absolute levels and change-from-baseline levels) at weeks 12 and 24 of treatment, and the area under the ROC curve (AUC) was calculated. A minimum of 100 observations was considered necessary to allow a robust analysis [20], and nonpredictive biomarkers (AUC <0.6) were excluded from further analyses.
Next, pre-specified target performance characteristics were used to identify candidate biomarker cutoffs: sensitivity (Se; % of all responders not meeting the stopping rule) >95%, specificity (Sp; % of all non-responders meeting stopping rule) >10%, and negative predictive value (NPV; likelihood of non-response if stopping rule is met) >90%. These target performance characteristics (high Se, low Sp, high NPV) ensure that a) responders would only rarely discontinue treatment prematurely, b) at least 10% of non-responders would be identified using the stopping rule, and c) any patient indicated to stop treatment has a very high likelihood of being a non-responder. The use of multiple target performance characteristics (as opposed to a single target performance characteristic) safeguards acceptable clinical utility of a proposed stopping rule. Similar performance targets were used in the previously published analyses [8].
Where multiple cutoffs were identified for a given biomarker, the cutoff with the highest specificity (that is, identifies the largest proportion of non-responders) was selected as a candidate cutoff. Additional candidate cutoffs were considered if they were present across multiple patient subgroups (for example, HBV genotypes) or were identified in previously published analyses [8,11]. Performance characteristics of candidate cutoffs were then compared and additional factors considered to identify a stopping rule.
Finally, the performance of proposed stopping rules was tested in subgroup analyses (internal validation) and/or independent datasets (external validation). For stopping rules derived from pooled treatment subgroups, the internal validation was conducted using the PEG-IFN monotherapy subgroup. Statistical analyses were performed using SAS 7.0 (SAS Institute Inc., Cary, NC, USA) and MedCalc 7.6 (MedCalc Software, Ostend, Belgium) software.
Results
A total of 560 PEG-IFN studies were screened, of which majority were identified using the Roche study database (primary search source), and eight eligible study datasets were included in the meta-analysis (Figure 1). Of the eight included studies, three were Roche-sponsored [2,3,6] and five were investigator-sponsored [21–25]. One additional investigator-sponsored study dataset became available whilst the meta-analysis was already being conducted and was thus reserved for use in external validation [26]. An overview of the studies included in the meta-analysis and a summary of excluded patients is presented in Additional files 2 and 3, respectively. Approximately 10% of patients were excluded as they did not meet eligibility criteria.

Study selection process (PRISMA flow diagram)
Patient characteristics
Data from 1,423 patients (765 HBeAg-positive and 658 HBeAg-negative) from eight study datasets was included in the meta-analysis. A summary of patient characteristics is shown in Table 1. Notably, patient characteristics and treatment response rates were comparable across treatment subgroups (monotherapy versus combination therapy). Majority of HBeAg-positive patients were infected with HBV genotype B (32%) or C (56%), while most of HBeAg-negative patients were infected with HBV genotype D (38%).
Patient characteristics
Determined 6 months post-treatment and defined as hepatitis B e antigen (HBeAg) seroconversion and HBV DNA <2,000 IU/ml (HBeAg-positive patients) or HBV DNA <2,000 IU/ml and alanine aminotransferase (ALT) normalization (HBeAg-negative patients). HBsAg, hepatitis B surface antigen; NA, nucleos(t)ide analogue; PEG-IFN, peginterferon alfa-2a.
Biomarker selection
Biomarker kinetics during treatment differed according to HBV genotype (Additional file 4), which supported the conduct of genotype-specific analyses. As expected, HBV DNA levels (but not HBsAg or ALT levels) differed markedly during PEG-IFN + NA combination therapy (Additional file 5). Thus, analyses of HBV DNA data were restricted to the PEG-IFN monotherapy subgroup while HBsAg/ALT analyses were performed on the overall dataset.
A summary of ROC curve analyses is presented in Additional file 6. Overall, a trend toward higher AUC values is evident with absolute compared with change-from-baseline biomarker levels, in HBeAg-positive versus HBeAg-negative patients, and at week 24 compared with week 12 time points. HBsAg and HBV DNA levels were generally fair-to-good predictors of response (AUC 0.6–0.8 in most cases), while ALT was consistently a poor predictor (AUC <0.6 in most cases). Thus, ALT was excluded from further consideration. HBeAg-positive genotype A (n=55) and D (n=29) subgroups, and the HBeAg-negative genotype A (n=45) subgroup had insufficient numbers to conduct robust ROC curve analyses and were also excluded from further analysis.
Stopping rule identification
Predictive biomarkers (HBsAg and HBV DNA) were graphically explored to identify cutoffs meeting the target performance criteria (example shown in Additional file 7). Interestingly, no cutoff range was identified in HBeAg-negative patients infected with HBV genotype B or C (data not shown).
Candidate biomarkers selected are shown in Table 2. In general, HBsAg and HBV DNA candidate cutoffs showed similar performance, and some cutoffs were common across multiple patient subgroups. Although candidate cutoffs at week 24 could maximize specificity (for example, 0.35 versus 0.30, in HBeAg-positive genotype B patients) the only modest increase would not justify delaying the use of a similar performing but earlier week 12 cutoff.
Candidate biomarker cutoffs
Common cutoffs in multiple genotype subgroups are underlined.
Hepatitis B surface antigen (HBsAg) units, IU/ml; HBV DNA units, log10 IU/ml.
No HBsAg decline and <2 log10 IU/ml HBV DNA decline. HBeAg, hepatitis B e antigen; NPV, negative predictive value; R/n, number of responders/number of patients; Se, sensitivity; Sp, specificity.
Based on these findings, the proposed stopping rule for HBeAg-positive patients with HBV genotype B or C infection, as well as for HBeAg-negative patients with HBV genotype D infection, is an HBsAg level >20,000 IU/ml at week 12 (Table 3). If HBsAg levels are not available, the alternative week 12 stopping rules are HBV DNA >8 log10 IU/ml for HBeAg-positive patients with HBV genotype B or C infection, and HBV DNA >6.5 log10 IU/ml for HBeAg-negative patients with HBV genotype D infection. The proposed stopping rules identify up to 26% of overall non-responders (that is, Sp 0.11–0.26; Table 3).
Performance characteristics of proposed week 12 stopping rules
HBeAg, hepatitis B e antigen; HBsAg, hepatitis B surface antigen; NPV, negative predictive value; Se, sensitivity; Sp, specificity.
In HBeAg-negative genotype D patients, the proposed HBsAg rule was compared with the PARC rule (no HBsAg decline and <2 log HBV DNA decline at week 12) in the PEG-IFN monotherapy subset. Although the PARC rule showed a numerically higher performance (Se 1.0, Sp 0.17, NPV 1.0) compared with the HBsAg rule (Se 0.91, Sp 0.17, NPV 0.91), this difference in performance was driven by a single patient (that is, 1/26 responders was misclassified using the HBsAg rule); thus, the two rules were considered to have clinically comparable performance.
Validation analyses
Results of validation analyses for the HBsAg rule are presented in Table 4. Internal validation, using the PEG-IFN monotherapy subgroup, demonstrated consistent rule performance. An independent dataset of HBeAg-positive patients (Additional file 8) was used for external validation. Although the sensitivity estimate for the genotype C subgroup was just below the target performance (that is, 0.93 instead of >0.95), this difference is not considered clinically meaningful and the overall performance remains clinically acceptable. No independent dataset was available to allow external validation in HBeAg-negative patients.
Validation analyses of proposed week 12 HBsAg stopping rule
HBeAg, hepatitis B e antigen; HBsAg, hepatitis B surface antigen; NPV, negative predictive value; Se, sensitivity; Sp, specificity.
The HBV DNA rule could not be validated, as the rule was already derived from the monotherapy subgroup and the independent HBeAg-positive dataset contained insufficient numbers of PEG-IFN monotherapy-treated patients (n=50 and n=29, for genotypes B and C respectively).
Additional analyses
Following recommendation from the steering committee, a re-analysis for HBeAg-negative subgroup was conducted using an alternative treatment end point (HBV DNA <2,000 IU/ml at 6 months’ post-treatment). Response rates were higher using the virological compared with the original combined end point (38% versus 31%). However, the performance of proposed week 12 stopping rules remained consistent (HBsAg >20,000 IU/ml, Se 0.95, Sp 0.17, NPV 0.91; HBV DNA >6.5 log10 IU/ml, Se 1.00, Sp 0.11, NPV 1.00), and comparable to the PARC rule (Se 0.97, Sp 0.17, NPV 0.94). Logistic regression models combining HBsAg and HBV DNA for prediction of response were also explored. However, the ROC curve AUC for these models (data not shown) was not significantly higher than the AUC of individual biomarkers, suggesting that combining biomarkers does not improve the prediction of response.
Discussion
Response-guided therapy is desirable for therapies such as PEG-IFN that produce therapeutic responses only in a proportion of patients while being associated with a significant tolerability burden. While several PEG-IFN stopping rules have been published previously and incorporated in current treatment guidelines, the previous analyses contained important limitations, in addition to small sample sizes.
For example, the pooled analysis of three HBeAg-positive randomized controlled trials used an unconventional efficacy end point (HBeAg loss and DNA <2,000 IU/ml), compared two previously proposed rules rather than trying to identify optimum rules and included 25% (204/803) subjects treated with peginterferon alfa-2b [8]. Although peginterferon alfa-2a and alfa-2b regimens are considered similar in terms of efficacy and safety, these molecules differ in structure and size, with resulting differences in pharmacokinetic profiles and dosing regimens [27]. Thus, on-treatment biomarker kinetics may differ according to peginterferon regimen, in which case the pooling of respective datasets may not be appropriate.
Similarly, the HBeAg-negative PARC rule was derived in a small cohort (n=107) of patients with predominantly HBV genotype D infection (n=85), and individual-biomarker rules were not explored to confirm whether the combined-biomarker PARC rule results in superior performance [24]. Although the PARC rule was subsequently validated in a pooled analysis of two randomized controlled trials [11], genotype A–C subgroups remained limited in size (n<50 each) and 21% (34/160) patients received a prolonged PEG-IFN treatment duration (96 weeks, at a dose of 135 mg after week 48).
Considering the above, the present meta-analysis has clear advantages over the previously published analyses, including the single analysis methods applied to the largest PEG-IFN biomarker dataset available, and represents the most comprehensive analysis conducted to date in identification of PEG-IFN stopping rules. Moreover, the use of prespecified study and patient eligibility criteria, the prespecified target performance characteristics, as well as the validation analyses, all intended to make the meta-analysis results robust and generalizable to the target population in the clinical setting.
An optimum stopping rule is one that maximizes clinical utility, and therefore must take into consideration not only performance but also other clinically relevant factors such as earlier treatment time point, ease of application, single rule for multiple patient subgroups and global applicability. Although potential stopping rules could be identified for week 24 of treatment, the only modest increase in performance compared with week 12 stopping rules was not deemed sufficient to justify recommending their use. Similarly, although genotype-specific rules that maximize specificity for given patient subgroups were identified, a stopping rule common across subgroups was chosen as this allows ease of use in the clinical setting. This is particularly useful in regions with a predominantly mixed prevalence of HBV genotypes B/C (for example, China, Taiwan, Korea, Thailand, Indonesia) but where genotyping may not be routinely conducted, as the proposed stopping rule can be applied even when the patient's HBV genotype is unknown. Global application of stopping rules is facilitated by providing an alternative biomarker (HBV DNA) that can be used instead of HBsAg in settings where HBsAg is either not routinely monitored or where HBsAg assays are not commercially available.
This meta-analysis has certain limitations. Insufficient data were available to develop genotype-specific stopping rules for certain subgroups such as HBeAg-positive patients with HBV genotype A or D infection, and HBeAg-negative patients with HBV genotype A infection. Treatment response was defined at 24 weeks post-treatment as this was the primary end point of many of the included studies and later post-treatment data was limited; it would have been preferable to evaluate later time points, particularly in HBeAg-negative patients where relapse is not uncommon. Validation analyses were incomplete in that the HBsAg rule was not externally validated for HBeAg-negative patients, and no validation analyses were performed for the alternative HBV DNA stopping rules.
Nonetheless, the meta-analysis results have important implications for development of future treatment guidelines as there are current inconsistencies in terms of their recommendations. For example, EASL/APASL guidelines [15,17] recommend use of the PARC rule in HBeAg-negative genotype D patients, which requires monitoring of both HBV DNA and HBsAg levels. The rules proposed in this report thus entail a simpler approach while maintaining comparable performance. Chinese guidelines recommend the use of the PARC rule in HBeAg-negative patients regardless of HBV genotype [18]. However, despite sufficient sample sizes, the present meta-analysis could not identify any stopping rule in this patient subgroup. Furthermore, the meta-analysis validates the EASL recommended stopping rule for HBeAg-positive genotype B/C patients at week 12 [17], which is important given this stopping rule was identified in a heterogenous dataset and lacked independent validation. Lastly, the AASLD guidelines do not currently recommend any PEG-IFN stopping rules [16].
In conclusion, the results of this meta-analysis demonstrate that discontinuation of PEG-IFN may be considered after 12 weeks of treatment if HBsAg >20,000 IU/ml, both in HBeAg-positive patients with HBV genotype B/C infection and in HBeAg-negative patients with genotype D infection. Where HBsAg levels are not available, HBV DNA stopping rules at week 12 can be used instead: >8 log10 IU/ml in HBeAg-positive patients with HBV genotype B/C infection, >6.5 log10 IU/ml in HBeAg-negative patients with genotype D infection.
Footnotes
Acknowledgment
The authors would like to thank the investigators of the studies for their contribution to the generation of data. Support for third-party writing assistance, furnished by John Carron, of Health Interactions (Manchester, UK) and funded by Roche Products Ltd, Welwyn, UK. VP conceived and designed the study, performed the study search and acquired the study datasets. LY and VP analysed and, together with CW, interpreted the data. VP drafted the manuscript. All authors discussed the results, critically revised the manuscript, and approved the final version of the manuscript.
VP, CW and LY are employed by Roche. CW and HL-YC have stockholdings in Roche. HLJ, TP, AJT, and HW have received research support from Roche. HL-YC, PL, TP, and HW have provided consultancy and speaker engagements for Roche. HLJ and AJT have provided consultancy for Roche. LW, JH, J-HK and C-YP report no potential conflicts. All authors report medical writing support from Roche.
