Hierarchical Bayesian modelling using groupings of adverse events (AEs) into system organ classes (SOC) are a set of approaches that have been proposed for analysing safety signals in clinical trials. However AEs may be the expression of more than one clinical pathology and the classification of an AE into a single SOC may not always be clear. Further, medical dictionaries may assign AEs which are difficult to classify into a generic disorders SOC. When modelling AE data using SOCs, the misclassification of an AE may lead to either a potential safety signal being missed, or a safety signal being incorrectly flagged.
Methods
We investigate the use of mixed membership models as one approach to handling this issue.
Conclusions
Results indicate that this type of approach does have a real effect on model results, and the implications are discussed.
There is a growing body of evidence that hierarchical modelling is increasingly being used, or advocated for use, in the analysis of adverse events (AEs).1 The classification of AEs into related groupings or System Organ Classes (SOCs) by medical dictionaries, for example MedDRA,1 provides a hierarchical structure which lends itself naturally to (Bayesian) hierarchical modelling. Hierarchical models based around a SOC/AE relationship are typically implemented using a common group or SOC mean which allows related AEs to borrow strength in order to flag potentially rare AEs as being significant (associated with a treatment), or alternatively to shrink non-significant effects. This use of a common SOC mean has a number of implications. In particular a common group mean tacitly assumes a positive relationship between AEs within a SOC. For the case of more complicated relationships, it is hoped that the models have enough structure to capture these accurately.2 The role of SOC membership is also crucial to the modelling process. The SOCs are pre-determined by the classification system, usually through a biological relationship, rather than through the discovery of clusters or groups of AEs by an exploratory analysis, as might be the case if the data was to be analysed with no preconceptions. Further, not all AEs have a specific single clinical pathway or pathology.3 In these cases there may be no single SOC into which the AE naturally fits. This raises the question of how is it possible to account for this type of uncertainty within the analysis? MedDRA, for example, provides a SOC, General disorders and administration site conditions, which is used for AEs which are nonspecific or that may be related to several body-systems. This type of generic SOC does not provide a suitable grouping as there may be no actual relationship between the AEs within this SOC, defeating one of the of requirements of the hierarchical models. To handle this type of case we look at the effect of introducing mixed membership4 into hierarchical models by way of an example using the three-level hierarchical model of Berry and Berry (BB),5 which was an early proponent of using groupings of AEs. Here the posterior probability of an increase in treatment log-odds (θ) is used as a method of assessing whether an AE has increased rate on the treatment arm and an attractive property of the model is that it requires a relatively strong signal in order for an AE to be flagged as associated with treatment.
The aim of the study is not to suggest new models but to investigate the possible effect of mixed membership in existing approaches, and to see if this is a worthwhile addition. We find the use of mixed SOC membership has a real effect on the outcomes of the models with the probability of an AE having an increase in rate on the treatment arm varying greatly depending on the SOC(s) with which it is associated. This is one possible approach which could be implemented relatively simply in many models, and which has the potential to account for some of the issues regarding AE classification where there is uncertainty about the SOC to which an AE should belong.
Study data
Data from a GlaxoSmithKline plc (GSK) sponsored Phase III randomised clinical trial (ClinicalTrials.gov identifier: NCT00078572) is used to demonstrate the approach. AE data is available through the GSK clinical study register2 and has been discussed previously.2 In this trial Oedema peripheral is classified by MedDRA to be in the General disorders and administration site conditions SOC. However, Oedema peripheral may have several different causes such as renal failure or liver cirrhosis,3 and could potentially be associated with a number of different SOCs.
Mixed membership models
We use a minor re-parameterisation of the BB model and look at two adaptations of this model to allow for mixed membership of a SOC. The first is a typical Bayesian approach assigning a prior probability of membership to each possible SOC for the AE (model 1). If there are NC subjects on the control arm, NT subjects on the treatment arm, j = 1, …, N AEs in total, b = 1, …, B SOCs, kb AEs in SOC b, AE(j) ∈ B(b) the event that AE j is assigned to SOC B, Xj and Yj the number of occurrences of AE(j) in the control and treatment groups respectively, and cj and tj the probability of experiencing AE(j) for the control and treatment groups respectively, the Berry and Berry model with priors for SOC membership is:
where γj is the log-odds for AE(j) occurring in the control group, and θj is the relative increase in the log-odds in the treatment group.
The priors for the model parameters and hyperparameters are given in equations (5)-(6).
where the log-odds in the control group are modelled by a normal distribution (representing the SOC containing the AE) with mean and variance μγb and respectively. The increase in log-odds on the treatment arm is similarly modelled by a normal distribution with mean μθb and variance , but also includes a probability, πb, of no difference in occurrence rate between the two groups (i.e. that cj = tj). The rest of the hyperparameters are as defined in the original BB model specification (described in the Supplemental Material).5 The model can be considered to include uncertainty about the SOC to which an AE is assigned as well as model membership of more than one SOC, so while it is capable of capturing in some sense some of the joint effects of membership of more than one SOC these effects will be weighted in proportion to the prior probabilities supplied.
This type of analysis leads naturally to the question of how to model an AE which is genuinely believed to be associated with more than one SOC. Hierarchical structures such as those in the Berry and Berry and similar models naturally support borrowing strength between AEs within a SOC, with the SOCs themselves (weakly2) borrowing strength from each other through a shared overall distribution.5–7 This structure does not lend itself naturally to a shared model of SOC membership for AEs. A second approach is to define a set of SOCs for each AE for which it is expected that the SOC will have some effect on the AE, and then consider how this may be modelled within the hierarchical structure. A relatively simple approach is to consider a mixed membership model as weighted contributions for the effects of different SOCs for AEs for which it is expected that a dependency on multiple SOCs may exist. A typical implementation of this type of model would be to consider γj as a sample from a normal distribution whose mean is a weighted sum of individual SOC means. For θj the situation is complicated by the presence of the point-mass term. Here we must weight each contribution of the mixture in order to keep the BB model property of the point-mass effect. In this case a very simple extension of the BB model is then (model 2):
with the remainder of the model remaining the same as the original BB model. For any j, choosing wjb = 1, for b (1 ≤ b ≤ kb), reduces model 2 to the Berry and Berry model. A further extension of the model would be to include a prior for each wjb. The interpretation of the parameters is the same as the BB model. The role of the weighting parameter (wbj) is to provide for an AE a fixed expression of our belief that it is associated with a specific SOC or SOCs. If an AE is believed to be associated with more than one SOC, then it will have multiple weightings (summing to 1). This model is perhaps the simplest extension of the BB model which doesn’t introduce any further distributional assumptions beyond accounting for SOC membership, and provides a comparator for (the fully Bayesian) model 1.
The models may be implemented in the bugs language and fitted using WinBUGs, OpenBUGs, jags or similar software.8 The analysis presented in this paper was run using jags 4.3.0. Each model was fitted using 3 parallel chains with an initial burn-in of 20,000 iterations, followed by a further 60,000 samples. Convergence was assessed visually and by using the Gelman-Rubin diagnostic statistic.9
Results
We investigate the case where the AE Oedema peripheral is considered to be a member of more than on SOC. In the trial EGF100151, Oedema peripheral has an incidence rate of 7 out of 191 on the control arm versus 17 out of 210 on the treatment arm, and a p-value of 0.090 for a two-sided Fisher exact test and 0.047 for a one-sided test (for increased incidence rate on the treatment arm). The SOC General disorders and administration site conditions contains 31 AEs none of which, apart from Oedema peripheral, are significant for a one-sided or two-sided Fisher exact test at the 5% level.
From the Berry and Berry model the posterior probability of an increase in rate for Oedema peripheral on the treatment arm is 0.544. This is in part a reflection of the low level rates of the other AEs in the SOC that Oedema peripheral has been assigned to. The top 10 AEs by positive posterior probability from the Berry and Berry model are given in Table 1 and a plot of the estimated median increase in log-odds for the AEs in the SOCs associated with the top 10 AEs as well as Oedema peripheral, versus the posterior probability of an increase on treatment, is given in Figure 1. Rash and Diarrhoea are expected AEs which were noted in the trial protocol. A number of other AEs have a high probability of being associated with treatment. How important these are will be a clinical decision.
Trial EGF100151: Top 10 adverse events by posterior probability of an increase in rate on the treatment arm (Berry and Berry model).
SOC
Adverse event
Prob (θ > 0)
Gastrointestinal disorders
Diarrhoea
1.000
Skin and subcutaneous tissue disorders
Rash
1.000
Gastrointestinal disorders
Dyspepsia
0.984
Respiratory, thoracic and mediastinal disorders
Epistaxis
0.984
Skin and subcutaneous tissue disorders
Dermatitis acneiform
0.965
Skin and subcutaneous tissue disorders
Nail disorder
0.934
Respiratory, thoracic and mediastinal disorders
Dyspnoea
0.905
Musculoskeletal and connective tissue disorders
Arthralgia
0.901
Musculoskeletal and connective tissue disorders
Muscle spasms
0.881
Musculoskeletal and connective tissue disorders
Back pain
0.880
Plot of the posterior median increase in log odds-ratio (OR) versus the posterior probability of an increase for all AEs.
The clinical trial data is reanalysed using model 1 and model 2 above. We look at two cases, one where there is doubt about which SOC Oedema peripheral belongs to (analysed by model 1), and one where it is now considered that Oedema peripheral is affected by more than one SOC (analysed by model 2). Clinical input and subject knowledge is of primary importance when choosing the SOCs, prior probabilities pjb, and weights wjb for the AEs. For our purposes we assume the SOCs are Renal and urinary disorders, Hepatobiliary disorders and Skin and subcutaneous tissue disorders.
We choose a discrete uniform prior for model 1 (pjb = 1/3). The top 10 AEs by positive posterior probability are given in Table 2, and a plot of the median increase in log odds versus the posterior probability of an increase is given in Figure 2.
Trial EGF100151: model 1 - top 10 adverse events by posterior probability of an increase in rate on the treatment arm.
SOC
Adverse event
Prob (θ > 0)
Gastrointestinal disorders
Diarrhoea
1.000
Skin and subcutaneous tissue disorders
Rash
1.000
Gastrointestinal disorders
Dyspepsia
0.985
Respiratory, thoracic and mediastinal disorders
Epistaxis
0.984
Skin and subcutaneous tissue disorders
Dermatitis acneiform
0.969
Skin and subcutaneous tissue disorders
Nail disorder
0.944
N/A
Oedema peripheral
0.929
Musculoskeletal and connective tissue disorders
Arthralgia
0.906
Respiratory, thoracic and mediastinal disorders
Dyspnoea
0.902
Skin and subcutaneous tissue disorders
Dry skin
0.893
Plot of the posterior median increase in log odds-ratio (OR) versus the posterior probability of an increase for all AEs.
Comparing Table 2 to Table 1 there are changes in the top 10 AEs. Muscle spasms and Back pain have dropped out of the top 10 for model 1, although their posterior probabilities are 0.888 and 0.882 respectively, and been replaced by Dry skin, posterior probability 0.893, and Oedema peripheral which now has a posterior probability of 0.929 of a positive increase in rate on the treatment arm, compared to 0.544 for the model where Oedema peripheral was in General disorders and administration site conditions. This is a large increase in posterior probability and shows the importance the choice of SOC can make when deciding to flag an AE as being potentially associated with a treatment.
For model 2, with wjb = 1/3 for Oedema peripheral, the top 10 AEs by positive posterior probability are given in Table 3. For the SOCs associated with the top 10 AEs, a plot of the median increase in log odds versus the posterior probability of an increase in given in Figure 2.
Trial EGF100151: model 2 - top 10 adverse events by posterior probability of an increase in rate on the treatment arm.
SOC
Adverse event
Prob (θ > 0)
Gastrointestinal disorders
Diarrhoea
1.000
Skin and subcutaneous tissue disorders
Rash
1.000
Gastrointestinal disorders
Dyspepsia
0.985
Respiratory, thoracic and mediastinal disorders
Epistaxis
0.985
N/A
Oedema peripheral
0.981
Skin and subcutaneous tissue disorders
Dermatitis acneiform
0.965
Skin and subcutaneous tissue disorders
Nail disorder
0.939
Musculoskeletal and connective tissue disorders
Arthralgia
0.906
Respiratory, thoracic and mediastinal disorders
Dyspnoea
0.901
Musculoskeletal and connective tissue disorders
Muscle spasms
0.887
Again the top 10 AEs are very similar to the other models, but Oedema peripheral has an even higher posterior probability than for model 1, indicating the difference between the models with regard to how mixed membership is implemented. Model 2 does not take into account any doubt about the SOCs to which an AE may belong. The effect of varying the weights for model 2 is investigated in a small sensitivity study included in the Supplemental Material.
It should be understood that changing the structure of the model to include effects for AEs from different SOCs may have an effect on the posterior probabilities of the other AEs in the model. However, it likely that this effect will be small as influence of the mixed membership AEs will be a partial effect for each SOC with which it is associated. As an example, for the Hepatobiliary disorders SOC the posterior probabilities under the BB and new models are given in Table 4. We can see that results of the different models are similar, the effect of introducing the mixed membership models is minimal in this SOC.
Hepatobiliary disorders: Posterior probability of an increase in rate on the treatment arm for all models.
Adverse event
Berry and Berry model
Model 1
Model 2
Budd-chiari syndrome
0.320
0.322
0.327
Cholecystitis
0.285
0.288
0.288
Hepatic cirrhosis
0.321
0.323
0.328
Hepatic function abnormal
0.423
0.424
0.427
Hepatic pain
0.251
0.252
0.249
Hepatotoxicity
0.283
0.286
0.285
Hyperbilirubinaemia
0.869
0.856
0.841
Jaundice
0.318
0.321
0.327
Discussion
High dimensionality, low event rates, and low power to detect treatment differences are among the challenges of detecting safety signals in clinical trials. Appropriate statistical methods, such as hierarchical Bayesian models, address some of these issues and may be used to help characterise the safety profile of a drug. The availability of software packages such as R and WinBUGs have allowed the use of sophisticated models of AE incidence to be applied to clinical trial safety data in a routine manner, and allow an explorative approach to safety analysis, in contrast to the use of a strict cut-off for flagging AEs employed by error controlling procedures.2
The development of methods for the statistical analysis of adverse events continues to be an area of active research,10–12 but regardless of approach, any method which requires a choice of AEs and their assignment to SOCs for the analysis, requires careful consideration. While medical dictionaries provide groupings of AEs into SOCs there may be a number of AEs which do not fit easily into a single SOC. One approach is to assign these AEs to a separate general SOC, such as General disorders and administration site conditions in MedDRA. This highlights our first modelling issue: if the AEs within a SOC have no biological relationship using them in a grouped analysis is counterproductive. In the case of Oedema peripheral we can see that its posterior probability of being associated with treatment is 0.544 while a member of this SOC. This is due in the main to the fact that other AEs in the SOC General disorders and administration site conditions are low incidence. For those AEs for which a clinical pathology exists, and which should be included in the statistical modelling, we need to find a way of meaningfully including them in the analysis.
Our second potential modelling issue is: if it is possible that an AE may be classified in one of two or more SOCs, or influenced by more than one SOC, then unless we use a mixed membership or similar approach, we are left with a choice of deciding which SOC is more suitable, or will have the most influence on the AE. The alternative of mixed membership approaches, as presented here, are to assign prior probabilities to the AE for membership of the SOCs on which the AE may have some dependence (model 1), or to explicitly include these effects in the model (model 2). In either case this allows a number of SOCs to contribute in some way to the modelling of the AE incidence rate. The other AEs in these SOCs will have a shrinkage or strengthening effect depending on whether incidence rates are raised or not raised in these SOCs.
We have seen that classifying Oedema peripheral into the General disorders and administration site conditions SOC does have an effect on the model results and that if we account for the uncertainty about the SOC to which an AE belongs in the model then the results are changed. In our case for the AE Oedema peripheral the posterior probability of an increase in incidence rate goes from 0.544 to 0.929 for model 1, and to 0.981 for model 2, very large increases. These results are demonstrative only and clearly when making an adjustment to the structure of the AE/SOC hierarchy very careful consideration needs to be given to the choice of SOCs with which to associate an AE.
Conclusion
Returning to the issues we considered in the introduction, the use of a generic SOC such as General disorders and administration site conditions is not suitable for modelling and alternatives should be considered. For Bayesian hierarchical models which model AE/SOC membership, alternatives such as mixed membership models may provide a relatively simple extension for some models which is capable of taking multiple SOC membership into account. However, mixed membership requires a number of additional assumptions. In particular, for model 2 above, we have tacitly assumed an additive structure for the multiple SOC membership. This may not always be the case and whether this is a reasonable assumption is dependent on the data and clinical assessment. Further, while model 1 gives similar results to model 2 for our data analysis, the philosophy behind the two models is different. Model 1 is a fully Bayesian extension of the existing model, taking into account the uncertainty around group membership. Model 2 on the other hand is a statement of our membership beliefs. This is reflected in the stronger signal for Oedema peripheral in model 2. Which approach may be more suitable is again a decision which needs to be informed by clinical opinion and, with the appropriate clinical guidance, mixed membership may be a worthwhile addition to modelling AE data.
Supplemental material
Supplemental material - Mixed membership effects in adverse event Bayesian hierarchical modelling
Supplemental Material for Mixed membership effects in adverse event Bayesian hierarchical modelling by Raymond Carragher in Research Methods in Medicine & Health Sciences.
Supplemental material
Supplemental material - Mixed membership effects in adverse event Bayesian hierarchical modelling
Supplemental Material for Mixed membership effects in adverse event Bayesian hierarchical modelling by Raymond Carragher in Research Methods in Medicine & Health Sciences.
Supplemental material
Supplemental material - Mixed membership effects in adverse event Bayesian hierarchical modelling
Supplemental Material for Mixed membership effects in adverse event Bayesian hierarchical modelling by Raymond Carragher in Research Methods in Medicine & Health Sciences.
Supplemental material
Supplemental material - Mixed membership effects in adverse event Bayesian hierarchical modelling
Supplemental Material for Mixed membership effects in adverse event Bayesian hierarchical modelling by Raymond Carragher in Research Methods in Medicine & Health Sciences.
Footnotes
ORCID iD
Raymond Carragher
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) [award reference 1521741] and Frontier Science (Scotland) Ltd, and by Health Data Research (HDR) (UK) (Medical Research Council (UK) award reference: MR/S003967/1).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Supplemental material
Supplemental material for this article is available online.
The supplementary material for this paper contains the models in the bugs language. The data and model outputs for the tables and figures above, convergence diagnostics for some of the parameters and the scripts used to run the analyses.
Notes
References
1.
de Abreu NunesLHooperRMcGettiganP, et al.Statistical methods leveraging the hierarchical structure of adverse events for signal detection in clinical trials: a scoping review of the methodological literature. BMC Med Res Methodol2024; 24(1): 253. https://doi.org/10.1186/s12874-024-02369-1
2.
CarragherRRobertsonC. Assessing safety at the end of clinical trials using system organ classes: a case and comparative study. Pharm Stat2021; 20(6): 1278–1287. https://doi.org/10.1002/pst.2148
AiroldiEMBleiDEroshevaEA, et al.Handbook of mixed membership models and their applications. 1st ed. Chapman & Hall, 2014.
5.
BerrySMBerryDA. Accounting for multiplicities in assessing drug safety: a three-level hierarchical mixture model. Biometrics2004; 60(2): 418–426. https://doi.org/10.1111/j.0006-341X.2004.00186.x
6.
DuMouchelW. Multivariate Bayesian logistic regression for analysis of clinical study safety issues. Stat Sci2012; 27(3): 319–339. https://doi.org/10.1214/11-sts381
7.
CrooksCPrieto-MerinoDEvansSW. Identifying adverse events of vaccines using a Bayesian method of medically guided information sharing. Drug Saf2012; 35(1): 61–78. https://doi.org/10.2165/11596630-000000000-00000
8.
LunnDJacksonCBestN, et al.The BUGS book: a practical introduction to Bayesian analysis. Chapman & Hall/CRC texts in statistical science. Taylor & Francis, 2012. https://books.google.co.uk/books?id=Cthz3XMa_VQC
TanXLiuGFZengD, et al.Controlling false discovery proportion in identification of drug-related adverse events from multiple system organ classes. Stat Med2019; 38(22): 4378–4389. https://doi.org/10.1002/sim.8304
11.
TanXChenBESunJ, et al.A hierarchical testing approach for detecting safety signals in clinical trials. Stat Med2020; 39(10): 1541–1557. https://doi.org/10.1002/sim.8495
12.
DiaoGLiuGFZengD, et al.Efficient methods for signal detection from correlated adverse events in clinical trials. Biometrics2019; 75(3): 1000–1008. https://doi.org/10.1111/biom.13031
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.