Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect Mycobacterium tuberculosis resistance mutations

Abstract

Antimicrobial resistance is becoming a major threat to public health throughout the world. Researchers are attempting to contrast it by developing both new antibiotics and patient-specific treatments. In the second case, whole-genome sequencing has had a huge impact in two ways: first, it is becoming cheaper and faster to perform whole-genome sequencing, and this makes it competitive with respect to standard phenotypic tests; second, it is possible to statistically associate the phenotypic patterns of resistance to specific mutations in the genome. Therefore, it is now possible to develop catalogues of genomic variants associated with resistance to specific antibiotics, in order to improve prediction of resistance and suggest treatments. It is essential to have robust methods for identifying mutations associated to resistance and continuously updating the available catalogues. This work proposes a general method to study minimal inhibitory concentration distributions and to identify clusters of strains showing different levels of resistance to antimicrobials. Once the clusters are identified and strains allocated to each of them, it is possible to perform regression method to identify with high statistical power the mutations associated with resistance. The method is applied to a new 96-well microtiter plate used for testing Mycobacterium tuberculosis.

Keywords

Antimicrobial resistance censored data minimal inhibitory concentration distributions mixture models genome-wide association study

1. Introduction

Public health authorities throughout the world are becoming more and more concerned about antimicrobial resistance, due to the reduced ability of standard compounds to treat infectious diseases.^1–3 Antimicrobial resistance mechanisms have been observed in bacteria,^4–6 in fungi,^7,8 and in viruses.^9,10

There are two main causes of the development of drug resistance, that is, either the prescription of suboptimal treatments which encourage the development of resistance or direct transmission of resistant strains. Methods used to tackle the rise of antimicrobial resistance include a wiser prescription of antimicrobials, that takes into account known resistance patterns. Such patterns are studied through antimicrobial susceptibility testing to identify at which concentration of a particular drug the growth of the pathogen is inhibited. In this respect, microtiter plates allow the effectiveness of several drugs to be tested at the same time on a single clinical isolate.

Antimicrobial data, obtained through dilution methods,¹¹ are registered as minimum inhibitory concentration (MIC) values, expressed in milligrams per litre (mg/L). The MIC is defined as the minimal concentration of an antimicrobial substance that inhibits the visual growth of a pathogen after incubation. Since this type of test is more accurate than diffusion tests, MICs are considered the golden standard of susceptibility tests.¹² According to the experiment design adopted to obtain MIC values, data for a specific drug follow a distribution as shown in Figure 1. The shape of this distribution may vary considerably from drug to drug, according to the specific resistance patterns.

Figure 1.

Barplots of the minimal inhibitory concentration (MIC) distributions of each drug under plate design UKMYC5. The $y$ -axis represents the density of each class/dilution and the $x$ -axis represents the $\log_{2} (MIC)$ values.

The aim of this work is to propose a general method, the censored Gaussian mixture approach, to define clusters of strains showing different levels of resistance in order to associate them to specific mutations in genome-wide association studies (GWASs).¹³ The method relies on a latent represention where a continuous variable, following a Gaussian mixture model with a prior on the number of components, is only partially observed. This representation allows to derive a posterior distribution on the number of clusters, that is, levels of resistance, and the allocations of strains to each cluster will be shown to have a better performance in association studies, in particular for rare or low-frequency mutations.

Although the methods presented in this work may be applied to any pathogen and any dilution method, the attention is focused on Mycobacterium tuberculosis, given the importance of the resistance mechanisms developed by this pathogen. While the trend of new cases of tuberculosis is decreasing,¹⁴ the number of cases resistant to one or more drugs, in particular to first-line drugs (rifampicin, ethambutol, isoniazid, and pyrazinamide) is increasing.^15,16

In this work, we propose an alternative approach to the standard definition of critical concentrations to define resistance. A critical concentration is the concentration used to classify isolates in the susceptible group or the resistant group. However, the critical concentrations of most of the anti-TB drugs have been recently revised and updated by the World Health Organization,^17,18 through an extensive study of the literature, and it has emerged that the identification of critical concentrations is not a simple task, as is usually assumed. We propose, instead, to use a classification approach, where isolates are allocated to clusters of resistance, in order to identify potential intermediate levels to define phenotypic subgroups (and not only two main groups – susceptible and resistant); this multi-label classification will be shown to be essential in order to identify the mutations associated with specific levels of resistance more clearly, in particular for those antimicrobials for which only a few resistant cases are observed (e.g. for bedaquiline which is a new treatment).

When defining critical concentrations, it is ofted assumed that the wild-type group of isolates (defined as the group of isolates with no acquired resistance to antimicrobials) follows a log-normal distribution, as given by Turnidge et al.,¹⁹ where the cutoffs are identified by fitting a log-normal cumulative distribution, through non-linear least squares regression. This method is implemented in the ECOFFinder software available on the website of the European Society of Clinical Microbiology and Infectious Diseases (EUCAST). This method strongly relies on the assumption that a log-normal is a suitable model for the binary logarithm of MIC values and it does not take into account the region of the distribution where wild-type and non-wild types strains overlap. Jaspers et al.²⁰ relaxed the assumption that the wild-type distribution is log-normal: the MIC values for the wild-type distribution are still considered realization of continuous random variables, however, the authors model the MIC groupings with a multinomial distribution, with parameters corresponding to the probabilities to belong to any of the different concentrations (or dilutions) analysed on the testing plate.

Such methods, also defined ‘local’, rely on the possibility to well identify the wild-type group of isolates; however, in many cases, such as M. tuberculosis, the wild-type group itself may be heterogeneous. On the other hand, the approach proposed in this work is ‘global’, that is, it is aimed at modelling the whole mixing distribution. Jaspers et al.,^21–23 considered a mixture-type model

g (y) = π f_{1} (y; θ_{1}) + (1 - π) f_{2} (y; θ_{2})

where

f_{1}

and

f_{2}

represent the wild-type and the non-wild-type component, respectively;

f_{1}

has a parametric form (log-normal or gamma), while

f_{2}

is fitted by following a nonparametric approach. Isolates are classified as wild-type when

\frac{π f_{1} (y_{i}; θ_{1})}{π f_{1} (y; θ_{1}) + (1 - π) f_{2} (y_{i}; θ_{2})} \geq 0.5

However, these classification is still binary and may not realistically represent the available groups, in particular in the presence of intermediate levels of resistance.

Gaussian mixture models have already been suggested in the study of MIC distributions, for example, by Craig²⁴ and Annis and Craig.²⁵ However, although MIC values may be considered ideally continuous, they are registered as discrete values or, more specifically, as counts of the number of isolates associated to every dilution. Moreover, considering a fixed and known number of components is a strong implicit assumption when the resistance mechanism is not yet fully understood.

Mixture models for ordinal data, including regression on covariates, have been proposed by McLachlan and Jones,²⁶ Cadez et al.,²⁷ and Hamdan and Wu²⁸ with estimation via expectation–maximization (EM) algorithm, and Kottas et al.,²⁹ and DeYoreo and Kottas³⁰ in a Bayesian setting, among others. Similarly to the approach proposed in this work, a latent Gaussian variable is introduced, following a mixture model, to describe the behaviour of the implicit continuous variable which is observed at a discrete scale. The main difference with the censored Gaussian mixture model (censored GM) proposed here is that in previous works the latent continuous variables is modelled according to an infinite mixture of Gaussian distributions, using a Dirichlet process prior. However, Miller and Harrison³¹ showed inconsistency of this model in estimating the number of clusters. The results on the available dataset (Section 4) will show such inconsistency empirically.

The use of a prior distribution on the number of components for a finite mixture model has been shown to allow for consistency in estimating the number of clusters, differently from the use of Dirichlet process priors. The reason for this is that, in finite mixture models, most of the prior mass is associated to clusters of similar size, while in Dirichlet processes the prior mass is associated to clusters of highly variable size, favouring an increasing number of small clusters.³² See also Frühwirth-Schnatter et al.³³ for a recent characterization of the prior distribution on the number of clusters, induced by the prior distribution on the number of components.

The remaining of the article is organized as follows. Section 2 describes the dataset which motivates the study. The censored Gaussian mixture approach proposed in this work is formally presented in Section 3. Several approaches are applied and compared on the motivating dataset in Section 4. The labelling provided by the proposed method is then used to perform a GWAS in Section 5, in order to identify mutations associated with resistance to each of the antimicrobials under considerations: several previously unreported variants, or variants identified in smaller studies will be associated, to several levels of resistant to specific drugs. Section 6 concludes the article. Supporting Information includes a simulation study to test the approach.

2. The dataset: Resistance prediction by means of CRyPTIC

The CRyPTIC Consortium (Comprehensive Resistance Prediction for Tuberculosis: an International Consortium) was created in order to collect and study about $20, 000$ isolates of M. tuberculosis and to define a catalogue of mutations associated with resistance to $14$ antituberculosis compounds: three first-line drugs (isoniazid INH, rifampicin RIF, and ethambutol EMB), other drugs already used in practice as antituberculosis compounds (rifabutin RFB, amikacin AMI, kanamycin KAN, ethionamide ETH, phage-antibiotic synergy PAS, levofloxacin LEV, and moxifloxacin MXF), two new compounds (delamanid DLM and bedaquiline BDQ), and two repurposed compounds (clofazimine CFZ and linezolid LZD).

As part of the project, the CRyPTIC Consortium designed a UKMYC 96-well microtiter plate. The plate design has been validated by seven laboratories in Asia, Europe, South America, and Africa, by using 19 external quality assessment (EQA) strains, including the most frequently studied tuberculosis strain, H37Rv. A full description of the experiment and of the results, in terms of reproducibility of the plate, is available by Rancoita et al.³⁴ Since the highest level of reproducibility was identified for readings at day 14 after inoculation with the Vizion imaging system, attention is here only concentrated on data relative to this subset. Moreover, the PAS compound was shown to not perform well on the plate, and has therefore been discarded in the following part of the CRyPTIC study. For this reason, the outcomes relative to PAS, although still presented in this work, should be considered more uncertain. The validation experiment also showed that there is a biological variability depending on both the plate and the culture preparation, so that by repeating the culture of the same strain several times, a full distribution of possible values is obtained and this distribution is concentrated within three dilutions 95% of the times.

In this article, MIC values obtained from dilution experiments on a 96-well microtiter plate containing a liquid growth medium (broth) are analysed, where the same dose of pathogen is cultured in each well, but in the presence of successively increasing antimicrobial concentrations (double dilutions). The MIC value is identified as the concentration of the first well which does not allow the pathogens to grow. By convention, if growth is inhibited in all wells, the MIC is set to the lowest concentration available and, if growth is observed at each concentration level, the MIC is set to an agreed higher level of antimicrobial concentration that has not been studied on the plate.

While the results from the validating experiment were studied, the laboratories involved in the CRyPTIC consortium analysed the first set of strains with the initial plate design (Plate Design ‘UKMYC5’); in May 2018 a new plate design (Plate Design ‘UKMYC6’) was concorded and the laboratories started to use it in July 2018. The dataset used for the current work includes only strains analysed with the UKMYC5 plate design ( $\sim$ 7500 isolates). The absolute frequencies of isolates for each compound are shown in Table 1, while the empirical distributions of the $\log_{2}$ (MIC) for each compound are shown in Figure 1.

Table 1.
Number of isolates analysed for each compound.

Compound $n$ Compound $n$ Compound $n$

AMI 7312 ETH 7310 MXF 6385

BDQ 7054 INH 7097 PAS 6319

CFZ 6793 KAN 7207 RFB 7331

DLM 7016 LEV 6607 RIF 7145

EMB 6584 LZD 6420

Compound	$n$	Compound	$n$	Compound	$n$
AMI	7312	ETH	7310	MXF	6385
BDQ	7054	INH	7097	PAS	6319
CFZ	6793	KAN	7207	RFB	7331
DLM	7016	LEV	6607	RIF	7145
EMB	6584	LZD	6420

RFB: rifabutin; AMI: amikacin; KAN: kanamycin; ETH: ethionamide; PAS: phage-antibiotic synergy; LEV: levofloxacin; MXF: moxifloxacin; INH: isoniazid; RIF: rifampicin; EMB: ethambutol; DLM: delamanid; BDQ: bedaquiline; CFZ: clofazimine; LZD: linezolid.

Figure 1 also shows an important feature of the dataset and, in general, of the problem of studying MIC distributions: the data are censored. First, the MIC value is only partially known at the boundary of the analysed concentration range. Moreover, the MIC values are not continuous variables as they are observed at fixed levels of concentrations (interval-censoring). Approaches usually applied to the estimation of MIC distributions often do not take into account these two sources of censoring, while one of the advantages of our proposed approach is being able to control for both of them.

3. The proposed models

Consider a set of random variables $Y_{1}, \dots, Y_{n}$ of size $n$ . In the case of the application of Section 2, $Y_{i}$ represents the $\log_{2}$ (MIC) for the drug under analysis. In this work, each antimicrobial is considered as independent from the others, so $Y_{i}$ is a univariate random variable. A mixture model assumes that the distribution of $Y_{i}$ can be written as a composition of distributions known in closed form

g (y_{i}; π, θ) = \sum_{k = 1}^{K} π_{k} f_{k} (y_{i}; θ_{k}) i = 1, \dots, n

(1)

where

f_{k} (\cdot)

is the

k

-th component density of the mixture depending on parameter

θ_{k}

and

π_{k}

is known as a mixture weight such that

0 \leq π_{k} \leq 1

for

k = 1, \dots, K

, and

\sum_{k = 1}^{K} π_{k} = 1

. Even though the probability distributions

f_{k} (\cdot)

may be from any family (and can also model either discrete or continuous random variables), it is usually assumed, in many applications, that all the distributions in the mixture come from the same family, albeit denoted by different parameters. The number of components

K

is in general unknown and may be considered finite (finite mixture models)³⁵ or infinite (nonparametric mixtures).³⁶ When

f_{k} (\cdot; θ_{k}) = N (μ_{k}, σ_{k}^{2})

, where

μ_{k}

and

σ_{k}^{2}

are the mean and the variance of the

k

-th component, respectively, the model is a Gaussian mixture model.

The model can be rewritten as

\begin{aligned} Y_{i ∣ k} & = μ_{k} + ε_{i, k} \\ ε_{i, k} & \sim N (0, σ_{k}^{2}) \end{aligned}

(2)

where

μ_{k}

is an intercept specific to component

k

; the intercept can be modelled such that

E [Y_{s, d | k}] = μ_{s, d, k} = a_{d, k} + b_{s, d, k}

, where

a_{d, k}

is an intercept that is specific of the compound

d

and

b_{s, d, k}

is an intercept that is specific of the strain

s

, tested with compound

d

. In this work, antimicrobial are considered separately. A model that allows for interactions among antimicrobials may better identify cross-associations. However, in order to flexibly represent this interaction, complex multivariate models are needed. An initial analysis of the association among

\log_{2}

(MIC) values recorded for pairs of antimicrobials showed that the dependence is highly non-linear, suggesting that non-normal models should be preferred. Therefore, such extensions are left for further research.

Equation (1) can be augmented including a latent variable relative to the allocation of an observation to a particular component: it is possible to hypothesize the existence of a latent variable $Z_{i}$ that assumes value in ${1, \dots, K}$ with probabilities ${π_{1}, \dots, π_{K}}$ and labels the component to which the observation belongs; in other words, the conditional density of $Y_{i}$ given $Z_{i} = k$ corresponds to the Gaussian distribution $N (μ_{k}, σ_{k}^{2})$ . It follows that $Z = (Z_{1}, \dots, Z_{n})$ is distributed according to a multinomial distribution. Diebolt and Robert³⁷ showed that this latent variable representation produces a Gibbs sampling that is ergodic and characterized by geometric convergence.

The decision to fit a mixture model is motivated by three reasons. First, it seems more appropriate to model the whole mixing structure rather than only the wild-type group of isolates (preferring a global method to a local method), since the classification is unsupervised and the microtiter plates under study are characterized by biological noisiness. Second, the mechanisms of resistance are heterogeneous; considering the possibility that the resistant group can be described by more than one component may allow to identify intermediate levels of resistance. Moreover, the complete patterns of resistance are not known for most of the drugs under analysis and are almost completely unknown for new drugs. This represents an important first step for subsequent analysis, like GWAS. Third, the standard way of defining a wild-type group is by looking at those isolates that have no known conferring-resistance mutations. However, strains of $M . t u b e r c u l o s i s$ have been exposed to antimicrobials for decades and the so-called ‘wild-type’ group is itself heterogeneous. Using an unsupervised method, like a mixture model, allows to cluster strains into several groups, in order to separately investigate their genomic patterns and link the specific mechanisms of resistance to particular genomic variants.

Although the use of Gaussian components is already very flexible, it does not take into account the discrete and censored nature of the data: MIC values are not actually continuous, and are in fact rounded to the next two-fold dilution. Moreover, the data are truncated at the minimum and maximum dilution chosen for the plate. Therefore, it is possible to consider a mixture of distributions, where the discrete nature of the data is taken into account by rounding continuous (for instance, Gaussian) variables. A latent variable, $Y^{*} \in R$ is introduced, which is related to the observed variable $Y$ that represents the registered MIC value, so that:

y_{i} = {\begin{cases} {dilution}_{1, d} y_{i}^{*} < {dilution}_{1, d} \\ ⋮ \\ {dilution}_{j, d} {dilution}_{(j - 1), d} \leq y_{i}^{*} < {dilution}_{j, d} \\ ⋮ \\ {dilution}_{m a x, d} y_{i}^{*} \geq {dilution}_{m a x, d} \end{cases}

that is, the observed

y_{i}

assumes values in the dilution set, for drug

d

, on the basis of a Gaussian latent variable

Y_{i}^{*}

, which has the distribution described in equation (1). Here,

d i l u t i o n_{m a x, d}

is a value not actually tested on the plate, but at which observations are registered when growth of the pathogen is observed in every well: each value of

Y_{i}^{*}

larger than this maximum dilution corresponds to a value

Y_{i}

equal to

d i l u t i o n_{m a x, d}

; similarly, each value of

Y_{i}^{*}

smaller than the minimum dilution (i.e. no growth is observed in any well) corresponds to a value

Y_{i}

equal to

d i l u t i o n_{1, d}

; this is the way left and right censoring are dealt with in the proposed approach.

The probability mass function $p (\cdot)$ of $Y = (Y_{1}, \dots, Y_{n})$ is defined as

p (y_{i} = {dilution}_{j, d}) = \int_{d i l u t i o n_{j - 1, d}}^{d i l u t i o n_{j, d}} g (y_{i}^{*}) d y_{i}^{*} = \int_{d i l u t i o n_{j - 1, d}}^{d i l u t i o n_{j, d}} \sum_{k = 1}^{K} π_{k} f_{k} (y_{i}^{*}; θ_{k}) d y_{i}^{*}

This approach may be considered as a generalization to the case of mixture models of the latent Gaussian representation of Albert and Chib³⁸ defined for discrete variables. The mixed nature of the data is transferred to an implicit and richer variable, which, when observed, is censored and then only registered at a discrete scale.

Several approaches are available to generalize latent variable algorithms³⁸ to mixture models, in particular in a nonparametric setting; for example, a nonparametric estimation for mixed count data based on the infinite mixture models is proposed by Kottas et al.²⁹ While these methods are similar to the one proposed here with respect to the latent representation, there are some important differences: the goal of these approaches is often density estimation and not clustering. In this work, the use of a finite mixture model with an unknown number of components is preferred in order to introduce the information that a small number of components is expected; this is particularly important in this setting, where the clusters are defined with a biological interpretation. Moreover, it avoids the inconsistency of the Dirichlet process in estimating the correct number of components discussed by Miller and Harrison.³¹

The estimation of the proposed model is made within a Bayesian framework³⁹ to obtain posterior distributions (and the relative credible intervals) of all the parameters involved. In this analysis, it is necessary to define prior distributions for all parameters, such that they describe the prior knowledge the experimenter has about them. For the location and scale parameters, it is common to use weakly informative priors, for instance a $N (μ_{0}, τ^{- 2})$ for each location parameter $μ_{k}$ , where $τ^{2}$ is a precision parameter that can be fixed with respect to the range of the observations; for the precision parameters $σ_{k}^{- 2}$ , it is often used a gamma prior distribution $Γ (c, d)$ , with shape parameter $a$ and rate parameter $b$ ; see Richardson and Green⁴⁰ for a full description of these prior distributions. A Dirichlet prior distribution for the mixture weights is often considered, $π \sim D i r (δ, \dots, δ)$ for some choice of $δ$ : Rousseau and Mengersen⁴¹ and Grazian and Robert⁴² suggested $δ < 1$ for finite mixture model with a known number of components, which is set to be large in order to have a posterior distribution concentrated on a lower number of meaningful components; differently from their approach, here we set $δ = 1$ and fix a prior distribution on the number of components $K$ , to better investigate the ability of such prior distribution to encourage consistency of the posterior distribution towards the correct number of clusters.

The prior distribution for the number of components is known to be delicate. Here, the default prior distribution proposed by Grazian et al.⁴³ based on a loss-information definition is used, since it has shown a good balance between conservativeness and accuracy: it is important that the number of components is well estimated and that, at the same time, lower values of $K$ are preferred to larger values, unless there is enough support for larger values. This assumption follows a parsimonious principle which helps both the interpretation and the estimation procedure. This prior distribution is defined for $K \in N$ and is obtained by considering a loss function ${Loss}_{C} (K)$ , representing a complexity loss which increases as the number of parameters increases, so that simpler models are preferred unless there is enough evidence to prefer more complex models. This loss function is associated to the prior distribution such that

p (K) \propto \exp {{Loss}_{C} (K)}

From this definition, Grazian et al.⁴³ derived a beta-negative-binomial distribution as prior distribution

p (K)

, where the number of successes before stopping the experiment is equal to one and with shape parameters

α, β > 0

The parameters $α$ and $β$ can be used to describe available prior information about the true number of components because

\begin{aligned} E (K) & = \frac{α + β - 1}{α - 1}, for α > 1 \\ Var (K) & = \frac{α β (α + β - 1)}{(α - 2) (α - 1)^{2}}, for α > 2 \end{aligned}

In this work,

α

and

β

are taken to be both equal to one as a default choice in the presence of weak prior information. This choice implies that the probability of success in the latent representation of the beta-negative-binomial is provided a uniform prior distribution,⁴³ resulting in the expression:

p (K) = \frac{1}{K (K + 1)}

This prior distribution assigns higher probability mass to small values of

K

, with the probability mass rapidly approaching zero as

K

increases.

It is worth reminding that a major issue when estimating the parameters of mixture models is the label-switching phenomenon, due to the symmetry in the likelihood of the model parameters. The method used to tackle this problem in this article is post-processing the output of the Bayesian algorithms to re-label the components and keep the labels consistent. In more details, once the MCMC samples are obtained, each sample is permuted to induce an identifiability constraint such that $μ_{1} < μ_{2} < \dots < μ_{K}$ . This can be shown to minimize the Kullblack-Leibler divergence between the estimated matrix of classification probability and the corresponding true matrix.^44,45 Other methodologies can also been applied; see, for example, Celeux⁴⁶ and Sperrin et al.⁴⁷

4. Results

The methodology described in Section 3 is now applied to the dataset presented in Section 2. The goal of the analysis is to characterize the clusters representing different levels of resistance. Antimicrobials are analysed independently here.

For the parameters of the mixture, the following prior distributions are used: for each $k = 1, \dots, K$ , $μ_{k} \sim N (0, 100)$ ; $1 / σ_{k} \sim Γ (1.5, 0.5)$ , so that very concentrated components are considered unlikely a priori. Finally, $π$ is given a Dirichlet prior with all the parameters equal to $δ = 1$ .

The censored Gaussian mixture model with conservative prior on the number of components proposed in this work has been compared with other three methods of classifications:

ECOFFinder,¹⁹ as implemented in the R package antibioticR⁴⁸; three choices of the quantile of interest are selected and compared: $0.95$ , $0.99$ , and $0.999$ ;

a Gaussian mixture (GM) model, as given by Annis and Craig²⁵;

a Dirichlet process (DP) mixture for discrete observations.²⁹

Supplemental Appendix A provides information about the MCMC scheme that was implemented for the censored Gaussian mixture model. For all methods, the MCMC algorithm has been implemented with

10^{6}

iterations, with a burnin of

10^{5}

iterations and using a (

\times 10

) thinning factor. A convergence study is available in Supplemental Appendix B.

Table 2 displays the estimated number of clusters obtained via the censored GM, along with the posterior means of the mixture weights associated with these clusters for each antimicrobial. Certain antimicrobials exhibit associations with two clusters; for instance, the majority of the new or repurposed drugs, such as BDQ, CFZ, and LZD. Conversely, DLM appears to possess a heavy tail, encompassing highly resistant strains as well as intermediate cases, as depicted in Figure 1. Interestingly, RIF, RFB, and INH each display three or four clusters, reinforcing the notion of multiple intermediate levels of resistance. In contrast, EMB presents a single cluster; however, it is worth noting that the range of MIC values might not have been precisely defined in the study.

Table 2.
Number of identified clusters, and posterior means of the relative mixture weights for the censored GM.

Compound $K$ (MAP) $π_{1}$ $π_{2}$ $π_{3}$ $π_{4}$

AMI 2 0.9728 0.0272

BDQ 2 0.9982 0.0018

CFZ 2 0.9228 0.0772

DLM 3 0.9184 0.0678 0.0137

EMB 1 1.0000

ETH 2 0.8850 0.1150

INH 4 0.4949 0.0445 0.0647 0.3959

KAN 3 0.8166 0.0709 0.1125

LEV 2 0.8638 0.1362

LZD 2 0.8557 0.1443

MZF 3 0.7909 0.0951 0.1141

PAS 3 0.8374 0.0388 0.1238

RFB 3 0.6276 0.0371 0.3354

RIF 4 0.5310 0.0397 0.0753 0.3541

Compound	$K$ (MAP)	$π_{1}$	$π_{2}$	$π_{3}$	$π_{4}$
AMI	2	0.9728	0.0272
BDQ	2	0.9982	0.0018
CFZ	2	0.9228	0.0772
DLM	3	0.9184	0.0678	0.0137
EMB	1	1.0000
ETH	2	0.8850	0.1150
INH	4	0.4949	0.0445	0.0647	0.3959
KAN	3	0.8166	0.0709	0.1125
LEV	2	0.8638	0.1362
LZD	2	0.8557	0.1443
MZF	3	0.7909	0.0951	0.1141
PAS	3	0.8374	0.0388	0.1238
RFB	3	0.6276	0.0371	0.3354
RIF	4	0.5310	0.0397	0.0753	0.3541

Table 3.

Percentages of strains characterized by known resistance mutations for the first-line drugs and correctly classified as resistant.

	ECOFFinder	ECOFFinder	ECOFFinder			Censored
Drug	0.95	0.99	0.999	GM	DP	GM
EMB	21.062	0.000	0.000	99.159	91.150	91.062
INH	94.131	92.054	92.054	97.813	97.813	92.054
RIF	91.904	91.904	91.904	97.885	97.885	93.508

INH: isoniazid; RIF: rifampicin; EMB: ethambutol; GM: Gaussian mixture; DP: Dirichlet process.

The ground truth for the dataset described in Section 2 is not known: we have no information about whether a strain belongs to the wild-type group or to one of the resistant groups. However, some strains can be predicted to be resistant with high confidence because they are characterized by genomic variants well known to be associated with resistance to specific antimicrobials. For example, Walker et al.⁴⁹ reported candidate genomic variants from the literature and classified them as not conferring resistance, resistance determinants, or uncharacterized.

To test the ability of the compared methods to identify resistant cases, we selected strains predicted as resistant to first-line drugs (EMB, INH, and RIF) with high predictive ability ( $> 95 %$ ) according to Walker et al.⁴⁹ Specifically, we used 14 variants in genes embA and embB for resistance to EMB, 42 variants in genes ahpC, fabG1, inhA, katG, and ndh for resistance to INH, and 30 variants in gene rpoB for resistance to RIF.

Table 3 shows the percentage of strains in the dataset correctly identified as resistant, meaning strains that were classified by the method as resistant and were characterized by one or more of the genomic variants selected from Walker et al.⁴⁹ as conferring resistance with high probability ( $> 95 %$ ).

ECOFFinder directly produces cutoffs that classify isolates into a susceptible and a resistant group. For the other methods, it is assumed that the first component represents the susceptible isolates, while the others represent some level of resistance. Once the classification is done, the strains are checked for the presence of genomic variants identified by Walker et al.⁴⁹ to predict resistance. All the methods identify the resistant strains with an accuracy above $90 %$ , except for ECOFFinder for EMB. GM and DP show very high levels of accuracy in identifying true positives, particularly for INH and RIF. The censored GM is characterized by a slightly lower level of accuracy but is still larger than $90 %$ for all the first-line drugs.

The dataset described in Section 2 does not include information about susceptible strains. However, the experiment carried out by CRyPTIC was first validated by a pilot experiment, as described by Rancoita et al.³⁴ In this validation experiment, the fully susceptible strain H37Rv (reference strain) was subcultured and tested 10 times, and an additional 4 times as a blind strain in each of the laboratories participating in the experiment.

To study the percentage of strains correctly identified as susceptible (true negatives), the methods under comparison were then applied to the validation data-set, and the percentage of duplicates of strain H37Rv correctly identified as susceptible was recorded. For this analysis, we assumed the model with drug and strain intercepts (see Section 3). Percentages of true negative cases are shown in Table 4.

Table 4.

Percentages of strains H37Rv tested during the validation experiment and correctly classified as susceptible.

	ECOFFinder	ECOFFinder	ECOFFinder			Censored
DRUG	0.95	0.99	0.999	GM	DP	GM
AMI	91.500	91.500	91.500	35.000	35.000	95.333
BDQ	92.358	92.358	95.772	13.171	42.764	96.585
CFZ	95.772	95.772	95.772	56.944	56.944	96.528
DLM	94.316	96.448	97.869	98.579	77.798	94.316
EMB	98.152	100.000	100.000	0.185	63.586	98.152
ETH	97.976	97.976	97.976	23.609	82.799	99.325
INH	80.993	91.952	91.952	5.137	5.137	91.952
KAN	98.042	98.042	98.042	8.320	8.320	83.850
LEV	97.414	97.414	97.414	97.414	32.069	98.621
LZD	94.188	94.188	94.188	8.034	8.034	97.265
MXF	97.028	97.028	97.727	2.972	78.846	97.028
PAS	91.107	95.134	100.000	91.107	56.544	91.107
RFB	97.162	98.330	100.000	96.494	93.823	96.494
RIF	96.329	96.329	96.329	76.049	76.049	93.007

For the censored GM, duplicates of H37Rv are correctly identified as susceptible in most cases, more than $90 %$ of the time for all drugs, except KAN. On the other hand, the percentages of correct classification for GM and DP are low for many antimicrobials (only 10 drugs for GM and 13 for DP are correctly classified in less than $90 %$ of the cases). ECOFFinder performs well; however, the choice of the reference quantile has a strong impact on the performance of the method.

Comparing Tables 3 and 4 allows to see that the censored GM seems to perform well (with correct classification $> 90 %$ ) in most of the cases, while ECOFFinder performs well to correctly classify the susceptible cases, but can have low performance in identifying the resistant cases; GM and DP show high levels of correct classifications for the resistant cases, but low levels of correct classification for the susceptible ones, and therefore they are not conservative enough.

In general, censored GM allows to reach good levels of correct classification without the introduction of additional information or experimental choices (as the choice of the reference quantiles) and can be seen as an automatic method of definition of the resistance levels, which is particularly important for the less investigated antimicrobials, but can also highlight unknown mechanisms of resistance for first-line drugs.

5. Application to GWASs

GWASs are a class of methods that involve a model of association of a particular phenotype (e.g. resistance to a specific antimicrobial or the MIC value with respect to that antimicrobial) to a set of genomic variants. Once a genetic association is identified, researchers can further study the biological mechanisms and develop better strategies to detect or treat the disease.

The methods can be classified depending on the type of covariates (e.g. single nucleotide polymorphisms (SNPs) or substrings of some length of the genome, $k$ -mers) or the type of the response variable (e.g. a binary variable of classification for resistance, a continuous variable representing the MIC, or a discrete variable representing the level of resistance). See Marees et al.⁵⁰ and Uffelmann et al.⁵¹ for recent reviews.

GWASs have been run for each of the antimicrobials under study, including SNPs of the whole genome as predictors. The involved model is

\begin{aligned} c_{Y_{i}} & \sim M N (n_{i}, p_{i, 1}, \dots, p_{i, K}) \\ p_{i, k} & = \frac{\exp (η_{i k})}{\sum_{k = 1}^{K} \exp (η_{i k})} \\ η_{i k} & = x_{i}^{T} β_{k} + u \\ u & \sim N (0, σ_{u}^{2} Σ) \end{aligned}

where

c_{Y_{i}}

is a multinomial random variable representing the dilution into which the phenotype of observation

Y_{i}

is classified (here the

\log_{2}

(MIC) associated with each drug),

n_{i}

is usually equal to one,

x_{i}

is a

p \times 1

vector of

p

SNPs,

β

is a

p \times 1

vector of fixed effect size of genetic variants, which may or may not include an intercept (the SNP effect size),

u

is a random effect that captures the polygenic effect of other SNPs, and

σ_{u}^{2}

measures the genetic variation of the phenotype,

Σ

is the genetic relationship matrix. A Bayesian categorical regression has been performed, by assuming inverse gamma prior distributions for

σ_{u}^{2}

, and spike-and-slab priors for

β

. It is assumed that if the posterior distribution of

β_{j}

is concentrated around zero (spike) or the corresponding credible intervals include zero with high-posterior probability (>95%), the coefficient is not significantly different from zero.

Table 5 includes all the variants that has been identified as positively associated to some levels of resistance in the isolates, for each compound. For each compound, the proposed approach has been able to suggest variants which are not included in the recent WHO Catalogue,¹⁸ whose results are based on the same dataset analysed in our work. Some of these variants have already been proposed in the literature, but usually with experiments involving a small number of isolates, or only virulent version of H37Rv, or more generically associated with resistance but not for a specific compound.

Table 5.
Genomic variants which have been identified by the GWAS for each compound.

Drug Catalogue Not in the catalogue Reference

AMI rrs_G1484T Rv3639c_A132E Unreported

rrs_C1402 Rv3897c_G74V Unreported

rrs_A1401G Rv0823c_D156N Unreported

Rv2242_S43L Unreported

Rv2348c_I101M Unreported

mmpL10_K384T Unreported

lipL_S41G Unreported

BDQ Rv1979c_A-129G Rv0678_CG286-287 Guo et al.⁵² (on H37Rv)

Rv0678_T179C Guo et al.⁵² (on H37Rv)

Rv0678_G198* Guo et al.⁵² (on H37Rv)

atpE_G61A Andres et al.⁵³ (124 patients, in vivo)

CFZ Rv1979c_D286G Rv1979c_T1052C Zhang et al.⁶¹ (96 isolates)

Rv0678_S68G Zhang et al.⁶¹ (96 isolates)

Rv0678_S53L Zhang et al.⁶¹ (96 isolates)

Rv0678_S2I Xu et al.⁶⁰ (90 isolates)

Rv0678_M146T Xu et al.⁶⁰ (90 isolates)

Rv0678_L117R Xu et al.⁶⁰ (90 isolates)

Rv1979c_V52G Xu et al.⁶⁰ (90 isolates)

Rv0678_V52G Xu et al.⁶⁰ (90 isolates)

pepQ_L44P Almeida et al.⁵⁶ (on H37Rv)

DLM ddn_W20* Gómez-González et al.⁵⁴ (>33,000 isolates)

ddn_A76E Unreported

ddn_Y89A Unreported

ddn_L37G Unreported

Rv1676_E34T Unreported

EMB embA_c-12t embA_c-16g Perdigão et al.⁸³ (17 isolates)

embB_Q497R embA_c-16t Jouet et al.⁸⁴ (429 isolates)

embB_Q497K embA_c-11t Phelan⁸⁵ (518 isolates)

embB_G406A embB_N1033K Chen et al.⁶⁶ (110 isolates)

embB_G406D embB_Q1002R Earle et al.⁷⁴ (3144 isolates)

(embB_G406S) embB_E405D Napier et al.⁸⁶ (535 isolates)

embB_D328Y Rv1565_V48G Unreported

(embB_D354A) pknJ_V447A Unreported

embB_M306I Rv2000_Y305C Unreported

embB_M306V

embB_Y319C

embB_Y319S

ETH fabG1_c-15t ethA_T186K DeBarber et al.⁸⁷ (11 isolates)

inhA_S94A ethA_Y84D DeBarber et al.⁸⁷ (11 isolates)

(ethA_M1R) ethA_P51L DeBarber et al.⁸⁷ (11 isolates)

ethA_A381P DeBarber et al.⁸⁷ (11 isolates)

ethA_D55A Morlock et al.⁸⁸ (41 isolates)

ethA_G385D Morlock et al.⁸⁸ (41 isolates)

ethA_G413D Morlock et al.⁸⁸ (41 isolates)

ethA_G124D Brossier et al.⁸⁹ (87 isolates)

ethA_S266R Brossier et al.⁸⁹ (87 isolates)

ethA_I194T Machado et al.⁹⁰ (17 isolates)

ethA_T321P Unreported

ethA_Q246* Unreported

INH (ndh_g-70t) katG_A109V Cardoso et al.⁹¹ (97 isolates)

katG_S315N inhA_I21T Hazbón et al.⁹² (1011 isolates)

katG_S315T inhA_I194T Hazbón et al.⁹² (1011 isolates)

katG_G125D Chen et al.⁶⁶ (110 isolates)

katG_S315I Jeeves et al.⁹³ (on strain H37Rv)

katG_S315R Jeeves et al.⁹³ (on strain H37Rv)

Rv3403c_S23R Unreported

Rv2896_S153A Unreported

Rv1922_D282Y Unreported

Rv0163_T45A Unreported

LEV gyrA_D94H gyrA_S91P Hameed et al.⁹⁴ (400 isolates)

gyrA_A90V ruvA_R39W Unreported

gyrA_D94G

(gyrB_E501D)

(gyrB_E501V)

gyrA_D94A

gyrA_D94N

gyrA_D94Y

gyrB_N499T

gyrB_D461N

LZD rplC_C154R rrs_G2447T Lee et al.⁵⁷ (41 isolates)

rrl_G2061T Hillemann et al.⁵⁸ (six isolates)

rplC_H155D Unreported

rrs_V403I Unreported

pks4_E537* Unreported

MXF gyrA_D94H secD_Y171D Unreported

gyrA_D94G Rv2923c_A46V Unreported

gyrA_D94N metS_A440V Unreported

gyrA_D94Y desA3_T236P Unreported

(gyrB_N499D) ruvA_R39W Unreported

RFB rpoB_H445D Li et al.⁹⁵ (154 isolates)

rpoB_H445Y Farhat et al.⁹⁶ (1003 isolates)

rpoB_S450L Farhat et al.⁹⁶ (1003 isolates)

RIF rpoB_D435F Rv1565c_V48G CRyPTIC Consortium⁷³

(rpoB_L452P) Rv2011c_D129 Cui et al.⁷² (on H37Rv)

rpoB_S450Y Rv2011c_R128 Cui et al.⁷² (on H37Rv)

rpoB_S450W

rpoB_S450Q

rpoB_S450L

rpoB_S431T

rpoB_Q432P

rpoB_I491F

rpoB_H445Y

rpoB_H445R

(rpoB_H445G)

rpoB_H445D

(rpoB_H445C)

rpoB_M434I

rpoB_V170F

Drug	Catalogue	Not in the catalogue	Reference
AMI	rrs_G1484T	Rv3639c_A132E	Unreported
	rrs_C1402	Rv3897c_G74V	Unreported
	rrs_A1401G	Rv0823c_D156N	Unreported
		Rv2242_S43L	Unreported
		Rv2348c_I101M	Unreported
		mmpL10_K384T	Unreported
		lipL_S41G	Unreported
BDQ	Rv1979c_A-129G	Rv0678_CG286-287	Guo et al.⁵² (on H37Rv)
		Rv0678_T179C	Guo et al.⁵² (on H37Rv)
		Rv0678_G198*	Guo et al.⁵² (on H37Rv)
		atpE_G61A	Andres et al.⁵³ (124 patients, in vivo)
CFZ	Rv1979c_D286G	Rv1979c_T1052C	Zhang et al.⁶¹ (96 isolates)
		Rv0678_S68G	Zhang et al.⁶¹ (96 isolates)
		Rv0678_S53L	Zhang et al.⁶¹ (96 isolates)
		Rv0678_S2I	Xu et al.⁶⁰ (90 isolates)
		Rv0678_M146T	Xu et al.⁶⁰ (90 isolates)
		Rv0678_L117R	Xu et al.⁶⁰ (90 isolates)
		Rv1979c_V52G	Xu et al.⁶⁰ (90 isolates)
		Rv0678_V52G	Xu et al.⁶⁰ (90 isolates)
		pepQ_L44P	Almeida et al.⁵⁶ (on H37Rv)
DLM		ddn_W20*	Gómez-González et al.⁵⁴ (>33,000 isolates)
		ddn_A76E	Unreported
		ddn_Y89A	Unreported
		ddn_L37G	Unreported
		Rv1676_E34T	Unreported
EMB	embA_c-12t	embA_c-16g	Perdigão et al.⁸³ (17 isolates)
	embB_Q497R	embA_c-16t	Jouet et al.⁸⁴ (429 isolates)
	embB_Q497K	embA_c-11t	Phelan⁸⁵ (518 isolates)
	embB_G406A	embB_N1033K	Chen et al.⁶⁶ (110 isolates)
	embB_G406D	embB_Q1002R	Earle et al.⁷⁴ (3144 isolates)
	(embB_G406S)	embB_E405D	Napier et al.⁸⁶ (535 isolates)
	embB_D328Y	Rv1565_V48G	Unreported
	(embB_D354A)	pknJ_V447A	Unreported
	embB_M306I	Rv2000_Y305C	Unreported
	embB_M306V
	embB_Y319C
	embB_Y319S
ETH	fabG1_c-15t	ethA_T186K	DeBarber et al.⁸⁷ (11 isolates)
	inhA_S94A	ethA_Y84D	DeBarber et al.⁸⁷ (11 isolates)
	(ethA_M1R)	ethA_P51L	DeBarber et al.⁸⁷ (11 isolates)
		ethA_A381P	DeBarber et al.⁸⁷ (11 isolates)
		ethA_D55A	Morlock et al.⁸⁸ (41 isolates)
		ethA_G385D	Morlock et al.⁸⁸ (41 isolates)
		ethA_G413D	Morlock et al.⁸⁸ (41 isolates)
		ethA_G124D	Brossier et al.⁸⁹ (87 isolates)
		ethA_S266R	Brossier et al.⁸⁹ (87 isolates)
		ethA_I194T	Machado et al.⁹⁰ (17 isolates)
		ethA_T321P	Unreported
		ethA_Q246*	Unreported
INH	(ndh_g-70t)	katG_A109V	Cardoso et al.⁹¹ (97 isolates)
	katG_S315N	inhA_I21T	Hazbón et al.⁹² (1011 isolates)
	katG_S315T	inhA_I194T	Hazbón et al.⁹² (1011 isolates)
		katG_G125D	Chen et al.⁶⁶ (110 isolates)
		katG_S315I	Jeeves et al.⁹³ (on strain H37Rv)
		katG_S315R	Jeeves et al.⁹³ (on strain H37Rv)
		Rv3403c_S23R	Unreported
		Rv2896_S153A	Unreported
		Rv1922_D282Y	Unreported
		Rv0163_T45A	Unreported
LEV	gyrA_D94H	gyrA_S91P	Hameed et al.⁹⁴ (400 isolates)
	gyrA_A90V	ruvA_R39W	Unreported
	gyrA_D94G
	(gyrB_E501D)
	(gyrB_E501V)
	gyrA_D94A
	gyrA_D94N
	gyrA_D94Y
	gyrB_N499T
	gyrB_D461N
LZD	rplC_C154R	rrs_G2447T	Lee et al.⁵⁷ (41 isolates)
		rrl_G2061T	Hillemann et al.⁵⁸ (six isolates)
		rplC_H155D	Unreported
		rrs_V403I	Unreported
		pks4_E537*	Unreported
MXF	gyrA_D94H	secD_Y171D	Unreported
	gyrA_D94G	Rv2923c_A46V	Unreported
	gyrA_D94N	metS_A440V	Unreported
	gyrA_D94Y	desA3_T236P	Unreported
	(gyrB_N499D)	ruvA_R39W	Unreported
RFB		rpoB_H445D	Li et al.⁹⁵ (154 isolates)
		rpoB_H445Y	Farhat et al.⁹⁶ (1003 isolates)
		rpoB_S450L	Farhat et al.⁹⁶ (1003 isolates)
RIF	rpoB_D435F	Rv1565c_V48G	CRyPTIC Consortium⁷³
	(rpoB_L452P)	Rv2011c_D129	Cui et al.⁷² (on H37Rv)
	rpoB_S450Y	Rv2011c_R128	Cui et al.⁷² (on H37Rv)
	rpoB_S450W
	rpoB_S450Q
	rpoB_S450L
	rpoB_S431T
	rpoB_Q432P
	rpoB_I491F
	rpoB_H445Y
	rpoB_H445R
	(rpoB_H445G)
	rpoB_H445D
	(rpoB_H445C)
	rpoB_M434I
	rpoB_V170F

The variants are clustered into (a) already present in the 2021 WHO Catalogue,¹⁸ (b) not present in the 2021 WHO Catalogue; if the variant is not included int the 2021 WHO Catalogue, it is indicated whether the variant is already been suggested as associated with resistance to the particular compound or is unreported. Under the group already present in the catalogue, variants which are here identified but with credible intervals including zero are shown in brackets and in italic.

With respect to the new drugs, it is interesting to notice that the method has identified mutations on $R v 0678$ as involved in resistance mechanisms for BDQ, as suggested by Guo et al.⁵² on mutant H37Rv (with a concentration of 0.5 mg/L was for mutant selection). Differently from Guo et al.,⁵² the study run here is able to suggest specific variants. Moreover, the method is able to identify one mutation on $a t p E$ as associated with resistance; the gene was previously found by Andres et al.⁵³ as mutated in seven out of 124 patients, within 9 months after the addition of BDQ and CFZ to the routine treatment. Our approach was also able to identify several mutations on $d d n$ as associated with resistance to DLM: of these, just one was previously identified in a large (>33,000) study,⁵⁴ while four were previously unreported and two were found to be generically associated to resistance; in particular, Antonova et al.⁵⁵ found $R v 1676$ as generically associate to resistance.

Relatively to the repurposed drugs, in the 2021 WHO Catalogue no mutation meets the criteria for association to the CFZ resistance. On the other hand, in this study, we were able to associate three mutations in $R v 1979 c$ , six mutations in $R v 0678$ , and one mutation in $p e p Q$ . Among these, eight variants were already identified on smaller studies (96 isolates and 90 isolates, respectively), while the one on $p e p Q$ was suggested on a mutant variant of H37Rv used in vitro and in mice.⁵⁶ With respect to LZD, the 2021 WHO Catalogue only reports one mutation on $r p l C$ to be associated with resistance, while mutations on $r r s$ and $r r l$ do not meet their criteria. On the other hand, our approach finds one additional mutation on $r p l C$ , which was previously unreported, two mutations on $r r s$ and one on $r r l$ . In particular, $r r s$ _G2447T was reported only on one patient,⁵⁷ and $r r l$ _G2061T was associated to resistance to LZD in a study with only six isolates.⁵⁸ In particular, $R v 1979 c$ and $R v 0678$ are genes which have been recently suggested as possibly associated to resistance to CFZ in cohort studies^59–61 or in vitro.⁶² As a note, Hartkoorn et al.⁶³ speculated that $R v 0678$ can represent a confounder when analysing resistance to BDQ and CFZ.

For these new or repurposed drugs, the proposed approach present one strong advantage. From Figure 1, it is evident that the distributions of some drugs present long tails, but with small number of cases with high values of MIC (BDQ, CFZ, and DLM): since the drugs have more recently introduced for treatment of tuberculosis, the bacteria have not yet developed widespread mechanisms of resistance. The ability to identify clusters to separate susceptible from resistant cases is important because in a GWAS for such drugs the signals coming from the susceptible cases are stronger than the ones coming from resistant cases, since there is a disparity in the number of strains associated with each group. As an example, Figure 2 shows the Manhattan plots for CFZ when using the clusters identified using GM (similar results for DP) and when using the clusters identified using censored GM. A Manhattan plot is a scatter plot displaying the p-values in logarithmic scale associated with each genomic variant, ordered on the $x$ -axis depending on its position on the genome. The red line in the figures represents the threshold of significance, which is computed here through the Bonferroni correction. With GM thousands of variants appear to be positively associated with the phenotype, while they are reduced to only four significant variants when using censored GM: GM (and DP) identifies a larger number of clusters, including more than one cluster for the susceptible isolates. Therefore, a GWAS tends to explain the heterogeneity of the susceptible group. On the other hand, when using the clusters identified by censored GMM, it is possible to select few candidates that can be associated with resistance to CFZ.

Figure 2.

Manhattan plots resulting from a genome-wide regression with outcomes given by level of resistance identified by a Gaussian mixture model (a) and the censored Gaussian mixture model proposed in this work (b).

The first-line drugs have been extensively studied in the literature. The proposed approach identifies most of the positive associations for resistance to INH in genes $k a t G$ (high levels of resistance) and $i n h A$ (low levels of resistance). Several mutations have already been identified either in the 2021 WHO Catalogue or in other studies. Similar to the 2021 WHO Catalogue, several mutations already identified in the literature are here found to be not significantly different from zero: $k a t G$ _W191G,⁶⁴ $k a t G$ _L141F,⁶⁵ $k a t G$ _L159P and $k a t G$ _L704S,⁶⁶ $k a t G$ _A614E.⁶⁷ In addition to mutations in $k a t G$ and $i n h A$ , the proposed approach identifies few other genes that can be of interest for further investigation: $R v 3403 c$ , which was previously associated to resistance to INH by Kruh et al.⁶⁸ through an experiment on a guinea pig, $R v 2896$ , which has been associated generically to resistance in two isolates treated with INH by Niemann et al.,⁶⁹ $R v 1922$ , which was generically associated to resistance by Mortimer et al.,⁷⁰ and $R v 0163$ , whose mutations were found to be needed for $M.~tuberculosis$ to seed in the lung of mice by Payros et al.⁷¹

Most mutations present in the 2021 WHO Catalogue as associated to resistance to RIF have been identified in this study as well, however, it is interesting to notice that our approach is able to identify two variants relative to gene $R v 2011 c$ as having a significant impact in the evolution of low resistance levels; while this gene was previously proposed as possibly involved in mechanisms of resistance to RIF,⁷² there is not yet agreement on its role. The role of mutations on $r p o B$ is so strong that standard methods have difficulties in identifying the associations with respect to intermediate level of resistance; however, our approach is able to identify three groups of resistance (susceptible isolates, intermediate resistant isolates, and high resistant isolates), which allows more easily to associate the second group to the important variants. Moreover, the CRyPTIC Consortium⁷³ also identified $R v 1565 c$ as having a role in mechanisms of resistance to RIF through a standard GWAS, and here we are able to identify one specific variant.

Several variants already present in the 2021 WHO Catalogue as associated to resistance to EMB have been found with our approach (including few variants which were not found significant). Six more variants on genes $e m b A$ and $e m b B$ were identified, five of them were already reported in smaller studies, while $e m b B$ _Q1002K was already found by Earle et al.⁷⁴ in a large study. Three previously unreported mutations were also found as significantly associated with low levels of resistance, on $R v 1565$ , on $p k n J$ , and on $R v 2000$ . In particular, this last gene was already found to be generically associated to resistance in the Tulega Ferry isolate.⁷⁵

Among the other second-line drugs, the proposed approach allows to identify additional genes involved in the development of resistance to AMI, beyond $r r s$ : in particular, Jain et al.⁷⁶ identified $R v 3639 c$ as highly up-regulated during the early stages of invasion for bacteria treated with AMI, Li⁷⁷ observed $R v 3897 c$ to be down-regulated in virulent H37Rv treated with AMI, Muzondiwa⁷⁸ identified $R v 0823 c$ _D156N as a compensatory mutation, Domenech et al.⁷⁹ suggested that $R v 2242$ might be a gene relevant to the host–pathogen dialogue, Bhargavi et al.⁸⁰ identified $R v 2348 c$ as involved in the interactome network, while $m m L 10$ _K384T was found involved in the resistance to KAN by CRyPTIC Consortium.⁷³ KAN is no longer endorsed for TB treatment and does not appear in Table 5. Differently from the WHO Catalogue, our approach is not only able to identify mutations on $f a b G 1 c$ and $i n h A$ as associated to resistance to ETH, but also several variants on gene $e t h A$ , some previously unreported, and some already suggested by smaller studies (on < 100 isolates).

Most of the variants associated to resistance to LEV in the WHO Catalogue, on gene $g y r A$ and $g y r B$ , also result significant in the current study; moreover, one variant previously reported on a smaller study was also identified, and one variant on $r u v A$ , which Klopper et al.⁸¹ reported as generically associated to resistance in a study with 211 isolates. Similarly, most of the variants associated to resistance to MXF are also found here, together with several previously unreported variants; in particular, Sharma et al.⁸² reported $s e c D$ as generically associated to resistance on mutated strain of H37Rv through a proteomic approach, while Klopper et al.⁸¹ reported $R v 2923 c$ as associated to resistance, even if its function was unclear.

Finally, three variants in $r p o B$ were identified to be associated to resistance to RFB; all of them were already reported in previous smaller studies.

6. Conclusion

This work has proposed a method to analyse distributions of MICs through mixture models and allocate strains to groups representing different levels of resistance to the antimicrobials, instead of using a binary classification defined via critical concentrations. The method presents several advantages.

First, the use of mixture models allows to identify several levels of resistance and possibly associate each of them with different genomic variants: some of them can be associated with high level of resistance, while others can be associated with intermediate levels of resistance and the possibility to separate levels of resistance is important to identify rare genomic variants.

Second, the method is defined in a Bayesian framework and this allows to introduce assumptions on the phenomenon of resistance. In particular, Section 4 shows that an assumption of conservativeness in the number of groups in the mixture model allows to increase the accuracy of the classification of susceptible and resistant strains. On the contrary, using a uniform prior distribution on the number of components or a nonparametric approach based on Dirichlet process priors leads to a large number of components, and it is more likely to split the susceptible group into subgroups, which may hide resistance mechanisms in particular associated with rare variants.

Finally, the method allows to deal with the discrete nature of the registered data, which are characterized by double censoring: both interval censoring (data are recorded at fix levels of concentrations) and boundary censoring (there are a maximum and a minimum concentration tested in the plate). The possibility to deal with this double censoring reduces the bias in the estimation process noticed by Annis and Craig.²⁵

The proposed approach is flexible and general, and can be automatically applied, with the reduction of experimental inputs. At this stage, antimicrobials are treated independently, however, since treatments to tuberculosis are usually defined as combinations of drugs given at the same time to the patient, tuberculosis is known to have developed high levels of multi-drug resistance. Generalizations to a multivariate version of the approach are subject of current research; such modification needs to take into account the complex structure of dependence among drugs: while some drugs are dependent because they have similar chemical structure, other groups of drugs are dependent because they are often prescribed together and strains develop associated mechanisms of resistance. Therefore, it is reasonable to expect non-linear structure of dependence.

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231211010 - Supplemental material for Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect $M y c o b a c t e r i u m t u b e r c u l o s i s$ resistance mutations

Supplemental material, sj-pdf-1-smm-10.1177_09622802231211010 for Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect $M y c o b a c t e r i u m t u b e r c u l o s i s$ resistance mutations by Clara Grazian in Statistical Methods in Medical Research

Footnotes

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) received no financial support for the research, authorship and/or publication of this article.

ORCID iD

Clara Grazian

Supplemental material

Supplemental material for this article is available in another file.

References

European Commission. A European one health action plan against antimicrobial resistance (AMR) . Brussels, Belgium: European Commission, 2017.

Gelband

Molly Miller

Pant

, et al. The state of the world’s antibiotics 2015. Wound Healing Southern Africa 2015; 8: 30–34.

World Health Organization. Global framework for Development and stewardship to combat antimicrobial resistance? World Health Organization, Geneva, Switzerland, 2017.

Kohanski

DePristo

Collins

. Sublethal antibiotic treatment leads to multidrug resistance via radical-induced mutagenesis. Mol Cell 2010; 37: 311–320.

Tenover

. Mechanisms of antimicrobial resistance in bacteria. Am J Infect Control 2006; 34: S3–S10.

Zignol

Hosseini

Wright

, et al. Global incidence of multidrug-resistant tuberculosis. J Infect Dis 2006; 194: 479–485.

Gulshan

Moye-Rowley

. Multidrug resistance in fungi. Eukaryotic Cell 2007; 6: 1933–1942.

Vandeputte

Ferrari

Coste

. Antifungal resistance and new strategies to control fungal infections. Int J Microbiol 2012; 2012: 713687.

Unemo

Nicholas

. Emergence of multidrug-resistant, extensively drug-resistant and untreatable gonorrhea. Future Microbiol 2012; 7: 1401–1422.

10.

Yim

Hussain

Liu

, et al. Evolution of multi-drug resistant hepatitis B virus during sequential therapy. Hepatology 2006; 44: 703–712.

11.

Wiegand

Hilpert

Hancock

. Agar and broth dilution methods to determine the minimal inhibitory concentration (MIC) of antimicrobial substances. Nat Protoc 2008; 3: 163.

12.

Turnidge

Paterson

. Setting and revising antibacterial susceptibility breakpoints. Clin Microbiol Rev 2007; 20: 391–408.

13.

Hirschhorn

Daly

. Genome-wide association studies for common diseases and complex traits. Nat Rev Genet 2005; 6: 95.

14.

Dheda

Gumbo

Maartens

, et al. The epidemiology, pathogenesis, transmission, diagnosis, and management of multidrug-resistant, extensively drug-resistant, and incurable tuberculosis. Lancet Resp Med 2017; 5: 291–360.

15.

Falzon

Mirzayev

Wares

, et al. Multidrug-resistant tuberculosis around the world: what progress has been made? Eur Respir J 2015; 45: 150–160.

16.

World Health Organization. Global tuberculosis report 2015. Geneva, Switzerland: World Health Organization, 2015.

17.

World Health Organization. Technical report on critical concentrations for drug susceptibility testing of medicines used in the treatment of drug-resistant tuberculosis. Technical report, World Health Organization, 2018.

18.

World Health Organization. Catalogue of mutations in Mycobacterium tuberculosis complex and their association with drug resistance. Technical report, World Health Organization, 2021.

19.

Turnidge

Kahlmeter

Kronvall

. Statistical characterisation of bacterial wild-type MIC value distributions and the determination of epidemiological cut-off values. Clin Microbiol Infect 2006; 12: 418–425.

20.

Jaspers

Aerts

Verbeke

, et al. Estimation of the wild-type minimum inhibitory concentration value distribution. Stat Med 2014; 33: 289–303.

21.

Jaspers

Aerts

Verbeke

, et al. A new semi-parametric mixture model for interval censored data, with applications in the field of antimicrobial resistance. Comput Stat Data Anal 2014; 71: 30–42.

22.

Jaspers

Verbeke

Böhning

, et al. Application of the vertex exchange method to estimate a semi-parametric mixture model for the MIC density of Escherichia coli isolates tested for susceptibility against ampicillin. Biostatistics 2015; 17: 94–107.

23.

Jaspers

Lambert

Aerts

. A Bayesian approach to the semiparametric estimation of a minimum inhibitory concentration distribution. Ann Appl Stat 2016; 10: 906–924.

24.

Craig

. Modeling approach to diameter breakpoint determination. Diagn Microbiol Infect Dis 2000; 36: 193–202.

25.

Annis

Craig

. Statistical properties and inference of the antimicrobial MIC test. Stat Med 2005; 24: 3631–3644.

26.

McLachlan

Jones

. Fitting mixture models to grouped and truncated data via the EM algorithm. Biometrics 1988; 44: 571–578.

27.

Cadez

Smyth

McLachlan

, et al. Maximum likelihood estimation of mixture densities for binned and truncated multivariate data. Mach Learn 2002; 47: 7–34.

28.

Hamdan

. EM algorithm of spherical models for binned data. In: 2011 IEEE international symposium on signal processing and information technology (ISSPIT), Bilbao, Spain, 2011, pp. 99–105.

29.

Kottas

Müller

Quintana

. Nonparametric Bayesian modeling for multivariate ordinal data. J Comput Graph Stat 2005; 14: 610–625.

30.

DeYoreo

Kottas

. Bayesian nonparametric modeling for multivariate ordinal regression. J Comput Graph Stat 2018; 27: 71–84.

31.

Miller

Harrison

. Inconsistency of Pitman-Yor process mixtures for the number of components. J Mach Learn Res 2014; 15: 3333–3370.

32.

Miller

Harrison

. Mixture models with a prior on the number of components. J Am Stat Assoc 2018; 113: 340–356.

33.

Frühwirth-Schnatter

Malsiner-Walli

Grün

. Generalized mixtures of finite mixtures and telescoping sampling. arXiv preprint arXiv:2005.09918, 2021.

34.

Rancoita

PMV

Cugnata

Gibertoni Cruz

, et al. Validating a 14-drug microtitre plate containing bedaquiline and delamanid for large-scale research susceptibility testing of Mycobacterium tuberculosis. Antimicrob Agents Chemother 2018; 62: e00344-18.

35.

Frühwirth-Schnatter

. Finite mixture and Markov switching models. Berlin, Germany: Springer Science & Business Media, 2006.

36.

Hjort

Holmes

Müller

, et al. Bayesian Nonparametrics. 28. Cambridge, UK: Cambridge University Press, 2010.

37.

Diebolt

Robert

. Estimation of finite mixture distributions through Bayesian sampling. J R Stat Soc Ser B (Methodological) 1994; 56: 363–375.

38.

Albert

Chib

. Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 1993; 88: 669–679.

39.

Robert

. The Bayesian choice: from decision-theoretic foundations to computational implementation. Berlin, Germany: Springer Science & Business Media, 2007.

40.

Richardson

Green

. On Bayesian analysis of mixtures with an unknown number of components (with discussion). J R Stat Soc: Ser B (statistical methodology) 1997; 59: 731–792.

41.

Rousseau

Mengersen

. Asymptotic behaviour of the posterior distribution in overfitted mixture models. J R Stat Soc: Ser B (Statistical Methodology) 2011; 73: 689–710.

42.

Grazian

Robert

. Jeffreys priors for mixture estimation: properties and alternatives. Comput Stat Data Anal 2018; 121: 149–163.

43.

Grazian

Villa

Liseo

. On a loss-based prior for the number of components in mixture models. Stat Probab Lett 2020; 158: 108656.

44.

Jasra

Holmes

Stephens

. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat Sci 2005; 20: 50–67.

45.

Stephens

. Dealing with label switching in mixture models. J R Stat Soc: Ser B (Statistical Methodology) 2000; 62: 795–809.

46.

Celeux

. Bayesian inference for mixture: the label switching problem. In: Payne R and Green P (eds) Compstat. Heidelberg: Physica, 1998; pp. 227–232.

47.

Sperrin

Jaki

Wit

. Probabilistic relabelling strategies for the label switching problem in Bayesian mixture models. Stat Comput 2010; 20: 357–366.

48.

Petzoldt

. antibioticR-package: Analysis of Antbiotic Resistance Data. https://rdrr.io/github/tpetzoldt/antibioticR/man/antibiotic R-package.html, 2019.

49.

Walker

Kohl

Omar

, et al. Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect Dis 2015; 15: 1193–1202.

50.

Marees

de Kluiver

Stringer

, et al. A tutorial on conducting genome-wide association studies: quality control and statistical analysis. Int J Methods Psychiatr Res 2018; 27: e1608.

51.

Uffelmann

Huang

Munung

, et al. Genome-wide association studies. Nat Rev Method Primers 2021; 1: 59–00.

52.

Guo

Lin

, et al. Whole genome sequencing identifies novel mutations associated with bedaquiline resistance in Mycobacterium tuberculosis. Front Cell Infect Microbiol 2022; 12: 807095.

53.

Andres

Merker

Heyckendorf

, et al. Bedaquiline-resistant tuberculosis: dark clouds on the horizon. Am J Respir Crit Care Med 2020; 24(23), 201: 1564–1568.

54.

Gómez-González

Perdigao

Gomes

, et al. Genetic diversity of candidate loci linked to Mycobacterium tuberculosis resistance to bedaquiline, delamanid and pretomanid. Nat Sci Rep 2021; 11: 19431.

55.

Antonova

Gryadunov

Zimenkov

. Molecular mechanisms of drug tolerance in Mycobacterium tuberculosis. Mol Biol (N.Y.) 2018; 52: 372–384.

56.

Almeida

Ioerger

Tyagi

, et al. Mutations in pepQ confer low-level resistance to bedaquiline and clofazimine in Mycobacterium tuberculosis. Antimicrob Agents Chemother 2016; 60: 4590–4599.

57.

Lee

Carroll

, et al. Linezolid for treatment of chronic extensively drug-resistant tuberculosis. N Engl J Med 2012; 367: 1508–1518.

58.

Hillemann

Rüsch-Gerdes

Richter

. In vitro-selected linezolid-resistant Mycobacterium tuberculosis mutants. Antimicrob Agents Chemother 2008; 52: 800.

59.

Ismail

Omar

Joseph

, et al. Defining bedaquiline susceptibility, resistance, cross-resistance and associated genetic determinants: a retrospective cohort study. EBioMedicine 2018; 28: 136–142.

60.

Wang

, et al. Primary clofazimine and bedaquiline resistance among isolates from patients with multidrug-resistant tuberculosis. Antimicrob Agents Chemother 2017; 61: e00239.

61.

Zhang

Chen

Cui

, et al. Identification of novel mutations associated with clofazimine resistance in Mycobacterium tuberculosis. J Antimicrob Chemother 2015; 70: 2507–2510.

62.

Ismail

Peters

Ismail

, et al. Clofazimine exposure in vitro selects efflux pump mutants and bedaquiline resistance. Antimicrob Agents Chemother 2019; 63: e02141.

63.

Hartkoorn

Uplekar

Cole

. Cross-resistance between clofazimine and bedaquiline through upregulation of MmpL5 in Mycobacterium tuberculosis. Antimicrob Agents Chemother 2014; 58: 2979–2981.

64.

Mitarai

Kato

Ogata

, et al. Comprehensive multicenter evaluation of a new line probe assay kit for identification of Mycobacterium species and detection of drug-resistant Mycobacterium tuberculosis. J Clin Microbiol 2012; 50: 884–890.

65.

Brossier

Veziris

Truffot-Pernot

, et al. Performance of the genotype MTBDR line probe assay for detection of resistance to rifampin and isoniazid in strains of Mycobacterium tuberculosis with low-and high-level resistance. J Clin Microbiol 2006; 44: 3659–3664.

66.

Chen

Wang

, et al. Evaluation of whole-genome sequence method to diagnose resistance of 13 anti-tuberculosis drugs and characterize resistance genes in clinical multi-drug resistance Mycobacterium tuberculosis isolates from China. Front Microbiol 2019; 10: 1741.

67.

Singh

Jamal

Ahmed

, et al. Computational modeling and bioinformatic analyses of functional mutations in drug target genes in Mycobacterium tuberculosis. Comput Struct Biotechnol J 2021; 19: 2423–2446.

68.

Kruh

Troudt

Izzo

, et al. Portrait of a pathogen: the Mycobacterium tuberculosis proteome in vivo. PLoS ONE 2010; 5: e13938.

69.

Niemann

Köser

Gagneux

, et al. Genomic diversity among drug sensitive and multidrug resistant isolates of Mycobacterium tuberculosis with identical DNA fingerprints. PLoS ONE 2009; 4: e7407.

70.

Mortimer

Weber

Pepperell

. Signatures of selection at drug resistance loci in Mycobacterium tuberculosis. MSystems 2018; 3: e00108.

71.

Payros

Alonso

Malaga

, et al. Rv0180c contributes to Mycobacterium tuberculosis cell shape and to infectivity in mice and macrophages. PLoS Pathog 2021; 17: e1010020.

72.

Cui

Zeng

. Anti-tuberculosis drug target discovery by targeting the higher in-degree proteins (HidPs) of the pathogen’s transcriptional network. J Tuberc 2018; 1: 1001.

73.

RyPTICConsortium

. Genome-wide association studies of global Mycobacterium tuberculosis resistance to 13 antimicrobials in 10, 228 genomes identify new resistance mechanisms. PLoS Biol 2022; 20: e3001755.

74.

Earle

Charlesworth

, et al. Identifying lineage effects when controlling for population structure improves power in bacterial association studies. Nature Microbiology 2016; 1: 1–8.

75.

Motiwala

Dai

Jones-López

, et al. Mutations in extensively drug-resistant Mycobacterium tuberculosis that do not code for known drug-resistance mechanisms. J Infect Dis 2010; 201: 881–888.

76.

Jain

Paul-Satyaseela

Lamichhane

, et al. Mycobacterium tuberculosis invasion and traversal across an in vitro human blood–brain barrier as a pathogenic mechanism for central nervous system tuberculosis. J Infect Dis 2006; 193: 1287–1295.

77.

AHL

. Identification of virulence determinants of Mycobacterium tuberculosis via genetic comparisons of a virulent and an attenuated strain of Mycobacterium tuberculosis. Doctoral dissertation, University of British Columbia, 2008.

78.

Muzondiwa

. Exploring the evolution of drug resistance in Mycobacterium using whole genome sequencing data. Doctoral dissertation, University of Pretoria, 2019.

79.

Domenech

Reed

Barry III

. Contribution of the Mycobacterium tuberculosis MmpL protein family to virulence and drug resistance. Infect Immun 2005; 73: 3492–3501.

80.

Bhargavi

Hassan

Balaji

, et al. Protein–protein interaction of Rv0148 with Htdy and its predicted role towards drug resistance in Mycobacterium tuberculosis. BMC Microbiol 2020; 20: 1–15.

81.

Klopper

Heupink

Hill-Cawthorne

, et al. A landscape of genomic alterations at the root of a near-untreatable tuberculosis epidemic. BMC Med 2020; 18: 1–14.

82.

Sharma

Bisht

Khan

. Potential alternative strategy against drug resistant tuberculosis: a proteomics prospect. Proteomes 2018; 6: 26.

83.

Perdigão

Silva

Maltez

, et al. Emergence of multidrug-resistant Mycobacterium tuberculosis of the Beijing lineage in Portugal and Guinea-Bissau: a snapshot of moving clones by whole-genome sequencing. Emerg Microbes Infect 2020; 9: 1342–1353.

84.

Jouet

Gaudin

Badalato

, et al. Deep amplicon sequencing for culture-free prediction of susceptibility or resistance to 13 anti-tuberculous drugs. Eur Respir J 2021; 57: 2002338.

85.

Phelan

. A bioinformatic analysis of Mycobacterium tuberculosis and host genomic data. Doctoral dissertation, London School of Hygiene & Tropical Medicine, 2018.

86.

Napier

Khan

Jabbar

, et al. Characterisation of drug-resistant Mycobacterium tuberculosis mutations and transmission in Pakistan. Nature Scientific Reports 2022; 12: 7703.

87.

DeBarber

Mdluli

Bosman

, et al. Ethionamide activation and sensitivity in multidrug-resistant Mycobacterium tuberculosis. Proc Natl Acad Sci USA 2000; 97: 9677–9682.

88.

Morlock

Metchock

Sikes

, et al. ethA, inhA, and katG loci of ethionamide-resistant clinical Mycobacterium tuberculosis isolates. Antimicrob Agents Chemother 2003; 47: 3799–3805.

89.

Brossier

Veziris

Truffot-Pernot

, et al. Molecular investigation of resistance to the antituberculous drug ethionamide in multidrug-resistant clinical isolates of Mycobacterium tuberculosis. Antimicrob Agents Chemother 2011; 55: 355–360.

90.

Machado

Perdigão

Ramos

, et al. High-level resistance to isoniazid and ethionamide in multidrug-resistant Mycobacterium tuberculosis of the Lisboa family is associated with inhA double mutations. J Antimicrob Chemother 2013; 68: 1728–1732.

91.

Cardoso

Cooksey

Morlock

, et al. Screening and characterization of mutations in isoniazid-resistant Mycobacterium tuberculosis isolates obtained in Brazil. Antimicrob Agents Chemother 2004; 48: 3373–3381.

92.

Hazbón

Brimacombe

Bobadilla del Valle

, et al. Population genetics study of isoniazid resistance mutations and evolution of multidrug-resistant Mycobacterium tuberculosis. Antimicrob Agents Chemother 2006; 50: 2640–2649.

93.

Jeeves

Marriott

Pullan

, et al. Mycobacterium tuberculosis is resistant to isoniazid at a slow growth rate by single nucleotide polymorphisms in katG codon Ser315. PLoS ONE 2015; 10: e0138253.

94.

Hameed

Tan

Islam

, et al. Phenotypic and genotypic characterization of levofloxacin-and moxifloxacin-resistant Mycobacterium tuberculosis clinical isolates in southern China. J Thorac Dis 2019; 11: 4613.

95.

Yang

Hong

, et al. Whole-genome sequencing for resistance level prediction in multidrug-resistant tuberculosis. Microbiol Spectr 2022; 10: e02714.

96.

Farhat

Sixsmith

Calderon

, et al. Rifampicin and rifabutin resistance in 1003 Mycobacterium tuberculosis clinical isolates. J Antimicrob Chemother 2019; 74: 1477–1483.

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

0.35 MB

Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect Mycobacterium tuberculosis resistance mutations

Abstract

Keywords

1. Introduction

Table 1. Number of isolates analysed for each compound. Compound n Compound n Compound n AMI 7312 ETH 7310 MXF 6385 BDQ 7054 INH 7097 PAS 6319 CFZ 6793 KAN 7207 RFB 7331 DLM 7016 LEV 6607 RIF 7145 EMB 6584 LZD 6420

Supplemental Material

sj-pdf-1-smm-10.1177_09622802231211010 - Supplemental material for Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect M y c o b a c t e r i u m t u b e r c u l o s i s resistance mutations

Footnotes

Declaration of conflicting interests

Funding

ORCID iD

Supplemental material

References

Supplementary Material

Table 1.
Number of isolates analysed for each compound.

Compound $n$ Compound $n$ Compound $n$

AMI 7312 ETH 7310 MXF 6385

BDQ 7054 INH 7097 PAS 6319

CFZ 6793 KAN 7207 RFB 7331

DLM 7016 LEV 6607 RIF 7145

EMB 6584 LZD 6420

sj-pdf-1-smm-10.1177_09622802231211010 - Supplemental material for Clustering minimal inhibitory concentration data through Bayesian mixture models: An application to detect $M y c o b a c t e r i u m t u b e r c u l o s i s$ resistance mutations