Abstract
We assessed the ability of a commercial DNA microarray to characterize bovine Shiga toxin–producing Escherichia coli (STEC) isolates and evaluated the results using in silico hybridization of the microarray probes within whole genome sequencing scaffolds. From a total of 69,954 reactions (393 probes with 178 isolates), 68,706 (98.2%) gave identical results by DNA microarray and in silico probe hybridization. Results were more congruent when detecting the genoserotype (209 differing results from 19,758 in total; 1.1%) or antimicrobial resistance genes (AMRGs; 141 of 26,878; 0.5%) than when detecting virulence-associated genes (VAGs; 876 of 22,072; 4.0%). Owing to the limited coverage of O-antigens by the microarray, only 37.2% of the isolates could be genoserotyped. However, the microarray proved suitable to rapidly screen bovine STEC strains for the occurrence of high numbers of VAGs and AMRGs and is suitable for molecular surveillance workflows.
Isolates of Shiga toxin–producing Escherichia coli (STEC) can cause illnesses in humans ranging from mild diarrhea to life-threatening hemolytic uremic syndrome or hemorrhagic colitis. 15 The main reservoir for STEC associated with human disease is ruminants, in particular cattle. 15 Accurate and rapid identification of STEC by bacteriologic workflows is important in human and veterinary medicine. Molecular tools detecting virulence-associated genes (VAGs) facilitate the identification of pathotypes within the species E. coli. However, conventional molecular tools such as the polymerase chain reaction (PCR) or in situ hybridization limit the number of genes per assay; at the other extreme, whole genome sequencing (WGS) allows the investigation of the entire gene set in a single analysis.16,19
DNA microarrays can allow several hundred genes to be economically tested within one reaction and possess some advantages over WGS.1,9 First, the time from isolation of a putative pathogen to the test result can be as short as 1 d (DNA preparation, PCR-based DNA biotinylation, DNA microarray test, and automated interpretation), whereas WGS can take several days (day 1: DNA preparation, library creation; days 2–3: cluster amplification and DNA sequencing reaction; days 4–7: data processing, analysis, interpretation). Second, DNA microarray results are given as a positive or negative value, with parallel proofing of sample purity and/or RNA contamination, whereas raw WGS sequence reads require quality checks 12 (e.g., phred, nucleotide distribution, GC content) after proceeding through the alignment and contig building steps (e.g., coverage, number of contigs). Third, the output data of DNA microarray tests is small (<100 kb), whereas raw data from sequencing a single isolate comprises 5–10 mb. Fourth, the interpretation and plausibility check of DNA microarray results is more user-friendly and does not need deeper bioinformatics knowledge compared to WGS results. Finally, costs for equipment and test buffers for DNA microarray tests are more economical than WGS. However, DNA microarrays require prior knowledge concerning the number and nature of the genes to be included. Isolates of E. coli exhibit high genome flexibility through extensive horizontal gene transfer and, in particular, VAGs are mostly encoded on mobile genetic elements such as bacteriophages, plasmids, or pathogenicity islands. 15 Such flexibility is even further increased by individual subtypes or variants described for several virulence factors. Prime examples are the Shiga toxin types 1 and 2 with their subtypes and variants, 18 as well as the variants of the adhesion factor intimin. 6 It is mandatory, therefore, to evaluate if the DNA microarray to be used is appropriate for the respective E. coli population to be tested. We assessed if a commercial E. coli DNA microarray (E. coli PanType AS-2, ArrayStrip format; Alere Technologies, Jena, Germany). is suitable to characterize bovine STEC isolates.
All 178 isolates used in our study originated from fecal samples of individual cattle in German herds and were described previously.8,11 The isolates were selected on the basis of phenotypic and genetic characterization to represent the STEC population from the previous studies. 3 Genomic E. coli DNA was prepared (ZR fungal/bacterial DNA kit, Zymo Research Europe, Freiburg, Germany) from overnight cultures in lysogeny broth (Lennox) and analyzed by miniaturized E. coli oligonucleotide arrays (Alere Technologies) containing targets for the identification of VAGs (124 probes), antimicrobial resistance genes (AMRGs; 151 probes), as well as for DNA-based serotyping (111 probes).1,2,4 For further analysis, either the normalized signal intensities (minus background) or a positive (signal intensity ≥ 0.3) or negative (<0.3) result for each probe were used (IconoClust v. 4.3r0, Alere Technologies, Jena, Germany). Additionally, the whole genome of each isolate was obtained by paired-end sequencing (MiSeq, Illumina, San Diego, CA) as described previously. 3 The DNA sequences of the microarray probes were used as references and mapped to the WGS scaffolds (hereafter, in silico hybridization) (Geneious v. 8.1.3, Biomatters, Auckland, New Zealand). For a positive result, the probe sequence had to match with a maximum of 3 mismatches. If more mismatches or no matching occurred, a negative result was recorded. The agreement between the DNA microarray results and the respective in silico hybridization within the WGS scaffolds was determined by a Cohen kappa measurement (IBM SPSS Statistics v. 19, IBM, New York, NY).
Seventy-one of 124 gene probes specific for VAGs or VAG variants (57.3%) hybridized with the DNA of at least one of the isolates (Supplemental Dataset 1). The VAGs and their variants detected most often were hemL (glutamate-1-semialdehyde aminotransferase, n = 170), eae (4 intimin variants, n = 135–146), nleB (non–locus of enterocyte effacement [LEE]-encoded effector protein B, n = 140), lpfA (major subunit of long polar fimbriae, n = 138), nleA (non–LEE-encoded effector protein A, 4 variants, n = 125–137), astA (heat stable enterotoxin EAST1, n = 120), hlyA (EHEC hemolysin, n = 111), nleC (non–LEE-encoded effector protein C, n = 108), and tccP (Tir-cytoskeleton coupling protein, n = 93–103). Several VAGs covered by the DNA microarray that were more specific for enteropathogenic (e.g., bfpA, perA), enterotoxigenic (e.g., faeG, fanA, fasA, f17A), extraintestinal pathogenic (e.g., ireA, iroN), enteroinvasive (e.g., ipa), or avian pathogenic E. coli (e.g., hlyE), rather than for STEC,7,15 were not detected in our isolates. Of 111 probes for genoserotyping, 61 (55.0%) reacted with at least 1 isolate. Complete genoserotypes (O- and H-antigen) were determined for 67 isolates (37.2%), with O26:H11 (n = 29), O157:H7 (n = 18), and O6:H49 (n =10) found most frequently. The microarray includes 47 of 53 known H-antigens, but only 24 of 188 known O-antigens by DNA probes,2,13 explaining why many isolates were not completely genoserotyped with this approach. Overall, 32 of 151 (22.5%) probes for AMRGs hybridized with at least 1 isolate. The most frequently detected AMRGs encoded for resistance against streptomycin (strA, n = 31; strB, n = 34), β-lactam antibiotics (blaTEM-1, n = 21), and sulfonamide (sul2, n = 26).
Applying the in silico hybridization, 51.2% (65 of 124) of the DNA sequences specific for VAGs or their variants matched within the scaffolds of at least 1 isolate (Supplemental Dataset 2). Again, the most frequently detected gene was hemL in 176 isolates, followed by lpfA (143 isolates), eae (133–134), nleB (132), nleA (124), and astA (117). Of 111 probes for genoserotyping, 46 (41.4%) reacted with at least 1 isolate. Complete genoserotypes (O- and H-antigen) were detected for 70 isolates (39.3%); the most frequent serotypes were O26:H11 (n = 28), O157:H7 (n = 18), and O6:H49 (n = 9). An O-antigen could be determined for 70 isolates, and the H-antigen for 176 isolates. Overall, 15 of 151 (9.9%) probes detecting AMRGs hybridized with at least 1 isolate; the most frequently detected genes were strA (n = 29), strB (n = 26), blaTEM-1 (n = 14), and sul2 (n = 22).
To compare the DNA microarray and in silico hybridization results for 121 of 393 probes (excluding the technical biotin control from the DNA microarray), a κ coefficient was calculated (Supplemental Table 1). For the remaining 272 probes, a κ coefficient could not be calculated because the results of 219 probes were negative in both approaches and the results of the remaining 53 probes were either positive or negative for all isolates in one of the assays. For 116 of 121 (95.9%) probes analyzed, the results in both tests agreed significantly (p < 0.001). An almost perfect (κ > 0.8), a substantial (0.6 < κ ≤ 0.8), and a moderate (0.4 < κ ≤ 0.6) agreement was calculated for 72 (59.5%), 35 (28.9%), and 6 probes (5.0%), respectively. The results for 3 further probes (toxB.611, fliC.H45.11, espB.O157.20) also agreed significantly (p < 0.05), but only with low κ coefficients (0.314, 0.395, and 0.098, respectively). In contrast, the results of the remaining 5 probes (rrs.612, dnaE.612, hemL.612, tirO157H45.611, fliC.H51.11) exhibited no significant agreement.
Of 69,954 reactions, 68,706 (98.2%) yielded identical results by DNA microarray and in silico probe hybridization (5,599 positive; 63,107 negative). From 1,248 differing test results, 430 were DNA microarray–/in silico hybridization+, and 818 were DNA microarray+/in silico hybridization–. With respect to the class of genes tested, the results were more congruent for genoserotypes (209 differing results from 19,758 in total; 1.1%) or AMRGs (141 of 26,878; 0.5%) than when detecting VAGs or VAG variants (876 of 22,072; 4.0%). Among the 7 control probes yielding 1,246 results, 22 results (1.8%) differed.
The discrepancies found between results obtained by DNA microarray and in silico hybridization analyses might have several causes. When using WGS scaffolds instead of complete genome sequences without gaps, it cannot be excluded that some probe sequences are missed or are of insufficient quality, leading to DNA microarray+/in silico hybridization– results. Furthermore, VAGs are predominantly located on mobile genetic elements such as pathogenicity islands (e.g., T3SS, eae), bacteriophages (e.g., stx), or plasmids (e.g. astA, ehxA) that may have been lost during the subculturing process. This may have occurred with the stx genes, where the DNA microarray and WGS analyses yielded identical results in only 158 of the 178 isolates (88.8%). Indeed, loss of bacteriophage-encoded stx genes during in vitro cultivation has been described repeatedly5,14 and must be considered when analyzing stored isolates. In our study, the DNA microarray experiments were performed several years before the sequencing and in silico hybridization. Although the isolates had been kept as glycerol stocks, the loss of genes is conceivable. Also, sequencing plasmid DNA from samples with a high genome load frequently results in difficulties with different sequencing techniques. 17 This observation is supported by our study for genes often encoded on large STEC virulence plasmids (e.g., ehxA, toxB, espP, katP 21 ). Those genes were more often DNA microarray+/in silico hybridization– than other genes and yielded κ values <0.8. It is reasonable to assume, therefore, that for plasmid-encoded genes the performance of the microarray analysis has been underestimated.
In the case of the espB.O157.20 probe (83 isolates DNA microarray+; 19 isolates in silico hybridization+; κ = 0.098), the large number of DNA microarray–positive signals mainly represent nonspecific positive signals in the DNA microarray. In detail, 19 isolates were espB.O157.20 positive in both assays. However, an additional 64 isolates that reacted in the DNA microarray also with the espB.O157.20 probe were espB.O157.20 negative and only espB.O26.40 positive in the in silico hybridization. It seems that the espB.O157.20 probe in the microarray does not differentiate between both variants.
The selection of DNA microarray probes to cover all known gene variants is a difficult task. This particularly applies when VAGs are concerned that occur in several sequence variants (e.g., intimin, Shiga toxin). As previously reported, 10 some isolates tested DNA microarray–/in silico hybridization+ with the probe tir_O157:H45_611. The probe recognized DNA of only 1 isolate by DNA microarray. Sequence data analysis revealed that the tir sequence of this isolate was identical to the probe sequence. However, in silico hybridization with probe tir_O157:H45_611 identified an additional 41 isolates with 2 mismatches (κ = 0.037). This discrepancy was unexpected, as up to 3 mismatches should be tolerated in the DNA microarray according to the manufacturer. It is unlikely that secondary structures in the biotinylated single-stranded DNA (ssDNA) hindered hybridization, because DNA is not as flexible as RNA 20 and therefore ssDNA molecules do not form stable loops or hairpins as easily as does RNA. Additionally, the internal biotinylation that is introduced by incorporation of biotin-linked dUTP nucleotides 2 during the labeling PCR step should prevent the formation of secondary structures. The stoichiometric ratio of labeled dUTP to unlabeled TTP is ~1:3. 2 Stretches with multiple successive adenine nucleotides ought to be tolerated, therefore. Given that the tir_O157:H45_611 probe does not even contain 2 consecutive adenine nucleotides, the negative results in the microarray cannot be explained at present.
The E. coli DNA microarray (Alere Technologies) that we used is suitable for typing of bovine STEC isolates for VAGs and AMRGs. Even though a few VAG variants were not detected correctly by the DNA microarray (espB, tir), the microarray provides fast, affordable, and accurate results concerning 1) the presence of VAGs in a single bovine isolate to identify the pathotype of the respective isolate and/or 2) the presence of AMRGs for preliminary inclusion or exclusion of antimicrobial drugs in a clinical setting until laboratory results of phenotypic resistance assays become available. The genoserotype prediction is limited given that the selection of O-antigens is more relevant for E. coli associated with human diseases than for bovine STEC. Nevertheless, this E. coli DNA microarray (Alere Technologies) may serve as a reliable screening tool when implemented in molecular surveillance workflows to identify isolates with new or atypical gene patterns that should be selected for further detailed analysis by WGS.
Footnotes
Acknowledgements
We thank Birgit Mintel and Susann Schares (FLI, Greifswald-Insel Riems), Anke Hinsching (FLI, Jena), and Petra Krienke (FU Berlin) for their excellent technical assistance, as well as Dr. Christian Berens (FLI, Jena) for critical reading of the manuscript.
Declaration of conflicting interests
The authors declare no potential conflicts of interest with respect to the research, authorship, and publication of this article.
Funding
This study was funded by the German Research Foundation (DFG) under grant GE2509/1-1.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
