Abstract
The exponential expansion of metagenomic data obtained through high-throughput sequencing technologies has surpassed the petabyte-scale threshold, resulting in an unprecedented abundance of data now enabling the in silico discovery of previously unknown viral and bacterial species. Here, we demonstrate the power and promise of mining sequencing data to uncover natural adeno-associated virus (AAV) cap(sid) genes, with the synergistic aims to expand our repertoire of templates for vector development and to enhance our understanding of the AAV space and of virus evolution across species. Specifically, we harnessed the Serratus Explorer to identify 29 AAV variant genomes from publicly accessible raw metagenomic data generated from birds, nonhuman primates, or human samples, of which 16 were classified as high-quality based on the high coverage of their cap region. To this end, we devised a comprehensive computational pipeline comprising (i) reference candidate selection, (ii) prealigned data acquisition, (iii) variant calling and frequency estimation, (iv) consensus calling, (v) variant resolution, (vi) phylogenetic analysis, and (vii) protein structure analysis steps. Eight representative cap genes from four different host organism species were synthesized and used to produce so-called metAAV vectors, which exhibited intriguing and biomedically relevant properties including partial escape from neutralizing anti-AAV antibodies and muscle tropism combined with robust liver detargeting in systemically injected mice. We concurrently pursued a conventional, reference-independent metagenome-based genome assembly, which also successfully reconstructed AAV cap genes but solely for abundant variants. Together with the fact that this traditional reference-independent method necessitates substantial computational resources and misses to accurately resolve multiple closely related variants, this highlights the assets and superiority of our original consensus-based reconstruction pipeline for fundamental virus research and for future gene therapy vector bioengineering.
Get full access to this article
View all access options for this article.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
