High-throughput whole genome sequencing of Porcine reproductive and respiratory syndrome virus from cell culture materials and clinical specimens using next-generation sequencing technology

Abstract

Next-generation sequencing (NGS) technologies have increasingly played crucial roles in biological and medical research, but are not yet in routine use in veterinary diagnostic laboratories. We developed and applied a procedure for high-throughput RNA sequencing of Porcine reproductive and respiratory syndrome virus (PRRSV) from cell culture–derived isolates and clinical specimens. Ten PRRSV isolates with known sequence information, 2 mixtures each with 2 different PRRSV isolates, and 51 clinical specimens (19 sera, 16 lungs, and 16 oral fluids) with various PCR threshold cycle (Ct) values were subjected to nucleic acid extraction, cDNA library preparation (24-plexed), and sequencing. Whole genome sequences were obtained from 10 reference isolates with expected sequences and from sera with a PRRSV real-time reverse transcription PCR Ct ≤ 23.6, lung tissues with Ct ≤ 21, and oral fluids with Ct ≤ 20.6. For mixtures with PRRSV-1 and -2 isolates (57.8% nucleotide identity), NGS was able to distinguish them as well as obtain their respective genome sequences. For mixtures with 2 PRRSV-2 isolates (92.4% nucleotide identity), sequence reads with nucleotide ambiguity at numerous sites were observed, indicating mixed infection; however, individual virus sequences could only be separated when 1 isolate identity and sequence in the mixture is known. The NGS approach described herein offers the prospect of high-throughput sequencing and could be adapted to routine workflows in veterinary diagnostic laboratories, although further improvement of sequencing outcomes from clinical specimens with higher Ct values remains to be investigated.

Keywords

Cell culture isolates clinical specimens lung next-generation sequencing oral fluid serum

Introduction

Porcine reproductive and respiratory syndrome (PRRS), characterized by reproductive failure in breeding females and respiratory distress in pigs of all ages, is an economically devastating disease globally. In a 2013 analysis, its economic impact on the U.S. swine industry was estimated at $664 million annually or $1.8 million per day.¹² A significant amount of time, money, and effort has been dedicated to controlling and/or preventing PRRS, including vaccination, herd management, and biosecurity. Nonetheless, the effectiveness of vaccines has been disappointing given the high rate of genetic and antigenic variability of Porcine reproductive and respiratory syndrome virus (PRRSV), the etiologic agent of PRRS.^16,25,35

PRRSV (order Nidovirales, family Arteriviridae, genus Arterivirus) is an enveloped, single-stranded, positive-sense RNA virus.⁵ The PRRSV genome is ~15 kb in length and contains at least 10 open reading frames (ORFs) in the order of: 5′ untranslated region (UTR), ORF1a/1b, ORF2a, ORF2b, ORF3, ORF4, ORF5a, ORF5, ORF6, ORF7, and 3′-UTR.³⁸ Among the structural protein genes (ORF2a–ORF7), the ORF5 encoding the major envelope glycoprotein 5 is most variable³⁸ and is commonly used for sequencing to study molecular epidemiology and/or genetic relatedness of PRRSV.^14,24,34,40 PRRSV sequencing has also often been used in making an intervention decision, including vaccine selection by swine practitioners and seed virus selection by vaccine manufacturers.^6,10,18,44 However, ORF5 only covers ~4% of the entire genome or 12% of structural genes, and thus may not be able to provide the breadth of evidence needed for differentiating PRRSV strains. In addition, PRRSV strains are divided into at least 2 distinct genotypes: type 1 (European type) and type 2 (North American type), with extensive genetic variability both within and between these genotypes.^{1,3,11,30,39,45} Moreover, PRRSV has demonstrated continuous and rapid evolution, generating new variants.^4,13,28 A rapid, high-throughput, and accurate whole genome sequencing method for PRRSV possesses enormous potential to provide more comprehensive information to diagnosticians, researchers, and the swine industry.

Great progress has been made in nucleic acid sequencing technologies.²⁶ The first high-throughput next-generation sequencing (NGS) technology, namely Roche 454 FLX platform, became available in 2005 (http://www.454.com). Subsequently, Illumina released the Genome Analyzer in early 2007 (http://www.illumina.com), and Applied Biosystems distributed the SOLiD System 2.0 platform (http://www.appliedbiosystems.com) in 2008. These novel sequencing techniques, characterized by massive sequence output, low cost per base, and short turnaround time, have significantly changed the way we understand pathogens. Theoretically, the metagenomics-based strategy (i.e., hypothesis-free) is able to detect novel pathogens or variants, and simultaneously identify mixed infection of microorganisms, including bacteria, viruses, and parasites.³³ It can also be used to study the dynamics of pathogens and associated microbiota in animal specimens during the course of infectious diseases.^9,42 Despite many advantages, the use of NGS in veterinary testing is in its infancy. We aimed to establish a fast, cost-effective, and high-throughput procedure using NGS technology and to evaluate its application to determine the whole genome sequences of PRRSV strains in cell culture–derived isolates and clinical specimens.

Materials and methods

Reference PRRSV isolates

Six PRRSV-2 (VR-2332, VR-2385, NADC20, MN184, ISU-P [ATCC VR-2402], and SDSU73 [ATCC PTA-6322]), 1 PRRSV-1 (Lelystad strain), and 3 PRRSV-2 vaccine (MLV vaccine,^a ATP vaccine,^b and F vaccine^c) strains were used in the study. These reference strains were selected because they are commonly used in research studies, and their complete genome sequences have been published,^{3,27,31,32,43} except for NADC20. PRRSV reference strains were propagated in MARC-145 cells, a clone of the African monkey kidney cell line MA-104,¹⁵ and had infectious titers of 10⁴ to 10⁶ TCID₅₀/mL (50% tissue culture infective dose per milliliter) before being used for sequencing.

Mixtures of PRRSV isolates

PRRSV Lelystad isolate and PRRSV VR-2385 isolate were manually mixed in the volume ratio of 1:1 to create a mixture containing a PRRSV-1 and -2 isolate. Similarly, MLV vaccine^a and PRRSV VR-2385 isolates were manually mixed in the volume ratio of 1:1 to create a mixture containing both PRRSV-2 vaccine strain and wild-type isolate.

Clinical specimens

Nineteen sera, 16 lungs, and 16 oral fluid samples were selected from submissions to the Iowa State University Veterinary Diagnostic laboratory (ISU VDL; Ames, Iowa) and used in attempts of whole genome sequencing. The samples tested positive with various threshold cycle (Ct) values by a commercial PRRSV real-time reverse transcription (RT)-PCR.^d

Total RNA extraction for NGS and RNA quantification

Aforementioned PRRSV reference isolates, mixtures of PRRSV isolates, and clinical samples were used for NGS attempts. For virus isolates, serum samples, and oral fluids, 300 µL of each sample were centrifuged at 4,200 × g for 30 min at 4°C to remove host cells and/or other debris. Two hundred microliters of supernatants were then taken for total viral RNA extraction. For lung tissues, 10% (wt/vol) homogenates were made in Earle balanced salt solution.^e Ten milliliters of each homogenate were then centrifuged at 4,200 × g for 30 min at 4°C. All of the supernatants were further filtered through a 0.22-μm membrane filter and then centrifuged at 180,000 × g at 4°C for 3 h. Next, the supernatants were discarded, and resulting pellets were resuspended in 200 µL of sterile phosphate-buffered saline (PBS; 1× pH 7.4)^f for total viral RNA extraction. To extract total RNA from the preprocessed samples, a magnetic particle processor^g and a viral RNA isolation kit^h were used following the manufacturer’s instructions, with minor modifications. Briefly, 200 μL of each sample and 400 μL of lysis/binding solution without carrier RNA were added into wells along with 20 μL of bead-enhancer mix. After 2 washes with 300 µL of wash solution I and 2 washes with 450 µL of wash solution II, RNAs were eluted in 50 µL of RNase-free water and stored at −80°C until use. Alternatively, total RNA extraction was performed using a column-based total RNA purification kit.ⁱ This method produced a similar RNA yield and quality as did the semiautomatic RNA extraction method with the viral RNA isolation kit^h described above. A spectrophotometer^j and a RNA assay kit^k were used to quantify the RNA in extracts.

Library preparation for NGS

Complementary (c)DNA libraries were constructed from total RNA using a total RNA sample preparation kit^l following the low-sample protocol in manufacturer’s guidelines, with some modifications. The library preparation is summarized in the following 6 steps.

Step 1: rRNA depletion

For the total RNA from lung tissue, ribosomal (r)RNA depletion was performed. Briefly, 0.1–1 μg of total RNA was diluted with nuclease-free water to a final volume of 10 μL. Ribosomal RNA was removed using a commercial kit.^m Finally, 8.5 μL of clean RNA was used in the next steps. Ribosomal RNA depletion was not performed on total RNA extracted from cell culture–derived isolates, serum, or oral fluid samples.

Step 2: RNA fragmentation

The time for fragmentation was optimized to 2 min to generate a median insert size of 340 nucleotides. Distribution of size of RNA fragment was confirmed using a bioanalyzer.ⁿ The fragmentation mixture was immediately subjected to cDNA synthesis.

Step 3: cDNA synthesis

The first-strand cDNA was synthesized using random primers (hexamer) and reverse transcriptase. The RNA template was then removed and a replacement strand was synthesized to generate double-stranded cDNA. These procedures were performed according to the low-sample protocol without in-line control reagent followed by DNA cleanup using a commercial kit.^o

Step 4: 3′-end adenylation and adapter ligation

A single “A” nucleotide was added to the 3′-end of the blunt fragments, and multiple indexing adapters were ligated to the fragment by the complementary overhang with the corresponding single “T” nucleotide at the 3′-end of the adapter. Up to 24 indexing adapters were used to distinguish 24 different samples. DNA was then cleaned up twice using a commercial kit.^o

Step 5: Enrichment of DNA fragments

DNA fragments with adapter molecules at both ends were PCR amplified, and the PCR products were cleaned up using a commercial kit.^o PCR conditions were 98°C for 30 s, 15 cycles of 98°C for 10 s, 60°C for 30 s and 72°C for 30 s, and 72°C for 5 min.

Step 6: Library quantification, normalization, pooling, and sequencing

Libraries were analyzed for size distribution using a bioanalyzerⁿ and further quantified by a PRRSV real-time RT-PCR using a quantification kit^p in accordance with the manufacturer’s protocol. Multiplex libraries were then normalized to 10 nM with nuclease-free water and pooled in an equal volume before loading to a flow cell. The pooled libraries were sequenced on a NGS sequencer^q with 150-bp end reads by following the operation manual at the ISU DNA Facility (Ames, Iowa).

NGS data analysis

Keywords “PRRSV complete genome” were used to search against the NCBI nucleotide database, resulting in 459 PRRSV whole genome sequences that were saved as a reference genome library for reads mapping. The adapter-filtered original paired-end reads were mapped against the PRRSV reference genome library using mapping software.^r,20 Sequence IDs of mapped reads were extracted using a sequence alignment tool.^s,21 The new paired-end read sets were extracted from the original paired-end FASTQ files using seqtk (https://github.com/lh3/seqtk) with the list of mapped IDs to ensure both reads of 1 pair were selected. Then, the mapped read sets were used for de novo assembly using assembly software.^t,37 k-mer sized from 20 to 64 were tested, and the best assembly was chosen according to the maximal contig. If a single contig of full-length genome was not achieved after de novo assembly, all contigs exported from the assembly software^t were reassembled with or without a reference genome sequence using another software.^u If a consensus sequence of full-length genome could not be assembled without a reference, the reference genome obtained via BLAST search of the longest contig was added to the software^u to yield a whole genome consensus. Then, the reference genome sequence was deleted from the assembly, resulting in a consensus genome with gaps properly annotated.

For analyzing sequence data generated from a mixture containing PRRSV-2 MLV vaccine strain and PRRSV-2 VR-2385 isolate, analysis procedures described above (mapping against the PRRSV reference genome library followed by de novo assembly) were first performed. Subsequently, Integrative Genomics Viewer (IGV)^36,41 was used to align sequence reads against the obtained consensus sequences. This process enables visualization of nucleotides at each position of sequencing reads.

PRRSV real-time RT-PCR tests and PRRSV ORF5 sequencing

Nucleic acids were extracted from serum (50 µL), lung homogenate (50 µL), and oral fluids (100 µL) using a commercial RNA/DNA extraction kit^v and an automatic instrument^g following the instructions of the manufacturer. Nucleic acids were eluted into 90 µL of elution buffer. A commercial PRRSV real-time RT-PCR^d was used to test samples for the presence or absence of PRRSV RNA according to the manufacturer’s instructions using a real-time PCR instrument.^w

PRRSV ORF5 sequencing was performed using different procedures depending on virus genotypes. For PRRSV-2, a primary primer set (forward primer P5F2 5′-AAGGTGGTATTTGGCAATGTGTC-3′ and reverse primer P5R2 5′-GAGGTGATGAATTTCCAGGTTTCTA-3′) was used for RT-PCR amplification of a fragment (1,082 bp) covering PRRSV ORF5 and 5′- and 3′-flanking regions using a 1-step RT-PCR kit^x following the cycling conditions of: 48°C for 20 min; 94°C for 3 min; 45 cycles of 94°C for 30 s, 50°C for 50 s, and 68°C for 50 s; and 68°C for 7 min. Detection of PCR products was conducted with an advanced detection system.^y If the PCR products with desired size (~1,082 bp) were obtained, the products were purified using a commercial kit^z and submitted to the ISU DNA Facility for sequencing. Sequence data were analyzed using software^u described above. If products of correct size were not obtained, alternative primers (forward primer ORF5U 5′-GGTGGGCAACKGTTTTAGCCTGTC-3′ and reverse primer ORF5L 5′-GGTAATAGARAAYGCCAAAAGCACC-3′) were used to RT-PCR amplify the ORF5 and 5′-and 3′-flanking regions using a one-step RT-PCR kit^x with the same cycling conditions as described for the primary primer set except that the annealing temperature was changed to 52°C. If the products of correct size (723 bp) were obtained, the products were purified and sequenced as described above; if the correct-size products were not obtained, the sample was considered unsuccessful for ORF5 sequencing.

For PRRSV-1, forward primer L1F 5′-TGAGGTGGGCTACAACCATT-3′ and reverse primer L1R 5′-AGGCTAGCACGAGCTTTTGT-3′ and a 1-step RT-PCR kit^x were used to amplify the PCR product. The cycling conditions were the same as described for the PRRSV-2 primary primer set except that the annealing temperature was changed to 55°C. If the products with the expected size of 702 bp were obtained, the products were purified and sequenced as described above; if not, the sample was considered unsuccessful for ORF5 sequencing.

Results

Next-generation sequencing outcomes on PRRSV reference isolates

Ten well-characterized PRRSV isolates were chosen as reference strains to optimize the NGS procedures, including 6 PRRSV-2 isolates, 1 PRRSV-1 isolate, and 3 vaccine strains. Full-length genome sequences were obtained from all 10 PRRSV reference isolates using NGS technology (Table 1). The total reads of PRRSV isolates ranged from 91,932 to 1,726,206, and ~1.87–74.2% of the total reads were mapped to a reference PRRSV genome library. Although the percentage of mapped reads was only 1.87% for isolate VR-2385, its number of mapped reads was 31,702, and a full-length genome of 14,956 nucleotides (nt) was still assembled. The smallest number of mapped reads was 6,090 from the NADC20 isolate, representing ~61× coverage of the PRRSV genome. The whole genome sequences of the 10 virus isolates obtained using NGS technology in our study, except NADC20, were compared with the corresponding sequences available in GenBank, which were determined by the traditional Sanger method (Table 1). The complete genome sequence of the F vaccine was not available in GenBank; however, its parental strain PRRSV P129 sequence is available, which was used for comparison to the NGS data. The nucleotide identities between the NGS sequences and the previously reported sequences ranged from 99.3–99.9% for all PRRSV isolates evaluated.

Table 1.

Summary of custom* RNA-Seq next-generation sequencing (NGS) on Porcine reproductive and respiratory syndrome virus (PRRSV) reference isolates.

PRRSV	Ct value	No. of total reads	No. of reads mapped	% Reads mapped	Full-length genome by NGS	Longest contig (bp)	GenBank accession of whole genome public sequence	Nucleotide identity between NGS and public sequences (%)
VR2332	12.4	1,027,668	762,176	74.17	Yes	15,396	AY150564	99.9
VR2385	11.1	1,690,980	31,702	1.87	Yes	14,956	JX044140	99.8
NADC20	11.2	165,146	6,090	3.69	Yes	15,393	Not available
MN184C	12.6	988,292	62,480	6.32	Yes	15,001	EF488739	99.3
ISU-P	13.5	1,596,536	774,848	48.53	Yes	15,387	EF532816	99.9
SDSU73	17.4	1,253,312	896,603	71.54	Yes	15,394	JN654458	99.9
Lelystad	14.1	1,267,240	309,634	24.43	Yes	15,083	M96262	99.8
MLV vaccine†	16.6	91,932	54,181	58.94	Yes	15,441	AF066183	99.9
ATP vaccine‡	20.7	113,246	22,951	20.27	Yes	15,395	DQ988080	99.9
F vaccine§	12.9	1,726,406	394,423	22.85	Yes	15,379	AF494042\|\|	99.4

Illumina, San Diego, California.

†

MLV vaccine: Ingelvac PRRS MLV vaccine, Boehringer Ingelheim Vetmedica, St. Joseph, Missouri.

‡

ATP Vaccine: Ingelvac PRRS ATP vaccine, Boehringer Ingelheim Vetmedica, St. Joseph, Missouri.

F vaccine: Fostera PRRS vaccine, Zoetis, Florham Park, New Jersey.

The complete genome sequence of the vaccine 3 is not available in GenBank. But the complete genome sequence of PRRSV P129 strain, the parental virus from which the vaccine 3 was derived, was available (AF494042), and its sequence was compared with our NGS data.

Next-generation sequencing outcomes on mixtures with 2 PRRSV isolates

For a mixture containing a PRRSV-1 isolate (Lelystad strain) and a PRRSV-2 isolate (VR-2385), 2 sequences corresponding to the Lelystad strain and VR-2385 strain, respectively, were de novo assembled and obtained when raw sequence reads were mapped against the reference PRRSV genome library including 459 PRRSV whole genomes. However, for a mixture containing a PRRSV-2 MLV vaccine strain and a PRRSV-2 wild-type isolate (VR-2385), similar analysis procedures only resulted in 1 consensus full-length genomic sequence; the MLV vaccine and VR-2385 genomic sequences could not be separately assembled. When the IGV software was used to align all sequence reads against the obtained consensus whole genome sequence, nucleotide variations at some positions of raw sequencing reads were observed, indicating a mixed infection. Because nucleotide variation at each single position could be clearly visualized, and identification of the PRRSV strains in the mixture was known, this enabled us to determine which nucleotide corresponded to the MLV vaccine and VR-2385, respectively, at those positions with nucleotide ambiguity.

Next-generation sequencing outcomes on clinical specimens

PRRSV whole genome sequencing was attempted using NGS directly from 19 sera, 16 lungs, and 16 oral fluid samples with different Ct values (Tables 2 –4). Ct values of samples had an impact on the outcome and quality of the NGS data. For example, 52.4% (569,911 of 1,087,018) of the total reads was mapped to PRRSV genomes in a serum sample with a Ct value of 15.1, whereas only 0.89% (19,699 of 2,223,592) of the total reads was mapped to PRRSV genomes in a serum sample with a Ct value of 23.6 (Table 2). From serum samples with Ct 15.1–23.6 (n = 10), full-length genomic sequences were successfully obtained without any gaps. From a serum sample with Ct 24.3, most of the genome sequences were obtained except for 8 gaps of <150 bp and 1 gap of <750 bp. From serum samples with Ct 25.0–26.8 (n = 4), some sequence contigs could be assembled, albeit with some gaps at different sizes; full-length sequences were not obtained. When Ct values increased to ≥28, no contigs were obtained for assembly (Table 2).

Table 2.

Summary of custom* RNA-Seq next-generation sequencing outcomes on serum samples.

Sample ID	Ct value	No. of total reads	No. of reads mapped	% Reads mapped	Full-length genome	Total bases assembled	Longest contig (bp)	No. of total gaps	No. of gaps
Sample ID	Ct value	No. of total reads	No. of reads mapped	% Reads mapped	Full-length genome	Total bases assembled	Longest contig (bp)	No. of total gaps	<150 bp	<750 bp	<1,500 bp	≥1,500 bp
14994-22VD	15.1	1,087,018	569,911	52.43	Yes	14,997	14,997	0	0	0	0	0
26109	17.1	308,880	17,838	5.78	Yes	14,999	14,999	0	0	0	0	0
1295	19.1	411,092	79,321	19.30	Yes	15,510	15,510	0	0	0	0	0
S4099	19.7	1,180,906	24,489	2.07	Yes	15,025	15,025	0	0	0	0	0
30820	19.8	333,428	52,996	15.89	Yes	14,959	14,959	0	0	0	0	0
S5342	20.5	1,554,098	24,895	1.60	Yes	15,002	15,002	0	0	0	0	0
PRRpig1	22.0	1,029,532	9,086	0.88	Yes	15,340	15,340	0	0	0	0	0
S0652A	22.2	1,084,508	8,503	0.78	Yes	15,000	15,000	0	0	0	0	0
PRRpig11	23.0	970,396	112,907	11.64	Yes	15,398	15,398	0	0	0	0	0
S5432	23.6	2,223,592	19,699	0.89	Yes	15,396	15,396	0	0	0	0	0
S4969	24.3	1,650,972	2,305	0.14	No	14,614	4,296	9	8	1	0	0
S4825	25.0	1,681,672	422	0.03	No	2,155	270	11	1	5	3	2
S2154	25.2	1,457,568	582	0.04	No	4,354	580	24	8	10	4	2
S13059316	26.2	1,416,986	2,302	0.16	No	10,546	1,485	19	11	7	1	0
S0652B	26.8	1,327,504	529	0.04	No	4,491	649	19	3	12	2	2
S4143	28.0	1,656,678	321	0.02	No
S4004	29.7	1,407,952	4,728	0.34	No
15-68894-7	32.6	424,734	52	0.01	No
15-69155-18	35.2	87,874	34	0.04	No

Illumina, San Diego, California.

Table 3.

Summary of custom* RNA-Seq next-generation sequencing outcomes on lung samples.

Sample ID	Ct value	No. of total reads	No. of reads mapped	% Reads mapped	Full-length genome	Total bases assembled	Longest contig (bp)	No. of total gaps	No. of gaps
Sample ID	Ct value	No. of total reads	No. of reads mapped	% Reads mapped	Full-length genome	Total bases assembled	Longest contig (bp)	No. of total gaps	<150 bp	<750 bp	<1,500 bp	≥1,500 bp
30324	16.4	3,367,976	136,882	4.06	Yes	15,068	15,068	0	0	0	0	0
31034	16.6	5,915,908	100,744	1.70	Yes	14,981	14,981	0	0	0	0	0
30131	17.3	3,418,010	39,526	1.16	Yes	15,011	15,011	0	0	0	0	0
31861	19.2	6,332,688	60,408	0.95	Yes	15,000	15,000	0	0	0	0	0
14-1179A	21.0	963,362	18,666	1.94	Yes	15,377	15,377	0	0	0	0	0
4302	21.3	910,914	12,116	1.33	No	10,499	2,043	23	11	11	1	0
4945	22.2	845,346	1,388	0.16	No	9,897	1,713	21	15	5	0	1
4902	22.9	1,114,286	6,176	0.55	No	6,545	698	26	9	14	3	0
4183	23.1	1,061,682	6,414	0.60	No	1,222	360	6	0	2	1	3
4210	23.3	1,243,518	16,497	1.33	No	1,433	304	8	1	3	0	4
14-8412	26.3	912,364	1,239	0.14	No	712	150	4	0	0	2	2
15-73891A	27.8	440,208	56	0.01	No
15-71435A	29.0	379,282	89	0.02	No
7033A	31.3	916,084	398	0.04	No
71759A	32.0	1,354,144	576	0.04	No
70451A	35.6	1,498,278	740	0.05	No

Illumina, San Diego, California.

Table 4.

Summary of custom* RNA-Seq next-generation sequencing outcomes on oral fluid samples.

Sample ID	Ct value	No. of total reads	No. of reads mapped	% Reads mapped	Full-length genome	Total bases assembled	Longest contig (bp)	No. of total gaps	No. of gaps
Sample ID	Ct value	No. of total reads	No. of reads mapped	% Reads mapped	Full-length genome	Total bases assembled	Longest contig (bp)	No. of total gaps	<150 bp	<750 bp	<1,500 bp	≥1,500 bp
71566-5	18.7	1,157,864	31,354	2.71	No	15,365	8,186	1	1	0	0	0
71566-6	18.8	2,421,536	112,603	4.65	No	15,479	8,196	1	1	0	0	0
71566-4	20.4	1,859,260	60,275	3.24	Yes	15,387	15,387	0	0	0	0	0
71566-7	20.6	2,226,248	71,825	3.23	Yes	15,367	15,367	0	0	0	0	0
73567-1	22.9	1,564,396	200	0.01	No	4,943	585	28	9	15	4	0
71033-1	26.0	1,051,242	339	0.03	No	6,978	790	29	15	13	1	0
O4455-56	27.8	1,455,220	788	0.05	No
O3896	28.2	2,022,650	9	0.00	No
O4455-1112	28.4	1,751,840	376	0.02	No
O4137	29.0	1,414,370	9	0.00	No
O3157	29.1	1,509,564	1,051	0.07	No
O3711	29.7	1,225,818	1,094	0.09	No
O4419	29.8	1,503,928	21	0.00	No
68203-6	32.2	2,101,192	306	0.01	No
72166-6	34.0	1,737,050	299	0.02	No
70093-3	35.4	1,419,278	200	0.01	No

Illumina, San Diego, California.

For lung samples, the total reads that could be mapped to PRRSV reference genomes decreased with increasing real-time RT-PCR Ct values (Table 3). Full-length PRRSV genomic sequences were obtained from lung tissues with Ct values 16.4–21.0 (n = 5). From lung tissues with Ct values of 21.3–26.3 (n = 6), some sequence contigs could be assembled, but gaps were present, and full-length sequences were not obtained. When Ct values increased to ≥27.8, no contigs were obtained for assembly (Table 3).

Whole genome sequencing was attempted on 16 oral fluids with Ct values of 18.7–35.4 (Table 4). From oral fluids with Ct values of 18.7 and 18.8 (n = 2), full-length genomic sequences were obtained, except 1 gap of <150 bp. From oral fluids with Ct values 20.4 and 20.6 (n = 2), full-length genomic sequences without gaps were obtained. From oral fluids with Ct values of 22.9–35.4 (n = 12), only a few reads could be mapped to the PRRSV genome, and no full-length genomic sequences were obtained from these samples.

Discussion

There are several advantages of first attempting NGS on virus isolates: 1) PRRSV isolates grown in cell cultures are relatively “pure” with fewer host gene sequences; 2) abundant PRRSV genetic material is present in the cell culture–derived isolates; 3) starting with PRRSV isolates can help to optimize procedures for sequencing and analysis; and 4) inclusion of some PRRSV reference isolates with known sequences can confirm the integrity and quality of the sequence data obtained by NGS technology. In the current study, 10 well-characterized PRRSV isolates were hence first used to optimize the NGS procedures. Whole genome sequences of all 10 PRRSV reference isolates were successfully obtained using NGS. For these PRRSV isolates, the nucleotide identities between the NGS sequences and the previously reported sequences were 99.3–99.9%. It is not surprising that the nucleotide identities were not 100% because the virus isolates used for sequencing in our study may not have the same passage history as the isolates whose sequences have been deposited in GenBank.

A key question was whether NGS technology can detect and distinguish a mixed infection with 2 PRRSV strains. We found that, by using routine sequence data analysis procedures (mapping sequence reads to reference PRRSV genome library followed by de novo assembly), NGS was able to detect and distinguish a mixture containing a PRRSV-1 Lelystad strain and a PRRSV-2 VR-2385 strain (these 2 strains have 57.8% nucleotide identity at the whole genome level) with their respective full genomic sequences obtained. However, for a mixture containing a PRRSV-2 vaccine strain (the MLV vaccine^a) and a PRRSV-2 VR-2385 strain (these 2 strains have 92.4% nucleotide identity at the whole genome level), routine sequence analysis procedures including de novo assembly could obtain the consensus of 2 genome sequences but could not obtain separately assembled genome sequences for 2 viruses. Use of another bioinformatics program (i.e., IGV software) showed nucleotide variations at each single position of sequence reads and enabled us to demonstrate the presence of viral quasi-species or more than 1 PRRSV strain. Nevertheless, the IGV software could not automatically distinguish the MLV vaccine from the VR-2385 sequence reads to assemble into their respective full genomic sequences. If at least 1 PRRSV isolate in the mixture has known genome sequences, it may be possible to manually determine whether a nucleotide at one position belongs to this PRRSV isolate or not. Generally speaking, intratypic pairwise nucleotide sequence variations are up to 30% among PRRSV-1 viruses and more than 21% in PRRSV-2 viruses, respectively, based on ORF5 sequences.⁴⁷ The MLV vaccine and VR-2385 viruses evaluated in our study have ~7% nucleotide differences. Based on the current study data, we cannot definitively conclude that NGS cannot distinguish and obtain genome sequences of two PRRSV-1 viruses or two PRRSV-2 viruses in a mixture; the outcomes may depend on how different the 2 respective viruses are in a mixture. This area remains to be further investigated.

From the laboratory and swine industry points of view, it is important to establish an efficient procedure to determine whole genome sequences directly from clinical samples. Serum, oral fluid, and lung samples are currently the 3 most common specimen types submitted by swine veterinarians for PRRSV PCR testing. Therefore, serum, lung, and oral fluid were chosen as clinical specimen types in our study to evaluate the performance of NGS technology in determining whole genome sequences of PRRSV. Full-length genomic sequences were successfully obtained from serum samples with Ct values 15.1–23.6, lung samples with Ct values 16.4–21, and oral fluid samples with Ct values 18.7–20.6. When Ct values were ≥24.3 for serum samples, ≥21.3 for lung samples, and ≥22.9 for oral fluids, full-length sequences could not be obtained, with assembly of no or only partial contigs possible. This implies that different specimen types evaluated in this study did not markedly affect the NGS efficiency; instead the Ct value of PRRSV in a sample is inversely related to the NGS outcomes. In addition to Ct values of PRRSV, the sample quality and PRRSV RNA integrity could also be potential factors affecting NGS outcomes on various samples. The RNA integrity number (RIN) determined by a bioanalyzer has generally been used to estimate the integrity of eukaryote and bacterial total RNA. However, it has been a challenge to accurately and specifically assess viral RNA integrity because viruses generally lack rRNA.

The NGS data obtained in our study together with PRRSV PCR–positive tests (Table 5) suggest that ~43% of PRRSV PCR–positive serum samples submitted to ISU VDL in 2014 had Ct <25 and would likely undergo successful whole genome sequencing by NGS; ~40% of PRRSV PCR–positive lung samples had Ct <22 and would likely undergo successful full-length genome sequencing by NGS; ~0.14% of PRRSV PCR–positive oral fluids had Ct <25 and would likely undergo successful whole genome sequencing by NGS. We do not recommend performing NGS on clinical serum, lung, and oral fluid samples with Ct >25 in the expectation of obtaining full-length PRRSV genomic sequences from these samples.

Table 5.

Number and percentage of Porcine reproductive and respiratory syndrome virus (PRRSV) PCR–positive tests with stratified threshold cycle (Ct) values in 2014 at the Iowa State University Veterinary Diagnostic Laboratory (Ames, Iowa).

Specimen	No. of PRRSV PCR–positive tests with stratified Ct values							Total
Specimen	<10	10–<20	20–<25	25–<28	28–<31	31–<33	33–<37	Total
Serum	4	1,575	2,234	1,500	1,435	821	1,194	8,763
Lung	0	847	1,086	397	362	250	377	3,319
Oral fluid	0	1	9	390	1,839	1,685	3,089	7,013
	Percentage of PRRSV PCR–positive tests with stratified Ct values							Total (%)
	<10	10–<20	20–<25	25–<28	28–<31	31–<33	33–<37	Total (%)
Serum	0.05	17.97	25.49	17.12	16.38	9.37	13.63	100.00
Lung	0.00	25.52	32.72	11.96	10.91	7.53	11.36	100.00
Oral fluid	0.00	0.01	0.13	5.56	26.22	24.03	44.05	100.00

Using the traditional Sanger method, all serum, lung, and oral fluid samples with Ct <31 had >95% success rates in PRRSV ORF5 sequencing; for serum, lung, and oral fluid samples with Ct 31–<33, the ORF5 sequencing success rates were 92.1%, 76.1%, and 86.7%, respectively; for serum, lung, and oral fluid samples with Ct 33–<37, the ORF5 sequencing success rates were 78.8%, 29.8%, and 57.2%, respectively (Table 6). Thus, for clinical serum, lung, and oral fluid samples with Ct 25–37, NGS may not be a good choice for whole genome sequencing but ORF5 sequencing using the Sanger method would have fairly good success rates on these samples and could be an option for swine practitioners, diagnosticians, and researchers.

Table 6.

Correlation of Porcine reproductive and respiratory syndrome virus (PRRSV) real-time reverse transcription PCR threshold cycle (Ct) values to ORF5 Sanger sequencing (Seq) outcome based on specimen types.*

Specimen	Ct range	Total	ORF5 Seq Neg	ORF5 Seq Pos	ORF5 Seq success rate (%)
Serum	<25	933	1	932	99.89
	25–<28	389	3	386	99.23
	28–<31	295	4	291	98.64
	31–<33	178	14	164	92.13
	33–<37	165	35	130	78.79
Subtotal		1,960	57	1,903	97.09
Lung	<25	821	2	819	99.76
	25–<28	119	1	118	99.16
	28–<31	110	4	106	96.36
	31–<33	67	16	51	76.12
	33–<37	57	40	17	29.82
Subtotal		1,174	63	1,111	94.63
Oral fluid	<25	2	0	2	100.00
	25–<28	91	0	91	100.00
	28–<31	365	16	349	95.62
	31–<33	293	39	254	86.69
	33–<37	407	174	233	57.25
Subtotal		1,158	229	929	80.22

Iowa State University Veterinary Diagnostic Laboratory (Ames, Iowa) 2014–2015 data. Only 31 sera, 30 lungs, and 13 oral fluid samples containing PRRSV-1 were sequenced, and these samples had scattered Ct distributions. Numbers shown in this table represent the sum of both PRRSV-1 and -2 strains for each specimen at each Ct range.

It must be noted that, in our study, 24 samples were multiplexed in a single NGS sequencing reaction. If fewer samples are multiplexed in a single NGS sequencing reaction, the success rate of obtaining the PRRSV whole genome sequences from clinical specimens using NGS MiSeq system could be higher, but the cost per sample would increase accordingly. Another limitation of the current study is that only 51 serum, lung, and oral fluid samples with different ranges of PRRSV PCR Ct values were tested by NGS; testing more clinical samples in each Ct range would increase statistical power and draw a stronger conclusion on the relationship between Ct values and NGS success rate. This warrants further investigation in future studies.

We developed an efficient and streamlined NGS approach to quickly determine the complete genomic sequences of PRRSV in cell culture–derived isolates and clinical specimens. Our method does not require sequence-specific primers for PCR amplification or bait-based enrichment as described previously,¹⁷ thus limiting the possibility of primer-based bias in the processing of samples and providing the opportunity of detecting any PRRSV variants at the same time. Our strategy is also different from that described in some previous studies,^22,23 which relies on bioinformatic filtering of host reads. In our NGS approach, total RNA is directly extracted from virus isolates and/or clinical specimens, and the cDNA libraries of different samples can be indexed and simultaneously sequenced, making it an efficient approach for use in diagnostic investigations and research. Regarding data analysis, previous methods of reference mapping usually used 1 reference genome,² which is restricted to the samples with known virus strains. However, considering the high genetic diversity among PRRSV strains, a single reference genome might not be similar enough for mapping reads, and consensus sequences may not be constructed successfully using SAMtools mpileup command.²¹ Therefore, we built a reference genome library containing all of the PRRSV genome sequences publicly available to extend the range of mapping objects. Usually, more reads can be mapped onto the reference genome library as compared to mapping against a single genome, and thereby more reads of viruses of interest can be collected for de novo assembly. The approach described in the current study has also been used to successfully obtain the whole genome sequences of other RNA viruses, such as Porcine epidemic diarrhea virus, porcine deltacoronavirus, Equine arteritis virus, and Senecavirus A, in forms of cell culture–derived isolates or clinical specimens with high concentrations of viruses.^7,8,19,29,46 However, the success rate of determining the whole genome sequences of PRRSV and other viruses from clinical samples with high Ct values was not satisfactory using our current NGS approach. Additional work remains to improve success rates of whole genome sequencing using NGS technology on clinical samples with low viral concentrations.

Footnotes

Acknowledgements

We thank Drs. Chandrasen Soans, Michael Bishop, and others from Illumina for technical training on next-generation sequencing.

Authors’ note

Jianqiang Zhang and Ying Zheng contributed equally to this work.

Authors’ contributions

J Zhang contributed to conception and design of the study; contributed to analysis and interpretation of data; drafted the manuscript; critically revised the manuscript; and gave final approval. Y Zheng contributed to acquisition and analysis of data, and drafted the manuscript. XQ Xia contributed to analysis of data. Q Chen and SA Bade contributed to acquisition of data and critically revised the manuscript. KJ Yoon contributed to conception of the study; critically revised the manuscript; and gave final approval. KM Harmon, PC Gauger, and RG Main critically revised the manuscript and gave final approval. G Li contributed to conception and design of the study; contributed to acquisition, analysis, and interpretation of data; drafted the manuscript; critically revised the manuscript; and gave final approval. All authors agreed to be accountable for all aspects of the work in ensuring that questions relating to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

a.

MLV vaccine: Ingelvac PRRS MLV, Boehringer Ingelheim Vetmedica, St. Joseph, MO.

b.

ATP vaccine: Ingelvac PRRS ATP, Boehringer Ingelheim Vetmedica, St. Joseph, MO.

c.

F vaccine: Fostera PRRS, Zoetis, Florham Park, NJ.

d.

VetMax NA and EU PRRSV reagents, Thermo Fisher Scientific, Waltham, MA.

e.

Earle’s balanced salt solution, Sigma-Aldrich, St. Louis, MO.

f.

Phosphate buffered saline (PBS, 1× pH 7.4), Thermo Fisher Scientific, Waltham, MA.

g.

Kingfisher-96 magnetic particle processor, Thermo Fisher Scientific, Waltham, MA.

h.

MagMAX-96 viral RNA isolation kit, Thermo Fisher Scientific, Waltham, MA.

i.

Total RNA purification kit, Norgen Biotek, Thorold, Ontario, Canada.

j.

Qubit 2.0 spectrophotometer, Thermo Fisher Scientific, Waltham, MA.

k.

Qubit RNA BR (Broad-Range) assay kit, Thermo Fisher Scientific, Waltham, MA.

l.

TruSeq stranded total RNA sample preparation kit, Illumina, San Diego, CA.

m.

Agencourt RNAClean XP, Beckman Coulter, Brea, CA.

n.

Agilent 2100 Bioanalyzer, Agilent Technologies, Santa Clara, CA.

o.

Agencourt AMPure XP, Beckman Coulter, Brea, CA.

p.

KAPA library quantification kit, KAPA Biosystems, Wilmington, MA.

q.

Illumina MiSeq sequencer, Illumina, San Diego, CA.

r.

Burrows-Wheeler Aligner (BWA)-MEM (v0.7.5a),

s.

Sequence Alignment/Map tools (SAMtools, v0.1.19),

t.

Assembly By Short Sequences (ABySS, v1.3.7),

u.

DNASTAR Lasergene 11 Core Suite, DNASTAR, Madison, WI.

v.

MagMAX pathogen RNA/DNA kit, Thermo Fisher Scientific, Waltham, MA.

w.

ABI 7500 Fast instrument, Thermo Fisher Scientific, Waltham, MA.

x.

qScript custom one-step RT-PCR kit, Quanta Biosciences, Gaithersburg, MD.

y.

QIAxcel advanced system, Qiagen, Valencia, CA.

z.

ExoSAP-IT kit, Affymetrix, Santa Clara, CA.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported in part by funds from Zoetis and the Iowa State University Veterinary Diagnostic Laboratory. Illumina provided in-kind support of some reagents.

References

Allende

, et al. North American and European porcine reproductive and respiratory syndrome viruses differ in non-structural protein coding regions. J Gen Virol 1999;80:307–315.

Batty

, et al. A modified RNA-Seq approach for whole genome sequencing of RNA viruses from faecal and blood samples. PLoS One 2013;8:e66129.

Brockmeier

, et al. Genomic sequence and virulence comparison of four type 2 porcine reproductive and respiratory syndrome virus strains. Virus Res 2012;169:212–221.

Butler

, et al. Porcine reproductive and respiratory syndrome (PRRS): an immune dysregulatory pandemic. Immunol Res 2014;59:81–108.

Cavanagh

. Nidovirales: a new order comprising Coronaviridae and Arteriviridae. Arch Virol 1997;142:629–633.

Chen

, et al. Genetic variation of Chinese PRRSV strains based on ORF5 sequence. Biochem Genet 2006;44:425–435.

Chen

, et al. Pathogenicity and pathogenesis of a United States porcine deltacoronavirus cell culture isolate in 5-day-old neonatal piglets. Virology 2015;482:51–59.

Chen

, et al. Isolation and characterization of porcine epidemic diarrhea viruses associated with the 2013 disease outbreak among swine in the United States. J Clin Microbiol 2014;52:234–243.

Fabre

, et al. Modelling the evolutionary dynamics of viruses within their hosts: a case study using high-throughput sequencing. PLoS Pathog 2012;8:e1002654.

10.

Fang

, et al. Heterogeneity in Nsp2 of European-like porcine reproductive and respiratory syndrome viruses isolated in the United States. Virus Res 2004;100:229–235.

11.

Goldberg

, et al. Genetic, geographical and temporal variation of porcine reproductive and respiratory syndrome virus in Illinois. J Gen Virol 2000;81:171–179.

12.

Holtkamp

, et al. Assessment of the economic impact of porcine reproductive and respiratory syndrome virus on United States pork producers. J Swine Health Prod 2013;21:72–84.

13.

Kappes

Faaberg

. PRRSV structure, replication and recombination: origin of phenotype and genotype diversity. Virology 2015;479–480:475–486.

14.

Key

, et al. Genetic variation and phylogenetic analyses of the ORF5 gene of acute porcine reproductive and respiratory syndrome virus isolates. Vet Microbiol 2001;83:249–263.

15.

Kim

, et al. Enhanced replication of porcine reproductive and respiratory syndrome (PRRS) virus in a homogeneous subpopulation of MA-104 cell line. Arch Virol 1993;133:477–483.

16.

Kimman

, et al. Challenges for porcine reproductive and respiratory syndrome virus (PRRSV) vaccinology. Vaccine 2009;27:3704–3718.

17.

Kvisgaard

, et al. A fast and robust method for full genome sequencing of porcine reproductive and respiratory syndrome virus (PRRSV) type 1 and type 2. J Virol Methods 2013;193:697–705.

18.

Ladinig

, et al. Variation in fetal outcome, viral load and ORF5 sequence mutations in a large scale study of phenotypic responses to late gestation exposure to type 2 porcine reproductive and respiratory syndrome virus. PLoS One 2014;9:e96104.

19.

, et al. Full-length genome sequence of porcine deltacoronavirus strain USA/IA/2014/8734. Genome Announc 2014;2:e00278-14.

20.

Durbin

. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 2009;25:1754–1760.

21.

, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 2009;25:2078–2079.

22.

, et al. Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing. Virol J 2014;11:42.

23.

, et al. Complete genome sequence of a pathogenic genotype 1 subtype 3 porcine reproductive and respiratory syndrome virus (strain SU1-Bel) from pig primary tissue. Genome Announc 2015;3:e00340-15.

24.

Martín-Valls

, et al. Analysis of ORF5 and full-length genome sequences of porcine reproductive and respiratory syndrome virus isolates of genotypes 1 and 2 retrieved worldwide provides evidence that recombination is a common phenomenon and may produce mosaic isolates. J Virol 2014;88:3170–3181.

25.

Meeusen

, et al. Current status of veterinary vaccines. Clin Microbiol Rev 2007;20:489–510.

26.

Metzker

. Sequencing technologies - the next generation. Nat Rev Genet 2010;11:31–46.

27.

Meulenberg

, et al. Lelystad virus, the causative agent of porcine epidemic abortion and respiratory syndrome (PEARS), is related to LDV and EAV. Virology 1993;192:62–72.

28.

Murtaugh

, et al. The ever-expanding diversity of porcine reproductive and respiratory syndrome virus. Virus Res 2010;154:18–30.

29.

Nam

, et al. Complete genome sequence of noncytopathic bovine viral diarrhea virus 1 contaminating a high-passage RK-13 cell line. Genome Announc 2015;3:pii:e01115–15.

30.

Nelsen

, et al. Porcine reproductive and respiratory syndrome virus comparison: divergent evolution on two continents. J Virol 1999;73:270–280.

31.

, et al. Establishment of a DNA-launched infectious clone for a highly pneumovirulent strain of type 2 porcine reproductive and respiratory syndrome virus: identification and in vitro and in vivo characterization of a large spontaneous deletion in the nsp2 region. Virus Res 2011;160:264–273.

32.

Nielsen

, et al. Generation of an infectious clone of VR-2332, a highly virulent North American-type isolate of porcine reproductive and respiratory syndrome virus. J Virol 2003;77:3702–3711.

33.

Pallen

. Diagnostic metagenomics: potential applications to bacterial, viral and parasitic infections. Parasitology 2014;141:1856–1862.

34.

Pirzadeh

, et al. Genomic and antigenic variations of porcine reproductive and respiratory syndrome virus major envelope GP5 glycoprotein. Can J Vet Res 1998;62:170–177.

35.

Renukaradhya

, et al. Live porcine reproductive and respiratory syndrome virus vaccines: current status and future direction. Vaccine 2015;33:4069–4080.

36.

Robinson

, et al. Integrative genomics viewer. Nat Biotechnol 2011;29:24–26.

37.

Simpson

, et al. ABySS: a parallel assembler for short read sequence data. Genome Res 2009;19:1117–1123.

38.

Snijder

, et al. Arterivirus molecular biology and pathogenesis. J Gen Virol 2013;94:2141–2163.

39.

Stadejek

, et al. Molecular evolution of PRRSV in Europe: current state of play. Vet Microbiol 2013;165:21–28.

40.

Tang

, et al. Positive effects of porcine IL-2 and IL-4 on virus-specific immune responses induced by the porcine reproductive and respiratory syndrome virus (PRRSV) ORF5 DNA vaccine in swine. J Vet Sci 2014;15:99–109.

41.

Thorvaldsdóttir

, et al. Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform 2013;14:178–192.

42.

Wang

, et al. Metagenomic sequencing reveals microbiota and its functional potential associated with periodontal disease. Sci Rep 2013;3:1843.

43.

Wang

, et al. Attenuation of porcine reproductive and respiratory syndrome virus strain MN184 using chimeric construction with vaccine sequence. Virology 2008;371:418–429.

44.

Wernike

, et al. Porcine reproductive and respiratory syndrome virus: interlaboratory ring trial to evaluate real-time reverse transcription polymerase chain reaction detection methods. J Vet Diagn Invest 2012;24:855–866.

45.

Yin

, et al. Genetic diversity of the ORF5 gene of porcine reproductive and respiratory syndrome virus isolates in southwest China from 2007 to 2009. PLoS One 2012;7:e33756.

46.

Zhang

, et al. Full-length genome sequences of Senecavirus A from recent idiopathic vesicular disease outbreaks in U.S. swine. Genome Announc 2015;3:e01270-15.

47.

Zimmerman

, et al. Porcine reproductive and respiratory syndrome virus (porcine arterivirus). In: Zimmerman

, et al, eds. ed. Diseases of Swine. 10th ed. Ames, IA: Wiley-Blackwell, 2012:461–486.