TY - JOUR
T1 - Improved detection of artifactual viral minority variants in high-throughput sequencing data
AU - Welkers, Matthijs R. A.
AU - Jonges, Marcel
AU - Jeeninga, Rienk E.
AU - Koopmans, Marion P. G.
AU - de Jong, Menno D.
PY - 2015
Y1 - 2015
N2 - High-throughput sequencing (HIS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HIS process were investigated by determining minority variant frequencies in an influenza ANVSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RI-PCR) amplification and HIS in the same sequence run. Results showed that after "best practice" quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RI-PCR amplified samples, indicating RI-PCR amplification artificially increased variation. Detailed analysis showed that artrfactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RI-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HIS
AB - High-throughput sequencing (HIS) of viral samples provides important information on the presence of viral minority variants. However, detection and accurate quantification is limited by the capacity to distinguish biological from artificial variation. In this study, errors related to the Illumina HiSeq2000 library generation and HIS process were investigated by determining minority variant frequencies in an influenza ANVSN/1933(H1N1) virus reverse-genetics plasmid pool. Errors related to amplification and sequencing were determined using the same plasmid pool, by generation of infectious virus using reverse genetics followed by in duplo reverse-transcriptase PCR (RI-PCR) amplification and HIS in the same sequence run. Results showed that after "best practice" quality control (QC), within the plasmid pool, one minority variant with a frequency >0.5% was identified, while 84 and 139 were identified in the RI-PCR amplified samples, indicating RI-PCR amplification artificially increased variation. Detailed analysis showed that artrfactual minority variants could be identified by two major technical characteristics: their predominant presence in a single read orientation and uneven distribution of mismatches over the length of the reads. We demonstrate that by addition of two QC steps 95% of the artifactual minority variants could be identified. When our analysis approach was applied to three clinical samples 68% of the initially identified minority variants were identified as artifacts. Our study clearly demonstrated that, without additional QC steps, overestimation of viral minority variants is very likely to occur, mainly as a consequence of the required RI-PCR amplification step. The improved ability to detect and correct for artifactual minority variants, increases data resolution and could aid both past and future studies incorporating HIS
U2 - https://doi.org/10.3389/fmicb.2014.00804
DO - https://doi.org/10.3389/fmicb.2014.00804
M3 - Article
C2 - 25657642
SN - 1664-302X
VL - 5
SP - 804
JO - Frontiers in Microbiology
JF - Frontiers in Microbiology
ER -