Enhanced bioinformatic profiling of VIDISCA libraries for virus detection and discovery

Research output: Contribution to JournalArticleAcademicpeer-review

12 Citations (Scopus)

Abstract

VIDISCA is a next-generation sequencing (NGS) library preparation method designed to enrich viral nucleic acids from samples before highly-multiplexed low depth sequencing. Reliable detection of known viruses and discovery of novel divergent viruses from NGS data require dedicated analysis tools that are both sensitive and accurate. Existing software was utilised to design a new bioinformatic workflow for high-throughput detection and discovery of viruses from VIDISCA data. The workflow leverages the VIDISCA library preparation molecular biology, specifically the use of Mse1 restriction enzyme which produces biological replicate library inserts from identical genomes. The workflow performs total metagenomic analysis for classification of non-viral sequence including parasites and host, and separately carries out virus specific analyses. Ribosomal RNA sequence is removed to increase downstream analysis speed and remaining reads are clustered at 100% identity. Known and novel viruses are sensitively detected via alignment to a virus-only protein database, and false positives are removed. A new cluster-profiling analysis takes advantage of the viral biological replicates produced by Mse1 digestion, using read clustering to flag the presence of short genomes at very high copy number. Importantly, this analysis ensures that highly repeated sequences are identified even if no homology is detected, as is shown here with the detection of a novel gokushovirus genome from human faecal matter. The workflow was validated using read data derived from serum and faeces samples taken from HIV-1 positive adults, and serum samples from pigs that were infected with atypical porcine pestivirus.
Original languageEnglish
Pages (from-to)21-26
JournalVirus research
Volume263
DOIs
Publication statusPublished - 2019

Cite this