Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution

Boas C. L. van der Putten; Niek A. H. Huijsmans; Daniel R. Mende; Constance Schultsz

doi:https://doi.org/10.1099/mgen.0.000799

Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution

Boas C. L. van der Putten, Niek A. H. Huijsmans, Daniel R. Mende, Constance Schultsz

Research output: Contribution to journal › Article › Academic › peer-review

1 Citation (Scopus)

Abstract

Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95 % average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked 19 distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and ska achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or skesa outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction, k-mer alignment methods are relevant alternatives to reference mapping at the species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.

Original language	English
Article number	000799
Journal	Microbial genomics
Volume	8
Issue number	3
DOIs	https://doi.org/10.1099/mgen.0.000799
Publication status	Published - 1 Mar 2022

Keywords

benchmarking study
in silico evolution
phylogenetics
simulation

Access to Document

https://doi.org/10.1099/mgen.0.000799

Cite this

@article{6b7da6d3244640c1a9f5110ea801e353,

title = "Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution",

abstract = "Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95 % average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked 19 distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and ska achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or skesa outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction, k-mer alignment methods are relevant alternatives to reference mapping at the species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.",

keywords = "benchmarking study, in silico evolution, phylogenetics, simulation",

author = "{van der Putten}, {Boas C. L.} and Huijsmans, {Niek A. H.} and Mende, {Daniel R.} and Constance Schultsz",

note = "Funding Information: B.C.L.P. was supported through an internal Academic Medical Center (AMC) Amsterdam grant ({\textquoteleft}Flexibele OiO beurs{\textquoteright}). The HECTOR research project was supported under the framework of the JPIAMR – Joint Programming Initiative on Antimicrobial Resistance – through the third joint call, thanks to the generous funding by the Netherlands Organisation for Health Research and Development (ZonMw, grant number 547001012), the Federal Ministry of Education and Research (BMBF/DLR grant numbers 01KI1703A, 01KI1703C and 01KI703B), the State Research Agency (AEI) of the Ministry of Science, Innovation and Universities (MINECO, grant number PCIN-2016-096), and the Medical Research Council (MRC, grant number MR/R002762/1). Publisher Copyright: {\textcopyright} 2022 The Authors.",

year = "2022",

month = mar,

day = "1",

doi = "https://doi.org/10.1099/mgen.0.000799",

language = "English",

volume = "8",

journal = "Microbial genomics",

issn = "2057-5858",

publisher = "Microbiology Society",

number = "3",

}

TY - JOUR

T1 - Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution

AU - van der Putten, Boas C. L.

AU - Huijsmans, Niek A. H.

AU - Mende, Daniel R.

AU - Schultsz, Constance

N1 - Funding Information: B.C.L.P. was supported through an internal Academic Medical Center (AMC) Amsterdam grant (‘Flexibele OiO beurs’). The HECTOR research project was supported under the framework of the JPIAMR – Joint Programming Initiative on Antimicrobial Resistance – through the third joint call, thanks to the generous funding by the Netherlands Organisation for Health Research and Development (ZonMw, grant number 547001012), the Federal Ministry of Education and Research (BMBF/DLR grant numbers 01KI1703A, 01KI1703C and 01KI703B), the State Research Agency (AEI) of the Ministry of Science, Innovation and Universities (MINECO, grant number PCIN-2016-096), and the Medical Research Council (MRC, grant number MR/R002762/1). Publisher Copyright: © 2022 The Authors.

PY - 2022/3/1

Y1 - 2022/3/1

N2 - Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95 % average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked 19 distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and ska achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or skesa outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction, k-mer alignment methods are relevant alternatives to reference mapping at the species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.

AB - Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as de novo assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95 % average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked 19 distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed k-mer alignment methods such as kSNP and ska achieve similar accuracy as reference mapping. The high accuracy of k-mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of de novo assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or skesa outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction, k-mer alignment methods are relevant alternatives to reference mapping at the species level, especially in the absence of suitable reference genomes. We show de novo genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.

KW - benchmarking study

KW - in silico evolution

KW - phylogenetics

KW - simulation

UR - http://www.scopus.com/inward/record.url?scp=85126664639&partnerID=8YFLogxK

U2 - https://doi.org/10.1099/mgen.0.000799

DO - https://doi.org/10.1099/mgen.0.000799

M3 - Article

C2 - 35290758

SN - 2057-5858

VL - 8

JO - Microbial genomics

JF - Microbial genomics

IS - 3

M1 - 000799

ER -

Benchmarking the topological accuracy of bacterial phylogenomic workflows using in silico evolution

Abstract

Keywords

Access to Document

Other files and links

Cite this