Euclidean Distance Analysis Enables Nucleotide Skew Analysis in Viral Genomes

F. van Hemert; M. Jebbink; A. van der Ark; F. Scholer; B. Berkhout

doi:https://doi.org/10.1155/2018/6490647

Euclidean Distance Analysis Enables Nucleotide Skew Analysis in Viral Genomes

F. van Hemert, M. Jebbink, A. van der Ark, F. Scholer, B. Berkhout

Research output: Contribution to journal › Article › Academic › peer-review

6 Citations (Scopus)

Abstract

Nucleotide skew analysis is a versatile method to study the nucleotide composition of RNA/DNA molecules, in particular to reveal characteristic sequence signatures. For instance, skew analysis of the nucleotide bias of several viral RNA genomes indicated that it is enriched in the unpaired, single-stranded genome regions, thus creating an even more striking virus-specific signature. The comparison of skew graphs for many virus isolates or families is difficult, time-consuming, and nonquantitative. Here, we present a procedure for a more simple identification of similarities and dissimilarities between nucleotide skew data of coronavirus, flavivirus, picornavirus, and HIV-1 RNA genomes. Window and step sizes were normalized to correct for differences in length of the viral genome. Cumulative skew data are converted into pairwise Euclidean distance matrices, which can be presented as neighbor-joining trees. We present skew value trees for the four virus families and show that closely related viruses are placed in small clusters. Importantly, the skew value trees are similar to the trees constructed by a “classical” model of evolutionary nucleotide substitution. Thus, we conclude that the simple calculation of Euclidean distances between nucleotide skew data allows an easy and quantitative comparison of characteristic sequence signatures of virus genomes. These results indicate that the Euclidean distance analysis of nucleotide skew data forms a nice addition to the virology toolbox.

Original language	English
Article number	6490647
Number of pages	9
Journal	Computational and Mathematical Methods in Medicine
Volume	2018
DOIs	https://doi.org/10.1155/2018/6490647
Publication status	Published - 30 Oct 2018

Access to Document

https://doi.org/10.1155/2018/6490647Licence: CC BY

https://pure.uva.nl/ws/files/32965469/6490647.pdfLicence: CC BY

Cite this

@article{a1eb71bef20749b4aad4aa533569689e,

title = "Euclidean Distance Analysis Enables Nucleotide Skew Analysis in Viral Genomes",

abstract = "Nucleotide skew analysis is a versatile method to study the nucleotide composition of RNA/DNA molecules, in particular to reveal characteristic sequence signatures. For instance, skew analysis of the nucleotide bias of several viral RNA genomes indicated that it is enriched in the unpaired, single-stranded genome regions, thus creating an even more striking virus-specific signature. The comparison of skew graphs for many virus isolates or families is difficult, time-consuming, and nonquantitative. Here, we present a procedure for a more simple identification of similarities and dissimilarities between nucleotide skew data of coronavirus, flavivirus, picornavirus, and HIV-1 RNA genomes. Window and step sizes were normalized to correct for differences in length of the viral genome. Cumulative skew data are converted into pairwise Euclidean distance matrices, which can be presented as neighbor-joining trees. We present skew value trees for the four virus families and show that closely related viruses are placed in small clusters. Importantly, the skew value trees are similar to the trees constructed by a “classical” model of evolutionary nucleotide substitution. Thus, we conclude that the simple calculation of Euclidean distances between nucleotide skew data allows an easy and quantitative comparison of characteristic sequence signatures of virus genomes. These results indicate that the Euclidean distance analysis of nucleotide skew data forms a nice addition to the virology toolbox.",

author = "{van Hemert}, F. and M. Jebbink and {van der Ark}, A. and F. Scholer and B. Berkhout",

note = "With supplementary file.",

year = "2018",

month = oct,

day = "30",

doi = "https://doi.org/10.1155/2018/6490647",

language = "English",

volume = "2018",

journal = "Computational and Mathematical Methods in Medicine",

issn = "1748-670X",

publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - Euclidean Distance Analysis Enables Nucleotide Skew Analysis in Viral Genomes

AU - van Hemert, F.

AU - Jebbink, M.

AU - van der Ark, A.

AU - Scholer, F.

AU - Berkhout, B.

N1 - With supplementary file.

PY - 2018/10/30

Y1 - 2018/10/30

N2 - Nucleotide skew analysis is a versatile method to study the nucleotide composition of RNA/DNA molecules, in particular to reveal characteristic sequence signatures. For instance, skew analysis of the nucleotide bias of several viral RNA genomes indicated that it is enriched in the unpaired, single-stranded genome regions, thus creating an even more striking virus-specific signature. The comparison of skew graphs for many virus isolates or families is difficult, time-consuming, and nonquantitative. Here, we present a procedure for a more simple identification of similarities and dissimilarities between nucleotide skew data of coronavirus, flavivirus, picornavirus, and HIV-1 RNA genomes. Window and step sizes were normalized to correct for differences in length of the viral genome. Cumulative skew data are converted into pairwise Euclidean distance matrices, which can be presented as neighbor-joining trees. We present skew value trees for the four virus families and show that closely related viruses are placed in small clusters. Importantly, the skew value trees are similar to the trees constructed by a “classical” model of evolutionary nucleotide substitution. Thus, we conclude that the simple calculation of Euclidean distances between nucleotide skew data allows an easy and quantitative comparison of characteristic sequence signatures of virus genomes. These results indicate that the Euclidean distance analysis of nucleotide skew data forms a nice addition to the virology toolbox.

AB - Nucleotide skew analysis is a versatile method to study the nucleotide composition of RNA/DNA molecules, in particular to reveal characteristic sequence signatures. For instance, skew analysis of the nucleotide bias of several viral RNA genomes indicated that it is enriched in the unpaired, single-stranded genome regions, thus creating an even more striking virus-specific signature. The comparison of skew graphs for many virus isolates or families is difficult, time-consuming, and nonquantitative. Here, we present a procedure for a more simple identification of similarities and dissimilarities between nucleotide skew data of coronavirus, flavivirus, picornavirus, and HIV-1 RNA genomes. Window and step sizes were normalized to correct for differences in length of the viral genome. Cumulative skew data are converted into pairwise Euclidean distance matrices, which can be presented as neighbor-joining trees. We present skew value trees for the four virus families and show that closely related viruses are placed in small clusters. Importantly, the skew value trees are similar to the trees constructed by a “classical” model of evolutionary nucleotide substitution. Thus, we conclude that the simple calculation of Euclidean distances between nucleotide skew data allows an easy and quantitative comparison of characteristic sequence signatures of virus genomes. These results indicate that the Euclidean distance analysis of nucleotide skew data forms a nice addition to the virology toolbox.

UR - https://pure.uva.nl/ws/files/32965467/6490647.f1.docx

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85062416963&origin=inward

UR - https://www.ncbi.nlm.nih.gov/pubmed/30510593

U2 - https://doi.org/10.1155/2018/6490647

DO - https://doi.org/10.1155/2018/6490647

M3 - Article

C2 - 30510593

SN - 1748-670X

VL - 2018

JO - Computational and Mathematical Methods in Medicine

JF - Computational and Mathematical Methods in Medicine

M1 - 6490647

ER -

Euclidean Distance Analysis Enables Nucleotide Skew Analysis in Viral Genomes

Abstract

Access to Document

Other files and links

Cite this