Multiset sparse redundancy analysis for high-dimensional omics data

Attila Csala; Michel H. Hof; Aeilko H. Zwinderman

doi:https://doi.org/10.1002/bimj.201700248

Multiset sparse redundancy analysis for high-dimensional omics data

Attila Csala, Michel H. Hof, Aeilko H. Zwinderman

Research output: Contribution to journal › Article › Academic › peer-review

7 Citations (Scopus)

Abstract

Redundancy Analysis (RDA) is a well-known method used to describe the directional relationship between related data sets. Recently, we proposed sparse Redundancy Analysis (sRDA) for high-dimensional genomic data analysis to find explanatory variables that explain the most variance of the response variables. As more and more biomolecular data become available from different biological levels, such as genotypic and phenotypic data from different omics domains, a natural research direction is to apply an integrated analysis approach in order to explore the underlying biological mechanism of certain phenotypes of the given organism. We show that the multiset sparse Redundancy Analysis (multi-sRDA) framework is a prominent candidate for high-dimensional omics data analysis since it accounts for the directional information transfer between omics sets, and, through its sparse solutions, the interpretability of the result is improved. In this paper, we also describe a software implementation for multi-sRDA, based on the Partial Least Squares Path Modeling algorithm. We test our method through simulation and real omics data analysis with data sets of 364,134 methylation markers, 18,424 gene expression markers, and 47 cytokine markers measured on 37 patients with Marfan syndrome.

Original language	English
Pages (from-to)	406-423
Journal	Biometrical journal. Biometrische Zeitschrift
Volume	61
Issue number	2
Early online date	2018
DOIs	https://doi.org/10.1002/bimj.201700248
Publication status	Published - 2019

Access to Document

https://doi.org/10.1002/bimj.201700248

Cite this

@article{09aa0c77779b4c2d837699ed983124ff,

title = "Multiset sparse redundancy analysis for high-dimensional omics data",

abstract = "Redundancy Analysis (RDA) is a well-known method used to describe the directional relationship between related data sets. Recently, we proposed sparse Redundancy Analysis (sRDA) for high-dimensional genomic data analysis to find explanatory variables that explain the most variance of the response variables. As more and more biomolecular data become available from different biological levels, such as genotypic and phenotypic data from different omics domains, a natural research direction is to apply an integrated analysis approach in order to explore the underlying biological mechanism of certain phenotypes of the given organism. We show that the multiset sparse Redundancy Analysis (multi-sRDA) framework is a prominent candidate for high-dimensional omics data analysis since it accounts for the directional information transfer between omics sets, and, through its sparse solutions, the interpretability of the result is improved. In this paper, we also describe a software implementation for multi-sRDA, based on the Partial Least Squares Path Modeling algorithm. We test our method through simulation and real omics data analysis with data sets of 364,134 methylation markers, 18,424 gene expression markers, and 47 cytokine markers measured on 37 patients with Marfan syndrome.",

author = "Attila Csala and Hof, {Michel H.} and Zwinderman, {Aeilko H.}",

year = "2019",

doi = "https://doi.org/10.1002/bimj.201700248",

language = "English",

volume = "61",

pages = "406--423",

journal = "Biometrical journal. Biometrische Zeitschrift",

issn = "0323-3847",

publisher = "Wiley-VCH Verlag",

number = "2",

}

TY - JOUR

T1 - Multiset sparse redundancy analysis for high-dimensional omics data

AU - Csala, Attila

AU - Hof, Michel H.

AU - Zwinderman, Aeilko H.

PY - 2019

Y1 - 2019

N2 - Redundancy Analysis (RDA) is a well-known method used to describe the directional relationship between related data sets. Recently, we proposed sparse Redundancy Analysis (sRDA) for high-dimensional genomic data analysis to find explanatory variables that explain the most variance of the response variables. As more and more biomolecular data become available from different biological levels, such as genotypic and phenotypic data from different omics domains, a natural research direction is to apply an integrated analysis approach in order to explore the underlying biological mechanism of certain phenotypes of the given organism. We show that the multiset sparse Redundancy Analysis (multi-sRDA) framework is a prominent candidate for high-dimensional omics data analysis since it accounts for the directional information transfer between omics sets, and, through its sparse solutions, the interpretability of the result is improved. In this paper, we also describe a software implementation for multi-sRDA, based on the Partial Least Squares Path Modeling algorithm. We test our method through simulation and real omics data analysis with data sets of 364,134 methylation markers, 18,424 gene expression markers, and 47 cytokine markers measured on 37 patients with Marfan syndrome.

AB - Redundancy Analysis (RDA) is a well-known method used to describe the directional relationship between related data sets. Recently, we proposed sparse Redundancy Analysis (sRDA) for high-dimensional genomic data analysis to find explanatory variables that explain the most variance of the response variables. As more and more biomolecular data become available from different biological levels, such as genotypic and phenotypic data from different omics domains, a natural research direction is to apply an integrated analysis approach in order to explore the underlying biological mechanism of certain phenotypes of the given organism. We show that the multiset sparse Redundancy Analysis (multi-sRDA) framework is a prominent candidate for high-dimensional omics data analysis since it accounts for the directional information transfer between omics sets, and, through its sparse solutions, the interpretability of the result is improved. In this paper, we also describe a software implementation for multi-sRDA, based on the Partial Least Squares Path Modeling algorithm. We test our method through simulation and real omics data analysis with data sets of 364,134 methylation markers, 18,424 gene expression markers, and 47 cytokine markers measured on 37 patients with Marfan syndrome.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85058052961&origin=inward

UR - https://www.ncbi.nlm.nih.gov/pubmed/30506971

U2 - https://doi.org/10.1002/bimj.201700248

DO - https://doi.org/10.1002/bimj.201700248

M3 - Article

C2 - 30506971

SN - 0323-3847

VL - 61

SP - 406

EP - 423

JO - Biometrical journal. Biometrische Zeitschrift

JF - Biometrical journal. Biometrische Zeitschrift

IS - 2

ER -

Multiset sparse redundancy analysis for high-dimensional omics data

Abstract

Access to Document

Other files and links

Cite this