TOSCCA: a framework for interpretation and testing of sparse canonical correlations

Nuria Senar; Mark van de Wiel; Aeilko H. Zwinderman; Michel H. Hof

doi:10.1093/bioadv/vbae021

TOSCCA: a framework for interpretation and testing of sparse canonical correlations

Nuria Senar, Mark van de Wiel, Aeilko H. Zwinderman, Michel H. Hof

Research output: Contribution to journal › Article › Academic › peer-review

Abstract

Summary: In clinical and biomedical research, multiple high-dimensional datasets are nowadays routinely collected from omics and imaging devices. Multivariate methods, such as Canonical Correlation Analysis (CCA), integrate two (or more) datasets to discover and understand underlying biological mechanisms. For an explorative method like CCA, interpretation is key. We present a sparse CCA method based on soft-thresholding that produces near-orthogonal components, allows for browsing over various sparsity levels, and permutation-based hypothesis testing. Our soft-thresholding approach avoids tuning of a penalty parameter. Such tuning is computationally burdensome and may render unintelligible results. In addition, unlike alternative approaches, our method is less dependent on the initialization. We examined the performance of our approach with simulations and illustrated its use on real cancer genomics data from drug sensitivity screens. Moreover, we compared its performance to Penalized Matrix Analysis (PMA), which is a popular alternative of sparse CCA with a focus on yielding interpretable results. Compared to PMA, our method offers improved interpretability of the results, while not compromising, or even improving, signal discovery.

Original language	English
Article number	vbae021
Journal	Bioinformatics advances
Volume	4
Issue number	1
DOIs	https://doi.org/10.1093/bioadv/vbae021
Publication status	Published - 2024

Access to Document

10.1093/bioadv/vbae021

Cite this

@article{3553a1e4129440e781b391a9dd90fabb,

title = "TOSCCA: a framework for interpretation and testing of sparse canonical correlations",

abstract = "Summary: In clinical and biomedical research, multiple high-dimensional datasets are nowadays routinely collected from omics and imaging devices. Multivariate methods, such as Canonical Correlation Analysis (CCA), integrate two (or more) datasets to discover and understand underlying biological mechanisms. For an explorative method like CCA, interpretation is key. We present a sparse CCA method based on soft-thresholding that produces near-orthogonal components, allows for browsing over various sparsity levels, and permutation-based hypothesis testing. Our soft-thresholding approach avoids tuning of a penalty parameter. Such tuning is computationally burdensome and may render unintelligible results. In addition, unlike alternative approaches, our method is less dependent on the initialization. We examined the performance of our approach with simulations and illustrated its use on real cancer genomics data from drug sensitivity screens. Moreover, we compared its performance to Penalized Matrix Analysis (PMA), which is a popular alternative of sparse CCA with a focus on yielding interpretable results. Compared to PMA, our method offers improved interpretability of the results, while not compromising, or even improving, signal discovery.",

author = "Nuria Senar and {van de Wiel}, Mark and Zwinderman, {Aeilko H.} and Hof, {Michel H.}",

note = "Publisher Copyright: {\textcopyright} The Author(s) 2024.",

year = "2024",

doi = "10.1093/bioadv/vbae021",

language = "English",

volume = "4",

journal = "Bioinformatics advances",

issn = "2635-0041",

publisher = "Oxford University Press",

number = "1",

}

TY - JOUR

T1 - TOSCCA

T2 - a framework for interpretation and testing of sparse canonical correlations

AU - Senar, Nuria

AU - van de Wiel, Mark

AU - Zwinderman, Aeilko H.

AU - Hof, Michel H.

N1 - Publisher Copyright: © The Author(s) 2024.

PY - 2024

Y1 - 2024

N2 - Summary: In clinical and biomedical research, multiple high-dimensional datasets are nowadays routinely collected from omics and imaging devices. Multivariate methods, such as Canonical Correlation Analysis (CCA), integrate two (or more) datasets to discover and understand underlying biological mechanisms. For an explorative method like CCA, interpretation is key. We present a sparse CCA method based on soft-thresholding that produces near-orthogonal components, allows for browsing over various sparsity levels, and permutation-based hypothesis testing. Our soft-thresholding approach avoids tuning of a penalty parameter. Such tuning is computationally burdensome and may render unintelligible results. In addition, unlike alternative approaches, our method is less dependent on the initialization. We examined the performance of our approach with simulations and illustrated its use on real cancer genomics data from drug sensitivity screens. Moreover, we compared its performance to Penalized Matrix Analysis (PMA), which is a popular alternative of sparse CCA with a focus on yielding interpretable results. Compared to PMA, our method offers improved interpretability of the results, while not compromising, or even improving, signal discovery.

AB - Summary: In clinical and biomedical research, multiple high-dimensional datasets are nowadays routinely collected from omics and imaging devices. Multivariate methods, such as Canonical Correlation Analysis (CCA), integrate two (or more) datasets to discover and understand underlying biological mechanisms. For an explorative method like CCA, interpretation is key. We present a sparse CCA method based on soft-thresholding that produces near-orthogonal components, allows for browsing over various sparsity levels, and permutation-based hypothesis testing. Our soft-thresholding approach avoids tuning of a penalty parameter. Such tuning is computationally burdensome and may render unintelligible results. In addition, unlike alternative approaches, our method is less dependent on the initialization. We examined the performance of our approach with simulations and illustrated its use on real cancer genomics data from drug sensitivity screens. Moreover, we compared its performance to Penalized Matrix Analysis (PMA), which is a popular alternative of sparse CCA with a focus on yielding interpretable results. Compared to PMA, our method offers improved interpretability of the results, while not compromising, or even improving, signal discovery.

UR - http://www.scopus.com/inward/record.url?scp=85187374435&partnerID=8YFLogxK

U2 - 10.1093/bioadv/vbae021

DO - 10.1093/bioadv/vbae021

M3 - Article

C2 - 38456127

SN - 2635-0041

VL - 4

JO - Bioinformatics advances

JF - Bioinformatics advances

IS - 1

M1 - vbae021

ER -

TOSCCA: a framework for interpretation and testing of sparse canonical correlations

Abstract

Access to Document

Other files and links

Cite this