Penalized estimation of the Gaussian graphical model from data with replicates

Wessel N. van Wieringen; Yao Chen

doi:https://doi.org/10.1002/sim.9028

Penalized estimation of the Gaussian graphical model from data with replicates

Wessel N. van Wieringen, Yao Chen

Research output: Contribution to journal › Article › Academic › peer-review

4 Citations (Scopus)

Abstract

Gaussian graphical models are usually estimated from unreplicated data. The data are, however, likely to comprise signal and noise. These two cannot be deconvoluted from unreplicated data. Pragmatically, the noise is then ignored in practice. We point out the consequences of this practice for the reconstruction of the conditional independence graph of the signal. Replicated data allow for the deconvolution of signal and noise and the reconstruction of former's conditional independence graph. Hereto we present a penalized Expectation-Maximization algorithm. The penalty parameter is chosen to maximize the F-fold cross-validated log-likelihood. Sampling schemes of the folds from replicated data are discussed. By simulation we investigate the effect of replicates on the reconstruction of the signal's conditional independence graph. Moreover, we compare the proposed method to several obvious competitors. In an application we use data from oncogenomic studies with replicates to reconstruct the gene-gene interaction networks, operationalized as conditional independence graphs. This yields a realistic portrait of the effect of ignoring other sources but sampling variation. In addition, it bears implications on the reproducibility of inferred gene-gene interaction networks reported in literature.

Original language	English
Pages (from-to)	4279-4293
Number of pages	15
Journal	Statistics in medicine
Volume	40
Issue number	19
Early online date	2021
DOIs	https://doi.org/10.1002/sim.9028
Publication status	Published - 30 Aug 2021

Keywords

conditional independence graph
inverse covariance
network
reproducibility
ridge penalty

Access to Document

https://doi.org/10.1002/sim.9028

Cite this

@article{2ca43ea4f3e541ba94137cc90a3de33f,

title = "Penalized estimation of the Gaussian graphical model from data with replicates",

abstract = "Gaussian graphical models are usually estimated from unreplicated data. The data are, however, likely to comprise signal and noise. These two cannot be deconvoluted from unreplicated data. Pragmatically, the noise is then ignored in practice. We point out the consequences of this practice for the reconstruction of the conditional independence graph of the signal. Replicated data allow for the deconvolution of signal and noise and the reconstruction of former's conditional independence graph. Hereto we present a penalized Expectation-Maximization algorithm. The penalty parameter is chosen to maximize the F-fold cross-validated log-likelihood. Sampling schemes of the folds from replicated data are discussed. By simulation we investigate the effect of replicates on the reconstruction of the signal's conditional independence graph. Moreover, we compare the proposed method to several obvious competitors. In an application we use data from oncogenomic studies with replicates to reconstruct the gene-gene interaction networks, operationalized as conditional independence graphs. This yields a realistic portrait of the effect of ignoring other sources but sampling variation. In addition, it bears implications on the reproducibility of inferred gene-gene interaction networks reported in literature.",

keywords = "conditional independence graph, inverse covariance, network, reproducibility, ridge penalty",

author = "{van Wieringen}, {Wessel N.} and Yao Chen",

note = "Funding Information: This project has received funding from the Euratom research and training programme 2014–2018 under grant agreement No 755523. Funding Information: information Euratom research and training programme 2014?2018, 755523This project has received funding from the Euratom research and training programme 2014?2018 under grant agreement No 755523. Publisher Copyright: {\textcopyright} 2021 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.",

year = "2021",

month = aug,

day = "30",

doi = "https://doi.org/10.1002/sim.9028",

language = "English",

volume = "40",

pages = "4279--4293",

journal = "Statistics in medicine",

issn = "0277-6715",

publisher = "John Wiley and Sons Ltd",

number = "19",

}

TY - JOUR

T1 - Penalized estimation of the Gaussian graphical model from data with replicates

AU - van Wieringen, Wessel N.

AU - Chen, Yao

N1 - Funding Information: This project has received funding from the Euratom research and training programme 2014–2018 under grant agreement No 755523. Funding Information: information Euratom research and training programme 2014?2018, 755523This project has received funding from the Euratom research and training programme 2014?2018 under grant agreement No 755523. Publisher Copyright: © 2021 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. Copyright: Copyright 2021 Elsevier B.V., All rights reserved.

PY - 2021/8/30

Y1 - 2021/8/30

N2 - Gaussian graphical models are usually estimated from unreplicated data. The data are, however, likely to comprise signal and noise. These two cannot be deconvoluted from unreplicated data. Pragmatically, the noise is then ignored in practice. We point out the consequences of this practice for the reconstruction of the conditional independence graph of the signal. Replicated data allow for the deconvolution of signal and noise and the reconstruction of former's conditional independence graph. Hereto we present a penalized Expectation-Maximization algorithm. The penalty parameter is chosen to maximize the F-fold cross-validated log-likelihood. Sampling schemes of the folds from replicated data are discussed. By simulation we investigate the effect of replicates on the reconstruction of the signal's conditional independence graph. Moreover, we compare the proposed method to several obvious competitors. In an application we use data from oncogenomic studies with replicates to reconstruct the gene-gene interaction networks, operationalized as conditional independence graphs. This yields a realistic portrait of the effect of ignoring other sources but sampling variation. In addition, it bears implications on the reproducibility of inferred gene-gene interaction networks reported in literature.

AB - Gaussian graphical models are usually estimated from unreplicated data. The data are, however, likely to comprise signal and noise. These two cannot be deconvoluted from unreplicated data. Pragmatically, the noise is then ignored in practice. We point out the consequences of this practice for the reconstruction of the conditional independence graph of the signal. Replicated data allow for the deconvolution of signal and noise and the reconstruction of former's conditional independence graph. Hereto we present a penalized Expectation-Maximization algorithm. The penalty parameter is chosen to maximize the F-fold cross-validated log-likelihood. Sampling schemes of the folds from replicated data are discussed. By simulation we investigate the effect of replicates on the reconstruction of the signal's conditional independence graph. Moreover, we compare the proposed method to several obvious competitors. In an application we use data from oncogenomic studies with replicates to reconstruct the gene-gene interaction networks, operationalized as conditional independence graphs. This yields a realistic portrait of the effect of ignoring other sources but sampling variation. In addition, it bears implications on the reproducibility of inferred gene-gene interaction networks reported in literature.

KW - conditional independence graph

KW - inverse covariance

KW - network

KW - reproducibility

KW - ridge penalty

UR - http://www.scopus.com/inward/record.url?scp=85105685673&partnerID=8YFLogxK

U2 - https://doi.org/10.1002/sim.9028

DO - https://doi.org/10.1002/sim.9028

M3 - Article

C2 - 33987868

SN - 0277-6715

VL - 40

SP - 4279

EP - 4293

JO - Statistics in medicine

JF - Statistics in medicine

IS - 19

ER -

Penalized estimation of the Gaussian graphical model from data with replicates

Abstract

Keywords

Access to Document

Other files and links

Cite this