Penalized estimation of the Gaussian graphical model from data with replicates

Research output: Contribution to journalArticleAcademicpeer-review

4 Citations (Scopus)

Abstract

Gaussian graphical models are usually estimated from unreplicated data. The data are, however, likely to comprise signal and noise. These two cannot be deconvoluted from unreplicated data. Pragmatically, the noise is then ignored in practice. We point out the consequences of this practice for the reconstruction of the conditional independence graph of the signal. Replicated data allow for the deconvolution of signal and noise and the reconstruction of former's conditional independence graph. Hereto we present a penalized Expectation-Maximization algorithm. The penalty parameter is chosen to maximize the F-fold cross-validated log-likelihood. Sampling schemes of the folds from replicated data are discussed. By simulation we investigate the effect of replicates on the reconstruction of the signal's conditional independence graph. Moreover, we compare the proposed method to several obvious competitors. In an application we use data from oncogenomic studies with replicates to reconstruct the gene-gene interaction networks, operationalized as conditional independence graphs. This yields a realistic portrait of the effect of ignoring other sources but sampling variation. In addition, it bears implications on the reproducibility of inferred gene-gene interaction networks reported in literature.
Original languageEnglish
Pages (from-to)4279-4293
Number of pages15
JournalStatistics in medicine
Volume40
Issue number19
Early online date2021
DOIs
Publication statusPublished - 30 Aug 2021

Keywords

  • conditional independence graph
  • inverse covariance
  • network
  • reproducibility
  • ridge penalty

Cite this