Semi-supervised empirical Bayes group-regularized factor regression

Magnus M. Münch, Mark A. van de Wiel, Aad W. van der Vaart, Carel F. W. Peeters

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

The features in a high-dimensional biomedical prediction problem are often well described by low-dimensional latent variables (or factors). We use this to include unlabeled features and additional information on the features when building a prediction model. Such additional feature information is often available in biomedical applications. Examples are annotation of genes, metabolites, or p-values from a previous study. We employ a Bayesian factor regression model that jointly models the features and the outcome using Gaussian latent variables. We fit the model using a computationally efficient variational Bayes method, which scales to high dimensions. We use the extra information to set up a prior model for the features in terms of hyperparameters, which are then estimated through empirical Bayes. The method is demonstrated in simulations and two applications. One application considers influenza vaccine efficacy prediction based on microarray data. The second application predicts oral cancer metastasis from RNAseq data.
Original languageEnglish
Pages (from-to)1289-1306
Number of pages18
JournalBiometrical Journal
Volume64
Issue number7
Early online date2022
DOIs
Publication statusPublished - Oct 2022

Keywords

  • empirical Bayes
  • factor regression
  • high-dimensional data
  • semisupervised learning

Cite this