TY - JOUR
T1 - Learning from a lot
T2 - Empirical Bayes for high-dimensional model-based prediction
AU - van de Wiel, Mark A.
AU - Te Beest, Dennis E.
AU - Münch, Magnus M.
PY - 2019/3
Y1 - 2019/3
N2 - Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well-known model-based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co-data”. In particular, we present two novel examples that allow for co-data: first, a Bayesian spike-and-slab setting that facilitates inclusion of multiple co-data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.
AB - Empirical Bayes is a versatile approach to “learn from a lot” in two ways: first, from a large number of variables and, second, from a potentially large amount of prior information, for example, stored in public repositories. We review applications of a variety of empirical Bayes methods to several well-known model-based prediction methods, including penalized regression, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss “formal” empirical Bayes methods that maximize the marginal likelihood but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and p, the number of variables, we consider a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, which model a priori information on variables termed “co-data”. In particular, we present two novel examples that allow for co-data: first, a Bayesian spike-and-slab setting that facilitates inclusion of multiple co-data sources and types and, second, a hybrid empirical Bayes–full Bayes ridge regression approach for estimation of the posterior predictive interval.
KW - co-data
KW - empirical Bayes
KW - marginal likelihood
KW - prediction
KW - variable selection
UR - http://www.scopus.com/inward/record.url?scp=85061092410&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85061092410&partnerID=8YFLogxK
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85061092410&origin=inward
U2 - https://doi.org/10.1111/sjos.12335
DO - https://doi.org/10.1111/sjos.12335
M3 - Article
C2 - 31007342
SN - 0303-6898
VL - 46
SP - 2
EP - 25
JO - Scandinavian Journal of Statistics
JF - Scandinavian Journal of Statistics
IS - 1
ER -