Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

Jurre R. Veerman; Gwenaël G. R. Leday; Mark A. van de Wiel

doi:https://doi.org/10.1080/03610918.2019.1646760

Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

Jurre R. Veerman, Gwenaël G. R. Leday, Mark A. van de Wiel

Research output: Contribution to journal › Article › Academic › peer-review

5 Citations (Scopus)

Abstract

For high-dimensional linear regression models, we review and compare several estimators of variances τ2 and σ2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ and heritability index h2, often used in genetics. Several estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of the high-dimensional covariate matrix such as multi-collinear covariates and data-derived ones. Moreover, we study robustness against model misspecifications such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, shown to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an n-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ as compared to CV. Software is provided to enable reproduction of all results.

Original language	English
Journal	Communications in Statistics: Simulation and Computation
DOIs	https://doi.org/10.1080/03610918.2019.1646760
Publication status	Published - 1 Jan 2019
Externally published	Yes

Access to Document

https://doi.org/10.1080/03610918.2019.1646760

Cite this

@article{2b9236a0a77a4f089b4133ed1478be6a,

title = "Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models",

abstract = "For high-dimensional linear regression models, we review and compare several estimators of variances τ2 and σ2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ and heritability index h2, often used in genetics. Several estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of the high-dimensional covariate matrix such as multi-collinear covariates and data-derived ones. Moreover, we study robustness against model misspecifications such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, shown to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an n-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ as compared to CV. Software is provided to enable reproduction of all results.",

author = "Veerman, {Jurre R.} and Leday, {Gwena{\"e}l G. R.} and {van de Wiel}, {Mark A.}",

year = "2019",

month = jan,

day = "1",

doi = "https://doi.org/10.1080/03610918.2019.1646760",

language = "English",

journal = "Communications in Statistics: Simulation and Computation",

issn = "0361-0918",

publisher = "Taylor and Francis Ltd.",

}

TY - JOUR

T1 - Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

AU - Veerman, Jurre R.

AU - Leday, Gwenaël G. R.

AU - van de Wiel, Mark A.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - For high-dimensional linear regression models, we review and compare several estimators of variances τ2 and σ2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ and heritability index h2, often used in genetics. Several estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of the high-dimensional covariate matrix such as multi-collinear covariates and data-derived ones. Moreover, we study robustness against model misspecifications such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, shown to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an n-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ as compared to CV. Software is provided to enable reproduction of all results.

AB - For high-dimensional linear regression models, we review and compare several estimators of variances τ2 and σ2 of the random slopes and errors, respectively. These variances relate directly to ridge regression penalty λ and heritability index h2, often used in genetics. Several estimators of these, either based on cross-validation (CV) or maximum marginal likelihood (MML), are also discussed. The comparisons include several cases of the high-dimensional covariate matrix such as multi-collinear covariates and data-derived ones. Moreover, we study robustness against model misspecifications such as sparse instead of dense effects and non-Gaussian errors. An example on weight gain data with genomic covariates confirms the good performance of MML compared to CV. Several extensions are presented. First, to the high-dimensional linear mixed effects model, with REML as an alternative to MML. Second, to the conjugate Bayesian setting, shown to be a good alternative. Third, and most prominently, to generalized linear models for which we derive a computationally efficient MML estimator by re-writing the marginal likelihood as an n-dimensional integral. For Poisson and Binomial ridge regression, we demonstrate the superior accuracy of the resulting MML estimator of λ as compared to CV. Software is provided to enable reproduction of all results.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85070841242&origin=inward

U2 - https://doi.org/10.1080/03610918.2019.1646760

DO - https://doi.org/10.1080/03610918.2019.1646760

M3 - Article

SN - 0361-0918

JO - Communications in Statistics: Simulation and Computation

JF - Communications in Statistics: Simulation and Computation

ER -

Estimation of variance components, heritability and the ridge penalty in high-dimensional generalized linear models

Abstract

Access to Document

Other files and links

Cite this