Sequential Learning of Regression Models by Penalized Estimation

Wessel N. van Wieringen; Harald Binder

doi:https://doi.org/10.1080/10618600.2022.2035231

Sequential Learning of Regression Models by Penalized Estimation

Wessel N. van Wieringen, Harald Binder

Research output: Contribution to journal › Article › Academic › peer-review

2 Citations (Scopus)

Abstract

When data arrive in a sequence of two or more datasets, modeling on the most recent dataset should take previous datasets into account. We specifically investigate a strategy for regression modeling when parameter estimates from previous data can be used as anchoring points, yet may not be available for all parameters, thus, covariance information cannot be reused. A procedure that updates through targeted penalized estimation, which shrinks the estimator toward a nonzero value, is presented. The parameter estimate from the previous data serves as this nonzero value when an update is sought from novel data. This naturally extends to a sequence of datasets with the same response, but potentially only partial overlap in covariates. The iteratively updated regression parameter estimator is shown to be asymptotically unbiased and consistent. The penalty parameter is chosen through constrained cross-validated log-likelihood optimization. The constraint bounds the amount of shrinkage of the updated estimator toward the current one from below. The bound aims to preserve the (updated) estimator’s goodness of fit on all-but-the-novel data. The proposed approach is compared to other regression modeling procedures. Finally, it is illustrated on an epidemiological study where the data arrive in batches with different covariate-availability and the model is refitted with the availability of a novel batch. Supplementary materials for this article are available online.

Original language	English
Pages (from-to)	877-886
Number of pages	10
Journal	Journal of Computational and Graphical Statistics
Volume	31
Issue number	3
Early online date	2022
DOIs	https://doi.org/10.1080/10618600.2022.2035231
Publication status	Published - 2022

Keywords

Aymptotic unbiasedness
Consistency
Constrained cross-validation
Generalized linear model
Targeted ridge () penalty
Updating

Access to Document

https://doi.org/10.1080/10618600.2022.2035231

Cite this

@article{3ed7fdc4e41749f2aa8d2a5347004746,

title = "Sequential Learning of Regression Models by Penalized Estimation",

abstract = "When data arrive in a sequence of two or more datasets, modeling on the most recent dataset should take previous datasets into account. We specifically investigate a strategy for regression modeling when parameter estimates from previous data can be used as anchoring points, yet may not be available for all parameters, thus, covariance information cannot be reused. A procedure that updates through targeted penalized estimation, which shrinks the estimator toward a nonzero value, is presented. The parameter estimate from the previous data serves as this nonzero value when an update is sought from novel data. This naturally extends to a sequence of datasets with the same response, but potentially only partial overlap in covariates. The iteratively updated regression parameter estimator is shown to be asymptotically unbiased and consistent. The penalty parameter is chosen through constrained cross-validated log-likelihood optimization. The constraint bounds the amount of shrinkage of the updated estimator toward the current one from below. The bound aims to preserve the (updated) estimator{\textquoteright}s goodness of fit on all-but-the-novel data. The proposed approach is compared to other regression modeling procedures. Finally, it is illustrated on an epidemiological study where the data arrive in batches with different covariate-availability and the model is refitted with the availability of a novel batch. Supplementary materials for this article are available online.",

keywords = "Aymptotic unbiasedness, Consistency, Constrained cross-validation, Generalized linear model, Targeted ridge () penalty, Updating",

author = "{van Wieringen}, {Wessel N.} and Harald Binder",

note = "Publisher Copyright: {\textcopyright} 2022 The Author(s). Published with license by Taylor & Francis Group, LLC.",

year = "2022",

doi = "https://doi.org/10.1080/10618600.2022.2035231",

language = "English",

volume = "31",

pages = "877--886",

journal = "Journal of Computational and Graphical Statistics",

issn = "1061-8600",

publisher = "Taylor and Francis Ltd.",

number = "3",

}

TY - JOUR

T1 - Sequential Learning of Regression Models by Penalized Estimation

AU - van Wieringen, Wessel N.

AU - Binder, Harald

PY - 2022

Y1 - 2022

N2 - When data arrive in a sequence of two or more datasets, modeling on the most recent dataset should take previous datasets into account. We specifically investigate a strategy for regression modeling when parameter estimates from previous data can be used as anchoring points, yet may not be available for all parameters, thus, covariance information cannot be reused. A procedure that updates through targeted penalized estimation, which shrinks the estimator toward a nonzero value, is presented. The parameter estimate from the previous data serves as this nonzero value when an update is sought from novel data. This naturally extends to a sequence of datasets with the same response, but potentially only partial overlap in covariates. The iteratively updated regression parameter estimator is shown to be asymptotically unbiased and consistent. The penalty parameter is chosen through constrained cross-validated log-likelihood optimization. The constraint bounds the amount of shrinkage of the updated estimator toward the current one from below. The bound aims to preserve the (updated) estimator’s goodness of fit on all-but-the-novel data. The proposed approach is compared to other regression modeling procedures. Finally, it is illustrated on an epidemiological study where the data arrive in batches with different covariate-availability and the model is refitted with the availability of a novel batch. Supplementary materials for this article are available online.

AB - When data arrive in a sequence of two or more datasets, modeling on the most recent dataset should take previous datasets into account. We specifically investigate a strategy for regression modeling when parameter estimates from previous data can be used as anchoring points, yet may not be available for all parameters, thus, covariance information cannot be reused. A procedure that updates through targeted penalized estimation, which shrinks the estimator toward a nonzero value, is presented. The parameter estimate from the previous data serves as this nonzero value when an update is sought from novel data. This naturally extends to a sequence of datasets with the same response, but potentially only partial overlap in covariates. The iteratively updated regression parameter estimator is shown to be asymptotically unbiased and consistent. The penalty parameter is chosen through constrained cross-validated log-likelihood optimization. The constraint bounds the amount of shrinkage of the updated estimator toward the current one from below. The bound aims to preserve the (updated) estimator’s goodness of fit on all-but-the-novel data. The proposed approach is compared to other regression modeling procedures. Finally, it is illustrated on an epidemiological study where the data arrive in batches with different covariate-availability and the model is refitted with the availability of a novel batch. Supplementary materials for this article are available online.

KW - Aymptotic unbiasedness

KW - Consistency

KW - Constrained cross-validation

KW - Generalized linear model

KW - Targeted ridge () penalty

KW - Updating

UR - http://www.scopus.com/inward/record.url?scp=85128190377&partnerID=8YFLogxK

U2 - https://doi.org/10.1080/10618600.2022.2035231

DO - https://doi.org/10.1080/10618600.2022.2035231

M3 - Article

SN - 1061-8600

VL - 31

SP - 877

EP - 886

JO - Journal of Computational and Graphical Statistics

JF - Journal of Computational and Graphical Statistics

IS - 3

ER -

Sequential Learning of Regression Models by Penalized Estimation

Abstract

Keywords

Access to Document

Other files and links

Cite this