TY - JOUR
T1 - Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables
AU - Hof, M. H. P.
AU - Zwinderman, A. H.
PY - 2012
Y1 - 2012
N2 - In record linkage studies, unique identifiers are often not available, and therefore, the linkage procedure depends on combinations of partially identifying variables with low discriminating power. As a consequence, wrongly linked covariate and outcome pairs will be created and bias further analysis of the linked data. In this article, we investigated two estimators that correct for linkage error in regression analysis. We extended the estimators developed by Lahiri and Larsen and also suggested a weighted least squares approach to deal with linkage error. We considered both linear and logistic regression problems and evaluated the performance of both methods with simulations. Our results show that all wrong covariate and outcome pairs need to be removed from the analysis in order to calculate unbiased regression coefficients in both approaches. This removal requires strong assumptions on the structure of the data. In addition, the bias significantly increases when the assumptions do not hold and wrongly linked records influence the coefficient estimation. Our simulations showed that both methods had similar performance in linear regression problems. With logistic regression problems, the weighted least squares method showed less bias. Because the specific structure of the data in record linkage problems often leads to different assumptions, it?is necessary that the analyst has prior knowledge on the nature of the data. These assumptions are more easily introduced in the weighted least squares approach than in the Lahiri and Larsen estimator. Copyright (c) 2012 John Wiley & Sons, Ltd
AB - In record linkage studies, unique identifiers are often not available, and therefore, the linkage procedure depends on combinations of partially identifying variables with low discriminating power. As a consequence, wrongly linked covariate and outcome pairs will be created and bias further analysis of the linked data. In this article, we investigated two estimators that correct for linkage error in regression analysis. We extended the estimators developed by Lahiri and Larsen and also suggested a weighted least squares approach to deal with linkage error. We considered both linear and logistic regression problems and evaluated the performance of both methods with simulations. Our results show that all wrong covariate and outcome pairs need to be removed from the analysis in order to calculate unbiased regression coefficients in both approaches. This removal requires strong assumptions on the structure of the data. In addition, the bias significantly increases when the assumptions do not hold and wrongly linked records influence the coefficient estimation. Our simulations showed that both methods had similar performance in linear regression problems. With logistic regression problems, the weighted least squares method showed less bias. Because the specific structure of the data in record linkage problems often leads to different assumptions, it?is necessary that the analyst has prior knowledge on the nature of the data. These assumptions are more easily introduced in the weighted least squares approach than in the Lahiri and Larsen estimator. Copyright (c) 2012 John Wiley & Sons, Ltd
U2 - https://doi.org/10.1002/sim.5498
DO - https://doi.org/10.1002/sim.5498
M3 - Article
C2 - 22807060
SN - 0277-6715
VL - 31
SP - 4231
EP - 4242
JO - Statistics in medicine
JF - Statistics in medicine
IS - 30
ER -