TY - JOUR
T1 - A probabilistic record Linkage Model for Survival Data
AU - Hof, Michel H.
AU - Ravelli, Anita C.
AU - Zwinderman, Aeilko H.
PY - 2017
Y1 - 2017
N2 - In the absence of a unique identifier, combining information from multiple files relies on partially identifying variables (e.g., gender, initials). With a record linkage procedure, these variables are used to distinguish record pairs that belong together (matches) from record pairs that do not belong together (nonmatches). Generally, the combined strength of the partially identifying variables is too low causing imperfect linkage; some true nonmatches are identified as match and, on the other hand, some true matches as nonmatch. To avoid bias in further analyses, it is necessary to correct for imperfect linkage. In this article, pregnancy data from the Perinatal Registry of the Netherlands were used to estimate the associations between the (baseline) characteristics from the first delivery and the time to a second delivery. Because of privacy regulations, no unique identifier was available to determine which pregnancies belonged to the same woman. To deal with imperfect linkage in a time-to-event setting, where we have a file with baseline characteristics and a file with event times, we developed a joint model in which the record linkage procedure and the time-to-event analysis are performed simultaneously. R code and example data are available as online supplemental material
AB - In the absence of a unique identifier, combining information from multiple files relies on partially identifying variables (e.g., gender, initials). With a record linkage procedure, these variables are used to distinguish record pairs that belong together (matches) from record pairs that do not belong together (nonmatches). Generally, the combined strength of the partially identifying variables is too low causing imperfect linkage; some true nonmatches are identified as match and, on the other hand, some true matches as nonmatch. To avoid bias in further analyses, it is necessary to correct for imperfect linkage. In this article, pregnancy data from the Perinatal Registry of the Netherlands were used to estimate the associations between the (baseline) characteristics from the first delivery and the time to a second delivery. Because of privacy regulations, no unique identifier was available to determine which pregnancies belonged to the same woman. To deal with imperfect linkage in a time-to-event setting, where we have a file with baseline characteristics and a file with event times, we developed a joint model in which the record linkage procedure and the time-to-event analysis are performed simultaneously. R code and example data are available as online supplemental material
U2 - https://doi.org/10.1080/01621459.2017.1311262
DO - https://doi.org/10.1080/01621459.2017.1311262
M3 - Article
SN - 0162-1459
VL - 112
SP - 1504
EP - 1515
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
IS - 520
ER -