Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: Results from a systematic review and simulation study

Amand F. Schmidt; Rolf H. H. Groenwold; Mirjam J. Knol; Arno W. Hoes; Mirjam Nielen; Kit C. B. Roes; Anthonius de Boer; Olaf H. Klungel

doi:https://doi.org/10.1016/j.jclinepi.2014.02.008

Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: Results from a systematic review and simulation study

Amand F. Schmidt, Rolf H. H. Groenwold, Mirjam J. Knol, Arno W. Hoes, Mirjam Nielen, Kit C. B. Roes, Anthonius de Boer, Olaf H. Klungel

Research output: Contribution to journal › Article › Academic › peer-review

41 Citations (Scopus)

Abstract

Objective To give a comprehensive comparison of the performance of commonly applied interaction tests. Methods A literature review and simulation study was performed evaluating interaction tests on the odds ratio (OR) or the risk difference (RD) scales: Cochran Q (Q), Breslow-Day (BD), Tarone, unconditional score, likelihood ratio (LR), Wald, and relative excess risk due to interaction (RERI)-based tests. Results Review results agreed with results from our simulation study, which showed that on the OR scale, in small sample sizes (eg, number of subjects ≤ 250) the type 1 error rates of the LR test was 0.10; the BD and Tarone tests showed results around 0.05. On the RD scale, the LR and RERI tests had error rates around 0.05. On both scales, tests did not differ regarding power. When exposure prevented the outcome RERI-based tests were relatively underpowered (eg, N = 100; RERI power = 5% vs. Wald power = 18%). With increasing sample size, difference decreased. Conclusion In small samples, interaction tests differed. On the OR scale, the Tarone and BD tests are recommended. On the RD scale, the LR and RERI-based tests performed best. However, RERI-based tests are underpowered compared with other tests, when exposure prevents the outcome, and sample size is limited. © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

Original language	English
Pages (from-to)	821-829
Journal	Journal of Clinical Epidemiology
Volume	67
Issue number	7
DOIs	https://doi.org/10.1016/j.jclinepi.2014.02.008
Publication status	Published - 2014
Externally published	Yes

Access to Document

https://doi.org/10.1016/j.jclinepi.2014.02.008

Cite this

Schmidt, A. F., Groenwold, R. H. H., Knol, M. J., Hoes, A. W., Nielen, M., Roes, K. C. B., de Boer, A., & Klungel, O. H. (2014). Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: Results from a systematic review and simulation study. Journal of Clinical Epidemiology, 67(7), 821-829. https://doi.org/10.1016/j.jclinepi.2014.02.008

@article{5caf6489b00c4c9ebd6603d4d2afe5a1,

title = "Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: Results from a systematic review and simulation study",

abstract = "Objective To give a comprehensive comparison of the performance of commonly applied interaction tests. Methods A literature review and simulation study was performed evaluating interaction tests on the odds ratio (OR) or the risk difference (RD) scales: Cochran Q (Q), Breslow-Day (BD), Tarone, unconditional score, likelihood ratio (LR), Wald, and relative excess risk due to interaction (RERI)-based tests. Results Review results agreed with results from our simulation study, which showed that on the OR scale, in small sample sizes (eg, number of subjects ≤ 250) the type 1 error rates of the LR test was 0.10; the BD and Tarone tests showed results around 0.05. On the RD scale, the LR and RERI tests had error rates around 0.05. On both scales, tests did not differ regarding power. When exposure prevented the outcome RERI-based tests were relatively underpowered (eg, N = 100; RERI power = 5% vs. Wald power = 18%). With increasing sample size, difference decreased. Conclusion In small samples, interaction tests differed. On the OR scale, the Tarone and BD tests are recommended. On the RD scale, the LR and RERI-based tests performed best. However, RERI-based tests are underpowered compared with other tests, when exposure prevents the outcome, and sample size is limited. {\textcopyright} 2014 The Authors. Published by Elsevier Inc. All rights reserved.",

author = "Schmidt, {Amand F.} and Groenwold, {Rolf H. H.} and Knol, {Mirjam J.} and Hoes, {Arno W.} and Mirjam Nielen and Roes, {Kit C. B.} and {de Boer}, Anthonius and Klungel, {Olaf H.}",

year = "2014",

doi = "https://doi.org/10.1016/j.jclinepi.2014.02.008",

language = "English",

volume = "67",

pages = "821--829",

journal = "Journal of Clinical Epidemiology",

issn = "0895-4356",

publisher = "Elsevier USA",

number = "7",

}

Schmidt, AF, Groenwold, RHH, Knol, MJ, Hoes, AW, Nielen, M, Roes, KCB, de Boer, A & Klungel, OH 2014, 'Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: Results from a systematic review and simulation study', Journal of Clinical Epidemiology, vol. 67, no. 7, pp. 821-829. https://doi.org/10.1016/j.jclinepi.2014.02.008

Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: Results from a systematic review and simulation study. / Schmidt, Amand F.; Groenwold, Rolf H. H.; Knol, Mirjam J. et al.
In: Journal of Clinical Epidemiology, Vol. 67, No. 7, 2014, p. 821-829.

Research output: Contribution to journal › Article › Academic › peer-review

TY - JOUR

T1 - Exploring interaction effects in small samples increases rates of false-positive and false-negative findings

T2 - Results from a systematic review and simulation study

AU - Schmidt, Amand F.

AU - Groenwold, Rolf H. H.

AU - Knol, Mirjam J.

AU - Hoes, Arno W.

AU - Nielen, Mirjam

AU - Roes, Kit C. B.

AU - de Boer, Anthonius

AU - Klungel, Olaf H.

PY - 2014

Y1 - 2014

N2 - Objective To give a comprehensive comparison of the performance of commonly applied interaction tests. Methods A literature review and simulation study was performed evaluating interaction tests on the odds ratio (OR) or the risk difference (RD) scales: Cochran Q (Q), Breslow-Day (BD), Tarone, unconditional score, likelihood ratio (LR), Wald, and relative excess risk due to interaction (RERI)-based tests. Results Review results agreed with results from our simulation study, which showed that on the OR scale, in small sample sizes (eg, number of subjects ≤ 250) the type 1 error rates of the LR test was 0.10; the BD and Tarone tests showed results around 0.05. On the RD scale, the LR and RERI tests had error rates around 0.05. On both scales, tests did not differ regarding power. When exposure prevented the outcome RERI-based tests were relatively underpowered (eg, N = 100; RERI power = 5% vs. Wald power = 18%). With increasing sample size, difference decreased. Conclusion In small samples, interaction tests differed. On the OR scale, the Tarone and BD tests are recommended. On the RD scale, the LR and RERI-based tests performed best. However, RERI-based tests are underpowered compared with other tests, when exposure prevents the outcome, and sample size is limited. © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

AB - Objective To give a comprehensive comparison of the performance of commonly applied interaction tests. Methods A literature review and simulation study was performed evaluating interaction tests on the odds ratio (OR) or the risk difference (RD) scales: Cochran Q (Q), Breslow-Day (BD), Tarone, unconditional score, likelihood ratio (LR), Wald, and relative excess risk due to interaction (RERI)-based tests. Results Review results agreed with results from our simulation study, which showed that on the OR scale, in small sample sizes (eg, number of subjects ≤ 250) the type 1 error rates of the LR test was 0.10; the BD and Tarone tests showed results around 0.05. On the RD scale, the LR and RERI tests had error rates around 0.05. On both scales, tests did not differ regarding power. When exposure prevented the outcome RERI-based tests were relatively underpowered (eg, N = 100; RERI power = 5% vs. Wald power = 18%). With increasing sample size, difference decreased. Conclusion In small samples, interaction tests differed. On the OR scale, the Tarone and BD tests are recommended. On the RD scale, the LR and RERI-based tests performed best. However, RERI-based tests are underpowered compared with other tests, when exposure prevents the outcome, and sample size is limited. © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=84902545752&origin=inward

UR - https://www.ncbi.nlm.nih.gov/pubmed/24768005

U2 - https://doi.org/10.1016/j.jclinepi.2014.02.008

DO - https://doi.org/10.1016/j.jclinepi.2014.02.008

M3 - Article

C2 - 24768005

SN - 0895-4356

VL - 67

SP - 821

EP - 829

JO - Journal of Clinical Epidemiology

JF - Journal of Clinical Epidemiology

IS - 7

ER -

Exploring interaction effects in small samples increases rates of false-positive and false-negative findings: Results from a systematic review and simulation study

Abstract

Access to Document

Other files and links

Cite this