Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders

Rajat M. Thomas; Willem Bruin; Paul Zhutovsky; Guido van Wingen

doi:https://doi.org/10.1016/B978-0-12-815739-8.00014-6

Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders

Rajat M. Thomas, Willem Bruin, Paul Zhutovsky, Guido van Wingen

Research output: Chapter in Book/Report/Conference proceeding › Chapter › Academic › peer-review

26 Citations (Scopus)

Abstract

In this chapter we explore three of the most common challenges in the application of machine learning techniques in brain disorders research: missing data, small sample sizes, and heterogeneity. After defining these challenges, we present a simple algorithm to generate data that are similar to a “real” dataset using pairwise correlations. This algorithm enables the reader to test the various strategies that are discussed later in the chapter. We then discuss a range of strategies that are currently available to mitigate the impact of missing data, small sample sizes, and heterogeneity on the results. As part of this discussion, we cover both classical strategies and state-of-the-art approaches based on neural networks. We conclude by providing a summary of key recommendations.

Original language	English
Title of host publication	Machine Learning: Methods and Applications to Brain Disorders
Publisher	Elsevier
Pages	249-266
ISBN (Electronic)	9780128157398
DOIs	https://doi.org/10.1016/B978-0-12-815739-8.00014-6
Publication status	Published - 1 Jan 2019

Publication series

Name	Machine Learning: Methods and Applications to Brain Disorders

Access to Document

https://doi.org/10.1016/B978-0-12-815739-8.00014-6

Cite this

Thomas, R. M., Bruin, W., Zhutovsky, P., & van Wingen, G. (2019). Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders. In Machine Learning: Methods and Applications to Brain Disorders (pp. 249-266). (Machine Learning: Methods and Applications to Brain Disorders). Elsevier. https://doi.org/10.1016/B978-0-12-815739-8.00014-6

@inbook{8bf0dce930f840eda55063209cd356ec,

title = "Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders",

abstract = "In this chapter we explore three of the most common challenges in the application of machine learning techniques in brain disorders research: missing data, small sample sizes, and heterogeneity. After defining these challenges, we present a simple algorithm to generate data that are similar to a “real” dataset using pairwise correlations. This algorithm enables the reader to test the various strategies that are discussed later in the chapter. We then discuss a range of strategies that are currently available to mitigate the impact of missing data, small sample sizes, and heterogeneity on the results. As part of this discussion, we cover both classical strategies and state-of-the-art approaches based on neural networks. We conclude by providing a summary of key recommendations.",

author = "Thomas, {Rajat M.} and Willem Bruin and Paul Zhutovsky and {van Wingen}, Guido",

year = "2019",

month = jan,

day = "1",

doi = "https://doi.org/10.1016/B978-0-12-815739-8.00014-6",

language = "English",

series = "Machine Learning: Methods and Applications to Brain Disorders",

publisher = "Elsevier",

pages = "249--266",

booktitle = "Machine Learning: Methods and Applications to Brain Disorders",

}

Thomas, RM , Bruin, W, Zhutovsky, P & van Wingen, G 2019, Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders. in Machine Learning: Methods and Applications to Brain Disorders. Machine Learning: Methods and Applications to Brain Disorders, Elsevier, pp. 249-266. https://doi.org/10.1016/B978-0-12-815739-8.00014-6

Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders. / Thomas, Rajat M.; Bruin, Willem; Zhutovsky, Paul et al.
Machine Learning: Methods and Applications to Brain Disorders. Elsevier, 2019. p. 249-266 (Machine Learning: Methods and Applications to Brain Disorders).

Research output: Chapter in Book/Report/Conference proceeding › Chapter › Academic › peer-review

TY - CHAP

T1 - Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders

AU - Thomas, Rajat M.

AU - Bruin, Willem

AU - Zhutovsky, Paul

AU - van Wingen, Guido

PY - 2019/1/1

Y1 - 2019/1/1

N2 - In this chapter we explore three of the most common challenges in the application of machine learning techniques in brain disorders research: missing data, small sample sizes, and heterogeneity. After defining these challenges, we present a simple algorithm to generate data that are similar to a “real” dataset using pairwise correlations. This algorithm enables the reader to test the various strategies that are discussed later in the chapter. We then discuss a range of strategies that are currently available to mitigate the impact of missing data, small sample sizes, and heterogeneity on the results. As part of this discussion, we cover both classical strategies and state-of-the-art approaches based on neural networks. We conclude by providing a summary of key recommendations.

AB - In this chapter we explore three of the most common challenges in the application of machine learning techniques in brain disorders research: missing data, small sample sizes, and heterogeneity. After defining these challenges, we present a simple algorithm to generate data that are similar to a “real” dataset using pairwise correlations. This algorithm enables the reader to test the various strategies that are discussed later in the chapter. We then discuss a range of strategies that are currently available to mitigate the impact of missing data, small sample sizes, and heterogeneity on the results. As part of this discussion, we cover both classical strategies and state-of-the-art approaches based on neural networks. We conclude by providing a summary of key recommendations.

UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85079503581&origin=inward

U2 - https://doi.org/10.1016/B978-0-12-815739-8.00014-6

DO - https://doi.org/10.1016/B978-0-12-815739-8.00014-6

M3 - Chapter

T3 - Machine Learning: Methods and Applications to Brain Disorders

SP - 249

EP - 266

BT - Machine Learning: Methods and Applications to Brain Disorders

PB - Elsevier

ER -

Thomas RM , Bruin W, Zhutovsky P, van Wingen G. Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders. In Machine Learning: Methods and Applications to Brain Disorders. Elsevier. 2019. p. 249-266. (Machine Learning: Methods and Applications to Brain Disorders). doi: https://doi.org/10.1016/B978-0-12-815739-8.00014-6

Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders

Abstract

Publication series

Access to Document

Other files and links

Cite this