Dealing with missing data, small sample sizes, and heterogeneity in machine learning studies of brain disorders

Research output: Chapter in Book/Report/Conference proceedingChapterAcademicpeer-review

26 Citations (Scopus)

Abstract

In this chapter we explore three of the most common challenges in the application of machine learning techniques in brain disorders research: missing data, small sample sizes, and heterogeneity. After defining these challenges, we present a simple algorithm to generate data that are similar to a “real” dataset using pairwise correlations. This algorithm enables the reader to test the various strategies that are discussed later in the chapter. We then discuss a range of strategies that are currently available to mitigate the impact of missing data, small sample sizes, and heterogeneity on the results. As part of this discussion, we cover both classical strategies and state-of-the-art approaches based on neural networks. We conclude by providing a summary of key recommendations.
Original languageEnglish
Title of host publicationMachine Learning: Methods and Applications to Brain Disorders
PublisherElsevier
Pages249-266
ISBN (Electronic)9780128157398
DOIs
Publication statusPublished - 1 Jan 2019

Publication series

NameMachine Learning: Methods and Applications to Brain Disorders

Cite this