A comparison study on creating simulated patient data for individuals suffering from chronic coronary disorders

Angela Koloi, Vasileios S. Loukas, Antonis Sakellarios, Jos A. Bosch, Rick Quax, Karina Nowakowska, Nikolaos Tachos, Jakub Kaźmierski, Costas Papaloukas, Dimitrios Fotiadis

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

An emerging area in data science that has lately gained attention is the virtual population (VP) and synthetic data generation. This field has the potential to significantly affect the healthcare industry by providing a means to augment clinical research databases that have a shortage of subjects. The current study provides a comparative analysis of five distinct approaches for creating virtual data populations from real patient data. The data set utilized for the current analyses involved clinical data collected among patients scheduled for elective coronary artery bypass graft surgery (CABG). To that end, the five computational techniques employed to augment the given dataset were: (i) Tabular Preset, (ii) Gaussian Copula Model (iii) Generative Adversarial Network based (GAN) Deep Learning data synthesizer (CTGAN), (iv) a variation of the CTGAN Model (Copula GAN), and (v) VAE-based Deep Learning data synthesizer (TVAE). The performance of these techniques was assessed against their effectiveness in producing high-quality virtual data. For this purpose, dataset correlation matrices, cosine similarity distance, density histograms, and kernel density estimation are employed to perform a comparative analysis of each attribute and the respective synthetic equivalent. Our findings demonstrate that Gaussian Copula Model prevails in creating virtual data with consistent distributions (Kolmogorov-Smirnov (KS) and Chi-Squared (CS) tests equal to 0.9 and 0.98, respectively) and correlation patterns (average cosine similarity equals to 0.95).Clinical Relevance - It has been shown that the use of a VP can increase the predictive performance of a ML model, i.e., above using a smaller non-augmented population.
Original languageEnglish
Title of host publication2023 45th Annual International Conference of the IEEE Engineering in Medicine and Biology Conference, EMBC 2023 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350324471
DOIs
Publication statusPublished - 2023
Event45th Annual International Conference of the IEEE Engineering in Medicine and Biology Conference, EMBC 2023 - Sydney, Australia
Duration: 24 Jul 202327 Jul 2023

Publication series

NameProceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS

Conference

Conference45th Annual International Conference of the IEEE Engineering in Medicine and Biology Conference, EMBC 2023
Country/TerritoryAustralia
CitySydney
Period24/07/202327/07/2023

Cite this