Computational analyses to characterise hidden information in short and long read sequencing data of human genomes: there’s more than meets the reference

Research output: PhD ThesisPhd-Thesis - Research and graduation internal

Abstract

Next generation sequencing (NGS) has enabled us to accurately determine the nucleotide sequence of short fragments of DNA at a massive scale, which has led to various clinical applications of human genome sequencing. To extract information from these NGS experiments, virtually all analyses make use of a reference assembly of the human genome to map sequenced reads. Importantly, in these experiments a large fraction (~12%) of the sequenced DNA fragments are ignored as the origin of these sequences cannot be traced back to a (single) position on the reference assembly. The origin of these ignored or unmapped fragments is dual. On the one hand these fragments originate from sequence that occurs more than once (repeats). On the other hand, these fragments originate from sequence that is absent from the reference assembly. In practice, many of these unmapped fragments originate from so-called structural variations (SVs) where the sequenced genome differs from the reference assembly. In Part 1 of this thesis, we study this source of sequence variation by making use of so-called long-read sequencing technology and introduce methods to do so. In Part 2 of this thesis, we specifically study the DNA fragments that can’t be traced back to the human reference assembly, but instead seem to originate from DNA viruses.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
Supervisors/Advisors
  • Sistermans, Erik, Supervisor
  • Reinders, M.J.T., Supervisor, External person
  • Holstege, Henne, Co-supervisor
Award date9 Dec 2022
Place of Publications.l.
Publisher
Publication statusPublished - 9 Dec 2022

Keywords

  • NIPT
  • cell-free DNA
  • de-novo assembly
  • long-read sequencing
  • next-generation sequencing
  • non-invasive prenatal testing
  • structural variation
  • viral DNA

Cite this