Abstract
Next generation sequencing (NGS) has enabled us to accurately determine the nucleotide sequence of short fragments of DNA at a massive scale, which has led to various clinical applications of human genome sequencing. To extract information from these NGS experiments, virtually all analyses make use of a reference assembly of the human genome to map sequenced reads. Importantly, in these experiments a large fraction (~12%) of the sequenced DNA fragments are ignored as the origin of these sequences cannot be traced back to a (single) position on the reference assembly. The origin of these ignored or unmapped fragments is dual. On the one hand these fragments originate from sequence that occurs more than once (repeats). On the other hand, these fragments originate from sequence that is absent from the reference assembly. In practice, many of these unmapped fragments originate from so-called structural variations (SVs) where the sequenced genome differs from the reference assembly. In Part 1 of this thesis, we study this source of sequence variation by making use of so-called long-read sequencing technology and introduce methods to do so. In Part 2 of this thesis, we specifically study the DNA fragments that can’t be traced back to the human reference assembly, but instead seem to originate from DNA viruses.
Original language | English |
---|---|
Qualification | Doctor of Philosophy |
Awarding Institution | |
Supervisors/Advisors |
|
Award date | 9 Dec 2022 |
Place of Publication | s.l. |
Publisher | |
Publication status | Published - 9 Dec 2022 |
Keywords
- NIPT
- cell-free DNA
- de-novo assembly
- long-read sequencing
- next-generation sequencing
- non-invasive prenatal testing
- structural variation
- viral DNA