Bilkent University
Department of Computer Engineering
M.S.THESIS PRESENTATION

 

Characterization of Structural Variation through Assembly-to-Assembly Comparison

 

Rafi Çoktalaş
Master Student
(Supervisor: Assoc.Prof.Can Alkan)
Computer Engineering Department
Bilkent University

Abstract: Structural variations (SVs) are genomic variations affecting more than 50 nucleotides of DNA. SVs play a crucial role in evolution, and has critical phenotypic effects on organisms, such as genetic diseases in humans like autism, schizophrenia, epilepsy and cancer. Thus, SV characterization is of great significance. In the past, read based methodologies were utilized due to infeasibility of constructing genome assemblies. However, with advancements in technology, assembling genomes has become significantly more feasible and complete assemblies of human and other primate genomes are constructed. Despite the high quality assemblies, SV discovery in human genomes remains challenging due to the the genome’s repetitive nature and complex rearrangements caused by combination of SVs. Most existing SV discovery tools operating on genome assemblies require whole genome alignments, leading to high preprocessing times and memory usage. Therefore, new algorithms are still needed to efficiently discover SVs. Here we propose STRIVE, a linear time algorithm that operates on genome assembly sketches instead of whole genome alignments to characterize insertions, deletions and inversions. We evaluated the performance STRIVE with two experiments with simulated data on the first chromosome from GRCh38.p14 (hg38) assembly. STRIVE is able to accurately detect insertions, deletions and inversions within 11 to 12 seconds with preprocessing times ranging from 50 to 55 seconds. STRIVE achieved over 95% precision and recall values in the simulations without duplications. In the simulations that included duplications and SNPS, although still maintaining over 95% recall in inversion discovery, the precision and recall for insertions and deletions were lower, suggesting a need for increased robustness to duplications.

 

DATE: September 6, Friday @ 13:00 Place: EA 409