CS 481: Bioinformatics Algorithms
... more like:
(bio)informatics ALGORITHMS
Spring 2024
- Instructor: Can
Alkan
- TA: Ricardo
Román Brenes (ricardo@bilkent)
- Class hours:
- Tue 13:30-15:20 - Fri 09:30-10:20
- Class room: EE-214
- Office hour: by appointment.
NOTE: Some biology, molecular
biology, genetics background would help, but not
required. Basics regarding the topic will be covered in class.
Prerequisites: Elementary discrete mathematics, basic
algorithms and data structures, and programming proficiency with, e.g.,
C/C++/Java will be expected. Knowledge of elementary combinatorics and
probability, fundamental algorithms for sorting, searching, hashing and
dynamic programming, elementary graph algorithms would be very helpful.
Textbook: none required.
- Recommended Material
- Genome-Scale Algorithm Design, Veli Mäkinen, Djamal Belazzougui,
Fabio Cunial, Alexandru I. Tomescu, 2015, Cambridge University Press
- An Introduction to Bioinformatics Algorithms (Computational
Molecular Biology), Neil Jones and Pavel Pevzner, MIT Press, 2004
- Bioinformatics algorithms : an active learning approach, Phillip
Compeau and Pavel Pevzner, Active Learning Publishers; 2018
- Biological Sequence Analysis: Probabilistic Models of Proteins and
Nucleic Acids, Richard Durbin, Sean R. Eddy, Anders Krogh, Graeme
Mitchison, Cambridge University Press
- Bioinformatics: The Machine Learning Approach, Second Edition,
Pierre Baldi, Soren Brunak, MIT Press
- Algorithms on Strings, Trees, and Sequences: Computer Science and
Computational Biology, Dan Gusfield, Cambridge University Press
Grading: There will be one midterm exam (25%) and a
final exam (35%). A further 30% of the final grade will be based on
homework/programming assignments, and 10% will be based on quizzes.
Grading Policy: Note that we do not
discuss with students about grades. Therefore, we will not answer any
questions about the passing grades and/or students' requests for
passing the course. Any emails sent to this effect will be omitted.
Planned
Lecture Cancellations
The following lectures are most likely to be canceled due to conference
travels.
- April 30, 2024
- May 3, 2024
Course material (updated as of Fall 2022 - subject to further updates)
Note: Some slides are adapted from http://bix.ucsd.edu/bioalgorithms/slides.php<
All pseudocodes in the slides are provided as guidelines, not meant for
direct "copy-paste" implementation.
- 0-
Introduction
- 0b- Short note on homework implementations
- 1-
Motif Finding & Algorithm Design Techniques
- 2-
Exact String Matching - Knuth-Morris-Pratt and Boyer-Moore
- 3-
Exact String Matching - Rabin-Karp, Finite Automata. Shift/And
- 4-
Pattern Matching - Hashing, keyword trees, and suffix trees, suffix
arrays, Aho-Corasick
- 5-
Burrows-Wheeler Transformation and Ferragina-Manzini index
- 6-
Dynamic programming, Manhattan grid, edit distance, longest common
subsequence
- 7-
Global alignment, affine gap penalties, local alignment,
Needleman-Wunsch and Smith-Waterman
- Needleman-Wunsch with affine gaps sample
- Smith-Waterman samples
- Static
- Random
sequence generator
- 8-
Banded alignment, linear space alignment, block edit distance, Four
Russians method
- 9-
Multiple sequence alignment
- 10-
Phylogenetic tree construction: Guide trees, evolutionary trees -
Neighbor-Joining and UPGMA, parsimony, Sankoff and Fitch algorithms
- 11
- Heuristic sequence search. Short introduction to BLAST. Hash table
indexes, minimizers. Chaining
- 12- MEMs,
MUMs, BWA-MEM, minimap2, K-mer
data structures
- 14-
Graphs in genome analysis. OLC, de Bruijn, string graphs. Aligning
reads to graphs
- 15-
Applications: short introduction to genome sequencing. Current
platforms and data types. Standard file formats
- 16- Applications: programming libraries, application-specific
programming languages
Key dates (tentative, subject to change)
- Midterm: March 26, 2024 -- during class hours
- Quizzes:
- Quiz 1: February 13, 2024
- Quiz 2: March 5, 2024
- Quiz 3: March 19, 2024
- Quiz 4: April 16, 2024
- Quiz 5: May 14, 2024
- Homeworks:
- HW0:
- Assigned: February 9, 2024
- Deadline: February 16, 2024
- HW1:
- Assigned: February 19, 2024
- Deadline: March 1, 2024
- HW2:
- Assigned: March 6, 2024
- Deadline: March 19, 2024
- HW3:
- HW4:
- Assigned: April 15, 2024
- Deadline: April 26, 2024
- HW5:
- Assigned: May 2, 2024
- Deadline: May 14, 2024
Important information for exams
- You will be given the opportunity to bring cheat sheet to the midterm
and final exams (not the quizzes). DO NOT try to memorize anything,
instead grasp the logic behind each algorithm & data structure.
- Write legibly, and make your answers clear.
- Unintelligible answers may fail to receive credit.
- In the case of a "design an algorithm" question, you are expected to
give only pseudocode. Any "real" code may not be graded.
Frequently Asked Questions
- I need at least grade (C/C+/B etc ..) to be able to graduate or keep
my status at a certain level (satisfactory, honor, high honor, etc.).
Can you please increase my grade?
- I am likely receiving an F and fail the course. Can you lower the F
threshold?
- I need at least some specific grade to keep my cGPA above some
threshold. Can you please give me at least (something)?
- I think I deserve more partial credits from question X in exam.
- Which topics should I prioritize to study more for the midterm and/or
final exam?