Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus
Prerequisites: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)
Credits: 3
Course Management Systems: Moodle
Course Website: http://www.cs.bilkent.edu.tr/~ge461/2023Spring
Instructor Team
TAs
Classroom and Hours
Grading Policy
Attendance
Exam
Projects
Other
Introduction; what is data science; data science applications. [Aksoy, Tüzün]
Topic Details: Introductory concepts in data science and applications. Overview of data science process.
Slides and Additional Material: ge461_lecture1_course_information.pdf
Topic Details: Software engineering applications.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework: None this week.
References:
Events: Classes begin (Jan 30).
Data science applications; data science pipeline. [Alkan, Dibeklioğlu]
Topic Details: Genomics applications.
Slides and Additional Material: ge461_lectures_3_genomics_applications-spring2023.pdf
Topic Details: Computer vision applications.
Slides and Additional Material: ge461_applications_vision_2023s.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015
Events:
Crowdsourcing; Data representation; preprocessing; preparation; [Arashloo]
Topic Details: Crowdsourcing applications and usage in data science.
Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).
Slides and Additional Material:crowdsourcing.pdf
Slides and Additional Material:preprocessing.pdf
Project/Exercise-Problem-Set/Homework: None this week
Events:
Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. [Körpeoğlu]
Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.
Slides and Additional Material:data-storage-and-processing.pdf
Project/Exercise-Problem-Set/Homework:
References:
SQLite
Pandas
ApacheSpark
Spark
Events:
Spring Break
Topic Details:
Slides:
Project/Exercise-Problem-Set/Homework:
References:
Events: Spring Break (Mar 6-8)
Basic models; parametric models; fitting. [S. Dayanık]
Topic Details: Exploratory data analysis, loess smoother, chi-squared test of independence
Slides and Additional Material: s2023_week06.zip
Project/Exercise-Problem-Set/Homework:
References:
Events:
Linear regression, goodness of fit [S. Dayanık]
Topic Details: linear regression and least squares method, factors and dummy variables, analysis of variance
Slides and Additional Material: s2023_week07.zip
Project/Exercise-Problem-Set/Homework:
References:
Events:
Diagnostic plot, nested and unnested model comparisons [S. Dayanık]
Topic Details: Hypothesis testing, confidence intervals, prediction intervals
Slides and Additional Material: s2023_week08.zip
Project: Complete Analysis of Dodgers Advertising and Promotion Study due 19:00 on Sunday, April 23. Details are in dodgers.html in zip file
References:
Events:
Dimensionality reduction; visualization. [Aksoy]
Topic Details: Feature reduction, feature selection, high-dimensional data visualization.
Slides and Additional Material: Dimensionality slides, t-SNE slides
Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on May 7, 2023)
References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, Matlab: data visualization,
Matplotlib: data visualization, t-SNE
Events: Bilkent Day (April 3)
Midterm Weak
Unsupervised learning, clustering. [Aksoy]
Topic Details: K-means clustering, mixture models, hierarchical clustering.
Slides and Additional Material: Clustering slides
Project/Exercise-Problem-Set/Homework:
References: Matlab: cluster analysis, Scikit-learn: clustering, Scikit-learn: clustering examples
Events: Feast of Ramadan holiday (Apr 21-23), National Sovereignty and Children's Day holiday (Apr 23)
Machine learning; supervised learning; classifiers; deep learning. [Dündar]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material: ge461_supervisedlearning_part1.pdf
Project/Exercise-Problem-Set/Homework:
References:
Events:
Machine learning; supervised learning; classifiers; deep learning. [Dündar]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material: ge461_supervisedlearning_part2.pdf
Project/Exercise-Problem-Set/Homework:
References:
Events: Labor and Solidarity Day holiday (May 1)
Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Activation functions, convolutional neural networks, recurrent architectures.
Slides and Additional Material: ge461_deep_learning_2023s.pdf
Project/Exercise-Problem-Set/Homework:[Project Description | Data] (due 23:55 on May 22, 2023)
References:
Events:
Machine learning in healthcare. [Çukur]
Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;
Slides and Additional Material: ge461_ml_in_healthcare.pdf
Project: ge461_pw13_description.pdf ge461_pw13_data.zip
References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5
Events:
Data mining; online data stream classification; applications. [Can]
Topic Details: Concept drift, ensemble-based classification, text mining.
Slides and Additional Material: ge461_datastreamminingspring23.pdf ge461_datastreamhwspringver2_2023.pdf
Project/Exercise-Problem-Set/Homework: ge461_datastreamhwspringver1_2023_2.pdf
References:
Events:
Reinforcement learning; applications. [Tekin]
Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning, Multi-armed bandits
Slides and Additional Material: https://www.dropbox.com/s/65h9melvnvuml2x/ge461_reinforcementlearning.pdf?dl=0
Project/Exercise-Problem-Set/Homework:
References:
Events: