Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus
Prerequisites: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)
Credits: 3
Course Management Systems: Moodle
Course Website: http://www.cs.bilkent.edu.tr/~ge461/2022Spring
Instructor Team
TAs
Classroom and Hours
Grading Policy
Attendance
Exam
Projects
Other
Introduction; what is data science; data science applications. [Çiçek, Tüzün]
Topic Details: Introductory concepts in data science and applications. Overview of data science process.
Slides and Additional Material:ge_461_-_lecture_1_-_course_information_spring_2022.pdf
Topic Details: Software engineering applications.
Slides and Additional Material:ge461_lecture_2_datascienceinsoftwareengineering.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References:
Events: Classes begin (Jan 31).
Data science applications; data science pipeline. [Alkan, Dibeklioğlu]
Topic Details: Genomics applications.
Slides and Additional Material:
Topic Details: Computer vision applications.
Slides and Additional Material:ge_461_-_lecture_4_-_computer_vision_applications_-_spring_2022.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015
Events:
Data representation; preprocessing; preparation; crowdsourcing. [Arashloo, Çiçek]
Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).
Slides and Additional Material:preprocessing.pdf
Topic Details: Crowdsourcing applications and usage in data science.
Slides and Additional Material:ge_461_-_lecture_6_-_crowdsourcing.pdf
Project/Exercise-Problem-Set/Homework: None this week
Events:
Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. [Körpeoğlu]
Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.
Slides and Additional Material: Slides
Project/Exercise-Problem-Set/Homework:
References:
SQLite
Pandas
ApacheSpark
Spark
Events:
Basic models; parametric models; fitting. [S. Dayanık]
Topic Details: Exploratory data analysis, loess smoother, chi-squared test of independence, linear regression and least squares method, factors and dummy variables, all illustrated on Dodgers Advertising and Promotion case study with R, RStudio, and SQLite
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:
Spring Break
Topic Details:
Slides:
Project/Exercise-Problem-Set/Homework:
References:
Events: Spring Break (March 10-13)
Application to customer choice problems (conjoint analysis) [S. Dayanık]
Topic Details: Part worths, part importance, their estimations from product rankings with multiple regression
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework: Dodgers Promotion Project due 19:00 on Saturday, April 9 to be submitted on Moodle page. Project details are in the dodgers.Rmd/dodgers.html files inside Week 7 course materials
References:
Events:
Conjoint analysis continued, and authorship problem [S. Dayanık]
Topic Details: New product design with market simulation to increase overall market share; who wrote the Federalists papers (identiciation of authorships by means of Bayesian classifiers, kNN)
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:
Dimensionality reduction; visualization. [Aksoy]
Topic Details: Feature reduction, feature selection, high-dimensional data visualization.
Slides and Additional Material: Dimensionality slides, t-SNE slides
Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on April 21, 2022)
References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, t-SNE
Events:
Unsupervised learning, clustering. [Aksoy]
Topic Details: K-means clustering, mixture models, hierarchical clustering.
Slides and Additional Material: Clustering slides
Project/Exercise-Problem-Set/Homework:
References: Matlab: cluster analysis, Scikit-learn: clustering
Events: Bilkent Day (April 3)
Machine learning; supervised learning; classifiers; deep learning. [Dündar]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material: Part1, Part2
Project/Exercise-Problem-Set/Homework:
References:
Events:
Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Activation functions, convolutional neural networks, recurrent architectures.
Slides and Additional Material: ge461_deep_learning_2022s.pdf
Project/Exercise-Problem-Set/Homework: [Project Description | Data)] (due 23:55 on April 30, 2022)
References:
Events:
Machine learning in healthcare. [Çukur]
Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;
Slides and Additional Material: ge461_ml_in_healthcare.pdf
Project/Exercise-Problem-Set/Homework: (Due: 06/05/2022) ge461_pw13_description.pdf ge461_pw13_data.zip
References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5
Events: Spring Festival (Apr 29-30)
Data mining; online data stream classification; applications. [Can]
Topic Details: Concept drift, ensemble-based classification, text mining.
Slides and Additional Material: DataStreamMining
Project/Exercise-Problem-Set/Homework:
References:
Events: Feast of Ramadan holiday (May 2-4)
Reinforcement learning; applications. [Tekin]
Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning
Slides and Additional Material: ge461_reinforcementlearning.pdf
Project/Exercise-Problem-Set/Homework:
References:
Events: Last day of classes (May 13)