Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus
Prerequisites: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)
Credits: 3
Course Management Systems: Moodle
Course Website: http://www.cs.bilkent.edu.tr/~ge461/2025Spring
Instructor Team
TAs
Classroom and Hours
Grading Policy
Attendance
Exam
Projects
Other
Introduction; what is data science; data science applications. [Çiçek, Tüzün]
Topic Details: Introductory concepts in data science and applications. Overview of data science process.
Slides and Additional Material:\\ge461-lecture1-course_information-spring-2025.pdf
Topic Details: Software engineering applications.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework: None this week.
References:
Events:
Data science applications; data science pipeline. [Alkan, Dibeklioğlu]
Topic Details: Genomics applications.
Slides and Additional Material:
Topic Details: Computer vision applications.
Slides and Additional Material: ge461_applications_vision_2025s.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015
Events:
Data representation; preprocessing; preparation; crowdsourcing. [Arashloo, Çiçek]
Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).
Slides and Additional Material:data_pre-processing.pdf
Topic Details: Crowdsourcing applications and usage in data science.
Slides and Additional Material:ge_461-lecture_6-_crowdsourcing.pdf
Project/Exercise-Problem-Set/Homework: None this week
References:
Events:
Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. [Körpeoğlu]
Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.
Slides and Additional Material: data_storage_and_access.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References:
SQLite
Pandas
MapReduce
ApacheHadoop
ApacheSpark
Events:
Basic models; parametric models; fitting. [Arıkan]
Topic Details: Multiparameter Linear Regression
Slides and Additional Material: ch3_linear_regression.pdf
Project: Solve following questions using Linear Regression: Exercises 3.7.8 and 3.7.9 in the ISLR Reference Book given below
References: An Introduction to Statistical Learning with Applications in Python, R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani and Jonathon Taylor.
Events:
Application [Arıkan]
Topic Details: Model Selection in Multiparameter Regression
Slides: ch6_model_selection.pdf
Project/Exercise-Problem-Set/Homework: None this week
References: An Introduction to Statistical Learning with Applications in Python, R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani and Jonathon Taylor.
Events:
Spring Break
Dimensionality reduction; visualization. [Aksoy]
Topic Details: Feature reduction, feature selection, high-dimensional data visualization.
Slides and Additional Material: Dimensionality slides, t-SNE slides
Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on April 7, 2025)
References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, Matlab: data visualization,
Matplotlib: data visualization, t-SNE
Events:
Unsupervised learning, clustering. [Aksoy]
Topic Details: K-means clustering, mixture models, hierarchical clustering.
Slides and Additional Material: Clustering slides
Project/Exercise-Problem-Set/Homework:
References: Matlab: cluster analysis, Scikit-learn: clustering, Scikit-learn: clustering examples
Events:
Ramadan Holiday
Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:
Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Activation functions, convolutional neural networks, recurrent architectures.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:
Machine learning in healthcare. [Çukur]
Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;
Slides and Additional Material: ge461_ml_in_healthcare.pdf
Project/Exercise-Problem-Set/Homework: ge461_pw13_description.pdf; ge461_pw13_data.zip (due date: 11 May 2025, 17:00)
References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5
Events: National Sovereignty and Children's Day (Apr 23)
Data mining; online data stream classification; applications. [Can]
Topic Details: Concept drift, ensemble-based classification, text mining.
Slides and Additional Material: ge461_datastreamminingspring25.pdf
Project Tentative Days: Announcement April 28 or earlier, Due date: May 18, 23:59.
Project/Exercise-Problem-Set/Homework:
References:
Events: Labor and Solidarity Day (May 1)
Reinforcement learning; applications. [Tekin]
Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:
No class