User Tools

Site Tools


start

GE461: Introduction to Data Science - Spring 2025

Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus

Prerequisites: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)
Credits: 3

Course Management Systems: Moodle
Course Website: http://www.cs.bilkent.edu.tr/~ge461/2025Spring

Instructor Team

  • S. Aksoy, C. Alkan, S. Arashloo, O. Arıkan, F. Can, E. Çiçek, T. Çukur, H. Dibeklioğlu, İ. Körpeoğlu, C. Tekin, E. Tüzün
  • Course Coordinator (contact point): S. Aksoy (saksoy AT cs.bilkent.edu.tr)

TAs

  • Farzad Hallaji Azad (farzad.hallaji AT bilkent.edu.tr)
  • Batuhan Uykulu (batuhan.uykulu AT bilkent.edu.tr)

Classroom and Hours

  • Clasroom: EB-101
  • Class hours:
    • Mon 13:30-15:20
    • Thu 08:30-10:20

Grading Policy

  • Final: 40 %
  • Projects: 60 %. Multiple computer/programming/exercise assignments of various sizes.
  • There will be 5 projects. Each project is 12 %.

Attendance

  • Attendance is mandatory. A student who misses more than 9 hours will fail the course automatically.

Exam

  • TBD

Projects

  • Multiple computer/programming/exercise assignments of various sizes.
  • A project can be assigned earlier than the indicated date on the weekly plan.
  • Projects can be individual or group based. Instructors will decide.
  • Projects will be uploaded to Moodle.
  • Programming languages like Python, Java, R or Matlab can be used in the projects.
  • Gaining hands-on experience and experimenting will be important. Real world data sets can be used (economical/financial data sets, medical/biological data sets, image/video data sets, social network data sets, IT data sets, etc.).

Other

  • Grades will be posted in SAPS.
  • There is no mandatory textbook for the course.

Week 1 (Jan 27, Jan 30)

Introduction; what is data science; data science applications. [Çiçek, Tüzün]
Topic Details: Introductory concepts in data science and applications. Overview of data science process.
Slides and Additional Material:\\ge461-lecture1-course_information-spring-2025.pdf
Topic Details: Software engineering applications.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework: None this week.
References:
Events:

Week 2 (Feb 3, Feb 6)

Data science applications; data science pipeline. [Alkan, Dibeklioğlu]
Topic Details: Genomics applications.
Slides and Additional Material:
Topic Details: Computer vision applications.
Slides and Additional Material: ge461_applications_vision_2025s.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015
Events:

Week 3 (Feb 10, Feb 13)

Data representation; preprocessing; preparation; crowdsourcing. [Arashloo, Çiçek]
Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).
Slides and Additional Material:data_pre-processing.pdf
Topic Details: Crowdsourcing applications and usage in data science.
Slides and Additional Material:ge_461-lecture_6-_crowdsourcing.pdf
Project/Exercise-Problem-Set/Homework: None this week
References:
Events:

Week 4 (Feb 17, Feb 20)

Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. [Körpeoğlu]
Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.
Slides and Additional Material: data_storage_and_access.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References: SQLite Pandas MapReduce ApacheHadoop ApacheSpark
Events:

Week 5 (Feb 24, Feb 27)

Basic models; parametric models; fitting. [Arıkan]
Topic Details: Multiparameter Linear Regression
Slides and Additional Material: ch3_linear_regression.pdf
Project: Solve following questions using Linear Regression: Exercises 3.7.8 and 3.7.9 in the ISLR Reference Book given below
References: An Introduction to Statistical Learning with Applications in Python, R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani and Jonathon Taylor.
Events:

Week 6 (Mar 3, Mar 6)

Application [Arıkan]
Topic Details: Model Selection in Multiparameter Regression
Slides: ch6_model_selection.pdf
Project/Exercise-Problem-Set/Homework: None this week
References: An Introduction to Statistical Learning with Applications in Python, R, Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani and Jonathon Taylor.
Events:

Week 7 (Mar 10, Mar 13)

Spring Break

Week 8 (Mar 17, Mar 20)

Dimensionality reduction; visualization. [Aksoy]
Topic Details: Feature reduction, feature selection, high-dimensional data visualization.
Slides and Additional Material: Dimensionality slides, t-SNE slides
Project/Exercise-Problem-Set/Homework: [Project (data)] (due 23:59 on April 7, 2025)
References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, Matlab: data visualization, Matplotlib: data visualization, t-SNE
Events:

Week 9 (Mar 24, Mar 27)

Unsupervised learning, clustering. [Aksoy]
Topic Details: K-means clustering, mixture models, hierarchical clustering.
Slides and Additional Material: Clustering slides
Project/Exercise-Problem-Set/Homework:
References: Matlab: cluster analysis, Scikit-learn: clustering, Scikit-learn: clustering examples
Events:

Week 10 (Mar 31, Apr 3)

Ramadan Holiday

Week 11 (Apr 7, Apr 10)

Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:

Week 12 (Apr 14, Apr 17)

Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Activation functions, convolutional neural networks, recurrent architectures.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:

Week 13 (Apr 21, Apr 24)

Machine learning in healthcare. [Çukur]
Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;
Slides and Additional Material: ge461_ml_in_healthcare.pdf
Project/Exercise-Problem-Set/Homework: ge461_pw13_description.pdf; ge461_pw13_data.zip (due date: 11 May 2025, 17:00)
References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5
Events: National Sovereignty and Children's Day (Apr 23)

Week 14 (Apr 28, May 1)

Data mining; online data stream classification; applications. [Can]
Topic Details: Concept drift, ensemble-based classification, text mining.
Slides and Additional Material: ge461_datastreamminingspring25.pdf
Project Tentative Days: Announcement April 28 or earlier, Due date: May 18, 23:59.
Project/Exercise-Problem-Set/Homework:
References:
Events: Labor and Solidarity Day (May 1)

Week 15 (May 5, May 8)

Reinforcement learning; applications. [Tekin]
Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:

Week 16 (May 12)

No class

Textbooks

Similar / Complementary Courses

Tools, Libraries, Systems, Languages

start.txt · Last modified: 2025/03/23 16:50 by ge461