This is an old revision of the document!
Introduction to data science fundamentals, techniques and applications; data collection, preparation, storage and querying; parametric models for data; models and methods for fitting, analysis, evaluation, and validation; dimensionality reduction, visualization; various learning methods, classifiers, clustering, data and text mining; applications in diverse domains such as business, medicine, social networks, computer vision; breadth knowledge on topics and hands-on experience through projects and computer assignments. STARS Syllabus
Prerequisites: (CS 101 or CS114 or CS 115) and (MATH 230 or MATH 255 or MATH 260) and (MATH 225 or MATH 241 or MATH 220)
Credits: 3
Course Management Systems: Moodle
Course Website: http://www.cs.bilkent.edu.tr/~ge461/2024Spring
Instructor Team
TAs
Classroom and Hours
Grading Policy
Attendance
Exam
Projects
Other
Introduction; what is data science; data science applications. [Çiçek, Tüzün]
Topic Details: Introductory concepts in data science and applications. Overview of data science process.
Slides and Additional Material:
Topic Details: Software engineering applications.
Slides and Additional Material:ge_461_-_lecture_1_-_course_information_compressed.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References:
Events:
Data science applications; data science pipeline. [Alkan, Dibeklioğlu]
Topic Details: Genomics applications.
Slides and Additional Material:
Topic Details: Computer vision applications.
Slides and Additional Material: ge461_applications_vision_2024s.pdf
Project/Exercise-Problem-Set/Homework: None this week.
References: "Big Data: Astronomical or Genomical?", Stephens et al., 2015
Events:
Data representation; preprocessing; preparation; crowdsourcing. [Arashloo, Çiçek]
Topic Details: Normalization, Noise Removal (Filtering), Anomaly Detection, Data Compression, Noise Removal (ICA).
Slides and Additional Material:2024_pre-processing.pdf
Topic Details: Crowdsourcing applications and usage in data science.
Slides and Additional Material:ge461-crowdsourcing.pdf
Project/Exercise-Problem-Set/Homework: None this week
Events:
Data collection; storage; querying; SQL, NoSQL; cloud; distributed storage and computing. [Körpeoğlu]
Topic Details: RDMBs, SQL; SQLite, Pandas; NoSQL; MapReduce and Hadoop; Spark.
Slides and Additional Material: Slides
Project/Exercise-Problem-Set/Homework:
References:
SQLite
Pandas
MapReduce
ApacheHadoop
ApacheSpark
Events:
Basic models; parametric models; fitting. [S. Dayanık]
Topic Details: Exploratory data analysis, loess smoother, chi-squared test of independence, linear regression and least squares method, factors and dummy variables, all illustrated on Dodgers Advertising and Promotion case study with R, RStudio, and SQLite
Slides and Additional Material: Week05 Dodgers.zip
Project/Exercise-Problem-Set/Homework: Dodgers Project
References: Posit (former RStudio) R SQLite R for Data Science Modern Data Science with R
Events:
Application to customer choice problems (conjoint analysis) [S. Dayanık]
Topic Details: Part worths, part importance, their estimations from product rankings with multiple regression
Slides: Conjoint Analysis and Market Simulation
Project/Exercise-Problem-Set/Homework:
References:
Events: Spring Break (Mar 7-8)
Authorship problem, text analysis, and topic modeling [S. Dayanık]
Topic Details: Who wrote the Federalists papers (identiciation of authorships by means of Bayesian classifiers, kNN)
Slides and Additional Material:
Federalist Papers Analysis
Latent Diriclet Allocation Graphical Model
Project/Exercise-Problem-Set/Homework:
References:
Events:
Dimensionality reduction; visualization. [Aksoy]
Topic Details: Feature reduction, feature selection, high-dimensional data visualization.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References: Matlab: dimensionality reduction, Scikit-learn: decomposition, Scikit-learn: decomposition examples, Scikit-learn: manifold learning, t-SNE
Events:
Unsupervised learning, clustering. [Aksoy]
Topic Details: K-means clustering, mixture models, hierarchical clustering.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References: Matlab: cluster analysis, Scikit-learn: clustering
Events:
Machine learning; supervised learning; classifiers; deep learning. [Dündar]
Topic Details: Bayesian decision theory, linear discriminants, introduction to neural networks, support vector machines, decision trees.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events: Bilkent Day (April 3)
Ramadan Holiday
Machine learning; supervised learning; classifiers; deep learning. [Dibeklioğlu]
Topic Details: Activation functions, convolutional neural networks, recurrent architectures.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:
Machine learning in healthcare. [Çukur]
Topic Details: Healthcare analytics: diagnostics, medical imaging, in-patient care, hospital management, risk analytics, wearables. Deep learning architectures for medical applications;
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References: Hastie, Tibshirani and Friedman, The Elements of Statistical Learning, Ch. 11 and 14; Mead, Analog VLSI and Neural Systems, Ch. 4; Bishop, Pattern Recognition and Machine Learning, Ch. 5
Events: National Sovereignty and Children's Day (Apr 23)
Data mining; online data stream classification; applications. [Can]
Topic Details: Concept drift, ensemble-based classification, text mining.
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events: Labor and Solidarity Day (May 1)
Reinforcement learning; applications. [Tekin]
Topic Details: Applications of Reinforcement Learning, Markov Decision Processes, Value Iteration, Q Learning
Slides and Additional Material:
Project/Exercise-Problem-Set/Homework:
References:
Events:
To be used if needed