Instructor:
Pinar Duygulu
Office : EA 433
e-mail : duygulu[at]cs.bilkent.edu.tr
Phone : (312) 290 31 43
Office hours: by
appointment..
Course web page:
http://www.cs.bilkent.edu.tr/~duygulu/Courses/CS554/
Textbook:
Computer Vision - A
modern Aproach
by David A. Forsyth & Jean Ponce, Prentice Hall, Ed. 1,
2002
Other textbooks:
Computer
Vision by Dana Ballard and Chris Brown (available online)
Digital Image Processing by Rafael Gonzalez and Richard Woods
Computer Vision by Linda Shapiro and George Stockman
Related Material: http://www.cs.bilkent.edu.tr/~duygulu/CVlinks.html
Also other complementary articles that will be made
available
Time & Location: Thursdays 10:40-13:30, EA 502
Course Description:
Basic concepts in computational vision. Relation to human visual
perception. The analysis and understanding of image and video data.
Mathematical foundations, image formation and representation,
segmentation, feature extraction, contour and region analysis, camera
geometry and calibration, stereo, motion, 3-D reconstruction, object
and scene recognition, object and people tracking, human activity
recognition and inference.
Prerequisites:Knowledge
of linear algebra and calculus, probability and statistics
Topics:
Introduction, Color and Light, Linear
Filters, Texture, Edge detection, Interest Points, Cameras,
Multi-view Geometry, Stereopsis, Motion,
Segmentation,
Object recognition, Face recognition, Image and Vieo
Databases
Grading:
Projects 55%
Midterm 15%
Final 20%
Paper Presentations 10%
Announcements:
Lectures
Introduction
(slides)
|
- Topics
- What is computer vision? Why is it difficult? Which cues
do humans use to perceive? Application areas
- Links
|
|
|
- Topics
- Image Representation,Review of Linear Algebra,Geometrical
Transformations, Introduction to Matlab,Handling Images in Matlab
- Readings:
|
|
|
- Topics
- Image Formation, Point Processing, Blob Processing,
Binary image
analysis,Thresholding,Connected component analysis,Mathematical
morphology,Region propoerties
- Readings:
- Links
|
Linear Filters
(slides1,
slides2)
|
- Topics
- Linear filters, convolution, smoothing,
derivatives, Fourier transform, sampling and aliazing, gaussian
pyramids
- Readings
- Chapter 7 from Forsyth&Ponce
- Correlation
and convolution, by David Jacobs
- Computer
vision for interactive computer graphics,W. T. Freeman, D.
Anderson,
P. Beardsley, C. Dodge,
H. Kage, K. Kyuma,
Y. Miyake,
M. Roth,
K. Tanaka,
C. Weissman,
W. Yerazunis, in IEEE Computer Graphics and
Applications, volume 18, number 3, May--June, pp. 42-53, 1998.
- Links
|
Edge Detection
(slides)
|
- Topics
- Derivatives,
Edge detection, Hough Transform
- Readings
- Chapter 8 from Forsyth&Ponce
- A
Computational Approach to Edge
Detection, J. Canny, IEEE Transactions on
Pattern Analysis and Machine
Intelligence, Vol 8, No. 6, Nov 1986.
- Chapter 4 from Olivier Faugeras' book: Three-Dimensional
Computer Vision, MIT Press, 1993
- Links
|
|
- Topics
- Texture analysis and
synthesis
- Readings
- Chapter 9 from Forsyth&Ponce
- Pyramid
based Texture Analysis/Synthesis, David
Heeger and James Bergen, SIGGARPH 1995
- A Computational Model of Texture Segmentation, J. Malik
and P. Perona, Proc. Computer Vision and Pattern Recognition, 1989
- Eraly Vision and Texture Perception, J.R. Bergen and E.H.
Adelson, Nature, 1988
- W.Y. Ma and B.S. Manjunath, Texture
features and learning similarity,
Proceedings of IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, San Francisco, pp. 425-430, June, 1996
- Texture
Synthesis by Non-parametric Sampling, Alexei A. Efros and Thomas K.
Leung, IEEE International Conference on Computer Vision (ICCV'99),
Corfu, Greece, September 1999,
- Image
Quilting for Texture Synthesis and Transfer, Alexei A. Efros and
William T. Freeman, Proceedings
of SIGGRAPH '01, Los Angeles, California, August, 2001
|
Interest Points
(slides)
|
- Topics
- Harris Detector, Local invariant points,
SIFT descriptors
- Readings
- A
combined corner and edge detector, Chris Harris and Mike Stephans,
Proceedings of The Fourth Alvey Vision Conference, Manchester, pp
147-151. 1988
- Local
Greyvalue Invariants for Image Retrieval,
C. Schmid and R. Mohr. In Pattern Analysis and Machine Intelligence,
1997.
- Indexing
based on scale invariant interest points. K.Mikolajczyk and
C.Schmid. In International
Conference on Computer Vision, 525-531, 2001
- Distinctive
Image Features from
Scale-Invariant Keypoints, David Lowe, International Journal of
Computer Vision, 2004.
|
|
|
- Topics
- Radiometry, measuring light
- Readings:
- Chapter 4 from Forsyth&Ponce
|
|
- Topics
- Color perception, color spaces
- Readings:
- Chapter 6 from Forsyth&Ponce
|
|
- Topics
- Perspective projection, Pinhole camera
model, Lenses
- Readings
|
Camera
Calibration
(slides)
|
- Topics
- Camera geometry, camera
calibration
- Readings
|
Multi
view Geometry
(slides)
|
- Readings
- Chapter 10 from Forsyth&Ponce
|
Stereopsis
(slides)
|
- Topics
- Stereopsis, Matching,
Reconstruction
- Readings
- Chapter 11 from Forsyth&Ponce
|
Motion
(slides)
|
- Topics
- Optical flow, structure from motion,
Tracking
- An
iterative image registration technique with an application to stereo
vision, Bruce Lucas and Takeo Kanade, Proceedings of the 7th
International Joint Conference on Artificial Intelligence (IJCAI), 1981
- Detection
and Tracking of Point Features.Carlo Tomasi and Takeo
Kanade. Carnegie Mellon University Technical Report
CMU-CS-91-132, April 1991.
- Good
Features to Track, Jianbo Shi and Carlo Tomasi, IEEE Conference on
Computer Vision and Pattern Recognition, pages 593-600, 1994.
- Feature
based methods for structure
and motion estimation, Phil Torr and Andrew Zisserman, in Vision
Algorithms: Theory and Practice,
B. Triggs, A. Zisserman, R. Szeliski (Eds.), Springer (2000)
|
Mosaics
(slides)
|
- Topics
- Homographies, Image Mosaics
- Readings
- R. Szeliski and H.-Y. Shum, Creating
Full View Panoramic Image Mosaics and Environment Maps, Proc.
ACM SIGGRAPH, 1997, longer version: Panoramic
Image Mosaics, Technical report, MSR-TR-97-23, 1997
- M. Brown and D. G. Lowe. Recognising
Panoramas. In Proceedings of the 9th International Conference
on Computer Vision (ICCV2003), pages 1218-1225, Nice, France,
October 2003.
- Planar
Scenes and Homography lecture notes by Serge Belongie
- Links
|
Segmentation
(slides)
|
- Topics
- Segmentation, Grouping, Fitting
- Readings
- a
shorter
version published in IEEE Conf. Computer Vision and Pattern
Recognition(CVPR), June 1997, Puerto Rico
- Laws
of Organization in Perceptual Forms, Max Wertheimer, first
published as
Untersuchungen zur Lehre von der Gestalt II, in Psycologische
Forschung, 4, 301-350. Translation published in Ellis, W. (1938). A
source book of Gestalt psychology (pp. 71-88). London: Routledge &
Kegan Paul.
- Links
|
Recognition
(slides)
|
- Topics
- Model based and template matching based
methods for recognition
- Readings
- , Object
Detection using statistics of parts, Henry
Schneiderman and Takeo Kanade, International Journal of Computer
Vision
- Robust
Real-time Object Detection, Paul Viola, Micheal Jones,
International Journal on Computer Vision, 2001
- Recognition
by linear combinations of models, S. Ullman and R. Basri, IEEE
Transactions on Pattern Analysis and Machine Intelligence, Vol.13,
No.10, October 1991
- Shape
Matching and Object
Recognition Using Shape Contexts , Serge Belongie, Jitendra Malik
and Jan Puzicha, IEEE Transactions on Pattern Analysis and Machine
Intelligence(PAMI), 24(4):509-522, April 2002.
- Object
Class Recognition by Unsupervised Scale-Invariant Learning, Rob
Fergus, Pietro Perona, and Andrew Zisserman, Conference on
Computer Vision and Pattern Recognition, (2003).
- Face
recognition using eigenfaces, M. Turk and A. Pentland, Proc. IEEE
Conference on Computer Vision and Pattern Recognition, Maui, Hawaii,
1991
- Face
Recognition Across Pose and Illumination R. Gross, S. Baker,
I. Matthews, and T. Kanade, Handbook of Face Recognition, Stan Z. Li
and Anil K. Jain, ed., Springer-Verlag, 2005
- Links
|
Image
and Video Databases
(slides)
|
- Topics
- Retrieval, browsing and other novel
applications on large datasests
- Readings
- The
Earth Mover's Distanceas a Metric for Image Retrieval, Y. Rubner,
C. Tomasi, L.J. Guibas, Technical Report
STAN-CS-TN-98-86, Computer Science Department, Stanford University,
September 1998
- Faces
and Names in the News Tamara Miller, Alexander C. Berg, Jaety
Edwards, Michael Maire,
Ryan White, Yee Whye Teh, Eric Learned-Miller, David A. Forsyth, CVPR
2004
- Links
|
Student
Presentations
|
December 22, Monday 10:40-11:30
-Bahaeddin Eravci : Context-based
vision system for place and object
recognition,Antonio Torralba, Kevin Murphy, William Freeman, Mark
Rubin
-Can Ufuk Hantas: "Object Class Recognition by Unsupervised
Scale-Invariant Learning"
-Umut Uyumaz: Recovering
Human Body Configurations:
Combining
Segmentation and Recognition, Greg Mori, Xiaofeng Ren, Alexei A.
Efros,
Jitendra Malik,
December 25, Thursday 10:40-12:30
-Ozlem Gur: A
Framework for Sensor Planning and Control with
Applications to Vision Guided Multi-robot Systems.John R. Spletzer
Camillo J. Taylor
-Reha Oguz Selvitopi: Detecting
Unusual Activity in Video, H. Zhong, J.
Shi and M. Visontai,
-Murat Kurtcephe: Pedestrian
Detection
in Crowded Scenes, B Leibe, ESeemann,
B Schiele, ,
-Ismail Uyanik: Robust Real-time Object Detection,Viola, P. and Jones,
-Cem Aksoy: Names
and Faces in the News,Berg, T., Berg, A.,
Edwards, J., Maire, M., White, R, Teh, R.Y., Learned-Miller, E. and
Forsyth, D.A.
-Sermetcan Baysal: Recognizing
Action at a Distance, A.A. Efros, A.C.
Berg, J. Malik,
December 29, Monday 10:40-11:30
-Bahadir Ozdemir: Shape
Matching and Object Recognition using Low
Distortion Correspondences,Berg, A., Berg, T. and Malik, J.
-Tolga Ozaslan : Automatic Photo Pop-up and Geometric Context
from a Single Image by D.
Hoiem, A. Efros and M. Hebert
-Mucahit Kutlu: Detecting
Irregularities in Images and in Video,Oren
Boiman, Michal Irani
January 6, Tuesday
10:40-12:30
-Abdulkadiir Eryildirim: Histograms
of Oriented Gradients for Human
Detection, by N. Dalal and B. Triggs
- Celal Cigir : Learning to Detect Natural Image Boundaries Using Local
Brightness, Color and Texture Cues, by D. Martin, C. Fowlkes and J.
Malik
-Merve Saglam: Object
recognition with features inspired by visual
cortex, by T. Serre,L. Wolf, and T. Poggio
-Damla Arifoglu: Finding
and Tracking People from the Bottom Up,
by D. Ramanan and D. Forsyth and Strike a
Pose: Tracking People by Finding Stylized Poses by D. Ramanan, D.
Forsyth and A. Zisserman
-Duygu Atilgan: Video
Google: A Text Retrieval Approach to Object Matching in Videos, by
J. Sivic and A. Zisserman, and Object
Level Grouping for Video Shots, by J. Sivic, F. Schaffalitzky and
A. Zisserman
-Mehmet Can
Kurt: Photo tourism: Exploring photo
collections in 3D, by N. Snavely, S. Seitz, and R. Szeliski
-Kaan Duman: Region
Covariance: A fast Descriptor for
Detection and Classification , Tuzel, O.; Porikli, F.; Meer, P., |
Assignments:
- Reading Assignment #1
- Programming Assignment #1 (15%)
- Due:November 7
- Filter design
- Reference: Preattentive
texture discrimination with early vision mechanisms, Jitendra Malik
and Pietro Perona, Journal of Optical Society of America A, 7(5), May
1990, 923-932
- Design spot and bar filters in different orientations as in
Malik & Perona's paper
- For each image
- Build a Gaussian Pyramid of four scales
- Apply your filters on different scales, obtain responses
to
your filters
- Use mean and variance to compute statistics of filter
responses
- Construct a feature length of 64 (4x8x2)
- Download the data from http://jfauqueur.free.fr/research/GTDB/
- There are 8 categories, with 130 images from each,
resulting in 1040 images
- Take one image from the data set
- Compare it with the remaining 1039 images
- Compute the similarity of two images based on the
Euclidean
distance of the features
- For each of the images in the data set, rank the other
images according to the similarity
- Report the results
- Your report should include
- The filters that you designed: how you designed them, the
parameters that you used, how they look like, their response to some
images
- For each category in the database, choose one image and
show the first 20 highest rank images
- Evaluate the performance of your system
- For all the categories
- for all images in the category
- rank the most similar images.
- count how many of the other images in the highest
130 correspond to
the same category
- take the average of performances in each category and
report the result per ategory
- Report the average performance for all categories.
- Discuss the results (which categories are good, which are
bad, why, how the results can be improved, etc)
- Search for other related studies, summarize one by giving
proper references, and compare your method with the method proposed in
that study
- Submit your report together with a tar ball of your codes
(do
not include the images) through
- http://pclinuxserver.cs.bilkent.edu.tr/cgi-bin/submit/submit.cgi/duygulu
- Programming Assignment #3 (20%)
- Due: January 7, 2009
- You are required to recognize actions performed by different
people on static backgrounds. The project consists of two steps
- You will use the dataset provided in http://www.wisdom.weizmann.ac.il/~vision/SpaceTimeActions.html#Database
- The dataset is in avi format. In order to obtain the image
sequence you may use the following
- Part1: Silhouette extraction:
- First, find the silhouettes by a background subtraction method
- Model the background (find Mu and Sigma values for each
pixel through the entire video sequence)
- Subtract each frame from the background model
- Since background subtraction may result in many noisy pixels
apply morphological operators to remove the noise (Optional: If you
still have noise pixels try to apply hystheresis thresholding)
- Compare your silhouette images with the ones given in the
website (intersected area / area of the given mask)
- Part 2: Action Recognition:
- For the
following continue with the masks provided in the website
- Create Motion Energy and Motion History Images as described
in the paper entitled as "The Recognition of Human Movement Using
Temporal Templates" by Bobick and Davis (in summary MEI is the OR'ed
silhouettes, and MHI is obtained by giving weights to silhouettes
ranging between 0-255 normalized with the number of frames in a
sequence, Give teh highest weight to the first frame, and then take the
difference images of each consecutive frames, weight those accordingly,
and add all of them)
- Describe MEI and MHI by dividing each of these images into
3x3 grid, calculating Mu and Sigma from each grid, and then
constructing a 9x2=18 length feature vector for each.
- Use these feature vectors for finding the similarities
- Use nearest
neighbor method for recognition. There are 9 people performing 9
actions ( do not use skip action) and therefore 81 videos. Take one of
them, and assign the label of the most similar one from the remaining
80.
- You will write a report including
- For each action (for a single person) show the background
model, and the silhouettes for the entire video (you can sample the
video by taking every 4th frame)
- For each action (for all people) average error rate when
comparing your results with the resulting masks given on the website
- For each action (for a single person) show motion
energy and motion history images.
- Finally report the results of MEI and MHI separately by a
confusion table, and report the overall performances
- Programming Assignment #3 (15%)
- Due: January 19, 2009
- You will implement Eigenface method for face recognition.
- IYou can find the details from the following links
- Reference paper: Face
recognition using eigenfaces, M. Turk and A. Pentland, Proc. IEEE
Conference on Computer Vision and Pattern Recognition, Maui, Hawaii,
1991
- Eigenface tutorials:
- http://www.pages.drexel.edu/~sis26/Eigenface%20Tutorial.htm
- http://en.wikipedia.org/wiki/Eigenface
- http://www.cse.unr.edu/~bebis/MathMethods/PCA/case_study_pca1.pdf
- Test your method on the following database
- http://www.cl.cam.ac.uk/Research/DTG/attarchive/facedatabase.html
- Write a report including
- mean face image
- difference images for each person
- recognition rates for each person, and average performance
- discussion of the results
- Important note: There
are a plenty of available code, but I want you to
implement it by yourselves. Therefore, no code re-use is allowed.
You
can only use Linear Algebra tools.
- If you are unable to write the code use one of the available
ones, by clearly specifying it in your report and providing the link,
and do the required tests. In that case your homework will be evaluated
over 8 rather than over 15.
Policies
Important
notes about evaluation:
Assignments:
There will be three reading
assignments and three programming
assignments
Late homeworks are not
accepted
All programming assignments are
due midnight and will be sent by e-mail
In your e-mail
use the following format in the title
CS554 -
Programming assignment #
Your
programming assignmenments should be sent as a tar ball in
the following format
<name_surname_PA_#>.tar
All reading assignments are due
before the lecture hours and will be given to the instructor as printed
out
Reading
assignments will be summary of the given paper. It should
be about one page,
and you should
explain the main contributions and
the important points of the
proposed
methods in
your own words. Do not include any figures or
formulas. Do not write the values
of parameters
or thresholds unless
they are very important. Assume that you are submitting
one or two
pages summary of your paper to a conference.
Projects
The project may be
An original implementation of a
new or published idea,
A detailed emprical evaluation
and comparison of the existing
implementations of two or more methods,
You will work in groups of two or three
You are required to write a proposal, progress
report, final report, and do a demonstration and a presentation
Project
proposal will be a short description of the problem you would
like to tackle, objective of the study,
proposed algorithms,
hardware/software tools and data that you plan to utilize, and
evaluation strategies
that you plan to use. Also
provide a short list of related references.
Progress
report will describe your progress in the project and your plans
for the rest of the semester
Final
report will be a well-written report which provides proper
motivation for the task, proper citation
and discussion of related
literature, proper explanation of the details of the approach and
implementation
strategies, proper performance
evaluation, and detailed discussion of the results. You should
highlight
your contributions and
conclusions.
Final report guidelines:
Follow IEEE
two-column format as shown in the
example and the
format definition table and glossary.
The page limit
is 6 pages.
The report
should not have any page numbers, headers or footers.
You can use
IEEE's
LaTeX template or
Word template. (LaTeX users: Be sure to use the template's
conference mode.)
PDF submission
is recommended.
Presentation will
be in the form of poster session and each team will show their
contributions on a poster
which will fit to a board of
approximately 1m X 1m
Presentations:
Your presentations will be evaluated according to
the
following criteria. Please, consider them in preparing your
presentations:
Understanding of the topic - how
confident are you with the paper
that you present
Review of the related work - not
just mentioning but by reading
some of them to understand and relate to your paper
Giving an overview of the paper
- the main contributions of
the paper, and an overview of the approach
Explaining the details -
understanding and explaining the
formulas and methods given in the paper
Presentation - in general how
well you are prepared to give the
talk
Use of visual material when
available