Fall 2008
TuTh 3:00PM - 4:15PM
BA 221
Credit hours: 3
Office
hours: 4:30-5:30
PM Tuesdays, 2:00-3:00
PM Thursdays
Office Location: CSB 237
Instructor: Mubarak Shah, email: shah@cs.ucf.edu
Course Web Page: http://www.cs.ucf.edu/courses/cap6412/
TA:
Paper
List Potential Papers
Datasets
Assignments
Homeworks
Course Goals:
To prepare students for graduate research in computer vision.
Course Description:
Review recent advances in computer vision.
Required and Optional texts:
No textbook.
Course Prerequisites:
CAP5415 or consent of instructor.
Exam and Grading Policy:
Reports 30%
Discussion and Attendance 20%
Homework 10%
Programs/Project 40%
No exam
Reports:
Summary
Strengths
Weaknesses
Ideas
Questions
Class Policy:
The University Golden Rules will be observed in this class. Copying or Plagiarism is violation of the Golden Rules.
Some Tips on Reading Research Papers:
1. You have to read the paper several times to understand it. When you read the paper first time, if you do not understand something do not get stuck, keep reading assuming you will figure out that later. When you read it the second time, you will understand much more, and the third time even more ...
2. Try first to get a general idea of the paper: What problem is being solved? What are the main steps? How can I implement the method?, even though I do not understand why each step is performed the way it is performed?
3. Try to relate the method to other methods you know, and conceptually find similarities and differences.
4. In the first reading it may be a good idea to skip the related work, since you do not know all other papers, they will confuse you more.
5. Do not use dictionary to just look up the meaning of technical terms like particle filters, maximum likelihood, they are concepts, dictionaries do not define them. They will tell you literal meanings, which may not be useful.
6. Try to understand each concept in isolation, and then integrate them to understand the whole paper. For instance, the paper on "Feature Integration with adaptive weights in a sequential Monte Carlo Tracker" is quite complex paper at the first look. Because it uses Monte Carlo, particle filter, likelihood etc. But try to understand the gist of it. The paper is about tracking, you know a few tracking methods already. It uses features: color histogram, templates in correlation, shape, etc. You know these features, and you have used them. The probabilities obtained by each features are combined (fused) to achieve tracking. How will you combine the probabilities or confidences of each features: multiply, add, apply threshold and then add ...
Particle filter/condensation method is already available in Intell Open CV library, use it, get some idea how it works, what are the parameters, then go back to read the paper again ... If you keep doing it for one week, you will understand a lot about that paper! Next week you do the second paper, and so on ...
Research Tip in MIT:
August 26: Lecture 1
Computer Vision Story and The Changing Shape of Computer Vision in Twenty First CenturyAugust 28: Anomalous Event Detection, Presenter: Arslan Basharat
Arslan Basharat, Alexei Gritai, and Mubarak Shah, "Learning Object Motion Patterns for Anomaly Detection and Improved Object Detection", CVPR 2008.Related papers:
- Chris Stauffer, W. Eric, L. Grimson, "Learning patterns of activity using real-time tracking", PAMI 2000
- Omar Javed, Khurram Shafique, and Mubarak Shah, "A Hierarchical Approach to Robust Background Subtraction using Color and Gradient Information", IEEE Workshop on Motion and Video Computing, 2002.
- Yaser Sheikh and Mubarak Shah, "Bayesian Modelling of Dynamic Scenes for Object Detection", PAMI 2005
- Brandyn White and Mubarak Shah, "Automatically Tuning Background Subtraction Parameters Using Particle Swarm Optimization", IEEE International Conference on Multimedia & Expo, 2007
- I. Saleemi, K. Shafique, M.Shah, “Probabilistic Modeling of Scene Dynamics for Applications in Visual Surveillance”. PAMI 2008
- Andrew Miller and Mubarak Shah, "Foreground Segmentation in Surveillance Scenes Containing a Door", IEEE International Conference on Multimedia & Expo 2007
September 2: Video Synopsis and Indexing, Presenter: Mikel D. Rodriguez Sullivan
Y. Pritch, A. Rav-Acha, S. Peleg, "Video Synopsis and Indexing", PAMI 2008.Related papers:
- A. Rav-Acha, Y. Pritch, A. Gutman, and S. Peleg, "Webcam synopsis: Peeking around the world", In ICCV’07, Rio de Janiero, October 2007.
- H. Kang, Y. Matsushita, X. Tang, and X. Chen,."Space-time video montage", In CVPR’06, pages 1331–1338, New-York, June 2006.
- C. Kim and J. Hwang. An integrated scheme for object-based video abstraction. In ACM Multimedia, pages 303–311, New York, 2000.
- A. Rav-Acha, Y. Pritch, and S. Peleg, "Making a long video short: Dynamic video synopsis", In CVPR’06, pages 435–441, New-York, June 2006
- X. Zhu, X. Wu, J. Fan, A. K. Elmagarmid, and W. G. Aref, "Exploring video content structure for hierarchical summarization", Multimedia Syst., 10(2):98–115, 2004
- Link to sample results: http://www.vision.huji.ac.il/video-synopsis/
September 4: Activity Recognition, Presenter: Jingen Liu
H. Jiang, D. Martin, "Finding Actions Using Shape Flows", EECV 2008.Related papers:
I. Action recognition by body joint trajectories
- Yilmaz, A., and Shah, M, Recognizing human actions in videos acquired by uncelebrated moving cameras, ICCV 2005
- Sheikh, Y., and Shah, M, Exploring the space of an action, ICCV 2005
- Parameswaran, V., and Chellappa, R. Human action-recognition using mutual invariants, CVIU 2005.
II. Action recognition by silhouettes
- Yilmaz, A., and Shah, M, Actions as objects: a novel action representation, CVPR 2005
- Weinland, D, et. al, Free viewpoint action recongnition using motion history volumes. CVIU 2006.
- J. Liu., S. Ali and M. Shah, Recognizing human actions using multiple features, CVPR 2008
III. Action recognition by interest parts
- I. Laptev, Space-Time interest points, ICCV 2003.
- J. Liu and M. Shah, Learning human actions via information maximization, CVPR 2008
IV. Others
- Interrante, V, et. al. Visualizing 3D flow, IEEE computer graphics and applications, 1998.
- Jiang H., et. al. Matching by linear programming and successive convexification, PAMI 2007.
September 9: Image Warping, Presenter: Ramin Mehran
T. Leyvand, D. Cohen-Or, G. Dror, D. Lischinski, "Data-Driven Enhancement of Facial Attractiveness", SIGGRAPH 2008.Related papers:
Related Videos and images:
- Y. Eisenthal, G. Dror, E. Ruppin, Facial attractiveness: Beauty and the machine. Neural Computation, 2006 18, 1, 119–142.
- D. I. Perrett, K.A. May, and S. Yoshikawa, "Facial Shape And Judgements of Female Attractiveness." Nature 368 (1994): 239-242
- Volker Blanz, Thomas Vetter, Morphable Model for the Synthesis of 3D Faces, SIGGRAPH 99
- Frédéric Pighin, Jamie Hecker, Dani Lischinski, Richard Szeliski, David H. Salesin, Synthesizing realistic facial expressions from photographs, SIGGRAPH 98
- V. BLANZ, Manipulation of facial attractiveness.
http://www.mpi-inf.mpg.de/~blanz/data/attractiveness/ , 2003
- SIGGRAPH Demo Video: Data-Driven Enhancement of Facial Attractiveness (Video)
- SIGGRAPH Demo Video: Morphable Model for the Synthesis of 3D Faces (youtube)
- Paper's website: http://www.cs.tau.ac.il/~tommer/beautification2008/
September 11: Object Tracking,
Presenter: Enrique
G. Ortiz
Yanlin Guo,Steve Hsu, Harpreet S. Sawhney, Rakesh Kumar, and
Ying Shan, "Robust
Object Matching for Persistent Tracking with Heterogeneous Features",
PAMI 2007.
Related Material:
September 16: Geo-spatial Aerial Video Processing, Presenter: Vladimir Reilly
Jiangjian Xiao, Hui Cheng, Feng Han, Harpreet Sawhney, "Geo-spatial Aerial Video Processing for Scene Understanding and Object Tracking", CVPR 2008.Related papers:
Geo Registeration:Book Chapter:
- Yaser Sheikh, Sohaib Khan, Mubarak Shah, and R. Cannata, "Geodetic Alignment of Aerial Video Frames", In Video Registration", Video Computing Series, Kluwer Academic Publishers, 2003
September 18: Geographic Information from Images, Presenter: Janaka Liyanage
James Hays, and Alexei A. Efros, "IM2GPS: Estimating Geographic Information from a Single Image", CVPR 2008.Related papers:
Data on the web:
- Asaad Hakeem, Roberto Vezzani, Mubarak Shah, and Rita Cucchiara, "Estimating Geospatial Trajectory of a Moving Camera", ICPR 2006.
- A. Oliva and A. Torralba. Building the gist of a scene: The role of global image features in recognition. In Visual Perception, Progress in Brain Research, volume 155, 2006.
- W. Zhang and J. Kosecka. Image based localization in urban environments. In 3DPVT '06, 2006.
September 23: Object Localization, Presenter: Pavel Babenko
Christoph H. Lampert, Matthew B. Blaschko, Thomas Hofmann, "Beyond Sliding Windows: Object Localization by Efficient Subwindow Search", CVPR 2008.Related papers:
- Scene Understanding and Classification
- Jingen Liu and Mubarak Shah, Scene Modeling using Co-Clustering, IEEE International Conference on Computer Vision (ICCV), 2007.
- Author's Presentation
September 25: 3D Pose Refinement, Presenter: Alexandre Bassel
P. Lagger, M. Salzmann, V. Lepetit, and P. Fua, "3D Pose Refinement from Reflections", CVPR 2008
Related papers:
Related Matrial:
- Pose estimation, Section 1.10 in Fundamentals of Computer Vision, Mubarak Shah
- Shape from shading 6.31 in Fundamentals of Computer Vision, Mubarak Shah
- Model based image compression, pages 11-24, CAP-6411- Lecture-12
September 30: Recursive GMM,
Presenter: Janaka
Liyanage
Zivkovic, Zoran and Heijden van der, Ferdinand, "Recursive
unsupervised learning of finite mixture models", PAMI 2004
October 2: Crowd
Segmentation, Presenter: Ramin
Mehran
P. Tu, T. Sebastian, G. Doretto, N. Krahnstoever, J.
Rittscher, and T. Yu, "Unified
Crowd Segmentation", ECCV 2008.
October 7th and 9th: Presentations for the Assignment I |
Octobesr 16:
Detection and Tracking, Presenter: Alexandre
Bassel
Mykhaylo Andriluka, Stefan
Roth, Bernt Schiele, "People-Tracking-by-Detection
and People-Detection-by-Tracking",
CVPR 2008.
October 17: Face
Alignment, Presenter: Enrique
G. Ortiz
H. Wu, X. Liu, G. Doretto. "Face Alignment using Boosted Ranking
Models." In Proc. of IEEE CVPR, 2008
October 21:
Actions from Movies, Presenter: Mikel
D. Rodriguez Sullivan (Link to Author's Presentations)
Ivan Laptev, Marcin
MarszaĆek, Cordelia Schmid,
Benjamin Rozenfeld, "Learning
realistic human actions from movies",
CVPR 2008.
October 28: Vision
Context, Presenter: Vladimir
Reilly
Zhuowen Tu, "Auto-context and Its Application to
High-level Vision Tasks" , In Proc. of IEEE CVPR,
2008
October 30: Image
Descriptor, Presenter: Pavel
Babenko
Engin Tola, Vincent Lepetit, Pascal Fua, "A Fast
Local Descriptor for Dense Matching", In Proc. of
IEEE CVPR, 2008
November 4th: Presentations for the Assignment 2 (group 2) |
November 6: Link
Analysis, Presenter: Ramin Mehran
Gunhee Kim, Christos Faloutsos, Martial Hebert, "Unsupervised
Modeling of Object Categories Using Link Analysis Techniques",
In Proc. of IEEE CVPR, 2008
November 13: Object
Category Detection, Presenter: Dr.
Rahul Sukthankar UCF Vision Class Guest
Lecture
L. Yang, R. Jin, R. Sukthankar, F. Jurie. "Unifying
Discriminative Visual Codebook Generation with Classifier Training for
Object Category Recognition", In Proc. of IEEE
CVPR, 2008
November 13:
Levenberg-Marquardt, Presenter: Dr. Mubarak Shah
Course Lecture
Levenberg-Marquardt
and Szeliski Registration Method
November 13: Kalman
Filter, Presenter: Dr. Mubarak Shah
Course Lecture
Kalman
Filter
Potential Papers (CVPR 08, ECCV 08, SIGGRAPH 08):
(b) ECCV 2008
- Large-scale manifold learning
- Single-image Vignetting correction using radial gradient symmetry
- Epitomic Location Recognition
- 3D Pose Refinement from Reflections
- Viewpoint-Independent Object Class Detection using 3D Feature Maps
- Auto-Context and Its Application to High-level Vision Tasks
- Learning realistic human actions from movies
- People-Tracking-by-Detection and People-Detection-by-Tracking
- Motion blur identification from image gradients
- Who killed the directed model?
- Fast Image Search for Learned Metrics
- Segmentation by transduction
- Semantic texton forests for image categorization and segmentation
- Robust Dual Motion Deblurring
- Unsupervised Modeling of Object Categories Using Link Analysis Techniques
- Multi-Object Shape Estimation and Tracking from Silhouette Cues
- Partitioning of Image Datasets using Discriminative Context Information
- Beyond Sliding Windows: Object Localization by Efficient Subwindow Search
- Globally Optimal Bilinear Programming for Computer Vision Applications
- Background Subtraction in Highly Dynamic Scenes
- Face Alignment via Boosted Ranking Model
- Directions of Egomotion from Antipodal Points
- A Unified Framework for Generalized Linear Discriminant Analysis
- Transductive Object Cutout
- Taylor Expansion Based Classifier Adaptation: Application to Person Detection
- Constant Time O(1) Bilateral Filtering
- Demosaicing by Smoothing along 1D Features
- A Fast Local Descriptor for Dense Matching
- Kernel Integral Images: A Framework for Fast Non-Uniform Filtering
- Manifold-Manifold Distance with Application to Face Recognition based on Image Set
- Summarizing Visual Data Using Bidirectional Similarity
- Unifying Discriminative Visual Codebook Generation with Classifier Training for Object Category Recognition
- Human-Assisted Motion Annotation
- Directional Independent Component Analysis with Tensor Representation
- Re-weighting Linear Discrimination Analysis under Ranking Loss
- From Appearance to Context-Based Recognition: Dense Labeling in Small Images
- The patch transform and its applications to image editing
(c) SIGGRAPH 2008
- Implement Data-Driven Enhancement of Facial Attractiveness". Due September 30
- Active Shape Model Code and instructions (Download)
- Dataset of faces with rating from Mikel Rodriguez
- Work in details of assignment 1, Due Oct 26
- Experiment with ASM model. Train the model using roughly half of 150 annotated face images, and test the face feature detection on the remaining half. Summarize the results, comment on the quality of results, difficulties, failures…
- Experiment with triangulation of 150 annotated face images. Keep the order of face point the same. Summarize the results, comment on the quality of results, difficulties, failures…
- Train SVR (non-linear) for computing the beauty score of face image. Train SVR on some of annotated face images and test on remailing images. Summarize the results, comment on the quality of results, difficulties, failures…
- Test LM algorithm (non-linear optimization) to estimate the optimal distances of vertices for beautification process. Study the LM iterations, effect of inial estimate, number of iterations to converge, error..
- Study warping method from MIT, summarize its main step. Apply the warping to all images. Summarize the results, comment on the quality of results, difficulties, failures…
- Impelement the paper: Zhuowen Tu, "Auto-context and Its Application to High-level Vision Tasks" , In Proc. of IEEE CVPR, 2008 , Due Last Day of the Classes
- demonstrate on:
- Weisman data set for horses
- Human body configuration
- MSRC Scene parsing/labeling
- Dataset of faces with rating from Mikel Rodriguez
- Active Shape Model Code and instructions (Download)
HW 1: From Geo-spatial Aerial Video Processing for Scene Understanding and Object Tracking", CVPR 2008. (Due Tuesday 9/23/2008)
HW 2: From "A Fast Local Descriptor for Dense Matching", CVPR 2008. (Due Thursday 12/03/2008)
CAP 6412 | Department of Electrical Engineering and Computer Sciences | University of Central Florida
Copyright 2008 University of Central Florida