HOME
  COMMITTEE
  CALL FOR PAPERS
  IMPORTANT DATES
  AWARDS
  TECHNICAL PROGRAM
  SPONSORSHIP
  THEME SESSION
  PLENARY TALK
  HP Competition
Theme Session
Theme Session: Video Event Detection
::: Volumetric Approaches for Video Event Detection in Crowded Scenes :::

Overview: Real-world actions often occur in crowded, dynamic environments. This poses a difficult challenge for standard approaches to activity recognition in video because it is difficult to segment the actor from the background due to distracting motion from other objects in the scene. I will present our recent work on approaches for robustly detecting single events (such as someone in a crowd bending down to pick up an object) in the presence of significant occlusions and background clutter. Our work is driven by three key ideas: (1) efficient matching of volumetric representations of an event against oversegmented spatio-temporal video volumes; (2) combining shape and flow features; (3) parts-based models of spatio-temporal events. This research was in collaboration with Yan Ke, Pyry Matikainen and Martial Hebert.

Dr. Rahul Sukthankar
Sr. principal researcher, Intel
Adjunct research professor, Robotics, CMU

Brief Biography: Rahul Sukthankar is a senior principal research scientist at Intel Research Pittsburgh and adjunct research professor in the Robotics Institute at Carnegie Mellon. He was previously a senior researcher at HP/Compaq's Cambridge Research Lab and a research scientist at Just Research. Rahul received his Robotics Ph.D. from Carnegie Mellon and his B.S.E. summa cum laude in computer science from Princeton. His current research focuses on computer vision and machine learning, particularly in the areas of object recognition and information retrieval in medical imaging.

::: Crowd Segmentation and Video Analytics at GE :::

Overview: This talk presents a unified approach to crowd segmentation. A global solution is generated using an Expectation Maximization framework. Initially, a head and shoulder detector is used to nominate an exhaustive set of person locations and these form the person hypotheses. The image is then partitioned into a grid of small patches, which are each assigned to one of the person hypotheses. A key idea of this talk is that while whole body monolithic person detectors can fail due to occlusion, a partial response to such a detector can be used to evaluate the likelihood of a single patch being assigned to a hypothesis. This captures local appearance information without having to learn specific appearance models. The likelihood of a pair of patches being assigned to a person hypothesis is evaluated based on low-level image features such as uniform motion fields and color constancy. During the E-step, the single and pairwise likelihoods are used to compute a globally optimal set of assignments of patches to hypotheses. In the M-step, parameters which enforce global consistency of assignments are estimated. This can be viewed as a form of occlusion reasoning. The final assignment of patches to hypotheses constitutes a segmentation of the crowd. The resulting system provides a global solution that does not require background modeling and is robust with respect to clutter and partial occlusion. The talk will conclude with an overview of intelligent video being conducted at the GE Global Research Center.

Dr. Peter H. Tu
Computer Scientist
GE Global Reseach

Brief Biography: Dr. Tu is a senior research scientist working at General Electric’s Global Research (GE GR) center since 1997. Currently Dr. Tu is leading a group of 15 researchers in the field of multi-view surveillance with the aim of acheiving reliable behavior recognition in complex environments. He has developed a number of algorithms for latent fingerprint matching which have been incorporated into the FBI AFIS system. Dr. Tu is the principal investigator for the FBI ReFace project which is focused on developing a system for face reconstruction from skeletal remains. Dr Tu is also the principle investigator for the National Institute of Justice 3D Face Enhancer program which is targetted at improving face recogntion from poor quality surveillance video. Dr Tu has over 25 publications and has filed more than 20 U.S. patents. B.S. Systems Design Engineering, 1990 University of Waterloo – Canada, PhD. Engineering Science, 1995 Oxford University – England.

::: Graphical Models for Activity Recognition :::

Overview: Recognizing various human activities in videos is important for many tasks including visual surveillance, human-computer interaction and video indexing. Human activities form a natural hierarchical structure. At the lowest level are what we call "primitive" activities which are inferred directly from spatio-temporal analysis of the videos; this may include low-level feature extraction and tracking of objects. More complex activities can then be constructed by a "composition" of primitive and other lower-level activities.

Graphical models provide a natural mapping for such hierarchical activity representation. The nodes can correspond to primitive (or lower level) activities, links represent the relations between the events. Graphical models also provide a convenient and rigorous method to encode the uncertainities inherent in analysis of video streams.

We have explored many variations of graphical models. These include a variety of hierarchical, parallel and coupled models that also include duration models for actions (so-called semi-Markov models). This talk will provide a summary of these explorations conducted over the last few years.

Prof. Ramakant Nevatia
University of Southern California

Brief Biography: Ram Nevatia received the BS degree from the University of Bombay, India, the MS and PhD degrees from Stanford University, Palo Alto, California, all in electrical engineering. He has been with the University of Southern California, Los Angeles, since 1975, where he is currently a professor of computer science and electrical engineering and the director of the Institute for Robotics and Intelligent Systems. He has authored two books and contributed chapters to several others. He has been a regular contributor to the literature in computer vision. His research interests include computer vision, artificial intelligence, and robotics. Dr. Nevatia is a fellow of the American Association for Artificial Intelligence and a member of the ACM. He is an associate editor of Pattern Recognition and Computer Vision and Image Understanding. He has served as an associate editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence and as a technical editor in the areas of robot vision and inspection systems for the IEEE Journal of Robotics and Automation. He also served as a cogeneral chair of the IEEE Conference on Computer Vision and Pattern Recognition in June 1997. He is a fellow of the IEEE and member of the IEEE Computer Society.

::: Integrating Recognition and Reasoning in Smart Environments :::

Overview: We present our project on ‘smart indoor environments’ that are monitored unobtrusively by biometric capture devices, such as video cameras, microphones, etc. Such environments will keep track of their occupants and be capable of answering queries about the occupants’ whereabouts. In order to develop a unified model that is applicable across diverse biometric modalities, we present an abstract state transition framework in which different recognition steps are abstracted by events, and the reasoning necessary to effect state transitions is abstracted by a transition function. We define the metrics of ‘precision’ and ‘recall’ of a smart environment to evaluate how well it tracks its occupants. We show how the overall performance of the smart environment is improved through the use of spatiotemporal knowledge of the environment. A prototype based upon our proposed abstract framework indicates that integrating recognition and reasoning capabilities substantially improves the overall performance of the environment.

Dr. Venu Govindaraju
University at Buffalo (SUNY Buffalo)

Brief Biography: Dr. Venu Govindaraju is a Professor of Computer Science and Engineering at the University at Buffalo (SUNY Buffalo). He received his B-Tech (Honors) from the Indian Institute of Technology (IIT), Kharagpur, India in 1986, and his Ph.D. from UB in 1992.
In a research career spanning over 20 years, Dr. Govindaraju has made significant contributions to many areas of pattern recognition which is a major branch in the field of Artificial Intelligence within Computer Science. Much of his early work focused on the automated recognition of written language, both machine-printed and hand-written text, and more recently his research has expanded to Information Retrieval and Biometrics.
Dr. Govindaraju has authored more than 270 scientific papers including 50 journal papers. His seminal work in handwriting recognition was at the core of the first handwritten address interpretation system used by the US Postal Service. He was also the prime technical lead responsible for technology transfer to Lockheed Martin and Siemens Corporation for deployment by the US Postal Service, Australia Post and UK Royal Mail. Dr. Govindaraju has been the Principal or Co-principal investigator of projects funded by government and industry for about 50 million dollars.The Center for Unified Biometrics and Sensors (CUBS) that he founded in 2003 has since received over 8 million dollars of research funding covering several projects.
Dr. Govindaraju has given over 75 invited talks and has supervised the dissertation of 17 doctoral students. He has served on the editorial boards of premier journals in his area and has chaired several technical conferences and workshops. Dr. Govindaraju has won several awards for his scholarship icluding the prestigious MIT Global echnovator Award . He is a Fellow of the IEEE (Institute of Electrical and Electronics Engineers) and a Fellow of the IAPR (International Association of Pattern Recognition).

Theme session: Medical Imaging

Coordinator:
Ragini Verma, PhD (Assistant Professor, Department of Radiology, University of Pennsylvania, USA)
Santanu Chaudhury, PhD (Professor, Department of Electrical Engineering, IIT, Delhi)

Overview: The theme session will address various aspects of medical imaging, especially, computational anatomy, which involves developing computerized methods to process and analyze medical images. The talks will concentrate on the magnetic resonance imaging (MRI) which is currently one of the most widely used non-invasive imaging techniques in clinical practice for disease investigation and for studying treatment effects. Its growing use in clinical studies and drug trials, has provided an impetus for the development of sophisticated mathematical and computerized methods for processing and analyzing MRI data, which has gained tremendously from research in physics, mathematics, vision and visualization. The talks aim at providing researchers an advanced and comprehensive state-of-the-art overview of the area of medical imaging. Potential candidates are graduate students, post-doctoral students, computational researchers and clinically oriented researchers who would like a deeper insight into this exciting field.

For more details visit : https://www.rad.upenn.edu/sbia/Events/icgvip08.html

Speakers
Dr. Guido Gerig
Dept. of Computer Science
Dept. of Psychiatry, SCI Institute
University of Utah, Salt Lake City
http://www.sci.utah.edu/personnel/gerig.html
Dr.Carl-Fredrik Westin
Director, Laboratory of Mathematics in Imaging (LMI)
Department of Radiology, Harvard Medical School
Brigham and Women's Hospital, Boston
http://lmi.bwh.harvard.edu/~westin/
Dr. Ragini Verma
Section of Biomedical Image Analysis, Dept. of Radiology
University of Pennsylvania, Philadelphia
https://www.rad.upenn.edu/sbia/rverma/
Dr. Rakesh Mullick
Principal Scientist, Imaging Technologies
GE Research, Bangalore, INDIA
http://www.artree.com/rakesh/
Theme Session: Multimedia Pattern Analysis
::: Recent Development in Duplicate Video Detection Techniques :::

Overview: The rapid development of digital video processing technologies and the increasing bandwidth enable easy access, editing, and distribution of digital video contents. Copyright protection becomes a growing concern for content creators/owners nowadays. One of the key problems is duplicate video detection. For example, there are many illegal video copies upload to video sharing websites (e.g., YouTube) every day. It is important for these sites to identify illegal copies fast and accurately. In this talk, I will first review state-of-the-art technologies in this area. Then, I will present a novel video detection system developed at USC that can identify duplicate video copies efficiently. We propose a compact signature based on the underlying video structure, which is discriminative yet insensitive to various attacks. In addition, we propose to use an extremely efficient matching technique, which is originated from fast symbol string search. Unlike images and audio, the size of videos is usually very large, which makes it computationally expensive to match two very long video sequences. Our system can perform the matching in linear time while the computational cost usually grows at least quadratically as the length of the video for other existing solutions.

Dr. C.-C. Jay Kuo
University of Southern California

Brief Biography: Dr. C.-C. Jay Kuo received the Ph.D. degrees from the Massachusetts Institute of Technology in 1987. He is now with the University of Southern California (USC) as Director of Signal and Image Processing Institute and Professor of EE, CS and Mathematics. His research interests are in the areas of digital media processing, multimedia compression, communication and networking technologies, and embedded multimedia system design. Dr. Kuo is a Fellow of IEEE and SPIE. Dr. Kuo has guided about 90 students to their Ph.D. degrees and supervised 20 postdoctoral research fellows. Currently, his research group at USC consists of around 35 Ph.D. students (see website http://viola.usc.edu), which is one of the largest academic research groups in multimedia technologies. He is a co-author of about 150 journal papers, 750 conference papers and 9 books.
Dr. Kuo is Editor-in-Chief for the Journal of Visual Communication and Image Representation, and Editor for the Journal of Information Science and Engineering, LNCS Transactions on Data Hiding and Multimedia Security (a Springer journal), the Journal of Advances in Multimedia (a Hindawi journal) and the EURASIP Journal of Applied Signal Processing (a Hindawi journal). He was on the Editorial Board of the IEEE Signal Processing Magazine in 2003-2004. He served as Associate Editor for IEEE Transactions on Image Processing in 1995-98, IEEE Transactions on Circuits and Systems for Video Technology in 1995-1997 and IEEE Transactions on Speech and Audio Processing in 2001-2003.

::: Multimedia Surveillance Systems :::

Overview: In recent years, there has been an increased research interest in a number of multimodal sensing applications like surveillance, video ethnography, tele-presence, assisted living, life blogging etc. However, these applications are currently evolving as separate silos with no interconnection. Further, the individual application-centric architectures typically tend to focus on specific sensors, specific (hardwired) queries and deal with specific environments. We present a generic sensing architecture "Observation System", which allows multiple users to undertake different applications through abstracted interaction with a common set of sensors. The observation system observes behavior of various objects in an environment and keeps a record of important events and activities in an eventbase. In this system, unstructured data collected from diverse sensors and other sources are correlated to understand and gain insights in the environment. The observation system has applications in many areas such as surveillance, ethnography, marketing, and healthcare.
We will then advocate a design methodology for building systems which can explicitly take performance into account. This can aid in optimal selection and placement of multimedia sensors. Finally, we will introduce novel notions related to adversarial modeling in surveillance. For each of these three aspects, we will present some open problems and issues arising from the novel way of looking at multimodal observation systems.

Dr. Mohan Kankanhalli
National University of Singapore

Brief Biography: Mohan Kankanhalli is a Professor at the Department of Computer Science at the National University of Singapore. He is also the Vice-Dean for Academic Affairs and Graduate Studies at the NUS School of Computing. Mohan obtained his BTech (Electrical Engineering) from the Indian Institute of Technology, Kharagpur, in 1986 and his MS and PhD (Computer and Systems Engineering) from the Rensselaer Polytechnic Institute in 1998 and 1990, respectively. He then joined the Institute of Systems Science (ISS - now Institute for Infocomm Research) in Singapore in 1990. He mainly worked on content-based retrieval in the multimedia group. He spent the 1997-1998 academic year at the Department of Electrical Engineering of the Indian Institute of Science, Bangalore. He visited the Garage Cinema Group of the University of California at Berkeley during Jan-Jun 2004.
He is actively involved in organizing of many major conferences in the area of Multimedia. He is on the editorial boards of several journals including the ACM Transactions on Multimedia Computing, Communications, and Applications, IEEE Transactions on Multimedia, Springer Multimedia Systems Journal, Pattern Recognition Journal and the IEEE Transactions on Information Forensics and Security.
His current research interests are in Multimedia Systems (content processing, retrieval) and Multimedia Security (surveillance, digital rights management and forensics).
More details are available at: http://www.comp.nus.edu.sg/~mohan