LA∀ID project
Learning about Activities from VIDeo
The LAVID project started in May 2006, and is a three year EPSRC funded project investigating the learning of symbolic models of activity from video data. The aim in this project is to move from "simple" pixels through to high-level logical or symbolic models of activity through unsupervised machine learning. The project is a collaboration between two research groups in Computing: Knowledge Representation and Reasoning and Computer Vision.
The LAVID project will build upon and extend work carried out on the COGVIS Cognitive Vision project.
The aim of the project is to automate the joint acquisition of object and activity models from extended observation of general scenes, requiring no supervision or prior knowledge about the objects and the activities involved. We will do this by extending and integrating recent work on object recognition and inductive reasoning.
Specific objectives are:
- To devise an unsupervised method for learning multiple object categories from everyday scenes, assuming the objects of interest are in relative motion;
- To identify a suitable qualitative representation for objects and their spatiotemporal relationships;
- To develop an efficient and general method for logical induction of activities from this qualitative representation;
- To devise a method for adapting object categories to maximise the explanatory adequacy of learned activities;
- To integrate the above components into a system for activity induction, and to evaluate this within the domains of security surveillance and sporting events.
Despite the absence of explicit externally assigned semantics (e.g. naming of activities and object categories), the induced activity models should enable:
- Anomaly detection (e.g. noticing someone behaving in an unfamiliar fashion)
- Recognition (e.g. relating the formation of a cue to earlier activities)
- Search (e.g. finding video clips on the web containing similar activities)
- Explanation (e.g. producing a plausible explanation for the hidden part of a partially observed activity, as in 'car disappears behind wall, person appears')
- Prediction (e.g. steering a camera to look for someone getting out of a recently parked car, or generating an animation of what may happen next)
- Imitation (e.g. substituting for one of the players in a table-top game)
- An explicit, inspectable representation of the induced activity models (as a result of their being symbolic).
Introducing external sources of semantics into the loop, for example by examining textual annotations of video clips, would open-up further applications. This is beyond the scope of the current project, although such extensions are clearly feasible. The learning is intended to be end-to-end in the sense that models for object-categories and activities are acquired together in an unsupervised fashion from extended observation of everyday scenes. Emergent object categories serve to ground logical terms that appear within induced activities, thereby providing an automatic ontology and avoiding the classical grounding problem of predicate logic. We are not claiming that all conceptual objects should be grounded in this way - some may be constructed by synthesis from other concepts, although this may be beyond the scope of the project. A key challenge will be to configure this end- to-end learning so that the set of learned object categories is optimal for concisely representing and efficiently inferring the emergent activities.
People
- Hannah Dee, Research fellow
- Roberto Fraile, Research fellow
- Tony Cohn, Investigator
- David Hogg, Investigator
- Krishna Sridhar, PhD student
Related links
Cognitive Vision project | Cognitive Systems MSc | Engineering and Physical Sciences Research Council
The LAVID project can be contacted via email to Roberto Fraile on
rf@comp.leeds.ac.uk, or Hannah Dee on
hannah@comp.leeds.ac.uk.
Computer Vision Group
School of Computing
University of Leeds
Leeds LS2 9JT
United Kingdom
+44 113 343 7288
+44 113 343 5868 (fax)
