LA∀ID project

Learning about Activities from VIDeo

The LAVID project started in May 2006, and is a three year EPSRC funded project investigating the learning of symbolic models of activity from video data. The aim in this project is to move from "simple" pixels through to high-level logical or symbolic models of activity through unsupervised machine learning. The project is a collaboration between two research groups in Computing: Knowledge Representation and Reasoning and Computer Vision.

The LAVID project will build upon and extend work carried out on the COGVIS Cognitive Vision project.

The aim of the project is to automate the joint acquisition of object and activity models from extended observation of general scenes, requiring no supervision or prior knowledge about the objects and the activities involved. We will do this by extending and integrating recent work on object recognition and inductive reasoning.

Specific objectives are:

Despite the absence of explicit externally assigned semantics (e.g. naming of activities and object categories), the induced activity models should enable:

Introducing external sources of semantics into the loop, for example by examining textual annotations of video clips, would open-up further applications. This is beyond the scope of the current project, although such extensions are clearly feasible. The learning is intended to be end-to-end in the sense that models for object-categories and activities are acquired together in an unsupervised fashion from extended observation of everyday scenes. Emergent object categories serve to ground logical terms that appear within induced activities, thereby providing an automatic ontology and avoiding the classical grounding problem of predicate logic. We are not claiming that all conceptual objects should be grounded in this way - some may be constructed by synthesis from other concepts, although this may be beyond the scope of the project. A key challenge will be to configure this end- to-end learning so that the set of learned object categories is optimal for concisely representing and efficiently inferring the emergent activities.

People

Related links

Cognitive Vision project | Cognitive Systems MSc | Engineering and Physical Sciences Research Council

The LAVID project can be contacted via email to Roberto Fraile on rf@comp.leeds.ac.uk, or Hannah Dee on hannah@comp.leeds.ac.uk.
Computer Vision Group
School of Computing
University of Leeds
Leeds LS2 9JT
United Kingdom

+44 113 343 7288
+44 113 343 5868 (fax)