Models of shape are used widely in computer vision; image features can be located, tracked or classified using a priori knowledge of object shape. Many objects are non-rigid, requiring a deformable model in order to capture shape variability.
One such model is the Point Distribution Model (PDM) [3]. An object is modelled in terms of landmark points positioned on object features, and at regular intervals in between. By identifying such points on a set of training examples, a statistical approach (principal component analysis, or PCA) can be used to discover the mean object shape, and the major modes of shape variation.
The standard PDM is based purely on linear statistics (the PCA assumes a Gaussian distribution of the training examples in shape space). For any particular mode of variation, the positions of landmark points can vary only along straight lines; non-linear variation is achieved by a combination of two or more modes. This situation is not ideal, firstly because the most compact representation of shape variability is not achieved, and secondly because implausible shapes can occur, when invalid combinations of deformations are used.
Attempts have been made to combat this problem. Sozou et al's Polynomial Regression PDM [9] allows landmark points to move along combinations of polynomial paths. Heap and Hogg's Cartesian-Polar Hybrid PDM [6] makes use of polar coordinates to model bending deformations more accurately. Sozou et al [10] have also investigated using a multi-layer perceptron to provide a non-linear mapping from shape parameters to shape.
All these approaches give some improvement over the linear PDM, but they have their limitations. The first two can only model certain types of non-linear deformation (polynomial and rotational respectively). The perceptron method can model more general non-linear deformations in one dimension, but performance is poor in cases where there is more than one degree of deformational freedom.
The common feature of previous approaches is some form of `linearising' mapping of the shape space onto another uniform space. In some cases such a mapping cannot exist; for example when the distribution of valid shapes forms a region which is hollow, has changing dimensionality, or is discontinuous.
Bregler and Omohundro [2] describe a method for approximating an arbitrary surface within an n-dimensional space, using samples taken from it. The training examples are divided into (overlapping) clusters, and a PCA is performed separately on each cluster. This produces a set of locally-linear `patches', the union of which gives the required approximation to the surface.
This technique applies directly to object shape modelling. If the training samples are examples of valid object shapes then the surface produced is the region of valid shapes. Bregler models the shape of lips in this way.
The piecewise-linear approach has also been touched on by Ahmad et al [1]. They built a multi-gesture hand model consisting of 5 sub-PCAs (one for each gesture) using a weighted combination of the training examples, and experienced promising results in terms of tracking performance. Automation of the model building process was not considered; both the collection of training data and the determination of cluster membership were undertaken manually.
In this paper we extend the ideas of Bregler and Omohundro. We describe alternative treatments of the locally-linear patches and union operations, and highlight some of the design choices which must be made. We also show how it is possible to use a two-level hierarchical approach to improve efficiency and reduce noise.
We apply the technique to synthetic data containing non-linear deformation (an anglepoise lamp), and on automatically-collected real data (hand shapes).