[Vision Group Logo]

Clustering

Consider 5000 points in 2D. Generated from 5 gaussians.

How many clusters should be chosen to represent the data?

Movie of Clusters

The demo uses an EM based based to fit Gaussians with full covariance matrices to the data. It begins with one Gaussian, and once the maximum likelihood solution is found, one of the clusters is split into two, each placed one standard deviation along the principal axis of the distribution. The final solution (choice of number of clusters) is made when adding in a extra cluster doesn't increase the likelihood.

Compare the membership of the points when classified

EM membership Using the EM-Gaussian fitting.
K-means membership Using standard K-means.


Gaussian mixture model
Vector Quantisation
Vision Group Homepage