N. Jojic, N. Petrovic, B. J. Frey and T. S. Huang 2000. Transformed hidden Markov models: Estimating mixture models of images and inferring spatial transformations in video sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2000, IEEE Computer Society Press, Los Alamatos, CA.
SEE THE VIDEOS (AVI FORMAT): Test sequence, Stabilized sequence, Continued sequence, Processing a noisy sequence, Distraction removal!


In this paper we describe a novel generative model for video analysis called the transformed hidden Markov model (THMM). The video sequence is modeled as a set of frames generated by transforming a small number of class images that summarize the sequence. For each frame, the transformation and the class are discrete latent variables that depend on the previous class and transformation in the sequence. The set of possible transformations is defined in advance, and it can include a variety of transformation such as translation, rotation and shearing. In each stage of such a Markov model, a new frame is generated from a transformed Gaussian distribution based on the class/transformation combination generated by the Markov chain. This model can be viewed as an extension of a transformed mixture of Gaussians through time. We use this model to cluster unlabeled video segments and form a video summary in an unsupervised fashion. We also use the trained models to perform tracking, image stabilization and filtering. We demonstrate that the THMM is capable of combining long term dependencies in video sequences (repeating similar frames in remote parts of the sequence) with short term dependencies (such as short term image frame similarities and motion patterns) to better summarize and process a video sequence even in the presence of high levels of white or structured noise (such as foreground occlusion).

Compressed postscript (.ps.Z), uncompressed postscript (.ps), portable document format (.pdf),

Back to Brendan Frey's home page.