Personal tools
  • Log in
You are here: Home Publications Analyzing bottom-up saliency in natural movies
The GazeCom project is funded by the European Commission (contract no. IST-C-033816) within the Information Society Technologies (IST) priority of the 6th Framework Programme.
Document Actions

Analyzing bottom-up saliency in natural movies

by Eleonora Vig last modified 2010-09-03 16:49

Analyzing bottom-up saliency in natural movies (Presented at the European Conference on Visual Perception 2010)

Eleonora Vig,  Michael Dorr, and Erhardt Barth

We investigate the contribution of local spatio-temporal variations of image intensity to saliency. To measure different types of variations, we use invariants of the structure tensor. Considering a video to be represented in spatial axes (x,y), and temporal axis t, the n-dimensional structure tensor (nD-ST) can be evaluated for different combinations of axes (2D- and 3D-ST) and also for the (degenerate) case of only one axis (1D-ST).

Eye movements recorded on 18 natural videos are used to label locations as fixated or non-fixated. For each location, we compute the invariants (products of eigenvalues) of the nD-ST and use these to predict eye movements on unseen videos with an SVM classifier. We show that the 3D-ST is optimal (average ROC score of 0.656), which means that the most predictive regions of a movie are those where intensity varies along all spatial and temporal directions. Analyzing 2-dimensional variations, the 2D-ST evaluated on the axes (y,t) gave the best score (0.638), followed by (x,y) (0.626), and (x,t) (0.625). The 1D-ST yielded 0.606 along the temporal, 0.604 for horizontal, and 0.602 for vertical axis. We conclude that bottom-up saliency is determined by spatio-temporal variations of image intensity rather than spatial or temporal variations.

Poster in pdf format.


Powered by Plone CMS, the Open Source Content Management System

This site conforms to the following standards: