Tuesday, 26.2.2013, 11:30
Activity recognition in video is a major problem in computer vision, that integrates two main challenges. The first, is defining a robust and informative set of features. The second, is constructing a model that builds on the set of features and provides a distinctive representation for each action. In this work we suggest a solution to both challenges. Our features are based on an extension of the GIST descriptor to space-time. We analyze the properties of space-time GIST in the Fourier domain and show that it should be tuned in order to properly capture variations in appearance and velocity. Our model for representing actions is very simple and is based on localizing filter responses in space-time. Interestingly, this very simple representation is shown to provide high-quality action recognition results on two different benchmarks.
M.Sc. seminar under supervision of Lihi Zelnik-Manor.