גל לביא, הרצאה סמינריונית לדוקטורט
יום רביעי, 27.6.2012, 16:00
This talk considers the problem of recognizing activities in surveillance video. Activities are high-level non-atomic semantic concepts which may have complex temporal structure. Activities are not easily identifiable using image features, but rather by the recognition of their composing events. Unfortunately, these composing events may only be observed up to a particular certainty.
Many approaches to classification/recognition in computer vision rely on the availability of a large corpus of training data. This data can be input into learning algorithms, yielding a classifier for future examples. The domain of activity recognition in surveillance video generally does not have this luxury. Interesting activities are usually rare, and few (if any) training examples are available for training.
The above limitation restricts the usefulness of probabilistic state space models such as Hidden Markov Models (HMM), for activity recognition. This type of models generally requires a number of examples on the order of the state space. For these reasons, the activity recognition community has thus turned to using formal specifications of domain knowledge to model activities in video.
Formalisms to describe activities in video must be robust to modeling of complex temporal relations including concurrent and asynchronous event streams. Inference procedures must be able to reason on uncertain input in an online manner in order to provide the timely alerts required in the surveillance scenario.
This talk describes PFPN (Particle Filter Petri Net), an approach which combines the specification of activities as Petri Nets with an activity recognition process that determines the likelihood that a particular activity is taking place in a video sequence, given a set of uncertain event observations. This framework improves over existing deterministic approaches to activity recognition by enabling the certainty reasoning required for coping with inherent ambiguity in both low-level video processing and activity definition. Furthermore, the PFPN approach reduces the dependence on a duration model and enables the creation of holistic activity models.
The talk also discusses our extension to the PFPN approach that separates the physical constraints of the scene (which we call context) and the constraints of the activity. This prevents context constraints from having to be modeled explicitly within the specification of each activity, leading to a decrease in the size of the state space over which we must perform probabilistic estimation. The factorization of the context state and activity state yields simpler activity models, that are able to achieve better recognition results more efficiently (as we demonstrate experimentally).