|Title:||Visual Tracking in a General Context via Tracker Combination and Low-Level Cues
|Supervisors:||Ehud Rivlin and Michael Lindenbaum
|Abstract:||This research thesis addresses the problem of visual tracking in video in a general context using two approaches. The first approach consists of the combination of multiple trackers that use different features and thus have different failure modes. A general framework for combining visual trackers that propagate filtering distributions over time is proposed. The individual trackers may propagate the filtering distributions either explicitly, for example, via Kalman filtering, or by using sample-sets of the distributions, via particle filtering. The proposed framework enables the combination of trackers of different state spaces, and in many cases it allows treating the individual trackers nearly as "black boxes." Another benefit of the proposed framework is that it may be applied as is to the combination of trackers that track different, albeit related, targets. The suggested framework was successfully tested using various state spaces and datasets. The second approach consists of the employment of basic, low-level visual characteristics, which are typically valid. A tracker of object bitmaps (silhouettes) is proposed without using any prior information about the target or the scene. The low-level visual characteristics employed are (short-term) Constancy of Color - the color projected to the camera from a point on a surface is approximately similar in consecutive video frames; Spatial Motion Continuity - the optical flow of the vast majority of the pixels in an image region corresponding to an object is spatially continuous; Spatial Color Coherence - it is highly probable that adjacent pixels of similar color belong to the same object. The tracker relies only on these basic visual characteristics, which makes it applicable in a very general context. The proposed tracker works by approximating, in each video frame, a probability distribution function of the target’s bitmap and then estimating the maximum a posteriori bitmap. Many experiments were conducted to demonstrate that the tracker is able to track objects that undergo drastic appearance changes and that are filmed using an arbitrarily moving camera. The usage of kernel-based trackers may also be included in the second approach. Two kernel-based trackers are proposed. The first tracker exploits the constancy of color and the presence of color edges along the target boundary. This tracker uses these two visual cues to affinely transform the kernel over time. In a sense, this work extends previous kernel-based trackers by incorporating the object boundary cue into the tracking process and by allowing the kernels to be affinely transformed instead of only translated and isotropically scaled. These two extensions make for more precise target localization. Moreover, a more accurately localized target facilitates safer updating of its reference color model, further enhancing the tracker’s robustness. The second kernel-based tracker enhances the Mean Shift tracker to use multiple color histograms obtained from different target views. This enhancement makes the Mean Shift tracker suitable for tracking objects whose colors that are revealed to the camera change over time. Both these kernel-based trackers were experimentally validated to cope with tracking scenarios for which traditional kernel-based trackers fail.|
|Copyright||The above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information|
Remark: Any link to this technical report should be to this page (http://www.cs.technion.ac.il/users/wwwb/cgi-bin/tr-info.cgi/2008/PHD/PHD-2008-06), rather than to the URL of the PDF or PS files directly. The latter URLs may change without notice.
To the list of the PHD technical reports of 2008
To the main CS technical reports page
Computer science department, Technion