Fragtrack - Robust Fragments-based Tracking using the Integral Histogram


In this work we apply a recognition-by-parts approach to object tracking. The template object is represented by multiple image fragments or patches.
The patches are arbitrary and are not based on an object model (in contrast with traditional use of model-based parts e.g. limbs and torso in human tracking). Every patch votes on the possible positions and scales of the object in the current frame, by comparing its histogram with the corresponding image patch histogram. We then minimize a robust statistic in order to combine the vote maps of the multiple patches.

Fragtrack overcomes several difficulties which cannot be handled by traditional histogram-based algorithms (e.g. mean shift). First, by robustly combining multiple patch votes, we are able to handle partial occlusions or pose change. Second, the geometric relations between the template patches allow us to take into account the spatial distribution of the pixel intensities - information which is lost in traditional histogram-based algorithms. Third, tracking large targets has the same computational cost as tracking small targets.


Amit Adam, Ehud Rivlin and Ilan Shimshoni, Robust Fragments-based Tracking using the Integral Histogram (pdf). IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2006

Example Videos

The following videos show the robust tracking achieved by our algorithm. In all of these examples we have used only the gray-scale information, quantized to 16 bins. The same parameters were used in all the clips.

Note that these results are obtained using the tracker's raw output. No filtering whatsoever is employed. Of course the tracker's estimate may be used as the measurement or update step in a filtering framework.


  face (2.5M)            face - comparison with mean-shift (3.2M) 

  woman (2.5M)         woman - comparison with mean-shift (3.3M)

  living room (<1M)    living room - comparison with mean-shift (<1M)

  hug (<1M)

  caviar occlusion (<1M)

  caviar scale (<1M)    Note: the (standard +-10%) scale solution we implemented has some limitations - see paper for comments



Below are the sources of a C++ implementation of the tracker, with instructions for building and running it:



Original sequences and ground truth

readme file    face sequence (11M)    woman sequence (11M)    ground truth