Technical Report PHD-2007-15

Title: Visual Attention Processes based on Stochastic Models: Algorithms and Bounds
Authors: Tamar Avraham
Supervisors: Assoc. Prof. Michael Lindenbaum
Abstract: Image analysis processes often scan the image exhaustively, looking for familiar objects of different location and size. An attention mechanism that suggests priorities to different image parts, and by that directs the analysis process to examine more interesting locations first, can make the whole process much more efficient. Motivated from studies of the human visual system, this study focuses mainly on inner-scene similarity as a source of information, testing its usefulness for directing computer visual attention from different perspectives. A study of the inherent limitations of visual search algorithms suggested the COVER measure. Taking a deterministic approach, we found that a measure similar to Kolmogorov's epsilon-covering bounds the performance abilities of all visual search implementations, and can quantify the difficulty of a search task. It was analytically proven that a simple algorithm (denoted FLNN-Farthest Labeled Nearest Neighbor) meets this bound. Taking a stochastic approach, we model the identity of the candidate image parts as a set of correlated random variables and derive two attention/search algorithms based on it. The first algorithm, denoted VSLE (Visual Search using Linear Estimation) suggests a 'dynamic' search procedure. Subsequent fixations are selected from combining inner-scene similarity information with the recognizer's feedback on previously attended sub-images. We show that VSLE can accelerate even fast detection processes as the one suggested by Viola and Jones 2004 for face detection. The second algorithm, denoted Esaliency (Extended Saliency) needs no recognition feedback and does not change the proposed priorities. It is therefore denoted a 'static' algorithm and can compete with previous attention mechanisms that also suggest a pre-calculated saliency map that is used to guide the fixations order. This algorithm incorporates inner-scene similarity information with the common expectation for a relatively small number of objects of interest in a scene and with the observation that the content of an image has a relatively clustered structure. Unlike other acceptable models of visual attention (e.g., Itti and Koch 1998) that associate saliency with local uniqueness, the Esaliency algorithm takes a global approach by considering a region as salient if there are only a few (or none) other similar regions in the whole scene. The algorithm uses a graphical model approximation that allows the hypotheses for target locations with the highest likelihood to be efficiently revealed. Its performance on natural scenes is extensively tested and its advantages for directing the recognition algorithm's resources are demonstrated. While our main goal is attention mechanisms for computer vision, we have also tested the relevance of our measures and models for human performance prediction in a Cognitive Psychology study. We extended the COVER and FLNN models to account for internal-noise, and tested their predictive abilities for orientation-search and color-search tasks where distractors' homogeneity and target-distractors similarity were systematically manipulated. In comparison to other prominent models of human visual search, the predictions of our models were the closest to the actual human performance.
CopyrightThe above paper is copyright by the Technion, Author(s), or others. Please contact the author(s) for more information

Remark: Any link to this technical report should be to this page (, rather than to the URL of the PDF files directly. The latter URLs may change without notice.

To the list of the PHD technical reports of 2007
To the main CS technical reports page

Computer science department, Technion