Pixel Club: Learning To Perceive: Developing Visual Concepts from Unlabeled

Speaker:
Danny Harari (Weizmann Institute of Science)
Date:
Monday, 17.6.2013, 11:30
Place:
Room 337-8 Taub Bld.

We consider the tasks of learning to recognize hands and direction of gaze from unlabeled natural video streams. These are known to be highly challenging tasks for current computational methods. However, infants learn to solve these visual problems early in development - during the first year of life. This gap between computational difficulty and infant learning is particularly striking. We present a model which is shown a stream of natural videos, and learns without any supervision to detect human hands by appearance and by context, as well as direction of gaze, in complex natural scenes. The algorithm is guided by an empirically motivated innate mechanism – the detection of ‘mover’ events in dynamic images, which are the events of a moving image region causing a stationary region to move or change after contact. Mover events provide an internal teaching signal, which is shown to be more effective than alternative cues and sufficient for the efficient acquisition of hand and gaze representations. We will discuss how the implications of our approach can go beyond the specific tasks, by showing how domain-specific ‘proto concepts’ can guide the system to acquire meaningful concepts, which are significant to the observer, but are statistically inconspicuous in the sensory input.

Back to the index of events