Daniel Zoran (Google DeepMind)
Tuesday, 29.3.2016, 11:30
We propose a framework which infers mid-level visual properties of an image by learning about ordinal relationships. Instead of estimating metric quantities directly, the system proposes ordinal relationship estimates for pairs of points in the input image. These probabilistic ordinal measurements are then aggregated and globalized to create a full output map of continuous metric measurements.
Estimating order relationships between pairs of points has several advantages over metric estimation: it requires solving a simpler problem than metric regression; humans are better at making relative judgements so data collection is easier, and ordinal relationships are invariant to monotonic transformations of the data, thereby increasing the robustness of the system and providing qualitatively different information.
We demonstrate that this framework works well on two important mid-level vision tasks: intrinsic image decomposition and depth from a single RGB image. We train two separate systems with the same architecture on data from two different modalities. We provide an analysis of the resulting models, showing that they learn a number of simple rules to make ordinal decisions. We apply the same algorithm to two different mid-level vision problems: depth estimation, with good results, and intrinsic image decomposition, with state-of-the-art results.
Joint work with Philip Isola, Dilip Krishnan and William T. Freeman.