Yedid Hoshen (Hebrew University of Jerusalem)
Monday, 14.11.2016, 11:30
One of the most exciting possibilities opened by deep neural networks is end-to-end learning: the ability to learn tasks without the need for feature engineering or breaking down into sub-tasks. This talk will present three cases illustrating how end-to-end learning can operate in machine perception across the senses (Hearing, Vision) and even for the entire perception-cognition-action cycle.
The talk begins with speech recognition, showing how acoustic models can be learned end-to-end. This approach skips the feature extraction pipeline, carefully designed for speech recognition over decades.
Proceeding to vision, a novel application is described: identification of photographers of wearable video cameras. Such video was previously considered anonymous as it does not show the photographer.
The talk concludes by presenting a new task, encompassing the full perception-cognition-action cycle: visual learning of arithmetic operations using only pictures of numbers. This is done without using or learning the notions of numbers, digits, and operators.
The talk is based on the following papers:
1. Speech Acoustic Modeling From Raw Multichannel Waveforms, Y. Hoshen, R.J. Weiss, and K.W. Wilson, ICASSP’15
2. An Egocentric Look at Video Photographer Identity, Y. Hoshen, S. Peleg, CVPR’16
3. Visual Learning of Arithmetic Operations, Y. Hoshen, S. Peleg, AAAI’16
Yedid Hoshen has recently completed his PhD at the Hebrew University of Jerusalem under the supervision of Prof. Shmuel Peleg. Yedid received his Physics Masters degree from the University of Oxford. He has held several industrial research positions, most recently consulting for Taboola. In winter 2017 he will join Facebook AI Research (FAIR) - New York as a Post-doctoral Researcher.