Aviv Gabbay (Hebrew University of Jerusalem)
Tuesday, 7.11.2017, 11:30
When video is recorded in a studio, sound is clear of external noises and unrelated sounds.
However, most video is not shot at studios. Video conferences from home or office are often disturbed by other people, ringing phones, or barking dogs.
TV reporting from city streets is commonly mixed with traffic noise or sound of winds.
We can exploit the visual information of face and mouth movements as seen in the video to enhance the voice of a speaker, eliminating unrelated sounds.
In the first part of this talk, I will describe a few techniques to predict speech signals by a silent video of a speaking person.
In the second part of the talk, I will propose a method to separate overlapping speech of several people speaking simultaneously (known as the cocktail-party problem), based on
rough speech predictions generated by video-to-speech system.
Aviv is a computer vision researcher at the Hebrew University of Jerusalem, working under the supervision of Prof. Shmuel Peleg.
His main research interests are applications of Computer Vision using Deep Learning methods and Audio-Visual analysis.
Aviv graduated with a B.Sc in Applied Mathematics during high school as part of a special program for gifted youth at Bar-Ilan University. Then he moved to the Hebrew university to pursue his M.Sc in computer science.
Apart from his studies, he served for 6 years in elite technological unit at Government of Israel as a software researcher.