אירועים והרצאות בפקולטה למדעי המחשב ע"ש הנרי ומרילין טאוב
רון סלוסברג (הרצאה סמינריונית לדוקטורט)
יום שלישי, 19.07.2022, 13:00
In this thesis, we study the modeling of human faces. As all structured data is believed to reside on some low-dimensional manifold in a high-dimensional space, we wish to study and model the so-called manifold of human faces. By uncovering the latent manifold of faces one can project onto the manifold (facial reconstruction) as well as sample from the manifold (facial synthesis), two tasks with a wide range of applications such as gaming, animation, and AR/VR to name a few.
In their seminal work (1999), Vetter and Blanz proposed the linear 3D Morphable Model (3DMM). This model has been widely adopted since and can be thought of as a first-order linear approximation of the facial manifold. This model, however, has two main drawbacks: The fine details are lost, and it is not well suited for facial synthesis. The first drawback stems from the truncation of the PCA basis as well as the linear nature of the model. The second one arises since the proposed method for randomly selecting coefficients for each vector does not consider the latent facial manifold.
In our work, we wish to remedy these problems by constructing a non-linear model for facial photometry and combining it with the linear geometric 3DMM to achieve highly realistic facial modeling. Our approach leverages the Generative Adversarial Network (GANs ) training methodology to achieve a non-linear model for texture generation. To form the final 3D face, we propose methods for generating a corresponding geometry via the 3DMM model for each synthesized texture. In addition, we can project onto the facial manifold by optimizing the generator input parameters according to some image loss by leveraging backward propagation through the generator model. This process enables us to perform a full facial reconstruction even under challenging circumstances such as side-views.
Initially, we propose to learn the model in a supervised manner directly from facial scans. This is done by performing semantic alignment of the scans and mapping the scanned textures to texture images used during the training process. We later propose a new unsupervised methodology based only on natural facial photos. This is much more practical and yields better results due to the much larger dataset, however, training without direct supervision of textures is more complicated and requires a complex training pipeline. Compared to previous efforts we are among the very few proposed methods for facial generation and our results are shown to be SOTA in this task. For the task of facial reconstruction via our model, we compete with many prior methods and demonstrate that we compare favorably, even outperforming several supervised methods.