Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a nunc odio. Etiam in purus cursus, hendrerit nunc in, tincidunt dolor. Ut sit amet molestie velit, vitae accumsan erat. Morbi elementum leo eu ipsum tincidunt, sit amet ornare mauris pharetra. Quisque vestibulum, nibh quis blandit tincidunt, lorem magna eleifend ante, a placerat elit velit et libero. Nam sed consectetur nulla, vitae auctor est. Aenean quis convallis sem. Sed et magna

MICA - Towards Metrical Reconstruction of Human Faces

MICA - Towards Metrical Reconstruction of Human Faces [ECCV2022]
Wojciech Zielonka, Timo Bolkart, Justus Thies
Max Planck Institute for Intelligent Systems, Tübingen, Germany
RGB Input Deng et al. 19 Li et al. 22 Sanyal et al. 19 Feng et al. 21 Ours

Face reconstruction and tracking is a building block of numerous applications in AR/VR, human-machine interaction, as well as medical applications. Most of these applications rely on a metrically correct prediction of the shape, especially, when the reconstructed subject is put into a metrical context (i.e., when there is a reference object of known size).

A metrical reconstruction is also needed for any application that measures distances and dimensions of the subject (e.g., to virtually fit a glasses frame). State-of-the-art methods for face reconstruction from a single image are trained on large 2D image datasets in a self-supervised fashion. However, due to the nature of a perspective projection they are not able to reconstruct the actual face dimensions, and even predicting the average human face outperforms some of these methods in a metrical sense.

To learn the actual shape of a face, we argue for a supervised training scheme. Since there exists no large-scale 3D dataset for this task, we annotated and unified small- and medium-scale databases. The resulting unified dataset contains about 2300 identities with corresponding images. We made this dataset publicly available.

To this end, we take advantage of a face recognition network pretrained on a large-scale 2D image dataset, which provides distinct features for different faces and is robust to expression, illumination, and camera changes. Using these features, we train our face shape estimator in a supervised fashion, inheriting the robustness and generalization of the face recognition network.

Our method, which we call MICA (MetrIC fAce), outperforms the state-of-the-art reconstruction methods by a large margin, both on current non-metric benchmarks as well as on our metric (15% and 24% lower average error on NoW, respectively).



We propose a method for metrical human face shape estimation from a single image which exploits a supervised training scheme based on a mixture of different 2D, 2D/3D and 3D datasets. This estimation can be used for facial expression tracking using analysis-by-synthesis which optimizes for the camera intrinsics, as well as the per-frame illumination, facial expression and pose.


Current methods are not predicting metrical faces, which becomes visible when displaying them in a metrical space and not in their image spaces. To illustrate we render the prediction of the faces of toddlers in a common metrical space using the same projection. State-of-the-art approaches learned in a self-supervised fashion like Feng et al. 21 (DECA) [0] (fourth row) or weakly-supervised like Li et al. 22 (FOCUS) [1] (third row) scale the face of an adult to fit the observation in the image space, thus, the prediction in 3D is non-metrical. In contrast, our reconstruction method is able to recover the physiognomy of the toddlers. Input images were generated by StyleGan2 [2] .

Video

BibTeX

@inproceedings{Zielonka2022TowardsMR,
  title={Towards Metrical Reconstruction of Human Faces},
  author={Wojciech Zielonka and Timo Bolkart and Justus Thies},
  booktitle={European Conference on Computer Vision},
  year={2022},
  url={https://api.semanticscholar.org/CorpusID:248177832}
}