For immersive telepresence in AR or VR, we aim for digital humans (avatars) that mimic the motions and facial expressions of the actual subjects participating in a meeting. Besides the motion, these avatars should reflect the human's shape and appearance. Instead of prerecorded, old avatars, we aim to instantaneously reconstruct the subject's look to capture the actual appearance during a meeting. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a nunc odio. Etiam in purus cursus, hendrerit nunc in, tincidunt dolor. Ut sit amet molestie velit, vitae accumsan erat. Morbi elementum leo eu ipsum tincidunt, sit amet ornare mauris pharetra. Quisque vestibulum, nibh quis blandit tincidunt, lorem magna eleifend ante, a placerat elit velit et libero. Nam sed consectetur nulla, vitae auctor est. Aenean quis convallis sem. Sed et magna

INSTA - Instant Volumetric Head Avatars

INSTA - Instant Volumetric Head Avatars [CVPR2023]
Wojciech Zielonka, Timo Bolkart, Justus Thies
Max Planck Institute for Intelligent Systems, Tübingen, Germany

For immersive telepresence in AR or VR, we aim for digital humans (avatars) that mimic the motions and facial expressions of the actual subjects participating in a meeting. Besides the motion, these avatars should reflect the human's shape and appearance. Instead of prerecorded, old avatars, we aim to instantaneously reconstruct the subject's look to capture the actual appearance during a meeting.

To this end, we propose INSTA, which enables the reconstruction of an avatar within a few minutes (~10 min) and can be driven at interactive frame rates. For easy accessibility, we rely on commodity hardware to train and capture the avatar. Specifically, we use a single RGB camera to record the input video. State-of-the-art methods that use similar input data to reconstruct a human avatar require a relatively long time to train, ranging from around one day Grassal et al. to almost a week Gafni et al. or Zheng et al.



Given a short monocular RGB video, our method instantaneously optimizes a deformable neural radiance field to synthesize a photo-realistic animatable 3D neural head avatar. The neural radiance field is embedded in a multi-resolution grid by Muller et al. around a 3D face model which guides the deformations. The resulting head avatar can be viewed under novel views and animated at interactive frame rates.

Video

BibTeX


@article{zielonka2022insta,
  title        = {Instant Volumetric Head Avatars},
  author       = {Wojciech Zielonka and Timo Bolkart and Justus Thies},
  journal      = {2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year         = {2022},
  pages        = {4574--4584}
}