Face2Face: real-time face capture and reenactment of RGB videos

Authors:
Justus Thies

Technical University Munich, Garching, Germany

Technical University Munich, Garching, Germany
View Profile

,
Michael Zollhöfer

Stanford University, Stanford, CA

Stanford University, Stanford, CA
View Profile

,
Marc Stamminger

University of Erlangen-Nuremberg, Erlangen, Germany

University of Erlangen-Nuremberg, Erlangen, Germany
View Profile

,
Christian Theobalt

Max-Planck-Institute for Informatics, Saarbrücken, Germany

Max-Planck-Institute for Informatics, Saarbrücken, Germany
View Profile

,
Matthias Nießner

Technical University Munich, Garching, Germany

Technical University Munich, Garching, Germany
View Profile

Authors Info & Claims

Communications of the ACM Volume 62 Issue 1January 2019pp 96–104https://doi.org/10.1145/3292039

Published:19 December 2018Publication History

Communications of the ACM

Abstract

Face2Face is an approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time. This live setup has also been shown at SIGGRAPH Emerging Technologies 2016, by Thies et al. where it won the Best in Show Award.

References

Blanz, V., Vetter, T. A morphable model for the synthesis of 3d faces. Proc, SIGGRAPH (1999), ACM Press/Addison-Wesley Publishing Co., 187--194. Google ScholarDigital Library
Bouaziz, S., Wang, Y., Pauly, M. Online modeling for realtime facial animation. ACM TOG 32, 4 (2013), 40. Google ScholarDigital Library
Bregler, C., Covell, M., Slaney, M. Video rewrite: Driving visual speech with audio. Proc. SIGGRAPH (1997), ACM Press/Addison-Wesley Publishing Co., 353--360. Google ScholarDigital Library
Cao, C., Bradley, D., Zhou, K., Beeler, T. Real-time high-fidelity facial performance capture. ACM TOG 34, 4 (2015), 46: 1--46:9. Google ScholarDigital Library
Cao, C., Hou, Q., Zhou, K. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4 (2014), 43. Google ScholarDigital Library
Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., Chai, J. Accurate and robust 3d facial capture using a single rgbd camera. Proc. ICCV (2013), 3615--3622. Google ScholarDigital Library
Garrido, P., Valgaerts, L., Rehmsen, O., Thormaehlen, T., Perez, P., Theobalt, C. Automatic face reenactment. Proc. CVPR (2014). Google ScholarDigital Library
Garrido, P., Valgaerts, L., Sarmadi, H., Steiner, I., Varanasi, K., Perez, P., Theobalt, C. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. Computer Graphics Forum, Wiley-Blackwell, Hoboken, New Jersey, 2015. Google ScholarDigital Library
Hu, L., Saito, S., Wei, L., Nagano, K., Seo, J., Fursund, J., Sadeghi, I., Sun, C., Chen, Y., Li, H. Avatar digitization from a single image for real-time rendering. ACM Trans. Graph. 36, 6 (2017), 195:1--195:14. Google ScholarDigital Library
Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., Seitz, S.M. Being john malkovich. In Computer Vision---ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5--11, 2010, Proceedings, Part I (2010), 341--353. Google ScholarDigital Library
Li, H., Yu, J., Ye, Y., Bregler, C. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4 (2013), 42. Google ScholarDigital Library
Li, K., Xu, F., Wang, J., Dai, Q., Liu, Y. A data-driven approach for facial expression synthesis in video. Proc. CVPR (2012), 57--64. Google ScholarDigital Library
Ramamoorthi, R., Hanrahan, P. A signal-processing framework for inverse rendering. Proc. SIGGRAPH (ACM, 2001), 117--128. Google ScholarDigital Library
Saragih, J.M., Lucey, S., Cohn, J.F. Deformable model fitting by regularized landmark mean-shift. IJCV 91, 2 (2011), 200--215. Google ScholarDigital Library
Saragih, J.M., Lucey, S., Cohn, J.F. Real-time avatar animation from a single image. Automatic Face and Gesture Recognition Workshops (2011), 213--220.Google Scholar
Shi, F., Wu, H.-T., Tong, X., Chai, J. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6 (2014), 222. Google ScholarDigital Library
Siegl, C., Lange, V., Stamminger, M., Bauer, F., Thies, J. Faceforge: Markerless non-rigid face multi-projection mapping. IEEE Transactions on Visualization and Computer Graphics, 2017.Google ScholarCross Ref
Sumner, R.W., Popović, J. Deformation transfer for triangle meshes. ACM TOG 23, 3 (2004), 399--405. Google ScholarDigital Library
Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C. Real-time expression transfer for facial reenactment. ACM Trans. Graph. (TOG) 34, 6 (2015). Google ScholarDigital Library
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M. Demo of face2face: Real-time face capture and reenactment of RGB videos. ACM SIGGRAPH 2016 Emerging Technologies, SIGGRAPH '16 (ACM, 2016), New York, NY, USA, 5:1--5:2. Google ScholarDigital Library
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M. Face2Face: Real-time face capture and reenactment of RGB videos. Proc. Comp. Vision and Pattern Recog. (CVPR), IEEE (2016).Google ScholarCross Ref
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M. FaceVR: Real-time facial reenactment and eye gaze control in virtual reality. ArXiv, Non-Peer-Reviewed Prepublication by the Authors, abs/1610.03151 (2016). Google ScholarDigital Library
Vlasic, D., Brand, M., Pfister, H., Popović, J. Face transfer with multilinear models. ACM TOG 24, 3 (2005), 426--433. Google ScholarDigital Library
Weise, T., Bouaziz, S., Li, H., Pauly, M. Realtime Performance-Based Facial Animation 30, 4 (2011), 77. Google ScholarDigital Library
Weise, T., Li, H., Gool, L.V., Pauly, M. Face/off: Live facial puppetry. Proc. 2009 ACM SIGGRAPH/Eurographics Symposium on Computer animation (Proc. SCA'09), ETH Zurich, August 2009. Eurographics Association. Google ScholarDigital Library

Index Terms

Face2Face: real-time face capture and reenactment of RGB videos
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Demo of Face2Face: real-time face capture and reenactment of RGB videos
SIGGRAPH '16: ACM SIGGRAPH 2016 Emerging Technologies

We present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial ...
Read More
${Face2Face}^{ρ}$ : Real-Time High-Resolution One-Shot Face Reenactment
Computer Vision – ECCV 2022
Abstract
Existing one-shot face reenactment methods either present obvious artifacts in large pose transformations, or cannot well-preserve the identity information in the source images, or fail to meet the requirements of real-time applications due to the ... $^{}$ $^{}$ $^{}$
Read More
Collaborative expression representation using peak expression and intra class variation face images for practical subject-independent emotion recognition in videos

This paper proposes a facial expression recognition (FER) method in videos. The proposed method automatically selects the peak expression face from a video sequence using closeness of the face to the neutral expression. The severely non-frontal faces ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 62, Issue 1
January 2019
109 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3301004
Editor:
Andrew A. Chien
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 December 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 83
  Total Citations
  View Citations
- 20,506
  Total Downloads
- Downloads (Last 12 months)157
- Downloads (Last 6 weeks)79
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Face2Face: real-time face capture and reenactment of RGB videos

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Demo of Face2Face: real-time face capture and reenactment of RGB videos

${Face2Face}^{ρ}$ : Real-Time High-Resolution One-Shot Face Reenactment

Collaborative expression representation using peak expression and intra class variation face images for practical subject-independent emotion recognition in videos