Abstract
Face2Face is an approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial expressions of the target video by a source actor and re-render the manipulated output video in a photo-realistic fashion. To this end, we first address the under-constrained problem of facial identity recovery from monocular video by non-rigid model-based bundling. At run time, we track facial expressions of both source and target video using a dense photometric consistency measure. Reenactment is then achieved by fast and efficient deformation transfer between source and target. The mouth interior that best matches the re-targeted expression is retrieved from the target sequence and warped to produce an accurate fit. Finally, we convincingly re-render the synthesized target face on top of the corresponding video stream such that it seamlessly blends with the real-world illumination. We demonstrate our method in a live setup, where Youtube videos are reenacted in real time. This live setup has also been shown at SIGGRAPH Emerging Technologies 2016, by Thies et al. where it won the Best in Show Award.
- Blanz, V., Vetter, T. A morphable model for the synthesis of 3d faces. Proc, SIGGRAPH (1999), ACM Press/Addison-Wesley Publishing Co., 187--194. Google ScholarDigital Library
- Bouaziz, S., Wang, Y., Pauly, M. Online modeling for realtime facial animation. ACM TOG 32, 4 (2013), 40. Google ScholarDigital Library
- Bregler, C., Covell, M., Slaney, M. Video rewrite: Driving visual speech with audio. Proc. SIGGRAPH (1997), ACM Press/Addison-Wesley Publishing Co., 353--360. Google ScholarDigital Library
- Cao, C., Bradley, D., Zhou, K., Beeler, T. Real-time high-fidelity facial performance capture. ACM TOG 34, 4 (2015), 46: 1--46:9. Google ScholarDigital Library
- Cao, C., Hou, Q., Zhou, K. Displaced dynamic expression regression for real-time facial tracking and animation. ACM TOG 33, 4 (2014), 43. Google ScholarDigital Library
- Chen, Y.-L., Wu, H.-T., Shi, F., Tong, X., Chai, J. Accurate and robust 3d facial capture using a single rgbd camera. Proc. ICCV (2013), 3615--3622. Google ScholarDigital Library
- Garrido, P., Valgaerts, L., Rehmsen, O., Thormaehlen, T., Perez, P., Theobalt, C. Automatic face reenactment. Proc. CVPR (2014). Google ScholarDigital Library
- Garrido, P., Valgaerts, L., Sarmadi, H., Steiner, I., Varanasi, K., Perez, P., Theobalt, C. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. Computer Graphics Forum, Wiley-Blackwell, Hoboken, New Jersey, 2015. Google ScholarDigital Library
- Hu, L., Saito, S., Wei, L., Nagano, K., Seo, J., Fursund, J., Sadeghi, I., Sun, C., Chen, Y., Li, H. Avatar digitization from a single image for real-time rendering. ACM Trans. Graph. 36, 6 (2017), 195:1--195:14. Google ScholarDigital Library
- Kemelmacher-Shlizerman, I., Sankar, A., Shechtman, E., Seitz, S.M. Being john malkovich. In Computer Vision---ECCV 2010, 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5--11, 2010, Proceedings, Part I (2010), 341--353. Google ScholarDigital Library
- Li, H., Yu, J., Ye, Y., Bregler, C. Realtime facial animation with on-the-fly correctives. ACM TOG 32, 4 (2013), 42. Google ScholarDigital Library
- Li, K., Xu, F., Wang, J., Dai, Q., Liu, Y. A data-driven approach for facial expression synthesis in video. Proc. CVPR (2012), 57--64. Google ScholarDigital Library
- Ramamoorthi, R., Hanrahan, P. A signal-processing framework for inverse rendering. Proc. SIGGRAPH (ACM, 2001), 117--128. Google ScholarDigital Library
- Saragih, J.M., Lucey, S., Cohn, J.F. Deformable model fitting by regularized landmark mean-shift. IJCV 91, 2 (2011), 200--215. Google ScholarDigital Library
- Saragih, J.M., Lucey, S., Cohn, J.F. Real-time avatar animation from a single image. Automatic Face and Gesture Recognition Workshops (2011), 213--220.Google Scholar
- Shi, F., Wu, H.-T., Tong, X., Chai, J. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM TOG 33, 6 (2014), 222. Google ScholarDigital Library
- Siegl, C., Lange, V., Stamminger, M., Bauer, F., Thies, J. Faceforge: Markerless non-rigid face multi-projection mapping. IEEE Transactions on Visualization and Computer Graphics, 2017.Google ScholarCross Ref
- Sumner, R.W., Popović, J. Deformation transfer for triangle meshes. ACM TOG 23, 3 (2004), 399--405. Google ScholarDigital Library
- Thies, J., Zollhöfer, M., Nießner, M., Valgaerts, L., Stamminger, M., Theobalt, C. Real-time expression transfer for facial reenactment. ACM Trans. Graph. (TOG) 34, 6 (2015). Google ScholarDigital Library
- Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M. Demo of face2face: Real-time face capture and reenactment of RGB videos. ACM SIGGRAPH 2016 Emerging Technologies, SIGGRAPH '16 (ACM, 2016), New York, NY, USA, 5:1--5:2. Google ScholarDigital Library
- Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M. Face2Face: Real-time face capture and reenactment of RGB videos. Proc. Comp. Vision and Pattern Recog. (CVPR), IEEE (2016).Google ScholarCross Ref
- Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M. FaceVR: Real-time facial reenactment and eye gaze control in virtual reality. ArXiv, Non-Peer-Reviewed Prepublication by the Authors, abs/1610.03151 (2016). Google ScholarDigital Library
- Vlasic, D., Brand, M., Pfister, H., Popović, J. Face transfer with multilinear models. ACM TOG 24, 3 (2005), 426--433. Google ScholarDigital Library
- Weise, T., Bouaziz, S., Li, H., Pauly, M. Realtime Performance-Based Facial Animation 30, 4 (2011), 77. Google ScholarDigital Library
- Weise, T., Li, H., Gool, L.V., Pauly, M. Face/off: Live facial puppetry. Proc. 2009 ACM SIGGRAPH/Eurographics Symposium on Computer animation (Proc. SCA'09), ETH Zurich, August 2009. Eurographics Association. Google ScholarDigital Library
Index Terms
- Face2Face: real-time face capture and reenactment of RGB videos
Recommendations
Demo of Face2Face: real-time face capture and reenactment of RGB videos
SIGGRAPH '16: ACM SIGGRAPH 2016 Emerging TechnologiesWe present a novel approach for real-time facial reenactment of a monocular target video sequence (e.g., Youtube video). The source sequence is also a monocular video stream, captured live with a commodity webcam. Our goal is to animate the facial ...
: Real-Time High-Resolution One-Shot Face Reenactment
Computer Vision – ECCV 2022AbstractExisting one-shot face reenactment methods either present obvious artifacts in large pose transformations, or cannot well-preserve the identity information in the source images, or fail to meet the requirements of real-time applications due to the ...
Collaborative expression representation using peak expression and intra class variation face images for practical subject-independent emotion recognition in videos
This paper proposes a facial expression recognition (FER) method in videos. The proposed method automatically selects the peak expression face from a video sequence using closeness of the face to the neutral expression. The severely non-frontal faces ...
Comments