research-article

Neural Rendering and Reenactment of Human Actor Videos

Authors:
Lingjie Liu

University of Hong Kong, Max Planck Institute for Informatics, Saarbrücken, Germany

University of Hong Kong, Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Weipeng Xu

Max Planck Institute for Informatics, Saarbrücken, Germany

Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Michael Zollhöfer

Stanford University, Max Planck Institute for Informatics, Saarbrücken, Germany

Stanford University, Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Hyeongwoo Kim

Max Planck Institute for Informatics, Saarbrücken, Germany

Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Florian Bernard

Max Planck Institute for Informatics, Saarbrücken, Germany

Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Marc Habermann

Max Planck Institute for Informatics, Saarbrücken, Germany

Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

,
Wenping Wang

University of Hong Kong, Hong Kong, China

University of Hong Kong, Hong Kong, China
View Profile

,
Christian Theobalt

Max Planck Institute for Informatics, Saarbrücken, Germany

Max Planck Institute for Informatics, Saarbrücken, Germany
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 38 Issue 5Article No.: 139pp 1–14https://doi.org/10.1145/3333002

Published:25 October 2019Publication History

ACM Transactions on Graphics

Abstract

We propose a method for generating video-realistic animations of real humans under user control. In contrast to conventional human character rendering, we do not require the availability of a production-quality photo-realistic three-dimensional (3D) model of the human but instead rely on a video sequence in conjunction with a (medium-quality) controllable 3D template model of the person. With that, our approach significantly reduces production cost compared to conventional rendering approaches based on production-quality 3D models and can also be used to realistically edit existing videos. Technically, this is achieved by training a neural network that translates simple synthetic images of a human character into realistic imagery. For training our networks, we first track the 3D motion of the person in the video using the template model and subsequently generate a synthetically rendered version of the video. These images are then used to train a conditional generative adversarial network that translates synthetic images of the 3D model into realistic imagery of the human. We evaluate our method for the reenactment of another person that is tracked to obtain the motion data, and show video results generated from artist-designed skeleton motion. Our results outperform the state of the art in learning-based human image synthesis.

Supplemental Material

Available for Download

zip

liu.zip (284.9 MB)

Supplemental movie and image files for, Neural Rendering and Reenactment of Human Actor Videos

References

Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/.Google Scholar
Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3 (Jul. 2005), 408--416. DOI:https://doi.org/10.1145/1073204.1073207Google ScholarDigital Library
Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, and John Guttag. 2018. Synthesizing images of humans in unseen poses. In Proceedings of the Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
Pascal Bérard, Derek Bradley, Maurizio Nitti, Thabo Beeler, and Markus Gross. 2014. High-quality capture of eyes. ACM Trans. Graph. 33, 6 (2014), 223:1--223:12.Google ScholarDigital Library
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’99). ACM Press/Addison-Wesley Publishing Co., New York, NY, 187--194.Google ScholarDigital Library
Federica Bogo, Michael J. Black, Matthew Loper, and Javier Romero. 2015. Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV’15). 2300--2308.Google ScholarDigital Library
Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 1339--1346.Google ScholarCross Ref
Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. 2003. Free-viewpoint video of human actors. ACM Trans. Graph. 22, 3 (Jul. 2003).Google ScholarDigital Library
Dan Casas, Marco Volino, John Collomosse, and Adrian Hilton. 2014. 4D video textures for interactive character appearance. Comput. Graph. Forum 33, 2 (May 2014), 371--380. DOI:https://doi.org/10.1111/cgf.12296Google ScholarDigital Library
Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 1520--1529. DOI:https://doi.org/10.1109/ICCV.2017.168Google ScholarCross Ref
Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015a. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4, Article 69 (Jul. 2015), 13 pages. DOI:https://doi.org/10.1145/2766945Google ScholarDigital Library
Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015b. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4 (2015), 69.Google ScholarDigital Library
Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008 Papers (SIGGRAPH’08). ACM, 98:1--98:10.Google ScholarDigital Library
Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning. 1--16.Google Scholar
Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. 2017. Motion2Fusion: Real-time volumetric performance capture. ACM Trans. Graph. 36, 6, Article 246 (Nov. 2017), 16 pages.Google ScholarDigital Library
Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4, Article 114 (Jul. 2016), 13 pages. DOI:https://doi.org/10.1145/2897824.2925969Google ScholarDigital Library
Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Micha Andriluka, Chris Bregler, Bernt Schiele, and Christian Theobalt. 2015. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3810--3818.Google ScholarCross Ref
Patrick Esser, Ekaterina Sutter, and Björn Ommer. 2018. A variational U-net for conditional appearance and shape generation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’18). 8857--8866. DOI:10.1109/CVPR.2018.00923Google ScholarCross Ref
Juergen Gall, Carsten Stoll, Edilson De Aguiar, Christian Theobalt, Bodo Rosenhahn, and Hans-Peter Seidel. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 1746--1753.Google ScholarCross Ref
Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor S. Lempitsky. 2016. DeepWarp: Photorealistic image resynthesis for gaze manipulation. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), Vol. 2. MIT Press, 2672--2680.Google Scholar
Thomas Helten, Meinard Muller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-time body tracking with one depth camera and inertial sensors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13).Google ScholarDigital Library
Geoffrey E. Hinton and Ruslan Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (Jul. 2006), 504--507. DOI:https://doi.org/10.1126/science.1127647Google ScholarCross Ref
Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967--5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google Scholar
Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18), Vancouver, BC, Canada Conference Track Proceedings. https://openreview.net/forum?id=Hk99zCeAb.Google Scholar
H. Kim, P. Garrido, A. Tewari, W. Xu, J. Thies, N. Nießner, P. Pérez, C. Richardt, M. Zollhöfer, and C. Theobalt. 2018. Deep video portraits. ACM Transactions on Graphics 37, 4 (2018), 163:1--163:14. DOI:10.1145/3197517.3201283Google ScholarDigital Library
Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR’14), Banff, AB, Canada, Conference Track Proceeding. http://arxiv.org/abs/1312.6114Google Scholar
Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. 2017. A generative model of people in clothing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).Google Scholar
Guannan Li, Yebin Liu, and Qionghai Dai. 2014. Free-viewpoint video relighting from multi-view sequence under general illumination. Mach. Vision Appl. 25, 7 (Oct. 2014), 1737--1746. DOI:https://doi.org/10.1007/s00138-013-0559-0Google ScholarDigital Library
Kun Li, Jingyu Yang, Leijie Liu, Ronan Boulic, Yu-Kun Lai, Yebin Liu, Yubin Li, and Eray Molla. 2017b. SPA: Sparse photorealistic animation using a single RGB-D camera. IEEE Trans. Circ. Syst. Vid. Technol. 27, 4 (Apr. 2017), 771--783. DOI:https://doi.org/10.1109/TCSVT.2016.2556419Google Scholar
Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017a. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 6 (Nov. 2017), 194:1--194:17.Google ScholarDigital Library
Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised image-to-image translation networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., 700--708. http://dl.acm.org/citation.cfm?id=3294771.3294838.Google ScholarDigital Library
Yebin Liu, Carsten Stoll, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Markerless motion capture of interacting characters using multi-view image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 1249--1256.Google ScholarDigital Library
Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1--248:16.Google ScholarDigital Library
Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, and Mario Fritz. 2018. Disentangled person image generation. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
Liqian Ma, Qianru Sun, Xu Jia, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. 2017. Pose guided person image generation. NIPS.Google Scholar
Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley Publishing Co., 369--374.Google ScholarDigital Library
Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3D human pose estimation using transfer learning and improved CNN supervision. In Proceedings of the International Conference on 3D Vision (3DV’17).Google Scholar
Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (Proceedings of SIGGRAPH 2017) 36, 4 (2017), 14.Google ScholarDigital Library
Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. (2014). https://arxiv.org/abs/1411.1784 arXiv:1411.1784.Google Scholar
Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. GANerated hands for real-time 3D hand tracking from monocular RGB. CoRR abs/1712.01057 (2017).Google Scholar
Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. Conditional image generation with PixelCNN decoders. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., 4797--4805.Google Scholar
Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarCross Ref
Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), San Juan, Puerto Rico, Conference Track Proceedings. http://arxiv.org/abs/1511.06434Google Scholar
Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36, 6 (Nov. 2017), 245:1--245:17. http://doi.acm.org/10.1145/3130800.3130883Google ScholarDigital Library
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’15). 234--241. DOI:https://doi.org/10.1007/978-3-319-24574-4_28Google ScholarCross Ref
Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. Int. J. Comput. Vis. 67, 3 (2006), 251--276.Google ScholarDigital Library
Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91, 2 (2011), 200--215. DOI:https://doi.org/10.1007/s11263-010-0380-4Google ScholarDigital Library
Arno Schödl and Irfan A. Essa. 2002. Controlled animation of video sprites. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA’02). ACM, New York, NY, 121--127. DOI:https://doi.org/10.1145/545261.545281Google Scholar
Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. 2000. Video textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). ACM Press/Addison-Wesley Publishing Co., New York, NY, 489--498. DOI:https://doi.org/10.1145/344779.345012Google ScholarDigital Library
A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. 2017. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2242--2251. DOI:https://doi.org/10.1109/CVPR.2017.241Google ScholarCross Ref
Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuiliere, and Nicu Sebe. 2018. Deformable GANs for pose-based human image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 3 (2007), 21--31.Google ScholarDigital Library
Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. In ACM Transactions on Graphics, Vol. 27. ACM, 97.Google ScholarDigital Library
Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Trans. Graph. 28, 5 (2009), 174.Google ScholarDigital Library
Marco Volino, Dan Casas, John Collomosse, and Adrian Hilton. 2014. Optimal representation of multiple view video. In Proceedings of the British Machine Vision Conference. BMVA Press.Google ScholarCross Ref
Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, and Hao Li. 2016. Capturing dynamic textured surfaces of moving targets. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarCross Ref
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
Michael Waschbüsch, Stephan Würmlin, Daniel Cotting, Filip Sadlo, and Markus Gross. 2005. Scalable 3D video of dynamic scenes. Vis. Comput. 21, 8--10 (2005), 629--638.Google ScholarCross Ref
Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
E. Wood, T. Baltrusaitis, L. P. Morency, P. Robinson, and A. Bulling. 2016. A 3D morphable eye region model for gaze estimation. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
Chenglei Wu, Derek Bradley, Pablo Garrido, Michael Zollhöfer, Christian Theobalt, Markus Gross, and Thabo Beeler. 2016. Model-based teeth reconstruction. ACM Trans. Graph. 35, 6, Article 220 (2016), 220:1--220:13 pages.Google ScholarDigital Library
Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. In ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2013), Vol. 32. 161:1--161:11. DOI:https://doi.org/10.1145/2508363.2508418Google ScholarDigital Library
Feng Xu, Yebin Liu, Carsten Stoll, James Tompkin, Gaurav Bharaj, Qionghai Dai, Hans-Peter Seidel, Jan Kautz, and Christian Theobalt. 2011. Video-based characters: Creating new human performances from a multi-view video database. In ACM SIGGRAPH 2011 Papers (SIGGRAPH’11). ACM, New York, NY, Article 32, 10 pages. DOI:https://doi.org/10.1145/1964921.1964927Google ScholarDigital Library
Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human performance capture from monocular video. ACM Transactions on Graphics 37, 2 (2018), 27:1--27:15. DOI:10.1145/3181973Google ScholarDigital Library
Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. IEEE International Conference on Computer Vision (ICCV’17) 2868--2876. DOI:https://doi.org/10.1109/ICCV.2017.310Google ScholarCross Ref
T. Yu, K. Guo, F. Xu, Y. Dong, Z. Su, J. Zhao, J. Li, Q. Dai, and Y. Liu. 2017. BodyFusion: Real-time capture of human motion and surface geometry using a single depth camera. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17). ACM, 910--919. DOI:https://doi.org/10.1109/ICCV.2017.104Google Scholar
Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE, 676--683.Google ScholarDigital Library
Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep kinematic pose regression. ECCV Workshops.Google ScholarCross Ref
Hao Zhu, Hao Su, Peng Wang, Xun Cao, and Ruigang Yang. 2018. View extrapolation of human body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV’17). 2242--2251. DOI:https://doi.org/10.1109/ICCV.2017.244Google Scholar
C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. In ACM Transactions on Graphics (TOG), Vol. 23. ACM, 600--608.Google ScholarDigital Library

Index Terms

Neural Rendering and Reenactment of Human Actor Videos
1. Computing methodologies

Recommendations

Neural actor: neural free-view synthesis of human actors with pose control

We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is developed upon recent neural scene representation and rendering works which learn ...
Read More
Deferred neural rendering: image synthesis using neural textures

The modern computer graphics pipeline can synthesize images at remarkable visual quality; however, it requires well-defined, high-quality 3D content as input. In this work, we explore the use of imperfect 3D content, for instance, obtained from photo-...
Read More
Deep video portraits

We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Graphics Volume 38, Issue 5
October 2019
191 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/3341165
Editor:
Marc Alexa
TU Berlin, Germany
Issue’s Table of Contents
Copyright © 2019 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 October 2019
- Accepted: 1 April 2019
- Revised: 1 February 2019
- Received: 1 September 2018
Published in tog Volume 38, Issue 5

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Neural rendering
conditional GAN
deep learning
rendering-to-video translation
video-based characters
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 91
  Total Citations
  View Citations
- 812
  Total Downloads
- Downloads (Last 12 months)85
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Neural Rendering and Reenactment of Human Actor Videos

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Neural actor: neural free-view synthesis of human actors with pose control

Deferred neural rendering: image synthesis using neural textures

Deep video portraits