Abstract
We propose a method for generating video-realistic animations of real humans under user control. In contrast to conventional human character rendering, we do not require the availability of a production-quality photo-realistic three-dimensional (3D) model of the human but instead rely on a video sequence in conjunction with a (medium-quality) controllable 3D template model of the person. With that, our approach significantly reduces production cost compared to conventional rendering approaches based on production-quality 3D models and can also be used to realistically edit existing videos. Technically, this is achieved by training a neural network that translates simple synthetic images of a human character into realistic imagery. For training our networks, we first track the 3D motion of the person in the video using the template model and subsequently generate a synthetically rendered version of the video. These images are then used to train a conditional generative adversarial network that translates synthetic images of the 3D model into realistic imagery of the human. We evaluate our method for the reenactment of another person that is tracked to obtain the motion data, and show video results generated from artist-designed skeleton motion. Our results outperform the state of the art in learning-based human image synthesis.
Supplemental Material
Available for Download
Supplemental movie and image files for, Neural Rendering and Reenactment of Human Actor Videos
- Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/.Google Scholar
- Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3 (Jul. 2005), 408--416. DOI:https://doi.org/10.1145/1073204.1073207Google ScholarDigital Library
- Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, and John Guttag. 2018. Synthesizing images of humans in unseen poses. In Proceedings of the Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
- Pascal Bérard, Derek Bradley, Maurizio Nitti, Thabo Beeler, and Markus Gross. 2014. High-quality capture of eyes. ACM Trans. Graph. 33, 6 (2014), 223:1--223:12.Google ScholarDigital Library
- Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’99). ACM Press/Addison-Wesley Publishing Co., New York, NY, 187--194.Google ScholarDigital Library
- Federica Bogo, Michael J. Black, Matthew Loper, and Javier Romero. 2015. Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV’15). 2300--2308.Google ScholarDigital Library
- Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
- Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 1339--1346.Google ScholarCross Ref
- Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. 2003. Free-viewpoint video of human actors. ACM Trans. Graph. 22, 3 (Jul. 2003).Google ScholarDigital Library
- Dan Casas, Marco Volino, John Collomosse, and Adrian Hilton. 2014. 4D video textures for interactive character appearance. Comput. Graph. Forum 33, 2 (May 2014), 371--380. DOI:https://doi.org/10.1111/cgf.12296Google ScholarDigital Library
- Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 1520--1529. DOI:https://doi.org/10.1109/ICCV.2017.168Google ScholarCross Ref
- Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
- Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015a. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4, Article 69 (Jul. 2015), 13 pages. DOI:https://doi.org/10.1145/2766945Google ScholarDigital Library
- Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015b. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4 (2015), 69.Google ScholarDigital Library
- Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008 Papers (SIGGRAPH’08). ACM, 98:1--98:10.Google ScholarDigital Library
- Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning. 1--16.Google Scholar
- Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. 2017. Motion2Fusion: Real-time volumetric performance capture. ACM Trans. Graph. 36, 6, Article 246 (Nov. 2017), 16 pages.Google ScholarDigital Library
- Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4, Article 114 (Jul. 2016), 13 pages. DOI:https://doi.org/10.1145/2897824.2925969Google ScholarDigital Library
- Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Micha Andriluka, Chris Bregler, Bernt Schiele, and Christian Theobalt. 2015. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3810--3818.Google ScholarCross Ref
- Patrick Esser, Ekaterina Sutter, and Björn Ommer. 2018. A variational U-net for conditional appearance and shape generation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’18). 8857--8866. DOI:10.1109/CVPR.2018.00923Google ScholarCross Ref
- Juergen Gall, Carsten Stoll, Edilson De Aguiar, Christian Theobalt, Bodo Rosenhahn, and Hans-Peter Seidel. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 1746--1753.Google ScholarCross Ref
- Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor S. Lempitsky. 2016. DeepWarp: Photorealistic image resynthesis for gaze manipulation. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
- Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), Vol. 2. MIT Press, 2672--2680.Google Scholar
- Thomas Helten, Meinard Muller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-time body tracking with one depth camera and inertial sensors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13).Google ScholarDigital Library
- Geoffrey E. Hinton and Ruslan Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (Jul. 2006), 504--507. DOI:https://doi.org/10.1126/science.1127647Google ScholarCross Ref
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967--5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google Scholar
- Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18), Vancouver, BC, Canada Conference Track Proceedings. https://openreview.net/forum?id=Hk99zCeAb.Google Scholar
- H. Kim, P. Garrido, A. Tewari, W. Xu, J. Thies, N. Nießner, P. Pérez, C. Richardt, M. Zollhöfer, and C. Theobalt. 2018. Deep video portraits. ACM Transactions on Graphics 37, 4 (2018), 163:1--163:14. DOI:10.1145/3197517.3201283Google ScholarDigital Library
- Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR’14), Banff, AB, Canada, Conference Track Proceeding. http://arxiv.org/abs/1312.6114Google Scholar
- Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. 2017. A generative model of people in clothing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).Google Scholar
- Guannan Li, Yebin Liu, and Qionghai Dai. 2014. Free-viewpoint video relighting from multi-view sequence under general illumination. Mach. Vision Appl. 25, 7 (Oct. 2014), 1737--1746. DOI:https://doi.org/10.1007/s00138-013-0559-0Google ScholarDigital Library
- Kun Li, Jingyu Yang, Leijie Liu, Ronan Boulic, Yu-Kun Lai, Yebin Liu, Yubin Li, and Eray Molla. 2017b. SPA: Sparse photorealistic animation using a single RGB-D camera. IEEE Trans. Circ. Syst. Vid. Technol. 27, 4 (Apr. 2017), 771--783. DOI:https://doi.org/10.1109/TCSVT.2016.2556419Google Scholar
- Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017a. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 6 (Nov. 2017), 194:1--194:17.Google ScholarDigital Library
- Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised image-to-image translation networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., 700--708. http://dl.acm.org/citation.cfm?id=3294771.3294838.Google ScholarDigital Library
- Yebin Liu, Carsten Stoll, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Markerless motion capture of interacting characters using multi-view image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 1249--1256.Google ScholarDigital Library
- Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1--248:16.Google ScholarDigital Library
- Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, and Mario Fritz. 2018. Disentangled person image generation. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
- Liqian Ma, Qianru Sun, Xu Jia, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. 2017. Pose guided person image generation. NIPS.Google Scholar
- Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley Publishing Co., 369--374.Google ScholarDigital Library
- Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3D human pose estimation using transfer learning and improved CNN supervision. In Proceedings of the International Conference on 3D Vision (3DV’17).Google Scholar
- Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (Proceedings of SIGGRAPH 2017) 36, 4 (2017), 14.Google ScholarDigital Library
- Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. (2014). https://arxiv.org/abs/1411.1784 arXiv:1411.1784.Google Scholar
- Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. GANerated hands for real-time 3D hand tracking from monocular RGB. CoRR abs/1712.01057 (2017).Google Scholar
- Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. Conditional image generation with PixelCNN decoders. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., 4797--4805.Google Scholar
- Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarCross Ref
- Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
- Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), San Juan, Puerto Rico, Conference Track Proceedings. http://arxiv.org/abs/1511.06434Google Scholar
- Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36, 6 (Nov. 2017), 245:1--245:17. http://doi.acm.org/10.1145/3130800.3130883Google ScholarDigital Library
- Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’15). 234--241. DOI:https://doi.org/10.1007/978-3-319-24574-4_28Google ScholarCross Ref
- Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. Int. J. Comput. Vis. 67, 3 (2006), 251--276.Google ScholarDigital Library
- Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91, 2 (2011), 200--215. DOI:https://doi.org/10.1007/s11263-010-0380-4Google ScholarDigital Library
- Arno Schödl and Irfan A. Essa. 2002. Controlled animation of video sprites. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA’02). ACM, New York, NY, 121--127. DOI:https://doi.org/10.1145/545261.545281Google Scholar
- Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. 2000. Video textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). ACM Press/Addison-Wesley Publishing Co., New York, NY, 489--498. DOI:https://doi.org/10.1145/344779.345012Google ScholarDigital Library
- A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. 2017. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2242--2251. DOI:https://doi.org/10.1109/CVPR.2017.241Google ScholarCross Ref
- Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuiliere, and Nicu Sebe. 2018. Deformable GANs for pose-based human image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
- Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 3 (2007), 21--31.Google ScholarDigital Library
- Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. In ACM Transactions on Graphics, Vol. 27. ACM, 97.Google ScholarDigital Library
- Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Trans. Graph. 28, 5 (2009), 174.Google ScholarDigital Library
- Marco Volino, Dan Casas, John Collomosse, and Adrian Hilton. 2014. Optimal representation of multiple view video. In Proceedings of the British Machine Vision Conference. BMVA Press.Google ScholarCross Ref
- Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, and Hao Li. 2016. Capturing dynamic textured surfaces of moving targets. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarCross Ref
- Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarCross Ref
- Michael Waschbüsch, Stephan Würmlin, Daniel Cotting, Filip Sadlo, and Markus Gross. 2005. Scalable 3D video of dynamic scenes. Vis. Comput. 21, 8--10 (2005), 629--638.Google ScholarCross Ref
- Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarCross Ref
- E. Wood, T. Baltrusaitis, L. P. Morency, P. Robinson, and A. Bulling. 2016. A 3D morphable eye region model for gaze estimation. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google Scholar
- Chenglei Wu, Derek Bradley, Pablo Garrido, Michael Zollhöfer, Christian Theobalt, Markus Gross, and Thabo Beeler. 2016. Model-based teeth reconstruction. ACM Trans. Graph. 35, 6, Article 220 (2016), 220:1--220:13 pages.Google ScholarDigital Library
- Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. In ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2013), Vol. 32. 161:1--161:11. DOI:https://doi.org/10.1145/2508363.2508418Google ScholarDigital Library
- Feng Xu, Yebin Liu, Carsten Stoll, James Tompkin, Gaurav Bharaj, Qionghai Dai, Hans-Peter Seidel, Jan Kautz, and Christian Theobalt. 2011. Video-based characters: Creating new human performances from a multi-view video database. In ACM SIGGRAPH 2011 Papers (SIGGRAPH’11). ACM, New York, NY, Article 32, 10 pages. DOI:https://doi.org/10.1145/1964921.1964927Google ScholarDigital Library
- Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human performance capture from monocular video. ACM Transactions on Graphics 37, 2 (2018), 27:1--27:15. DOI:10.1145/3181973Google ScholarDigital Library
- Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. IEEE International Conference on Computer Vision (ICCV’17) 2868--2876. DOI:https://doi.org/10.1109/ICCV.2017.310Google ScholarCross Ref
- T. Yu, K. Guo, F. Xu, Y. Dong, Z. Su, J. Zhao, J. Li, Q. Dai, and Y. Liu. 2017. BodyFusion: Real-time capture of human motion and surface geometry using a single depth camera. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17). ACM, 910--919. DOI:https://doi.org/10.1109/ICCV.2017.104Google Scholar
- Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE, 676--683.Google ScholarDigital Library
- Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep kinematic pose regression. ECCV Workshops.Google ScholarCross Ref
- Hao Zhu, Hao Su, Peng Wang, Xun Cao, and Ruigang Yang. 2018. View extrapolation of human body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarCross Ref
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV’17). 2242--2251. DOI:https://doi.org/10.1109/ICCV.2017.244Google Scholar
- C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. In ACM Transactions on Graphics (TOG), Vol. 23. ACM, 600--608.Google ScholarDigital Library
Index Terms
- Neural Rendering and Reenactment of Human Actor Videos
Recommendations
Neural actor: neural free-view synthesis of human actors with pose control
We propose Neural Actor (NA), a new method for high-quality synthesis of humans from arbitrary viewpoints and under arbitrary controllable poses. Our method is developed upon recent neural scene representation and rendering works which learn ...
Deferred neural rendering: image synthesis using neural textures
The modern computer graphics pipeline can synthesize images at remarkable visual quality; however, it requires well-defined, high-quality 3D content as input. In this work, we explore the use of imperfect 3D content, for instance, obtained from photo-...
Deep video portraits
We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full ...
Comments