skip to main content
research-article

Neural Rendering and Reenactment of Human Actor Videos

Authors Info & Claims
Published:25 October 2019Publication History
Skip Abstract Section

Abstract

We propose a method for generating video-realistic animations of real humans under user control. In contrast to conventional human character rendering, we do not require the availability of a production-quality photo-realistic three-dimensional (3D) model of the human but instead rely on a video sequence in conjunction with a (medium-quality) controllable 3D template model of the person. With that, our approach significantly reduces production cost compared to conventional rendering approaches based on production-quality 3D models and can also be used to realistically edit existing videos. Technically, this is achieved by training a neural network that translates simple synthetic images of a human character into realistic imagery. For training our networks, we first track the 3D motion of the person in the video using the template model and subsequently generate a synthetically rendered version of the video. These images are then used to train a conditional generative adversarial network that translates synthetic images of the 3D model into realistic imagery of the human. We evaluate our method for the reenactment of another person that is tracked to obtain the motion data, and show video results generated from artist-designed skeleton motion. Our results outperform the state of the art in learning-based human image synthesis.

Skip Supplemental Material Section

Supplemental Material

References

  1. Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Retrieved from https://www.tensorflow.org/.Google ScholarGoogle Scholar
  2. Dragomir Anguelov, Praveen Srinivasan, Daphne Koller, Sebastian Thrun, Jim Rodgers, and James Davis. 2005. SCAPE: Shape completion and animation of people. ACM Trans. Graph. 24, 3 (Jul. 2005), 408--416. DOI:https://doi.org/10.1145/1073204.1073207Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Guha Balakrishnan, Amy Zhao, Adrian V. Dalca, Fredo Durand, and John Guttag. 2018. Synthesizing images of humans in unseen poses. In Proceedings of the Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  4. Pascal Bérard, Derek Bradley, Maurizio Nitti, Thabo Beeler, and Markus Gross. 2014. High-quality capture of eyes. ACM Trans. Graph. 33, 6 (2014), 223:1--223:12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’99). ACM Press/Addison-Wesley Publishing Co., New York, NY, 187--194.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Federica Bogo, Michael J. Black, Matthew Loper, and Javier Romero. 2015. Detailed full-body reconstructions of moving people from monocular RGB-D sequences. In Proceedings of the International Conference on Computer Vision (ICCV’15). 2300--2308.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Federica Bogo, Angjoo Kanazawa, Christoph Lassner, Peter Gehler, Javier Romero, and Michael J. Black. 2016. Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarGoogle Scholar
  8. Cedric Cagniart, Edmond Boyer, and Slobodan Ilic. 2010. Free-form mesh tracking: A patch-based approach. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 1339--1346.Google ScholarGoogle ScholarCross RefCross Ref
  9. Joel Carranza, Christian Theobalt, Marcus A. Magnor, and Hans-Peter Seidel. 2003. Free-viewpoint video of human actors. ACM Trans. Graph. 22, 3 (Jul. 2003).Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dan Casas, Marco Volino, John Collomosse, and Adrian Hilton. 2014. 4D video textures for interactive character appearance. Comput. Graph. Forum 33, 2 (May 2014), 371--380. DOI:https://doi.org/10.1111/cgf.12296Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Qifeng Chen and Vladlen Koltun. 2017. Photographic image synthesis with cascaded refinement networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 1520--1529. DOI:https://doi.org/10.1109/ICCV.2017.168Google ScholarGoogle ScholarCross RefCross Ref
  12. Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2018. StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  13. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015a. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4, Article 69 (Jul. 2015), 13 pages. DOI:https://doi.org/10.1145/2766945Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alvaro Collet, Ming Chuang, Pat Sweeney, Don Gillett, Dennis Evseev, David Calabrese, Hugues Hoppe, Adam Kirk, and Steve Sullivan. 2015b. High-quality streamable free-viewpoint video. ACM Trans. Graph. 34, 4 (2015), 69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Edilson De Aguiar, Carsten Stoll, Christian Theobalt, Naveed Ahmed, Hans-Peter Seidel, and Sebastian Thrun. 2008. Performance capture from sparse multi-view video. In ACM SIGGRAPH 2008 Papers (SIGGRAPH’08). ACM, 98:1--98:10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. In Proceedings of the 1st Annual Conference on Robot Learning. 1--16.Google ScholarGoogle Scholar
  17. Mingsong Dou, Philip Davidson, Sean Ryan Fanello, Sameh Khamis, Adarsh Kowdle, Christoph Rhemann, Vladimir Tankovich, and Shahram Izadi. 2017. Motion2Fusion: Real-time volumetric performance capture. ACM Trans. Graph. 36, 6, Article 246 (Nov. 2017), 16 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Mingsong Dou, Sameh Khamis, Yury Degtyarev, Philip Davidson, Sean Ryan Fanello, Adarsh Kowdle, Sergio Orts Escolano, Christoph Rhemann, David Kim, Jonathan Taylor, Pushmeet Kohli, Vladimir Tankovich, and Shahram Izadi. 2016. Fusion4D: Real-time performance capture of challenging scenes. ACM Trans. Graph. 35, 4, Article 114 (Jul. 2016), 13 pages. DOI:https://doi.org/10.1145/2897824.2925969Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ahmed Elhayek, Edilson de Aguiar, Arjun Jain, Jonathan Tompson, Leonid Pishchulin, Micha Andriluka, Chris Bregler, Bernt Schiele, and Christian Theobalt. 2015. Efficient ConvNet-based marker-less motion capture in general scenes with a low number of cameras. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3810--3818.Google ScholarGoogle ScholarCross RefCross Ref
  20. Patrick Esser, Ekaterina Sutter, and Björn Ommer. 2018. A variational U-net for conditional appearance and shape generation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’18). 8857--8866. DOI:10.1109/CVPR.2018.00923Google ScholarGoogle ScholarCross RefCross Ref
  21. Juergen Gall, Carsten Stoll, Edilson De Aguiar, Christian Theobalt, Bodo Rosenhahn, and Hans-Peter Seidel. 2009. Motion capture using joint skeleton tracking and surface estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 1746--1753.Google ScholarGoogle ScholarCross RefCross Ref
  22. Yaroslav Ganin, Daniil Kononenko, Diana Sungatullina, and Victor S. Lempitsky. 2016. DeepWarp: Photorealistic image resynthesis for gaze manipulation. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarGoogle Scholar
  23. Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS’14), Vol. 2. MIT Press, 2672--2680.Google ScholarGoogle Scholar
  24. Thomas Helten, Meinard Muller, Hans-Peter Seidel, and Christian Theobalt. 2013. Real-time body tracking with one depth camera and inertial sensors. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Geoffrey E. Hinton and Ruslan Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (Jul. 2006), 504--507. DOI:https://doi.org/10.1126/science.1127647Google ScholarGoogle ScholarCross RefCross Ref
  26. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 5967--5976. DOI:https://doi.org/10.1109/CVPR.2017.632Google ScholarGoogle Scholar
  27. Tero Karras, Timo Aila, Samuli Laine, and Jaakko Lehtinen. 2018. Progressive growing of GANs for improved quality, stability, and variation. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18), Vancouver, BC, Canada Conference Track Proceedings. https://openreview.net/forum?id=Hk99zCeAb.Google ScholarGoogle Scholar
  28. H. Kim, P. Garrido, A. Tewari, W. Xu, J. Thies, N. Nießner, P. Pérez, C. Richardt, M. Zollhöfer, and C. Theobalt. 2018. Deep video portraits. ACM Transactions on Graphics 37, 4 (2018), 163:1--163:14. DOI:10.1145/3197517.3201283Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Diederik P. Kingma and Max Welling. 2013. Auto-encoding variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations (ICLR’14), Banff, AB, Canada, Conference Track Proceeding. http://arxiv.org/abs/1312.6114Google ScholarGoogle Scholar
  30. Christoph Lassner, Gerard Pons-Moll, and Peter V. Gehler. 2017. A generative model of people in clothing. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17).Google ScholarGoogle Scholar
  31. Guannan Li, Yebin Liu, and Qionghai Dai. 2014. Free-viewpoint video relighting from multi-view sequence under general illumination. Mach. Vision Appl. 25, 7 (Oct. 2014), 1737--1746. DOI:https://doi.org/10.1007/s00138-013-0559-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kun Li, Jingyu Yang, Leijie Liu, Ronan Boulic, Yu-Kun Lai, Yebin Liu, Yubin Li, and Eray Molla. 2017b. SPA: Sparse photorealistic animation using a single RGB-D camera. IEEE Trans. Circ. Syst. Vid. Technol. 27, 4 (Apr. 2017), 771--783. DOI:https://doi.org/10.1109/TCSVT.2016.2556419Google ScholarGoogle Scholar
  33. Tianye Li, Timo Bolkart, Michael J. Black, Hao Li, and Javier Romero. 2017a. Learning a model of facial shape and expression from 4D scans. ACM Trans. Graph. 36, 6 (Nov. 2017), 194:1--194:17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised image-to-image translation networks. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., 700--708. http://dl.acm.org/citation.cfm?id=3294771.3294838.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Yebin Liu, Carsten Stoll, Juergen Gall, Hans-Peter Seidel, and Christian Theobalt. 2011. Markerless motion capture of interacting characters using multi-view image segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). IEEE, 1249--1256.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Matthew Loper, Naureen Mahmood, Javier Romero, Gerard Pons-Moll, and Michael J. Black. 2015. SMPL: A skinned multi-person linear model. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 34, 6 (Oct. 2015), 248:1--248:16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Liqian Ma, Qianru Sun, Stamatios Georgoulis, Luc Van Gool, Bernt Schiele, and Mario Fritz. 2018. Disentangled person image generation. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  38. Liqian Ma, Qianru Sun, Xu Jia, Bernt Schiele, Tinne Tuytelaars, and Luc Van Gool. 2017. Pose guided person image generation. NIPS.Google ScholarGoogle Scholar
  39. Wojciech Matusik, Chris Buehler, Ramesh Raskar, Steven J. Gortler, and Leonard McMillan. 2000. Image-based visual hulls. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques. ACM Press/Addison-Wesley Publishing Co., 369--374.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Dushyant Mehta, Helge Rhodin, Dan Casas, Oleksandr Sotnychenko, Weipeng Xu, and Christian Theobalt. 2017a. Monocular 3D human pose estimation using transfer learning and improved CNN supervision. In Proceedings of the International Conference on 3D Vision (3DV’17).Google ScholarGoogle Scholar
  41. Dushyant Mehta, Srinath Sridhar, Oleksandr Sotnychenko, Helge Rhodin, Mohammad Shafiei, Hans-Peter Seidel, Weipeng Xu, Dan Casas, and Christian Theobalt. 2017b. VNect: Real-time 3D human pose estimation with a single RGB camera. ACM Trans. Graph. (Proceedings of SIGGRAPH 2017) 36, 4 (2017), 14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. (2014). https://arxiv.org/abs/1411.1784 arXiv:1411.1784.Google ScholarGoogle Scholar
  43. Franziska Mueller, Florian Bernard, Oleksandr Sotnychenko, Dushyant Mehta, Srinath Sridhar, Dan Casas, and Christian Theobalt. 2017. GANerated hands for real-time 3D hand tracking from monocular RGB. CoRR abs/1712.01057 (2017).Google ScholarGoogle Scholar
  44. Aäron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. 2016. Conditional image generation with PixelCNN decoders. In Proceedings of the 30th International Conference on Neural Information Processing Systems (NIPS’16). Curran Associates Inc., 4797--4805.Google ScholarGoogle Scholar
  45. Georgios Pavlakos, Xiaowei Zhou, Konstantinos G. Derpanis, and Kostas Daniilidis. 2017. Coarse-to-fine volumetric prediction for single-image 3D human pose. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17).Google ScholarGoogle ScholarCross RefCross Ref
  46. Leonid Pishchulin, Eldar Insafutdinov, Siyu Tang, Bjoern Andres, Mykhaylo Andriluka, Peter Gehler, and Bernt Schiele. 2016. DeepCut: Joint subset partition and labeling for multi person pose estimation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  47. Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised representation learning with deep convolutional generative adversarial networks. In Proceedings of the 4th International Conference on Learning Representations (ICLR’16), San Juan, Puerto Rico, Conference Track Proceedings. http://arxiv.org/abs/1511.06434Google ScholarGoogle Scholar
  48. Javier Romero, Dimitrios Tzionas, and Michael J. Black. 2017. Embodied hands: Modeling and capturing hands and bodies together. ACM Trans. Graph. (Proc. SIGGRAPH Asia) 36, 6 (Nov. 2017), 245:1--245:17. http://doi.acm.org/10.1145/3130800.3130883Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI’15). 234--241. DOI:https://doi.org/10.1007/978-3-319-24574-4_28Google ScholarGoogle ScholarCross RefCross Ref
  50. Rómer Rosales and Stan Sclaroff. 2006. Combining generative and discriminative models in a framework for articulated pose estimation. Int. J. Comput. Vis. 67, 3 (2006), 251--276.Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Jason M. Saragih, Simon Lucey, and Jeffrey F. Cohn. 2011. Deformable model fitting by regularized landmark mean-shift. International Journal of Computer Vision 91, 2 (2011), 200--215. DOI:https://doi.org/10.1007/s11263-010-0380-4Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Arno Schödl and Irfan A. Essa. 2002. Controlled animation of video sprites. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation (SCA’02). ACM, New York, NY, 121--127. DOI:https://doi.org/10.1145/545261.545281Google ScholarGoogle Scholar
  53. Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. 2000. Video textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH’00). ACM Press/Addison-Wesley Publishing Co., New York, NY, 489--498. DOI:https://doi.org/10.1145/344779.345012Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. 2017. Learning from simulated and unsupervised images through adversarial training. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 2242--2251. DOI:https://doi.org/10.1109/CVPR.2017.241Google ScholarGoogle ScholarCross RefCross Ref
  55. Aliaksandr Siarohin, Enver Sangineto, Stephane Lathuiliere, and Nicu Sebe. 2018. Deformable GANs for pose-based human image generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  56. Jonathan Starck and Adrian Hilton. 2007. Surface capture for performance-based animation. IEEE Comput. Graph. Appl. 27, 3 (2007), 21--31.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Daniel Vlasic, Ilya Baran, Wojciech Matusik, and Jovan Popović. 2008. Articulated mesh animation from multi-view silhouettes. In ACM Transactions on Graphics, Vol. 27. ACM, 97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Daniel Vlasic, Pieter Peers, Ilya Baran, Paul Debevec, Jovan Popović, Szymon Rusinkiewicz, and Wojciech Matusik. 2009. Dynamic shape capture using multi-view photometric stereo. ACM Trans. Graph. 28, 5 (2009), 174.Google ScholarGoogle ScholarDigital LibraryDigital Library
  59. Marco Volino, Dan Casas, John Collomosse, and Adrian Hilton. 2014. Optimal representation of multiple view video. In Proceedings of the British Machine Vision Conference. BMVA Press.Google ScholarGoogle ScholarCross RefCross Ref
  60. Ruizhe Wang, Lingyu Wei, Etienne Vouga, Qixing Huang, Duygu Ceylan, Gerard Medioni, and Hao Li. 2016. Capturing dynamic textured surfaces of moving targets. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarGoogle ScholarCross RefCross Ref
  61. Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle ScholarCross RefCross Ref
  62. Michael Waschbüsch, Stephan Würmlin, Daniel Cotting, Filip Sadlo, and Markus Gross. 2005. Scalable 3D video of dynamic scenes. Vis. Comput. 21, 8--10 (2005), 629--638.Google ScholarGoogle ScholarCross RefCross Ref
  63. Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. 2016. Convolutional pose machines. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR’16).Google ScholarGoogle ScholarCross RefCross Ref
  64. E. Wood, T. Baltrusaitis, L. P. Morency, P. Robinson, and A. Bulling. 2016. A 3D morphable eye region model for gaze estimation. In Proceedings of the European Conference on Computer Vision (ECCV’16).Google ScholarGoogle Scholar
  65. Chenglei Wu, Derek Bradley, Pablo Garrido, Michael Zollhöfer, Christian Theobalt, Markus Gross, and Thabo Beeler. 2016. Model-based teeth reconstruction. ACM Trans. Graph. 35, 6, Article 220 (2016), 220:1--220:13 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Chenglei Wu, Carsten Stoll, Levi Valgaerts, and Christian Theobalt. 2013. On-set performance capture of multiple actors with a stereo camera. In ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2013), Vol. 32. 161:1--161:11. DOI:https://doi.org/10.1145/2508363.2508418Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Feng Xu, Yebin Liu, Carsten Stoll, James Tompkin, Gaurav Bharaj, Qionghai Dai, Hans-Peter Seidel, Jan Kautz, and Christian Theobalt. 2011. Video-based characters: Creating new human performances from a multi-view video database. In ACM SIGGRAPH 2011 Papers (SIGGRAPH’11). ACM, New York, NY, Article 32, 10 pages. DOI:https://doi.org/10.1145/1964921.1964927Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Weipeng Xu, Avishek Chatterjee, Michael Zollhoefer, Helge Rhodin, Dushyant Mehta, Hans-Peter Seidel, and Christian Theobalt. 2018. MonoPerfCap: Human performance capture from monocular video. ACM Transactions on Graphics 37, 2 (2018), 27:1--27:15. DOI:10.1145/3181973Google ScholarGoogle ScholarDigital LibraryDigital Library
  69. Zili Yi, Hao Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised dual learning for image-to-image translation. IEEE International Conference on Computer Vision (ICCV’17) 2868--2876. DOI:https://doi.org/10.1109/ICCV.2017.310Google ScholarGoogle ScholarCross RefCross Ref
  70. T. Yu, K. Guo, F. Xu, Y. Dong, Z. Su, J. Zhao, J. Li, Q. Dai, and Y. Liu. 2017. BodyFusion: Real-time capture of human motion and surface geometry using a single depth camera. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV’17). ACM, 910--919. DOI:https://doi.org/10.1109/ICCV.2017.104Google ScholarGoogle Scholar
  71. Qing Zhang, Bo Fu, Mao Ye, and Ruigang Yang. 2014. Quality dynamic human body modeling using a single low-cost depth camera. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). IEEE, 676--683.Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Xingyi Zhou, Xiao Sun, Wei Zhang, Shuang Liang, and Yichen Wei. 2016. Deep kinematic pose regression. ECCV Workshops.Google ScholarGoogle ScholarCross RefCross Ref
  73. Hao Zhu, Hao Su, Peng Wang, Xun Cao, and Ruigang Yang. 2018. View extrapolation of human body from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).Google ScholarGoogle ScholarCross RefCross Ref
  74. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In IEEE International Conference on Computer Vision (ICCV’17). 2242--2251. DOI:https://doi.org/10.1109/ICCV.2017.244Google ScholarGoogle Scholar
  75. C. Lawrence Zitnick, Sing Bing Kang, Matthew Uyttendaele, Simon Winder, and Richard Szeliski. 2004. High-quality video view interpolation using a layered representation. In ACM Transactions on Graphics (TOG), Vol. 23. ACM, 600--608.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Neural Rendering and Reenactment of Human Actor Videos

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Graphics
            ACM Transactions on Graphics  Volume 38, Issue 5
            October 2019
            191 pages
            ISSN:0730-0301
            EISSN:1557-7368
            DOI:10.1145/3341165
            Issue’s Table of Contents

            Copyright © 2019 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 25 October 2019
            • Accepted: 1 April 2019
            • Revised: 1 February 2019
            • Received: 1 September 2018
            Published in tog Volume 38, Issue 5

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format