skip to main content
research-article

Dynamic 3D avatar creation from hand-held video input

Published:27 July 2015Publication History
Skip Abstract Section

Abstract

We present a complete pipeline for creating fully rigged, personalized 3D facial avatars from hand-held video. Our system faithfully recovers facial expression dynamics of the user by adapting a blendshape template to an image sequence of recorded expressions using an optimization that integrates feature tracking, optical flow, and shape from shading. Fine-scale details such as wrinkles are captured separately in normal maps and ambient occlusion maps. From this user- and expression-specific data, we learn a regressor for on-the-fly detail synthesis during animation to enhance the perceptual realism of the avatars. Our system demonstrates that the use of appropriate reconstruction priors yields compelling face rigs even with a minimalistic acquisition system and limited user assistance. This facilitates a range of new applications in computer animation and consumer-level online communication based on personalized avatars. We present realtime application demos to validate our method.

Skip Supplemental Material Section

Supplemental Material

a45.mp4

mp4

20.4 MB

References

  1. Alexander, O., Rogers, M., Lambeth, W., Chiang, M., and Debevec, P. 2009. Creating a photoreal digital actor: The digital emily project. In Visual Media Production, 2009. CVMP'09. Conference for. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alexander, O., Fyffe, G., Busch, J., Yu, X., Ichikari, R., Jones, A., Debevec, P., Jimenez, J., Danvoye, E., Antionazzi, B., Eheler, M., Kysela, Z., and von der Pahlen, J. 2013. Digital ira: Creating a real-time photoreal digital actor. In ACM SIGGRAPH 2013 Posters. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amberg, B., Blake, A., Fitzgibbon, A. W., Romdhani, S., and Vetter, T. 2007. Reconstructing high quality face-surfaces using model based stereo. In ICCV.Google ScholarGoogle Scholar
  4. Beeler, T., Bickel, B., Beardsley, P., Sumner, B., and Gross, M. 2010. High-quality single-shot capture of facial geometry. ACM Transactions on Graphics (TOG). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Beeler, T., Hahn, F., Bradley, D., Bickel, B., Beardsley, P., Gotsman, C., Sumner, R. W., and Gross, M. 2011. High-quality passive facial performance capture using anchor frames. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Beeler, T., Bickel, B., Noris, G., Beardsley, P., Marschner, S., Sumner, R. W., and Gross, M. 2012. Coupled 3d reconstruction of sparse facial hair and skin. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bérard, P., Bradley, D., Nitti, M., Beeler, T., and Gross, M. 2014. High-quality capture of eyes. ACM Trans. Graph. 33, 6 (Nov.), 223:1--223:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bermano, A. H., Bradley, D., Beeler, T., Zünd, F., Nowrouzezahrai, D., Baran, I., Sorkine, O., Pfister, H., Sumner, R. W., Bickel, B., and Gross, M. 2014. Facial performance enhancement using dynamic shape space analysis. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bickel, B., Lang, M., Botsch, M., Otaduy, M. A., and Gross, M. H. 2008. Pose-space animation and transfer of facial details. In Symposium on Computer Animation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3d faces. In Proceedings of the 26th annual conference on Computer graphics and interactive techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Botsch, M., Kobbelt, L., Pauly, M., Alliez, P., and Levy, B. 2010. Polygon Mesh Processing. AK Peters.Google ScholarGoogle Scholar
  12. Bouaziz, S., Wang, Y., and Pauly, M. 2013. Online modeling for realtime facial animation. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bouaziz, S., Tagliasacchi, A., and Pauly, M. 2014. Dynamic 2d/3d registration. Eurographics Tutorial.Google ScholarGoogle Scholar
  14. Bunnell, M. 2005. Dynamic ambient occlusion and indirect lighting. Gpu gems.Google ScholarGoogle Scholar
  15. Cao, X., Wei, Y., Wen, F., and Sun, J. 2012. Face alignment by explicit shape regression. In CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Cao, C., Weng, Y., Lin, S., and Zhou, K. 2013. 3d shape regression for real-time facial animation. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cao, C., Hou, Q., and Zhou, K. 2014. Displaced dynamic expression regression for real-time facial tracking and animation. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Cao, C., Weng, Y., Zhou, S., Tong, Y., and Zhou, K. 2014. Facewarehouse: A 3d facial expression database for visual computing. IEEE Transactions on Visualization and Computer Graphics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Chai, M., Zheng, C., and Zhou, K. 2014. A reduced model for interactive hairs. ACM Transactions on Graphics (July). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Chambolle, A., Caselles, V., Cremers, D., Novaga, M., and Pock, T. 2010. An introduction to total variation for image analysis. Theoretical foundations and numerical methods for sparse recovery 9, 263--340.Google ScholarGoogle Scholar
  21. Chartrand, R., and Yin, W. 2008. Iteratively reweighted algorithms for compressive sensing. In Acoustics, speech and signal processing, 2008. ICASSP 2008. IEEE international conference on, IEEE, 3869--3872.Google ScholarGoogle Scholar
  22. Duda, R. O., and Hart, P. E. 1972. Use of the hough transformation to detect lines and curves in pictures. Commun. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Frolova, D., Simakov, D., and Basri, R. 2004. Accuracy of spherical harmonic approximations for images of lambertian objects under far and near lighting. In Computer Vision-ECCV 2004.Google ScholarGoogle Scholar
  24. Fu, W. J. 1998. Penalized Regressions: The Bridge versus the Lasso. J. Comp. Graph. Stat..Google ScholarGoogle Scholar
  25. Furukawa, Y., and Ponce, J. 2010. Accurate, dense, and robust multiview stereopsis. IEEE Trans. Pattern Anal. Mach. Intell.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Garrido, P., Valgaerts, L., Wu, C., and Theobalt, C. 2013. Reconstructing detailed dynamic face geometry from monocular video. ACM Transactions on Graphics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. In Proc. of ACM SIGGRAPH Asia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Gonzalez, R. C., and Woods, R. E. 2006. Digital Image Processing (3rd Edition). Prentice-Hall, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Gray, R. M. 2006. Toeplitz and circulant matrices: A review. now publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Hu, L., Ma, C., Luo, L., and Li, H. 2014. Robust hair capture using simulated examples. ACM Transactions on Graphics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Huang, H., Chai, J., Tong, X., and Wu, H.-T. 2011. Leveraging motion capture and 3d scanning for high-fidelity facial performance acquisition. ACM Trans. Graph. (Proc. SIGGRAPH). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Jimenez, J., Echevarria, J. I., Oat, C., and Gutierrez, D. 2011. GPU Pro 2. AK Peters Ltd., ch. Practical and Realistic Facial Wrinkles Animation.Google ScholarGoogle Scholar
  33. Kemelmacher-Shlizerman, I., and Basri, R. 2011. 3d face reconstruction from a single image using a single reference face shape. Pattern Analysis and Machine Intelligence, IEEE Transactions on. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Lewis, J. P., Anjyo, K., Rhee, T., Zhang, M., Pighin, F., and Deng, Z. 2014. Practice and Theory of Blendshape Facial Models. In EG - STARs.Google ScholarGoogle Scholar
  35. Li, H., Adams, B., Guibas, L. J., and Pauly, M. 2009. Robust single-view geometry and motion reconstruction. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Li, H., Yu, J., Ye, Y., and Bregler, C. 2013. Realtime facial animation with on-the-fly correctives. ACM Transactions on Graphics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Li, J., Xu, W., Cheng, Z., Xu, K., and Klein, R. 2015. Lightweight wrinkle synthesis for 3d facial modeling and animation. Computer-Aided Design 58, 0, 117--122. Solid and Physical Modeling 2014.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Ma, W.-C., Jones, A., Chiang, J.-Y., Hawkins, T., Frederiksen, S., Peers, P., Vukovic, M., Ouhyoung, M., and Debevec, P. 2008. Facial performance synthesis using deformation-driven polynomial displacement maps. Proc. of ACM SIGGRAPH Asia. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Oat, C. 2007. Animated wrinkle maps. In ACM SIGGRAPH 2007 courses. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Pérez, P., Gangnet, M., and Blake, A. 2003. Poisson image editing. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Saragih, J. M., Lucey, S., and Cohn, J. F. 2009. Face alignment through subspace constrained mean-shifts. In Computer Vision, 2009 IEEE 12th International Conference on.Google ScholarGoogle Scholar
  42. Saragih, J. M., Lucey, S., and Cohn, J. F. 2011. Deformable model fitting by regularized landmark mean-shift. Int. J. Comput. Vision. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Shi, F., Wu, H.-T., Tong, X., and Chai, J. 2014. Automatic acquisition of high-fidelity facial performances using monocular videos. ACM Trans. Graph. 33, 6 (Nov.), 222:1--222:13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Sumner, R. W., and Popović, J. 2004. Deformation transfer for triangle meshes. ACM Trans. Graph.. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Valgaerts, L., Wu, C., Bruhn, A., Seidel, H.-P., and Theobalt, C. 2012. Lightweight binocular facial performance capture under uncontrolled lighting. Proc. of ACM SIGGRAPH Asia.Google ScholarGoogle Scholar
  46. Venkataraman, K., Lodha, S., and Raghavan, R. 2005. A kinematic-variational model for animating skin with wrinkles. Computers & Graphics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Vlasic, D., Brand, M., Pfister, H., and Popović, J. 2005. Face transfer with multilinear models.Google ScholarGoogle Scholar
  48. Weise, T., Li, H., Van Gool, L., and Pauly, M. 2009. Face/off: Live facial puppetry. ACM Trans. Graph..Google ScholarGoogle Scholar
  49. Weise, T., Bouaziz, S., Li, H., and Pauly, M. 2011. Realtime performance-based facial animation. In ACM SIGGRAPH 2011 Papers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wu, Y., Kalra, P., and Thalmann, N. M. 1996. Simulation of static and dynamic wrinkles of skin. In Proc. of IEEE Computer Animation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Wu, C., Zollhöfer, M., Niessner, M., Stamminger, M., Izadi, S., and Theobalt, C. 2014. Real-time shading-based refinement for consumer depth cameras. ACM Trans. Graph. 33, 6 (Nov.), 200:1--200:10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Wu, C. 2013. Towards linear-time incremental structure from motion. In 3D Vision, 2013 International Conference on. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Zach, C., Pock, T., and Bischof, H. 2007. A duality based approach for realtime tv-l 1 optical flow. In Pattern Recognition. Springer, 214--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Zhang, L., Snavely, N., Curless, B., and Seitz, S. M. 2004. Spacetime faces: High-resolution capture for modeling and animation. In ACM Annual Conference on Computer Graphics.Google ScholarGoogle Scholar

Index Terms

  1. Dynamic 3D avatar creation from hand-held video input

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Graphics
      ACM Transactions on Graphics  Volume 34, Issue 4
      August 2015
      1307 pages
      ISSN:0730-0301
      EISSN:1557-7368
      DOI:10.1145/2809654
      Issue’s Table of Contents

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 July 2015
      Published in tog Volume 34, Issue 4

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader