skip to main content
research-article

Learning to be a depth camera for close-range human capture and interaction

Published:27 July 2014Publication History
Skip Abstract Section

Abstract

We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human-computer interaction and capture scenarios. Experiments show an accuracy that outperforms a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.

Skip Supplemental Material Section

Supplemental Material

a86-sidebyside.mp4

mp4

29.9 MB

References

  1. Ahmed, A. H., and Farag, A. A. 2007. Shape from shading under various imaging conditions. In Proc. CVPR, IEEE, 1--8.Google ScholarGoogle Scholar
  2. Amit, Y., and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9, 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Barron, J. T., and Malik, J. 2013. Shape, illumination, and reflectance from shading. Tech. Rep. UCB/EECS-2013-117, EECS, UC Berkeley, May.Google ScholarGoogle Scholar
  4. Batlle, J., Mouaddib, E., and Salvi, J. 1998. Recent progress in coded structured light as a technique to solve the correspondence problem: a survey. Pattern Recognition 31, 7, 963--982.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ben-Arie, J., and Nandy, D. 1998. A neural network approach for reconstructing surface shape from shading. In In Proc. ICIP 98., vol. 2, IEEE, 972--976.Google ScholarGoogle Scholar
  6. Besl, P. J. 1988. Active, optical range imaging sensors. Machine vision and applications 1, 2, 127--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Blais, F. 2004. Review of 20 years of range sensor development. Journal of Electronic Imaging 13, 1.Google ScholarGoogle ScholarCross RefCross Ref
  8. Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. Proc. ACM SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Breiman, L. 2001. Random forests. Machine Learning 45, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Brown, M. Z., Burschka, D., and Hager, G. D. 2003. Advances in computational stereo. PAMI 25, 8, 993--1008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Comaniciu, D., and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24, 5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Criminisi, A., and Shotton, J. 2013. Decision Forests for Computer Vision and Medical Image Analysis. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fredembach, C., and Susstrunk, S. 2008. Colouring the near-infrared. In Color and Imaging Conference, vol. 2008, Society for Imaging Science and Technology, 176--182.Google ScholarGoogle Scholar
  14. Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM Transactions on Graphics (TOG) 30, 6, 129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Girshick, R., Shotton, J., Kohli, P., Criminisi, A., and Fitzgibbon, A. 2011. Efficient regression of general-activity human poses from depth images. In Proc. ICCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Guan, P., Weiss, A., Balan, A., and Black, M. 2009. Estimating human shape and pose from a single image. In Proc. ICCV.Google ScholarGoogle Scholar
  17. Gurbuz, S. 2009. Application of inverse square law for 3d sensing. In SPIE Optical Engineering+ Applications, International Society for Optics and Photonics, 744706--744706.Google ScholarGoogle Scholar
  18. Hernández, C., Vogiatzis, G., and Cipolla, R. 2008. Multiview photometric stereo. IEEE Trans. PAMI 30, 3, 548--554. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hertzmann, A., and Seitz, S. 2005. Example-based photometric stereo: Shape reconstruction with general, varying BRDFs. PAMI 27, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hoiem, D., Efros, A., and Hebert, M. 2005. Automatic photo pop-up. In Proc. ACM SIGGRAPH. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Horn, B. K. 1975. Obtaining shape from shading information. The psychology of computer vision, 115--155.Google ScholarGoogle Scholar
  22. Ideses, I., Yaroslavsky, L., and Fishbain, B. 2007. Real-time 2D to 3D video conversion. J. of Real-Time Image Processing 2, 3--9.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jiang, T., Liu, B., Lu, Y., and Evans, D. 2003. A neural network approach to shape from shading. International journal of computer mathematics 80, 4, 433--439.Google ScholarGoogle Scholar
  24. Karsch, K., Liu, C., and Kang, S. 2012. Depth extraction from video using non-parametric sampling. In Proc. ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Keskin, C., Kiraç, F., Kara, Y., and Akarun, L. 2012. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Proc. ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Khan, N., Tran, L., and Tappen, M. 2009. Training many-parameter shape-from-shading models using a surface database. In Proc. ICCV Workshop.Google ScholarGoogle Scholar
  27. Kim, D., Hilliges, O., Izadi, S., Butler, A. D., Chen, J., Oikonomidis, I., and Olivier, P. 2012. Digits: freehand 3d interactions anywhere using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM, 167--176. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Krishnan, D., and Fergus, R. 2009. Dark flash photography. In ACM Transactions on Graphics, SIGGRAPH 2009 Conference Proceedings, vol. 28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Lanman, D., and Taubin, G. 2009. Build your own 3D scanner: 3D photography for beginners. In ACM SIGGRAPH 2009 Courses, ACM, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Liao, M., Wang, L., Yang, R., and Gong, M. 2007. Light fall-off stereo. In Proc. CVPR.Google ScholarGoogle Scholar
  31. Liu, C. P., Cheng, B. H., Chen, P. L., and Jeng, T. R. 2011. Study of three-dimensional sensing by using inverse square law. Magnetics, IEEE Transactions on 47, 3, 687--690.Google ScholarGoogle ScholarCross RefCross Ref
  32. Marschner, S. R., Westin, S. H., Lafortune, E. P., Torrance, K. E., and Greenberg, D. P. 1999. Image-based BRDF measurement including human skin. In Rendering Techniques 99. Springer, 131--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Mulligan, J., and Brolly, X. 2004. Surface determination by photometric ranging. In Proc. CVPR Workshop. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Newcombe, R. A., Izadi, S., et al. 2011. Kinect-fusion: Real-time dense surface mapping and tracking. In Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, IEEE, 127--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Prados, E., and Faugeras, O. 2005. Shape from shading: a well-posed problem? In Proc. CVPR, vol. 2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Remondino, F., and Stoppa, D. 2013. ToF range-imaging cameras. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Rother, C., Kiefel, M., Zhang, L., Schölkopf, B., and Gehler, P. V. 2011. Recovering intrinsic images with a global sparsity prior on reflectance. In Proc. NIPS.Google ScholarGoogle Scholar
  38. Saxena, A., Sun, M., and Ng, A. 2009. Make3D: Learning 3D scene structure from a single still image. PAMI 31, 5, 824--840. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In IJCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shotton, J., Winn, J., Rother, C., and Criminisi, A. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proc. ECCV. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from single depth images. In Proc. CVPR. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Simpson, C. R., Kohl, M., Essenpreis, M., and Cope, M. 1998. Near-infrared optical properties of ex vivo human skin and subcutaneous tissues measured using the monte carlo inversion technique. Physics in Medicine and Biology 43, 2465--2478.Google ScholarGoogle ScholarCross RefCross Ref
  43. Smith, W. A., and Hancock, E. R. 2008. Facial shape-from-shading and recognition using principal geodesic analysis and robust statistics. International Journal of Computer Vision 76, 1, 71--91. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Tunwattanapong, B., Fyffe, G., Graham, P., Busch, J., Yu, X., Ghosh, A., and Debevec, P. 2013. Acquiring reflectance and shape from continuous spherical harmonic illumination. ACM Transactions on Graphics (TOG) 32, 4, 109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Vineet, V., Rother, C., and Torr, P. 2013. Higher order priors for joint intrinsic image, objects, and attributes estimation. In Proc. NIPS, 557--565.Google ScholarGoogle Scholar
  46. Visentini-Scarzanella, M., Stoyanov, D., and Yang, G.-Z. 2012. Metric depth recovery from monocular images using shape-from-shading and specularities. In Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE, 25--28.Google ScholarGoogle Scholar
  47. Vogel, O., Breuss, M., Leichtweis, T., and Weickert, J. 2009. Fast shape from shading for Phong-type surfaces. In International Conf. Scale Space and Variational Methods. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wang, X., and Yang, R. 2010. Learning 3D shape from a single facial image via non-linear manifold embedding and alignment. In Proc. CVPR.Google ScholarGoogle Scholar
  49. Wei, G.-Q., and Hirzinger, G. 1996. Learning shape from shading by a multilayer network. IEEE Transactions on Neural Networks 7, 4, 985--995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Zhang, Z., Tsa, P.-S., Cryer, J. E., and Shah, M. 1999. Shape from shading: A survey. PAMI 21, 8, 690--706. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhang, Z. 2000. A flexible new technique for camera calibration. IEEE Trans. PAMI 22, 11, 1330--1334. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Zhang, S. 2010. Recent progresses on real-time 3d shape measurement using digital fringe projection techniques. Optics and lasers in engineering 48, 2, 149--158.Google ScholarGoogle Scholar

Index Terms

  1. Learning to be a depth camera for close-range human capture and interaction

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Graphics
            ACM Transactions on Graphics  Volume 33, Issue 4
            July 2014
            1366 pages
            ISSN:0730-0301
            EISSN:1557-7368
            DOI:10.1145/2601097
            Issue’s Table of Contents

            Copyright © 2014 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 27 July 2014
            Published in tog Volume 33, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader