research-article

Learning to be a depth camera for close-range human capture and interaction

Authors:
Sean Ryan Fanello

Microsoft Research and Istituto Italiano di Tecnologia

Microsoft Research and Istituto Italiano di Tecnologia
View Profile

,
Cem Keskin

Microsoft Research

Microsoft Research
View Profile

,
Shahram Izadi

Microsoft Research

Microsoft Research
View Profile

,
Pushmeet Kohli

Microsoft Research

Microsoft Research
View Profile

,
David Kim

Microsoft Research

Microsoft Research
View Profile

,
David Sweeney

Microsoft Research

Microsoft Research
View Profile

,
Antonio Criminisi

Microsoft Research

Microsoft Research
View Profile

,
Jamie Shotton

Microsoft Research

Microsoft Research
View Profile

,
Sing Bing Kang

Microsoft Research

Microsoft Research
View Profile

,
Tim Paek

Microsoft Research

Microsoft Research
View Profile

Authors Info & Claims

ACM Transactions on Graphics Volume 33 Issue 4Article No.: 86pp 1–11https://doi.org/10.1145/2601097.2601223

Published:27 July 2014Publication History

ACM Transactions on Graphics

Abstract

We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human-computer interaction and capture scenarios. Experiments show an accuracy that outperforms a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.

Supplemental Material

a86-sidebyside.mp4

mp4

29.9 MB

Download

Available for Download

zip

a86-fanello.zip (309.3 MB)

Supplemental material.

References

Ahmed, A. H., and Farag, A. A. 2007. Shape from shading under various imaging conditions. In Proc. CVPR, IEEE, 1--8.Google Scholar
Amit, Y., and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9, 7. Google ScholarDigital Library
Barron, J. T., and Malik, J. 2013. Shape, illumination, and reflectance from shading. Tech. Rep. UCB/EECS-2013-117, EECS, UC Berkeley, May.Google Scholar
Batlle, J., Mouaddib, E., and Salvi, J. 1998. Recent progress in coded structured light as a technique to solve the correspondence problem: a survey. Pattern Recognition 31, 7, 963--982.Google ScholarCross Ref
Ben-Arie, J., and Nandy, D. 1998. A neural network approach for reconstructing surface shape from shading. In In Proc. ICIP 98., vol. 2, IEEE, 972--976.Google Scholar
Besl, P. J. 1988. Active, optical range imaging sensors. Machine vision and applications 1, 2, 127--152. Google ScholarDigital Library
Blais, F. 2004. Review of 20 years of range sensor development. Journal of Electronic Imaging 13, 1.Google ScholarCross Ref
Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. Proc. ACM SIGGRAPH. Google ScholarDigital Library
Breiman, L. 2001. Random forests. Machine Learning 45, 1. Google ScholarDigital Library
Brown, M. Z., Burschka, D., and Hager, G. D. 2003. Advances in computational stereo. PAMI 25, 8, 993--1008. Google ScholarDigital Library
Comaniciu, D., and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24, 5 Google ScholarDigital Library
Criminisi, A., and Shotton, J. 2013. Decision Forests for Computer Vision and Medical Image Analysis. Springer. Google ScholarDigital Library
Fredembach, C., and Susstrunk, S. 2008. Colouring the near-infrared. In Color and Imaging Conference, vol. 2008, Society for Imaging Science and Technology, 176--182.Google Scholar
Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM Transactions on Graphics (TOG) 30, 6, 129. Google ScholarDigital Library
Girshick, R., Shotton, J., Kohli, P., Criminisi, A., and Fitzgibbon, A. 2011. Efficient regression of general-activity human poses from depth images. In Proc. ICCV. Google ScholarDigital Library
Guan, P., Weiss, A., Balan, A., and Black, M. 2009. Estimating human shape and pose from a single image. In Proc. ICCV.Google Scholar
Gurbuz, S. 2009. Application of inverse square law for 3d sensing. In SPIE Optical Engineering+ Applications, International Society for Optics and Photonics, 744706--744706.Google Scholar
Hernández, C., Vogiatzis, G., and Cipolla, R. 2008. Multiview photometric stereo. IEEE Trans. PAMI 30, 3, 548--554. Google ScholarDigital Library
Hertzmann, A., and Seitz, S. 2005. Example-based photometric stereo: Shape reconstruction with general, varying BRDFs. PAMI 27, 8. Google ScholarDigital Library
Hoiem, D., Efros, A., and Hebert, M. 2005. Automatic photo pop-up. In Proc. ACM SIGGRAPH. Google ScholarDigital Library
Horn, B. K. 1975. Obtaining shape from shading information. The psychology of computer vision, 115--155.Google Scholar
Ideses, I., Yaroslavsky, L., and Fishbain, B. 2007. Real-time 2D to 3D video conversion. J. of Real-Time Image Processing 2, 3--9.Google ScholarCross Ref
Jiang, T., Liu, B., Lu, Y., and Evans, D. 2003. A neural network approach to shape from shading. International journal of computer mathematics 80, 4, 433--439.Google Scholar
Karsch, K., Liu, C., and Kang, S. 2012. Depth extraction from video using non-parametric sampling. In Proc. ECCV. Google ScholarDigital Library
Keskin, C., Kiraç, F., Kara, Y., and Akarun, L. 2012. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Proc. ECCV. Google ScholarDigital Library
Khan, N., Tran, L., and Tappen, M. 2009. Training many-parameter shape-from-shading models using a surface database. In Proc. ICCV Workshop.Google Scholar
Kim, D., Hilliges, O., Izadi, S., Butler, A. D., Chen, J., Oikonomidis, I., and Olivier, P. 2012. Digits: freehand 3d interactions anywhere using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM, 167--176. Google ScholarDigital Library
Krishnan, D., and Fergus, R. 2009. Dark flash photography. In ACM Transactions on Graphics, SIGGRAPH 2009 Conference Proceedings, vol. 28. Google ScholarDigital Library
Lanman, D., and Taubin, G. 2009. Build your own 3D scanner: 3D photography for beginners. In ACM SIGGRAPH 2009 Courses, ACM, 8. Google ScholarDigital Library
Liao, M., Wang, L., Yang, R., and Gong, M. 2007. Light fall-off stereo. In Proc. CVPR.Google Scholar
Liu, C. P., Cheng, B. H., Chen, P. L., and Jeng, T. R. 2011. Study of three-dimensional sensing by using inverse square law. Magnetics, IEEE Transactions on 47, 3, 687--690.Google ScholarCross Ref
Marschner, S. R., Westin, S. H., Lafortune, E. P., Torrance, K. E., and Greenberg, D. P. 1999. Image-based BRDF measurement including human skin. In Rendering Techniques 99. Springer, 131--144. Google ScholarDigital Library
Mulligan, J., and Brolly, X. 2004. Surface determination by photometric ranging. In Proc. CVPR Workshop. Google ScholarDigital Library
Newcombe, R. A., Izadi, S., et al. 2011. Kinect-fusion: Real-time dense surface mapping and tracking. In Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, IEEE, 127--136. Google ScholarDigital Library
Prados, E., and Faugeras, O. 2005. Shape from shading: a well-posed problem? In Proc. CVPR, vol. 2. Google ScholarDigital Library
Remondino, F., and Stoppa, D. 2013. ToF range-imaging cameras. Springer. Google ScholarDigital Library
Rother, C., Kiefel, M., Zhang, L., Schölkopf, B., and Gehler, P. V. 2011. Recovering intrinsic images with a global sparsity prior on reflectance. In Proc. NIPS.Google Scholar
Saxena, A., Sun, M., and Ng, A. 2009. Make3D: Learning 3D scene structure from a single still image. PAMI 31, 5, 824--840. Google ScholarDigital Library
Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In IJCV. Google ScholarDigital Library
Shotton, J., Winn, J., Rother, C., and Criminisi, A. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proc. ECCV. Google ScholarDigital Library
Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from single depth images. In Proc. CVPR. Google ScholarDigital Library
Simpson, C. R., Kohl, M., Essenpreis, M., and Cope, M. 1998. Near-infrared optical properties of ex vivo human skin and subcutaneous tissues measured using the monte carlo inversion technique. Physics in Medicine and Biology 43, 2465--2478.Google ScholarCross Ref
Smith, W. A., and Hancock, E. R. 2008. Facial shape-from-shading and recognition using principal geodesic analysis and robust statistics. International Journal of Computer Vision 76, 1, 71--91. Google ScholarDigital Library
Tunwattanapong, B., Fyffe, G., Graham, P., Busch, J., Yu, X., Ghosh, A., and Debevec, P. 2013. Acquiring reflectance and shape from continuous spherical harmonic illumination. ACM Transactions on Graphics (TOG) 32, 4, 109. Google ScholarDigital Library
Vineet, V., Rother, C., and Torr, P. 2013. Higher order priors for joint intrinsic image, objects, and attributes estimation. In Proc. NIPS, 557--565.Google Scholar
Visentini-Scarzanella, M., Stoyanov, D., and Yang, G.-Z. 2012. Metric depth recovery from monocular images using shape-from-shading and specularities. In Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE, 25--28.Google Scholar
Vogel, O., Breuss, M., Leichtweis, T., and Weickert, J. 2009. Fast shape from shading for Phong-type surfaces. In International Conf. Scale Space and Variational Methods. Google ScholarDigital Library
Wang, X., and Yang, R. 2010. Learning 3D shape from a single facial image via non-linear manifold embedding and alignment. In Proc. CVPR.Google Scholar
Wei, G.-Q., and Hirzinger, G. 1996. Learning shape from shading by a multilayer network. IEEE Transactions on Neural Networks 7, 4, 985--995. Google ScholarDigital Library
Zhang, Z., Tsa, P.-S., Cryer, J. E., and Shah, M. 1999. Shape from shading: A survey. PAMI 21, 8, 690--706. Google ScholarDigital Library
Zhang, Z. 2000. A flexible new technique for camera calibration. IEEE Trans. PAMI 22, 11, 1330--1334. Google ScholarDigital Library
Zhang, S. 2010. Recent progresses on real-time 3d shape measurement using digital fringe projection techniques. Optics and lasers in engineering 48, 2, 149--158.Google Scholar

Index Terms

Learning to be a depth camera for close-range human capture and interaction
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Graphics recognition and interpretation
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
  2. Computer graphics

Recommendations

Joint Depth and Color Camera Calibration with Distortion Correction

We present an algorithm that simultaneously calibrates two color cameras, a depth camera, and the relative pose between them. The method is designed to have three key features: accurate, practical, and applicable to a wide range of sensors. The method ...
Read More
High Quality Photometric Reconstruction Using a Depth Camera
CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition

In this paper we present a depth-guided photometric 3D reconstruction method that works solely with a depth camera like the Kinect. Existing methods that fuse depth with normal estimates use an external RGB camera to obtain photometric information and ...
Read More
Depth Camera in Human-Computer Interaction: An Overview
ICINIS '12: Proceedings of the 2012 Fifth International Conference on Intelligent Networks and Intelligent Systems

With the continuous development of multimedia technology, simple two-dimensional(2D) scenes have already can't meet the needs of people's requirements because people want a more direct reflection on the real world. As a newly developing distance ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Graphics Volume 33, Issue 4
July 2014
1366 pages
ISSN:0730-0301
EISSN:1557-7368
DOI:10.1145/2601097
Issue’s Table of Contents

Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 July 2014
Published in tog Volume 33, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
acquisition
depth camera
interaction
learning
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 48
  Total Citations
  View Citations
- 3,544
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to be a depth camera for close-range human capture and interaction

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Joint Depth and Color Camera Calibration with Distortion Correction

High Quality Photometric Reconstruction Using a Depth Camera

Depth Camera in Human-Computer Interaction: An Overview

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Learning to be a depth camera for close-range human capture and interaction

ACM Transactions on Graphics

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Joint Depth and Color Camera Calibration with Distortion Correction

High Quality Photometric Reconstruction Using a Depth Camera

Depth Camera in Human-Computer Interaction: An Overview

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media