Abstract
We present a machine learning technique for estimating absolute, per-pixel depth using any conventional monocular 2D camera, with minor hardware modifications. Our approach targets close-range human capture and interaction where dense 3D estimation of hands and faces is desired. We use hybrid classification-regression forests to learn how to map from near infrared intensity images to absolute, metric depth in real-time. We demonstrate a variety of human-computer interaction and capture scenarios. Experiments show an accuracy that outperforms a conventional light fall-off baseline, and is comparable to high-quality consumer depth cameras, but with a dramatically reduced cost, power consumption, and form-factor.
Supplemental Material
Available for Download
Supplemental material.
- Ahmed, A. H., and Farag, A. A. 2007. Shape from shading under various imaging conditions. In Proc. CVPR, IEEE, 1--8.Google Scholar
- Amit, Y., and Geman, D. 1997. Shape quantization and recognition with randomized trees. Neural Computation 9, 7. Google ScholarDigital Library
- Barron, J. T., and Malik, J. 2013. Shape, illumination, and reflectance from shading. Tech. Rep. UCB/EECS-2013-117, EECS, UC Berkeley, May.Google Scholar
- Batlle, J., Mouaddib, E., and Salvi, J. 1998. Recent progress in coded structured light as a technique to solve the correspondence problem: a survey. Pattern Recognition 31, 7, 963--982.Google ScholarCross Ref
- Ben-Arie, J., and Nandy, D. 1998. A neural network approach for reconstructing surface shape from shading. In In Proc. ICIP 98., vol. 2, IEEE, 972--976.Google Scholar
- Besl, P. J. 1988. Active, optical range imaging sensors. Machine vision and applications 1, 2, 127--152. Google ScholarDigital Library
- Blais, F. 2004. Review of 20 years of range sensor development. Journal of Electronic Imaging 13, 1.Google ScholarCross Ref
- Blanz, V., and Vetter, T. 1999. A morphable model for the synthesis of 3D faces. Proc. ACM SIGGRAPH. Google ScholarDigital Library
- Breiman, L. 2001. Random forests. Machine Learning 45, 1. Google ScholarDigital Library
- Brown, M. Z., Burschka, D., and Hager, G. D. 2003. Advances in computational stereo. PAMI 25, 8, 993--1008. Google ScholarDigital Library
- Comaniciu, D., and Meer, P. 2002. Mean shift: A robust approach toward feature space analysis. IEEE Trans. PAMI 24, 5 Google ScholarDigital Library
- Criminisi, A., and Shotton, J. 2013. Decision Forests for Computer Vision and Medical Image Analysis. Springer. Google ScholarDigital Library
- Fredembach, C., and Susstrunk, S. 2008. Colouring the near-infrared. In Color and Imaging Conference, vol. 2008, Society for Imaging Science and Technology, 176--182.Google Scholar
- Ghosh, A., Fyffe, G., Tunwattanapong, B., Busch, J., Yu, X., and Debevec, P. 2011. Multiview face capture using polarized spherical gradient illumination. ACM Transactions on Graphics (TOG) 30, 6, 129. Google ScholarDigital Library
- Girshick, R., Shotton, J., Kohli, P., Criminisi, A., and Fitzgibbon, A. 2011. Efficient regression of general-activity human poses from depth images. In Proc. ICCV. Google ScholarDigital Library
- Guan, P., Weiss, A., Balan, A., and Black, M. 2009. Estimating human shape and pose from a single image. In Proc. ICCV.Google Scholar
- Gurbuz, S. 2009. Application of inverse square law for 3d sensing. In SPIE Optical Engineering+ Applications, International Society for Optics and Photonics, 744706--744706.Google Scholar
- Hernández, C., Vogiatzis, G., and Cipolla, R. 2008. Multiview photometric stereo. IEEE Trans. PAMI 30, 3, 548--554. Google ScholarDigital Library
- Hertzmann, A., and Seitz, S. 2005. Example-based photometric stereo: Shape reconstruction with general, varying BRDFs. PAMI 27, 8. Google ScholarDigital Library
- Hoiem, D., Efros, A., and Hebert, M. 2005. Automatic photo pop-up. In Proc. ACM SIGGRAPH. Google ScholarDigital Library
- Horn, B. K. 1975. Obtaining shape from shading information. The psychology of computer vision, 115--155.Google Scholar
- Ideses, I., Yaroslavsky, L., and Fishbain, B. 2007. Real-time 2D to 3D video conversion. J. of Real-Time Image Processing 2, 3--9.Google ScholarCross Ref
- Jiang, T., Liu, B., Lu, Y., and Evans, D. 2003. A neural network approach to shape from shading. International journal of computer mathematics 80, 4, 433--439.Google Scholar
- Karsch, K., Liu, C., and Kang, S. 2012. Depth extraction from video using non-parametric sampling. In Proc. ECCV. Google ScholarDigital Library
- Keskin, C., Kiraç, F., Kara, Y., and Akarun, L. 2012. Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In Proc. ECCV. Google ScholarDigital Library
- Khan, N., Tran, L., and Tappen, M. 2009. Training many-parameter shape-from-shading models using a surface database. In Proc. ICCV Workshop.Google Scholar
- Kim, D., Hilliges, O., Izadi, S., Butler, A. D., Chen, J., Oikonomidis, I., and Olivier, P. 2012. Digits: freehand 3d interactions anywhere using a wrist-worn gloveless sensor. In Proceedings of the 25th annual ACM symposium on User interface software and technology, ACM, 167--176. Google ScholarDigital Library
- Krishnan, D., and Fergus, R. 2009. Dark flash photography. In ACM Transactions on Graphics, SIGGRAPH 2009 Conference Proceedings, vol. 28. Google ScholarDigital Library
- Lanman, D., and Taubin, G. 2009. Build your own 3D scanner: 3D photography for beginners. In ACM SIGGRAPH 2009 Courses, ACM, 8. Google ScholarDigital Library
- Liao, M., Wang, L., Yang, R., and Gong, M. 2007. Light fall-off stereo. In Proc. CVPR.Google Scholar
- Liu, C. P., Cheng, B. H., Chen, P. L., and Jeng, T. R. 2011. Study of three-dimensional sensing by using inverse square law. Magnetics, IEEE Transactions on 47, 3, 687--690.Google ScholarCross Ref
- Marschner, S. R., Westin, S. H., Lafortune, E. P., Torrance, K. E., and Greenberg, D. P. 1999. Image-based BRDF measurement including human skin. In Rendering Techniques 99. Springer, 131--144. Google ScholarDigital Library
- Mulligan, J., and Brolly, X. 2004. Surface determination by photometric ranging. In Proc. CVPR Workshop. Google ScholarDigital Library
- Newcombe, R. A., Izadi, S., et al. 2011. Kinect-fusion: Real-time dense surface mapping and tracking. In Mixed and augmented reality (ISMAR), 2011 10th IEEE international symposium on, IEEE, 127--136. Google ScholarDigital Library
- Prados, E., and Faugeras, O. 2005. Shape from shading: a well-posed problem? In Proc. CVPR, vol. 2. Google ScholarDigital Library
- Remondino, F., and Stoppa, D. 2013. ToF range-imaging cameras. Springer. Google ScholarDigital Library
- Rother, C., Kiefel, M., Zhang, L., Schölkopf, B., and Gehler, P. V. 2011. Recovering intrinsic images with a global sparsity prior on reflectance. In Proc. NIPS.Google Scholar
- Saxena, A., Sun, M., and Ng, A. 2009. Make3D: Learning 3D scene structure from a single still image. PAMI 31, 5, 824--840. Google ScholarDigital Library
- Scharstein, D., and Szeliski, R. 2002. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. In IJCV. Google ScholarDigital Library
- Shotton, J., Winn, J., Rother, C., and Criminisi, A. 2006. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proc. ECCV. Google ScholarDigital Library
- Shotton, J., Fitzgibbon, A., Cook, M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., and Blake, A. 2011. Real-time human pose recognition in parts from single depth images. In Proc. CVPR. Google ScholarDigital Library
- Simpson, C. R., Kohl, M., Essenpreis, M., and Cope, M. 1998. Near-infrared optical properties of ex vivo human skin and subcutaneous tissues measured using the monte carlo inversion technique. Physics in Medicine and Biology 43, 2465--2478.Google ScholarCross Ref
- Smith, W. A., and Hancock, E. R. 2008. Facial shape-from-shading and recognition using principal geodesic analysis and robust statistics. International Journal of Computer Vision 76, 1, 71--91. Google ScholarDigital Library
- Tunwattanapong, B., Fyffe, G., Graham, P., Busch, J., Yu, X., Ghosh, A., and Debevec, P. 2013. Acquiring reflectance and shape from continuous spherical harmonic illumination. ACM Transactions on Graphics (TOG) 32, 4, 109. Google ScholarDigital Library
- Vineet, V., Rother, C., and Torr, P. 2013. Higher order priors for joint intrinsic image, objects, and attributes estimation. In Proc. NIPS, 557--565.Google Scholar
- Visentini-Scarzanella, M., Stoyanov, D., and Yang, G.-Z. 2012. Metric depth recovery from monocular images using shape-from-shading and specularities. In Image Processing (ICIP), 2012 19th IEEE International Conference on, IEEE, 25--28.Google Scholar
- Vogel, O., Breuss, M., Leichtweis, T., and Weickert, J. 2009. Fast shape from shading for Phong-type surfaces. In International Conf. Scale Space and Variational Methods. Google ScholarDigital Library
- Wang, X., and Yang, R. 2010. Learning 3D shape from a single facial image via non-linear manifold embedding and alignment. In Proc. CVPR.Google Scholar
- Wei, G.-Q., and Hirzinger, G. 1996. Learning shape from shading by a multilayer network. IEEE Transactions on Neural Networks 7, 4, 985--995. Google ScholarDigital Library
- Zhang, Z., Tsa, P.-S., Cryer, J. E., and Shah, M. 1999. Shape from shading: A survey. PAMI 21, 8, 690--706. Google ScholarDigital Library
- Zhang, Z. 2000. A flexible new technique for camera calibration. IEEE Trans. PAMI 22, 11, 1330--1334. Google ScholarDigital Library
- Zhang, S. 2010. Recent progresses on real-time 3d shape measurement using digital fringe projection techniques. Optics and lasers in engineering 48, 2, 149--158.Google Scholar
Index Terms
- Learning to be a depth camera for close-range human capture and interaction
Recommendations
Joint Depth and Color Camera Calibration with Distortion Correction
We present an algorithm that simultaneously calibrates two color cameras, a depth camera, and the relative pose between them. The method is designed to have three key features: accurate, practical, and applicable to a wide range of sensors. The method ...
High Quality Photometric Reconstruction Using a Depth Camera
CVPR '14: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern RecognitionIn this paper we present a depth-guided photometric 3D reconstruction method that works solely with a depth camera like the Kinect. Existing methods that fuse depth with normal estimates use an external RGB camera to obtain photometric information and ...
Depth Camera in Human-Computer Interaction: An Overview
ICINIS '12: Proceedings of the 2012 Fifth International Conference on Intelligent Networks and Intelligent SystemsWith the continuous development of multimedia technology, simple two-dimensional(2D) scenes have already can't meet the needs of people's requirements because people want a more direct reflection on the real world. As a newly developing distance ...
Comments