skip to main content
10.1145/3240508.3240530acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Self-boosted Gesture Interactive System with ST-Net

Published:15 October 2018Publication History

ABSTRACT

In this paper, we propose a self-boosted intelligent system for joint sign language recognition and automatic education. A novel Spatial-Temporal Net (ST-Net) is designed to exploit the temporal dynamics of localized hands for sign language recognition. Features from ST-Net can be deployed by our education system to detect failure modes of the learners. Moreover, the education system can help collect a vast amount of data for training ST-Net. Our sign language recognition and education system help improve each other step-by-step.On the one hand, benefited from accurate recognition system, the education system can detect the failure parts of the learner more precisely. On the other hand, with more training data gathered from the education system, the recognition system becomes more robust and accurate. Experiments on Hong Kong sign language dataset containing 227 commonly used words validate the effectiveness of our joint recognition and education system.

References

  1. Felix Sze Brenda Yu. 2008. Asian SignBank. http://cslds.org/asiansignbank/index.htm (2008).Google ScholarGoogle Scholar
  2. Diane Brentari. 1998. A prosodic model of sign language phonology .Mit Press.Google ScholarGoogle Scholar
  3. Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, and Richard Bowden. 2017. Subunets: End-to-end hand shape and continuous sign language recognition. In IEEE International Conference on Computer Vision (ICCV).Google ScholarGoogle ScholarCross RefCross Ref
  4. Hardie Cate, Fahim Dalvi, and Zeshan Hussain. 2017. Sign Language Recognition using Temporal Classification. arXiv preprint arXiv:1701.01875 (2017).Google ScholarGoogle Scholar
  5. Xiujuan Chai, Hanjie Wang, and Xilin Chen. 2014. The devisign large vocabulary of chinese sign language database and baseline evaluations . Technical Report. Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS.Google ScholarGoogle Scholar
  6. Xiujuan Chai, Hanjie Wang, Fang Yin, and Xilin Chen. 2015. Communication tool for the hard of hearings: A large vocabulary sign language recognition system. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 781--783. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Xiaoming Deng, Yinda Zhang, Shuo Yang, Ping Tan, Liang Chang, Ye Yuan, and Hongan Wang. 2017. Joint Hand Detection and Rotation Estimation Using CNN. IEEE Transactions on Image Processing (2017).Google ScholarGoogle Scholar
  8. Jens Forster, Christoph Schmidt, Thomas Hoyoux, Oscar Koller, Uwe Zelle, Justus H Piater, and Hermann Ney. 2012. RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus.. In LREC. 3785--3789.Google ScholarGoogle Scholar
  9. David F Fouhey, Wei-cheng Kuo, Alexei A Efros, and Jitendra Malik. 2017. From Lifestyle Vlogs to Everyday Interactions. arXiv preprint arXiv:1712.02310 (2017).Google ScholarGoogle Scholar
  10. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarGoogle ScholarCross RefCross Ref
  11. Fuyang Huang, Zelong Sun, Qiang Xu, Felix Yim Binh Sze, Tang Wai Lan, and Xiaogang Wang. 2014. Real-time sign language recognition using RGBD stream: spatial-temporal feature exploration. In Proceedings of the 2nd ACM symposium on Spatial user interaction. ACM, 149--149. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jie Huang, Wengang Zhou, Houqiang Li, and Weiping Li. 2015. Sign language recognition using 3D convolutional neural networks. In Multimedia and Expo (ICME), 2015 IEEE International Conference on. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  13. Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, and Weiping Li. 2018. Video-based Sign Language Recognition without Temporal Segmentation. arXiv preprint arXiv:1801.10111 (2018).Google ScholarGoogle Scholar
  14. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google ScholarGoogle Scholar
  15. Vahid Kazemi and Josephine Sullivan. 2014a. One Millisecond Face Alignment with an Ensemble of Regression Trees. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Vahid Kazemi and Josephine Sullivan. 2014b. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1867--1874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Joongrock Kim, Sunjin Yu, Dongchul Kim, Kar-Ann Toh, and Sangyoun Lee. 2017. An adaptive local binary pattern for 3d hand tracking. Pattern Recognition , Vol. 61 (2017), 139--152. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Oscar Koller, Hermann Ney, and Richard Bowden. 2016. Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3793--3802.Google ScholarGoogle ScholarCross RefCross Ref
  19. T Hoang Ngan Le, Kha Gia Quach, Chenchen Zhu, Chi Nhan Duong, Khoa Luu, Marios Savvides, and CyLab Biometrics Center. 2017. Robust Hand Detection and Classification in Vehicles and in the Wild. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE, 1203--1210.Google ScholarGoogle Scholar
  20. Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE , Vol. 86, 11 (1998), 2278--2324.Google ScholarGoogle ScholarCross RefCross Ref
  21. Honghai Liu, Zhaojie Ju, Xiaofei Ji, Chee Seng Chan, and Mehdi Khoury. 2017. A novel approach to extract hand gesture feature in depth images. In Human Motion Sensing and Recognition . Springer, 193--205.Google ScholarGoogle Scholar
  22. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016a. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21--37.Google ScholarGoogle ScholarCross RefCross Ref
  23. Zhengzhe Liu, Fuyang Huang, Gladys Wai Lan Tang, Felix Yim Binh Sze, Jing Qin, Xiaogang Wang, and Qiang Xu. 2016b. Real-time Sign Language Recognition with Guided Deep Convolutional Neural Networks. In Proceedings of the 2016 Symposium on Spatial User Interaction. ACM, 187--187. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Mittal, A. Zisserman, and P. H. S. Torr. 2011. Hand detection using multiple proposals. In British Machine Vision Conference .Google ScholarGoogle ScholarCross RefCross Ref
  25. Marlon Oliveira, Houssem Chatbri, Ylva Ferstl, Mohamed Farouk, Suzanne Little, Noel E O'Connor, and Alistair Sutherland. 2017. A dataset for irish sign language recognition. (2017).Google ScholarGoogle Scholar
  26. Eng-Jon Ong, Helen Cooper, Nicolas Pugeault, and Richard Bowden. 2012. Sign language recognition using sequential pattern trees. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2200--2207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Lionel Pigou, Sander Dieleman, Pieter-Jan Kindermans, and Benjamin Schrauwen. 2014. Sign language recognition using convolutional neural networks. In Workshop at the European Conference on Computer Vision. Springer, 572--578.Google ScholarGoogle Scholar
  28. Lionel Pigou, Mieke Van Herreweghe, and Joni Dambre. 2017. Gesture and Sign Language Recognition With Temporal Residual Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3086--3093.Google ScholarGoogle ScholarCross RefCross Ref
  29. G Ananth Rao and PVV Kishore. 2017. Selfie video based continuous Indian sign language recognition system. Ain Shams Engineering Journal (2017).Google ScholarGoogle Scholar
  30. Franco Ronchetti, Facundo Quiroga, César Armando Estrebou, Laura Cristina Lanzarini, and Alejandro Rosete. 2016. LSA64: An Argentinian Sign Language Dataset. In XXII Congreso Argentino de Ciencias de la Computación (CACIC 2016).Google ScholarGoogle Scholar
  31. Kankana Roy, Aparna Mohanty, and Rajiv R Sahay. 2017. Deep Learning Based Hand Detection in Cluttered Environment Using Skin Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 640--649.Google ScholarGoogle ScholarCross RefCross Ref
  32. Shinji Sako and Tadashi Kitamura. 2013. Subunit modeling for japanese sign language recognition based on phonetically depend multi-stream hidden markov models. Universal Access in Human-Computer Interaction. Design Methods, Tools, and Interaction Techniques for EInclusion. Springer Berlin Heidelberg (2013), 548--555. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Tamer Shanableh, Khaled Assaleh, and M Al-Rousan. 2007. Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic sign language. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , Vol. 37, 3 (2007), 641--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping. arXiv preprint arXiv:1704.07809 (2017).Google ScholarGoogle Scholar
  35. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR .Google ScholarGoogle Scholar
  36. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarGoogle ScholarCross RefCross Ref
  37. Ulrich Von Agris, Moritz Knorr, and Karl-Friedrich Kraiss. 2008. The significance of facial features for automatic sign language recognition. In Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on . IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  38. Hanjie Wang, Xiujuan Chai, Xiaopeng Hong, Guoying Zhao, and Xilin Chen. 2016. Isolated Sign Language Recognition with Grassmann Covariance Matrices. ACM Transactions on Accessible Computing (TACCESS) , Vol. 8, 4 (2016), 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hanjie Wang, Xiujuan Chai, Yu Zhou, and Xilin Chen. 2015. Fast sign language recognition benefited from low rank approximation. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on , Vol. 1. IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  40. Polina Yanovich, Carol Neidle, and Dimitris N Metaxas. 2016. Detection of Major ASL Sign Types in Continuous Signing For ASL Recognition.. In LREC.Google ScholarGoogle Scholar
  41. Fang Yin, Xiujuan Chai, and Xilin Chen. 2016. Iterative Reference Driven Metric Learning for Signer Independent Isolated Sign Language Recognition. In European Conference on Computer Vision. Springer, 434--450.Google ScholarGoogle Scholar
  42. Fang Yin, Xiujuan Chai, Yu Zhou, and Xilin Chen. 2015. Semantics constrained dictionary learning for signer-independent sign language recognition. In Image Processing (ICIP), 2015 IEEE International Conference on. IEEE, 3310--3314.Google ScholarGoogle ScholarCross RefCross Ref
  43. Christian Zimmermann and Thomas Brox. 2017. Learning to Estimate 3D Hand Pose from Single RGB Images. arXiv preprint arXiv:1705.01389 (2017).Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    MM '18: Proceedings of the 26th ACM international conference on Multimedia
    October 2018
    2167 pages
    ISBN:9781450356657
    DOI:10.1145/3240508

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 15 October 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%

    Upcoming Conference

    MM '24
    MM '24: The 32nd ACM International Conference on Multimedia
    October 28 - November 1, 2024
    Melbourne , VIC , Australia

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader