research-article

Self-boosted Gesture Interactive System with ST-Net

Authors:
Zhengzhe Liu

DJI, Shenzhen, China

DJI, Shenzhen, China
View Profile

,
Xiaojuan Qi

CUHK, Hong Kong, China

CUHK, Hong Kong, China
View Profile

,
Lei Pang

DJI, Shenzhen, China

DJI, Shenzhen, China
View Profile

MM '18: Proceedings of the 26th ACM international conference on MultimediaOctober 2018Pages 145–153https://doi.org/10.1145/3240508.3240530

Published:15 October 2018Publication History

MM '18: Proceedings of the 26th ACM international conference on Multimedia

Pages 145–153

ABSTRACT

In this paper, we propose a self-boosted intelligent system for joint sign language recognition and automatic education. A novel Spatial-Temporal Net (ST-Net) is designed to exploit the temporal dynamics of localized hands for sign language recognition. Features from ST-Net can be deployed by our education system to detect failure modes of the learners. Moreover, the education system can help collect a vast amount of data for training ST-Net. Our sign language recognition and education system help improve each other step-by-step.On the one hand, benefited from accurate recognition system, the education system can detect the failure parts of the learner more precisely. On the other hand, with more training data gathered from the education system, the recognition system becomes more robust and accurate. Experiments on Hong Kong sign language dataset containing 227 commonly used words validate the effectiveness of our joint recognition and education system.

References

Felix Sze Brenda Yu. 2008. Asian SignBank. http://cslds.org/asiansignbank/index.htm (2008).Google Scholar
Diane Brentari. 1998. A prosodic model of sign language phonology .Mit Press.Google Scholar
Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, and Richard Bowden. 2017. Subunets: End-to-end hand shape and continuous sign language recognition. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
Hardie Cate, Fahim Dalvi, and Zeshan Hussain. 2017. Sign Language Recognition using Temporal Classification. arXiv preprint arXiv:1701.01875 (2017).Google Scholar
Xiujuan Chai, Hanjie Wang, and Xilin Chen. 2014. The devisign large vocabulary of chinese sign language database and baseline evaluations . Technical Report. Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS.Google Scholar
Xiujuan Chai, Hanjie Wang, Fang Yin, and Xilin Chen. 2015. Communication tool for the hard of hearings: A large vocabulary sign language recognition system. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 781--783. Google ScholarDigital Library
Xiaoming Deng, Yinda Zhang, Shuo Yang, Ping Tan, Liang Chang, Ye Yuan, and Hongan Wang. 2017. Joint Hand Detection and Rotation Estimation Using CNN. IEEE Transactions on Image Processing (2017).Google Scholar
Jens Forster, Christoph Schmidt, Thomas Hoyoux, Oscar Koller, Uwe Zelle, Justus H Piater, and Hermann Ney. 2012. RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus.. In LREC. 3785--3789.Google Scholar
David F Fouhey, Wei-cheng Kuo, Alexei A Efros, and Jitendra Malik. 2017. From Lifestyle Vlogs to Everyday Interactions. arXiv preprint arXiv:1712.02310 (2017).Google Scholar
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
Fuyang Huang, Zelong Sun, Qiang Xu, Felix Yim Binh Sze, Tang Wai Lan, and Xiaogang Wang. 2014. Real-time sign language recognition using RGBD stream: spatial-temporal feature exploration. In Proceedings of the 2nd ACM symposium on Spatial user interaction. ACM, 149--149. Google ScholarDigital Library
Jie Huang, Wengang Zhou, Houqiang Li, and Weiping Li. 2015. Sign language recognition using 3D convolutional neural networks. In Multimedia and Expo (ICME), 2015 IEEE International Conference on. IEEE, 1--6.Google ScholarCross Ref
Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, and Weiping Li. 2018. Video-based Sign Language Recognition without Temporal Segmentation. arXiv preprint arXiv:1801.10111 (2018).Google Scholar
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
Vahid Kazemi and Josephine Sullivan. 2014a. One Millisecond Face Alignment with an Ensemble of Regression Trees. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarDigital Library
Vahid Kazemi and Josephine Sullivan. 2014b. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1867--1874. Google ScholarDigital Library
Joongrock Kim, Sunjin Yu, Dongchul Kim, Kar-Ann Toh, and Sangyoun Lee. 2017. An adaptive local binary pattern for 3d hand tracking. Pattern Recognition , Vol. 61 (2017), 139--152. Google ScholarDigital Library
Oscar Koller, Hermann Ney, and Richard Bowden. 2016. Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3793--3802.Google ScholarCross Ref
T Hoang Ngan Le, Kha Gia Quach, Chenchen Zhu, Chi Nhan Duong, Khoa Luu, Marios Savvides, and CyLab Biometrics Center. 2017. Robust Hand Detection and Classification in Vehicles and in the Wild. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE, 1203--1210.Google Scholar
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE , Vol. 86, 11 (1998), 2278--2324.Google ScholarCross Ref
Honghai Liu, Zhaojie Ju, Xiaofei Ji, Chee Seng Chan, and Mehdi Khoury. 2017. A novel approach to extract hand gesture feature in depth images. In Human Motion Sensing and Recognition . Springer, 193--205.Google Scholar
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016a. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21--37.Google ScholarCross Ref
Zhengzhe Liu, Fuyang Huang, Gladys Wai Lan Tang, Felix Yim Binh Sze, Jing Qin, Xiaogang Wang, and Qiang Xu. 2016b. Real-time Sign Language Recognition with Guided Deep Convolutional Neural Networks. In Proceedings of the 2016 Symposium on Spatial User Interaction. ACM, 187--187. Google ScholarDigital Library
A. Mittal, A. Zisserman, and P. H. S. Torr. 2011. Hand detection using multiple proposals. In British Machine Vision Conference .Google ScholarCross Ref
Marlon Oliveira, Houssem Chatbri, Ylva Ferstl, Mohamed Farouk, Suzanne Little, Noel E O'Connor, and Alistair Sutherland. 2017. A dataset for irish sign language recognition. (2017).Google Scholar
Eng-Jon Ong, Helen Cooper, Nicolas Pugeault, and Richard Bowden. 2012. Sign language recognition using sequential pattern trees. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2200--2207. Google ScholarDigital Library
Lionel Pigou, Sander Dieleman, Pieter-Jan Kindermans, and Benjamin Schrauwen. 2014. Sign language recognition using convolutional neural networks. In Workshop at the European Conference on Computer Vision. Springer, 572--578.Google Scholar
Lionel Pigou, Mieke Van Herreweghe, and Joni Dambre. 2017. Gesture and Sign Language Recognition With Temporal Residual Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3086--3093.Google ScholarCross Ref
G Ananth Rao and PVV Kishore. 2017. Selfie video based continuous Indian sign language recognition system. Ain Shams Engineering Journal (2017).Google Scholar
Franco Ronchetti, Facundo Quiroga, César Armando Estrebou, Laura Cristina Lanzarini, and Alejandro Rosete. 2016. LSA64: An Argentinian Sign Language Dataset. In XXII Congreso Argentino de Ciencias de la Computación (CACIC 2016).Google Scholar
Kankana Roy, Aparna Mohanty, and Rajiv R Sahay. 2017. Deep Learning Based Hand Detection in Cluttered Environment Using Skin Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 640--649.Google ScholarCross Ref
Shinji Sako and Tadashi Kitamura. 2013. Subunit modeling for japanese sign language recognition based on phonetically depend multi-stream hidden markov models. Universal Access in Human-Computer Interaction. Design Methods, Tools, and Interaction Techniques for EInclusion. Springer Berlin Heidelberg (2013), 548--555. Google ScholarDigital Library
Tamer Shanableh, Khaled Assaleh, and M Al-Rousan. 2007. Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic sign language. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , Vol. 37, 3 (2007), 641--650. Google ScholarDigital Library
Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping. arXiv preprint arXiv:1704.07809 (2017).Google Scholar
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR .Google Scholar
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarCross Ref
Ulrich Von Agris, Moritz Knorr, and Karl-Friedrich Kraiss. 2008. The significance of facial features for automatic sign language recognition. In Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on . IEEE, 1--6.Google ScholarCross Ref
Hanjie Wang, Xiujuan Chai, Xiaopeng Hong, Guoying Zhao, and Xilin Chen. 2016. Isolated Sign Language Recognition with Grassmann Covariance Matrices. ACM Transactions on Accessible Computing (TACCESS) , Vol. 8, 4 (2016), 14. Google ScholarDigital Library
Hanjie Wang, Xiujuan Chai, Yu Zhou, and Xilin Chen. 2015. Fast sign language recognition benefited from low rank approximation. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on , Vol. 1. IEEE, 1--6.Google ScholarCross Ref
Polina Yanovich, Carol Neidle, and Dimitris N Metaxas. 2016. Detection of Major ASL Sign Types in Continuous Signing For ASL Recognition.. In LREC.Google Scholar
Fang Yin, Xiujuan Chai, and Xilin Chen. 2016. Iterative Reference Driven Metric Learning for Signer Independent Isolated Sign Language Recognition. In European Conference on Computer Vision. Springer, 434--450.Google Scholar
Fang Yin, Xiujuan Chai, Yu Zhou, and Xilin Chen. 2015. Semantics constrained dictionary learning for signer-independent sign language recognition. In Image Processing (ICIP), 2015 IEEE International Conference on. IEEE, 3310--3314.Google ScholarCross Ref
Christian Zimmermann and Thomas Brox. 2017. Learning to Estimate 3D Hand Pose from Single RGB Images. arXiv preprint arXiv:1705.01389 (2017).Google Scholar

Recommendations

An interactive e-learning system for improving web programming skills

This paper introduces an interactive e-learning system that provides an integrated environment for web programming. The proposed system is web-based. It provides an online editor for writing, editing, updating, and executing programming code, so the ...
Read More
Facial expression recognition with Convolutional Neural Networks

Facial expression recognition has been an active research area in the past 10 years, with growing application areas including avatar animation, neuromarketing and sociable robots. The recognition of facial expressions is not an easy problem for machine ...
Read More
Personalized E-learning system with self-regulated learning assisted mechanisms for promoting learning performance

With the rapid development of Internet technologies, the conventional computer-assisted learning (CAL) is gradually moving toward to web-based learning. Additionally, instructors typically base their teaching methods to simultaneously interact with all ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
General Chairs:
Susanne Boll
University of Oldenburg, Germany
,
Kyoung Mu Lee
Seoul National University, Korea
,
Jiebo Luo
University of Rochester, USA
,
Wenwu Zhu
Tsinghua University, China
,
Program Chairs:
Hyeran Byun
Yonsei University, Korea
,
Chang Wen Chen
State Univ. Of New York at Buffalo, USA
,
Rainer Lienhart
University of Augsburg, Germany
,
Tao Mei
JD AI, China
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 15 October 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
convolutional neural networks
interactive system
Qualifiers
- research-article
Conference

Acceptance Rates
MM '18 Paper Acceptance Rate209of757submissions,28%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 259
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Self-boosted Gesture Interactive System with ST-Net

MM '18: Proceedings of the 26th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Recommendations

An interactive e-learning system for improving web programming skills

Facial expression recognition with Convolutional Neural Networks

Personalized E-learning system with self-regulated learning assisted mechanisms for promoting learning performance