ABSTRACT
In this paper, we propose a self-boosted intelligent system for joint sign language recognition and automatic education. A novel Spatial-Temporal Net (ST-Net) is designed to exploit the temporal dynamics of localized hands for sign language recognition. Features from ST-Net can be deployed by our education system to detect failure modes of the learners. Moreover, the education system can help collect a vast amount of data for training ST-Net. Our sign language recognition and education system help improve each other step-by-step.On the one hand, benefited from accurate recognition system, the education system can detect the failure parts of the learner more precisely. On the other hand, with more training data gathered from the education system, the recognition system becomes more robust and accurate. Experiments on Hong Kong sign language dataset containing 227 commonly used words validate the effectiveness of our joint recognition and education system.
- Felix Sze Brenda Yu. 2008. Asian SignBank. http://cslds.org/asiansignbank/index.htm (2008).Google Scholar
- Diane Brentari. 1998. A prosodic model of sign language phonology .Mit Press.Google Scholar
- Necati Cihan Camgoz, Simon Hadfield, Oscar Koller, and Richard Bowden. 2017. Subunets: End-to-end hand shape and continuous sign language recognition. In IEEE International Conference on Computer Vision (ICCV).Google ScholarCross Ref
- Hardie Cate, Fahim Dalvi, and Zeshan Hussain. 2017. Sign Language Recognition using Temporal Classification. arXiv preprint arXiv:1701.01875 (2017).Google Scholar
- Xiujuan Chai, Hanjie Wang, and Xilin Chen. 2014. The devisign large vocabulary of chinese sign language database and baseline evaluations . Technical Report. Technical report VIPL-TR-14-SLR-001. Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS.Google Scholar
- Xiujuan Chai, Hanjie Wang, Fang Yin, and Xilin Chen. 2015. Communication tool for the hard of hearings: A large vocabulary sign language recognition system. In Affective Computing and Intelligent Interaction (ACII), 2015 International Conference on. IEEE, 781--783. Google ScholarDigital Library
- Xiaoming Deng, Yinda Zhang, Shuo Yang, Ping Tan, Liang Chang, Ye Yuan, and Hongan Wang. 2017. Joint Hand Detection and Rotation Estimation Using CNN. IEEE Transactions on Image Processing (2017).Google Scholar
- Jens Forster, Christoph Schmidt, Thomas Hoyoux, Oscar Koller, Uwe Zelle, Justus H Piater, and Hermann Ney. 2012. RWTH-PHOENIX-Weather: A Large Vocabulary Sign Language Recognition and Translation Corpus.. In LREC. 3785--3789.Google Scholar
- David F Fouhey, Wei-cheng Kuo, Alexei A Efros, and Jitendra Malik. 2017. From Lifestyle Vlogs to Everyday Interactions. arXiv preprint arXiv:1712.02310 (2017).Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.Google ScholarCross Ref
- Fuyang Huang, Zelong Sun, Qiang Xu, Felix Yim Binh Sze, Tang Wai Lan, and Xiaogang Wang. 2014. Real-time sign language recognition using RGBD stream: spatial-temporal feature exploration. In Proceedings of the 2nd ACM symposium on Spatial user interaction. ACM, 149--149. Google ScholarDigital Library
- Jie Huang, Wengang Zhou, Houqiang Li, and Weiping Li. 2015. Sign language recognition using 3D convolutional neural networks. In Multimedia and Expo (ICME), 2015 IEEE International Conference on. IEEE, 1--6.Google ScholarCross Ref
- Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, and Weiping Li. 2018. Video-based Sign Language Recognition without Temporal Segmentation. arXiv preprint arXiv:1801.10111 (2018).Google Scholar
- Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. arXiv preprint arXiv:1408.5093 (2014).Google Scholar
- Vahid Kazemi and Josephine Sullivan. 2014a. One Millisecond Face Alignment with an Ensemble of Regression Trees. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Google ScholarDigital Library
- Vahid Kazemi and Josephine Sullivan. 2014b. One millisecond face alignment with an ensemble of regression trees. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 1867--1874. Google ScholarDigital Library
- Joongrock Kim, Sunjin Yu, Dongchul Kim, Kar-Ann Toh, and Sangyoun Lee. 2017. An adaptive local binary pattern for 3d hand tracking. Pattern Recognition , Vol. 61 (2017), 139--152. Google ScholarDigital Library
- Oscar Koller, Hermann Ney, and Richard Bowden. 2016. Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3793--3802.Google ScholarCross Ref
- T Hoang Ngan Le, Kha Gia Quach, Chenchen Zhu, Chi Nhan Duong, Khoa Luu, Marios Savvides, and CyLab Biometrics Center. 2017. Robust Hand Detection and Classification in Vehicles and in the Wild. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2017 IEEE Conference on. IEEE, 1203--1210.Google Scholar
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE , Vol. 86, 11 (1998), 2278--2324.Google ScholarCross Ref
- Honghai Liu, Zhaojie Ju, Xiaofei Ji, Chee Seng Chan, and Mehdi Khoury. 2017. A novel approach to extract hand gesture feature in depth images. In Human Motion Sensing and Recognition . Springer, 193--205.Google Scholar
- Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016a. SSD: Single shot multibox detector. In European Conference on Computer Vision. Springer, 21--37.Google ScholarCross Ref
- Zhengzhe Liu, Fuyang Huang, Gladys Wai Lan Tang, Felix Yim Binh Sze, Jing Qin, Xiaogang Wang, and Qiang Xu. 2016b. Real-time Sign Language Recognition with Guided Deep Convolutional Neural Networks. In Proceedings of the 2016 Symposium on Spatial User Interaction. ACM, 187--187. Google ScholarDigital Library
- A. Mittal, A. Zisserman, and P. H. S. Torr. 2011. Hand detection using multiple proposals. In British Machine Vision Conference .Google ScholarCross Ref
- Marlon Oliveira, Houssem Chatbri, Ylva Ferstl, Mohamed Farouk, Suzanne Little, Noel E O'Connor, and Alistair Sutherland. 2017. A dataset for irish sign language recognition. (2017).Google Scholar
- Eng-Jon Ong, Helen Cooper, Nicolas Pugeault, and Richard Bowden. 2012. Sign language recognition using sequential pattern trees. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2200--2207. Google ScholarDigital Library
- Lionel Pigou, Sander Dieleman, Pieter-Jan Kindermans, and Benjamin Schrauwen. 2014. Sign language recognition using convolutional neural networks. In Workshop at the European Conference on Computer Vision. Springer, 572--578.Google Scholar
- Lionel Pigou, Mieke Van Herreweghe, and Joni Dambre. 2017. Gesture and Sign Language Recognition With Temporal Residual Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . 3086--3093.Google ScholarCross Ref
- G Ananth Rao and PVV Kishore. 2017. Selfie video based continuous Indian sign language recognition system. Ain Shams Engineering Journal (2017).Google Scholar
- Franco Ronchetti, Facundo Quiroga, César Armando Estrebou, Laura Cristina Lanzarini, and Alejandro Rosete. 2016. LSA64: An Argentinian Sign Language Dataset. In XXII Congreso Argentino de Ciencias de la Computación (CACIC 2016).Google Scholar
- Kankana Roy, Aparna Mohanty, and Rajiv R Sahay. 2017. Deep Learning Based Hand Detection in Cluttered Environment Using Skin Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 640--649.Google ScholarCross Ref
- Shinji Sako and Tadashi Kitamura. 2013. Subunit modeling for japanese sign language recognition based on phonetically depend multi-stream hidden markov models. Universal Access in Human-Computer Interaction. Design Methods, Tools, and Interaction Techniques for EInclusion. Springer Berlin Heidelberg (2013), 548--555. Google ScholarDigital Library
- Tamer Shanableh, Khaled Assaleh, and M Al-Rousan. 2007. Spatio-temporal feature-extraction techniques for isolated gesture recognition in Arabic sign language. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) , Vol. 37, 3 (2007), 641--650. Google ScholarDigital Library
- Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. 2017. Hand Keypoint Detection in Single Images using Multiview Bootstrapping. arXiv preprint arXiv:1704.07809 (2017).Google Scholar
- Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR .Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.Google ScholarCross Ref
- Ulrich Von Agris, Moritz Knorr, and Karl-Friedrich Kraiss. 2008. The significance of facial features for automatic sign language recognition. In Automatic Face & Gesture Recognition, 2008. FG'08. 8th IEEE International Conference on . IEEE, 1--6.Google ScholarCross Ref
- Hanjie Wang, Xiujuan Chai, Xiaopeng Hong, Guoying Zhao, and Xilin Chen. 2016. Isolated Sign Language Recognition with Grassmann Covariance Matrices. ACM Transactions on Accessible Computing (TACCESS) , Vol. 8, 4 (2016), 14. Google ScholarDigital Library
- Hanjie Wang, Xiujuan Chai, Yu Zhou, and Xilin Chen. 2015. Fast sign language recognition benefited from low rank approximation. In Automatic Face and Gesture Recognition (FG), 2015 11th IEEE International Conference and Workshops on , Vol. 1. IEEE, 1--6.Google ScholarCross Ref
- Polina Yanovich, Carol Neidle, and Dimitris N Metaxas. 2016. Detection of Major ASL Sign Types in Continuous Signing For ASL Recognition.. In LREC.Google Scholar
- Fang Yin, Xiujuan Chai, and Xilin Chen. 2016. Iterative Reference Driven Metric Learning for Signer Independent Isolated Sign Language Recognition. In European Conference on Computer Vision. Springer, 434--450.Google Scholar
- Fang Yin, Xiujuan Chai, Yu Zhou, and Xilin Chen. 2015. Semantics constrained dictionary learning for signer-independent sign language recognition. In Image Processing (ICIP), 2015 IEEE International Conference on. IEEE, 3310--3314.Google ScholarCross Ref
- Christian Zimmermann and Thomas Brox. 2017. Learning to Estimate 3D Hand Pose from Single RGB Images. arXiv preprint arXiv:1705.01389 (2017).Google Scholar
Recommendations
An interactive e-learning system for improving web programming skills
This paper introduces an interactive e-learning system that provides an integrated environment for web programming. The proposed system is web-based. It provides an online editor for writing, editing, updating, and executing programming code, so the ...
Facial expression recognition with Convolutional Neural Networks
Facial expression recognition has been an active research area in the past 10 years, with growing application areas including avatar animation, neuromarketing and sociable robots. The recognition of facial expressions is not an easy problem for machine ...
Personalized E-learning system with self-regulated learning assisted mechanisms for promoting learning performance
With the rapid development of Internet technologies, the conventional computer-assisted learning (CAL) is gradually moving toward to web-based learning. Additionally, instructors typically base their teaching methods to simultaneously interact with all ...
Comments