ABSTRACT
In this paper, we propose a new visual object tracking which realizes robustness against object occlusion and deformation. In the proposed visual tracking, triplet convolutional neural network (triplet-CNN) structure is devised. The three inputs for the triplet-CNN come from current query frame, tracked object in a previous frame, and reference object. Object location in the query frame is predicted by fusing latent features from the three inputs. Moreover, predicted object is compared with reference object by using a Siamese CNN, so that object occlusion and deformation are detected and search range of tracking object is found adaptively. Comprehensive experimental results on a large-scale benchmark database showed that the proposed method outperformed state-of-the-art tracking methods in terms of precision and robustness with real-time tracking (about 25 fps).
- Niu, W., Jiao, L., Han, D., and Wang, Y.F., 2003. Real-time multi person tracking in video surveillance. In Proceedings of the Pacific Rim Multimedia Conference on IEEE, 1144--1148.Google Scholar
- Aggarwal, J. K., and Ryoo, M. S., 2011. Human activity analysis: a review. ACM Computing Surveys 43, 3, 1--43. Google ScholarDigital Library
- Crowley, J. K., and Schewerdt, K., 1999. Robust tracking and compression for video communications. In Proceedings of the IEEE International Conference Recognition, Analysis and Tracking of Faces and Gestures in Real-Time, 2--9. Google ScholarDigital Library
- Menresa, C., Varona, J., Mas, R., and Perales, F. J., 2005. Hand tracking and gesture recognition for human-computer interaction. Electronics Letters on Computer Vision and Image Analysis 4, 3, 96--104.Google ScholarCross Ref
- Adam, A., Rivlin, E., and Shimshoni, I., 2006. Robust fragments-based tracking using the integral histogram. In Proceedings of the International Conference on Computer Vision, 798--805. Google ScholarDigital Library
- Shu, G., Dehghan, A., Oreifeg, O, Hand, E., and Shah, M., 2012. Part-based multiple-person tracking with partial occlusion handling. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, 1815--1821. Google ScholarDigital Library
- Ma, C., Yang, X., Zhang, C., and Yang, M. H., 2015, Long-term correlation tracking. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on IEEE, 5388--5396.Google Scholar
- Wang, L., Ouyang, W., Wang, X., Lu, H., 2015. Visual tracking with fully convolutional networks, In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on IEEE, 3119--3127. Google ScholarDigital Library
- Nam, H., Han, B., 2016. Learning multi-domain convolutional neural networks for visual tracking. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on IEEE, 4293--4302.Google ScholarCross Ref
- Held, D., Thrun, S., Savarese, S., 2016. Learning to track at 100 fps with deep regression networks. In European Conference on Computer Vision, 749--765.Google ScholarCross Ref
- Ma, C., Huang, J. B., Yang, X., and Yang, M. H., 2015. Hierarchical convolutional feature for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, 3074--3082. Google ScholarDigital Library
- Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv, 1409.1556.Google Scholar
- Chollet, F., 2015. Keras. Available: https://github.com/fchollet/kerasGoogle Scholar
- Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Cehovin, L., et al., 2014. In European Conference on Computer Vision Workshop, 191--217.Google Scholar
- Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., et al., 2015. The visual object tracking vot2015 results. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW), 1--23. Google ScholarDigital Library
- Wu, Y., Lim, J., Yang, M. H., 2015. Object tracking benchmark. Pattern Analysis and Machine Intelligence, IEEE Transactions on 37, 9, 1834--1848.Google Scholar
- Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., and Torr, P. H., 2016. Staple: complementary learners for real-time tracking. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on IEEE, 1401--1409.Google Scholar
- Zhang, J., Ma, S., Sclaroff, S., 2014. Meem: robust tracking via multiple experts using entropy minimization. In European Conference on Computer Vision, 188--203.Google ScholarCross Ref
- Danelljan, M., Hager, G. K., and Felsberg, M., 2014. Accurate scale estimation for robust visual tracking. In British Machine Vision Conference, Nottingham, 1--11.Google Scholar
- Henriques, J. F., Caseiro, R., Martins, P., and Bastista, J., 2015. High-speed tracking with kernelized correlation filters. Pattern Analysis and Machine Intelligence, IEEE Transactions on 37, 3, 583--896.Google Scholar
- Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., and Torr, P. H. S., 2016. Staple: Complementary learners for real-time tracking. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on IEEE, 1401--1409.Google Scholar
Index Terms
- Robust and Real-Time Visual Tracking with Triplet Convolutional Neural Network
Recommendations
Deep visual tracking
The first comprehensive survey on deep-learning-based trackers.Review existing deep visual trackers from three different perspectives.Large-scale benchmark evaluations of deep visual trackers.Summarize cutting-edge research works and discuss future ...
Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks
Robust visual object tracking against occlusions and deformations is still very challenging task. To tackle these issues, existing Convolutional Neural Networks (CNNs) based trackers either fail to handle them or can just run in low speed. In this paper,...
Robust visual tracking based on convolutional neural network with extreme learning machine
Recently, deep learning has attracted substantial attention as a promising solution to many problems in computer vision. Among various deep learning architectures, convolutional neural network (CNN) has demonstrated superior performance as a feature ...
Comments