research-article

Robust and Real-Time Visual Tracking with Triplet Convolutional Neural Network

Authors:
Jung Uk Kim

Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
View Profile

,
Hak Gu Kim

Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
View Profile

,
Yong Man Ro

Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea

Korea Advanced Institute of Science and Technology (KAIST), Daejeon, Republic of Korea
View Profile

Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017October 2017Pages 280–286https://doi.org/10.1145/3126686.3126740

Published:23 October 2017Publication History

Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

Pages 280–286

ABSTRACT

In this paper, we propose a new visual object tracking which realizes robustness against object occlusion and deformation. In the proposed visual tracking, triplet convolutional neural network (triplet-CNN) structure is devised. The three inputs for the triplet-CNN come from current query frame, tracked object in a previous frame, and reference object. Object location in the query frame is predicted by fusing latent features from the three inputs. Moreover, predicted object is compared with reference object by using a Siamese CNN, so that object occlusion and deformation are detected and search range of tracking object is found adaptively. Comprehensive experimental results on a large-scale benchmark database showed that the proposed method outperformed state-of-the-art tracking methods in terms of precision and robustness with real-time tracking (about 25 fps).

References

Niu, W., Jiao, L., Han, D., and Wang, Y.F., 2003. Real-time multi person tracking in video surveillance. In Proceedings of the Pacific Rim Multimedia Conference on IEEE, 1144--1148.Google Scholar
Aggarwal, J. K., and Ryoo, M. S., 2011. Human activity analysis: a review. ACM Computing Surveys 43, 3, 1--43. Google ScholarDigital Library
Crowley, J. K., and Schewerdt, K., 1999. Robust tracking and compression for video communications. In Proceedings of the IEEE International Conference Recognition, Analysis and Tracking of Faces and Gestures in Real-Time, 2--9. Google ScholarDigital Library
Menresa, C., Varona, J., Mas, R., and Perales, F. J., 2005. Hand tracking and gesture recognition for human-computer interaction. Electronics Letters on Computer Vision and Image Analysis 4, 3, 96--104.Google ScholarCross Ref
Adam, A., Rivlin, E., and Shimshoni, I., 2006. Robust fragments-based tracking using the integral histogram. In Proceedings of the International Conference on Computer Vision, 798--805. Google ScholarDigital Library
Shu, G., Dehghan, A., Oreifeg, O, Hand, E., and Shah, M., 2012. Part-based multiple-person tracking with partial occlusion handling. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on IEEE, 1815--1821. Google ScholarDigital Library
Ma, C., Yang, X., Zhang, C., and Yang, M. H., 2015, Long-term correlation tracking. In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on IEEE, 5388--5396.Google Scholar
Wang, L., Ouyang, W., Wang, X., Lu, H., 2015. Visual tracking with fully convolutional networks, In Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference on IEEE, 3119--3127. Google ScholarDigital Library
Nam, H., Han, B., 2016. Learning multi-domain convolutional neural networks for visual tracking. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on IEEE, 4293--4302.Google ScholarCross Ref
Held, D., Thrun, S., Savarese, S., 2016. Learning to track at 100 fps with deep regression networks. In European Conference on Computer Vision, 749--765.Google ScholarCross Ref
Ma, C., Huang, J. B., Yang, X., and Yang, M. H., 2015. Hierarchical convolutional feature for visual tracking. In Proceedings of the IEEE International Conference on Computer Vision, 3074--3082. Google ScholarDigital Library
Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv, 1409.1556.Google Scholar
Chollet, F., 2015. Keras. Available: https://github.com/fchollet/kerasGoogle Scholar
Kristan, M., Pflugfelder, R., Leonardis, A., Matas, J., Cehovin, L., et al., 2014. In European Conference on Computer Vision Workshop, 191--217.Google Scholar
Kristan, M., Matas, J., Leonardis, A., Felsberg, M., Cehovin, L., et al., 2015. The visual object tracking vot2015 results. In Proceedings of the IEEE International Conference on Computer Vision Workshop (ICCVW), 1--23. Google ScholarDigital Library
Wu, Y., Lim, J., Yang, M. H., 2015. Object tracking benchmark. Pattern Analysis and Machine Intelligence, IEEE Transactions on 37, 9, 1834--1848.Google Scholar
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., and Torr, P. H., 2016. Staple: complementary learners for real-time tracking. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on IEEE, 1401--1409.Google Scholar
Zhang, J., Ma, S., Sclaroff, S., 2014. Meem: robust tracking via multiple experts using entropy minimization. In European Conference on Computer Vision, 188--203.Google ScholarCross Ref
Danelljan, M., Hager, G. K., and Felsberg, M., 2014. Accurate scale estimation for robust visual tracking. In British Machine Vision Conference, Nottingham, 1--11.Google Scholar
Henriques, J. F., Caseiro, R., Martins, P., and Bastista, J., 2015. High-speed tracking with kernelized correlation filters. Pattern Analysis and Machine Intelligence, IEEE Transactions on 37, 3, 583--896.Google Scholar
Bertinetto, L., Valmadre, J., Golodetz, S., Miksik, O., and Torr, P. H. S., 2016. Staple: Complementary learners for real-time tracking. In Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference on IEEE, 1401--1409.Google Scholar

Index Terms

Robust and Real-Time Visual Tracking with Triplet Convolutional Neural Network
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Deep visual tracking

The first comprehensive survey on deep-learning-based trackers.Review existing deep visual trackers from three different perspectives.Large-scale benchmark evaluations of deep visual trackers.Summarize cutting-edge research works and discuss future ...
Read More
Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks

Robust visual object tracking against occlusions and deformations is still very challenging task. To tackle these issues, existing Convolutional Neural Networks (CNNs) based trackers either fail to handle them or can just run in low speed. In this paper,...
Read More
Robust visual tracking based on convolutional neural network with extreme learning machine

Recently, deep learning has attracted substantial attention as a promising solution to many problems in computer vision. Among various deep learning architectures, convolutional neural network (CNN) has demonstrated superior performance as a feature ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017
October 2017
558 pages
ISBN:9781450354165
DOI:10.1145/3126686
Program Chairs:
Wanmin Wu
Google, USA
,
Jianchao Yang
Snap Inc., USA
,
Qi Tian
The University of Texas at San Antonio, USA
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
deformation
occlusion
visual tracking
Qualifiers
- research-article
Conference
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 211
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Robust and Real-Time Visual Tracking with Triplet Convolutional Neural Network

Thematic Workshops '17: Proceedings of the on Thematic Workshops of ACM Multimedia 2017

ABSTRACT

References

Cited By

Index Terms

Recommendations

Deep visual tracking

Robust real-time visual object tracking via multi-scale fully convolutional Siamese networks

Robust visual tracking based on convolutional neural network with extreme learning machine