short-paper

Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection

Authors:
Foteini Markatopoulou

CERTH & Queen Mary University of London, Thessaloniki, Greece

CERTH & Queen Mary University of London, Thessaloniki, Greece
View Profile

,
Vasileios Mezaris

CERTH, Thessaloniki, Greece

CERTH, Thessaloniki, Greece
View Profile

,
Ioannis Patras

Queen Mary University of London, London, United Kingdom

Queen Mary University of London, London, United Kingdom
View Profile

MM '16: Proceedings of the 24th ACM international conference on MultimediaOctober 2016Pages 501–505https://doi.org/10.1145/2964284.2967271

Published:01 October 2016Publication History

MM '16: Proceedings of the 24th ACM international conference on Multimedia

Pages 501–505

ABSTRACT

In this work we propose a method that integrates multi-task learning (MTL) and deep learning. Our method appends a MTL-like loss to a deep convolutional neural network, in order to learn the relations between tasks together at the same time, and also incorporates the label correlations between pairs of tasks. We apply the proposed method on a transfer learning scenario, where our objective is to fine-tune the parameters of a network that has been originally trained on a large-scale image dataset for concept detection, so that it be applied on a target video dataset and a corresponding new set of target concepts. We evaluate the proposed method for the video concept detection problem on the TRECVID 2013 Semantic Indexing dataset. Our results show that the proposed algorithm leads to better concept-based video annotation than existing state-of-the-art methods.

References

A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. Advances in Neural Information Processing Systems (NIPS 2007), 2007. Google ScholarDigital Library
A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243--272, 2008. Google ScholarDigital Library
C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Trans. on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Google ScholarDigital Library
K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014.Google ScholarCross Ref
H. Daumé, III. Bayesian multitask learning with latent hierarchies. In the 25th Conf. on Uncertainty in Artificial Intelligence (UAI 2009), pages 135--142, Quebec, Canada, 2009. AUAI Press. Google ScholarDigital Library
T. Evgeniou and M. Pontil. Regularized multi--task learning. In the 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2004), pages 109--117, Seattle, WA, 2004. Google ScholarDigital Library
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarDigital Library
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), 2014. Google ScholarDigital Library
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google Scholar
A. Krizhevsky, S. Ilya, and G. Hinton. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NIPS 2012), pages 1097--1105, 2012. Google ScholarDigital Library
A. Kumar and H. Daume. Learning task grouping and overlap in multi-task learning. In the 29th ACM Int. Conf. on Machine Learning (ICML 2012), pages 1383--1390, Edinburgh, Scotland, 2012.Google Scholar
M. Long and J. Wang. Learning multiple tasks with deep relationship networks. CoRR, abs/1506.02117, 2015.Google Scholar
F. Markatopoulou, V. Mezaris, and I. Patras. Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection. In the IEEE Int. Conf. on Image Processing (ICIP 2015), pages 1786--1790, Quebec, Canada, 2015.Google ScholarDigital Library
F. Markatopoulou, V. Mezaris, and I. Patras. Online Multi-Task Learning for Semantic Concept Detection in Video. In the IEEE Int. Conf. on Image Processing (ICIP 2016), Phoenix, AZ, USA, 2016.Google Scholar
F. Markatopoulou et al. ITI-CERTH in TRECVID 2015. In TRECVID 2015. NIST, USA, 2015.Google Scholar
H. Mousavi, U. Srinivas, V. Monga, Y. Suo, M. Dao, and T. Tran. Multi-task image classification via collaborative, hierarchical spike-and-slab priors. In the IEEE Int. Conf. on Image Processing (ICIP 2014), pages 4236--4240, Paris, France, 2014.Google ScholarCross Ref
G. Obozinski and B. Taskar. Multi-task feature selection. In the 23rd Int. Conf. on Machine Learning (ICML 2006). Workshop of Structural Knowledge Transfer for Machine Learning, Pittsburgh, PA, 2006.Google Scholar
M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, 2014. Google ScholarDigital Library
W. Ouyang, X. Chu, and X. Wang. Multi-source deep learning for human pose estimation. In the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), pages 2337--2344, Columbus, OH, 2014. Google ScholarDigital Library
P. Over et al. TRECVID 2013-An overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID 2013. NIST, USA, 2013.Google Scholar
O. Russakovsky, J. Deng, and H. S. et al. ImageNet Large Scale Visual Recognition Challenge. Int. Journal of Computer Vision (IJCV 2015), 115(3):211--252, 2015. Google ScholarDigital Library
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv technical report, 2014.Google Scholar
C. Snoek, D. Fontijne, K. E. van de Sande, and H. e. a. Stokman. Qualcomm research and University of Amsterdam at TRECVID 2015: Recognizing concepts, objects, and events in video. In TRECVID 2015. NIST, USA, 2015.Google Scholar
C. G. M. Snoek and M. Worring. Concept-Based Video Retrieval. Foundations and Trends in Information Retrieval, 2(4):215--322, 2009. Google ScholarDigital Library
G. Sun, Y. Chen, X. Liu, and E. Wu. Adaptive multi-task learning for fine-grained categorization. In the IEEE Int. Conf. on Image Processing (ICIP 2015), pages 996--1000, Quebec, Canada, 2015.Google ScholarDigital Library
C. Szegedy et al. Going deeper with convolutions. In the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, 2015.Google ScholarCross Ref
Y. Yang and T. M. Hospedales. A unified perspective on multi-domain and multi-task learning. In the Int. Conf. on Learning Representations (ICLR 2015), San Diego, California, 2015.Google Scholar
E. Yilmaz, E. Kanoulas, and J. A. Aslam. A simple and efficient sampling method for estimating AP and NDCG. In the 31st ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR 2008), pages 603--610, Singapore, 2008. Google ScholarDigital Library
J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? Advances in Neural Information Processing Systems (NIPS 2014), pages 3320--3328, 2014. Google ScholarDigital Library
Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In the 13th Europ. Conf. on Computer Vision (ECCV 2014), pages 94--108, Zurich, Switzerland, 2014. Springer.Google ScholarCross Ref
J. Zhou, J. Chen, and J. Ye. Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems (NIPS 2011), 2011. Google ScholarDigital Library
J. Zhou, J. Chen, and J. Ye. MALSAR: Multi-task learning via structural regularization. Technical report, 2011.Google Scholar

Index Terms

Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Transductive multi-label learning for video concept detection
MIR '08: Proceedings of the 1st ACM international conference on Multimedia information retrieval

Transductive video concept detection is an effective way to handle the lack of sufficient labeled videos. However, another issue, the multi-label interdependence, is not essentially addressed in the existing transductive methods. Most solutions only ...
Read More
A transductive multi-label learning approach for video concept detection

In this paper, we address two important issues in the video concept detection problem: the insufficiency of labeled videos and the multiple labeling issue. Most existing solutions merely handle the two issues separately. We propose an integrated ...
Read More
Asymmetry label correlation for multi-label learning
Abstract
As an effective method for mining latent information between labels, label correlation is widely adopted by many scholars to model multi-label learning algorithms. Most existing multi-label algorithms usually ignore that the correlation between ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
General Chairs:
Alan Hanjalic
Delft University of Technology
,
Cees Snoek
Qualcomm Research Netherlands / University of Amsterdam
,
Marcel Worring
University of Amsterdam
,
Moderator:
Dick Bulterman
CWI / VU University Amsterdam
,
Program Chairs:
Benoit Huet
EURECOM
,
Aisling Kelliher
Virginia Tech
,
Yiannis Kompatsiaris
CERTH-ITI
,
Jin Li
Microsoft
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concept detection
deep learning
video analysis
Qualifiers
- short-paper
Conference

Acceptance Rates
MM '16 Paper Acceptance Rate52of237submissions,22%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 14
  Total Citations
  View Citations
- 430
  Total Downloads
- Downloads (Last 12 months)4
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection

MM '16: Proceedings of the 24th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Transductive multi-label learning for video concept detection

A transductive multi-label learning approach for video concept detection

Asymmetry label correlation for multi-label learning