skip to main content
10.1145/2964284.2967271acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection

Published:01 October 2016Publication History

ABSTRACT

In this work we propose a method that integrates multi-task learning (MTL) and deep learning. Our method appends a MTL-like loss to a deep convolutional neural network, in order to learn the relations between tasks together at the same time, and also incorporates the label correlations between pairs of tasks. We apply the proposed method on a transfer learning scenario, where our objective is to fine-tune the parameters of a network that has been originally trained on a large-scale image dataset for concept detection, so that it be applied on a target video dataset and a corresponding new set of target concepts. We evaluate the proposed method for the video concept detection problem on the TRECVID 2013 Semantic Indexing dataset. Our results show that the proposed algorithm leads to better concept-based video annotation than existing state-of-the-art methods.

References

  1. A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. Advances in Neural Information Processing Systems (NIPS 2007), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Argyriou, T. Evgeniou, and M. Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243--272, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Trans. on Intelligent Systems and Technology, 2:27:1--27:27, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  5. H. Daumé, III. Bayesian multitask learning with latent hierarchies. In the 25th Conf. on Uncertainty in Artificial Intelligence (UAI 2009), pages 135--142, Quebec, Canada, 2009. AUAI Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Evgeniou and M. Pontil. Regularized multi--task learning. In the 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD 2004), pages 109--117, Seattle, WA, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9:1871--1874, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.Google ScholarGoogle Scholar
  10. A. Krizhevsky, S. Ilya, and G. Hinton. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems (NIPS 2012), pages 1097--1105, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Kumar and H. Daume. Learning task grouping and overlap in multi-task learning. In the 29th ACM Int. Conf. on Machine Learning (ICML 2012), pages 1383--1390, Edinburgh, Scotland, 2012.Google ScholarGoogle Scholar
  12. M. Long and J. Wang. Learning multiple tasks with deep relationship networks. CoRR, abs/1506.02117, 2015.Google ScholarGoogle Scholar
  13. F. Markatopoulou, V. Mezaris, and I. Patras. Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection. In the IEEE Int. Conf. on Image Processing (ICIP 2015), pages 1786--1790, Quebec, Canada, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. F. Markatopoulou, V. Mezaris, and I. Patras. Online Multi-Task Learning for Semantic Concept Detection in Video. In the IEEE Int. Conf. on Image Processing (ICIP 2016), Phoenix, AZ, USA, 2016.Google ScholarGoogle Scholar
  15. F. Markatopoulou et al. ITI-CERTH in TRECVID 2015. In TRECVID 2015. NIST, USA, 2015.Google ScholarGoogle Scholar
  16. H. Mousavi, U. Srinivas, V. Monga, Y. Suo, M. Dao, and T. Tran. Multi-task image classification via collaborative, hierarchical spike-and-slab priors. In the IEEE Int. Conf. on Image Processing (ICIP 2014), pages 4236--4240, Paris, France, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  17. G. Obozinski and B. Taskar. Multi-task feature selection. In the 23rd Int. Conf. on Machine Learning (ICML 2006). Workshop of Structural Knowledge Transfer for Machine Learning, Pittsburgh, PA, 2006.Google ScholarGoogle Scholar
  18. M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. In the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Ouyang, X. Chu, and X. Wang. Multi-source deep learning for human pose estimation. In the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2014), pages 2337--2344, Columbus, OH, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. P. Over et al. TRECVID 2013-An overview of the goals, tasks, data, evaluation mechanisms and metrics. In TRECVID 2013. NIST, USA, 2013.Google ScholarGoogle Scholar
  21. O. Russakovsky, J. Deng, and H. S. et al. ImageNet Large Scale Visual Recognition Challenge. Int. Journal of Computer Vision (IJCV 2015), 115(3):211--252, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv technical report, 2014.Google ScholarGoogle Scholar
  23. C. Snoek, D. Fontijne, K. E. van de Sande, and H. e. a. Stokman. Qualcomm research and University of Amsterdam at TRECVID 2015: Recognizing concepts, objects, and events in video. In TRECVID 2015. NIST, USA, 2015.Google ScholarGoogle Scholar
  24. C. G. M. Snoek and M. Worring. Concept-Based Video Retrieval. Foundations and Trends in Information Retrieval, 2(4):215--322, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Sun, Y. Chen, X. Liu, and E. Wu. Adaptive multi-task learning for fine-grained categorization. In the IEEE Int. Conf. on Image Processing (ICIP 2015), pages 996--1000, Quebec, Canada, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Szegedy et al. Going deeper with convolutions. In the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR 2015), Boston, MA, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  27. Y. Yang and T. M. Hospedales. A unified perspective on multi-domain and multi-task learning. In the Int. Conf. on Learning Representations (ICLR 2015), San Diego, California, 2015.Google ScholarGoogle Scholar
  28. E. Yilmaz, E. Kanoulas, and J. A. Aslam. A simple and efficient sampling method for estimating AP and NDCG. In the 31st ACM Int. Conf. on Research and Development in Information Retrieval (SIGIR 2008), pages 603--610, Singapore, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Yosinski, J. Clune, Y. Bengio, and H. Lipson. How transferable are features in deep neural networks? Advances in Neural Information Processing Systems (NIPS 2014), pages 3320--3328, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In the 13th Europ. Conf. on Computer Vision (ECCV 2014), pages 94--108, Zurich, Switzerland, 2014. Springer.Google ScholarGoogle ScholarCross RefCross Ref
  31. J. Zhou, J. Chen, and J. Ye. Clustered multi-task learning via alternating structure optimization. Advances in Neural Information Processing Systems (NIPS 2011), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Zhou, J. Chen, and J. Ye. MALSAR: Multi-task learning via structural regularization. Technical report, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '16: Proceedings of the 24th ACM international conference on Multimedia
        October 2016
        1542 pages
        ISBN:9781450336031
        DOI:10.1145/2964284

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 October 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • short-paper

        Acceptance Rates

        MM '16 Paper Acceptance Rate52of237submissions,22%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader