skip to main content
10.1145/2502081.2502171acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
poster

Revisiting the VLAD image representation

Published:21 October 2013Publication History

ABSTRACT

Recent works on image retrieval have proposed to index images by compact representations encoding powerful local descriptors, such as the closely related VLAD and Fisher vector. By combining such a representation with a suitable coding technique, it is possible to encode an image in a few dozen bytes while achieving excellent retrieval results. This paper revisits some assumptions proposed in this context regarding the handling of "visual burstiness", and shows that ad-hoc choices are implicitly done which are not desirable. Focusing on VLAD without loss of generality, we propose to modify several steps of the original design. Albeit simple, these modifications significantly improve VLAD and make it compare favorably against the state of the art.

References

  1. R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, Jun. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Arandjelovic and A. Zisserman. All about VLAD. In CVPR, Jun. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2007.Google ScholarGoogle Scholar
  4. G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV Workshop Statistical Learning in Computer Vision, 2004.Google ScholarGoogle Scholar
  5. H. Jégou, M. Douze, and C. Schmid. On the burstiness of visual elements. In CVPR, Jun. 2009.Google ScholarGoogle ScholarCross RefCross Ref
  6. H. Jégou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. IJCV, 87(3):316--336, Feb. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. Trans. PAMI, 33(1):117--128, Jan. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In CVPR, Jun. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  9. H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local descriptors into compact codes. In Trans. PAMI, 34(9):1704--1714, Sep. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, Nov. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. IJCV, 60(1):63--86, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In CVPR, Jun. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies for image categorization. In CVPR, Jun. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  14. F. Perronnin, J.Sánchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. In ECCV, Sep. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier. Large-scale image retrieval with compressed Fisher vectors. In CVPR, Jun. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  16. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, Jun. 2007.Google ScholarGoogle ScholarCross RefCross Ref
  17. J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, Jun. 2008.Google ScholarGoogle ScholarCross RefCross Ref
  18. J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Revisiting the VLAD image representation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '13: Proceedings of the 21st ACM international conference on Multimedia
        October 2013
        1166 pages
        ISBN:9781450324045
        DOI:10.1145/2502081

        Copyright © 2013 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 October 2013

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        MM '13 Paper Acceptance Rate47of235submissions,20%Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader