poster

Revisiting the VLAD image representation

Authors:
Jonathan Delhumeau

INRIA, Rennes, France

INRIA, Rennes, France
View Profile

,
Philippe-Henri Gosselin

INRIA, Rennes, France

INRIA, Rennes, France
View Profile

,
Hervé Jégou

INRIA, Rennes, France

INRIA, Rennes, France
View Profile

,
Patrick Pérez

Technicolor, Rennes, France

Technicolor, Rennes, France
View Profile

MM '13: Proceedings of the 21st ACM international conference on MultimediaOctober 2013Pages 653–656https://doi.org/10.1145/2502081.2502171

Published:21 October 2013Publication History

MM '13: Proceedings of the 21st ACM international conference on Multimedia

Pages 653–656

ABSTRACT

Recent works on image retrieval have proposed to index images by compact representations encoding powerful local descriptors, such as the closely related VLAD and Fisher vector. By combining such a representation with a suitable coding technique, it is possible to encode an image in a few dozen bytes while achieving excellent retrieval results. This paper revisits some assumptions proposed in this context regarding the handling of "visual burstiness", and shows that ad-hoc choices are implicitly done which are not desirable. Focusing on VLAD without loss of generality, we propose to modify several steps of the original design. Albeit simple, these modifications significantly improve VLAD and make it compare favorably against the state of the art.

References

R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, Jun. 2012. Google ScholarDigital Library
R. Arandjelovic and A. Zisserman. All about VLAD. In CVPR, Jun. 2013. Google ScholarDigital Library
C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2007.Google Scholar
G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray. Visual categorization with bags of keypoints. In ECCV Workshop Statistical Learning in Computer Vision, 2004.Google Scholar
H. Jégou, M. Douze, and C. Schmid. On the burstiness of visual elements. In CVPR, Jun. 2009.Google ScholarCross Ref
H. Jégou, M. Douze, and C. Schmid. Improving bag-of-features for large scale image search. IJCV, 87(3):316--336, Feb. 2010. Google ScholarDigital Library
H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. Trans. PAMI, 33(1):117--128, Jan. 2011. Google ScholarDigital Library
H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In CVPR, Jun. 2010.Google ScholarCross Ref
H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local descriptors into compact codes. In Trans. PAMI, 34(9):1704--1714, Sep. 2012. Google ScholarDigital Library
D. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2):91--110, Nov. 2004. Google ScholarDigital Library
K. Mikolajczyk and C. Schmid. Scale and affine invariant interest point detectors. IJCV, 60(1):63--86, Oct. 2004. Google ScholarDigital Library
D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In CVPR, Jun. 2006. Google ScholarDigital Library
F. Perronnin and C. R. Dance. Fisher kernels on visual vocabularies for image categorization. In CVPR, Jun. 2007.Google ScholarCross Ref
F. Perronnin, J.Sánchez, and T. Mensink. Improving the Fisher kernel for large-scale image classification. In ECCV, Sep. 2010. Google ScholarDigital Library
F. Perronnin, Y. Liu, J. Sanchez, and H. Poirier. Large-scale image retrieval with compressed Fisher vectors. In CVPR, Jun. 2010.Google ScholarCross Ref
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In CVPR, Jun. 2007.Google ScholarCross Ref
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In CVPR, Jun. 2008.Google ScholarCross Ref
J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, Oct. 2003. Google ScholarDigital Library

Index Terms

Revisiting the VLAD image representation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Image manipulation
      1. Image processing

Recommendations

Boosting VLAD with weighted fusion of local descriptors for image retrieval

In the last decade, many efforts have been developed for discriminative image representations. Among these works, vector of locally aggregated descriptors (VLAD) has been demonstrated to be an effective one. However, most VLAD-based methods generally ...
Read More
Weighted two-step aggregated VLAD for image retrieval
Abstract
The vector of locally aggregated descriptor (VLAD) has been demonstrated to be efficient and effective in image retrieval and classification tasks. Due to the small-size codebook adopted by the method, the feature space division is coarse and the ...
Read More
Novel color Gabor-LBP-PHOG (GLP) descriptors for object and scene image classification
ICVGIP '12: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image Processing

This paper presents a novel set of color descriptors for object and scene image classification. We first introduce a new Gabor-PHOG (GPHOG) descriptor by concatenating the Pyramid of Histograms of Oriented Gradients (PHOG) of the local Gabor filtered ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '13: Proceedings of the 21st ACM international conference on Multimedia
October 2013
1166 pages
ISBN:9781450324045
DOI:10.1145/2502081
General Chairs:
Alejandro (Alex) Jaimes
Yahoo!, Spain
,
Nicu Sebe
University of Trento, Italy
,
Nozha Boujemaa
INRIA, France
,
Program Chairs:
Daniel Gatica-Perez
IDIAP & EPFL, Switzerland
,
David A. Shamma
Yahoo!, USA
,
Marcel Worring
University of Amsterdam, The Netherlands
,
Roger Zimmermann
National University of Singapore, Singapore
Copyright © 2013 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 October 2013
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
VLAD
image search
multimedia retrieval
Qualifiers
- poster
Conference

Acceptance Rates
MM '13 Paper Acceptance Rate47of235submissions,20%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 154
  Total Citations
  View Citations
- 703
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)4
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Revisiting the VLAD image representation

MM '13: Proceedings of the 21st ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

Boosting VLAD with weighted fusion of local descriptors for image retrieval

Weighted two-step aggregated VLAD for image retrieval

Novel color Gabor-LBP-PHOG (GLP) descriptors for object and scene image classification