research-article

Matching User Photos to Online Products with Robust Deep Features

Authors:
Xi Wang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Zhenfeng Sun

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Wenqiang Zhang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

,
Yu Zhou

Chinese Academy of Sciences, Beijing, China

Chinese Academy of Sciences, Beijing, China
View Profile

,
Yu-Gang Jiang

Fudan University, Shanghai, China

Fudan University, Shanghai, China
View Profile

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia RetrievalJune 2016Pages 7–14https://doi.org/10.1145/2911996.2912002

Published:06 June 2016Publication History

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

Pages 7–14

ABSTRACT

This paper focuses on a practically very important problem of matching a real-world product photo to exactly the same item(s) in online shopping sites. The task is extremely challenging because the user photos (i.e., the queries in this scenario) are often captured in uncontrolled environments, while the product images in online shops are mostly taken by professionals with clean backgrounds and perfect lighting conditions. To tackle the problem, we study deep network architectures and training schemes, with the goal of learning a robust deep feature representation that is able to bridge the domain gap between the user photos and the online product images. Our contributions are two-fold. First, we propose an alternative of the popular contrastive loss used in siamese deep networks, namely robust contrastive loss, where we "relax" the penalty on positive pairs to alleviate over-fitting. Second, a multi-task fine-tuning approach is introduced to learn a better feature representation, which not only incorporates knowledge from the provided training photo pairs, but also explores additional information from the large ImageNet dataset to regularize the fine-tuning procedure. Experiments on two challenging real-world datasets demonstrate that both the robust contrastive loss and the multi-task fine-tuning approach are effective, leading to very promising results with a time cost suitable for real-time retrieval.

References

S. Bell and K. Bala. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics, 34(4):98, 2015. Google ScholarDigital Library
G. Chechik, V. Sharma, U. Shalit, and S. Bengio. Large scale online learning of image similarity through ranking. The Journal of Machine Learning Research, 11:1109--1135, 2010. Google ScholarDigital Library
S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 539--546. IEEE, 2005. Google ScholarDigital Library
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009.Google ScholarCross Ref
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531, 2013.Google Scholar
C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1915--1929, 2013. Google ScholarDigital Library
R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality reduction by learning an invariant mapping. In Computer Vision and Pattern Recognition, 2006. Google ScholarDigital Library
J. He, J. Feng, X. Liu, T. Cheng, T.-H. Lin, H. Chung, and S.-F. Chang. Mobile product search with bag of hash bits and boundary reranking. In Computer Vision and Pattern Recognition, 2012. Google ScholarDigital Library
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google Scholar
J. Huang, R. S. Feris, Q. Chen, and S. Yan. Cross-domain image retrieval with a dual attribute-aware ranking network. In International Conference on Computer Vision, 2015. Google ScholarDigital Library
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv;1502.03167, 2015.Google Scholar
Y.-G. Jiang and J. Wang. Partial copy detection in videos: A benchmark and an evaluation of popular methods. IEEE Transactions on Big Data, PP(99), 2016.Google ScholarCross Ref
Y. Kalantidis, L. Kennedy, and L.-J. Li. Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In ACM International Conference on Multimedia Retrieval, 2013. Google ScholarDigital Library
M. H. Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg. Where to buy it:matching street clothing photos in online shops. In International Conference on Computer Vision, 2015. Google ScholarDigital Library
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.Google ScholarDigital Library
Y.-H. Kuo, W.-H. Cheng, H.-T. Lin, and W. H. Hsu. Unsupervised semantic feature discovery for image object retrieval and tag refinement. Multimedia, IEEE Transactions on, 14(4):1079--1090, 2012. Google ScholarDigital Library
D. Lim, G. Lanckriet, and B. McFee. Robust structural metric learning. In International Conference on Machine Learning, 2013.Google Scholar
S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Computer Vision and Pattern Recognition, 2012.Google ScholarDigital Library
A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops, 2014. Google ScholarDigital Library
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 2015.Google ScholarDigital Library
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211--252, 2015. Google ScholarDigital Library
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Computer Vision and Pattern Recognition, 2015.Google ScholarCross Ref
K. E. Van de Sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders. Segmentation as selective search for object recognition. In International Conference on Computer Vision. IEEE, 2011. Google ScholarDigital Library
L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579--2605):85, 2008.Google Scholar
P. Wu, S. C. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao. Online multimodal deep similarity learning with application to image retrieval. In ACM International Conference on Multimedia, 2013. Google ScholarDigital Library
Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In ACM International Conference on Multimedia, 2015. Google ScholarDigital Library

Index Terms

Matching User Photos to Online Products with Robust Deep Features
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
2. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Image search

Recommendations

Image Retrieval Using Fused Deep Convolutional Features

This paper proposes an image retrieval using fused deep convolutional features to solve the semantic gap between low-level features and high-level semantic features of traditional contend-based image retrieval method. Firstly, the improved network ...
Read More
Deep convolutional features for image retrieval
Highlights
- A comprehensive study that explores deep convolutional features for CBIR.
- The ...
Abstract
Nowadays, the use of Convolutional Neural Networks (CNNs) has led to tremendous achievements in several computer vision challenges. CNN-based image retrieval methods vary in complexity, growing capacity, and execution time. This work ...
Read More
Content-based image retrieval with compact deep convolutional features

Convolutional neural networks (CNNs) with deep learning have recently achieved a remarkable success with a superior performance in computer vision applications. Most of CNN-based methods extract image features at the last layer using a single CNN ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval
June 2016
452 pages
ISBN:9781450343596
DOI:10.1145/2911996
General Chairs:
John R. Kender
Columbia University, USA
,
John R. Smith
IBM Research, USA
,
Program Chairs:
Jiebo Luo
University of Rochester, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Winston Hsu
National Taiwan University, Taiwan
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
image retrieval
visual similarity
Qualifiers
- research-article
Conference

Acceptance Rates
ICMR '16 Paper Acceptance Rate20of120submissions,17%Overall Acceptance Rate254of830submissions,31%
More
Upcoming Conference
ICMR '24

Sponsor:

sigmm

International Conference on Multimedia Retrieval

June 10 - 14, 2024

Phuket , Thailand
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 471
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Matching User Photos to Online Products with Robust Deep Features

ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval

ABSTRACT

References

Cited By

Index Terms

Recommendations

Image Retrieval Using Fused Deep Convolutional Features

Deep convolutional features for image retrieval

Content-based image retrieval with compact deep convolutional features