skip to main content
10.1145/2911996.2912002acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article

Matching User Photos to Online Products with Robust Deep Features

Published:06 June 2016Publication History

ABSTRACT

This paper focuses on a practically very important problem of matching a real-world product photo to exactly the same item(s) in online shopping sites. The task is extremely challenging because the user photos (i.e., the queries in this scenario) are often captured in uncontrolled environments, while the product images in online shops are mostly taken by professionals with clean backgrounds and perfect lighting conditions. To tackle the problem, we study deep network architectures and training schemes, with the goal of learning a robust deep feature representation that is able to bridge the domain gap between the user photos and the online product images. Our contributions are two-fold. First, we propose an alternative of the popular contrastive loss used in siamese deep networks, namely robust contrastive loss, where we "relax" the penalty on positive pairs to alleviate over-fitting. Second, a multi-task fine-tuning approach is introduced to learn a better feature representation, which not only incorporates knowledge from the provided training photo pairs, but also explores additional information from the large ImageNet dataset to regularize the fine-tuning procedure. Experiments on two challenging real-world datasets demonstrate that both the robust contrastive loss and the multi-task fine-tuning approach are effective, leading to very promising results with a time cost suitable for real-time retrieval.

References

  1. S. Bell and K. Bala. Learning visual similarity for product design with convolutional neural networks. ACM Transactions on Graphics, 34(4):98, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Chechik, V. Sharma, U. Shalit, and S. Bengio. Large scale online learning of image similarity through ranking. The Journal of Machine Learning Research, 11:1109--1135, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, volume 1, pages 539--546. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Computer Vision and Pattern Recognition, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531, 2013.Google ScholarGoogle Scholar
  6. C. Farabet, C. Couprie, L. Najman, and Y. LeCun. Learning hierarchical features for scene labeling. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8):1915--1929, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Hadsell, S. Chopra, and Y. LeCun. Dimensionality reduction by learning an invariant mapping. In Computer Vision and Pattern Recognition, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. He, J. Feng, X. Liu, T. Cheng, T.-H. Lin, H. Chung, and S.-F. Chang. Mobile product search with bag of hash bits and boundary reranking. In Computer Vision and Pattern Recognition, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.Google ScholarGoogle Scholar
  10. J. Huang, R. S. Feris, Q. Chen, and S. Yan. Cross-domain image retrieval with a dual attribute-aware ranking network. In International Conference on Computer Vision, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv;1502.03167, 2015.Google ScholarGoogle Scholar
  12. Y.-G. Jiang and J. Wang. Partial copy detection in videos: A benchmark and an evaluation of popular methods. IEEE Transactions on Big Data, PP(99), 2016.Google ScholarGoogle ScholarCross RefCross Ref
  13. Y. Kalantidis, L. Kennedy, and L.-J. Li. Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos. In ACM International Conference on Multimedia Retrieval, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. H. Kiapour, X. Han, S. Lazebnik, A. C. Berg, and T. L. Berg. Where to buy it:matching street clothing photos in online shops. In International Conference on Computer Vision, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Y.-H. Kuo, W.-H. Cheng, H.-T. Lin, and W. H. Hsu. Unsupervised semantic feature discovery for image object retrieval and tag refinement. Multimedia, IEEE Transactions on, 14(4):1079--1090, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Lim, G. Lanckriet, and B. McFee. Robust structural metric learning. In International Conference on Machine Learning, 2013.Google ScholarGoogle Scholar
  18. S. Liu, Z. Song, G. Liu, C. Xu, H. Lu, and S. Yan. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Computer Vision and Pattern Recognition, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. In Computer Vision and Pattern Recognition Workshops, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision, 115(3):211--252, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Computer Vision and Pattern Recognition, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  23. K. E. Van de Sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders. Segmentation as selective search for object recognition. In International Conference on Computer Vision. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579--2605):85, 2008.Google ScholarGoogle Scholar
  25. P. Wu, S. C. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao. Online multimodal deep similarity learning with application to image retrieval. In ACM International Conference on Multimedia, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In ACM International Conference on Multimedia, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Matching User Photos to Online Products with Robust Deep Features

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ICMR '16: Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval
        June 2016
        452 pages
        ISBN:9781450343596
        DOI:10.1145/2911996

        Copyright © 2016 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 6 June 2016

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        ICMR '16 Paper Acceptance Rate20of120submissions,17%Overall Acceptance Rate254of830submissions,31%

        Upcoming Conference

        ICMR '24
        International Conference on Multimedia Retrieval
        June 10 - 14, 2024
        Phuket , Thailand

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader