skip to main content
10.1145/2964284.2967205acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
short-paper

Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition

Authors Info & Claims
Published:01 October 2016Publication History

ABSTRACT

We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating a multi-task loss function on top of a convolutional neural network (CNN) architecture. It then refines the CNN predictions using a random walk based smoothing procedure, which further exploits the rich semantic information. We evaluate our algorithm on a large "food-in-the-wild" benchmark, as well as a challenging dataset of restaurant food dishes with very few training images. The proposed method achieves higher classification accuracy than a baseline which directly fine-tunes a deep learning network on the target dataset. Furthermore, we analyze the consistency of the learned model with the inherent semantic relationships among food categories. Results show that the proposed approach provides more semantically meaningful results than the baseline method, even in cases of mispredictions.

References

  1. O. Beijbom, N. Joshi, D. Morris, S. Saponas, and S. Khullar. Menu-match: Restaurant-specific food logging from images. In IEEE Winter Conference on Applications of Computer Vision, pages 844--851, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Bettadapura, E. Thomaz, A. Parnami, G. Abowd, and I. Essa. Leveraging context to support automated food recognition in restaurants. In IEEE Winter Conference on Applications of Computer Vision, January 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Bossard, M. Guillaumin, and L. Van Gool. Food-101 -- mining discriminative components with random forests. In European Conference on Computer Vision, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, and H. Adam. Large-scale object classification using label relation graphs. In European Conference on Computer Vision, pages 48--64, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  5. T. Deselaers and V. Ferrari. Visual and semantic similarity in imagenet. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1777--1784, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. Ge, C. McCool, C. Sanderson, and P. I. Corke. Modelling local deep convolutional neural network features to improve fine-grained image classification. In IEEE International Conference on Image Processing, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. Goernitz, C. Widmer, G. Zeller, A. Kahles, G. Ratsch, and S. Sonnenburg. Hierarchical multitask structured output learning for large-scale sequence segmentation. In NIPS, pages 2690--2698, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Hessel, N. Savva, and M. J. Wilber. Image representations and new domains in neural image captioning. Conference on Empirical Methods in Natural Language Processing Vision+Learning workshop, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  9. H. Kagaya, K. Aizawa, and M. Ogawa. Food detection and recognition using convolutional neural network. In ACM Multimedia, pages 1085--1088, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Kawano and K. Yanai. Foodcam: A real-time mobile food recognition system employing fisher vector. In International Conference on MultiMedia Modeling, pages 369--373, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. N. Langville and C. D. Meyer. A survey of eigenvector methods for web information retrieval. SIAM review, 47(1):135--161, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Myers, N. Johnston, V. Rathod, A. Korattikara, A. Gorban, N. Silberman, S. Guadarrama, G. Papandreou, J. Huang, and K. Murphy. Im2calories: towards an automated mobile vision food diary. In International Conference on Computer Vision, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. C. M. Niki Martinel, Claudio Piciarelli and G. L. Foresti. A structured committee for food recognition. In International Conference on Computer Vision, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1--9, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  15. L. Van Der Maaten. Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research, 15(1):3221--3245, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2645--2654, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  17. R. Xu, L. Herranz, S. Jiang, S. Wang, X. Song, and R. Jain. Geolocalized modeling for dish recognition. IEEE Transactions on Multimedia, 17(8):1187--1199, Aug 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Z. Yan, H. Zhang, R. Piramuthu, V. Jagadeesh, D. DeCoste, W. Di, and Y. Yu. Hd-cnn: Hierarchical deep convolutional neural network for large scale visual recognition. In International Conference on Computer Vision, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. K. Yanai and Y. Kawano. Food image recognition using deep convolutional network with pre-training and fine-tuning. In IEEE International Conference on Multimedia and Expo, pages 1--6, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  20. T. Zeng and S. Ji. Deep convolutional neural networks for multi-instance multi-task learning. In IEEE International Conference on Data Mining, pages 579--588, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. X. Zhang, F. Zhou, Y. Lin, and S. Zhang. Embedding label structures for fine-grained feature representation. IEEE Conference on Computer Vision and Pattern Recognition, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  22. F. Zhou and Y. Lin. Fine-grained image classification by exploring bipartite-graph labels. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '16: Proceedings of the 24th ACM international conference on Multimedia
      October 2016
      1542 pages
      ISBN:9781450336031
      DOI:10.1145/2964284

      Copyright © 2016 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 October 2016

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      MM '16 Paper Acceptance Rate52of237submissions,22%Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader