short-paper

Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition

Authors:
Hui Wu

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Michele Merler

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
Rosario Uceda-Sosa

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

,
John R. Smith

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA

IBM Thomas J. Watson Research Center, Yorktown Heights, NY, USA
View Profile

MM '16: Proceedings of the 24th ACM international conference on MultimediaOctober 2016Pages 172–176https://doi.org/10.1145/2964284.2967205

Published:01 October 2016Publication History

MM '16: Proceedings of the 24th ACM international conference on Multimedia

Pages 172–176

ABSTRACT

We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating a multi-task loss function on top of a convolutional neural network (CNN) architecture. It then refines the CNN predictions using a random walk based smoothing procedure, which further exploits the rich semantic information. We evaluate our algorithm on a large "food-in-the-wild" benchmark, as well as a challenging dataset of restaurant food dishes with very few training images. The proposed method achieves higher classification accuracy than a baseline which directly fine-tunes a deep learning network on the target dataset. Furthermore, we analyze the consistency of the learned model with the inherent semantic relationships among food categories. Results show that the proposed approach provides more semantically meaningful results than the baseline method, even in cases of mispredictions.

References

O. Beijbom, N. Joshi, D. Morris, S. Saponas, and S. Khullar. Menu-match: Restaurant-specific food logging from images. In IEEE Winter Conference on Applications of Computer Vision, pages 844--851, 2015. Google ScholarDigital Library
V. Bettadapura, E. Thomaz, A. Parnami, G. Abowd, and I. Essa. Leveraging context to support automated food recognition in restaurants. In IEEE Winter Conference on Applications of Computer Vision, January 2015. Google ScholarDigital Library
L. Bossard, M. Guillaumin, and L. Van Gool. Food-101 -- mining discriminative components with random forests. In European Conference on Computer Vision, 2014.Google ScholarCross Ref
J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, and H. Adam. Large-scale object classification using label relation graphs. In European Conference on Computer Vision, pages 48--64, 2014.Google ScholarCross Ref
T. Deselaers and V. Ferrari. Visual and semantic similarity in imagenet. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1777--1784, 2011. Google ScholarDigital Library
Z. Ge, C. McCool, C. Sanderson, and P. I. Corke. Modelling local deep convolutional neural network features to improve fine-grained image classification. In IEEE International Conference on Image Processing, 2015.Google ScholarDigital Library
N. Goernitz, C. Widmer, G. Zeller, A. Kahles, G. Ratsch, and S. Sonnenburg. Hierarchical multitask structured output learning for large-scale sequence segmentation. In NIPS, pages 2690--2698, 2011. Google ScholarDigital Library
J. Hessel, N. Savva, and M. J. Wilber. Image representations and new domains in neural image captioning. Conference on Empirical Methods in Natural Language Processing Vision+Learning workshop, 2015.Google ScholarCross Ref
H. Kagaya, K. Aizawa, and M. Ogawa. Food detection and recognition using convolutional neural network. In ACM Multimedia, pages 1085--1088, 2014. Google ScholarDigital Library
Y. Kawano and K. Yanai. Foodcam: A real-time mobile food recognition system employing fisher vector. In International Conference on MultiMedia Modeling, pages 369--373, 2014. Google ScholarDigital Library
A. N. Langville and C. D. Meyer. A survey of eigenvector methods for web information retrieval. SIAM review, 47(1):135--161, 2005. Google ScholarDigital Library
A. Myers, N. Johnston, V. Rathod, A. Korattikara, A. Gorban, N. Silberman, S. Guadarrama, G. Papandreou, J. Huang, and K. Murphy. Im2calories: towards an automated mobile vision food diary. In International Conference on Computer Vision, 2015. Google ScholarDigital Library
C. M. Niki Martinel, Claudio Piciarelli and G. L. Foresti. A structured committee for food recognition. In International Conference on Computer Vision, 2015. Google ScholarDigital Library
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1--9, 2015.Google ScholarCross Ref
L. Van Der Maaten. Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research, 15(1):3221--3245, 2014. Google ScholarDigital Library
S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2645--2654, 2015.Google ScholarCross Ref
R. Xu, L. Herranz, S. Jiang, S. Wang, X. Song, and R. Jain. Geolocalized modeling for dish recognition. IEEE Transactions on Multimedia, 17(8):1187--1199, Aug 2015.Google ScholarDigital Library
Z. Yan, H. Zhang, R. Piramuthu, V. Jagadeesh, D. DeCoste, W. Di, and Y. Yu. Hd-cnn: Hierarchical deep convolutional neural network for large scale visual recognition. In International Conference on Computer Vision, 2015. Google ScholarDigital Library
K. Yanai and Y. Kawano. Food image recognition using deep convolutional network with pre-training and fine-tuning. In IEEE International Conference on Multimedia and Expo, pages 1--6, 2015.Google ScholarCross Ref
T. Zeng and S. Ji. Deep convolutional neural networks for multi-instance multi-task learning. In IEEE International Conference on Data Mining, pages 579--588, 2015. Google ScholarDigital Library
X. Zhang, F. Zhou, Y. Lin, and S. Zhang. Embedding label structures for fine-grained feature representation. IEEE Conference on Computer Vision and Pattern Recognition, 2016.Google ScholarCross Ref
F. Zhou and Y. Lin. Fine-grained image classification by exploring bipartite-graph labels. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.Google ScholarCross Ref

Index Terms

Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition
1. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia databases

Recommendations

A multi-task learning approach for meal assessment
CEA/MADiMa '18: Proceedings of the Joint Workshop on Multimedia for Cooking and Eating Activities and Multimedia Assisted Dietary Management

Key role in the prevention of diet-related chronic diseases plays the balanced nutrition together with a proper diet. The conventional dietary assessment methods are time-consuming, expensive and prone to errors. New technology-based methods that ...
Read More
Fast multi-task learning for query spelling correction
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge management

In this paper, we explore the use of a novel online multi-task learning framework for the task of search query spelling correction. In our procedure, correction candidates are initially generated by a ranker-based system and then re-ranked by our multi-...
Read More
Few-shot Food Recognition via Multi-view Representation Learning

This article considers the problem of few-shot learning for food recognition. Automatic food recognition can support various applications, e.g., dietary assessment and food journaling. Most existing works focus on food recognition with large numbers of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MM '16: Proceedings of the 24th ACM international conference on Multimedia
October 2016
1542 pages
ISBN:9781450336031
DOI:10.1145/2964284
General Chairs:
Alan Hanjalic
Delft University of Technology
,
Cees Snoek
Qualcomm Research Netherlands / University of Amsterdam
,
Marcel Worring
University of Amsterdam
,
Moderator:
Dick Bulterman
CWI / VU University Amsterdam
,
Program Chairs:
Benoit Huet
EURECOM
,
Aisling Kelliher
Virginia Tech
,
Yiannis Kompatsiaris
CERTH-ITI
,
Jin Li
Microsoft
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 October 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
food recognition
multi-task learning
Qualifiers
- short-paper
Conference

Acceptance Rates
MM '16 Paper Acceptance Rate52of237submissions,22%Overall Acceptance Rate995of4,171submissions,24%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 46
  Total Citations
  View Citations
- 666
  Total Downloads
- Downloads (Last 12 months)40
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition

MM '16: Proceedings of the 24th ACM international conference on Multimedia

ABSTRACT

References

Cited By

Index Terms

Recommendations

A multi-task learning approach for meal assessment

Fast multi-task learning for query spelling correction

Few-shot Food Recognition via Multi-view Representation Learning