ABSTRACT
We propose a visual food recognition framework that integrates the inherent semantic relationships among fine-grained classes. Our method learns semantics-aware features by formulating a multi-task loss function on top of a convolutional neural network (CNN) architecture. It then refines the CNN predictions using a random walk based smoothing procedure, which further exploits the rich semantic information. We evaluate our algorithm on a large "food-in-the-wild" benchmark, as well as a challenging dataset of restaurant food dishes with very few training images. The proposed method achieves higher classification accuracy than a baseline which directly fine-tunes a deep learning network on the target dataset. Furthermore, we analyze the consistency of the learned model with the inherent semantic relationships among food categories. Results show that the proposed approach provides more semantically meaningful results than the baseline method, even in cases of mispredictions.
- O. Beijbom, N. Joshi, D. Morris, S. Saponas, and S. Khullar. Menu-match: Restaurant-specific food logging from images. In IEEE Winter Conference on Applications of Computer Vision, pages 844--851, 2015. Google ScholarDigital Library
- V. Bettadapura, E. Thomaz, A. Parnami, G. Abowd, and I. Essa. Leveraging context to support automated food recognition in restaurants. In IEEE Winter Conference on Applications of Computer Vision, January 2015. Google ScholarDigital Library
- L. Bossard, M. Guillaumin, and L. Van Gool. Food-101 -- mining discriminative components with random forests. In European Conference on Computer Vision, 2014.Google ScholarCross Ref
- J. Deng, N. Ding, Y. Jia, A. Frome, K. Murphy, S. Bengio, Y. Li, H. Neven, and H. Adam. Large-scale object classification using label relation graphs. In European Conference on Computer Vision, pages 48--64, 2014.Google ScholarCross Ref
- T. Deselaers and V. Ferrari. Visual and semantic similarity in imagenet. In IEEE Conference on Computer Vision and Pattern Recognition, pages 1777--1784, 2011. Google ScholarDigital Library
- Z. Ge, C. McCool, C. Sanderson, and P. I. Corke. Modelling local deep convolutional neural network features to improve fine-grained image classification. In IEEE International Conference on Image Processing, 2015.Google ScholarDigital Library
- N. Goernitz, C. Widmer, G. Zeller, A. Kahles, G. Ratsch, and S. Sonnenburg. Hierarchical multitask structured output learning for large-scale sequence segmentation. In NIPS, pages 2690--2698, 2011. Google ScholarDigital Library
- J. Hessel, N. Savva, and M. J. Wilber. Image representations and new domains in neural image captioning. Conference on Empirical Methods in Natural Language Processing Vision+Learning workshop, 2015.Google ScholarCross Ref
- H. Kagaya, K. Aizawa, and M. Ogawa. Food detection and recognition using convolutional neural network. In ACM Multimedia, pages 1085--1088, 2014. Google ScholarDigital Library
- Y. Kawano and K. Yanai. Foodcam: A real-time mobile food recognition system employing fisher vector. In International Conference on MultiMedia Modeling, pages 369--373, 2014. Google ScholarDigital Library
- A. N. Langville and C. D. Meyer. A survey of eigenvector methods for web information retrieval. SIAM review, 47(1):135--161, 2005. Google ScholarDigital Library
- A. Myers, N. Johnston, V. Rathod, A. Korattikara, A. Gorban, N. Silberman, S. Guadarrama, G. Papandreou, J. Huang, and K. Murphy. Im2calories: towards an automated mobile vision food diary. In International Conference on Computer Vision, 2015. Google ScholarDigital Library
- C. M. Niki Martinel, Claudio Piciarelli and G. L. Foresti. A structured committee for food recognition. In International Conference on Computer Vision, 2015. Google ScholarDigital Library
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1--9, 2015.Google ScholarCross Ref
- L. Van Der Maaten. Accelerating t-sne using tree-based algorithms. The Journal of Machine Learning Research, 15(1):3221--3245, 2014. Google ScholarDigital Library
- S. Xie, T. Yang, X. Wang, and Y. Lin. Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE Conference on Computer Vision and Pattern Recognition, pages 2645--2654, 2015.Google ScholarCross Ref
- R. Xu, L. Herranz, S. Jiang, S. Wang, X. Song, and R. Jain. Geolocalized modeling for dish recognition. IEEE Transactions on Multimedia, 17(8):1187--1199, Aug 2015.Google ScholarDigital Library
- Z. Yan, H. Zhang, R. Piramuthu, V. Jagadeesh, D. DeCoste, W. Di, and Y. Yu. Hd-cnn: Hierarchical deep convolutional neural network for large scale visual recognition. In International Conference on Computer Vision, 2015. Google ScholarDigital Library
- K. Yanai and Y. Kawano. Food image recognition using deep convolutional network with pre-training and fine-tuning. In IEEE International Conference on Multimedia and Expo, pages 1--6, 2015.Google ScholarCross Ref
- T. Zeng and S. Ji. Deep convolutional neural networks for multi-instance multi-task learning. In IEEE International Conference on Data Mining, pages 579--588, 2015. Google ScholarDigital Library
- X. Zhang, F. Zhou, Y. Lin, and S. Zhang. Embedding label structures for fine-grained feature representation. IEEE Conference on Computer Vision and Pattern Recognition, 2016.Google ScholarCross Ref
- F. Zhou and Y. Lin. Fine-grained image classification by exploring bipartite-graph labels. In IEEE Conference on Computer Vision and Pattern Recognition, 2016.Google ScholarCross Ref
Index Terms
- Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition
Recommendations
A multi-task learning approach for meal assessment
CEA/MADiMa '18: Proceedings of the Joint Workshop on Multimedia for Cooking and Eating Activities and Multimedia Assisted Dietary ManagementKey role in the prevention of diet-related chronic diseases plays the balanced nutrition together with a proper diet. The conventional dietary assessment methods are time-consuming, expensive and prone to errors. New technology-based methods that ...
Fast multi-task learning for query spelling correction
CIKM '12: Proceedings of the 21st ACM international conference on Information and knowledge managementIn this paper, we explore the use of a novel online multi-task learning framework for the task of search query spelling correction. In our procedure, correction candidates are initially generated by a ranker-based system and then re-ranked by our multi-...
Few-shot Food Recognition via Multi-view Representation Learning
This article considers the problem of few-shot learning for food recognition. Automatic food recognition can support various applications, e.g., dietary assessment and food journaling. Most existing works focus on food recognition with large numbers of ...
Comments