Abstract
This article considers the problem of few-shot learning for food recognition. Automatic food recognition can support various applications, e.g., dietary assessment and food journaling. Most existing works focus on food recognition with large numbers of labelled samples, and fail to recognize food categories with few samples. To address this problem, we propose a Multi-View Few-Shot Learning (MVFSL) framework to explore additional ingredient information for few-shot food recognition. Besides category-oriented deep visual features, we introduce ingredient-supervised deep network to extract ingredient-oriented features. As general and intermediate attributes of food, ingredient-oriented features are informative and complementary to category-oriented features, and thus they play an important role in improving food recognition. Particularly in few-shot food recognition, ingredient information can bridge the gap between disjoint training categories and test categories. To take advantage of ingredient information, we fuse these two kinds of features by first combining their feature maps from their respective deep networks and then convolving combined feature maps. Such convolution is further incorporated into a multi-view relation network, which is capable of comparing pairwise images to enable fine-grained feature learning. MVFSL is trained in an end-to-end fashion for joint optimization on two types of feature learning subnetworks and relation subnetworks. Extensive experiments on different food datasets have consistently demonstrated the advantage of MVFSL in multi-view feature fusion. Furthermore, we extend another two types of networks, namely, Siamese Network and Matching Network, by introducing ingredient information for few-shot food recognition. Experimental results have also shown that introducing ingredient information into these two networks can improve the performance of few-shot food recognition.
- Kiyoharu Aizawa, Yuto Maruyama, He Li, and Chamin Morikawa. 2013. Food balance estimation by using personal dietary tendencies in a multimedia food log. IEEE Trans. Multimedia 15, 8 (2013), 2176--2185.Google ScholarDigital Library
- Giuseppe Amato, Paolo Bolettieri, Monteiro De Lira Vinicius, Cristina Ioana Muntean, Raffaele Perego, and Chiara Renso. 2017. Social media image recognition for food trend analysis. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. 1333--1336.Google ScholarDigital Library
- Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando De Freitas. 2016. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems. MIT Press, 3981--3989.Google Scholar
- Shuang Ao and Charles X. Ling. 2015. Adapting new categories for food recognition with deep representation. In Proceedings of the IEEE International Conference on Data Mining Workshop. 1196--1203.Google Scholar
- Oscar Beijbom, Neel Joshi, Dan Morris, Scott Saponas, and Siddharth Khullar. 2015. Menu-match: Restaurant-specific food logging from images. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 844--851.Google ScholarDigital Library
- Luca Bertinetto, João F. Henriques, Jack Valmadre, Philip Torr, and Andrea Vedaldi. 2016. Learning feed-forward one-shot learners. In Advances in Neural Information Processing Systems. MIT Press, 523--531.Google Scholar
- Vinay Bettadapura, Edison Thomaz, Aman Parnami, Gregory D. Abowd, and Irfan Essa. 2015. Leveraging context to support automated food recognition in restaurants. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 580--587.Google ScholarDigital Library
- Marc Bolaños, Aina Ferrà, and Petia Radeva. 2017. Food ingredients recognition through multi-label learning. In Proceedings of the International Conference on Image Analysis and Processing. Springer, 394--402.Google ScholarCross Ref
- Lukas Bossard, Matthieu Guillaumin, and Luc Van Gool. 2014. Food-101--mining discriminative components with random forests. In Proceedings of the European Conference on Computer Vision. 446--461.Google ScholarCross Ref
- Qi Cai, Yingwei Pan, Ting Yao, Chenggang Yan, and Tao Mei. 2018. Memory matching networks for one-shot image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4080--4088.Google ScholarCross Ref
- Jingjing Chen and Chong-Wah Ngo. 2016. Deep-based ingredient recognition for cooking recipe retrieval. In Proceedings of the ACM International Conference on Multimedia. 32--41.Google ScholarDigital Library
- Xin Chen, Hua Zhou, Yu Zhu, and Liang Diao. 2017. ChineseFoodNet: A large-scale image dataset for Chinese food recognition. arXiv preprint arXiv:1705.02743.Google Scholar
- Joachim Dehais, Marios Anthimopoulos, Sergey Shevchik, and Stavroula Mougiakakou. 2017. Two-view 3D reconstruction for food volume estimation. IEEE Trans. Multimedia 19, 5 (2017), 1090--1099.Google ScholarDigital Library
- Lixi Deng, Jingjing Chen, Qianru Sun, Xiangnan He, Sheng Tang, Zhaoyan Ming, Yongdong Zhang, and Tat-Seng Chua. 2019. Mixed-dish recognition with contextual relation networks. In Proceedings of the 27th ACM International Conference on Multimedia (MM’19). 112--120.Google ScholarDigital Library
- L. Fei-Fei, R. Fergus, and P. Perona. 2006. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28, 4 (2006), 594--611.Google ScholarDigital Library
- C. Feichtenhofer, A. Pinz, and A. Zisserman. 2016. Convolutional two-stream network fusion for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1933--1941.Google Scholar
- Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning. 1126--1135.Google Scholar
- Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4476--4484.Google ScholarCross Ref
- Spyros Gidaris and Nikos Komodakis. 2018. Dynamic few-shot visual learning without forgetting. In IEEE Conference on Computer Vision and Pattern Recognition. 4367--4375.Google ScholarCross Ref
- Cheng Gong, Peicheng Zhou, and Junwei Han. 2016. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 54, 12 (2016), 7405--7415.Google ScholarCross Ref
- Junwei Han, Dingwen Zhang, Gong Cheng, Lei Guo, and Jinchang Ren. 2015. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans. Geosci. Remote Sens. 53, 6 (2015), 3325--3337.Google ScholarCross Ref
- Zhizhong Han, Xinhai Liu, Yu-Shen Liu, and Matthias Zwicker. 2019. Parts4Feature: Learning 3D global features from generally semantic parts in multiple views. In Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI’19). 766--773.Google ScholarCross Ref
- Zhizhong Han, Honglei Lu, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Matthias Zwicker, Junwei Han, and C. L. Philip Chen. 2019. 3D2SeqViews: Aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation. IEEE Trans. Image Process. 28, 8 (2019), 3986--3999.Google ScholarCross Ref
- Zhizhong Han, Mingyang Shang, Zhenbao Liu, Chi-Man Vong, Yu-Shen Liu, Matthias Zwicker, Junwei Han, and C. L. Philip Chen. 2019. SeqViews2SeqLabels: Learning 3D global features via aggregating sequential views by RNN with attention. IEEE Trans. Image Process. 28, 2 (2019), 1941--0042.Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.Google ScholarCross Ref
- Luis Herranz, Shuqiang Jiang, and Ruihan Xu. 2017. Modeling restaurant context for food recognition. IEEE Trans. Multimedia 19, 2 (2017), 430--440.Google ScholarDigital Library
- G. Huang, Z. Liu, L. v. d. Maaten, and K. Q. Weinberger. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2261--2269.Google Scholar
- Shuqiang Jiang, Weiqing Min, Linhu Liu, and Zhengdong Luo. 2019. Multi-scale multi-view deep feature aggregation for food recognition. IEEE Trans. Image Process. 29, 1 (2019), 265--276.Google ScholarCross Ref
- Taichi Joutou and Keiji Yanai. 2010. A food image recognition system with multiple kernel learning. In Proceedings of the IEEE International Conference on Image Processing. 285--288.Google ScholarDigital Library
- Hokuto Kagaya, Kiyoharu Aizawa, and Makoto Ogawa. 2014. Food detection and recognition using convolutional neural network. In Proceedings of the ACM International Conference on Multimedia. 1085--1088.Google ScholarDigital Library
- Yoshiyuki Kawano and Keiji Yanai. 2014. Food image recognition with deep convolutional features. In Proceedings of the ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. 589--593.Google ScholarDigital Library
- Yoshiyuki Kawano and Keiji Yanai. 2014. Foodcam: A real-time mobile food recognition system employing fisher vector. In Proceedings of the International Conference on Multimedia Modeling. Springer, 369--373.Google ScholarDigital Library
- Diederik P. Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
- Gregory Koch, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese neural networks for one-shot image recognition. In Proceedings of the International Conference on Machine Learning, Vol. 2.Google Scholar
- Brenden Lake, Ruslan Salakhutdinov, Jason Gross, and Joshua Tenenbaum. 2011. One shot learning of simple visual concepts. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 33.Google Scholar
- Brenden M. Lake, Ruslan Salakhutdinov, and Joshua B. Tenenbaum. 2013. One-shot learning by inverting a compositional causal process. In Proceedings of the International Conference on Neural Information Processing Systems. 2526--2534.Google Scholar
- Yanbin Liu, Juho Lee, Minseop Park, Saehoon Kim, Eunho Yang, Sungju Hwang, and Yi Yang. 2019. Learning to propagate labels: Transductive propagation network for few-shot learning. In Proceedings of the International Conference on Learning Representations.Google Scholar
- Yuzhen Lu, Yuping Huang, and Renfu Lu. 2017. Innovative hyperspectral imaging-based techniques for quality evaluation of fruits and vegetables: A review. Appl. Sci. 7, 2 (2017), 189.Google ScholarCross Ref
- J. Marin, A. Biswas, F. Ofli, N. Hynes, A. Salvador, Y. Aytar, I. Weber, and A. Torralba. 2019. Recipe1M+: A dataset for learning cross-modal embeddings for cooking recipes and food images. IEEE Trans. Pattern Anal. Mach. Intell. (2019), 1. Early Access.Google Scholar
- Niki Martinel, Gian Luca Foresti, and Christian Micheloni. 2018. Wide-slice residual networks for food recognition. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision. 567--576.Google ScholarCross Ref
- Niki Martinel, Claudio Piciarelli, Christian Micheloni, and Gian Luca Foresti. 2015. A structured committee for food recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 484--492.Google ScholarDigital Library
- A. E. Mesas, M. Mu±ozpareja, E. Lopez-Garcia, and F. Rodríguez Artalejo. 2012. Selected eating behaviours and excess body weight: A systematic review.Obesity Rev. Offic. J. Int. Assoc. Study Obesity 13, 2 (2012), 106.Google ScholarCross Ref
- Austin Meyers, Nick Johnston, Vivek Rathod, Anoop Korattikara, Alex Gorban, Nathan Silberman, Sergio Guadarrama, George Papandreou, Jonathan Huang, and Kevin P. Murphy. 2015. Im2Calories: Towards an automated mobile vision food diary. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1233--1241.Google Scholar
- Weiqing Min, Bing-Kun Bao, Shuhuan Mei, Yaohui Zhu, Yong Rui, and Shuqiang Jiang. 2018. You are what you eat: Exploring rich recipe information for cross-region food analysis. IEEE Trans. Multimedia 20, 4 (2018), 950--964.Google ScholarDigital Library
- Weiqing Min, Shuqiang Jiang, Linhu Liu, Yong Rui, and Ramesh Jain. 2019. A survey on food computing. ACM Comput. Surv. 52, 5 (2019), 92:1--92:36.Google Scholar
- Weiqing Min, Shuqiang Jiang, Jitao Sang, Huayang Wang, Xinda Liu, and Luis Herranz. 2017. Being a supercook: Joint food attributes and multimodal content modeling for recipe retrieval and exploration. IEEE Trans. Multimedia 19, 5 (2017), 1100--1113.Google ScholarDigital Library
- Weiqing Min, Linhu Liu, Zhengdong Luo, and Shuqiang Jiang. 2019. Ingredient-guided cascaded multi-attention network for food recognition. In Proceedings of the ACM International Conference on Multimedia. 99--107.Google ScholarDigital Library
- Tsendsuren Munkhdalai and Hong Yu. 2017. Meta networks. arXiv preprint arXiv:1703.00837.Google Scholar
- Kaoru Ota, Minh Son Dao, Vasileios Mezaris, and Francesco G. B. De Natale. 2017. Deep learning for mobile multimedia: A survey. ACM Trans. Multimedia Comput. Commun. Appl. 13, 3s (2017), 34:1--34:22.Google ScholarDigital Library
- Parisa Pouladzadeh and Shervin Shirmohammadi. 2017. Mobile multi-food recognition using deep learning. ACM Trans. Multimedia Comput. Commun. Appl. 13, 3s (2017).Google ScholarDigital Library
- Siyuan Qiao, Chenxi Liu, Wei Shen, and Alan L. Yuille. 2018. Few-shot image recognition by predicting parameters from activations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7229--7238.Google Scholar
- Amaia Salvador, Nicholas Hynes, Yusuf Aytar, Javier Marin, Ferda Ofli, Ingmar Weber, and Antonio Torralba. 2017. Learning cross-modal embeddings for cooking recipes and food images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3068--3076.Google ScholarCross Ref
- Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy P. Lillicrap. 2016. One-shot learning with memory-augmented neural networks. CoRR abs/1605.06065.Google ScholarDigital Library
- Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2017. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 618--626.Google ScholarCross Ref
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.Google Scholar
- Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for few-shot learning. In Advances in Neural Information Processing Systems. MIT Press, 4080--4090.Google ScholarDigital Library
- Flood Sung, Yongxin Yang, Li Zhang, Tao Xiang, Philip H. S. Torr, and Timothy M. Hospedales. 2018. Learning to compare: Relation network for few-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1199--1208.Google Scholar
- Ryosuke Tanno, Koichi Okamoto, and Keiji Yanai. 2016. DeepFoodCam: A DCNN-based real-time mobile food recognition system. In Proceedings of the International Workshop on Multimedia Assisted Dietary Management. 89--89.Google ScholarDigital Library
- Sebastian Thrun. 1998. Lifelong Learning Algorithms. Springer, 181--209.Google Scholar
- Oriol Vinyals, Charles Blundell, Timothy Lillicrap, Koray Kavukcuoglu, and Daan Wierstra. 2016. Matching networks for one shot learning. In Advances in Neural Information Processing Systems. 3630--3638.Google Scholar
- Ruihan Xu, Luis Herranz, Shuqiang Jiang, Shuang Wang, Xinhang Song, and Ramesh Jain. 2015. Geolocalized modeling for dish recognition. IEEE Trans. Multimedia 17, 8 (2015), 1187--1199.Google ScholarDigital Library
- Shulin Yang, Mei Chen, Dean Pomerleau, and Rahul Sukthankar. 2010. Food recognition using statistics of pairwise local features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2249--2256.Google Scholar
- Matthew D. Zeiler and Rob Fergus. 2013. Visualizing and understanding convolutional networks. CoRR abs/1311.2901.Google Scholar
- Dingwen Zhang, Deyu Meng, and Junwei Han. 2017. Co-saliency detection via a self-paced multiple-instance learning framework. IEEE Trans. Pattern Anal. Mach. Intell. 39, 5 (2017), 865--878.Google ScholarDigital Library
- Ning Zhang, Jeff Donahue, Ross Girshick, and Trevor Darrell. 2014. Part-based R-CNNs for fine-grained category detection. In Proceedings of the European Conference on Computer Vision. 834--849.Google ScholarCross Ref
- Jiannan Zheng, Z. Jane Wang, and Chunsheng Zhu. 2017. Food image recognition via superpixel-based low-level and mid-level distance coding for smart home applications. Sustainability 9, 5 (2017), 856.Google ScholarCross Ref
Index Terms
- Few-shot Food Recognition via Multi-view Representation Learning
Recommendations
Few-shot Food Recognition with Pre-trained Model
CEA++ '22: Proceedings of the 1st International Workshop on Multimedia for Cooking, Eating, and related APPlicationsFood recognition is a challenging task due to the diversity of food. However, conventional training in food recognition networks demands large amounts of labeled images, which is laborious and expensive. In this work, we aim to tackle the challenging ...
A supervised extreme learning committee for food recognition
A food recognition system exploiting a supervised committee of classifiers is proposed.The system automatically selects the optimal features for the task.The structured fusion approach is designed to achieve an optimal ranking.Evaluations have been ...
Food Item Recognition and Intake Measurement Techniques
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and ComputingHigh-calorie intake can be harmful and result in numerous diseases. Standard intake of a number of calories is fundamental for keeping the right balance of calories in the human body. Currently, some techniques allow users to estimate the calorie count ...
Comments