Abstract
Automatic recognition of behavioral context (location, activities, body-posture etc.) can serve health monitoring, aging care, and many other domains. Recognizing context in-the-wild is challenging because of great variability in behavioral patterns, and it requires a complex mapping from sensor features to predicted labels. Data collected in-the-wild may be unbalanced and incomplete, with cases of missing labels or missing sensors. We propose using the multiple layer perceptron (MLP) as a multi-task model for context recognition. Based on features from multi-modal sensors, the model simultaneously predicts many diverse context labels. We analyze the advantages of the model's hidden layers, which are shared among all sensors and all labels, and provide insight to the behavioral patterns that these hidden layers may capture. We demonstrate how recognition of new labels can be improved when utilizing a model that was trained for an initial set of labels, and show how to train the model to withstand missing sensors. We evaluate context recognition on the previously published ExtraSensory Dataset, which was collected in-the-wild. Compared to previously suggested models, the MLP improves recognition, even with fewer parameters than a linear model. The ability to train a good model using data that has incomplete, unbalanced labeling and missing sensors encourages further research with uncontrolled, in-the-wild behavior.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Context Recognition In-the-Wild: Unified Model for Multi-Modal Sensors and Multi-Label Classification
- Kay Henning Brodersen, Cheng Soon Ong, Klaas Enno Stephan, and Joachim M Buhmann. 2010. The balanced accuracy and its posterior distribution. In Pattern recognition (ICPR), 2010 20th international conference on. IEEE, 3121--3124.Google Scholar
- Katherine Ellis, Suneeta Godbole, Jacqueline Kerr, and Gert Lanckriet. 2014. Multi-Sensor physical activity recognition in free-living. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication. ACM, 431--440. Google ScholarDigital Library
- Miikka Ermes, Juha Parkka, Jani Mantyjarvi, and Ilkka Korhonen. 2008. Detection of daily activities and sports with wearable sensors in controlled and uncontrolled conditions. Information Technology in Biomedicine, IEEE Transactions on 12, 1 (2008), 20--26.Google ScholarDigital Library
- John J Guiry, Pepijn van de Ven, and John Nelson. 2014. Multi-Sensor Fusion for Enhanced Contextual Awareness of Everyday Activities with Ubiquitous Devices. Sensors 14 (2014), 5687--5701.Google ScholarCross Ref
- Jacqueline Kerr, Ruth E Patterson, Katherine Ellis, Suneeta Godbole, Eileen Johnson, Gert Lanckriet, and John Staudenmayer. 2016. Objective Assessment of Physical Activity: Classifiers for Public Health. Medicine and science in sports and exercise 48, 5 (2016), 951--957. Google ScholarCross Ref
- Adil Mehmood Khan, Ali Tufail, Asad Masood Khattak, and Teemu H Laine. 2014. Activity recognition on smartphones via sensor-fusion and kda-based svms. International Journal of Distributed Sensor Networks 2014 (2014).Google Scholar
- Jennifer R Kwapisz, Gary M Weiss, and Samuel A Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter 12, 2 (2011), 74--82.Google ScholarDigital Library
- Matthew L Lee and Anind K Dey. 2015. Sensor-based observations of daily living for aging in place. Personal and Ubiquitous Computing 19, 1 (2015), 27--43. Google ScholarDigital Library
- Zachary C Lipton, Charles Elkan, and Balakrishnan Naryanaswamy. 2014. Optimal thresholding of classifiers to maximize F1 measure. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 225--239.Google ScholarDigital Library
- Zachary C Lipton, David C Kale, and Randall Wetzel. 2016. Modeling missing data in clinical time series with RNNs. Machine Learning for Healthcare (2016).Google Scholar
- Sri Harish Mallidi and Hynek Hermansky. 2016. Novel neural network based fusion for multistream ASR. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 5680--5684. Google ScholarCross Ref
- Jani Mantyjarvi, Johan Himberg, and Tapio Seppanen. 2001. Recognizing human motion with multiple acceleration sensors. In Systems, Man, and Cybernetics, 2001 IEEE International Conference on, Vol. 2. IEEE, 747--752. Google ScholarCross Ref
- Annamalai Natarajan, Gustavo Angarita, Edward Gaiser, Robert Malison, Deepak Ganesan, and Benjamin M Marlin. 2016. Domain adaptation methods for improving lab-to-field generalization of cocaine detection using wearable ECG. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 875--885. Google ScholarDigital Library
- Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the 28th international conference on machine learning (ICML-11). 689--696.Google ScholarDigital Library
- Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2847--2854. Google ScholarCross Ref
- Susanna Pirttikangas, Kaori Fujinami, and Tatsuo Nakajima. 2006. Feature selection and activity recognition from wearable sensors. In International Symposium on Ubiquitious Computing Systems. Springer, 516--527. Google ScholarDigital Library
- Mattia Rossi, Sebastian Feese, Oliver Amft, Nils Braune, Sandro Martis, and G Troster. 2013. AmbientSense: A real-time ambient sound recognition system for smartphones. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2013 IEEE International Conference on. IEEE, 230--235. Google ScholarCross Ref
- Julia Seiter, Oliver Amft, Mirco Rossi, and Gerhard Tröster. 2014. Discovery of activity composites using topic models: An analysis of unsupervised methods. Pervasive and Mobile Computing 15 (2014), 215--227. Google ScholarDigital Library
- Muhammad Shoaib, Stephan Bosch, Hans Scholten, Paul JM Havinga, and Ozlem Durmaz Incel. 2015. Towards detection of bad habits by fusing smartphone and smartwatch sensors. In Pervasive Computing and Communication Workshops (PerCom Workshops), 2015 IEEE International Conference on. IEEE, 591--596.Google ScholarCross Ref
- Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15, 1 (2014), 1929--1958.Google ScholarDigital Library
- Yonatan Vaizman, Katherine Ellis, and Gert Lanckriet. 2017. Recognizing Detailed Human Context In-the-Wild from Smartphones and Smartwatches. IEEE Pervasive Computing 16, 4 (2017), 62--74. Google ScholarCross Ref
- Jamie A Ward, Paul Lukowicz, and Hans W Gellersen. 2011. Performance metrics for activity recognition. ACM Transactions on Intelligent Systems and Technology (TIST) 2, 1 (2011), 6.Google ScholarDigital Library
- Huiru Zheng, Haiying Wang, and Norman Black. 2008. Human activity detection in smart home environment with self-adaptive neural networks. In Networking, Sensing and Control, 2008. ICNSC 2008. IEEE International Conference on. IEEE, 1505--1510. Google ScholarCross Ref
Index Terms
- Context Recognition In-the-Wild: Unified Model for Multi-Modal Sensors and Multi-Label Classification
Recommendations
Blended Emotion in-the-Wild: Multi-label Facial Expression Recognition Using Crowdsourced Annotations and Deep Locality Feature Learning
AbstractComprehending different categories of facial expressions plays a great role in the design of computational model analyzing human perceived and affective state. Authoritative studies have revealed that facial expressions in human daily life are ...
Face Recognition in the Wild
Face recognition is one of the most important tasks in pattern recognition and computer vision. The most conventional way to per- form face recognition is to compare a set of facial features that are extracted from a source image or a video frame with a ...
Self-paced multi-label co-training
AbstractMulti-label learning aims to solve classification problems where instances are associated with a set of labels. In reality, it is generally easy to acquire unlabeled data but expensive or time-consuming to label them, and this ...
Comments