research-article

Multimodal Deep Learning for Activity and Context Recognition

Authors:
Valentin Radu

The University of Edinburgh

The University of Edinburgh
View Profile

,
Catherine Tong

University of Oxford

University of Oxford
View Profile

,
Sourav Bhattacharya

Nokia Bell Labs

Nokia Bell Labs
View Profile

,
Nicholas D. Lane

University of Oxford and Nokia Bell Labs

University of Oxford and Nokia Bell Labs
View Profile

,
Cecilia Mascolo

University of Cambridge

University of Cambridge
View Profile

,
Mahesh K. Marina

The University of Edinburgh

The University of Edinburgh
View Profile

,
Fahim Kawsar

Nokia Bell Labs and TU Delft

Nokia Bell Labs and TU Delft
View Profile

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 1 Issue 4Article No.: 157pp 1–27https://doi.org/10.1145/3161174

Published:08 January 2018Publication History

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Wearables and mobile devices see the world through the lens of half a dozen low-power sensors, such as, barometers, accelerometers, microphones and proximity detectors. But differences between sensors ranging from sampling rates, discrete and continuous data or even the data type itself make principled approaches to integrating these streams challenging. How, for example, is barometric pressure best combined with an audio sample to infer if a user is in a car, plane or bike? Critically for applications, how successfully sensor devices are able to maximize the information contained across these multi-modal sensor streams often dictates the fidelity at which they can track user behaviors and context changes. This paper studies the benefits of adopting deep learning algorithms for interpreting user activity and context as captured by multi-sensor systems. Specifically, we focus on four variations of deep neural networks that are based either on fully-connected Deep Neural Networks (DNNs) or Convolutional Neural Networks (CNNs). Two of these architectures follow conventional deep models by performing feature representation learning from a concatenation of sensor types. This classic approach is contrasted with a promising deep model variant characterized by modality-specific partitions of the architecture to maximize intra-modality learning. Our exploration represents the first time these architectures have been evaluated for multimodal deep learning under wearable data -- and for convolutional layers within this architecture, it represents a novel architecture entirely. Experiments show these generic multimodal neural network models compete well with a rich variety of conventional hand-designed shallow methods (including feature extraction and classifier construction) and task-specific modeling pipelines, across a wide-range of sensor types and inference tasks (four different datasets). Although the training and inference overhead of these multimodal deep approaches is in some cases appreciable, we also demonstrate the feasibility of on-device mobile and wearable execution is not a barrier to adoption. This study is carefully constructed to focus on multimodal aspects of wearable data modeling for deep learning by providing a wide range of empirical observations, which we expect to have considerable value in the community. We summarize our observations into a series of practitioner rules-of-thumb and lessons learned that can guide the usage of multimodal deep learning for activity and context detection.

Supplemental Material

Available for Download

zip

radu.zip (56.8 KB)

Supplemental movie, appendix, image and software files for, Multimodal Deep Learning for Activity and Context Recognition

References

Michael Barz, Mohammad Mehdi Moniri, Markus Weber, and Daniel Sonntag. 2016. Multimodal Multisensor Activity Annotation Tool. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct (UbiComp‘16). ACM, New York, NY, USA, 17--20. https://doi.org/10.1145/2968219.2971459 Google ScholarDigital Library
Yoshua Bengio, Ian J. Goodfellow, and Aaron Courville. 2015. Deep Learning. (2015). http://www.iro.umontreal.ca/~bengioy/dlbook Book in preparation for MIT Press.Google Scholar
S. Bhattacharya and Nicholas D. Lane. 2016. From smart to deep: Robust activity recognition on smartwatches using deep learning. In 2016 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops). 1--6. https://doi.org/10.1109/PERCOMW.2016.7457169 Google ScholarCross Ref
Sourav Bhattacharya and Nicholas D. Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems (SenSys). ACM, 176--189. Google ScholarDigital Library
Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA.Google Scholar
Tatiana Bokareva, Wen Hu, Salil Kanhere, Branko Ristic, Neil Gordon, Travis Bessell, Mark Rutten, and Sanjay Jha. 2006. Wireless sensor networks for battlefield surveillance. In Proceedings of the land warfare conference. 1--8.Google Scholar
Heike Brock, Yuji Ohgi, and James Lee. 2017. Learning to judge like a human: convolutional networks for classification of ski jumping errors. In Proceedings of the 2017 ACM International Symposium on Wearable Computers. ACM, 106--113. Google ScholarDigital Library
Donald E. Brown, Vincent Corruble, and Clarence Louis Pittard. 1993. A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems. Pattern Recognition 26, 6 (1993), 953--961. https://doi.org/10.1016/0031-3203(93)90060-A Google ScholarCross Ref
Andreas Bulling, Jamie A. Ward, and Hans Gellersen. 2012. Multimodal recognition of reading activity in transit using body-worn sensors. TAP 9, 1 (2012), 2. https://doi.org/10.1145/2134203.2134205 Google ScholarDigital Library
Jose A Castellanos and Juan D Tardos. 2000. Mobile robot localization and map building: A multisensor fusion approach. Kluwer academic publishers.Google Scholar
Guoguo Chen, Carolina Parada, and Georg Heigold. 2014. Small-footprint Keyword Spotting Using Deep Neural Networks. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP‘14). Google ScholarCross Ref
W. Chen, A. Sano, D. L. Martinez, S. Taylor, A. W. McHill, A. J. K. Phillips, L. Barger, E. B. Klerman, and R. W. Picard. 2017. Multimodal ambulatory sleep detection. In 2017 IEEE EMBS International Conference on Biomedical Health Informatics (BHI). 465--468. https://doi.org/10.1109/BHI.2017.7897306 Google ScholarCross Ref
Tanzeem Choudhury, Gaetano Borriello, Sunny Consolvo, Dirk Haehnel, Beverly Harrison, Bruce Hemingway, Jeffrey Hightower, Predrag “Pedja” Klasnja, Karl Koscher, Anthony LaMarca, James A. Landay, Louis LeGrand, Jonathan Lester, Ali Rahimi, Adam Rea, and Danny Wyatt. 2008. The Mobile Sensing Platform: An Embedded Activity Recognition System. IEEE Pervasive Computing 7, 2 (April 2008), 32--41. https://doi.org/10.1109/MPRV.2008.39 Google ScholarDigital Library
Li Deng and Dong Yu. 2014. DEEP LEARNING: Methods and Applications. Technical Report MSR-TR-2014-21. http://research.microsoft.com/apps/pubs/default.aspx?id=209355Google Scholar
Samira Ebrahimi Kahou, Xavier Bouthillier, Pascal Lamblin, Çağlar Gülçehre, Vincent Michalski, Kishore Reddy Konda, Sébastien Jean, Pierre Froumenty, Yann Dauphin, Nicolas Boulanger-Lewandowski, Raul Chandias Ferrari, Mehdi Mirza, David Warde-Farley, Aaron Courville, Pascal Vincent, Roland Memisevic, Christopher Pal, and Yoshua Bengio. 2015. EmoNets: Multimodal deep learning approaches for emotion recognition in video. Journal on Multimodal User Interfaces (2015), 1--13. https://doi.org/10.1007/s12193-015-0195-2 Google ScholarCross Ref
Yarin Gal, Riashat Islam, and Zoubin Ghahramani. 2017. Deep Bayesian Active Learning with Image Data. CoRR abs/1703.02910 (2017). http://arxiv.org/abs/1703.02910Google Scholar
Petko Georgiev, Sourav Bhattacharya, Nicholas D. Lane, and Cecilia Mascolo. 2017. Low-resource Multi-task Audio Sensing for Mobile and Embedded Devices via Shared Deep Neural Network Representations. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3, Article 50 (Sept. 2017), 19 pages. https://doi.org/10.1145/3131895 Google ScholarDigital Library
Github repository 2017. Multimodal Deep Learning Framework. https://github.com/vradu10/deepfusion.git. (2017).Google Scholar
A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. Ch. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. 2000. PhysioBank, PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals. Circulation 101, 23 (2000), e215--e220. Circulation Electronic Pages: http://circ.ahajournals.org/cgi/content/full/101/23/e215PMID:1085218; doi: 10.1161/01.CIR.101.23.e215. Google ScholarCross Ref
Alex Graves, A-R Mohamed, and Geoffrey Hinton. 2013. Speech recognition with deep recurrent neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 6645--6649.Google Scholar
Matthieu Guillaumin, Jakob Verbeek, and Cordelia Schmid. 2010. Multimodal semi-supervised learning for image classification. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 902--909. Google ScholarCross Ref
Haodong Guo, Ling Chen, Liangying Peng, and Gencai Chen. 2016. Wearable sensor based multimodal human activity recognition exploiting the diversity of classifier ensemble. In Proceedings of UbiComp. ACM. Google ScholarDigital Library
Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten. 2009. The WEKA Data Mining Software: An Update. SIGKDD Explor. Newsl. 11, 1 (Nov. 2009), 10--18. https://doi.org/10.1145/1656274.1656278 Google ScholarDigital Library
Nils Hammerla, James Fisher, Peter Andras, Lynn Rochester, Richard Walker, and Thomas Plötz. 2015. PD Disease State Assessment in Naturalistic Environments using Deep Learning. In AAAI 2015.Google ScholarDigital Library
Nils Hammerla, Shane Halloran, and Thomas Ploetz. 2016. Deep, Convolutional, and Recurrent Models for Human Activity Recognition using Wearables. In Proceedings of IJCAI. ACM.Google Scholar
Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).Google Scholar
Awni Y. Hannun, Carl Case, Jared Casper, Bryan C. Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep Speech: Scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014). http://arxiv.org/abs/1412.5567Google Scholar
Samuli Hemminki, Petteri Nurmi, and Sasu Tarkoma. 2013. Accelerometer-based Transportation Mode Detection on Smartphones. In Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems (SenSys‘13). ACM, New York, NY, USA, Article 13, 14 pages. https://doi.org/10.1145/2517351.2517367 Google ScholarDigital Library
Loc N Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-based Deep Learning Framework for Continuous Vision Applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services. ACM, 82--95.Google ScholarDigital Library
Ashish Kapoor and Rosalind W Picard. 2005. Multimodal affect recognition in learning environments. In Proceedings of the 13th annual ACM international conference on Multimedia. ACM, 677--682. Google ScholarDigital Library
Thomas Kautz, Benjamin H Groh, Julius Hannink, Ulf Jensen, Holger Strubberg, and Bjoern M Eskofier. 2017. Activity recognition in beach volleyball using a Deep Convolutional Neural Network. Data Mining and Knowledge Discovery (2017), 1--28.Google Scholar
Mohamed Khamis, Florian Alt, Mariam Hassib, Emanuel von Zezschwitz, Regina Hasholzner, and Andreas Bulling. 2016. GazeTouchPass: Multimodal Authentication Using Gaze and Touch on Mobile Devices. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems (CHI EA‘16). ACM, New York, NY, USA, 2156--2164. https://doi.org/10.1145/2851581.2892314 Google ScholarDigital Library
Yelin Kim, Honglak Lee, and E.M. Provost. 2013. Deep learning for robust feature generation in audiovisual emotion recognition. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. 3687--3691. https://doi.org/10.1109/ICASSP.2013.6638346 Google ScholarCross Ref
Ryan Kiros, Ruslan Salakhutdinov, and Richard S. Zemel. 2014. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. CoRR abs/1411.2539 (2014). http://arxiv.org/abs/1411.2539Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger (Eds.). Curran Associates, Inc., 1097--1105. http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdfGoogle Scholar
Saewon Kye, Junhyung Moon, Juneil Lee, Inho Choi, Dongmi Cheon, and Kyoungwoo Lee. 2017. Multimodal Data Collection Framework for Mental Stress Monitoring. In Proceedings of the 2017 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2017 ACM International Symposium on Wearable Computers (UbiComp‘17). ACM, New York, NY, USA, 822--829. https://doi.org/10.1145/3123024.3125616 Google ScholarDigital Library
Nicholas D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qendro, and F. Kawsar. 2016. DeepX: A Software Accelerator for Low-Power Deep Learning Inference on Mobile Devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). 1--12. https://doi.org/10.1109/IPSN.2016.7460664 Google ScholarCross Ref
Nicholas D. Lane, Sourav Bhattacharya, Petko Georgiev, Claudio Forlivesi, and Fahim Kawsar. 2015. An early resource characterization of deep learning on wearables, smartphones and internet-of-things devices. In Proceedings of the 2015 International Workshop on Internet of Things towards Applications. ACM, 7--12. Google ScholarDigital Library
Nicholas D. Lane and Petko Georgiev. 2015. Can Deep Learning Revolutionize Mobile Sensing?. In HotMobile 2015.Google Scholar
Nicholas D. Lane, Petko Georgiev, and Lorena Qendro. 2015. DeepEar: Robust Smartphone Audio Sensing in Unconstrained Acoustic Environments Using Deep Learning. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp‘15). ACM, New York, NY, USA, 283--294. https://doi.org/10.1145/2750858.2804262 Google ScholarDigital Library
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep Learning. Nature (2015).Google Scholar
LG G Watch R 2017. LG G Watch R. https://www.qualcomm.com/products/snapdragon/wearables/lg-g-watch-r. (2017).Google Scholar
Wei Liu, Wei-Long Zheng, and Bao-Liang Lu. 2016. Multimodal Emotion Recognition Using Multimodal Deep Learning. CoRR abs/1602.08225 (2016). http://arxiv.org/abs/1602.08225Google Scholar
Hong Lu, Jun Yang, Zhigang Liu, Nicholas D. Lane, Tanzeem Choudhury, and Andrew T. Campbell. 2010. The Jigsaw Continuous Sensing Engine for Mobile Phone Applications. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems (SenSys‘10). ACM, New York, NY, USA, 71--84. https://doi.org/10.1145/1869983.1869992 Google ScholarDigital Library
Lumo Lift 2017. Lumo Lift. http://www.lumobodytech.com. (2017).Google Scholar
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. 2014. Explain Images with Multimodal Recurrent Neural Networks. ArXiv e-prints (Oct. 2014). arXiv:cs.CV/1410.1090Google Scholar
Christopher Merck, Christina Maher, Mark Mirtchouk, Min Zheng, Yuxiao Huang, and Samantha Kleinberg. 2016. Multimodality Sensing for Eating Recognition. In Proceedings of the 10th EAI International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth‘16). ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering), ICST, Brussels, Belgium, Belgium, 130--137. http://dl.acm.org/citation.cfm?id=3021319.3021339Google ScholarDigital Library
Microsoft Band 2017. Microsoft Band. http://www.microsoft.com/Microsoft-Band/. (2017).Google Scholar
Francisco Javier Ordóñez Morales and Daniel Roggen. 2016. Deep convolutional feature transfer across mobile activity recognition domains, sensor modalities and locations. In Proceedings of the 2016 ACM International Symposium on Wearable Computers. ACM, 92--99.Google ScholarDigital Library
Y. Mroueh, E. Marcheret, and V. Goel. 2015. Deep multimodal learning for Audio-Visual Speech Recognition. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2130--2134. https://doi.org/10.1109/ICASSP.2015.7178347 Google ScholarCross Ref
Sebastian Münzner, Philip Schmidt, Attila Reiss, Michael Hanselmann, Rainer Stiefelhagen, and Robert Dürichen. 2017. CNN-based Sensor Fusion Techniques for Multimodal Human Activity Recognition. In Proceedings of the 2017 ACM International Symposium on Wearable Computers (ISWC‘17). ACM, New York, NY, USA, 158--165. https://doi.org/10.1145/3123021.3123046 Google ScholarDigital Library
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28 - July 2, 2011, Lise Getoor and Tobias Scheffer (Eds.). Omnipress, 689--696.Google Scholar
Trung Thanh Ngo, Yasushi Makihara, Hajime Nagahara, Yasuhiro Mukaigawa, and Yasushi Yagi. 2015. Similar gait action recognition using an inertial sensor. Pattern Recognition 48, 4 (2015), 1289--1301. https://doi.org/10.1016/j.patcog.2014.10.012 Google ScholarDigital Library
Reza Olfati-Saber and Jeff S Shamma. 2005. Consensus filters for sensor networks and distributed sensor fusion. In Decision and Control, 2005 and 2005 European Control Conference. CDC-ECC‘05. 44th IEEE Conference on. IEEE, 6698--6703.Google ScholarCross Ref
Soujanya Poria, Erik Cambria, Newton Howard, Guang-Bin Huang, and Amir Hussain. 2016. Fusing audio, visual and textual clues for sentiment analysis from multimodal content. Neurocomputing 174, Part A (2016), 50--59. https://doi.org/10.1016/j.neucom.2015.01.095 Google ScholarDigital Library
Valentin Radu, Panagiota Katsikouli, Rik Sarkar, and Mahesh K. Marina. 2014. A Semi-supervised Learning Approach for Robust Indoor-outdoor Detection with Smartphones. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems (SenSys ‘14). ACM, New York, NY, USA, 280--294. https://doi.org/10.1145/2668332.2668347 Google ScholarDigital Library
Valentin Radu, Nicholas D. Lane, Sourav Bhattacharya, Cecilia Mascolo, Mahesh K Marina, and Fahim Kawsar. 2016. Towards multimodal deep learning for activity recognition on mobile devices. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct. ACM, 185--188. Google ScholarDigital Library
Valentin Radu and Mahesh K. Marina. 2013. HiMLoc: Indoor Smartphone Localization via Activity Aware Pedestrian Dead Reckoning with Selective Crowdsourced WiFi Fingerprinting. In In Proc. Indoor Positioning and Indoor Navigation (IPIN). IEEE. http://dx.doi.org/10.1109/IPIN.2013.6817916 Google ScholarCross Ref
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In European Conference on Computer Vision. Springer, 525--542. Google ScholarCross Ref
Devendra Singh Sachan, Umesh Tekwani, and Amit Sethi. 2013. Sports Video Classification from Multimodal Information Using Deep Neural Networks. In 2013 AAAI Fall Symposium Series.Google Scholar
Gyula Simon, Miklós Maróti, Ákos Lédeczi, György Balogh, Branislav Kusy, András Nádas, Gábor Pap, János Sallai, and Ken Frampton. 2004. Sensor network-based countersniper system. In Proceedings of the 2nd international conference on Embedded networked sensor systems. ACM, 1--12. Google ScholarDigital Library
Cees GM Snoek, Marcel Worring, and Arnold WM Smeulders. 2005. Early versus late fusion in semantic video analysis. In Proceedings of the 13th annual ACM international conference on Multimedia. ACM, 399--402.Google ScholarDigital Library
Kihyuk Sohn, Wenling Shang, and Honglak Lee. 2014. Improved Multimodal Deep Learning with Variation of Information. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada, Zoubin Ghahramani, Max Welling, Corinna Cortes, Neil D. Lawrence, and Kilian Q. Weinberger (Eds.). 2141--2149. http://papers.nips.cc/paper/5279-improved-multimodal-deep-learning-with-variation-of-informationGoogle Scholar
Nitish Srivastava and Ruslan R Salakhutdinov. 2012. Multimodal Learning with Deep Boltzmann Machines. In Advances in Neural Information Processing Systems 25, F. Pereira, C.J.C. Burges, L. Bottou, and K.Q. Weinberger (Eds.). Curran Associates, Inc., 2222--2230. http://papers.nips.cc/paper/4683-multimodal-learning-with-deep-boltzmann-machines.pdfGoogle Scholar
Allan Stisen, Henrik Blunck, Sourav Bhattacharya, Thor Siiger Prentow, Mikkel Baun Kjærgaard, Anind Dey, Tobias Sonne, and Mads Møller Jensen. 2015. Smart Devices are Different: Assessing and Mitigating Mobile Sensing Heterogeneities for Activity Recognition. In The 13th ACM Conference on Embedded Networked Sensor Systems.Google ScholarDigital Library
Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. 2014. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. In Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarDigital Library
Torch 2017. Torch. http://torch.ch/. (2017).Google Scholar
Ehsan Variani, Xin Lei, Erik McDermott, Ignacio Lopez Moreno, and Javier Gonzalez-Dominguez. 2014. Deep neural networks for small footprint text-dependent speaker verification. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2014, Florence, Italy, May 4--9, 2014. IEEE, 4052--4056. https://doi.org/10.1109/ICASSP.2014.6854363 Google ScholarCross Ref
Weiran Wang, Raman Arora, Karen Livescu, and Jeff Bilmes. 2015. On deep multi-view representation learning. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15). 1083--1092.Google ScholarDigital Library
Pengcheng Wu, Steven C.H. Hoi, Hao Xia, Peilin Zhao, Dayong Wang, and Chunyan Miao. 2013. Online Multimodal Deep Similarity Learning with Application to Image Retrieval. In Proceedings of the 21st ACM International Conference on Multimedia (MM‘13). ACM, New York, NY, USA, 153--162. https://doi.org/10.1145/2502081.2502112 Google ScholarDigital Library
Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, 351--360. Google ScholarDigital Library
Piero Zappi, Thomas Stiefmeier, Elisabetta Farella, Daniel Roggen, Luca Benini, and Gerhard Tröster. 2007. Activity Recognition from On-Body Sensors by Classifier Fusion: Sensor Scalability and Robustness. In 3rd Int. Conf. on Intelligent Sensors, Sensor Networks, and Information Processing (ISSNIP). 281--286. http://www2.ife.ee.ethz.ch/~droggen/publications/wear/EDAS_ISSNIP.pdfGoogle ScholarCross Ref

Recommendations

Towards multimodal deep learning for activity recognition on mobile devices
UbiComp '16: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct

Current smartphones and smartwatches come equipped with a variety of sensors, from light sensor and inertial sensors to radio interfaces, enabling applications running on these devices to make sense of their surrounding environment. Rather than using ...
Read More
A Survey on Deep Learning for Human Activity Recognition
Human activity recognition is a key to a lot of applications such as healthcare and smart home. In this study, we provide a comprehensive survey on recent advances and challenges in human activity recognition (HAR) with deep learning. Although there are ...
Read More
Deep appearance and motion learning for egocentric activity recognition

Egocentric activity recognition has recently generated great popularity in computer vision due to its widespread applications in egocentric video analysis. However, it poses new challenges comparing to the conventional third-person activity recognition ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 1, Issue 4
December 2017
1298 pages
EISSN:2474-9567
DOI:10.1145/3178157
Issue’s Table of Contents

Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 8 January 2018
- Accepted: 1 October 2017
- Revised: 1 August 2017
- Received: 1 February 2017
Published in imwut Volume 1, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Mobile sensing
activity recognition
context detection
deep learning
deep neural networks
multi-modal
sensor fusion
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 162
  Total Citations
  View Citations
- 3,079
  Total Downloads
- Downloads (Last 12 months)364
- Downloads (Last 6 weeks)39
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Multimodal Deep Learning for Activity and Context Recognition

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Supplemental Material

Available for Download

References

Cited By

Recommendations

Towards multimodal deep learning for activity recognition on mobile devices

A Survey on Deep Learning for Human Activity Recognition

Deep appearance and motion learning for egocentric activity recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Multimodal Deep Learning for Activity and Context Recognition

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Abstract

Supplemental Material

Available for Download

References

Cited By

Recommendations

Towards multimodal deep learning for activity recognition on mobile devices

A Survey on Deep Learning for Human Activity Recognition

Deep appearance and motion learning for egocentric activity recognition

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media