ABSTRACT
In order to provide content-based search on image media, including images and video, they are typically accessed based on manual or automatically assigned concepts or tags, or sometimes based on image-image similarity depending on the use case. While great progress has been made in very recent years in automatic concept detection using machine learning, we are still left with a mis-match between the semantics of the concepts we can automatically detect, and the semantics of the words used in a user's query, for example. In this paper we report on a large collection of images from wearable cameras gathered as part of the Kids'Cam project, which have been both manually annotated from a vocabulary of 83 concepts, and automatically annotated from a vocabulary of 1,000 concepts. This collection allows us to explore issues around how language, in the form of two distinct concept vocabularies or spaces, one manually assigned and thus forming a ground-truth, is used to represent images, in our case taken using wearable cameras. It also allows us to discuss, in general terms, issues around mis-match of concepts in visual media, which derive from language mis-matches. We report the data processing we have completed on this collection and some of our initial experimentation in mapping across the two language vocabularies.
- G. Awad, C. G. M. Snoek, A. F. Smeaton, and G. Quénot. TRECVid Semantic Indexing of Video: A 6-Year Retrospective. ITE Transactions on Media Technology and Applications, pages 1--22, 2016. (in press).Google Scholar
- S. Barzegar, J. E. Sales, A. Freitas, S. Handschuh, and B. Davis. Dinfra: A one stop shop for computing multilingual semantic relatedness. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '15, pages 1027--1028, New York, NY, USA, 2015. ACM. Google ScholarDigital Library
- T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015.Google Scholar
- S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391, 1990.Google ScholarCross Ref
- C. Gurrin, A. F. Smeaton, and A. R. Doherty. Lifelogging: Personal big data. Foundations and Trends in Information Retrieval, 8(1):1--125, 2014. Google ScholarDigital Library
- Z. S. Harris. Distributional structure. WORD, 10(2--3):146--162, 1954.Google Scholar
- K. McGuinness, R. Aly, K. Chatfield, O. Parkhi, R. Arandjelovic, M. Douze, M. Kemman, M. Kleppe, P. Van Der Kreeft, K. Macquarrie, et al. The axes research video search system. In IEEE ICASSP-International Conference on Acoustics, Speech and Signal Processing, pages 4--9, 2014.Google Scholar
- T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546, 2013.Google Scholar
- Ministry of Health. New Zealand Health Survey. Annual update of key findings 2014/15. Wellington: Ministry of Health. http://www.health.govt.nz/publication/annual-update-key-results-2014--15-new-zealand-health -survey. Accessed Mar 15, 2016.Google Scholar
- OECD. Obesity Update. http://www.oecd.org/els/health-systems/Obesity-Update-2014.pdf. Accessed Oct 3, 2015.Google Scholar
- G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, M. Wang, and H.-J. Zhang. Correlative multilabel video annotation with temporal kernels. ACM Trans. Multimedia Comput. Commun. Appl., 5(1):3:1--3:27, Oct. 2008. Google ScholarDigital Library
- P. Resnik. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. CoRR, abs/1105.5444, 2011.Google Scholar
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211--252, 2015. Google ScholarDigital Library
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.Google Scholar
- A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In MIR '06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 321--330, New York, NY, USA, 2006. ACM Press. Google ScholarDigital Library
- P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. J. Artif. Int. Res., 37(1):141--188, Jan. 2010. Google ScholarDigital Library
- P. Wang, L. Sun, S. Yang, and A. F. Smeaton. Towards training-free refinement for semantic indexing of visual media. In MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4--6, 2016, Proceedings, Part I, pages 251--263, Cham, 2016. Springer International Publishing. Google ScholarDigital Library
- P. Wang, L. Sun, S. Yang, A. F. Smeaton, and C. Gurrin. Characterizing everyday activities from visual lifelogs based on enhancing concept representation. Computer Vision and Image Understanding, 148:181--192, 2016. Special issue on Assistive Computer Vision and Robotics: Assistive Solutions for Mobility, Communication and HMI. Google ScholarDigital Library
- WHO. Report of the Commission on Ending Childhood Obesity. Geneva: World Health Organization. http://apps.who.int.wmezproxy.wnmeds.ac.nz/iris/bitstream/10665/204176/1/9789241510066_eng.pdf. Accessed Dec 18, 2015.Google Scholar
- X. Xue, W. Zhang, J. Zhang, B. Wu, J. Fan, and Y. Lu. Correlative multi-label multi-instance image annotation. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 651--658, Nov 2011. Google ScholarDigital Library
Index Terms
- Semantic Indexing of Wearable Camera Images: Kids'Cam Concepts
Recommendations
Privacy behaviors of lifeloggers using wearable cameras
UbiComp '14: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous ComputingA number of wearable 'lifelogging' camera devices have been released recently, allowing consumers to capture images and other sensor data continuously from a first-person perspective. Unlike traditional cameras that are used deliberately and ...
Concept-based indexing of annotated images using semantic DNA
One of the challenges in image retrieval is dealing with concepts which have no visual appearance in the images or are not used as keywords in their annotations. To address this problem, this paper proposes an unsupervised concept-based image indexing ...
Understanding lifelog sharing preferences of lifeloggers
OzCHI '16: Proceedings of the 28th Australian Conference on Computer-Human InteractionThe lifelogging activity enables users, the lifeloggers, to passively capture images using wearable cameras from a first person perspective and ultimately create a visual diary encoding every possible aspect of their life with unprecedented details. ...
Comments