skip to main content
10.1145/2983563.2983566acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Semantic Indexing of Wearable Camera Images: Kids'Cam Concepts

Authors Info & Claims
Published:16 October 2016Publication History

ABSTRACT

In order to provide content-based search on image media, including images and video, they are typically accessed based on manual or automatically assigned concepts or tags, or sometimes based on image-image similarity depending on the use case. While great progress has been made in very recent years in automatic concept detection using machine learning, we are still left with a mis-match between the semantics of the concepts we can automatically detect, and the semantics of the words used in a user's query, for example. In this paper we report on a large collection of images from wearable cameras gathered as part of the Kids'Cam project, which have been both manually annotated from a vocabulary of 83 concepts, and automatically annotated from a vocabulary of 1,000 concepts. This collection allows us to explore issues around how language, in the form of two distinct concept vocabularies or spaces, one manually assigned and thus forming a ground-truth, is used to represent images, in our case taken using wearable cameras. It also allows us to discuss, in general terms, issues around mis-match of concepts in visual media, which derive from language mis-matches. We report the data processing we have completed on this collection and some of our initial experimentation in mapping across the two language vocabularies.

References

  1. G. Awad, C. G. M. Snoek, A. F. Smeaton, and G. Quénot. TRECVid Semantic Indexing of Video: A 6-Year Retrospective. ITE Transactions on Media Technology and Applications, pages 1--22, 2016. (in press).Google ScholarGoogle Scholar
  2. S. Barzegar, J. E. Sales, A. Freitas, S. Handschuh, and B. Davis. Dinfra: A one stop shop for computing multilingual semantic relatedness. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR '15, pages 1027--1028, New York, NY, USA, 2015. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. Mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems. arXiv preprint arXiv:1512.01274, 2015.Google ScholarGoogle Scholar
  4. S. Deerwester, S. T. Dumais, G. W. Furnas, T. K. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391, 1990.Google ScholarGoogle ScholarCross RefCross Ref
  5. C. Gurrin, A. F. Smeaton, and A. R. Doherty. Lifelogging: Personal big data. Foundations and Trends in Information Retrieval, 8(1):1--125, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Z. S. Harris. Distributional structure. WORD, 10(2--3):146--162, 1954.Google ScholarGoogle Scholar
  7. K. McGuinness, R. Aly, K. Chatfield, O. Parkhi, R. Arandjelovic, M. Douze, M. Kemman, M. Kleppe, P. Van Der Kreeft, K. Macquarrie, et al. The axes research video search system. In IEEE ICASSP-International Conference on Acoustics, Speech and Signal Processing, pages 4--9, 2014.Google ScholarGoogle Scholar
  8. T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. CoRR, abs/1310.4546, 2013.Google ScholarGoogle Scholar
  9. Ministry of Health. New Zealand Health Survey. Annual update of key findings 2014/15. Wellington: Ministry of Health. http://www.health.govt.nz/publication/annual-update-key-results-2014--15-new-zealand-health -survey. Accessed Mar 15, 2016.Google ScholarGoogle Scholar
  10. OECD. Obesity Update. http://www.oecd.org/els/health-systems/Obesity-Update-2014.pdf. Accessed Oct 3, 2015.Google ScholarGoogle Scholar
  11. G.-J. Qi, X.-S. Hua, Y. Rui, J. Tang, T. Mei, M. Wang, and H.-J. Zhang. Correlative multilabel video annotation with temporal kernels. ACM Trans. Multimedia Comput. Commun. Appl., 5(1):3:1--3:27, Oct. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Resnik. Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. CoRR, abs/1105.5444, 2011.Google ScholarGoogle Scholar
  13. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211--252, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.Google ScholarGoogle Scholar
  15. A. F. Smeaton, P. Over, and W. Kraaij. Evaluation campaigns and TRECVid. In MIR '06: Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, pages 321--330, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. P. D. Turney and P. Pantel. From frequency to meaning: Vector space models of semantics. J. Artif. Int. Res., 37(1):141--188, Jan. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Wang, L. Sun, S. Yang, and A. F. Smeaton. Towards training-free refinement for semantic indexing of visual media. In MultiMedia Modeling: 22nd International Conference, MMM 2016, Miami, FL, USA, January 4--6, 2016, Proceedings, Part I, pages 251--263, Cham, 2016. Springer International Publishing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. P. Wang, L. Sun, S. Yang, A. F. Smeaton, and C. Gurrin. Characterizing everyday activities from visual lifelogs based on enhancing concept representation. Computer Vision and Image Understanding, 148:181--192, 2016. Special issue on Assistive Computer Vision and Robotics: Assistive Solutions for Mobility, Communication and HMI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. WHO. Report of the Commission on Ending Childhood Obesity. Geneva: World Health Organization. http://apps.who.int.wmezproxy.wnmeds.ac.nz/iris/bitstream/10665/204176/1/9789241510066_eng.pdf. Accessed Dec 18, 2015.Google ScholarGoogle Scholar
  20. X. Xue, W. Zhang, J. Zhang, B. Wu, J. Fan, and Y. Lu. Correlative multi-label multi-instance image annotation. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 651--658, Nov 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semantic Indexing of Wearable Camera Images: Kids'Cam Concepts

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion
          October 2016
          70 pages
          ISBN:9781450345194
          DOI:10.1145/2983563

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 16 October 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          iV&L-MM '16 Paper Acceptance Rate7of15submissions,47%Overall Acceptance Rate7of15submissions,47%

          Upcoming Conference

          MM '24
          MM '24: The 32nd ACM International Conference on Multimedia
          October 28 - November 1, 2024
          Melbourne , VIC , Australia

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader