ABSTRACT
Video tag annotations have become a useful and powerful feature to facilitate video search in many social media and web applications. The majority of tags assigned to videos are supplied by users - a task which is time consuming and may result in annotations that are subjective and lack precision. A number of studies have utilized content-based extraction techniques to automate tag generation. However, these methods are compute-intensive and challenging to apply across domains. Here, we describe a complementary approach for generating tags based on the geographic properties of videos. With today's sensor-equipped smartphones, the location and orientation of a camera can be continuously acquired in conjunction with the captured video stream. Our novel technique utilizes these sensor meta-data to automatically tag outdoor videos in a two step process. First, we model the viewable scenes of the video as geometric shapes by means of its accompanied sensor data and determine the geographic objects that are visible in the video by querying geo-information databases through the viewable scene descriptions. Subsequently we extract textual information about the visible objects to serve as tags. Second, we define six criteria to score the tag relevance and rank the obtained tags based on these scores. Then we associate the tags with the video and the accurately delimited segments of the video. To evaluate the proposed technique we implemented a prototype tag generator and conducted a user study. The results demonstrate significant benefits of our method in terms of automation and tag utility.
- M. Ames and M. Naaman. Why We Tag: Motivations for Annotation in Mobile and Online Media. In CHI, 2007. Google ScholarDigital Library
- S. Arslan Ay, R. Zimmermann, and S. H. Kim. Viewable Scene Modeling for Geospatial Video Search. In ACM Multimedia, 2008. Google ScholarDigital Library
- L. Cao, J. Luo, and T. S. Huang. Annotating Photo Collections by Label Propagation According to Multiple Similarity Cues. In ACM Multimedia, 2008. Google ScholarDigital Library
- Y. Gao, J. Tang, R. Hong, Q. Dai, T. S. Chua, and R. Jain. W2Go: A Travel Guidance System by Automatic Landmark Ranking. In ACM Multimedia, 2010. Google ScholarDigital Library
- C. H. Graham, N. R. Bartlett, J. L. Brown, Y. Hsia, C. C. Mueller, and L. A. Riggs. Vision and Visual Perception. John Wiley & Sons, Inc., 1965.Google Scholar
- R. Jain and P. Sinha. Content Without Context is Meaningless. In ACM Multimedia, 2010. Google ScholarDigital Library
- Y. G. Jiang, C. W. Ngo, and S. F. Chang. Semantic Context Transfer across Heterogeneous Sources for Domain Adaptive Video Search. In ACM Multimedia, 2009. Google ScholarDigital Library
- Y. Jin, M. Hu, H. Singh, D. Rule, M. Berlyant, and Z. Xie. MySpace Video Recommendation with Map-Reduce on Qizmt. In IEEE ICSC, 2010. Google ScholarDigital Library
- T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to Predict Where Humans Look. In ICCV, 2009.Google ScholarCross Ref
- K. C. K. Lee, W.-C. Lee, and H. V. Leong. Nearest Surrounder Queries. IEEE TKDE, 2010. Google ScholarDigital Library
- K. Lerman and L. Jones. Social Browsing on Flickr. Arxiv preprint cs0612047, 2006.Google Scholar
- X. Li, L. Guo, and Y. E. Zhao. Tag-based Social Interest Discovery. In WWW, 2008. Google ScholarDigital Library
- D. Liu, X. S. Hua, L. Yang, M. Wang, and H. J. Zhang. Tag Ranking. In WWW, 2009. Google ScholarDigital Library
- X. Liu, M. Corner, and P. Shenoy. SEVA: Sensor-Enhanced Video Annotation. In ACM Multimedia, 2005. Google ScholarDigital Library
- X. Lu, C. Wang, J. M. Yang, Y. Pang, and L. Zhang. Photo2Trip: Generating Travel Routes from Geo-Tagged Photos for Trip Planning. In ACM Multimedia, 2010. Google ScholarDigital Library
- M. Naaman, S. Harada, Q. Wang, H. G. Molina, and A. Paepcke. Context Data in GeoReferenced Digital Photo Collections. In ACM Multimedia, 2004. Google ScholarDigital Library
- M. Naphade, J. R. Smith, J. Tesic, S.-F. Chang, W. Hsu, L. Kennedy, A. Hauptmann, and J. Curtis. Large-Scale concept ontology for multimedia. IEEE Multimedia, 2006. Google ScholarDigital Library
- M. R. Naphade and J. R. Smith. On the Detection of Semantic Concepts at TRECVID. In ACM Multimedia, 2004. Google ScholarDigital Library
- A. Pigeau and M. Gelgon. Building and Tracking Hierarchical Geographical & Temporal Partitions for Image Collection Management on Mobile Devices. In ACM Multimedia, 2005. Google ScholarDigital Library
- G. J. Qi, X. S. Hua, Y. Rui, J. Tang, T. Mei, and H. J. Zhang. Correlative Multi-Label Video Annotation. In ACM Multimedia, 2007. Google ScholarDigital Library
- C. Shahabi, F. Banaei-Kashani, A. Khoshgozaran, L. Nocera, and S. Xing. GeoDec: A Framework to Effectively Visualize and Query Geospatial Data for Decision-Making. IEEE Multimedia, 2010. Google ScholarDigital Library
- S. Siersdorfer, J. San Pedro, and M. Sanderson. Automatic Video Tagging using Content Redundancy. In SIGIR, 2009. Google ScholarDigital Library
- B. Sigurbjörnsson and R. van Zwol. Flickr Tag Recommendation based on Collective Knowledge. In WWW, 2008. Google ScholarDigital Library
- F. M. Suchanek, M. Vojnovic, and D. Gunawardena. Social Tags: Meaning and Suggestions. In ACM CIKM, 2008. Google ScholarDigital Library
- K. Toyama, R. Logan, and A. Roseway. Geographic Location Tags on Digital Images. In ACM Multimedia, 2003. Google ScholarDigital Library
- M. Wang, X. S. Hua, X. Yuan, Y. Song, and L. R. Dai. Optimizing Multi-Graph Learning: Towards A Unified Video Annotation Scheme. In ACM Multimedia, 2007. Google ScholarDigital Library
- R. Yan, A. Natsev, and M. Campbell. A Learning-based Hybrid Tagging and Browsing Approach for Efficient Manual Image Annotation. In CVPR, 2008.Google Scholar
- K. Yang, X. S. Hua, M. Wang, and H. J. Zhang. Tagging Tags. In ACM Multimedia, 2010. Google ScholarDigital Library
- P. A. Zandbergen. Accuracy of iPhone Locations: A Comparison of Assisted-GPS, WiFi and Cellular Positioning. Transactions in GIS, 2009.Google Scholar
Index Terms
- Automatic tag generation and ranking for sensor-rich outdoor videos
Recommendations
Spatial-Temporal Tag Mining for Automatic Geospatial Video Annotation
Videos are increasingly geotagged and used in practical and powerful GIS applications. However, video search and management operations are typically supported by manual textual annotations, which are subjective and laborious. Therefore, research has ...
SRV-TaGS: An Automatic TAGging and Search System for Sensor-Rich Outdoor Videos
MM '11: Proceedings of the 19th ACM international conference on MultimediaTagging facilitates video search in many social media and web applications. While manual tagging is time consuming, subjective and sometimes inaccurate, auto-tagging facilitated by content-based techniques is compute-intensive and challenging to apply ...
Disinformation in Multimedia Annotation: Misleading Metadata Detection on YouTube
iV&L-MM '16: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia FusionPopularity of online videos is increasing at a rapid rate. Not only the users can access these videos online, but they can also upload video content on platforms like YouTube and Myspace. These videos are indexed by user generated multimedia annotation, ...
Comments