research-article

Open Access

YFCC100M: the new data in multimedia research

Authors:
Bart Thomee

Yahoo Labs and Flickr in San Francisco, CA

Yahoo Labs and Flickr in San Francisco, CA
View Profile

,
David A. Shamma

Yahoo Labs and Flickr in San Francisco, CA

Yahoo Labs and Flickr in San Francisco, CA
View Profile

,
Gerald Friedland

International Computer Science Institute in Berkeley, CA

International Computer Science Institute in Berkeley, CA
View Profile

,
Benjamin Elizalde

International Computer Science Institute in Berkeley, CA

International Computer Science Institute in Berkeley, CA
View Profile

,
Karl Ni

Lawrence Livermore National Laboratory in Livermore, CA

Lawrence Livermore National Laboratory in Livermore, CA
View Profile

,
Douglas Poland

Lawrence Livermore National Laboratory in Livermore, CA

Lawrence Livermore National Laboratory in Livermore, CA
View Profile

,
Damian Borth

International Computer Science Institute in Berkeley, CA

International Computer Science Institute in Berkeley, CA
View Profile

,
Li-Jia Li

Yahoo Labs, San Francisco, CA

Yahoo Labs, San Francisco, CA
View Profile

Authors Info & Claims

Communications of the ACM Volume 59 Issue 2February 2016pp 64–73https://doi.org/10.1145/2812802

Published:25 January 2016Publication History

Communications of the ACM

Abstract

This publicly available curated dataset of almost 100 million photos and videos is free and legal for all.

References

Bernd, J., Borth, D., Elizalde, B., Friedland, G., Gallagher, H., Gottlieb, L.R., Janin, A., Karabashlieva, S., Takahashi, J., and Won, J. The YLI-MED corpus: Characteristics, procedures, and plans. Computing Research Repository Division of arXiv abs/1503.04250 (Mar. 2015).Google Scholar
Borgman, C.L. The conundrum of sharing research data. Journal of the American Society for Information Science and Technology 63, 6 (Apr. 2012), 1059--1078. Google ScholarDigital Library
Choi, J., Thomee, B., Friedland, G., Cao, L., Ni, K., Borth, D., Elizalde, B., Gottlieb, L., Carrano, C., Pearce, R., and Poland, D. The placing task: A large-scale geo-estimation challenge for social-media videos and images. In Proceedings of the Third ACM International Workshop on Geotagging and Its Applications in Multimedia (Orlando, FL, Nov. 3--7). ACM Press, New York, 2014, 27--31. Google ScholarDigital Library
Crandall, D. J., Backstrom, L., Huttenlocher, D., and Kleinberg, J. Mapping the world's photos. In Proceedings of the 18^th IW3C2 International Conference on the World Wide Web (Madrid, Spain, Apr. 20--24). ACM Press, New York, 2009, 761--770. Google ScholarDigital Library
Deng, J., Dong, W., Socher, R., Li, L., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Miami, FL, June 20--25). IEEE Press, New York, 2009. 248--255.Google Scholar
Facebook, Ericsson, and Qualcomm. A Focus on Efficiency. Technical Report, Internet.org, 2013; https://web.archive.org/web/20150402101302/http://internet.org/efficiencypaperGoogle Scholar
Fienberg, S.E., Martin, M.E., and Straf, M.L. Eds. (National Research Council). Sharing Research Data. National Academy Press, Washington, D.C., 1985; http://www.nap.edu/catalog/2033/sharing-research-dataGoogle Scholar
Good, J. How many photos have ever been taken?. Internet Archive Wayback Machine, Sept. 2011; https://web.archive.org/web/20150203215607/http://blog.1000memories.com/94-number-of-photos-ever-taken-digital-and-analog-in-shoeboxGoogle Scholar
Hays, J. and Efros, A.A. IM2GPS: Estimating geographic information from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Anchorage, AK, June 23--28). IEEE Press, New York, 2008.Google Scholar
Hecht, B., Hong, L., Suh, B., and Chi, E. H. Tweets from Justin Bieber's heart: The dynamics of the location field in user profiles. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Vancouver, Canada, May 7--12). ACM Press, New York, 2011, 237--246. Google ScholarDigital Library
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R. B., Guadarrama, S., and Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22^nd ACM International Conference on Multimedia (Orlando, FL, Nov. 3--7). ACM Press, New York, 2014, 675--678. Google ScholarDigital Library
Kremerskothen, K. Welcome the Internet archive to the commons. Flickr, San Francisco, CA, Aug. 2014; https://blog.flickr.net/2014/08/29/welcome-the-internet-archive-to-the-commons/Google Scholar
Krizhevsky, A., Sutskever, I., and Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in Neural Information Processing Systems (Lake Tahoe, CA, Dec 3--8). Curran Associates, Red Hook, NY, 2012, 1097--1105.Google Scholar
Li, L., Socher, R., and Fei-Fei, L. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (Miami, FL, June 20--25). IEEE Press, New York, 2009, 2036--2043.Google Scholar
Rattenbury, T., Good, N., and Naaman, M. Towards automatic extraction of event and place semantics from Flickr tags. In Proceedings of the 30^th ACM International Conference on Research and Development in Information Retrieval (Amsterdam, the Netherlands, July 23--27). ACM Press, New York, 2007, 103--110. Google ScholarDigital Library
Renear, A.H., Sacchi, S., and Wickett, K.M. Definitions of dataset in the scientific and technical literature. In Proceedings of the 73^rd Annual Meeting of the American Society for Information Science and Technology (Pittsburgh, PA, Oct. 22--27). Association for Information Science and Technology, Silver Spring, MD, 2010, article 81. Google ScholarDigital Library
Snavely, N., Seitz, S., and Szeliski, R. Photo tourism: Exploring photo collections in 3D. ACM Transactions on Graphics 25, 3 (July 2006), 835--846. Google ScholarDigital Library
Swan, A. and Brown, S. To Share or Not to Share: Publication and Quality Assurance of Research Data Outputs. Technical Report. Research Information Network, London, U.K., 2008.Google Scholar
Van Dijck, J. Digital photography: Communication, identity, memory. Visual Communication 7, 1 (Feb. 2008), 57--76.Google ScholarCross Ref
Wilson, M.L., Chi, E.H., Reeves, S., and Coyle, D. RepliCHI: The workshop II. In Proceedings of the International Conference on Human Factors in Computing Systems, Extended Abstracts (Toronto, Canada, Apr. 26--May 1). ACM Press, New York, 2014, 33--36. Google ScholarDigital Library
Yelp. Yelp Dataset Challenge. Yelp, San Francisco, CA; http://yelp.com/dataset_challenge/Google Scholar
YouTube. YouTube press statistics. YouTube, San Bruno, CA; http://youtube.com/yt/press/statistics.htmlGoogle Scholar

Index Terms

YFCC100M: the new data in multimedia research
1. Information systems
  1. Data management systems
    1. Information integration
  2. Information systems applications
    1. Data mining

Recommendations

Real-time Analysis and Visualization of the YFCC100m Dataset
MMCommons '15: Proceedings of the 2015 Workshop on Community-Organized Multimodal Mining: Opportunities for Novel Solutions

With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has ...
Read More
Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset
MMCommons '16: Proceedings of the 2016 ACM Workshop on Multimedia COMMONS

The Yahoo Flickr Creative Commons 100 Million dataset (YFCC100M) is one of the largest public databases containing images and videos and their annotations for research on multimedia analysis. In this paper, we present our study on analysis of ...
Read More
Practical guide to using the YFCC100M and MMCOMMONS on a budget

The Yahoo-Flickr Creative Commons 100 Million (YFCC100M), the largest freely usable multimedia dataset to have been released so far, is widely used by students, researchers and engineers on topics in multimedia that range from computer vision to machine ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Communications of the ACM Volume 59, Issue 2
February 2016
110 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/2886013
Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY
Issue’s Table of Contents
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 25 January 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Qualifiers
- research-article
- Popular
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 963
  Total Citations
  View Citations
- 38,957
  Total Downloads
- Downloads (Last 12 months)2,037
- Downloads (Last 6 weeks)209
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF Chinese translation

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

YFCC100M: the new data in multimedia research

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Real-time Analysis and Visualization of the YFCC100m Dataset

Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset

Practical guide to using the YFCC100M and MMCOMMONS on a budget

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

YFCC100M: the new data in multimedia research

Communications of the ACM

Abstract

References

Cited By

Index Terms

Recommendations

Real-time Analysis and Visualization of the YFCC100m Dataset

Analysis of Spatial, Temporal, and Content Characteristics of Videos in the YFCC100M Dataset

Practical guide to using the YFCC100M and MMCOMMONS on a budget

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media