research-article

Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark

Authors:
Gylfi Þór Guðmundsson

Reykjavik University, Reykjavik, Iceland

Reykjavik University, Reykjavik, Iceland
View Profile

,
Laurent Amsaleg

IRISA-CNRS, Rennes, France

IRISA-CNRS, Rennes, France
View Profile

,
Björn Þór Jónsson

Reykjavik University, Iceland, IT University of Copenhagen, Denmark

Reykjavik University, Iceland, IT University of Copenhagen, Denmark
View Profile

,
Michael J. Franklin

University of Chicago, Chicago, IL, USA

University of Chicago, Chicago, IL, USA
View Profile

MMSys'17: Proceedings of the 8th ACM on Multimedia Systems ConferenceJune 2017Pages 1–12https://doi.org/10.1145/3083187.3083200

Published:20 June 2017Publication History

MMSys'17: Proceedings of the 8th ACM on Multimedia Systems Conference

Pages 1–12

ABSTRACT

Computing power has now become abundant with multi-core machines, grids and clouds, but it remains a challenge to harness the available power and move towards gracefully handling web-scale datasets. Several researchers have used automatically distributed computing frameworks, notably Hadoop and Spark, for processing multimedia material, but mostly using small collections on small clusters. In this paper, we describe the engineering process for a prototype of a (near) web-scale multimedia service using the Spark framework running on the AWS cloud service. We present experimental results using up to 43 billion SIFT feature vectors from the public YFCC 100M collection, making this the largest high-dimensional feature vector collection reported in the literature. The design of the prototype and performance results demonstrate both the flexibility and scalability of the Spark framework for implementing multimedia services.

References

L. Amsaleg. A database perspective on large scale high-dimensional indexing. Habilitation à diriger des recherches, Université de Rennes 1, 2014.Google Scholar
R. Arandjelovic and A. Zisserman. All about VLAD. In Proc. CVPR, 2013. Google ScholarDigital Library
A. Babenko and V. S. Lempitsky. The inverted multi-index. TPAMI, 37(6), 2015.Google Scholar
E. Y. Chang. Foundations of Large-Scale Multimedia Information Management and Retrieval: Mathematics of Perception. Springer, 2011.Google Scholar
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. CACM, 51(1), 2008. Google ScholarDigital Library
R. K. Grace, R. Manimegalai, and S. S. Kumar. Medical image retrieval system in grid using Hadoop framework. In Proc. ICCSCI, 2014. Google ScholarDigital Library
C. Gu and Y. Gao. A content-based image retrieval system based on Hadoop and Lucene. In Proc. ICCGC, 2012. Google ScholarDigital Library
J. S. Hare, S. Samangooei, D. P. Dupplaw, and P. H. Lewis. ImageTerrier: An extensible platform for scalable high-performance image retrieval. In Proc. ICMR, 2012. Google ScholarDigital Library
S. Jai-Andaloussi, A. Elabdouli, A. Chaffai, N. Madrane, and A. Sekkaki. Medical content based image retrieval by using the hadoop framework. In Proc. ICT, 2013.Google ScholarCross Ref
H. Jégou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In Proc. ECCV, 2008. Google ScholarDigital Library
H. Jégou, M. Douze, and C. Schmid. The Copydays image dataset. http://lear.inrialpes.fr/people/jegou/data.php#copydays, 2008.Google Scholar
H. Jégou, M. Douze, and C. Schmid. Product quantization for nearest neighbor search. TPAMI, 33(1), 2011. Google ScholarDigital Library
H. Jégou, F. Perronnin, M. Douze, J. Sánchez, P. Pérez, and C. Schmid. Aggregating local image descriptors into compact codes. TPAMI, 34(9), 2012. Google ScholarDigital Library
H. Lejsek, B. Þ. Jónsson, and L. Amsaleg. NV-Tree: Nearest neighbours at the billion scale. In Proc. ICMR, 2011. Google ScholarDigital Library
T. Liu, C. Rosenberg, and H. Rowley. Clustering billions of images with large scale nearest neighbor search. In Proc. WACV, 2007. Google ScholarDigital Library
D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 60(2), 2004. Google ScholarDigital Library
D. Moise, D. Shestakov, G. Þ. Guðmundsson, and L. Amsaleg. Indexing and searching 100M images with Map-Reduce. In Proc. ICMR, 2013. Google ScholarDigital Library
D. Moise, D. Shestakov, G. Þ. Guðmundsson, and L. Amsaleg. Terabyte-scale image similarity search: experience and best practice. In Proc. Big Data, 2013.Google ScholarCross Ref
P. Moritz, R. Nishihara, I. Stoica, and M. I. Jordan. SparkNet: Training deep networks in Spark. CoRR, abs/1511.06051, 2015.Google Scholar
D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In Proc. CVPR, 2006. Google ScholarDigital Library
B. C. Ooi, K.-L. Tan, S. Wang, W. Wang, Q. Cai, G. Chen, J. Gao, Z. Luo, A. K. Tung, Y. Wang, Z. Xie, M. Zhang, and K. Zheng. Singa: A distributed deep learning platform. In Proc. ACM MM, 2015. Google ScholarDigital Library
S. Owen, R. Anil, T. Dunning, and E. Friedman. Mahout in Action. Manning Publications Co., 2011. Google ScholarDigital Library
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007.Google ScholarCross Ref
J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Lost in quantization: Improving particular object retrieval in large scale image databases. In Proc. CVPR, 2008.Google ScholarCross Ref
W. Premchaiswadi, A. Tungkatsathan, S. Intarasema, and N. Premchaiswadi. Improving performance of content-based image retrieval schemes using Hadoop MapReduce. In Proc. HPCS, 2013.Google ScholarCross Ref
D. Shestakov, D. Moise, G. Þ. Guðmundsson, and L. Amsaleg. Scalable high-dimensional indexing with Hadoop. In Proc. CBMI, 2013.Google ScholarCross Ref
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop distributed file system. In Proc. SMSST, 2010. Google ScholarDigital Library
J. Sivic and A. Zisserman. Video google: A text retrieval approach to object matching in videos. In Proc. ECCV, 2003. Google ScholarDigital Library
X. Sun, C. Wang, C. Xu, and L. Zhang. Indexing billions of images for sketch-based retrieval. In Proc. ACM MM, 2013. Google ScholarDigital Library
B. Thomee, D. A. Shamma, G. Friedland, B. Elizalde, K. Ni, D. Poland, D. Borth, and L.-J. Li. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817, 2015.Google Scholar
A. Vedaldi and B. Fulkerson. VLFeat: An open and portable library of computer vision algorithms. In Proc. ACM MM, 2010. Google ScholarDigital Library
H. Wang, B. Xiao, L. Wang, and J. Wu. Accelerating large-scale image retrieval on heterogeneous architectures with Spark. In Proc. ACM MM, 2015. Google ScholarDigital Library
B. White, T. Yeh, J. Lin, and L. S. Davis. Web-scale computer vision using MapReduce for multimedia data mining. In Proc. MDM, 2010. Google ScholarDigital Library
Q.-A. Yao, H. Zheng, Z.-Y. Xu, Q. Wu, Z.-W. Li, and L. Yun. Massive medical images retrieval system based on Hadoop. JMM, 9(2), 2014.Google Scholar
D. Yin and D. Liu. Content-based image retrieval based on Hadoop. Mathematical Problems in Engineering, 2013.Google Scholar
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proc. NSDI, 2012. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In Proc. USENIX CHTCC, 2010. Google ScholarDigital Library
J. Zhang, X. Liu, J. Luo, and B. Lang. DISR: Distributed image retrieval system based on MapReduce. In Proc. PCA, 2010.Google Scholar

Index Terms

Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark
1. Computing methodologies
  1. Distributed computing methodologies
    1. Distributed algorithms
      1. MapReduce algorithms
2. Information systems
  1. Information systems applications
    1. Multimedia information systems
      1. Multimedia databases

Recommendations

Prototyping a Web-Scale Multimedia Retrieval Service Using Spark
Special Section on Delay-Sensitive Video Computing in the Cloud and Special Section on Extended MMSys-NOSSDAV Best Papers

The world has experienced phenomenal growth in data production and storage in recent years, much of which has taken the form of media files. At the same time, computing power has become abundant with multi-core machines, grids, and clouds. Yet it ...
Read More
Towards building an analytics platform in the cloud
CF '15: Proceedings of the 12th ACM International Conference on Computing Frontiers

Recently enterprises have been able to leverage two revolutionary new tools for gaining a competitive advantage for their business -- cloud computing and analytic applications. Cloud computing unburdens them from running and maintaining their compute ...
Read More
Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

Cloud computing techniques take the form of distributed computing by utilizing multiple computers to execute computing simultaneously on the service side. To process the increasing quantity of multimedia data, numerous large-scale multimedia data ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

MMSys'17: Proceedings of the 8th ACM on Multimedia Systems Conference
June 2017
407 pages
ISBN:9781450350020
DOI:10.1145/3083187

Copyright © 2017 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Content-based image retrieval
Spark
cloud computing
distributed computing
scalability
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
MMSys'17 Paper Acceptance Rate13of47submissions,28%Overall Acceptance Rate176of530submissions,33%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 11
  Total Citations
  View Citations
- 209
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark

MMSys'17: Proceedings of the 8th ACM on Multimedia Systems Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prototyping a Web-Scale Multimedia Retrieval Service Using Spark

Towards building an analytics platform in the cloud

Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Towards Engineering a Web-Scale Multimedia Service: A Case Study Using Spark

MMSys'17: Proceedings of the 8th ACM on Multimedia Systems Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

Prototyping a Web-Scale Multimedia Retrieval Service Using Spark

Towards building an analytics platform in the cloud

Towards a framework for large-scale multimedia data storage and processing on Hadoop platform

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media