research-article

Open Access

Gloria: a batch friendly indexing and storage framework

Authors:
Rachid Kachemir

Apple Inc.

Apple Inc.
View Profile

,
Brad Kellett

Apple Inc.

Apple Inc.
View Profile

,
Krishna Behara

Apple Inc.

Apple Inc.
View Profile

SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information SystemsOctober 2016Article No.: 43Pages 1–10https://doi.org/10.1145/2996913.2997013

Published:31 October 2016Publication History

SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

Pages 1–10

ABSTRACT

Indexing and delivering spatial data to a massive user base composed of over a billion devices around the world stretches the limits of traditional infrastructure and operational tools. For instance, offline bulk indexing and loading fall short of viable solutions when it comes to data at scale; Integration with distributed systems such as Apache Hadoop^© or Spark^© is sparse, while data loading is often performed in a sub-optimal fashion by relying on intermediate file formats.

We present in this paper an approach toward a hybrid on- line/offline indexing framework called Gloria that has been running in production settings for the past year at over 350k requests per seconds with lookup latencies under 5μs. The resulting output is an in-memory key-value store and we show that by leveraging higher level MapReduce [7] constructs as defined in FlumeJava [5], Gloria can achieve large scale key-value offline indexing in a fraction of the time required by traditional datastores while maintaining similar operational performance. Gloria also provides a spatial layer based on improvements to pointer-less quadtrees [12] and locational identifiers we call shift key that reduces the nearest neighbor problem in spatial data to simple key-value lookups. Shift keys have shown to outperform well established solutions such as Google S2 with locational key operations.

References

A. Aji, F. Wang, H. Vo, R. Lee, Q. Liu, X. Zhang, and J. Saltz. Hadoop gis: a high performance spatial data warehousing system over mapreduce. Proceedings of the VLDB Endowment, 6(11):1009--1020, 2013. Google ScholarDigital Library
P. Bagwell. Ideal hash trees. Technical report, 2001.Google Scholar
A. Barbuzzi, P. Michiardi, E. Biersack, and G. Boggia. Parallel bulk insertion for large-scale analytics applications. In Proceedings of the 4th International Workshop on Large Scale Distributed Systems and Middleware, pages 27--31. ACM, 2010. Google ScholarDigital Library
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13(7):422--426, 1970. Google ScholarDigital Library
C. Chambers, A. Raniwala, F. Perry, S. Adams, R. Henry, R. Bradshaw, and Nathan. Flumejava: Easy, efficient data-parallel pipelines. pages 363--375, 2010. URL: http://dl.acm.org/citation.cfm?id=1806638.Google Scholar
A. Crunch. https://crunch.apache.org, 2016. [online, accessed 20-April-2016].Google Scholar
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI'04, pages 10--10, Berkeley, CA, USA, 2004. USENIX Association. URL: http://dl.acm.org/citation.cfm?id=1251254.1251264.Google ScholarDigital Library
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon's highly available key-value store. In ACM SIGOPS Operating Systems Review, volume 41, pages 205--220. ACM, 2007. Google ScholarDigital Library
A. Eldawy and M. F. Mokbel. Spatialhadoop: A mapreduce framework for spatial data. In Data Engineering (ICDE), 2015 IEEE 31st International Conference on, pages 1352--1363. IEEE, 2015. Google ScholarCross Ref
L. Fan, P. Cao, J. Almeida, and A. Z. Broder. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking (TON), 8(3):281--293, 2000.Google Scholar
R. A. Finkel and J. L. Bentley. Quad trees a data structure for retrieval on composite keys. Acta informatica, 4(1):1--9, 1974. Google ScholarDigital Library
I. Gargantini. An effective way to represent quadtrees. Commun. ACM, 25(12):905--910, Dec. 1982. URL: http://doi.acm.org/10.1145/358728.358741, doi:10.1145/358728.358741. Google ScholarDigital Library
G. H. Gonnet. Expected length of the longest probe sequence in hash code searching. J. ACM, 28(2):289--304, Apr. 1981. URL: http://doi.acm.org/10.1145/322248.322254, doi:10.1145/322248.322254. Google ScholarDigital Library
D. Guo, J. Wu, H. Chen, Y. Yuan, and X. Luo. The dynamic bloom filters. Knowledge and Data Engineering, IEEE Transactions on, 22(1):120--133, 2010.Google ScholarDigital Library
P. Gupta, A. Wildani, E. L. Miller, D. Rosenthal, I. F. Adams, C. Strong, and A. Hospodor. An economic perspective of disk vs. flash media in archival storage. In Modelling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS), 2014 IEEE 22nd International Symposium on, pages 249--254. IEEE, 2014.Google ScholarDigital Library
A. Hadoop. https://hadoop.apache.org, 2016. [online, accessed 20-April-2016].Google Scholar
M. Herlihy, N. Shavit, and M. Tzafrir. Hopscotch hashing. In Distributed Computing, pages 350--364. Springer, 2008. Google ScholarDigital Library
G. R. Hjaltason and H. Samet. Speeding up construction of pmr quadtree-based spatial indexes. The VLDB Journal-The International Journal on Very Large Data Bases, 11(2): 109--137, 2002. Google ScholarDigital Library
S. L. Horowitz and T. Pavlidis. Picture segmentation by a tree traversal algorithm. Journal of the ACM (JACM), 23(2):368--388, 1976. Google ScholarDigital Library
G. Inc. https://github.com/sparsehash/sparsehash, 2016. [online, accessed 20-April-2016].Google Scholar
M. Mitzenmacher and E. Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005. Google ScholarDigital Library
Oracle. https://docs.oracle.com/javase/7/docs/api/java/util/IdentityHashMap.html, 2016. [online, accessed 20-April-2016].Google Scholar
R. Pagh and F. F. Rodler. Cuckoo hashing. Journal of Algorithms, 51(2):122--144, 2004. Google ScholarDigital Library
Y. Perl, A. Itai, and H. Avni. Interpolation search-a log log n search. Communications of the ACM, 21(7):550--553, 1978. Google ScholarDigital Library
P. Rigaux, M. Scholl, and A. Voisard. Spatial databases: with application to GIS. Morgan Kaufmann, 2001.Google Scholar
H. Samet. The quadtree and related hierarchical data structures. ACM Computing Surveys (CSUR), 16(2):187--260, 1984. Google ScholarDigital Library
A. Silberstein, B. F. Cooper, U. Srivastava, E. Vee, R. Yerneni, and R. Ramakrishnan. Efficient bulk insertion into a distributed ordered table. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 765--778. ACM, 2008. Google ScholarDigital Library
R. Sumbaly, J. Kreps, L. Gao, A. Feinberg, C. Soman, and S. Shah. Serving large-scale batch computed data with project voldemort. In Proceedings of the 10th USENIX conference on File and Storage Technologies, pages 18--18. USENIX Association, 2012.Google ScholarDigital Library

Index Terms

Gloria: a batch friendly indexing and storage framework
1. Information systems

Recommendations

Inverted Grid-Based kNN Query Processing with MapReduce
CHINAGRID '12: Proceedings of the 2012 Seventh ChinaGrid Annual Conference

With the increasing availability of LBS (Location Based Services) and mobile internet, the amount of spatial data is growing larger and larger. It poses new requirements and challenges towards cloud environments, such as how to accomplish efficient ...
Read More
Hybrid index structures for location-based web search
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

There is more and more commercial and research interest in location-based web search, i.e. finding web content whose topic is related to a particular place or region. In this type of search, location information should be indexed as well as text ...
Read More
An Efficient Distributed Index for Geospatial Databases
DEXA 2015: Proceedings, Part I, of the 26th International Conference on Database and Expert Systems Applications - Volume 9261

The recent and rapid growth of GPS-enabled devices has resulted in an explosion of spatial data. There are three main challenges for managing and querying such data: the massive volume of data, the need for a high insertion throughput and enabling real-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
October 2016
649 pages
ISBN:9781450345897
DOI:10.1145/2996913
General Chairs:
Mohamed Ali
University of Washington, Tacoma
,
Shawn Newsam
University of California, Merced
,
Program Chairs:
Matthias Renz
George Mason University, USA
,
Goce Trajcevski
Northwestern University, USA
,
Siva Ravada
Oracle Corporation, USA
Copyright © 2016 Owner/Author
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 31 October 2016
Check for updates
Author Tags
distributed bulk indexing
key value storage
locational identifiers
offline indexing
spatial index
Qualifiers
- research-article
Conference

Acceptance Rates
SIGSPACIAL '16 Paper Acceptance Rate40of216submissions,19%Overall Acceptance Rate220of1,116submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 699
  Total Downloads
- Downloads (Last 12 months)64
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Gloria: a batch friendly indexing and storage framework

SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inverted Grid-Based kNN Query Processing with MapReduce

Hybrid index structures for location-based web search

An Efficient Distributed Index for Geospatial Databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Gloria: a batch friendly indexing and storage framework

SIGSPACIAL '16: Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems

ABSTRACT

References

Cited By

Index Terms

Recommendations

Inverted Grid-Based kNN Query Processing with MapReduce

Hybrid index structures for location-based web search

An Efficient Distributed Index for Geospatial Databases

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media