research-article

Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database

Authors:
Lucas Braun

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Thomas Etter

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Georgios Gasparis

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Martin Kaufmann

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Donald Kossmann

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Daniel Widmer

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Aharon Avitzur

Huawei Technologies, Shenzhen, China

Huawei Technologies, Shenzhen, China
View Profile

,
Anthony Iliopoulos

Huawei Technologies, Shenzhen, China

Huawei Technologies, Shenzhen, China
View Profile

,
Eliezer Levy

Huawei Technologies, Shenzhen, China

Huawei Technologies, Shenzhen, China
View Profile

,
Ning Liang

Huawei Technologies, Shenzhen, China

Huawei Technologies, Shenzhen, China
View Profile

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of DataMay 2015Pages 251–264https://doi.org/10.1145/2723372.2742783

Published:27 May 2015Publication History

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

Pages 251–264

ABSTRACT

Modern data-centric flows in the telecommunications industry require real time analytical processing over a rapidly changing and large dataset. The traditional approach of separating OLTP and OLAP workloads cannot satisfy this requirement. Instead, a new class of integrated solutions for handling hybrid workloads is needed. This paper presents an industrial use case and a novel architecture that integrates key-value-based event processing and SQL-based analytical processing on the same distributed store while minimizing the total cost of ownership. Our approach combines several well-known techniques such as shared scans, delta processing, a PAX-fashioned storage layout, and an interleaving of scanning and delta merging in a completely new way. Performance experiments show that our system scales out linearly with the number of servers. For instance, our system sustains event streams of 100,000 events per second while simultaneously processing 100 ad-hoc analytical queries per second, using a cluster of 12 commodity servers. In doing so, our system meets all response time goals of our telecommunication customers; that is, 10 milliseconds per event and 100 milliseconds for an ad-hoc analytical query. Moreover, our system beats commercial competitors by a factor of 2.5 in analytical and two orders of magnitude in update performance.

References

A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving Relations for Cache Performance. In VLDB, pages 169--180, 2001. Google ScholarDigital Library
I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A Hands-free Adaptive Store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1103--1114. ACM, 2014. Google ScholarDigital Library
M. Ali. An introduction to microsoft sql server streaminsight. In Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, page 66. ACM, 2010. Google ScholarDigital Library
Apache Foundation. Apache Storm -- A system for processing streaming data in real time.Google Scholar
Apache Foundation. Hadoop. http://hadoop.apache.org/.Google Scholar
M. Aslett. Data Platforms Landscape Map. http://blogs.the451group.com/information_management/2014/03/18/updated-data-platforms-landscape-map-february-2014.Google Scholar
R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5):720--748, 1999. Google ScholarDigital Library
P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, volume 5, pages 225--237, 2005.Google Scholar
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008. Google ScholarDigital Library
R. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. Kuno, R. Nambiar, T. Neumann, M. Poess, et al. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, page 8. ACM, 2011. Google ScholarDigital Library
F. Fabret, H.-A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe. In ACM SIGMOD Record, volume 30, pages 115--126. ACM, 2001. Google ScholarDigital Library
F. Färber et al. The SAP HANA Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1), 2012.Google Scholar
G. Gasparis. AIM: A System for Handling Enormous Workloads under Strict Latency and Scalability Regulations. Master's thesis, Systems Group, Dep. of CS, ETH Zurich, 2013.Google Scholar
G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: killing one thousand queries with one stone. Proceedings of the VLDB Endowment, 5(6):526--537, 2012. Google ScholarDigital Library
Google. Sparsehash. https://code.google.com/p/sparsehash.Google Scholar
Google. Supersonic Query Engine. https://code.google.com/p/supersonic.Google Scholar
M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudré-Mauroux, and S. Madden. HYRISE - A Main Memory Hybrid Storage Engine. Proceedings of the VLDB Endowment, 4(2):105--116, 2010. Google ScholarDigital Library
S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. Oltp through the looking glass, and what we found there. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 981--992. ACM, 2008. Google ScholarDigital Library
InfiniBand Trade Association. http://www.infinibandta.org.Google Scholar
R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment, 1(2):1496--1499, 2008. Google ScholarDigital Library
A. Kemper and T. Neumann. HyPer: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011. Google ScholarDigital Library
A. Khetrapal and V. Ganesh. Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, 2006.Google Scholar
R. Kimball. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley, 1996. Google ScholarDigital Library
C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. The VLDB Journal, 23(2):253--278, 2014. Google ScholarDigital Library
J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. Proceedings of the VLDB Endowment, 5(1):61--72, 2011. Google ScholarDigital Library
Y. Li and J. M. Patel. Widetable: An accelerator for analytical data processing. Proceedings of the VLDB Endowment, 7(10), 2014. Google ScholarDigital Library
S. Loesing, M. Pilman, T. Etter, and D. Kossmann. On the Design and Scalability of Distributed Shared-Memory Databases. Technical report, ETH Zurich, 2013.Google Scholar
J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 54(7):121--130, July 2011. Google ScholarDigital Library
M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A Column-oriented DBMS. In Proceedings of the 31st international conference on Very large data bases, pages 553--564. VLDB Endowment, 2005. Google ScholarDigital Library
M. Stonebraker and A. Weisberg. The voltdb main memory dbms. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google Scholar
E. Tech. Event Series Intelligence: Esper & NEsper. http://esper.codehaus.org.Google Scholar
TELCO-X Network Analytics Technical Questionnaire. Huawei internal document relating to customer TELCO-X, 2012.Google Scholar
A. Thomson and D. J. Abadi. The case for determinism in database systems. Proceedings of the VLDB Endowment, 3(1--2):70--80, 2010. Google ScholarDigital Library
P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable Performance for Unpredictable Workloads. Proceedings of the VLDB Endowment, 2(1):706--717, 2009. Google ScholarDigital Library
T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. Simd-scan: ultra fast in-memory table scan using on-chip vector processing units. Proceedings of the VLDB Endowment, 2(1):385--394, 2009. Google ScholarDigital Library
F. Yang, E. Tschetter, G. Merlino, N. Ray, X. Léauté, D. Ganguli, and H. Singh. Druid: A Real-time Analytical Data Store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 157--168. ACM, 2014. Google ScholarDigital Library
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10--17, 2010. Google ScholarDigital Library
J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 145--156. ACM, 2002. Google ScholarDigital Library

Index Terms

Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database

Recommendations

Big data analytics in Cloud computing: an overview
Abstract
Big Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable ...
Read More
A Real-World Distributed Infrastructure for Processing Financial Data at Scale
DEBS '19: Proceedings of the 13th ACM International Conference on Distributed and Event-based Systems

Financial markets are event- and data-driven to an extremely high degree. For making decisions and triggering actions stakeholders require notifications about significant events and reliable background information that meet their individual requirements ...
Read More
Review on Big Data & Analytics – Concepts, Philosophy, Process and Applications
Abstract
Big Data analytics has been the main focus in all the industries today. It is not overstating that if an enterprise is not using Big Data analytics, it will be a stray and incompetent in their businesses against their Big Data enabled competitors. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
May 2015
2110 pages
ISBN:9781450327589
DOI:10.1145/2723372
General Chair:
Timos Sellis
RMIT University, Australia
,
Program Chairs:
Susan B. Davidson
University of Pennsylvania, USA
,
Zack Ives
University of Pennsylvania, USA
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 May 2015
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
analytics
event-processing
oltp/olap engine
Qualifiers
- research-article
Conference

Acceptance Rates
SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate741of3,710submissions,20%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 36
  Total Citations
  View Citations
- 1,193
  Total Downloads
- Downloads (Last 12 months)33
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database

SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Big data analytics in Cloud computing: an overview

A Real-World Distributed Infrastructure for Processing Financial Data at Scale

Review on Big Data & Analytics – Concepts, Philosophy, Process and Applications