ABSTRACT
Modern data-centric flows in the telecommunications industry require real time analytical processing over a rapidly changing and large dataset. The traditional approach of separating OLTP and OLAP workloads cannot satisfy this requirement. Instead, a new class of integrated solutions for handling hybrid workloads is needed. This paper presents an industrial use case and a novel architecture that integrates key-value-based event processing and SQL-based analytical processing on the same distributed store while minimizing the total cost of ownership. Our approach combines several well-known techniques such as shared scans, delta processing, a PAX-fashioned storage layout, and an interleaving of scanning and delta merging in a completely new way. Performance experiments show that our system scales out linearly with the number of servers. For instance, our system sustains event streams of 100,000 events per second while simultaneously processing 100 ad-hoc analytical queries per second, using a cluster of 12 commodity servers. In doing so, our system meets all response time goals of our telecommunication customers; that is, 10 milliseconds per event and 100 milliseconds for an ad-hoc analytical query. Moreover, our system beats commercial competitors by a factor of 2.5 in analytical and two orders of magnitude in update performance.
- A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving Relations for Cache Performance. In VLDB, pages 169--180, 2001. Google ScholarDigital Library
- I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A Hands-free Adaptive Store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1103--1114. ACM, 2014. Google ScholarDigital Library
- M. Ali. An introduction to microsoft sql server streaminsight. In Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, page 66. ACM, 2010. Google ScholarDigital Library
- Apache Foundation. Apache Storm -- A system for processing streaming data in real time.Google Scholar
- Apache Foundation. Hadoop. http://hadoop.apache.org/.Google Scholar
- M. Aslett. Data Platforms Landscape Map. http://blogs.the451group.com/information_management/2014/03/18/updated-data-platforms-landscape-map-february-2014.Google Scholar
- R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5):720--748, 1999. Google ScholarDigital Library
- P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, volume 5, pages 225--237, 2005.Google Scholar
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008. Google ScholarDigital Library
- R. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. Kuno, R. Nambiar, T. Neumann, M. Poess, et al. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, page 8. ACM, 2011. Google ScholarDigital Library
- F. Fabret, H.-A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe. In ACM SIGMOD Record, volume 30, pages 115--126. ACM, 2001. Google ScholarDigital Library
- F. Färber et al. The SAP HANA Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1), 2012.Google Scholar
- G. Gasparis. AIM: A System for Handling Enormous Workloads under Strict Latency and Scalability Regulations. Master's thesis, Systems Group, Dep. of CS, ETH Zurich, 2013.Google Scholar
- G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: killing one thousand queries with one stone. Proceedings of the VLDB Endowment, 5(6):526--537, 2012. Google ScholarDigital Library
- Google. Sparsehash. https://code.google.com/p/sparsehash.Google Scholar
- Google. Supersonic Query Engine. https://code.google.com/p/supersonic.Google Scholar
- M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudré-Mauroux, and S. Madden. HYRISE - A Main Memory Hybrid Storage Engine. Proceedings of the VLDB Endowment, 4(2):105--116, 2010. Google ScholarDigital Library
- S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. Oltp through the looking glass, and what we found there. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 981--992. ACM, 2008. Google ScholarDigital Library
- InfiniBand Trade Association. http://www.infinibandta.org.Google Scholar
- R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment, 1(2):1496--1499, 2008. Google ScholarDigital Library
- A. Kemper and T. Neumann. HyPer: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011. Google ScholarDigital Library
- A. Khetrapal and V. Ganesh. Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, 2006.Google Scholar
- R. Kimball. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley, 1996. Google ScholarDigital Library
- C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. The VLDB Journal, 23(2):253--278, 2014. Google ScholarDigital Library
- J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. Proceedings of the VLDB Endowment, 5(1):61--72, 2011. Google ScholarDigital Library
- Y. Li and J. M. Patel. Widetable: An accelerator for analytical data processing. Proceedings of the VLDB Endowment, 7(10), 2014. Google ScholarDigital Library
- S. Loesing, M. Pilman, T. Etter, and D. Kossmann. On the Design and Scalability of Distributed Shared-Memory Databases. Technical report, ETH Zurich, 2013.Google Scholar
- J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 54(7):121--130, July 2011. Google ScholarDigital Library
- M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A Column-oriented DBMS. In Proceedings of the 31st international conference on Very large data bases, pages 553--564. VLDB Endowment, 2005. Google ScholarDigital Library
- M. Stonebraker and A. Weisberg. The voltdb main memory dbms. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google Scholar
- E. Tech. Event Series Intelligence: Esper & NEsper. http://esper.codehaus.org.Google Scholar
- TELCO-X Network Analytics Technical Questionnaire. Huawei internal document relating to customer TELCO-X, 2012.Google Scholar
- A. Thomson and D. J. Abadi. The case for determinism in database systems. Proceedings of the VLDB Endowment, 3(1--2):70--80, 2010. Google ScholarDigital Library
- P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable Performance for Unpredictable Workloads. Proceedings of the VLDB Endowment, 2(1):706--717, 2009. Google ScholarDigital Library
- T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. Simd-scan: ultra fast in-memory table scan using on-chip vector processing units. Proceedings of the VLDB Endowment, 2(1):385--394, 2009. Google ScholarDigital Library
- F. Yang, E. Tschetter, G. Merlino, N. Ray, X. Léauté, D. Ganguli, and H. Singh. Druid: A Real-time Analytical Data Store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 157--168. ACM, 2014. Google ScholarDigital Library
- M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10--17, 2010. Google ScholarDigital Library
- J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 145--156. ACM, 2002. Google ScholarDigital Library
Index Terms
- Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database
Recommendations
Big data analytics in Cloud computing: an overview
AbstractBig Data and Cloud Computing as two mainstream technologies, are at the center of concern in the IT field. Every day a huge amount of data is produced from different sources. This data is so big in size that traditional processing tools are unable ...
A Real-World Distributed Infrastructure for Processing Financial Data at Scale
DEBS '19: Proceedings of the 13th ACM International Conference on Distributed and Event-based SystemsFinancial markets are event- and data-driven to an extremely high degree. For making decisions and triggering actions stakeholders require notifications about significant events and reliable background information that meet their individual requirements ...
Review on Big Data & Analytics – Concepts, Philosophy, Process and Applications
AbstractBig Data analytics has been the main focus in all the industries today. It is not overstating that if an enterprise is not using Big Data analytics, it will be a stray and incompetent in their businesses against their Big Data enabled competitors. ...
Comments