skip to main content
10.1145/2723372.2742783acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database

Published:27 May 2015Publication History

ABSTRACT

Modern data-centric flows in the telecommunications industry require real time analytical processing over a rapidly changing and large dataset. The traditional approach of separating OLTP and OLAP workloads cannot satisfy this requirement. Instead, a new class of integrated solutions for handling hybrid workloads is needed. This paper presents an industrial use case and a novel architecture that integrates key-value-based event processing and SQL-based analytical processing on the same distributed store while minimizing the total cost of ownership. Our approach combines several well-known techniques such as shared scans, delta processing, a PAX-fashioned storage layout, and an interleaving of scanning and delta merging in a completely new way. Performance experiments show that our system scales out linearly with the number of servers. For instance, our system sustains event streams of 100,000 events per second while simultaneously processing 100 ad-hoc analytical queries per second, using a cluster of 12 commodity servers. In doing so, our system meets all response time goals of our telecommunication customers; that is, 10 milliseconds per event and 100 milliseconds for an ad-hoc analytical query. Moreover, our system beats commercial competitors by a factor of 2.5 in analytical and two orders of magnitude in update performance.

References

  1. A. Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving Relations for Cache Performance. In VLDB, pages 169--180, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. I. Alagiannis, S. Idreos, and A. Ailamaki. H2O: A Hands-free Adaptive Store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 1103--1114. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. M. Ali. An introduction to microsoft sql server streaminsight. In Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, page 66. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Apache Foundation. Apache Storm -- A system for processing streaming data in real time.Google ScholarGoogle Scholar
  5. Apache Foundation. Hadoop. http://hadoop.apache.org/.Google ScholarGoogle Scholar
  6. M. Aslett. Data Platforms Landscape Map. http://blogs.the451group.com/information_management/2014/03/18/updated-data-platforms-landscape-map-february-2014.Google ScholarGoogle Scholar
  7. R. D. Blumofe and C. E. Leiserson. Scheduling multithreaded computations by work stealing. Journal of the ACM (JACM), 46(5):720--748, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. A. Boncz, M. Zukowski, and N. Nes. MonetDB/X100: Hyper-Pipelining Query Execution. In CIDR, volume 5, pages 225--237, 2005.Google ScholarGoogle Scholar
  9. F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Cole, F. Funke, L. Giakoumakis, W. Guy, A. Kemper, S. Krompass, H. Kuno, R. Nambiar, T. Neumann, M. Poess, et al. The mixed workload CH-benCHmark. In Proceedings of the Fourth International Workshop on Testing Database Systems, page 8. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. F. Fabret, H.-A. Jacobsen, F. Llirbat, J. Pereira, K. A. Ross, and D. Shasha. Filtering Algorithms and Implementation for Very Fast Publish/Subscribe. In ACM SIGMOD Record, volume 30, pages 115--126. ACM, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Färber et al. The SAP HANA Database -- An Architecture Overview. IEEE Data Eng. Bull., 35(1), 2012.Google ScholarGoogle Scholar
  13. G. Gasparis. AIM: A System for Handling Enormous Workloads under Strict Latency and Scalability Regulations. Master's thesis, Systems Group, Dep. of CS, ETH Zurich, 2013.Google ScholarGoogle Scholar
  14. G. Giannikis, G. Alonso, and D. Kossmann. SharedDB: killing one thousand queries with one stone. Proceedings of the VLDB Endowment, 5(6):526--537, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Google. Sparsehash. https://code.google.com/p/sparsehash.Google ScholarGoogle Scholar
  16. Google. Supersonic Query Engine. https://code.google.com/p/supersonic.Google ScholarGoogle Scholar
  17. M. Grund, J. Krüger, H. Plattner, A. Zeier, P. Cudré-Mauroux, and S. Madden. HYRISE - A Main Memory Hybrid Storage Engine. Proceedings of the VLDB Endowment, 4(2):105--116, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Harizopoulos, D. J. Abadi, S. Madden, and M. Stonebraker. Oltp through the looking glass, and what we found there. In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pages 981--992. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. InfiniBand Trade Association. http://www.infinibandta.org.Google ScholarGoogle Scholar
  20. R. Kallman, H. Kimura, J. Natkins, A. Pavlo, A. Rasin, S. Zdonik, E. P. Jones, S. Madden, M. Stonebraker, Y. Zhang, et al. H-store: a high-performance, distributed main memory transaction processing system. Proceedings of the VLDB Endowment, 1(2):1496--1499, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Kemper and T. Neumann. HyPer: A hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In ICDE, pages 195--206, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Khetrapal and V. Ganesh. Hbase and hypertable for large scale distributed storage systems. Dept. of Computer Science, Purdue University, 2006.Google ScholarGoogle Scholar
  23. R. Kimball. The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses. John Wiley, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Koch, Y. Ahmad, O. Kennedy, M. Nikolic, A. Nötzli, D. Lupei, and A. Shaikhha. Dbtoaster: higher-order delta processing for dynamic, frequently fresh views. The VLDB Journal, 23(2):253--278, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Krueger, C. Kim, M. Grund, N. Satish, D. Schwalb, J. Chhugani, H. Plattner, P. Dubey, and A. Zeier. Fast updates on read-optimized databases using multi-core CPUs. Proceedings of the VLDB Endowment, 5(1):61--72, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Li and J. M. Patel. Widetable: An accelerator for analytical data processing. Proceedings of the VLDB Endowment, 7(10), 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Loesing, M. Pilman, T. Etter, and D. Kossmann. On the Design and Scalability of Distributed Shared-Memory Databases. Technical report, ETH Zurich, 2013.Google ScholarGoogle Scholar
  28. J. Ousterhout, P. Agrawal, D. Erickson, C. Kozyrakis, J. Leverich, D. Mazières, S. Mitra, A. Narayanan, D. Ongaro, G. Parulkar, M. Rosenblum, S. M. Rumble, E. Stratmann, and R. Stutsman. The case for RAMCloud. Commun. ACM, 54(7):121--130, July 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Stonebraker, D. J. Abadi, A. Batkin, X. Chen, M. Cherniack, M. Ferreira, E. Lau, A. Lin, S. Madden, E. J. O'Neil, P. E. O'Neil, A. Rasin, N. Tran, and S. B. Zdonik. C-Store: A Column-oriented DBMS. In Proceedings of the 31st international conference on Very large data bases, pages 553--564. VLDB Endowment, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. Stonebraker and A. Weisberg. The voltdb main memory dbms. IEEE Data Eng. Bull., 36(2):21--27, 2013.Google ScholarGoogle Scholar
  31. E. Tech. Event Series Intelligence: Esper & NEsper. http://esper.codehaus.org.Google ScholarGoogle Scholar
  32. TELCO-X Network Analytics Technical Questionnaire. Huawei internal document relating to customer TELCO-X, 2012.Google ScholarGoogle Scholar
  33. A. Thomson and D. J. Abadi. The case for determinism in database systems. Proceedings of the VLDB Endowment, 3(1--2):70--80, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Unterbrunner, G. Giannikis, G. Alonso, D. Fauser, and D. Kossmann. Predictable Performance for Unpredictable Workloads. Proceedings of the VLDB Endowment, 2(1):706--717, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. T. Willhalm, N. Popovici, Y. Boshmaf, H. Plattner, A. Zeier, and J. Schaffner. Simd-scan: ultra fast in-memory table scan using on-chip vector processing units. Proceedings of the VLDB Endowment, 2(1):385--394, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. F. Yang, E. Tschetter, G. Merlino, N. Ray, X. Léauté, D. Ganguli, and H. Singh. Druid: A Real-time Analytical Data Store. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data, pages 157--168. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing, pages 10--17, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. J. Zhou and K. A. Ross. Implementing database operations using SIMD instructions. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data, pages 145--156. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analytics in Motion: High Performance Event-Processing AND Real-Time Analytics in the Same Database

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
          May 2015
          2110 pages
          ISBN:9781450327589
          DOI:10.1145/2723372

          Copyright © 2015 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 May 2015

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader