ABSTRACT
Pattern queries are widely used in complex event processing (CEP) systems. Existing pattern matching techniques, however, can provide only limited performance for expensive queries in real-world applications, which may involve Kleene closure patterns, flexible event selection strategies, and events with imprecise timestamps. To support these expensive queries with high performance, we begin our study by analyzing the complexity of pattern queries, with a focus on the fundamental understanding of which features make pattern queries more expressive and at the same time more computationally expensive. This analysis allows us to identify performance bottlenecks in processing those expensive queries, and provides key insights for us to develop a series of optimizations to mitigate those bottlenecks. Microbenchmark results show superior performance of our system for expensive pattern queries while most state-of-the-art systems suffer from poor performance. A thorough case study on Hadoop cluster monitoring further demonstrates the efficiency and effectiveness of our proposed techniques.
- J. Agrawal, Y. Diao, et al. Efficient pattern matching over event streams. In SIGMOD, 147--160, 2008. Google ScholarDigital Library
- M. Akdere, et al. Plan-based complex event detection across distributed sources. PVLDB, 1(1):66--77, 2008. Google ScholarDigital Library
- M. H. Ali, C. Gerea, et al. Microsoft cep server and online behavioral targeting. VLDB, 2(2):1558--1561, 2009. Google ScholarDigital Library
- R. Alur and P. Madhusudan. Visibly pushdown languages. In STOC, 202--211, 2004. Google ScholarDigital Library
- A. Arasu, S. Babu, et al. The CQL continuous query language: semantic foundations and query execution. VLDB J., 15(2):121--142, 2006. Google ScholarDigital Library
- R. S. Barga, J. Goldstein, et al. Consistent streaming through time: A vision for event stream processing. In CIDR, 363--374, 2007.Google Scholar
- D. A. M. Barrington, N. Immerman, et al. On uniformity within nc 1. J. Comput. Syst. Sci., 41(3):274--306, 1990. Google ScholarDigital Library
- J. Boulon, A. Konwinski, et al. Chukwa, a large-scale monitoring system. In CCA, volume 8, 2008.Google Scholar
- Z. Cao, C. Sutton, et al. Distributed inference and query processing for rfid tracking and monitoring. PVLDB, 4(5):326--337, 2011. Google ScholarDigital Library
- G. Cugola and A. Margara. Tesla: a formally defined event specification language. In DEBS, 50--61, 2010. Google ScholarDigital Library
- A. J. Demers, J. Gehrke, et al. Cayuga: A general purpose event monitoring system. In CIDR, 412--422, 2007.Google Scholar
- N. Immerman. Descriptive Complexity (Texts in Computer Science). Springer, 1999 edition, 1 1999.Google Scholar
- T. Johnson, et al. Monitoring regular expressions on out-of-order streams. In ICDE, 1315--1319, 2007.Google Scholar
- J. Kamp. Tense logic and the theory of linear order. 1968.Google Scholar
- S. Krishnamurthy, C. Wu, et al. On-the- y sharing for streamed aggregation. In SIGMOD, 623--634, 2006. Google ScholarDigital Library
- M. Liu, et al. High-performance nested cep query processing over event streams. In ICDE, 123--134, 2011. Google ScholarDigital Library
- D. Luckham. Event Processing for Business: Organizing the Real-Time Enterprise. Wiley, 2011.Google Scholar
- M. Massie, B. Li, et al. Monitoring with ganglia, 2012. Google ScholarDigital Library
- R. McNaughton and S. A. Papert. Counter-Free Automata (MIT research monograph no. 65). The MIT Press, 1971. Google ScholarDigital Library
- Y. Mei and S. Madden. Zstream: a cost-based query processor for adaptively detecting composite events. In SIGMOD, 193--206, 2009. Google ScholarDigital Library
- B. Mozafari, et al. High-performance complex event processing over xml streams. In SIGMOD, 253--264, 2012. Google ScholarDigital Library
- K. Ren, Y. Kwon, et al. Hadoop's adolescence. PVLDB, 6(10):853--864, 2013. Google ScholarDigital Library
- R. Sadri, C. Zaniolo, et al. Expressing and optimizing sequence queries in database systems. ACM TODS, 29(2):282--318, 2004. Google ScholarDigital Library
- N. P. Schultz-Müller, et al. Distributed complex event processing with query rewriting. In DEBS, 4. , 2009. Google ScholarDigital Library
- H. Vollmer. Introduction to circuit complexity: a uniform approach. Springer, 1999. Google ScholarDigital Library
- D. Wang, et al. Active complex event processing over event streams. PVLDB, 4(10):634--645, 2011. Google ScholarDigital Library
- D. Wang, E. A. Rundensteiner, et al. Probabilistic inference of object identifications for event stream analytics. In EDBT, 513--524, 2013. Google ScholarDigital Library
- E. Wu, Y. Diao, et al. High-performance complex event processing over streams. In SIGMOD, 407--418, 2006. Google ScholarDigital Library
- D. Yang, E. A. Rundensteiner, et al. A shared execution strategy for multiple pattern mining requests over streaming data. PVLDB, 2(1):874--885, 2009. Google ScholarDigital Library
- H. Zhang, Y. Diao, et al. Recognizing patterns in streams with imprecise timestamps. PVLDB, 3(1):244--255, 2010. Google ScholarDigital Library
- H. Zhang, et al. Optimizing expensive queries in complex event processing. Tech Report, http://sase.cs.umass.edu/Google Scholar
Index Terms
- On complexity and optimization of expensive queries in complex event processing
Recommendations
Join query optimization techniques for complex event processing applications
Complex event processing (CEP) is a prominent technology used in many modern applications for monitoring and tracking events of interest in massive data streams. CEP engines inspect real-time information flows and attempt to detect combinations of ...
Optimizing complex queries based on similarities of subqueries
As database technology is applied to more and more application domains, user queries are becoming increasingly complex (e.g. involving a large number of joins and a complex query structure). Query optimizers in existing database management systems (DBMS)...
An Optimal Algorithm for Processing Distributed Star Queries
The problem of optimal query processing in distributed database systems was shown to be NP-hard. However, for a special type of queries called star queries, we have developed a polynomial optimal algorithm. Semijoin tactics are applied for query ...
Comments