ABSTRACT
Data streaming systems are becoming essential for monitoring applications such as financial analysis and network intrusion detection. These systems often have to process many similar but different queries over common data. Since executing each query separately can lead to significant scalability and performance problems, it is vital to share resources by exploiting similarities in the queries. In this paper we present ways to efficiently share streaming aggregate queries with differing periodic windows and arbitrary selection predicates. A major contribution is our sharing technique that does not require any up-front multiple query optimization. This is a significant departure from existing techniques that rely on complex static analyses of fixed query workloads. Our approach is particularly vital in streaming systems where queries can join and leave the system at any point. We present a detailed performance study that evaluates our strategies with an implementation and real data. In these experiments, our approach gives us as much as an order of magnitude performance improvement over the state of the art.
- A. Arasu et al. Resource sharing in continuous sliding-window aggregates. In VLDB. 2004. Google ScholarDigital Library
- A. Arasu, et al. The CQL continuous query language: Semantic foundations and query execution. VLDB Journal, (To appear). Google ScholarDigital Library
- F. Bancilhon, et al. FAD, a powerful and simple database language. In VLDB. 1987. Google ScholarDigital Library
- D. Carney, et al. Monitoring streams - a new class of data management applications. In VLDB. 2002. Google ScholarDigital Library
- S. Chandrasekaran et al. Streaming queries over streaming data. In VLDB. 2002. Google ScholarDigital Library
- S. Chandrasekaran, et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR. 2003.Google Scholar
- J. Chen, et al. NiagaraCQ: a scalable continuous query system for Internet databases. In SIGMOD. 2000. Google ScholarDigital Library
- C. D. Cranor, et al. Gigascope: A stream database for network applications. In SIGMOD. 2003. Google ScholarDigital Library
- M. Denny et al. Predicate result range caching for continuous queries. In SIGMOD. 2005. Google ScholarDigital Library
- P. M. Deshpande, et al. Caching multidimensional queries using chunks. In SIGMOD. 1998. Google ScholarDigital Library
- C. L. Forgy. Rete: A fast algorithm for the many pattern/many object match problem. Artifical Intelligence, 19(1):17--37, September 1982.Google ScholarDigital Library
- M. J. Franklin, et al. Design considerations for high fan-in systems: The HiFi approach. In CIDR. 2005.Google Scholar
- L. Golab et al. Update-pattern-aware modeling and processing of continuous queries. In SIGMOD. 2005. Google ScholarDigital Library
- G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, June 1993. Google ScholarDigital Library
- J. Gray, et al. Data Cube: a relational aggregation operator generalizing group-by, cross-tab and sub-total. In ICDE. February 1996. Google ScholarDigital Library
- M. A. Hammad, et al. Efficient pipelined execution of sliding window queries over data streams. Technical Report CSD TR#03-035, Purdue, 2003.Google Scholar
- M. A. Hammad, et al. Scheduling for shared window joins over data streams. In vldb. 2003. Google ScholarDigital Library
- V. Harinarayan, et al. Implementing data cubes efficiently. In SIGMOD. 1996. Google ScholarDigital Library
- M. Jarke. Common subexpression isolation in multiple query optimization. In Query Processing in Database Systems. Springer Verlag, 1985.Google Scholar
- S. Krishnamurthy, et al. TelegraphCQ: An architectural status report. IEEE DE. Bull., 26(1), 2003.Google Scholar
- S. Krishnamurthy, et al. The case for precision sharing. In VLDB. 2004. Google ScholarDigital Library
- J. Li, et al. No pane, no gain: Efficient evaluation of sliding-window aggregates over data streams. SIGMOD Record, March 2005. Google ScholarDigital Library
- S. R. Madden, et al. Continuously adaptive continuous queries over streams. In SIGMOD. 2002. Google ScholarDigital Library
- S. R. Madden, et al. TAG: a tiny aggregation service for ad-hoc sensor networks. In OSDI. 2002. Google ScholarDigital Library
- R. Motwani, et al. Query processing, resource management, and approximation in a data stream management system. In CIDR. 2003.Google Scholar
- NASDAQ. NASTRAQ: North American Securities Tracking and Quantifying System. http://www.nastraq.com/description.htm.Google Scholar
- NYSE. NYSE TAQ: Daily Trades and Quotes Database. http://www.nysedata.com/info/productdetail.asp?dpbid=13.Google Scholar
- P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD. 2000. Google ScholarDigital Library
- T. K. Sellis. Multiple-query optimization. ACM TODS, March 1988. Google ScholarDigital Library
- D. Srivastava, et al. Multiple aggregations over data streams. In SIGMOD. 2005. Google ScholarDigital Library
- N. Tatbul, et al. Load shedding in a data stream manager. In VLDB. 2003.Google Scholar
Index Terms
- On-the-fly sharing for streamed aggregation
Recommendations
Streaming multiple aggregations using phantoms
Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most ...
MTopS: scalable processing of continuous top-k multi-query workloads
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge managementA continuous top-k query retrieves the k most preferred objects in a data stream according to a given preference function. These queries are important for a broad spectrum of applications ranging from web-based advertising to financial analysis. In ...
Window query processing for joining data streams with relations
CASCON '07: Proceedings of the 2007 conference of the center for advanced studies on Collaborative researchQuery processing for data streams raises challenges that cannot be directly handled by existing database management systems (DBMS). Most related work in the literature mainly focuses on developing techniques for a dedicated data stream management system ...
Comments