Article

On-the-fly sharing for streamed aggregation

Authors:
Sailesh Krishnamurthy

UC Berkeley

UC Berkeley
View Profile

,
Chung Wu

Google

Google
View Profile

,
Michael Franklin

UC Berkeley

UC Berkeley
View Profile

SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataJune 2006Pages 623–634https://doi.org/10.1145/1142473.1142543

Published:27 June 2006Publication History

SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

Pages 623–634

ABSTRACT

Data streaming systems are becoming essential for monitoring applications such as financial analysis and network intrusion detection. These systems often have to process many similar but different queries over common data. Since executing each query separately can lead to significant scalability and performance problems, it is vital to share resources by exploiting similarities in the queries. In this paper we present ways to efficiently share streaming aggregate queries with differing periodic windows and arbitrary selection predicates. A major contribution is our sharing technique that does not require any up-front multiple query optimization. This is a significant departure from existing techniques that rely on complex static analyses of fixed query workloads. Our approach is particularly vital in streaming systems where queries can join and leave the system at any point. We present a detailed performance study that evaluates our strategies with an implementation and real data. In these experiments, our approach gives us as much as an order of magnitude performance improvement over the state of the art.

References

A. Arasu et al. Resource sharing in continuous sliding-window aggregates. In VLDB. 2004. Google ScholarDigital Library
A. Arasu, et al. The CQL continuous query language: Semantic foundations and query execution. VLDB Journal, (To appear). Google ScholarDigital Library
F. Bancilhon, et al. FAD, a powerful and simple database language. In VLDB. 1987. Google ScholarDigital Library
D. Carney, et al. Monitoring streams - a new class of data management applications. In VLDB. 2002. Google ScholarDigital Library
S. Chandrasekaran et al. Streaming queries over streaming data. In VLDB. 2002. Google ScholarDigital Library
S. Chandrasekaran, et al. TelegraphCQ: Continuous dataflow processing for an uncertain world. In CIDR. 2003.Google Scholar
J. Chen, et al. NiagaraCQ: a scalable continuous query system for Internet databases. In SIGMOD. 2000. Google ScholarDigital Library
C. D. Cranor, et al. Gigascope: A stream database for network applications. In SIGMOD. 2003. Google ScholarDigital Library
M. Denny et al. Predicate result range caching for continuous queries. In SIGMOD. 2005. Google ScholarDigital Library
P. M. Deshpande, et al. Caching multidimensional queries using chunks. In SIGMOD. 1998. Google ScholarDigital Library
C. L. Forgy. Rete: A fast algorithm for the many pattern/many object match problem. Artifical Intelligence, 19(1):17--37, September 1982.Google ScholarDigital Library
M. J. Franklin, et al. Design considerations for high fan-in systems: The HiFi approach. In CIDR. 2005.Google Scholar
L. Golab et al. Update-pattern-aware modeling and processing of continuous queries. In SIGMOD. 2005. Google ScholarDigital Library
G. Graefe. Query evaluation techniques for large databases. ACM Computing Surveys, 25(2):73--170, June 1993. Google ScholarDigital Library
J. Gray, et al. Data Cube: a relational aggregation operator generalizing group-by, cross-tab and sub-total. In ICDE. February 1996. Google ScholarDigital Library
M. A. Hammad, et al. Efficient pipelined execution of sliding window queries over data streams. Technical Report CSD TR#03-035, Purdue, 2003.Google Scholar
M. A. Hammad, et al. Scheduling for shared window joins over data streams. In vldb. 2003. Google ScholarDigital Library
V. Harinarayan, et al. Implementing data cubes efficiently. In SIGMOD. 1996. Google ScholarDigital Library
M. Jarke. Common subexpression isolation in multiple query optimization. In Query Processing in Database Systems. Springer Verlag, 1985.Google Scholar
S. Krishnamurthy, et al. TelegraphCQ: An architectural status report. IEEE DE. Bull., 26(1), 2003.Google Scholar
S. Krishnamurthy, et al. The case for precision sharing. In VLDB. 2004. Google ScholarDigital Library
J. Li, et al. No pane, no gain: Efficient evaluation of sliding-window aggregates over data streams. SIGMOD Record, March 2005. Google ScholarDigital Library
S. R. Madden, et al. Continuously adaptive continuous queries over streams. In SIGMOD. 2002. Google ScholarDigital Library
S. R. Madden, et al. TAG: a tiny aggregation service for ad-hoc sensor networks. In OSDI. 2002. Google ScholarDigital Library
R. Motwani, et al. Query processing, resource management, and approximation in a data stream management system. In CIDR. 2003.Google Scholar
NASDAQ. NASTRAQ: North American Securities Tracking and Quantifying System. http://www.nastraq.com/description.htm.Google Scholar
NYSE. NYSE TAQ: Daily Trades and Quotes Database. http://www.nysedata.com/info/productdetail.asp?dpbid=13.Google Scholar
P. Roy, et al. Efficient and extensible algorithms for multi query optimization. In SIGMOD. 2000. Google ScholarDigital Library
T. K. Sellis. Multiple-query optimization. ACM TODS, March 1988. Google ScholarDigital Library
D. Srivastava, et al. Multiple aggregations over data streams. In SIGMOD. 2005. Google ScholarDigital Library
N. Tatbul, et al. Load shedding in a data stream manager. In VLDB. 2003.Google Scholar

Index Terms

On-the-fly sharing for streamed aggregation
1. Information systems
  1. Data management systems

Recommendations

Streaming multiple aggregations using phantoms

Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most ...
Read More
MTopS: scalable processing of continuous top-k multi-query workloads
CIKM '11: Proceedings of the 20th ACM international conference on Information and knowledge management

A continuous top-k query retrieves the k most preferred objects in a data stream according to a given preference function. These queries are important for a broad spectrum of applications ranging from web-based advertising to financial analysis. In ...
Read More
Window query processing for joining data streams with relations
CASCON '07: Proceedings of the 2007 conference of the center for advanced studies on Collaborative research

Query processing for data streams raises challenges that cannot be directly handled by existing database management systems (DBMS). Most related work in the literature mainly focuses on developing techniques for a dedicated data stream management system ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data
June 2006
830 pages
ISBN:1595934340
DOI:10.1145/1142473
General Chairs:
Clement Yu
University of Illinois at Chicago
,
Peter Scheuermann
Northwestern University
,
Program Chair:
Surajit Chaudhuri
Microsoft Research
Copyright © 2006 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 27 June 2006
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
aggregation
multiple-query optimization
shared processing
streaming data
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 126
  Total Citations
  View Citations
- 1,698
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

On-the-fly sharing for streamed aggregation

SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Streaming multiple aggregations using phantoms

MTopS: scalable processing of continuous top-k multi-query workloads

Window query processing for joining data streams with relations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

On-the-fly sharing for streamed aggregation

SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

Streaming multiple aggregations using phantoms

MTopS: scalable processing of continuous top-k multi-query workloads

Window query processing for joining data streams with relations

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media