Ripple joins for online aggregation

Authors:
Peter J. Haas

Almaden Research Center, IBM Research Division

Almaden Research Center, IBM Research Division
View Profile

,
Joseph M. Hellerstein

Computer Science Division, University of California, Berkeley

Computer Science Division, University of California, Berkeley
View Profile

Authors Info & Claims

ACM SIGMOD Record Volume 28 Issue 2June 1999pp 287–298https://doi.org/10.1145/304181.304208

Published:01 June 1999Publication History

ACM SIGMOD Record

Abstract

We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support applications.

Traditional offline join algorithms are designed to minimize the time to completion of the query. In contrast, ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available, as measured by the length of a confidence interval. Ripple joins are adaptive, adjusting their behavior during processing in accordance with the statistical properties of the data. Ripple joins also permit the user to dynamically trade off the two key performance factors of on-line aggregation: the time between successive updates of the running aggregate, and the amount by which the confidence-interval length decreases at each update. We show how ripple joins can be implemented in an existing DBMS using iterators, and we give an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters. In experiments with an initial implementation of our algorithms in the POSTGRES DBMS, the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers.

References

AS72 M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions. Dover, New York, 1972. Ninth printing. Google ScholarDigital Library
Bil86 P. Billingsley. Probability and Measure. Wiley, New York, 1986.Google Scholar
CGL83 T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing She sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.Google ScholarCross Ref
DKO+84 D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Michael R. Stonebraker, and D. Wood. Implementation Techniques for Main Memory Database Systems. In Proc. 198g A CM SIGMOD Intl. Conf. Managment of Data, pages 1-8. ACM Press, 1984. Google ScholarDigital Library
GG97 J. Gray and G. Graefe. The five-minute rule ten years later and other computer storage rules of thumb. SIGMOD Record, 26(4), 1997. Google ScholarDigital Library
Gra93 G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surveys, 25(2):73-170, June 1993. Google ScholarDigital Library
Haa97 P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. In Proc. Ninth Intl. Con}. Scientific and Statist. Database Management, pages 51-63. I}EEE Computer Society Press, 1997. Google Scholar
HAR99 J. M. Hellerstein, R. Avnur, and V. Raman. informix under CONTROL: Online query processing. Submitted for publication, 1999.Google Scholar
HH98 P. J. Haas and J. M. Hellerstein. Join algorithms for online aggregation. IBM Research Report RJ 10126, IBM Almaden Research Center, San Jose, CA, 1998.Google Scholar
HHW97 J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In Proc. 1997 ACM SIGMOD Intl. Conf. Managment of Data, pages 171-182. ACM Press, 1997. Google ScholarDigital Library
HN96 J. M. Hellerstein and J. F. Naughton. Query Execution Techniques for Caching Expensive Methods. in Proc. 1996 A CM SIGMOD Intl. Conf. Managment of Data, pages 423- 424. ACM Press, 1996. Google ScholarDigital Library
HNS94 P. J. Haas, J. F. Naughton, and A. N. Swami. On the relative cost of sampling for join selectivity estimation. In Proc. Thirteenth A CM SIGA CT-SIGMOD-SIGART Syrup. Principles of Database Sys., pages 14-24. ACM Press, 1994. Google Scholar
HNSS96 P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on ,'andom sampling. J. Comput. System Sci., 52:550-569, 1996. Google ScholarDigital Library
Hoe48 W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statist., 19:293-325, 1948.Google ScholarCross Ref
HOT88 W. Hou, G. Ozsoyoglu, and B. Taneja. Statistical estimators for relational algebra expressions. In Proc. Seventh A CM SIGACT-SIGMOD-SIGART Syrup. Principles of Database Sys., pages 276-287. ACM Press, 1988. Google Scholar
HOT89 W. Hou, G. Ozsoyoglu, and B. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. 1989 A CM SIGMOD Intl. Conf. Managment of Data, pages 68-77. ACM Press, 1989. Google ScholarDigital Library
IBM97 IBM Corporation. IBM DB2 Universal Database Administration Guide, Version 5. North York, Ontario, Canada, 1997.Google Scholar
Inf97 Informix Corporation. Informix Universal Server G~ide to SQL: Syntax, Version 9.01. Menlo Park, CA, March 1997.Google Scholar
ODT+91 G. Ozsoyoglu, K. Du, A. Tjahjana, W. Hou, and D. Y. Rowland. On estimating COUNT, SUM, and AVERAGE relational algebra queries. In D. Dimitris Karagiannis, editor, Database and Expert Systems Applications, Proceedings of the International Conference in Berlin, Germany, 1991 (DEXA 91), pages 406-412. Springer-Verlag, 1991.Google Scholar
OJ93 V. O'day and R. Jeffries. Orienteering in an informati:on landscape: How information seekers get from here to there. In Human Factors in Computing Systems: INTERCHI '93 Conf. Proc., pages 438-445. ACM Press, 1993. Google ScholarDigital Library
Olk93 F. Olken. Random Sampling .from Databases. Ph.D. Dissertation, University of California, Berkeley, CA, 1993. Available as Tech. Report LBL-32883, Lawrence Berkeley Laboratories, Berkeley, CA.Google Scholar
Ora97 Oracle Corporation. Oracle8 Server SQL Reference, Release 8.0. Redwood Shores, CA, June 1997.Google Scholar
Pos98 PostgreSQL Home Page, 1998. http://www.postgre,aql .org.Google Scholar
RRH99 V. Raman, B. Raman, and J. M. Hellerstein. Online dynamic reordering for interactive data processing. Technical Report UCB//CSD-99-1043, Computer Science Division, UC Berkeley, 1999. Submitted for publication. Google ScholarDigital Library
RSS94 R. Ramakrishnan, D. Srivastava, and S. Sudarshan. Rtde ordering in bottom-up fixpoint evaluation of logic prograrrts. Trans. Knowledge and Data Engrg., 6(4):501-517, 1994. Google ScholarDigital Library
WA91 A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Info. Sys. (PDIS), pages 68-77. IEEE Computer Society Press, 1991. Google ScholarDigital Library

Index Terms

Ripple joins for online aggregation
1. Information systems
  1. Data management systems
    1. Database design and models

Recommendations

Ripple joins for online aggregation
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data

We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support ...
Read More
Reducing outer joins

We present a method for transforming some outer joins to inner joins and describe a generalized semijoin reduction technique. The first part of the paper shows how to transform a given outer join query whose join graph is a tree to an equivalent inner ...
Read More
Optimization of joins using random record generation method
A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in India

Joins are statements that retrieve data from more than one table. A Join is characterized by multiple tables in the FROM clause, and the relationship between the tables is defined through the existence of a Join condition in the WHERE clause. In the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM SIGMOD Record Volume 28, Issue 2
June 1999
599 pages
ISSN:0163-5808
DOI:10.1145/304181
Chairmen:
Susan Davidson
Univ. of Pennsylvania
,
Christos Faloutsos
Carnegie Mellon Univ.
Issue’s Table of Contents
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
June 1999
604 pages
ISBN:1581130848
DOI:10.1145/304182
Chairmen:
Susan B. Davidson
Univ. of Pennsylvania, Philidelphia
,
Christos Faloutsos
Carnegie Mellon Univ., Pittsburgh
Copyright © 1999 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 June 1999
Check for updates
Qualifiers
- article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 264
  Total Citations
  View Citations
- 2,357
  Total Downloads
- Downloads (Last 12 months)215
- Downloads (Last 6 weeks)22
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Ripple joins for online aggregation

ACM SIGMOD Record

Abstract

References

Cited By

Index Terms

Recommendations

Ripple joins for online aggregation

Reducing outer joins

Optimization of joins using random record generation method