Abstract
We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support applications.
Traditional offline join algorithms are designed to minimize the time to completion of the query. In contrast, ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available, as measured by the length of a confidence interval. Ripple joins are adaptive, adjusting their behavior during processing in accordance with the statistical properties of the data. Ripple joins also permit the user to dynamically trade off the two key performance factors of on-line aggregation: the time between successive updates of the running aggregate, and the amount by which the confidence-interval length decreases at each update. We show how ripple joins can be implemented in an existing DBMS using iterators, and we give an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters. In experiments with an initial implementation of our algorithms in the POSTGRES DBMS, the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers.
- AS72 M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions. Dover, New York, 1972. Ninth printing. Google ScholarDigital Library
- Bil86 P. Billingsley. Probability and Measure. Wiley, New York, 1986.Google Scholar
- CGL83 T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing She sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.Google ScholarCross Ref
- DKO+84 D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Michael R. Stonebraker, and D. Wood. Implementation Techniques for Main Memory Database Systems. In Proc. 198g A CM SIGMOD Intl. Conf. Managment of Data, pages 1-8. ACM Press, 1984. Google ScholarDigital Library
- GG97 J. Gray and G. Graefe. The five-minute rule ten years later and other computer storage rules of thumb. SIGMOD Record, 26(4), 1997. Google ScholarDigital Library
- Gra93 G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surveys, 25(2):73-170, June 1993. Google ScholarDigital Library
- Haa97 P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. In Proc. Ninth Intl. Con}. Scientific and Statist. Database Management, pages 51-63. I}EEE Computer Society Press, 1997. Google Scholar
- HAR99 J. M. Hellerstein, R. Avnur, and V. Raman. informix under CONTROL: Online query processing. Submitted for publication, 1999.Google Scholar
- HH98 P. J. Haas and J. M. Hellerstein. Join algorithms for online aggregation. IBM Research Report RJ 10126, IBM Almaden Research Center, San Jose, CA, 1998.Google Scholar
- HHW97 J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In Proc. 1997 ACM SIGMOD Intl. Conf. Managment of Data, pages 171-182. ACM Press, 1997. Google ScholarDigital Library
- HN96 J. M. Hellerstein and J. F. Naughton. Query Execution Techniques for Caching Expensive Methods. in Proc. 1996 A CM SIGMOD Intl. Conf. Managment of Data, pages 423- 424. ACM Press, 1996. Google ScholarDigital Library
- HNS94 P. J. Haas, J. F. Naughton, and A. N. Swami. On the relative cost of sampling for join selectivity estimation. In Proc. Thirteenth A CM SIGA CT-SIGMOD-SIGART Syrup. Principles of Database Sys., pages 14-24. ACM Press, 1994. Google Scholar
- HNSS96 P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on ,'andom sampling. J. Comput. System Sci., 52:550-569, 1996. Google ScholarDigital Library
- Hoe48 W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statist., 19:293-325, 1948.Google ScholarCross Ref
- HOT88 W. Hou, G. Ozsoyoglu, and B. Taneja. Statistical estimators for relational algebra expressions. In Proc. Seventh A CM SIGACT-SIGMOD-SIGART Syrup. Principles of Database Sys., pages 276-287. ACM Press, 1988. Google Scholar
- HOT89 W. Hou, G. Ozsoyoglu, and B. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. 1989 A CM SIGMOD Intl. Conf. Managment of Data, pages 68-77. ACM Press, 1989. Google ScholarDigital Library
- IBM97 IBM Corporation. IBM DB2 Universal Database Administration Guide, Version 5. North York, Ontario, Canada, 1997.Google Scholar
- Inf97 Informix Corporation. Informix Universal Server G~ide to SQL: Syntax, Version 9.01. Menlo Park, CA, March 1997.Google Scholar
- ODT+91 G. Ozsoyoglu, K. Du, A. Tjahjana, W. Hou, and D. Y. Rowland. On estimating COUNT, SUM, and AVERAGE relational algebra queries. In D. Dimitris Karagiannis, editor, Database and Expert Systems Applications, Proceedings of the International Conference in Berlin, Germany, 1991 (DEXA 91), pages 406-412. Springer-Verlag, 1991.Google Scholar
- OJ93 V. O'day and R. Jeffries. Orienteering in an informati:on landscape: How information seekers get from here to there. In Human Factors in Computing Systems: INTERCHI '93 Conf. Proc., pages 438-445. ACM Press, 1993. Google ScholarDigital Library
- Olk93 F. Olken. Random Sampling .from Databases. Ph.D. Dissertation, University of California, Berkeley, CA, 1993. Available as Tech. Report LBL-32883, Lawrence Berkeley Laboratories, Berkeley, CA.Google Scholar
- Ora97 Oracle Corporation. Oracle8 Server SQL Reference, Release 8.0. Redwood Shores, CA, June 1997.Google Scholar
- Pos98 PostgreSQL Home Page, 1998. http://www.postgre,aql .org.Google Scholar
- RRH99 V. Raman, B. Raman, and J. M. Hellerstein. Online dynamic reordering for interactive data processing. Technical Report UCB//CSD-99-1043, Computer Science Division, UC Berkeley, 1999. Submitted for publication. Google ScholarDigital Library
- RSS94 R. Ramakrishnan, D. Srivastava, and S. Sudarshan. Rtde ordering in bottom-up fixpoint evaluation of logic prograrrts. Trans. Knowledge and Data Engrg., 6(4):501-517, 1994. Google ScholarDigital Library
- WA91 A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Info. Sys. (PDIS), pages 68-77. IEEE Computer Society Press, 1991. Google ScholarDigital Library
Index Terms
- Ripple joins for online aggregation
Recommendations
Ripple joins for online aggregation
SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of dataWe present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support ...
Reducing outer joins
We present a method for transforming some outer joins to inner joins and describe a generalized semijoin reduction technique. The first part of the paper shows how to transform a given outer join query whose join graph is a tree to an equivalent inner ...
Optimization of joins using random record generation method
A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in IndiaJoins are statements that retrieve data from more than one table. A Join is characterized by multiple tables in the FROM clause, and the relationship between the tables is defined through the existence of a Join condition in the WHERE clause. In the ...
Comments