skip to main content
article
Free Access

Ripple joins for online aggregation

Published:01 June 1999Publication History
Skip Abstract Section

Abstract

We present a new family of join algorithms, called ripple joins, for online processing of multi-table aggregation queries in a relational database management system (DBMS). Such queries arise naturally in interactive exploratory decision-support applications.

Traditional offline join algorithms are designed to minimize the time to completion of the query. In contrast, ripple joins are designed to minimize the time until an acceptably precise estimate of the query result is available, as measured by the length of a confidence interval. Ripple joins are adaptive, adjusting their behavior during processing in accordance with the statistical properties of the data. Ripple joins also permit the user to dynamically trade off the two key performance factors of on-line aggregation: the time between successive updates of the running aggregate, and the amount by which the confidence-interval length decreases at each update. We show how ripple joins can be implemented in an existing DBMS using iterators, and we give an overview of the methods used to compute confidence intervals and to adaptively optimize the ripple join “aspect-ratio” parameters. In experiments with an initial implementation of our algorithms in the POSTGRES DBMS, the time required to produce reasonably precise online estimates was up to two orders of magnitude smaller than the time required for the best offline join algorithms to produce exact answers.

References

  1. AS72 M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions. Dover, New York, 1972. Ninth printing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Bil86 P. Billingsley. Probability and Measure. Wiley, New York, 1986.Google ScholarGoogle Scholar
  3. CGL83 T. F. Chan, G. H. Golub, and R. J. LeVeque. Algorithms for computing She sample variance: Analysis and recommendation. Amer. Statist., 37:242-247, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  4. DKO+84 D. J. DeWitt, R. H. Katz, F. Olken, L. D. Shapiro, M. R. Michael R. Stonebraker, and D. Wood. Implementation Techniques for Main Memory Database Systems. In Proc. 198g A CM SIGMOD Intl. Conf. Managment of Data, pages 1-8. ACM Press, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. GG97 J. Gray and G. Graefe. The five-minute rule ten years later and other computer storage rules of thumb. SIGMOD Record, 26(4), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Gra93 G. Graefe. Query Evaluation Techniques for Large Databases. ACM Comput. Surveys, 25(2):73-170, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Haa97 P. J. Haas. Large-sample and deterministic confidence intervals for online aggregation. In Proc. Ninth Intl. Con}. Scientific and Statist. Database Management, pages 51-63. I}EEE Computer Society Press, 1997. Google ScholarGoogle Scholar
  8. HAR99 J. M. Hellerstein, R. Avnur, and V. Raman. informix under CONTROL: Online query processing. Submitted for publication, 1999.Google ScholarGoogle Scholar
  9. HH98 P. J. Haas and J. M. Hellerstein. Join algorithms for online aggregation. IBM Research Report RJ 10126, IBM Almaden Research Center, San Jose, CA, 1998.Google ScholarGoogle Scholar
  10. HHW97 J. M. Hellerstein, P. J. Haas, and H. J. Wang. Online aggregation. In Proc. 1997 ACM SIGMOD Intl. Conf. Managment of Data, pages 171-182. ACM Press, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. HN96 J. M. Hellerstein and J. F. Naughton. Query Execution Techniques for Caching Expensive Methods. in Proc. 1996 A CM SIGMOD Intl. Conf. Managment of Data, pages 423- 424. ACM Press, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. HNS94 P. J. Haas, J. F. Naughton, and A. N. Swami. On the relative cost of sampling for join selectivity estimation. In Proc. Thirteenth A CM SIGA CT-SIGMOD-SIGART Syrup. Principles of Database Sys., pages 14-24. ACM Press, 1994. Google ScholarGoogle Scholar
  13. HNSS96 P. J. Haas, J. F. Naughton, S. Seshadri, and A. N. Swami. Selectivity and cost estimation for joins based on ,'andom sampling. J. Comput. System Sci., 52:550-569, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Hoe48 W. Hoeffding. A class of statistics with asymptotically normal distribution. Ann. Math. Statist., 19:293-325, 1948.Google ScholarGoogle ScholarCross RefCross Ref
  15. HOT88 W. Hou, G. Ozsoyoglu, and B. Taneja. Statistical estimators for relational algebra expressions. In Proc. Seventh A CM SIGACT-SIGMOD-SIGART Syrup. Principles of Database Sys., pages 276-287. ACM Press, 1988. Google ScholarGoogle Scholar
  16. HOT89 W. Hou, G. Ozsoyoglu, and B. Taneja. Processing aggregate relational queries with hard time constraints. In Proc. 1989 A CM SIGMOD Intl. Conf. Managment of Data, pages 68-77. ACM Press, 1989. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. IBM97 IBM Corporation. IBM DB2 Universal Database Administration Guide, Version 5. North York, Ontario, Canada, 1997.Google ScholarGoogle Scholar
  18. Inf97 Informix Corporation. Informix Universal Server G~ide to SQL: Syntax, Version 9.01. Menlo Park, CA, March 1997.Google ScholarGoogle Scholar
  19. ODT+91 G. Ozsoyoglu, K. Du, A. Tjahjana, W. Hou, and D. Y. Rowland. On estimating COUNT, SUM, and AVERAGE relational algebra queries. In D. Dimitris Karagiannis, editor, Database and Expert Systems Applications, Proceedings of the International Conference in Berlin, Germany, 1991 (DEXA 91), pages 406-412. Springer-Verlag, 1991.Google ScholarGoogle Scholar
  20. OJ93 V. O'day and R. Jeffries. Orienteering in an informati:on landscape: How information seekers get from here to there. In Human Factors in Computing Systems: INTERCHI '93 Conf. Proc., pages 438-445. ACM Press, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Olk93 F. Olken. Random Sampling .from Databases. Ph.D. Dissertation, University of California, Berkeley, CA, 1993. Available as Tech. Report LBL-32883, Lawrence Berkeley Laboratories, Berkeley, CA.Google ScholarGoogle Scholar
  22. Ora97 Oracle Corporation. Oracle8 Server SQL Reference, Release 8.0. Redwood Shores, CA, June 1997.Google ScholarGoogle Scholar
  23. Pos98 PostgreSQL Home Page, 1998. http://www.postgre,aql .org.Google ScholarGoogle Scholar
  24. RRH99 V. Raman, B. Raman, and J. M. Hellerstein. Online dynamic reordering for interactive data processing. Technical Report UCB//CSD-99-1043, Computer Science Division, UC Berkeley, 1999. Submitted for publication. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. RSS94 R. Ramakrishnan, D. Srivastava, and S. Sudarshan. Rtde ordering in bottom-up fixpoint evaluation of logic prograrrts. Trans. Knowledge and Data Engrg., 6(4):501-517, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. WA91 A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. In Proc. First Intl. Conf. Parallel and Distributed Info. Sys. (PDIS), pages 68-77. IEEE Computer Society Press, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Ripple joins for online aggregation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGMOD Record
      ACM SIGMOD Record  Volume 28, Issue 2
      June 1999
      599 pages
      ISSN:0163-5808
      DOI:10.1145/304181
      Issue’s Table of Contents
      • cover image ACM Conferences
        SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
        June 1999
        604 pages
        ISBN:1581130848
        DOI:10.1145/304182

      Copyright © 1999 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 June 1999

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader