skip to main content
10.1145/1807167.1807239acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Results Reproduced / v1.1

Histograms reloaded: the merits of bucket diversity

Published:06 June 2010Publication History

ABSTRACT

Virtually all histograms store for each bucket the number of distinct values it contains and their average frequency. In this paper, we question this paradigm. We start out by investigating the estimation precision of three commercial database systems which also follow the above paradigm. It turns out that huge errors are quite common. We then introduce new bucket types and investigate their accuracy when building optimal histograms with them. The results are ambiguous. There is no clear winner among the bucket types. At this point, we (1) switch to heterogeneous histograms, where different buckets of the same histogram possibly are of different types, and (2) design more bucket types. The nice consequence of introducing heterogeneous histograms is that we can guarantee decent upper error bounds while at the same time heterogeneous histograms require far less space than homogeneous histograms.

References

  1. M. Charikar, S. Chaudhuri, R. Motwani, and V. Narasayya. Towards estimation error guarantees for distinct values. In Proc. ACM SIGMOD/SIGACT Conf. on Princ. of Database Syst. (PODS), pages 268--279, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. T. Cormen, C. Leiserson, and R. Rivest. Introduction to Algorithms. MIT Press, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. P. Gibbons. Distinct sampling for highly-accurate answers to distinct values queries and event reports. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 541--550, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Haas, M. Carey, M. Livny, and A. Shukla. Seeking the truth about ad hoc join costs. The VLDB Journal, 6(3):241--256, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Ioannidis. Universality of serial histograms. In Proc. Int. Conf. on Very Large Data Bases (VLDB), pages 256--267, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Ioannidis and S. Christodoulakis. Optimal histograms for limiting worst-case error propagation in the size of join results. ACM Trans. on Database Systems, 18(4):709--748, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Y. Ioannidis and V. Poosola. Balancing histogram optimality and practicality for query result size estimation. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 233--244, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Y. Ioannidis and V. Poosola. Histogram-based solutions to diverse database estimation problems. IEEE Data Engineering Bulletin, 18(3):10--18, Sept 1995.Google ScholarGoogle Scholar
  9. R. Kooi. The Optimization of Queries in Relational Databases. PhD thesis, Case Western Reserve University, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. G. Moerkotte. Profiles for single attributes. Technical Report TR-2010-003, Department of Mathematics and Computer Science, University of Mannheim, Mannheim, Germany, 2010.Google ScholarGoogle Scholar
  11. G. Moerkotte, T. Neumann, and G. Steidl. Preventing bad plans by bounding the impact of cardinality estimation errors. Proc. Int. Conf. on Very Large Data Bases (VLDB), 2(1):982--993, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Piatetsky-Shapiro and C. Connell. Accurate estimation of the number of tuples satisfying a condition. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 256--276, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. V. Poosola, Y. Ioannidis, P. Haas, and E. Shekita. Improved histograms for selectivity estimates of range predicates. In Proc. of the ACM SIGMOD Conf. on Management of Data, pages 294--305, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Histograms reloaded: the merits of bucket diversity

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            SIGMOD '10: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
            June 2010
            1286 pages
            ISBN:9781450300322
            DOI:10.1145/1807167

            Copyright © 2010 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 6 June 2010

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate785of4,003submissions,20%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader