skip to main content
10.1145/304182.304203acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

A comparison of selectivity estimators for range queries on metric attributes

Authors Info & Claims
Published:01 June 1999Publication History

ABSTRACT

In this paper, we present a comparison of nonparametric estimation methods for computing approximations of the selectivities of queries, in particular range queries. In contrast to previous studies, the focus of our comparison is on metric attributes with large domains which occur for example in spatial and temporal databases. We also assume that only small sample sets of the required relations are available for estimating the selectivity. In addition to the popular histogram estimators, our comparison includes so-called kernel estimation methods. Although these methods have been proven to be among the most accurate estimators known in statistics, they have not been considered for selectivity estimation of database queries, so far. We first show how to generate kernel estimators that deliver accurate approximate selectivities of queries. Thereafter, we reveal that two parameters, the number of samples and the so-called smoothing parameter, are important for the accuracy of both kernel estimators and histogram estimators. For histogram estimators, the smoothing parameter determines the number of bins (histogram classes). We first present the optimal smoothing parameter as a function of the number of samples and show how to compute approximations of the optimal parameter. Moreover, we propose a new selectivity estimator that can be viewed as an hybrid of histogram and kernel estimators. Experimental results show the performance of different estimators in practice. We found in our experiments that kernel estimators are most efficient for continuously distributed data sets, whereas for our real data sets the hybrid technique is most promising.

References

  1. 1.Chert, C.M. & Roussopoulos, N. "Adaptive Selectivity Estimation Using Query Feedback" Proc. ACM-SIG- MOD Intl. Conf. on Management of Data 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Ioannidis, Yannis E. & Christodoulakis, S. "Optimal Histograms for Limiting Worst-Case Error Propagation in the Size of Join Results" TODS 18(4) 1993.709-748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. 3.Gregory Piatetsky-Shapiro & Charles Connell. "Accurate Estimation of the Number of Tuples Satisfying a Condition" SIGMOD Conference 1984. 256-276. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Yossi Matias, Jeffrey Scott Vitter, Min Wang. "Wavelet-Based Histograms for Selectivity Estimation" SIG- MOD Conference 1998.448-459. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. 5.Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya. "Random Sampling for Histogram Con-. struction: How much is enough?" SIGMOD Conference 1998.436-447. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 6.Joseph M. Hellerstein, Peter J. Haas, Helen Wang. "Online Aggregation" SIGMOD Conference 1997. 171-182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. 7.H. V. Jagadish, Nick Koudas, S. Muthukrishnan, Viswanath Poosala, Kenneth C. Sevcik, Torsten Sue,1. "Optimal Histograms with Quality Guarantees" VLDB 1998. 275-286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. 8.Viswanath Poosala, Yannis E. Ioannidis, Peter J. Haas, Eugene J. Shekita. "Improved Histograms for Selectivity Estimation of Range Predicates" SIGMOD Conference 1996. 294-305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. 9.Michael V. Mannino, Paicheng Chu, Thomas Sager. "Statistical Profile Estimation in Database Systems" Computing Surveys 20(3) 1988. 191-221. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. 10.Gasser, T. & Engel, J. & Seifert, B. ,,Nonparametric function estimation" in: Rao (Ed.),"Handbook of Sta.tistics Vol. 9", North Holland 1993.Google ScholarGoogle Scholar
  11. 11.David W. Scott. "Multivariate Density Estimation" Wiley & Sons 1992.Google ScholarGoogle Scholar
  12. 12.Selinger, P.G. & Astrahan, M.M. & Chamberlin, D.D. & Lorie, R.A. & Price, T.T. ,,Access path selection in a relational database management system" Proc. ACM SGMOD Conference 1979.23-34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Silverman, B.W. ,,Density Estimation for Statistics and Data Analysis" Chapman & Hall 1986.Google ScholarGoogle Scholar
  14. 14.Simonoff, J. & Dong, J. "The Construction and Properties of Boundary Kernels for Sparse Multinomials" Journal of Computational and Graphical Statistics 1994.Google ScholarGoogle Scholar
  15. 15.Wand, M.P. & Jones, M.C. ,,Kernel Smoothing" Chapman & Hall 1995.Google ScholarGoogle Scholar
  16. 16.Brodsky, B.E. & Darkhovsky, B.S. "Nonparametric Methods in change-point problems" Kluwer Academic Publishers 1993.Google ScholarGoogle Scholar
  17. 17.http://www.mathematik.uni-marburg.de/DBS/download/data.htmlGoogle ScholarGoogle Scholar
  18. 18.http://www, tiger, gov/Google ScholarGoogle Scholar
  19. 19.http://www.kdnuggets.com/datasets,htmlGoogle ScholarGoogle Scholar

Index Terms

  1. A comparison of selectivity estimators for range queries on metric attributes

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          SIGMOD '99: Proceedings of the 1999 ACM SIGMOD international conference on Management of data
          June 1999
          604 pages
          ISBN:1581130848
          DOI:10.1145/304182

          Copyright © 1999 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 June 1999

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • Article

          Acceptance Rates

          Overall Acceptance Rate785of4,003submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader