skip to main content
article

Aggregate operators in probabilistic databases

Published:01 January 2005Publication History
Skip Abstract Section

Abstract

Though extensions to the relational data model have been proposed in order to handle probabilistic information, there has been very little work to date on handling aggregate operators in such databases. In this article, we present a very general notion of an aggregate operator and show how classical aggregation operators (such as COUNT, SUM, etc.) as well as statistical operators (such as percentiles, variance, etc.) are special cases of this general definition. We devise a formal linear programming based semantics for computing aggregates over probabilistic DBMSs, develop algorithms that satisfy this semantics, analyze their complexity, and introduce several families of approximation algorithms that run in polynomial time. We implemented all of these algorithms and tested them on a large set of data to help determine when each one is preferable.

References

  1. Aitchison, J., and Dunsmore, I. R. 1975. Statistical Prediction Analysis. Cambridge University Press, Cambridge.]]Google ScholarGoogle Scholar
  2. Barbara, D., Garcia-Molina, H., and Porter, D. 1992. The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4, 5, 487--502.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bobick, A., and Ivanov, Y. 1998. Action recognition using probabilistic parsing. In Proceedings of the Symposium on Computer Vision and Pattern Recognition (San Barbara, Calif.). IEEE Computer Society Press, Los Alamitos, Calif., 196--202.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cavallo, R., and Pittarelli, M. 1987. The theory of probabilistic databases. In Proceedings of the 13th International Conference on Very Large Databases (Los Altos, Calif.). Morgan-Kaufmann, San Mateo, Calif., 71--81.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD Conference. ACM, New York, 551--562.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cormen, T. H., Leiserson, C. E., and Rivest, R. L. 1992. Introduction to Algorithms. MIT Press, Cambridge, Mass.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cramton, P. 1997. The FCC spectrum auctions: An early assessment. J. Econ. Manage. Strat. 6, 3, 431--495.]]Google ScholarGoogle ScholarCross RefCross Ref
  8. Dekhtyar, A., Ross, R., and Subrahmanian, V. S. 2001. Probabilistic temporal databases. I: Algebra. ACM Trans. Datab. Syst. 26, 1, 41--95.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Dekhtyar, A., and Subrahmanian, V. S. 1997. Hybrid probabilistic programs. In Proceedings of the 14th International Conference on Logic Program. MIT Press, Cambridge, Mass., 391--405.]]Google ScholarGoogle Scholar
  10. Dey, D., and Sarkar, S. 1996. A probabilistic relational model and algebra. ACM Trans. Datab. Syst. 21, 3, 339--369.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dyreson, C. E., and Snodgrass, R. T. 1998. Supporting valid-time indeterminacy. ACM Trans. Datab. Syst. 23, 1, 1--57.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eiter, T., Lukasiewicz, T., and Walter, M. 2000. Extension of the relational algebra to probabilistic complex values. In Proceedings of the 1st International Symposium on Foundations of Information and Knowledge Systems. Springer, Heidelberg, Germany, 94--115.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fagin, R., Halpern, J. Y., and Megiddo, N. 1990. A logic for reasoning about probabilities. Inf. Comput. 87, 1/2, 78--128.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fitting, M. 1991. Bilattices and the semantics of logic programming. J. Logic Program. 11, 1--2, 91--116.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Fuhr, N., and Pfeifer, U. 1994. Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions. ACM Trans. Inf. Syst. 12, 1, 92--115.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Fuhr, N., and Rolleke, T. 1996. A probabilistic NF2 relational algebra for integrated information retrieval and database systems. In Proceedings of the 2nd World Conference on Integrated Design and Process Technology. Society for Design and Process Science, Austin, Tex. 17--30.]]Google ScholarGoogle Scholar
  17. Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, San Francisco, Calif.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gee, J., Briquer, L., Barillot, C., and Haynor, D. 1995. Probabilistic matching of brain images. In Information Processing in Medical Imaging. Kluwer Academic Publishers, Dordrecht, 113--125.]]Google ScholarGoogle Scholar
  19. Kiessling, W., Thone, H., and Guntzer, U. 1992. Database support for problematic knowledge. In Proceedings of the 3rd International Conference on Extending Database Technology. Springer-Verlag, Berlin, Germany, 421--436.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Kifer, M., and Li, A. 1988. On the semantics of rule-based expert systems with uncertainty. In Proceedings of the 2nd International Conference on Database Theory. Springer-Verlag, New York, 102--117.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Korth, H., and Silberschatz, A. 1991. Database System Concepts, 2nd ed. McGraw-Hill, New York.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lakshmanan, L. V. S., Leone, N., Ross, R., and Subrahmanian, V. S. 1997. ProbView: A flexible probabilistic database system. ACM Trans. Datab. Syst. 22, 3, 419--469.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lakshmanan, L. V. S., and Sadri, F. 1994a. Modeling uncertainty in deductive databases. In Proceedings of the 5th International Conference on Database and Expert Systems Applications. Springer, Berlin, Germany, 724--733.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Lakshmanan, L. V. S., and Sadri, F. 1994b. Probabilistic deductive databases. In Proceedings of the 11th International Logic Programing Symposium (London, England). MIT Press, Cambridge, Mass., 254--268.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Lakshmanan, L. V. S., and Shiri, N. 2001. A parametric approach to deductive databases with uncertainty. IEEE Trans. Knowl. Data Eng. 13, 4, 554--570.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Mandelbaum, R., Kamberova, G., and Mintz, M. 1998. Stereo depth estimation: A confidence interval approach. In Proceedings of the 6th International Conference on Computer Vision. Narosa Publishing House, New Delhi, 503--509.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Ng, R., and Subrahmanian, V. S. 1992. Probabilistic logic programming. Inf. Comput. 101, 2, 150--201.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Ng, R., and Subrahmanian, V. S. 1994. Stable semantics for probabilistic deductive databases. Inf. Comput. 110, 1, 42--83.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Oliver, N., Rosario, B., and Pentland, A. 1998. Statistical modeling of human interactions. In Proceedings of the Symposium on Computer Vision and Pattern Recognition (Santa Barbara, Calif.). IEEE Computer Society Press, Los Alamitos, Calif., 39--46.]]Google ScholarGoogle Scholar
  30. Ross, R., Subrahmanian, V. S., and Grant, J. 2002. Probabilistic aggregates. In Proceedings of the 13th International Symposium on Methodologies for Intelligent System. Springer, Berlin, Germany, 553--564.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Shiri, N. 1997. On a generalized theory of deductive databases. Ph.D. thesis, Concordia University, Montreal.]]Google ScholarGoogle Scholar
  32. Street, W. N., Mangasarian, O. L., and Wolberg, W. H. 1995. An inductive learning approach to prognostic prediction. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, Calif. 522--530.]]Google ScholarGoogle Scholar
  33. Subrahmanian, V. S. 1987. On the semantics of quantitative logic programs. In Proceedings of the 4th Symposium on Logic Programming (Washington, D.C.). IEEE Computer Society Press, Los Alamitos, Calif., 173--182.]]Google ScholarGoogle Scholar
  34. Thirunarayan, K., and Kifer, M. 1993. A theory of nonmonotonic inheritance based on annotated logic. Artif. Intel. 60, 1, 23--50.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. van Emden, M. H. 1986. Quantitative deduction and its fixpoint theory. J. Logic Program. 3, 1, 37--53.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. van Rijsbergen, C. J. 1979. Information Retrieval. Butterworth, London, England.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Wolfram, C. D. 1998. Strategic bidding in a multi-unit auction: An empirical analysis of bids to supply electricity in England and Wales. RAND J. Econ. 29, 4, 703--773.]]Google ScholarGoogle ScholarCross RefCross Ref
  38. Yamanishi, K., and Takeuchi, J. 2001. Discovering outlier filtering rules from unlabeled data: Combining a supervised learner with an unsupervised learner. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining. ACM, New York, 389--394.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Aggregate operators in probabilistic databases

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Journal of the ACM
      Journal of the ACM  Volume 52, Issue 1
      January 2005
      146 pages
      ISSN:0004-5411
      EISSN:1557-735X
      DOI:10.1145/1044731
      Issue’s Table of Contents

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 January 2005
      Published in jacm Volume 52, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader