Abstract
Though extensions to the relational data model have been proposed in order to handle probabilistic information, there has been very little work to date on handling aggregate operators in such databases. In this article, we present a very general notion of an aggregate operator and show how classical aggregation operators (such as COUNT, SUM, etc.) as well as statistical operators (such as percentiles, variance, etc.) are special cases of this general definition. We devise a formal linear programming based semantics for computing aggregates over probabilistic DBMSs, develop algorithms that satisfy this semantics, analyze their complexity, and introduce several families of approximation algorithms that run in polynomial time. We implemented all of these algorithms and tested them on a large set of data to help determine when each one is preferable.
- Aitchison, J., and Dunsmore, I. R. 1975. Statistical Prediction Analysis. Cambridge University Press, Cambridge.]]Google Scholar
- Barbara, D., Garcia-Molina, H., and Porter, D. 1992. The management of probabilistic data. IEEE Trans. Knowl. Data Eng. 4, 5, 487--502.]] Google ScholarDigital Library
- Bobick, A., and Ivanov, Y. 1998. Action recognition using probabilistic parsing. In Proceedings of the Symposium on Computer Vision and Pattern Recognition (San Barbara, Calif.). IEEE Computer Society Press, Los Alamitos, Calif., 196--202.]] Google ScholarDigital Library
- Cavallo, R., and Pittarelli, M. 1987. The theory of probabilistic databases. In Proceedings of the 13th International Conference on Very Large Databases (Los Altos, Calif.). Morgan-Kaufmann, San Mateo, Calif., 71--81.]] Google ScholarDigital Library
- Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD Conference. ACM, New York, 551--562.]] Google ScholarDigital Library
- Cormen, T. H., Leiserson, C. E., and Rivest, R. L. 1992. Introduction to Algorithms. MIT Press, Cambridge, Mass.]] Google ScholarDigital Library
- Cramton, P. 1997. The FCC spectrum auctions: An early assessment. J. Econ. Manage. Strat. 6, 3, 431--495.]]Google ScholarCross Ref
- Dekhtyar, A., Ross, R., and Subrahmanian, V. S. 2001. Probabilistic temporal databases. I: Algebra. ACM Trans. Datab. Syst. 26, 1, 41--95.]] Google ScholarDigital Library
- Dekhtyar, A., and Subrahmanian, V. S. 1997. Hybrid probabilistic programs. In Proceedings of the 14th International Conference on Logic Program. MIT Press, Cambridge, Mass., 391--405.]]Google Scholar
- Dey, D., and Sarkar, S. 1996. A probabilistic relational model and algebra. ACM Trans. Datab. Syst. 21, 3, 339--369.]] Google ScholarDigital Library
- Dyreson, C. E., and Snodgrass, R. T. 1998. Supporting valid-time indeterminacy. ACM Trans. Datab. Syst. 23, 1, 1--57.]] Google ScholarDigital Library
- Eiter, T., Lukasiewicz, T., and Walter, M. 2000. Extension of the relational algebra to probabilistic complex values. In Proceedings of the 1st International Symposium on Foundations of Information and Knowledge Systems. Springer, Heidelberg, Germany, 94--115.]] Google ScholarDigital Library
- Fagin, R., Halpern, J. Y., and Megiddo, N. 1990. A logic for reasoning about probabilities. Inf. Comput. 87, 1/2, 78--128.]] Google ScholarDigital Library
- Fitting, M. 1991. Bilattices and the semantics of logic programming. J. Logic Program. 11, 1--2, 91--116.]] Google ScholarDigital Library
- Fuhr, N., and Pfeifer, U. 1994. Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions. ACM Trans. Inf. Syst. 12, 1, 92--115.]] Google ScholarDigital Library
- Fuhr, N., and Rolleke, T. 1996. A probabilistic NF2 relational algebra for integrated information retrieval and database systems. In Proceedings of the 2nd World Conference on Integrated Design and Process Technology. Society for Design and Process Science, Austin, Tex. 17--30.]]Google Scholar
- Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Company, San Francisco, Calif.]] Google ScholarDigital Library
- Gee, J., Briquer, L., Barillot, C., and Haynor, D. 1995. Probabilistic matching of brain images. In Information Processing in Medical Imaging. Kluwer Academic Publishers, Dordrecht, 113--125.]]Google Scholar
- Kiessling, W., Thone, H., and Guntzer, U. 1992. Database support for problematic knowledge. In Proceedings of the 3rd International Conference on Extending Database Technology. Springer-Verlag, Berlin, Germany, 421--436.]] Google ScholarDigital Library
- Kifer, M., and Li, A. 1988. On the semantics of rule-based expert systems with uncertainty. In Proceedings of the 2nd International Conference on Database Theory. Springer-Verlag, New York, 102--117.]] Google ScholarDigital Library
- Korth, H., and Silberschatz, A. 1991. Database System Concepts, 2nd ed. McGraw-Hill, New York.]] Google ScholarDigital Library
- Lakshmanan, L. V. S., Leone, N., Ross, R., and Subrahmanian, V. S. 1997. ProbView: A flexible probabilistic database system. ACM Trans. Datab. Syst. 22, 3, 419--469.]] Google ScholarDigital Library
- Lakshmanan, L. V. S., and Sadri, F. 1994a. Modeling uncertainty in deductive databases. In Proceedings of the 5th International Conference on Database and Expert Systems Applications. Springer, Berlin, Germany, 724--733.]] Google ScholarDigital Library
- Lakshmanan, L. V. S., and Sadri, F. 1994b. Probabilistic deductive databases. In Proceedings of the 11th International Logic Programing Symposium (London, England). MIT Press, Cambridge, Mass., 254--268.]] Google ScholarDigital Library
- Lakshmanan, L. V. S., and Shiri, N. 2001. A parametric approach to deductive databases with uncertainty. IEEE Trans. Knowl. Data Eng. 13, 4, 554--570.]] Google ScholarDigital Library
- Mandelbaum, R., Kamberova, G., and Mintz, M. 1998. Stereo depth estimation: A confidence interval approach. In Proceedings of the 6th International Conference on Computer Vision. Narosa Publishing House, New Delhi, 503--509.]] Google ScholarDigital Library
- Ng, R., and Subrahmanian, V. S. 1992. Probabilistic logic programming. Inf. Comput. 101, 2, 150--201.]] Google ScholarDigital Library
- Ng, R., and Subrahmanian, V. S. 1994. Stable semantics for probabilistic deductive databases. Inf. Comput. 110, 1, 42--83.]] Google ScholarDigital Library
- Oliver, N., Rosario, B., and Pentland, A. 1998. Statistical modeling of human interactions. In Proceedings of the Symposium on Computer Vision and Pattern Recognition (Santa Barbara, Calif.). IEEE Computer Society Press, Los Alamitos, Calif., 39--46.]]Google Scholar
- Ross, R., Subrahmanian, V. S., and Grant, J. 2002. Probabilistic aggregates. In Proceedings of the 13th International Symposium on Methodologies for Intelligent System. Springer, Berlin, Germany, 553--564.]] Google ScholarDigital Library
- Shiri, N. 1997. On a generalized theory of deductive databases. Ph.D. thesis, Concordia University, Montreal.]]Google Scholar
- Street, W. N., Mangasarian, O. L., and Wolberg, W. H. 1995. An inductive learning approach to prognostic prediction. In Proceedings of the 12th International Conference on Machine Learning. Morgan Kaufmann, San Francisco, Calif. 522--530.]]Google Scholar
- Subrahmanian, V. S. 1987. On the semantics of quantitative logic programs. In Proceedings of the 4th Symposium on Logic Programming (Washington, D.C.). IEEE Computer Society Press, Los Alamitos, Calif., 173--182.]]Google Scholar
- Thirunarayan, K., and Kifer, M. 1993. A theory of nonmonotonic inheritance based on annotated logic. Artif. Intel. 60, 1, 23--50.]] Google ScholarDigital Library
- van Emden, M. H. 1986. Quantitative deduction and its fixpoint theory. J. Logic Program. 3, 1, 37--53.]] Google ScholarDigital Library
- van Rijsbergen, C. J. 1979. Information Retrieval. Butterworth, London, England.]] Google ScholarDigital Library
- Wolfram, C. D. 1998. Strategic bidding in a multi-unit auction: An empirical analysis of bids to supply electricity in England and Wales. RAND J. Econ. 29, 4, 703--773.]]Google ScholarCross Ref
- Yamanishi, K., and Takeuchi, J. 2001. Discovering outlier filtering rules from unlabeled data: Combining a supervised learner with an unsupervised learner. In Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining. ACM, New York, 389--394.]] Google ScholarDigital Library
Index Terms
- Aggregate operators in probabilistic databases
Recommendations
Reengineering Probabilistic Relational Databases with Fuzzy Probability Measures into XML Model
This paper concentrates on modeling probabilistic events with fuzzy probability measures in relational databases and XML Extensible Markup Language. Instead of crisp probability degrees or interval probability degrees, fuzzy sets are applied to ...
Probabilistic Relational Database Applications for Biomedical Informatics
AINAW '08: Proceedings of the 22nd International Conference on Advanced Information Networking and Applications - WorkshopsIn the information age, information is usually communicated and exchanged with heterogeneous databases in a distributed internetworking environment. When multiple heterogeneous databases show different values for the same data item, its actual value is ...
A Software System Development for Probabilistic Relational Database Applications for Biomedical Informatics
WAINA '09: Proceedings of the 2009 International Conference on Advanced Information Networking and Applications WorkshopsProbabilistic relational databases (PDB) have extended from the relational database model by incorporating probability measures to capture the data uncertainty associated with data items. There are many useful applications have been studied such as ...
Comments