skip to main content
research-article

Asymptotically efficient algorithms for skyline probabilities of uncertain data

Published:02 June 2011Publication History
Skip Abstract Section

Abstract

Skyline computation is widely used in multicriteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied. Some earlier work focused on probabilistic skylines with a given threshold; Atallah and Qi [2009] studied the problem to compute skyline probabilities for all instances of uncertain objects without the use of thresholds, and proposed an algorithm with subquadratic time complexity. In this work, we propose a new algorithm for computing all skyline probabilities that is asymptotically faster: worst-case O(nn log n) time and O(n) space for 2D data; O(n2−1/d logd−1 n) time and O(n logd−2 n) space for d-dimensional data. Furthermore, we study the online version of the problem: Given any query point p (unknown until the query time), return the probability that no instance in the given data set dominates p. We propose an algorithm for answering such an online query for d-dimensional data in O(n1−1/d logd−1 n) time after preprocessing the data in O(n2−1/d logd−1) time and space.

References

  1. Abrahamson, K. 1987. Generalized string matching. SIAM J. Comput. 16, 1039--1051. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Afshani, P., Agarwal, P. K., Arge, L., Larsen, K. G., and Phillips, J. M. 2011. (Approximate) uncertain skylines. In Proceedings of the 14th International Conference on Database Theory. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Antova, L., Jansen, T., Koch, C., and Olteanu, D. 2008. Fast and simple relational processing of uncertain data. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los Alamitos, CA, 983--992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Atallah, M. J. and Qi, Y. 2009. Computing all skyline probabilities for uncertain data. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). ACM, New York, NY, 279--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Benjelloun, O., Sarma, A. D., Halevy, A., and Widom, J. 2006. ULDBs: Databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 953--964. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bentley, J. L. 1980. Multidimensional divide-and-conquer. Comm. ACM 23, 214--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Beskales, G., Soliman, M. A., and IIyas, I. F. 2008. Efficient search for the top-k probable nearest neighbors in uncertain databases. Proc. VLDB Endow. 1, 326--339. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 421--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., and Suciu, D. 2005. MYSTIQ: a system for finding more answers by using probabilities. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 891--893. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, B.-C., LeFevre, K., and Ramakrishnan, R. 2007. Privacy skyline: Privacy with multidimensional adversarial knowledge. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 770--781. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cheng, R., Chen, J., Mokbel, M., and Chow, C.-Y. 2008. Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over uncertain data. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 973--982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 551--562. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2004a. Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. 16, 1112--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Cheng, R., Xia, Y., Prabhakar, S., Shah, R., and Vitter, J. S. 2004b. Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proceedings of the 30th International Conference on Very Large Data Bases VLDB. Vol. 30, VLDB Endowment, 876--887. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Cormode, G., Li, F., and Yi, K. 2009. Semantics of ranking queries for probabilistic data and expected ranks. In Proceedings of the IEEE International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 305--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Dellis, E. and Seeger, B. 2007. Efficient computation of reverse skyline queries. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 291--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hua, M., Pei, J., Zhang, W., and Lin, X. 2008. Ranking queries on uncertain data: A probabilistic threshold approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 673--686. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kung, H. T., Luccio, F., and Preparata, F. P. 1975. On finding the maxima of a set of vectors. J. ACM 22, 4, 469--476. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Li, J., Saha, B., and Deshpande, A. 2009. A unified approach to ranking in probabilistic databases. Proc. VLDB Endow. 2, 502--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Lian, X. and Chen, L. 2008a. Monochromatic and bichromatic reverse skyline search over uncertain databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 213--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Lian, X. and Chen, L. 2008b. Probabilistic ranked queries in uncertain databases. In Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology (EDBT). ACM, New York, NY, 511--522. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Lin, X., Yuan, Y., Zhang, Q., and Zhang, Y. 2007. Selecting stars: The k most representative skyline operator. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 86--95.Google ScholarGoogle Scholar
  23. Ljosa, V. and Singh, A. K. 2008. Top-k spatial joins of probabilistic objects. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 566--575. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mehlhorn, K. 1984. Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry. Springer-Verlag Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Morse, M., Patel, J. M., and Jagadish, H. V. 2007. Efficient skyline computation over low-cardinality domains. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 267--278. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Pei, J., Fu, A. W.-C., Lin, X., and Wang, H. 2007a. Computing compressed multidimensional skyline cubes efficiently. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 96--105.Google ScholarGoogle Scholar
  27. Pei, J., Jiang, B., Lin, X., and Yuan, Y. 2007b. Probabilistic skylines on uncertain data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 15--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Preparata, F. and Shamos, M. 1985. Computational Geometry: An Introduction. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sarma, A. D., Theobald, M., and Widom, J. 2008. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1023--1032. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., and Cheng, R. 2008. Database support for probabilistic attributes and tuples. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1053--1061. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Soliman, M. A., Ilyas, I. F., and Chang, K. C.-C. 2007. Urank: formulation and efficient evaluation of top-k queries in uncertain databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 1082--1084. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Tao, Y., Cheng, R., Xiao, X., Ngai, W. K., Kao, B., and Prabhakar, S. 2005. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 922--933. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vlachou, A., Doulkeridis, C., Kotidis, Y., and Vazirgiannis, M. 2007. SKYPEER: Efficient subspace skyline computation over distributed data. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 416--425.Google ScholarGoogle Scholar
  34. Willard, D. E. 1985. New data structures for orthogonal range queries. SIAM J. Comput. 14, 232--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wu, P., Agrawal, D., Egecioglu, O., and El Abbadi, A. 2007. DeltaSky: Optimal maintenance of skyline deletions without exclusive dominance region generation. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 486--495.Google ScholarGoogle ScholarCross RefCross Ref
  36. Zhang, W., Lin, X., Zhang, Y., Wang, W., and Yu, J. X. 2009. Probabilistic skyline operator over sliding windows. In Proceedings of the IEEE International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1060--1071. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Zhang, X. and Chomicki, J. 2008. Semantics and evaluation of top-k queries in probabilistic databases. Distrib. Datab. 26, 67--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Zhu, L., Zhou, S., and Guan, J. 2007. Efficient skyline retrieval on peer-to-peer networks. In Proceedings of the Future Generation Communication and Networking—Volume 02 (FGCN). IEEE Computer Society, Los, Alamitos, CA, 309--314. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Asymptotically efficient algorithms for skyline probabilities of uncertain data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 36, Issue 2
        May 2011
        257 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/1966385
        Issue’s Table of Contents

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 2 June 2011
        • Accepted: 1 February 2011
        • Revised: 1 December 2010
        • Received: 1 May 2010
        Published in tods Volume 36, Issue 2

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader