Abstract
Skyline computation is widely used in multicriteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied. Some earlier work focused on probabilistic skylines with a given threshold; Atallah and Qi [2009] studied the problem to compute skyline probabilities for all instances of uncertain objects without the use of thresholds, and proposed an algorithm with subquadratic time complexity. In this work, we propose a new algorithm for computing all skyline probabilities that is asymptotically faster: worst-case O(n √n log n) time and O(n) space for 2D data; O(n2−1/d logd−1 n) time and O(n logd−2 n) space for d-dimensional data. Furthermore, we study the online version of the problem: Given any query point p (unknown until the query time), return the probability that no instance in the given data set dominates p. We propose an algorithm for answering such an online query for d-dimensional data in O(n1−1/d logd−1 n) time after preprocessing the data in O(n2−1/d logd−1) time and space.
- Abrahamson, K. 1987. Generalized string matching. SIAM J. Comput. 16, 1039--1051. Google ScholarDigital Library
- Afshani, P., Agarwal, P. K., Arge, L., Larsen, K. G., and Phillips, J. M. 2011. (Approximate) uncertain skylines. In Proceedings of the 14th International Conference on Database Theory. Google ScholarDigital Library
- Antova, L., Jansen, T., Koch, C., and Olteanu, D. 2008. Fast and simple relational processing of uncertain data. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los Alamitos, CA, 983--992. Google ScholarDigital Library
- Atallah, M. J. and Qi, Y. 2009. Computing all skyline probabilities for uncertain data. In Proceedings of the 28th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS). ACM, New York, NY, 279--287. Google ScholarDigital Library
- Benjelloun, O., Sarma, A. D., Halevy, A., and Widom, J. 2006. ULDBs: Databases with uncertainty and lineage. In Proceedings of the 32nd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 953--964. Google ScholarDigital Library
- Bentley, J. L. 1980. Multidimensional divide-and-conquer. Comm. ACM 23, 214--229. Google ScholarDigital Library
- Beskales, G., Soliman, M. A., and IIyas, I. F. 2008. Efficient search for the top-k probable nearest neighbors in uncertain databases. Proc. VLDB Endow. 1, 326--339. Google ScholarDigital Library
- Börzsönyi, S., Kossmann, D., and Stocker, K. 2001. The skyline operator. In Proceedings of the 17th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 421--430. Google ScholarDigital Library
- Boulos, J., Dalvi, N., Mandhani, B., Mathur, S., Re, C., and Suciu, D. 2005. MYSTIQ: a system for finding more answers by using probabilities. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 891--893. Google ScholarDigital Library
- Chen, B.-C., LeFevre, K., and Ramakrishnan, R. 2007. Privacy skyline: Privacy with multidimensional adversarial knowledge. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 770--781. Google ScholarDigital Library
- Cheng, R., Chen, J., Mokbel, M., and Chow, C.-Y. 2008. Probabilistic verifiers: Evaluating constrained nearest-neighbor queries over uncertain data. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 973--982. Google ScholarDigital Library
- Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2003. Evaluating probabilistic queries over imprecise data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 551--562. Google ScholarDigital Library
- Cheng, R., Kalashnikov, D. V., and Prabhakar, S. 2004a. Querying imprecise data in moving object environments. IEEE Trans. Knowl. Data Eng. 16, 1112--1127. Google ScholarDigital Library
- Cheng, R., Xia, Y., Prabhakar, S., Shah, R., and Vitter, J. S. 2004b. Efficient indexing methods for probabilistic threshold queries over uncertain data. In Proceedings of the 30th International Conference on Very Large Data Bases VLDB. Vol. 30, VLDB Endowment, 876--887. Google ScholarDigital Library
- Cormode, G., Li, F., and Yi, K. 2009. Semantics of ranking queries for probabilistic data and expected ranks. In Proceedings of the IEEE International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 305--316. Google ScholarDigital Library
- Dellis, E. and Seeger, B. 2007. Efficient computation of reverse skyline queries. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 291--302. Google ScholarDigital Library
- Hua, M., Pei, J., Zhang, W., and Lin, X. 2008. Ranking queries on uncertain data: A probabilistic threshold approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 673--686. Google ScholarDigital Library
- Kung, H. T., Luccio, F., and Preparata, F. P. 1975. On finding the maxima of a set of vectors. J. ACM 22, 4, 469--476. Google ScholarDigital Library
- Li, J., Saha, B., and Deshpande, A. 2009. A unified approach to ranking in probabilistic databases. Proc. VLDB Endow. 2, 502--513. Google ScholarDigital Library
- Lian, X. and Chen, L. 2008a. Monochromatic and bichromatic reverse skyline search over uncertain databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 213--226. Google ScholarDigital Library
- Lian, X. and Chen, L. 2008b. Probabilistic ranked queries in uncertain databases. In Proceedings of the 11th International Conference on Extending Database Technology: Advances in Database Technology (EDBT). ACM, New York, NY, 511--522. Google ScholarDigital Library
- Lin, X., Yuan, Y., Zhang, Q., and Zhang, Y. 2007. Selecting stars: The k most representative skyline operator. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 86--95.Google Scholar
- Ljosa, V. and Singh, A. K. 2008. Top-k spatial joins of probabilistic objects. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 566--575. Google ScholarDigital Library
- Mehlhorn, K. 1984. Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry. Springer-Verlag Berlin. Google ScholarDigital Library
- Morse, M., Patel, J. M., and Jagadish, H. V. 2007. Efficient skyline computation over low-cardinality domains. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 267--278. Google ScholarDigital Library
- Pei, J., Fu, A. W.-C., Lin, X., and Wang, H. 2007a. Computing compressed multidimensional skyline cubes efficiently. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 96--105.Google Scholar
- Pei, J., Jiang, B., Lin, X., and Yuan, Y. 2007b. Probabilistic skylines on uncertain data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 15--26. Google ScholarDigital Library
- Preparata, F. and Shamos, M. 1985. Computational Geometry: An Introduction. Springer-Verlag. Google ScholarDigital Library
- Sarma, A. D., Theobald, M., and Widom, J. 2008. Exploiting lineage for confidence computation in uncertain and probabilistic databases. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1023--1032. Google ScholarDigital Library
- Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., and Cheng, R. 2008. Database support for probabilistic attributes and tuples. In Proceedings of the IEEE 24th International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1053--1061. Google ScholarDigital Library
- Soliman, M. A., Ilyas, I. F., and Chang, K. C.-C. 2007. Urank: formulation and efficient evaluation of top-k queries in uncertain databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). ACM, New York, NY, 1082--1084. Google ScholarDigital Library
- Tao, Y., Cheng, R., Xiao, X., Ngai, W. K., Kao, B., and Prabhakar, S. 2005. Indexing multi-dimensional uncertain data with arbitrary probability density functions. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB). VLDB Endowment, 922--933. Google ScholarDigital Library
- Vlachou, A., Doulkeridis, C., Kotidis, Y., and Vazirgiannis, M. 2007. SKYPEER: Efficient subspace skyline computation over distributed data. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 416--425.Google Scholar
- Willard, D. E. 1985. New data structures for orthogonal range queries. SIAM J. Comput. 14, 232--253. Google ScholarDigital Library
- Wu, P., Agrawal, D., Egecioglu, O., and El Abbadi, A. 2007. DeltaSky: Optimal maintenance of skyline deletions without exclusive dominance region generation. In Proceedings of the IEEE 23rd International Conference on Data Engineering (ICDE). 486--495.Google ScholarCross Ref
- Zhang, W., Lin, X., Zhang, Y., Wang, W., and Yu, J. X. 2009. Probabilistic skyline operator over sliding windows. In Proceedings of the IEEE International Conference on Data Engineering. IEEE Computer Society, Los, Alamitos, CA, 1060--1071. Google ScholarDigital Library
- Zhang, X. and Chomicki, J. 2008. Semantics and evaluation of top-k queries in probabilistic databases. Distrib. Datab. 26, 67--126. Google ScholarDigital Library
- Zhu, L., Zhou, S., and Guan, J. 2007. Efficient skyline retrieval on peer-to-peer networks. In Proceedings of the Future Generation Communication and Networking—Volume 02 (FGCN). IEEE Computer Society, Los, Alamitos, CA, 309--314. Google ScholarDigital Library
Index Terms
- Asymptotically efficient algorithms for skyline probabilities of uncertain data
Recommendations
Computing all skyline probabilities for uncertain data
PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsSkyline computation is widely used in multi-criteria decision making. As research in uncertain databases draws increasing attention, skyline queries with uncertain data have also been studied, e.g. probabilistic skylines. The previous work requires "...
Continuous probabilistic skyline queries over uncertain data streams
DEXA'10: Proceedings of the 21st international conference on Database and expert systems applications: Part IRecently, some approaches of finding probabilistic skylines on uncertain data have been proposed. In these approaches, a data object is composed of instances, each associated with a probability. The probabilistic skyline is then defined as a set of non-...
An efficient scheme for probabilistic skyline queries over distributed uncertain data
Uncertain data has already widely existed in many practical applications, such as sensor networks, RFID networks, location-based services and mobile object management, etc. The skyline queries over uncertain data as an important aspect of uncertain data ...
Comments