ABSTRACT
Many data stream monitoring applications involve rank queries and hence a number of efficient evaluation algorithms are proposed recently. Most of these techniques assume that rank queries are executed directly over the whole data space. However, we observe that many applications often require to perform clustering over the data streams before rank queries are run on each cluster.
To address the problem, we propose a novel algorithm for integral clustering and ranking processing and we refer to such integrated queries as cluster-based rank queries. The algorithm includes two phases, namely the online phase which maintains the required data structures and statistics, and the query phase which uses these data structures to process queries. Extensive experiments indicate that the proposed algorithm is efficient in both space consumption and query processing.
- C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In VLDB, pages 81--92, 2003. Google ScholarDigital Library
- G. Cormode, F. Korn, S. Muthukrishnan, and D. Srivastava. Effective computation of biased quantiles over data streams. In ICDE, pages 20--31, 2005. Google ScholarDigital Library
- F. Farnstrom, J. Lewis, and C. Elkan. Scalability for clustering algorithms revisited. SIGKDD Explor. Newsl., 2(1):51--57, 2000. Google ScholarDigital Library
- T. F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theor. Comput. Sci., 38:293--306, 1985.Google ScholarCross Ref
- M. Greenwald and S. Khanna. Space-effcient online computation of quantile summaries. In SIGMOD Conference, pages 58--66, 2001. Google ScholarDigital Library
- M. Greenwald and S. Khanna. Power-conserving computation of order-statistics over sensor networks. In PODS, pages 275--285, 2004. Google ScholarDigital Library
- C. Li, M. Wang, L. Lim, H. Wang, and K. C.-C. Chang. Supporting ranking and clustering as generalized order-by and group-by. In SIGMOD Conference, pages 127--138, 2007. Google ScholarDigital Library
- X. Lin, H. Lu, J. Xu, and J. X. Yu. Continuously maintaining quantile summaries of the most recent n elements over a data stream. In ICDE, pages 362--374, 2004. Google ScholarDigital Library
Index Terms
- Cluster based rank query over multidimensional data streams
Recommendations
Technology of Continuous Query Optimization over Data Streams
ISISE '08: Proceedings of the 2008 International Symposium on Information Science and Engieering - Volume 02The characteristic of data stream is that it has a huge size and its data change continually, which need to be responded quickly, since the times of query is limited. The continuous query and data stream approximate query model are introduced in this ...
Mining query subtopics from search log data
SIGIR '12: Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrievalMost queries in web search are ambiguous and multifaceted. Identifying the major senses and facets of queries from search log data, referred to as query subtopic mining in this paper, is a very important issue in web search. Through search log analysis, ...
Condensative stream query language for data streams
ADC '07: Proceedings of the eighteenth conference on Australasian database - Volume 63In contrast to traditional database queries, a query on stream data is continuous in that it is periodically evaluated over fractions (sliding windows) of the data stream. This introduces challenges beyond those encountered when processing traditional ...
Comments