ABSTRACT
This paper considers the problem of scalably processing a large number of continuous queries. We propose a flexible framework with novel data structures and algorithms for group-processing and indexing continuous queries by exploiting potential overlaps in query predicates. Our approach partitions the collection of continuous queries into groups based on the clustering patterns of the query ranges, and then applies specialized processing strategies to those heavily-clustered groups (or hotspots). To maintain the partition dynamically, we present efficient algorithms that maintain a nearly optimal partition in nearly amortized logarithmic time. We show how to use the hotspots to scalably process large numbers of continuous select-join and band-join queries, which are much more challenging than simple range selection queries. Experiments demonstrate that this approach can improve the processing throughput by orders of magnitude. As another application of hotspots, we show how to use them to build a high-quality histogram for intervals in linear time.
- {1} P. K. Agarwal, J. Xie, J. Yang, and H. Yu. Scalable continuous queries processing by tracking hotspots. Technical report, Department of Computer Science, Duke University, 2006. http://www.cs.duke.edu/~junyi/papers/joincq/vldb06-full.pdf.Google Scholar
- {2} P. K. Agarwal, J. Xie, J. Yang, and H. Yu. Monitoring continuous band-join queries over dynamic data. In Proc. of the 16th Intl. Sympos. Algorithms and Computation, pages 349-359, 2005. Google ScholarDigital Library
- {3} L. Arge and J. Vitter. Optimal external memory interval management. SIAM J. Comput., 32(6):1488-1508, 2003. Google ScholarDigital Library
- {4} R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In Proc. of the 19th ACM SIGMOD Intl. Conf. on Management of Data, pages 261-272, 2000. Google ScholarDigital Library
- {5} D. Carney, U. Çetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. B. Zdonik. Monitoring streams - a new class of data management applications. In Proc. of the 28th Intl. Conf. on Very Large Data Bases, pages 215-226, 2002. Google ScholarDigital Library
- {6} S. Chandrasekaran and M. J. Franklin. Psoup: a system for streaming queries over streaming data. VLDB Journal, 12(2):140-156, 2003. Google ScholarDigital Library
- {7} J. Chen, D. J. DeWitt, F. Tian, and Y. Wang. NiagraCQ: A scalable continuous query system for internet databases. In Proc. of the 19th ACM SIGMOD Intl. Conf. on Management of Data, pages 379-390, 2000. Google ScholarDigital Library
- {8} M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf. Computational Geometry: Algorithms and Applications. Springer-Verlag, 2nd edition, 2000. Google ScholarDigital Library
- {9} D. J. DeWitt, J. F. Naughton, and D. A. Schneider. An evaluation of non-equijoin algorithms. In Proc. of the 17th Intl. Conf. on Very Large Data Bases, pages 443-452, 1991. Google ScholarDigital Library
- {10} J.-P. Dittrich, P. M. Fischer, and D. Kossmann. Agile: adaptive indexing for context-aware information filters. In Proc. of the 24th ACM SIGMOD Intl. Conf. on Management of Data, pages 215-226, 2005. Google ScholarDigital Library
- {11} Q. Du, V. Faber, and M. Gunzburger. Centroidal Voronoi tessellations: Applications and algorithms. SIAM Reviews, 41:637-676, 1999. Google ScholarDigital Library
- {12} E. Hanson and T. Johnson. The interval skip list: A data structure for finding all intervals that overlap a point. In Proc. of the 2nd Workshop on Algorithms and Data Structures, pages 153-164, 1991.Google ScholarCross Ref
- {13} E. N. Hanson, C. Carnes, L. Huang, M. Konyala, L. Noronha, S. Parthasarathy, J. B. Park, and A. Vernon. Scalable trigger processing. In Proc. of the 15th Intl. Conf. on Data Engineering, pages 266-275, 1999. Google ScholarDigital Library
- {14} S. Har-Peled and S. Mazumdar. Coresets for k-means and k-median clustering and their applications. In Proc. of the 36th Annu. Sympos. Theory of Computing, pages 291-300, 2004. Google ScholarDigital Library
- {15} H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. Sevcik, and T. Suel. Optimal histograms with quality guarantees. In Proc. of the 24th Intl. Conf. on Very Large Data Bases, pages 275-286, 1998. Google ScholarDigital Library
- {16} M. Katz, F. Nielsen, and M. Segal. Maintenance of a piercing set for intervals with applications. Algorithmica, 36(1):59-73, 2003.Google ScholarDigital Library
- {17} N. Koudas, S. Muthukrishnan, and D. Srivastava. Optimal histograms for hierarchical range queries. In Proc. of the 19th ACM Sympos. on Principles of Database Systems, pages 196-204, 2000. Google ScholarDigital Library
- {18} L. Liu, C. Pu, and W. Tang. Continual queries for Internet scale event-driven information delivery. IEEE Trans. on Knowledge and Data Engineering, 11(4):610-628, 1999. Google ScholarDigital Library
- {19} S. Madden, M. Shah, J. Hellerstein, and V. Raman. Continuously adaptive continuous queries over streams. In Proc. of the 21st ACM SIGMOD Intl. Conf. on Management of Data, pages 49-60, 2002. Google ScholarDigital Library
- {20} E. M. McCreight. Priority search trees. SIAM J. Comput., 14:257-276, 1985.Google ScholarDigital Library
- {21} J. Pereira, F. Fabret, H. A. Jacobsen, F. Llirbat, and D. Shasha. Webfilter: A high-throughput XML-based publish and subscribe system. In Proc. of the 27th Intl. Conf. on Very Large Data Bases, pages 723-724, 2001. Google ScholarDigital Library
- {22} Special issue on data stream processing. IEEE Data Eng. Bull., 26(1), 2003.Google Scholar
- {23} J. Widom and S. Ceri. Active Database Systems: Triggers and Rules For Advanced Database Processing. Morgan Kaufmann, 1996. Google ScholarDigital Library
Index Terms
- Scalable continuous query processing by tracking hotspots
Recommendations
Input-sensitive scalable continuous join query processing
This article considers the problem of scalably processing a large number of continuous queries. Our approach, consisting of novel data structures and algorithms and a flexible processing framework, advances the state-of-the-art in several ways. First, ...
Comments