ABSTRACT
The primary goal of data stream research is to develop space and time efficient solutions for answering continuous on-line summarization queries. Research efforts over the last decade have resulted in a number of efficient algorithms with varying degrees of space and time complexities. While these techniques are developed in a standard CPU setting, many of their applications such as click-fraud detection and network-traffic summarization typically execute on special networking architectures called Network Processing Units (NPUs). These NPUs interface with special associative memories known as Ternary Content Addressable Memories (TCAMs) to provide gigabit rate forwarding at network routers. In this paper, we describe how the integrated architecture of NPU and TCAMs can be exploited towards achieving the goal of developing high-speed stream summarization solutions. We propose two TCAM-conscious solutions for the frequent elements problem in data streams and present a comprehensive evaluation of these techniques on a state-of-the-art networking platform.
- Intel network processing units. http://www.intel.com/, 2006.Google Scholar
- Teja networking systems. http://www.teja.com, 2006.Google Scholar
- D. J. Abadi, D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, M. Stonebraker, N. Tatbul, and S. Zdonik. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2):120--139, 2003. Google ScholarDigital Library
- N. Bandi, D. Agrawal, and A. E. Abbadi. Fast algorithms for heavy distinct hitters using associative memories. In International Conference on Distributed Computing Systems(ICDCS), 2007. Google ScholarDigital Library
- N. Bandi, S. Schnieder, D. Agrawal, and A. E. Abbadi. Hardware acceleration of database operations using content-addressable memories. In DaMoN, 2005.Google Scholar
- S. Chandrasekaran. Telegraphcq: Continuous dataflow processing for an uncertain world, 2003.Google Scholar
- M. Charikar, K. Chen, and M. Farach-Colton. Finding frequent items in data streams. In ICALP'02, pages 693--703, 2002. Google ScholarDigital Library
- G. Cormode and S. Muthukrishnan. What's hot and what's not: tracking most frequent items dynamically. In PODS '03, pages 296--306, 2003. Google ScholarDigital Library
- C. Cranor, T. Johnson, O. Spataschek, and V. Shkapenyuk. Gigascope: a stream database for network applications. In SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data, pages 647--651, New York, NY, USA, 2003. ACM Press. Google ScholarDigital Library
- E. D. Demaine, A. López-Ortiz, and J. I. Munro. Frequency estimation of internet packet streams with limited space. In Proceedings of the 10th Annual European Symposium on Algorithms (ESA'02), volume 2461, pages 348--360, 2002. Google ScholarDigital Library
- C. Estan and G. Varghese. New directions in traffic measurement and accounting: Focusing on the elephants, ignoring the mice. ACM Trans. Comput. Syst., 21(3):270--313, 2003. Google ScholarDigital Library
- R. H. K. Fang Yu. Efficient Multi-Match Packet Classification with TCAM. In IEEE Hot Interconnects, 2004. Google ScholarDigital Library
- B. T. Gold, A. Ailamaki, L. Huston, and B. Falsafi. Accelerating database operations using a network processor. In DaMoN, 2005. Google ScholarDigital Library
- M. Greenwald and S. Khanna. Space-efficient online computation of quantile summaries. In SIGMOD Conference, 2001. Google ScholarDigital Library
- J. Hershberger, N. Shrivastava, S. Suri, and C. D. Tóth. Adaptive spatial partitioning for multidimensional data streams. In ISAAC, pages 522--533, 2004. Google ScholarDigital Library
- R. M. Karp, S. Shenker, and C. H. Papadimitriou. A simple algorithm for finding frequent elements in streams and bags. ACM Trans. Database Syst., 28(1):51--55, 2003. Google ScholarDigital Library
- K. Lakshminarayanan, A. Rangarajan, and S. Venkatachary. Algorithms for advanced packet classification with ternary cams. In SIGCOMM '05, pages 193--204, New York, NY, USA, 2005. ACM Press. Google ScholarDigital Library
- G. S. Manku and R. Motwani. Approximate frequency counts over data streams, 2002.Google Scholar
- G. S. Manku, S. Rajagopalan, and B. G. Lindsay. Approximate medians and other quantiles in one pass and with limited memory. In SIGMOD'98, pages 426--435, 1998. Google ScholarDigital Library
- A. Metwally, D. Agrawal, and A. E. Abbadi. Efficient computation of frequent and top-k elements in data streams. In ICDT, pages 398--412, 2005. Google ScholarDigital Library
- J. Misra and D. Gries. Finding repeated elements. Sci. Comput. Program., 2(2):143--152, 1982.Google ScholarCross Ref
- S. Mysore, B. Agrawal, T. Sherwood, N. Shrivastava, and S. Suri. Profiling over adaptive ranges. In CGO '06: Proceedings of the International Symposium on Code Generation and Optimization, pages 147--158, 2006. Google ScholarDigital Library
- G. J. Narlikar, A. Basu, and F. Zane. Coolcams: Power-efficient tcams for forwarding engines. In INFOCOM, 2003.Google Scholar
- R. Panigrahy and S. Sharma. Sorting and Searching using Ternary CAMs. In IEEE Micro., 2003. Google ScholarDigital Library
- J. Widom and R. Motwani. Query processing, resource management, and approximation in a data stream management system, 2003.Google Scholar
- F. Yu, R. H. Katz, and T. V. Lakshman. Gigabit rate packet pattern-matching using tcam. In ICNP '04, pages 174--183, 2004. Google ScholarDigital Library
Index Terms
- Fast data stream algorithms using associative memories
Recommendations
Fast and approximate stream mining of quantiles and frequencies using graphics processors
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of dataWe present algorithms for fast quantile and frequency estimation in large data streams using graphics processors (GPUs). We exploit the high computation power and memory bandwidth of graphics processors and present a new sorting algorithm that performs ...
Error-Correcting Codes for Ternary Content Addressable Memories
As VLSI silicon technology continues its relentless advance and memory densities increase, the problem of soft errors--bit upsets caused by alpha particles or neutron hits--demands solutions. Error-correcting codes (ECCs) are routinely used on random-...
Theory of data stream computing: where to go
PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systemsComputing power has been growing steadily, just as communication rate and memory size. Simultaneously our ability to create data has been growing phenomenally and therefore the need to analyze it. We now have examples of massive data streams that are ...
Comments