skip to main content
10.1145/1989323.1989389acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

How soccer players would do stream joins

Published:12 June 2011Publication History

ABSTRACT

In spite of the omnipresence of parallel (multi-core) systems, the predominant strategy to evaluate window-based stream joins is still strictly sequential, mostly just straightforward along the definition of the operation semantics.

In this work we present handshake join, a way of describing and executing window-based stream joins that is highly amenable to parallelized execution. Handshake join naturally leverages available hardware parallelism, which we demonstrate with an implementation on a modern multi-core system and on top of field-programmable gate arrays (FPGAs), an emerging technology that has shown distinctive advantages for high-throughput data processing.

On the practical side, we provide a join implementation that substantially outperforms CellJoin (the fastest published result) and that will directly turn any degree of parallelism into higher throughput or larger supported window sizes. On the semantic side, our work gives a new intuition of window semantics, which we believe could inspire other stream processing algorithms or ongoing standardization efforts for stream query languages.

References

  1. D. J. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In Proc. of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 2005.Google ScholarGoogle Scholar
  2. A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom. STREAM: The Stanford Data Stream Management System. http://infolab.stanford.edu/ usriv/papers/streambook.pdf.Google ScholarGoogle Scholar
  3. A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proc. of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, USA, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. I. Botan, R. Derakhshan, N. Dindar, L. Haas, R. J. Miller, and N. Tatbul. SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. Proc. of the VLDB Endowment, 3(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Butts. Synchronization Through Communication in a Massively Parallel Processor Array. IEEE Micro, 27(5):32--40, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor. IEEE Micro, 30(2):16--29, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Davis, C. Thacker, and C. Chang. BEE3: Revitalizing Computer Architecture Research. Technical Report MSR-TR-2009-45, Microsoft Research, 2009.Google ScholarGoogle Scholar
  8. A. DeHon. Balancing Interconnect and Computation in a Reconfigurable Computing Array (or, why you don't really want 100% LUT utilization). In Proc. of the Int'l Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, USA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. Frey, R. Gonçalves, M. Kersten, and J. Teubner. A Spinning Join That Does Not Get Dizzy. In Proc. of the 30th Int'l Conference on Distributed Computing Systems (ICDCS), Genoa, Italy, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Gedik, P. S. Yu, and R. Bordawekar. CellJoin: A Parallel Stream Join Operator for the Cell Processor. The VLDB Journal, 18(2):501--519, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. M. Ghanem, M. A. Hammad, M. F. Mokbel, W. G. Aref, and A. K. Elmagarmid. Incremental Evaluation of Sliding-Window Queries over Data Streams. IEEE Trans. on Knowledge and Data Engineering (TKDE), 19(1):57--72, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. Golab, S. Garg, and T. Öszu. On Indexing Sliding Windows over Online Data Streams. In Proc. of the 9th Int'l Conference on Extending Database Technology (EDBT), Crete, Greece, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Helmer, T. Westmann, and G. Moerkotte. Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships. In Proc. of the 24th Int'l Conference on Very Large Databases (VLDB), New York, NY, USA, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, F. Pailet, S. Jain, T. Jacob, S. Yada, S. Marella, P. Salihundam, V. Erraguntla, M. Konow, M. Riepen, G. Droege, J. Lindemann, M. Gries, T. Apel, K. Henriss, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, R. v. d. Wijngaart, and T. Mattson. A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In 2010 IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, February 2010.Google ScholarGoogle ScholarCross RefCross Ref
  15. Z. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld. An Adaptive Query Execution System for Data Integration. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Philadelphia, PA, USA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, and S. Zdonik. Towards a Streaming SQL Standard. Proc. of the VLDB Endowment, 1(2), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Kang, J. F. Naughton, and S. D. Viglas. Evaluating Window Joins over Unbounded Streams. In Proc. of the 19th Int'l Conference on Data Engineering (ICDE), Bangalore, India, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  18. S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, and N. Thombre. Continuous Analytics Over Discontinuous Streams. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Indidanapolis, IN, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. T. Kung and P. L. Lohman. Systolic (VLSI) Arrays for Relational Database Operations. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Santa Monica, CA, USA, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. I. Kuon and J. Rose. Measuring the gap between FPGAs and ASICs. In Proc. of the 14th Int'l Symposium on Field-Programmable Gate Arrays (FPGA), Monterey, CA, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Baltimore, MD, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. O. Mencer, K. H. Tsoi, S. Craimer, T. Todman, W. Luk, M. Y. Wong, and P. H. W. Leong. CUBE: A 512-FPGA Cluster. In Proc. of the Southern Programmable Logic Conference (SPL), Sao Carlos, Brazil, 2009.Google ScholarGoogle ScholarCross RefCross Ref
  23. A. Mitra, M. R. Vieira, P. Bakalov, V. J. Tsotras, and W. A. Najjar. Boosting XML Filtering Through a Scalable FPGA-Based Architecture. In Proc. of the 4th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 2009.Google ScholarGoogle Scholar
  24. Netezza Corp. http://www.netezza.com/.Google ScholarGoogle Scholar
  25. J.-B. Qian, H.-B. Xu, Y.-S. Dong, X.-J. Liu, and Y.-L. Wang. FPGA Acceleration Window Joins Over Multiple Data Streams. Journal of Circuits, Systems, and Computers, 14(4):813--830, 2005.Google ScholarGoogle Scholar
  26. K. A. Ross. Selection Conditions in Main Memory. ACM Trans. on Database Systems (TODS), 29:132--161, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. Load Shedding in a Data Stream Manager. In Proc. of the 29th Int'l Conference on Very Large Databases (VLDB), Berlin, Germany, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. B. Teeuw and H. M. Blanken. Control Versus Data Flow in Parallel Database Machines. IEEE Trans. on Parallel and Distributed Systems (TPDS), 4(11):1265--1279, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J. Teubner, R. Mueller, and G. Alonso. FPGA Acceleration for the Frequent Item Problem. In Proc. of the 26th Int'l Conference on Data Engineering (ICDE), Long Beach, CA, USA, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  30. A. Wilschut and P. Apers. Dataflow Query Execution in a Parallel Main-Memory Environment. In Proc. of the 1st Int'l Conference on Parallel and Distributed Information Systems (PDIS), Miami Beach, FL, USA, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How soccer players would do stream joins

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
        June 2011
        1364 pages
        ISBN:9781450306614
        DOI:10.1145/1989323

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 June 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate785of4,003submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader