research-article

How soccer players would do stream joins

Authors:
Jens Teubner

ETH Zurich, Zurich, Switzerland

ETH Zurich, Zurich, Switzerland
View Profile

,
Rene Mueller

IBM Almaden, San Jose, CA, USA

IBM Almaden, San Jose, CA, USA
View Profile

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of dataJune 2011Pages 625–636https://doi.org/10.1145/1989323.1989389

Published:12 June 2011Publication History

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Pages 625–636

ABSTRACT

In spite of the omnipresence of parallel (multi-core) systems, the predominant strategy to evaluate window-based stream joins is still strictly sequential, mostly just straightforward along the definition of the operation semantics.

In this work we present handshake join, a way of describing and executing window-based stream joins that is highly amenable to parallelized execution. Handshake join naturally leverages available hardware parallelism, which we demonstrate with an implementation on a modern multi-core system and on top of field-programmable gate arrays (FPGAs), an emerging technology that has shown distinctive advantages for high-throughput data processing.

On the practical side, we provide a join implementation that substantially outperforms CellJoin (the fastest published result) and that will directly turn any degree of parallelism into higher throughput or larger supported window sizes. On the semantic side, our work gives a new intuition of window semantics, which we believe could inspire other stream processing algorithms or ongoing standardization efforts for stream query languages.

References

D. J. Abadi, Y. Ahmad, M. Balazinska, U. Çetintemel, M. Cherniack, J.-H. Hwang, W. Lindner, A. S. Maskey, A. Rasin, E. Ryvkina, N. Tatbul, and S. Zdonik. The Design of the Borealis Stream Processing Engine. In Proc. of the 2nd Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 2005.Google Scholar
A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R. Motwani, U. Srivastava, and J. Widom. STREAM: The Stanford Data Stream Management System. http://infolab.stanford.edu/ usriv/papers/streambook.pdf.Google Scholar
A. Baumann, P. Barham, P.-E. Dagand, T. Harris, R. Isaacs, S. Peter, T. Roscoe, A. Schüpbach, and A. Singhania. The Multikernel: A new OS architecture for scalable multicore systems. In Proc. of the 22nd ACM Symposium on Operating Systems Principles (SOSP), Big Sky, MT, USA, 2009. Google ScholarDigital Library
I. Botan, R. Derakhshan, N. Dindar, L. Haas, R. J. Miller, and N. Tatbul. SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems. Proc. of the VLDB Endowment, 3(1), 2010. Google ScholarDigital Library
M. Butts. Synchronization Through Communication in a Massively Parallel Processor Array. IEEE Micro, 27(5):32--40, 2007. Google ScholarDigital Library
P. Conway, N. Kalyanasundharam, G. Donley, K. Lepak, and B. Hughes. Cache Hierarchy and Memory Subsystem of the AMD Opteron Processor. IEEE Micro, 30(2):16--29, 2010. Google ScholarDigital Library
J. Davis, C. Thacker, and C. Chang. BEE3: Revitalizing Computer Architecture Research. Technical Report MSR-TR-2009-45, Microsoft Research, 2009.Google Scholar
A. DeHon. Balancing Interconnect and Computation in a Reconfigurable Computing Array (or, why you don't really want 100% LUT utilization). In Proc. of the Int'l Symposium on Field Programmable Gate Arrays (FPGA), Monterey, CA, USA, 1999. Google ScholarDigital Library
P. Frey, R. Gonçalves, M. Kersten, and J. Teubner. A Spinning Join That Does Not Get Dizzy. In Proc. of the 30th Int'l Conference on Distributed Computing Systems (ICDCS), Genoa, Italy, 2010. Google ScholarDigital Library
B. Gedik, P. S. Yu, and R. Bordawekar. CellJoin: A Parallel Stream Join Operator for the Cell Processor. The VLDB Journal, 18(2):501--519, 2009. Google ScholarDigital Library
T. M. Ghanem, M. A. Hammad, M. F. Mokbel, W. G. Aref, and A. K. Elmagarmid. Incremental Evaluation of Sliding-Window Queries over Data Streams. IEEE Trans. on Knowledge and Data Engineering (TKDE), 19(1):57--72, 2007. Google ScholarDigital Library
L. Golab, S. Garg, and T. Öszu. On Indexing Sliding Windows over Online Data Streams. In Proc. of the 9th Int'l Conference on Extending Database Technology (EDBT), Crete, Greece, 2004. Google ScholarDigital Library
S. Helmer, T. Westmann, and G. Moerkotte. Diag-Join: An Opportunistic Join Algorithm for 1:N Relationships. In Proc. of the 24th Int'l Conference on Very Large Databases (VLDB), New York, NY, USA, 1998. Google ScholarDigital Library
J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan, G. Ruhl, D. Jenkins, H. Wilson, N. Borkar, G. Schrom, F. Pailet, S. Jain, T. Jacob, S. Yada, S. Marella, P. Salihundam, V. Erraguntla, M. Konow, M. Riepen, G. Droege, J. Lindemann, M. Gries, T. Apel, K. Henriss, T. Lund-Larsen, S. Steibl, S. Borkar, V. De, R. v. d. Wijngaart, and T. Mattson. A 48-Core IA-32 Message-Passing Processor with DVFS in 45nm CMOS. In 2010 IEEE International Solid-State Circuits Conference, San Francisco, CA, USA, February 2010.Google ScholarCross Ref
Z. Ives, D. Florescu, M. Friedman, A. Levy, and D. S. Weld. An Adaptive Query Execution System for Data Integration. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Philadelphia, PA, USA, 1999. Google ScholarDigital Library
N. Jain, S. Mishra, A. Srinivasan, J. Gehrke, J. Widom, H. Balakrishnan, U. Çetintemel, M. Cherniack, R. Tibbetts, and S. Zdonik. Towards a Streaming SQL Standard. Proc. of the VLDB Endowment, 1(2), 2008. Google ScholarDigital Library
J. Kang, J. F. Naughton, and S. D. Viglas. Evaluating Window Joins over Unbounded Streams. In Proc. of the 19th Int'l Conference on Data Engineering (ICDE), Bangalore, India, 2003.Google ScholarCross Ref
S. Krishnamurthy, M. J. Franklin, J. Davis, D. Farina, P. Golovko, A. Li, and N. Thombre. Continuous Analytics Over Discontinuous Streams. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Indidanapolis, IN, USA, 2010. Google ScholarDigital Library
H. T. Kung and P. L. Lohman. Systolic (VLSI) Arrays for Relational Database Operations. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Santa Monica, CA, USA, 1980. Google ScholarDigital Library
I. Kuon and J. Rose. Measuring the gap between FPGAs and ASICs. In Proc. of the 14th Int'l Symposium on Field-Programmable Gate Arrays (FPGA), Monterey, CA, USA, 2006. Google ScholarDigital Library
J. Li, D. Maier, K. Tufte, V. Papadimos, and P. A. Tucker. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. In Proc. of the Int'l Conference on Management of Data (SIGMOD), Baltimore, MD, USA, 2005. Google ScholarDigital Library
O. Mencer, K. H. Tsoi, S. Craimer, T. Todman, W. Luk, M. Y. Wong, and P. H. W. Leong. CUBE: A 512-FPGA Cluster. In Proc. of the Southern Programmable Logic Conference (SPL), Sao Carlos, Brazil, 2009.Google ScholarCross Ref
A. Mitra, M. R. Vieira, P. Bakalov, V. J. Tsotras, and W. A. Najjar. Boosting XML Filtering Through a Scalable FPGA-Based Architecture. In Proc. of the 4th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 2009.Google Scholar
Netezza Corp. http://www.netezza.com/.Google Scholar
J.-B. Qian, H.-B. Xu, Y.-S. Dong, X.-J. Liu, and Y.-L. Wang. FPGA Acceleration Window Joins Over Multiple Data Streams. Journal of Circuits, Systems, and Computers, 14(4):813--830, 2005.Google Scholar
K. A. Ross. Selection Conditions in Main Memory. ACM Trans. on Database Systems (TODS), 29:132--161, 2004. Google ScholarDigital Library
N. Tatbul, U. Çetintemel, S. Zdonik, M. Cherniack, and M. Stonebraker. Load Shedding in a Data Stream Manager. In Proc. of the 29th Int'l Conference on Very Large Databases (VLDB), Berlin, Germany, 2003. Google ScholarDigital Library
W. B. Teeuw and H. M. Blanken. Control Versus Data Flow in Parallel Database Machines. IEEE Trans. on Parallel and Distributed Systems (TPDS), 4(11):1265--1279, 1993. Google ScholarDigital Library
J. Teubner, R. Mueller, and G. Alonso. FPGA Acceleration for the Frequent Item Problem. In Proc. of the 26th Int'l Conference on Data Engineering (ICDE), Long Beach, CA, USA, 2010.Google ScholarCross Ref
A. Wilschut and P. Apers. Dataflow Query Execution in a Parallel Main-Memory Environment. In Proc. of the 1st Int'l Conference on Parallel and Distributed Information Systems (PDIS), Miami Beach, FL, USA, 1991. Google ScholarDigital Library

Index Terms

How soccer players would do stream joins
1. Information systems
  1. Data management systems
    1. Database management system engines
      1. Database query processing
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory
      1. Database query processing and optimization (theory)

Recommendations

CPU load shedding for binary stream joins

We present an adaptive load shedding approach for windowed stream joins. In contrast to the conventional approach of dropping tuples from the input streams, we explore the concept ofselective processing for load shedding. We allow stream tuples to be ...
Read More
Adaptive load shedding for windowed stream joins
CIKM '05: Proceedings of the 14th ACM international conference on Information and knowledge management

We present an adaptive load shedding approach for windowed stream joins. In contrast to the conventional approach of dropping tuples from the input streams, we explore the concept of selective processing for load shedding. We allow stream tuples to be ...
Read More
Executing stream joins on the cell processor
VLDB '07: Proceedings of the 33rd international conference on Very large data bases

Low-latency and high-throughput processing are key requirements of data stream management systems (DSMSs). Hence, multi-core processors that provide high aggregate processing capacity are ideal matches for executing costly DSMS operators. The recently ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
June 2011
1364 pages
ISBN:9781450306614
DOI:10.1145/1989323
General Chair:
Timos Sellis
IMIS/RC Athena
,
Program Chair:
Renée J. Miller
University of Toronto
,
Publications Chairs:
Anastasios Kementsietsidis
IBM T.J. Watson Research Center
,
Yannis Velegrakis
University of Trento
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 June 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data flow
parallelism
stream joins
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate785of4,003submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 87
  Total Citations
  View Citations
- 969
  Total Downloads
- Downloads (Last 12 months)34
- Downloads (Last 6 weeks)3
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

How soccer players would do stream joins

SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

ABSTRACT

References

Cited By

Index Terms

Recommendations

CPU load shedding for binary stream joins

Adaptive load shedding for windowed stream joins

Executing stream joins on the cell processor