skip to main content
10.1145/3242153.3242155acmotherconferencesArticle/Chapter ViewAbstractPublication PagesbirteConference Proceedingsconference-collections
research-article
Open Access

Streams and Tables: Two Sides of the Same Coin

Published:27 August 2018Publication History

ABSTRACT

Stream processing has emerged as a paradigm for applications that require low-latency evaluation of operators over unbounded sequences of data. Defining the semantics of stream processing is challenging in the presence of distributed data sources. The physical and logical order of data in a stream may become inconsistent in such a setting. Existing models either neglect these inconsistencies or handle them by means of data buffering and reordering techniques, thereby compromising processing latency.

In this paper, we introduce the Dual Streaming Model to reason about physical and logical order in data stream processing. This model presents the result of an operator as a stream of successive updates, which induces a duality of results and streams. As such, it provides a natural way to cope with inconsistencies between the physical and logical order of streaming data in a continuous manner, without explicit buffering and reordering. We further discuss the trade-offs and challenges faced when implementing this model in terms of correctness, latency, and processing cost. A case study based on Apache Kafka illustrates the effectiveness of our model in the light of real-world requirements.

References

  1. Daniel Abadi et al. 2003. Aurora: A New Model and Architecture for Data Stream Management. The VLDB Journal 12, 2 (2003), 120--139. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Daniel Abadi et al. 2005. The Design of the Borealis Stream Processing Engine. In CIDR, 2nd Biennial Conf. on Innovative Data Systems Research. 277--289.Google ScholarGoogle Scholar
  3. Tyler Akidau et al. 2013. MillWheel: Fault-tolerant Stream Processing at Internet Scale. Proc. VLDB Endow. 6, 11 (2013), 1033--1044. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tyler Akidau et al. 2015. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-scale, Unbounded, Out-of-order Data Processing. Proc. VLDB Endow. 8, 12 (2015), 1792--1803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Arvind Arasu, Shivnath Babu, and Jennifer Widom. 2003. CQL: A Language for Continuous Queries over Streams and Relations. In Database Programming Languages, 9th Int. WS. 1--19.Google ScholarGoogle Scholar
  6. Brian Babcock et al. 2002. Models and Issues in Data Stream Systems. In Proc. of the 21st ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. 1--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Shivnath Babu and Jennifer Widom. 2001. Continuous Queries over Data Streams. SIGMOD Records 30, 3 (2001), 109--120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Roger Barga et al. 2007. Consistent Streaming Through Time: A Vision for Event Stream Processing. In CIDR, 3rd Biennial Conf. on Innovative Data Systems Research. 363--374.Google ScholarGoogle Scholar
  9. Jose A. Blakeley, Per-Ake Larson, and Frank Wm Tompa. 1986. Efficiently Updating Materialized Views. SIGMOD Record 15, 2 (1986), 61--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Badrish Chandramouli et al. 2014. Trill: A High-performance Incremental Query Processor for Diverse Analytics. Proc. VLDB Endow. 8, 4 (2014), 401--412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gianpaolo Cugola and Alessandro Margara. 2012. Processing flows of information: From data stream to complex event processing. ACM Comput. Surv. 44, 3 (2012), 15:1--15:62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Nihal Dindar et al. 2013. Modeling the Execution Semantics of Stream Processing Engines with SECRET. The VLDB Journal 22, 4 (Aug. 2013), 421--446. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jim Gray et al. 1997. Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals. Data Mining and Knowledge Discovery 1, 1 (1997), 29--53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. H. V. Jagadish, Inderpal Singh Mumick, and Abraham Silberschatz. 1995. View Maintenance Issues for the Chronicle Data Model (Extended Abstract). In Proc. of the 14th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. 113--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Namit Jain et al. 2008. Towards a Streaming SQL Standard. Proc. VLDB Endow. 1, 2 (2008), 1379--1390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jay Kreps, Neha Narkhede, and Jun Rao. 2011. Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB. 1--7.Google ScholarGoogle Scholar
  17. Sailesh Krishnamurthy et al. 2010. Continuous Analytics over Discontinuous Streams. In Proc. of the 2010 ACM SIGMOD Int. Conf. on Management of Data. 1081--1092. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yan-Nei Law, Haixun Wang, and Carlo Zaniolo. 2004. Query Languages and Data Models for Database Sequences and Data Streams. In Proc. of the 13th Int. Conf. on Very Large Data Bases. 492--503. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. James Lewis and Martin Fowler. 2014. Microservices: a definition of this new architectural term. https://www.martinfowler.com/articles/microservices.htmlGoogle ScholarGoogle Scholar
  20. Jin Li et al. 2005. Semantics and Evaluation Techniques for Window Aggregates in Data Streams. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data. 311--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jin Li et al. 2008. Out-of-order Processing: A New Architecture for High-performance Stream Systems. Proc. VLDB Endow. 1, 1 (2008), 274--288. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ling Liu, Calton Pu, and Wei Tang. 1999. Continual Queries for Internet Scale Event-Driven Information Delivery. IEEE Transactions on Knowledge Data Engineering 11, 4 (1999), 610--628. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ling Liu et al. 1996. Differential Evaluation of Continual Queries. In Proc. of the 16th Int. Conf. on Distributed Computing Systems. 458--465. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. David Maier et al. 2005. Semantics of Data Streams and Operators. In Proc. of the 10th Int. Conf. on Database Theory. 37--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Utkarsh Srivastava and Jennifer Widom. 2004. Flexible Time Management in Data Stream Systems. In Proc. of the 23rd ACM SIGMOD-SIGACT-SIGART Symp. on Principles of Database Systems. 263--274. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Douglas Terry et al. 1992. Continuous Queries over Append-only Databases. In Proc. of the 1992 ACM SIGMOD Int. Conf. on Management of Data. 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Peter Tucker et al. 2003. Exploiting Punctuation Semantics in Continuous Data Streams. IEEE Transactions on Knowledge Data Engineering 15, 3 (2003), 555--568. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Streams and Tables: Two Sides of the Same Coin

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        BIRTE '18: Proceedings of the International Workshop on Real-Time Business Intelligence and Analytics
        August 2018
        59 pages
        ISBN:9781450366076
        DOI:10.1145/3242153

        Copyright © 2018 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 August 2018

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate12of21submissions,57%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader