skip to main content
10.1145/2723372.2742788acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open Access

Twitter Heron: Stream Processing at Scale

Published:27 May 2015Publication History

ABSTRACT

Storm has long served as the main platform for real-time analytics at Twitter. However, as the scale of data being processed in real-time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. We need a system that scales better, has better debug-ability, has better performance, and is easier to manage -- all while working in a shared cluster infrastructure. We considered various alternatives to meet these needs, and in the end concluded that we needed to build a new real-time stream data processing system. This paper presents the design and implementation of this new system, called Heron. Heron is now the de facto stream data processing engine inside Twitter, and in this paper we also share our experiences from running Heron in production. In this paper, we also provide empirical evidence demonstrating the efficiency and scalability of Heron.

References

  1. Apache Aurora. http://aurora.incubator.apache.orgGoogle ScholarGoogle Scholar
  2. Apache Samza. http://samza.incubator.apache.orgGoogle ScholarGoogle Scholar
  3. Tyler Akidau, Alex Balikov, Kaya Bekiroglu, Slava Chernyak, Josh Haberman, Reuven Lax, Sam McVeety, Daniel Mills, Paul Nordstrom, Sam Whittle: MillWheel: Fault-Tolerant Stream Processing at Internet Scale. PVLDB 6(11): 1033--1044 (2013) Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Mohamed H. Ali, Badrish Chandramouli, Jonathan Goldstein, Roman Schindlauer: The extensibility framework in Microsoft StreamInsight. ICDE 2011: 1242--1253 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman: Photon: fault-tolerant and scalable joining of continuous data streams. SIGMOD 2013: 577--588 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Arvind Arasu, Brian Babcock, Shivnath Babu, Mayur Datar, Keith Ito, Rajeev Motwani, Itaru Nishizawa, Utkarsh Srivastava, Dilys Thomas, Rohit Varma, Jennifer Widom: STREAM: The Stanford Stream Data Manager. IEEE Data Eng. Bull. 26(1): 19--26 (2003)Google ScholarGoogle Scholar
  7. Hari Balakrishnan, Magdalena Balazinska, Donald Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey, Eduardo F. Galvez, Jon Salz, Michael Stonebraker, Nesime Tatbul, Richard Tibbetts, Stanley B. Zdonik: Retrospective on Aurora. VLDB J. 13(4): 370--383 (2004) Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Oscar Boykin, Sam Ritchie, Ian O'Connell, Jimmy Lin: Summingbird: A Framework for Integrating Batch and Online MapReduce Computations. PVLDB 7(13): 1441--1451 (2014) Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. DataTorrent. https://www.datatorrent.comGoogle ScholarGoogle Scholar
  10. Minos N. Garofalakis, Johannes Gehrke: Querying and Mining Data Streams: You Only Get One Look. VLDB 2002Google ScholarGoogle Scholar
  11. Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy H. Katz, Scott Shenker, Ion Stoica: Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center. NSDI 2011 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. IBM Infosphere Streams. http://www-03.ibm.com/software/products/en/infosphere-streams/Google ScholarGoogle Scholar
  13. Kestrel: A simple, distributed message queue system. http://robey.github.com/kestrelGoogle ScholarGoogle Scholar
  14. Jay Kreps, Neha Narkhede, and Jun Rao. Kafka: a distributed messaging system for log processing. SIGMOD Workshop on Networking Meets Databases, 2011.Google ScholarGoogle Scholar
  15. Simon Loesing, Martin Hentschel, Tim Kraska, Donald Kossmann: Stormy: an elastic and highly available streaming service in the cloud. EDBT/ICDT Workshops 2012: 55--60 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Nathan Marz: (Storm) Tutorial. https://github.com/nathanmarz/storm/wiki/TutorialGoogle ScholarGoogle Scholar
  17. S4 Distributed stream computing platform. http://incubator.apache.org/s4/Google ScholarGoogle Scholar
  18. Spark Streaming. https://spark.apache.org/streaming/Google ScholarGoogle Scholar
  19. Sankar Subramanian, Srikanth Bellamkonda, Hua-Gang Li, Vince Liang, Lei Sheng, Wayne Smith, James Terry, Tsae-Feng Yu, Andrew Witkowski: Continuous Queries in Oracle. VLDB 2007: 1173--1184 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthikeyan Ramasamy, Jignesh M. Patel, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy V. Ryaboy: Storm@twitter. SIGMOD 2014: 147--156 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Trident: https://github.com/nathanmarz/storm/wikiGoogle ScholarGoogle Scholar
  22. Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, Sharad Agarwal, Mahadev Konar, Robert Evans, Thomas Graves, Jason Lowe, Hitesh Shah, Siddharth Seth, Bikas Saha, Carlo Curino, Owen O'Malley, Sanjay Radia, Benjamin Reed, Eric Baldeschwieler: Apache Hadoop YARN: yet another resource negotiator. SoCC 2013: 5Google ScholarGoogle Scholar
  23. ZeroMQ: http://zeromq.org. Retrieved December 1, 2014.Google ScholarGoogle Scholar

Index Terms

  1. Twitter Heron: Stream Processing at Scale

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '15: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data
      May 2015
      2110 pages
      ISBN:9781450327589
      DOI:10.1145/2723372

      Copyright © 2015 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 27 May 2015

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SIGMOD '15 Paper Acceptance Rate106of415submissions,26%Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader