skip to main content
10.1145/2517349.2522738acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article
Open Access

Naiad: a timely dataflow system

Published:03 November 2013Publication History

ABSTRACT

Naiad is a distributed system for executing data parallel, cyclic dataflow programs. It offers the high throughput of batch processors, the low latency of stream processors, and the ability to perform iterative and incremental computations. Although existing systems offer some of these features, applications that require all three have relied on multiple platforms, at the expense of efficiency, maintainability, and simplicity. Naiad resolves the complexities of combining these features in one framework.

A new computational model, timely dataflow, underlies Naiad and captures opportunities for parallelism across a wide class of algorithms. This model enriches dataflow computation with timestamps that represent logical points in the computation and provide the basis for an efficient, lightweight coordination mechanism.

We show that many powerful high-level programming models can be built on Naiad's low-level primitives, enabling such diverse tasks as streaming data analysis, iterative machine learning, and interactive graph mining. Naiad outperforms specialized systems in their target application domains, and its unique features enable the development of new high-performance applications.

Skip Supplemental Material Section

Supplemental Material

d3-05-derek-murray.mp4

mp4

855.6 MB

References

  1. The ClueWeb09 Dataset. http://lemurproject.org/clueweb09.Google ScholarGoogle Scholar
  2. Parallel Data Warehouse. http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspx.Google ScholarGoogle Scholar
  3. Storm: Distributed and fault-tolerant realtime computation. http://storm-project.net/.Google ScholarGoogle Scholar
  4. M. Abadi, F. McSherry, D. G. Murray, and T. L. Rodeheffer. Formal analysis of a distributed algorithm for tracking progress. In Proceedings of the IFIP Joint International Conference on Formal Techniques for Distributed Systems, June 2013.Google ScholarGoogle ScholarCross RefCross Ref
  5. T. Akidau, A. Balikov, K. Bekiroǧlu, S. Chernyak, J. Haberman, R. Lax, S. McVeety, D. Mills, P. Nordstrom, and S. Whittle. MillWheel: fault-tolerant stream processing at Internet scale. In Proceedings of the 39th International Conference on Very Large Data Bases (VLDB), Aug. 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Alizadeh, A. Greenberg, D. A. Maltz, J. Padhyey, P. Pately, B. Prabhakarz, S. Senguptay, and M. Sridharany. Data Center TCP (DCTCP). In Proceedings of the ACM International Conference on Applications, Technologies, Architectures and Protocols for Computer Communications (SIGCOMM), Aug. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Alvaro, N. Conway, J. M. Hellerstein, and W. R. Marczak. Consistency analysis in Bloom: a CALM and collected approach. In Proceedings of the 5th Conference on Innovative Data Systems Research (CIDR), Jan. 2011.Google ScholarGoogle Scholar
  8. B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. ul Haq, M. I. ul Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas. Windows Azure Storage: a highly available cloud storage service with strong consistency. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP), Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Chandramouli, J. Goldstein, and D. Maier. On-the-fly progress detection in iterative stream queries. Proceedings of the Very Large Database Endowment (PVLDB), 2(1):241--252, Aug. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Cheng, J. Hong, A. Kyrola, Y. Miao, X. Weng, M. Wu, F. Yang, L. Zhou, F. Zhao, and E. Chen. Kineograph: taking the pulse of a fast-changing and connected world. In Proceedings of the EuroSys Conference, Apr. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Cipar, Q. Ho, J. K. Kim, S. Lee, G. R. Ganger, G. Gibson, K. Keeton, and E. Xing. Solving the straggler problem with bounded staleness. In Proceedings of the 14th Workshop on Hot Topics in Operating Systems (HotOS), May 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. D. Clark. Window and acknowledgement strategy in TCP. RFC 813, July 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Conway, W. R. Marczak, P. Alvaro, J. M. Hellerstein, and D. Maier. Logic and lattices for distributed programming. In Proceedings of the 3rd ACM Symposium on Cloud Computing (SoCC), Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Dean and L. A. Barroso. The tail at scale. Communications of the ACM, 56(2):74--80, Feb. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. E. Gonzalez, Y. Low, H. Gu, D. Bickson, and C. Guestrin. PowerGraph: distributed graph-parallel computation on natural graphs. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Oct. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. D. Hsu, N. Karampatziakis, J. Langford, and A. Smola. Parallel online learning. In R. Bekkerman, M. Bilenko, and J. Langford, editors, Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Dec. 2011.Google ScholarGoogle Scholar
  18. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data-parallel programs from sequential building blocks. In Proceedings of the EuroSys Conference, Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Q. Ke, M. Isard, and Y. Yu. Optimus: A dynamic rewriting framework for execution plans of data-parallel computation. In Proceedings of the EuroSys Conference, Apr. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek. The Click Modular Router. ACM Transactions on Computer Systems, 18(3):263--297, Aug. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Kwak, C. Lee, H. Park, and S. Moon. What is Twitter, a social network or a news media? In Proceedings of the 19th International World Wide Web Conference (WWW), Apr. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Li, K. Tufte, V. Shkapenyuk, V. Papadimos, T. Johnson, and D. Maier. Out-of-order processing: a new architecture for high-performance stream systems. Proceedings of the Very Large Database Endowment (PVLDB), 1(1):274--288, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. B. T. Loo, T. Condie, M. Garofalakis, D. E. Gay, J. M. Hellerstein, P. Maniatis, R. Ranakrishnan, T. Roscoe, and I. Stoica. Declarative networking: language, execution and optimization. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), June 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. T. Loo, T. Condie, J. M. Hellerstein, P. Maniatis, T. Roscoe, and I. Stoica. Implementing declarative overlays. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP), Oct. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. T. Loo, J. M. Hellerstein, I. Stoica, and R. Ramakrishnan. Declarative routing: extensible routing with declarative queries. In Proceedings of the ACM International Conference on Applications, Technologies, Architectures and Protocols for Computer Communications (SIGCOMM), Aug. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. GraphLab: A new parallel framework for machine learning. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), July 2010.Google ScholarGoogle Scholar
  27. G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski. Pregel: a system for large-scale graph processing. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), June 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. F. McSherry, D. G. Murray, R. Isaacs, and M. Isard. Differential dataflow. In Proceedings of the 6th Conference on Innovative Data Systems Research (CIDR), Jan. 2013.Google ScholarGoogle Scholar
  29. C. Mitchell, R. Power, and J. Li. Oolong: asynchronous distributed applications made easy. In Proceedings of the 3rd Asia-Pacific Workshop on Systems (APSys), July 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. G. Murray, M. Schwarzkopf, C. Smowton, S. Smith, A. Madhavapeddy, and S. Hand. CIEL: a universal execution engine for distributed dataflow computing. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Mar. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. D. Nagle, D. Serenyi, and A. Matthews. The Panasas ActiveScale storage cluster: Delivering scalable high bandwidth storage. In Proceedings of the ACM/IEEE Supercomputing Conference (SC), Nov. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Nagle. Congestion control in IP/TCP internetworks. RFC 896, Jan. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Najork. The scalable hyperlink store. In Proceedings of the 20th ACM Conference on Hypertext and Hypermedia, June 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. M. Najork, D. Fetterly, A. Halverson, K. Kenthapadi, and S. Gollapudi. Of hammers and nails: an empirical comparison of three paradigms for processing large graphs. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM), Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Pelissier. Providing quality of service over InfiniBand#8482; Architecture fabrics. In Proceedings of the 8th IEEE Symposium on High Performance Interconnects (HOT Interconnects), 2000.Google ScholarGoogle Scholar
  36. D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. D. P. Reed and R. K. Kanodia. Synchronization with eventcounts and sequencers. Communications of the ACM, 22(2):115--123, Feb. 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. P. A. Tucker, D. Maier, T. Sheard, and L. Fegaras. Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 15(3), May/June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Welsh, D. Culler, and E. Brewer. SEDA: an architecture for well-conditioned, scalable internet services. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP), Oct. 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. R. Xin, J. Gonzalez, M. Franklin, and I. Stoica. GraphX: A resilient distributed graph system on spark. In Proceedings of the Graph Data-management Experiences and Systems (GRADES) Workshop, June 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Y. Yu, M. Isard, D. Fetterly, M. Budiu, Ú. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. Franklin, S. Shenker, and I. Stoica. Resilient Distributed Datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI), Apr. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Zaharia, T. Das, H. Li, T. Hunter, S. Shenker, and I. Stoica. Discretized Streams: Fault-tolerant streaming computation at scale. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP), Nov. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica. Improving MapReduce performance in heterogeneous environments. In Proceedings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI), Dec. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Y. Zhang, Q. Gao, L. Gao, and C. Wang. PrIter: A distributed framework for prioritized iterative computations. In Proceedings of the 2nd ACM Symposium on Cloud Computing (SoCC), Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Y. Zhang, Q. Gao, L. Gao, and C. Wang. Accelerate large-scale iterative computation through asynchronous accumulative updates. In Proceedings of the 3rd ACM Workshop on Scientific Cloud Computing (ScienceCloud), June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Conferences
    SOSP '13: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
    November 2013
    498 pages
    ISBN:9781450323888
    DOI:10.1145/2517349

    Copyright © 2013 Owner/Author

    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 3 November 2013

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    Overall Acceptance Rate131of716submissions,18%

    Upcoming Conference

    SOSP '24

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader