skip to main content
research-article
Open Access

Disentanglement in nested-parallel programs

Published:20 December 2019Publication History
Skip Abstract Section

Abstract

Nested parallelism has proved to be a popular approach for programming the rapidly expanding range of multicore computers. It allows programmers to express parallelism at a high level and relies on a run-time system and a scheduler to deliver efficiency and scalability. As a result, many programming languages and extensions that support nested parallelism have been developed, including in C/C++, Java, Haskell, and ML. Yet, writing efficient and scalable nested parallel programs remains challenging, primarily due to difficult concurrency bugs arising from destructive updates or effects. For decades, researchers have argued that functional programming can simplify writing parallel programs by allowing more control over effects but functional programs continue to underperform in comparison to parallel programs written in lower-level languages. The fundamental difficulty with functional languages is that they have high demand for memory, and this demand only grows with parallelism.

In this paper, we identify a memory property, called disentanglement, of nested-parallel programs, and propose memory management techniques for improved efficiency and scalability. Disentanglement allows for (destructive) effects as long as concurrently executing threads do not gain knowledge of the memory objects allocated by each other. We formally define disentanglement by considering an ML-like higher-order language with mutable references and presenting a dynamic semantics for it that enables reasoning about computation graphs of nested parallel programs. Based on this graph semantics, we formalize a classic correctness property---determinacy race freedom---and prove that it implies disentanglement. This establishes that disentanglement applies to a relatively broad class of parallel programs. We then propose memory management techniques for nested-parallel programs that take advantage of disentanglement for improved efficiency and scalability. We show that these techniques are practical by extending the MLton compiler for Standard ML to support this form of nested parallelism. Our empirical evaluation shows that our techniques are efficient and scale well.

Skip Supplemental Material Section

Supplemental Material

a47-westrick.webm

webm

95.1 MB

References

  1. Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2018a. Performance Challenges in Modular Parallel Programs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). 381–382.Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). 214–228.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Umut A. Acar, Guy Blelloch, Matthew Fluet, Stefan K. Muller, and Ram Raghunathan. 2015a. Coupling Memory and Computation for Locality Management. In Summit on Advances in Programming Languages (SNAPL).Google ScholarGoogle Scholar
  4. Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2002. The Data Locality of Work Stealing. Theory of Computing Systems 35, 3 (2002), 321–347.Google ScholarGoogle ScholarCross RefCross Ref
  5. Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018b. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). 769–782.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2013. Scheduling Parallel Programs by Work Stealing with Private Deques. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13).Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Umut A. Acar, Arthur Chargueraud, and Mike Rainey. 2015b. A Work-Efficient Algorithm for Parallel Unordered Depth-First Search. In ACM/IEEE Conference on High Performance Computing (SC). ACM, New York, NY, USA, Article 67, 12 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2016. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. Journal of Functional Programming (JFP) 26 (2016), e23.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sarita V. Adve. 2010. Data races are evil with no exceptions: technical perspective. Commun. ACM 53, 11 (2010), 84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. R. Allen and D. A. Padua. 1987. Debugging Fortran on a Shared Memory Machine. In Proceedings of the 1987 International Conference on Parallel Processing. 721–727.Google ScholarGoogle Scholar
  11. B. Alpern, L. Carter, and E. Feig. 1990. Uniform memory hierarchies. In Proceedings 31st Annual Symposium on Foundations of Computer Science. 600–608 vol.2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Todd A. Anderson. 2010. Optimizations in a private nursery-based garbage collector. In Proceedings of the 9th International Symposium on Memory Management, ISMM 2010, Toronto, Ontario, Canada, June 5-6, 2010. 21–30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Andrew W. Appel. 1989. Simple Generational Garbage Collection and Fast Allocation. Software Prac. Experience 19, 2 (1989), 171–183. http://www.cs.princeton.edu/fac/~appel/papers/143.psGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andrew W. Appel and Zhong Shao. 1996. Empirical and analytic study of stack versus heap cost for languages with closures. Journal of Functional Programming 6, 1 (Jan. 1996), 47–74. ftp://daffy.cs.yale.edu/pub/papers/shao/stack.psGoogle ScholarGoogle ScholarCross RefCross Ref
  15. Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread Scheduling for Multiprogrammed Multiprocessors. Theory of Computing Systems 34, 2 (2001), 115–144.Google ScholarGoogle ScholarCross RefCross Ref
  16. Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data Structures for Parallel Computing. ACM Trans. Program. Lang. Syst. 11, 4 (Oct. 1989), 598–632.Google ScholarGoogle Scholar
  17. Sven Auhagen, Lars Bergstrom, Matthew Fluet, and John H. Reppy. 2011. Garbage collection for multicore NUMA machines. In Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness (MSPC). 51–57.Google ScholarGoogle Scholar
  18. Stephanie Balzer, Bernardo Toninho, and Frank Pfenning. 2019. Manifest Deadlock-Freedom for Shared Session Types. In Programming Languages and Systems - 28th European Symposium on Programming, ESOP 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings. 611–639. Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ales Bizjak, Daniel Gratzer, Robbert Krebbers, and Lars Birkedal. 2019. Iron: managing obligations in higher-order concurrent separation logic. PACMPL 3, POPL (2019), 65:1–65:30.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Guy E. Blelloch. 1996. Programming Parallel Algorithms. Commun. ACM 39, 3 (1996), 85–97.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Guy E. Blelloch, Rezaul A. Chowdhury, Phillip B. Gibbons, Vijaya Ramachandran, Shimin Chen, and Michael Kozuch. 2008. Provably good multicore cache performance for divide-and-conquer algorithms. In In the Proceedings of the 19th ACM-SIAM Symposium on Discrete Algorithms. 501–510.Google ScholarGoogle Scholar
  23. Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic parallel algorithms can be fast. In PPoPP ’12. 181–192.Google ScholarGoogle Scholar
  24. Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling irregular parallel computations on hierarchical caches. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 355–366.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Guy E. Blelloch and Phillip B. Gibbons. 2004. Effectively sharing a cache among threads. In SPAA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Girija J. Narlikar. 1997. Space-efficient Scheduling of Parallelism with Synchronization Variables. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’97). 12–23.Google ScholarGoogle Scholar
  27. Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a Portable Nested Data-Parallel Language. J. Parallel Distrib. Comput. 21, 1 (1994), 4–14.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Santa Barbara, California, 207–216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1996. Cilk: An Efficient Multithreaded Runtime System. J. Parallel and Distrib. Comput. 37, 1 (1996), 55 – 69.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Robert D. Blumofe and Charles E. Leiserson. 1998. Space-Efficient Scheduling of Multithreaded Computations. SIAM J. Comput. 27, 1 (1998), 202–229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46 (Sept. 1999), 720–748. Issue 5.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Robert L. Bocchino, Stephen Heumann, Nima Honarmand, Sarita V. Adve, Vikram S. Adve, Adam Welc, and Tatiana Shpeisman. 2011. Safe nondeterminism in a deterministic-by-default parallel language. In ACM POPL.Google ScholarGoogle Scholar
  33. Robert L. Bocchino, Jr., Vikram S. Adve, Danny Dig, Sarita V. Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, Hyojin Sung, and Mohsen Vakilian. 2009. A type and effect system for deterministic parallel Java. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications (OOPSLA ’09). 97–116.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Robert L Bocchino, Jr., Vikram S. Adve, Sarita V. Adve, and Marc Snir. 2009. Parallel programming must be deterministic by default. In First USENIX Conference on Hot Topics in Parallelism.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Hans-Juergen Boehm. 2011. How to Miscompile Programs with "Benign" Data Races. In 3rd USENIX Workshop on Hot Topics in Parallelism, HotPar’11, Berkeley, CA, USA, May 26-27, 2011.Google ScholarGoogle Scholar
  36. Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon L. Peyton Jones, Gabriele Keller, and Simon Marlow. 2007. Data parallel Haskell: a status report. In Proceedings of the POPL 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP 2007, Nice, France, January 16, 2007. 10–18.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA ’05). ACM, 519–538.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C. J. Cheney. 1970. A Non-Recursive List Compacting Algorithm. Commun. ACM 13, 11 (Nov. 1970), 677–8.Google ScholarGoogle Scholar
  39. Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. 1998. Detecting data races in Cilk programs that use locks. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA ’98).Google ScholarGoogle Scholar
  40. Intel Corp. 2017. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In Intel Xeon Processor E7 v4 Family Specification. https://ark.intel.com/products/series/93797/Intel- Xeon- Processor- E7- v4- Family .Google ScholarGoogle Scholar
  41. Damien Doligez and Georges Gonthier. 1994. Portable, Unobtrusive Garbage Collection for Multiprocessor Systems. In Conference Record of the Twenty-first Annual ACM Symposium on Principles of Programming Languages (ACM SIGPLAN Notices). ACM Press, Portland, OR. ftp://ftp.inria.fr/INRIA/Projects/para/doligez/DoligezGonthier94.ps.gzGoogle ScholarGoogle Scholar
  42. Damien Doligez and Xavier Leroy. 1993. A Concurrent Generational Garbage Collector for a Multi-Threaded Implementation of ML. In Conference Record of the Twentieth Annual ACM Symposium on Principles of Programming Languages (ACM SIGPLAN Notices). ACM Press, 113–123. file://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/publications/concurrentgc.ps.gzGoogle ScholarGoogle Scholar
  43. Tamar Domani, Elliot K. Kolodner, Ethan Lewis, Erez Petrank, and Dafna Sheinwald. 2002. Thread-Local Heaps for Java. In ISMM’02 Proceedings of the Third International Symposium on Memory Management (ACM SIGPLAN Notices), David Detlefs (Ed.). ACM Press, Berlin, 76–87. http://www.cs.technion.ac.il/~erez/publications.htmlGoogle ScholarGoogle Scholar
  44. Martin Elsman. 2001. A Stack Machine for Region Based Programs, See [ SPACE 2001 ]. http://www.diku.dk/topps/space2001/ program.html#MartinElsmanGoogle ScholarGoogle Scholar
  45. Perry A. Emrath, Sanjoy Ghosh, and David A. Padua. 1991. Event Synchronization Analysis for Debugging Parallel Programs. In Supercomputing ’91. 580–588.Google ScholarGoogle Scholar
  46. Perry A. Emrath and Davis A. Padua. 1988. Automatic Detection of Nondeterminacy in Parallel Programs. In Proceedings of the Workshop on Parallel and Distributed Debugging. 89–99.Google ScholarGoogle Scholar
  47. Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC ’06). ACM, New York, NY, USA, Article 83.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Mingdong Feng and Charles E. Leiserson. 1997. Efficient Detection of Determinacy Races in Cilk Programs. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1–11.Google ScholarGoogle Scholar
  49. Mingdong Feng and Charles E. Leiserson. 1999. Efficient Detection of Determinacy Races in Cilk Programs. Theory of Computing Systems 32, 3 (1999), 301–326.Google ScholarGoogle ScholarCross RefCross Ref
  50. Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: efficient and precise dynamic race detection. SIGPLAN Not. 44, 6 (June 2009), 121–133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Cormac Flanagan, Stephen N. Freund, Marina Lifshin, and Shaz Qadeer. 2008. Types for atomicity: Static checking and inference for Java. ACM Trans. Program. Lang. Syst. 30, 4 (2008), 20:1–20:53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Matthew Fluet, Greg Morrisett, and Amal J. Ahmed. 2006. Linear Regions Are All You Need. In Proceedings of the 15th Annual European Symposium on Programming (ESOP).Google ScholarGoogle Scholar
  53. Matthew Fluet, Mike Rainey, and John Reppy. 2008. A scheduling framework for general-purpose parallel languages. In ACM SIGPLAN International Conference on Functional Programming (ICFP).Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2011. Implicitly threaded parallelism in Manticore. Journal of Functional Programming 20, 5-6 (2011), 1–40.Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Matthew Fluet, Mike Rainey, John Reppy, Adam Shaw, and Yingqi Xiao. 2007. Manticore: A Heterogeneous Parallel Language. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming (DAMP ’07). 37–44.Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Matteo Frigo, Pablo Halpern, Charles E. Leiserson, and Stephen Lewin-Berlin. 2009. Reducers and Other Cilk++ Hyperobjects. In 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures. 79–90.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212–223.Google ScholarGoogle Scholar
  58. David K. Gifford and John M. Lucassen. 1986. Integrating Functional and Imperative Programming. In Proceedings of the ACM Symposium on Lisp and Functional Programming (LFP). ACM Press, 22–38.Google ScholarGoogle Scholar
  59. Marcelo J. R. Gonçalves. 1995. Cache Performance of Programs with Intensive Heap Allocation and Generational Garbage Collection. Ph.D. Dissertation. Department of Computer Science, Princeton University.Google ScholarGoogle Scholar
  60. Marcelo J. R. Gonçalves and Andrew W. Appel. 1995. Cache Performance of Fast-Allocating Programs. In Record of the 1995 Conference on Functional Programming and Computer Architecture.Google ScholarGoogle Scholar
  61. Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-Based Memory Management in Cyclone. In Proceedings of SIGPLAN 2002 Conference on Programming Languages Design and Implementation (ACM SIGPLAN Notices). ACM Press, Berlin, 282–293.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut A. Acar, and Matthew Fluet. 2018. Hierarchical memory management for mutable state. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, Vienna, Austria, February 24-28, 2018. 81–93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Robert H. Halstead, Jr. 1984. Implementation of Multilisp: Lisp on a Multiprocessor. In Proceedings of the 1984 ACM Symposium on LISP and functional programming (LFP ’84). ACM, 9–17.Google ScholarGoogle ScholarDigital LibraryDigital Library
  64. Kevin Hammond. 2011. Why Parallel Functional Programming Matters: Panel Statement. In Reliable Software Technologies -Ada-Europe 2011 - 16th Ada-Europe International Conference on Reliable Software Technologies, Edinburgh, UK, June 20-24, 2011. Proceedings. 201–205.Google ScholarGoogle Scholar
  65. David R. Hanson. 1990. Fast Allocation and Deallocation of Memory Based on Object Lifetimes. Software Prac. Experience 20, 1 (Jan. 1990), 5–12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  66. Shams Mahmood Imam and Vivek Sarkar. 2014. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14. 75–86.Google ScholarGoogle ScholarCross RefCross Ref
  67. Intel. 2011. Intel Threading Building Blocks. https://www.threadingbuildingblocks.org/ .Google ScholarGoogle Scholar
  68. Intel Corporation 2009a. Intel Cilk++ SDK Programmer’s Guide. Intel Corporation. Document Number: 322581-001US.Google ScholarGoogle Scholar
  69. Intel Corporation 2009b. Intel(R) Threading Building Blocks. Intel Corporation. Available from http://www. threadingbuildingblocks.org/documentation.php .Google ScholarGoogle Scholar
  70. Richard Jones, Antony Hosking, and Eliot Moss. 2011. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC.Google ScholarGoogle ScholarDigital LibraryDigital Library
  71. Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2018a. RustBelt: securing the foundations of the rust programming language. PACMPL 2, POPL (2018), 66:1–66:34. Google ScholarGoogle ScholarDigital LibraryDigital Library
  72. Ralf Jung, Robbert Krebbers, Jacques-Henri Jourdan, Ales Bizjak, Lars Birkedal, and Derek Dreyer. 2018b. Iris from the ground up: A modular foundation for higher-order concurrent separation logic. J. Funct. Program. 28 (2018), e20.Google ScholarGoogle ScholarCross RefCross Ref
  73. Gabriele Keller, Manuel M.T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. 2010. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (ICFP ’10). 261–272.Google ScholarGoogle ScholarDigital LibraryDigital Library
  74. A. Krishnamurthy, D. E. Culler, A. Dusseau, S. C. Goldstein, S. Lumetta, T. von Eicken, and K. Yelick. 1993. Parallel Programming in Split-C. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (Supercomputing ’93). ACM, New York, NY, USA, 262–273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  75. Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic Parallelism Requires Abstractions. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). 211–222.Google ScholarGoogle Scholar
  76. Lindsey Kuper and Ryan R Newton. 2013. LVars: lattice-based data structures for deterministic parallelism. In Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing. ACM, 71–84.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R. Newton. 2014a. Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 2–14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  78. Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, and Ryan R. Newton. 2014b. Freeze After Writing: Quasideterministic Parallel Programming with LVars. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 257–270.Google ScholarGoogle Scholar
  79. Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In WWW ’10. ACM, 591–600.Google ScholarGoogle Scholar
  80. John Launchbury and Simon L. Peyton Jones. 1994. Lazy Functional State Threads. In Proceedings of the ACM SIGPLAN’94 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, USA, June 20-24, 1994. 24–35.Google ScholarGoogle Scholar
  81. Matthew Le and Matthew Fluet. 2015. Partial Aborts for Transactions via First-class Continuations. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). 230–242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  82. Doug Lea. 2000. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande (JAVA ’00). 36–43.Google ScholarGoogle ScholarDigital LibraryDigital Library
  83. Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). 227–242.Google ScholarGoogle ScholarDigital LibraryDigital Library
  84. Peng Li, Simon Marlow, Simon L. Peyton Jones, and Andrew P. Tolmach. 2007. Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell 2007, Freiburg, Germany, September 30, 2007. 107–118.Google ScholarGoogle Scholar
  85. Henry Lieberman and Carl E. Hewitt. 1981. A Real-Time Garbage Collector Based on the Lifetimes of Objects. AI Memo 569a. MIT. ftp://publications.ai.mit.edu/ai- publications/pdf/AIM- 569a.pdfGoogle ScholarGoogle Scholar
  86. J. M. Lucassen and D. K. Gifford. 1988. Polymorphic Effect Systems. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’88). ACM, New York, NY, USA, 47–57.Google ScholarGoogle Scholar
  87. Simon Marlow. 2011. Parallel and Concurrent Programming in Haskell. In Central European Functional Programming School - 4th Summer School, CEFP 2011, Budapest, Hungary, June 14-24, 2011, Revised Selected Papers. 339–401.Google ScholarGoogle Scholar
  88. John Mellor-Crummey. 1991. On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. In Proceedings of Supercomputing’91. 24–33.Google ScholarGoogle ScholarDigital LibraryDigital Library
  89. Stefan K. Muller and Umut A. Acar. 2016. Latency-Hiding Work Stealing: Scheduling Interacting Parallel Computations with Work Stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 71–82.Google ScholarGoogle Scholar
  90. Stefan K. Muller, Umut A. Acar, and Robert Harper. 2017. Responsive Parallel Computation: Bridging Competitive and Cooperative Threading. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 677–692.Google ScholarGoogle ScholarDigital LibraryDigital Library
  91. Stefan K. Muller, Umut A. Acar, and Robert Harper. 2018. Types and Cost Models for Responsive Parallelism (Draft). In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’18).Google ScholarGoogle Scholar
  92. Girija J. Narlikar. 1999. Scheduling threads for low space requirement and good locality. In 11th Annual ACM Symposium on Parallel Algorithms and Architectures. 83–95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  93. Robert H. B. Netzer and Barton P. Miller. 1992. What are Race Conditions? ACM Letters on Programming Languages and Systems 1, 1 (March 1992), 74–88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  94. Robert W. Numrich and John Reid. 1998. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum 17, 2 (Aug. 1998), 1–31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  95. Atsushi Ohori, Kenjiro Taura, and Katsuhiro Ueno. 2018. Making SML# a General-purpose High-performance Language. Unpublished Manuscript.Google ScholarGoogle Scholar
  96. Sungwoo Park, Frank Pfenning, and Sebastian Thrun. 2008. A Probabilistic Language Based on Sampling Functions. ACM Trans. Program. Lang. Syst. 31, 1, Article 4 (Dec. 2008), 46 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  97. Simon L. Peyton Jones, Roman Leshchinskiy, Gabriele Keller, and Manuel M. T. Chakravarty. 2008. Harnessing the Multicores: Nested Data Parallelism in Haskell. In FSTTCS. 383–414.Google ScholarGoogle Scholar
  98. Simon L. Peyton Jones and Philip Wadler. 1993. Imperative Functional Programming. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’93). 71–84.Google ScholarGoogle Scholar
  99. Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, Muhammad Amber Hassaan, Rashid Kaleem, TsungHsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 12–25.Google ScholarGoogle ScholarDigital LibraryDigital Library
  100. Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. 2016. Hierarchical Memory Management for Parallel Programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). ACM, New York, NY, USA, 392–406.Google ScholarGoogle ScholarDigital LibraryDigital Library
  101. Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin T. Vechev, and Eran Yahav. 2012. Scalable and precise dynamic datarace detection for structured parallelism. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012. 531–542.Google ScholarGoogle ScholarDigital LibraryDigital Library
  102. John C. Reynolds. 1978. Syntactic Control of Interference. In Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’78). ACM, New York, NY, USA, 39–46.Google ScholarGoogle ScholarDigital LibraryDigital Library
  103. John C. Reynolds. 2002. Separation Logic: A Logic for Shared Mutable Data Structures. In 17th IEEE Symposium on Logic in Computer Science (LICS 2002), 22-25 July 2002, Copenhagen, Denmark, Proceedings. 55–74.Google ScholarGoogle Scholar
  104. Dan Robinson. 2017. HPE shows The Machine — with 160TB of shared memory. Data Center Dynamics (May 2017).Google ScholarGoogle Scholar
  105. Douglas T. Ross. 1967. The AED free storage package. Commun. ACM 10, 8 (Aug. 1967), 481–492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  106. Rust Team. 2019. Rust Language. https://www.rust- lang.org/Google ScholarGoogle Scholar
  107. Jacob T. Schwartz. 1975. Optimization of very high level languages (parts I and II). Computer Languages 2–3, 1 (1975), 161–194,197–218.Google ScholarGoogle ScholarDigital LibraryDigital Library
  108. Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief Announcement: The Problem Based Benchmark Suite. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’12). 68–70.Google ScholarGoogle ScholarDigital LibraryDigital Library
  109. KC Sivaramakrishnan and Stephen Dolan. 2017. A deep dive into Multicore OCaml garbage collector. http://kcsrk.info/ multicore/gc/2017/07/06/multicore- ocaml- gc/ Unpublished manuscript.Google ScholarGoogle Scholar
  110. K. C. Sivaramakrishnan, Lukasz Ziarek, and Suresh Jagannathan. 2014. MultiMLton: A multicore-aware runtime for standard ML. Journal of Functional Programming FirstView (6 2014), 1–62.Google ScholarGoogle Scholar
  111. A. Sodani. 2015. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In 2015 IEEE Hot Chips 27 Symposium (HCS). 1–24.Google ScholarGoogle ScholarCross RefCross Ref
  112. SPACE 2001. Proceedings of the Second workshop on Semantics, Program Analysis and Computing Environments for Memory Management (SPACE’01). London. http://www.diku.dk/topps/space2001/Google ScholarGoogle Scholar
  113. Daniel Spoonhower. 2009. Scheduling Deterministic Parallel Programs. Ph.D. Dissertation. Carnegie Mellon University. https://www.cs.cmu.edu/~rwh/theses/spoonhower.pdfGoogle ScholarGoogle ScholarDigital LibraryDigital Library
  114. Guy L. Steele, Jr. 1994. Building Interpreters by Composing Monads. In Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’94). ACM, New York, NY, USA, 472–492.Google ScholarGoogle ScholarDigital LibraryDigital Library
  115. Guy L. Steele Jr. 1990. Making Asynchronous Parallelism Safe for the World. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Programming Languages (POPL). ACM Press, 218–231.Google ScholarGoogle ScholarDigital LibraryDigital Library
  116. Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM 22, 2 (April 1975), 215–225.Google ScholarGoogle ScholarDigital LibraryDigital Library
  117. Tachio Terauchi and Alex Aiken. 2008. Witnessing Side Effects. ACM Trans. Program. Lang. Syst. 30, 3, Article 15 (May 2008), 42 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  118. Mads Tofte and Jean-Pierre Talpin. 1997. Region-Based Memory Management. Information and Computation (Feb. 1997). http://www.diku.dk/research- groups/topps/activities/kit2/infocomp97.psGoogle ScholarGoogle Scholar
  119. Aaron Turon, Derek Dreyer, and Lars Birkedal. 2013. Unifying refinement and hoare-style reasoning in a logic for higherorder concurrency. In ACM SIGPLAN International Conference on Functional Programming, ICFP’13, Boston, MA, USA -September 25 - 27, 2013. 377–390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  120. David M. Ungar. 1984. Generation Scavenging: A Non-Disruptive High Performance Storage Reclamation Algorithm. ACM SIGPLAN Notices 19, 5 (April 1984), 157–167. Also published as ACM Software Engineering Notes 9, 3 (May 1984) — Proceedings of the ACM/SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, 157–167, April 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  121. Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 83–94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  122. Viktor Vafeiadis and Matthew J. Parkinson. 2007. A Marriage of Rely/Guarantee and Separation Logic. In CONCUR 2007 - Concurrency Theory, 18th International Conference, CONCUR 2007, Lisbon, Portugal, September 3-8, 2007, Proceedings. 256–271.Google ScholarGoogle Scholar
  123. David Walker. 2001. On Linear Types and Regions, See [ SPACE 2001 ]. http://www.diku.dk/topps/space2001/program.html# DavidWalkerGoogle ScholarGoogle Scholar
  124. Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham, David Gay, Phil Colella, and Alex Aiken. 1998. Titanium: a high-performance Java dialect. Concurrency: Practice and Experience 10, 11ÃćÂĂÂŘ13 (1998), 825–836.Google ScholarGoogle Scholar
  125. Lukasz Ziarek, K. C. Sivaramakrishnan, and Suresh Jagannathan. 2011. Composable asynchronous events. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 628–639.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Disentanglement in nested-parallel programs

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader