research-article

Open Access

Disentanglement in nested-parallel programs

Authors:
Sam Westrick

Carnegie Mellon University, USA

Carnegie Mellon University, USA
View Profile

,
Rohan Yadav

Carnegie Mellon University, USA

Carnegie Mellon University, USA
View Profile

,
Matthew Fluet

Rochester Institute of Technology, USA

Rochester Institute of Technology, USA
View Profile

,
Umut A. Acar

Carnegie Mellon University, USA

Carnegie Mellon University, USA
View Profile

Proceedings of the ACM on Programming Languages Volume 4 Issue POPLArticle No.: 47pp 1–32https://doi.org/10.1145/3371115

Published:20 December 2019Publication History

Proceedings of the ACM on Programming Languages

Abstract

Nested parallelism has proved to be a popular approach for programming the rapidly expanding range of multicore computers. It allows programmers to express parallelism at a high level and relies on a run-time system and a scheduler to deliver efficiency and scalability. As a result, many programming languages and extensions that support nested parallelism have been developed, including in C/C++, Java, Haskell, and ML. Yet, writing efficient and scalable nested parallel programs remains challenging, primarily due to difficult concurrency bugs arising from destructive updates or effects. For decades, researchers have argued that functional programming can simplify writing parallel programs by allowing more control over effects but functional programs continue to underperform in comparison to parallel programs written in lower-level languages. The fundamental difficulty with functional languages is that they have high demand for memory, and this demand only grows with parallelism.

In this paper, we identify a memory property, called disentanglement, of nested-parallel programs, and propose memory management techniques for improved efficiency and scalability. Disentanglement allows for (destructive) effects as long as concurrently executing threads do not gain knowledge of the memory objects allocated by each other. We formally define disentanglement by considering an ML-like higher-order language with mutable references and presenting a dynamic semantics for it that enables reasoning about computation graphs of nested parallel programs. Based on this graph semantics, we formalize a classic correctness property---determinacy race freedom---and prove that it implies disentanglement. This establishes that disentanglement applies to a relatively broad class of parallel programs. We then propose memory management techniques for nested-parallel programs that take advantage of disentanglement for improved efficiency and scalability. We show that these techniques are practical by extending the MLton compiler for Standard ML to support this form of nested parallelism. Our empirical evaluation shows that our techniques are efficient and scale well.

Supplemental Material

a47-westrick.webm

webm

95.1 MB

Download

Available for Download

zip

popl20main-p169-p-aux.zip (554.9 KB)

This companion appendix includes a proof of the main theorem.

References

Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2018a. Performance Challenges in Modular Parallel Programs. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’18). 381–382.Google ScholarDigital Library
Umut A. Acar, Vitaly Aksenov, Arthur Charguéraud, and Mike Rainey. 2019. Provably and Practically Efficient Granularity Control. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). 214–228.Google ScholarDigital Library
Umut A. Acar, Guy Blelloch, Matthew Fluet, Stefan K. Muller, and Ram Raghunathan. 2015a. Coupling Memory and Computation for Locality Management. In Summit on Advances in Programming Languages (SNAPL).Google Scholar
Umut A. Acar, Guy E. Blelloch, and Robert D. Blumofe. 2002. The Data Locality of Work Stealing. Theory of Computing Systems 35, 3 (2002), 321–347.Google ScholarCross Ref
Umut A. Acar, Arthur Charguéraud, Adrien Guatto, Mike Rainey, and Filip Sieczkowski. 2018b. Heartbeat Scheduling: Provable Efficiency for Nested Parallelism. In Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2018). 769–782.Google ScholarDigital Library
Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2013. Scheduling Parallel Programs by Work Stealing with Private Deques. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’13).Google ScholarDigital Library
Umut A. Acar, Arthur Chargueraud, and Mike Rainey. 2015b. A Work-Efficient Algorithm for Parallel Unordered Depth-First Search. In ACM/IEEE Conference on High Performance Computing (SC). ACM, New York, NY, USA, Article 67, 12 pages.Google ScholarDigital Library
Umut A. Acar, Arthur Charguéraud, and Mike Rainey. 2016. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. Journal of Functional Programming (JFP) 26 (2016), e23.Google ScholarCross Ref
Sarita V. Adve. 2010. Data races are evil with no exceptions: technical perspective. Commun. ACM 53, 11 (2010), 84.Google ScholarDigital Library
T. R. Allen and D. A. Padua. 1987. Debugging Fortran on a Shared Memory Machine. In Proceedings of the 1987 International Conference on Parallel Processing. 721–727.Google Scholar
B. Alpern, L. Carter, and E. Feig. 1990. Uniform memory hierarchies. In Proceedings 31st Annual Symposium on Foundations of Computer Science. 600–608 vol.2. Google ScholarDigital Library
Todd A. Anderson. 2010. Optimizations in a private nursery-based garbage collector. In Proceedings of the 9th International Symposium on Memory Management, ISMM 2010, Toronto, Ontario, Canada, June 5-6, 2010. 21–30.Google ScholarDigital Library
Andrew W. Appel. 1989. Simple Generational Garbage Collection and Fast Allocation. Software Prac. Experience 19, 2 (1989), 171–183. http://www.cs.princeton.edu/fac/~appel/papers/143.psGoogle ScholarDigital Library
Andrew W. Appel and Zhong Shao. 1996. Empirical and analytic study of stack versus heap cost for languages with closures. Journal of Functional Programming 6, 1 (Jan. 1996), 47–74. ftp://daffy.cs.yale.edu/pub/papers/shao/stack.psGoogle ScholarCross Ref
Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. 2001. Thread Scheduling for Multiprogrammed Multiprocessors. Theory of Computing Systems 34, 2 (2001), 115–144.Google ScholarCross Ref
Arvind, Rishiyur S. Nikhil, and Keshav K. Pingali. 1989. I-structures: Data Structures for Parallel Computing. ACM Trans. Program. Lang. Syst. 11, 4 (Oct. 1989), 598–632.Google Scholar
Sven Auhagen, Lars Bergstrom, Matthew Fluet, and John H. Reppy. 2011. Garbage collection for multicore NUMA machines. In Proceedings of the 2011 ACM SIGPLAN workshop on Memory Systems Performance and Correctness (MSPC). 51–57.Google Scholar
Stephanie Balzer, Bernardo Toninho, and Frank Pfenning. 2019. Manifest Deadlock-Freedom for Shared Session Types. In Programming Languages and Systems - 28th European Symposium on Programming, ESOP 2019, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019, Prague, Czech Republic, April 6-11, 2019, Proceedings. 611–639. Google ScholarCross Ref
M. Bauer, S. Treichler, E. Slaughter, and A. Aiken. 2012. Legion: Expressing locality and independence with logical regions. In SC ’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 1–11. Google ScholarDigital Library
Ales Bizjak, Daniel Gratzer, Robbert Krebbers, and Lars Birkedal. 2019. Iron: managing obligations in higher-order concurrent separation logic. PACMPL 3, POPL (2019), 65:1–65:30.Google ScholarDigital Library
Guy E. Blelloch. 1996. Programming Parallel Algorithms. Commun. ACM 39, 3 (1996), 85–97.Google ScholarDigital Library
Guy E. Blelloch, Rezaul A. Chowdhury, Phillip B. Gibbons, Vijaya Ramachandran, Shimin Chen, and Michael Kozuch. 2008. Provably good multicore cache performance for divide-and-conquer algorithms. In In the Proceedings of the 19th ACM-SIAM Symposium on Discrete Algorithms. 501–510.Google Scholar
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Julian Shun. 2012. Internally deterministic parallel algorithms can be fast. In PPoPP ’12. 181–192.Google Scholar
Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, and Harsha Vardhan Simhadri. 2011. Scheduling irregular parallel computations on hierarchical caches. In Proc. ACM Symposium on Parallelism in Algorithms and Architectures (SPAA). 355–366.Google ScholarDigital Library
Guy E. Blelloch and Phillip B. Gibbons. 2004. Effectively sharing a cache among threads. In SPAA.Google ScholarDigital Library
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Girija J. Narlikar. 1997. Space-efficient Scheduling of Parallelism with Synchronization Variables. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA ’97). 12–23.Google Scholar
Guy E. Blelloch, Jonathan C. Hardwick, Jay Sipelstein, Marco Zagha, and Siddhartha Chatterjee. 1994. Implementation of a Portable Nested Data-Parallel Language. J. Parallel Distrib. Comput. 21, 1 (1994), 4–14.Google ScholarDigital Library
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1995. Cilk: An Efficient Multithreaded Runtime System. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Santa Barbara, California, 207–216.Google ScholarDigital Library
Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. 1996. Cilk: An Efficient Multithreaded Runtime System. J. Parallel and Distrib. Comput. 37, 1 (1996), 55 – 69.Google ScholarDigital Library
Robert D. Blumofe and Charles E. Leiserson. 1998. Space-Efficient Scheduling of Multithreaded Computations. SIAM J. Comput. 27, 1 (1998), 202–229.Google ScholarDigital Library
Robert D. Blumofe and Charles E. Leiserson. 1999. Scheduling multithreaded computations by work stealing. J. ACM 46 (Sept. 1999), 720–748. Issue 5.Google ScholarDigital Library
Robert L. Bocchino, Stephen Heumann, Nima Honarmand, Sarita V. Adve, Vikram S. Adve, Adam Welc, and Tatiana Shpeisman. 2011. Safe nondeterminism in a deterministic-by-default parallel language. In ACM POPL.Google Scholar
Robert L. Bocchino, Jr., Vikram S. Adve, Danny Dig, Sarita V. Adve, Stephen Heumann, Rakesh Komuravelli, Jeffrey Overbey, Patrick Simmons, Hyojin Sung, and Mohsen Vakilian. 2009. A type and effect system for deterministic parallel Java. In Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications (OOPSLA ’09). 97–116.Google ScholarDigital Library
Robert L Bocchino, Jr., Vikram S. Adve, Sarita V. Adve, and Marc Snir. 2009. Parallel programming must be deterministic by default. In First USENIX Conference on Hot Topics in Parallelism.Google ScholarDigital Library
Hans-Juergen Boehm. 2011. How to Miscompile Programs with "Benign" Data Races. In 3rd USENIX Workshop on Hot Topics in Parallelism, HotPar’11, Berkeley, CA, USA, May 26-27, 2011.Google Scholar
Manuel M. T. Chakravarty, Roman Leshchinskiy, Simon L. Peyton Jones, Gabriele Keller, and Simon Marlow. 2007. Data parallel Haskell: a status report. In Proceedings of the POPL 2007 Workshop on Declarative Aspects of Multicore Programming, DAMP 2007, Nice, France, January 16, 2007. 10–18.Google ScholarDigital Library
Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. 2005. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (OOPSLA ’05). ACM, 519–538.Google ScholarDigital Library
C. J. Cheney. 1970. A Non-Recursive List Compacting Algorithm. Commun. ACM 13, 11 (Nov. 1970), 677–8.Google Scholar
Guang-Ien Cheng, Mingdong Feng, Charles E. Leiserson, Keith H. Randall, and Andrew F. Stark. 1998. Detecting data races in Cilk programs that use locks. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA ’98).Google Scholar
Intel Corp. 2017. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In Intel Xeon Processor E7 v4 Family Specification. https://ark.intel.com/products/series/93797/Intel- Xeon- Processor- E7- v4- Family .Google Scholar
Damien Doligez and Georges Gonthier. 1994. Portable, Unobtrusive Garbage Collection for Multiprocessor Systems. In Conference Record of the Twenty-first Annual ACM Symposium on Principles of Programming Languages (ACM SIGPLAN Notices). ACM Press, Portland, OR. ftp://ftp.inria.fr/INRIA/Projects/para/doligez/DoligezGonthier94.ps.gzGoogle Scholar
Damien Doligez and Xavier Leroy. 1993. A Concurrent Generational Garbage Collector for a Multi-Threaded Implementation of ML. In Conference Record of the Twentieth Annual ACM Symposium on Principles of Programming Languages (ACM SIGPLAN Notices). ACM Press, 113–123. file://ftp.inria.fr/INRIA/Projects/cristal/Xavier.Leroy/publications/concurrentgc.ps.gzGoogle Scholar
Tamar Domani, Elliot K. Kolodner, Ethan Lewis, Erez Petrank, and Dafna Sheinwald. 2002. Thread-Local Heaps for Java. In ISMM’02 Proceedings of the Third International Symposium on Memory Management (ACM SIGPLAN Notices), David Detlefs (Ed.). ACM Press, Berlin, 76–87. http://www.cs.technion.ac.il/~erez/publications.htmlGoogle Scholar
Martin Elsman. 2001. A Stack Machine for Region Based Programs, See [ SPACE 2001 ]. http://www.diku.dk/topps/space2001/ program.html#MartinElsmanGoogle Scholar
Perry A. Emrath, Sanjoy Ghosh, and David A. Padua. 1991. Event Synchronization Analysis for Debugging Parallel Programs. In Supercomputing ’91. 580–588.Google Scholar
Perry A. Emrath and Davis A. Padua. 1988. Automatic Detection of Nondeterminacy in Parallel Programs. In Proceedings of the Workshop on Parallel and Distributed Debugging. 89–99.Google Scholar
Kayvon Fatahalian, Daniel Reiter Horn, Timothy J. Knight, Larkhoon Leem, Mike Houston, Ji Young Park, Mattan Erez, Manman Ren, Alex Aiken, William J. Dally, and Pat Hanrahan. 2006. Sequoia: Programming the Memory Hierarchy. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing (SC ’06). ACM, New York, NY, USA, Article 83.Google ScholarDigital Library
Mingdong Feng and Charles E. Leiserson. 1997. Efficient Detection of Determinacy Races in Cilk Programs. In Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA). 1–11.Google Scholar
Mingdong Feng and Charles E. Leiserson. 1999. Efficient Detection of Determinacy Races in Cilk Programs. Theory of Computing Systems 32, 3 (1999), 301–326.Google ScholarCross Ref
Cormac Flanagan and Stephen N. Freund. 2009. FastTrack: efficient and precise dynamic race detection. SIGPLAN Not. 44, 6 (June 2009), 121–133. Google ScholarDigital Library
Cormac Flanagan, Stephen N. Freund, Marina Lifshin, and Shaz Qadeer. 2008. Types for atomicity: Static checking and inference for Java. ACM Trans. Program. Lang. Syst. 30, 4 (2008), 20:1–20:53.Google ScholarDigital Library
Matthew Fluet, Greg Morrisett, and Amal J. Ahmed. 2006. Linear Regions Are All You Need. In Proceedings of the 15th Annual European Symposium on Programming (ESOP).Google Scholar
Matthew Fluet, Mike Rainey, and John Reppy. 2008. A scheduling framework for general-purpose parallel languages. In ACM SIGPLAN International Conference on Functional Programming (ICFP).Google ScholarDigital Library
Matthew Fluet, Mike Rainey, John Reppy, and Adam Shaw. 2011. Implicitly threaded parallelism in Manticore. Journal of Functional Programming 20, 5-6 (2011), 1–40.Google ScholarDigital Library
Matthew Fluet, Mike Rainey, John Reppy, Adam Shaw, and Yingqi Xiao. 2007. Manticore: A Heterogeneous Parallel Language. In Proceedings of the 2007 Workshop on Declarative Aspects of Multicore Programming (DAMP ’07). 37–44.Google ScholarDigital Library
Matteo Frigo, Pablo Halpern, Charles E. Leiserson, and Stephen Lewin-Berlin. 2009. Reducers and Other Cilk++ Hyperobjects. In 21st Annual ACM Symposium on Parallelism in Algorithms and Architectures. 79–90.Google ScholarDigital Library
Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. 1998. The Implementation of the Cilk-5 Multithreaded Language. In PLDI. 212–223.Google Scholar
David K. Gifford and John M. Lucassen. 1986. Integrating Functional and Imperative Programming. In Proceedings of the ACM Symposium on Lisp and Functional Programming (LFP). ACM Press, 22–38.Google Scholar
Marcelo J. R. Gonçalves. 1995. Cache Performance of Programs with Intensive Heap Allocation and Generational Garbage Collection. Ph.D. Dissertation. Department of Computer Science, Princeton University.Google Scholar
Marcelo J. R. Gonçalves and Andrew W. Appel. 1995. Cache Performance of Fast-Allocating Programs. In Record of the 1995 Conference on Functional Programming and Computer Architecture.Google Scholar
Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-Based Memory Management in Cyclone. In Proceedings of SIGPLAN 2002 Conference on Programming Languages Design and Implementation (ACM SIGPLAN Notices). ACM Press, Berlin, 282–293.Google ScholarDigital Library
Adrien Guatto, Sam Westrick, Ram Raghunathan, Umut A. Acar, and Matthew Fluet. 2018. Hierarchical memory management for mutable state. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2018, Vienna, Austria, February 24-28, 2018. 81–93.Google ScholarDigital Library
Robert H. Halstead, Jr. 1984. Implementation of Multilisp: Lisp on a Multiprocessor. In Proceedings of the 1984 ACM Symposium on LISP and functional programming (LFP ’84). ACM, 9–17.Google ScholarDigital Library
Kevin Hammond. 2011. Why Parallel Functional Programming Matters: Panel Statement. In Reliable Software Technologies -Ada-Europe 2011 - 16th Ada-Europe International Conference on Reliable Software Technologies, Edinburgh, UK, June 20-24, 2011. Proceedings. 201–205.Google Scholar
David R. Hanson. 1990. Fast Allocation and Deallocation of Memory Based on Object Lifetimes. Software Prac. Experience 20, 1 (Jan. 1990), 5–12.Google ScholarDigital Library
Shams Mahmood Imam and Vivek Sarkar. 2014. Habanero-Java library: a Java 8 framework for multicore programming. In 2014 International Conference on Principles and Practices of Programming on the Java Platform Virtual Machines, Languages and Tools, PPPJ ’14. 75–86.Google ScholarCross Ref
Intel. 2011. Intel Threading Building Blocks. https://www.threadingbuildingblocks.org/ .Google Scholar
Intel Corporation 2009a. Intel Cilk++ SDK Programmer’s Guide. Intel Corporation. Document Number: 322581-001US.Google Scholar
Intel Corporation 2009b. Intel(R) Threading Building Blocks. Intel Corporation. Available from http://www. threadingbuildingblocks.org/documentation.php .Google Scholar
Richard Jones, Antony Hosking, and Eliot Moss. 2011. The garbage collection handbook: the art of automatic memory management. Chapman & Hall/CRC.Google ScholarDigital Library
Ralf Jung, Jacques-Henri Jourdan, Robbert Krebbers, and Derek Dreyer. 2018a. RustBelt: securing the foundations of the rust programming language. PACMPL 2, POPL (2018), 66:1–66:34. Google ScholarDigital Library
Ralf Jung, Robbert Krebbers, Jacques-Henri Jourdan, Ales Bizjak, Lars Birkedal, and Derek Dreyer. 2018b. Iris from the ground up: A modular foundation for higher-order concurrent separation logic. J. Funct. Program. 28 (2018), e20.Google ScholarCross Ref
Gabriele Keller, Manuel M.T. Chakravarty, Roman Leshchinskiy, Simon Peyton Jones, and Ben Lippmeier. 2010. Regular, shape-polymorphic, parallel arrays in Haskell. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (ICFP ’10). 261–272.Google ScholarDigital Library
A. Krishnamurthy, D. E. Culler, A. Dusseau, S. C. Goldstein, S. Lumetta, T. von Eicken, and K. Yelick. 1993. Parallel Programming in Split-C. In Proceedings of the 1993 ACM/IEEE Conference on Supercomputing (Supercomputing ’93). ACM, New York, NY, USA, 262–273. Google ScholarDigital Library
Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew. 2007. Optimistic Parallelism Requires Abstractions. In Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’07). 211–222.Google Scholar
Lindsey Kuper and Ryan R Newton. 2013. LVars: lattice-based data structures for deterministic parallelism. In Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing. ACM, 71–84.Google ScholarDigital Library
Lindsey Kuper, Aaron Todd, Sam Tobin-Hochstadt, and Ryan R. Newton. 2014a. Taming the Parallel Effect Zoo: Extensible Deterministic Parallelism with LVish. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’14). ACM, New York, NY, USA, 2–14. Google ScholarDigital Library
Lindsey Kuper, Aaron Turon, Neelakantan R. Krishnaswami, and Ryan R. Newton. 2014b. Freeze After Writing: Quasideterministic Parallel Programming with LVars. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’14). ACM, New York, NY, USA, 257–270.Google Scholar
Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a social network or a news media?. In WWW ’10. ACM, 591–600.Google Scholar
John Launchbury and Simon L. Peyton Jones. 1994. Lazy Functional State Threads. In Proceedings of the ACM SIGPLAN’94 Conference on Programming Language Design and Implementation (PLDI), Orlando, Florida, USA, June 20-24, 1994. 24–35.Google Scholar
Matthew Le and Matthew Fluet. 2015. Partial Aborts for Transactions via First-class Continuations. In Proceedings of the 20th ACM SIGPLAN International Conference on Functional Programming (ICFP 2015). 230–242.Google ScholarDigital Library
Doug Lea. 2000. A Java fork/join framework. In Proceedings of the ACM 2000 conference on Java Grande (JAVA ’00). 36–43.Google ScholarDigital Library
Daan Leijen, Wolfram Schulte, and Sebastian Burckhardt. 2009. The design of a task parallel library. In Proceedings of the 24th ACM SIGPLAN conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’09). 227–242.Google ScholarDigital Library
Peng Li, Simon Marlow, Simon L. Peyton Jones, and Andrew P. Tolmach. 2007. Lightweight concurrency primitives for GHC. In Proceedings of the ACM SIGPLAN Workshop on Haskell, Haskell 2007, Freiburg, Germany, September 30, 2007. 107–118.Google Scholar
Henry Lieberman and Carl E. Hewitt. 1981. A Real-Time Garbage Collector Based on the Lifetimes of Objects. AI Memo 569a. MIT. ftp://publications.ai.mit.edu/ai- publications/pdf/AIM- 569a.pdfGoogle Scholar
J. M. Lucassen and D. K. Gifford. 1988. Polymorphic Effect Systems. In Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’88). ACM, New York, NY, USA, 47–57.Google Scholar
Simon Marlow. 2011. Parallel and Concurrent Programming in Haskell. In Central European Functional Programming School - 4th Summer School, CEFP 2011, Budapest, Hungary, June 14-24, 2011, Revised Selected Papers. 339–401.Google Scholar
John Mellor-Crummey. 1991. On-the-fly Detection of Data Races for Programs with Nested Fork-Join Parallelism. In Proceedings of Supercomputing’91. 24–33.Google ScholarDigital Library
Stefan K. Muller and Umut A. Acar. 2016. Latency-Hiding Work Stealing: Scheduling Interacting Parallel Computations with Work Stealing. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 71–82.Google Scholar
Stefan K. Muller, Umut A. Acar, and Robert Harper. 2017. Responsive Parallel Computation: Bridging Competitive and Cooperative Threading. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM, New York, NY, USA, 677–692.Google ScholarDigital Library
Stefan K. Muller, Umut A. Acar, and Robert Harper. 2018. Types and Cost Models for Responsive Parallelism (Draft). In Proceedings of the 14th ACM SIGPLAN International Conference on Functional Programming (ICFP ’18).Google Scholar
Girija J. Narlikar. 1999. Scheduling threads for low space requirement and good locality. In 11th Annual ACM Symposium on Parallel Algorithms and Architectures. 83–95.Google ScholarDigital Library
Robert H. B. Netzer and Barton P. Miller. 1992. What are Race Conditions? ACM Letters on Programming Languages and Systems 1, 1 (March 1992), 74–88.Google ScholarDigital Library
Robert W. Numrich and John Reid. 1998. Co-array Fortran for Parallel Programming. SIGPLAN Fortran Forum 17, 2 (Aug. 1998), 1–31. Google ScholarDigital Library
Atsushi Ohori, Kenjiro Taura, and Katsuhiro Ueno. 2018. Making SML# a General-purpose High-performance Language. Unpublished Manuscript.Google Scholar
Sungwoo Park, Frank Pfenning, and Sebastian Thrun. 2008. A Probabilistic Language Based on Sampling Functions. ACM Trans. Program. Lang. Syst. 31, 1, Article 4 (Dec. 2008), 46 pages.Google ScholarDigital Library
Simon L. Peyton Jones, Roman Leshchinskiy, Gabriele Keller, and Manuel M. T. Chakravarty. 2008. Harnessing the Multicores: Nested Data Parallelism in Haskell. In FSTTCS. 383–414.Google Scholar
Simon L. Peyton Jones and Philip Wadler. 1993. Imperative Functional Programming. In Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’93). 71–84.Google Scholar
Keshav Pingali, Donald Nguyen, Milind Kulkarni, Martin Burtscher, Muhammad Amber Hassaan, Rashid Kaleem, TsungHsien Lee, Andrew Lenharth, Roman Manevich, Mario Méndez-Lojo, Dimitrios Prountzos, and Xin Sui. 2011. The tao of parallelism in algorithms. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 12–25.Google ScholarDigital Library
Ram Raghunathan, Stefan K. Muller, Umut A. Acar, and Guy Blelloch. 2016. Hierarchical Memory Management for Parallel Programs. In Proceedings of the 21st ACM SIGPLAN International Conference on Functional Programming (ICFP 2016). ACM, New York, NY, USA, 392–406.Google ScholarDigital Library
Raghavan Raman, Jisheng Zhao, Vivek Sarkar, Martin T. Vechev, and Eran Yahav. 2012. Scalable and precise dynamic datarace detection for structured parallelism. In ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’12, Beijing, China - June 11 - 16, 2012. 531–542.Google ScholarDigital Library
John C. Reynolds. 1978. Syntactic Control of Interference. In Proceedings of the 5th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL ’78). ACM, New York, NY, USA, 39–46.Google ScholarDigital Library
John C. Reynolds. 2002. Separation Logic: A Logic for Shared Mutable Data Structures. In 17th IEEE Symposium on Logic in Computer Science (LICS 2002), 22-25 July 2002, Copenhagen, Denmark, Proceedings. 55–74.Google Scholar
Dan Robinson. 2017. HPE shows The Machine — with 160TB of shared memory. Data Center Dynamics (May 2017).Google Scholar
Douglas T. Ross. 1967. The AED free storage package. Commun. ACM 10, 8 (Aug. 1967), 481–492.Google ScholarDigital Library
Rust Team. 2019. Rust Language. https://www.rust- lang.org/Google Scholar
Jacob T. Schwartz. 1975. Optimization of very high level languages (parts I and II). Computer Languages 2–3, 1 (1975), 161–194,197–218.Google ScholarDigital Library
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, Phillip B. Gibbons, Aapo Kyrola, Harsha Vardhan Simhadri, and Kanat Tangwongsan. 2012. Brief Announcement: The Problem Based Benchmark Suite. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA ’12). 68–70.Google ScholarDigital Library
KC Sivaramakrishnan and Stephen Dolan. 2017. A deep dive into Multicore OCaml garbage collector. http://kcsrk.info/ multicore/gc/2017/07/06/multicore- ocaml- gc/ Unpublished manuscript.Google Scholar
K. C. Sivaramakrishnan, Lukasz Ziarek, and Suresh Jagannathan. 2014. MultiMLton: A multicore-aware runtime for standard ML. Journal of Functional Programming FirstView (6 2014), 1–62.Google Scholar
A. Sodani. 2015. Knights landing (KNL): 2nd Generation Intel Xeon Phi processor. In 2015 IEEE Hot Chips 27 Symposium (HCS). 1–24.Google ScholarCross Ref
SPACE 2001. Proceedings of the Second workshop on Semantics, Program Analysis and Computing Environments for Memory Management (SPACE’01). London. http://www.diku.dk/topps/space2001/Google Scholar
Daniel Spoonhower. 2009. Scheduling Deterministic Parallel Programs. Ph.D. Dissertation. Carnegie Mellon University. https://www.cs.cmu.edu/~rwh/theses/spoonhower.pdfGoogle ScholarDigital Library
Guy L. Steele, Jr. 1994. Building Interpreters by Composing Monads. In Proceedings of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’94). ACM, New York, NY, USA, 472–492.Google ScholarDigital Library
Guy L. Steele Jr. 1990. Making Asynchronous Parallelism Safe for the World. In Proceedings of the Seventeenth Annual ACM Symposium on Principles of Programming Languages (POPL). ACM Press, 218–231.Google ScholarDigital Library
Robert Endre Tarjan. 1975. Efficiency of a Good But Not Linear Set Union Algorithm. J. ACM 22, 2 (April 1975), 215–225.Google ScholarDigital Library
Tachio Terauchi and Alex Aiken. 2008. Witnessing Side Effects. ACM Trans. Program. Lang. Syst. 30, 3, Article 15 (May 2008), 42 pages.Google ScholarDigital Library
Mads Tofte and Jean-Pierre Talpin. 1997. Region-Based Memory Management. Information and Computation (Feb. 1997). http://www.diku.dk/research- groups/topps/activities/kit2/infocomp97.psGoogle Scholar
Aaron Turon, Derek Dreyer, and Lars Birkedal. 2013. Unifying refinement and hoare-style reasoning in a logic for higherorder concurrency. In ACM SIGPLAN International Conference on Functional Programming, ICFP’13, Boston, MA, USA -September 25 - 27, 2013. 377–390.Google ScholarDigital Library
David M. Ungar. 1984. Generation Scavenging: A Non-Disruptive High Performance Storage Reclamation Algorithm. ACM SIGPLAN Notices 19, 5 (April 1984), 157–167. Also published as ACM Software Engineering Notes 9, 3 (May 1984) — Proceedings of the ACM/SIGSOFT/SIGPLAN Software Engineering Symposium on Practical Software Development Environments, 157–167, April 1984.Google ScholarDigital Library
Robert Utterback, Kunal Agrawal, Jeremy T. Fineman, and I-Ting Angelina Lee. 2016. Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs. In Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2016, Asilomar State Beach/Pacific Grove, CA, USA, July 11-13, 2016. 83–94.Google ScholarDigital Library
Viktor Vafeiadis and Matthew J. Parkinson. 2007. A Marriage of Rely/Guarantee and Separation Logic. In CONCUR 2007 - Concurrency Theory, 18th International Conference, CONCUR 2007, Lisbon, Portugal, September 3-8, 2007, Proceedings. 256–271.Google Scholar
David Walker. 2001. On Linear Types and Regions, See [ SPACE 2001 ]. http://www.diku.dk/topps/space2001/program.html# DavidWalkerGoogle Scholar
Kathy Yelick, Luigi Semenzato, Geoff Pike, Carleton Miyamoto, Ben Liblit, Arvind Krishnamurthy, Paul Hilfinger, Susan Graham, David Gay, Phil Colella, and Alex Aiken. 1998. Titanium: a high-performance Java dialect. Concurrency: Practice and Experience 10, 11ÃćÂĂÂŘ13 (1998), 825–836.Google Scholar
Lukasz Ziarek, K. C. Sivaramakrishnan, and Suresh Jagannathan. 2011. Composable asynchronous events. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI 2011, San Jose, CA, USA, June 4-8, 2011. 628–639.Google ScholarDigital Library

Index Terms

Disentanglement in nested-parallel programs
1. Software and its engineering
  1. Software notations and tools
    1. General programming languages
      1. Language types
        Functional languages
        Parallel programming languages
  2. Software organization and properties
    1. Contextual software domains
      1. Operating systems
        Memory management
        Garbage collection
2. Theory of computation
  1. Design and analysis of algorithms
    1. Parallel algorithms

Recommendations

Provably space-efficient parallel functional programming

Because of its many desirable properties, such as its ability to control effects and thus potentially disastrous race conditions, functional programming offers a viable approach to programming modern multicore computers. Over the past decade several ...
Read More
Efficient Parallel Functional Programming with Effects

Although functional programming languages simplify writing safe parallel programs by helping programmers to avoid data races, they have traditionally delivered poor performance. Recent work improved performance by using a hierarchical memory ...
Read More
On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs
SPAA '04: Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures

A key capability of data-race detectors is to determine whether one thread executes logically in parallel with another or whether the threads must operate in series. This paper provides two algorithms, one serial and one parallel, to maintain series-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

Proceedings of the ACM on Programming Languages Volume 4, Issue POPL
January 2020
1984 pages
EISSN:2475-1421
DOI:10.1145/3377388
Issue’s Table of Contents

Copyright © 2019 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 December 2019
Published in pacmpl Volume 4, Issue POPL

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
data race
disentanglement
functional programming
memory management
parallel computing
Qualifiers
- research-article
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 9
  Total Citations
  View Citations
- 1,271
  Total Downloads
- Downloads (Last 12 months)215
- Downloads (Last 6 weeks)59
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Disentanglement in nested-parallel programs

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Provably space-efficient parallel functional programming

Efficient Parallel Functional Programming with Effects

On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Disentanglement in nested-parallel programs

Proceedings of the ACM on Programming Languages

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Provably space-efficient parallel functional programming

Efficient Parallel Functional Programming with Effects

On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media