skip to main content
research-article
Open Access

A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories

Published:17 November 2014Publication History
Skip Abstract Section

Abstract

Spatial architectures provide energy-efficient computation but require effective scheduling algorithms. Existing heuristic-based approaches offer low compiler/architect productivity, little optimality insight, and low architectural portability.

We seek to develop a spatial-scheduling framework by utilizing constraint-solving theories and find that architecture primitives and scheduler responsibilities can be related through five abstractions: computation placement, data routing, event timing, resource utilization, and the optimization objective. We encode these responsibilities as 20 mathematical constraints, using SMT and ILP, and create schedulers for the TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using constraint solving is implementable, is practical, and can outperform specialized schedulers.

References

  1. Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Amarasinghe, D. R. Karger, W. Lee, and V. S. Mirrokni. 2002. A Theoretical and Practical Approach to Instruction Scheduling on Spatial Architectures. Technical Report. MIT.Google ScholarGoogle Scholar
  3. Said Amellal and Bozena Kaminska. 1994. Functional synthesis of digital systems with TASS. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 13, 5 (May 1994), 537--552. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Corinne Ancourt and François Irigoin. 1991. Scanning polyhedra with DO loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'91). 39--50. DOI:http://dx.doi.org/10.1145/109625.109631 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Omid Azizi, Aqeel Mahesri, Benjamin C. Lee, Sanjay J. Patel, and Mark Horowitz. 2010. Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). ACM, 26--36. DOI:http://dx.doi.org/10.1145/1815961.1815967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Shuvra S. Battacharyya, Edward A. Lee, and Praveen K. Murthy. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5 (2011), 67--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Doug Burger, Stephen W. Keckler, Kathryn S. McKinley, Michael Dahlin, Lizy K. John, Calvin Lin, Chuck R. Moore, Jim Burrill, Robert G. McDonald, William Yoder, and the TRIPS Team. 2004. Scaling to the end of silicon with EDGE architectures. IEEE Comput. 37, 7 (2004), 44--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alessandro Cimatti, Anders Franzén, Alberto Griggio, Roberto Sebastiani, and Cristian Stenico. 2010. Satisfiability Modulo the Theory of Costs: Foundations and Applications. (TACAS 2010), 99--113. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 37). 30--40. DOI:http://dx.doi.org/10.1109/MICRO.2004.5 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jason Cong, Karthik Gururaj, Guoling Han, and Wei Jiang. 2009. Synthesis algorithm for application-specific homogeneous processor networks. IEEE Trans. Very Large Scale Integr. Syst. 17, 9 (Sept. 2009). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Katherine E. Coons, Xia Chen, Doug Burger, Kathryn S. McKinley, and Sundeep K. Kushwaha. 2006. A spatial path scheduling algorithm for EDGE architectures. SIGARCH Comput. Archit. News 34, 5 (Oct. 2006), 129--140. DOI:http://dx.doi.org/10.1145/1168919.1168875 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lorenzo De Carli, Yi Pan, Amit Kumar, Cristian Estan, and Karthikeyan Sankaralingam. 2009. PLUG: Flexible lookup modules for rapid deployment of new protocols in high-speed routers. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication (SIGCOMM'09). 207--218. DOI:http://dx.doi.org/10.1145/1592568.1592593 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In TACAS. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Abhishek Deb, Josep Maria Codina, and Antonio González. 2011. SoftHV: A HW/SW co-designed processor with horizontal and vertical fusion. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF'11). Article 1, 10 pages. DOI:http://dx.doi.org/10.1145/2016604.2016606 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alexandre E. Eichenberger and Edward S. Davidson. 1997. Efficient formulation for optimal modulo schedulers. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI'97). 194--205. DOI:http://dx.doi.org/10.1145/258915.258933 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Christine Eisenbeis and Antoine Sawaya. 1996. Optimal Loop Parallelization under Register Constraints. Research Report RR-2781, Inria.Google ScholarGoogle Scholar
  18. John R. Ellis. 1985. Bulldog: A Compiler for Vliw Architectures. Ph.D. Dissertation, Yale. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Daniel W. Engels, Jon Feldman, David R. Karger, and Matthias Ruhl. 2001. Parallel processor scheduling with delay constraints. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'01). 577--585. http://dl.acm.org/citation.cfm?id=365411.365538 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. SIGARCH Comput. Archit. News 39, 3 (June 2011), 365--376. DOI:http://dx.doi.org/10.1145/2024723.2000108 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'12). IEEE Computer Society, Washington, DC, 449--460. DOI:http://dx.doi.org/10.1109/MICRO.2012.48 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kevin Fan, Hyun hul Park, Manjunath Kudlur, and Scott Mahlke. 2008. Modulo scheduling for highly customized datapaths to increase hardware reusability. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'08). ACM, New York, NY, 124--133. DOI:http://dx.doi.org/10.1145/1356058.1356075 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Paul Feautrier. 1994. Fine-grain scheduling under resource constraints. In Proceedings of the 7th Workshop on Language and Compilers for Parallel Computing. Springer-Verlag, LNCS 892, 1--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Mark Gebhart, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatmili, Aaron Smith, James Burrill, Stephen W. Keckler, Doug Burger, and Kathryn S. McKinley. 2009. An evaluation of the TRIPS computer system. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Geoffrey J. Gordon, Sue Ann Hong, and Miroslav Dudík. First-order mixed integer linear programming. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI'09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Ramaswamy Govindarajan, Erik R. Altman, and Guang R. Gao. 1994. A framework for resource-constrained rate-optimal software pipelining. In Proceedings of the Conference on Vector and Parallel Processing (CONPAR-94 VAPP VI). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy efficient computing. IEEE Micro 33, 5 (2012). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA). 503--514. DOI:http://dx.doi.org/10.1109/HPCA.2011.5749755 Google ScholarGoogle ScholarCross RefCross Ref
  29. Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, and David August. 2011. Bundled execution of recurring traces for energy-efficient general purpose processing. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44'11). 12--23. DOI:http://dx.doi.org/10.1145/2155620.2155623 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011), 6--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. John N. Hooker. 2002. Logic, optimization and constraint programming. INFORMS J. Comput. 14 (2002), 295--321. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. John N. Hooker and María Auxilio Osorio Lama. 1999. Mixed logical-linear programming. Discrete Appl. Math. 96--97, 1 (Oct. 1999). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Zhining Huang, Sharad Malik, Nahri Moreano, and Guido Araujo. 2004. The design of dynamically reconfigurable datapath coprocessors. ACM Trans. Embed. Comput. Syst. 3, 2 (May 2004), 361--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: A goal-directed superoptimizer. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI'02). 304--314. DOI:http://dx.doi.org/10.1145/512529.512566 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Krishnan Kailas, Ashok Agrawala, and Kemal Ebcioglu. 2001. CARS: A new code generation framework for clustered ILP processors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA'01). 133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Daniel Kroening and Ofer Strichman. 2010. Decision Procedures: An Algorithmic Point of View. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Manjunath Kudlur and Scott Mahlke. 2008. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). 114--124. DOI:http://dx.doi.org/10.1145/1375581.1375596 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Amit Kumar, Lorenzo De Carli, Sung Jin Kim, Marc de Kruijf, Karthikeyan Sankaralingam, Cristian Estan, and Somesh Jha. 2010. Design and implementation of the PLUG architecture for programmable and efficient network lookups. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 331--342. DOI:http://dx.doi.org/10.1145/1854273.1854316 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. 1998. Space-time scheduling of instruction-level parallelism on a raw machine. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). 46--57. DOI:http://dx.doi.org/10.1145/291069.291018 Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. 2006a. Instruction scheduling for a tiled dataflow architecture. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). 141--150. DOI:http://dx.doi.org/10.1145/1168857.1168876 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. 2006b. Modeling instruction placement on a spatial architecture. In Proceedings of the 18th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating spatial computation for whole program execution. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). 163--174. DOI:http://dx.doi.org/10.1145/1168857.1168878 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ramadass Nagarajan, Sundeep K. Kushwaha, Doug Burger, Kathryn S. McKinley, Calvin Lin, and Stephen W. Keckler. 2004. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). 74--84. DOI:http://dx.doi.org/10.1109/PACT.2004.26 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Emre Özer, Sanjeev Banerjia, and Thomas M. Conte. 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO'31). 308--315. http://dl.acm.org/ citation.cfm?id=290940.291004 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Jens Palsberg and Mayur Naik. 2004. ILP-Based Resource-Aware Compilation. (Multiprocessor Systems-on-Chips, chapter 12. Elsevier, 2004).Google ScholarGoogle Scholar
  46. Hyunchul Park, Kevin Fan, Scott A. Mahlke, Taewook Oh, Heeseok Kim, and Hong-seok Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 166--176. DOI:http://dx.doi.org/10.1145/1454115.1454140 Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. William Pugh. 1991. The Omega test: A fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing'91). Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Michael Sartin-Tarm, Tony Nowatzki, Lorenzo De Carli, Karthikeyan Sankaralingam, and Cristian Estan. 2013. Constraint centric scheduling guide. SIGARCH Comput. Archit. News 41, 2 (May 2013), 17--21. DOI:http://dx.doi.org/10.1145/2490302.2490306 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Nadathur Satish, Kaushik Ravindran, and Kurt Keutzer. 2007. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Robert E. Shostak. 1984. Deciding combinations of theories. J. ACM 31, 1 (Jan. 1984), 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Steven Swanson, Ken Michelson, Andrew Schwerin, and Mark Oskin. 2003. WaveScalar. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 36). 291. http://dl.acm.org/citation.cfm?id=956417.956546 Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. M. Thuresson, M. Sjalander, M. Bjork, L. Svensson, P. Larsson-Edefors, and P. Stenstrom. 2007. FlexCore: Utilizing exposed datapath control for efficient computing. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS'07).Google ScholarGoogle Scholar
  53. Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Harvey M. Wagner. 1959. An integer linear-programming model for machine scheduling. Naval Res. Logistics Quarterly 6, 2 (1959), 131--140.Google ScholarGoogle ScholarCross RefCross Ref
  55. Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. 1997. Baring it all to software: RAW machines. Computer 30, 9 (1997), 86--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. M. A. Watkins, M. J. Cianchetti, and D. H. Albonesi. 2008. Shared reconfigurable architectures for CMPS. In Proceedings of the 16th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'08). 299--304.Google ScholarGoogle Scholar
  57. Laurence A. Wolsey and George L. Nemhauser. 1999. Integer and Combinatorial Optimization. Wiley.Google ScholarGoogle Scholar

Index Terms

  1. A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Programming Languages and Systems
          ACM Transactions on Programming Languages and Systems  Volume 37, Issue 1
          January 2015
          170 pages
          ISSN:0164-0925
          EISSN:1558-4593
          DOI:10.1145/2688877
          Issue’s Table of Contents

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 November 2014
          • Accepted: 1 July 2014
          • Revised: 1 March 2014
          • Received: 1 October 2013
          Published in toplas Volume 37, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader