Abstract
Spatial architectures provide energy-efficient computation but require effective scheduling algorithms. Existing heuristic-based approaches offer low compiler/architect productivity, little optimality insight, and low architectural portability.
We seek to develop a spatial-scheduling framework by utilizing constraint-solving theories and find that architecture primitives and scheduler responsibilities can be related through five abstractions: computation placement, data routing, event timing, resource utilization, and the optimization objective. We encode these responsibilities as 20 mathematical constraints, using SMT and ILP, and create schedulers for the TRIPS, DySER, and PLUG architectures. Our results show that a general declarative approach using constraint solving is implementable, is practical, and can outperform specialized schedulers.
- Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd ed.). Google ScholarDigital Library
- S. Amarasinghe, D. R. Karger, W. Lee, and V. S. Mirrokni. 2002. A Theoretical and Practical Approach to Instruction Scheduling on Spatial Architectures. Technical Report. MIT.Google Scholar
- Said Amellal and Bozena Kaminska. 1994. Functional synthesis of digital systems with TASS. IEEE Trans. Comput. Aided Des. Integr. Circ. Syst. 13, 5 (May 1994), 537--552. Google ScholarDigital Library
- Corinne Ancourt and François Irigoin. 1991. Scanning polyhedra with DO loops. In Proceedings of the 3rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPOPP'91). 39--50. DOI:http://dx.doi.org/10.1145/109625.109631 Google ScholarDigital Library
- Omid Azizi, Aqeel Mahesri, Benjamin C. Lee, Sanjay J. Patel, and Mark Horowitz. 2010. Energy-performance tradeoffs in processor architecture and circuit design: A marginal cost analysis. In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA'10). ACM, 26--36. DOI:http://dx.doi.org/10.1145/1815961.1815967 Google ScholarDigital Library
- Shuvra S. Battacharyya, Edward A. Lee, and Praveen K. Murthy. 1996. Software Synthesis from Dataflow Graphs. Kluwer Academic. Google ScholarDigital Library
- Shekhar Borkar and Andrew A. Chien. 2011. The future of microprocessors. Commun. ACM 54, 5 (2011), 67--77. Google ScholarDigital Library
- Doug Burger, Stephen W. Keckler, Kathryn S. McKinley, Michael Dahlin, Lizy K. John, Calvin Lin, Chuck R. Moore, Jim Burrill, Robert G. McDonald, William Yoder, and the TRIPS Team. 2004. Scaling to the end of silicon with EDGE architectures. IEEE Comput. 37, 7 (2004), 44--55. Google ScholarDigital Library
- Alessandro Cimatti, Anders Franzén, Alberto Griggio, Roberto Sebastiani, and Cristian Stenico. 2010. Satisfiability Modulo the Theory of Costs: Foundations and Applications. (TACAS 2010), 99--113. Google ScholarDigital Library
- Nathan Clark, Manjunath Kudlur, Hyunchul Park, Scott Mahlke, and Krisztian Flautner. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the 37th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 37). 30--40. DOI:http://dx.doi.org/10.1109/MICRO.2004.5 Google ScholarDigital Library
- Jason Cong, Karthik Gururaj, Guoling Han, and Wei Jiang. 2009. Synthesis algorithm for application-specific homogeneous processor networks. IEEE Trans. Very Large Scale Integr. Syst. 17, 9 (Sept. 2009). Google ScholarDigital Library
- Katherine E. Coons, Xia Chen, Doug Burger, Kathryn S. McKinley, and Sundeep K. Kushwaha. 2006. A spatial path scheduling algorithm for EDGE architectures. SIGARCH Comput. Archit. News 34, 5 (Oct. 2006), 129--140. DOI:http://dx.doi.org/10.1145/1168919.1168875 Google ScholarDigital Library
- Lorenzo De Carli, Yi Pan, Amit Kumar, Cristian Estan, and Karthikeyan Sankaralingam. 2009. PLUG: Flexible lookup modules for rapid deployment of new protocols in high-speed routers. In Proceedings of the ACM SIGCOMM 2009 Conference on Data Communication (SIGCOMM'09). 207--218. DOI:http://dx.doi.org/10.1145/1592568.1592593 Google ScholarDigital Library
- Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An efficient SMT solver. In TACAS. Google ScholarDigital Library
- Abhishek Deb, Josep Maria Codina, and Antonio González. 2011. SoftHV: A HW/SW co-designed processor with horizontal and vertical fusion. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF'11). Article 1, 10 pages. DOI:http://dx.doi.org/10.1145/2016604.2016606 Google ScholarDigital Library
- Alexandre E. Eichenberger and Edward S. Davidson. 1997. Efficient formulation for optimal modulo schedulers. In Proceedings of the ACM SIGPLAN 1997 Conference on Programming Language Design and Implementation (PLDI'97). 194--205. DOI:http://dx.doi.org/10.1145/258915.258933 Google ScholarDigital Library
- Christine Eisenbeis and Antoine Sawaya. 1996. Optimal Loop Parallelization under Register Constraints. Research Report RR-2781, Inria.Google Scholar
- John R. Ellis. 1985. Bulldog: A Compiler for Vliw Architectures. Ph.D. Dissertation, Yale. Google ScholarDigital Library
- Daniel W. Engels, Jon Feldman, David R. Karger, and Matthias Ruhl. 2001. Parallel processor scheduling with delay constraints. In Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA'01). 577--585. http://dl.acm.org/citation.cfm?id=365411.365538 Google ScholarDigital Library
- Hadi Esmaeilzadeh, Emily Blem, Renee St. Amant, Karthikeyan Sankaralingam, and Doug Burger. 2011. Dark silicon and the end of multicore scaling. SIGARCH Comput. Archit. News 39, 3 (June 2011), 365--376. DOI:http://dx.doi.org/10.1145/2024723.2000108 Google ScholarDigital Library
- Hadi Esmaeilzadeh, Adrian Sampson, Luis Ceze, and Doug Burger. 2012. Neural acceleration for general-purpose approximate programs. In Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'12). IEEE Computer Society, Washington, DC, 449--460. DOI:http://dx.doi.org/10.1109/MICRO.2012.48 Google ScholarDigital Library
- Kevin Fan, Hyun hul Park, Manjunath Kudlur, and Scott Mahlke. 2008. Modulo scheduling for highly customized datapaths to increase hardware reusability. In Proceedings of the 6th Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO'08). ACM, New York, NY, 124--133. DOI:http://dx.doi.org/10.1145/1356058.1356075 Google ScholarDigital Library
- Paul Feautrier. 1994. Fine-grain scheduling under resource constraints. In Proceedings of the 7th Workshop on Language and Compilers for Parallel Computing. Springer-Verlag, LNCS 892, 1--15. Google ScholarDigital Library
- Mark Gebhart, Bertrand A. Maher, Katherine E. Coons, Jeff Diamond, Paul Gratz, Mario Marino, Nitya Ranganathan, Behnam Robatmili, Aaron Smith, James Burrill, Stephen W. Keckler, Doug Burger, and Kathryn S. McKinley. 2009. An evaluation of the TRIPS computer system. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'09). Google ScholarDigital Library
- Geoffrey J. Gordon, Sue Ann Hong, and Miroslav Dudík. First-order mixed integer linear programming. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI'09). Google ScholarDigital Library
- Ramaswamy Govindarajan, Erik R. Altman, and Guang R. Gao. 1994. A framework for resource-constrained rate-optimal software pipelining. In Proceedings of the Conference on Vector and Parallel Processing (CONPAR-94 VAPP VI). Google ScholarDigital Library
- Venkatraman Govindaraju, Chen-Han Ho, Tony Nowatzki, Jatin Chhugani, Nadathur Satish, Karthikeyan Sankaralingam, and Changkyu Kim. 2012. DySER: Unifying functionality and parallelism specialization for energy efficient computing. IEEE Micro 33, 5 (2012). Google ScholarDigital Library
- Venkatraman Govindaraju, Chen-Han Ho, and Karthikeyan Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In 2011 IEEE 17th International Symposium on High Performance Computer Architecture (HPCA). 503--514. DOI:http://dx.doi.org/10.1109/HPCA.2011.5749755 Google ScholarCross Ref
- Shantanu Gupta, Shuguang Feng, Amin Ansari, Scott Mahlke, and David August. 2011. Bundled execution of recurring traces for energy-efficient general purpose processing. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-44'11). 12--23. DOI:http://dx.doi.org/10.1145/2155620.2155623 Google ScholarDigital Library
- Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. 2011. Toward dark silicon in servers. IEEE Micro 31, 4 (2011), 6--15. Google ScholarDigital Library
- John N. Hooker. 2002. Logic, optimization and constraint programming. INFORMS J. Comput. 14 (2002), 295--321. Google ScholarDigital Library
- John N. Hooker and María Auxilio Osorio Lama. 1999. Mixed logical-linear programming. Discrete Appl. Math. 96--97, 1 (Oct. 1999). Google ScholarDigital Library
- Zhining Huang, Sharad Malik, Nahri Moreano, and Guido Araujo. 2004. The design of dynamically reconfigurable datapath coprocessors. ACM Trans. Embed. Comput. Syst. 3, 2 (May 2004), 361--384. Google ScholarDigital Library
- Rajeev Joshi, Greg Nelson, and Keith Randall. 2002. Denali: A goal-directed superoptimizer. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation (PLDI'02). 304--314. DOI:http://dx.doi.org/10.1145/512529.512566 Google ScholarDigital Library
- Krishnan Kailas, Ashok Agrawala, and Kemal Ebcioglu. 2001. CARS: A new code generation framework for clustered ILP processors. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA'01). 133. Google ScholarDigital Library
- Daniel Kroening and Ofer Strichman. 2010. Decision Procedures: An Algorithmic Point of View. Springer. Google ScholarDigital Library
- Manjunath Kudlur and Scott Mahlke. 2008. Orchestrating the execution of stream programs on multicore platforms. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'08). 114--124. DOI:http://dx.doi.org/10.1145/1375581.1375596 Google ScholarDigital Library
- Amit Kumar, Lorenzo De Carli, Sung Jin Kim, Marc de Kruijf, Karthikeyan Sankaralingam, Cristian Estan, and Somesh Jha. 2010. Design and implementation of the PLUG architecture for programmable and efficient network lookups. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT'10). 331--342. DOI:http://dx.doi.org/10.1145/1854273.1854316 Google ScholarDigital Library
- Walter Lee, Rajeev Barua, Matthew Frank, Devabhaktuni Srikrishna, Jonathan Babb, Vivek Sarkar, and Saman Amarasinghe. 1998. Space-time scheduling of instruction-level parallelism on a raw machine. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). 46--57. DOI:http://dx.doi.org/10.1145/291069.291018 Google ScholarDigital Library
- Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. 2006a. Instruction scheduling for a tiled dataflow architecture. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). 141--150. DOI:http://dx.doi.org/10.1145/1168857.1168876 Google ScholarDigital Library
- Martha Mercaldi, Steven Swanson, Andrew Petersen, Andrew Putnam, Andrew Schwerin, Mark Oskin, and Susan J. Eggers. 2006b. Modeling instruction placement on a spatial architecture. In Proceedings of the 18th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'06). Google ScholarDigital Library
- Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating spatial computation for whole program execution. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XII). 163--174. DOI:http://dx.doi.org/10.1145/1168857.1168878 Google ScholarDigital Library
- Ramadass Nagarajan, Sundeep K. Kushwaha, Doug Burger, Kathryn S. McKinley, Calvin Lin, and Stephen W. Keckler. 2004. Static placement, dynamic issue (SPDI) scheduling for EDGE architectures. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT'04). 74--84. DOI:http://dx.doi.org/10.1109/PACT.2004.26 Google ScholarDigital Library
- Emre Özer, Sanjeev Banerjia, and Thomas M. Conte. 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO'31). 308--315. http://dl.acm.org/ citation.cfm?id=290940.291004 Google ScholarDigital Library
- Jens Palsberg and Mayur Naik. 2004. ILP-Based Resource-Aware Compilation. (Multiprocessor Systems-on-Chips, chapter 12. Elsevier, 2004).Google Scholar
- Hyunchul Park, Kevin Fan, Scott A. Mahlke, Taewook Oh, Heeseok Kim, and Hong-seok Kim. 2008. Edge-centric modulo scheduling for coarse-grained reconfigurable architectures. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT'08). 166--176. DOI:http://dx.doi.org/10.1145/1454115.1454140 Google ScholarDigital Library
- William Pugh. 1991. The Omega test: A fast and practical integer programming algorithm for dependence analysis. In Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing'91). Google ScholarDigital Library
- Michael Sartin-Tarm, Tony Nowatzki, Lorenzo De Carli, Karthikeyan Sankaralingam, and Cristian Estan. 2013. Constraint centric scheduling guide. SIGARCH Comput. Archit. News 41, 2 (May 2013), 17--21. DOI:http://dx.doi.org/10.1145/2490302.2490306 Google ScholarDigital Library
- Nadathur Satish, Kaushik Ravindran, and Kurt Keutzer. 2007. A decomposition-based constraint optimization approach for statically scheduling task graphs with communication delays to multiprocessors. In DATE'07. Google ScholarDigital Library
- Robert E. Shostak. 1984. Deciding combinations of theories. J. ACM 31, 1 (Jan. 1984), 1--12. Google ScholarDigital Library
- Steven Swanson, Ken Michelson, Andrew Schwerin, and Mark Oskin. 2003. WaveScalar. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 36). 291. http://dl.acm.org/citation.cfm?id=956417.956546 Google ScholarDigital Library
- M. Thuresson, M. Sjalander, M. Bjork, L. Svensson, P. Larsson-Edefors, and P. Stenstrom. 2007. FlexCore: Utilizing exposed datapath control for efficient computing. In Proceedings of the International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation (IC-SAMOS'07).Google Scholar
- Ganesh Venkatesh, Jack Sampson, Nathan Goulding, Saturnino Garcia, Vladyslav Bryksin, Jose Lugo-Martinez, Steven Swanson, and Michael Bedford Taylor. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS XV). Google ScholarDigital Library
- Harvey M. Wagner. 1959. An integer linear-programming model for machine scheduling. Naval Res. Logistics Quarterly 6, 2 (1959), 131--140.Google ScholarCross Ref
- Elliot Waingold, Michael Taylor, Devabhaktuni Srikrishna, Vivek Sarkar, Walter Lee, Victor Lee, Jang Kim, Matthew Frank, Peter Finch, Rajeev Barua, Jonathan Babb, Saman Amarasinghe, and Anant Agarwal. 1997. Baring it all to software: RAW machines. Computer 30, 9 (1997), 86--93. Google ScholarDigital Library
- M. A. Watkins, M. J. Cianchetti, and D. H. Albonesi. 2008. Shared reconfigurable architectures for CMPS. In Proceedings of the 16th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'08). 299--304.Google Scholar
- Laurence A. Wolsey and George L. Nemhauser. 1999. Integer and Combinatorial Optimization. Wiley.Google Scholar
Index Terms
- A Scheduling Framework for Spatial Architectures Across Multiple Constraint-Solving Theories
Recommendations
A general constraint-centric scheduling framework for spatial architectures
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and ImplementationSpecialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which ...
A general constraint-centric scheduling framework for spatial architectures
PLDI '13Specialized execution using spatial architectures provides energy efficient computation, but requires effective algorithms for spatially scheduling the computation. Generally, this has been solved with architecture-specific heuristics, an approach which ...
Solving constraint satisfaction problems with SAT modulo theories
Due to significant advances in SAT technology in the last years, its use for solving constraint satisfaction problems has been gaining wide acceptance. Solvers for satisfiability modulo theories (SMT) generalize SAT solving by adding the ability to ...
Comments