skip to main content
Skip header Section
Optimizing compilers for modern architectures: a dependence-based approachOctober 2001
Publisher:
  • Morgan Kaufmann Publishers Inc.
  • 340 Pine Street, Sixth Floor
  • San Francisco
  • CA
  • United States
ISBN:978-1-55860-286-1
Published:01 October 2001
Pages:
790
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

Modern computer architectures designed with high-performance microprocessors offer tremendous potential gains in performance over previous designs. Yet their very complexity makes it increasingly difficult to produce efficient code and to realize their full potential. This landmark text from two leaders in the field focuses on the pivotal role that compilers can play in addressing this critical issue. The basis for all the methods presented in this book is data dependence, a fundamental compiler analysis tool for optimizing programs on high-performance microprocessors and parallel architectures. It enables compiler designers to write compilers that automatically transform simple, sequential programs into forms that can exploit special features of these modern architectures. The text provides a broad introduction to data dependence, to the many transformation strategies it supports, and to its applications to important optimization problems such as parallelization, compiler memory hierarchy management, and instruction scheduling. The authors demonstrate the importance and wide applicability of dependence-based compiler optimizations and give the compiler writer the basics needed to understand and implement them. They also offer cookbook explanations for transforming applications by hand to computational scientists and engineers who are driven to obtain the best possible performance of their complex applications.

Cited By

  1. ACM
    Tayeb H, Paillat L and Bramas B (2023). Autovesk: Automatic Vectorized Code Generation from Unstructured Static Kernels Using Graph Transformations, ACM Transactions on Architecture and Code Optimization, 21:1, (1-25), Online publication date: 31-Mar-2024.
  2. ACM
    Xu J, Song G, Zhou B, Li F, Hao J and Zhao J A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine Programs Proceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, (55-67)
  3. ACM
    Marron M Toward Programming Languages for Reasoning: Humans, Symbolic Systems, and AI Agents Proceedings of the 2023 ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, (136-152)
  4. ACM
    Reber B, Gould M, Kneipp A, Liu F, Prechtl I, Ding C, Chen L and Patru D (2023). Cache Programming for Scientific Loops Using Leases, ACM Transactions on Architecture and Code Optimization, 20:3, (1-25), Online publication date: 30-Sep-2023.
  5. ACM
    Chen T, Jia H, Zhang Y, Li K, Li Z, Zhao X, Yao J and Li C OpenFFT: An Adaptive Tuning Framework for 3D FFT on ARM Multicore CPUs Proceedings of the 37th International Conference on Supercomputing, (398-409)
  6. Su Z, Wang D, Yu Z, Yang Y, Jiang Y, Wang R, Chang W, Li W, Cui A and Sun J (2023). PHCG: Optimizing Simulink Code Generation for Embedded System With SIMD Instructions, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 42:4, (1072-1084), Online publication date: 1-Apr-2023.
  7. ACM
    Bai A Million.js: A Fast Compiler-Augmented Virtual DOM for the Web Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing, (1813-1820)
  8. ACM
    Sundararajah K, Saumya C and Kulkarni M (2022). UniRec: a unimodular-like framework for nested recursions and loops, Proceedings of the ACM on Programming Languages, 6:OOPSLA2, (1264-1290), Online publication date: 31-Oct-2022.
  9. ACM
    Borum H and Clausen M Transforming domain models to efficient C# for the Danish pension industry Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings, (766-773)
  10. Tong G, Yan R, Yang L, Lan M, Zhang J, Cheng Y, Ma W, Lü Y, Ma S and Huang L Optimizing Winograd Convolution on GPUs via Partial Kernel Fusion Network and Parallel Computing, (17-29)
  11. Praharenka W, Pankratz D, De Carvalho J, Amiri E and Amaral J (2022). Vectorizing divergent control flow with active-lane consolidation on long-vector architectures, The Journal of Supercomputing, 78:10, (12553-12588), Online publication date: 1-Jul-2022.
  12. ACM
    Susungi A and Tadonki C (2021). Intermediate Representations for Explicitly Parallel Programs, ACM Computing Surveys, 54:5, (1-24), Online publication date: 30-Jun-2022.
  13. ACM
    Khan S, Chatterjee B and Pande S VICO Proceedings of the 36th ACM International Conference on Supercomputing, (1-14)
  14. Ziraksima M, Lotfi S and Razmara J (2022). Deep reinforcement learning in loop fusion problem, Neurocomputing, 481:C, (102-120), Online publication date: 7-Apr-2022.
  15. Rocha R, Petoumenos P, Franke B, Bhatotia P and O'Boyle M Loop rolling for code size reduction Proceedings of the 20th IEEE/ACM International Symposium on Code Generation and Optimization, (217-229)
  16. Abdollahi-Kalkhoran A, Lotfi S and Izadkhah H (2022). TEA-SEA, Expert Systems with Applications: An International Journal, 191:C, Online publication date: 1-Apr-2022.
  17. ACM
    Ding C, Chen D, Liu F, Reber B and Smith W (2022). CARL: Compiler Assigned Reference Leasing, ACM Transactions on Architecture and Code Optimization, 19:1, (1-28), Online publication date: 31-Mar-2022.
  18. ACM
    Chatarasi P, Kwon H, Parashar A, Pellauer M, Krishna T and Sarkar V (2021). Marvel: A Data-Centric Approach for Mapping Deep Learning Operators on Spatial Accelerators, ACM Transactions on Architecture and Code Optimization, 19:1, (1-26), Online publication date: 31-Mar-2022.
  19. ACM
    Liu L, Isaacman S and Kremer U (2021). An Adaptive Application Framework with Customizable Quality Metrics, ACM Transactions on Design Automation of Electronic Systems, 27:2, (1-33), Online publication date: 31-Mar-2022.
  20. de Souza Neto J, Martins Moreira A, Vargas-Solar G and Musicante M (2022). A two-level formal model for Big Data processing programs, Science of Computer Programming, 215:C, Online publication date: 1-Mar-2022.
  21. Feng J, He Y, Tao Q, Ma H and Hashmi M (2022). An SLP Vectorization Method Based on Equivalent Extended Transformation, Wireless Communications & Mobile Computing, 2022, Online publication date: 1-Jan-2022.
  22. Álvarez Casado C and Bordallo López M (2021). Real-time face alignment: evaluation methods, training strategies and implementation optimization, Journal of Real-Time Image Processing, 18:6, (2239-2267), Online publication date: 1-Dec-2021.
  23. Tao X, Pang J, Xu J and Zhu Y (2021). Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture, The Journal of Supercomputing, 77:12, (14502-14524), Online publication date: 1-Dec-2021.
  24. Bednárek D, Kruliš M and Yaghob J (2021). Letting future programmers experience performance-related tasks, Journal of Parallel and Distributed Computing, 155:C, (74-86), Online publication date: 1-Sep-2021.
  25. ACM
    Di Luna G, Italiano D, Massarelli L, Österlund S, Giuffrida C and Querzoni L Who’s debugging the debuggers? exposing debug information bugs in optimized binaries Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, (1034-1045)
  26. Vasiladiotis C, Lozano R, Cole M and Franke B Loop parallelization using dynamic commutativity analysis Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization, (150-161)
  27. ACM
    Poesia G and Pereira F (2020). Dynamic dispatch of context-sensitive optimizations, Proceedings of the ACM on Programming Languages, 4:OOPSLA, (1-28), Online publication date: 13-Nov-2020.
  28. Brinich P and Johnson J Verification of Vectorization of Signal Transforms Languages and Compilers for Parallel Computing, (215-231)
  29. ACM
    Lezos C, Dimitroulakos G, Latifis I and Masselos K (2020). A Locality Optimizer for Loop-dominated Applications Based on Reuse Distance Analysis, ACM Transactions on Design Automation of Electronic Systems, 25:6, (1-26), Online publication date: 12-Oct-2020.
  30. ACM
    Gharat P, Khedker U and Mycroft A (2020). Generalized Points-to Graphs, ACM Transactions on Programming Languages and Systems, 42:2, (1-78), Online publication date: 30-Jun-2020.
  31. ACM
    Prabhu I and Nandivada V Chunking loops with non-uniform workloads Proceedings of the 34th ACM International Conference on Supercomputing, (1-12)
  32. ACM
    Gupta S, Purandare S and Ramachandra K Aggify: Lifting the Curse of Cursor Loops using Custom Aggregates Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, (559-573)
  33. ACM
    Vasilache N, Zinenko O, Theodoridis T, Goyal P, Devito Z, Moses W, Verdoolaege S, Adams A and Cohen A (2019). The Next 700 Accelerated Layers, ACM Transactions on Architecture and Code Optimization, 16:4, (1-26), Online publication date: 31-Dec-2020.
  34. Kunft A, Katsifodimos A, Schelter S, Breß S, Rabl T and Markl V (2019). An intermediate representation for optimizing machine learning pipelines, Proceedings of the VLDB Endowment, 12:11, (1553-1567), Online publication date: 1-Jul-2019.
  35. ACM
    Jacob D and Singer J ALPyNA: acceleration of loops in Python for novel architectures Proceedings of the 6th ACM SIGPLAN International Workshop on Libraries, Languages and Compilers for Array Programming, (25-34)
  36. ACM
    Sundararajah K and Kulkarni M Composable, sound transformations of nested recursion and loops Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, (902-917)
  37. ACM
    Zou Y and Lin M Graph-Morphing Proceedings of the 56th Annual Design Automation Conference 2019, (1-6)
  38. Angerer F, Grimmer A, Prähofer H and Grünbacher P (2019). Change impact analysis for maintenance and evolution of variable software systems, Automated Software Engineering, 26:2, (417-461), Online publication date: 1-Jun-2019.
  39. ACM
    Wei J, Gibson G, Gibbons P and Xing E Automating Dependence-Aware Parallelization of Machine Learning Training on Distributed Shared Memory Proceedings of the Fourteenth EuroSys Conference 2019, (1-17)
  40. Teixeira T, Ancourt C, Padua D and Gropp W Locus: a system and a language for program optimization Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization, (217-228)
  41. ACM
    Wang Q, Su P, Chabbi M and Liu X Lightweight hardware transactional memory profiling Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming, (186-200)
  42. ACM
    Crago N, Stephenson M and Keckler S (2018). Exposing Memory Access Patterns to Improve Instruction and Memory Efficiency in GPUs, ACM Transactions on Architecture and Code Optimization, 15:4, (1-23), Online publication date: 8-Jan-2019.
  43. ACM
    Sato Y, Yuki T and Endo T (2019). An Autotuning Framework for Scalable Execution of Tiled Code via Iterative Polyhedral Compilation, ACM Transactions on Architecture and Code Optimization, 15:4, (1-23), Online publication date: 31-Dec-2019.
  44. ACM
    Zhao H, Zheng F, Wu J, Nan B, Li B and Mei K Automatic Parallelization for Binary on Multi-core Platforms Proceedings of the 2nd International Conference on Computer Science and Application Engineering, (1-6)
  45. Boehm M, Reinwald B, Hutchison D, Sen P, Evfimievski A and Pansare N (2018). On optimizing operator fusion plans for large-scale machine learning in systemML, Proceedings of the VLDB Endowment, 11:12, (1755-1768), Online publication date: 1-Aug-2018.
  46. ACM
    Jinyang Y, Rongcai Z, Qi W and Xiaohan T Loop-nest Auto-vectorization Method Based on Benefit Analysis Proceedings of the 2nd International Conference on Advances in Image Processing, (240-244)
  47. ACM
    Vahabzadeh A, Stocco A and Mesbah A Fine-grained test minimization Proceedings of the 40th International Conference on Software Engineering, (210-221)
  48. Stpiczyński P (2018). Language-based vectorization and parallelization using intrinsics, OpenMP, TBB and Cilk Plus, The Journal of Supercomputing, 74:4, (1461-1472), Online publication date: 1-Apr-2018.
  49. Zhao J and Zhao R (2018). K-DT, The Journal of Supercomputing, 74:4, (1655-1675), Online publication date: 1-Apr-2018.
  50. ACM
    Zinenko O, Huot S and Bastoul C (2018). Visual Program Manipulation in the Polyhedral Model, ACM Transactions on Architecture and Code Optimization, 15:1, (1-25), Online publication date: 31-Mar-2018.
  51. ACM
    Kotsifakou M, Srivastava P, Sinclair M, Komuravelli R, Adve V and Adve S (2018). HPVM, ACM SIGPLAN Notices, 53:1, (68-80), Online publication date: 23-Mar-2018.
  52. ACM
    Shen D, Chabbi M and Liu X An Evaluation of Vectorization and Cache Reuse Tradeoffs on Modern CPUs Proceedings of the 9th International Workshop on Programming Models and Applications for Multicores and Manycores, (21-30)
  53. ACM
    Rodrigues C, Phaosawasdi A and Wu P SIMDization of Small Tensor Multiplication Kernels for Wide SIMD Vector Processors Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing, (1-8)
  54. ACM
    Lemaitre F, Couturier B and Lacassagne L Small SIMD Matrices for CERN High Throughput Computing Proceedings of the 2018 4th Workshop on Programming Models for SIMD/Vector Processing, (1-8)
  55. ACM
    Zinenko O, Verdoolaege S, Reddy C, Shirako J, Grosser T, Sarkar V and Cohen A Modeling the conflicting demands of parallelism and Temporal/Spatial locality in affine scheduling Proceedings of the 27th International Conference on Compiler Construction, (3-13)
  56. ACM
    Kotsifakou M, Srivastava P, Sinclair M, Komuravelli R, Adve V and Adve S HPVM Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (68-80)
  57. Harris B, Moghaddam M, Kang D, Bae I, Kim E, Min H, Cho H, Kim S, Egger B, Ha S and Choi K Architectures and algorithms for user customization of CNNs Proceedings of the 23rd Asia and South Pacific Design Automation Conference, (540-547)
  58. Harris B, Moghaddam M, Kang D, Bae I, Kim E, Min H, Cho H, Kim S, Egger B, Ha S and Choi K Architectures and algorithms for user customization of CNNs 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), (540-547)
  59. ACM
    Shrivastava R and Nandivada V (2017). Energy-Efficient Compilation of Irregular Task-Parallel Loops, ACM Transactions on Architecture and Code Optimization, 14:4, (1-29), Online publication date: 20-Dec-2017.
  60. Ramachandra K, Park K, Emani K, Halverson A, Galindo-Legaria C and Cunningham C (2017). Froid, Proceedings of the VLDB Endowment, 11:4, (432-444), Online publication date: 1-Dec-2017.
  61. Ramachandra K, Park K, Emani K, Halverson A, Galindo-Legaria C and Cunningham C (2018). Froid, Proceedings of the VLDB Endowment, 11:4, (432-444), Online publication date: 1-Dec-2017.
  62. ACM
    Li Z, Liu L, Deng Y, Yin S, Wang Y and Wei S (2017). Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware, ACM SIGARCH Computer Architecture News, 45:2, (575-586), Online publication date: 14-Sep-2017.
  63. ACM
    Henriksen T, Serup N, Elsman M, Henglein F and Oancea C (2017). Futhark: purely functional GPU-programming with nested parallelism and in-place array updates, ACM SIGPLAN Notices, 52:6, (556-571), Online publication date: 14-Sep-2017.
  64. ACM
    Jensen N and Karlsson S (2017). Improving Loop Dependence Analysis, ACM Transactions on Architecture and Code Optimization, 14:3, (1-24), Online publication date: 6-Sep-2017.
  65. ACM
    Li Z, Liu L, Deng Y, Yin S, Wang Y and Wei S Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware Proceedings of the 44th Annual International Symposium on Computer Architecture, (575-586)
  66. ACM
    Gupta S, Shrivastava R and Nandivada V Optimizing recursive task parallel programs Proceedings of the International Conference on Supercomputing, (1-11)
  67. ACM
    Henriksen T, Serup N, Elsman M, Henglein F and Oancea C Futhark: purely functional GPU-programming with nested parallelism and in-place array updates Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, (556-571)
  68. ACM
    Bilardi G, Ekanadham K and Pattnaik P Optimal On-Line Computation of Stack Distances for MIN and OPT Proceedings of the Computing Frontiers Conference, (237-246)
  69. ACM
    Sundararajah K, Sakka L and Kulkarni M (2017). Locality Transformations for Nested Recursive Iteration Spaces, ACM SIGPLAN Notices, 52:4, (281-295), Online publication date: 12-May-2017.
  70. ACM
    Sundararajah K, Sakka L and Kulkarni M (2017). Locality Transformations for Nested Recursive Iteration Spaces, ACM SIGARCH Computer Architecture News, 45:1, (281-295), Online publication date: 11-May-2017.
  71. ACM
    Sundararajah K, Sakka L and Kulkarni M Locality Transformations for Nested Recursive Iteration Spaces Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems, (281-295)
  72. ACM
    Shi X, Cui B, Dobbie G and Ooi B (2016). UniAD, ACM Transactions on Database Systems, 42:1, (1-42), Online publication date: 2-Mar-2017.
  73. ACM
    Shirako J, Hayashi A and Sarkar V Optimized two-level parallelization for GPU accelerators using the polyhedral model Proceedings of the 26th International Conference on Compiler Construction, (22-33)
  74. ACM
    Kusano M and Wang C Flow-sensitive composition of thread-modular abstract interpretation Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering, (799-809)
  75. ACM
    Huang J, Prabhu P, Jablin T, Ghosh S, Apostolakis S, Lee J and August D Speculatively Exploiting Cross-Invocation Parallelism Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, (207-221)
  76. ACM
    Kristensen M, Lund S, Blum T and Avery J Fusion of Parallel Array Operations Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, (71-85)
  77. ACM
    Agullo E, Buttari A, Guermouche A and Lopez F (2016). Implementing Multifrontal Sparse Solvers for Multicore Architectures with Sequential Task Flow Runtime Systems, ACM Transactions on Mathematical Software, 43:2, (1-22), Online publication date: 2-Sep-2016.
  78. ACM
    Truong L, Barik R, Totoni E, Liu H, Markley C, Fox A and Shpeisman T (2016). Latte: a language, compiler, and runtime for elegant and efficient deep neural networks, ACM SIGPLAN Notices, 51:6, (209-223), Online publication date: 1-Aug-2016.
  79. ACM
    Sultana N, Calvert A, Overbey J and Arnold G From OpenACC to OpenMP 4 Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, (1-8)
  80. Agullo E, Bramas B, Coulaud O, Darve E, Messner M and Takahashi T (2016). Task-based FMM for heterogeneous architectures, Concurrency and Computation: Practice & Experience, 28:9, (2608-2629), Online publication date: 25-Jun-2016.
  81. ACM
    Truong L, Barik R, Totoni E, Liu H, Markley C, Fox A and Shpeisman T Latte: a language, compiler, and runtime for elegant and efficient deep neural networks Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, (209-223)
  82. Šinkarovs A and Scholz S (2016). Type-driven data layouts for improved vectorisation, Concurrency and Computation: Practice & Experience, 28:7, (2092-2119), Online publication date: 1-May-2016.
  83. Lin Y and Lee J (2016). Vector data flow analysis for SIMD optimizations on OpenCL programs, Concurrency and Computation: Practice & Experience, 28:5, (1629-1654), Online publication date: 10-Apr-2016.
  84. Elkhouly R, El-Mahdy A and Elmasry A Optimality analysis of if-conversion transformation Proceedings of the 24th High Performance Computing Symposium, (1-8)
  85. ACM
    Na Y, Kim S and Han Y (2016). JavaScript Parallelizing Compiler for Exploiting Parallelism from Data-Parallel HTML5 Applications, ACM Transactions on Architecture and Code Optimization, 12:4, (1-25), Online publication date: 7-Jan-2016.
  86. ACM
    Yiapanis P, Brown G and Luján M (2015). Compiler-Driven Software Speculation for Thread-Level Parallelism, ACM Transactions on Programming Languages and Systems, 38:2, (1-45), Online publication date: 4-Jan-2016.
  87. Tan M, Liu G, Zhao R, Dai S and Zhang Z ElasticFlow Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, (78-85)
  88. ACM
    Ding C, Lu H and Ye C MMC Proceedings of the 2015 International Symposium on Memory Systems, (47-50)
  89. ACM
    Stevens J, Tschirhart P and Jacob B The Semantic Gap Between Software and the Memory System Proceedings of the 2015 International Symposium on Memory Systems, (43-46)
  90. ACM
    Guo S, Kusano M, Wang C, Yang Z and Gupta A Assertion guided symbolic execution of multithreaded programs Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, (854-865)
  91. ACM
    Venkat A, Hall M and Strout M (2015). Loop and data transformations for sparse matrix code, ACM SIGPLAN Notices, 50:6, (521-532), Online publication date: 7-Aug-2015.
  92. ACM
    Weijiang Y, Balakrishna S, Liu J and Kulkarni M (2015). Tree dependence analysis, ACM SIGPLAN Notices, 50:6, (314-325), Online publication date: 7-Aug-2015.
  93. Kotha A, Anand K, Creech T, ElWazeer K, Smithson M, Yellareddy G and Barua R (2015). Affine Parallelization Using Dependence and Cache Analysis in a Binary Rewriter, IEEE Transactions on Parallel and Distributed Systems, 26:8, (2154-2163), Online publication date: 1-Aug-2015.
  94. ACM
    Chatty S, Magnaudet M and Prun D Verification of properties of interactive components from their executable code Proceedings of the 7th ACM SIGCHI Symposium on Engineering Interactive Computing Systems, (276-285)
  95. ACM
    Aloor R and Nandivada V Unique Worker model for OpenMP Proceedings of the 29th ACM on International Conference on Supercomputing, (47-56)
  96. ACM
    Caballero D, Royuela S, Ferrer R, Duran A and Martorell X Optimizing Overlapped Memory Accesses in User-directed Vectorization Proceedings of the 29th ACM on International Conference on Supercomputing, (393-404)
  97. ACM
    Venkat A, Hall M and Strout M Loop and data transformations for sparse matrix code Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, (521-532)
  98. ACM
    Weijiang Y, Balakrishna S, Liu J and Kulkarni M Tree dependence analysis Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, (314-325)
  99. ACM
    Hassaan M, Nguyen D and Pingali K (2015). Kinetic Dependence Graphs, ACM SIGARCH Computer Architecture News, 43:1, (457-471), Online publication date: 29-May-2015.
  100. Wang D, Janjusic T, Iversen C, Thornton P, Karssovski M, Wu W and Xu Y A scientific function test framework for modular environmental model development Proceedings of the 2015 International Workshop on Software Engineering for High Performance Computing in Science, (16-23)
  101. ACM
    Hassaan M, Nguyen D and Pingali K (2015). Kinetic Dependence Graphs, ACM SIGPLAN Notices, 50:4, (457-471), Online publication date: 12-May-2015.
  102. ACM
    Streit K, Doerfert J, Hammacher C, Zeller A and Hack S (2015). Generalized Task Parallelism, ACM Transactions on Architecture and Code Optimization, 12:1, (1-25), Online publication date: 16-Apr-2015.
  103. ACM
    Hassaan M, Nguyen D and Pingali K Kinetic Dependence Graphs Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, (457-471)
  104. Kim H, El Hajj I, Stratton J, Lumetta S and Hwu W Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (257-268)
  105. ACM
    Lazarescu M and Lavagno L (2015). Interactive Trace-Based Analysis Toolset for Manual Parallelization of C Programs, ACM Transactions on Embedded Computing Systems, 14:1, (1-20), Online publication date: 21-Jan-2015.
  106. ACM
    Huda Z, Jannesari A and Wolf F (2015). Using Template Matching to Infer Parallel Design Patterns, ACM Transactions on Architecture and Code Optimization, 11:4, (1-21), Online publication date: 9-Jan-2015.
  107. ACM
    Kong M, Pop A, Pouchet L, Govindarajan R, Cohen A and Sadayappan P (2015). Compiler/Runtime Framework for Dynamic Dataflow Parallelization of Tiled Programs, ACM Transactions on Architecture and Code Optimization, 11:4, (1-30), Online publication date: 9-Jan-2015.
  108. ACM
    Cilardo A and Gallo L (2015). Improving Multibank Memory Access Parallelism with Lattice-Based Partitioning, ACM Transactions on Architecture and Code Optimization, 11:4, (1-25), Online publication date: 9-Jan-2015.
  109. Yi Q, Wang Q and Cui H Specializing Compiler Optimizations through Programmable Composition for Dense Matrix Computations Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, (596-608)
  110. Shirako J, Pouchet L and Sarkar V Oil and water can mix Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (287-298)
  111. ACM
    Overbey J, Behrang F and Hafiz M A foundation for refactoring C with macros Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, (75-85)
  112. ACM
    Sommer R, Vallentin M, De Carli L and Paxson V HILTI Proceedings of the 2014 Conference on Internet Measurement Conference, (461-474)
  113. ACM
    Liu C, Zhang J, Zhou H, McDirmid S, Guo Z and Moscibroda T Automating Distributed Partial Aggregation Proceedings of the ACM Symposium on Cloud Computing, (1-12)
  114. ACM
    Campanoni S, Brownell K, Kanev S, Jones T, Wei G and Brooks D (2014). HELIX-RC, ACM SIGARCH Computer Architecture News, 42:3, (217-228), Online publication date: 16-Oct-2014.
  115. ACM
    Albert C, Murray A and Ravindran B Applying source level auto-vectorization to Aparapi Java Proceedings of the 2014 International Conference on Principles and Practices of Programming on the Java platform: Virtual machines, Languages, and Tools, (122-132)
  116. ACM
    Kusano M and Wang C Assertion guided abstraction Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering, (175-186)
  117. ACM
    Shi X, Cui B, Dobbie G and Ooi B Towards unified ad-hoc data processing Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, (1263-1274)
  118. Campanoni S, Brownell K, Kanev S, Jones T, Wei G and Brooks D HELIX-RC Proceeding of the 41st annual international symposium on Computer architecuture, (217-228)
  119. ACM
    Waterland A, Angelino E, Adams R, Appavoo J and Seltzer M (2014). ASC, ACM SIGARCH Computer Architecture News, 42:1, (575-590), Online publication date: 5-Apr-2014.
  120. ACM
    Waterland A, Angelino E, Adams R, Appavoo J and Seltzer M (2014). ASC, ACM SIGPLAN Notices, 49:4, (575-590), Online publication date: 5-Apr-2014.
  121. Kim T and Hoskote Y Automatic generation of custom SIMD instructions for superword level parallelism Proceedings of the conference on Design, Automation & Test in Europe, (1-6)
  122. Boehm M, Tatikonda S, Reinwald B, Sen P, Tian Y, Burdick D and Vaithyanathan S (2014). Hybrid parallelization strategies for large-scale machine learning in SystemML, Proceedings of the VLDB Endowment, 7:7, (553-564), Online publication date: 1-Mar-2014.
  123. ACM
    Waterland A, Angelino E, Adams R, Appavoo J and Seltzer M ASC Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, (575-590)
  124. ACM
    Lacassagne L, Etiemble D, Hassan Zahraee A, Dominguez A and Vezolle P High level transforms for SIMD and low-level computer vision algorithms Proceedings of the 2014 Workshop on Programming models for SIMD/Vector processing, (49-56)
  125. ACM
    Venkat A, Shantharam M, Hall M and Strout M Non-affine Extensions to Polyhedral Code Generation Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (185-194)
  126. ACM
    Venkat A, Shantharam M, Hall M and Strout M Non-affine Extensions to Polyhedral Code Generation Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (185-194)
  127. Ketterlin A and Clauss P (2014). Recovering memory access patterns of executable programs, Science of Computer Programming, 80:PB, (440-456), Online publication date: 1-Feb-2014.
  128. ACM
    Wang Z, Tournavitis G, Franke B and O'boyle M (2014). Integrating profile-driven parallelism detection and machine-learning-based mapping, ACM Transactions on Architecture and Code Optimization, 11:1, (1-26), Online publication date: 1-Feb-2014.
  129. ACM
    Brock J, Gu X, Bao B and Ding C (2013). Pacman, ACM SIGPLAN Notices, 48:11, (39-50), Online publication date: 4-Dec-2013.
  130. ACM
    Fauzia N, Elango V, Ravishankar M, Ramanujam J, Rastello F, Rountev A, Pouchet L and Sadayappan P (2013). Beyond reuse distance analysis, ACM Transactions on Architecture and Code Optimization, 10:4, (1-29), Online publication date: 1-Dec-2013.
  131. ACM
    Ravi N, Yang Y, Bao T and Chakradhar S Semi-automatic restructuring of offloadable tasks for many-core accelerators Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-12)
  132. Seo S, Lee J, Jo G and Lee J Automatic OpenCL work-group size selection for multicore CPUs Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, (387-398)
  133. Govindaraju V, Nowatzki T and Sankaralingam K Breaking SIMD shackles with an exposed flexible microarchitecture and the access execute PDG Proceedings of the 22nd international conference on Parallel architectures and compilation techniques, (341-352)
  134. Compiler-directed memory hierarchy design for low-energy embedded systems Proceedings of the Eleventh ACM/IEEE International Conference on Formal Methods and Models for Codesign, (147-156)
  135. ACM
    Henriksen T and Oancea C A T2 graph-reduction approach to fusion Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing, (47-58)
  136. Liu P, Huang C, Guo J, Geng Y, Wang W and Yang M Scalable-Grain Pipeline Parallelization Method for Multi-core Systems Proceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 8147, (269-283)
  137. Ding C and Liu L Access Annotation for Safe Program Parallelization Proceedings of the 10th IFIP International Conference on Network and Parallel Computing - Volume 8147, (13-26)
  138. ACM
    Papakonstantinou A, Gururaj K, Stratton J, Chen D, Cong J and Hwu W (2013). Efficient compilation of CUDA kernels for high-performance computing on FPGAs, ACM Transactions on Embedded Computing Systems, 13:2, (1-26), Online publication date: 1-Sep-2013.
  139. Agullo E, Buttari A, Guermouche A and Lopez F Multifrontal QR factorization for multicore architectures over runtime systems Proceedings of the 19th international conference on Parallel Processing, (521-532)
  140. ACM
    Barthe G, Crespo J, Gulwani S, Kunz C and Marron M (2013). From relational verification to SIMD loop synthesis, ACM SIGPLAN Notices, 48:8, (123-134), Online publication date: 23-Aug-2013.
  141. ACM
    Benoit A, Çatalyürek Ü, Robert Y and Saule E (2013). A survey of pipelined workflow scheduling, ACM Computing Surveys, 45:4, (1-36), Online publication date: 1-Aug-2013.
  142. ACM
    Waterland A, Angelino E, Cubuk E, Kaxiras E, Adams R, Appavoo J and Seltzer M Computational caches Proceedings of the 6th International Systems and Storage Conference, (1-7)
  143. Sheffield D, Anderson M and Keutzer K Three fingered jack Proceedings of the 5th USENIX Conference on Hot Topics in Parallelism, (2-2)
  144. ACM
    Johnson N, Oh T, Zaks A and August D (2013). Fast condensation of the program dependence graph, ACM SIGPLAN Notices, 48:6, (39-50), Online publication date: 23-Jun-2013.
  145. ACM
    Kong M, Veras R, Stock K, Franchetti F, Pouchet L and Sadayappan P (2013). When polyhedral transformations meet SIMD code generation, ACM SIGPLAN Notices, 48:6, (127-138), Online publication date: 23-Jun-2013.
  146. ACM
    Brock J, Gu X, Bao B and Ding C Pacman Proceedings of the 2013 international symposium on memory management, (39-50)
  147. ACM
    Brock J, Gu X, Bao B and Ding C Pacman Proceedings of the 2013 international symposium on memory management, (39-50)
  148. ACM
    Johnson N, Oh T, Zaks A and August D Fast condensation of the program dependence graph Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, (39-50)
  149. ACM
    Kong M, Veras R, Stock K, Franchetti F, Pouchet L and Sadayappan P When polyhedral transformations meet SIMD code generation Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, (127-138)
  150. ACM
    Alle M, Morvan A and Derrien S Runtime dependency analysis for loop pipelining in high-level synthesis Proceedings of the 50th Annual Design Automation Conference, (1-10)
  151. ACM
    Papakonstantinou A, Chen D, Hwu W, Cong J and Liang Y Throughput-oriented kernel porting onto FPGAs Proceedings of the 50th Annual Design Automation Conference, (1-10)
  152. Leung A, Lhoták O and Lashari G (2013). Parallel execution of Java loops on Graphics Processing Units, Science of Computer Programming, 78:5, (458-480), Online publication date: 1-May-2013.
  153. ACM
    Oh T, Kim H, Johnson N, Lee J and August D (2013). Practical automatic loop specialization, ACM SIGPLAN Notices, 48:4, (419-430), Online publication date: 23-Apr-2013.
  154. ACM
    Xiang X, Ding C, Luo H and Bao B (2013). HOTL, ACM SIGPLAN Notices, 48:4, (343-356), Online publication date: 23-Apr-2013.
  155. ACM
    Nandivada V, Shirako J, Zhao J and Sarkar V (2013). A Transformation Framework for Optimizing Task-Parallel Programs, ACM Transactions on Programming Languages and Systems, 35:1, (1-48), Online publication date: 1-Apr-2013.
  156. ACM
    Oh T, Kim H, Johnson N, Lee J and August D (2013). Practical automatic loop specialization, ACM SIGARCH Computer Architecture News, 41:1, (419-430), Online publication date: 29-Mar-2013.
  157. ACM
    Xiang X, Ding C, Luo H and Bao B (2013). HOTL, ACM SIGARCH Computer Architecture News, 41:1, (343-356), Online publication date: 29-Mar-2013.
  158. ACM
    Vasilache N, Baskaran M, Meister B and Lethin R Memory reuse optimizations in the R-Stream compiler Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, (42-53)
  159. ACM
    Oh T, Kim H, Johnson N, Lee J and August D Practical automatic loop specialization Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, (419-430)
  160. ACM
    Xiang X, Ding C, Luo H and Bao B HOTL Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems, (343-356)
  161. ACM
    Barthe G, Crespo J, Gulwani S, Kunz C and Marron M From relational verification to SIMD loop synthesis Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming, (123-134)
  162. August D, Huang J, Beard S, Johnson N and Jablin T Automatically exploiting cross-invocation parallelism using runtime information Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), (1-11)
  163. O'Boyle M, Wang Z and Grewe D Portable mapping of data parallel programs to OpenCL for heterogeneous systems Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO), (1-10)
  164. ACM
    Pouchet L, Zhang P, Sadayappan P and Cong J Polyhedral-based data reuse optimization for configurable computing Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, (29-38)
  165. Bocchino R Alias control for deterministic parallelism Aliasing in Object-Oriented Programming, (156-195)
  166. ACM
    Verdoolaege S, Carlos Juega J, Cohen A, Ignacio Gómez J, Tenllado C and Catthoor F (2013). Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, 9:4, (1-23), Online publication date: 1-Jan-2013.
  167. ACM
    Baghdadi R, Cohen A, Verdoolaege S and Trifunović K (2013). Improved loop tiling based on the removal of spurious false dependences, ACM Transactions on Architecture and Code Optimization, 9:4, (1-26), Online publication date: 1-Jan-2013.
  168. ACM
    Cui H, Yi Q, Xue J and Feng X (2013). Layout-oblivious compiler optimization for matrix computations, ACM Transactions on Architecture and Code Optimization, 9:4, (1-20), Online publication date: 1-Jan-2013.
  169. ACM
    Xydis S, Pekmestzi K, Soudris D and Economakos G (2013). Compiler-in-the-loop exploration during datapath synthesis for higher quality delay-area trade-offs, ACM Transactions on Design Automation of Electronic Systems, 18:1, (1-35), Online publication date: 1-Jan-2013.
  170. Ketterlin A and Clauss P Profiling Data-Dependence to Assist Parallelization Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, (437-448)
  171. ACM
    Jo Y and Kulkarni M (2012). Automatically enhancing locality for tree traversals with traversal splicing, ACM SIGPLAN Notices, 47:10, (355-374), Online publication date: 15-Nov-2012.
  172. ACM
    Li P, Wang Y, Zhang P, Luo G, Wang T and Cong J Memory partitioning and scheduling co-optimization in behavioral synthesis Proceedings of the International Conference on Computer-Aided Design, (488-495)
  173. ACM
    Jo Y and Kulkarni M Automatically enhancing locality for tree traversals with traversal splicing Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (355-374)
  174. Guo Z, Fan X, Chen R, Zhang J, Zhou H, McDirmid S, Liu C, Lin W, Zhou J and Zhou L Spotting code optimizations in data-parallel pipelines through PeriSCOPE Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation, (121-133)
  175. ACM
    Raman A, Lee J and August D From sequential programming to flexible parallel execution Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems, (37-40)
  176. Pellegrini S, Hoefler T and Fahringer T Exact dependence analysis for increased communication overlap Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface, (89-99)
  177. ACM
    Oancea C, Andreetta C, Berthold J, Frisch A and Henglein F Financial software on GPUs Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing, (61-72)
  178. ACM
    Kim S and Han H (2012). Efficient SIMD code generation for irregular kernels, ACM SIGPLAN Notices, 47:8, (55-64), Online publication date: 11-Sep-2012.
  179. Bikker J (2012). Improving Data Locality for Efficient In-Core Path Tracing, Computer Graphics Forum, 31:6, (1936-1947), Online publication date: 1-Sep-2012.
  180. ACM
    Oancea C and Rauchwerger L (2012). Logical inference techniques for loop parallelization, ACM SIGPLAN Notices, 47:6, (509-520), Online publication date: 6-Aug-2012.
  181. ACM
    Holewinski J, Ramamurthi R, Ravishankar M, Fauzia N, Pouchet L, Rountev A and Sadayappan P (2012). Dynamic trace-based analysis of vectorization potential of applications, ACM SIGPLAN Notices, 47:6, (371-382), Online publication date: 6-Aug-2012.
  182. ACM
    Raman A, Zaks A, Lee J and August D (2012). Parcae, ACM SIGPLAN Notices, 47:6, (133-144), Online publication date: 6-Aug-2012.
  183. ACM
    Yu H and Li Z Fast loop-level data dependence profiling Proceedings of the 26th ACM international conference on Supercomputing, (37-46)
  184. ACM
    Ramachandra K, Guravannavar R and Sudarshan S Program analysis and transformation for holistic optimization of database applications Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis, (39-44)
  185. ACM
    Oancea C and Rauchwerger L Logical inference techniques for loop parallelization Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, (509-520)
  186. ACM
    Holewinski J, Ramamurthi R, Ravishankar M, Fauzia N, Pouchet L, Rountev A and Sadayappan P Dynamic trace-based analysis of vectorization potential of applications Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, (371-382)
  187. ACM
    Raman A, Zaks A, Lee J and August D Parcae Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, (133-144)
  188. Xu G, Yan D and Rountev A Static detection of loop-invariant data structures Proceedings of the 26th European conference on Object-Oriented Programming, (738-763)
  189. ACM
    Cong J, Zhang P and Zou Y Optimizing memory hierarchy allocation with loop transformations for high-level synthesis Proceedings of the 49th Annual Design Automation Conference, (1233-1238)
  190. ACM
    Campanoni S, Jones T, Holloway G, Wei G and Brooks D The HELIX project Proceedings of the 49th Annual Design Automation Conference, (277-282)
  191. ACM
    Park Y, Seo S, Park H, Cho H and Mahlke S (2012). SIMD defragmenter, ACM SIGPLAN Notices, 47:4, (363-374), Online publication date: 1-Jun-2012.
  192. Bao B, Ding C, Gao Y and Archambault R Delta Send-Recv for Dynamic Pipelining in MPI Programs Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), (384-392)
  193. Zhang J, Zhou H, Chen R, Fan X, Guo Z, Lin H, Li J, Lin W, Zhou J and Zhou L Optimizing data shuffling in data-parallel computation by understanding user-defined functions Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation, (22-22)
  194. ACM
    Park Y, Seo S, Park H, Cho H and Mahlke S (2012). SIMD defragmenter, ACM SIGARCH Computer Architecture News, 40:1, (363-374), Online publication date: 18-Apr-2012.
  195. ACM
    Zhou X, Giacalone J, Garzarán M, Kuhn R, Ni Y and Padua D Hierarchical overlapped tiling Proceedings of the Tenth International Symposium on Code Generation and Optimization, (207-218)
  196. ACM
    Campanoni S, Jones T, Holloway G, Reddi V, Wei G and Brooks D HELIX Proceedings of the Tenth International Symposium on Code Generation and Optimization, (84-93)
  197. Unkule S, Shaltz C and Qasem A Automatic restructuring of GPU kernels for exploiting inter-thread data locality Proceedings of the 21st international conference on Compiler Construction, (21-40)
  198. ACM
    Park Y, Seo S, Park H, Cho H and Mahlke S SIMD defragmenter Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, (363-374)
  199. ACM
    Qasem A Efficient execution of time-step computations with pipelined parallelism and inter-thread data locality optimizaitions Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (27-35)
  200. ACM
    Kim S and Han H Efficient SIMD code generation for irregular kernels Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, (55-64)
  201. ACM
    Burrows E and Haveraaen M Programmable data dependencies and placements Proceedings of the 7th workshop on Declarative aspects and applications of multicore programming, (31-40)
  202. ACM
    Stock K, Pouchet L and Sadayappan P (2012). Using machine learning to improve automatic vectorization, ACM Transactions on Architecture and Code Optimization, 8:4, (1-23), Online publication date: 1-Jan-2012.
  203. ACM
    Feng M, Lin C and Gupta R (2012). PLDS, ACM Transactions on Architecture and Code Optimization, 8:4, (1-21), Online publication date: 1-Jan-2012.
  204. Owaida M, Bellas N, Antonopoulos C, Daloukas K and Antoniadis C Massively parallel programming models used as hardware description languages Proceedings of the International Conference on Computer-Aided Design, (326-333)
  205. Cong J, Zhang P and Zou Y Combined loop transformation and hierarchy allocation for data reuse optimization Proceedings of the International Conference on Computer-Aided Design, (185-192)
  206. Overbey J and Johnson R Differential precondition checking Proceedings of the 26th IEEE/ACM International Conference on Automated Software Engineering, (303-312)
  207. ACM
    Jo Y and Kulkarni M Enhancing locality for recursive traversals of recursive structures Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, (463-482)
  208. ACM
    Ke C, Liu L, Zhang C, Bai T, Jacobs B and Ding C Safe parallel programming using dynamic dependence hints Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications, (243-258)
  209. ACM
    Jo Y and Kulkarni M (2011). Enhancing locality for recursive traversals of recursive structures, ACM SIGPLAN Notices, 46:10, (463-482), Online publication date: 18-Oct-2011.
  210. ACM
    Ke C, Liu L, Zhang C, Bai T, Jacobs B and Ding C (2011). Safe parallel programming using dynamic dependence hints, ACM SIGPLAN Notices, 46:10, (243-258), Online publication date: 18-Oct-2011.
  211. ACM
    Smith A and Kulkarni P Localizing globals and statics to make C programs thread-safe Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems, (205-214)
  212. Misailovic S, Roy D and Rinard M Probabilistically accurate program transformations Proceedings of the 18th international conference on Static analysis, (316-333)
  213. Burak D and Chudzik M Parallelization of the discrete chaotic block encryption algorithm Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part II, (323-332)
  214. Kalinnik N, Korch M and Rauber T (2011). An efficient time-step-based self-adaptive algorithm for predictor-corrector methods of Runge-Kutta type, Journal of Computational and Applied Mathematics, 236:3, (394-410), Online publication date: 1-Sep-2011.
  215. Krzikalla O, Feldhoff K, Müller-Pfefferkorn R and Nagel W Scout Proceedings of the 2011 international conference on Parallel Processing - Volume 2, (137-145)
  216. Donaldson A, Kaiser A, Kroening D and Wahl T Symmetry-aware predicate abstraction for shared-variable concurrent programs Proceedings of the 23rd international conference on Computer aided verification, (356-371)
  217. ACM
    Cong J, Huang H, Liu C and Zou Y A reuse-aware prefetching scheme for scratchpad memory Proceedings of the 48th Design Automation Conference, (960-965)
  218. ACM
    Udupa A, Rajan K and Thies W ALTER Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (480-491)
  219. ACM
    Sato S and Iwasaki H Automatic parallelization via matrix multiplication Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (470-479)
  220. ACM
    Raman A, Kim H, Oh T, Lee J and August D Parallelism orchestration using DoPE Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (26-37)
  221. ACM
    Pingali K, Nguyen D, Kulkarni M, Burtscher M, Hassaan M, Kaleem R, Lee T, Lenharth A, Manevich R, Méndez-Lojo M, Prountzos D and Sui X The tao of parallelism in algorithms Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (12-25)
  222. ACM
    Prabhu P, Ghosh S, Zhang Y, Johnson N and August D Commutative set Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, (1-11)
  223. ACM
    Udupa A, Rajan K and Thies W (2011). ALTER, ACM SIGPLAN Notices, 46:6, (480-491), Online publication date: 4-Jun-2011.
  224. ACM
    Sato S and Iwasaki H (2011). Automatic parallelization via matrix multiplication, ACM SIGPLAN Notices, 46:6, (470-479), Online publication date: 4-Jun-2011.
  225. ACM
    Raman A, Kim H, Oh T, Lee J and August D (2011). Parallelism orchestration using DoPE, ACM SIGPLAN Notices, 46:6, (26-37), Online publication date: 4-Jun-2011.
  226. ACM
    Pingali K, Nguyen D, Kulkarni M, Burtscher M, Hassaan M, Kaleem R, Lee T, Lenharth A, Manevich R, Méndez-Lojo M, Prountzos D and Sui X (2011). The tao of parallelism in algorithms, ACM SIGPLAN Notices, 46:6, (12-25), Online publication date: 4-Jun-2011.
  227. ACM
    Prabhu P, Ghosh S, Zhang Y, Johnson N and August D (2011). Commutative set, ACM SIGPLAN Notices, 46:6, (1-11), Online publication date: 4-Jun-2011.
  228. ACM
    McFarlin D, Arbatov V, Franchetti F and Püschel M Automatic SIMD vectorization of fast fourier transforms for the larrabee and AVX instruction sets Proceedings of the international conference on Supercomputing, (265-274)
  229. ACM
    Rahman S, Yi Q and Qasem A Understanding stencil code performance on multicore architectures Proceedings of the 8th ACM International Conference on Computing Frontiers, (1-10)
  230. ACM
    Bilardi G, Ekanadham K and Pattnaik P Efficient stack distance computation for priority replacement policies Proceedings of the 8th ACM International Conference on Computing Frontiers, (1-10)
  231. Newburn C, So B, Liu Z, McCool M, Ghuloum A, Toit S, Wang Z, Du Z, Chen Y, Wu G, Guo P, Liu Z and Zhang D Intel's Array Building Blocks Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (224-235)
  232. Kandemir M, Zhang Y, Liu J and Yemliha T Neighborhood-aware data locality optimization for NoC-based multicores Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (191-200)
  233. Nuzman D, Dyshel S, Rohou E, Rosen I, Williams K, Yuste D, Cohen A and Zaks A Vapor SIMD Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (151-160)
  234. Henretty T, Stock K, Pouchet L, Franchetti F, Ramanujam J and Sadayappan P Data layout transformation for stencil computations on short-vector SIMD architectures Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, (225-245)
  235. ACM
    Kalinnik N, Korch M and Rauber T Dynamic selection of implementation variants of sequential iterated runge-kutta methods with tile size sampling Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering, (189-200)
  236. ACM
    Cong J, Jiang W, Liu B and Zou Y (2011). Automatic memory partitioning and scheduling for throughput and power optimization, ACM Transactions on Design Automation of Electronic Systems, 16:2, (1-25), Online publication date: 1-Mar-2011.
  237. Liu M, Sha E, Zhuge Q, He Y and Qiu M (2011). Loop Distribution and Fusion with Timing and Code Size Optimization, Journal of Signal Processing Systems, 62:3, (325-340), Online publication date: 1-Mar-2011.
  238. ACM
    Daloukas K, Antonopoulos C and Bellas N GLOpenCL Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, (15-24)
  239. Qiu M, Niu J, Yang L, Qin X, Zhang S and Wang B Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing, (205-212)
  240. Barik R, Zhao J and Sarkar V Efficient Selection of Vector Instructions Using Dynamic Programming Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, (201-212)
  241. Kotha A, Anand K, Smithson M, Yellareddy G and Barua R Automatic Parallelization in a Binary Rewriter Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, (547-557)
  242. Kim H, Raman A, Liu F, Lee J and August D Scalable Speculative Parallelization on Commodity Clusters Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, (3-14)
  243. Pouchet L, Bondhugula U, Bastoul C, Cohen A, Ramanujam J and Sadayappan P Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  244. ACM
    Li G and Gopalakrishnan G Scalable SMT-based verification of GPU kernel functions Proceedings of the eighteenth ACM SIGSOFT international symposium on Foundations of software engineering, (187-196)
  245. ACM
    Yu C and Petrov P (2010). Energy- and Performance-Efficient Communication Framework for Embedded MPSoCs through Application-Driven Release Consistency, ACM Transactions on Design Automation of Electronic Systems, 16:1, (1-39), Online publication date: 1-Nov-2010.
  246. ACM
    Palem K Compilers, architectures and synthesis for embedded computing Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems, (167-176)
  247. ACM
    Méndez-Lojo M, Mathew A and Pingali K (2010). Parallel inclusion-based points-to analysis, ACM SIGPLAN Notices, 45:10, (428-443), Online publication date: 17-Oct-2010.
  248. ACM
    Herzeel C and Costanza P (2010). Dynamic parallelization of recursive code, ACM SIGPLAN Notices, 45:10, (377-396), Online publication date: 17-Oct-2010.
  249. ACM
    Tian K, Jiang Y, Zhang E and Shen X (2010). An input-centric paradigm for program dynamic optimizations, ACM SIGPLAN Notices, 45:10, (125-139), Online publication date: 17-Oct-2010.
  250. ACM
    Chakraborty S and Nandivada V Inferring arbitrary distributions for data and computation Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion, (51-60)
  251. ACM
    Méndez-Lojo M, Mathew A and Pingali K Parallel inclusion-based points-to analysis Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (428-443)
  252. ACM
    Herzeel C and Costanza P Dynamic parallelization of recursive code Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (377-396)
  253. ACM
    Tian K, Jiang Y, Zhang E and Shen X An input-centric paradigm for program dynamic optimizations Proceedings of the ACM international conference on Object oriented programming systems languages and applications, (125-139)
  254. Afek Y, Korland G and Zilberstein A Lowering STM overhead with static analysis Proceedings of the 23rd international conference on Languages and compilers for parallel computing, (31-45)
  255. Philippidis C and Shang W (2010). On minimizing register usage of linearly scheduled algorithms with uniform dependencies, Computer Languages, Systems and Structures, 36:3, (250-267), Online publication date: 1-Oct-2010.
  256. Qasem A, Guo J, Rahman F and Yi Q Exposing tunable parameters in multi-threaded numerical code Proceedings of the 2010 IFIP international conference on Network and parallel computing, (46-60)
  257. Nie J, Cheng B, Li S, Wang L and Li X Vectorization for Java Proceedings of the 2010 IFIP international conference on Network and parallel computing, (3-17)
  258. ACM
    Vandierendonck H, Rul S and De Bosschere K The Paralax infrastructure Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (389-400)
  259. ACM
    Tournavitis G and Franke B Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (377-388)
  260. ACM
    Lee J, Kim J, Seo S, Kim S, Park J, Kim H, Dao T, Cho Y, Seo S, Lee S, Cho S, Song H, Suh S and Choi J An OpenCL framework for heterogeneous multicores with local memory Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (193-204)
  261. ACM
    Zhao J, Shirako J, Nandivada V and Sarkar V Reducing task creation and termination overhead in explicitly parallel programs Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (169-180)
  262. ACM
    Purnaprajna M, Porrmann M, Rueckert U, Hussmann M, Thies M and Kastens U (2010). Runtime Reconfiguration of Multiprocessors Based on Compile-Time Analysis, ACM Transactions on Reconfigurable Technology and Systems, 3:3, (1-25), Online publication date: 1-Sep-2010.
  263. Lionetti F, McCulloch A and Baden S Source-to-source optimization of CUDA C for GPU accelerated cardiac cell modeling Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, (38-49)
  264. Mak J, Faxén K, Janson S and Mycroft A Estimating and exploiting potential parallelism by source-level dependence profiling Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I, (26-37)
  265. ACM
    Tian C, Feng M and Gupta R (2010). Speculative parallelization using state separation and multiple value prediction, ACM SIGPLAN Notices, 45:8, (63-72), Online publication date: 1-Aug-2010.
  266. Agullo E, Bouwmeester H, Dongarra J, Kurzak J, Langou J and Rosenberg L Towards an efficient tile matrix inversion of symmetric positive definite matrices on multicore architectures Proceedings of the 9th international conference on High performance computing for computational science, (129-138)
  267. ACM
    Kandemir M, Muralidhara S, Karakoy M and Son S Computation mapping for multi-level storage cache hierarchies Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, (179-190)
  268. ACM
    Tian C, Feng M and Gupta R Speculative parallelization using state separation and multiple value prediction Proceedings of the 2010 international symposium on Memory management, (63-72)
  269. ACM
    Zhang E, Jiang Y and Shen X (2010). Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?, ACM SIGPLAN Notices, 45:5, (203-212), Online publication date: 1-May-2010.
  270. ACM
    Harper K, Zheng J and Mahate S Experiences in initiating concurrency software research efforts Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering - Volume 2, (139-148)
  271. Steinberg R (2010). Mapping loop nests to multipipelined architecture, Programming and Computing Software, 36:3, (177-185), Online publication date: 1-May-2010.
  272. ACM
    Jiang Y, Zhang E, Tian K, Mao F, Gethers M, Shen X and Gao Y Exploiting statistical correlations for proactive prediction of program behaviors Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, (248-256)
  273. ACM
    Huang J, Raman A, Jablin T, Zhang Y, Hung T and August D Decoupled software pipelining creates parallelization opportunities Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization, (121-130)
  274. ACM
    Chen N and Johnson R Patterns for cache optimizations on multi-processor machines Proceedings of the 2010 Workshop on Parallel Programming Patterns, (1-10)
  275. ACM
    Hormati A, Choi Y, Woh M, Kudlur M, Rabbah R, Mudge T and Mahlke S MacroSS Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems, (285-296)
  276. ACM
    Raman A, Kim H, Mason T, Jablin T and August D Speculative parallelization using software multi-threaded transactions Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems, (65-76)
  277. ACM
    Hormati A, Choi Y, Woh M, Kudlur M, Rabbah R, Mudge T and Mahlke S (2010). MacroSS, ACM SIGPLAN Notices, 45:3, (285-296), Online publication date: 5-Mar-2010.
  278. ACM
    Raman A, Kim H, Mason T, Jablin T and August D (2010). Speculative parallelization using software multi-threaded transactions, ACM SIGPLAN Notices, 45:3, (65-76), Online publication date: 5-Mar-2010.
  279. ACM
    Hormati A, Choi Y, Woh M, Kudlur M, Rabbah R, Mudge T and Mahlke S (2010). MacroSS, ACM SIGARCH Computer Architecture News, 38:1, (285-296), Online publication date: 5-Mar-2010.
  280. ACM
    Raman A, Kim H, Mason T, Jablin T and August D (2010). Speculative parallelization using software multi-threaded transactions, ACM SIGARCH Computer Architecture News, 38:1, (65-76), Online publication date: 5-Mar-2010.
  281. ACM
    Askitis N and Zobel J (2011). Redesigning the string hash table, burst trie, and BST to exploit cache, ACM Journal of Experimental Algorithmics, 15, (1.1-1.61), Online publication date: 1-Mar-2010.
  282. ACM
    Zhang E, Jiang Y and Shen X Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, (203-212)
  283. ACM
    Renganarayana L, Bondhugula U, Derisavi S, Eichenberger A and O'Brien K Compact multi-dimensional kernel extraction for register tiling Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, (1-12)
  284. ACM
    Cong J, Jiang W, Liu B and Zou Y Automatic memory partitioning and scheduling for throughput and power optimization Proceedings of the 2009 International Conference on Computer-Aided Design, (697-704)
  285. ACM
    Barik R, Budimlic Z, Cavè V, Chatterjee S, Guo Y, Peixotto D, Raman R, Shirako J, Taşırlar S, Yan Y, Zhao Y and Sarkar V The habanero multicore software research project Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications, (735-736)
  286. ACM
    Liu D, Shao Z, Wang M, Guo M and Xue J Optimal loop parallelization for maximizing iteration-level parallelism Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, (67-76)
  287. Kwiatkowski J and Iwaszyn R Automatic program parallelization for multicore processors Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, (236-245)
  288. Bielecki W and Palkowski M Extracting both affine and non-linear synchronization-free slices in program loops Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I, (196-205)
  289. ACM
    Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K and Chew L (2009). Optimistic parallelism requires abstractions, Communications of the ACM, 52:9, (89-97), Online publication date: 1-Sep-2009.
  290. ACM
    Leung A, Lhoták O and Lashari G Automatic parallelization for graphics processing units Proceedings of the 7th International Conference on Principles and Practice of Programming in Java, (91-100)
  291. ACM
    Zhong Y, Shen X and Ding C (2009). Program locality analysis using reuse distance, ACM Transactions on Programming Languages and Systems, 31:6, (1-39), Online publication date: 1-Aug-2009.
  292. ACM
    Bilardi G, Ekanadham K and Pattnaik P (2009). On approximating the ideal random access machine by physical machines, Journal of the ACM, 56:5, (1-57), Online publication date: 1-Aug-2009.
  293. ACM
    Mak J and Mycroft A Limits of parallelism using dynamic dependency graphs Proceedings of the Seventh International Workshop on Dynamic Analysis, (42-48)
  294. Long S and Fursin G (2009). Systematic search within an optimisation space based on Unified Transformation Framework, International Journal of Computational Science and Engineering, 4:2, (102-111), Online publication date: 1-Jul-2009.
  295. ACM
    Tournavitis G, Wang Z, Franke B and O'Boyle M Towards a holistic approach to auto-parallelization Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, (177-187)
  296. ACM
    Mehrara M, Hao J, Hsu P and Mahlke S Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory Proceedings of the 30th ACM SIGPLAN Conference on Programming Language Design and Implementation, (166-176)
  297. ACM
    Shirako J, Zhao J, Nandivada V and Sarkar V Chunking parallel loops in the presence of synchronization Proceedings of the 23rd international conference on Supercomputing, (181-192)
  298. ACM
    Tournavitis G, Wang Z, Franke B and O'Boyle M (2009). Towards a holistic approach to auto-parallelization, ACM SIGPLAN Notices, 44:6, (177-187), Online publication date: 28-May-2009.
  299. ACM
    Mehrara M, Hao J, Hsu P and Mahlke S (2009). Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory, ACM SIGPLAN Notices, 44:6, (166-176), Online publication date: 28-May-2009.
  300. Liao C, Quinlan D, Willcock J and Panas T Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism, (28-41)
  301. Dimitroulakos G, Kostaras N, Galanis M and Goutis C (2009). Compiler assisted architectural exploration framework for coarse grained reconfigurable arrays, The Journal of Supercomputing, 48:2, (115-151), Online publication date: 1-May-2009.
  302. Kelsey K, Bai T, Ding C and Zhang C Fast Track Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization, (157-168)
  303. ACM
    Magee J and Qasem A A case for compiler-driven superpage allocation Proceedings of the 47th Annual Southeast Regional Conference, (1-4)
  304. ACM
    Jang B, Do S, Pien H and Kaeli D Architecture-aware optimization targeting multithreaded stream computing Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, (62-70)
  305. ACM
    Kulkarni M, Burtscher M, Inkulu R, Pingali K and Casçaval C (2009). How much parallelism is there in irregular applications?, ACM SIGPLAN Notices, 44:4, (3-14), Online publication date: 14-Feb-2009.
  306. ACM
    Kulkarni M, Burtscher M, Inkulu R, Pingali K and Casçaval C How much parallelism is there in irregular applications? Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, (3-14)
  307. ACM
    Cooper K, Eckhardt J and Kennedy K Redundancy elimination revisited Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (12-21)
  308. ACM
    Nuzman D and Zaks A Outer-loop vectorization Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (2-11)
  309. ACM
    Ghodrat M, Givargis T and Nicolau A Control flow optimization in loops using interval analysis Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, (157-166)
  310. ACM
    Leha A, Chalabine M and Kessler C Parallelizing scientific code with invasive interactive parallelization Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance, (1-10)
  311. ACM
    Arenaz M, Touriño J and Doallo R (2008). XARK, ACM Transactions on Programming Languages and Systems, 30:6, (1-56), Online publication date: 1-Oct-2008.
  312. ACM
    Youseff L, Seymour K, You H, Dongarra J and Wolski R The impact of paravirtualized memory hierarchy on linear algebra computational kernels and software Proceedings of the 17th international symposium on High performance distributed computing, (141-152)
  313. ACM
    Pouchet L, Bastoul C, Cohen A and Cavazos J Iterative optimization in the polyhedral model Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, (90-100)
  314. ACM
    Shou Y and van Engelen R Automatic SIMD vectorization of chains of recurrences Proceedings of the 22nd annual international conference on Supercomputing, (245-255)
  315. ACM
    Pouchet L, Bastoul C, Cohen A and Cavazos J (2008). Iterative optimization in the polyhedral model, ACM SIGPLAN Notices, 43:6, (90-100), Online publication date: 30-May-2008.
  316. ACM
    Rodrigues C, Hardy D, Stone J, Schulten K and Hwu W GPU acceleration of cutoff pair potentials for molecular modeling applications Proceedings of the 5th conference on Computing frontiers, (273-282)
  317. ACM
    Nuzman D, Namolaru M, Zaks A and Derby J Compiling for an indirect vector register architecture Proceedings of the 5th conference on Computing frontiers, (199-208)
  318. ACM
    Kotzmann T, Wimmer C, Mössenböck H, Rodriguez T, Russell K and Cox D (2008). Design of the Java HotSpot™ client compiler for Java 6, ACM Transactions on Architecture and Code Optimization, 5:1, (1-32), Online publication date: 1-May-2008.
  319. ACM
    Hampton M and Asanovic K Compiling for vector-thread architectures Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, (205-215)
  320. ACM
    Ryoo S, Rodrigues C, Stone S, Baghsorkhi S, Ueng S, Stratton J and Hwu W Program optimization space pruning for a multithreaded gpu Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, (195-204)
  321. ACM
    Raman E, Va hharajani N, Rangan R and August D Spice Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, (175-184)
  322. ACM
    Raman E, Ottoni G, Raman A, Bridges M and August D Parallel-stage decoupled software pipelining Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization, (114-123)
  323. ACM
    Suleman M, Qureshi M and Patt Y (2008). Feedback-driven threading, ACM SIGPLAN Notices, 43:3, (277-286), Online publication date: 25-Mar-2008.
  324. ACM
    Kulkarni M, Pingali K, Ramanarayanan G, Walter B, Bala K and Chew L (2008). Optimistic parallelism benefits from data partitioning, ACM SIGPLAN Notices, 43:3, (233-243), Online publication date: 25-Mar-2008.
  325. ACM
    Suleman M, Qureshi M and Patt Y (2008). Feedback-driven threading, ACM SIGOPS Operating Systems Review, 42:2, (277-286), Online publication date: 25-Mar-2008.
  326. ACM
    Kulkarni M, Pingali K, Ramanarayanan G, Walter B, Bala K and Chew L (2008). Optimistic parallelism benefits from data partitioning, ACM SIGOPS Operating Systems Review, 42:2, (233-243), Online publication date: 25-Mar-2008.
  327. ACM
    Suleman M, Qureshi M and Patt Y (2008). Feedback-driven threading, ACM SIGARCH Computer Architecture News, 36:1, (277-286), Online publication date: 25-Mar-2008.
  328. ACM
    Kulkarni M, Pingali K, Ramanarayanan G, Walter B, Bala K and Chew L (2008). Optimistic parallelism benefits from data partitioning, ACM SIGARCH Computer Architecture News, 36:1, (233-243), Online publication date: 25-Mar-2008.
  329. ACM
    Suleman M, Qureshi M and Patt Y Feedback-driven threading Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, (277-286)
  330. ACM
    Kulkarni M, Pingali K, Ramanarayanan G, Walter B, Bala K and Chew L Optimistic parallelism benefits from data partitioning Proceedings of the 13th international conference on Architectural support for programming languages and operating systems, (233-243)
  331. ACM
    Ryoo S, Rodrigues C, Baghsorkhi S, Stone S, Kirk D and Hwu W Optimization principles and application performance evaluation of a multithreaded GPU using CUDA Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, (73-82)
  332. Reid W, Kelly W and Craik A Reasoning about inherent parallelism in modern object-oriented languages Proceedings of the thirty-first Australasian conference on Computer science - Volume 74, (27-36)
  333. Berzal F, Cubero J and Jiménez A Hierarchical program representation for program element matching Proceedings of the 8th international conference on Intelligent data engineering and automated learning, (467-476)
  334. Berzal F, Cubero J and Jiménez A Hierarchical Program Representation for Program Element Matching Intelligent Data Engineering and Automated Learning - IDEAL 2007, (467-476)
  335. Beletska A, Bielecki W and Pietro P Extracting synchronization-free slices of operations in perfectly-nested loops Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems, (244-249)
  336. Lokhmotov A, Gaster B, Mycroft A, Hickey N and Stuttard D Revisiting SIMD Programming Languages and Compilers for Parallel Computing, (32-46)
  337. Fritz N, Lucas P and Wilhelm R Exploiting SIMD Parallelism with the CGiS Compiler Framework Languages and Compilers for Parallel Computing, (246-260)
  338. ACM
    Absar J, Li M, Raghavan P, Lambrechts A, Jayapala M, Vandecappelle A and Catthoor F Locality optimization in wireless applications Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis, (125-130)
  339. ACM
    Fellahi M, Cohen A and Touati S Code-size conscious pipelining of imperfectly nested loops Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, (49-55)
  340. ACM
    Xiao S and Lai E (2007). VLIW instruction scheduling for minimal power variation, ACM Transactions on Architecture and Code Optimization, 4:3, (18-es), Online publication date: 1-Sep-2007.
  341. Korch M and Rauber T Locality optimized shared-memory implementations of iterated runge-kutta methods Proceedings of the 13th international Euro-Par conference on Parallel Processing, (737-747)
  342. Lokhmotov A, Mycroft A and Richards A Delayed side-effects ease multi-core programming Proceedings of the 13th international Euro-Par conference on Parallel Processing, (641-650)
  343. Donaldson A, Riley C, Lokhmotov A and Cook A Auto-parallelisation of sieve C++ programs Proceedings of the 2007 conference on Parallel processing, (18-27)
  344. Ryoo S, Ueng S, Rodrigues C, Kidd R, Frank M and Hwu W Automatic Discovery of Coarse-Grained Parallelism in Media Applications Transactions on High-Performance Embedded Architectures and Compilers I, (194-213)
  345. Zelenov S and Zelenova S Model-based testing of optimizing compilers Proceedings of the 19th IFIP TC6/WG6.1 international conference, and 7th international conference on Testing of Software and Communicating Systems, (365-377)
  346. ACM
    Ding C, Shen X, Kelsey K, Tice C, Huang R and Zhang C Software behavior oriented parallelization Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, (223-234)
  347. ACM
    Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K and Chew L Optimistic parallelism requires abstractions Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, (211-222)
  348. ACM
    Ding C, Shen X, Kelsey K, Tice C, Huang R and Zhang C (2007). Software behavior oriented parallelization, ACM SIGPLAN Notices, 42:6, (223-234), Online publication date: 10-Jun-2007.
  349. ACM
    Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K and Chew L (2007). Optimistic parallelism requires abstractions, ACM SIGPLAN Notices, 42:6, (211-222), Online publication date: 10-Jun-2007.
  350. ACM
    Yotov K, Roeder T, Pingali K, Gunnels J and Gustavson F An experimental comparison of cache-oblivious and cache-conscious programs Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures, (93-104)
  351. ACM
    Kennedy K, Koelbel C and Zima H The rise and fall of High Performance Fortran Proceedings of the third ACM SIGPLAN conference on History of programming languages, (7-1-7-22)
  352. ACM
    Dimitroulakos G, Galanis M, Kostaras N and Goutis C A unified evaluation framework for coarse grained reconfigurable array architectures Proceedings of the 4th international conference on Computing frontiers, (161-172)
  353. Fireman L, Petrank E and Zaks A New algorithms for SIMD alignment Proceedings of the 16th international conference on Compiler construction, (1-15)
  354. ACM
    Gontmakher A, Mendelson A and Schuster A Using fine grain multithreading for energy efficient computing Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, (259-269)
  355. Pouchet L, Bastoul C, Cohen A and Vasilache N Iterative Optimization in the Polyhedral Model Proceedings of the International Symposium on Code Generation and Optimization, (144-156)
  356. Birkbeck N, Levesque J and Amaral J A Dimension Abstraction Approach to Vectorization in Matlab Proceedings of the International Symposium on Code Generation and Optimization, (115-130)
  357. ACM
    Gill G, Hansen J and Singh M Loop pipelining for high-throughput stream computation using self-timed rings Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design, (289-296)
  358. Wang S, Zhai A and Yew P Exploiting speculative thread-level parallelism in data compression applications Proceedings of the 19th international conference on Languages and compilers for parallel computing, (126-140)
  359. Zhao Y and Kennedy K Dependence-based code generation for a CELL processor Proceedings of the 19th international conference on Languages and compilers for parallel computing, (64-79)
  360. ACM
    Audsley N and Ward M Syntax-driven implementation of software programming language control constructs and expressions on FPGAs Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, (253-260)
  361. ACM
    Birch J, van Engelen R, Gallivan K and Shou Y An empirical evaluation of chains of recurrences for array dependence testing Proceedings of the 15th international conference on Parallel architectures and compilation techniques, (295-304)
  362. Cohen A, Donadio S, Garzaran M, Herrmann C, Kiselyov O and Padua D (2006). In search of a program generator to implement generic transformations for high-performance computing, Science of Computer Programming, 62:1, (25-46), Online publication date: 1-Sep-2006.
  363. Parsa S and Lotfi S (2006). A New Genetic Algorithm for Loop Tiling, The Journal of Supercomputing, 37:3, (249-269), Online publication date: 1-Sep-2006.
  364. Hu Z, del Cuvillo J, Zhu W and Gao G Optimization of dense matrix multiplication on IBM cyclops-64 Proceedings of the 12th international conference on Parallel Processing, (134-144)
  365. ACM
    Vasilache N, Bastoul C, Cohen A and Girbal S Violated dependence analysis Proceedings of the 20th annual international conference on Supercomputing, (335-344)
  366. Parsa S and Lotfi S Loop parallelization in multi-dimensional cartesian space Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics, (335-348)
  367. Zumbusch G Data dependence analysis for the parallelization of numerical tree codes Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (890-899)
  368. ACM
    Bocchino R and Adve V Vector LLVA Proceedings of the 2nd international conference on Virtual execution environments, (46-56)
  369. ACM
    Nuzman D, Rosen I and Zaks A Auto-vectorization of interleaved data for SIMD Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, (132-143)
  370. ACM
    Nuzman D, Rosen I and Zaks A (2006). Auto-vectorization of interleaved data for SIMD, ACM SIGPLAN Notices, 41:6, (132-143), Online publication date: 11-Jun-2006.
  371. Dimitroulakos G, Galanis M and Goutis C Exploring the design space of an optimized compiler approach for mesh-like coarse-grained reconfigurable architectures Proceedings of the 20th international conference on Parallel and distributed processing, (113-113)
  372. Galanis M, Dimitroulakos G and Goutis C Design flow for optimizing performance in processor systems with on-chip coarse-grain reconfigurable logic Proceedings of the 20th international conference on Parallel and distributed processing, (112-112)
  373. Zhang Z and Seidel S A performance model for fine-grain accesses in UPC Proceedings of the 20th international conference on Parallel and distributed processing, (65-65)
  374. ACM
    Absar J and Catthoor F (2006). Reuse analysis of indirectly indexed arrays, ACM Transactions on Design Automation of Electronic Systems, 11:2, (282-305), Online publication date: 1-Apr-2006.
  375. Zhang T, Zhuang X and Pande S Compiler Optimizations to Reduce Security Overhead Proceedings of the International Symposium on Code Generation and Optimization, (346-357)
  376. Son S, Chen G and Kandemir M A Compiler-Guided Approach for Reducing Disk Power Consumption by Exploiting Disk Access Locality Proceedings of the International Symposium on Code Generation and Optimization, (256-268)
  377. ACM
    Tang P Complete inlining of recursive calls Proceedings of the 44th annual Southeast regional conference, (579-584)
  378. Dongarra J, Bosilca G, Chen Z, Eijkhout V, Fagg G, Fuentes E, Langou J, Luszczek P, Pjesivac-Grbovic J, Seymour K, You H and Vadhiyar S (2006). Self-adapting numerical software (SANS) effort, IBM Journal of Research and Development, 50:2/3, (223-238), Online publication date: 1-Mar-2006.
  379. Liu M, Zhuge Q, Shao Z, Xue C, Qiu M and Sha E Loop distribution and fusion with timing and code size optimization for embedded DSPs Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing, (121-130)
  380. Yang H, Govindarajan R, Gao G and Hu Z (2005). Improving power efficiency with compiler-assisted cache replacement, Journal of Embedded Computing, 1:4, (487-499), Online publication date: 1-Dec-2005.
  381. Pop S, Cohen A and Silber G Induction variable analysis with delayed abstractions Proceedings of the First international conference on High Performance Embedded Architectures and Compilers, (218-232)
  382. Weinberg J, McCracken M, Strohmaier E and Snavely A Quantifying Locality In The Memory Access Patterns of HPC Applications Proceedings of the 2005 ACM/IEEE conference on Supercomputing
  383. Larsen S, Rabbah R and Amarasinghe S Exploiting Vector Parallelism in Software Pipelined Loops Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, (119-129)
  384. Ottoni G, Rangan R, Stoler A and August D Automatic Thread Extraction with Decoupled Software Pipelining Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, (105-118)
  385. Zuck L, Pnueli A, Goldberg B, Barrett C, Fang Y and Hu Y (2005). Translation and Run-Time Validation of Loop Transformations, Formal Methods in System Design, 27:3, (335-360), Online publication date: 1-Nov-2005.
  386. Chalabine M and Kessler C Parallelisation of sequential programs by invasive composition and aspect weaving Proceedings of the 6th international conference on Advanced Parallel Processing Technologies, (131-140)
  387. Shen X and Ding C Parallelization of utility programs based on behavior phase analysis Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (425-432)
  388. Epshteyn A, Garzaran M, DeJong G, Padua D, Ren G, Li X, Yotov K and Pingali K Analytic models and empirical search Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (259-273)
  389. Renganarayana L, Ramakrishna U and Rajopadhye S Combined ILP and register tiling Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (244-258)
  390. Yotov K, Jackson S, Steele T, Pingali K and Stodghill P Automatic measurement of instruction cache capacity Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (230-243)
  391. ACM
    Matosevic I, Abdelrahman T, Karim F and Mellan A Power optimizations for the MLCA using dynamic voltage scaling Proceedings of the 2005 workshop on Software and compilers for embedded systems, (109-123)
  392. Narasamdya I and Voronkov A Finding basic block and variable correspondence Proceedings of the 12th international conference on Static Analysis, (251-267)
  393. ACM
    Johnson J, Krandick W and Ruslanov A Architecture-aware classical Taylor shift by 1 Proceedings of the 2005 international symposium on Symbolic and algebraic computation, (200-207)
  394. Barrett C, Fang Y, Goldberg B, Hu Y, Pnueli A and Zuck L TVOC Proceedings of the 17th international conference on Computer Aided Verification, (291-295)
  395. ACM
    Yotov K, Pingali K and Stodghill P Think globally, search locally Proceedings of the 19th annual international conference on Supercomputing, (141-150)
  396. ACM
    Shen X, Gao Y, Ding C and Archambault R Lightweight reference affinity analysis Proceedings of the 19th annual international conference on Supercomputing, (131-140)
  397. ACM
    Ni Y, Kremer U, Stere A and Iftode L Programming ad-hoc networks of mobile and resource-constrained devices Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation, (249-260)
  398. ACM
    Ni Y, Kremer U, Stere A and Iftode L (2005). Programming ad-hoc networks of mobile and resource-constrained devices, ACM SIGPLAN Notices, 40:6, (249-260), Online publication date: 12-Jun-2005.
  399. ACM
    Yotov K, Pingali K and Stodghill P (2005). Automatic measurement of memory hierarchy parameters, ACM SIGMETRICS Performance Evaluation Review, 33:1, (181-192), Online publication date: 6-Jun-2005.
  400. ACM
    Yotov K, Pingali K and Stodghill P Automatic measurement of memory hierarchy parameters Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (181-192)
  401. Alam S and Vetter J Performance and scalability analysis of cray x1 vectorization and multistreaming optimization Proceedings of the 5th international conference on Computational Science - Volume Part I, (304-312)
  402. Chen G, Chen G, Ozturk O and Kandemir M Exploiting Inter-Processor Data Sharing for Improving Behavior of Multi-Processor SoCs Proceedings of the IEEE Computer Society Annual Symposium on VLSI: New Frontiers in VLSI Design, (90-95)
  403. Agerwala T and Chatterjee S (2005). Computer Architecture, IEEE Micro, 25:3, (58-69), Online publication date: 1-May-2005.
  404. Grelck C (2005). Shared memory multiprocessor support for functional array processing in SAC, Journal of Functional Programming, 15:3, (353-401), Online publication date: 1-May-2005.
  405. Guo Z, Wang X and Zhou A WSQuery Proceedings of the 10th international conference on Database Systems for Advanced Applications, (372-384)
  406. Barton C, Tal A, Blainey B and Amaral J Generalized index-set splitting Proceedings of the 14th international conference on Compiler Construction, (106-120)
  407. Shashidhar K, Bruynooghe M, Catthoor F and Janssens G Verification of source code transformations by program equivalence checking Proceedings of the 14th international conference on Compiler Construction, (221-236)
  408. Shin J, Hall M and Chame J Superword-Level Parallelism in the Presence of Control Flow Proceedings of the international symposium on Code generation and optimization, (165-175)
  409. Edwards S The Challenges of Hardware Synthesis from C-Like Languages Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, (66-67)
  410. Shashidhar K, Bruynooghe M, Catthoor F and Janssens G Functional Equivalence Checking for Verification of Algebraic Transformations on Array-Intensive Source Code Proceedings of the conference on Design, Automation and Test in Europe - Volume 2, (1310-1315)
  411. Beletskyy V and Burak D Parallelization of the data encryption standard(DES) algorithm Enhanced methods in computer security, biometric and artificial intelligence systems, (23-33)
  412. Zhao Y and Kennedy K (2005). Scalarization using loop alignment and loop skewing, The Journal of Supercomputing, 31:1, (5-46), Online publication date: 1-Jan-2005.
  413. Ding C and Orlovich M The Potential of Computation Regrouping for Improving Locality Proceedings of the 2004 ACM/IEEE conference on Supercomputing
  414. ACM
    Brifault K and Charles H Efficient data driven run-time code generation Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems, (1-7)
  415. Zhu Y, Magklis G, Scott M, Ding C and Albonesi D The Energy Impact of Aggressive Loop Fusion Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, (153-164)
  416. ACM
    Liu M, Zhuge Q, Shao Z and Sha E General loop fusion technique for nested loops considering timing and code size Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, (190-201)
  417. Shen X, Zhong Y and Ding C Phase-Based miss rate prediction across program inputs Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (42-55)
  418. Baradaran N, Diniz P and Park J Extending the applicability of scalar replacement to multiple induction variables Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (455-469)
  419. Zhang G, Unnikrishnan P and Ren J Experiments with auto-parallelizing SPEC2000FP benchmarks Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (348-362)
  420. Yi Q and Quinlan D Applying loop optimizations to object-oriented abstractions through general classification of array semantics Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (253-267)
  421. Rauber T and Rünger G (2004). Improving locality for ODE solvers by program transformations, Scientific Programming, 12:3, (133-154), Online publication date: 1-Aug-2004.
  422. ACM
    Carribault P and Cohen A Applications of storage mapping optimization to register promotion Proceedings of the 18th annual international conference on Supercomputing, (247-256)
  423. Drakenberg N A matrix-type for performance–portability Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (237-246)
  424. Kowarschik M, Christadler I and Rüde U Towards cache-optimized multigrid using patch-adaptive relaxation Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (901-910)
  425. ACM
    Zhong Y, Orlovich M, Shen X and Ding C (2004). Array regrouping and structure splitting using whole-program reference affinity, ACM SIGPLAN Notices, 39:6, (255-266), Online publication date: 9-Jun-2004.
  426. ACM
    Zhong Y, Orlovich M, Shen X and Ding C Array regrouping and structure splitting using whole-program reference affinity Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, (255-266)
  427. ACM
    Yi Q, Kennedy K, You H, Seymour K and Dongarra J Automatic blocking of QR and LU factorizations for locality Proceedings of the 2004 workshop on Memory system performance, (12-22)
  428. Yi Q and Kennedy K (2004). Improving Memory Hierarchy Performance through Combined Loop Interchange and Multi-Level Fusion, International Journal of High Performance Computing Applications, 18:2, (237-253), Online publication date: 1-May-2004.
  429. ACM
    Allen R and Kennedy K (2004). Automatic loop interchange, ACM SIGPLAN Notices, 39:4, (75-90), Online publication date: 1-Apr-2004.
  430. Li X, Garzarán M and Padua D A Dynamically Tuned Sorting Library Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
  431. Rong H, Tang Z, Govindarajan R, Douillet A and Gao G Single-Dimension Software Pipelining for Multi-Dimensional Loops Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
  432. Yi Q, Kennedy K and Adve V (2004). Transforming Complex Loop Nests for Locality, The Journal of Supercomputing, 27:3, (219-264), Online publication date: 1-Mar-2004.
  433. ACM
    Song L and Kavi K (2004). What can we gain by unfolding loops?, ACM SIGPLAN Notices, 39:2, (26-33), Online publication date: 1-Feb-2004.
  434. Ding C and Kennedy K (2004). Improving effective bandwidth through compiler enhancement of global cache reuse, Journal of Parallel and Distributed Computing, 64:1, (108-134), Online publication date: 1-Jan-2004.
  435. Scholz S (2003). Single Assignment C: efficient support for high-level array operations in a functional setting, Journal of Functional Programming, 13:6, (1005-1059), Online publication date: 1-Nov-2003.
  436. ACM
    Chen M and Olukotun K The Jrpm system for dynamically parallelizing Java programs Proceedings of the 30th annual international symposium on Computer architecture, (434-446)
  437. ACM
    Ding C and Zhong Y Predicting whole-program locality through reuse distance analysis Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, (245-257)
  438. ACM
    Yotov K, Li X, Ren G, Cibulskis M, DeJong G, Garzaran M, Padua D, Pingali K, Stodghill P and Wu P A comparison of empirical and model-driven optimization Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation, (63-76)
  439. ACM
    Ding C and Zhong Y (2003). Predicting whole-program locality through reuse distance analysis, ACM SIGPLAN Notices, 38:5, (245-257), Online publication date: 9-May-2003.
  440. ACM
    Yotov K, Li X, Ren G, Cibulskis M, DeJong G, Garzaran M, Padua D, Pingali K, Stodghill P and Wu P (2003). A comparison of empirical and model-driven optimization, ACM SIGPLAN Notices, 38:5, (63-76), Online publication date: 9-May-2003.
  441. ACM
    Chen M and Olukotun K (2003). The Jrpm system for dynamically parallelizing Java programs, ACM SIGARCH Computer Architecture News, 31:2, (434-446), Online publication date: 1-May-2003.
  442. Ghosh S, Kanhere A, Krishnaiyer R, Kulkarni D, Li W, Lim C and Ng J Integrating high-level optimizations in a production compiler Proceedings of the 12th international conference on Compiler construction, (303-319)
  443. Chen M and Olukotun K TEST Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, (301-312)
  444. ACM
    Carter L, Ferrante J and Thomborson C (2003). Folklore confirmed, ACM SIGPLAN Notices, 38:1, (106-114), Online publication date: 15-Jan-2003.
  445. ACM
    Carter L, Ferrante J and Thomborson C Folklore confirmed Proceedings of the 30th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (106-114)
  446. Dongarra J, Foster I, Fox G, Gropp W, Kennedy K, Torczon L and White A References Sourcebook of parallel computing, (729-789)
  447. Grelck C and Scholz S Axis control in SAC Proceedings of the 14th international conference on Implementation of functional languages, (182-198)
  448. Bik A, Girkar M, Grey P and Tian X Automatic detection of saturation and clipping idioms Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing, (61-74)
  449. Tan M, Liu G, Zhao R, Dai S and Zhang Z ElasticFlow: A complexity-effective approach for pipelining irregular loop nests 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), (78-85)
Contributors
  • Rice University
  • Rice University

Index Terms

  1. Optimizing compilers for modern architectures: a dependence-based approach

    Recommendations

    Reviews

    Robert Ballance

    Data and control dependencies are the constraints among program statements that influence or dictate execution order. Dependence analysis extracts dependence information from a program in order to guide optimization and to expose potential parallelism. Data dependence analysis is particularly important in optimizing programs for parallel execution. In this book, Allen and Kennedy have managed to summarize 20 years of academic theory and industrial practice in designing dependence-based optimizations for high-performance parallel computing. The authors start with data dependence analysis. Starting with dependence analysis, they then present analysis methods and optimizations for loops, fine- and coarse-grained parallelism, and other control constructs. As the book unfolds, the reader is brought closer to machine architectural issues such as register management, cache management, and instruction scheduling. Interprocedural analysis and optimization are introduced. The book concludes with several case studies in optimization: C, hardware design languages, array assignments, and compiling high-performance Fortran. Each chapter concludes with both practical and theoretical exercises. Allen and Kennedy have not really written a compiler book, but rather an excellent introduction and guide to implementing optimizations for high-performance computing. It could be used as a graduate-level text on the applications of dependence analysis, or as an advanced compiler construction text. It would be an excellent follow-on text to a more general compiler construction book, such as Muchnick’s Advanced compiler design and implementation [1]. The publisher provides a Web site where one can find supporting materials, answers to sample exercises, prepared lectures, and errata. Overall, dependence-analysis and optimization are hard topics, which the authors handle well. This is to be expected, given their experience in the field, and their roles in developing the subject. As computer architectures and high performance applications continue to evolve, the techniques presented here will continue to be relevant to compiler writers. A good example is the problem of vectorization, which is again becoming an issue for the most advanced current architectures. The book is intended for academic instruction, and as a reference for industrial practitioner. Practicing programmers who are developing high-performance numerical codes would also benefit from reading the book. Online Computing Reviews Service

    Olivier Louis Marie Lecarme

    I have a very high opinion of the Morgan Kaufmann series of books in computer science. They are generally very well presented, and bound in handy and pleasant volumes. Thus, I was expecting the same for this new book. The book is almost 800 pages, with 36 pages of index, 289 entries in the reference section, a very pleasant presentation and layout, and a beautiful cover: these are the first things that are visible. The reputation of the authors is also impressive, especially Kennedy, who can be considered to be an authority in the subject area. As the preface explains, this book is the product of a 20-year research project at Rice University. Some aspects of the book are disappointing, however. First of all, as soon as you open the book, a leaflet of errata falls out (two leaflets, in fact). The first leaflet corrects a systematic error made by an overzealous copy editor, who added a comma in the number 1000 wherever it occurred in Fortran programs. The second leaflet corrects and replaces page 46, where there is some confusion in comparison operators in a definition. This is not the end: when you go to the Web site for the book (http://www.mkp.com/ocma/), you discover a 105 KB PDF file of errata, completely different from the previous ones. Another disappointment, however, is that it seems impossible to find the PowerPoint presentations, used in classes based on the book, on this Web site, contrary to what is stated in the preface. Despite this impressing lot of errata, lets consider the preface, table of contents, and first chapters. One rapidly discovers that the title of the book has been chosen by the editor, but not by the authors. Reading only the title, you could imagine that the book addresses the topic of building optimizing compilers for any general-purpose language. In fact, the book addresses the optimization of loops in Fortran programs, using the theory of dependence. Almost all the program examples in the book and there are plenty of examples are in Fortran. Im not saying this is not an important and interesting subject, but the book title is somewhat misleading. Fortunately, the book is very well written and organized. The authors are true specialists, both in the practice of compiler writing, and in the teaching of it. They are the authors of the main part of the theory they are explaining, as demonstrated by the enormous number of times their names occur in the large bibliography. The chapters in the book are as follows: This book by can be used as support for an advanced undergraduate course, or, better, for a graduate course. It should also have a place in the library of anybody seriously working on implementing optimizing compilers for modern architectures. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.