skip to main content
Skip header Section
High Performance Compilers for Parallel ComputingJune 1995
Publisher:
  • Addison-Wesley Longman Publishing Co., Inc.
  • 75 Arlington Street, Suite 300 Boston, MA
  • United States
ISBN:978-0-8053-2730-4
Published:01 June 1995
Pages:
570
Skip Bibliometrics Section
Bibliometrics
Abstract

No abstract available.

Cited By

  1. ACM
    Jain A, Lin H, Villavieja C, Kasikci B, Kennelly C, Hashemi M and Ranganathan P Limoncello: Prefetchers for Scale Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3, (577-590)
  2. ACM
    Zhang Y, Sobotka N, Park S, Jamilan S, Khan T, Kasikci B, Pokam G, Litz H and Devietti J RPG2: Robust Profile-Guided Runtime Prefetch Generation Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2, (999-1013)
  3. ACM
    Reber B, Gould M, Kneipp A, Liu F, Prechtl I, Ding C, Chen L and Patru D (2023). Cache Programming for Scientific Loops Using Leases, ACM Transactions on Architecture and Code Optimization, 20:3, (1-25), Online publication date: 30-Sep-2023.
  4. Li J, Xie Q, Ma Y, Ma J, Ji K, Zhang Y, Zhang C, Chen Y, Wu G, Zhang J, Yang K, He X, Shen Q, Tao Y, Zhao H, Jiao P, Zhu C, Qian D and Xu C (2023). Big Data Analytic Toolkit: A General-Purpose, Modular, and Heterogeneous Acceleration Toolkit for Data Analytical Engines, Proceedings of the VLDB Endowment, 16:12, (3702-3714), Online publication date: 1-Aug-2023.
  5. ACM
    Essadki M, Michel B, Maugars B, Zinenko O, Vasilache N and Cohen A Code Generation for In-Place Stencils Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization, (2-13)
  6. Praharenka W, Pankratz D, De Carvalho J, Amiri E and Amaral J (2022). Vectorizing divergent control flow with active-lane consolidation on long-vector architectures, The Journal of Supercomputing, 78:10, (12553-12588), Online publication date: 1-Jul-2022.
  7. ACM
    Ding C, Chen D, Liu F, Reber B and Smith W (2022). CARL: Compiler Assigned Reference Leasing, ACM Transactions on Architecture and Code Optimization, 19:1, (1-28), Online publication date: 31-Mar-2022.
  8. ACM
    Hecker M, Bischof S and Snelting G (2021). On Time-sensitive Control Dependencies, ACM Transactions on Programming Languages and Systems, 44:1, (1-37), Online publication date: 31-Mar-2022.
  9. ACM
    Jamilan S, Khan T, Ayers G, Kasikci B and Litz H APT-GET Proceedings of the Seventeenth European Conference on Computer Systems, (747-764)
  10. ACM
    Behroozi A, Park S and Mahlke S Loner: utilizing the CPU vector datapath to process scalar integer data Proceedings of the 31st ACM SIGPLAN International Conference on Compiler Construction, (205-217)
  11. Guo J, Yi Q and Psarris K (2022). Enhancing the Effectiveness of Inlining in Automatic Parallelization, International Journal of Parallel Programming, 50:1, (65-88), Online publication date: 1-Feb-2022.
  12. ACM
    Barredo A, Armejach A, Beard J and Moreto M PLANAR Proceedings of the ACM International Conference on Supercomputing, (164-176)
  13. Honorio B, de Carvalho J, Skaf M and Araujo G Using OpenMP to Detect and Speculate Dynamic DOALL Loops OpenMP: Portable Multi-Level Parallelism on Modern Systems, (231-246)
  14. Arabnejad H, Bispo J, Cardoso J and Barbosa J (2019). Source-to-source compilation targeting OpenMP-based automatic parallelization of C applications, The Journal of Supercomputing, 76:9, (6753-6785), Online publication date: 1-Sep-2020.
  15. ACM
    Mendonça G, Liao C and Pereira F AutoParBench Proceedings of the 34th ACM International Conference on Supercomputing, (1-10)
  16. ACM
    Ayers G, Litz H, Kozyrakis C and Ranganathan P Classifying Memory Access Patterns for Prefetching Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems, (513-526)
  17. ACM
    Yuan L, Ding C, Smith W, Denning P and Zhang Y (2019). A Relational Theory of Locality, ACM Transactions on Architecture and Code Optimization, 16:3, (1-26), Online publication date: 20-Aug-2019.
  18. Porpodas V, Rocha R, Brevnov E, Góes L and Mattson T Super-Node SLP: optimized vectorization for code sequences containing operators and their inverse elements Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization, (206-216)
  19. ACM
    Porpodas V, Rocha R and Góes L VW-SLP Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques, (1-15)
  20. Zhao J and Zhao R (2018). K-DT, The Journal of Supercomputing, 74:4, (1655-1675), Online publication date: 1-Apr-2018.
  21. ACM
    Zinenko O, Huot S and Bastoul C (2018). Visual Program Manipulation in the Polyhedral Model, ACM Transactions on Architecture and Code Optimization, 15:1, (1-25), Online publication date: 31-Mar-2018.
  22. ACM
    Porpodas V, Rocha R and Góes L Look-ahead SLP: auto-vectorization in the presence of commutative operations Proceedings of the 2018 International Symposium on Code Generation and Optimization, (163-174)
  23. Maalej M, Paisante V, Magno Quinto Pereira F and Gonnord L (2018). Combining range and inequality information for pointer disambiguation, Science of Computer Programming, 152:C, (161-184), Online publication date: 15-Jan-2018.
  24. Jin C, de Supinski B, Abramson D, Poxon H, DeRose L, Dinh M, Endrei M and Jessup E (2017). A survey on software methods to improve the energy efficiency of parallel computing, International Journal of High Performance Computing Applications, 31:6, (517-549), Online publication date: 1-Nov-2017.
  25. ACM
    Ahmed H, Skjellumh A, Bangalore P and Pirkelbauer P Transforming blocking MPI collectives to Non-blocking and persistent operations Proceedings of the 24th European MPI Users' Group Meeting, (1-11)
  26. ACM
    Mendonça G, Guimarães B, Alves P, Pereira M, Araújo G and Pereira F (2017). DawnCC, ACM Transactions on Architecture and Code Optimization, 14:2, (1-25), Online publication date: 21-Jul-2017.
  27. ACM
    Sampaio D, Pouchet L and Rastello F Simplification and runtime resolution of data dependence constraints for loop transformations Proceedings of the International Conference on Supercomputing, (1-11)
  28. ACM
    Bilardi G, Ekanadham K and Pattnaik P Optimal On-Line Computation of Stack Distances for MIN and OPT Proceedings of the Computing Frontiers Conference, (237-246)
  29. Liu X, Qiu M, Wang X, Liu W and Cai K (2017). Energy Efficiency Optimization for Communication of Air-Based Information Network with Guaranteed Timing Constraints, Journal of Signal Processing Systems, 86:2-3, (299-312), Online publication date: 1-Mar-2017.
  30. Maalej M, Paisante V, Ramos P, Gonnord L and Pereira F Pointer disambiguation via strict inequalities Proceedings of the 2017 International Symposium on Code Generation and Optimization, (134-147)
  31. Zhang H, Venkat A and Hall M Compiler transformation to generate hybrid sparse computations Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms, (34-41)
  32. ACM
    Sampaio D, Ketterlin A, Pouchet L and Rastello F POSTER Proceedings of the 2016 International Conference on Parallel Architectures and Compilation, (439-440)
  33. Wende F, Noack M, Steinke T, Klemm M, Newburn C and Zitzlsberger G Portable SIMD Performance with OpenMP* 4.x Compiler Directives Proceedings of the 22nd International Conference on Euro-Par 2016: Parallel Processing - Volume 9833, (264-277)
  34. ACM
    Sultana N, Calvert A, Overbey J and Arnold G From OpenACC to OpenMP 4 Proceedings of the XSEDE16 Conference on Diversity, Big Data, and Science at Scale, (1-8)
  35. Sridi M, Raffin B and Faucher V (2016). Cache Aware Dynamics Data Layout for Efficient Shared Memory Parallelisation of EUROPLEXUS, Procedia Computer Science, 80:C, (1083-1092), Online publication date: 1-Jun-2016.
  36. ACM
    Bao W, Krishnamoorthy S, Pouchet L, Rastello F and Sadayappan P (2016). PolyCheck: dynamic verification of iteration space transformations on affine programs, ACM SIGPLAN Notices, 51:1, (539-554), Online publication date: 8-Apr-2016.
  37. ACM
    Paisante V, Maalej M, Barbosa L, Gonnord L and Quintão Pereira F Symbolic range analysis of pointers Proceedings of the 2016 International Symposium on Code Generation and Optimization, (171-181)
  38. ACM
    Bagnères L, Zinenko O, Huot S and Bastoul C Opening polyhedral compiler's black box Proceedings of the 2016 International Symposium on Code Generation and Optimization, (128-138)
  39. ACM
    Bao W, Krishnamoorthy S, Pouchet L, Rastello F and Sadayappan P PolyCheck: dynamic verification of iteration space transformations on affine programs Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, (539-554)
  40. Yan C and Cheung A (2016). Leveraging lock contention to improve OLTP application performance, Proceedings of the VLDB Endowment, 9:5, (444-455), Online publication date: 1-Jan-2016.
  41. Membarth R, Reiche O, Hannig F, Teich J, Korner M and Eckert W (2016). HIPAcc: A Domain-Specific Language and Compiler for Image Processing, IEEE Transactions on Parallel and Distributed Systems, 27:1, (210-224), Online publication date: 1-Jan-2016.
  42. ACM
    Alves P, Gruber F, Doerfert J, Lamprineas A, Grosser T, Rastello F and Pereira F (2015). Runtime pointer disambiguation, ACM SIGPLAN Notices, 50:10, (589-606), Online publication date: 18-Dec-2015.
  43. ACM
    Alves P, Gruber F, Doerfert J, Lamprineas A, Grosser T, Rastello F and Pereira F Runtime pointer disambiguation Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, (589-606)
  44. Demontiê F, Cezar J, Bigonha M, Campos F and Magno Quintão Pereira F Automatic Inference of Loop Complexity Through Polynomial Interpolation Proceedings of the 19th Brazilian Symposium on Programming Languages - Volume 9325, (1-15)
  45. ACM
    Mehta S and Yew P (2015). Improving compiler scalability: optimizing large programs at small price, ACM SIGPLAN Notices, 50:6, (143-152), Online publication date: 7-Aug-2015.
  46. Sheikh R, Tuck J and Rotenberg E (2015). Control-Flow Decoupling: An Approach for Timely, Non-Speculative Branching, IEEE Transactions on Computers, 64:8, (2182-2203), Online publication date: 1-Aug-2015.
  47. ACM
    Pananilath I, Acharya A, Vasista V and Bondhugula U (2015). An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations, ACM Transactions on Architecture and Code Optimization, 12:2, (1-23), Online publication date: 8-Jul-2015.
  48. ACM
    Mehta S and Yew P Improving compiler scalability: optimizing large programs at small price Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, (143-152)
  49. Porpodas V, Magni A and Jones T PSLP Proceedings of the 13th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (190-201)
  50. Shirako J, Pouchet L and Sarkar V Oil and water can mix Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, (287-298)
  51. ACM
    Bondhugula U, Bandishti V, Cohen A, Potron G and Vasilache N Tiling and optimizing time-iterated computations on periodic domains Proceedings of the 23rd international conference on Parallel architectures and compilation, (39-50)
  52. ACM
    Piccoli G, Santos H, Rodrigues R, Pousa C, Borin E and Quintão Pereira F Compiler support for selective page migration in NUMA architectures Proceedings of the 23rd international conference on Parallel architectures and compilation, (369-380)
  53. ACM
    Meloni P, Tuveri G, Raffo L, Loi I and Conti F A Stream Buffer Mechanism for Pervasive Splitting Transformations on Polyhedral Process Networks Proceedings of International Workshop on Manycore Embedded Systems, (25-32)
  54. ACM
    Ahmed N, Mateev N and Pingali K Synthesizing transformations for locality enhancement of imperfectly-nested loop nests ACM International Conference on Supercomputing 25th Anniversary Volume, (299-310)
  55. ACM
    Bilmes J, Asanovic K, Chin C and Demmel J Optimizing matrix multiply using PHiPAC ACM International Conference on Supercomputing 25th Anniversary Volume, (253-260)
  56. ACM
    Clauss P Author retrospective for counting solutions to linear and nonlinear constraints through ehrhart polynomials ACM International Conference on Supercomputing 25th Anniversary Volume, (37-39)
  57. ACM
    Grosser T, Cohen A, Holewinski J, Sadayappan P and Verdoolaege S Hybrid Hexagonal/Classical Tiling for GPUs Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (66-75)
  58. ACM
    Grosser T, Cohen A, Holewinski J, Sadayappan P and Verdoolaege S Hybrid Hexagonal/Classical Tiling for GPUs Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, (66-75)
  59. Ketterlin A and Clauss P (2014). Recovering memory access patterns of executable programs, Science of Computer Programming, 80:PB, (440-456), Online publication date: 1-Feb-2014.
  60. ACM
    Samadi M, Hormati A, Lee J and Mahlke S (2014). Leveraging GPUs using cooperative loop speculation, ACM Transactions on Architecture and Code Optimization, 11:1, (1-26), Online publication date: 1-Feb-2014.
  61. ACM
    Park J, Bikshandi G, Vaidyanathan K, Tang P, Dubey P and Kim D Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-12)
  62. ACM
    Park J, Yoo R, Khudia D, Hughes C and Kim D Location-aware cache management for many-core processors with deep cache hierarchy Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-12)
  63. Chen N and Johnson R JFlow Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, (202-212)
  64. Zuo W, Li P, Chen D, Pouchet L, Zhong S and Cong J Improving polyhedral code generation for high-level synthesis Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, (1-10)
  65. ACM
    Kong M, Veras R, Stock K, Franchetti F, Pouchet L and Sadayappan P (2013). When polyhedral transformations meet SIMD code generation, ACM SIGPLAN Notices, 48:6, (127-138), Online publication date: 23-Jun-2013.
  66. ACM
    Kong M, Veras R, Stock K, Franchetti F, Pouchet L and Sadayappan P When polyhedral transformations meet SIMD code generation Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, (127-138)
  67. Leung A, Lhoták O and Lashari G (2013). Parallel execution of Java loops on Graphics Processing Units, Science of Computer Programming, 78:5, (458-480), Online publication date: 1-May-2013.
  68. ACM
    Nandivada V, Shirako J, Zhao J and Sarkar V (2013). A Transformation Framework for Optimizing Task-Parallel Programs, ACM Transactions on Programming Languages and Systems, 35:1, (1-48), Online publication date: 1-Apr-2013.
  69. Bhaskaracharya S and Bondhugula U PolyGLoT Proceedings of the 22nd international conference on Compiler Construction, (123-143)
  70. ACM
    Bhattacharyya A and Amaral J Automatic speculative parallelization of loops using polyhedral dependence analysis Proceedings of the First International Workshop on Code OptimiSation for MultI and many Cores, (1-9)
  71. ACM
    Verdoolaege S, Carlos Juega J, Cohen A, Ignacio Gómez J, Tenllado C and Catthoor F (2013). Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, 9:4, (1-23), Online publication date: 1-Jan-2013.
  72. ACM
    Coelho F and Irigoin F (2013). API compilation for image hardware accelerators, ACM Transactions on Architecture and Code Optimization, 9:4, (1-25), Online publication date: 1-Jan-2013.
  73. Tang P, Park J, Kim D and Petrov V A framework for low-communication 1-D FFT Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-12)
  74. Ding W, Zhang Y, Kandemir M and Son S Compiler-directed file layout optimization for hierarchical storage systems Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  75. Bandishti V, Pananilath I and Bondhugula U Tiling stencil computations to maximize parallelism Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  76. Park J, Tang P, Smelyanskiy M, Kim D and Benson T Efficient backprojection-based synthetic aperture radar computation with many-core processors Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  77. ACM
    Holewinski J, Ramamurthi R, Ravishankar M, Fauzia N, Pouchet L, Rountev A and Sadayappan P (2012). Dynamic trace-based analysis of vectorization potential of applications, ACM SIGPLAN Notices, 47:6, (371-382), Online publication date: 6-Aug-2012.
  78. ACM
    Österlund E and Löwe W Analysis of pure methods using garbage collection Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, (48-57)
  79. ACM
    Holewinski J, Ramamurthi R, Ravishankar M, Fauzia N, Pouchet L, Rountev A and Sadayappan P Dynamic trace-based analysis of vectorization potential of applications Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, (371-382)
  80. Xu G, Yan D and Rountev A Static detection of loop-invariant data structures Proceedings of the 26th European conference on Object-Oriented Programming, (738-763)
  81. ACM
    Stock K, Pouchet L and Sadayappan P (2012). Using machine learning to improve automatic vectorization, ACM Transactions on Architecture and Code Optimization, 8:4, (1-23), Online publication date: 1-Jan-2012.
  82. Yedlapalli P, Kultursay E and Kandemir M Cooperative parallelization Proceedings of the International Conference on Computer-Aided Design, (134-141)
  83. ACM
    Teodoro G, Valle E, Mariano N, Torres R and Meira W Adaptive parallel approximate similarity search for responsive multimedia retrieval Proceedings of the 20th ACM international conference on Information and knowledge management, (495-504)
  84. ACM
    Karmani R, Madhusudan P and Moore B (2011). Thread contracts for safe parallelism, ACM SIGPLAN Notices, 46:8, (125-134), Online publication date: 7-Sep-2011.
  85. ACM
    Li Y and Dos Reis G An automatic parallelization framework for algebraic computation systems Proceedings of the 36th international symposium on Symbolic and algebraic computation, (233-240)
  86. ACM
    Bilardi G, Ekanadham K and Pattnaik P Efficient stack distance computation for priority replacement policies Proceedings of the 8th ACM International Conference on Computing Frontiers, (1-10)
  87. Kandemir M, Zhang Y, Liu J and Yemliha T Neighborhood-aware data locality optimization for NoC-based multicores Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (191-200)
  88. Karrenberg R and Hack S Whole-function vectorization Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization, (141-150)
  89. Henretty T, Stock K, Pouchet L, Franchetti F, Ramanujam J and Sadayappan P Data layout transformation for stencil computations on short-vector SIMD architectures Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software, (225-245)
  90. Liu M, Sha E, Zhuge Q, He Y and Qiu M (2011). Loop Distribution and Fusion with Timing and Code Size Optimization, Journal of Signal Processing Systems, 62:3, (325-340), Online publication date: 1-Mar-2011.
  91. Membarth R, Hannig F, Teich J, Körner M and Eckert W Frameworks for multi-core architectures Proceedings of the 24th international conference on Architecture of computing systems, (62-73)
  92. ACM
    Karmani R, Madhusudan P and Moore B Thread contracts for safe parallelism Proceedings of the 16th ACM symposium on Principles and practice of parallel programming, (125-134)
  93. ACM
    Pouchet L, Bondhugula U, Bastoul C, Cohen A, Ramanujam J, Sadayappan P and Vasilache N Loop transformations Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (549-562)
  94. ACM
    Pouchet L, Bondhugula U, Bastoul C, Cohen A, Ramanujam J, Sadayappan P and Vasilache N (2011). Loop transformations, ACM SIGPLAN Notices, 46:1, (549-562), Online publication date: 26-Jan-2011.
  95. Yeom J and Nikolopoulos D Strider Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  96. Pouchet L, Bondhugula U, Bastoul C, Cohen A, Ramanujam J and Sadayappan P Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, (1-11)
  97. ACM
    Lin H, Liu T, Li H, Chen T, Renganarayana L, O'Brien J and Shao L DMATiler Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (559-560)
  98. ACM
    Patrick C, Kandemir M, Karaköy M, Son S and Choudhary A Cashing in on hints for better prefetching and caching in PVFS and MPI-IO Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, (191-202)
  99. ACM
    Kandemir M, Muralidhara S, Karakoy M and Son S Computation mapping for multi-level storage cache hierarchies Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, (179-190)
  100. Breitbart J An approach for semiautomatic locality optimizations using OpenMP Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2, (291-301)
  101. ACM
    Cardoso J, Diniz P and Weinhardt M (2010). Compiling for reconfigurable computing, ACM Computing Surveys, 42:4, (1-65), Online publication date: 1-Jun-2010.
  102. Raghavendra P, Behki A, Hariprasad K, Mohan M, Jain P, Bhat S, Thejus V and Prabhu V A study of performance scalability by parallelizing loop iterations on multi-core SMPs Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I, (476-486)
  103. ACM
    Wolfe M Implementing the PGI Accelerator model Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, (43-50)
  104. Robert Y and Vivien F Algorithmic issues in grid computing Algorithms and theory of computation handbook, (29-29)
  105. Yang C and Lai K (2009). A directive-based MPI code generator for Linux PC clusters, The Journal of Supercomputing, 50:2, (177-207), Online publication date: 1-Nov-2009.
  106. ACM
    Lublinerman R, Chaudhuri S and Cerny P Parallel programming with object assemblies Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications, (61-80)
  107. ACM
    Lublinerman R, Chaudhuri S and Cerny P (2009). Parallel programming with object assemblies, ACM SIGPLAN Notices, 44:10, (61-80), Online publication date: 25-Oct-2009.
  108. Diouf B, Ozturk O and Cohen A Optimizing local memory allocation and assignment through a decoupled approach Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing, (408-415)
  109. ACM
    Leung A, Lhoták O and Lashari G Automatic parallelization for graphics processing units Proceedings of the 7th International Conference on Principles and Practice of Programming in Java, (91-100)
  110. ACM
    Zhong Y, Shen X and Ding C (2009). Program locality analysis using reuse distance, ACM Transactions on Programming Languages and Systems, 31:6, (1-39), Online publication date: 1-Aug-2009.
  111. ACM
    Bilardi G, Ekanadham K and Pattnaik P (2009). On approximating the ideal random access machine by physical machines, Journal of the ACM, 56:5, (1-57), Online publication date: 1-Aug-2009.
  112. ACM
    Wasserrab D, Lohner D and Snelting G On PDG-based noninterference and its modular proof Proceedings of the ACM SIGPLAN Fourth Workshop on Programming Languages and Analysis for Security, (31-44)
  113. ACM
    Shirako J, Zhao J, Nandivada V and Sarkar V Chunking parallel loops in the presence of synchronization Proceedings of the 23rd international conference on Supercomputing, (181-192)
  114. ACM
    Nicolau A, Li G, Veidenbaum A and Kejariwal A Synchronization optimizations for efficient execution on multi-cores Proceedings of the 23rd international conference on Supercomputing, (169-180)
  115. Dig D, Marrero J and Ernst M Refactoring sequential Java code for concurrency via concurrent libraries Proceedings of the 31st International Conference on Software Engineering, (397-407)
  116. ACM
    Kejariwal A, Nicolau A, Banerjee U, Veidenbaum A and Polychronopoulos C Cache-aware partitioning of multi-dimensional iteration spaces Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference, (1-12)
  117. ACM
    Unnikrishnan P, Chen G, Kandemir M, Karakoy M and Kolcu I (2009). Reducing memory requirements of resource-constrained applications, ACM Transactions on Embedded Computing Systems, 8:3, (1-37), Online publication date: 1-Apr-2009.
  118. ACM
    Hohenauer M, Engel F, Leupers R, Ascheid G and Meyr H (2009). A SIMD optimization framework for retargetable compilers, ACM Transactions on Architecture and Code Optimization, 6:1, (1-27), Online publication date: 30-Mar-2009.
  119. ACM
    Sidelnik A, Sung I, Wu W, Garzarán M, Hwu W, Nahrstedt K, Padua D and Patel S Optimization of tele-immersion codes Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, (85-93)
  120. ACM
    Qiu M and Sha E (2009). Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems, ACM Transactions on Design Automation of Electronic Systems, 14:2, (1-30), Online publication date: 1-Mar-2009.
  121. ACM
    Mittal G, Zaretsky D and Banerjee P Streaming implementation of a sequential decompression algorithm on an FPGA Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays, (283-283)
  122. ACM
    Son S, Kandemir M, Karakoy M and Chakrabarti D (2009). A compiler-directed data prefetching scheme for chip multiprocessors, ACM SIGPLAN Notices, 44:4, (209-218), Online publication date: 14-Feb-2009.
  123. ACM
    Son S, Kandemir M, Karakoy M and Chakrabarti D A compiler-directed data prefetching scheme for chip multiprocessors Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, (209-218)
  124. Masri W and Podgurski A (2009). Algorithms and tool support for dynamic information flow analysis, Information and Software Technology, 51:2, (385-404), Online publication date: 1-Feb-2009.
  125. Ozturk O, Son S, Kandemir M and Karakoy M Prefetch throttling and data pinning for improving performance of shared caches Proceedings of the 2008 ACM/IEEE conference on Supercomputing, (1-12)
  126. ACM
    Son S, Muralidhara S, Ozturk O, Kandemir M, Kolcu I and Karakoy M Profiler and compiler assisted adaptive I/O prefetching for shared storage caches Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (112-121)
  127. ACM
    Nuzman D and Zaks A Outer-loop vectorization Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (2-11)
  128. ACM
    Ghodrat M, Givargis T and Nicolau A Control flow optimization in loops using interval analysis Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, (157-166)
  129. ACM
    Arenaz M, Touriño J and Doallo R (2008). XARK, ACM Transactions on Programming Languages and Systems, 30:6, (1-56), Online publication date: 1-Oct-2008.
  130. Dwyer M and Purandare R Residual Checking of Safety Properties Proceedings of the 15th international workshop on Model Checking Software, (1-2)
  131. Torkey F, Salah A, El Desouky N and Gomaa S Affine and unimodular transformations for non-uniform nested loops Proceedings of the 12th WSEAS international conference on Computers, (414-419)
  132. ACM
    Pouchet L, Bastoul C, Cohen A and Cavazos J Iterative optimization in the polyhedral model Proceedings of the 29th ACM SIGPLAN Conference on Programming Language Design and Implementation, (90-100)
  133. ACM
    Pouchet L, Bastoul C, Cohen A and Cavazos J (2008). Iterative optimization in the polyhedral model, ACM SIGPLAN Notices, 43:6, (90-100), Online publication date: 30-May-2008.
  134. ACM
    Nuzman D, Namolaru M, Zaks A and Derby J Compiling for an indirect vector register architecture Proceedings of the 5th conference on Computing frontiers, (199-208)
  135. Qiu M, Sha E, Liu M, Lin M, Hua S and Yang L (2008). Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP, Journal of Parallel and Distributed Computing, 68:4, (443-455), Online publication date: 1-Apr-2008.
  136. Kandemir M, Son S and Karakoy M Improving I/O performance of applications through compiler-directed code restructuring Proceedings of the 6th USENIX Conference on File and Storage Technologies, (1-16)
  137. Xue J, Guo M and Wei D (2008). Improving the parallelism of iterative methods by aggressive loop fusion, The Journal of Supercomputing, 43:2, (147-164), Online publication date: 1-Feb-2008.
  138. Meduna A and Techet J (2007). Canonical scattered context generators of sentences with their parses, Theoretical Computer Science, 389:1-2, (73-81), Online publication date: 10-Dec-2007.
  139. ACM
    Zhao P, Cui S, Gao Y, Silvera R and Amaral J (2007). Forma, ACM Transactions on Programming Languages and Systems, 30:1, (2-es), Online publication date: 1-Nov-2007.
  140. Huang C, Ravi S, Raghunathan A and Jha N (2007). Generation of heterogeneous distributed architectures for memory-intensive applications through high-level synthesis, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 15:11, (1191-1204), Online publication date: 1-Nov-2007.
  141. Escuder V, Durán R and Rico R Quantifying ILP by means of graph theory Proceedings of the 2nd international conference on Performance evaluation methodologies and tools, (1-7)
  142. Woo Son S, Chen G, Ozturk O, Kandemir M and Choudhary A (2007). Compiler-Directed Energy Optimization for Parallel Disk Based Systems, IEEE Transactions on Parallel and Distributed Systems, 18:9, (1241-1257), Online publication date: 1-Sep-2007.
  143. Du J, Yang X, Wang G, Tang T and Zeng K Architecture-based optimization for mapping scientific applications to imagine Proceedings of the 5th international conference on Parallel and Distributed Processing and Applications, (32-43)
  144. Zhao J, Horsnell M, Rogers I, Dinn A, Kirkham C and Watson I Optimizing chip multiprocessor work distribution using dynamic compilation Proceedings of the 13th international Euro-Par conference on Parallel Processing, (258-267)
  145. Mahajan A and Ali M Optimization of memory system in real-time embedded systems Proceedings of the 11th WSEAS International Conference on Computers, (13-19)
  146. ACM
    Rus S, Pennings M and Rauchwerger L Sensitivity analysis for automatic parallelization on multi-cores Proceedings of the 21st annual international conference on Supercomputing, (263-273)
  147. ACM
    Chandraiah P and Doemer R Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification Proceedings of the 44th annual Design Automation Conference, (787-790)
  148. ACM
    Xue L, Ozturk O and Kandemir M A memory-conscious code parallelization scheme Proceedings of the 44th annual Design Automation Conference, (230-233)
  149. Dutta H, Hannig F, Ruckdeschel H and Teich J (2007). Efficient control generation for mapping nested loop programs onto processor arrays, Journal of Systems Architecture: the EUROMICRO Journal, 53:5-6, (300-309), Online publication date: 1-May-2007.
  150. Kandemir M, Yemliha T, Son S and Ozturk O Memory bank aware dynamic loop scheduling Proceedings of the conference on Design, automation and test in Europe, (1671-1676)
  151. Meijer S, Kienhuis B, Turjan A and de Kock E Interactive presentation: A process splitting transformation for Kahn process networks Proceedings of the conference on Design, automation and test in Europe, (1355-1360)
  152. ACM
    Kejariwal A, Tian X, Girkar M, Li W, Kozhukhov S, Banerjee U, Nicolau A, Veidenbaum A and Polychronopoulos C Tight analysis of the performance potential of thread speculation using spec CPU 2006 Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, (215-225)
  153. Ozturk O, Chen G, Kandemir M and Karakoy M Compiler-Directed Variable Latency Aware SPM Management to CopeWith Timing Problems Proceedings of the International Symposium on Code Generation and Optimization, (232-243)
  154. Pouchet L, Bastoul C, Cohen A and Vasilache N Iterative Optimization in the Polyhedral Model Proceedings of the International Symposium on Code Generation and Optimization, (144-156)
  155. Birkbeck N, Levesque J and Amaral J A Dimension Abstraction Approach to Vectorization in Matlab Proceedings of the International Symposium on Code Generation and Optimization, (115-130)
  156. Yang X, Du J, Yan X and Deng Y Matrix-Based programming optimization for improving memory hierarchy performance on imagine Proceedings of the 4th international conference on Parallel and Distributed Processing and Applications, (782-793)
  157. ACM
    Franchetti F, Voronenko Y and Püschel M FFT program generation for shared memory Proceedings of the 2006 ACM/IEEE conference on Supercomputing, (115-es)
  158. ACM
    Govindaraju N, Larsen S, Gray J and Manocha D A memory model for scientific algorithms on graphics processors Proceedings of the 2006 ACM/IEEE conference on Supercomputing, (89-es)
  159. ACM
    Birch J, van Engelen R, Gallivan K and Shou Y An empirical evaluation of chains of recurrences for array dependence testing Proceedings of the 15th international conference on Parallel architectures and compilation techniques, (295-304)
  160. Hillers M and Nebel W Impact of array data flow analysis on the design of energy-efficient circuits Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation, (117-126)
  161. Amiranoff P, Cohen A and Feautrier P Beyond iteration vectors Proceedings of the 13th international conference on Static Analysis, (161-180)
  162. Hu Z, del Cuvillo J, Zhu W and Gao G Optimization of dense matrix multiplication on IBM cyclops-64 Proceedings of the 12th international conference on Parallel Processing, (134-144)
  163. ACM
    Vasilache N, Bastoul C, Cohen A and Girbal S Violated dependence analysis Proceedings of the 20th annual international conference on Supercomputing, (335-344)
  164. ACM
    Nuzman D, Rosen I and Zaks A Auto-vectorization of interleaved data for SIMD Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, (132-143)
  165. ACM
    Nuzman D, Rosen I and Zaks A (2006). Auto-vectorization of interleaved data for SIMD, ACM SIGPLAN Notices, 41:6, (132-143), Online publication date: 11-Jun-2006.
  166. Narayanan S, Kandemir M, Brooks R and Kolcu I Secure execution of computations in untrusted hosts Proceedings of the 11th Ada-Europe international conference on Reliable Software Technologies, (106-118)
  167. ACM
    Ozturk O, Chen G and Kandemir M Multi-compilation Proceedings of the 3rd conference on Computing frontiers, (157-170)
  168. ACM
    Son S and Kandemir M Energy-aware data prefetching for multi-speed disks Proceedings of the 3rd conference on Computing frontiers, (105-114)
  169. Ciorba F, Andronikos T, Riakiotakis I, Chronopoulos A and Papakonstantinou G Dynamic multi phase scheduling for heterogeneous cluste Proceedings of the 20th international conference on Parallel and distributed processing, (72-72)
  170. Zhou J and Zeng G A general data dependence analysis to nested loop using integer interval theory Proceedings of the 20th international conference on Parallel and distributed processing, (386-386)
  171. Cornwall J, Beckmann O and Kelly P Automatically translating a general purpose C++ image processing library for GPUs Proceedings of the 20th international conference on Parallel and distributed processing, (381-381)
  172. Fishgold L, Danalis A, Pollock L and Swany M An automated approach to improve communication-computation overlap in clusters Proceedings of the 20th international conference on Parallel and distributed processing, (290-290)
  173. ACM
    Kandemir M (2006). Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality, ACM Transactions on Design Automation of Electronic Systems, 11:2, (410-441), Online publication date: 1-Apr-2006.
  174. Vasilache N, Bastoul C and Cohen A Polyhedral code generation in the real world Proceedings of the 15th international conference on Compiler Construction, (185-201)
  175. Ozturk O, Kandemir M and Kolcu I Shared Scratch-Pad Memory Space Management Proceedings of the 7th International Symposium on Quality Electronic Design, (576-584)
  176. Nuzman D and Henderson R Multi-platform Auto-vectorization Proceedings of the International Symposium on Code Generation and Optimization, (281-294)
  177. Dutta H, Hannig F and Teich J Controller synthesis for mapping partitioned programs on array architectures Proceedings of the 19th international conference on Architecture of Computing Systems, (176-190)
  178. ACM
    Shou Y, van Engelen R, Birch J and Gallivan K Toward efficient flow-sensitive induction variable analysis and dependence testing for loop optimization Proceedings of the 44th annual Southeast regional conference, (1-6)
  179. Chen G, Ozturk O, Kandemir M and Karakoy M Dynamic scratch-pad memory management for irregular array access patterns Proceedings of the conference on Design, automation and test in Europe: Proceedings, (931-936)
  180. Korch M and Rauber T (2006). Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining, Journal of Parallel and Distributed Computing, 66:3, (444-468), Online publication date: 1-Mar-2006.
  181. Ozturk O, Chen G, Kandemir M and Kolcu I Compiler-Guided data compression for reducing memory consumption of embedded applications Proceedings of the 2006 Asia and South Pacific Design Automation Conference, (814-819)
  182. Liu M, Zhuge Q, Shao Z, Xue C, Qiu M and Sha E Loop distribution and fusion with timing and code size optimization for embedded DSPs Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing, (121-130)
  183. Giordano M and Furnari M An incremental compilation approach for OpenMP applications Proceedings of the 2005 IFIP international conference on Network and Parallel Computing, (249-252)
  184. ACM
    Vargas P, de Castro Dutra I, Dalto do Nascimento V, Santos L, da Silva L, Geyer C and Schulze B Hierarchical submission in a Grid environment Proceedings of the 3rd international workshop on Middleware for grid computing, (1-6)
  185. Pop S, Cohen A and Silber G Induction variable analysis with delayed abstractions Proceedings of the First international conference on High Performance Embedded Architectures and Compilers, (218-232)
  186. Larsen S, Rabbah R and Amarasinghe S Exploiting Vector Parallelism in Software Pipelined Loops Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, (119-129)
  187. Xue J Aggressive loop fusion for improving locality and parallelism Proceedings of the Third international conference on Parallel and Distributed Processing and Applications, (224-238)
  188. Sheridan-Smith N, O'Neill T, Leaney J and Hunter M Enhancements to policy distribution for control flow and looping Proceedings of the 16th IFIP/IEEE Ambient Networks international conference on Distributed Systems: operations and Management, (269-280)
  189. Li X and Agrawal G Code transformations for one-pass analysis Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (377-396)
  190. Shirako J, Oshiyama N, Wada Y, Shikano H, Kimura K and Kasahara H Compiler control power saving scheme for multi core processors Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing, (362-376)
  191. ACM
    Ozturk O, Kandemir M and Irwin M Increasing on-chip memory space utilization for embedded chip multiprocessors through data compression Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (87-92)
  192. Shirako J, Yoshida M, Oshiyama N, Wada Y, Nakano H, Shikano H, Kimura K and Kasahara H Performance evaluation of compiler controlled power saving scheme Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems, (480-493)
  193. Murai H and Okabe Y Pipelined parallelization in HPF programs on the earth simulator Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems, (365-373)
  194. Chen G, Kandemir M and Karakoy M Memory space conscious loop iteration duplication for reliable execution Proceedings of the 12th international conference on Static Analysis, (52-69)
  195. Bernabé G, García J and González J (2005). Reducing 3D Fast Wavelet Transform Execution Time Using Blocking and the Streaming SIMD Extensions, Journal of VLSI Signal Processing Systems, 41:2, (209-223), Online publication date: 1-Sep-2005.
  196. ACM
    Chen G and Kandemir M Dataflow analysis for energy-efficient scratch-pad memory management Proceedings of the 2005 international symposium on Low power electronics and design, (327-330)
  197. ACM
    Kandemir M, Son S and Chen G An evaluation of code and data optimizations in the context of disk power reduction Proceedings of the 2005 international symposium on Low power electronics and design, (209-214)
  198. Ruckdeschel H, Dutta H, Hannig F and Teich J Automatic FIR filter generation for FPGAs Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation, (51-61)
  199. ACM
    Kandemir M, Chen G and Kadayif I (2005). Compiling for memory emergency, ACM SIGPLAN Notices, 40:7, (213-221), Online publication date: 12-Jul-2005.
  200. ACM
    Xu R and Li Z (2005). A sample-based cache mapping scheme, ACM SIGPLAN Notices, 40:7, (166-174), Online publication date: 12-Jul-2005.
  201. ACM
    Son S, Chen G and Kandemir M Disk layout optimization for reducing energy consumption Proceedings of the 19th annual international conference on Supercomputing, (274-283)
  202. ACM
    Chafi H, Minh C, McDonald A, Carlstrom B, Chung J, Hammond L, Kozyrakis C and Olukotun K TAPE Proceedings of the 19th annual international conference on Supercomputing, (199-208)
  203. ACM
    Cohen A, Sigler M, Girbal S, Temam O, Parello D and Vasilache N Facilitating the search for compositions of program transformations Proceedings of the 19th annual international conference on Supercomputing, (151-160)
  204. ACM
    Son S, Chen G, Kandemir M and Choudhary A Exposing disk layout to compiler for reducing energy consumption of parallel disk based systems Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, (174-185)
  205. ACM
    Kejariwal A, Nicolau A, Banerjee U and Polychronopoulos C A novel approach for partitioning iteration spaces with variable densities Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, (120-131)
  206. ACM
    Darte A and Schreiber R A linear-time algorithm for optimal barrier placement Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming, (26-35)
  207. ACM
    Kandemir M, Chen G and Kadayif I Compiling for memory emergency Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, (213-221)
  208. ACM
    Xu R and Li Z A sample-based cache mapping scheme Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, (166-174)
  209. Li F, Chen G, Kandemir M and Kolcu I Improving scratch-pad memory reliability through compiler-guided data block duplication Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design, (1002-1005)
  210. Li F, Chen G and Kandemir M Compiler-directed voltage scaling on communication links for reducing power consumption Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design, (456-460)
  211. Kim H A new carried-dependence self-scheduling algorithm Proceedings of the 2005 international conference on Computational Science and its Applications - Volume Part I, (281-290)
  212. Kadayif I, Kandemir M, Chen G, Ozturk O, Karakoy M and Sezer U (2005). Optimizing Array-Intensive Applications for On-Chip Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 16:5, (396-411), Online publication date: 1-May-2005.
  213. Grelck C (2005). Shared memory multiprocessor support for functional array processing in SAC, Journal of Functional Programming, 15:3, (353-401), Online publication date: 1-May-2005.
  214. Goldberg B, Zuck L and Barrett C (2005). Into the Loops, Electronic Notes in Theoretical Computer Science (ENTCS), 132:1, (53-71), Online publication date: 1-May-2005.
  215. Thomas A and Olukotun K An Application Analysis Framework For Polymorphic Chip Multiprocessors Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
  216. Son S, Kandemir M and Choudhary A Software-Directed Disk Power Management for Scientific Applications Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
  217. Tian X, Krishnaiyer R, Saito H, Girkar M and Li W Impact of Compiler-based Data-Prefetching Techniques on SPEC OMP Application Performance Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
  218. Nalabalapu P and Sass R Bandwidth Management with a Reconfigurable Data Cache Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
  219. Barton C, Tal A, Blainey B and Amaral J Generalized index-set splitting Proceedings of the 14th international conference on Compiler Construction, (106-120)
  220. Shashidhar K, Bruynooghe M, Catthoor F and Janssens G Verification of source code transformations by program equivalence checking Proceedings of the 14th international conference on Compiler Construction, (221-236)
  221. Li F, Chen G, Kandemir M and Brooks R A compiler-based approach to data security Proceedings of the 14th international conference on Compiler Construction, (188-203)
  222. ACM
    Bulić P and Guštin V (2005). An efficient way to filter out data dependences with a sufficiently large distance between memory references, ACM SIGPLAN Notices, 40:4, (51-60), Online publication date: 1-Apr-2005.
  223. Chen G and Kandemir M Optimizing Address Code Generation for Array-Intensive DSP Applications Proceedings of the international symposium on Code generation and optimization, (141-152)
  224. Chen G, Kandemir M and Karakoy M A Constraint Network Based Approach to Memory Layout Optimization Proceedings of the conference on Design, Automation and Test in Europe - Volume 2, (1156-1161)
  225. ACM
    Zhang C and Kurdahi F On combining iteration space tiling with data space tiling for scratch-pad memory systems Proceedings of the 2005 Asia and South Pacific Design Automation Conference, (973-976)
  226. Meduna A and Techet J (2005). Generation of sentences with their parses, Acta Cybernetica, 17:1, (11-20), Online publication date: 1-Jan-2005.
  227. Tian X and Girkar M Effect of optimizations on performance of OpenMP programs Proceedings of the 11th international conference on High Performance Computing, (133-143)
  228. Huang C, Ravi S, Raghunathan A and Jha N High-level synthesis using computation-unit integrated memories Proceedings of the 2004 IEEE/ACM International conference on Computer-aided design, (783-790)
  229. Kadayif I and Kandemir M (2004). Quasidynamic Layout Optimizations for Improving Data Locality, IEEE Transactions on Parallel and Distributed Systems, 15:11, (996-1011), Online publication date: 1-Nov-2004.
  230. ACM
    Liu M, Zhuge Q, Shao Z and Sha E General loop fusion technique for nested loops considering timing and code size Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, (190-201)
  231. Kejariwal A, D'Alberto P, Nicolau A and Polychronopoulos C A geometric approach for partitioning n-dimensional non-rectangular iteration spaces Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (102-116)
  232. Ishizaka K, Miyamoto T, Shirako J, Obata M, Kimura K and Kasahara H Performance of OSCAR multigrain parallelizing compiler on SMP servers Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (319-331)
  233. Chen G, Ozturk O and Kandemir M An ILP-Based approach to locality optimization Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing, (149-163)
  234. ACM
    Kandemir M, Kadayif I and Chen G Compiler-directed code restructuring for reducing data TLB energy Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (98-103)
  235. Song Y, Xu R, Wang C and Li Z (2004). Improving Data Locality by Array Contraction, IEEE Transactions on Computers, 53:9, (1073-1084), Online publication date: 1-Sep-2004.
  236. Bulić P and Guštin V On dependence analysis for SIMD enhanced processors Proceedings of the 6th international conference on High Performance Computing for Computational Science, (527-540)
  237. ACM
    van Engelen R, Birch J, Shou Y, Walsh B and Gallivan K A unified framework for nonlinear dependence testing and symbolic analysis Proceedings of the 18th annual international conference on Supercomputing, (106-115)
  238. Drakenberg N A matrix-type for performance–portability Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (237-246)
  239. ACM
    Ozturk O, Kandemir M, Demirkiran I, Chen G and Irwin M Data compression for improving SPM behavior Proceedings of the 41st annual Design Automation Conference, (401-406)
  240. ACM
    Kandemir M LODS Proceedings of the 41st annual Design Automation Conference, (125-128)
  241. ACM
    Allen R and Kennedy K (2004). Automatic loop interchange, ACM SIGPLAN Notices, 39:4, (75-90), Online publication date: 1-Apr-2004.
  242. De La Luz V, Kadayif I, Kandemir M and Sezer U (2004). Access Pattern Restructuring for Memory Energy, IEEE Transactions on Parallel and Distributed Systems, 15:4, (289-303), Online publication date: 1-Apr-2004.
  243. Kim H, Yoon Y and Han D (2004). Parallel Processing of First Order Linear Recurrence on SMP Machines, The Journal of Supercomputing, 27:3, (295-310), Online publication date: 1-Mar-2004.
  244. Hsu C and Kremer U (2004). A Quantitative Analysis of Tile Size Selection Algorithms, The Journal of Supercomputing, 27:3, (279-294), Online publication date: 1-Mar-2004.
  245. Kandemir M Impact of Data Transformations on Memory Bank Locality Proceedings of the conference on Design, automation and test in Europe - Volume 1
  246. ACM
    Song L and Kavi K (2004). What can we gain by unfolding loops?, ACM SIGPLAN Notices, 39:2, (26-33), Online publication date: 1-Feb-2004.
  247. Guo M Linear data distribution based on index analysis High performance scientific and engineering computing, (15-29)
  248. Chang W, Huang J and Chu C (2004). Using Elementary Linear Algebra to Solve Data Alignment for Arrays with Linear or Quadratic References, IEEE Transactions on Parallel and Distributed Systems, 15:1, (28-39), Online publication date: 1-Jan-2004.
  249. De La Luz V and Kandemir M (2004). Array Regrouping and Its Use in Compiling Data-Intensive Embedded Applications, IEEE Transactions on Computers, 53:1, (1-19), Online publication date: 1-Jan-2004.
  250. ACM
    Ding Y and Li Z A Compiler Analysis of Interprocedural Data Communication Proceedings of the 2003 ACM/IEEE conference on Supercomputing
  251. Chen G, Kandemir M, Nadgir A and Sezer U Array Composition and Decomposition for Optimizing Embedded Applications Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
  252. Huang C, Ravi S, Raghunathan A and Jha N Synthesis of Heterogeneous Distributed Architectures for Memory-Intensive Applications Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
  253. ACM
    Menon V, Pingali K and Mateev N (2003). Fractal symbolic analysis, ACM Transactions on Programming Languages and Systems, 25:6, (776-813), Online publication date: 1-Nov-2003.
  254. Scholz S (2003). Single Assignment C: efficient support for high-level array operations in a functional setting, Journal of Functional Programming, 13:6, (1005-1059), Online publication date: 1-Nov-2003.
  255. ACM
    Chen G, Kandemir M, Saputra H and Irwin M Exploiting bank locality in multi-bank memories Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, (287-297)
  256. ACM
    Colavin O and Rizzo D A scalable wide-issue clustered VLIW with a reconfigurable interconnect Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, (148-158)
  257. ACM
    Naishlos D, Biberstein M, Ben-David S and Zaks A Vectorizing for a SIMdD DSP architecture Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, (2-11)
  258. Talla D, John L and Burger D (2003). Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements, IEEE Transactions on Computers, 52:8, (1015-1031), Online publication date: 1-Aug-2003.
  259. Onbasçioglu E and Özdamar L (2003). Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing, The Journal of Supercomputing, 25:3, (237-253), Online publication date: 1-Jul-2003.
  260. ACM
    Arenaz M, Touriño J and Doallo R A GSA-based compiler infrastructure to extract parallelism from complex loops Proceedings of the 17th annual international conference on Supercomputing, (193-204)
  261. ACM
    Zhang W, Karakoy M, Kandemir M and Chen G A compiler approach for reducing data cache energy Proceedings of the 17th annual international conference on Supercomputing, (76-85)
  262. Unnikrishnan P, Chen G, Kandemir M, Karakoy M and Kolcu I Loop transformations for reducing data space requirements of resource-constrained applications Proceedings of the 10th international conference on Static analysis, (383-400)
  263. ACM
    Zhang W, Chen G, Kandemir M and Karakoy M Interprocedural optimizations for improving data cache performance of array-intensive embedded applications Proceedings of the 40th annual Design Automation Conference, (887-892)
  264. Ghosh S, Kanhere A, Krishnaiyer R, Kulkarni D, Li W, Lim C and Ng J Integrating high-level optimizations in a production compiler Proceedings of the 12th international conference on Compiler construction, (303-319)
  265. Kandemir M, Irwin M, Chen G and Ramanujam J Address register assignment for reducing code size Proceedings of the 12th international conference on Compiler construction, (273-289)
  266. Kandemir M, Choudhary A, Ramanujam J and Banerjee P (2003). Reducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework, IEEE Transactions on Parallel and Distributed Systems, 14:4, (337-354), Online publication date: 1-Apr-2003.
  267. Bulić P and Guštin V (2003). An extended ANSI C for processors with a multimedia extension, International Journal of Parallel Programming, 31:2, (107-136), Online publication date: 1-Apr-2003.
  268. Chen M and Olukotun K TEST Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, (301-312)
  269. Marathe J, Mueller F, Mohan T, de Supinski B, McKee S and Yoo A METRIC Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization, (289-300)
  270. De La Luz V, Kandemir M, Kadayif I and Sezer U Generalized Data Transformations for Enhancing Cache Behavior Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
  271. Kandemir M, Zhang W and Karakoy M Runtime Code Parallelization for On-Chip Multiprocessors Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
  272. Högstedt K, Carter L and Ferrante J (2003). On the Parallel Execution Time of Tiled Loops, IEEE Transactions on Parallel and Distributed Systems, 14:3, (307-321), Online publication date: 1-Mar-2003.
  273. Kittitornkun S and Hu Y (2003). Processor Array Synthesis from Shift-Variant Deep Nested Do Loops, The Journal of Supercomputing, 24:3, (229-249), Online publication date: 1-Mar-2003.
  274. Quinn M, Miller R, Miller R and Quinn M Parallel processing Encyclopedia of Computer Science, (1349-1365)
  275. Lee H and Fortes J (2003). Generation of Injective and Reversible Modular Mappings, IEEE Transactions on Parallel and Distributed Systems, 14:1, (1-12), Online publication date: 1-Jan-2003.
  276. Vijaykrishnan N, Kandemir M, Irwin M, Kim H, Ye W and Duarte D (2003). Evaluating Integrated Hardware-Software Optimizations Using a Unified Energy Estimation Framework, IEEE Transactions on Computers, 52:1, (59-76), Online publication date: 1-Jan-2003.
  277. Schweitz E and Agrawal D (2002). A Parallelization Domain Oriented Multilevel Graph Partitioner, IEEE Transactions on Computers, 51:12, (1435-1441), Online publication date: 1-Dec-2002.
  278. Zhang W, Hu J, Degalahal V, Kandemir M, Vijaykrishnan N and Irwin M Compiler-directed instruction cache leakage optimization Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, (208-218)
  279. ACM
    Unnikrishnan P, Chen G, Kandemir M and Mudgett D Dynamic compilation for energy adaptation Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design, (158-163)
  280. ACM
    Kandemir M, Kadayif I, Choudhary A and Zambreno J Optimizing inter-nest data locality Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, (127-135)
  281. Tembe W and Pande S (2002). Loop Restructuring for Data I/O Minimization on Limited On-Chip Memory Embedded Processors, IEEE Transactions on Computers, 51:10, (1269-1280), Online publication date: 1-Oct-2002.
  282. Goldberg B, Crutcher E, Huneycutt C and Palem K Software Bubbles Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, (211-221)
  283. ACM
    Bilardi G, Ekanadham K and Pattnaik P Optimal organizations for pipelined hierarchical memories Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, (109-116)
  284. Kadayif I, Kandemir M and Choudhary A A hybrid strategy based on data distribution and migration for optimizing memory locality Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing, (111-125)
  285. Bik A, Girkar M, Grey P and Tian X Automatic detection of saturation and clipping idioms Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing, (61-74)
  286. Obata M, Shirako J, Kaminaga H, Ishizaka K and Kasahara H Hierarchical parallelism control for multigrain parallel processing Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing, (31-44)
  287. ACM
    Saputra H, Kandemir M, Vijaykrishnan N, Irwin M, Hu J, Hsu C and Kremer U (2002). Energy-conscious compilation based on voltage scaling, ACM SIGPLAN Notices, 37:7, (2-11), Online publication date: 17-Jul-2002.
  288. ACM
    Saputra H, Kandemir M, Vijaykrishnan N, Irwin M, Hu J, Hsu C and Kremer U Energy-conscious compilation based on voltage scaling Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems, (2-11)
  289. Larus J and Parkes M Using Cohort-Scheduling to Enhance Server Performance Proceedings of the General Track of the annual conference on USENIX Annual Technical Conference, (103-114)
  290. ACM
    Kadayif I, Kandemir M and Sezer U An integer linear programming based approach for parallelizing applications in On-chip multiprocessors Proceedings of the 39th annual Design Automation Conference, (703-706)
  291. ACM
    Kandemir M and Choudhary A Compiler-directed scratch pad memory hierarchy design and management Proceedings of the 39th annual Design Automation Conference, (628-633)
  292. ACM
    Huang Z and Malik S Exploiting operation level parallelism through dynamically reconfigurable datapaths Proceedings of the 39th annual Design Automation Conference, (337-342)
  293. ACM
    Kandemir M, Ramanujam J and Choudhary A Exploiting shared scratch pad memory space in embedded multiprocessor systems Proceedings of the 39th annual Design Automation Conference, (219-224)
  294. ACM
    Kadayif I, Kandemir M and Karakoy M An energy saving strategy based on adaptive loop parallelization Proceedings of the 39th annual Design Automation Conference, (195-200)
  295. ACM
    Kadayif I, Kandemir M, Kolcu I and Chen G Locality-conscious process scheduling in embedded systems Proceedings of the tenth international symposium on Hardware/software codesign, (193-198)
  296. Gao G, Theobald K, Hu Z, Wu H, Lu J, Pingali K, Stodghill P, Sterling T, Stevens R and Hereld M Next Generation System Software for Future High-End Computing Systems Proceedings of the 16th International Parallel and Distributed Processing Symposium
  297. Zoppetti G, Agrawal G and Kumar R Compiler and Runtime Support for Irregular Reductions on a Multithreaded Architecture Proceedings of the 16th International Parallel and Distributed Processing Symposium
  298. Kandemir M and Choudhary A Compiler-Directed I/O Optimization Proceedings of the 16th International Parallel and Distributed Processing Symposium
  299. ACM
    Leopold C On optimal temporal locality of stencil codes Proceedings of the 2002 ACM symposium on Applied computing, (948-952)
  300. Kandemir M A Compiler-Based Approach for Improving Intra-Iteration Data Reuse Proceedings of the conference on Design, automation and test in Europe
  301. Kandemir M, Choudhary A and Ramanujam J (2002). An I/O-Conscious Tiling Strategy for Disk-Resident Data Sets, The Journal of Supercomputing, 21:3, (257-284), Online publication date: 1-Mar-2002.
  302. Chen T and Chang C (2002). Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers, The Journal of Supercomputing, 21:2, (191-211), Online publication date: 25-Feb-2002.
  303. Böhm W, Hammes J, Draper B, Chawathe M, Ross C, Rinker R and Najjar W (2002). Mapping a Single Assignment Programming Language to Reconfigurable Systems, The Journal of Supercomputing, 21:2, (117-130), Online publication date: 25-Feb-2002.
  304. Crosbie N, Kandemir M, Kolcu I, Ramanujam J and Choudhary A Strategies for Improving Data Locality in Embedded Applications Proceedings of the 2002 Asia and South Pacific Design Automation Conference
  305. Delaluz V, Kandemir M, Vijaykrishnan N, Irwin M, Sivasubramaniam A and Kolcu I Compiler-Directed Array Interleaving for Reducing Energy in Multi-Bank Memories Proceedings of the 2002 Asia and South Pacific Design Automation Conference
  306. Bednara M, Hannig F and Teich J Generation of distributed loop control Embedded processor design challenges, (154-170)
  307. Coors M, Keding H, Lüthje O and Meyr H (2002). Design and DSP implementation of fixed-point systems, EURASIP Journal on Advances in Signal Processing, 2002:1, (908-925), Online publication date: 1-Jan-2002.
  308. Moreira J, Midkiff S, Gupta M, Wu P, Almasi G and Artigas P (2002). NINJA: Java for high performance numerical computing, Scientific Programming, 10:1, (19-33), Online publication date: 1-Jan-2002.
  309. Ramasubramanian N, Subramanian R and Pande S (2002). Automatic Compilation of Loops to Exploit Operator Parallelism on Configurable Arithmetic Logic Units, IEEE Transactions on Parallel and Distributed Systems, 13:1, (45-66), Online publication date: 1-Jan-2002.
  310. Loechner V, Meister B and Clauss P (2002). Precise Data Locality Optimization of Nested Loops, The Journal of Supercomputing, 21:1, (37-76), Online publication date: 1-Jan-2002.
  311. Kandemir M, Ramanujam J, Choudhary A and Banerjee P (2001). A Layout-Conscious Iteration Space Transformation Technique, IEEE Transactions on Computers, 50:12, (1321-1336), Online publication date: 1-Dec-2001.
  312. ACM
    Lüthje O, Coors M, Keding H and Meyr H A novel approach to code analysis of digital signal processing systems Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems, (76-83)
  313. ACM
    Jin G, Mellor-Crummey J and Fowler R Increasing temporal locality with skewing and recursive blocking Proceedings of the 2001 ACM/IEEE conference on Supercomputing, (43-43)
  314. Kandemir M, Sezer U and Delaluz V Improving memory energy using access pattern classification Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design, (201-206)
  315. Delaluz V, Kandemir M, Vijaykrishnan N, Sivasubramaniam A and Irwin M (2001). Hardware and Software Techniques for Controlling DRAM Power Modes, IEEE Transactions on Computers, 50:11, (1154-1173), Online publication date: 1-Nov-2001.
  316. ACM
    Moreira J, Midkiff S, Gupta M, Artigas P, Wu P and Almasi G (2001). The NINJA project, Communications of the ACM, 44:10, (102-109), Online publication date: 1-Oct-2001.
  317. López D, Llosa J, Valero M and Ayguadé E (2001). Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures, IEEE Transactions on Computers, 50:10, (1033-1051), Online publication date: 1-Oct-2001.
  318. Ahmed N, Mateev N and Pingali K (2001). Synthesizing Transformations for Locality Enhancement of Imperfectly-Nested Loop Nests, International Journal of Parallel Programming, 29:5, (493-544), Online publication date: 1-Oct-2001.
  319. ACM
    Chung E, Benini L and De Micheli G Source code transformation based on software cost analysis Proceedings of the 14th international symposium on Systems synthesis, (153-158)
  320. Kandemir M, Banerjee P, Choudhary A, Ramanujam J and Ayguadé E (2001). Static and Dynamic Locality Optimizations Using Integer Linear Programming, IEEE Transactions on Parallel and Distributed Systems, 12:9, (922-941), Online publication date: 1-Sep-2001.
  321. ACM
    Kandemir M, Ramanujam J and Sezer U Compiler support for block buffering Proceedings of the 2001 international symposium on Low power electronics and design, (76-79)
  322. ACM
    Kadayif I, Kandemir M, Vijaykrishnan N, Irwin M and Ramanujam J Morphable Cache Architectures Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems, (128-137)
  323. ACM
    Kadayif I, Kandemir M, Vijaykrishnan N, Irwin M and Ramanujam J Morphable Cache Architectures Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems, (128-137)
  324. ACM
    Kadayif I, Kandemir M, Vijaykrishnan N, Irwin M and Ramanujam J (2001). Morphable Cache Architectures, ACM SIGPLAN Notices, 36:8, (128-137), Online publication date: 1-Aug-2001.
  325. Joisha P and Banerjee P (2001). The Efficient Computation of Ownership Sets in HPF, IEEE Transactions on Parallel and Distributed Systems, 12:8, (769-788), Online publication date: 1-Aug-2001.
  326. Kendemir M and Ramanujam J (2001). Data Relation Vectors, IEEE Transactions on Computers, 50:8, (798-810), Online publication date: 1-Aug-2001.
  327. Chang W, Chu C and Wu J (2001). Communication-Free Alignment for Array References with Linear Subscripts in Three Loop Index Variables or Quadratic Subscripts, The Journal of Supercomputing, 20:1, (67-83), Online publication date: 1-Aug-2001.
  328. ACM
    Bilardi G, Ekanadham K and Pattnaik P Computational power of pipelined memory hierarchies Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, (144-152)
  329. ACM
    Lim A, Liao S and Lam M (2001). Blocking and array contraction across arbitrarily nested loops using affine partitioning, ACM SIGPLAN Notices, 36:7, (103-112), Online publication date: 1-Jul-2001.
  330. ACM
    Keding H, Coors M, Lüthje O and Meyr H Fast bit-true simulation Proceedings of the 38th annual Design Automation Conference, (708-713)
  331. ACM
    Kandemir M, Ramanujam J, Irwin J, Vijaykrishnan N, Kadayif I and Parikh A Dynamic management of scratch-pad memory space Proceedings of the 38th annual Design Automation Conference, (690-695)
  332. ACM
    Ramanujam J, Hong J, Kandemir M and Narayan A Reducing memory requirements of nested loops for embedded systems Proceedings of the 38th annual Design Automation Conference, (359-364)
  333. ACM
    Lim A, Liao S and Lam M Blocking and array contraction across arbitrarily nested loops using affine partitioning Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, (103-112)
  334. ACM
    Chakrabarti D and Banerjee P Global optimization techniques for automatic parallelization of hybrid applications Proceedings of the 15th international conference on Supercomputing, (166-180)
  335. ACM
    Rauber T and Rüger G Optimizing locality for ODE solvers Proceedings of the 15th international conference on Supercomputing, (123-132)
  336. ACM
    Cociorva D, Wilkins J, Lam C, Baumgartner G, Ramanujam J and Sadayappan P Loop optimization for a class of memory-constrained computations Proceedings of the 15th international conference on Supercomputing, (103-113)
  337. ACM
    Song Y, Xu R, Wang C and Li Z Data locality enhancement by memory reduction Proceedings of the 15th international conference on Supercomputing, (50-64)
  338. ACM
    Mateev N, Menon V and Pingali K Fractal symbolic analysis Proceedings of the 15th international conference on Supercomputing, (38-49)
  339. Kotlyar V, Bau D, Kodukula I, Pingali K and Stodghill P Solving alignment using elementary linear algebra Compiler optimizations for scalable parallel systems, (385-411)
  340. Darte A, Robert Y and Vivien F Loop parallelization algorithms Compiler optimizations for scalable parallel systems, (141-171)
  341. Kodukula I and Pingali K (2001). Data-Centric Transformations for Locality Enhancement, International Journal of Parallel Programming, 29:3, (319-364), Online publication date: 1-Jun-2001.
  342. Catthoor F, Danckaert K, Wuytack S and Dutt N (2001). Code Transformations for Data Transfer and Storage Exploration Preprocessing in Multimedia Processors, IEEE Design & Test, 18:3, (70-82), Online publication date: 1-May-2001.
  343. ACM
    Kandemir M and Kadayif I Compiler-directed selection of dynamic memory layouts Proceedings of the ninth international symposium on Hardware/software codesign, (219-224)
  344. Hoeflinger J, Paek Y and Yi K (2001). Unified Interprocedural Parallelism Detection, International Journal of Parallel Programming, 29:2, (185-215), Online publication date: 1-Apr-2001.
  345. Chakrabarti D and Banerjee P (2001). Static Single Assignment Form for Message-Passing Programs, International Journal of Parallel Programming, 29:2, (139-184), Online publication date: 1-Apr-2001.
  346. ACM
    Kandemir M (2001). A compiler technique for improving whole-program locality, ACM SIGPLAN Notices, 36:3, (179-192), Online publication date: 1-Mar-2001.
  347. ACM
    Kandemir M A dynamic locality optimization algorithm for linear algebra codes Proceedings of the 2001 ACM symposium on Applied computing, (632-635)
  348. ACM
    Leopold C Exploiting non-uniform reuse for cache optimization Proceedings of the 2001 ACM symposium on Applied computing, (560-564)
  349. ACM
    Moisset P, Diniz P and Park J Matching and searching analysis for parallel hardware implementation on FPGAs Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays, (125-133)
  350. Kittitornkun S and Hu Y (2001). Efficient implementation of nested-loop multimedia algorithms, EURASIP Journal on Advances in Signal Processing, 2001:1, (129-146), Online publication date: 1-Jan-2001.
  351. ACM
    Kandemir M A compiler technique for improving whole-program locality Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, (179-192)
  352. Kandemir M, Choudhary A, Banerjee P, Ramanujam J and Shenoy N (2000). Minimizing Data and Synchronization Costs in One-Way Communication, IEEE Transactions on Parallel and Distributed Systems, 11:12, (1232-1251), Online publication date: 1-Dec-2000.
  353. Ahmed N, Mateev N and Pingali K Tiling imperfectly-nested loop nests Proceedings of the 2000 ACM/IEEE conference on Supercomputing, (31-es)
  354. ACM
    Esakkimuthu G, Vijaykrishnan N, Kandemir M and Irwin M Memory system energy (poster session) Proceedings of the 2000 international symposium on Low power electronics and design, (244-246)
  355. Kandemir M, Choudhary A, Ramanujam J and Kandaswamy M (2000). A Unified Framework for Optimizing Locality, Parallelism, and Communication in Out-of-Core Computations, IEEE Transactions on Parallel and Distributed Systems, 11:7, (648-668), Online publication date: 1-Jul-2000.
  356. ACM
    Vijaykrishnan N, Kandemir M, Irwin M, Kim H and Ye W Energy-driven integrated hardware-software optimizations using SimplePower Proceedings of the 27th annual international symposium on Computer architecture, (95-106)
  357. ACM
    Kandemir M, Vijaykrishnan N, Irwin M and Ye W Influence of compiler optimizations on system power Proceedings of the 37th Annual Design Automation Conference, (304-307)
  358. ACM
    Ahmed N, Mateev N and Pingali K Synthesizing transformations for locality enhancement of imperfectly-nested loop nests Proceedings of the 14th international conference on Supercomputing, (141-152)
  359. ACM
    Vijaykrishnan N, Kandemir M, Irwin M, Kim H and Ye W (2000). Energy-driven integrated hardware-software optimizations using SimplePower, ACM SIGARCH Computer Architecture News, 28:2, (95-106), Online publication date: 1-May-2000.
  360. Healy C, Sjödin M, Rustagi V, Whalley D and Engelen R (2000). Supporting Timing Analysis by Automatic Bounding of LoopIterations, Real-Time Systems, 18:2/3, (129-156), Online publication date: 1-May-2000.
  361. Diniz P and Park J Automatic Synthesis of Data Storage and Control Structures for FPGA-Based Computing Engines Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines
  362. Yan Y, Zhang X and Zhang Z (2000). Cacheminer, IEEE Transactions on Parallel and Distributed Systems, 11:4, (357-374), Online publication date: 1-Apr-2000.
  363. Rauber T and Rünger G (2000). A Transformation Approach to Derive Efficient Parallel Implementations, IEEE Transactions on Software Engineering, 26:4, (315-339), Online publication date: 1-Apr-2000.
  364. Ishizaki K, Komatsu H and Nakatani T (2000). A Loop Transformation Algorithm for Communication Overlapping, International Journal of Parallel Programming, 28:2, (135-154), Online publication date: 1-Apr-2000.
  365. Chamberlain B, Choi S, Lewis E, Lin C, Snyder L and Weathersby W (2000). ZPL, IEEE Transactions on Software Engineering, 26:3, (197-211), Online publication date: 1-Mar-2000.
  366. Shih K, Sheu J and Huang C (2000). Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers, The Journal of Supercomputing, 15:3, (243-269), Online publication date: 1-Mar-2000.
  367. ACM
    Ganesan S and Vemuri R An integrated temporal partioning and partial reconfiguration technique for design latency improvement Proceedings of the conference on Design, automation and test in Europe, (320-325)
  368. ACM
    Menon V and Pingali K (1999). A case for source-level transformations in MATLAB, ACM SIGPLAN Notices, 35:1, (53-65), Online publication date: 1-Jan-2000.
  369. Tongsima S, Sha E, Chantrapornchai C, Surma D and Passos N (2000). Probabilistic Loop Scheduling for Applications with Uncertain Execution Time, IEEE Transactions on Computers, 49:1, (65-80), Online publication date: 1-Jan-2000.
  370. ACM
    Menon V and Pingali K A case for source-level transformations in MATLAB Proceedings of the 2nd conference on Domain-specific languages, (53-65)
  371. White R, Mueller F, Healy C, Whalley D and Harmon M (1999). Timing Analysis for Data and Wrap-Around Fill Caches, Real-Time Systems, 17:2-3, (209-233), Online publication date: 14-Dec-1999.
  372. ACM
    Kandemir M, Banerjee P, Choudhary A, Ramanujam J and Shenoy N (1999). A global communication optimization technique based on data-flow analysis and linear algebra, ACM Transactions on Programming Languages and Systems, 21:6, (1251-1297), Online publication date: 1-Nov-1999.
  373. Menon V and Pingali K A case for source-level transformations in MATLAB Proceedings of the 2nd conference on Conference on Domain-Specific Languages - Volume 2, (5-5)
  374. Tsai J, Huang J, Amlo C, Lilja D and Yew P (1999). The Superthreaded Processor Architecture, IEEE Transactions on Computers, 48:9, (881-902), Online publication date: 1-Sep-1999.
  375. ACM
    Knoop J and Steffen B (1999). Code motion for explicitly parallel programs, ACM SIGPLAN Notices, 34:8, (13-24), Online publication date: 1-Aug-1999.
  376. ACM
    Grelck C and Scholz S Accelerating APL programs with SAC Proceedings of the conference on APL '99 : On track to the 21st century: On track to the 21st century, (50-57)
  377. ACM
    Kandemir M, Banerjee P, Choudhary A, Ramanujam J and Ayguadé E An integer linear programming approach for optimizing cache locality Proceedings of the 13th international conference on Supercomputing, (500-509)
  378. ACM
    Kodukula I, Pingali K, Cox R and Maydan D An experimental evaluation of tiling and shackling for memory hierarchy management Proceedings of the 13th international conference on Supercomputing, (482-491)
  379. ACM
    Menon V and Pingali K High-level semantic optimization of numerical codes Proceedings of the 13th international conference on Supercomputing, (434-443)
  380. ACM
    Kaul M, Vemuri R, Govindarajan S and Ouaiss I An automated temporal partitioning and loop fission approach for FPGA based reconfigurable synthesis of DSP applications Proceedings of the 36th annual ACM/IEEE Design Automation Conference, (616-622)
  381. ACM
    Högstedt K, Carter L and Ferrante J Selecting tile shape for minimal execution time Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, (201-211)
  382. O'Boyle M and Knijnenburg P (1999). Nonsingular Data Transformations, International Journal of Parallel Programming, 27:3, (131-159), Online publication date: 1-Jun-1999.
  383. ACM
    Song Y and Li Z (1999). New tiling techniques to improve cache temporal locality, ACM SIGPLAN Notices, 34:5, (215-228), Online publication date: 1-May-1999.
  384. ACM
    Leung A and George L (1999). Static single assignment form for machine code, ACM SIGPLAN Notices, 34:5, (204-214), Online publication date: 1-May-1999.
  385. ACM
    Song Y and Li Z New tiling techniques to improve cache temporal locality Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, (215-228)
  386. ACM
    Leung A and George L Static single assignment form for machine code Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation, (204-214)
  387. ACM
    Knoop J and Steffen B Code motion for explicitly parallel programs Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (13-24)
  388. Weinhardt M and Luk W Pipeline Vectorization for Reconfigurable Systems Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines
  389. ACM
    Scholz S (1998). On defining application-specific high-level array operations by means of shape-invariant programming facilities, ACM SIGAPL APL Quote Quad, 29:3, (32-38), Online publication date: 1-Mar-1999.
  390. Li Z, Reif J and Gupta S (1999). Synthesizing Efficient Out-of-Core Programs for Block Recursive Algorithms Using Block-Cyclic Data Distributions, IEEE Transactions on Parallel and Distributed Systems, 10:3, (297-315), Online publication date: 1-Mar-1999.
  391. ACM
    Rauber T and Rünger G A coordination language for mixed task and and data parallel programs Proceedings of the 1999 ACM symposium on Applied computing, (146-155)
  392. Kandemir M, Choudhary A, Shenoy N, Banerjee P and Ramanujam J (1999). A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts, IEEE Transactions on Parallel and Distributed Systems, 10:2, (115-135), Online publication date: 1-Feb-1999.
  393. Kandemir M, Ramanujam J and Choudhary A (1999). Improving Cache Locality by a Combination of Loop and Data Transformations, IEEE Transactions on Computers, 48:2, (159-167), Online publication date: 1-Feb-1999.
  394. ACM
    Strout M, Carter L, Ferrante J and Simon B (1998). Schedule-independent storage mapping for loops, ACM SIGOPS Operating Systems Review, 32:5, (24-33), Online publication date: 1-Dec-1998.
  395. ACM
    Grelck C and Scholz S (1998). Accelerating APL programs with SAC, ACM SIGAPL APL Quote Quad, 29:2, (50-57), Online publication date: 1-Dec-1998.
  396. Kandemir M, Choudhary A, Ramanujam J and Banerjee P Improving locality using loop and data transformations in an integrated framework Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture, (285-297)
  397. ACM
    Strout M, Carter L, Ferrante J and Simon B (1998). Schedule-independent storage mapping for loops, ACM SIGPLAN Notices, 33:11, (24-33), Online publication date: 1-Nov-1998.
  398. ACM
    Strout M, Carter L, Ferrante J and Simon B Schedule-independent storage mapping for loops Proceedings of the eighth international conference on Architectural support for programming languages and operating systems, (24-33)
  399. Abdelrahman T and Wong T (1998). Compiler Support for Array Distribution onNUMA Shared Memory Multiprocessors, The Journal of Supercomputing, 12:4, (349-371), Online publication date: 1-Oct-1998.
  400. ACM
    Pollard N and May D Using interval arithmetic the calculate data sizes for compilation to multimedia instruction sets Proceedings of the sixth ACM international conference on Multimedia, (279-284)
  401. ACM
    Scholz S On defining application-specific high-level array operations by means of shape-invariant programming facilities Proceedings of the APL98 conference on Array Processing Languages, (32-38)
  402. ACM
    Vajracharya S and Grunwald D Dependence driven execution for multiprogrammed multiprocessor Proceedings of the 12th international conference on Supercomputing, (329-336)
  403. ACM
    Chang W and Chu C The infinity Lambda test Proceedings of the 12th international conference on Supercomputing, (196-203)
  404. ACM
    Jiménez M, Llabería J, Fernández A and Morancho E A general algorithm for tiling the register level Proceedings of the 12th international conference on Supercomputing, (133-140)
  405. ACM
    Roth G and Kennedy K Loop fusion in high performance Fortran Proceedings of the 12th international conference on Supercomputing, (125-132)
  406. Calland P, Darte A, Robert Y and Vivien F (1998). On the Removal of Anti- and Output-Dependences, International Journal of Parallel Programming, 26:3, (285-312), Online publication date: 1-Jun-1998.
  407. ACM
    Lewis E, Lin C and Snyder L (1998). The implementation and evaluation of fusion and contraction in array languages, ACM SIGPLAN Notices, 33:5, (50-59), Online publication date: 1-May-1998.
  408. ACM
    Lo R, Chow F, Kennedy R, Liu S and Tu P (1998). Register promotion by sparse partial redundancy elimination of loads and stores, ACM SIGPLAN Notices, 33:5, (26-37), Online publication date: 1-May-1998.
  409. ACM
    Lewis E, Lin C and Snyder L The implementation and evaluation of fusion and contraction in array languages Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, (50-59)
  410. ACM
    Lo R, Chow F, Kennedy R, Liu S and Tu P Register promotion by sparse partial redundancy elimination of loads and stores Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation, (26-37)
  411. ACM
    Karkowski I and Corporaal H Design space exploration algorithm for heterogeneous multi-processor embedded system design Proceedings of the 35th annual Design Automation Conference, (82-87)
  412. Grun P, Balasa F and Dutt N Memory size estimation for multimedia applications Proceedings of the 6th international workshop on Hardware/software codesign, (145-149)
  413. Autrey T and Wolfe M (1998). Initial Results for Glacial Variable Analysis, International Journal of Parallel Programming, 26:1, (43-64), Online publication date: 1-Feb-1998.
  414. ACM
    Kandemir M, Choudhary A, Ramanujam J and Kandaswamy M A unified compiler algorithm for optimizing locality, parallelism and communication in out-of-core computations Proceedings of the fifth workshop on I/O in parallel and distributed systems, (79-92)
  415. ACM
    Vajracharya S and Grunwald D Loop re-ordering and pre-fetching at run-time Proceedings of the 1997 ACM/IEEE conference on Supercomputing, (1-13)
  416. ACM
    Alverson G, Briggs P, Coatney S, Kahan S and Korry R Tera hardware-software cooperation Proceedings of the 1997 ACM/IEEE conference on Supercomputing, (1-16)
  417. Bringmann O and Rosenstiel W Resource sharing in hierarchical synthesis Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design, (318-325)
  418. Kandemir M, Ramanujam J and Choudhary A Improving the Performance of Out-of-Core Computations Proceedings of the international Conference on Parallel Processing, (128-136)
  419. Lee H and Fortes J Automatic generation of injective modular mappings Proceedings of the international Conference on Parallel Processing
  420. ACM
    Sakellariou R and Gurd J Compile-time minimisation of load imbalance in loop nests Proceedings of the 11th international conference on Supercomputing, (277-284)
  421. ACM
    Bilmes J, Asanovic K, Chin C and Demmel J Optimizing matrix multiply using PHiPAC Proceedings of the 11th international conference on Supercomputing, (340-347)
  422. ACM
    Kandemir M, Ramanujam J and Choudhary A A compiler algorithm for optimizing locality in loop nests Proceedings of the 11th international conference on Supercomputing, (269-276)
  423. ACM
    Chrisochoides N, Kodukula I and Pingali K Compiler and run-time support for semi-structured applications Proceedings of the 11th international conference on Supercomputing, (229-236)
  424. ACM
    Kodukula I, Ahmed N and Pingali K (1997). Data-centric multi-level blocking, ACM SIGPLAN Notices, 32:5, (346-357), Online publication date: 1-May-1997.
  425. ACM
    Chow F, Chan S, Kennedy R, Liu S, Lo R and Tu P (1997). A new algorithm for partial redundancy elimination based on SSA form, ACM SIGPLAN Notices, 32:5, (273-286), Online publication date: 1-May-1997.
  426. ACM
    Kodukula I, Ahmed N and Pingali K Data-centric multi-level blocking Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, (346-357)
  427. ACM
    Chow F, Chan S, Kennedy R, Liu S, Lo R and Tu P A new algorithm for partial redundancy elimination based on SSA form Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation, (273-286)
  428. Park S and Koo M Detection of Implicit Parallelisms in the Task Parallel Language Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
  429. Koo M, Park S, Yook H and Park M A transformation method to reduce loop overhead in HPF compiler Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
  430. Hwang R An Efficient Technique of Instruction Scheduling on a Superscalar-Based Mulprocessor Proceedings of the 11th International Symposium on Parallel Processing, (33-39)
  431. Sakellariou R A Compile-Time Partitioning Strategy for Non-Rectangular Loop Nests Proceedings of the 11th International Symposium on Parallel Processing, (633-637)
  432. Govindarajan S and Vemuri R Cone Based Clustering for List Scheduling Algorithms Proceedings of the 1997 European conference on Design and Test
  433. Koo M, Park S, Yook H and Park M A New Transformation Method to Generate Optimized DO Loop from FORALL Construct Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
  434. Kodukula I and Pingali K Transformations for imperfectly nested loops Proceedings of the 1996 ACM/IEEE conference on Supercomputing, (12-es)
  435. Thirumalai A and Ramanujam J (1996). Efficient Computation of Address Sequences in Data Parallel Programs Using Closed Forms for Basis Vectors, Journal of Parallel and Distributed Computing, 38:2, (188-203), Online publication date: 1-Nov-1996.
  436. van Reeuwijk K, Denissen W, Sips H and Paalvast E (1996). An Implementation Framework for HPF Distributed Arrays on Message-Passing Parallel Computer Systems, IEEE Transactions on Parallel and Distributed Systems, 7:9, (897-914), Online publication date: 1-Sep-1996.
  437. Song W, Park D, Kim B and Kong Y Extracting Parallelism in Nested Loops Proceedings of the 20th Conference on Computer Software and Applications
  438. ACM
    Acharya A, Uysal M, Bennett R, Mendelson A, Beynon M, Hollingsworth J, Saltz J and Sussman A Tuning the performance of I/O-intensive parallel applications Proceedings of the fourth workshop on I/O in parallel and distributed systems: part of the federated computing research conference, (15-27)
  439. ACM
    Chakrabarti S, Gupta M and Choi J (1996). Global communication analysis and optimization, ACM SIGPLAN Notices, 31:5, (68-78), Online publication date: 1-May-1996.
  440. ACM
    Chakrabarti S, Gupta M and Choi J Global communication analysis and optimization Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation, (68-78)
  441. Zima E, Vadivelu K and Casavant T Mapping Techniques for Parallel Evaluation of Chains of Recurrences Proceedings of the 10th International Parallel Processing Symposium, (620-624)
  442. Bik A and Wijshoff H (1996). The Use of Iteration Space Partitioning to Construct Representative Simple Sections, Journal of Parallel and Distributed Computing, 34:1, (95-110), Online publication date: 10-Apr-1996.
  443. ACM
    Acharya A Eliminating redundant barrier synchronizations in rule-based programs Proceedings of the 10th international conference on Supercomputing, (325-332)
  444. ACM
    Yoshida A, Koshizuka K and Kasahara H Data-localization for Fortran macro-dataflow computation using partial static task assignment Proceedings of the 10th international conference on Supercomputing, (61-68)
Contributors
  • STMicroelectronics

Recommendations