skip to main content
Skip header Section
Parallel Computer Architecture: A Hardware/Software ApproachSeptember 1997
Publisher:
  • Morgan Kaufmann Publishers Inc.
  • 340 Pine Street, Sixth Floor
  • San Francisco
  • CA
  • United States
ISBN:978-1-55860-343-1
Published:01 September 1997
Pages:
1100
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

From the Publisher:

This book outlines a set of issues that are critical to all of parallel architecture--communication latency, communication bandwidth, and coordination of cooperative work (across modern designs). It describes the set of techniques available in hardware and in software to address each issues and explore how the various techniques interact.

Cited By

  1. Upadhyay B, Ros A and M. S (2023). Fine-grain data classification to filter token coherence traffic, Journal of Parallel and Distributed Computing, 171:C, (40-53), Online publication date: 1-Jan-2023.
  2. Zheng R and Pai S Efficient execution of graph algorithms on CPU with SIMD extensions Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization, (262-276)
  3. Upadhyay B, Ros A and Shah J (2021). Efficient classification of private memory blocks, Journal of Parallel and Distributed Computing, 157:C, (256-268), Online publication date: 1-Nov-2021.
  4. Khalil K, Eldash O, Kumar A and Bayoumi M (2019). Self-healing hardware systems, Microelectronics Journal, 93:C, Online publication date: 1-Nov-2019.
  5. ACM
    Jalaparti V, Douglas C, Ghosh M, Agrawal A, Floratou A, Kandula S, Menache I, Naor J and Rao S Netco Proceedings of the ACM Symposium on Cloud Computing, (186-198)
  6. Chen C, Hsia A, Zhan Y and Liu T (2018). Energy-efficient hybrid coherence protocol for multicore processors, Cluster Computing, 21:3, (1521-1541), Online publication date: 1-Sep-2018.
  7. ACM
    Dutt S, Nandi S and Trivedi G (2017). Analysis and Design of Adders for Approximate Computing, ACM Transactions on Embedded Computing Systems, 17:2, (1-28), Online publication date: 31-Mar-2018.
  8. Bijo S, Johnsen E, Pun K, Seidl C and Tarifa S Deployment by Construction for Multicore Architectures Leveraging Applications of Formal Methods, Verification and Validation. Modeling, (448-465)
  9. ACM
    Titos-Gil R, Flores A, Fernández-Pascual R, Ros A and Acacio M Way-combining directory Proceedings of the International Conference on Supercomputing, (1-10)
  10. ACM
    Bijo S, Johnsen E, Pun K and Tarifa S An operational semantics of cache coherent multicore architectures Proceedings of the 31st Annual ACM Symposium on Applied Computing, (1219-1224)
  11. Ros A and Kaxiras S Racer The 49th Annual IEEE/ACM International Symposium on Microarchitecture, (1-13)
  12. ACM
    Farias C, Li W, Delicato F, Pirmez L, Zomaya A, Pires P and Souza J (2016). A Systematic Review of Shared Sensor Networks, ACM Computing Surveys, 48:4, (1-50), Online publication date: 2-May-2016.
  13. ACM
    Zhang G, Horn W and Sanchez D Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems Proceedings of the 48th International Symposium on Microarchitecture, (13-25)
  14. ACM
    Chen A, Bhat D and Gehringer E An extensible simulator for bus- and directory-based cache coherence Proceedings of the Workshop on Computer Architecture Education, (1-7)
  15. ACM
    Kuiper G, Geuns S and Bekooij M Utilization Improvement by Enforcing Mutual Exclusive Task Execution in Modal Stream Processing Applications Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems, (28-37)
  16. ACM
    Cabezas J, Jordà M, Gelado I, Navarro N and Hwu W GPU-SM: shared memory multi-GPU programming Proceedings of the 8th Workshop on General Purpose Processing using GPUs, (13-24)
  17. Venkataramani S, Chakradhar S, Roy K and Raghunathan A Computing approximately, and efficiently Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, (748-751)
  18. Psathakis A, Papaefstathiou V, Chrysos N, Chaix F, Vasilakis E, Pnevmatikatos D and Katevenis M A Systematic Evaluation of Emerging Mesh-like CMP NoCs Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for networking and communications systems, (159-170)
  19. Jiang Y and Chen W (2015). Task scheduling for grid computing systems using a genetic algorithm, The Journal of Supercomputing, 71:4, (1357-1377), Online publication date: 1-Apr-2015.
  20. Bavarsad A and Atoofian E (2015). TurboLock, Computing, 97:6, (649-661), Online publication date: 1-Jun-2015.
  21. Asher Y, Shajrawi Y, Gendel Y, Haber G and Segal O A study of manycore shared memory architecture as a way to build SOC applications Proceedings of the Symposium on High Performance Computing, (174-181c)
  22. ACM
    Abadal S, Mestres A, Iannazzo M, Solé-Pareta J, Alarcón E and Cabellos-Aparicio A Evaluating the Feasibility of Wireless Networks-on-Chip Enabled by Graphene Proceedings of the 2014 International Workshop on Network on Chip Architectures, (51-56)
  23. ACM
    Daya B, Chen C, Subramanian S, Kwon W, Park S, Krishna T, Holt J, Chandrakasan A and Peh L (2014). SCORPIO, ACM SIGARCH Computer Architecture News, 42:3, (25-36), Online publication date: 16-Oct-2014.
  24. ACM
    Voskuilen G and Vijaykumar T (2014). High-performance fractal coherence, ACM SIGARCH Computer Architecture News, 42:1, (701-714), Online publication date: 5-Apr-2014.
  25. ACM
    Voskuilen G and Vijaykumar T (2014). High-performance fractal coherence, ACM SIGPLAN Notices, 49:4, (701-714), Online publication date: 5-Apr-2014.
  26. ACM
    Atoofian E Acceleration of Software Transactional Memory through Hardware Clock Proceedings of International Workshop on Manycore Embedded Systems, (41-47)
  27. ACM
    Geuns S, Hausmans J and Bekooij M Temporal analysis model extraction for optimizing modal multi-rate stream processing applications Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems, (21-30)
  28. ACM
    Liu C and Yang C Exploiting heterogeneity in MPSoCs to prevent potential trojan propagation across malicious IPs Proceedings of the 24th edition of the great lakes symposium on VLSI, (335-340)
  29. ACM
    Rutgers J, Bekooij M and Smit G Programming a Multicore Architecture without Coherency and Atomic Operations Proceedings of Programming Models and Applications on Multicores and Manycores, (29-38)
  30. ACM
    Hu J, Zhuge Q, Xue C, Tseng W and Sha E (2014). Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors, ACM Transactions on Embedded Computing Systems, 13:4, (1-25), Online publication date: 5-Dec-2014.
  31. ACM
    Voskuilen G and Vijaykumar T High-performance fractal coherence Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, (701-714)
  32. Xu W, Yu H, Lu D, Song F, Wang D, Ye X, Pei S, Fan D and Xie H (2014). Fast and scalable lock methods for video coding on many-core architecture, Journal of Visual Communication and Image Representation, 25:7, (1758-1762), Online publication date: 1-Oct-2014.
  33. Braojos R, Dogan A, Beretta I, Ansaloni G and Atienza D Hardware/software approach for code synchronization in low-power multi-core sensor nodes Proceedings of the conference on Design, Automation & Test in Europe, (1-6)
  34. Kim T and Hoskote Y Automatic generation of custom SIMD instructions for superword level parallelism Proceedings of the conference on Design, Automation & Test in Europe, (1-6)
  35. Daya B, Chen C, Subramanian S, Kwon W, Park S, Krishna T, Holt J, Chandrakasan A and Peh L SCORPIO Proceeding of the 41st annual international symposium on Computer architecuture, (25-36)
  36. ACM
    Rutgers J, Bekooij M and Smit G Programming a Multicore Architecture without Coherency and Atomic Operations Proceedings of Programming Models and Applications on Multicores and Manycores, (29-38)
  37. ACM
    Albericio J, Ibáñez P, Viñals V and Llabería J The reuse cache Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, (310-321)
  38. ACM
    Singh A, Das A and Kumar A Energy optimization by exploiting execution slacks in streaming applications on multiprocessor systems Proceedings of the 50th Annual Design Automation Conference, (1-7)
  39. ACM
    Yiapanis P, Rosas-Ham D, Brown G and Luján M (2013). Optimizing software runtime systems for speculative parallelization, ACM Transactions on Architecture and Code Optimization, 9:4, (1-27), Online publication date: 1-Jan-2013.
  40. Dogan A, Braojos R, Constantin J, Ansaloni G, Burg A and Atienza D Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms Proceedings of the Conference on Design, Automation and Test in Europe, (396-399)
  41. Rodrigues E, Navaux P, Panetta J and Mendes C (2013). Preserving the original MPI semantics in a virtualized processor environment, Science of Computer Programming, 78:4, (412-421), Online publication date: 1-Apr-2013.
  42. Atoofian E VGTS Proceedings of the 19th international conference on Parallel Processing, (203-214)
  43. ACM
    Schor L, Bacivarov I, Rai D, Yang H, Kang S and Thiele L Scenario-based design flow for mapping streaming applications onto on-chip many-core systems Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems, (71-80)
  44. ACM
    Nychis G, Fallin C, Moscibroda T, Mutlu O and Seshan S (2012). On-chip networks from a networking perspective, ACM SIGCOMM Computer Communication Review, 42:4, (407-418), Online publication date: 24-Sep-2012.
  45. ACM
    Kramer W Top500 versus sustained performance Proceedings of the 21st international conference on Parallel architectures and compilation techniques, (223-230)
  46. ACM
    Dreslinski R, Manville T, Sewell K, Das R, Pinckney N, Satpathy S, Blaauw D, Sylvester D and Mudge T XPoint cache Proceedings of the 21st international conference on Parallel architectures and compilation techniques, (75-86)
  47. ACM
    Carpenter A, Hu J, Kocabas O, Huang M and Wu H (2012). Enhancing effective throughput for transmission line-based bus, ACM SIGARCH Computer Architecture News, 40:3, (165-176), Online publication date: 5-Sep-2012.
  48. ACM
    Nychis G, Fallin C, Moscibroda T, Mutlu O and Seshan S On-chip networks from a networking perspective Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, (407-418)
  49. ACM
    Ratanaworabhan P, Burtscher M, Kirovski D and Zorn B Hardware support for enforcing isolation in lock-based parallel programs Proceedings of the 26th ACM international conference on Supercomputing, (301-310)
  50. ACM
    Aggarwal V, Stitt G, George A and Yoon C (2012). SCF, ACM Transactions on Reconfigurable Technology and Systems, 5:2, (1-23), Online publication date: 1-Jun-2012.
  51. ACM
    Solano-Quinde L, Bode B and Somani A Techniques for the parallelization of unstructured grid applications on multi-GPU systems Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (140-147)
  52. ACM
    Edwards J and Vishkin U Better speedups using simpler parallel programming for graph connectivity and biconnectivity Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (103-114)
  53. ACM
    Atoofian E and Bavarsad A AGC Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (11-16)
  54. ACM
    Santos B and Macedo H (2012). Improving CUDA™ C/C++ encoding readability to foster parallel application development, ACM SIGSOFT Software Engineering Notes, 37:1, (1-5), Online publication date: 27-Jan-2012.
  55. ACM
    Pricopi M and Mitra T (2012). Bahurupi, ACM Transactions on Architecture and Code Optimization, 8:4, (1-21), Online publication date: 1-Jan-2012.
  56. Carpenter A, Hu J, Kocabas O, Huang M and Wu H Enhancing effective throughput for transmission line-based bus Proceedings of the 39th Annual International Symposium on Computer Architecture, (165-176)
  57. Hart S, Frachtenberg E and Berezecki M Predicting memcached throughput using simulation and modeling Proceedings of the 2012 Symposium on Theory of Modeling and Simulation - DEVS Integrative M&S Symposium, (1-8)
  58. Van der Wijngaart R, Sridharan S and Lee V Extending the BT NAS parallel benchmark to exascale computing Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-9)
  59. Tavangarian D Virtual computing Software Service and Application Engineering, (53-70)
  60. Atoofian E and Bavarsad A Maintaining consistency in software transactional memory through dynamic versioning tuning Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II, (40-49)
  61. ACM
    Terechko A, Hoogerbrugge J, Alkadi G, Guntur S, Lahiri A, Duranton M, Wüst C, Christie P, Nackaerts A and Kumar A (2012). Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures, ACM Transactions on Embedded Computing Systems, 11S:1, (1-32), Online publication date: 1-Jun-2012.
  62. ACM
    Kramer W How to measure useful, sustained performance State of the Practice Reports, (1-18)
  63. ACM
    Clemons J, Jones A, Perricone R, Savarese S and Austin T EFFEX Proceedings of the 48th Design Automation Conference, (1020-1025)
  64. ACM
    Vishkin U (2011). Using simple abstraction to reinvent computing for parallelism, Communications of the ACM, 54:1, (75-85), Online publication date: 1-Jan-2011.
  65. Cappiello C, Hinostroza A, Pernici B, Sami M, Henis E, Kat R, Meth K and Mura M ADSC Proceedings of the First international conference on Information and communication on technology for the fight against global warming, (165-179)
  66. Khan M and Herbordt M (2011). Parallel discrete molecular dynamics simulation with speculation and in-order commitment, Journal of Computational Physics, 230:17, (6563-6582), Online publication date: 1-Jul-2011.
  67. ACM
    Kourtis K, Goumas G and Koziris N (2010). Exploiting compression opportunities to improve SpMxV performance on shared memory systems, ACM Transactions on Architecture and Code Optimization, 7:3, (1-31), Online publication date: 1-Dec-2010.
  68. Habgood K and Arel I Revisiting Cramer's rule for solving dense linear systems Proceedings of the 2010 Spring Simulation Multiconference, (1-8)
  69. ACM
    Pugsley S, Spjut J, Nellans D and Balasubramonian R SWEL Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (465-476)
  70. ACM
    Kim H, Ahn J and Kim J Replication-aware leakage management in chip multiprocessors with private L2 cache Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design, (135-140)
  71. ACM
    Liao D and Berkovich S A new multi-core pipelined architecture for executing sequential programs for parallel geospatial computing Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, (1-8)
  72. ACM
    Xue J, Garg A, Ciftcioglu B, Hu J, Wang S, Savidis I, Jain M, Berman R, Liu P, Huang M, Wu H, Friedman E, Wicks G and Moore D (2010). An intra-chip free-space optical interconnect, ACM SIGARCH Computer Architecture News, 38:3, (94-105), Online publication date: 19-Jun-2010.
  73. ACM
    Xue J, Garg A, Ciftcioglu B, Hu J, Wang S, Savidis I, Jain M, Berman R, Liu P, Huang M, Wu H, Friedman E, Wicks G and Moore D An intra-chip free-space optical interconnect Proceedings of the 37th annual international symposium on Computer architecture, (94-105)
  74. ACM
    Rodrigues E, Navaux P, Panetta J and Mendes C A new technique for data privatization in user-level threads and its use in parallel applications Proceedings of the 2010 ACM Symposium on Applied Computing, (2149-2154)
  75. ACM
    Kirman N and Martínez J A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems, (15-28)
  76. ACM
    Kirman N and Martínez J (2010). A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing, ACM SIGPLAN Notices, 45:3, (15-28), Online publication date: 5-Mar-2010.
  77. ACM
    Kirman N and Martínez J (2010). A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing, ACM SIGARCH Computer Architecture News, 38:1, (15-28), Online publication date: 5-Mar-2010.
  78. ACM
    Rupnow K, Adriaens J, Fu W and Compton K Accurately evaluating application performance in simulated hybrid multi-tasking systems Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, (135-144)
  79. ACM
    Bueno D, Conger C and George A (2010). Optimizing rapidIO architectures for onboard processing, ACM Transactions on Embedded Computing Systems, 9:3, (1-30), Online publication date: 1-Feb-2010.
  80. Canedo A, Yoshizawa T and Komatsu H Skewed pipelining for parallel simulink simulations Proceedings of the Conference on Design, Automation and Test in Europe, (891-896)
  81. Daneshtalab M, Ebrahimi M, Liljeberg P, Plosila J and Tenhunen H A Low-Latency and Memory-Efficient On-chip Network Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip, (99-106)
  82. Ye Y and Megson G Distributed acceleration of mobile radio network optimisation algorithms Proceedings of the 9th conference on Wireless telecommunications symposium, (124-131)
  83. Shabbir A, Kumar A, Stuijk S, Mesman B and Corporaal H (2010). CA-MPSoC, Journal of Systems Architecture: the EUROMICRO Journal, 56:7, (265-277), Online publication date: 1-Jul-2010.
  84. Razavi S and Sarbazi-Azad H (2010). The triangular pyramid, Information Sciences: an International Journal, 180:11, (2328-2339), Online publication date: 1-Jun-2010.
  85. Akay M and Abasıkeleş I (2010). Predicting the performance measures of an optical distributed shared memory multiprocessor by using support vector regression, Expert Systems with Applications: An International Journal, 37:9, (6293-6301), Online publication date: 1-Sep-2010.
  86. Akay M, Abasıkeleş İ and Oral M (2010). Application of self organizing maps for investigating network latency on a broadcast-based distributed shared memory multiprocessor, Expert Systems with Applications: An International Journal, 37:4, (2937-2942), Online publication date: 1-Apr-2010.
  87. Abasıkeleş İ and Akay M (2010). Performance evaluation of directory protocols on an optical broadcast-based distributed shared memory multiprocessor, Computers and Electrical Engineering, 36:1, (114-131), Online publication date: 1-Jan-2010.
  88. Breitbart J An approach for semiautomatic locality optimizations using OpenMP Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2, (291-301)
  89. ACM
    Shabbir A, Stuijk S, Kumar A, Theelen B, Mesman B and Corporaal H A predictable communication assist Proceedings of the 7th ACM international conference on Computing frontiers, (97-98)
  90. ACM
    Suleman M, Mutlu O, Qureshi M and Patt Y (2009). Accelerating critical section execution with asymmetric multi-core architectures, ACM SIGARCH Computer Architecture News, 37:1, (253-264), Online publication date: 1-Mar-2009.
  91. ACM
    Kandemir M, Muralidhara S, Narayanan S, Zhang Y and Ozturk O Optimizing shared cache behavior of chip multiprocessors Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, (505-516)
  92. ACM
    Zeng H, Yourst M, Ghose K and Ponomarev D MPTLsim Proceedings of the 46th Annual Design Automation Conference, (226-231)
  93. ACM
    Ophelders F, Bekooij M and Corporaal H A tuneable software cache coherence protocol for heterogeneous MPSoCs Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, (383-392)
  94. ACM
    Ha P, Tsigas P and Anshus O (2009). Preliminary results on nb-feb, a synchronization primitive for parallel programming, ACM SIGPLAN Notices, 44:4, (295-296), Online publication date: 14-Feb-2009.
  95. ACM
    Firoozshahian A, Solomatnikov A, Shacham O, Asgar Z, Richardson S, Kozyrakis C and Horowitz M (2009). A memory system design framework, ACM SIGARCH Computer Architecture News, 37:3, (406-417), Online publication date: 15-Jun-2009.
  96. ACM
    Firoozshahian A, Solomatnikov A, Shacham O, Asgar Z, Richardson S, Kozyrakis C and Horowitz M A memory system design framework Proceedings of the 36th annual international symposium on Computer architecture, (406-417)
  97. ACM
    Müller T and Knoll A Attention driven visual processing for an interactive dialog robot Proceedings of the 2009 ACM symposium on Applied Computing, (1151-1155)
  98. ACM
    Suleman M, Mutlu O, Qureshi M and Patt Y (2009). Accelerating critical section execution with asymmetric multi-core architectures, ACM SIGPLAN Notices, 44:3, (253-264), Online publication date: 28-Feb-2009.
  99. ACM
    Suleman M, Mutlu O, Qureshi M and Patt Y Accelerating critical section execution with asymmetric multi-core architectures Proceedings of the 14th international conference on Architectural support for programming languages and operating systems, (253-264)
  100. ACM
    Ha P, Tsigas P and Anshus O Preliminary results on nb-feb, a synchronization primitive for parallel programming Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, (295-296)
  101. ACM
    Hansson A, Goossens K, Bekooij M and Huisken J (2009). CoMPSoC, ACM Transactions on Design Automation of Electronic Systems, 14:1, (1-24), Online publication date: 1-Jan-2009.
  102. ACM
    Larsson A, Gidenstam A, Ha P, Papatriantafilou M and Tsigas P (2009). Multiword atomic read/write registers on multiprocessor systems, ACM Journal of Experimental Algorithmics, 13, (1.7-1.30), Online publication date: 1-Feb-2009.
  103. Huang C, Lin C and Tsai W (2009). A multi-core based parallel streaming mechanism for concurrent video-on-demand applications, IEEE Communications Letters, 13:4, (286-288), Online publication date: 1-Apr-2009.
  104. Li X and Hammami O (2009). An automatic design flow for data parallel and pipelined signal processing applications on embedded multiprocessor with NoC, International Journal of Reconfigurable Computing, 2009, (2-2), Online publication date: 1-Jan-2009.
  105. Wagner I and Bertacco V CASPAR Proceedings of the Conference on Design, Automation and Test in Europe, (658-663)
  106. Kim H, Youn S and Kim J (2009). Reusability-aware cache memory sharing for chip multiprocessors with private L2 caches, Journal of Systems Architecture: the EUROMICRO Journal, 55:10-12, (446-456), Online publication date: 1-Oct-2009.
  107. Palermo G, Silvano C and Zaccaria V (2009). ReSPIR, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28:12, (1816-1829), Online publication date: 1-Dec-2009.
  108. Han W, Yi Y, Muir M, Nousias I, Arslan T and Erdogan A (2009). Multicore architectures with dynamically reconfigurable array processors for wireless broadband technologies, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28:12, (1830-1843), Online publication date: 1-Dec-2009.
  109. Martínez-Pérez I and Zimmermann K (2009). Parallel bioinspired algorithms for NP complete graph problems, Journal of Parallel and Distributed Computing, 69:3, (221-229), Online publication date: 1-Mar-2009.
  110. Park K, Park H, Jeun W and Ha S (2009). Boolean circuit programming, Journal of Discrete Algorithms, 7:2, (267-277), Online publication date: 1-Jun-2009.
  111. Honda K, Vasconcelos V and Yoshida N (2009). Type-Directed Compilation for Multicore Programming, Electronic Notes in Theoretical Computer Science (ENTCS), 241, (101-111), Online publication date: 1-Jul-2009.
  112. Mihu I and Caprita H A strategy for parallel sorting algorithms evaluation based on MPI technology Proceedings of the 8th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases, (49-54)
  113. ACM
    Ernst D and Stevenson D (2008). Concurrent CS, ACM SIGCSE Bulletin, 40:3, (230-234), Online publication date: 25-Aug-2008.
  114. ACM
    De A, Roychoudhury A and D'Souza D Java memory model aware software validation Proceedings of the 8th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, (8-14)
  115. ACM
    Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M and Kozyrakis C (2008). Comparative evaluation of memory models for chip multiprocessors, ACM Transactions on Architecture and Code Optimization, 5:3, (1-30), Online publication date: 1-Nov-2008.
  116. ACM
    Kluter T, Brisk P, Ienne P and Charbon E Speculative DMA for architecturally visible storage in instruction set extensions Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis, (243-248)
  117. ACM
    Wagner I and Bertacco V MCjammer Proceedings of the conference on Design, automation and test in Europe, (670-675)
  118. ACM
    Moonen A, Bekooij M, van den Berg R and van Meerbergen J Cache aware mapping of streaming applications on a multiprocessor system-on-chip Proceedings of the conference on Design, automation and test in Europe, (300-305)
  119. ACM
    Ernst D and Stevenson D Concurrent CS Proceedings of the 13th annual conference on Innovation and technology in computer science education, (230-234)
  120. ACM
    Popovici K, Guerin X, Rousseau F, Paolucci P and Jerraya A (2008). Platform-based software design flow for heterogeneous MPSoC, ACM Transactions on Embedded Computing Systems, 7:4, (1-23), Online publication date: 1-Jul-2008.
  121. ACM
    Inoue H, Sakai J and Edahiro M (2008). Processor virtualization for secure mobile terminals, ACM Transactions on Design Automation of Electronic Systems, 13:3, (1-23), Online publication date: 1-Jul-2008.
  122. ACM
    Wen X and Vishkin U Fpga-based prototype of a pram-on-chip processor Proceedings of the 5th conference on Computing frontiers, (55-66)
  123. ACM
    Bijlsma T, Bekooij M, Jansen P and Smit G Communication between nested loop programs via circular buffers in an embedded multiprocessor system Proceedings of the 11th international workshop on Software & compilers for embedded systems, (33-42)
  124. ACM
    Gehringer E, Cassel L, Deibel K and Joel W (2008). Wikis, ACM SIGCSE Bulletin, 40:1, (379-380), Online publication date: 29-Feb-2008.
  125. ACM
    Gehringer E, Cassel L, Deibel K and Joel W Wikis Proceedings of the 39th SIGCSE technical symposium on Computer science education, (379-380)
  126. Liao D and Berkovich S The design of parallel solid voxelization based on multi-processor pipeline by program slicing Proceedings of the 12th WSEAS international conference on Computers, (167-172)
  127. Fernández-Pascual R, García J, Acacio M and Duato J Fault-tolerant cache coherence protocols for CMPs Proceedings of the 15th international conference on High performance computing, (555-568)
  128. ACM
    Chung C and Kim J Broadcast filtering-aware task assignment techniques for low-power MPSoCs Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, (89-96)
  129. ACM
    Subramaniam M, Chundi P and Siy H Aggregating changes to efficiently check consistency Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting, (39-42)
  130. ACM
    Moreira O, Valente F and Bekooij M Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor Proceedings of the 7th ACM & IEEE international conference on Embedded software, (57-66)
  131. ACM
    Chandraiah P and Doemer R Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification Proceedings of the 44th annual Design Automation Conference, (787-790)
  132. ACM
    Stuijk S, Basten T, Geilen M and Corporaal H Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs Proceedings of the 44th annual Design Automation Conference, (777-782)
  133. ACM
    Hwu W, Ryoo S, Ueng S, Kelm J, Gelado I, Stone S, Kidd R, Baghsorkhi S, Mahesri A, Tsao S, Navarro N, Lumetta S, Frank M and Patel S Implicitly parallel programming models for thousand-core microprocessors Proceedings of the 44th annual Design Automation Conference, (754-759)
  134. ACM
    Gambhir M, Gehringer E and Solihin Y Animations of important concepts in parallel computer architecture Proceedings of the 2007 workshop on Computer architecture education, (23-29)
  135. ACM
    Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M and Kozyrakis C (2007). Comparing memory systems for chip multiprocessors, ACM SIGARCH Computer Architecture News, 35:2, (358-368), Online publication date: 9-Jun-2007.
  136. ACM
    Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M and Kozyrakis C Comparing memory systems for chip multiprocessors Proceedings of the 34th annual international symposium on Computer architecture, (358-368)
  137. ACM
    Atoofian E, Baniasadi A and Aasaraai K Speculative supplier identification for reducing power of interconnects in snoopy cache coherence protocols Proceedings of the 4th international conference on Computing frontiers, (259-266)
  138. ACM
    Wheeler P and Fulp E A taxonomy of parallel techniques for intrusion detection Proceedings of the 45th annual southeast regional conference, (278-282)
  139. ACM
    Heirman W, Dambre J and Van Campenhout J Synthetic traffic generation as a tool for dynamic interconnect evaluation Proceedings of the 2007 international workshop on System level interconnect prediction, (65-72)
  140. ACM
    Tumeo A, Monchiero M, Palermo G, Ferrandi F and Sciuto D A design kit for a fully working shared memory multiprocessor on FPGA Proceedings of the 17th ACM Great Lakes symposium on VLSI, (219-222)
  141. ACM
    Chung C, Kim J and Kim D Reducing snoop-energy in shared bus-based mpsocs by filtering useless broadcasts Proceedings of the 17th ACM Great Lakes symposium on VLSI, (126-131)
  142. Cameron K, Ge R and Sun X (2007). $\log_{\rm n}{\rm P}$ and $\log_{3}{\rm P}$, IEEE Transactions on Computers, 56:3, (314-327), Online publication date: 1-Mar-2007.
  143. Poletti F, Poggiali A, Bertozzi D, Benini L, Marchal P, Loghi M and Poncino M (2007). Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing, IEEE Transactions on Computers, 56:5, (606-621), Online publication date: 1-May-2007.
  144. Chen G and Kandemir M An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors Transactions on High-Performance Embedded Architectures and Compilers I, (214-233)
  145. Borkar S, Jouppi N and Stenstrom P Microprocessors in the era of terascale integration Proceedings of the conference on Design, automation and test in Europe, (237-242)
  146. Narayanan S, Kandemir M and Brooks R Performance aware secure code partitioning Proceedings of the conference on Design, automation and test in Europe, (1122-1127)
  147. Popovici K and Jerraya A Simulink based hardware-software codesign flow for heterogeneous MPSoC Proceedings of the 2007 Summer Computer Simulation Conference, (497-504)
  148. Bolotin E, Guz Z, Cidon I, Ginosar R and Kolodny A The Power of Priority Proceedings of the First International Symposium on Networks-on-Chip, (117-126)
  149. Hong B and Prasanna V (2007). Adaptive Allocation of Independent Tasks to Maximize Throughput, IEEE Transactions on Parallel and Distributed Systems, 18:10, (1420-1435), Online publication date: 1-Oct-2007.
  150. Conway P and Hughes B (2007). The AMD Opteron Northbridge Architecture, IEEE Micro, 27:2, (10-21), Online publication date: 1-Mar-2007.
  151. Vlassov V, Merino O, Moritz C and Popov K Support for fine-grained synchronization in shared-memory multiprocessors Proceedings of the 9th international conference on Parallel Computing Technologies, (453-467)
  152. Ros A, Acacio M and García J Direct coherence Proceedings of the 14th international conference on High performance computing, (147-160)
  153. ACM
    Kennedy K, Koelbel C and Zima H The rise and fall of High Performance Fortran Proceedings of the third ACM SIGPLAN conference on History of programming languages, (7-1-7-22)
  154. Ha P, Papatriantafilou M and Tsigas P (2007). Efficient self-tuning spin-locks using competitive analysis, Journal of Systems and Software, 80:7, (1077-1090), Online publication date: 1-Jul-2007.
  155. ACM
    Moerschell A and Owens J Distributed texture memory in a multi-GPU environment Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, (31-38)
  156. ACM
    Cheng L, Muralimanohar N, Ramani K, Balasubramonian R and Carter J (2006). Interconnect-Aware Coherence Protocols for Chip Multiprocessors, ACM SIGARCH Computer Architecture News, 34:2, (339-351), Online publication date: 1-May-2006.
  157. ACM
    Strauss K, Shen X and Torrellas J (2006). Flexible Snooping, ACM SIGARCH Computer Architecture News, 34:2, (327-338), Online publication date: 1-May-2006.
  158. ACM
    Lin Y, Lee H, Woh M, Harel Y, Mahlke S, Mudge T, Chakrabarti C and Flautner K (2006). SODA, ACM SIGARCH Computer Architecture News, 34:2, (89-101), Online publication date: 1-May-2006.
  159. ACM
    Inoue H, Ikeno A, Kondo M, Sakai J and Edahiro M VIRTUS Proceedings of the 43rd annual Design Automation Conference, (484-489)
  160. ACM
    Jerraya A, Bouchhima A and Pétrot F Programming models and HW-SW interfaces abstraction for multi-processor SoC Proceedings of the 43rd annual Design Automation Conference, (280-285)
  161. Sinnen O, Sousa L and Eika Sandnes F (2006). Toward a Realistic Task Scheduling Model, IEEE Transactions on Parallel and Distributed Systems, 17:3, (263-275), Online publication date: 1-Mar-2006.
  162. Jeong J and Dubois M (2006). Cache Replacement Algorithms with Nonuniform Miss Costs, IEEE Transactions on Computers, 55:4, (353-365), Online publication date: 1-Apr-2006.
  163. Kirman N, Kirman M, Dokania R, Martinez J, Apsel A, Watkins M and Albonesi D Leveraging Optical Technology in Future Bus-based Chip Multiprocessors Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, (492-503)
  164. Sampson J, Gonzalez R, Collard J, Jouppi N, Schlansker M and Calder B Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, (235-246)
  165. Lin Y, Lee H, Woh M, Harel Y, Mahlke S, Mudge T, Chakrabarti C and Flautner K SODA Proceedings of the 33rd annual international symposium on Computer Architecture, (89-101)
  166. Cheng L, Muralimanohar N, Ramani K, Balasubramonian R and Carter J Interconnect-Aware Coherence Protocols for Chip Multiprocessors Proceedings of the 33rd annual international symposium on Computer Architecture, (339-351)
  167. Strauss K, Shen X and Torrellas J Flexible Snooping Proceedings of the 33rd annual international symposium on Computer Architecture, (327-338)
  168. Chen H, Decker J and Bierbaum N Future networking for scalable I/O Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks, (128-135)
  169. Farley R and Fulp E Effects of processing delay on function-parallel firewalls Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks, (136-141)
  170. Gu P and Vishkin U (2006). Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor, Journal of Embedded Computing, 2:2, (181-190), Online publication date: 1-Apr-2006.
  171. Mouhoub R and Hammami O Multiprocessor on chip Proceedings of the 20th international conference on Parallel and distributed processing, (319-319)
  172. Sendag R, Yilmazer A, Yi J and Uht A Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems Proceedings of the 20th international conference on Parallel and distributed processing, (21-21)
  173. Blazewicz J, Kovalyov M, Machowiak M, Trystram D and Weglarz J (2006). Preemptable Malleable Task Scheduling Problem, IEEE Transactions on Computers, 55:4, (486-490), Online publication date: 1-Apr-2006.
  174. Xue L, ozturk O, Li F, Kandemir M and Kolcu I Dynamic partitioning of processing and memory resources in embedded MPSoC architectures Proceedings of the conference on Design, automation and test in Europe: Proceedings, (690-695)
  175. Dumitrescu C, Ciocoi V and Pop M Power QUICC™ II pro family of communications processors Proceedings of the 5th WSEAS international conference on Data networks, communications and computers, (125-130)
  176. Navarrete C, Holgado S and Anguiano E Epitaxial surface growth with local interaction, parallel and non-parallel simulations Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (882-889)
  177. Ivanov L (2006). A modern course on parallel and distributed processing, Journal of Computing Sciences in Colleges, 21:6, (29-38), Online publication date: 1-Jun-2006.
  178. ACM
    Jansen K and Zhang H (2006). An approximation algorithm for scheduling malleable tasks under general precedence constraints, ACM Transactions on Algorithms, 2:3, (416-434), Online publication date: 1-Jul-2006.
  179. ACM
    Kandemir M (2006). Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality, ACM Transactions on Design Automation of Electronic Systems, 11:2, (410-441), Online publication date: 1-Apr-2006.
  180. ACM
    Bjerregaard T and Mahadevan S (2006). A survey of research and practices of Network-on-chip, ACM Computing Surveys, 38:1, (1-es), Online publication date: 29-Jun-2006.
  181. Fatoohi R, Kardys K, Koshy S, Sivaramakrishnan S and Vetter J (2006). Performance evaluation of high-speed interconnects using dense communication patterns, Parallel Computing, 32:11-12, (794-807), Online publication date: 1-Dec-2006.
  182. Chen K and Sha E (2006). The fat-stack and universal routing in interconnection networks, Journal of Parallel and Distributed Computing, 66:5, (705-715), Online publication date: 1-May-2006.
  183. James T, Barkhi R and Johnson J (2006). Platform impact on performance of parallel genetic algorithms, Engineering Applications of Artificial Intelligence, 19:8, (843-856), Online publication date: 1-Dec-2006.
  184. Shen Z (2006). A bypassing path based routing algorithm for the pyramid structures, Applied Mathematics and Computation, 181:2, (1523-1543), Online publication date: 1-Oct-2006.
  185. Antony J, Janes P and Rendell A Exploring thread and memory placement on NUMA architectures Proceedings of the 13th international conference on High Performance Computing, (338-352)
  186. Tang X, Li K, Xiao D, Yang J, Liu M and Qin Y A dynamic communication contention awareness list scheduling algorithm for arbitrary heterogeneous system Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II, (1315-1324)
  187. ACM
    Bekooij M, Parmar S and van Meerbergen J Performance guarantees by simulation of process Proceedings of the 2005 workshop on Software and compilers for embedded systems, (10-19)
  188. ACM
    Sampson J, González R, Collard J, Jouppi N and Schlansker M (2005). Fast synchronization for chip multiprocessors, ACM SIGARCH Computer Architecture News, 33:4, (64-69), Online publication date: 1-Nov-2005.
  189. ACM
    Davis J, Richardson S, Charitsis C and Olukotun K (2005). A chip prototyping substrate, ACM SIGARCH Computer Architecture News, 33:4, (34-43), Online publication date: 1-Nov-2005.
  190. ACM
    Löf H and Holmgren S affinity-on-next-touch Proceedings of the 19th annual international conference on Supercomputing, (387-392)
  191. ACM
    Nurvitadhi E, Chalainanont N and Lu S Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C Proceedings of the 19th annual international conference on Supercomputing, (12-20)
  192. ACM
    Chen G and Kandemir M Optimizing inter-processor data locality on embedded chip multiprocessors Proceedings of the 5th ACM international conference on Embedded software, (227-236)
  193. ACM
    Jansen K and Zhang H Scheduling malleable tasks with precedence constraints Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, (86-95)
  194. ACM
    Suh T, Kim D and Lee H Cache coherence support for non-shared bus architecture on heterogeneous MPSoCs Proceedings of the 42nd annual Design Automation Conference, (553-558)
  195. ACM
    Loghi M, Letis M, Benini L and Poncino M Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors Proceedings of the 15th ACM Great Lakes symposium on VLSI, (276-281)
  196. Sinnen O and Sousa L (2005). Communication Contention in Task Scheduling, IEEE Transactions on Parallel and Distributed Systems, 16:6, (503-515), Online publication date: 1-Jun-2005.
  197. Kadayif I, Kandemir M, Chen G, Ozturk O, Karakoy M and Sezer U (2005). Optimizing Array-Intensive Applications for On-Chip Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 16:5, (396-411), Online publication date: 1-May-2005.
  198. Acacio M, Gonzalez J, Garcia J and Duato J (2005). A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 16:1, (67-79), Online publication date: 1-Jan-2005.
  199. Bertozzi D, Jalabert A, Murali S, Tamhankar R, Stergiou S, Benini L and De Micheli G (2005). NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip, IEEE Transactions on Parallel and Distributed Systems, 16:2, (113-129), Online publication date: 1-Feb-2005.
  200. Frachtenberg E, Feitelson D, Petrini F and Fernandez J (2005). Adaptive Parallel Job Scheduling with Flexible Coscheduling, IEEE Transactions on Parallel and Distributed Systems, 16:11, (1066-1077), Online publication date: 1-Nov-2005.
  201. Bhunia S, Datta A, Banerjee N and Roy K (2005). GAARP, IEEE Transactions on Computers, 54:6, (752-766), Online publication date: 1-Jun-2005.
  202. Vuletic M, Pozzi L and Ienne P (2005). Seamless Hardware-Software Integration in Reconfigurable Computing Systems, IEEE Design & Test, 22:2, (102-113), Online publication date: 1-Mar-2005.
  203. Nava M, Blouet P, Teninge P, Coppola M, Ben-Ismail T, Picchiottino S and Wilson R (2005). An Open Platform for Developing Multiprocessor SoCs, Computer, 38:7, (60-67), Online publication date: 1-Jul-2005.
  204. Foglia P, Giorgi R and Prete C (2005). Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload, Journal of Parallel and Distributed Computing, 65:3, (289-306), Online publication date: 1-Mar-2005.
  205. Basharahil R, Wims B, Xu C and Fu S (2005). Distributed Shared Arrays, The Journal of Supercomputing, 31:2, (161-184), Online publication date: 1-Feb-2005.
  206. Huerta P, Castillo J, Mártinez J and López V Multi MicroBlaze system for parallel computing Proceedings of the 9th International Conference on Circuits, (1-6)
  207. Chen J, Watson III W, Edwards R and Mao W Message Passing for Linux Clusters with Gigabit Ethernet Mesh Connections Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
  208. Datta A, Bhunia S, Banerjee N and Roy K A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks Proceedings of the 6th International Symposium on Quality of Electronic Design, (358-363)
  209. Liu C, Sivasubramaniam A, Kandemir M and Irwin M Exploiting Barriers to Optimize Power Consumption of CMPs Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
  210. Di Blas A, Dahle D, Diekhans M, Grate L, Hirschberg J, Karplus K, Keller H, Kendrick M, J. Mesa-Martinez F, Pease D, Rice E, Schultz A, Speck D and Hughey R (2005). The UCSC Kestrel Parallel Processor, IEEE Transactions on Parallel and Distributed Systems, 16:1, (80-92), Online publication date: 1-Jan-2005.
  211. Min G and Ould-Khaoua M (2005). Prediction of communication delay in torus networks under multiple time-scale correlated traffic, Performance Evaluation, 60:1-4, (255-273), Online publication date: 1-May-2005.
  212. Seo D, Ali A, Lim W, Rafique N and Thottethodi M Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks Proceedings of the 32nd annual international symposium on Computer Architecture, (432-443)
  213. Chishti Z, Powell M and Vijaykumar T Optimizing Replication, Communication, and Capacity Allocation in CMPs Proceedings of the 32nd annual international symposium on Computer Architecture, (357-368)
  214. ACM
    Chishti Z, Powell M and Vijaykumar T (2005). Optimizing Replication, Communication, and Capacity Allocation in CMPs, ACM SIGARCH Computer Architecture News, 33:2, (357-368), Online publication date: 1-May-2005.
  215. ACM
    Seo D, Ali A, Lim W, Rafique N and Thottethodi M (2005). Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks, ACM SIGARCH Computer Architecture News, 33:2, (432-443), Online publication date: 1-May-2005.
  216. Kyrman M, Kyrman N and Martynez J Cherry-MP Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, (245-256)
  217. Francesco P, Antonio P and Marchal P Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture Proceedings of the conference on Design, Automation and Test in Europe - Volume 2, (736-741)
  218. Cao F and Singh J MEDYM Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware, (292-313)
  219. Jansen K and Zhang H An approximation algorithm for scheduling malleable tasks under general precedence constraints Proceedings of the 16th international conference on Algorithms and Computation, (236-245)
  220. Brown J and Wen Z Toward an application support layer Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics, (912-919)
  221. Bilardi G, Pietracaprina A, Pucci G, Schifano F and Tripiccione R The potential of on-chip multiprocessing for QCD machines Proceedings of the 12th international conference on High Performance Computing, (386-397)
  222. Subramaniam M and Shi J Using Dominators to Extract Observable Protocol Contexts Proceedings of the Third IEEE International Conference on Software Engineering and Formal Methods, (96-105)
  223. Loghi M and Poncino M Exploring Energy/Performance Tradeoffs in Shared Memory MPSoCs Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, (508-513)
  224. Yin Z, Yuan L and Tang T (2005). A new parallel strategy for two-dimensional incompressible flow simulations using pseudo-spectral methods, Journal of Computational Physics, 210:1, (325-341), Online publication date: 20-Nov-2005.
  225. Shen Z (2005). The impact of the apex node on routing inside a pyramid structure, Applied Mathematics and Computation, 169:1, (157-178), Online publication date: 1-Oct-2005.
  226. Jayanti P, Petrovic S and Narula N Read/Write based fast-path transformation for FCFS mutual exclusion Proceedings of the 31st international conference on Theory and Practice of Computer Science, (209-218)
  227. Ros A, Acacio M and García J A novel lightweight directory architecture for scalable shared-memory multiprocessors Proceedings of the 11th international Euro-Par conference on Parallel Processing, (582-591)
  228. Stuijk S, Basten T, Mesman B and Geilen M Predictable Embedding of Large Data Structures in Multiprocessor Networks-on-Chip Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, (254-255)
  229. ACM
    González J, Latorre F and González A Cache organizations for clustered microarchitectures Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, (46-55)
  230. ACM
    McCurdy C and Fischer C A localizing directory coherence protocol Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, (23-29)
  231. ACM
    Chaudhuri M and Heinrich M (2004). SMTp, ACM SIGARCH Computer Architecture News, 32:2, (124), Online publication date: 2-Mar-2004.
  232. ACM
    Brifault K and Charles H (2003). Data cache management on EPIC architecture, ACM SIGARCH Computer Architecture News, 32:3, (35-42), Online publication date: 1-Jun-2004.
  233. ACM
    Teo Y and Onggo B Formalization and strictness of simulation event orderings Proceedings of the eighteenth workshop on Parallel and distributed simulation, (89-96)
  234. ACM
    Dutot P, Eyraud L, Mounié G and Trystram D Bi-criteria algorithm for scheduling jobs on cluster platforms Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, (125-132)
  235. ACM
    Chung F, Graham R and Varghese G Parallelism versus memory allocation in pipelined router forwarding engines Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, (103-111)
  236. Louri A and Kodi A (2004). An Optical Interconnection Network and a Modified Snooping Protocol for the Design of Large-Scale Symmetric Multiprocessors (SMPs), IEEE Transactions on Parallel and Distributed Systems, 15:12, (1093-1104), Online publication date: 1-Dec-2004.
  237. Acacio M, Gonzalez J, Garcia J and Duato J (2004). An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration, IEEE Transactions on Parallel and Distributed Systems, 15:8, (755-768), Online publication date: 1-Aug-2004.
  238. Xu C and Ibrahim T (2004). A Keyword-Based Semantic Prefetching Approach in Internet News Services, IEEE Transactions on Knowledge and Data Engineering, 16:5, (601-611), Online publication date: 1-May-2004.
  239. Zhang Z, Zhu Z and Zhang X (2004). Design and Optimization of Large Size and Low Overhead Off-Chip Caches, IEEE Transactions on Computers, 53:7, (843-855), Online publication date: 1-Jul-2004.
  240. Sinnen O and Sousa L (2004). On Task Scheduling Accuracy, The Journal of Supercomputing, 27:2, (177-194), Online publication date: 1-Feb-2004.
  241. Basten T, Bošnački D and Geilen M (2004). Cluster-Based Partial-Order Reduction, Automated Software Engineering, 11:4, (365-402), Online publication date: 1-Oct-2004.
  242. ACM
    Wang T, Qi Z and Moritz C Opportunities and challenges in application-tuned circuits and architectures based on nanodevices Proceedings of the 1st conference on Computing frontiers, (503-511)
  243. ACM
    Ozturk O, Kandemir M, Irwin M and Kolcu I Tuning data replication for improving behavior of MPSoC applications Proceedings of the 14th ACM Great Lakes symposium on VLSI, (170-173)
  244. ACM
    Banerjee S and Dutt N FIFO power optimization for on-chip networks Proceedings of the 14th ACM Great Lakes symposium on VLSI, (187-191)
  245. ACM
    Han S, Baghdadi A, Bonaciu M, Chae S and Jerraya A An efficient scalable and flexible data transfer architecture for multiprocessor SoC with massive distributed memory Proceedings of the 41st annual Design Automation Conference, (250-255)
  246. Rauber T and Rünger G (2004). Improving locality for ODE solvers by program transformations, Scientific Programming, 12:3, (133-154), Online publication date: 1-Aug-2004.
  247. Mitra T, Roychoudhury A and Shen Q Impact of Java Memory Model on Out-of-Order Multiprocessors Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, (99-110)
  248. Kadayif I, Kandemir M and Kolcu I Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors Proceedings of the conference on Design, automation and test in Europe - Volume 2
  249. Suh T, Blough D and Lee H Supporting Cache Coherence in Heterogeneous Multiprocessor Systems Proceedings of the conference on Design, automation and test in Europe - Volume 2
  250. R"dulescu A, Dielissen J, Goossens K, Rijpkema E and Wielage P An Efficient On-Chip Network Interface Offering Guaranteed Services, Shared-Memory Abstraction, and Flexible Network Configuration Proceedings of the conference on Design, automation and test in Europe - Volume 2
  251. Millberg M, Nilsson E, Thid R and Jantsch A Guaranteed Bandwidth Using Looped Containers in Temporally Disjoint Networks within the Nostrum Network on Chip Proceedings of the conference on Design, automation and test in Europe - Volume 2
  252. Chaudhuri M and Heinrich M SMTp Proceedings of the 31st annual international symposium on Computer architecture
  253. Cameron K and Ge R Predicting and Evaluating Distributed Communication Performance Proceedings of the 2004 ACM/IEEE conference on Supercomputing
  254. Jalby W, Lemuet C and Le Pasteur X (2004). WBTK, International Journal of High Performance Computing Applications, 18:2, (211-224), Online publication date: 1-May-2004.
  255. Gürsoy A and Kale L (2004). Performance and modularity benefits of message-driven execution, Journal of Parallel and Distributed Computing, 64:4, (461-480), Online publication date: 1-Apr-2004.
  256. Knoll D and Keyes D (2004). Jacobian-free Newton-Krylov methods, Journal of Computational Physics, 193:2, (357-397), Online publication date: 20-Jan-2004.
  257. ACM
    Mahapatra N, Liu J and Sundaresan K (2002). The performance advantage of applying compression to the memory system, ACM SIGPLAN Notices, 38:2 supplement, (86-96), Online publication date: 15-Feb-2003.
  258. ACM
    Kiran S, Jayram M, Rao P and Nandy S A complexity effective communication model for behavioral modeling of signal processing applications Proceedings of the 40th annual Design Automation Conference, (412-415)
  259. ACM
    Brifault K and Charles H Data cache management on EPIC architecture Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture, (35-42)
  260. Yin G, Xu C and Wang L (2003). Optimal Remapping in Dynamic Bulk Synchronous Computations via a Stochastic Control Approach, IEEE Transactions on Parallel and Distributed Systems, 14:1, (51-62), Online publication date: 1-Jan-2003.
  261. Tam A and Wang C (2003). Contention-Aware Communication Schedule for High-Speed Communication, Cluster Computing, 6:4, (339-353), Online publication date: 1-Oct-2003.
  262. ACM
    McCurdy C and Fischer C User-controllable coherence for high performance shared memory multiprocessors Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, (73-82)
  263. ACM
    Goel A, Roychoudhury A and Mitra T Compactly representing parallel program executions Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, (191-202)
  264. ACM
    Saunders S and Rauchwerger L ARMI Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, (230-241)
  265. ACM
    Chen M and Olukotun K The Jrpm system for dynamically parallelizing Java programs Proceedings of the 30th annual international symposium on Computer architecture, (434-446)
  266. ACM
    Chen M and Olukotun K (2003). The Jrpm system for dynamically parallelizing Java programs, ACM SIGARCH Computer Architecture News, 31:2, (434-446), Online publication date: 1-May-2003.
  267. ACM
    Jayanti P Adaptive and efficient abortable mutual exclusion Proceedings of the twenty-second annual symposium on Principles of distributed computing, (295-304)
  268. ACM
    Paul J Programmers' views of SoCs Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (156-181)
  269. ACM
    Poplavko P, Basten T, Bekooij M, van Meerbergen J and Mesman B Task-level timing models for guaranteed performance in multiprocessor networks-on-chip Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, (63-72)
  270. ACM
    Shen Z An alternative routing algorithm for the pyramid structures Proceedings of the 2003 ACM symposium on Applied computing, (1009-1013)
  271. ACM
    Shen Z An optimal broadcasting schema for multidimensional mesh structures Proceedings of the 2003 ACM symposium on Applied computing, (1019-1023)
  272. ACM
    McCurdy C and Fischer C (2003). User-controllable coherence for high performance shared memory multiprocessors, ACM SIGPLAN Notices, 38:10, (73-82), Online publication date: 1-Oct-2003.
  273. ACM
    Goel A, Roychoudhury A and Mitra T (2003). Compactly representing parallel program executions, ACM SIGPLAN Notices, 38:10, (191-202), Online publication date: 1-Oct-2003.
  274. ACM
    Saunders S and Rauchwerger L (2003). ARMI, ACM SIGPLAN Notices, 38:10, (230-241), Online publication date: 1-Oct-2003.
  275. Baer J Multiprocessing Encyclopedia of Computer Science, (1205-1207)
  276. Quinn M, Miller R, Miller R and Quinn M Parallel processing Encyclopedia of Computer Science, (1349-1365)
  277. Ye T, Benini L and De Micheli G Packetized On-Chip Interconnect Communication Analysis for MPSoC Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
  278. Dongarra J, Foster I, Fox G, Gropp W, Kennedy K, Torczon L and White A References Sourcebook of parallel computing, (729-789)
  279. Emerson E and Kahlon V Rapid parameterized model checking of snoopy cache coherence protocols Proceedings of the 9th international conference on Tools and algorithms for the construction and analysis of systems, (144-159)
  280. Madsen J, Mahadevan S, Virk K and Gonzalez M Network-on-Chip Modeling for System-Level Multiprocessor Simulation Proceedings of the 24th IEEE International Real-Time Systems Symposium
  281. Jeong J and Dubois M Cost-Sensitive Cache Replacement Algorithms Proceedings of the 9th International Symposium on High-Performance Computer Architecture
  282. Radovic Z and Hagersten E Hierarchical Backoff Locks for Nonuniform Communication Architectures Proceedings of the 9th International Symposium on High-Performance Computer Architecture
  283. Vetter J and Mueller F (2003). Communication characteristics of large-scale scientific applications for contemporary cluster architectures, Journal of Parallel and Distributed Computing, 63:9, (853-865), Online publication date: 1-Sep-2003.
  284. Iwamoto Y, Suga K, Ootsu K, Yokota T and Baba T (2003). Receiving message prediction method, Parallel Computing, 29:11-12, (1509-1538), Online publication date: 1-Nov-2003.
  285. ACM
    Kee Y, Kim J and Ha S ParADE Proceedings of the 2003 ACM/IEEE conference on Supercomputing
  286. ACM
    Mauer C, Hill M and Wood D Full-system timing-first simulation Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (108-116)
  287. ACM
    Kandiraju G and Sivasubramaniam A Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (129-139)
  288. ACM
    Mauer C, Hill M and Wood D (2002). Full-system timing-first simulation, ACM SIGMETRICS Performance Evaluation Review, 30:1, (108-116), Online publication date: 1-Jun-2002.
  289. ACM
    Kandiraju G and Sivasubramaniam A (2002). Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks, ACM SIGMETRICS Performance Evaluation Review, 30:1, (129-139), Online publication date: 1-Jun-2002.
  290. ACM
    Gharsalli F, Meftali S, Rousseau F and Jerraya A Automatic generation of embedded memory wrapper for multiprocessor SoC Proceedings of the 39th annual Design Automation Conference, (596-601)
  291. ACM
    Cesário W, Baghdadi A, Gauthier L, Lyonnard D, Nicolescu G, Paviot Y, Yoo S, Jerraya A and Diaz-Nava M Component-based design approach for multicore SoCs Proceedings of the 39th annual Design Automation Conference, (789-794)
  292. ACM
    Brown J, Grossman J and Knight T A lightweight idempotent messaging protocol for faulty networks Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, (248-257)
  293. ACM
    Gharsalli F, Lyonnard D, Meftali S, Rousseau F and Jerraya A Unifying memory and processor wrapper architecture in multiprocessor SoC design Proceedings of the 15th international symposium on System Synthesis, (26-31)
  294. ACM
    Paul J, Andrews C, Cassidy A and Thomas D System-level modeling of a network switch SoC Proceedings of the 15th international symposium on System Synthesis, (62-67)
  295. ACM
    Lepak K and Lipasti M Temporally silent stores Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, (30-41)
  296. ACM
    Lepak K and Lipasti M (2002). Temporally silent stores, ACM SIGPLAN Notices, 37:10, (30-41), Online publication date: 1-Oct-2002.
  297. ACM
    Lepak K and Lipasti M (2002). Temporally silent stores, ACM SIGARCH Computer Architecture News, 30:5, (30-41), Online publication date: 1-Dec-2002.
  298. ACM
    Lepak K and Lipasti M (2002). Temporally silent stores, ACM SIGOPS Operating Systems Review, 36:5, (30-41), Online publication date: 1-Dec-2002.
  299. ACM
    Mahapatra N, Liu J and Sundaresan K The performance advantage of applying compression to the memory system Proceedings of the 2002 workshop on Memory system performance, (86-96)
  300. Chatterjee S, R. Lebeck A, K. Patnala P and Thottethodi M (2002). Recursive Array Layouts and Fast Matrix Multiplication, IEEE Transactions on Parallel and Distributed Systems, 13:11, (1105-1123), Online publication date: 1-Nov-2002.
  301. Sorin D, Plakal M, Condon A, Hill M, Martin M and Wood D (2002). Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol, IEEE Transactions on Parallel and Distributed Systems, 13:6, (556-578), Online publication date: 1-Jun-2002.
  302. Charlesworth A (2002). The Sun Fireplane Interconnect, IEEE Micro, 22:1, (36-45), Online publication date: 1-Jan-2002.
  303. Parthasarathy S and Dwarkadas S (2002). Shared State for Distributed Interactive Data Mining Applications, Distributed and Parallel Databases, 11:2, (129-155), Online publication date: 1-Mar-2002.
  304. Simmonds R, Kiddle C and Unger B Addressing blocking and scalability in critical channel traversing Proceedings of the sixteenth workshop on Parallel and distributed simulation, (17-24)
  305. Vetter J and Mueller F Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures Proceedings of the 16th International Parallel and Distributed Processing Symposium
  306. Zahran M and Franklin M A Feasibility Study of Hierarchical Multithreading Proceedings of the 16th International Parallel and Distributed Processing Symposium
  307. Beaumont O, Boudet V and Robert Y A Realistic Model and an Efficient Heuristic for Scheduling with Heterogeneous Processors Proceedings of the 16th International Parallel and Distributed Processing Symposium
  308. Lepère R and Trystram D A New Clustering Algorithm for Large Communication Delays Proceedings of the 16th International Parallel and Distributed Processing Symposium
  309. Yin G, Xu C and Wang L Optimal Remapping in Dynamic Bulk Synchronous Computations via a Stochastic Control Approach Proceedings of the 16th International Parallel and Distributed Processing Symposium
  310. Acacio M, González J, García J and Duato J A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors Proceedings of the 16th International Parallel and Distributed Processing Symposium
  311. Acacio M, González J, García J and Duato J The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, (155-164)
  312. Acacio M, González J, García J and Duato J Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture Proceedings of the 2002 ACM/IEEE conference on Supercomputing, (1-12)
  313. Radović Z and Hagersten E Efficient synchronization for nonuniform communication architectures Proceedings of the 2002 ACM/IEEE conference on Supercomputing, (1-13)
  314. Vetter J and Yoo A An empirical performance evaluation of scalable scientific applications Proceedings of the 2002 ACM/IEEE conference on Supercomputing, (1-18)
  315. van der Steen A and Dongarra J Overview of high performance computers Handbook of massive data sets, (791-852)
  316. Baydal E, López P and Duato J Increasing the adaptivity of routing algorithms for k-ary n-cubes Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing, (455-462)
  317. Acacio M, González J, García J and Duato J Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing, (368-375)
  318. Povitsky A (2002). Parallel ADI solver based on processor scheduling, Applied Mathematics and Computation, 133:1, (43-81), Online publication date: 25-Nov-2002.
  319. ACM
    Shen Z A routing algorithm for the pyramid structures Proceedings of the 2001 ACM symposium on Applied computing, (484-488)
  320. ACM
    Pressel D Fundamental limitations on the use of prefetching and stream buffers for scientific applications Proceedings of the 2001 ACM symposium on Applied computing, (554-559)
  321. ACM
    Nikolopoulos D, Ayguadé E, Papatheodorou T, Polychronopoulos C and Labarta J The trade-off between implicit and explicit data distribution in shared-memory programming paradigms Proceedings of the 15th international conference on Supercomputing, (23-37)
  322. ACM
    Tang H and Yang T Optimizing threaded MPI execution on SMP clusters Proceedings of the 15th international conference on Supercomputing, (381-392)
  323. ACM
    Shuf Y, Serrano M, Gupta M and Singh J Characterizing the memory behavior of Java workloads Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (194-205)
  324. ACM
    Dutot P and Trystram D Scheduling on hierarchical clusters using malleable tasks Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, (199-208)
  325. ACM
    Vetter J and McCracken M Statistical scalability analysis of communication operations in distributed applications Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, (123-132)
  326. ACM
    Shuf Y, Serrano M, Gupta M and Singh J (2001). Characterizing the memory behavior of Java workloads, ACM SIGMETRICS Performance Evaluation Review, 29:1, (194-205), Online publication date: 1-Jun-2001.
  327. ACM
    Meftali S, Gharsalli F, Rousseau F and Jerraya A An optimal memory allocation for application-specific multiprocessor system-on-chip Proceedings of the 14th international symposium on Systems synthesis, (19-24)
  328. ACM
    Aslot V and Eigenmann R (2001). Performance characteristics of the SPEC OMP2001 benchmarks, ACM SIGARCH Computer Architecture News, 29:5, (31-40), Online publication date: 1-Dec-2001.
  329. ACM
    Vetter J and McCracken M (2001). Statistical scalability analysis of communication operations in distributed applications, ACM SIGPLAN Notices, 36:7, (123-132), Online publication date: 1-Jul-2001.
  330. ACM
    Charlesworth A The sun fireplane system interconnect Proceedings of the 2001 ACM/IEEE conference on Supercomputing, (7-7)
  331. Banikazemi M, Govindaraju R, Blackmore R and Panda D (2001). MPI-LAPI, IEEE Transactions on Parallel and Distributed Systems, 12:10, (1081-1093), Online publication date: 1-Oct-2001.
  332. Xu C and Chaudhary V (2001). Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences, IEEE Transactions on Parallel and Distributed Systems, 12:5, (433-450), Online publication date: 1-May-2001.
  333. Vaidya A, Sivasubramaniam A and Das C (2001). Impact of Virtual Channels and Adaptive Routing on Application Performance, IEEE Transactions on Parallel and Distributed Systems, 12:2, (223-237), Online publication date: 1-Feb-2001.
  334. Li T and John L (2001). ADir_pNB, IEEE Transactions on Computers, 50:9, (921-934), Online publication date: 1-Sep-2001.
  335. Nikolopoulos D and Papatheodorou T (2001). The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors, International Journal of Parallel Programming, 29:3, (249-282), Online publication date: 1-Jun-2001.
  336. Hsiao H and King C (2001). An Application-Driven Study of Multicast Communication for Write Invalidation, The Journal of Supercomputing, 18:3, (279-304), Online publication date: 1-Mar-2001.
  337. Hsiao H and King C (2001). Exploiting Network Locality for CC-NUMA Multiprocessors, The Journal of Supercomputing, 18:1, (63-87), Online publication date: 1-Jan-2001.
  338. Brock B, Carpenter G, Chiprout E, Dean M, De Backer P, Elnozahy E, Franke H, Giampapa M, Glasco D, Peterson J, Rajamony R, Ravindran R, Rawson F, Rockhold R and Rubio J (2001). Experience with building a commodity intel-based ccNUMA system, IBM Journal of Research and Development, 45:2, (207-227), Online publication date: 1-Mar-2001.
  339. Baghdadi A, Lyonnard D, Zergainoh N and Jerraya A An efficient architecture model for systematic design of application-specific multiprocessor SoC Proceedings of the conference on Design, automation and test in Europe, (55-63)
  340. Martin M, Sorin D, Cain H, Hill M and Lipasti M Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, (328-337)
  341. Moh S, Yu C, Lee B, Youn H, Han D and Lee D (2001). Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement, IEEE Transactions on Computers, 50:8, (811-823), Online publication date: 1-Aug-2001.
  342. Min R and Hu Y (2001). Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses, IEEE Transactions on Computers, 50:11, (1191-1201), Online publication date: 1-Nov-2001.
  343. Beaumont O, Boudet V and Petitet A (2001). A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers), IEEE Transactions on Computers, 50:10, (1052-1070), Online publication date: 1-Oct-2001.
  344. ACM
    Lyonnard D, Yoo S, Baghdadi A and Jerraya A Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip Proceedings of the 38th annual Design Automation Conference, (518-523)
  345. ACM
    Hill J, Szewczyk R, Woo A, Hollar S, Culler D and Pister K (2000). System architecture directions for networked sensors, ACM SIGPLAN Notices, 35:11, (93-104), Online publication date: 1-Nov-2000.
  346. ACM
    Sánchez J and González A Modulo scheduling for a fully-distributed clustered VLIW architecture Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, (124-133)
  347. ACM
    Hill J, Szewczyk R, Woo A, Hollar S, Culler D and Pister K System architecture directions for networked sensors Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, (93-104)
  348. ACM
    Hill J, Szewczyk R, Woo A, Hollar S, Culler D and Pister K (2000). System architecture directions for networked sensors, ACM SIGARCH Computer Architecture News, 28:5, (93-104), Online publication date: 1-Dec-2000.
  349. ACM
    Hill J, Szewczyk R, Woo A, Hollar S, Culler D and Pister K (2000). System architecture directions for networked sensors, ACM SIGOPS Operating Systems Review, 34:5, (93-104), Online publication date: 1-Dec-2000.
  350. ACM
    Vishkin D and Vishkin U (2000). Experiments with list ranking for explicit multi-threaded (XMT) instruction parallelism, ACM Journal of Experimental Algorithmics, 5, (10-es), Online publication date: 31-Dec-2001.
  351. ACM
    Vishkin U A no-busy-wait balanced tree parallel algorithmic paradigm Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, (147-155)
  352. ACM
    Nikolopoulos D, Papatheodorou T, Polychronopoulos C, Labarta J and Ayguadé E A case for user-level dynamic page migration Proceedings of the 14th international conference on Supercomputing, (119-130)
  353. ACM
    Manjikian N Enhancements and applications of the SimpleScalar simulator for undergraduate and graduate computer architecture education Proceedings of the 2000 workshop on Computer architecture education, (8-es)
  354. Fleury M, Downton A and Clark A (2000). Performance Metrics for Embedded Parallel Pipelines, IEEE Transactions on Parallel and Distributed Systems, 11:11, (1164-1185), Online publication date: 1-Nov-2000.
  355. Prieto M, Llorente I and Tirado F (2000). Data Locality Exploitation in the Decomposition of Regular Domain Problems, IEEE Transactions on Parallel and Distributed Systems, 11:11, (1141-1150), Online publication date: 1-Nov-2000.
  356. Milenkovic A (2000). Achieving High Performance in Bus-Based Shared-Memory Multiprocessors, IEEE Concurrency, 8:3, (36-44), Online publication date: 1-Jul-2000.
  357. Rauber T and Rünger G (2000). A Transformation Approach to Derive Efficient Parallel Implementations, IEEE Transactions on Software Engineering, 26:4, (315-339), Online publication date: 1-Apr-2000.
  358. Gao G and Sarkar V (2000). Location Consistency-A New Memory Model and Cache Consistency Protocol, IEEE Transactions on Computers, 49:8, (798-813), Online publication date: 1-Aug-2000.
  359. Nikolopoulos D, Papatheodorou T, Polychronopoulos C, Labarta J and Ayguad\'{e} E (2000). A transparent runtime data distribution engine for OpenMP, Scientific Programming, 8:3, (143-162), Online publication date: 1-Aug-2000.
  360. Acquaviva J and Jalby W Hardware prediction for data coherency of scientific codes on DSM Proceedings of the 2000 ACM/IEEE conference on Supercomputing, (41-es)
  361. Nikolopoulos D, Papatheodorou T, Polychronopoulos C, Labarta J and Ayguade;eacute; E Is data distribution necessary in OpenMP? Proceedings of the 2000 ACM/IEEE conference on Supercomputing, (47-es)
  362. Hsiao H and King C The Thread-Based Protocol Engines for CC-NUMA Multiprocessors Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
  363. ACM
    Tang H, Shen K and Yang T (2000). Program transformation and runtime support for threaded MPI execution on shared-memory machines, ACM Transactions on Programming Languages and Systems, 22:4, (673-700), Online publication date: 1-Jul-2000.
  364. ACM
    Bagrodia R, Deeljman E, Docy S and Phan T (1999). Performance prediction of large parallel applications using parallel simulations, ACM SIGPLAN Notices, 34:8, (151-162), Online publication date: 1-Aug-1999.
  365. ACM
    Tang H, Shen K and Yang T (1999). Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines, ACM SIGPLAN Notices, 34:8, (107-118), Online publication date: 1-Aug-1999.
  366. ACM
    Chatterjee S, Lebeck A, Patnala P and Thottethodi M Recursive array layouts and fast parallel matrix multiplication Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, (222-231)
  367. ACM
    Jeong J and Dubois M Optimal replacements in caches with two miss costs Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, (155-164)
  368. ACM
    Hill M, Condon A, Plakal M and Sorin D A system-level specification framework for I/O architectures Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, (138-147)
  369. ACM
    Bagrodia R, Deeljman E, Docy S and Phan T Performance prediction of large parallel applications using parallel simulations Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (151-162)
  370. ACM
    Tang H, Shen K and Yang T Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (107-118)
  371. ACM
    Sodan A and Torra V Hierarchical fuzzy configuration of implementation strategies Proceedings of the 1999 ACM symposium on Applied computing, (250-259)
  372. Giorgi R and Prete C (1999). PSCR, IEEE Transactions on Parallel and Distributed Systems, 10:7, (742-763), Online publication date: 1-Jul-1999.
  373. Dai D and Panda D (1999). Exploiting the Benefits of Multiple-Path Network in DSM Systems, IEEE Transactions on Computers, 48:2, (236-244), Online publication date: 1-Feb-1999.
  374. Kwak H, Lee B, Hurson A, Yoon S and Hahn W (1999). Effects of Multithreading on Cache Performance, IEEE Transactions on Computers, 48:2, (176-184), Online publication date: 1-Feb-1999.
  375. ACM
    Messina P, Culler D, Pfeiffer W, Martin W, Oden J and Smith G (1998). Architecture, Communications of the ACM, 41:11, (36-44), Online publication date: 1-Nov-1998.
  376. ACM
    Abandah G and Davidson E (1998). Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance, ACM SIGARCH Computer Architecture News, 26:3, (318-329), Online publication date: 1-Jun-1998.
  377. ACM
    Keeton K, Patterson D, He Y, Raphael R and Baker W (1998). Performance characterization of a Quad Pentium Pro SMP using OLTP workloads, ACM SIGARCH Computer Architecture News, 26:3, (15-26), Online publication date: 1-Jun-1998.
  378. Abandah G and Davidson E Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance Proceedings of the 25th annual international symposium on Computer architecture, (318-329)
  379. Keeton K, Patterson D, He Y, Raphael R and Baker W Performance characterization of a Quad Pentium Pro SMP using OLTP workloads Proceedings of the 25th annual international symposium on Computer architecture, (15-26)
  380. ACM
    Pinkston T and Beerel P Computer engineering using innovative instructional technologies at the University of Southern California Proceedings of the 1998 workshop on Computer architecture education, (27-es)
  381. Abandah G and Davidson E (1998). Characterizing Distributed Shared Memory Performance, IEEE Transactions on Parallel and Distributed Systems, 9:2, (206-216), Online publication date: 1-Feb-1998.
  382. Lee J and Jhon C Reducing coherence overhead of barrier synchronization in software DSMs Proceedings of the 1998 ACM/IEEE conference on Supercomputing, (1-18)
  383. Dubois M, Jeong J, Song Y and Moga A (1998). Rapid Hardware Prototyping on RPM-2, IEEE Design & Test, 15:3, (112-118), Online publication date: 1-Jul-1998.
  384. ACM
    Shi W, Hu W and Tang Z (1997). An interaction of coherence protocols and memory consistency models in DSM systems, ACM SIGOPS Operating Systems Review, 31:4, (41-54), Online publication date: 1-Oct-1997.
  385. ACM
    Vishkin U From algorithm parallelism to instruction-level parallelism Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, (260-271)
Contributors
  • Google LLC
  • Microsoft Research
  • Princeton University

Recommendations