From the Publisher:
This book outlines a set of issues that are critical to all of parallel architecture--communication latency, communication bandwidth, and coordination of cooperative work (across modern designs). It describes the set of techniques available in hardware and in software to address each issues and explore how the various techniques interact.
Cited By
- Upadhyay B, Ros A and M. S (2023). Fine-grain data classification to filter token coherence traffic, Journal of Parallel and Distributed Computing, 171:C, (40-53), Online publication date: 1-Jan-2023.
- Zheng R and Pai S Efficient execution of graph algorithms on CPU with SIMD extensions Proceedings of the 2021 IEEE/ACM International Symposium on Code Generation and Optimization, (262-276)
- Upadhyay B, Ros A and Shah J (2021). Efficient classification of private memory blocks, Journal of Parallel and Distributed Computing, 157:C, (256-268), Online publication date: 1-Nov-2021.
- Khalil K, Eldash O, Kumar A and Bayoumi M (2019). Self-healing hardware systems, Microelectronics Journal, 93:C, Online publication date: 1-Nov-2019.
- Jalaparti V, Douglas C, Ghosh M, Agrawal A, Floratou A, Kandula S, Menache I, Naor J and Rao S Netco Proceedings of the ACM Symposium on Cloud Computing, (186-198)
- Chen C, Hsia A, Zhan Y and Liu T (2018). Energy-efficient hybrid coherence protocol for multicore processors, Cluster Computing, 21:3, (1521-1541), Online publication date: 1-Sep-2018.
- Dutt S, Nandi S and Trivedi G (2017). Analysis and Design of Adders for Approximate Computing, ACM Transactions on Embedded Computing Systems, 17:2, (1-28), Online publication date: 31-Mar-2018.
- Bijo S, Johnsen E, Pun K, Seidl C and Tarifa S Deployment by Construction for Multicore Architectures Leveraging Applications of Formal Methods, Verification and Validation. Modeling, (448-465)
- Titos-Gil R, Flores A, Fernández-Pascual R, Ros A and Acacio M Way-combining directory Proceedings of the International Conference on Supercomputing, (1-10)
- Bijo S, Johnsen E, Pun K and Tarifa S An operational semantics of cache coherent multicore architectures Proceedings of the 31st Annual ACM Symposium on Applied Computing, (1219-1224)
- Ros A and Kaxiras S Racer The 49th Annual IEEE/ACM International Symposium on Microarchitecture, (1-13)
- Farias C, Li W, Delicato F, Pirmez L, Zomaya A, Pires P and Souza J (2016). A Systematic Review of Shared Sensor Networks, ACM Computing Surveys, 48:4, (1-50), Online publication date: 2-May-2016.
- Zhang G, Horn W and Sanchez D Exploiting commutativity to reduce the cost of updates to shared data in cache-coherent systems Proceedings of the 48th International Symposium on Microarchitecture, (13-25)
- Chen A, Bhat D and Gehringer E An extensible simulator for bus- and directory-based cache coherence Proceedings of the Workshop on Computer Architecture Education, (1-7)
- Kuiper G, Geuns S and Bekooij M Utilization Improvement by Enforcing Mutual Exclusive Task Execution in Modal Stream Processing Applications Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems, (28-37)
- Cabezas J, Jordà M, Gelado I, Navarro N and Hwu W GPU-SM: shared memory multi-GPU programming Proceedings of the 8th Workshop on General Purpose Processing using GPUs, (13-24)
- Venkataramani S, Chakradhar S, Roy K and Raghunathan A Computing approximately, and efficiently Proceedings of the 2015 Design, Automation & Test in Europe Conference & Exhibition, (748-751)
- Psathakis A, Papaefstathiou V, Chrysos N, Chaix F, Vasilakis E, Pnevmatikatos D and Katevenis M A Systematic Evaluation of Emerging Mesh-like CMP NoCs Proceedings of the Eleventh ACM/IEEE Symposium on Architectures for networking and communications systems, (159-170)
- Jiang Y and Chen W (2015). Task scheduling for grid computing systems using a genetic algorithm, The Journal of Supercomputing, 71:4, (1357-1377), Online publication date: 1-Apr-2015.
- Bavarsad A and Atoofian E (2015). TurboLock, Computing, 97:6, (649-661), Online publication date: 1-Jun-2015.
- Asher Y, Shajrawi Y, Gendel Y, Haber G and Segal O A study of manycore shared memory architecture as a way to build SOC applications Proceedings of the Symposium on High Performance Computing, (174-181c)
- Abadal S, Mestres A, Iannazzo M, Solé-Pareta J, Alarcón E and Cabellos-Aparicio A Evaluating the Feasibility of Wireless Networks-on-Chip Enabled by Graphene Proceedings of the 2014 International Workshop on Network on Chip Architectures, (51-56)
- Daya B, Chen C, Subramanian S, Kwon W, Park S, Krishna T, Holt J, Chandrakasan A and Peh L (2014). SCORPIO, ACM SIGARCH Computer Architecture News, 42:3, (25-36), Online publication date: 16-Oct-2014.
- Voskuilen G and Vijaykumar T (2014). High-performance fractal coherence, ACM SIGARCH Computer Architecture News, 42:1, (701-714), Online publication date: 5-Apr-2014.
- Voskuilen G and Vijaykumar T (2014). High-performance fractal coherence, ACM SIGPLAN Notices, 49:4, (701-714), Online publication date: 5-Apr-2014.
- Atoofian E Acceleration of Software Transactional Memory through Hardware Clock Proceedings of International Workshop on Manycore Embedded Systems, (41-47)
- Geuns S, Hausmans J and Bekooij M Temporal analysis model extraction for optimizing modal multi-rate stream processing applications Proceedings of the 17th International Workshop on Software and Compilers for Embedded Systems, (21-30)
- Liu C and Yang C Exploiting heterogeneity in MPSoCs to prevent potential trojan propagation across malicious IPs Proceedings of the 24th edition of the great lakes symposium on VLSI, (335-340)
- Rutgers J, Bekooij M and Smit G Programming a Multicore Architecture without Coherency and Atomic Operations Proceedings of Programming Models and Applications on Multicores and Manycores, (29-38)
- Hu J, Zhuge Q, Xue C, Tseng W and Sha E (2014). Management and optimization for nonvolatile memory-based hybrid scratchpad memory on multicore embedded processors, ACM Transactions on Embedded Computing Systems, 13:4, (1-25), Online publication date: 5-Dec-2014.
- Voskuilen G and Vijaykumar T High-performance fractal coherence Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, (701-714)
- Xu W, Yu H, Lu D, Song F, Wang D, Ye X, Pei S, Fan D and Xie H (2014). Fast and scalable lock methods for video coding on many-core architecture, Journal of Visual Communication and Image Representation, 25:7, (1758-1762), Online publication date: 1-Oct-2014.
- Braojos R, Dogan A, Beretta I, Ansaloni G and Atienza D Hardware/software approach for code synchronization in low-power multi-core sensor nodes Proceedings of the conference on Design, Automation & Test in Europe, (1-6)
- Kim T and Hoskote Y Automatic generation of custom SIMD instructions for superword level parallelism Proceedings of the conference on Design, Automation & Test in Europe, (1-6)
- Daya B, Chen C, Subramanian S, Kwon W, Park S, Krishna T, Holt J, Chandrakasan A and Peh L SCORPIO Proceeding of the 41st annual international symposium on Computer architecuture, (25-36)
- Rutgers J, Bekooij M and Smit G Programming a Multicore Architecture without Coherency and Atomic Operations Proceedings of Programming Models and Applications on Multicores and Manycores, (29-38)
- Albericio J, Ibáñez P, Viñals V and Llabería J The reuse cache Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, (310-321)
- Singh A, Das A and Kumar A Energy optimization by exploiting execution slacks in streaming applications on multiprocessor systems Proceedings of the 50th Annual Design Automation Conference, (1-7)
- Yiapanis P, Rosas-Ham D, Brown G and Luján M (2013). Optimizing software runtime systems for speculative parallelization, ACM Transactions on Architecture and Code Optimization, 9:4, (1-27), Online publication date: 1-Jan-2013.
- Dogan A, Braojos R, Constantin J, Ansaloni G, Burg A and Atienza D Synchronizing code execution on ultra-low-power embedded multi-channel signal analysis platforms Proceedings of the Conference on Design, Automation and Test in Europe, (396-399)
- Rodrigues E, Navaux P, Panetta J and Mendes C (2013). Preserving the original MPI semantics in a virtualized processor environment, Science of Computer Programming, 78:4, (412-421), Online publication date: 1-Apr-2013.
- Atoofian E VGTS Proceedings of the 19th international conference on Parallel Processing, (203-214)
- Schor L, Bacivarov I, Rai D, Yang H, Kang S and Thiele L Scenario-based design flow for mapping streaming applications onto on-chip many-core systems Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems, (71-80)
- Nychis G, Fallin C, Moscibroda T, Mutlu O and Seshan S (2012). On-chip networks from a networking perspective, ACM SIGCOMM Computer Communication Review, 42:4, (407-418), Online publication date: 24-Sep-2012.
- Kramer W Top500 versus sustained performance Proceedings of the 21st international conference on Parallel architectures and compilation techniques, (223-230)
- Dreslinski R, Manville T, Sewell K, Das R, Pinckney N, Satpathy S, Blaauw D, Sylvester D and Mudge T XPoint cache Proceedings of the 21st international conference on Parallel architectures and compilation techniques, (75-86)
- Carpenter A, Hu J, Kocabas O, Huang M and Wu H (2012). Enhancing effective throughput for transmission line-based bus, ACM SIGARCH Computer Architecture News, 40:3, (165-176), Online publication date: 5-Sep-2012.
- Nychis G, Fallin C, Moscibroda T, Mutlu O and Seshan S On-chip networks from a networking perspective Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication, (407-418)
- Ratanaworabhan P, Burtscher M, Kirovski D and Zorn B Hardware support for enforcing isolation in lock-based parallel programs Proceedings of the 26th ACM international conference on Supercomputing, (301-310)
- Aggarwal V, Stitt G, George A and Yoon C (2012). SCF, ACM Transactions on Reconfigurable Technology and Systems, 5:2, (1-23), Online publication date: 1-Jun-2012.
- Solano-Quinde L, Bode B and Somani A Techniques for the parallelization of unstructured grid applications on multi-GPU systems Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (140-147)
- Edwards J and Vishkin U Better speedups using simpler parallel programming for graph connectivity and biconnectivity Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (103-114)
- Atoofian E and Bavarsad A AGC Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores, (11-16)
- Santos B and Macedo H (2012). Improving CUDA™ C/C++ encoding readability to foster parallel application development, ACM SIGSOFT Software Engineering Notes, 37:1, (1-5), Online publication date: 27-Jan-2012.
- Pricopi M and Mitra T (2012). Bahurupi, ACM Transactions on Architecture and Code Optimization, 8:4, (1-21), Online publication date: 1-Jan-2012.
- Carpenter A, Hu J, Kocabas O, Huang M and Wu H Enhancing effective throughput for transmission line-based bus Proceedings of the 39th Annual International Symposium on Computer Architecture, (165-176)
- Hart S, Frachtenberg E and Berezecki M Predicting memcached throughput using simulation and modeling Proceedings of the 2012 Symposium on Theory of Modeling and Simulation - DEVS Integrative M&S Symposium, (1-8)
- Van der Wijngaart R, Sridharan S and Lee V Extending the BT NAS parallel benchmark to exascale computing Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-9)
- Tavangarian D Virtual computing Software Service and Application Engineering, (53-70)
- Atoofian E and Bavarsad A Maintaining consistency in software transactional memory through dynamic versioning tuning Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II, (40-49)
- Terechko A, Hoogerbrugge J, Alkadi G, Guntur S, Lahiri A, Duranton M, Wüst C, Christie P, Nackaerts A and Kumar A (2012). Balancing Programmability and Silicon Efficiency of Heterogeneous Multicore Architectures, ACM Transactions on Embedded Computing Systems, 11S:1, (1-32), Online publication date: 1-Jun-2012.
- Kramer W How to measure useful, sustained performance State of the Practice Reports, (1-18)
- Clemons J, Jones A, Perricone R, Savarese S and Austin T EFFEX Proceedings of the 48th Design Automation Conference, (1020-1025)
- Vishkin U (2011). Using simple abstraction to reinvent computing for parallelism, Communications of the ACM, 54:1, (75-85), Online publication date: 1-Jan-2011.
- Cappiello C, Hinostroza A, Pernici B, Sami M, Henis E, Kat R, Meth K and Mura M ADSC Proceedings of the First international conference on Information and communication on technology for the fight against global warming, (165-179)
- Khan M and Herbordt M (2011). Parallel discrete molecular dynamics simulation with speculation and in-order commitment, Journal of Computational Physics, 230:17, (6563-6582), Online publication date: 1-Jul-2011.
- Kourtis K, Goumas G and Koziris N (2010). Exploiting compression opportunities to improve SpMxV performance on shared memory systems, ACM Transactions on Architecture and Code Optimization, 7:3, (1-31), Online publication date: 1-Dec-2010.
- Habgood K and Arel I Revisiting Cramer's rule for solving dense linear systems Proceedings of the 2010 Spring Simulation Multiconference, (1-8)
- Pugsley S, Spjut J, Nellans D and Balasubramonian R SWEL Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (465-476)
- Kim H, Ahn J and Kim J Replication-aware leakage management in chip multiprocessors with private L2 cache Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design, (135-140)
- Liao D and Berkovich S A new multi-core pipelined architecture for executing sequential programs for parallel geospatial computing Proceedings of the 1st International Conference and Exhibition on Computing for Geospatial Research & Application, (1-8)
- Xue J, Garg A, Ciftcioglu B, Hu J, Wang S, Savidis I, Jain M, Berman R, Liu P, Huang M, Wu H, Friedman E, Wicks G and Moore D (2010). An intra-chip free-space optical interconnect, ACM SIGARCH Computer Architecture News, 38:3, (94-105), Online publication date: 19-Jun-2010.
- Xue J, Garg A, Ciftcioglu B, Hu J, Wang S, Savidis I, Jain M, Berman R, Liu P, Huang M, Wu H, Friedman E, Wicks G and Moore D An intra-chip free-space optical interconnect Proceedings of the 37th annual international symposium on Computer architecture, (94-105)
- Rodrigues E, Navaux P, Panetta J and Mendes C A new technique for data privatization in user-level threads and its use in parallel applications Proceedings of the 2010 ACM Symposium on Applied Computing, (2149-2154)
- Kirman N and Martínez J A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systems, (15-28)
- Kirman N and Martínez J (2010). A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing, ACM SIGPLAN Notices, 45:3, (15-28), Online publication date: 5-Mar-2010.
- Kirman N and Martínez J (2010). A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing, ACM SIGARCH Computer Architecture News, 38:1, (15-28), Online publication date: 5-Mar-2010.
- Rupnow K, Adriaens J, Fu W and Compton K Accurately evaluating application performance in simulated hybrid multi-tasking systems Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, (135-144)
- Bueno D, Conger C and George A (2010). Optimizing rapidIO architectures for onboard processing, ACM Transactions on Embedded Computing Systems, 9:3, (1-30), Online publication date: 1-Feb-2010.
- Canedo A, Yoshizawa T and Komatsu H Skewed pipelining for parallel simulink simulations Proceedings of the Conference on Design, Automation and Test in Europe, (891-896)
- Daneshtalab M, Ebrahimi M, Liljeberg P, Plosila J and Tenhunen H A Low-Latency and Memory-Efficient On-chip Network Proceedings of the 2010 Fourth ACM/IEEE International Symposium on Networks-on-Chip, (99-106)
- Ye Y and Megson G Distributed acceleration of mobile radio network optimisation algorithms Proceedings of the 9th conference on Wireless telecommunications symposium, (124-131)
- Shabbir A, Kumar A, Stuijk S, Mesman B and Corporaal H (2010). CA-MPSoC, Journal of Systems Architecture: the EUROMICRO Journal, 56:7, (265-277), Online publication date: 1-Jul-2010.
- Razavi S and Sarbazi-Azad H (2010). The triangular pyramid, Information Sciences: an International Journal, 180:11, (2328-2339), Online publication date: 1-Jun-2010.
- Akay M and Abasıkeleş I (2010). Predicting the performance measures of an optical distributed shared memory multiprocessor by using support vector regression, Expert Systems with Applications: An International Journal, 37:9, (6293-6301), Online publication date: 1-Sep-2010.
- Akay M, Abasıkeleş İ and Oral M (2010). Application of self organizing maps for investigating network latency on a broadcast-based distributed shared memory multiprocessor, Expert Systems with Applications: An International Journal, 37:4, (2937-2942), Online publication date: 1-Apr-2010.
- Abasıkeleş İ and Akay M (2010). Performance evaluation of directory protocols on an optical broadcast-based distributed shared memory multiprocessor, Computers and Electrical Engineering, 36:1, (114-131), Online publication date: 1-Jan-2010.
- Breitbart J An approach for semiautomatic locality optimizations using OpenMP Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2, (291-301)
- Shabbir A, Stuijk S, Kumar A, Theelen B, Mesman B and Corporaal H A predictable communication assist Proceedings of the 7th ACM international conference on Computing frontiers, (97-98)
- Suleman M, Mutlu O, Qureshi M and Patt Y (2009). Accelerating critical section execution with asymmetric multi-core architectures, ACM SIGARCH Computer Architecture News, 37:1, (253-264), Online publication date: 1-Mar-2009.
- Kandemir M, Muralidhara S, Narayanan S, Zhang Y and Ozturk O Optimizing shared cache behavior of chip multiprocessors Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture, (505-516)
- Zeng H, Yourst M, Ghose K and Ponomarev D MPTLsim Proceedings of the 46th Annual Design Automation Conference, (226-231)
- Ophelders F, Bekooij M and Corporaal H A tuneable software cache coherence protocol for heterogeneous MPSoCs Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, (383-392)
- Ha P, Tsigas P and Anshus O (2009). Preliminary results on nb-feb, a synchronization primitive for parallel programming, ACM SIGPLAN Notices, 44:4, (295-296), Online publication date: 14-Feb-2009.
- Firoozshahian A, Solomatnikov A, Shacham O, Asgar Z, Richardson S, Kozyrakis C and Horowitz M (2009). A memory system design framework, ACM SIGARCH Computer Architecture News, 37:3, (406-417), Online publication date: 15-Jun-2009.
- Firoozshahian A, Solomatnikov A, Shacham O, Asgar Z, Richardson S, Kozyrakis C and Horowitz M A memory system design framework Proceedings of the 36th annual international symposium on Computer architecture, (406-417)
- Müller T and Knoll A Attention driven visual processing for an interactive dialog robot Proceedings of the 2009 ACM symposium on Applied Computing, (1151-1155)
- Suleman M, Mutlu O, Qureshi M and Patt Y (2009). Accelerating critical section execution with asymmetric multi-core architectures, ACM SIGPLAN Notices, 44:3, (253-264), Online publication date: 28-Feb-2009.
- Suleman M, Mutlu O, Qureshi M and Patt Y Accelerating critical section execution with asymmetric multi-core architectures Proceedings of the 14th international conference on Architectural support for programming languages and operating systems, (253-264)
- Ha P, Tsigas P and Anshus O Preliminary results on nb-feb, a synchronization primitive for parallel programming Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, (295-296)
- Hansson A, Goossens K, Bekooij M and Huisken J (2009). CoMPSoC, ACM Transactions on Design Automation of Electronic Systems, 14:1, (1-24), Online publication date: 1-Jan-2009.
- Larsson A, Gidenstam A, Ha P, Papatriantafilou M and Tsigas P (2009). Multiword atomic read/write registers on multiprocessor systems, ACM Journal of Experimental Algorithmics, 13, (1.7-1.30), Online publication date: 1-Feb-2009.
- Huang C, Lin C and Tsai W (2009). A multi-core based parallel streaming mechanism for concurrent video-on-demand applications, IEEE Communications Letters, 13:4, (286-288), Online publication date: 1-Apr-2009.
- Li X and Hammami O (2009). An automatic design flow for data parallel and pipelined signal processing applications on embedded multiprocessor with NoC, International Journal of Reconfigurable Computing, 2009, (2-2), Online publication date: 1-Jan-2009.
- Wagner I and Bertacco V CASPAR Proceedings of the Conference on Design, Automation and Test in Europe, (658-663)
- Kim H, Youn S and Kim J (2009). Reusability-aware cache memory sharing for chip multiprocessors with private L2 caches, Journal of Systems Architecture: the EUROMICRO Journal, 55:10-12, (446-456), Online publication date: 1-Oct-2009.
- Palermo G, Silvano C and Zaccaria V (2009). ReSPIR, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28:12, (1816-1829), Online publication date: 1-Dec-2009.
- Han W, Yi Y, Muir M, Nousias I, Arslan T and Erdogan A (2009). Multicore architectures with dynamically reconfigurable array processors for wireless broadband technologies, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28:12, (1830-1843), Online publication date: 1-Dec-2009.
- Martínez-Pérez I and Zimmermann K (2009). Parallel bioinspired algorithms for NP complete graph problems, Journal of Parallel and Distributed Computing, 69:3, (221-229), Online publication date: 1-Mar-2009.
- Park K, Park H, Jeun W and Ha S (2009). Boolean circuit programming, Journal of Discrete Algorithms, 7:2, (267-277), Online publication date: 1-Jun-2009.
- Honda K, Vasconcelos V and Yoshida N (2009). Type-Directed Compilation for Multicore Programming, Electronic Notes in Theoretical Computer Science (ENTCS), 241, (101-111), Online publication date: 1-Jul-2009.
- Mihu I and Caprita H A strategy for parallel sorting algorithms evaluation based on MPI technology Proceedings of the 8th WSEAS international conference on Artificial intelligence, knowledge engineering and data bases, (49-54)
- Ernst D and Stevenson D (2008). Concurrent CS, ACM SIGCSE Bulletin, 40:3, (230-234), Online publication date: 25-Aug-2008.
- De A, Roychoudhury A and D'Souza D Java memory model aware software validation Proceedings of the 8th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, (8-14)
- Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M and Kozyrakis C (2008). Comparative evaluation of memory models for chip multiprocessors, ACM Transactions on Architecture and Code Optimization, 5:3, (1-30), Online publication date: 1-Nov-2008.
- Kluter T, Brisk P, Ienne P and Charbon E Speculative DMA for architecturally visible storage in instruction set extensions Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis, (243-248)
- Wagner I and Bertacco V MCjammer Proceedings of the conference on Design, automation and test in Europe, (670-675)
- Moonen A, Bekooij M, van den Berg R and van Meerbergen J Cache aware mapping of streaming applications on a multiprocessor system-on-chip Proceedings of the conference on Design, automation and test in Europe, (300-305)
- Ernst D and Stevenson D Concurrent CS Proceedings of the 13th annual conference on Innovation and technology in computer science education, (230-234)
- Popovici K, Guerin X, Rousseau F, Paolucci P and Jerraya A (2008). Platform-based software design flow for heterogeneous MPSoC, ACM Transactions on Embedded Computing Systems, 7:4, (1-23), Online publication date: 1-Jul-2008.
- Inoue H, Sakai J and Edahiro M (2008). Processor virtualization for secure mobile terminals, ACM Transactions on Design Automation of Electronic Systems, 13:3, (1-23), Online publication date: 1-Jul-2008.
- Wen X and Vishkin U Fpga-based prototype of a pram-on-chip processor Proceedings of the 5th conference on Computing frontiers, (55-66)
- Bijlsma T, Bekooij M, Jansen P and Smit G Communication between nested loop programs via circular buffers in an embedded multiprocessor system Proceedings of the 11th international workshop on Software & compilers for embedded systems, (33-42)
- Gehringer E, Cassel L, Deibel K and Joel W (2008). Wikis, ACM SIGCSE Bulletin, 40:1, (379-380), Online publication date: 29-Feb-2008.
- Gehringer E, Cassel L, Deibel K and Joel W Wikis Proceedings of the 39th SIGCSE technical symposium on Computer science education, (379-380)
- Liao D and Berkovich S The design of parallel solid voxelization based on multi-processor pipeline by program slicing Proceedings of the 12th WSEAS international conference on Computers, (167-172)
- Fernández-Pascual R, García J, Acacio M and Duato J Fault-tolerant cache coherence protocols for CMPs Proceedings of the 15th international conference on High performance computing, (555-568)
- Chung C and Kim J Broadcast filtering-aware task assignment techniques for low-power MPSoCs Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture, (89-96)
- Subramaniam M, Chundi P and Siy H Aggregating changes to efficiently check consistency Ninth international workshop on Principles of software evolution: in conjunction with the 6th ESEC/FSE joint meeting, (39-42)
- Moreira O, Valente F and Bekooij M Scheduling multiple independent hard-real-time jobs on a heterogeneous multiprocessor Proceedings of the 7th ACM & IEEE international conference on Embedded software, (57-66)
- Chandraiah P and Doemer R Designer-controlled generation of parallel and flexible heterogeneous MPSoC specification Proceedings of the 44th annual Design Automation Conference, (787-790)
- Stuijk S, Basten T, Geilen M and Corporaal H Multiprocessor resource allocation for throughput-constrained synchronous dataflow graphs Proceedings of the 44th annual Design Automation Conference, (777-782)
- Hwu W, Ryoo S, Ueng S, Kelm J, Gelado I, Stone S, Kidd R, Baghsorkhi S, Mahesri A, Tsao S, Navarro N, Lumetta S, Frank M and Patel S Implicitly parallel programming models for thousand-core microprocessors Proceedings of the 44th annual Design Automation Conference, (754-759)
- Gambhir M, Gehringer E and Solihin Y Animations of important concepts in parallel computer architecture Proceedings of the 2007 workshop on Computer architecture education, (23-29)
- Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M and Kozyrakis C (2007). Comparing memory systems for chip multiprocessors, ACM SIGARCH Computer Architecture News, 35:2, (358-368), Online publication date: 9-Jun-2007.
- Leverich J, Arakida H, Solomatnikov A, Firoozshahian A, Horowitz M and Kozyrakis C Comparing memory systems for chip multiprocessors Proceedings of the 34th annual international symposium on Computer architecture, (358-368)
- Atoofian E, Baniasadi A and Aasaraai K Speculative supplier identification for reducing power of interconnects in snoopy cache coherence protocols Proceedings of the 4th international conference on Computing frontiers, (259-266)
- Wheeler P and Fulp E A taxonomy of parallel techniques for intrusion detection Proceedings of the 45th annual southeast regional conference, (278-282)
- Heirman W, Dambre J and Van Campenhout J Synthetic traffic generation as a tool for dynamic interconnect evaluation Proceedings of the 2007 international workshop on System level interconnect prediction, (65-72)
- Tumeo A, Monchiero M, Palermo G, Ferrandi F and Sciuto D A design kit for a fully working shared memory multiprocessor on FPGA Proceedings of the 17th ACM Great Lakes symposium on VLSI, (219-222)
- Chung C, Kim J and Kim D Reducing snoop-energy in shared bus-based mpsocs by filtering useless broadcasts Proceedings of the 17th ACM Great Lakes symposium on VLSI, (126-131)
- Cameron K, Ge R and Sun X (2007). $\log_{\rm n}{\rm P}$ and $\log_{3}{\rm P}$, IEEE Transactions on Computers, 56:3, (314-327), Online publication date: 1-Mar-2007.
- Poletti F, Poggiali A, Bertozzi D, Benini L, Marchal P, Loghi M and Poncino M (2007). Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing, IEEE Transactions on Computers, 56:5, (606-621), Online publication date: 1-May-2007.
- Chen G and Kandemir M An Approach for Enhancing Inter-processor Data Locality on Chip Multiprocessors Transactions on High-Performance Embedded Architectures and Compilers I, (214-233)
- Borkar S, Jouppi N and Stenstrom P Microprocessors in the era of terascale integration Proceedings of the conference on Design, automation and test in Europe, (237-242)
- Narayanan S, Kandemir M and Brooks R Performance aware secure code partitioning Proceedings of the conference on Design, automation and test in Europe, (1122-1127)
- Popovici K and Jerraya A Simulink based hardware-software codesign flow for heterogeneous MPSoC Proceedings of the 2007 Summer Computer Simulation Conference, (497-504)
- Bolotin E, Guz Z, Cidon I, Ginosar R and Kolodny A The Power of Priority Proceedings of the First International Symposium on Networks-on-Chip, (117-126)
- Hong B and Prasanna V (2007). Adaptive Allocation of Independent Tasks to Maximize Throughput, IEEE Transactions on Parallel and Distributed Systems, 18:10, (1420-1435), Online publication date: 1-Oct-2007.
- Conway P and Hughes B (2007). The AMD Opteron Northbridge Architecture, IEEE Micro, 27:2, (10-21), Online publication date: 1-Mar-2007.
- Vlassov V, Merino O, Moritz C and Popov K Support for fine-grained synchronization in shared-memory multiprocessors Proceedings of the 9th international conference on Parallel Computing Technologies, (453-467)
- Ros A, Acacio M and García J Direct coherence Proceedings of the 14th international conference on High performance computing, (147-160)
- Kennedy K, Koelbel C and Zima H The rise and fall of High Performance Fortran Proceedings of the third ACM SIGPLAN conference on History of programming languages, (7-1-7-22)
- Ha P, Papatriantafilou M and Tsigas P (2007). Efficient self-tuning spin-locks using competitive analysis, Journal of Systems and Software, 80:7, (1077-1090), Online publication date: 1-Jul-2007.
- Moerschell A and Owens J Distributed texture memory in a multi-GPU environment Proceedings of the 21st ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, (31-38)
- Cheng L, Muralimanohar N, Ramani K, Balasubramonian R and Carter J (2006). Interconnect-Aware Coherence Protocols for Chip Multiprocessors, ACM SIGARCH Computer Architecture News, 34:2, (339-351), Online publication date: 1-May-2006.
- Strauss K, Shen X and Torrellas J (2006). Flexible Snooping, ACM SIGARCH Computer Architecture News, 34:2, (327-338), Online publication date: 1-May-2006.
- Lin Y, Lee H, Woh M, Harel Y, Mahlke S, Mudge T, Chakrabarti C and Flautner K (2006). SODA, ACM SIGARCH Computer Architecture News, 34:2, (89-101), Online publication date: 1-May-2006.
- Inoue H, Ikeno A, Kondo M, Sakai J and Edahiro M VIRTUS Proceedings of the 43rd annual Design Automation Conference, (484-489)
- Jerraya A, Bouchhima A and Pétrot F Programming models and HW-SW interfaces abstraction for multi-processor SoC Proceedings of the 43rd annual Design Automation Conference, (280-285)
- Sinnen O, Sousa L and Eika Sandnes F (2006). Toward a Realistic Task Scheduling Model, IEEE Transactions on Parallel and Distributed Systems, 17:3, (263-275), Online publication date: 1-Mar-2006.
- Jeong J and Dubois M (2006). Cache Replacement Algorithms with Nonuniform Miss Costs, IEEE Transactions on Computers, 55:4, (353-365), Online publication date: 1-Apr-2006.
- Kirman N, Kirman M, Dokania R, Martinez J, Apsel A, Watkins M and Albonesi D Leveraging Optical Technology in Future Bus-based Chip Multiprocessors Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, (492-503)
- Sampson J, Gonzalez R, Collard J, Jouppi N, Schlansker M and Calder B Exploiting Fine-Grained Data Parallelism with Chip Multiprocessors and Fast Barriers Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, (235-246)
- Lin Y, Lee H, Woh M, Harel Y, Mahlke S, Mudge T, Chakrabarti C and Flautner K SODA Proceedings of the 33rd annual international symposium on Computer Architecture, (89-101)
- Cheng L, Muralimanohar N, Ramani K, Balasubramonian R and Carter J Interconnect-Aware Coherence Protocols for Chip Multiprocessors Proceedings of the 33rd annual international symposium on Computer Architecture, (339-351)
- Strauss K, Shen X and Torrellas J Flexible Snooping Proceedings of the 33rd annual international symposium on Computer Architecture, (327-338)
- Chen H, Decker J and Bierbaum N Future networking for scalable I/O Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks, (128-135)
- Farley R and Fulp E Effects of processing delay on function-parallel firewalls Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks, (136-141)
- Gu P and Vishkin U (2006). Case study of gate-level logic simulation on an extremely fine-grained chip multiprocessor, Journal of Embedded Computing, 2:2, (181-190), Online publication date: 1-Apr-2006.
- Mouhoub R and Hammami O Multiprocessor on chip Proceedings of the 20th international conference on Parallel and distributed processing, (319-319)
- Sendag R, Yilmazer A, Yi J and Uht A Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems Proceedings of the 20th international conference on Parallel and distributed processing, (21-21)
- Blazewicz J, Kovalyov M, Machowiak M, Trystram D and Weglarz J (2006). Preemptable Malleable Task Scheduling Problem, IEEE Transactions on Computers, 55:4, (486-490), Online publication date: 1-Apr-2006.
- Xue L, ozturk O, Li F, Kandemir M and Kolcu I Dynamic partitioning of processing and memory resources in embedded MPSoC architectures Proceedings of the conference on Design, automation and test in Europe: Proceedings, (690-695)
- Dumitrescu C, Ciocoi V and Pop M Power QUICC™ II pro family of communications processors Proceedings of the 5th WSEAS international conference on Data networks, communications and computers, (125-130)
- Navarrete C, Holgado S and Anguiano E Epitaxial surface growth with local interaction, parallel and non-parallel simulations Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing, (882-889)
- Ivanov L (2006). A modern course on parallel and distributed processing, Journal of Computing Sciences in Colleges, 21:6, (29-38), Online publication date: 1-Jun-2006.
- Jansen K and Zhang H (2006). An approximation algorithm for scheduling malleable tasks under general precedence constraints, ACM Transactions on Algorithms, 2:3, (416-434), Online publication date: 1-Jul-2006.
- Kandemir M (2006). Reducing energy consumption of multiprocessor SoC architectures by exploiting memory bank locality, ACM Transactions on Design Automation of Electronic Systems, 11:2, (410-441), Online publication date: 1-Apr-2006.
- Bjerregaard T and Mahadevan S (2006). A survey of research and practices of Network-on-chip, ACM Computing Surveys, 38:1, (1-es), Online publication date: 29-Jun-2006.
- Fatoohi R, Kardys K, Koshy S, Sivaramakrishnan S and Vetter J (2006). Performance evaluation of high-speed interconnects using dense communication patterns, Parallel Computing, 32:11-12, (794-807), Online publication date: 1-Dec-2006.
- Chen K and Sha E (2006). The fat-stack and universal routing in interconnection networks, Journal of Parallel and Distributed Computing, 66:5, (705-715), Online publication date: 1-May-2006.
- James T, Barkhi R and Johnson J (2006). Platform impact on performance of parallel genetic algorithms, Engineering Applications of Artificial Intelligence, 19:8, (843-856), Online publication date: 1-Dec-2006.
- Shen Z (2006). A bypassing path based routing algorithm for the pyramid structures, Applied Mathematics and Computation, 181:2, (1523-1543), Online publication date: 1-Oct-2006.
- Antony J, Janes P and Rendell A Exploring thread and memory placement on NUMA architectures Proceedings of the 13th international conference on High Performance Computing, (338-352)
- Tang X, Li K, Xiao D, Yang J, Liu M and Qin Y A dynamic communication contention awareness list scheduling algorithm for arbitrary heterogeneous system Proceedings of the 2006 Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, DOA, GADA, and ODBASE - Volume Part II, (1315-1324)
- Bekooij M, Parmar S and van Meerbergen J Performance guarantees by simulation of process Proceedings of the 2005 workshop on Software and compilers for embedded systems, (10-19)
- Sampson J, González R, Collard J, Jouppi N and Schlansker M (2005). Fast synchronization for chip multiprocessors, ACM SIGARCH Computer Architecture News, 33:4, (64-69), Online publication date: 1-Nov-2005.
- Davis J, Richardson S, Charitsis C and Olukotun K (2005). A chip prototyping substrate, ACM SIGARCH Computer Architecture News, 33:4, (34-43), Online publication date: 1-Nov-2005.
- Löf H and Holmgren S affinity-on-next-touch Proceedings of the 19th annual international conference on Supercomputing, (387-392)
- Nurvitadhi E, Chalainanont N and Lu S Characterization of L3 cache behavior of SPECjAppServer2002 and TPC-C Proceedings of the 19th annual international conference on Supercomputing, (12-20)
- Chen G and Kandemir M Optimizing inter-processor data locality on embedded chip multiprocessors Proceedings of the 5th ACM international conference on Embedded software, (227-236)
- Jansen K and Zhang H Scheduling malleable tasks with precedence constraints Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, (86-95)
- Suh T, Kim D and Lee H Cache coherence support for non-shared bus architecture on heterogeneous MPSoCs Proceedings of the 42nd annual Design Automation Conference, (553-558)
- Loghi M, Letis M, Benini L and Poncino M Exploring the energy efficiency of cache coherence protocols in single-chip multi-processors Proceedings of the 15th ACM Great Lakes symposium on VLSI, (276-281)
- Sinnen O and Sousa L (2005). Communication Contention in Task Scheduling, IEEE Transactions on Parallel and Distributed Systems, 16:6, (503-515), Online publication date: 1-Jun-2005.
- Kadayif I, Kandemir M, Chen G, Ozturk O, Karakoy M and Sezer U (2005). Optimizing Array-Intensive Applications for On-Chip Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 16:5, (396-411), Online publication date: 1-May-2005.
- Acacio M, Gonzalez J, Garcia J and Duato J (2005). A Two-Level Directory Architecture for Highly Scalable cc-NUMA Multiprocessors, IEEE Transactions on Parallel and Distributed Systems, 16:1, (67-79), Online publication date: 1-Jan-2005.
- Bertozzi D, Jalabert A, Murali S, Tamhankar R, Stergiou S, Benini L and De Micheli G (2005). NoC Synthesis Flow for Customized Domain Specific Multiprocessor Systems-on-Chip, IEEE Transactions on Parallel and Distributed Systems, 16:2, (113-129), Online publication date: 1-Feb-2005.
- Frachtenberg E, Feitelson D, Petrini F and Fernandez J (2005). Adaptive Parallel Job Scheduling with Flexible Coscheduling, IEEE Transactions on Parallel and Distributed Systems, 16:11, (1066-1077), Online publication date: 1-Nov-2005.
- Bhunia S, Datta A, Banerjee N and Roy K (2005). GAARP, IEEE Transactions on Computers, 54:6, (752-766), Online publication date: 1-Jun-2005.
- Vuletic M, Pozzi L and Ienne P (2005). Seamless Hardware-Software Integration in Reconfigurable Computing Systems, IEEE Design & Test, 22:2, (102-113), Online publication date: 1-Mar-2005.
- Nava M, Blouet P, Teninge P, Coppola M, Ben-Ismail T, Picchiottino S and Wilson R (2005). An Open Platform for Developing Multiprocessor SoCs, Computer, 38:7, (60-67), Online publication date: 1-Jul-2005.
- Foglia P, Giorgi R and Prete C (2005). Reducing coherence overhead and boosting performance of high-end SMP multiprocessors running a DSS workload, Journal of Parallel and Distributed Computing, 65:3, (289-306), Online publication date: 1-Mar-2005.
- Basharahil R, Wims B, Xu C and Fu S (2005). Distributed Shared Arrays, The Journal of Supercomputing, 31:2, (161-184), Online publication date: 1-Feb-2005.
- Huerta P, Castillo J, Mártinez J and López V Multi MicroBlaze system for parallel computing Proceedings of the 9th International Conference on Circuits, (1-6)
- Chen J, Watson III W, Edwards R and Mao W Message Passing for Linux Clusters with Gigabit Ethernet Mesh Connections Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
- Datta A, Bhunia S, Banerjee N and Roy K A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks Proceedings of the 6th International Symposium on Quality of Electronic Design, (358-363)
- Liu C, Sivasubramaniam A, Kandemir M and Irwin M Exploiting Barriers to Optimize Power Consumption of CMPs Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
- Di Blas A, Dahle D, Diekhans M, Grate L, Hirschberg J, Karplus K, Keller H, Kendrick M, J. Mesa-Martinez F, Pease D, Rice E, Schultz A, Speck D and Hughey R (2005). The UCSC Kestrel Parallel Processor, IEEE Transactions on Parallel and Distributed Systems, 16:1, (80-92), Online publication date: 1-Jan-2005.
- Min G and Ould-Khaoua M (2005). Prediction of communication delay in torus networks under multiple time-scale correlated traffic, Performance Evaluation, 60:1-4, (255-273), Online publication date: 1-May-2005.
- Seo D, Ali A, Lim W, Rafique N and Thottethodi M Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks Proceedings of the 32nd annual international symposium on Computer Architecture, (432-443)
- Chishti Z, Powell M and Vijaykumar T Optimizing Replication, Communication, and Capacity Allocation in CMPs Proceedings of the 32nd annual international symposium on Computer Architecture, (357-368)
- Chishti Z, Powell M and Vijaykumar T (2005). Optimizing Replication, Communication, and Capacity Allocation in CMPs, ACM SIGARCH Computer Architecture News, 33:2, (357-368), Online publication date: 1-May-2005.
- Seo D, Ali A, Lim W, Rafique N and Thottethodi M (2005). Near-Optimal Worst-Case Throughput Routing for Two-Dimensional Mesh Networks, ACM SIGARCH Computer Architecture News, 33:2, (432-443), Online publication date: 1-May-2005.
- Kyrman M, Kyrman N and Martynez J Cherry-MP Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, (245-256)
- Francesco P, Antonio P and Marchal P Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture Proceedings of the conference on Design, Automation and Test in Europe - Volume 2, (736-741)
- Cao F and Singh J MEDYM Proceedings of the ACM/IFIP/USENIX 6th international conference on Middleware, (292-313)
- Jansen K and Zhang H An approximation algorithm for scheduling malleable tasks under general precedence constraints Proceedings of the 16th international conference on Algorithms and Computation, (236-245)
- Brown J and Wen Z Toward an application support layer Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics, (912-919)
- Bilardi G, Pietracaprina A, Pucci G, Schifano F and Tripiccione R The potential of on-chip multiprocessing for QCD machines Proceedings of the 12th international conference on High Performance Computing, (386-397)
- Subramaniam M and Shi J Using Dominators to Extract Observable Protocol Contexts Proceedings of the Third IEEE International Conference on Software Engineering and Formal Methods, (96-105)
- Loghi M and Poncino M Exploring Energy/Performance Tradeoffs in Shared Memory MPSoCs Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, (508-513)
- Yin Z, Yuan L and Tang T (2005). A new parallel strategy for two-dimensional incompressible flow simulations using pseudo-spectral methods, Journal of Computational Physics, 210:1, (325-341), Online publication date: 20-Nov-2005.
- Shen Z (2005). The impact of the apex node on routing inside a pyramid structure, Applied Mathematics and Computation, 169:1, (157-178), Online publication date: 1-Oct-2005.
- Jayanti P, Petrovic S and Narula N Read/Write based fast-path transformation for FCFS mutual exclusion Proceedings of the 31st international conference on Theory and Practice of Computer Science, (209-218)
- Ros A, Acacio M and García J A novel lightweight directory architecture for scalable shared-memory multiprocessors Proceedings of the 11th international Euro-Par conference on Parallel Processing, (582-591)
- Stuijk S, Basten T, Mesman B and Geilen M Predictable Embedding of Large Data Structures in Multiprocessor Networks-on-Chip Proceedings of the conference on Design, Automation and Test in Europe - Volume 1, (254-255)
- González J, Latorre F and González A Cache organizations for clustered microarchitectures Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, (46-55)
- McCurdy C and Fischer C A localizing directory coherence protocol Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, (23-29)
- Chaudhuri M and Heinrich M (2004). SMTp, ACM SIGARCH Computer Architecture News, 32:2, (124), Online publication date: 2-Mar-2004.
- Brifault K and Charles H (2003). Data cache management on EPIC architecture, ACM SIGARCH Computer Architecture News, 32:3, (35-42), Online publication date: 1-Jun-2004.
- Teo Y and Onggo B Formalization and strictness of simulation event orderings Proceedings of the eighteenth workshop on Parallel and distributed simulation, (89-96)
- Dutot P, Eyraud L, Mounié G and Trystram D Bi-criteria algorithm for scheduling jobs on cluster platforms Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, (125-132)
- Chung F, Graham R and Varghese G Parallelism versus memory allocation in pipelined router forwarding engines Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, (103-111)
- Louri A and Kodi A (2004). An Optical Interconnection Network and a Modified Snooping Protocol for the Design of Large-Scale Symmetric Multiprocessors (SMPs), IEEE Transactions on Parallel and Distributed Systems, 15:12, (1093-1104), Online publication date: 1-Dec-2004.
- Acacio M, Gonzalez J, Garcia J and Duato J (2004). An Architecture for High-Performance Scalable Shared-Memory Multiprocessors Exploiting On-Chip Integration, IEEE Transactions on Parallel and Distributed Systems, 15:8, (755-768), Online publication date: 1-Aug-2004.
- Xu C and Ibrahim T (2004). A Keyword-Based Semantic Prefetching Approach in Internet News Services, IEEE Transactions on Knowledge and Data Engineering, 16:5, (601-611), Online publication date: 1-May-2004.
- Zhang Z, Zhu Z and Zhang X (2004). Design and Optimization of Large Size and Low Overhead Off-Chip Caches, IEEE Transactions on Computers, 53:7, (843-855), Online publication date: 1-Jul-2004.
- Sinnen O and Sousa L (2004). On Task Scheduling Accuracy, The Journal of Supercomputing, 27:2, (177-194), Online publication date: 1-Feb-2004.
- Basten T, Bošnački D and Geilen M (2004). Cluster-Based Partial-Order Reduction, Automated Software Engineering, 11:4, (365-402), Online publication date: 1-Oct-2004.
- Wang T, Qi Z and Moritz C Opportunities and challenges in application-tuned circuits and architectures based on nanodevices Proceedings of the 1st conference on Computing frontiers, (503-511)
- Ozturk O, Kandemir M, Irwin M and Kolcu I Tuning data replication for improving behavior of MPSoC applications Proceedings of the 14th ACM Great Lakes symposium on VLSI, (170-173)
- Banerjee S and Dutt N FIFO power optimization for on-chip networks Proceedings of the 14th ACM Great Lakes symposium on VLSI, (187-191)
- Han S, Baghdadi A, Bonaciu M, Chae S and Jerraya A An efficient scalable and flexible data transfer architecture for multiprocessor SoC with massive distributed memory Proceedings of the 41st annual Design Automation Conference, (250-255)
- Rauber T and Rünger G (2004). Improving locality for ODE solvers by program transformations, Scientific Programming, 12:3, (133-154), Online publication date: 1-Aug-2004.
- Mitra T, Roychoudhury A and Shen Q Impact of Java Memory Model on Out-of-Order Multiprocessors Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, (99-110)
- Kadayif I, Kandemir M and Kolcu I Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors Proceedings of the conference on Design, automation and test in Europe - Volume 2
- Suh T, Blough D and Lee H Supporting Cache Coherence in Heterogeneous Multiprocessor Systems Proceedings of the conference on Design, automation and test in Europe - Volume 2
- R"dulescu A, Dielissen J, Goossens K, Rijpkema E and Wielage P An Efficient On-Chip Network Interface Offering Guaranteed Services, Shared-Memory Abstraction, and Flexible Network Configuration Proceedings of the conference on Design, automation and test in Europe - Volume 2
- Millberg M, Nilsson E, Thid R and Jantsch A Guaranteed Bandwidth Using Looped Containers in Temporally Disjoint Networks within the Nostrum Network on Chip Proceedings of the conference on Design, automation and test in Europe - Volume 2
- Chaudhuri M and Heinrich M SMTp Proceedings of the 31st annual international symposium on Computer architecture
- Cameron K and Ge R Predicting and Evaluating Distributed Communication Performance Proceedings of the 2004 ACM/IEEE conference on Supercomputing
- Jalby W, Lemuet C and Le Pasteur X (2004). WBTK, International Journal of High Performance Computing Applications, 18:2, (211-224), Online publication date: 1-May-2004.
- Gürsoy A and Kale L (2004). Performance and modularity benefits of message-driven execution, Journal of Parallel and Distributed Computing, 64:4, (461-480), Online publication date: 1-Apr-2004.
- Knoll D and Keyes D (2004). Jacobian-free Newton-Krylov methods, Journal of Computational Physics, 193:2, (357-397), Online publication date: 20-Jan-2004.
- Mahapatra N, Liu J and Sundaresan K (2002). The performance advantage of applying compression to the memory system, ACM SIGPLAN Notices, 38:2 supplement, (86-96), Online publication date: 15-Feb-2003.
- Kiran S, Jayram M, Rao P and Nandy S A complexity effective communication model for behavioral modeling of signal processing applications Proceedings of the 40th annual Design Automation Conference, (412-415)
- Brifault K and Charles H Data cache management on EPIC architecture Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture, (35-42)
- Yin G, Xu C and Wang L (2003). Optimal Remapping in Dynamic Bulk Synchronous Computations via a Stochastic Control Approach, IEEE Transactions on Parallel and Distributed Systems, 14:1, (51-62), Online publication date: 1-Jan-2003.
- Tam A and Wang C (2003). Contention-Aware Communication Schedule for High-Speed Communication, Cluster Computing, 6:4, (339-353), Online publication date: 1-Oct-2003.
- McCurdy C and Fischer C User-controllable coherence for high performance shared memory multiprocessors Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, (73-82)
- Goel A, Roychoudhury A and Mitra T Compactly representing parallel program executions Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, (191-202)
- Saunders S and Rauchwerger L ARMI Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming, (230-241)
- Chen M and Olukotun K The Jrpm system for dynamically parallelizing Java programs Proceedings of the 30th annual international symposium on Computer architecture, (434-446)
- Chen M and Olukotun K (2003). The Jrpm system for dynamically parallelizing Java programs, ACM SIGARCH Computer Architecture News, 31:2, (434-446), Online publication date: 1-May-2003.
- Jayanti P Adaptive and efficient abortable mutual exclusion Proceedings of the twenty-second annual symposium on Principles of distributed computing, (295-304)
- Paul J Programmers' views of SoCs Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (156-181)
- Poplavko P, Basten T, Bekooij M, van Meerbergen J and Mesman B Task-level timing models for guaranteed performance in multiprocessor networks-on-chip Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, (63-72)
- Shen Z An alternative routing algorithm for the pyramid structures Proceedings of the 2003 ACM symposium on Applied computing, (1009-1013)
- Shen Z An optimal broadcasting schema for multidimensional mesh structures Proceedings of the 2003 ACM symposium on Applied computing, (1019-1023)
- McCurdy C and Fischer C (2003). User-controllable coherence for high performance shared memory multiprocessors, ACM SIGPLAN Notices, 38:10, (73-82), Online publication date: 1-Oct-2003.
- Goel A, Roychoudhury A and Mitra T (2003). Compactly representing parallel program executions, ACM SIGPLAN Notices, 38:10, (191-202), Online publication date: 1-Oct-2003.
- Saunders S and Rauchwerger L (2003). ARMI, ACM SIGPLAN Notices, 38:10, (230-241), Online publication date: 1-Oct-2003.
- Baer J Multiprocessing Encyclopedia of Computer Science, (1205-1207)
- Quinn M, Miller R, Miller R and Quinn M Parallel processing Encyclopedia of Computer Science, (1349-1365)
- Ye T, Benini L and De Micheli G Packetized On-Chip Interconnect Communication Analysis for MPSoC Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
- Dongarra J, Foster I, Fox G, Gropp W, Kennedy K, Torczon L and White A References Sourcebook of parallel computing, (729-789)
- Emerson E and Kahlon V Rapid parameterized model checking of snoopy cache coherence protocols Proceedings of the 9th international conference on Tools and algorithms for the construction and analysis of systems, (144-159)
- Madsen J, Mahadevan S, Virk K and Gonzalez M Network-on-Chip Modeling for System-Level Multiprocessor Simulation Proceedings of the 24th IEEE International Real-Time Systems Symposium
- Jeong J and Dubois M Cost-Sensitive Cache Replacement Algorithms Proceedings of the 9th International Symposium on High-Performance Computer Architecture
- Radovic Z and Hagersten E Hierarchical Backoff Locks for Nonuniform Communication Architectures Proceedings of the 9th International Symposium on High-Performance Computer Architecture
- Vetter J and Mueller F (2003). Communication characteristics of large-scale scientific applications for contemporary cluster architectures, Journal of Parallel and Distributed Computing, 63:9, (853-865), Online publication date: 1-Sep-2003.
- Iwamoto Y, Suga K, Ootsu K, Yokota T and Baba T (2003). Receiving message prediction method, Parallel Computing, 29:11-12, (1509-1538), Online publication date: 1-Nov-2003.
- Kee Y, Kim J and Ha S ParADE Proceedings of the 2003 ACM/IEEE conference on Supercomputing
- Mauer C, Hill M and Wood D Full-system timing-first simulation Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (108-116)
- Kandiraju G and Sivasubramaniam A Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (129-139)
- Mauer C, Hill M and Wood D (2002). Full-system timing-first simulation, ACM SIGMETRICS Performance Evaluation Review, 30:1, (108-116), Online publication date: 1-Jun-2002.
- Kandiraju G and Sivasubramaniam A (2002). Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks, ACM SIGMETRICS Performance Evaluation Review, 30:1, (129-139), Online publication date: 1-Jun-2002.
- Gharsalli F, Meftali S, Rousseau F and Jerraya A Automatic generation of embedded memory wrapper for multiprocessor SoC Proceedings of the 39th annual Design Automation Conference, (596-601)
- Cesário W, Baghdadi A, Gauthier L, Lyonnard D, Nicolescu G, Paviot Y, Yoo S, Jerraya A and Diaz-Nava M Component-based design approach for multicore SoCs Proceedings of the 39th annual Design Automation Conference, (789-794)
- Brown J, Grossman J and Knight T A lightweight idempotent messaging protocol for faulty networks Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures, (248-257)
- Gharsalli F, Lyonnard D, Meftali S, Rousseau F and Jerraya A Unifying memory and processor wrapper architecture in multiprocessor SoC design Proceedings of the 15th international symposium on System Synthesis, (26-31)
- Paul J, Andrews C, Cassidy A and Thomas D System-level modeling of a network switch SoC Proceedings of the 15th international symposium on System Synthesis, (62-67)
- Lepak K and Lipasti M Temporally silent stores Proceedings of the 10th international conference on Architectural support for programming languages and operating systems, (30-41)
- Lepak K and Lipasti M (2002). Temporally silent stores, ACM SIGPLAN Notices, 37:10, (30-41), Online publication date: 1-Oct-2002.
- Lepak K and Lipasti M (2002). Temporally silent stores, ACM SIGARCH Computer Architecture News, 30:5, (30-41), Online publication date: 1-Dec-2002.
- Lepak K and Lipasti M (2002). Temporally silent stores, ACM SIGOPS Operating Systems Review, 36:5, (30-41), Online publication date: 1-Dec-2002.
- Mahapatra N, Liu J and Sundaresan K The performance advantage of applying compression to the memory system Proceedings of the 2002 workshop on Memory system performance, (86-96)
- Chatterjee S, R. Lebeck A, K. Patnala P and Thottethodi M (2002). Recursive Array Layouts and Fast Matrix Multiplication, IEEE Transactions on Parallel and Distributed Systems, 13:11, (1105-1123), Online publication date: 1-Nov-2002.
- Sorin D, Plakal M, Condon A, Hill M, Martin M and Wood D (2002). Specifying and Verifying a Broadcast and a Multicast Snooping Cache Coherence Protocol, IEEE Transactions on Parallel and Distributed Systems, 13:6, (556-578), Online publication date: 1-Jun-2002.
- Charlesworth A (2002). The Sun Fireplane Interconnect, IEEE Micro, 22:1, (36-45), Online publication date: 1-Jan-2002.
- Parthasarathy S and Dwarkadas S (2002). Shared State for Distributed Interactive Data Mining Applications, Distributed and Parallel Databases, 11:2, (129-155), Online publication date: 1-Mar-2002.
- Simmonds R, Kiddle C and Unger B Addressing blocking and scalability in critical channel traversing Proceedings of the sixteenth workshop on Parallel and distributed simulation, (17-24)
- Vetter J and Mueller F Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures Proceedings of the 16th International Parallel and Distributed Processing Symposium
- Zahran M and Franklin M A Feasibility Study of Hierarchical Multithreading Proceedings of the 16th International Parallel and Distributed Processing Symposium
- Beaumont O, Boudet V and Robert Y A Realistic Model and an Efficient Heuristic for Scheduling with Heterogeneous Processors Proceedings of the 16th International Parallel and Distributed Processing Symposium
- Lepère R and Trystram D A New Clustering Algorithm for Large Communication Delays Proceedings of the 16th International Parallel and Distributed Processing Symposium
- Yin G, Xu C and Wang L Optimal Remapping in Dynamic Bulk Synchronous Computations via a Stochastic Control Approach Proceedings of the 16th International Parallel and Distributed Processing Symposium
- Acacio M, González J, García J and Duato J A Novel Approach to Reduce L2 Miss Latency in Shared-Memory Multiprocessors Proceedings of the 16th International Parallel and Distributed Processing Symposium
- Acacio M, González J, García J and Duato J The Use of Prediction for Accelerating Upgrade Misses in cc-NUMA Multiprocessors Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques, (155-164)
- Acacio M, González J, García J and Duato J Owner prediction for accelerating cache-to-cache transfer misses in a cc-NUMA architecture Proceedings of the 2002 ACM/IEEE conference on Supercomputing, (1-12)
- Radović Z and Hagersten E Efficient synchronization for nonuniform communication architectures Proceedings of the 2002 ACM/IEEE conference on Supercomputing, (1-13)
- Vetter J and Yoo A An empirical performance evaluation of scalable scientific applications Proceedings of the 2002 ACM/IEEE conference on Supercomputing, (1-18)
- van der Steen A and Dongarra J Overview of high performance computers Handbook of massive data sets, (791-852)
- Baydal E, López P and Duato J Increasing the adaptivity of routing algorithms for k-ary n-cubes Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing, (455-462)
- Acacio M, González J, García J and Duato J Reducing the latency of L2 misses in shared-memory multiprocessors through on-chip directory integration Proceedings of the 10th Euromicro conference on Parallel, distributed and network-based processing, (368-375)
- Povitsky A (2002). Parallel ADI solver based on processor scheduling, Applied Mathematics and Computation, 133:1, (43-81), Online publication date: 25-Nov-2002.
- Shen Z A routing algorithm for the pyramid structures Proceedings of the 2001 ACM symposium on Applied computing, (484-488)
- Pressel D Fundamental limitations on the use of prefetching and stream buffers for scientific applications Proceedings of the 2001 ACM symposium on Applied computing, (554-559)
- Nikolopoulos D, Ayguadé E, Papatheodorou T, Polychronopoulos C and Labarta J The trade-off between implicit and explicit data distribution in shared-memory programming paradigms Proceedings of the 15th international conference on Supercomputing, (23-37)
- Tang H and Yang T Optimizing threaded MPI execution on SMP clusters Proceedings of the 15th international conference on Supercomputing, (381-392)
- Shuf Y, Serrano M, Gupta M and Singh J Characterizing the memory behavior of Java workloads Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, (194-205)
- Dutot P and Trystram D Scheduling on hierarchical clusters using malleable tasks Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures, (199-208)
- Vetter J and McCracken M Statistical scalability analysis of communication operations in distributed applications Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming, (123-132)
- Shuf Y, Serrano M, Gupta M and Singh J (2001). Characterizing the memory behavior of Java workloads, ACM SIGMETRICS Performance Evaluation Review, 29:1, (194-205), Online publication date: 1-Jun-2001.
- Meftali S, Gharsalli F, Rousseau F and Jerraya A An optimal memory allocation for application-specific multiprocessor system-on-chip Proceedings of the 14th international symposium on Systems synthesis, (19-24)
- Aslot V and Eigenmann R (2001). Performance characteristics of the SPEC OMP2001 benchmarks, ACM SIGARCH Computer Architecture News, 29:5, (31-40), Online publication date: 1-Dec-2001.
- Vetter J and McCracken M (2001). Statistical scalability analysis of communication operations in distributed applications, ACM SIGPLAN Notices, 36:7, (123-132), Online publication date: 1-Jul-2001.
- Charlesworth A The sun fireplane system interconnect Proceedings of the 2001 ACM/IEEE conference on Supercomputing, (7-7)
- Banikazemi M, Govindaraju R, Blackmore R and Panda D (2001). MPI-LAPI, IEEE Transactions on Parallel and Distributed Systems, 12:10, (1081-1093), Online publication date: 1-Oct-2001.
- Xu C and Chaudhary V (2001). Time Stamp Algorithms for Runtime Parallelization of DOACROSS Loops with Dynamic Dependences, IEEE Transactions on Parallel and Distributed Systems, 12:5, (433-450), Online publication date: 1-May-2001.
- Vaidya A, Sivasubramaniam A and Das C (2001). Impact of Virtual Channels and Adaptive Routing on Application Performance, IEEE Transactions on Parallel and Distributed Systems, 12:2, (223-237), Online publication date: 1-Feb-2001.
- Li T and John L (2001). ADir_pNB, IEEE Transactions on Computers, 50:9, (921-934), Online publication date: 1-Sep-2001.
- Nikolopoulos D and Papatheodorou T (2001). The Architectural and Operating System Implications on the Performance of Synchronization on ccNUMA Multiprocessors, International Journal of Parallel Programming, 29:3, (249-282), Online publication date: 1-Jun-2001.
- Hsiao H and King C (2001). An Application-Driven Study of Multicast Communication for Write Invalidation, The Journal of Supercomputing, 18:3, (279-304), Online publication date: 1-Mar-2001.
- Hsiao H and King C (2001). Exploiting Network Locality for CC-NUMA Multiprocessors, The Journal of Supercomputing, 18:1, (63-87), Online publication date: 1-Jan-2001.
- Brock B, Carpenter G, Chiprout E, Dean M, De Backer P, Elnozahy E, Franke H, Giampapa M, Glasco D, Peterson J, Rajamony R, Ravindran R, Rawson F, Rockhold R and Rubio J (2001). Experience with building a commodity intel-based ccNUMA system, IBM Journal of Research and Development, 45:2, (207-227), Online publication date: 1-Mar-2001.
- Baghdadi A, Lyonnard D, Zergainoh N and Jerraya A An efficient architecture model for systematic design of application-specific multiprocessor SoC Proceedings of the conference on Design, automation and test in Europe, (55-63)
- Martin M, Sorin D, Cain H, Hill M and Lipasti M Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, (328-337)
- Moh S, Yu C, Lee B, Youn H, Han D and Lee D (2001). Four-Ary Tree-Based Barrier Synchronization for 2D Meshes without Nonmember Involvement, IEEE Transactions on Computers, 50:8, (811-823), Online publication date: 1-Aug-2001.
- Min R and Hu Y (2001). Improving Performance of Large Physically Indexed Caches by Decoupling Memory Addresses from Cache Addresses, IEEE Transactions on Computers, 50:11, (1191-1201), Online publication date: 1-Nov-2001.
- Beaumont O, Boudet V and Petitet A (2001). A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers), IEEE Transactions on Computers, 50:10, (1052-1070), Online publication date: 1-Oct-2001.
- Lyonnard D, Yoo S, Baghdadi A and Jerraya A Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip Proceedings of the 38th annual Design Automation Conference, (518-523)
- Hill J, Szewczyk R, Woo A, Hollar S, Culler D and Pister K (2000). System architecture directions for networked sensors, ACM SIGPLAN Notices, 35:11, (93-104), Online publication date: 1-Nov-2000.
- Sánchez J and González A Modulo scheduling for a fully-distributed clustered VLIW architecture Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture, (124-133)
- Hill J, Szewczyk R, Woo A, Hollar S, Culler D and Pister K System architecture directions for networked sensors Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, (93-104)
- Hill J, Szewczyk R, Woo A, Hollar S, Culler D and Pister K (2000). System architecture directions for networked sensors, ACM SIGARCH Computer Architecture News, 28:5, (93-104), Online publication date: 1-Dec-2000.
- Hill J, Szewczyk R, Woo A, Hollar S, Culler D and Pister K (2000). System architecture directions for networked sensors, ACM SIGOPS Operating Systems Review, 34:5, (93-104), Online publication date: 1-Dec-2000.
- Vishkin D and Vishkin U (2000). Experiments with list ranking for explicit multi-threaded (XMT) instruction parallelism, ACM Journal of Experimental Algorithmics, 5, (10-es), Online publication date: 31-Dec-2001.
- Vishkin U A no-busy-wait balanced tree parallel algorithmic paradigm Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures, (147-155)
- Nikolopoulos D, Papatheodorou T, Polychronopoulos C, Labarta J and Ayguadé E A case for user-level dynamic page migration Proceedings of the 14th international conference on Supercomputing, (119-130)
- Manjikian N Enhancements and applications of the SimpleScalar simulator for undergraduate and graduate computer architecture education Proceedings of the 2000 workshop on Computer architecture education, (8-es)
- Fleury M, Downton A and Clark A (2000). Performance Metrics for Embedded Parallel Pipelines, IEEE Transactions on Parallel and Distributed Systems, 11:11, (1164-1185), Online publication date: 1-Nov-2000.
- Prieto M, Llorente I and Tirado F (2000). Data Locality Exploitation in the Decomposition of Regular Domain Problems, IEEE Transactions on Parallel and Distributed Systems, 11:11, (1141-1150), Online publication date: 1-Nov-2000.
- Milenkovic A (2000). Achieving High Performance in Bus-Based Shared-Memory Multiprocessors, IEEE Concurrency, 8:3, (36-44), Online publication date: 1-Jul-2000.
- Rauber T and Rünger G (2000). A Transformation Approach to Derive Efficient Parallel Implementations, IEEE Transactions on Software Engineering, 26:4, (315-339), Online publication date: 1-Apr-2000.
- Gao G and Sarkar V (2000). Location Consistency-A New Memory Model and Cache Consistency Protocol, IEEE Transactions on Computers, 49:8, (798-813), Online publication date: 1-Aug-2000.
- Nikolopoulos D, Papatheodorou T, Polychronopoulos C, Labarta J and Ayguad\'{e} E (2000). A transparent runtime data distribution engine for OpenMP, Scientific Programming, 8:3, (143-162), Online publication date: 1-Aug-2000.
- Acquaviva J and Jalby W Hardware prediction for data coherency of scientific codes on DSM Proceedings of the 2000 ACM/IEEE conference on Supercomputing, (41-es)
- Nikolopoulos D, Papatheodorou T, Polychronopoulos C, Labarta J and Ayguade;eacute; E Is data distribution necessary in OpenMP? Proceedings of the 2000 ACM/IEEE conference on Supercomputing, (47-es)
- Hsiao H and King C The Thread-Based Protocol Engines for CC-NUMA Multiprocessors Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
- Tang H, Shen K and Yang T (2000). Program transformation and runtime support for threaded MPI execution on shared-memory machines, ACM Transactions on Programming Languages and Systems, 22:4, (673-700), Online publication date: 1-Jul-2000.
- Bagrodia R, Deeljman E, Docy S and Phan T (1999). Performance prediction of large parallel applications using parallel simulations, ACM SIGPLAN Notices, 34:8, (151-162), Online publication date: 1-Aug-1999.
- Tang H, Shen K and Yang T (1999). Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines, ACM SIGPLAN Notices, 34:8, (107-118), Online publication date: 1-Aug-1999.
- Chatterjee S, Lebeck A, Patnala P and Thottethodi M Recursive array layouts and fast parallel matrix multiplication Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, (222-231)
- Jeong J and Dubois M Optimal replacements in caches with two miss costs Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, (155-164)
- Hill M, Condon A, Plakal M and Sorin D A system-level specification framework for I/O architectures Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures, (138-147)
- Bagrodia R, Deeljman E, Docy S and Phan T Performance prediction of large parallel applications using parallel simulations Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (151-162)
- Tang H, Shen K and Yang T Compile/run-time support for threaded MPI execution on multiprogrammed shared memory machines Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, (107-118)
- Sodan A and Torra V Hierarchical fuzzy configuration of implementation strategies Proceedings of the 1999 ACM symposium on Applied computing, (250-259)
- Giorgi R and Prete C (1999). PSCR, IEEE Transactions on Parallel and Distributed Systems, 10:7, (742-763), Online publication date: 1-Jul-1999.
- Dai D and Panda D (1999). Exploiting the Benefits of Multiple-Path Network in DSM Systems, IEEE Transactions on Computers, 48:2, (236-244), Online publication date: 1-Feb-1999.
- Kwak H, Lee B, Hurson A, Yoon S and Hahn W (1999). Effects of Multithreading on Cache Performance, IEEE Transactions on Computers, 48:2, (176-184), Online publication date: 1-Feb-1999.
- Messina P, Culler D, Pfeiffer W, Martin W, Oden J and Smith G (1998). Architecture, Communications of the ACM, 41:11, (36-44), Online publication date: 1-Nov-1998.
- Abandah G and Davidson E (1998). Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance, ACM SIGARCH Computer Architecture News, 26:3, (318-329), Online publication date: 1-Jun-1998.
- Keeton K, Patterson D, He Y, Raphael R and Baker W (1998). Performance characterization of a Quad Pentium Pro SMP using OLTP workloads, ACM SIGARCH Computer Architecture News, 26:3, (15-26), Online publication date: 1-Jun-1998.
- Abandah G and Davidson E Effects of architectural and technological advances on the HP/Convex Exemplar's memory and communication performance Proceedings of the 25th annual international symposium on Computer architecture, (318-329)
- Keeton K, Patterson D, He Y, Raphael R and Baker W Performance characterization of a Quad Pentium Pro SMP using OLTP workloads Proceedings of the 25th annual international symposium on Computer architecture, (15-26)
- Pinkston T and Beerel P Computer engineering using innovative instructional technologies at the University of Southern California Proceedings of the 1998 workshop on Computer architecture education, (27-es)
- Abandah G and Davidson E (1998). Characterizing Distributed Shared Memory Performance, IEEE Transactions on Parallel and Distributed Systems, 9:2, (206-216), Online publication date: 1-Feb-1998.
- Lee J and Jhon C Reducing coherence overhead of barrier synchronization in software DSMs Proceedings of the 1998 ACM/IEEE conference on Supercomputing, (1-18)
- Dubois M, Jeong J, Song Y and Moga A (1998). Rapid Hardware Prototyping on RPM-2, IEEE Design & Test, 15:3, (112-118), Online publication date: 1-Jul-1998.
- Shi W, Hu W and Tang Z (1997). An interaction of coherence protocols and memory consistency models in DSM systems, ACM SIGOPS Operating Systems Review, 31:4, (41-54), Online publication date: 1-Oct-1997.
- Vishkin U From algorithm parallelism to instruction-level parallelism Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, (260-271)
Recommendations
A universal parallel computer architecture
AbstractAdvances in interconnection network performance and interprocessor interaction mechanisms enable the construction of fine-grain parallel computers in which the nodes are physically small and have a small amount of memory. This class of machines ...
Architecture of the VPP500 parallel supercomputer
Supercomputing '94: Proceedings of the 1994 ACM/IEEE conference on SupercomputingThe VPP500 vector parallel processor is a highly parallel, distributed memory supercomputer that has a performance range of 6.4 to 355 gigaFLOPS and a main memory capacity from 1 to 222 gigabytes. The system scalably supports between 4 and 222 ...