skip to main content
Skip header Section
Computer Architecture: A Quantitative ApproachMay 2003
Publisher:
  • Morgan Kaufmann Publishers Inc.
  • 340 Pine Street, Sixth Floor
  • San Francisco
  • CA
  • United States
ISBN:978-1-55860-724-8
Published:01 May 2003
Pages:
1100
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

From the Book:

I am very lucky to have studied computer architecture under Prof. David Patterson at U.C. Berkeley more than 20 years ago. I enjoyed the courses I took from him, in the early days of RISC architecture. Since leaving Berkeley to help found Sun Microsystems, I have used the ideas from his courses and many more that are described in this important book.

The good news today is that this book covers incredibly important and contemporary material. The further good news is that much exciting and challenging work remains to be done, and that working from Computer Architecture: A Quantitative Approach is a great way to start.

The most successful architectural projects that I have been involved in have always started from simple ideas, with advantages explainable using simple numerical models derived from hunches and rules of thumb. The continuing rapid advances in computing technology and new applications ensure that we will need new similarly simple models to understand what is possible in the future, and that new classes of applications will stress systems in different and interesting ways.

The quantitative approach introduced in Chapter 1 is essential to understanding these issues. In particular, we expect to see, in the near future, much more emphasis on minimizing power to meet the demands of a given application, across all sizes of systems; much remains to be learned in this area.

I have worked with many different instruction sets in my career. I first programmed a PDP-8, whose instruction set was so simple that a friend easily learned to disassemble programs just by glancing at the hole punches in paper tape! I wrote a lot of code in PDP-11 assembler, including an interpreter for the Pascal programming language and for the VAX (which was used as an example in the first edition of this book); the success of the VAX led to the widespread use of UNIX on the early Internet.

Cited By

  1. San Juan P, Rodríguez-Sánchez R, Igual F, Alonso-Jordá P and Quintana-Ortí E (2021). Low precision matrix multiplication for efficient deep learning in NVIDIA Carmel processors, The Journal of Supercomputing, 77:10, (11257-11269), Online publication date: 1-Oct-2021.
  2. Haverkort B Performance Evaluation: Model-Driven or Problem-Driven? Quantitative Evaluation of Systems, (3-11)
  3. Lyu Y, Qin X, Chen M and Mishra P (2018). Directed Test Generation for Validation of Cache Coherence Protocols, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 38:1, (163-176), Online publication date: 1-Jan-2019.
  4. ACM
    Eisl J, Leopoldseder D and Mössenböck H Parallel trace register allocation Proceedings of the 15th International Conference on Managed Languages & Runtimes, (1-7)
  5. Tomar G and George M (2018). Modified Binary Multiplier Architecture to Achieve Reduced Latency and Hardware Utilization, Wireless Personal Communications: An International Journal, 98:4, (3549-3561), Online publication date: 1-Feb-2018.
  6. Chalios C, Georgakoudis G, Tovletoglou K, Karakonstantis G, Vandierendonck H and Nikolopoulos D (2018). DARE, International Journal of High Performance Computing Applications, 32:1, (74-88), Online publication date: 1-Jan-2018.
  7. Li M, Tanimura Y and Nakada H A Quantitative Analysis on Required Network Bandwidth for Large-Scale Parallel Machine Learning Machine Learning, Optimization, and Big Data, (389-400)
  8. ACM
    Wong H, Betz V and Rose J (2016). Microarchitecture and Circuits for a 200 MHz Out-of-Order Soft Processor Memory System, ACM Transactions on Reconfigurable Technology and Systems, 10:1, (1-22), Online publication date: 28-Dec-2016.
  9. Dalui M and Sikdar B (2016). A Cache System Design for CMPs with Built-In Coherence Verification, VLSI Design, 2016, (2), Online publication date: 1-Oct-2016.
  10. ACM
    Low T, Igual F, Smith T and Quintana-Orti E (2016). Analytical Modeling Is Enough for High-Performance BLIS, ACM Transactions on Mathematical Software, 43:2, (1-18), Online publication date: 2-Sep-2016.
  11. Terzopoulos G and Karatza H (2016). Power-aware Bag-of-Tasks scheduling on heterogeneous platforms, Cluster Computing, 19:2, (615-631), Online publication date: 1-Jun-2016.
  12. ACM
    Furbach F, Meyer R, Schneider K and Senftleben M (2015). Memory-Model-Aware Testing, ACM Transactions on Embedded Computing Systems, 14:4, (1-25), Online publication date: 8-Dec-2015.
  13. ACM
    Mariani G, Anghel A, Jongerius R and Dittmann G Scaling application properties to exascale Proceedings of the 12th ACM International Conference on Computing Frontiers, (1-8)
  14. ACM
    Martins P and Sousa L Stretching the limits of Programmable Embedded Devices for Public-key Cryptography Proceedings of the Second Workshop on Cryptography and Security in Computing Systems, (19-24)
  15. ACM
    Tong X, Koju T, Kawahito M and Moshovos A (2015). Optimizing Memory Translation Emulation in Full System Emulators, ACM Transactions on Architecture and Code Optimization, 11:4, (1-24), Online publication date: 9-Jan-2015.
  16. ACM
    Ding W, Kandemir M, Guttman D, Jog A, Das C and Yedlapalli P Trading cache hit rate for memory performance Proceedings of the 23rd international conference on Parallel architectures and compilation, (357-368)
  17. ACM
    Harris T and Fraser K (2014). Language support for lightweight transactions, ACM SIGPLAN Notices, 49:4S, (64-78), Online publication date: 1-Jul-2014.
  18. ACM
    Ding W and Kandemir M (2014). CApRI, ACM SIGMETRICS Performance Evaluation Review, 42:1, (477-489), Online publication date: 20-Jun-2014.
  19. ACM
    Ding W and Kandemir M CApRI The 2014 ACM international conference on Measurement and modeling of computer systems, (477-489)
  20. ACM
    Schneider J, Peddersen J and Parameswaran S MASH{fifo} Proceedings of the 51st Annual Design Automation Conference, (1-6)
  21. ACM
    Chen T, Chen Y, Guo Q, Zhou Z, Li L and Xu Z (2014). Effective and efficient microprocessor design space exploration using unlabeled design configurations, ACM Transactions on Intelligent Systems and Technology, 5:1, (1-18), Online publication date: 1-Dec-2013.
  22. ACM
    Vandierendonck H, Tzenakis G and Nikolopoulos D (2013). Analysis of dependence tracking algorithms for task dataflow execution, ACM Transactions on Architecture and Code Optimization, 10:4, (1-24), Online publication date: 1-Dec-2013.
  23. ACM
    Baldwin D, Walker H and Henderson P (2013). The roles of mathematics in computer science, ACM Inroads, 4:4, (74-80), Online publication date: 1-Dec-2013.
  24. ACM
    Plavec F, Vranesic Z and Brown S (2013). Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs, ACM Transactions on Reconfigurable Technology and Systems (TRETS), 6:4, (1-37), Online publication date: 1-Dec-2013.
  25. ACM
    Yuan X, Mahapatra S, Nienaber W, Pakin S and Lang M A new routing scheme for Jellyfish and its performance with HPC workloads Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  26. ACM
    Zhang G, Zheng W and Li K (2013). Design and Evaluation of a New Approach to RAID-0 Scaling, ACM Transactions on Storage, 9:4, (1-31), Online publication date: 1-Nov-2013.
  27. Clemons J, Pellegrini A, Savarese S and Austin T EVA Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, (1-10)
  28. ACM
    Serrano M Trace construction using enhanced performance monitoring Proceedings of the ACM International Conference on Computing Frontiers, (1-10)
  29. ACM
    Palem K and Lingamneni A (2013). Ten Years of Building Broken Chips, ACM Transactions on Embedded Computing Systems (TECS), 12:2s, (1-23), Online publication date: 1-May-2013.
  30. ACM
    Schuurman D Step-by-step design and simulation of a simple CPU architecture Proceeding of the 44th ACM technical symposium on Computer science education, (335-340)
  31. Cheng C (2013). Design example of useful memory latency for developing a hazard preventive pipeline high-performance embedded-microprocessor, VLSI Design, 2013, (6-6), Online publication date: 1-Jan-2013.
  32. ACM
    Coelho F and Irigoin F (2013). API compilation for image hardware accelerators, ACM Transactions on Architecture and Code Optimization (TACO), 9:4, (1-25), Online publication date: 1-Jan-2013.
  33. ACM
    Oboril F, Firouzi F, Kiamehr S and Tahoori M Reducing NBTI-induced processor wearout by exploiting the timing slack of instructions Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (443-452)
  34. ACM
    Thielmann B, Huthmann J and Koch A (2012). Memory Latency Hiding by Load Value Speculation for Reconfigurable Computers, ACM Transactions on Reconfigurable Technology and Systems, 5:3, (1-14), Online publication date: 1-Oct-2012.
  35. Chiu J, Yang K and Wong C Analytical modeling for multi-transaction bus on distributed systems Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II, (1-9)
  36. ACM
    Arora D, Aaraj N, Raghunathan A and Jha N (2012). INVISIOS, ACM Transactions on Embedded Computing Systems (TECS), 11:3, (1-20), Online publication date: 1-Sep-2012.
  37. ACM
    Sau S, Paul R, Biswas T and Chakrabarti A A novel AES-256 implementation on FPGA using co-processor based architecture Proceedings of the International Conference on Advances in Computing, Communications and Informatics, (632-638)
  38. ACM
    Wang W, Mishra P and Gordon-Ross A (2012). Dynamic Cache Reconfiguration for Soft Real-Time Systems, ACM Transactions on Embedded Computing Systems (TECS), 11:2, (1-31), Online publication date: 1-Jul-2012.
  39. ACM
    Hefeeda M, Gao F and Abd-Almageed W Distributed approximate spectral clustering for large-scale datasets Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, (223-234)
  40. ACM
    Tartara M and Crespi Reghizzi S Parallel iterative compilation Proceedings of third international workshop on MapReduce and its Applications Date, (33-40)
  41. Firouzi F, Kiamehr S and Tahoori M NBTI mitigation by optimized NOP assignment and insertion Proceedings of the Conference on Design, Automation and Test in Europe, (218-223)
  42. Qin X and Mishra P Automated generation of directed tests for transition coverage in cache coherence protocols Proceedings of the Conference on Design, Automation and Test in Europe, (3-8)
  43. ACM
    Monteiro P, Monteiro M and Pingali K Parallelizing irregular algorithms Proceedings of the 18th Conference on Pattern Languages of Programs, (1-18)
  44. Park S, Kim S, Lee D, Kim J, Griffin W and Roy K Column-selection-enabled 8T SRAM array with ~1R/1W multi-port operation for DVFS-enabled processors Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design, (303-308)
  45. Kontkanen J, Tabellion E and Overbeck R Coherent out-of-core point-based global illumination Proceedings of the Twenty-second Eurographics conference on Rendering, (1353-1360)
  46. Grund D, Reineke J and Gebhard G (2011). Branch target buffers, Journal of Systems Architecture: the EUROMICRO Journal, 57:6, (625-637), Online publication date: 1-Jun-2011.
  47. Rajamony R, Arimilli L and Gildea K (2019). PERCS, IBM Journal of Research and Development, 55:3, (233-244), Online publication date: 1-May-2011.
  48. ACM
    Kim H, Ghoshal P, Grot B, Gratz P and Jiménez D Reducing Network-on-Chip energy consumption through spatial locality speculation Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip, (233-240)
  49. ACM
    Perks O, Hammond S, Pennycook S and Jarvis S (2011). Should we worry about memory loss?, ACM SIGMETRICS Performance Evaluation Review, 38:4, (69-74), Online publication date: 29-Mar-2011.
  50. Zheng W and Zhang G FastScale Proceedings of the 9th USENIX conference on File and stroage technologies, (11-11)
  51. ACM
    Park J, Balfour J and Dally W Fine-grain dynamic instruction placement for L0 scratch-pad memory Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems, (137-146)
  52. ACM
    Kelm J, Johnson M, Lumettta S and Patel S WAYPOINT Proceedings of the 19th international conference on Parallel architectures and compilation techniques, (99-110)
  53. Sciampacone R, Sundaresan V, Maier D and Gray-Donald T (2010). Exploitation of multicore systems in a java virtual machine, IBM Journal of Research and Development, 54:5, (445-455), Online publication date: 1-Sep-2010.
  54. ACM
    Lee B and Brooks D (2010). Applied inference, ACM Transactions on Architecture and Code Optimization (TACO), 7:2, (1-37), Online publication date: 1-Sep-2010.
  55. ACM
    Zhang L, Speight E, Rajamony R and Lin J Enigma Proceedings of the 24th ACM International Conference on Supercomputing, (159-168)
  56. ACM
    Monteiro P and Monteiro M A pattern language for parallelizing irregular algorithms Proceedings of the 2010 Workshop on Parallel Programming Patterns, (1-14)
  57. Kranenburg T and van Leuken R MB-LITE Proceedings of the Conference on Design, Automation and Test in Europe, (997-1000)
  58. Chen M, Qin X and Mishra P Efficient decision ordering techniques for SAT-based test generation Proceedings of the Conference on Design, Automation and Test in Europe, (490-495)
  59. ACM
    Askitis N and Zobel J (2011). Redesigning the string hash table, burst trie, and BST to exploit cache, Journal of Experimental Algorithmics (JEA), 15, (1.1-1.61), Online publication date: 1-Mar-2010.
  60. Pirzadeh H, Dubé D and Hamou-Lhadj A An extended proof-carrying code framework for security enforcement Transactions on computational science XI, (249-269)
  61. Silvestri F (2018). Mining Query Logs, Foundations and Trends in Information Retrieval, 4:1—2, (1-174), Online publication date: 1-Jan-2010.
  62. ACM
    Dubois D, Dubois A, Boorman T, Connor C and Poole S (2010). Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application, ACM Transactions on Reconfigurable Technology and Systems (TRETS), 3:1, (1-31), Online publication date: 1-Jan-2010.
  63. Chiu J and Yang K (2010). A Novel instruction stream buffer for VLIW architectures, Computers and Electrical Engineering, 36:1, (190-198), Online publication date: 1-Jan-2010.
  64. ACM
    Belter G, Jessup E, Karlin I and Siek J Automating the generation of composed linear algebra kernels Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, (1-12)
  65. Molnos A, Cotofana S, Heijligers M and Eijndhoven J (2009). Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors, Journal of Signal Processing Systems, 57:2, (155-172), Online publication date: 1-Nov-2009.
  66. Olschanowsky C, Tikir M, Carrington L and Snavely A PSnAP Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing, (353-367)
  67. ACM
    Nikolov A (2010). Queuing theoretic model for a multiprocessor with private caches and shared memory, ACM SIGARCH Computer Architecture News, 37:4, (35-44), Online publication date: 27-Sep-2009.
  68. Scandolo L, Kunz C and Hermenegildo M Program parallelization using synchronized pipelining Proceedings of the 19th international conference on Logic-Based Program Synthesis and Transformation, (173-187)
  69. ACM
    Gopalakrishnan G, Yang Y, Vakkalanka S, Vo A, Aananthakrishnan S, Szubzda G, Sawaya G, Williams J, Sharma S, DeLisi M and Atzeni S Some resources for teaching concurrency Proceedings of the 7th Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging, (1-6)
  70. ACM
    Koo H and Mishra P (2009). Functional test generation using design and property decomposition techniques, ACM Transactions on Embedded Computing Systems, 8:4, (1-33), Online publication date: 1-Jul-2009.
  71. Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M and Ferdinand C (2009). Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 28:7, (966-978), Online publication date: 1-Jul-2009.
  72. Andryc K, Tessier R and Kelly P An Interactive Approach to Timing Accurate PCI-X Simulation Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping, (181-187)
  73. ACM
    Prokopov S and Tyanev D Hardware implementation of strategies for servicing queues Proceedings of the International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing, (1-8)
  74. Casteres J and Ramaherirariny T Aircraft integration real-time simulator modeling with AADL for architecture tradeoffs Proceedings of the Conference on Design, Automation and Test in Europe, (346-351)
  75. ACM
    Gupta A, Kim Y and Urgaonkar B DFTL Proceedings of the 14th international conference on Architectural support for programming languages and operating systems, (229-240)
  76. ACM
    Gupta A, Kim Y and Urgaonkar B (2009). DFTL, ACM SIGARCH Computer Architecture News, 37:1, (229-240), Online publication date: 1-Mar-2009.
  77. ACM
    Gupta A, Kim Y and Urgaonkar B (2009). DFTL, ACM SIGPLAN Notices, 44:3, (229-240), Online publication date: 28-Feb-2009.
  78. ACM
    Quintana-Ortí G, Igual F, Quintana-Ortí E and van de Geijn R (2009). Solving dense linear systems on platforms with multiple hardware accelerators, ACM SIGPLAN Notices, 44:4, (121-130), Online publication date: 14-Feb-2009.
  79. ACM
    Quintana-Ortí G, Igual F, Quintana-Ortí E and van de Geijn R Solving dense linear systems on platforms with multiple hardware accelerators Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, (121-130)
  80. Kim H and Oh H (2009). A DSP-enhanced 32-bit embedded microprocessor, Journal of Embedded Computing, 3:1, (19-28), Online publication date: 1-Jan-2009.
  81. Zipf P, Sassatelli G, Utlu N, Saint-Jean N, Benoit P and Glesner M (2009). A decentralised task mapping approach for homogeneous multiprocessor network-on-chips, International Journal of Reconfigurable Computing, 2009, (1-14), Online publication date: 1-Jan-2009.
  82. ACM
    Eisley N, Peh L and Shang L Leveraging on-chip networks for data cache migration in chip multiprocessors Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (197-207)
  83. ACM
    Romanescu B and Sorin D Core cannibalization architecture Proceedings of the 17th international conference on Parallel architectures and compilation techniques, (43-51)
  84. ACM
    Lickly B, Liu I, Kim S, Patel H, Edwards S and Lee E Predictable programming on a precision timed architecture Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems, (137-146)
  85. Merniz S and Benmohammed M Modelling and verification of superscalar Micro-architectures functional approach Proceedings of the 12th WSEAS international conference on Computers, (446-451)
  86. ACM
    Park C, Cheon W, Kang J, Roh K, Cho W and Kim J (2008). A reconfigurable FTL (flash translation layer) architecture for NAND flash-based applications, ACM Transactions on Embedded Computing Systems, 7:4, (1-23), Online publication date: 1-Jul-2008.
  87. ACM
    Mishra P and Dutt N (2008). Specification-driven directed test generation for validation of pipelined processors, ACM Transactions on Design Automation of Electronic Systems (TODAES), 13:3, (1-36), Online publication date: 1-Jul-2008.
  88. Choudhury A, Potter K and Parker S Interactive visualization for memory reference traces Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization, (815-822)
  89. ACM
    Pratas F, Gaydadjiev G, Berekovic M, Sousa L and Kaxiras S Low power microarchitecture with instruction reuse Proceedings of the 5th conference on Computing frontiers, (149-158)
  90. ACM
    Qi Z and Stan M NBTI resilient circuits using adaptive body biasing Proceedings of the 18th ACM Great Lakes symposium on VLSI, (285-290)
  91. Zang C, Imai S, Frank S and Kimura S (2018). Issue Mechanism for Embedded Simultaneous Multithreading Processor, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E91-A:4, (1092-1100), Online publication date: 1-Apr-2008.
  92. Bergstra J and Middelburg C (2008). Maurer computers for pipelined instruction processing†, Mathematical Structures in Computer Science, 18:2, (373-409), Online publication date: 1-Apr-2008.
  93. Shestak V, Chong E, Siegel H, Maciejewski A, Benmohamed L, Wang I and Daley R (2008). A hybrid Branch-and-Bound and evolutionary approach for allocating strings of applications to heterogeneous distributed computing systems, Journal of Parallel and Distributed Computing, 68:4, (410-426), Online publication date: 1-Apr-2008.
  94. ACM
    Moonen A, Bekooij M, van den Berg R and van Meerbergen J Cache aware mapping of streaming applications on a multiprocessor system-on-chip Proceedings of the conference on Design, automation and test in Europe, (300-305)
  95. He R and Delgado-Frias J (2007). Fault Tolerant Interleaved Switching Fabrics For Scalable High-Performance Routers, IEEE Transactions on Parallel and Distributed Systems, 18:12, (1727-1739), Online publication date: 1-Dec-2007.
  96. Liu Y, Chen P, Xie G, Liu G and Li Z Evaluating a low-power dual-core architecture Proceedings of the 7th international conference on Advanced parallel processing technologies, (80-89)
  97. Liu Y, Chen P, Xie G, Liu G and Li Z Evaluating a Low-Power Dual-Core Architecture Advanced Parallel Processing Technologies, (80-89)
  98. Moreno L, Gonzalez C, Castilla I, Gonzalez E and Sigut J (2007). Applying a constructivist and collaborative methodological approach in engineering education, Computers & Education, 49:3, (891-915), Online publication date: 1-Nov-2007.
  99. ACM
    Singer J, Brown G, Watson I and Cavazos J Intelligent selection of application-specific garbage collectors Proceedings of the 6th international symposium on Memory management, (91-102)
  100. ACM
    Jayant C, Renzelmann M, Wen D, Krisnandi S, Ladner R and Comden D Automated tactile graphics translation Proceedings of the 9th international ACM SIGACCESS conference on Computers and accessibility, (75-82)
  101. Yi K and Gaudiot J Architectural implications of cache coherence protocols with network applications on chip multiprocessors Proceedings of the 2007 IFIP international conference on Network and parallel computing, (394-403)
  102. Ogoubi É, Pouliot D, Turcotte M and Hafid A Parallel multiprocessor approaches to the RNA folding problem Proceedings of the 7th international conference on Parallel processing and applied mathematics, (1230-1239)
  103. Maani R and Parsa S An algorithm to improve parallelism in distributed systems using asynchronous calls Proceedings of the 7th international conference on Parallel processing and applied mathematics, (49-58)
  104. Sheaffer J, Luebke D and Skadron K A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware, (55-64)
  105. Molnos A, Cotofana S, Heijligers M and Eijndhoven J Static Cache Partitioning Robustness Analysis for Embedded On-Chip Multi-processors Transactions on High-Performance Embedded Architectures and Compilers I, (279-297)
  106. Jääskeläinen P, Guzma V and Takala J Resource conflict detection in simulation of function unit pipelines Proceedings of the 7th international conference on Embedded computer systems: architectures, modeling, and simulation, (233-240)
  107. ACM
    Lee S, Park D, Chung T, Lee D, Park S and Song H (2007). A log buffer-based flash translation layer using fully-associative sector translation, ACM Transactions on Embedded Computing Systems (TECS), 6:3, (18-es), Online publication date: 1-Jul-2007.
  108. ACM
    Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K and Chew L Optimistic parallelism requires abstractions Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation, (211-222)
  109. Eyole-Monono M, Harle R and Rose A SpotCore Proceedings of the ICST 2nd international conference on Body area networks, (1-8)
  110. ACM
    Kulkarni M, Pingali K, Walter B, Ramanarayanan G, Bala K and Chew L (2007). Optimistic parallelism requires abstractions, ACM SIGPLAN Notices, 42:6, (211-222), Online publication date: 10-Jun-2007.
  111. ACM
    Shee S and Parameswaran S Design methodology for pipelined heterogeneous multiprocessor system Proceedings of the 44th annual Design Automation Conference, (811-816)
  112. Agrawal K, Bender M and Fineman J The worst page-replacement policy Proceedings of the 4th international conference on Fun with algorithms, (135-145)
  113. ACM
    Becchi M, Franklin M and Crowley P Performance/area efficiency in chip multiprocessors with micro-caches Proceedings of the 4th international conference on Computing frontiers, (247-258)
  114. ACM
    Fraser K and Harris T (2007). Concurrent programming without locks, ACM Transactions on Computer Systems (TOCS), 25:2, (5-es), Online publication date: 1-May-2007.
  115. Poletti F, Poggiali A, Bertozzi D, Benini L, Marchal P, Loghi M and Poncino M (2007). Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing, IEEE Transactions on Computers, 56:5, (606-621), Online publication date: 1-May-2007.
  116. Saghir M and Naous R A configurable multi-ported register file architecture for soft processor cores Proceedings of the 3rd international conference on Reconfigurable computing: architectures, tools and applications, (14-25)
  117. Dai J, Li L and Huang B Pipelined Execution of Critical Sections Using Software-Controlled Caching in Network Processors Proceedings of the International Symposium on Code Generation and Optimization, (312-324)
  118. ACM
    Zhang G, Shu J, Xue W and Zheng W (2007). SLAS, ACM Transactions on Storage, 3:1, (3-es), Online publication date: 1-Mar-2007.
  119. ACM
    Hwang Y and Li J (2007). Snug set-associative caches, ACM Transactions on Architecture and Code Optimization, 4:1, (6-es), Online publication date: 1-Mar-2007.
  120. Eisley N, Peh L and Shang L In-Network Cache Coherence Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, (321-332)
  121. ACM
    Noonan L and Flanagan C An effective network processor design framework Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems, (103-112)
  122. ACM
    Shee S, Erdos A and Parameswaran S Heterogeneous multiprocessor implementations for JPEG: Proceedings of the 4th international conference on Hardware/software codesign and system synthesis, (217-222)
  123. Scott M, Costigan N and Abdulwahab W Implementing cryptographic pairings on smartcards Proceedings of the 8th international conference on Cryptographic Hardware and Embedded Systems, (134-147)
  124. ACM
    Cai L and Lu Y Power reduction of multiple disks using dynamic cache resizing and speed control Proceedings of the 2006 international symposium on Low power electronics and design, (186-190)
  125. Larkin D, Kinane A and O'Connor N Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices Proceedings of the 13th international conference on Neural information processing - Volume Part III, (1178-1188)
  126. ACM
    Yao X and Wang J (2006). RIMAC, ACM SIGOPS Operating Systems Review, 40:4, (249-262), Online publication date: 1-Oct-2006.
  127. Monchiero M, Palermo G, Silvano C and Villa O (2006). Efficient synchronization for embedded on-chip multiprocessors, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 14:10, (1049-1062), Online publication date: 1-Oct-2006.
  128. ACM
    Rajan K and Govindarajan R Two-level mapping based cache index selection for packet forwarding engines Proceedings of the 15th international conference on Parallel architectures and compilation techniques, (212-221)
  129. Athanasaki E, Anastopoulos N, Kourtis K and Koziris N Exploring the capacity of a modern SMT architecture to deliver high scientific application performance Proceedings of the Second international conference on High Performance Computing and Communications, (180-189)
  130. Liu Y, Furber S and Li Z The design of a dataflow coprocessor for low power embedded hierarchical processing Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation, (425-438)
  131. Keung K and Tyagi A SRAM CP Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation, (95-106)
  132. Vintan L, Gellert A, Florea A, Oancea M and Egan C Understanding prediction limits through unbiased branches Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture, (480-487)
  133. Fournier J and Tunstall M Cache based power analysis attacks on AES Proceedings of the 11th Australasian conference on Information Security and Privacy, (17-28)
  134. ACM
    Hampton M and Asanović K Implementing virtual memory in a vector processor with software restart markers Proceedings of the 20th annual international conference on Supercomputing, (135-144)
  135. ACM
    Yi J, Vandierendonck H, Eeckhout L and Lilja D The exigency of benchmark and compiler drift Proceedings of the 20th annual international conference on Supercomputing, (75-86)
  136. ACM
    Petit S, Tomás N, Sahuquillo J and Pont A An execution-driven simulation tool for teaching cache memories in introductory computer organization courses Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture, (4-es)
  137. ACM
    Smullen C and Taha T PSATSim Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture, (3-es)
  138. Chen J, Yi H, Yang X and Qian L Compile-Time energy optimization for parallel applications in on-chip multiprocessors Proceedings of the 6th international conference on Computational Science - Volume Part II, (904-911)
  139. Efremides O and Ivanov G A dynamic workload balancing technique of a text matching algorithm on a cluster Proceedings of the 5th WSEAS international conference on Telecommunications and informatics, (287-292)
  140. ACM
    Molnos A, Cotofana S, Heijligers M and van Eijndhoven J Static cache partitioning robustness analysis for embedded on-chip multi-processors Proceedings of the 3rd conference on Computing frontiers, (353-360)
  141. ACM
    Becchi M and Crowley P Dynamic thread assignment on heterogeneous multiprocessor architectures Proceedings of the 3rd conference on Computing frontiers, (29-40)
  142. Zeffer H, Radović Z and Hagersten E Exploiting locality Proceedings of the 20th international conference on Parallel and distributed processing, (33-33)
  143. Carvalho M, Góes L and Martins C Dynamically reconfigurable cache architecture using adaptive block allocation policy Proceedings of the 20th international conference on Parallel and distributed processing, (217-217)
  144. ACM
    Yao X and Wang J RIMAC Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, (249-262)
  145. Koo H and Mishra P Functional test generation using property decompositions for validation of pipelined processors Proceedings of the conference on Design, automation and test in Europe: Proceedings, (1240-1245)
  146. Monchiero M, Palermo G, Silvano C and Villa O Power/performance hardware optimization for synchronization intensive applications in MPSoCs Proceedings of the conference on Design, automation and test in Europe: Proceedings, (606-611)
  147. Molnos A, Heijligers M, Cotofana S and van Eijndhoven J Compositional, efficient caches for a chip multi-processor Proceedings of the conference on Design, automation and test in Europe: Proceedings, (345-350)
  148. ACM
    Monchiero M, Palermo G, Silvano C and Villa O (2005). An efficient synchronization technique for multiprocessor systems on-chip, ACM SIGARCH Computer Architecture News, 34:1, (33-40), Online publication date: 1-Mar-2006.
  149. ACM
    Naz A, Kavi K, Rezaei M and Li W (2005). Making a case for split data caches for embedded applications, ACM SIGARCH Computer Architecture News, 34:1, (19-26), Online publication date: 1-Mar-2006.
  150. Chang Y Lazy BTB Proceedings of the 2006 Asia and South Pacific Design Automation Conference, (917-922)
  151. Kim H and Oh H A DSP-Enhanced 32-bit embedded microprocessor Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing, (17-26)
  152. ACM
    Li J and Martínez J (2018). Power-performance considerations of parallel computing on chip multiprocessors, ACM Transactions on Architecture and Code Optimization (TACO), 2:4, (397-422), Online publication date: 1-Dec-2005.
  153. Kim H and Oh H A low-power DSP-enhanced 32-bit EISC processor Proceedings of the First international conference on High Performance Embedded Architectures and Compilers, (302-316)
  154. Kulkarni M and Bommi J. B Assertion-Based verification for the SpaceCAKE multiprocessor – a case study Proceedings of the First Haifa international conference on Hardware and Software Verification and Testing, (43-55)
  155. Jacob P, Erdogan O, Zia A, Belemjian P, Kraft R and McDonald J (2005). Predicting the Performance of a 3D Processor-Memory Chip Stack, IEEE Design & Test, 22:6, (540-547), Online publication date: 1-Nov-2005.
  156. Fournier J and Moore S A vector approach to cryptography implementation Proceedings of the First international conference on Digital Rights Management: technologies, Issues, Challenges and Systems, (277-297)
  157. Modarressi M, Goudarzi M and Hessabi S Application-Specific hardware-driven prefetching to improve data cache performance Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture, (761-774)
  158. Kelly D and Phillips B Arithmetic data value speculation Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture, (353-366)
  159. Kim J, Wills D and Wills L Architectural enhancements for color image and video processing on embedded systems Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture, (104-117)
  160. ACM
    Araiza R, Aguilera M, Pham T and Teller P Towards a cross-platform microbenchmark suite for evaluating hardware performance counter data Proceedings of the 2005 conference on Diversity in computing, (36-39)
  161. ACM
    Ladner R, Ivory M, Rao R, Burgstahler S, Comden D, Hahn S, Renzelmann M, Krisnandi S, Ramasamy M, Slabosky B, Martin A, Lacenski A, Olsen S and Groce D Automating tactile graphics translation Proceedings of the 7th international ACM SIGACCESS conference on Computers and accessibility, (150-157)
  162. Li P, Deng Y and Pileggi L Temperature-Dependent Optimization of Cache Leakage Power Dissipation Proceedings of the 2005 International Conference on Computer Design, (7-12)
  163. ACM
    Middha B, Simpson M and Barua R MTSS Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, (191-201)
  164. ACM
    Simpson M, Middha B and Barua R Segment protection for embedded systems using run-time checks Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems, (66-77)
  165. ACM
    Locasto M, Sidiroglou S and Keromytis A Speculative virtual verification Proceedings of the 2005 workshop on New security paradigms, (119-124)
  166. ACM
    Shee S, Parameswaran S and Cheung N Novel architecture for loop acceleration Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, (297-302)
  167. Wenzel I, Kirner R, Puschner P and Rieder B Principles of Timing Anomalies in Superscalar Processors Proceedings of the Fifth International Conference on Quality Software, (295-306)
  168. Monchiero M, Palermo G, Silvano C and Villa O An efficient synchronization technique for multiprocessor systems on-chip Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture, (33-40)
  169. Naz A, Kavi K, Rezaei M and Li W Making a case for split data caches for embedded applications Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture, (19-26)
  170. ACM
    Olukotun K and Hammond L (2005). The Future of Microprocessors, Queue, 3:7, (26-29), Online publication date: 1-Sep-2005.
  171. ACM
    Li J and Hwang Y Snug set-associative caches Proceedings of the 2005 international symposium on Low power electronics and design, (345-350)
  172. Pande P, Grecu C, Jones M, Ivanov A and Saleh R (2005). Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures, IEEE Transactions on Computers, 54:8, (1025-1040), Online publication date: 1-Aug-2005.
  173. ACM
    Mark W and Fussell D Real-time rendering systems in 2010 ACM SIGGRAPH 2005 Courses, (19-es)
  174. Vayá G, Langerwerf J and Pirsch P RAPANUI Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation, (32-40)
  175. Shim S, Kwak J, Kim C, Jhang S and Jhon C Power-aware branch logic Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation, (162-171)
  176. A Simple Project for Teaching Instruction Set Architecture Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies, (69-71)
  177. ACM
    Azimi R, Stumm M and Wisniewski R Online performance analysis by statistical sampling of microprocessor performance counters Proceedings of the 19th annual international conference on Supercomputing, (101-110)
  178. ACM
    Teller J, Silio C and Jacob B Performance characteristics of MAUI Proceedings of the 2005 workshop on Memory system performance, (44-53)
  179. ACM
    Ricks K, Stapleton W and Jackson D An embedded systems course and course sequence Proceedings of the 2005 workshop on Computer architecture education: held in conjunction with the 32nd International Symposium on Computer Architecture, (8-es)
  180. Srinivasan J, Adve S, Bose P and Rivers J Exploiting Structural Duplication for Lifetime Reliability Enhancement Proceedings of the 32nd annual international symposium on Computer Architecture, (520-531)
  181. ACM
    Naz A, Rezaei M, Kavi K and Sweany P (2004). Improving data cache performance with integrated use of split caches, victim cache and stream buffers, ACM SIGARCH Computer Architecture News, 33:3, (41-48), Online publication date: 1-Jun-2005.
  182. Bhunia S, Datta A, Banerjee N and Roy K (2005). GAARP, IEEE Transactions on Computers, 54:6, (752-766), Online publication date: 1-Jun-2005.
  183. ACM
    Srinivasan J, Adve S, Bose P and Rivers J (2019). Exploiting Structural Duplication for Lifetime Reliability Enhancement, ACM SIGARCH Computer Architecture News, 33:2, (520-531), Online publication date: 1-May-2005.
  184. Grelck C (2005). Shared memory multiprocessor support for functional array processing in SAC, Journal of Functional Programming, 15:3, (353-401), Online publication date: 1-May-2005.
  185. Shestak V, Chong E, Maciejewski A, Siegel H, Benmohamed L, Wang I and Daley R Resource Allocation for Periodic Applications in a Shipboard Environment Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 1 - Volume 02
  186. Datta A, Bhunia S, Banerjee N and Roy K A Power-Aware GALS Architecture for Real-Time Algorithm-Specific Tasks Proceedings of the 6th International Symposium on Quality of Electronic Design, (358-363)
  187. ACM
    Dalton A and Norris C An experimental evaluation of a distributed Java compiler Proceedings of the 43rd annual Southeast regional conference - Volume 2, (294-299)
  188. ACM
    van de Waerdt J, Slavenburg G, van Itegem J and Vassiliadis S Motion estimation performance of the TM3270 processor Proceedings of the 2005 ACM symposium on Applied computing, (850-856)
  189. Molnos A, Heijligers M, Cotofana S and Eijndhoven J Compositional Memory Systems for Multimedia Communicating Tasks Proceedings of the conference on Design, Automation and Test in Europe - Volume 2, (932-937)
  190. ACM
    Ekman M, Warg F and Nilsson J (2005). An in-depth look at computer performance growth, ACM SIGARCH Computer Architecture News, 33:1, (144-147), Online publication date: 1-Mar-2005.
  191. ACM
    Crandall J and Chong F (2005). A security assessment of the minos architecture, ACM SIGARCH Computer Architecture News, 33:1, (48-57), Online publication date: 1-Mar-2005.
  192. Monchiero M, Palermo G, Sami M, Silvano C, Zaccaria V and Zafalon R (2005). Low-power branch prediction techniques for VLIW architectures, Integration, the VLSI Journal, 38:3, (515-524), Online publication date: 1-Jan-2005.
  193. van Berkel K, Heinle F, Meuwissen P, Moerman K and Weiss M (2005). Vector processing as an enabler for software-defined radio in handheld devices, EURASIP Journal on Advances in Signal Processing, 2005, (2613-2625), Online publication date: 1-Jan-2005.
  194. ACM
    Song J (2005). Segment-based proxy caching for distributed cooperative media content servers, ACM SIGOPS Operating Systems Review, 39:1, (22-33), Online publication date: 1-Jan-2005.
  195. Monchiero M, Palermo G, Sami M, Silvano C, Zaccaria V and Zafalon R (2005). Low-power branch prediction techniques for VLIW architectures, Integration, the VLSI Journal, 38:3, (515-524), Online publication date: 1-Jan-2005.
  196. Crandall J and Chong F Minos Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture, (221-232)
  197. Thiele L and Wilhelm R (2019). Design for Timing Predictability, Real-Time Systems, 28:2-3, (157-177), Online publication date: 1-Nov-2004.
  198. ACM
    Patterson D (2004). Latency lags bandwith, Communications of the ACM, 47:10, (71-75), Online publication date: 1-Oct-2004.
  199. ACM
    Naz A, Rezaei M, Kavi K and Sweany P Improving data cache performance with integrated use of split caches, victim cache and stream buffers Proceedings of the 2004 workshop on MEmory performance: DEaling with Applications , systems and architecture, (41-48)
  200. ACM
    Beck A and Carro L A VLIW low power Java processor for embedded applications Proceedings of the 17th symposium on Integrated circuits and system design, (157-162)
  201. ACM
    Liu Y and Furber S The design of a low power asynchronous multiplier Proceedings of the 2004 international symposium on Low power electronics and design, (301-306)
  202. ACM
    Ekman M and Stenstrom P A case for multi-level main memory Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, (1-8)
  203. Kowarschik M, Christadler I and Rüde U Towards cache-optimized multigrid using patch-adaptive relaxation Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing, (901-910)
  204. ACM
    Uy R, Bernardo M and Erica J DARC2 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture, (20-es)
  205. ACM
    Nakkar M Integrating research and e-learning in advanced computer architecture courses Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture, (14-es)
  206. ACM
    Claver J, Castillo M and Mayo R Improving Instruction Set Architecture learning results Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture, (13-es)
  207. Kim J and Yi J Performance sensitivity of SPEC CPU2000 over operating frequency Proceedings of the 2004 international symposium on Information and communication technologies, (196-201)
  208. Zhang H, Newman T and Zhang X Case study of multithreaded in-core isosurface extraction algorithms Proceedings of the 5th Eurographics conference on Parallel Graphics and Visualization, (83-92)
  209. ACM
    Mishra P, Shrivastava A and Dutt N Architecture description language (ADL)-driven software toolkit generation for architectural exploration of programmable SOCs Proceedings of the 41st annual Design Automation Conference, (626-658)
  210. Karim F, Mellan A, Nguyen A, Aydonat U and Abdelrahman T (2018). A Multilevel Computing Architecture for Embedded Multimedia Applications, IEEE Micro, 24:3, (56-66), Online publication date: 1-May-2004.
  211. ACM
    Monchiero M, Palermo G, Sami M, Silvano C, Zaccaria V and Zafalon R Power-aware branch prediction techniques Proceedings of the 14th ACM Great Lakes symposium on VLSI, (440-443)
  212. ACM
    Juurlink B Approximating the optimal replacement algorithm Proceedings of the 1st conference on Computing frontiers, (313-319)
  213. ACM
    de Langen P and Juurlink B Reducing traffic generated by conflict misses in caches Proceedings of the 1st conference on Computing frontiers, (235-239)
  214. ACM
    Galluzzi M, Puente V, Cristal A, Beivide R, Gregorio J and Valero M A first glance at Kilo-instruction based multiprocessors Proceedings of the 1st conference on Computing frontiers, (212-221)
  215. ACM
    Al Na'mneh R, Pan W and Wells B Two parallel implementations for one dimension FFT on symmetric multiprocessors Proceedings of the 42nd annual Southeast regional conference, (273-278)
  216. ACM
    Al-Zoubi H, Milenkovic A and Milenkovic M Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite Proceedings of the 42nd annual Southeast regional conference, (267-272)
  217. ACM
    Gurumani S and Milenkovic A Execution characteristics of SPEC CPU2000 benchmarks Proceedings of the 42nd annual Southeast regional conference, (261-266)
  218. Schindler J, Schlosser S, Shao M, Ailamaki A and Ganger G Atropos Proceedings of the 3rd USENIX conference on File and storage technologies, (12-12)
  219. Schindler J, Schlosser S, Shao M, Ailamaki A and Ganger G Atropos: A Disk Array Volume Manager for Orchestrated Use of Disks Proceedings of the 3rd USENIX Conference on File and Storage Technologies, (159-172)
  220. Molnos A, Heijligers M, Cotofana S and van Eijndhoven J Compositional Memory Systems for Data Intensive Applications Proceedings of the conference on Design, automation and test in Europe - Volume 1
  221. ACM
    Breen K and Elliott D (2003). Aliasing and anti-aliasing in branch history table prediction, ACM SIGARCH Computer Architecture News, 31:5, (1-4), Online publication date: 1-Dec-2003.
  222. Verma M, Wehmeyer L and Marwedel P Efficient scratchpad allocation algorithms for energy constrained embedded systems Proceedings of the Third international conference on Power - Aware Computer Systems, (41-56)
  223. ACM
    Harris T and Fraser K (2003). Language support for lightweight transactions, ACM SIGPLAN Notices, 38:11, (388-402), Online publication date: 26-Nov-2003.
  224. Kim N, Blaauw D and Mudge T Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
  225. ACM
    Harris T and Fraser K Language support for lightweight transactions Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, (388-402)
  226. Liu J, Chow F, Kong T and Roy R (2003). Variable Instruction Set Architecture and Its Compiler Support, IEEE Transactions on Computers, 52:7, (881-895), Online publication date: 1-Jul-2003.
  227. ACM
    Citron D MisSPECulation Proceedings of the 30th annual international symposium on Computer architecture, (52-61)
  228. ACM
    Citron D (2003). MisSPECulation, ACM SIGARCH Computer Architecture News, 31:2, (52-61), Online publication date: 1-May-2003.
  229. Xin Q, Miller E, Schwarz T, Long D, Brandt S and Litwin W Reliability Mechanisms for Very Large Storage Systems Proceedings of the 20 th IEEE/11 th NASA Goddard Conference on Mass Storage Systems and Technologies (MSS'03)
  230. Aspnes J (2002). Fast deterministic consensus in a noisy environment, Journal of Algorithms, 45:1, (16-39), Online publication date: 1-Oct-2002.
Contributors
  • Stanford University
  • University of California, Berkeley

Recommendations