skip to main content
Skip header Section
Computer architecture: a quantitative approachJanuary 2002
Publisher:
  • Morgan Kaufmann Publishers Inc.
  • 340 Pine Street, Sixth Floor
  • San Francisco
  • CA
  • United States
ISBN:978-1-55860-596-1
Published:01 January 2002
Pages:
1096
Skip Bibliometrics Section
Bibliometrics
Skip Abstract Section
Abstract

This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high-performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and Web technologies, and high-performance computing.

Cited By

  1. ACM
    Zhong W, Li J, Niu N and Fu F Algorithm analysis of MCU automatic trimming The 2nd International Conference on Computing and Data Science, (1-6)
  2. ACM
    Li Y, Phanishayee A, Murray D and Kim N Doing more with less Proceedings of the Workshop on Hot Topics in Operating Systems, (119-127)
  3. Fosse T, Tisi M, Bousse E, Mottu J and Sunyé G Towards platform specific energy estimation for executable domain-specific modeling languages Proceedings of the 22nd International Conference on Model Driven Engineering Languages and Systems, (314-317)
  4. ACM
    Dong X, Shen Z, Criswell J, Cox A and Dwarkadas S Spectres, virtual ghosts, and hardware support Proceedings of the 7th International Workshop on Hardware and Architectural Support for Security and Privacy, (1-9)
  5. ACM
    Kurth A, Capotondi A, Vogel P, Benini L and Marongiu A HERO Proceedings of the 2nd Workshop on AutotuniNg and aDaptivity AppRoaches for Energy efficient HPC Systems, (1-6)
  6. Multanen J, Viitanen T, Jääskeläinen P and Takala J (2018). Instruction Fetch Energy Reduction with Biased SRAMs, Journal of Signal Processing Systems, 90:11, (1519-1532), Online publication date: 1-Nov-2018.
  7. ACM
    Hammari E, Kjeldsberg P and Catthoor F (2018). Runtime Precomputation of Data-Dependent Parameters in Embedded Systems, ACM Transactions on Embedded Computing Systems, 17:3, (1-21), Online publication date: 31-May-2018.
  8. Altaf M and Wood D (2015). LogCA: A Performance Model for Hardware Accelerators, IEEE Computer Architecture Letters, 14:2, (132-135), Online publication date: 1-Jul-2015.
  9. ACM
    Childers B, Yang J and Zhang Y Achieving Yield, Density and Performance Effective DRAM at Extreme Technology Sizes Proceedings of the 2015 International Symposium on Memory Systems, (78-84)
  10. ACM
    Jurkiewicz T and Mehlhorn K (2015). On a Model of Virtual Address Translation, ACM Journal of Experimental Algorithmics, 19, (1-28), Online publication date: 3-Feb-2015.
  11. Xiang P, Yang Y, Mantor M, Rubin N and Zhou H Revisiting ILP designs for throughput-oriented GPGPU architecture Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, (121-130)
  12. ACM
    Tu C, Hsu H, Chen J, Chen C and Hung S (2014). Performance and power profiling for emulated Android systems, ACM Transactions on Design Automation of Electronic Systems, 19:2, (1-25), Online publication date: 1-Mar-2014.
  13. ACM
    Dossis M A Floating-Point Paradigm for High-level Synthesis Proceedings of the 18th Panhellenic Conference on Informatics, (1-6)
  14. Stewin P A Primitive for Revealing Stealthy Peripheral-Based Attacks on the Computing Platform's Main Memory Proceedings of the 16th International Symposium on Research in Attacks, Intrusions, and Defenses - Volume 8145, (1-20)
  15. Ge R, Feng X and Sun X SERA-IO Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), (204-211)
  16. Alvarez L, Vilanova L, Gonzalez M, Martorell X, Navarro N and Ayguade E Hardware-software coherence protocol for the coexistence of caches and local memories Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, (1-11)
  17. ACM
    Park S, Gupta S, Mojumder N, Raghunathan A and Roy K Future cache design using STT MRAMs for improved energy efficiency Proceedings of the 49th Annual Design Automation Conference, (492-497)
  18. Stewin P and Bystrov I Understanding DMA malware Proceedings of the 9th international conference on Detection of Intrusions and Malware, and Vulnerability Assessment, (21-41)
  19. Velev M and Gao P Automatic formal verification of multithreaded pipelined microprocessors Proceedings of the International Conference on Computer-Aided Design, (679-686)
  20. Thiyagalingam J, Goodman D, Schnabel J, Trefethen A and Grau V (2011). On the usage of GPUs for efficient motion estimation in medical image sequences, Journal of Biomedical Imaging, 2011, (1-15), Online publication date: 1-Jan-2011.
  21. ACM
    Gilroy M, Irvine J and Atkinson R (2011). RAID 6 Hardware Acceleration, ACM Transactions on Embedded Computing Systems, 10:4, (1-17), Online publication date: 1-Nov-2011.
  22. ACM
    Schoeberl M, Korsholm S, Kalibera T and Ravn A (2011). A Hardware Abstraction Layer in Java, ACM Transactions on Embedded Computing Systems, 10:4, (1-40), Online publication date: 1-Nov-2011.
  23. ACM
    Caparrós Cabezas V and Stanley-Marbell P Parallelism and data movement characterization of contemporary application classes Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures, (95-104)
  24. Yuan F, Wright S, Eder K and May D Managing complexity through abstraction Proceedings of the 13th international conference on Formal methods and software engineering, (585-600)
  25. Chen W, Wang Z, Dou Q and Wang Y A novel chaining approach to indirect control transfer instructions Proceedings of the IFIP WG 8.4/8.9 international cross domain conference on Availability, reliability and security for business, enterprise and health information systems, (309-320)
  26. Habgood K and Arel I Revisiting Cramer's rule for solving dense linear systems Proceedings of the 2010 Spring Simulation Multiconference, (1-8)
  27. ACM
    Torbert S, Vishkin U, Tzur R and Ellison D Is teaching parallel algorithmic thinking to high school students possible? Proceedings of the 41st ACM technical symposium on Computer science education, (290-294)
  28. Vaidyanathan N, Billionniere E and Collofello J (2010). A preliminary comparative survey of computer architecture courses across the nation's top schools, Journal of Computing Sciences in Colleges, 25:4, (193-202), Online publication date: 1-Apr-2010.
  29. ACM
    Jin Z, Pittman R and Forin A Reconfigurable custom floating-point instructions (abstract only) Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays, (287-287)
  30. Velev M and Gao P A method for debugging of pipelined processors in formal verification by correspondence checking Proceedings of the 2010 Asia and South Pacific Design Automation Conference, (619-624)
  31. ACM
    Schoeberl M, Preußer T and Uhrig S The embedded Java benchmark suite JemBench Proceedings of the 8th International Workshop on Java Technologies for Real-Time and Embedded Systems, (120-127)
  32. ACM
    Pesterev A, Zeldovich N and Morris R Locating cache performance bottlenecks using data profiling Proceedings of the 5th European conference on Computer systems, (335-348)
  33. ACM
    Cabodi G, Lavagno L, Murciano M, Kondratyev A and Watanabe Y (2010). Speeding-up heuristic allocation, scheduling and binding with SAT-based abstraction/refinement techniques, ACM Transactions on Design Automation of Electronic Systems, 15:2, (1-34), Online publication date: 1-Feb-2010.
  34. Amir A and Levy A String rearrangement metrics Algorithms and Applications, (1-33)
  35. Velev M and Gao P Method for formal verification of soft-error tolerance mechanisms in pipelined microprocessors Proceedings of the 12th international conference on Formal engineering methods and software engineering, (355-370)
  36. Amir A, Eisenberg E, Keller O, Levy A and Porat E Approximate string matching with stuck address bits Proceedings of the 17th international conference on String processing and information retrieval, (395-405)
  37. ACM
    La Fratta P and Kogge P Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore Proceedings of the 7th ACM international conference on Computing frontiers, (227-236)
  38. ACM
    Schwartz-Narbonne D, Chan C, Mahajan Y and Malik S Supporting RTL flow compatibility in a microarchitecture-level design framework Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis, (343-352)
  39. Wang B, Yao Y, Himmelspach J, Ewald R and Uhrmacher A Experimental analysis of logical process simulation algorithms in JAMES II Winter Simulation Conference, (1167-1179)
  40. ACM
    Murase M, Shimizu K, Plouffe W and Sakamoto M Effective implementation of the cell broadband engine™ isolation loader Proceedings of the 16th ACM conference on Computer and communications security, (303-313)
  41. ACM
    Ferri B and Ferri A (2009). Reconfiguration of IIR filters in response to computer resource availability, ACM Transactions on Embedded Computing Systems, 9:1, (1-25), Online publication date: 1-Oct-2009.
  42. ACM
    El-Shobaky S, El-Mahdy A and El-Nahas A Automatic vectorization using dynamic compilation and tree pattern matching technique in Jikes RVM Proceedings of the 4th workshop on the Implementation, Compilation, Optimization of Object-Oriented Languages and Programming Systems, (63-69)
  43. ACM
    Bilardi G, Ekanadham K and Pattnaik P (2009). On approximating the ideal random access machine by physical machines, Journal of the ACM, 56:5, (1-57), Online publication date: 1-Aug-2009.
  44. ACM
    Sirowy S, Sheldon D, Givargis T and Vahid F (2009). Virtual microcontrollers, ACM SIGBED Review, 6:1, (1-8), Online publication date: 1-Jan-2009.
  45. ACM
    Moreto M, Cazorla F, Ramirez A, Sakellariou R and Valero M (2009). FlexDCP, ACM SIGOPS Operating Systems Review, 43:2, (86-96), Online publication date: 21-Apr-2009.
  46. ACM
    Williams S, Waterman A and Patterson D (2009). Roofline, Communications of the ACM, 52:4, (65-76), Online publication date: 1-Apr-2009.
  47. Sahoo S, Shekhar C, Kodali S, Asati A and Gupta A (2009). Dual channel addition based FFT processor architecture for signal and image processing, International Journal of High Performance Systems Architecture, 2:1, (35-45), Online publication date: 1-Dec-2009.
  48. Amir A, Aumann Y, Kapah O, Levy A and Porat E (2009). Approximate string matching with address bit errors, Theoretical Computer Science, 410:51, (5334-5346), Online publication date: 1-Nov-2009.
  49. Le G and Shi Y (2009). Access region cache with register guided memory reference partitioning, Journal of Systems Architecture: the EUROMICRO Journal, 55:10-12, (434-445), Online publication date: 1-Oct-2009.
  50. Xu L (2008). A modular approach to language engineering using XML and inexpensive robots, Journal of Computing Sciences in Colleges, 23:5, (133-141), Online publication date: 1-May-2008.
  51. Kiselyov O, Byrd W, Friedman D and Shan C Pure, declarative, and constructive arithmetic relations (declarative pearl) Proceedings of the 9th international conference on Functional and logic programming, (64-80)
  52. ACM
    Pirzadeh H and Dubé D VEP Proceedings of the 1st ACM workshop on Virtual machine security, (9-18)
  53. ACM
    Bungo J (2008). The use of compiler optimizations for embedded systems software, XRDS: Crossroads, The ACM Magazine for Students, 15:1, (8-15), Online publication date: 1-Sep-2008.
  54. ACM
    Koo H and Mishra P Specification-based compaction of directed tests for functional validation of pipelined processors Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis, (137-142)
  55. ACM
    Chowdhury R and Ramachandran V Cache-efficient dynamic programming algorithms for multicores Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures, (207-216)
  56. ACM
    Middha B, Simpson M and Barua R (2008). MTSS, ACM Transactions on Embedded Computing Systems, 7:4, (1-37), Online publication date: 1-Jul-2008.
  57. ACM
    Wang W, Wang Q, Wei W and Liu D Modeling and evaluating heterogeneous memory architectures by trace-driven simulation Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?, (369-376)
  58. ACM
    He B and Luo Q (2008). Cache-oblivious databases, ACM Transactions on Database Systems, 33:2, (1-42), Online publication date: 1-Jun-2008.
  59. ACM
    Xu L (2008). Language engineering in the context of a popular, inexpensive robot platform, ACM SIGCSE Bulletin, 40:1, (43-47), Online publication date: 29-Feb-2008.
  60. ACM
    Xu L Language engineering in the context of a popular, inexpensive robot platform Proceedings of the 39th SIGCSE technical symposium on Computer science education, (43-47)
  61. Schoeberl M (2008). A Java processor architecture for embedded real-time systems, Journal of Systems Architecture: the EUROMICRO Journal, 54:1-2, (265-286), Online publication date: 1-Jan-2008.
  62. Khanli L and Analoui M (2008). An approach to grid resource selection and fault management based on ECA rules, Future Generation Computer Systems, 24:4, (296-316), Online publication date: 1-Apr-2008.
  63. Amir A, Aumann Y, Kapah O, Levy A and Porat E Approximate String Matching with Address Bit Errors Proceedings of the 19th annual symposium on Combinatorial Pattern Matching, (118-129)
  64. Xu L (2007). Project the wiki way, Journal of Computing Sciences in Colleges, 22:6, (109-116), Online publication date: 1-Jun-2007.
  65. ACM
    Shacham A, Bergman K and Carloni L The case for low-power photonic networks on chip Proceedings of the 44th annual Design Automation Conference, (132-135)
  66. Murphy R and Kogge P (2007). On the Memory Access Patterns of Supercomputer Applications, IEEE Transactions on Computers, 56:7, (937-945), Online publication date: 1-Jul-2007.
  67. Sugihara M, Ishihara T and Murakami K Task scheduling for reliable cache architectures of multiprocessor systems Proceedings of the conference on Design, automation and test in Europe, (1490-1495)
  68. Rhod E, Lisbôa C and Carro L A low-SER efficient core processor architecture for future technologies Proceedings of the conference on Design, automation and test in Europe, (1448-1453)
  69. Verma S, Harris I and Ramineni K Interactive presentation: Automatic generation of functional coverage models from behavioral verilog descriptions Proceedings of the conference on Design, automation and test in Europe, (900-905)
  70. ACM
    Dominguez A, Nguyen N and Barua R Recursive function data allocation to scratch-pad memory Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems, (65-74)
  71. ACM
    Koc H, Kandemir M, Ercanli E and Ozturk O Reducing off-chip memory access costs using data recomputation in embedded chip multi-processors Proceedings of the 44th annual Design Automation Conference, (224-229)
  72. ACM
    Ali A, Johnsson L and Subhlok J Scheduling FFT computation on SMP and multicore systems Proceedings of the 21st annual international conference on Supercomputing, (293-301)
  73. ACM
    Nesbit K, Laudon J and Smith J (2007). Virtual private caches, ACM SIGARCH Computer Architecture News, 35:2, (57-68), Online publication date: 9-Jun-2007.
  74. ACM
    Nesbit K, Laudon J and Smith J Virtual private caches Proceedings of the 34th annual international symposium on Computer architecture, (57-68)
  75. ACM
    Sasanka R, Li M, Adve S, Chen Y and Debes E (2007). ALP, ACM Transactions on Architecture and Code Optimization, 4:1, (3-es), Online publication date: 1-Mar-2007.
  76. Li X and Parashar M (2007). Hybrid Runtime Management of Space-Time Heterogeneity for Parallel Structured Adaptive Applications, IEEE Transactions on Parallel and Distributed Systems, 18:9, (1202-1214), Online publication date: 1-Sep-2007.
  77. Qin X (2007). Design and analysis of a load balancing strategy in data grids, Future Generation Computer Systems, 23:1, (132-137), Online publication date: 1-Jan-2007.
  78. Yang H, Ziavras S and Hu J (2007). Reconfiguration support for vector operations, International Journal of High Performance Systems Architecture, 1:2, (89-97), Online publication date: 1-Oct-2007.
  79. Marescaux T, Brockmeyer E and Corporaal H The Impact of Higher Communication Layers on NoC Supported MP-SoCs Proceedings of the First International Symposium on Networks-on-Chip, (107-116)
  80. Lin T, Lin H, Chao C, Liu C and Jen C (2006). A Compact DSP Core with Static Floating-Point Arithmetic, Journal of VLSI Signal Processing Systems, 42:2, (127-138), Online publication date: 1-Feb-2006.
  81. Andrews J and Baker N (2006). Xbox 360 System Architecture, IEEE Micro, 26:2, (25-37), Online publication date: 1-Mar-2006.
  82. Yang X and H. Vaidya N (2006). A Wireless MAC Protocol Using Implicit Pipelining, IEEE Transactions on Mobile Computing, 5:3, (258-273), Online publication date: 1-Mar-2006.
  83. Zheng K, Che H, Wang Z, Liu B and Zhang X (2006). DPPC-RE, IEEE Transactions on Computers, 55:8, (947-961), Online publication date: 1-Aug-2006.
  84. Kwak J, Jhang S and Jhon C Accuracy enhancement by selective use of branch history in embedded processor Proceedings of the 6th international conference on Computational Science - Volume Part IV, (979-986)
  85. Kwak J and Jhon C Recovery logics for speculative update global and local branch history Proceedings of the 21st international conference on Computer and Information Sciences, (258-266)
  86. Bariamis D, Iakovidis D and Maroulis D Dedicated hardware for real-time computation of second-order statistical features for high resolution images Proceedings of the 8th international conference on Advanced Concepts For Intelligent Vision Systems, (67-77)
  87. Dolev S and Haviv Y (2006). Self-Stabilizing Microprocessor, IEEE Transactions on Computers, 55:4, (385-399), Online publication date: 1-Apr-2006.
  88. Cérin C, Koskas M, Fkaier H and Jemni M (2006). Sequential in-core sorting performance for a SQL data service and for parallel sorting on heterogeneous clusters, Future Generation Computer Systems, 22:7, (776-783), Online publication date: 1-Aug-2006.
  89. Zhu Y and Jiang H (2006). CEFT, Journal of Parallel and Distributed Computing, 66:2, (291-306), Online publication date: 1-Feb-2006.
  90. ACM
    Mendes J, Coutinho L and Martins C Web memory hierarchy learning and research environment Proceedings of the 2006 workshop on Computer architecture education: held in conjunction with the 33rd International Symposium on Computer Architecture, (5-es)
  91. ACM
    Gill G, Hansen J and Singh M Loop pipelining for high-throughput stream computation using self-timed rings Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design, (289-296)
  92. ACM
    Bellens P, Perez J, Badia R and Labarta J CellSs Proceedings of the 2006 ACM/IEEE conference on Supercomputing, (86-es)
  93. ACM
    Gilbert J and Abrahamson D Adaptive object code compression Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems, (282-292)
  94. ACM
    Adams K and Agesen O (2006). A comparison of software and hardware techniques for x86 virtualization, ACM SIGARCH Computer Architecture News, 34:5, (2-13), Online publication date: 20-Oct-2006.
  95. ACM
    Adams K and Agesen O (2006). A comparison of software and hardware techniques for x86 virtualization, ACM SIGPLAN Notices, 41:11, (2-13), Online publication date: 1-Nov-2006.
  96. ACM
    Adams K and Agesen O (2006). A comparison of software and hardware techniques for x86 virtualization, ACM SIGOPS Operating Systems Review, 40:5, (2-13), Online publication date: 20-Oct-2006.
  97. ACM
    Adams K and Agesen O A comparison of software and hardware techniques for x86 virtualization Proceedings of the 12th international conference on Architectural support for programming languages and operating systems, (2-13)
  98. ACM
    Bardine A, Bechini A, Foglia P and Prete C (2005). Analysis of embedded video coder systems, ACM SIGARCH Computer Architecture News, 34:1, (71-76), Online publication date: 1-Mar-2006.
  99. ACM
    Chiyonobu A and Sato T (2005). Energy-efficient instruction scheduling utilizing cache miss information, ACM SIGARCH Computer Architecture News, 34:1, (65-70), Online publication date: 1-Mar-2006.
  100. ACM
    Yue Y, Lin C and Tan Z (2005). NPCryptBench, ACM SIGARCH Computer Architecture News, 34:1, (49-56), Online publication date: 1-Mar-2006.
  101. ACM
    Chen W, Bhansali S, Chilimbi T, Gao X and Chuang W Profile-guided proactive garbage collection for locality optimization Proceedings of the 27th ACM SIGPLAN Conference on Programming Language Design and Implementation, (332-340)
  102. ACM
    Chen W, Bhansali S, Chilimbi T, Gao X and Chuang W (2006). Profile-guided proactive garbage collection for locality optimization, ACM SIGPLAN Notices, 41:6, (332-340), Online publication date: 11-Jun-2006.
  103. ACM
    Koo H and Mishra P Test generation using SAT-based bounded model checking for validation of pipelined processors Proceedings of the 16th ACM Great Lakes symposium on VLSI, (362-365)
  104. Ou S, Lin T, Huang C, Kuo Y, Chao C, Liu C and Jen C A 52mW 1200MIPS compact DSP for multi-core media SoC Proceedings of the 2006 Asia and South Pacific Design Automation Conference, (118-119)
  105. Heffernan M, Wilken K and Shobaki G Data-Dependency Graph Transformations for Superblock Scheduling Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture, (77-88)
  106. Velev M Formal Verification of Pipelined Microprocessors with Delayed Branches Proceedings of the 7th International Symposium on Quality Electronic Design, (296-299)
  107. Sugihara M, Ishihara T, Muroyama M and Hashimoto K A Simulation-Based Soft Error Estimation Methodology for Computer Systems Proceedings of the 7th International Symposium on Quality Electronic Design, (196-203)
  108. Velev M Using Abstraction for Efficient Formal Verification of Pipelined Processors with Value Prediction Proceedings of the 7th International Symposium on Quality Electronic Design, (51-56)
  109. Billerbeck B and Zobel J (2006). Efficient query expansion with auxiliary data structures, Information Systems, 31:7, (573-584), Online publication date: 1-Nov-2006.
  110. Koukis E and Koziris N Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1, (345-354)
  111. Vedantham R, Zhuang Z and Sivakumar R (2006). Hazard avoidance in wireless sensor and actor networks, Computer Communications, 29:13-14, (2578-2598), Online publication date: 1-Aug-2006.
  112. Zhang C, Zhou H, Zhang M and Xing Z An architectural leakage power reduction method for instruction cache in ultra deep submicron microprocessors Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture, (588-594)
  113. Amir A Asynchronous pattern matching Proceedings of the 17th Annual conference on Combinatorial Pattern Matching, (1-10)
  114. Wang F, Zhang S, Feng D, Jiang H, Zeng L and Lv S A hybrid scheme for object allocation in a distributed object-storage system Proceedings of the 6th international conference on Computational Science - Volume Part IV, (396-403)
  115. Di Blas A, Dahle D, Diekhans M, Grate L, Hirschberg J, Karplus K, Keller H, Kendrick M, J. Mesa-Martinez F, Pease D, Rice E, Schultz A, Speck D and Hughey R (2005). The UCSC Kestrel Parallel Processor, IEEE Transactions on Parallel and Distributed Systems, 16:1, (80-92), Online publication date: 1-Jan-2005.
  116. ACM
    Roy A, Panda S, Kumar R and Chakrabarti P (2005). A framework for systematic validation and debugging of pipeline simulators, ACM Transactions on Design Automation of Electronic Systems, 10:3, (462-491), Online publication date: 1-Jul-2005.
  117. ACM
    Gunawi H, Agrawal N, Arpaci-Dusseau A, Arpaci-Dusseau R and Schindler J (2005). Deconstructing Commodity Storage Clusters, ACM SIGARCH Computer Architecture News, 33:2, (60-71), Online publication date: 1-May-2005.
  118. Velev M Automatic formal verification of liveness for pipelined processors with multicycle functional units Proceedings of the 13 IFIP WG 10.5 international conference on Correct Hardware Design and Verification Methods, (97-113)
  119. Gunawi H, Agrawal N, Arpaci-Dusseau A, Arpaci-Dusseau R and Schindler J Deconstructing Commodity Storage Clusters Proceedings of the 32nd annual international symposium on Computer Architecture, (60-71)
  120. Haga S, Reeves N, Barua R and Marculescu D (2005). Dynamic functional unit assignment for low power, The Journal of Supercomputing, 31:1, (47-62), Online publication date: 1-Jan-2005.
  121. ACM
    Sinha R and Zobel J (2005). Using random sampling to build approximate tries for efficient string sorting, ACM Journal of Experimental Algorithmics, 10, (2.10-es), Online publication date: 31-Dec-2005.
  122. Papathanasiou A and Scott M Aggressive prefetching Proceedings of the 10th conference on Hot Topics in Operating Systems - Volume 10, (6-6)
  123. Bardine A, Bechini A, Foglia P and Prete C Analysis of embedded video coder systems Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture, (71-76)
  124. Chiyonobu A and Sato T Energy-efficient instruction scheduling utilizing cache miss information Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture, (65-70)
  125. Yue Y, Lin C and Tan Z NPCryptBench Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture, (49-56)
  126. ACM
    Rao W, Orailoglu A and Karri R Fault tolerant nanoelectronic processor architectures Proceedings of the 2005 Asia and South Pacific Design Automation Conference, (311-316)
  127. ACM
    Velev M Comparison of schemes for encoding unobservability in translation to SAT Proceedings of the 2005 Asia and South Pacific Design Automation Conference, (1056-1059)
  128. ACM
    Hasan J and Vijaykumar T (2005). Dynamic pipelining, ACM SIGCOMM Computer Communication Review, 35:4, (205-216), Online publication date: 1-Oct-2005.
  129. ACM
    Zennaro M and Sengupta R Distributing synchronous programs using bounded queues Proceedings of the 5th ACM international conference on Embedded software, (325-334)
  130. ACM
    Hasan J and Vijaykumar T Dynamic pipelining Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications, (205-216)
  131. ACM
    Gulati A and Varman P Lexicographic QoS scheduling for parallel I/O Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures, (29-38)
  132. ACM
    Johnson J, Krandick W and Ruslanov A Architecture-aware classical Taylor shift by 1 Proceedings of the 2005 international symposium on Symbolic and algebraic computation, (200-207)
  133. ACM
    Zhong L and Jha N Energy efficiency of handheld computer interfaces Proceedings of the 3rd international conference on Mobile systems, applications, and services, (247-260)
  134. ACM
    Schaumont P, Lai B, Qin W and Verbauwhede I Cooperative multithreading on 3mbedded multiprocessor architectures enables energy-scalable design Proceedings of the 42nd annual Design Automation Conference, (27-30)
  135. ACM
    Zhang C, Vahid F, Yang J and Najjar W (2005). A way-halting cache for low-energy high-performance systems, ACM Transactions on Architecture and Code Optimization, 2:1, (34-54), Online publication date: 1-Mar-2005.
  136. ACM
    Lin T, Chao C, Liu C, Hsiao P, Chen S, Lin L, Liu C and Jen C A unified processor architecture for RISC & VLIW DSP Proceedings of the 15th ACM Great Lakes symposium on VLSI, (50-55)
  137. Hashempour H and Lombardi F (2005). Application of Arithmetic Coding to Compression of VLSI Test Data, IEEE Transactions on Computers, 54:9, (1166-1177), Online publication date: 1-Sep-2005.
  138. Petrou D, Gibson G and Ganger G Scheduling speculative tasks in a compute farm Proceedings of the 2005 ACM/IEEE conference on Supercomputing
  139. Vuletic M, Pozzi L and Ienne P (2005). Seamless Hardware-Software Integration in Reconfigurable Computing Systems, IEEE Design & Test, 22:2, (102-113), Online publication date: 1-Mar-2005.
  140. Heffernan M and Wilken K (2005). Data-Dependency Graph Transformations for Instruction Scheduling, Journal of Scheduling, 8:5, (427-451), Online publication date: 1-Oct-2005.
  141. Schoeberl M Design and Implementation of an Efficient Stack Machine Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 3 - Volume 04
  142. Datta A, Bhunia S, Mukhopadhyay S, Banerjee N and Roy K Statistical Modeling of Pipeline Delay and Design of Pipeline under Process Variation to Enhance Yield in sub-100nm Technologies Proceedings of the conference on Design, Automation and Test in Europe - Volume 2, (926-931)
  143. Herruzo E, Mesones A, Benavides J, Plata O and Zapata E Distributed architecture system for computer performance testing Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics, (140-147)
  144. Athanasaki E, Kourtis K, Anastopoulos N and Koziris N Tuning blocked array layouts to exploit memory hierarchy in SMT architectures Proceedings of the 10th Panhellenic conference on Advances in Informatics, (600-610)
  145. Butt A, Johnson T, Zheng Y and Hu Y Kosha Proceedings of the 2004 ACM/IEEE conference on Supercomputing
  146. Togawa N, Tachikake K, Miyaoka Y, Yanagisawa M and Ohtsuki T Instruction set and functional unit synthesis for SIMD processor cores Proceedings of the 2004 Asia and South Pacific Design Automation Conference, (743-750)
  147. Candea G, Cutler J and Fox A (2004). Improving availability with recursive microreboots, Performance Evaluation, 56:1-4, (213-248), Online publication date: 1-Mar-2004.
  148. Kumar R, Tullsen D, Ranganathan P, Jouppi N and Farkas K Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance Proceedings of the 31st annual international symposium on Computer architecture
  149. Zhang Y and Chakrabarty K Task Feasibility Analysis and Dynamic Voltage Scaling in Fault-Tolerant Real-Time Embedded Systems Proceedings of the conference on Design, automation and test in Europe - Volume 2
  150. Velev M Exploiting Signal Unobservability for Efficient Translation to CNF in Formal Verification of Microprocessors Proceedings of the conference on Design, automation and test in Europe - Volume 1
  151. Chen M, Accardi A, Kiciman E, Lloyd J, Patterson D, Fox A and Brewer E Path-based faliure and evolution management Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation - Volume 1, (23-23)
  152. Deng Y and Maly W 2.5D system integration Proceedings of the 2004 Asia and South Pacific Design Automation Conference, (450-455)
  153. Velev M Using positive equality to prove liveness for pipelined microprocessors Proceedings of the 2004 Asia and South Pacific Design Automation Conference, (316-321)
  154. Velev M Efficient translation of boolean formulas to CNF in formal verification of microprocessors Proceedings of the 2004 Asia and South Pacific Design Automation Conference, (310-315)
  155. ACM
    Johnson T, Eigenmann R and Vijaykumar T (2004). Min-cut program decomposition for thread-level speculation, ACM SIGPLAN Notices, 39:6, (59-70), Online publication date: 9-Jun-2004.
  156. ACM
    Johnson T, Eigenmann R and Vijaykumar T Min-cut program decomposition for thread-level speculation Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, (59-70)
  157. ACM
    Vuletić M, Pozzi L and Ienne P Virtual memory window for application-specific reconfigurable coprocessors Proceedings of the 41st annual Design Automation Conference, (948-953)
  158. ACM
    Netto E, Azevedo R, Centoducatte P and Araujo G Multi-profile based code compression Proceedings of the 41st annual Design Automation Conference, (244-249)
  159. ACM
    Velev M Efficient formal verification of pipelined processors with instruction queues Proceedings of the 14th ACM Great Lakes symposium on VLSI, (92-95)
  160. ACM
    Lin T, Lin H, Chao C, Liu C and Jen C A compact DSP core with static floating-point unit & its microcode generation Proceedings of the 14th ACM Great Lakes symposium on VLSI, (57-60)
  161. ACM
    Metzgen P A high performance 32-bit ALU for programmable logic Proceedings of the 2004 ACM/SIGDA 12th international symposium on Field programmable gate arrays, (61-70)
  162. ACM
    Branovic I, Giorgi R and Martinelli E WebMIPS Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture, (19-es)
  163. ACM
    Bečvář M Teaching basics of instruction pipelining with HDLDLX Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture, (16-es)
  164. ACM
    Chihaia I and Gross T An analytical model for software-only main memory compression Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture, (107-113)
  165. ACM
    Smolens J, Gold B, Kim J, Falsafi B, Hoe J and Nowatzyk A (2004). Fingerprinting, ACM SIGOPS Operating Systems Review, 38:5, (224-234), Online publication date: 1-Dec-2004.
  166. ACM
    Smolens J, Gold B, Kim J, Falsafi B, Hoe J and Nowatzyk A (2004). Fingerprinting, ACM SIGARCH Computer Architecture News, 32:5, (224-234), Online publication date: 1-Dec-2004.
  167. ACM
    Smolens J, Gold B, Kim J, Falsafi B, Hoe J and Nowatzyk A (2004). Fingerprinting, ACM SIGPLAN Notices, 39:11, (224-234), Online publication date: 1-Nov-2004.
  168. ACM
    Kumar R, Tullsen D, Ranganathan P, Jouppi N and Farkas K (2004). Single-ISA Heterogeneous Multi-Core Architectures for Multithreaded Workload Performance, ACM SIGARCH Computer Architecture News, 32:2, (64), Online publication date: 2-Mar-2004.
  169. ACM
    Smolens J, Gold B, Kim J, Falsafi B, Hoe J and Nowatzyk A Fingerprinting Proceedings of the 11th international conference on Architectural support for programming languages and operating systems, (224-234)
  170. ACM
    Berekovic M, Moch S and Pirsch P (2003). A scalable, clustered SMT processor for digital signal processing, ACM SIGARCH Computer Architecture News, 32:3, (62-69), Online publication date: 1-Jun-2004.
  171. ACM
    Biswas S, Simpson M and Barua R Memory overflow protection for embedded systems using run-time checks, reuse and compression Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, (280-291)
  172. ACM
    Mathew B, Davis A and Parker M A low power architecture for embedded perception Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems, (46-56)
  173. ACM
    Citron D, Haber G and Levin R Reducing program image size by extracting frozen code and data Proceedings of the 4th ACM international conference on Embedded software, (297-305)
  174. ACM
    Sica F, Coelho C, Nacif J, Foster H and Fernandes A Exception handling in microprocessors using assertion libraries Proceedings of the 17th symposium on Integrated circuits and system design, (55-59)
  175. ACM
    Choi K, Soma R and Pedram M Dynamic voltage and frequency scaling based on workload decomposition Proceedings of the 2004 international symposium on Low power electronics and design, (174-179)
  176. ACM
    Zhang C, Vahid F, Yang J and Najjar W A way-halting cache for low-energy high-performance systems Proceedings of the 2004 international symposium on Low power electronics and design, (126-131)
  177. ACM
    Oliver J, Akella V and Chong F Efficient orchestration of sub-word parallelism in media processors Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, (225-234)
  178. Fields B, Rastislav , Hill M and Newburn C (2004). Interaction Cost, IEEE Micro, 24:6, (57-61), Online publication date: 1-Nov-2004.
  179. Samavi S, Shirani S, Karimi N and Deen M (2004). A Pipeline Architecture for Processing of DNA Microarrays Images, Journal of VLSI Signal Processing Systems, 38:3, (287-297), Online publication date: 1-Nov-2004.
  180. Yim K, Lee J, Kim J, Kim S and Koh K A space-efficient on-chip compressed cache organization for high performance computing Proceedings of the Second international conference on Parallel and Distributed Processing and Applications, (952-964)
  181. Aggarwal A Single FU bypass networks for high clock rate superscalar processors Proceedings of the 11th international conference on High Performance Computing, (319-332)
  182. Liu G, Xia F, Yang X, Zhou H, Zhao H and Deng Y The design and performance analysis of embedded parallel multiprocessing system Proceedings of the First international conference on Embedded Software and Systems, (210-215)
  183. ACM
    Tachikake K, Togawa N, Miyaoka Y, Choi J, Yanagisawa M and Ohtsuki T A hardware/software partitioning algorithm for SIMD processor cores Proceedings of the 2003 Asia and South Pacific Design Automation Conference, (135-140)
  184. Egan C, Steven G, Quick P, Anguera R, Steven F and Vintan L (2003). Two-level branch prediction using neural networks, Journal of Systems Architecture: the EUROMICRO Journal, 49:12-15, (557-570), Online publication date: 1-Dec-2003.
  185. Fields B, Bodík R, Hill M and Newburn C Using Interaction Costs for Microarchitectural Bottleneck Analysis Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
  186. Denning P Virtual memory Encyclopedia of Computer Science, (1832-1835)
  187. ACM
    Udayakumaran S and Barua R Compiler-decided dynamic memory allocation for scratch-pad based embedded systems Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, (276-286)
  188. ACM
    Goodwin D and Petkov D Automatic generation of application specific processors Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems, (137-147)
  189. ACM
    Dufour B, Driesen K, Hendren L and Verbrugge C (2003). Dynamic metrics for java, ACM SIGPLAN Notices, 38:11, (149-168), Online publication date: 26-Nov-2003.
  190. ACM
    Dufour B, Driesen K, Hendren L and Verbrugge C Dynamic metrics for java Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications, (149-168)
  191. ACM
    Chen M and Olukotun K (2003). The Jrpm system for dynamically parallelizing Java programs, ACM SIGARCH Computer Architecture News, 31:2, (434-446), Online publication date: 1-May-2003.
  192. ACM
    Kozyrakis C and Patterson D (2003). Overcoming the limitations of conventional vector processors, ACM SIGARCH Computer Architecture News, 31:2, (399-409), Online publication date: 1-May-2003.
  193. ACM
    Ernst D, Hamel A and Austin T (2003). Cyclone, ACM SIGARCH Computer Architecture News, 31:2, (253-263), Online publication date: 1-May-2003.
  194. ACM
    Chen M and Olukotun K The Jrpm system for dynamically parallelizing Java programs Proceedings of the 30th annual international symposium on Computer architecture, (434-446)
  195. ACM
    Kozyrakis C and Patterson D Overcoming the limitations of conventional vector processors Proceedings of the 30th annual international symposium on Computer architecture, (399-409)
  196. ACM
    Ernst D, Hamel A and Austin T Cyclone Proceedings of the 30th annual international symposium on Computer architecture, (253-263)
  197. ACM
    Aziz A, Prakash A and Ramachandran V A near optimal scheduler for switch-memory-switch routers Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures, (343-352)
  198. ACM
    Becvar M, Pluhacek A and Danecek J DOP Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture, (4-es)
  199. ACM
    Cornea M, Harrison J and Tang P Intel® Itanium® floating-point architecture Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture, (3-es)
  200. ACM
    Berekovic M, Moch S and Pirsch P A scalable, clustered SMT processor for digital signal processing Proceedings of the 2003 workshop on MEmory performance: DEaling with Applications , systems and architecture, (62-69)
  201. Velev M and Bryant R (2003). Effective use of boolean satisfiability procedures in the formal verification of superscalar and VLIW microprocessors, Journal of Symbolic Computation, 35:2, (73-106), Online publication date: 1-Feb-2003.
  202. Venkateswaran N and Chandramouli C General purpose processor architecture for modeling stochastic biological neuronal assemblies Proceedings of the 5th international conference on Evolvable systems: from biology to hardware, (387-397)
  203. Song D, Heywood M and Zincir-Heywood A A linear genetic programming approach to intrusion detection Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, (2325-2336)
  204. ACM
    Pisharath J and Choudhary A An integrated approach to reducing power dissipation in memory hierarchies Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems, (88-97)
  205. ACM
    Brorsson M MipsIt Proceedings of the 2002 workshop on Computer architecture education: Held in conjunction with the 29th International Symposium on Computer Architecture, (12-es)
  206. ACM
    Herath J, Ramnath S, Herath A and Herath S An active learning environment for intermediate computer architecture courses Proceedings of the 2002 workshop on Computer architecture education: Held in conjunction with the 29th International Symposium on Computer Architecture, (8-es)
  207. ACM
    Miyaoka Y, Kataoka Y, Togawa N, Yanagisawa M and Ohtsuki T Area/delay estimation for digital signal processor cores Proceedings of the 2001 Asia and South Pacific Design Automation Conference, (156-161)
  208. Rhea S, Wells C, Eaton P, Geels D, Zhao B, Weatherspoon H and Kubiatowicz J (2001). Maintenance-Free Global Data Storage, IEEE Internet Computing, 5:5, (40-49), Online publication date: 1-Sep-2001.
  209. ACM
    Vajracharya S and Grunwald D Loop re-ordering and pre-fetching at run-time Proceedings of the 1997 ACM/IEEE conference on Supercomputing, (1-13)
  210. Szymanski T (1997). Design Principles for Practical Self-Routing Nonblocking Switching Networks with O(N · log N) Bit-Complexity, IEEE Transactions on Computers, 46:10, (1057-1069), Online publication date: 1-Oct-1997.
  211. ACM
    Gupta R Analysis of operation delay and execution rate constraints for embedded systems Proceedings of the 33rd annual Design Automation Conference, (601-604)
  212. ACM
    Wulf W and McKee S (1995). Hitting the memory wall, ACM SIGARCH Computer Architecture News, 23:1, (20-24), Online publication date: 1-Mar-1995.
  213. ACM
    Kuga M, Murakami K and Tomita S (1991). DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar), ACM SIGARCH Computer Architecture News, 19:4, (14-29), Online publication date: 1-Jul-1991.
Contributors
  • Stanford University
  • University of California, Berkeley

Recommendations

Reviews

Diego R. Llanos

Since its first edition in 1990, this book has rapidly become a reference text in most advanced computing architecture courses all over the world. The idea of writing a book to explain computer architecture from the performance point of view has proved its usefulness, allowing students and engineers to better understand the tradeoffs involved in the design of a computer system. The second edition, published in 1996 [1], updated the first edition’s contents, adding many new advances in hardware, from pipelined instruction level parallelism to shared bus multiprocessors to computer interconnection technologies. This third edition presents a better organization of subjects in each chapter. To fully understand the contents, a good background in computer architecture is needed. Most introductory material needed is covered by Computer organization and design: the hardware-software interface [2], an excellent introductory book in computer architecture written by the same authors, and has therefore been removed from or placed in the appendices of this book. Most of the high-quality appendices are available online, an excellent idea that keeps the book to a reasonable size. Each chapter concludes with several common sections: “Putting It All Together,” presents real-world examples of the topics covered; “Fallacies and Pitfalls,” shows common mistakes and architectural traps; and “Historical Perspective and Further Reading,” provides an absorbing description of the historic evolution of the concepts presented, together with an excellent collection of useful references and many exercises with different degrees of difficulty, some of them with their solutions. This huge work (1200 pages) is structured in eight chapters plus the appendices. Chapter 1, “Fundamentals of Computer Design,” presents the evolution of computing performance over the years and establishes an initial taxonomy of computer markets. This new classification, extensively referenced in the rest of the book, defines three categories: desktop computing, servers, and embedded computers; the latter a field that had not been covered in the previous edition. The importance of the embedded market, with its rapid growth rate in the last few years, is immense, and with its several distinguishing factors (power consumption, real-time requirements, lack of huge amounts of memory) it justifies a detailed study. Relationships between cost, price, and performance are also analyzed here. The chapter also presents a description of some basic principles of computing design, and finishes with a discussion of performance and price-performance in relation to each of the computer categories mentioned above. Chapter 2, “Instruction Set Principles and Examples,” classifies different characteristics found in an instruction set, such as the memory addressing scheme, type and size of operands, operations, and flow control. This new edition also shows those concepts in the context of digital signal processors (DSP) and media processors. As in the previous edition, the RISC concept is extensively described, together with the role of compilers in obtaining better performance. The MIPS architecture is described as a classic example of RISC machine, replacing the DLX architecture that can be found in the previous edition. The Trimedia TM32 processor, dedicated to multimedia processing, is also analyzed here, although to better understand the description, some concepts of very large instruction word (VLIW) architectures, delayed in the book until chapter 4, would be useful. Chapter 3, “Instruction-Level Parallelism and its Dynamic Exploitation,” describes some of the problems associated with the use of pipelining, branch prediction, and hardware-based speculation in the design of a new architecture. To fully understand these advanced topics, it might be necessary to have a refresher on some concepts on pipelining by reading the material provided in Appendix A. This appendix, together with chapter 3, condenses the material covered in chapters 3 and 4 of the previous edition, also reducing the overlap with the content presented in the authors’ introductory book [2]. Chapter 4, “Exploiting Instruction-Level Parallelism with Software Approaches,” shows how advanced compiler techniques can improve the performance of pipelines and multi-issue processors. The VLIW approach is also presented here, as a software solution to avoid dependency checking by the hardware. Many pages are devoted to the description of compiler techniques to exploit inherent parallelism, and how the hardware can help. It is clear that compiling for processors with significant amounts of ILP has become quite complex. The chapter concludes with an extensive description of the Intel IA-64 architecture and Itanium processor, and the Trimedia TM32 and the Transmeta Crusoe chip as examples of VLIW in the embedded space. The coverage of the memory hierarchy design in chapter 5 is simply perfect, with a complete review of cache peculiarities (such as cache miss penalty, miss rate, and hit time), together with a clear enumeration of techniques to handle each one of them. Mechanisms for reducing them by overlapping with the execution of instructions are also described. The organization of this chapter is excellent, allowing the reader to fully understand the impact of each technique in different aspects of the cache design. The chapter continues with a description of main memory organizations to help reduce latency and supply a higher bandwidth; a survey on memory technology; and an introduction to virtual memory and its relationship with caches. As in other chapters, this chapter concludes with the description of real-world examples of the topics discussed; in this case, the memory hierarchy of the Alpha 21264 and the Emotion Engine of the Sony Playstation 2 are covered, together with the Sunfire 6800 server as an example from the server market. Multiprocessors and thread-level parallelism are covered in chapter 6, and the discussion is then dedicated to storage systems and network technologies. As the authors say, multiprocessor architecture is a large and diverse field that would require an additional volume: their intention is only to focus on the mainstream of multiprocessor design. Chapter 6 contains most of the material about multiprocessors included on chapter 8 of the previous edition. A commercial workload has been added to the scientific workload that was used in the previous edition to show the behavior and performance of symmetric and distributed shared-memory architectures. A new section on multithreading architectures and their challenges is included here. Sun’s Wildfire prototype is the example chosen to show how the advantages of centralized and distributed shared memory architectures can be combined, presenting a uniform access to memory while allowing good scalability. Chapters 7 and 8 present storage systems and network technologies. In addition to the topics already presented in the second edition (buses, I/O performance measures, RAID systems), chapter 7 includes the study of failures in storage systems, with some real-world examples and statistics that are hard to find elsewhere. Benchmarking of storage performance is also included here. The design of an I/O system is more effectively explained than in the previous edition, with more elaborate examples. Chapter 8 presents the basic concepts on networking from the computer architect’s point of view. Three new sections deal with cluster technology, with a discussion on its performance challenges and some recommendations on cluster design with examples, followed by an extremely interesting description of the cluster of PCs used by the Google search engine. The chapter concludes with an additional example from the embedded world; in this case, some wireless networking concepts and the anatomy of a digital cell phone. Previous editions of this book have become a standard reference for advanced computer architecture courses and for practitioners of computer design. This new edition updates its contents to reflect the rapid evolution of the discipline and presents an improved organization of the information, especially in the chapters devoted to instruction-level parallelism and memory hierarchy. If the previous editions quickly became the reference book in the field, this new edition will surely be at least as successful as they were. Online Computing Reviews Service

Fernando Berzal

This excellent book, nicknamed , is the third edition of a classic that began its journey with two previous editions in the 1990s. Suffice it to say that, in computer architecture and related subjects, particularly in the study of computer design and organization, this is THE advanced textbook. If you studied computer design a few years ago, and you want to keep up to date with the latest trends and advances, you should definitely buy and read this book. If, however, you are only beginning to make some inroads into the field of computer architecture, maybe you should start with the authors () textbook [1]. is more light-hearted, and will probably spur your interest in the field and make you ask for more in the future, while is intended for those knowledgeable people who expect more than simplistic descriptions of the fundamentals. advocates for a quantitative approach, based on measurement. Any design decision should be made after extensive simulations and proper measurements on actual examples, not just on cleverly devised scenarios that tend to bias the experimental results. Often, easy-to-perform, back of the envelope calculations will suffice to evaluate competing designs. Hunches and vested (mainly commercial) interests should never replace a quantitative evaluation, or a study of cost-performance-power trade-offs. Following this approach, tackles instruction set design, studies instruction-level parallelism (both from a hardware and from a software point of view), looks at memory hierarchies in detail (a topic whose origin dates back to 1946), and analyzes the higher-level parallelism found in multiprocessors and clusters. This lengthy book even overviews storage systems and introduces networking topics. All of this is seasoned with thought-provoking comments (see, for instance, Throughput Versus Response Time, on p. 717 through p. 719), cleverly written introductions to ancillary topics (such as queuing theory, for example, on p. 720), and insightful discussions to broaden your perspective (for example in Studies on the Limitations of ILP [instruction-level parallelism], on p. 240). Parallelism is one recurring theme throughout the book. In fact, the advantages of parallel execution were already proposed in 1842 (p. 652), the era of Charles Babbages analytical engine. While there have been continuous technological advances during the last decades, computer performance has maintained a growth rate above and beyond the growth due to these technology improvements. This is because of clever ideas that make use of parallelism at the instruction and thread level. Pipelined, superscalar, very long instruction word (VLIW), explicitly parallel instruction computers (EPIC), simultaneous multithreading (SMT), and vector processors are examined, as well as multiprocessors and multicomputers, such as clusters, thus covering the entire spectrum of hardware design alternatives. The authors also discuss techniques that are used in modern optimizing compilers to improve performance. A combination of a varied set of techniques is necessary in order to keep improving performance, since parallelism at a given level is hampered both by the mismatch between hardware architecture and current software development approaches, and also by the square law of computation [2], which states that, unless simplifications are made, the amount of computation involved (hardware complexity) increases at least as fast as the square of the size of the problem (instruction complexity). Apart from its broad coverage of almost every topic that might be of interest for computer architects, stands out because of its almost unique style, which makes this book an invaluable reference. The Putting It All Together and Another View sections present real examples of actual computer systems, from the desktop, server, and embedded markets. In these sections, readers will learn about the internal organization of the Intel P6 and NetBurst microarchitectures (covering microprocessors from the Pentium Pro to the Pentium 4), the IA-64 architecture (implemented by the Itanium processor), the cluster used by the Google search engine, the innards of the Sony Playstation 2, the building blocks of a Sanyo digital camera, and even the architecture of a Nokia cell phone, as well as the MIPS instruction set, the Alpha 21264 memory hierarchy, and other interesting examples. Other outstanding sections are present in all chapters. Fallacies and Pitfalls collects some wrongly believed assumptions and widely made mistakes, as a warning sign for practitioners. Historical Perspective tries to put everything in context, describing how ideas were originated, and evolved over time due to competing forces. Comprehensive lists of exercises, to verify what you have learned and hone your skills, close the book chapters, which are complemented by a few appendices containing additional material. Some of these appendices are available online (http://www.mkp.com/CA3), and address topics such as vector processors, computer arithmetic, and coherence protocols, as well as a wide range of instruction set architectures. is definitely not for newbies, assuming they want to fully understand all the cross-cutting issues, as the title says, in those sections that discuss trade-offs and interactions among different aspects of a computer design. This apparent limitation on the target audience for the book, however, does not make it inaccessible to those new to the field. The books thorough and clear explanations fit the bill for a five-star textbook, which is worth its weight (and your wrist pain, if you get absorbed into reading this book for too long). I am looking forward to the next edition of this book. There are people who, without enough time to peruse research journals and conference proceedings, will nevertheless always be interested in understanding the innards of the latest computers from an objective and critical point of view, which is not easily found in the trade publications. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.